Parsing various log formats
After an input module has received a log message and generated an event record for it, there may be additional parsing required. This parsing can be implemented by a dedicated module, or in the NXLog language with regular expression and other string manipulation functionality.
The following sections provide configuration examples for parsing log formats commonly used by applications.
Common & Combined Log Formats
The Common Log Format (or NCSA Common Log Format) and Combined Log Format are access log formats used by web servers. These are the same, except that the Combined Log Format uses two additional fields.
host ident authuser [date] "request" status sizehost ident authuser [date] "request" status size "referer" "user-agent"If a field is not available, a hyphen (-) is used as a placeholder.
| Field | Description | 
|---|---|
| host | IP address of the client | 
| ident | RFC 1413 identity of the client | 
| authuser | Username of the user accessing the document (not applicable for public documents) | 
| date | Timestamp of the request | 
| request | Request line received from the client | 
| status | HTTP status code returned to the client | 
| size | Size of the object returned to the client (measured in bytes) | 
| referer | URL from which the user was referred | 
| user-agent | User agent string sent by the client | 
This configuration uses a regular expression to parse the fields in each record. The parsedate() function is used to convert the timestamp string into a datetime type for later processing or conversion as required.
<Input access_log>
    Module  im_file
    File    "/var/log/apache2/access.log"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/
        {
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3);
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
        }
    </Exec>
</Input>This example is like the previous one, except it parses the two additional
fields unique to the Combined Log Format. An om_file instance is
also shown here which has been configured to discard all events not related to
the user john and write the remaining events to a file in JSON format.
<Extension _json>
    Module  xm_json
</Extension>
<Input access_log>
    Module  im_file
    File    "/var/log/apache2/access.log"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
                          \ \"([^\"]+)\"/
        {
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3);
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
            if $8 != '-' $HTTPReferer = $8;
            if $9 != '-' $HTTPUserAgent = $9;
        }
    </Exec>
</Input>
<Output out>
    Module  om_file
    File    '/var/log/john_access.log'
    <Exec>
        if not (defined($AccountName) and ($AccountName == 'john')) drop();
        to_json();
    </Exec>
</Output>For information about using the Common and Combined Log Formats with the Apache HTTP Server, see Apache HTTP Server.
Parsing syslog events
The xm_syslog module provides the parse_syslog() procedure, which will parse a BSD or IETF Syslog formatted raw event to create fields in the event record.
This example shows a Syslog event as it is received via UDP and processed by the parse_syslog() procedure.
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2The following configuration loads the xm_syslog extension module and then uses the Exec directive to execute the parse_syslog() procedure for each event.
<Extension _syslog>
    Module  xm_syslog
</Extension>
<Input udp>
    Module  im_udp
    Host    0.0.0.0
    Port    514
    Exec    parse_syslog();
</Input>
<Output out>
    Module  om_null
</Output>This results in the following fields being added to the event record by parse_syslog().
| Field | Value | 
|---|---|
| $Message | Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2 | 
| $SyslogSeverityValue | 6 | 
| $SyslogSeverity | INFO | 
| $SeverityValue | 2 | 
| $Severity | INFO | 
| $SyslogFacilityValue | 4 | 
| $SyslogFacility | AUTH | 
| $EventTime | 2016-11-22 10:30:12 | 
| $Hostname | myhost | 
| $SourceName | sshd | 
| $ProcessID | 8459 | 
Field Delimited Formats (CSV)
Log files containing fields and their values delimited by commas, spaces, or semicolons are commonly created and consumed using such formats. The xm_csv module can both generate and parse these log formats. Multiple xm_csv instances can be used to reorder, add, remove, or modify fields before outputting to a different CSV log format.
This example reads from the input file and parses it with the parse_csv() procedure from the csv1
instance where the field names, types, and order within the record are defined.
The $date field is then set to the current time and the
$number field is set to 0 if it is not already defined. Finally, the
to_csv() procedure from
the csv2 instance is used to generate output with the additional
date field, a different delimiter, and a different field order.
<Output out>
    Module          om_null
</Output>
<Extension w3c_parser>
    Module          xm_csv
    Fields          $date, $time, $HTTPMethod, $HTTPURL
    FieldTypes      string, string, string, string
    Delimiter       ' '
    EscapeChar      '"'
    QuoteChar       '"'
    EscapeControl   FALSE
    UndefValue      -
</Extension>
<Extension _json>
    Module          xm_json
</Extension>
<Input w3c>
    Module          im_file
    File            '/var/log/httpd-log'
    <Exec>
        if $raw_event =~ /^#/ drop();
        else
        {
            w3c_parser->parse_csv();
            $EventTime = parsedate($date + " " + $time);
        }
    </Exec>
</Input>1, "John K.", 42 2, "Joe F.", 43
1;42;"John K.";2011-01-15 23:45:20 2;43;"Joe F.";2011-01-15 23:45:20
JSON
The xm_json module provides procedures for generating and parsing log data in JSON format.
This example reads JSON-formatted data from file with the im_file module. Then the parse_json() procedure is used to parse the data, setting each JSON field to a field in the event record.
<Extension _json>
    Module  xm_json
</Extension>
<Input in>
    Module  im_file
    File    "/var/log/app.json"
    Exec    parse_json();
</Input>W3C Extended Log File Format
See the specification draft of the W3C format. The dedicated xm_w3c parser module can be used to process W3C formatted logs. See also the W3C section in the Microsoft IIS chapter.
#Version: 1.0
#Date: 2011-07-01 00:00:00
#Fields: date time cs-method cs-uri
2011-07-01 00:34:23 GET /foo/bar1.html
2011-07-01 12:21:16 GET /foo/bar2.html
2011-07-01 12:45:52 GET /foo/bar3.html
2011-07-01 12:57:34 GET /foo/bar4.htmlThis configuration reads the W3C format log file and parses it with the xm_w3c module. The fields in the event record are converted to JSON and the logs are forwarded via TCP.
<Extension _json>
    Module      xm_json
</Extension>
<Extension w3c_parser>
    Module      xm_w3c
</Extension>
<Input w3c>
    Module      im_file
    File        '/var/log/httpd-log'
    InputType   w3c_parser
</Input>
<Output tcp>
    Module      om_tcp
    Host        192.168.12.1
    Port        1514
    Exec        to_json();
</Output>The W3C log format can also be parsed with the xm_csv module if using NXLog Community Edition.
The following configuration reads a W3C file and tokenizes it with the CSV
parser. Header lines starting with a leading hash mark (#) are ignored. The
$EventTime field is set from the parsed date and time fields.
| The fields in the xm_csv module instance below must be updated to correspond with the fields in the W3C file to be parsed. | 
<Extension w3c_parser>
    Module          xm_csv
    Fields          $date, $time, $HTTPMethod, $HTTPURL
    FieldTypes      string, string, string, string
    Delimiter       ' '
    EscapeChar      '"'
    QuoteChar       '"'
    EscapeControl   FALSE
    UndefValue      -
</Extension>
<Extension _json>
    Module          xm_json
</Extension>
<Input w3c>
    Module          im_file
    File            '/var/log/httpd-log'
    <Exec>
        if $raw_event =~ /^#/ drop();
        else
        {
            w3c_parser->parse_csv();
            $EventTime = parsedate($date + " " + $time);
        }
    </Exec>
</Input>XML
The xm_xml module can be used for generating and parsing structured data in XML format.
This configuration uses the im_file module to read from file. Then the parse_xml() procedure parses the XML into fields in the event record.
<Extension _xml>
    Module  xm_xml
</Extension>
<Input in>
    Module  im_file
    File    "/var/log/app.xml"
    Exec    parse_xml();
</Input> 
   