Parsing various log formats

After an input module has received a log message and generated an event record for it, there may be additional parsing required. This parsing can be implemented by a dedicated module, or in the NXLog language with regular expression and other string manipulation functionality.

The following sections provide configuration examples for parsing log formats commonly used by applications.

Common & Combined Log Formats

The Common Log Format (or NCSA Common Log Format) and Combined Log Format are access log formats used by web servers. These are the same, except that the Combined Log Format uses two additional fields.

Common Log Format syntax
host ident authuser [date] "request" status size
Combined Log Format syntax
host ident authuser [date] "request" status size "referer" "user-agent"

If a field is not available, a hyphen (-) is used as a placeholder.

Table 1. Fields
Field Description

host

IP address of the client

ident

RFC 1413 identity of the client

authuser

Username of the user accessing the document (not applicable for public documents)

date

Timestamp of the request

request

Request line received from the client

status

HTTP status code returned to the client

size

Size of the object returned to the client (measured in bytes)

referer

URL from which the user was referred

user-agent

User agent string sent by the client

Example 1. Parsing the Common Log Format

This configuration uses a regular expression to parse the fields in each record. The parsedate() function is used to convert the timestamp string into a datetime type for later processing or conversion as required.

nxlog.conf
<Input access_log>
    Module  im_file
    File    "/var/log/apache2/access.log"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/
        {
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3);
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
        }
    </Exec>
</Input>
Example 2. Parsing the combined log format

This example is like the previous one, except it parses the two additional fields unique to the Combined Log Format. An om_file instance is also shown here which has been configured to discard all events not related to the user john and write the remaining events to a file in JSON format.

nxlog.conf
<Extension _json>
    Module  xm_json
</Extension>

<Input access_log>
    Module  im_file
    File    "/var/log/apache2/access.log"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
                          \ \"([^\"]+)\"/
        {
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3);
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
            if $8 != '-' $HTTPReferer = $8;
            if $9 != '-' $HTTPUserAgent = $9;
        }
    </Exec>
</Input>

<Output out>
    Module  om_file
    File    '/var/log/john_access.log'
    <Exec>
        if not (defined($AccountName) and ($AccountName == 'john')) drop();
        to_json();
    </Exec>
</Output>

For information about using the Common and Combined Log Formats with the Apache HTTP Server, see Apache HTTP Server.

Parsing syslog events

The xm_syslog module provides the parse_syslog() procedure, which will parse a BSD or IETF Syslog formatted raw event to create fields in the event record.

Example 3. Parsing a syslog event with parse_syslog()

This example shows a Syslog event as it is received via UDP and processed by the parse_syslog() procedure.

Syslog message
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2

The following configuration loads the xm_syslog extension module and then uses the Exec directive to execute the parse_syslog() procedure for each event.

nxlog.conf
<Extension _syslog>
    Module  xm_syslog
</Extension>

<Input udp>
    Module  im_udp
    Host    0.0.0.0
    Port    514
    Exec    parse_syslog();
</Input>

<Output out>
    Module  om_null
</Output>

This results in the following fields being added to the event record by parse_syslog().

Table 2. Syslog fields added by parse_syslog()
Field Value

$Message

Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2

$SyslogSeverityValue

6

$SyslogSeverity

INFO

$SeverityValue

2

$Severity

INFO

$SyslogFacilityValue

4

$SyslogFacility

AUTH

$EventTime

2016-11-22 10:30:12

$Hostname

myhost

$SourceName

sshd

$ProcessID

8459

Field Delimited Formats (CSV)

Log files containing fields and their values delimited by commas, spaces, or semicolons are commonly created and consumed using such formats. The xm_csv module can both generate and parse these log formats. Multiple xm_csv instances can be used to reorder, add, remove, or modify fields before outputting to a different CSV log format.

Example 4. Complex CSV format conversion

This example reads from the input file and parses it with the parse_csv() procedure from the csv1 instance where the field names, types, and order within the record are defined. The $date field is then set to the current time and the $number field is set to 0 if it is not already defined. Finally, the to_csv() procedure from the csv2 instance is used to generate output with the additional date field, a different delimiter, and a different field order.

nxlog.conf
<Output out>
    Module          om_null
</Output>

<Extension w3c_parser>
    Module          xm_csv
    Fields          $date, $time, $HTTPMethod, $HTTPURL
    FieldTypes      string, string, string, string
    Delimiter       ' '
    EscapeChar      '"'
    QuoteChar       '"'
    EscapeControl   FALSE
    UndefValue      -
</Extension>

<Extension _json>
    Module          xm_json
</Extension>

<Input w3c>
    Module          im_file
    File            '/var/log/httpd-log'
    <Exec>
        if $raw_event =~ /^#/ drop();
        else
        {
            w3c_parser->parse_csv();
            $EventTime = parsedate($date + " " + $time);
        }
    </Exec>
</Input>
Input sample
1, "John K.", 42
2, "Joe F.", 43
Output sample
1;42;"John K.";2011-01-15 23:45:20
2;43;"Joe F.";2011-01-15 23:45:20

JSON

The xm_json module provides procedures for generating and parsing log data in JSON format.

Example 5. Using the xm_json module for parsing JSON

This example reads JSON-formatted data from file with the im_file module. Then the parse_json() procedure is used to parse the data, setting each JSON field to a field in the event record.

nxlog.conf
<Extension _json>
    Module  xm_json
</Extension>

<Input in>
    Module  im_file
    File    "/var/log/app.json"
    Exec    parse_json();
</Input>
Example 6. Using the xm_json module for generating JSON logs

Here, the to_json() procedure is used to write all the event record fields to $raw_event in JSON format. This is then written to file using the om_file module.

nxlog.conf
<Extension _json>
    Module  xm_json
</Extension>

<Output out>
    Module  om_file
    File    "/var/log/json.log"
    Exec    to_json();
</Output>

W3C Extended Log File Format

See the specification draft of the W3C format. The dedicated xm_w3c parser module can be used to process W3C formatted logs. See also the W3C section in the Microsoft IIS chapter.

Log sample
#Version: 1.0
#Date: 2011-07-01 00:00:00
#Fields: date time cs-method cs-uri
2011-07-01 00:34:23 GET /foo/bar1.html
2011-07-01 12:21:16 GET /foo/bar2.html
2011-07-01 12:45:52 GET /foo/bar3.html
2011-07-01 12:57:34 GET /foo/bar4.html
Example 7. Parsing W3C format with xm_w3c

This configuration reads the W3C format log file and parses it with the xm_w3c module. The fields in the event record are converted to JSON and the logs are forwarded via TCP.

nxlog.conf
<Extension _json>
    Module      xm_json
</Extension>

<Extension w3c_parser>
    Module      xm_w3c
</Extension>

<Input w3c>
    Module      im_file
    File        '/var/log/httpd-log'
    InputType   w3c_parser
</Input>

<Output tcp>
    Module      om_tcp
    Host        192.168.12.1
    Port        1514
    Exec        to_json();
</Output>

The W3C log format can also be parsed with the xm_csv module if using NXLog Community Edition.

Example 8. Parsing W3C format with xm_csv

The following configuration reads a W3C file and tokenizes it with the CSV parser. Header lines starting with a leading hash mark (#) are ignored. The $EventTime field is set from the parsed date and time fields.

The fields in the xm_csv module instance below must be updated to correspond with the fields in the W3C file to be parsed.
nxlog.conf
<Extension w3c_parser>
    Module          xm_csv
    Fields          $date, $time, $HTTPMethod, $HTTPURL
    FieldTypes      string, string, string, string
    Delimiter       ' '
    EscapeChar      '"'
    QuoteChar       '"'
    EscapeControl   FALSE
    UndefValue      -
</Extension>

<Extension _json>
    Module          xm_json
</Extension>

<Input w3c>
    Module          im_file
    File            '/var/log/httpd-log'
    <Exec>
        if $raw_event =~ /^#/ drop();
        else
        {
            w3c_parser->parse_csv();
            $EventTime = parsedate($date + " " + $time);
        }
    </Exec>
</Input>

XML

The xm_xml module can be used for generating and parsing structured data in XML format.

Example 9. Using the xm_xml module for parsing XML formatted logs

This configuration uses the im_file module to read from file. Then the parse_xml() procedure parses the XML into fields in the event record.

nxlog.conf
<Extension _xml>
    Module  xm_xml
</Extension>

<Input in>
    Module  im_file
    File    "/var/log/app.xml"
    Exec    parse_xml();
</Input>
Example 10. Using the xm_xml module for generating XML logs

Here, the fields in the event record are used by the to_xml() procedure to generate XML, which is then written to file by the om_file module.

nxlog.conf
<Extension _xml>
    Module  xm_xml
</Extension>

<Output out>
    Module  om_file
    File    "/var/log/logs.xml"
    Exec    to_xml();
</Output>