Parsing various log formats
After an input module has received a log message and generated an event record for it, there may be additional parsing required. This parsing can be implemented by a dedicated module, or in the NXLog language with regular expression and other string manipulation functionality.
The following sections provide configuration examples for parsing log formats commonly used by applications.
Common & Combined Log Formats
The Common Log Format (or NCSA Common Log Format) and Combined Log Format are access log formats used by web servers. These are the same, except that the Combined Log Format uses two additional fields.
host ident authuser [date] "request" status size
host ident authuser [date] "request" status size "referer" "user-agent"
If a field is not available, a hyphen (-
) is used as a placeholder.
Field | Description |
---|---|
host |
IP address of the client |
ident |
RFC 1413 identity of the client |
authuser |
Username of the user accessing the document (not applicable for public documents) |
date |
Timestamp of the request |
request |
Request line received from the client |
status |
HTTP status code returned to the client |
size |
Size of the object returned to the client (measured in bytes) |
referer |
URL from which the user was referred |
user-agent |
User agent string sent by the client |
This configuration uses a regular expression to parse the fields in each record. The parsedate() function is used to convert the timestamp string into a datetime type for later processing or conversion as required.
<Input access_log>
Module im_file
File "/var/log/apache2/access.log"
<Exec>
if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/
{
$Hostname = $1;
if $2 != '-' $AccountName = $2;
$EventTime = parsedate($3);
$HTTPMethod = $4;
$HTTPURL = $5;
$HTTPResponseStatus = $6;
if $7 != '-' $FileSize = $7;
}
</Exec>
</Input>
This example is like the previous one, except it parses the two additional
fields unique to the Combined Log Format. An om_file instance is
also shown here which has been configured to discard all events not related to
the user john
and write the remaining events to a file in JSON format.
<Extension _json>
Module xm_json
</Extension>
<Input access_log>
Module im_file
File "/var/log/apache2/access.log"
<Exec>
if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
\ \"([^\"]+)\"/
{
$Hostname = $1;
if $2 != '-' $AccountName = $2;
$EventTime = parsedate($3);
$HTTPMethod = $4;
$HTTPURL = $5;
$HTTPResponseStatus = $6;
if $7 != '-' $FileSize = $7;
if $8 != '-' $HTTPReferer = $8;
if $9 != '-' $HTTPUserAgent = $9;
}
</Exec>
</Input>
<Output out>
Module om_file
File '/var/log/john_access.log'
<Exec>
if not (defined($AccountName) and ($AccountName == 'john')) drop();
to_json();
</Exec>
</Output>
For information about using the Common and Combined Log Formats with the Apache HTTP Server, see Apache HTTP Server.
Parsing syslog events
The xm_syslog module provides the parse_syslog() procedure, which will parse a BSD or IETF Syslog formatted raw event to create fields in the event record.
This example shows a Syslog event as it is received via UDP and processed by the parse_syslog() procedure.
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2
The following configuration loads the xm_syslog extension module and then uses the Exec directive to execute the parse_syslog() procedure for each event.
<Extension _syslog>
Module xm_syslog
</Extension>
<Input udp>
Module im_udp
Host 0.0.0.0
Port 514
Exec parse_syslog();
</Input>
<Output out>
Module om_null
</Output>
This results in the following fields being added to the event record by parse_syslog().
Field | Value |
---|---|
$Message |
Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2 |
$SyslogSeverityValue |
6 |
$SyslogSeverity |
INFO |
$SeverityValue |
2 |
$Severity |
INFO |
$SyslogFacilityValue |
4 |
$SyslogFacility |
AUTH |
$EventTime |
2016-11-22 10:30:12 |
$Hostname |
myhost |
$SourceName |
sshd |
$ProcessID |
8459 |
Field Delimited Formats (CSV)
Log files containing fields and their values delimited by commas, spaces, or semicolons are commonly created and consumed using such formats. The xm_csv module can both generate and parse these log formats. Multiple xm_csv instances can be used to reorder, add, remove, or modify fields before outputting to a different CSV log format.
This example reads from the input file and parses it with the parse_csv() procedure from the csv1
instance where the field names, types, and order within the record are defined.
The $date
field is then set to the current time and the
$number
field is set to 0 if it is not already defined. Finally, the
to_csv() procedure from
the csv2
instance is used to generate output with the additional
date
field, a different delimiter, and a different field order.
<Output out>
Module om_null
</Output>
<Extension w3c_parser>
Module xm_csv
Fields $date, $time, $HTTPMethod, $HTTPURL
FieldTypes string, string, string, string
Delimiter ' '
EscapeChar '"'
QuoteChar '"'
EscapeControl FALSE
UndefValue -
</Extension>
<Extension _json>
Module xm_json
</Extension>
<Input w3c>
Module im_file
File '/var/log/httpd-log'
<Exec>
if $raw_event =~ /^#/ drop();
else
{
w3c_parser->parse_csv();
$EventTime = parsedate($date + " " + $time);
}
</Exec>
</Input>
1, "John K.", 42 2, "Joe F.", 43
1;42;"John K.";2011-01-15 23:45:20 2;43;"Joe F.";2011-01-15 23:45:20
JSON
The xm_json module provides procedures for generating and parsing log data in JSON format.
This example reads JSON-formatted data from file with the im_file module. Then the parse_json() procedure is used to parse the data, setting each JSON field to a field in the event record.
<Extension _json>
Module xm_json
</Extension>
<Input in>
Module im_file
File "/var/log/app.json"
Exec parse_json();
</Input>
W3C Extended Log File Format
See the specification draft of the W3C format. The dedicated xm_w3c parser module can be used to process W3C formatted logs. See also the W3C section in the Microsoft IIS chapter.
#Version: 1.0
#Date: 2011-07-01 00:00:00
#Fields: date time cs-method cs-uri
2011-07-01 00:34:23 GET /foo/bar1.html
2011-07-01 12:21:16 GET /foo/bar2.html
2011-07-01 12:45:52 GET /foo/bar3.html
2011-07-01 12:57:34 GET /foo/bar4.html
This configuration reads the W3C format log file and parses it with the xm_w3c module. The fields in the event record are converted to JSON and the logs are forwarded via TCP.
<Extension _json>
Module xm_json
</Extension>
<Extension w3c_parser>
Module xm_w3c
</Extension>
<Input w3c>
Module im_file
File '/var/log/httpd-log'
InputType w3c_parser
</Input>
<Output tcp>
Module om_tcp
Host 192.168.12.1
Port 1514
Exec to_json();
</Output>
The W3C log format can also be parsed with the xm_csv module if using NXLog Community Edition.
The following configuration reads a W3C file and tokenizes it with the CSV
parser. Header lines starting with a leading hash mark (#
) are ignored. The
$EventTime
field is set from the parsed date
and time
fields.
The fields in the xm_csv module instance below must be updated to correspond with the fields in the W3C file to be parsed. |
<Extension w3c_parser>
Module xm_csv
Fields $date, $time, $HTTPMethod, $HTTPURL
FieldTypes string, string, string, string
Delimiter ' '
EscapeChar '"'
QuoteChar '"'
EscapeControl FALSE
UndefValue -
</Extension>
<Extension _json>
Module xm_json
</Extension>
<Input w3c>
Module im_file
File '/var/log/httpd-log'
<Exec>
if $raw_event =~ /^#/ drop();
else
{
w3c_parser->parse_csv();
$EventTime = parsedate($date + " " + $time);
}
</Exec>
</Input>
XML
The xm_xml module can be used for generating and parsing structured data in XML format.
This configuration uses the im_file module to read from file. Then the parse_xml() procedure parses the XML into fields in the event record.
<Extension _xml>
Module xm_xml
</Extension>
<Input in>
Module im_file
File "/var/log/app.xml"
Exec parse_xml();
</Input>