W3C (xm_w3c)

This module provides a parser that can process data in the W3C Extended Log File Format. It also understands the Zeek format and Microsoft Exchange Message Tracking logs. While the xm_csv module can be used to parse these formats, xm_w3c has the advantage of automatically extracting information from the headers. This makes it much easier to parse such log files without the need to explicitly define the fields that appear in the input.

A common W3C log source is Microsoft IIS, which produces output like the following:

#Software: Microsoft Internet Information Services 7.0
#Version: 1.0
#Date: 2010-02-13 07:08:22
#Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2010-02-13 07:08:21 W3SVC76 DNP1WEB1 174.120.30.2 GET / - 80 - 61.135.169.37 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.0.1)+Gecko/2008070208+Firefox/3.0.1 - http://www.baidu.com/s?wd=QQ www.domain.com 200 0 0 29554 273 1452
2010-02-13 07:25:00 W3SVC76 DNP1WEB1 174.120.30.2 GET /index.htm - 80 - 119.63.198.110 HTTP/1.1 Baiduspider+(+http://www.baidu.jp/spider/) - - www.itcsoftware.com 200 0 0 17791 210 551

The format generated by Zeek is similar, as it too defines the field names in the header. The field types and separator characters are also specified in the header. This allows the parser to automatically process the data. Below is a sample from Zeek:

#separator \x09
#set_separator  ,
#empty_field    (empty)
#unset_field    -
#path   dns
#open   2013-04-09-21-01-43
#fields ts      uid     id.orig_h       id.orig_p       id.resp_h       id.resp_p       proto   trans_id        query   qclass  qclass_name     qtype   qtype_name      rcode   rcode_name      AA      TC      RD
   RA   Z       answers TTLs
#types  time    string  addr    port    addr    port    enum    count   string  count   string  count   string  count   string  bool    bool    bool    bool    count   vector[string]  vector[interval]
1210953058.350065       m2EJRWK7sCg     192.168.2.16    1920    192.168.2.1     53      udp     16995   ipv6.google.com 1       C_INTERNET      28      AAAA    0       NOERROR F       F       T       T       0
   ipv6.l.google.com,2001:4860:0:2001::68       8655.000000,300.000000
1210953058.350065       m2EJRWK7sCg     192.168.2.16    1920    192.168.2.1     53      udp     16995   ipv6.google.com 1       C_INTERNET      28      AAAA    0       NOERROR F       F       T       T       0
   ipv6.l.google.com,2001:4860:0:2001::68       8655.000000,300.000000

To use the parser in an input module, the InputType directive must reference the instance name of the xm_w3c module. See the example below.

Configuration

The xm_w3c module accepts the following directives in addition to the common module directives.

Optional directives

Delimiter

This optional directive takes a single character (see below) as an argument to specify the delimiter character used to separate fields. If this directive is not specified, the default delimiter character is either the space or tab character, as detected. For Microsoft Exchange Message Tracking logs the comma must be set as the delimiter:

Delimiter ,
There is no delimiter after the last field in W3C, but Microsoft Exchange Message Tracking logs can contain a trailing comma.

FieldType

This optional directive can be used to specify a field type for a particular field. For example, to parse a ByteSent field as an integer, use FieldType ByteSent integer. This directive can be used more than once to provide types for multiple fields.

Specifying Quote, Escape, and Delimiter Characters

The Delimiter directive can be specified in several ways.

Unquoted single character

Any printable character can be specified as an unquoted character, except for the backslash (\):

Delimiter ;
Control characters

The following non-printable characters can be specified with escape sequences:

\a

audible alert (bell)

\b

backspace

\t

horizontal tab

\n

newline

\v

vertical tab

\f

formfeed

\r

carriage return

For example, to use TAB delimiting:

Delimiter \t
A character in single quotes

The configuration parser strips whitespace, so it is not possible to define a space as the delimiter unless it is enclosed within quotes:

Delimiter ' '

Printable characters can also be enclosed:

Delimiter ';'

The backslash can be specified when enclosed within quotes:

Delimiter '\'
A character in double quotes

Double quotes can be used like single quotes:

Delimiter " "

The backslash can be specified when enclosed within double quotes:

Delimiter "\"
A hexadecimal ASCII code

Hexadecimal ASCII character codes can be used prefixed with 0x. For example, the space can be specified as:

Delimiter 0x20

This is equivalent to:

Delimiter " "

Fields

The following fields are used by xm_w3c.

$EventTime (type: datetime)

Constructed from the date and time fields in the input, or from a date-time field.

$SourceName (type: string)

The string in the Software header, such as Microsoft Internet Information Services 7.0.

Examples

Example 1. Parsing advanced IIS logs

The following configuration parses logs from the IIS Advanced Logging Module using the pipe delimiter. The logs are converted to JSON.

nxlog.conf
<Extension json>
    Module      xm_json
</Extension>

<Extension w3cinput>
    Module      xm_w3c
    Delimiter   |
</Extension>

<Input w3c>
    Module      im_file
    File        'C:\inetpub\logs\LogFiles\W3SVC\ex*.log'
    InputType   w3cinput
</Input>

<Output file>
    Module      om_file
    File        'C:\test\IIS.json'
    Exec        to_json();
</Output>

<Route w3c_to_json>
    Path        w3c => file
</Route>