W3C (xm_w3c)
This module provides a parser that can process data in the W3C Extended Log File Format. It also understands the Zeek format and Microsoft Exchange Message Tracking logs. While the xm_csv module can be used to parse these formats, xm_w3c has the advantage of automatically extracting information from the headers. This makes it much easier to parse such log files without the need to explicitly define the fields that appear in the input.
To examine the supported platforms, see the list of installer packages in the Available Modules chapter. |
A common W3C log source is Microsoft IIS, which produces output like the following:
#Software: Microsoft Internet Information Services 7.0
#Version: 1.0
#Date: 2010-02-13 07:08:22
#Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2010-02-13 07:08:21 W3SVC76 DNP1WEB1 174.120.30.2 GET / - 80 - 61.135.169.37 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.0.1)+Gecko/2008070208+Firefox/3.0.1 - http://www.baidu.com/s?wd=QQ www.domain.com 200 0 0 29554 273 1452
2010-02-13 07:25:00 W3SVC76 DNP1WEB1 174.120.30.2 GET /index.htm - 80 - 119.63.198.110 HTTP/1.1 Baiduspider+(+http://www.baidu.jp/spider/) - - www.itcsoftware.com 200 0 0 17791 210 551
The format generated by Zeek is similar, as it too defines the field names in the header. The field types and separator characters are also specified in the header. This allows the parser to automatically process the data. Below is a sample from Zeek:
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path dns
#open 2013-04-09-21-01-43
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto trans_id query qclass qclass_name qtype qtype_name rcode rcode_name AA TC RD
RA Z answers TTLs
#types time string addr port addr port enum count string count string count string count string bool bool bool bool count vector[string] vector[interval]
1210953058.350065 m2EJRWK7sCg 192.168.2.16 1920 192.168.2.1 53 udp 16995 ipv6.google.com 1 C_INTERNET 28 AAAA 0 NOERROR F F T T 0
ipv6.l.google.com,2001:4860:0:2001::68 8655.000000,300.000000
1210953058.350065 m2EJRWK7sCg 192.168.2.16 1920 192.168.2.1 53 udp 16995 ipv6.google.com 1 C_INTERNET 28 AAAA 0 NOERROR F F T T 0
ipv6.l.google.com,2001:4860:0:2001::68 8655.000000,300.000000
To use the parser in an input module, the InputType directive must reference the instance name of the xm_w3c module. See the example below.
Configuration
The xm_w3c module accepts the following directives in addition to the common module directives.
Optional directives
This optional directive takes a single character (see below) as an argument to specify the delimiter character used to separate fields. If this directive is not specified, the default delimiter character is either the space or tab character, as detected. For Microsoft Exchange Message Tracking logs the comma must be set as the delimiter: Delimiter ,
|
|||
This optional directive can be used to specify a field type for a particular field.
For example, to parse a |
Specifying Quote, Escape, and Delimiter Characters
The Delimiter directive can be specified in several ways.
- Unquoted single character
-
Any printable character can be specified as an unquoted character, except for the backslash (
\
):Delimiter ;
- Control characters
-
The following non-printable characters can be specified with escape sequences:
- \a
-
audible alert (bell)
- \b
-
backspace
- \t
-
horizontal tab
- \n
-
newline
- \v
-
vertical tab
- \f
-
formfeed
- \r
-
carriage return
For example, to use TAB delimiting:
Delimiter \t
- A character in single quotes
-
The configuration parser strips whitespace, so it is not possible to define a space as the delimiter unless it is enclosed within quotes:
Delimiter ' '
Printable characters can also be enclosed:
Delimiter ';'
The backslash can be specified when enclosed within quotes:
Delimiter '\'
- A character in double quotes
-
Double quotes can be used like single quotes:
Delimiter " "
The backslash can be specified when enclosed within double quotes:
Delimiter "\"
- A hexadecimal ASCII code
-
Hexadecimal ASCII character codes can be used prefixed with
0x
. For example, the space can be specified as:Delimiter 0x20
This is equivalent to:
Delimiter " "
Examples
The following configuration parses logs from the IIS Advanced Logging Module using the pipe delimiter. The logs are converted to JSON.
<Extension json>
Module xm_json
</Extension>
<Extension w3cinput>
Module xm_w3c
Delimiter |
</Extension>
<Input w3c>
Module im_file
File 'C:\inetpub\logs\LogFiles\W3SVC\ex*.log'
InputType w3cinput
</Input>
<Output file>
Module om_file
File 'C:\test\IIS.json'
Exec to_json();
</Output>
<Route w3c_to_json>
Path w3c => file
</Route>