Key-Value Pairs (xm_kvp)
This module provides functions and procedures for processing data formatted as key-value pairs (KVPs), also commonly called "name-value pairs". The module can both parse and generate key-value formatted data.
It is quite common to have a different set of keys in each log line when accepting key-value formatted input messages. Extracting values from such logs using regular expressions can be quite cumbersome. The xm_kvp extension module automates this process.
Log messages containing key-value pairs typically look like one of the following:
-
key1: value1, key2: value2, key42: value42
-
key1="value 1"; key2="value 2"
-
Application=smtp, Event='Protocol Conversation', status='Client Request', ClientRequest='HELO 1.2.3.4'
Keys are usually separated from the value using an equal sign (=
) or a colon (:
); and the key-value pairs are delimited with a comma (,
), a semicolon (;
), or a space.
In addition, values and keys may be quoted and may contain escaping.
The module will try to guess the format or the format can be explicitly specified using the configuration directives below.
Keys and unquoted values are trimmed of any leading or trailing white space when extracted as fields.
There is a limit on the size of the key which is 255 bytes. Keys larger than this are skipped together with their associated values. |
It is possible to use more than one xm_kvp module instance with different options to support different KVP formats at the same time. For this reason, functions and procedures exported by the module are public and must be referenced by the module instance name. |
Configuration
The xm_kvp module accepts the following directives in addition to the common module directives.
Optional directives
If this optional boolean directive is set to TRUE, the parse_kvp() procedure will try to parse numeric values as integers first. The default is TRUE (numeric values will be parsed as integers and unquoted in the output). Note that floating-point numbers will not be handled. |
|
This optional directive takes a single character (see below) as argument.
It specifies the character used for escaping special characters. The escape character is used to prefix the following characters: the EscapeChar itself, the KeyQuoteChar, and the ValueQuoteChar.
If EscapeControl is TRUE, the newline ( |
|
If this optional boolean directive is set to TRUE, control characters are also escaped. See the EscapeChar directive for details. The default is TRUE (control characters are escaped). Note that this is necessary support single-line KVP field lists containing line breaks. |
|
|
This boolean directive specifies that the to_kvp() function or the to_kvp() procedure should include fields having a leading dot ( |
This optional directive takes a single character (see below) as an argument. It specifies the quote character for enclosing key names. If this directive is not specified, the module will accept single-quoted keys, double-quoted keys, and unquoted keys. White-space characters are not permitted to be configured as the KeyQuoteChar. |
|
This optional directive takes a single character (see below) as an argument.
It specifies the delimiter character used to separate the key from the value.
If this directive is not set and the parse_kvp() procedure is used, the module will try to guess the delimiter from the following: the colon ( |
|
This optional directive takes a single character (see below) as an argument.
It specifies the delimiter character used to separate the key-value pairs. If this directive is not set and the parse_kvp() procedure is used, the module will try to guess the delimiter from the following: the comma ( |
|
This directive can be used to specify the quote method used for the values by to_kvp().
|
|
This optional directive takes a single character (see below) as argument. It specifies the quote character for enclosing key values. If this directive is not specified, the module will accept single-quoted values, double-quoted values, and unquoted values. Normally, a quotation is used when the value contains a space or the KVDelimiter character. White-space characters are not permitted. |
Specifying Quote, Escape, and Delimiter Characters
The KeyQuoteChar, ValueQuoteChar, EscapeChar, KVDelimiter, and KVPDelimiter directives can be specified in several ways.
- Unquoted single character
-
Any printable character can be specified as an unquoted character, except for the backslash (
\
):Delimiter ;
- Control characters
-
The following non-printable characters can be specified with escape sequences:
- \a
-
audible alert (bell)
- \b
-
backspace
- \t
-
horizontal tab
- \n
-
newline
- \v
-
vertical tab
- \f
-
formfeed
- \r
-
carriage return
For example, to use TAB delimiting:
Delimiter \t
- A character in single quotes
-
The configuration parser strips whitespace, so it is not possible to define a space as the delimiter unless it is enclosed within quotes:
Delimiter ' '
Printable characters can also be enclosed:
Delimiter ';'
The backslash can be specified when enclosed within quotes:
Delimiter '\'
- A character in double quotes
-
Double quotes can be used like single quotes:
Delimiter " "
The backslash can be specified when enclosed within double quotes:
Delimiter "\"
- A hexadecimal ASCII code
-
Hexadecimal ASCII character codes can be used prefixed with
0x
. For example, the space can be specified as:Delimiter 0x20
This is equivalent to:
Delimiter " "
Functions
The following functions are exported by xm_kvp.
- string
to_kvp()
-
Convert the internal fields to a single key-value pair formatted string.
Procedures
The following procedures are exported by xm_kvp.
parse_kvp();
-
Parse the
$raw_event
field as key-value pairs and populate the internal fields using the key names.
parse_kvp(string source);
-
Parse the given string key-value pairs and populate the internal fields using the key names.
reset_kvp();
-
Reset the KVP parser so that the autodetected KeyQuoteChar, ValueQuoteChar, KVDelimiter, and KVPDelimiter characters can be detected again.
to_kvp();
-
Format the internal fields as key-value pairs and put this into the
$raw_event
field.Note that directive IncludeHiddenFields has an effect on fields included in the output.
Creating and populating fields
The parse_kvp() procedure parses a string containing a list of key-value pairs into structured data.
It expects the $raw_event
field or the string passed as a parameter to be in the following format:
Key1=Value1, Key2=Value2
The key-value delimiter, =
in this example, can be defined by the KVDelimiter directive, while the key-value pair delimiter, ,
in this example, can be defined by the KVPDelimiter directive.
Once a log record is parsed with this procedure, fields are created based on the keys and values in the string. The fields can be used for further log processing or to convert the log record into a different output format. See the Examples below for how to parse key-value pairs and manipulate fields with NXLog Agent.
Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension. |
Examples
The following examples illustrate various scenarios for parsing KVPs, whether embedded, encapsulated (in Syslog, for example), or alone. In each case, the logs are converted from KVP input files to JSON output files, though obviously there are many other possibilities.
The following two lines of input are in a simple KVP format where each line consists of various keys with values assigned to them.
Name=John, Age=42, Weight=84, Height=142
Name=Mike, Weight=64, Age=24, Pet=dog, Height=172
This input can be parsed with the following configuration.
The parsed fields can be used in NXLog Agent expressions: a new field named $Overweight
is added and set to TRUE if the conditions are met.
Finally, a few automatically added fields are removed, and the log is then converted to JSON.
<Extension kvp>
Module xm_kvp
KVPDelimiter ,
KVDelimiter =
EscapeChar \\
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input filein>
Module im_file
File "modules/extension/kvp/xm_kvp5.in"
<Exec>
if $raw_event =~ /^#/ drop();
else
{
kvp->parse_kvp();
delete($EventReceivedTime);
delete($SourceModuleName);
delete($SourceModuleType);
if ( integer($Weight) > integer($Height) - 100 ) $Overweight = TRUE;
to_json();
}
</Exec>
</Input>
<Output fileout>
Module om_file
File 'tmp/output'
</Output>
<Route parse_kvp>
Path filein => fileout
</Route>
{"Name":"John","Age":42,"Weight":84,"Height":142,"Overweight":true}
{"Name":"Mike","Weight":64,"Age":24,"Pet":"dog","Height":172}
The following lines are from a Cisco ACS source.
<38>2010-10-12 21:01:29 10.0.1.1 CisACS_02_FailedAuth 1k1fg93nk 1 0 Message-Type=Authen failed,User-Name=John,NAS-IP-Address=10.0.1.2,AAA Server=acs01
<38>2010-10-12 21:01:31 10.0.1.1 CisACS_02_FailedAuth 2k1fg63nk 1 0 Message-Type=Authen failed,User-Name=Foo,NAS-IP-Address=10.0.1.2,AAA Server=acs01
These logs are in syslog format with a set of values present in each record and an additional set of KVPs. The following configuration can be used to process this and convert it to JSON.
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Extension kvp>
Module xm_kvp
KVDelimiter =
KVPDelimiter ,
</Extension>
<Input cisco>
Module im_file
File "modules/extension/kvp/cisco_acs.in"
<Exec>
parse_syslog_bsd();
if ( $Message =~ /^CisACS_(\d\d)_(\S+) (\S+) (\d+) (\d+) (.*)$/ )
{
$ACSCategoryNumber = $1;
$ACSCategoryName = $2;
$ACSMessageId = $3;
$ACSTotalSegments = $4;
$ACSSegmentNumber = $5;
$Message = $6;
kvp->parse_kvp($Message);
}
else log_warning("does not match: " + to_json());
</Exec>
</Input>
<Output file>
Module om_file
File "tmp/output"
Exec delete($EventReceivedTime);
Exec to_json();
</Output>
<Route cisco_to_file>
Path cisco => file
</Route>
{"SourceModuleName":"cisco","SourceModuleType":"im_file","SyslogFacilityValue":4,"SyslogFacility":"AUTH","SyslogSeverityValue":6,"SyslogSeverity":"INFO","SeverityValue":2,"Severity":"INFO","Hostname":"10.0.1.1","EventTime":"2010-10-12 21:01:29","Message":"Message-Type=Authen failed,User-Name=John,NAS-IP-Address=10.0.1.2,AAA Server=acs01","ACSCategoryNumber":"02","ACSCategoryName":"FailedAuth","ACSMessageId":"1k1fg93nk","ACSTotalSegments":"1","ACSSegmentNumber":"0","Message-Type":"Authen failed","User-Name":"John","NAS-IP-Address":"10.0.1.2","AAA Server":"acs01"}
{"SourceModuleName":"cisco","SourceModuleType":"im_file","SyslogFacilityValue":4,"SyslogFacility":"AUTH","SyslogSeverityValue":6,"SyslogSeverity":"INFO","SeverityValue":2,"Severity":"INFO","Hostname":"10.0.1.1","EventTime":"2010-10-12 21:01:31","Message":"Message-Type=Authen failed,User-Name=Foo,NAS-IP-Address=10.0.1.2,AAA Server=acs01","ACSCategoryNumber":"02","ACSCategoryName":"FailedAuth","ACSMessageId":"2k1fg63nk","ACSTotalSegments":"1","ACSSegmentNumber":"0","Message-Type":"Authen failed","User-Name":"Foo","NAS-IP-Address":"10.0.1.2","AAA Server":"acs01"}
The following line is from a Sidewinder log source.
date="May 5 14:34:40 2009 MDT",fac=f_mail_filter,area=a_kmvfilter,type=t_mimevirus_reject,pri=p_major,pid=10174,ruid=0,euid=0,pgid=10174,logid=0,cmd=kmvfilter,domain=MMF1,edomain=MMF1,message_id=(null),srcip=66.74.184.9,mail_sender=<habuzeid6@…>,virus_name=W32/Netsky.c@MM!zip,reason="Message scan detected a Virus in msg Unknown, message being Discarded, and not quarantined"
This can be parsed and converted to JSON with the following configuration.
<Extension kvp>
Module xm_kvp
KVPDelimiter ,
KVDelimiter =
EscapeChar \\
ValueQuoteChar "
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input sidewinder>
Module im_file
File "modules/extension/kvp/sidewinder.in"
Exec kvp->parse_kvp(); delete($EventReceivedTime); to_json();
</Input>
<Output file>
Module om_file
File 'tmp/output'
</Output>
<Route sidewinder_to_file>
Path sidewinder => file
</Route>
{"SourceModuleName":"sidewinder","SourceModuleType":"im_file","date":"May 5 14:34:40 2009 MDT","fac":"f_mail_filter","area":"a_kmvfilter","type":"t_mimevirus_reject","pri":"p_major","pid":10174,"ruid":0,"euid":0,"pgid":10174,"logid":0,"cmd":"kmvfilter","domain":"MMF1","edomain":"MMF1","message_id":"(null)","srcip":"66.74.184.9","mail_sender":"<habuzeid6@…>","virus_name":"W32/Netsky.c@MM!zip","reason":"Message scan detected a Virus in msg Unknown, message being Discarded, and not quarantined"}
URLs in HTTP requests frequently contain URL parameters which are a special kind of key-value pairs delimited by the ampersand (&
).
Here is an example of two HTTP requests logged by the Apache web server in the Combined Log Format.
192.168.1.1 - foo [11/Jun/2013:15:44:34 +0200] "GET /do?action=view&obj_id=2 HTTP/1.1" 200 1514 "https://localhost" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0"
192.168.1.1 - - [11/Jun/2013:15:44:44 +0200] "GET /do?action=delete&obj_id=42 HTTP/1.1" 401 788 "https://localhost" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0"
The following configuration file parses the access log and extracts all the fields. The request parameters are extracted into the $HTTPParams field using a regular expression, and then this field is further parsed using the KVP parser. At the end of the processing all fields are converted to KVP format using the to_kvp() procedure of the kvp2 instance.
<Extension kvp>
Module xm_kvp
KVPDelimiter &
KVDelimiter =
</Extension>
<Extension kvp2>
Module xm_kvp
KVPDelimiter ;
KVDelimiter =
#QuoteMethod None
</Extension>
<Input apache>
Module im_file
File "modules/extension/kvp/apache_url.in"
<Exec>
if $raw_event =~ /(?x)^(\S+)\ (\S+)\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP.\d\.\d\"\ (\d+)\ (\d+)\ \"([^\"]+)\"\ \"([^\"]+)\"/
{
$Hostname = $1;
if $3 != '-' $AccountName = $3;
$EventTime = parsedate($4);
$HTTPMethod = $5;
$HTTPURL = $6;
$HTTPResponseStatus = $7;
$FileSize = $8;
$HTTPReferer = $9;
$HTTPUserAgent = $10;
if $HTTPURL =~ /\?(.+)/ { $HTTPParams = $1; }
kvp->parse_kvp($HTTPParams);
delete($EventReceivedTime);
kvp2->to_kvp();
}
</Exec>
</Input>
<Output file>
Module om_file
File 'tmp/output'
</Output>
<Route apache_to_file>
Path apache => file
</Route>
The two request parameters action and obj_id then appear at the end of the KVP formatted lines.
SourceModuleName=apache;SourceModuleType=im_file;Hostname=192.168.1.1;AccountName=foo;EventTime=2013-06-11T13:44:34.000000Z;HTTPMethod=GET;HTTPURL=/do?action=view&obj_id=2;HTTPResponseStatus=200;FileSize=1514;HTTPReferer=https://localhost;HTTPUserAgent='Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0';HTTPParams=action=view&obj_id=2;action=view;obj_id=2;
SourceModuleName=apache;SourceModuleType=im_file;Hostname=192.168.1.1;EventTime=2013-06-11T13:44:44.000000Z;HTTPMethod=GET;HTTPURL=/do?action=delete&obj_id=42;HTTPResponseStatus=401;FileSize=788;HTTPReferer=https://localhost;HTTPUserAgent='Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0';HTTPParams=action=delete&obj_id=42;action=delete;obj_id=42;
URL escaping is not handled. |