Key-Value Pairs (xm_kvp)

This module provides functions and procedures for processing data formatted as key-value pairs (KVPs), also commonly called "name-value pairs". The module can both parse and generate key-value formatted data.

It is quite common to have a different set of keys in each log line when accepting key-value formatted input messages. Extracting values from such logs using regular expressions can be quite cumbersome. The xm_kvp extension module automates this process.

Log messages containing key-value pairs typically look like one of the following:

  • key1: value1, key2: value2, key42: value42

  • key1="value 1"; key2="value 2"

  • Application=smtp, Event='Protocol Conversation', status='Client Request', ClientRequest='HELO 1.2.3.4'

Keys are usually separated from the value using an equal sign (=) or a colon (:); and the key-value pairs are delimited with a comma (,), a semicolon (;), or a space. In addition, values and keys may be quoted and may contain escaping. The module will try to guess the format or the format can be explicitly specified using the configuration directives below.

Keys and unquoted values are trimmed of any leading or trailing white space when extracted as fields.

There is a limit on the size of the key which is 255 bytes. Keys larger than this are skipped together with their associated values.
It is possible to use more than one xm_kvp module instance with different options to support different KVP formats at the same time. For this reason, functions and procedures exported by the module are public and must be referenced by the module instance name.

Configuration

The xm_kvp module accepts the following directives in addition to the common module directives.

Optional directives

DetectNumericValues

If this optional boolean directive is set to TRUE, the parse_kvp() procedure will try to parse numeric values as integers first. The default is TRUE (numeric values will be parsed as integers and unquoted in the output). Note that floating-point numbers will not be handled.

EscapeChar

This optional directive takes a single character (see below) as argument. It specifies the character used for escaping special characters. The escape character is used to prefix the following characters: the EscapeChar itself, the KeyQuoteChar, and the ValueQuoteChar. If EscapeControl is TRUE, the newline (\n), carriage return (\r), tab (\t), and backspace (\b) control characters are also escaped. The default escape character is the backslash (\). White-space characters are not permitted to be configured as the EscapeChar.

EscapeControl

If this optional boolean directive is set to TRUE, control characters are also escaped. See the EscapeChar directive for details. The default is TRUE (control characters are escaped). Note that this is necessary to support single-line KVP field lists containing line breaks.

IncludeHiddenFields

This boolean directive specifies that the to_kvp() function or the to_kvp() procedure should include fields having a leading dot (.) or underscore (_) in their names. The default is TRUE. If IncludeHiddenFields is set to TRUE, then the generated text will contain these otherwise excluded fields.

KeyQuoteChar

This optional directive takes a single character (see below) as argument. It specifies the quote character for enclosing key names. If this directive is not specified, the module will accept single-quoted keys, double-quoted keys, and unquoted keys. White-space characters are not permitted to be configured as the KeyQuoteChar.

KVDelimiter

This optional directive takes a single character (see below) as argument. It specifies the delimiter character used to separate the key from the value. If this directive is not set and the parse_kvp() procedure is used, the module will try to guess the delimiter from the following: the colon (:) or the equal-sign (=).

KVPDelimiter

This optional directive takes a single character (see below) as argument. It specifies the delimiter character used to separate the key-value pairs. If this directive is not set and the parse_kvp() procedure is used, the module will try to guess the delimiter from the following: the comma (,), the semicolon (;), or the space.

QuoteMethod

This directive can be used to specify the quote method used for the values by to_kvp().

All

The values will be always quoted. This is the default.

Delimiter

The value will be only enclosed in quotes if it contains the delimiter character.

None

The values will not be quoted.

ValueQuoteChar

This optional directive takes a single character (see below) as an argument. It specifies the quote character for enclosing key values. If this directive is not specified, the module will accept single-quoted values, double-quoted values, and unquoted values. Normally, a quotation is used when the value contains a space or the KVDelimiter character. White-space characters are not permitted.

Specifying Quote, Escape, and Delimiter Characters

The KeyQuoteChar, ValueQuoteChar, EscapeChar, KVDelimiter, and KVPDelimiter directives can be specified in several ways.

Unquoted single character

Any printable character can be specified as an unquoted character, except for the backslash (\):

Delimiter ;
Control characters

The following non-printable characters can be specified with escape sequences:

\a

audible alert (bell)

\b

backspace

\t

horizontal tab

\n

newline

\v

vertical tab

\f

formfeed

\r

carriage return

For example, to use TAB delimiting:

Delimiter \t
A character in single quotes

The configuration parser strips whitespace, so it is not possible to define a space as the delimiter unless it is enclosed within quotes:

Delimiter ' '

Printable characters can also be enclosed:

Delimiter ';'

The backslash can be specified when enclosed within quotes:

Delimiter '\'
A character in double quotes

Double quotes can be used like single quotes:

Delimiter " "

The backslash can be specified when enclosed within double quotes:

Delimiter "\"
A hexadecimal ASCII code

Hexadecimal ASCII character codes can be used prefixed with 0x. For example, the space can be specified as:

Delimiter 0x20

This is equivalent to:

Delimiter " "

Functions

The following functions are exported by xm_kvp.

string to_kvp()

Convert the internal fields to a single key-value pair formatted string.

Procedures

The following procedures are exported by xm_kvp.

parse_kvp();

Parse the $raw_event field as key-value pairs and populate the internal fields using the key names.

parse_kvp(string source);

Parse the given string key-value pairs and populate the internal fields using the key names.

parse_kvp(string source, string prefix);

Parse the given string key-value pairs and populate the internal fields using the key names prefixed with the value of the second parameter.

reset_kvp();

Reset the KVP parser so that the autodetected KeyQuoteChar, ValueQuoteChar, KVDelimiter, and KVPDelimiter characters can be detected again.

to_kvp();

Format the internal fields as key-value pairs and put this into the $raw_event field.

Note that directive IncludeHiddenFields has an effect on fields included in the output.

Creating and populating fields

The parse_kvp() procedure parses a string containing a list of key-value pairs into structured data. It expects the $raw_event field or the string passed as a parameter to be in the following format:

Key1=Value1, Key2=Value2

The key-value delimiter, = in this example, can be defined by the KVDelimiter directive, while the key-value pair delimiter, , in this example, can be defined by the KVPDelimiter directive.

Once a log record is parsed with this procedure, fields are created based on the keys and values in the string. The fields can be used for further log processing or to convert the log record into a different output format. See the Examples below for how to parse key-value pairs and manipulate fields with NXLog Agent.

Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension.

Examples

The following examples illustrate various scenarios for parsing KVPs, whether embedded, encapsulated (in Syslog, for example), or alone. In each case, the logs are converted from KVP input files to JSON output files, though obviously there are many other possibilities.

Example 1. Simple KVP parsing

The following two lines of input are in a simple KVP format where each line consists of various keys with values assigned to them.

Input sample
Name=John, Age=42, Weight=84, Height=142
Name=Mike, Weight=64, Age=24, Pet=dog, Height=172

This input can be parsed with the following configuration. The parsed fields can be used in NXLog Agent expressions: a new field named $Overweight is added and set to TRUE if the conditions are met. Finally, a few automatically added fields are removed, and the log is then converted to JSON.

nxlog.conf
<Extension kvp>
    Module          xm_kvp
    KVPDelimiter    ,
    KVDelimiter     =
    EscapeChar      \\
</Extension>

<Extension json>
    Module          xm_json
</Extension>

<Input filein>
    Module          im_file
    File            "modules/extension/kvp/xm_kvp5.in"
    <Exec>
        if $raw_event =~ /^#/ drop();
        else
        {
            kvp->parse_kvp();
            delete($EventReceivedTime);
            delete($SourceModuleName);
            delete($SourceModuleType);
            if ( integer($Weight) > integer($Height) - 100 ) $Overweight = TRUE;
            to_json();
        }
    </Exec>
</Input>

<Output fileout>
    Module          om_file
    File            'tmp/output'
</Output>

<Route parse_kvp>
    Path            filein => fileout
</Route>
Output sample
{"Name":"John","Age":42,"Weight":84,"Height":142,"Overweight":true}
{"Name":"Mike","Weight":64,"Age":24,"Pet":"dog","Height":172}
Example 2. Parsing KVPs in Cisco ACS syslog

The following lines are from a Cisco ACS source.

Input sample
<38>2010-10-12 21:01:29 10.0.1.1 CisACS_02_FailedAuth 1k1fg93nk 1 0 Message-Type=Authen failed,User-Name=John,NAS-IP-Address=10.0.1.2,AAA Server=acs01
<38>2010-10-12 21:01:31 10.0.1.1 CisACS_02_FailedAuth 2k1fg63nk 1 0 Message-Type=Authen failed,User-Name=Foo,NAS-IP-Address=10.0.1.2,AAA Server=acs01

These logs are in syslog format with a set of values present in each record and an additional set of KVPs. The following configuration can be used to process this and convert it to JSON.

nxlog.conf
<Extension json>
    Module          xm_json
</Extension>

<Extension syslog>
    Module          xm_syslog
</Extension>

<Extension kvp>
    Module          xm_kvp
    KVDelimiter     =
    KVPDelimiter    ,
</Extension>

<Input cisco>
    Module          im_file
    File            "modules/extension/kvp/cisco_acs.in"
    <Exec>
        parse_syslog_bsd();
        if ( $Message =~ /^CisACS_(\d\d)_(\S+) (\S+) (\d+) (\d+) (.*)$/ )
        {
            $ACSCategoryNumber = $1;
            $ACSCategoryName = $2;
            $ACSMessageId = $3;
            $ACSTotalSegments = $4;
            $ACSSegmentNumber = $5;
            $Message = $6;
            kvp->parse_kvp($Message);
        }
        else log_warning("does not match: " + to_json());
    </Exec>
</Input>

<Output file>
    Module          om_file
    File            "tmp/output"
    Exec            delete($EventReceivedTime);
    Exec            to_json();
</Output>

<Route cisco_to_file>
    Path            cisco => file
</Route>
Output sample
{"SourceModuleName":"cisco","SourceModuleType":"im_file","SyslogFacilityValue":4,"SyslogFacility":"AUTH","SyslogSeverityValue":6,"SyslogSeverity":"INFO","SeverityValue":2,"Severity":"INFO","Hostname":"10.0.1.1","EventTime":"2010-10-12 21:01:29","Message":"Message-Type=Authen failed,User-Name=John,NAS-IP-Address=10.0.1.2,AAA Server=acs01","ACSCategoryNumber":"02","ACSCategoryName":"FailedAuth","ACSMessageId":"1k1fg93nk","ACSTotalSegments":"1","ACSSegmentNumber":"0","Message-Type":"Authen failed","User-Name":"John","NAS-IP-Address":"10.0.1.2","AAA Server":"acs01"}
{"SourceModuleName":"cisco","SourceModuleType":"im_file","SyslogFacilityValue":4,"SyslogFacility":"AUTH","SyslogSeverityValue":6,"SyslogSeverity":"INFO","SeverityValue":2,"Severity":"INFO","Hostname":"10.0.1.1","EventTime":"2010-10-12 21:01:31","Message":"Message-Type=Authen failed,User-Name=Foo,NAS-IP-Address=10.0.1.2,AAA Server=acs01","ACSCategoryNumber":"02","ACSCategoryName":"FailedAuth","ACSMessageId":"2k1fg63nk","ACSTotalSegments":"1","ACSSegmentNumber":"0","Message-Type":"Authen failed","User-Name":"Foo","NAS-IP-Address":"10.0.1.2","AAA Server":"acs01"}
Example 3. Parsing KVPs in Sidewinder logs

The following line is from a Sidewinder log source.

Input sample
date="May 5 14:34:40 2009 MDT",fac=f_mail_filter,area=a_kmvfilter,type=t_mimevirus_reject,pri=p_major,pid=10174,ruid=0,euid=0,pgid=10174,logid=0,cmd=kmvfilter,domain=MMF1,edomain=MMF1,message_id=(null),srcip=66.74.184.9,mail_sender=<habuzeid6@…>,virus_name=W32/Netsky.c@MM!zip,reason="Message scan detected a Virus in msg Unknown, message being Discarded, and not quarantined"

This can be parsed and converted to JSON with the following configuration.

nxlog.conf
<Extension kvp>
    Module          xm_kvp
    KVPDelimiter    ,
    KVDelimiter     =
    EscapeChar      \\
    ValueQuoteChar  "
</Extension>

<Extension json>
    Module          xm_json
</Extension>

<Input sidewinder>
    Module          im_file
    File            "modules/extension/kvp/sidewinder.in"
    Exec            kvp->parse_kvp(); delete($EventReceivedTime); to_json();
</Input>

<Output file>
    Module          om_file
    File            'tmp/output'
</Output>

<Route sidewinder_to_file>
    Path            sidewinder => file
</Route>
Output sample
{"SourceModuleName":"sidewinder","SourceModuleType":"im_file","date":"May 5 14:34:40 2009 MDT","fac":"f_mail_filter","area":"a_kmvfilter","type":"t_mimevirus_reject","pri":"p_major","pid":10174,"ruid":0,"euid":0,"pgid":10174,"logid":0,"cmd":"kmvfilter","domain":"MMF1","edomain":"MMF1","message_id":"(null)","srcip":"66.74.184.9","mail_sender":"<habuzeid6@…>","virus_name":"W32/Netsky.c@MM!zip","reason":"Message scan detected a Virus in msg Unknown, message being Discarded, and not quarantined"}
Example 4. Parsing URL request parameters in Apache Access logs

URLs in HTTP requests frequently contain URL parameters which are a special kind of key-value pairs delimited by the ampersand (&). Here is an example of two HTTP requests logged by the Apache web server in the Combined Log Format.

Input sample
192.168.1.1 - foo [11/Jun/2013:15:44:34 +0200] "GET /do?action=view&obj_id=2 HTTP/1.1" 200 1514 "https://localhost" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0"
192.168.1.1 - - [11/Jun/2013:15:44:44 +0200] "GET /do?action=delete&obj_id=42 HTTP/1.1" 401 788 "https://localhost" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0"

The following configuration file parses the access log and extracts all the fields. The request parameters are extracted into the $HTTPParams field using a regular expression, and then this field is further parsed using the KVP parser. At the end of the processing all fields are converted to KVP format using the to_kvp() procedure of the kvp2 instance.

nxlog.conf
<Extension kvp>
    Module          xm_kvp
    KVPDelimiter    &
    KVDelimiter     =
</Extension>

<Extension kvp2>
    Module          xm_kvp
    KVPDelimiter    ;
    KVDelimiter     =
    #QuoteMethod    None
</Extension>

<Input apache>
    Module          im_file
    File            "modules/extension/kvp/apache_url.in"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ (\S+)\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP.\d\.\d\"\ (\d+)\ (\d+)\ \"([^\"]+)\"\ \"([^\"]+)\"/
        {
            $Hostname = $1;
            if $3 != '-' $AccountName = $3;
            $EventTime = parsedate($4);
            $HTTPMethod = $5;
            $HTTPURL = $6;
            $HTTPResponseStatus = $7;
            $FileSize = $8;
            $HTTPReferer = $9;
            $HTTPUserAgent = $10;
            if $HTTPURL =~ /\?(.+)/ { $HTTPParams = $1; }
            kvp->parse_kvp($HTTPParams);
            delete($EventReceivedTime);
            kvp2->to_kvp();
        }
    </Exec>
</Input>

<Output file>
    Module          om_file
    File            'tmp/output'
</Output>

<Route apache_to_file>
    Path            apache => file
</Route>

The two request parameters action and obj_id then appear at the end of the KVP formatted lines.

Output sample
SourceModuleName=apache;SourceModuleType=im_file;Hostname=192.168.1.1;AccountName=foo;EventTime=2013-06-11T13:44:34.000000Z;HTTPMethod=GET;HTTPURL=/do?action=view&obj_id=2;HTTPResponseStatus=200;FileSize=1514;HTTPReferer=https://localhost;HTTPUserAgent='Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0';HTTPParams=action=view&obj_id=2;action=view;obj_id=2;
SourceModuleName=apache;SourceModuleType=im_file;Hostname=192.168.1.1;EventTime=2013-06-11T13:44:44.000000Z;HTTPMethod=GET;HTTPURL=/do?action=delete&obj_id=42;HTTPResponseStatus=401;FileSize=788;HTTPReferer=https://localhost;HTTPUserAgent='Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Firefox/17.0';HTTPParams=action=delete&obj_id=42;action=delete;obj_id=42;
URL escaping is not handled.