Parse unstructured logs with NXLog Agent

Log parsing is when you extract data from unstructured event log records based on rules that segment messages into named fields or columns.

When NXLog Agent receives a log event, it creates an event record consisting of a $raw_event field and other core fields. Input modules that automatically parse log records create additional fields. See also NXLog Agent log processing overview.

NXLog Agent supports regular expressions with capturing groups, Grok patterns, and NXLog patterns for parsing logs. Below, we provide various examples of extracting values from a raw event with NXLog Agent.

Using regular expressions

Regular expressions are one of the most common methods to parse unstructured log data into fields. See Regular Expressions in the NXLog Agent Reference Manual for more information on using regular expressions.

Example 1. Parsing logs with a regular expression

This configuration receives syslog messages over UDP and parses records into structured data using the parse_syslog() procedure of the xm_syslog module. This procedure adds the $Message field to the event record.

It then uses a regular expression to parse the $Message field and extract the authentication method, username, and source IP address if available.

nxlog.conf
<Extension syslog>
    Module        xm_syslog
</Extension>

<Input udp>
    Module        im_udp
    ListenAddr    0.0.0.0:514
    <Exec>
        parse_syslog();
        if $Message =~ /(?x)^Failed\ (\S+)\ for(?:\ invalid user)?\ (\S+)\ from
                        \ (\S+)\ port\ \d+\ ssh2$/
        {
            $AuthMethod = $1;
            $AccountName = $2;
            $SourceIPAddress = $3;
        }
    </Exec>
</Input>

NXLog Agent also supports named capturing. Named capturing groups you specify in the regular expression are automatically added to the event record as a field with the same name. The following configuration is the same as above but uses a regular expression with named capturing groups.

nxlog.conf
<Extension syslog>
    Module        xm_syslog
</Extension>

<Input udp>
    Module        im_udp
    ListenAddr    0.0.0.0:514
    <Exec>
        parse_syslog();
        $Message =~ /(?x)^Failed\ (?<AuthMethod>\S+)\ for(?:\ invalid\ user)?
                     \ (?<AccountName>\S+)\ from\ (?<SourceIPAddress>\S+)\ port
                     \ \d+\ ssh2$/;
    </Exec>
</Input>

The following is a syslog message collected from a Linux host.

Input sample
<38>Oct 16 12:15:30 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2

When the NXLog Agent configurations above process this message, they add the following fields to the log record in addition to the core fields.

Field Value

$AuthMethod

password

$AccountName

linda

$SourceIPAddress

192.168.1.60

Using Grok patterns

The xm_grok module supports parsing unstructured log data with Grok patterns. Below, we demonstrate how to parse Apache logs using Grok patterns.

Example 2. Parsing Apache logs with Grok patterns

The first step is to define the Grok patterns to parse your logs. In this example, we define patterns for Apache access and error logs.

Apache access log
192.168.3.20 - - [16/Oct/2023] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

You can parse Apache access log events like the above with the following Grok pattern:

ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\]
"%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}"
%{INT:http_status_code} %{INT:response_size}
Apache error log
[Mon Oct 16 12:15:30 2023] [error] [client 192.168.3.123] Directory index forbidden
by rule: /home/test/

You can parse Apache error log events like the above with the following Grok pattern:

ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\]
\[client %{IP:client_address}\] %{GREEDYDATA:message}

Next, save your patterns in a Grok pattern file like the one below.

patterns.txt
INT (?:[+-]?(?:[0-9]+))
YEAR (?>\d\d){1,2}
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
GREEDYDATA .*
IP (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
TIMESTAMP_ACCESS %{INT}\/%{MONTH}\/%{YEAR}(:%{HOUR}:%{MINUTE}:%{SECOND} %{GREEDYDATA})?
TIMESTAMP_ERROR %{DAY} %{MONTH} %{INT} %{HOUR}:%{MINUTE}:%{SECOND} %{YEAR}
METHOD (GET|POST|PUT|DELETE|HEAD|TRACE|OPTIONS|CONNECT|PATCH){1}
HTTP_VERSION 1.(0|1)

ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\] "%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}" %{INT:http_status_code} %{INT:response_size}
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\] \[client %{IP:client_address}\] %{GREEDYDATA:message}

An online search will yield many examples of ready-made Grok patterns. See the Logstash Grok patterns on GitHub to start with.

You are now ready to process logs with Grok patterns. This configuration reads Apache logs with the im_file input module, which stores the unparsed log event in the $raw_event field. It then uses the match_grok() function of the xm_grok module to test $raw_event against Grok patterns defined in patterns.txt. If none of the patterns match, it writes an informational event in the NXLog Agent log file.

nxlog.conf
<Extension grok>
    Module        xm_grok
    Pattern       /path/to/patterns.txt
</Extension>

<Input apache>
    Module        im_file
    File          '/var/log/httpd/*'
    <Exec>
        if not (match_grok($raw_event, "%{ACCESS_LOG}") or
                match_grok($raw_event, "%{ERROR_LOG}"))
        {
            log_info('Event did not match any pattern');
        }
    </Exec>
</Input>

When the NXLog Agent configuration above processes Apache access and error log events, it adds the following fields to the log record in addition to the core fields.

Table 1. Apache access log fields added by NXLog Agent
Field Value

$ip_address

192.168.3.20

$timestamp

16/Oct/2023

$http_method

GET

uri

/cgi-bin/try/

http_version

1.0

http_status_code

200

response_size

3395

Table 2. Apache error log fields added by NXLog Agent
Field Value

$timestamp

Mon Oct 16 12:15:30 2023

$severity

error

$client_address

192.168.3.123

$message

Directory index forbidden by rule: /home/test/

Using Pattern matcher (xm_pattern)

Pattern-matching methods often use regular expressions. Unfortunately, linearly evaluating many regular expression-based patterns slows down log processing. The xm_pattern module implements a more efficient way than evaluating regular expressions within an Exec block.

Example 3. Parsing logs with xm_pattern

The first step is to define the patterns to parse your logs. In this example, we define patterns for SSH authentication failures.

SSH authentication failure log event
<38>Oct 16 12:15:30 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2

The following NXLog pattern file defines an ssh group that uses the $SourceName field. If an event matches the source, it parses the $Message field with a regular expression that matches SSH authentication failures. See the pattern database schema in the NXLog Agent Reference Manual for details of the XML schema.

patterndb.xml
<?xml version='1.0' encoding='UTF-8'?>
<patterndb>
    <created>2023-10-16 10:21:24</created>
    <version>1</version>
    <!-- First and only pattern group in this file -->
    <group>
        <name>ssh</name>
        <id>40</id>
        <!-- Only try to match this group if $SourceName == "sshd" -->
        <matchfield>
            <name>SourceName</name>
            <type>exact</type>
            <value>sshd</value>
        </matchfield>
        <!-- First and only pattern in this pattern group -->
        <pattern>
            <id>1</id>
            <name>ssh auth failure</name>
            <!-- Match regular expression on $Message field -->
            <matchfield>
                <name>Message</name>
                <type>regexp</type>
                <value>^Failed (\S+) for(?: invalid user)? (\S+) from (\S+) port \d+ ssh2</value>
                <!-- Set fields from the 3 capturing groups -->
                <capturedfield>
                    <name>AuthMethod</name>
                    <type>string</type>
                </capturedfield>
                <capturedfield>
                    <name>AccountName</name>
                    <type>string</type>
                </capturedfield>
                <capturedfield>
                    <name>SourceIPAddress</name>
                    <type>string</type>
                </capturedfield>
            </matchfield>
        </pattern>
    </group>
</patterndb>

You can now use the pattern file with NXLog Agent. This configuration collects Linux system messages with im_uds and parses records into structured data using the parse_syslog() procedure of the xm_syslog module. This procedure adds the $SourceName and $Message fields to the event record.

It then uses the match_pattern() procedure of the xm_pattern module to process the record against the patterns defined in the pattern database.

nxlog.conf
<Extension syslog>
    Module         xm_syslog
</Extension>

<Extension pattern>
    Module         xm_pattern
    PatternFile    '/path/to/patterndb.xml'
</Extension>

<Input uds>
    Module         im_uds
    UDS            /dev/log
    <Exec>
        parse_syslog_bsd();
        match_pattern();
    </Exec>
</Input>

When the NXLog Agent configuration above processes an SSH authentication event, it adds the following fields to the log record in addition to the core fields.

Field Value

$AuthMethod

password

$AccountName

linda

$SourceIPAddress

192.168.1.60