Extracting data

When NXLog receives an event, it creates an event record with a $raw_event field, other core fields like $EventReceivedTime, and any fields provided by the particular module (see Fields for more information). This section explores the various ways that NXLog can be configured to extract values from the raw event.

Some log sources (like Windows Event Log collected via im_msvistalog) already contain structured data. In this case, there is often no additional extraction required; see Log classification.

Log parsing with regular expressions via the Exec directive

NXLog supports the use of regular expressions for parsing log fields. For detailed information about regular expressions in NXLog, see the Reference Manual Regular Expressions section.

Example 1. Parsing logs with regular expressions
Syslog message
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2

With this configuration, the Syslog message shown above is first parsed with parse_syslog(). This results in a $Message field created in the event record. Then, a regular expression is used to further parse the $Message field and create additional fields if it matches.

nxlog.conf
<Extension _syslog>
    Module  xm_syslog
</Extension>

<Input udp>
    Module  im_udp
    Host    0.0.0.0
    Port    514
    <Exec>
        parse_syslog();
        if $Message =~ /(?x)^Failed\ (\S+)\ for(?:\ invalid user)?\ (\S+)\ from
                        \ (\S+)\ port\ \d+\ ssh2$/
        {
            $AuthMethod = $1;
            $AccountName = $2;
            $SourceIPAddress = $3;
        }
    </Exec>
</Input>

Named capturing is also supported. Regex named groups are automatically added to the event record as a field with the same name.

nxlog.conf
<Input in>
    Module  im_udp
    Host    0.0.0.0
    Port    514
    <Exec>
        parse_syslog();
        $Message =~ /(?x)^Failed\ (?<AuthMethod>\S+)\ for(?:\ invalid\ user)?
                     \ (?<AccountName>\S+)\ from\ (?<SourceIPAddress>\S+)\ port
                     \ \d+\ ssh2$/;
    </Exec>
</Input>
Table 1. Additional fields parsed by regular expression
Field Value

$AuthMethod

password

$AccountName

linda

$SourceIPAddress

192.168.1.60

Pattern matching with Grok

The xm_grok module provides parsing for unstructured log messages with Grok patterns.

The examples below demonstrate how to parse Apache logs using Grok patterns.

Example 2. Creating the pattern to parse the Access log

The message below is a sample of an Apache Access log.

Apache Access log
192.168.3.20 - - [28/Jun/2019] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

The above Apache message can be parsed using the Grok pattern below.

Pattern for the Access log
ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\]
"%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}"
%{INT:http_status_code} %{INT:response_size}
Example 3. Creating the pattern to parse the Error Message

The message below is a sample of an Apache error message.

Apache Error Message
[Fri Dec 16 01:46:23 2019] [error] [client 1.2.3.4] Directory index forbidden
by rule: /home/test/

The above Apache log message can be parsed using the Grok pattern below.

Pattern for the Error Message
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\]
\[client %{IP:client_address}\] %{GREEDYDATA:message}

Lists of Grok patterns are available in various repositories. As an example, see the logstash-plugin section on GitHub.

Example 4. Configuring NXLog to parse Apache logs

The following configuration reads messages from the apache_entries.log file using the im_file module and stores the result in the $raw_event field.

The match_grok() function reads patterns from the patterns.txt file and attempts a series of matches on the $raw_event field. If none of the patterns match, an internal message is logged.

nxlog.conf
<Extension grok>
    Module     xm_grok
    Pattern    patterns.txt
</Extension>

<Input messages>
    Module     im_file
    File       "apache_entries.log"
    <Exec>
        if not ( match_grok($raw_event, "%{ACCESS_LOG}") or
                 match_grok($raw_event, "%{ERROR_LOG}"))
        {
            log_info('Event did not match any pattern');
        }
    </Exec>
</Input>

This example uses the patterns.txt file, which contains all necessary Grok patterns.

patterns.txt
INT (?:[+-]?(?:[0-9]+))
YEAR (?>\d\d){1,2}
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
GREEDYDATA .*
IP (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
TIMESTAMP_ACCESS %{INT}\/%{MONTH}\/%{YEAR}(:%{HOUR}:%{MINUTE}:%{SECOND} %{GREEDYDATA})?
TIMESTAMP_ERROR %{DAY} %{MONTH} %{INT} %{HOUR}:%{MINUTE}:%{SECOND} %{YEAR}
METHOD (GET|POST|PUT|DELETE|HEAD|TRACE|OPTIONS|CONNECT|PATCH){1}
HTTP_VERSION 1.(0|1)

ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\] "%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}" %{INT:http_status_code} %{INT:response_size}
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\] \[client %{IP:client_address}\] %{GREEDYDATA:message}

Pattern matching with pm_pattern

Regular expressions are widely used in pattern matching. Unfortunately, using a large number of regular expression based patterns does not scale well, because these need to be evaluated linearly. The pm_pattern module implements a more efficient pattern matching than regular expressions used in Exec directives.

Example 5. Using regular expressions with pm_pattern
Syslog Message
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2

With this configuration, the above Syslog message is first parsed with parse_syslog(). This results in a $Message field created in the event record. Then, the pm_pattern module is used with a pattern XML file to further parse the record.

nxlog.conf
<Extension _syslog>
    Module      xm_syslog
</Extension>

<Input uds>
    Module      im_uds
    UDS         /dev/log
    Exec        parse_syslog_bsd();
</Input>

<Processor pattern>
    Module      pm_pattern
    PatternFile /var/lib/nxlog/patterndb.xml
</Processor>

<Output file>
    Module      om_file
    File        "/var/log/messages"
    Exec        to_syslog_bsd();
</Output>

<Route uds_to_file>
    Path        uds => pattern => file
</Route>

The patterns for the pm_pattern module instance above are declared in the following patterndb.xml file.

Pattern database (patterndb.xml)
<?xml version='1.0' encoding='UTF-8'?>
<patterndb>
  <created>2010-01-01 01:02:03</created>
  <version>42</version>
  <!-- First and only pattern group in this file -->
  <group>
    <name>ssh</name>
    <id>42</id>
    <!-- Only try to match this group if $SourceName == "sshd" -->
    <matchfield>
      <name>SourceName</name>
      <type>exact</type>
      <value>sshd</value>
    </matchfield>
    <!-- First and only pattern in this pattern group -->
    <pattern>
      <id>1</id>
      <name>ssh auth failure</name>
      <!-- Do regular expression match on $Message field -->
      <matchfield>
        <name>Message</name>
        <type>regexp</type>
        <value>^Failed (\S+) for(?: invalid user)? (\S+) from (\S+) port \d+ ssh2</value>
        <!-- Set 3 event record fields from captured strings -->
        <capturedfield>
          <name>AuthMethod</name>
          <type>string</type>
        </capturedfield>
        <capturedfield>
          <name>AccountName</name>
          <type>string</type>
        </capturedfield>
        <capturedfield>
          <name>SourceIPAddress</name>
          <type>string</type>
        </capturedfield>
      </matchfield>
      <!-- Set additional fields if pattern matches -->
      <set>
        <field>
          <name>TaxonomyAction</name>
          <value>Authenticate</value>
          <type>string</type>
        </field>
        <field>
          <name>TaxonomyStatus</name>
          <value>Failure</value>
          <type>string</type>
        </field>
      </set>
    </pattern>
  </group>
</patterndb>
Table 2. Fields added by pm_pattern
Field Value

$AuthMethod

password

$AccountName

linda

$SourceIPAddress

192.168.1.60

$TaxonomyAction

Authenticate

$TaxonomyStatus

Failure

NXLog Manager provides an interface for writing pattern files, and will also test sample events to aid in establishing the correct match patterns. The pattern functions can be accessed from the PATTERNS menu in the page header.

Example 6. Creating patterns with NXLog Manager

The following instructions explain the steps required for creating the above pattern database with NXLog Manager.

  1. Open PATTERNS > CREATE GROUP. Enter a Name for the new pattern group, and optionally a Description, in the Properties section. The name is used to refer to the pattern group later. The name of the above pattern group is ssh.

  2. Add a match field by clicking Add Field in the Match section. Only messages that match will be further processed by this pattern group. In the above example, there is no reason to attempt any matches if the $SourceName field does not equal sshd. The above pattern group uses Field name=SourceName, Match=EXACT, and Value=sshd.

    NXLog Manager Pattern Group Match Section
  3. Save the new pattern group.

  4. Open PATTERNS > CREATE FIELD to create a new field to be used when creating new patterns. For the above example, the $AuthMethod field must be added because it is not in the default set provided by NXLog Manager. Set Name=AuthMethod and Field Type=STRING, then click Save.

  5. Open PATTERNS > CREATE PATTERN. In the Pattern Info section, enter a Pattern Name and optionally a Pattern Description. Select the correct Pattern Group from the list. In the above example, the ssh pattern group is used.

  6. In the Match section, set match values for the fields to be matched. If a regular expression match with captured subgroups is detected, the interface will provide a Captured fields list where target fields can be selected. The above example uses Field name=Message, Match=REGEXP, and Value=^Failed (\S+) for(?: invalid user)? (\S+) from (\S+) port \d+ ssh2$. The three captured fields are AuthMethod, AccountName, and SourceIPAddress.

    NXLog Manager Pattern Match Section
  7. The Set section allows fields to be set if the match is successful. Click Add Field for each field. The above example sets $TaxonomyStatus to Failure and $TaxonomyAction to Authenticate.

    NXLog Manager Pattern Set Section
  8. The Action section accepts NXLog language statements like you would specify in an Exec directive. Click Add action, type in the statement, and click Verify to make sure the statement is valid. The above example does not include any NXLog language statements.

  9. The final tabbed section allows test messages to be entered to verify that the match works as expected. Click the + to add a test case. To test the above example, add a Value for the Message field: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2. Click Update Test Cases in the Match section to automatically fill the captured fields. Verify that the fields are set as expected. Additional test cases can be added to test other events.

    NXLog Manager Pattern Test Case Section
  10. Save the new pattern. Then click Export to download the pattern.xml file or use the pattern to configure a managed agent.

See the NXLog Manager User Guide for more information.

Using the extracted fields

The previous sections explore ways that the log message can be parsed and new fields added to the event record. Once the required data has been extracted and corresponding fields created, there are various ways to use this new data.

  • A field or set of fields can be matched by string or regular expression to trigger alerts, perform filtering, or further classify the event.

  • Fields in the event record can be renamed, modified, or deleted.

  • Event correlation can be used to execute statements or suppress messages based on matching events inside a specified window.

  • Some output formats can be used to preserve the full set of fields in the event record (such as JSON and the NXLog Binary format).