Parse unstructured logs with NXLog Agent
Log parsing is when you extract data from unstructured event log records based on rules that segment messages into named fields or columns.
When NXLog Agent receives a log event, it creates an event record consisting of a $raw_event
field and other core fields.
Input modules that automatically parse log records create additional fields.
See also NXLog Agent log processing overview.
NXLog Agent supports regular expressions with capturing groups, Grok patterns, and NXLog patterns for parsing logs. Below, we provide various examples of extracting values from a raw event with NXLog Agent.
Using regular expressions
Regular expressions are one of the most common methods to parse unstructured log data into fields. See Regular Expressions in the NXLog Agent Reference Manual for more information on using regular expressions.
This configuration receives syslog messages over UDP and parses records into structured data using the parse_syslog() procedure of the xm_syslog module. This procedure adds the $Message field to the event record.
It then uses a regular expression to parse the $Message
field and extract the authentication method, username, and source IP address if available.
<Extension syslog>
Module xm_syslog
</Extension>
<Input udp>
Module im_udp
ListenAddr 0.0.0.0:514
<Exec>
parse_syslog();
if $Message =~ /(?x)^Failed\ (\S+)\ for(?:\ invalid user)?\ (\S+)\ from
\ (\S+)\ port\ \d+\ ssh2$/
{
$AuthMethod = $1;
$AccountName = $2;
$SourceIPAddress = $3;
}
</Exec>
</Input>
NXLog Agent also supports named capturing. Named capturing groups you specify in the regular expression are automatically added to the event record as a field with the same name. The following configuration is the same as above but uses a regular expression with named capturing groups.
<Extension syslog>
Module xm_syslog
</Extension>
<Input udp>
Module im_udp
ListenAddr 0.0.0.0:514
<Exec>
parse_syslog();
$Message =~ /(?x)^Failed\ (?<AuthMethod>\S+)\ for(?:\ invalid\ user)?
\ (?<AccountName>\S+)\ from\ (?<SourceIPAddress>\S+)\ port
\ \d+\ ssh2$/;
</Exec>
</Input>
The following is a syslog message collected from a Linux host.
<38>Oct 16 12:15:30 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2
When the NXLog Agent configurations above process this message, they add the following fields to the log record in addition to the core fields.
Field | Value |
---|---|
$AuthMethod |
password |
$AccountName |
linda |
$SourceIPAddress |
192.168.1.60 |
Using Grok patterns
The xm_grok module supports parsing unstructured log data with Grok patterns. Below, we demonstrate how to parse Apache logs using Grok patterns.
The first step is to define the Grok patterns to parse your logs. In this example, we define patterns for Apache access and error logs.
192.168.3.20 - - [16/Oct/2023] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
You can parse Apache access log events like the above with the following Grok pattern:
ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\]
"%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}"
%{INT:http_status_code} %{INT:response_size}
[Mon Oct 16 12:15:30 2023] [error] [client 192.168.3.123] Directory index forbidden
by rule: /home/test/
You can parse Apache error log events like the above with the following Grok pattern:
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\]
\[client %{IP:client_address}\] %{GREEDYDATA:message}
Next, save your patterns in a Grok pattern file like the one below.
INT (?:[+-]?(?:[0-9]+))
YEAR (?>\d\d){1,2}
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
GREEDYDATA .*
IP (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
TIMESTAMP_ACCESS %{INT}\/%{MONTH}\/%{YEAR}(:%{HOUR}:%{MINUTE}:%{SECOND} %{GREEDYDATA})?
TIMESTAMP_ERROR %{DAY} %{MONTH} %{INT} %{HOUR}:%{MINUTE}:%{SECOND} %{YEAR}
METHOD (GET|POST|PUT|DELETE|HEAD|TRACE|OPTIONS|CONNECT|PATCH){1}
HTTP_VERSION 1.(0|1)
ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\] "%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}" %{INT:http_status_code} %{INT:response_size}
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\] \[client %{IP:client_address}\] %{GREEDYDATA:message}
An online search will yield many examples of ready-made Grok patterns. See the Logstash Grok patterns on GitHub to start with.
You are now ready to process logs with Grok patterns.
This configuration reads Apache logs with the im_file input module, which stores the unparsed log event in the $raw_event
field.
It then uses the match_grok() function of the xm_grok module to test $raw_event
against Grok patterns defined in patterns.txt
.
If none of the patterns match, it writes an informational event in the NXLog Agent log file.
<Extension grok>
Module xm_grok
Pattern /path/to/patterns.txt
</Extension>
<Input apache>
Module im_file
File '/var/log/httpd/*'
<Exec>
if not (match_grok($raw_event, "%{ACCESS_LOG}") or
match_grok($raw_event, "%{ERROR_LOG}"))
{
log_info('Event did not match any pattern');
}
</Exec>
</Input>
When the NXLog Agent configuration above processes Apache access and error log events, it adds the following fields to the log record in addition to the core fields.
Field | Value |
---|---|
$ip_address |
192.168.3.20 |
$timestamp |
16/Oct/2023 |
$http_method |
GET |
uri |
/cgi-bin/try/ |
http_version |
1.0 |
http_status_code |
200 |
response_size |
3395 |
Field | Value |
---|---|
$timestamp |
Mon Oct 16 12:15:30 2023 |
$severity |
error |
$client_address |
192.168.3.123 |
$message |
Directory index forbidden by rule: /home/test/ |
Using Pattern matcher (xm_pattern)
Pattern-matching methods often use regular expressions. Unfortunately, linearly evaluating many regular expression-based patterns slows down log processing. The xm_pattern module implements a more efficient way than evaluating regular expressions within an Exec block.
The first step is to define the patterns to parse your logs. In this example, we define patterns for SSH authentication failures.
<38>Oct 16 12:15:30 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2
The following NXLog pattern file defines an ssh
group that uses the $SourceName
field.
If an event matches the source, it parses the $Message
field with a regular expression that matches SSH authentication failures.
See the pattern database schema in the NXLog Agent Reference Manual for details of the XML schema.
<?xml version='1.0' encoding='UTF-8'?>
<patterndb>
<created>2023-10-16 10:21:24</created>
<version>1</version>
<!-- First and only pattern group in this file -->
<group>
<name>ssh</name>
<id>40</id>
<!-- Only try to match this group if $SourceName == "sshd" -->
<matchfield>
<name>SourceName</name>
<type>exact</type>
<value>sshd</value>
</matchfield>
<!-- First and only pattern in this pattern group -->
<pattern>
<id>1</id>
<name>ssh auth failure</name>
<!-- Match regular expression on $Message field -->
<matchfield>
<name>Message</name>
<type>regexp</type>
<value>^Failed (\S+) for(?: invalid user)? (\S+) from (\S+) port \d+ ssh2</value>
<!-- Set fields from the 3 capturing groups -->
<capturedfield>
<name>AuthMethod</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>AccountName</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>SourceIPAddress</name>
<type>string</type>
</capturedfield>
</matchfield>
</pattern>
</group>
</patterndb>
You can now use the pattern file with NXLog Agent. This configuration collects Linux system messages with im_uds and parses records into structured data using the parse_syslog() procedure of the xm_syslog module. This procedure adds the $SourceName and $Message fields to the event record.
It then uses the match_pattern() procedure of the xm_pattern module to process the record against the patterns defined in the pattern database.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension pattern>
Module xm_pattern
PatternFile '/path/to/patterndb.xml'
</Extension>
<Input uds>
Module im_uds
UDS /dev/log
<Exec>
parse_syslog_bsd();
match_pattern();
</Exec>
</Input>
When the NXLog Agent configuration above processes an SSH authentication event, it adds the following fields to the log record in addition to the core fields.
Field | Value |
---|---|
$AuthMethod |
password |
$AccountName |
linda |
$SourceIPAddress |
192.168.1.60 |