Pattern Matcher (xm_pattern)
This module makes it possible to execute pattern matching with a pattern database file in XML format. Using xm_pattern is more efficient than having NXLog regular expression rules listed in Exec directives, because it was designed in such a way that patterns do not need to be matched linearly. Regular expression sub-capturing can be used to set additional fields in the event record and arbitrary fields can be added under the scope of a pattern match for message classification. In addition, the module does an automatic on-the-fly pattern reordering internally for further speed improvements.
To examine the supported platforms, see the list of installer packages in the Available Modules chapter. |
There are other techniques such as the radix tree which solve the linearity problem; the drawback is that usually these require the user to learn a special syntax for specifying patterns. If the log message is already parsed and is not treated as single line of message, then it is possible to process only a subset of the patterns which partially solves the linearity problem. With other performance improvements employed within the xm_pattern module, its speed can compare to the other techniques. Yet the xm_pattern module uses regular expressions which are familiar to users and can easily be migrated from other tools.
Traditionally, pattern matching on log messages has employed a technique where the log message was one string and the pattern (regular expression or radix tree based pattern) was executed against it. To match patterns against logs which contain structured data (such as the Windows EventLog), this structured data (the fields of the log) must be converted to a single string. This is a simple but inefficient method used by many tools.
The NXLog patterns defined in the XML pattern database file can contain more than one field. This allows multi-dimensional pattern matching. Thus with NXLog’s xm_pattern module there is no need to convert all fields into a single string as it can work with multiple fields.
Patterns can be grouped together under pattern groups. Pattern groups
serve an optimization purpose. The group can have an optional
matchfield block which can check a condition. If the condition (such
as $SourceName
matches sshd
) is satisfied, the xm_pattern module
will descend into the group and check each pattern against the log. If
the pattern group’s condition did not match ($SourceName
was not
sshd
), the module can skip all patterns in the group without having
to check each pattern individually.
When the xm_pattern module finds a matching pattern, the $PatternID
and $PatternName
fields are set on the log message. These can be
used later in conditional processing and correlation rules of the
pm_evcorr module, for example.
The xm_pattern module does not process all patterns. It exits after the
first matching pattern is found. This means that at most one pattern can
match a log message. Multiple patterns that can match the same subset of
logs should be avoided. For example, with two regular expression
patterns ^\d+ and ^\d\d , only one will be matched but not
consistently because the internal order of patterns and pattern groups
is changed dynamically by xm_pattern (patterns with the highest match
count are placed and tried first). For a strictly linearly executing
pattern matcher, see the Exec directive.
|
The XML Schema Definition (XSD) for the pattern database file is available in the nxlog-public/contrib repository. |
Configuration
The xm_pattern module accepts the following directives in addition to the common module directives.
- PatternFile
-
This mandatory directive specifies the name of the pattern database file.
Functions
The following functions are exported by xm_pattern.
- boolean
match_pattern()
-
Execute the match_pattern() procedure. If the event is successfully matched, return TRUE, otherwise FALSE.
Procedures
The following procedures are exported by xm_pattern.
match_pattern();
-
Attempt to match the current event according to the PatternFile. Execute statements and add fields as specified.
Examples
This configuration reads syslog messages from file and parses them with parse_syslog().
The events are then further processed with a pattern file and the corresponding match_pattern() procedure to add additional fields to SSH authentication success or failure events.
The matching is done against the $SourceName
and $Message
fields, so the syslog parsing must be performed before the pattern matching will work.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension pattern>
Module xm_pattern
PatternFile 'modules/extension/pattern/patterndb2-3.xml'
</Extension>
<Input in>
Module im_file
File 'test2.log'
<Exec>
parse_syslog();
match_pattern();
</Exec>
</Input>
The following pattern database contains two patterns to match SSH
authentication messages. The patterns are under a group named ssh which
checks whether the $SourceName
field is sshd
and only tries to match the
patterns if the logs are indeed from sshd. The patterns both extract
$AuthMethod
, $AccountName
, and $SourceIP4Address
fields from the log
message when the pattern matches the log. Additionally $TaxonomyStatus
and
$TaxonomyAction
are set. The second pattern shows an
Exec block example, which is evaluated when the pattern
matches.
For the full syntax and semantics of the regular expressions supported by PCRE2, please see the pcre2pattern documentation.
The number of captured fields should be exactly equal to the number of capturedfield records, otherwise the parsing will terminate.
|
<?xml version='1.0' encoding='UTF-8'?>
<patterndb>
<created>2018-01-01 01:02:03</created>
<version>4</version>
<group>
<id>1</id>
<name>ssh</name>
<matchfield>
<name>SourceName</name>
<type>exact</type>
<value>sshd</value>
</matchfield>
<pattern>
<id>1</id>
<name>ssh auth success</name>
<matchfield>
<name>Message</name>
<type>regexp</type>
<value>^Accepted (\S+) for (\S+) from (\S+) port \d+ ssh2</value>
<capturedfield>
<name>AuthMethod</name>
<type>STRING</type>
</capturedfield>
<capturedfield>
<name>AccountName</name>
<type>STRING</type>
</capturedfield>
<capturedfield>
<name>SourceIP4Address</name>
<type>IP4ADDR</type>
</capturedfield>
</matchfield>
<set>
<field>
<name>TaxonomyStatus</name>
<type>STRING</type>
<value>success</value>
</field>
<field>
<name>TaxonomyAction</name>
<type>STRING</type>
<value>authenticate</value>
</field>
</set>
</pattern>
<pattern>
<id>2</id>
<name>ssh auth failure</name>
<matchfield>
<name>Message</name>
<type>regexp</type>
<value>^Failed (\S+) for invalid user (\S+) from (\S+) port \d+ ssh2</value>
<capturedfield>
<name>AuthMethod</name>
<type>STRING</type>
</capturedfield>
<capturedfield>
<name>AccountName</name>
<type>STRING</type>
</capturedfield>
<capturedfield>
<name>SourceIP4Address</name>
<type>IP4ADDR</type>
</capturedfield>
</matchfield>
<set>
<field>
<name>TaxonomyStatus</name>
<type>STRING</type>
<value>failure</value>
</field>
<field>
<name>TaxonomyAction</name>
<type>STRING</type>
<value>authenticate</value>
</field>
</set>
<exec>
$TestField = 'test';
$TestField = $Testfield + 'value';
</exec>
</pattern>
</group>
</patterndb>
This example is the same as the previous one, and uses the same pattern file, but it uses the match_pattern() function to discard any event that is not matched by the pattern file.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension pattern>
Module xm_pattern
PatternFile 'modules/extension/pattern/patterndb2-3.xml'
</Extension>
<Input in>
Module im_file
File 'test2.log'
<Exec>
parse_syslog();
if not match_pattern() drop();
</Exec>
</Input>