XML (xm_xml)

This module provides functions and procedures for working with data formatted as Extensible Markup Language (XML). It can convert log messages to XML format and can parse XML into fields.

To examine the supported platforms, see the list of installer packages in the Available Modules chapter.

Configuration

The xm_xml module accepts the following directives in addition to the common module directives.

IgnoreRootTag

This optional boolean directive causes parse_xml() to omit the root tag when setting field names. For example, when this is set to TRUE and the RootTag is set to Event, a field might be named $Event.timestamp. With this directive set to FALSE, that field name would be $timestamp. The default value is TRUE.

IncludeHiddenFields

This boolean directive specifies that the to_xml() function or the to_xml() procedure should include fields having a leading underscore (_) in their names. The default is TRUE. If IncludeHiddenFields is set to TRUE, then generated XML will contain these otherwise excluded fields.

Note that leading dot (.) is not allowed in XML attribute names thus field names having a leading dot (.) will always be excluded from XML output.

ParseAttributes

When this optional boolean directive is set to TRUE, parse_xml() will also parse XML attributes. The default is FALSE (attributes are not parsed). For example, if ParseAttributes is set to TRUE, the following would be parsed into $Msg.time, $Msg.type, and $Msg:

<Msg time='2014-06-27T00:27:38' type='ERROR'>foo</Msg>
RootTag

This optional directive can be used to specify the name of the root tag that will be used by to_xml() to generate XML. The default RootTag is Event.

PrefixWinEvent

When this optional boolean directive is set to TRUE, parse_windows_eventlog_xml() will create EventData. prefixed fields from <EventData> section of event XML and UserData. prefixed fields from <UserData> section. The default PrefixWinEvent is FALSE.

Functions

The following functions are exported by xm_xml.

string extract_xml(string xpath)

Search the $raw_event field using the specified XPath expression. If successfully matched, it returns the value of the matched node. If no match is found, it returns an empty string.

string extract_xml(string xpath, string xml_data)

Search xml_data using the specified XPath expression. If successfully matched, it returns the value of the matched node. If no match is found, it returns an empty string.

string to_xml()

Convert the fields to XML and returns this as a string value. The $raw_event field and any field having a leading dot (.) or underscore (_) will be automatically excluded.

Note that directive IncludeHiddenFields has an effect on fields included in the output.

Procedures

The following procedures are exported by xm_xml.

extract_xml(string xpath);

Search the $raw_event field using the specified XPath expression. If successfully matched, it rewrites the $raw_event field with the value of the matched node. If no match is found, it returns an empty string.

extract_xml(string xpath, string xml_data);

Search xml_data using the specified XPath expression. If successfully matched, it rewrites the $raw_event field with the value of the matched node. If no match is found, it returns an empty string.

parse_windows_eventlog_xml();

Parse the $raw_event field as windows eventlog XML input.

Any CR LF and any CR that is not followed by an LF will be translated to a single LF.

parse_windows_eventlog_xml(string source);

Parse the given string as windows eventlog XML format.

Any CR LF and any CR that is not followed by an LF will be translated to a single LF.

parse_xml();

Parse the $raw_event field as XML input.

parse_xml(string source);

Parse the given string as XML format.

to_xml();

Convert the fields to XML and put this into the $raw_event field. The $raw_event field and any field having a leading dot (.) or underscore (_) will be automatically excluded.

Note that directive IncludeHiddenFields has an effect on fields included in the output.

Creating and populating fields

The parse_xml() procedure parses a string containing XML into structured data. It expects the $raw_event field or the string passed as a parameter to be in the following format:

<Event><Tag1>Value 1</Tag1><Tag2>Tag 2</Tag2></Event>

Once a log record is parsed with this procedure, fields are created according to the tags and values in the XML. The fields can be used for further log processing or to convert the log record into a different output format. For an example of how to parse XML log records and manipulate fields, see Parsing XML below.

Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension.

Extracting values from XML

The extract_xml() function or procedure extracts a node’s value from an XML tree.

Both function and procedure accept an XPath expression to select the required node and an optional string containing the XML data to parse. If no XML data is specified, the content of the $raw_event core field is parsed.

Accepted XPath expressions

extract_xml() implements a subset of XPath that is most commonly used:

  • Selectors

    • /node to select by name

    • //node to select by name or a first child with this name

    • /* to select any node (usually narrowed down by predicates)

    • @attribute to get the value of a named "attribute", can only be an end-point

  • Predicates (filters)

    • [n] where n is the numeric index of the array element to select

    • [@attribute=value] to narrow nodes by the value of an attribute

An expression can contain a sequence of the above selectors and predicates.

When specifying the //node selector, extract_xml() does not perform a recursive search but limits the scope to "self-or-child".

The output of the /* selector is not randomized and can vary from one system to another. Therefore, if the selector matches multiple nodes on the same level, you cannot assume it will return the same node.

Returned data

The extract_xml() function returns a string containing the node’s or attribute’s text or XML if the matched node contains XML elements. The extract_xml() procedure sets the value of the $raw_event field instead of returning a value.

Currently, the returned value is limited to 1 MiB of data.

See Extracting values from XML below for an example.

Examples

Example 1. Syslog to XML format conversion

The following configuration accepts syslog (both BSD and IETF) and converts it to XML.

nxlog.conf
<Extension syslog>
    Module  xm_syslog
</Extension>

<Extension xml>
    Module      xm_xml
</Extension>

<Input tcp>
    Module      im_tcp
    ListenAddr  0.0.0.0:1514
    Exec        parse_syslog(); to_xml();
</Input>

<Output file>
    Module      om_file
    File        "/var/log/log.xml"
</Output>

<Route tcp_to_file>
    Path        tcp => file
</Route>
Input sample
<30>Sep 30 15:45:43 host44.localdomain.hu acpid: 1 client rule loaded
Output sample
<Event>
  <MessageSourceAddress>127.0.0.1</MessageSourceAddress>
  <EventReceivedTime>2012-03-08 15:05:39</EventReceivedTime>
  <SyslogFacilityValue>3</SyslogFacilityValue>
  <SyslogFacility>DAEMON</SyslogFacility>
  <SyslogSeverityValue>6</SyslogSeverityValue>
  <SyslogSeverity>INFO</SyslogSeverity>
  <SeverityValue>2</SeverityValue>
  <Severity>INFO</Severity>
  <Hostname>host44.localdomain.hu</Hostname>
  <EventTime>2012-09-30 15:45:43</EventTime>
  <SourceName>acpid</SourceName>
  <Message>1 client rule loaded</Message>
</Event>
Example 2. Converting Windows Event Log to syslog-encapsulated XML

The following configuration reads the Windows Event Log and converts it to the BSD Syslog format where the message part contains the fields in XML.

nxlog.conf
<Extension syslog>
    Module  xm_syslog
</Extension>

<Extension xml>
    Module  xm_xml
</Extension>

<Input eventlog>
    Module  im_msvistalog
    Exec    $Message = to_xml(); to_syslog_bsd();
</Input>

<Output tcp>
    Module  om_tcp
    Host    192.168.1.1:1514
</Output>

<Route eventlog_to_tcp>
    Path    eventlog => tcp
</Route>
Output sample
<14>Mar  8 15:12:12 WIN-OUNNPISDHIG Service_Control_Manager: <Event><EventTime>2012-03-08 15:12:12</EventTime><EventTimeWritten>2012-03-08 15:12:12</EventTimeWritten><Hostname>WIN-OUNNPISDHIG</Hostname><EventType>INFO</EventType><SeverityValue>2</SeverityValue><Severity>INFO</Severity><SourceName>Service Control Manager</SourceName><FileName>System</FileName><EventID>7036</EventID><CategoryNumber>0</CategoryNumber><RecordNumber>6791</RecordNumber><Message>The nxlog service entered the running state. </Message><EventReceivedTime>2012-03-08 15:12:14</EventReceivedTime></Event>
Example 3. Parsing XML

This configuration uses the im_file input module to collect XML logs from file. Log records are parsed into structured data using the parse_xml() procedure. If the log record has a severity equal to NOTICE, it is changed to INFO. Core fields that are not required are deleted and the log record is converted back to XML using the to_xml() procedure.

nxlog.conf
<Extension xml>
    Module    xm_xml
</Extension>

<Input from_file>
    Module    im_file
    File      '/tmp/input'
    <Exec>
        # Parse $raw_event and create fields
        parse_xml();

        # Change the value of the $Severity field
        if ($Severity == 'NOTICE')
            $Severity = 'INFO';

        # Delete core fields that are not required
        delete($SourceModuleType);
        delete($SourceModuleName);

        # Convert fields back to XML
        to_xml();
    </Exec>
</Input>
Input sample
<Event>
  <Hostname>Ubuntu-VM</Hostname>
  <Message>The service has started.</Message>
  <Severity>NOTICE</Severity>
</Event>

The output sample below contains all fields from the input sample and the $EventReceivedTime core field.

Output sample
<Event>
  <EventReceivedTime>2021-11-10 09:42:42</EventReceivedTime>
  <Hostname>Ubuntu-VM</Hostname>
  <Message>The service has started.</Message>
  <Severity>INFO</Severity>
</Event>
Example 4. Extracting values from XML

This configuration collects XML-formatted event logs using the im_file input module, which populates the $raw_event core field with the log line read from the file. It then uses the extract_xml() procedure to extract the EventData node from the original event and rewrite the $raw_event field.

nxlog.conf
<Extension xml>
    Module    xm_xml
</Extension>

<Input in>
    Module    im_file
    File      '/path/to/log/file'
    Exec      extract_xml("/Event/EventData");
</Input>
Input sample
<Event>
  <Stream>stdout</Stream>
  <EventData>
    <EventTime>2020-03-31 14:32:12</EventTime>
    <Message>The process has started</Message>
    <Severity>INFO</Severity>
  </EventData>
</Event>
Output sample
<EventData>
  <EventTime>2020-03-31 14:32:12</EventTime>
  <Message>The process has started</Message>
  <Severity>INFO</Severity>
</EventData>