XML (xm_xml)
This module provides functions and procedures for working with data formatted as Extensible Markup Language (XML). It can convert log messages to XML format and can parse XML into fields.
To examine the supported platforms, see the list of installer packages in the Available Modules chapter. |
Configuration
The xm_xml module accepts the following directives in addition to the common module directives.
- IgnoreRootTag
-
This optional boolean directive causes parse_xml() to omit the root tag when setting field names. For example, when this is set to TRUE and the RootTag is set to
Event
, a field might be named$Event.timestamp
. With this directive set to FALSE, that field name would be$timestamp
. The default value is TRUE.
Note that leading dot (.
) is not allowed in XML attribute names thus field names
having a leading dot (.
) will always be excluded from XML output.
- ParseAttributes
-
When this optional boolean directive is set to TRUE, parse_xml() will also parse XML attributes. The default is FALSE (attributes are not parsed). For example, if ParseAttributes is set to TRUE, the following would be parsed into
$Msg.time
,$Msg.type
, and$Msg
:<Msg time='2014-06-27T00:27:38' type='ERROR'>foo</Msg>
- RootTag
-
This optional directive can be used to specify the name of the root tag that will be used by to_xml() to generate XML. The default RootTag is
Event
.
- PrefixWinEvent
-
When this optional boolean directive is set to TRUE, parse_windows_eventlog_xml() will create
EventData.
prefixed fields from<EventData>
section of event XML andUserData.
prefixed fields from<UserData>
section. The default PrefixWinEvent isFALSE
.
Functions
The following functions are exported by xm_xml.
- string
to_xml()
-
Convert the fields to XML and returns this as a string value. The
$raw_event
field and any field having a leading dot (.
) or underscore (_
) will be automatically excluded.Note that directive IncludeHiddenFields has an effect on fields included in the output.
Procedures
The following procedures are exported by xm_xml.
extract_xml(string xpath);
-
Search the
$raw_event
field using the specified XPath expression. If successfully matched, it rewrites the$raw_event
field with the value of the matched node. If no match is found, it returns an empty string.
parse_windows_eventlog_xml();
-
Parse the
$raw_event
field as windows eventlog XML input.Any CR LF and any CR that is not followed by an LF will be translated to a single LF.
parse_windows_eventlog_xml(string source);
-
Parse the given string as windows eventlog XML format.
Any CR LF and any CR that is not followed by an LF will be translated to a single LF.
parse_xml();
-
Parse the
$raw_event
field as XML input.
parse_xml(string source);
-
Parse the given string as XML format.
to_xml();
-
Convert the fields to XML and put this into the
$raw_event
field. The$raw_event
field and any field having a leading dot (.
) or underscore (_
) will be automatically excluded.Note that directive IncludeHiddenFields has an effect on fields included in the output.
Creating and populating fields
The parse_xml() procedure parses a string containing XML into structured data. It expects the $raw_event
field or the string passed as a parameter to be in the following format:
<Event><Tag1>Value 1</Tag1><Tag2>Tag 2</Tag2></Event>
Once a log record is parsed with this procedure, fields are created according to the tags and values in the XML. The fields can be used for further log processing or to convert the log record into a different output format. For an example of how to parse XML log records and manipulate fields, see Parsing XML below.
Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension. |
Extracting values from XML
Both function and procedure accept an XPath expression to select the required node and an optional string containing the XML data to parse.
If no XML data is specified, the content of the $raw_event
core field is parsed.
- Accepted XPath expressions
-
extract_xml() implements a subset of XPath that is most commonly used:
-
Selectors
-
/node
to select by name -
//node
to select by name or a first child with this name -
/*
to select any node (usually narrowed down by predicates) -
@attribute
to get the value of a named "attribute", can only be an end-point
-
-
Predicates (filters)
-
[n]
where n is the numeric index of the array element to select -
[@attribute=value]
to narrow nodes by the value of an attribute
-
An expression can contain a sequence of the above selectors and predicates.
When specifying the
//node
selector, extract_xml() does not perform a recursive search but limits the scope to "self-or-child".The output of the
/*
selector is not randomized and can vary from one system to another. Therefore, if the selector matches multiple nodes on the same level, you cannot assume it will return the same node. -
- Returned data
-
The extract_xml() function returns a string containing the node’s or attribute’s text or XML if the matched node contains XML elements. The extract_xml() procedure sets the value of the
$raw_event
field instead of returning a value.Currently, the returned value is limited to 1 MiB of data.
See Extracting values from XML below for an example.
Examples
The following configuration accepts syslog (both BSD and IETF) and converts it to XML.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension xml>
Module xm_xml
</Extension>
<Input tcp>
Module im_tcp
ListenAddr 0.0.0.0:1514
Exec parse_syslog(); to_xml();
</Input>
<Output file>
Module om_file
File "/var/log/log.xml"
</Output>
<Route tcp_to_file>
Path tcp => file
</Route>
<30>Sep 30 15:45:43 host44.localdomain.hu acpid: 1 client rule loaded
<Event>
<MessageSourceAddress>127.0.0.1</MessageSourceAddress>
<EventReceivedTime>2012-03-08 15:05:39</EventReceivedTime>
<SyslogFacilityValue>3</SyslogFacilityValue>
<SyslogFacility>DAEMON</SyslogFacility>
<SyslogSeverityValue>6</SyslogSeverityValue>
<SyslogSeverity>INFO</SyslogSeverity>
<SeverityValue>2</SeverityValue>
<Severity>INFO</Severity>
<Hostname>host44.localdomain.hu</Hostname>
<EventTime>2012-09-30 15:45:43</EventTime>
<SourceName>acpid</SourceName>
<Message>1 client rule loaded</Message>
</Event>
The following configuration reads the Windows Event Log and converts it to the BSD Syslog format where the message part contains the fields in XML.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension xml>
Module xm_xml
</Extension>
<Input eventlog>
Module im_msvistalog
Exec $Message = to_xml(); to_syslog_bsd();
</Input>
<Output tcp>
Module om_tcp
Host 192.168.1.1:1514
</Output>
<Route eventlog_to_tcp>
Path eventlog => tcp
</Route>
<14>Mar 8 15:12:12 WIN-OUNNPISDHIG Service_Control_Manager: <Event><EventTime>2012-03-08 15:12:12</EventTime><EventTimeWritten>2012-03-08 15:12:12</EventTimeWritten><Hostname>WIN-OUNNPISDHIG</Hostname><EventType>INFO</EventType><SeverityValue>2</SeverityValue><Severity>INFO</Severity><SourceName>Service Control Manager</SourceName><FileName>System</FileName><EventID>7036</EventID><CategoryNumber>0</CategoryNumber><RecordNumber>6791</RecordNumber><Message>The nxlog service entered the running state. </Message><EventReceivedTime>2012-03-08 15:12:14</EventReceivedTime></Event>
This configuration uses the im_file input module to collect XML logs from file.
Log records are parsed into structured data using the parse_xml() procedure.
If the log record has a severity equal to NOTICE
, it is changed to INFO
.
Core fields that are not required are deleted and the log record is converted back to XML using the to_xml() procedure.
<Extension xml>
Module xm_xml
</Extension>
<Input from_file>
Module im_file
File '/tmp/input'
<Exec>
# Parse $raw_event and create fields
parse_xml();
# Change the value of the $Severity field
if ($Severity == 'NOTICE')
$Severity = 'INFO';
# Delete core fields that are not required
delete($SourceModuleType);
delete($SourceModuleName);
# Convert fields back to XML
to_xml();
</Exec>
</Input>
<Event>
<Hostname>Ubuntu-VM</Hostname>
<Message>The service has started.</Message>
<Severity>NOTICE</Severity>
</Event>
The output sample below contains all fields from the input sample and the $EventReceivedTime
core field.
<Event>
<EventReceivedTime>2021-11-10 09:42:42</EventReceivedTime>
<Hostname>Ubuntu-VM</Hostname>
<Message>The service has started.</Message>
<Severity>INFO</Severity>
</Event>
This configuration collects XML-formatted event logs using the im_file input module, which populates the $raw_event
core field with the log line read from the file.
It then uses the extract_xml() procedure to extract the EventData
node from the original event and rewrite the $raw_event
field.
<Extension xml>
Module xm_xml
</Extension>
<Input in>
Module im_file
File '/path/to/log/file'
Exec extract_xml("/Event/EventData");
</Input>
<Event>
<Stream>stdout</Stream>
<EventData>
<EventTime>2020-03-31 14:32:12</EventTime>
<Message>The process has started</Message>
<Severity>INFO</Severity>
</EventData>
</Event>
<EventData>
<EventTime>2020-03-31 14:32:12</EventTime>
<Message>The process has started</Message>
<Severity>INFO</Severity>
</EventData>