JSON (xm_json)
This module provides features to parse or generate JSON data.
Logs in JSON format can be parsed into structured data using one of two methods:
-
By calling the parse_json() procedure, which expects the
$raw_event
field or the string passed as a parameter to contain a valid JSON object. The most common use-case for this is when an input module reads line-based data and each line is JSON object. -
By specifying the name of an xm_json module instance in the InputType directive of the input module, which instructs the module to parse the raw data directly. This method supports JSON spanning multiple lines (e.g. prettified JSON), JSON arrays, and newline-delimited JSON. See the example on Processing multiple JSON objects.
The JSON specification does not define a type for datetime values,
therefore, these are represented as JSON strings. The JSON parser in this module
can automatically detect datetime values, therefore, it is not necessary to
explicitly use parsedate().
|
The length of JSON keys is limited to a maximum of 499 characters. Any attempt to use a longer key will result in failure of the current operation and probable data loss. |
The records can be produced as JSON array elements by specifying the name of an xm_json module instance in the OutputType directive of the output module. See the example on Produce JSON array.
To examine the supported platforms, see the list of installer packages in the Available Modules chapter. |
Configuration
The xm_json module accepts the following directives in addition to the common module directives.
- DateFormat
-
This optional directive can be used to set the format of the datetime strings in the generated JSON. This directive is similar to the global DateFormat, but is independent of it: this directive is defined separately and has its own default. If this directive is not specified, the default is
YYYY-MM-DDThh:mm:ss.sTZ
.
- DetectNestedJSON
-
This optional directive can be used to disable the autodetection of nested JSON strings when calling the to_json() function or the to_json() procedure. For example, consider a field
$key
which contains the string value of{"subkey":42}
. If DetectNestedJSON is set to FALSE, to_json() will produce{"key":"{\"subkey\":42}"}
. If DetectNestedJSON is set to TRUE (the default), the result is{"key":{"subkey":42}}
—a valid nested JSON record.
- Flatten
-
This optional boolean directive specifies that the parse_json() procedure should flatten nested JSON, creating field names with dot notation. The default is FALSE. If Flatten is set to TRUE, the following JSON will populate the fields
$event.time
and$event.severity
:{"event":{"time":"2015-01-01T00:00:00.000Z","severity":"ERROR"}}
- ForceUTF8
-
This optional boolean directive specifies whether the generated JSON should be valid UTF-8. The JSON specification requires JSON records to be UTF-8 encoded, and some tools fail to parse JSON if it is not valid UTF-8. If ForceUTF8 is set to TRUE, the generated JSON will be validated and any invalid character will be replaced with a question mark (
?
). The default is FALSE.
- ParseDate
-
If this boolean directive is set to TRUE, xm_json will attempt to parse as a timestamp any string that appears to begin with a 4-digit year (as a regular expression,
^[12][0-9]{3}-
). If this directive is set to FALSE, xm_json will not attempt to parse these strings. The default is TRUE.
- PrettyPrint
-
If set to TRUE, this optional boolean directive specifies that the generated JSON should be pretty-printed, where each key-value is printed on a new indented line. Note that this adds line-breaks to the JSON records, which can cause parser errors in some tools that expect single-line JSON. If this directive is not specified, the default is FALSE.
- UnFlatten
-
This optional boolean directive specifies that the to_json() procedure should generate nested JSON when field names exist containing the dot (
.
). For example, if UnFlatten is set to TRUE, the two fields$event.time
and$event.severity
will be converted to JSON as follows:{"event":{"time":"2015-01-01T00:00:00.000Z","severity":"ERROR"}}
When UnFlatten is set to FALSE (the default if not specified), the following JSON would result:
{"event.time":"2015-01-01T00:00:00.000Z","event.severity":"ERROR"}
Functions
The following functions are exported by xm_json.
- string
to_json()
-
Convert the fields to JSON and return this as a string value. Any field having a leading dot (
.
) or underscore (_
) will be automatically excluded unless IncludeHiddenFields directive is set to TRUE. The existing$raw_event
field is never included in the generated JSON.
Procedures
The following procedures are exported by xm_json.
extract_json(string jsonpath);
-
Search the
$raw_event
field using the specified JSONPath expression. If successfully matched, it rewrites the$raw_event
field with the value of the matched node. If no match is found,$raw_event
will be set to an empty string.
parse_json();
-
Parse the
$raw_event
field as JSON input.
parse_json(string source);
-
Parse the given string as JSON format.
to_json();
-
Convert the fields to JSON and put this into the
$raw_event
field. Any field having a leading dot (.
) or underscore (_
) will be automatically excluded unless IncludeHiddenFields directive is set to TRUE. The existing$raw_event
field is never included in the generated JSON.
Creating and populating fields
The parse_json() procedure parses a string containing a JSON object into structured data.
It expects the $raw_event
field or the string passed as a parameter to be in the following format:
{"Key1":"Value1","Key2":"Value2"}
Once a log record is parsed with this procedure, fields are created according to the keys and values in the JSON object. The fields can be used for further log processing or to convert the log record into a different output format. For an example of how to parse JSON log records and manipulate fields, see Parsing JSON below.
Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension. |
Extracting values from JSON
Both function and procedure accept a JSONPath expression to select the required node and an optional string containing the JSON data to parse.
If no JSON data is specified, the content of the $raw_event
core field is parsed.
- Accepted JSONPath expressions
-
extract_json() implements a subset of JSONPath that is most commonly used in
$.a.['b']['c'].d[1].['e'][1]..f
notation:-
Node selection by name, e.g.,
$.a
or$['a']
. -
Array elements selection by index, including multi-dimensional arrays, e.g.,
$.a[1]
or$.a[1][1][1]
or$['a'][1][1][1]
. -
Search for a named node in one nesting layer, e.g.,
$..a
. -
Any sequence of the above expressions.
When specifying a
$..a
-type expression, extract_json() does not perform a recursive search for nodes but limits the scope to only child nodes of the current level. The first match is returned. -
- Returned data
-
The extract_json() function returns a string containing the node’s value or JSON if the matched node contains a JSON object. The extract_json() procedure sets the value of the
$raw_event
field instead of returning a value.Since data is parsed and rewritten as a string, floating-point values always include 6 digits after the point.
See Extracting values from JSON below for an example.
Examples
The following configuration accepts syslog (both BSD and IETF) via TCP and converts it to JSON.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input tcp>
Module im_tcp
ListenAddr 0.0.0.0:1514
Exec parse_syslog(); to_json();
</Input>
<Output file>
Module om_file
File "/var/log/json.txt"
</Output>
<Route tcp_to_file>
Path tcp => file
</Route>
<30>Sep 30 15:45:43 host44.localdomain.hu acpid: 1 client rule loaded
{
"MessageSourceAddress":"127.0.0.1",
"EventReceivedTime":"2011-03-08 14:22:41",
"SyslogFacilityValue":1,
"SyslogFacility":"DAEMON",
"SyslogSeverityValue":5,
"SyslogSeverity":"INFO",
"SeverityValue":2,
"Severity":"INFO",
"Hostname":"host44.localdomain.hu",
"EventTime":"2011-09-30 14:45:43",
"SourceName":"acpid",
"Message":"1 client rule loaded "
}
The following configuration reads the Windows Event Log and converts events to the BSD syslog format, with the message part containing the fields in JSON.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input eventlog>
Module im_msvistalog
Exec $Message = to_json(); to_syslog_bsd();
</Input>
<Output tcp>
Module om_tcp
Host 192.168.1.1:1514
</Output>
<Route eventlog_json_tcp>
Path eventlog => tcp
</Route>
<14>Mar 8 14:40:11 WIN-OUNNPISDHIG Service_Control_Manager: {"EventTime":"2012-03-08 14:40:11","EventTimeWritten":"2012-03-08 14:40:11","Hostname":"WIN-OUNNPISDHIG","EventType":"INFO","SeverityValue":2,"Severity":"INFO","SourceName":"Service Control Manager","FileName":"System","EventID":7036,"CategoryNumber":0,"RecordNumber":6788,"Message":"The nxlog service entered the running state. ","EventReceivedTime":"2012-03-08 14:40:12"}
The NXLog configuration below uses the im_file input module to collect JSON log lines from file.
Log records are parsed into structured data using the parse_json() procedure.
If the log record has a severity equal to NOTICE
, it is changed to INFO
.
Core fields that are not required are deleted and the log record is converted back to JSON using the to_json() procedure.
<Extension json>
Module xm_json
</Extension>
<Input from_file>
Module im_file
File "/tmp/input"
<Exec>
# Parse $raw_event and creating fields
parse_json();
# Change the value of the $Severity field
if ($Severity == 'NOTICE')
$Severity = 'INFO';
# Delete core fields that are not required
delete($SourceModuleType);
delete($SourceModuleName);
# Convert fields back to JSON
to_json();
</Exec>
</Input>
{
"Hostname": "Ubuntu-VM",
"Message": "The service has started.",
"Severity": "NOTICE"
}
The output sample below contains all fields from the input sample and the
$EventReceivedTime
core field.
{
"EventReceivedTime": "2021-08-29T22:50:12.000000-07:00",
"Hostname": "Ubuntu-VM",
"Message": "The service has started.",
"Severity": "INFO"
}
The sample inputs presented here in different form contain a pair of JSON objects which can be parsed by using the InputType directive of the input module.
{
"Date": "2021-08-29", "Time": "22:50:12",
"Message": "The process has started"
},
{
"Message": "The process has stopped!"
}
[{"Date": "2021-08-29", "Time": "22:50:12", "Message": "The process has started"}, {"Message": "The process has stopped!"}]
{"Date": "2021-08-29", "Time": "22:50:12", "Message": "The process has started"}
{"Message": "The process has stopped!"}
The below configuration parses JSON data using the xm_json module instance specified by the InputType directive.
The value of this directive must correspond to the name of the xm_json module instance, in this case json_parser
.
Using the InputType directive eliminates the need to use the parse_json() procedure.
The to_json() function is used to generate the output as JSON, which is only necessary as the configuration deletes some fields.
<Extension json_parser>
Module xm_json
</Extension>
<Input from_file>
Module im_file
File 'C:\input.txt'
InputType json_parser
<Exec>
# Checking for the $Date field presence in the entry
if defined $Date
{
# Creating the $EventTime field based on the $Date and $Time values
$EventTime = strptime($Date + $Time,"%Y-%m-%d %T");
}
else
{
# Using the $EventReceivedTime value if $Date is not specified
$EventTime = $EventReceivedTime;
}
# Deleting several fields to make the output shorter
delete($Date);
delete($Time);
delete($EventReceivedTime);
delete($SourceModuleName);
delete($SourceModuleType);
# Convert fields back to JSON
to_json();
</Exec>
</Input>
The output is the same for all three input samples.
{"Message":"The process has started","EventTime":"2021-08-29T22:50:12.000000-07:00"}
{"Message":"The process has stopped!","EventTime":"2021-08-29T22:50:26.979726-07:00"}
The below configuration is used to generate the output as JSON array.
<Extension json_ex>
Module xm_json
</Extension>
<Input in>
Module im_file
File 'C:\input.txt'
InputType json_ex
</Input>
<Output out>
Module om_file
<Exec>
delete($EventReceivedTime);
delete($SourceModuleName);
delete($SourceModuleType);
</Exec>
OutputType json_ex
File 'C:\output.txt'
</Output>
{ "uid": 175, "user": "terry", "os": "linux", "active": false, "ssh": null }
[ { "uid": 235, "user": "alex", "os": "linux", "active": true, "ssh": null }, { "uid": 333, "user": "roger", "os": "windows", "active": false, "ssh": null } ]
{ "uid": 853, "user": "ben", "os": "windows", "active": false, "ssh": null }
[ {"uid":175,"user":"terry","os":"linux","active":false,"ssh":null}, {"uid":235,"user":"alex","os":"linux","active":true,"ssh":null}, {"uid":333,"user":"roger","os":"windows","active":false,"ssh":null}, {"uid":853,"user":"ben","os":"windows","active":false,"ssh":null} ]
This configuration collects JSON-formatted event logs using the im_file input module, which populates the $raw_event
core field with the log line read from the file.
It then uses the extract_json() procedure to extract the EventData
node from the original event and rewrite the $raw_event
field.
<Extension json_ex>
Module xm_json
</Extension>
<Input in>
Module im_file
File '/path/to/log/file'
Exec extract_json("$.EventData");
</Input>
{
"Stream": "stdout",
"EventData": {
"EventTime": "2020-03-31 09:35:12",
"Message": "The process has started",
"Severity": "INFO"
}
}
{
"EventTime": "2020-03-31 09:35:12",
"Message": "The process has started",
"Severity": "INFO"
}