JSON (xm_json)
This module provides features to parse or generate JSON data.
Logs in JSON format can be parsed into structured data using one of two methods:
-
By calling the parse_json() procedure, which expects the
$raw_event
field or the string passed as a parameter to contain a valid JSON object. The most common use case for this is when an input module reads line-based data and each line is JSON object. -
By specifying the name of an xm_json module instance in the InputType directive of the input module, which instructs the module to parse the raw data directly. This method supports JSON spanning multiple lines (e.g. prettified JSON), JSON arrays, and newline-delimited JSON. See the example on Processing multiple JSON objects.
Log records can be converted to JSON format by calling the to_json() function or procedure. The records can be produced as JSON array elements by specifying the name of an xm_json module instance in the OutputType directive of the output module. See the example on Produce JSON array.
When using this module keep in mind the following limitations:
-
The JSON specification does not define a type for
datetime
values, therefore, these are represented as JSON strings. The JSON parser in this module can automatically detect datetime values, therefore, it is not necessary to explicitly use parsedate(). -
The parse_json() and to_json() functions do not support floating point or integer values that exceed the limits of the 32-bit or 64-bit instruction set architecture of the computer where NXLog runs. Those values are converted into strings to preserve the original data.
-
The length of JSON keys is limited to a maximum of 499 characters. Any attempt to use a longer key will result in the current operation failing and probable data loss.
To examine the supported platforms, see the list of installer packages in the Available Modules chapter. |
Configuration
The xm_json module accepts the following directives in addition to the common module directives.
Optional directives
This optional directive can be used to set the format of the datetime strings in the generated JSON.
This directive is similar to the global DateFormat, but is independent of it: this directive is defined separately and has its own default.
If this directive is not specified, the default is |
|
This optional directive can be used to disable the autodetection of nested JSON strings when calling the to_json() function or the to_json() procedure.
For example, consider a field |
|
This optional boolean directive specifies that the parse_json() procedure should flatten nested JSON, creating field names with dot notation.
The default is
|
|
This optional boolean directive specifies whether the generated JSON should be valid UTF-8.
The JSON specification requires JSON records to be UTF-8 encoded, and some tools fail to parse JSON if it is not valid UTF-8.
If ForceUTF8 is set to |
|
|
This boolean directive specifies that the to_json() function or the to_json() procedure should include fields having a leading dot ( |
If this boolean directive is set to |
|
If set to |
|
This optional boolean directive specifies that the to_json() procedure should generate nested JSON when field names exist containing the dot (
When UnFlatten is set to
|
Functions
The following functions are exported by xm_json.
- string
to_json()
-
Convert the fields to JSON and return this as a string value. Any field having a leading dot (
.
) or underscore (_
) will be automatically excluded unless IncludeHiddenFields directive is set to TRUE. The existing$raw_event
field is never included in the generated JSON.
Procedures
The following procedures are exported by xm_json.
extract_json(string jsonpath);
-
Search the
$raw_event
field using the specified JSONPath expression. If successfully matched, it rewrites the$raw_event
field with the value of the matched node. If no match is found,$raw_event
will be set to an empty string.
parse_json();
-
Parse the
$raw_event
field as JSON input.
parse_json(string source);
-
Parse the given string as JSON format.
to_json();
-
Convert the fields to JSON and write it to the
$raw_event
field. Any field starting with a dot (.
) or underscore (_
) will be automatically excluded unless the IncludeHiddenFields directive is set toTRUE
. The$raw_event
core field is not included when converting fields to JSON.
Creating and populating fields
The parse_json() procedure parses a string containing a JSON object into structured data.
It expects the $raw_event
field or the string passed as a parameter to be in the following format:
{"Key1":"Value1","Key2":"Value2"}
Once a log record is parsed with this procedure, fields are created according to the keys and values in the JSON object. The fields can be used for further log processing or to convert the log record into a different output format. For an example of how to parse JSON log records and manipulate fields, see Parsing JSON below.
Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension. |
Extracting values from JSON
Both function and procedure accept a JSONPath expression to select the required node and an optional string containing the JSON data to parse.
If no JSON data is specified, the content of the $raw_event
core field is parsed.
- Accepted JSONPath expressions
-
extract_json() implements a subset of JSONPath that is most commonly used in
$.a.['b']['c'].d[1].['e'][1]..f
notation:-
Node selection by name, e.g.,
$.a
or$['a']
. -
Array elements selection by index, including multi-dimensional arrays, e.g.,
$.a[1]
or$.a[1][1][1]
or$['a'][1][1][1]
. -
Search for a named node in one nesting layer, e.g.,
$..a
. -
Any sequence of the above expressions.
When specifying a
$..a
-type expression, extract_json() does not perform a recursive search for nodes but limits the scope to only child nodes of the current level. The first match is returned. -
- Returned data
-
The extract_json() function returns a string containing the node’s value or JSON if the matched node contains a JSON object. The extract_json() procedure sets the value of the
$raw_event
field instead of returning a value.Since data is parsed and rewritten as a string, floating-point values always include 6 digits after the point.
See Extracting values from JSON below for an example.
Examples
Converting logs to JSON
The following configuration accepts syslog (both BSD and IETF) via TCP and converts it to JSON.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input tcp>
Module im_tcp
ListenAddr 0.0.0.0:1514
Exec parse_syslog(); to_json();
</Input>
<Output file>
Module om_file
File "/var/log/json.txt"
</Output>
<Route tcp_to_file>
Path tcp => file
</Route>
<30>Sep 30 15:45:43 host44.localdomain.hu acpid: 1 client rule loaded
{
"MessageSourceAddress":"127.0.0.1",
"EventReceivedTime":"2011-03-08 14:22:41",
"SyslogFacilityValue":1,
"SyslogFacility":"DAEMON",
"SyslogSeverityValue":5,
"SyslogSeverity":"INFO",
"SeverityValue":2,
"Severity":"INFO",
"Hostname":"host44.localdomain.hu",
"EventTime":"2011-09-30 14:45:43",
"SourceName":"acpid",
"Message":"1 client rule loaded "
}
The following configuration reads the Windows Event Log and converts events to the BSD syslog format, with the message part containing the fields in JSON.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input eventlog>
Module im_msvistalog
Exec $Message = to_json(); to_syslog_bsd();
</Input>
<Output tcp>
Module om_tcp
Host 192.168.1.1:1514
</Output>
<Route eventlog_json_tcp>
Path eventlog => tcp
</Route>
<14>Mar 8 14:40:11 WIN-OUNNPISDHIG Service_Control_Manager: {"EventTime":"2012-03-08 14:40:11","EventTimeWritten":"2012-03-08 14:40:11","Hostname":"WIN-OUNNPISDHIG","EventType":"INFO","SeverityValue":2,"Severity":"INFO","SourceName":"Service Control Manager","FileName":"System","EventID":7036,"CategoryNumber":0,"RecordNumber":6788,"Message":"The nxlog service entered the running state. ","EventReceivedTime":"2012-03-08 14:40:12"}
Parsing JSON logs
The NXLog configuration below uses the im_file input module to collect JSON log lines from file.
Log records are parsed into structured data using the parse_json() procedure.
If the log record has a severity equal to NOTICE
, it is changed to INFO
.
Core fields that are not required are deleted and the log record is converted back to JSON using the to_json() procedure.
<Extension json>
Module xm_json
</Extension>
<Input from_file>
Module im_file
File "/tmp/input"
<Exec>
# Parse $raw_event and creating fields
parse_json();
# Change the value of the $Severity field
if ($Severity == 'NOTICE')
$Severity = 'INFO';
# Delete core fields that are not required
delete($SourceModuleType);
delete($SourceModuleName);
# Convert fields back to JSON
to_json();
</Exec>
</Input>
{
"Hostname": "Ubuntu-VM",
"Message": "The service has started.",
"Severity": "NOTICE"
}
The output sample below contains all fields from the input sample and the
$EventReceivedTime
core field.
{
"EventReceivedTime": "2021-08-29T22:50:12.000000-07:00",
"Hostname": "Ubuntu-VM",
"Message": "The service has started.",
"Severity": "INFO"
}
The sample inputs presented here in different form contain a pair of JSON objects which can be parsed by using the InputType directive of the input module.
{
"Date": "2021-08-29", "Time": "22:50:12",
"Message": "The process has started"
},
{
"Message": "The process has stopped!"
}
[{"Date": "2021-08-29", "Time": "22:50:12", "Message": "The process has started"}, {"Message": "The process has stopped!"}]
{"Date": "2021-08-29", "Time": "22:50:12", "Message": "The process has started"}
{"Message": "The process has stopped!"}
The below configuration parses JSON data using the xm_json module instance specified by the InputType directive.
The value of this directive must correspond to the name of the xm_json module instance, json_parser
in this case.
Using the InputType directive eliminates the need to use the parse_json() procedure.
The to_json() function is used to generate the output as JSON, which is only necessary as the configuration deletes some fields.
<Extension json_parser>
Module xm_json
</Extension>
<Input from_file>
Module im_file
File 'C:\input.txt'
InputType json_parser
<Exec>
# Checking for the $Date field presence in the entry
if defined $Date
{
# Creating the $EventTime field based on the $Date and $Time values
$EventTime = strptime($Date + $Time,"%Y-%m-%d %T");
}
else
{
# Using the $EventReceivedTime value if $Date is not specified
$EventTime = $EventReceivedTime;
}
# Deleting several fields to make the output shorter
delete($Date);
delete($Time);
delete($EventReceivedTime);
delete($SourceModuleName);
delete($SourceModuleType);
# Convert fields back to JSON
to_json();
</Exec>
</Input>
The output is the same for all three input samples.
{"Message":"The process has started","EventTime":"2021-08-29T22:50:12.000000-07:00"}
{"Message":"The process has stopped!","EventTime":"2021-08-29T22:50:26.979726-07:00"}
This configuration collects JSON-formatted event logs using the im_file input module, which populates the $raw_event
core field with the log line read from the file.
It then uses the extract_json() procedure to extract the EventData
node from the original event and rewrite the $raw_event
field.
<Extension json_ex>
Module xm_json
</Extension>
<Input in>
Module im_file
File '/path/to/log/file'
Exec extract_json("$.EventData");
</Input>
{
"Stream": "stdout",
"EventData": {
"EventTime": "2020-03-31 09:35:12",
"Message": "The process has started",
"Severity": "INFO"
}
}
{
"EventTime": "2020-03-31 09:35:12",
"Message": "The process has started",
"Severity": "INFO"
}
This example demonstrates how to use the parse_json() function to process JSON logs containing composite data. The function converts the raw JSON event into an object, allowing you to access individual fields.
{"message": "Session opened for user jdoe","metadata":{"event_type":"SSH Login","log_level":3}}
{"Message": "The cat has gone"}
This configuration collects logs from a file and parses log records using the parse_json() function. Variables prefixed with `$$`are module variables, which means they are only accessible by that module instance and not propagated through the log processing pipeline.
<Extension json_ex>
Module xm_json
</Extension>
<Input in>
Module im_file
File '/path/to/log/file'
<Exec>
$$parsed = json_ex->parse_json($raw_event); (1)
log_info("Parsed type: " + type($$parsed)); (2)
log_info("Message value: " + $$parsed("Message")); (3)
log_info("Attrib value: " + $$parsed("Attributes")); (4)
</Exec>
</Input>
1 | Parses the raw JSON event and stores the structured result in the $$parsed variable.
The parse_json() function automatically detects whether the input is a JSON object or an array. |
2 | Logs the data type of the $$parsed variable, which should be hash for objects and array for lists.
This verifies that the data was parsed correctly. |
3 | Extracts the value of the Message key from the parsed JSON.
If the key is missing, it logs an empty value. |
4 | Retrieves the Attributes field from the JSON object.
If this field contains a nested object, the structure is preserved.
If the key is missing, it logs an empty value. |
Notice that the second log entry in the output sample below has an empty Attributes
value.
If a field is missing from the JSON input, the parse_json()`function sets its value to `undef
.
2025-02-02 10:20:30 INFO [im_file|in1] Parsed type:hash
2025-02-02 10:20:30 INFO [im_file|in1] Message value:The cat has come
2025-02-02 10:20:30 INFO [im_file|in1] Attrib value:( 'Color' => 'black', 'Weight' => 10 )
2025-02-02 10:20:30 INFO [im_file|in1] Parsed type:hash
2025-02-02 10:20:30 INFO [im_file|in1] Message value:The cat has gone
2025-02-02 10:20:30 INFO [im_file|in1] Attrib value:
In addition to objects, parse_json() can automatically detect JSON arrays.
[[101, "jdoe", "ssh_login"], [102, "asmith", "failed_ssh"], [103, "mjane", "session_closed"]]
This configuration demonstrates how the parse_json() function processes JSON arrays.
<Extension json_ex>
Module xm_json
</Extension>
<Input in>
Module im_file
File '/path/to/log/file'
<Exec>
$$parsed = json_ex->parse_json($raw_event); (1)
log_info("Parsed type:" + type($$parsed)); (2)
log_info("Value:" + $$parsed[1]); (3)
log_info("Invalue:" + $$parsed[2][0]); (4)
</Exec>
</Input>
1 | Parses the raw JSON event and stores the structured result in the $$parsed variable.
The parse_json() function automatically detects whether the input is a JSON object or an array. |
2 | Logs the data type of the $$parsed variable, which should be array in this case.
This verifies that the data was parsed correctly. |
3 | Extracts the second element from the top-level array.
Since JSON arrays use zero-based indexing, $$parsed[1] refers to the second item in the list. |
4 | Outputs the first element of the third nested array. If the data structure is different or the index is out of range, it will result in an error or an empty value. |
2025-02-02 11:14:40 INFO [im_file|in1] Parsed type:array
2025-02-02 11:14:40 INFO [im_file|in1] Value:[ 102, "asmith", "failed_ssh" ]
2025-02-02 11:14:40 INFO [im_file|in1] Invalue:103
Output logs in JSON format
The below configuration is used to generate the output as a JSON array.
<Extension json_ex>
Module xm_json
</Extension>
<Input in>
Module im_file
File 'C:\input.txt'
InputType json_ex
</Input>
<Output out>
Module om_file
<Exec>
delete($EventReceivedTime);
delete($SourceModuleName);
delete($SourceModuleType);
</Exec>
OutputType json_ex
File 'C:\output.txt'
</Output>
{ "uid": 175, "user": "terry", "os": "linux", "active": false, "ssh": null }
[ { "uid": 235, "user": "alex", "os": "linux", "active": true, "ssh": null }, { "uid": 333, "user": "roger", "os": "windows", "active": false, "ssh": null } ]
{ "uid": 853, "user": "ben", "os": "windows", "active": false, "ssh": null }
[ {"uid":175,"user":"terry","os":"linux","active":false,"ssh":null}, {"uid":235,"user":"alex","os":"linux","active":true,"ssh":null}, {"uid":333,"user":"roger","os":"windows","active":false,"ssh":null}, {"uid":853,"user":"ben","os":"windows","active":false,"ssh":null} ]
Some SIEMs require sending data using a specific schema, which may include compound values. There are multiple ways to output compound values with the xm_json module. The configuration in these examples uses the im_testgen module to generate log events in the following format.
0@Tue Feb 04 13:05:37 2025
The first method is to use the to_json() function. However, this method may not process complex structures, such as JSON objects inside arrays or nested arrays in JSON objects, correctly.
<Extension json>
Module xm_json
</Extension>
<Input in1>
Module im_testgen
MaxCount 1
<Exec>
$generated = ( 'Message' => 'The cat has come', (1)
'Attributes' =>
( 'Color' => 'black',
'Weight' => 10 )
);
log_info("Generated type:" + type($generated)); (2)
</Exec>
</Input>
<Output out1>
Module om_file
File '/output/output.json'
Exec to_json(); (3)
</Output>
1 | Creates a Hash field called $generated comprising two keys: Message and Attributes . |
2 | Logs the data type of the $generated field.
This verifies the data is stored as a hash. |
3 | Converts the log record, including the $generated field, into a JSON-formatted string and writes it to the $raw_event core field. |
{
"SeverityValue": 2,
"EventTime": "2025-02-04T12:51:53.265985+00:00",
"SourceName": "nxlog",
"ProcessID": 6361,
"EventReceivedTime": "2025-02-05T12:51:53.265995+00:00",
"SourceModuleName": "in1",
"SourceModuleType": "im_testgen",
"Hostname": "localhost",
"generated": {
"Message": "The cat has come",
"Attributes": {
"Color": "black",
"Weight": 10
}
}
}
An alternative approach is to use the OutputType directive. This method structures the output differently but does not allow for formatting adjustments.
<Extension json>
Module xm_json
</Extension>
<Input in1>
Module im_testgen
MaxCount 1
<Exec>
$generated = ( 'Message' => 'The cat has come', (1)
'Attributes' =>
( 'Color' => 'black',
'Weight' => 10 )
);
log_info("Generated type:" + type($generated)); (2)
</Exec>
</Input>
<Output out1>
Module om_file
File '/output/output.json'
OutputType json (3)
</Output>
1 | Creates a Hash field called $generated comprising two keys: Message and Attributes . |
2 | Logs the data type of the $generated field.
This verifies the data is stored as a hash. |
3 | Specifies that the output format should match the structure of events from the json (xm_json) extension module.
This method maintains the event’s original structure, including any automatically added event fields, but does not allow for custom formatting or pretty-printing. |
[
{
"SeverityValue": 2,
"EventTime": "2024-05-06 12:39:48",
"SourceName": "nxlog",
"ProcessID": 6340,
"EventReceivedTime": "2024-05-06 12:39:48",
"SourceModuleName": "in1",
"SourceModuleType": "im_testgen",
"Hostname": "localhost",
"generated": {
"Message": "The cat has come",
"Attributes": {
"Color": "black",
"Weight": 10
}
}
}
]
There are a few things to consider when using this method:
-
Unlike the to_json() function, the OutputType directive does not allow for formatting or prettifying the output.
-
The closing bracket in the array format is added to the stream only when the file is closed.