JSON (xm_json)

This module provides features to parse or generate JSON data.

Log records can be converted to JSON format by calling the to_json() function or procedure.

Logs in JSON format can be parsed into structured data using one of two methods:

  • By calling the parse_json() procedure, which expects the $raw_event field or the string passed as a parameter to contain a valid JSON object. The most common use case for this is when an input module reads line-based data and each line is JSON object.

  • By specifying the name of an xm_json module instance in the InputType directive of the input module, which instructs the module to parse the raw data directly. This method supports JSON spanning multiple lines (e.g. prettified JSON), JSON arrays, and newline-delimited JSON. See the example on Processing multiple JSON objects.

The JSON specification does not define a type for datetime values, therefore, these are represented as JSON strings. The JSON parser in this module can automatically detect datetime values, therefore, it is not necessary to explicitly use parsedate().
The length of JSON keys is limited to a maximum of 499 characters. Any attempt to use a longer key will result in the current operation failing and probable data loss.

The records can be produced as JSON array elements by specifying the name of an xm_json module instance in the OutputType directive of the output module. See the example on Produce JSON array.

Configuration

The xm_json module accepts the following directives in addition to the common module directives.

Optional directives

DateFormat

This optional directive can be used to set the format of the datetime strings in the generated JSON. This directive is similar to the global DateFormat, but is independent of it: this directive is defined separately and has its own default. If this directive is not specified, the default is YYYY-MM-DDThh:mm:ss.sTZ.

DetectNestedJSON

This optional directive can be used to disable the autodetection of nested JSON strings when calling the to_json() function or the to_json() procedure. For example, consider a field $key which contains the string value of {"subkey":42}. If DetectNestedJSON is set to FALSE, to_json() will produce {"key":"{\"subkey\":42}"}. If DetectNestedJSON is set to TRUE (the default), the result is {"key":{"subkey":42}}—a valid nested JSON record.

Flatten

This optional boolean directive specifies that the parse_json() procedure should flatten nested JSON, creating field names with dot notation. The default is FALSE. If Flatten is set to TRUE, the following JSON will populate the fields $event.time and $event.severity:

{"event":{"time":"2015-01-01T00:00:00.000Z","severity":"ERROR"}}

ForceUTF8

This optional boolean directive specifies whether the generated JSON should be valid UTF-8. The JSON specification requires JSON records to be UTF-8 encoded, and some tools fail to parse JSON if it is not valid UTF-8. If ForceUTF8 is set to TRUE, the generated JSON will be validated and any invalid character will be replaced with a question mark (?). The default is FALSE.

IncludeHiddenFields

This boolean directive specifies that the to_json() function or the to_json() procedure should include fields having a leading dot (.) or underscore (_) in their names. The default is TRUE. If IncludeHiddenFields is set to TRUE, then generated JSON will contain these otherwise excluded fields.

ParseDate

If this boolean directive is set to TRUE, xm_json will attempt to parse as a timestamp any string that appears to begin with a 4-digit year (as a regular expression, ^[12][0-9]{3}-). If this directive is set to FALSE, xm_json will not attempt to parse these strings. The default is TRUE.

PrettyPrint

If set to TRUE, this optional boolean directive specifies that the generated JSON should be pretty-printed, where each key-value is printed on a new indented line. Note that this adds line breaks to the JSON records, which can cause parser errors in some tools that expect single-line JSON. If this directive is not specified, the default is FALSE.

UnFlatten

This optional boolean directive specifies that the to_json() procedure should generate nested JSON when field names exist containing the dot (.). For example, if UnFlatten is set to TRUE, the two fields $event.time and $event.severity will be converted to JSON as follows:

{"event":{"time":"2015-01-01T00:00:00.000Z","severity":"ERROR"}}

When UnFlatten is set to FALSE (the default if not specified), the following JSON would result:

{"event.time":"2015-01-01T00:00:00.000Z","event.severity":"ERROR"}

Functions

The following functions are exported by xm_json.

string extract_json(string jsonpath)

Search the $raw_event field using the specified JSONPath expression. If successfully matched, it returns the value of the matched node. If no match is found, it returns an empty string.

string extract_json(string jsonpath, string json_data)

Search json_data using the specified JSONPath expression. If successfully matched, it returns the value of the matched node. If no match is found, it returns an empty string.

unknown parse_json(string source)

Parse the parameter string as JSON format into a simple or compound returning value.

string to_json()

Convert the fields to JSON and return this as a string value. Any field having a leading dot (.) or underscore (_) will be automatically excluded unless IncludeHiddenFields directive is set to TRUE. The existing $raw_event field is never included in the generated JSON.

string to_json(unknown value)

Converts the value parameter of any simple or compound type to JSON and returns this as a string value.

Procedures

The following procedures are exported by xm_json.

extract_json(string jsonpath);

Search the $raw_event field using the specified JSONPath expression. If successfully matched, it rewrites the $raw_event field with the value of the matched node. If no match is found, $raw_event will be set to an empty string.

extract_json(string jsonpath, string json_data);

Search json_data using the specified JSONPath expression. If successfully matched, it rewrites the $raw_event field with the value of the matched node. If no match is found, $raw_event will be set to an empty string.

parse_json();

Parse the $raw_event field as JSON input.

parse_json(string source);

Parse the given string as JSON format.

to_json();

Convert the fields to JSON and put this into the $raw_event field. Any field having a leading dot (.) or underscore (_) will be automatically excluded unless IncludeHiddenFields directive is set to TRUE. The existing $raw_event field is never included in the generated JSON.

Creating and populating fields

The parse_json() procedure parses a string containing a JSON object into structured data. It expects the $raw_event field or the string passed as a parameter to be in the following format:

{"Key1":"Value1","Key2":"Value2"}

Once a log record is parsed with this procedure, fields are created according to the keys and values in the JSON object. The fields can be used for further log processing or to convert the log record into a different output format. For an example of how to parse JSON log records and manipulate fields, see Parsing JSON below.

Input modules may create additional fields containing various information. When converting to a different format, such fields will be included in the output log record, which may consume additional memory and bandwidth. For efficient handling of log records, consult the Fields section in the documentation of input modules and test the configuration before deployment. To delete any unwanted fields, use the delete() procedure or the xm_rewrite extension.

Extracting values from JSON

The extract_json() function or procedure extracts a node’s value from a JSON tree.

Both function and procedure accept a JSONPath expression to select the required node and an optional string containing the JSON data to parse. If no JSON data is specified, the content of the $raw_event core field is parsed.

Accepted JSONPath expressions

extract_json() implements a subset of JSONPath that is most commonly used in $.a.['b']['c'].d[1].['e'][1]..f notation:

  • Node selection by name, e.g., $.a or $['a'].

  • Array elements selection by index, including multi-dimensional arrays, e.g., $.a[1] or $.a[1][1][1] or $['a'][1][1][1].

  • Search for a named node in one nesting layer, e.g., $..a.

  • Any sequence of the above expressions.

When specifying a $..a-type expression, extract_json() does not perform a recursive search for nodes but limits the scope to only child nodes of the current level. The first match is returned.

Returned data

The extract_json() function returns a string containing the node’s value or JSON if the matched node contains a JSON object. The extract_json() procedure sets the value of the $raw_event field instead of returning a value.

Since data is parsed and rewritten as a string, floating-point values always include 6 digits after the point.

See Extracting values from JSON below for an example.

Examples

Example 1. Converting syslog to JSON format

The following configuration accepts syslog (both BSD and IETF) via TCP and converts it to JSON.

nxlog.conf
<Extension syslog>
    Module  xm_syslog
</Extension>

<Extension json>
    Module  xm_json
</Extension>

<Input tcp>
    Module        im_tcp
    ListenAddr    0.0.0.0:1514
    Exec          parse_syslog(); to_json();
</Input>

<Output file>
    Module  om_file
    File    "/var/log/json.txt"
</Output>

<Route tcp_to_file>
    Path    tcp => file
</Route>
Input sample
<30>Sep 30 15:45:43 host44.localdomain.hu acpid: 1 client rule loaded
Output sample
{
  "MessageSourceAddress":"127.0.0.1",
  "EventReceivedTime":"2011-03-08 14:22:41",
  "SyslogFacilityValue":1,
  "SyslogFacility":"DAEMON",
  "SyslogSeverityValue":5,
  "SyslogSeverity":"INFO",
  "SeverityValue":2,
  "Severity":"INFO",
  "Hostname":"host44.localdomain.hu",
  "EventTime":"2011-09-30 14:45:43",
  "SourceName":"acpid",
  "Message":"1 client rule loaded "
}
Example 2. Converting Windows Event Log to syslog-encapsulated JSON

The following configuration reads the Windows Event Log and converts events to the BSD syslog format, with the message part containing the fields in JSON.

nxlog.conf
<Extension syslog>
    Module      xm_syslog
</Extension>

<Extension json>
    Module      xm_json
</Extension>

<Input eventlog>
    Module      im_msvistalog
    Exec        $Message = to_json(); to_syslog_bsd();
</Input>

<Output tcp>
    Module      om_tcp
    Host        192.168.1.1:1514
</Output>

<Route eventlog_json_tcp>
    Path        eventlog => tcp
</Route>
Output sample
<14>Mar  8 14:40:11 WIN-OUNNPISDHIG Service_Control_Manager: {"EventTime":"2012-03-08 14:40:11","EventTimeWritten":"2012-03-08 14:40:11","Hostname":"WIN-OUNNPISDHIG","EventType":"INFO","SeverityValue":2,"Severity":"INFO","SourceName":"Service Control Manager","FileName":"System","EventID":7036,"CategoryNumber":0,"RecordNumber":6788,"Message":"The nxlog service entered the running state. ","EventReceivedTime":"2012-03-08 14:40:12"}
Example 3. Parsing JSON

The NXLog Agent configuration below uses the im_file input module to collect JSON log lines from a file. Log records are parsed into structured data using the parse_json() procedure. If the log record has a severity equal to NOTICE, it is changed to INFO. Core fields that are not required are deleted and the log record is converted back to JSON using the to_json() procedure.

nxlog.conf
<Extension json>
    Module      xm_json
</Extension>

<Input from_file>
    Module      im_file
    File        "/tmp/input"
    <Exec>
        # Parse $raw_event and creating fields
        parse_json();

        # Change the value of the $Severity field
        if ($Severity == 'NOTICE')
            $Severity = 'INFO';

        # Delete core fields that are not required
        delete($SourceModuleType);
        delete($SourceModuleName);

        # Convert fields back to JSON
        to_json();
    </Exec>
</Input>
Input sample
{
  "Hostname": "Ubuntu-VM",
  "Message": "The service has started.",
  "Severity": "NOTICE"
}

The output sample below contains all fields from the input sample and the $EventReceivedTime core field.

Output sample
{
  "EventReceivedTime": "2021-08-29T22:50:12.000000-07:00",
  "Hostname": "Ubuntu-VM",
  "Message": "The service has started.",
  "Severity": "INFO"
}
Example 4. Processing multiple JSON objects

The sample inputs presented here in different forms contain a pair of JSON objects which can be parsed by using the InputType directive of the input module.

JSON objects spanning multiple lines
{
  "Date": "2021-08-29", "Time": "22:50:12",
  "Message": "The process has started"
},
{
  "Message": "The process has stopped!"
}
Array of JSON objects
[{"Date": "2021-08-29", "Time": "22:50:12", "Message": "The process has started"}, {"Message": "The process has stopped!"}]
Newline-delimited JSON objects
{"Date": "2021-08-29", "Time": "22:50:12", "Message": "The process has started"}
{"Message": "The process has stopped!"}

The below configuration parses JSON data using the xm_json module instance specified by the InputType directive. The value of this directive must correspond to the name of the xm_json module instance, in this case json_parser. Using the InputType directive eliminates the need to use the parse_json() procedure.

The to_json() function is used to generate the output as JSON, which is only necessary as the configuration deletes some fields.

nxlog.conf
<Extension json_parser>
    Module      xm_json
</Extension>

<Input from_file>
    Module      im_file
    File        'C:\input.txt'
    InputType   json_parser
    <Exec>
        # Checking for the $Date field presence in the entry
        if defined $Date
        {
            # Creating the $EventTime field based on the $Date and $Time values
            $EventTime = strptime($Date + $Time,"%Y-%m-%d %T");
        }
        else
        {
            # Using the $EventReceivedTime value if $Date is not specified
            $EventTime = $EventReceivedTime;
        }
        # Deleting several fields to make the output shorter
        delete($Date);
        delete($Time);
        delete($EventReceivedTime);
        delete($SourceModuleName);
        delete($SourceModuleType);

        # Convert fields back to JSON
        to_json();
    </Exec>
</Input>

The output is the same for all three input samples.

Output sample in JSON
{"Message":"The process has started","EventTime":"2021-08-29T22:50:12.000000-07:00"}
{"Message":"The process has stopped!","EventTime":"2021-08-29T22:50:26.979726-07:00"}
Example 5. Produce JSON array

The below configuration is used to generate the output as a JSON array.

nxlog.conf
<Extension json_ex>
    Module       xm_json
</Extension>

<Input in>
    Module       im_file
    File         'C:\input.txt'
    InputType    json_ex
</Input>

<Output out>
    Module       om_file

    <Exec>
        delete($EventReceivedTime);
        delete($SourceModuleName);
        delete($SourceModuleType);
    </Exec>

    OutputType   json_ex
    File         'C:\output.txt'
</Output>
Input data sample
{ "uid": 175, "user": "terry", "os": "linux", "active": false, "ssh": null }
[ { "uid": 235, "user": "alex", "os": "linux", "active": true, "ssh": null }, { "uid": 333, "user": "roger", "os": "windows", "active": false, "ssh": null } ]
{ "uid": 853, "user": "ben", "os": "windows", "active": false, "ssh": null }
Output sample of produced JSON array
[ {"uid":175,"user":"terry","os":"linux","active":false,"ssh":null}, {"uid":235,"user":"alex","os":"linux","active":true,"ssh":null}, {"uid":333,"user":"roger","os":"windows","active":false,"ssh":null}, {"uid":853,"user":"ben","os":"windows","active":false,"ssh":null} ]
Example 6. Extracting values from JSON data

This configuration collects JSON-formatted event logs using the im_file input module, which populates the $raw_event core field with the log line read from the file. It then uses the extract_json() procedure to extract the EventData node from the original event and rewrite the $raw_event field.

nxlog.conf
<Extension json_ex>
    Module    xm_json
</Extension>

<Input in>
    Module    im_file
    File      '/path/to/log/file'
    Exec      extract_json("$.EventData");
</Input>
Input sample
{
  "Stream": "stdout",
  "EventData": {
    "EventTime": "2020-03-31 09:35:12",
    "Message": "The process has started",
    "Severity": "INFO"
  }
}
Output sample
{
  "EventTime": "2020-03-31 09:35:12",
  "Message": "The process has started",
  "Severity": "INFO"
}