NXLog Docs

Log normalization

Log normalization is required when events from different log sources need to be translated to a common data structure. You need to normalize logs according to the platform taxonomy when forwarding events to a SIEM or log analytics platform. Almost all SIEM solutions have taxonomies for different types of logs, such as:

  • Logon events (successful or failed login, logoff)

  • File access (auditing)

  • Network access (VPN connections, web access)

Examples include the Elastic Common Schema (ECS), which defines a standard set of fields to store event data in Elasticsearch, or the Unified Data Model (UDM) when forwarding events to Send logs to Google Chronicle. Data normalization enables SIEMs to efficiently interpret logs across different sources, facilitates event correlation, and makes it easier for you to work with the data in dashboards and reports.

Methods to normalize data

NXLog enables you to translate logs from different sources into a single taxonomy. Once logs are parsed, you can map event fields to the required schema, enrich log records with additional fields, and output events in a different format altogether.

Mapping fields

The NXLog language provides the rename_field() and delete() procedures for simple manipulation of fields.

Example 1. Renaming and deleting fields

This example reads syslog messages from file and parses records into structured data using the parse_syslog() procedure of the xm_syslog module. The NXLog $SourceModuleType and $SourceModuleName core fields are renamed to $NXLogModuleType and $NXLogModuleName respectively, and the NXLog $SeverityValue and $Severity fields are deleted. Finally, records are converted to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module    xm_json
</Extension>

<Extension syslog>
    Module    xm_syslog
</Extension>

<Input system_messages>
    Module    im_file
    File      '/var/log/syslog'
    <Exec>
        parse_syslog();

        # Rename a field by passing the field names as strings
        # or the fields themselves
        rename_field("SourceModuleType", "NXLogModuleType");
        rename_field($SourceModuleName, $NXLogModuleName);

        # Delete a field by passing the field name as a string
        # or the field itself
        delete("SeverityValue");
        delete($Severity);

        to_json();
    </Exec>
</Input>
Input sample

The following is a syslog message collected from a Linux host.

Mar  9 23:11:26 NXLog-Ubuntu-1 systemd[1]: Started NXLog daemon.
Output sample

The following JSON object shows the same log record after it was processed by NXLog.

{
  "EventReceivedTime": "2022-03-09T23:11:26.880942+01:00",
  "NXLogModuleName": "system_messages",
  "NXLogModuleType": "im_file",
  "SyslogFacilityValue": 1,
  "SyslogFacility": "USER",
  "SyslogSeverityValue": 5,
  "SyslogSeverity": "NOTICE",
  "Hostname": "NXLog-Server-1",
  "EventTime": "2022-03-09T23:11:26.000000+01:00",
  "SourceName": "systemd",
  "ProcessID": 1,
  "Message": "Started NXLog daemon."
}

For more advanced processing, the xm_rewrite module allows you to rename or delete fields, specify a list of fields to retain, and transform the data based on custom processing.

Example 2. Mapping fields with xm_rewrite

This example reads syslog messages from file. Records are parsed into structured data using the parse_syslog() procedure of the xm_syslog module. The xm_rewrite module is used to map NXLog fields to the Elastic Common Schema. Finally, records are converted to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module    xm_json
</Extension>

<Extension syslog>
    Module    xm_syslog
</Extension>

<Extension syslog_ecs>
    Module    xm_rewrite
    Rename    EventTime, @timestamp
    Rename    EventReceivedTime, event.ingested
    Rename    Severity, event.severity
    Rename    SeverityValue, log.level
    Rename    SyslogSeverityValue, log.syslog.severity.code
    Rename    SyslogSeverity, log.syslog.severity.name
    Rename    SyslogFacilityValue, log.syslog.facility.code
    Rename    SyslogFacility, log.syslog.facility.name
    Rename    ProcessID, process.pid
    Rename    SourceName, service.type
    Rename    Message, message
    Rename    SourceModuleType, nxlog.module.type
    Rename    SourceModuleName, nxlog.module.name
    <Exec>
        ${event.original} = $raw_event;
        ${ecs.version} = "8.0.1";
        if $Hostname =~ /^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$/
        {
            ${host.ip} = $Hostname;
        }
        else
        {
            ${host.hostname} = $Hostname;
        }
    </Exec>
    Delete    Hostname
</Extension>

<Input system_messages>
    Module    im_file
    File      '/var/log/syslog'
    <Exec>
        parse_syslog();
        syslog_ecs->process();
        to_json();
    </Exec>
</Input>
Input sample

The following is a syslog message collected from a Linux host.

Mar  9 22:32:33 NXLog-Server-1 systemd[1]: Started NXLog daemon.
Output sample

The following JSON object shows the same log record after it was processed by NXLog.

{
  "event.ingested": "2022-03-09T22:32:33.485096+01:00",
  "nxlog.module.name": "system_messages",
  "nxlog.module.type": "im_file",
  "log.syslog.facility.code": 1,
  "log.syslog.facility.name": "USER",
  "log.syslog.severity.code": 5,
  "log.syslog.severity.name": "NOTICE",
  "log.level": 2,
  "event.severity": "INFO",
  "@timestamp": "2022-03-09T22:32:33.000000+01:00",
  "service.type": "systemd",
  "process.pid": 1,
  "message": "Started NXLog daemon.",
  "event.original": "Mar  9 22:32:33 NXLog-Ubuntu-1 systemd[1]: Started NXLog daemon.",
  "ecs.version": "8.0.1",
  "host.hostname": "NXLog-Server-1"
}

Data enrichment

Normalization may require log records to include a set of standard metadata fields, such as labels that describe the environment where the event was generated and keywords to tag the event. Such data might not be part of the event record but must be added from an external source. NXLog provides several methods to enrich log records.

The envvar general directive allows you to use operating system environment variables accessible by the NXLog user.

Example 3. Using environment variables

This example defines three environment variables to retrieve CPU information. The im_msvistalog module is used to read Windows events from the System log. Log records are enriched with the CPU information from the environment variables and then converted to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
envvar PROCESSOR_IDENTIFIER
envvar PROCESSOR_ARCHITECTURE
envvar NUMBER_OF_PROCESSORS

<Extension json>
    Module    xm_json
</Extension>

<Input eventlog>
    Module    im_msvistalog
    <QueryXML>
        <QueryList>
             <Query Id="0">
                <Select Path="System">*</Select>
             </Query>
        </QueryList>
    </QueryXML>
    <Exec>
        ${meta.processor} = '%PROCESSOR_IDENTIFIER%';
        ${meta.processor_arch} = '%PROCESSOR_ARCHITECTURE%';
        ${meta.processor_count} = %NUMBER_OF_PROCESSORS%;
        to_json();
    </Exec>
</Input>
Output sample

The following JSON object shows a Windows System event after it was processed by NXLog.

{
  "EventTime": "2022-03-09T18:25:40.946416+01:00",
  "Hostname": "SERVER1.example.com",
  "Keywords": "9259400833873739776",
  "LevelValue": 4,
  "EventType": "INFO",
  "SeverityValue": 2,
  "Severity": "INFO",
  "EventID": 7036,
  "SourceName": "Service Control Manager",
  "ProviderGuid": "{555908D1-A6D7-4695-8E1E-26931D2012F4}",
  "Version": 0,
  "TaskValue": 0,
  "OpcodeValue": 0,
  "RecordNumber": 23944,
  "ExecutionProcessID": 536,
  "ExecutionThreadID": 1700,
  "Channel": "System",
  "Message": "The nxlog service entered the running state.",
  "Level": "Information",
  "param1": "nxlog",
  "param2": "running",
  "EventData.Binary": "6E0078006C006F0067002F0034000000",
  "EventReceivedTime": "2022-03-09T18:25:42.962159+01:00",
  "SourceModuleName": "eventlog",
  "SourceModuleType": "im_msvistalog",
  "meta.processor": "Intel64 Family 6 Model 165 Stepping 5, GenuineIntel",
  "meta.processor_arch": "AMD64",
  "meta.processor_count": 16
}

The include and include_stdout general directives allow you to load data into the NXLog configuration from a file or script respectively. For example, with include_stdout, you can execute a script to read dynamic data and inject the script’s output into the configuration.

Example 4. Loading values from a file or script

This example uses two files to inject static and dynamic values into the NXLog configuration. The first file defines two static values for the operating system name and version.

env.conf
define OS_NAME    Linux Ubuntu
define OS_VER     20.04

The second is a bash script to retrieve CPU information from the operating system and output the values to the standard output.

env.sh
#!/bin/bash

PROCESSOR=$(cat /proc/cpuinfo  | grep 'name'| uniq)
PROCESSOR_COUNT=$(cat /proc/cpuinfo  | grep process| wc -l)
PROCESSOR_ARCH=$(uname -m)

echo "define PROCESSOR $PROCESSOR"
echo "define PROCESSOR_COUNT $PROCESSOR_COUNT"
echo "define PROCESSOR_ARCH $PROCESSOR_ARCH"

The above files are included in the NXLog configuration using the include and include_stdout directives. The im_file input module is used to read syslog messages from file and records are parsed into structured data using the parse_syslog() procedure of the xm_syslog module. Log records are enriched with the operating system and CPU information included from the file and script and then converted to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
include           /opt/nxlog/etc/env.conf
include_stdout    /opt/nxlog/etc/env.sh

<Extension json>
    Module        xm_json
</Extension>

<Extension syslog>
    Module        xm_syslog
</Extension>

<Input system_messages>
    Module        im_file
    File          '/var/log/syslog'
    <Exec>
        parse_syslog();

        ${meta.os_name} = '%OS_NAME%';
        ${meta.os_ver} = '%OS_VER%';

        ${meta.processor} = '%PROCESSOR%';
        ${meta.processor_arch} = '%PROCESSOR_ARCH%';
        ${meta.processor_count} = %PROCESSOR_COUNT%;

        to_json();
    </Exec>
</Input>
Input sample

The following is a syslog message collected from a Linux host.

Mar  9 17:02:16 NXLog-Ubuntu-1 systemd[1]: Started NXLog daemon.
Output sample

The following JSON object shows the same log record after it was processed by NXLog.

{
  "EventReceivedTime": "2022-03-09T17:02:16.172998+01:00",
  "SourceModuleName": "file",
  "SourceModuleType": "im_file",
  "SyslogFacilityValue": 1,
  "SyslogFacility": "USER",
  "SyslogSeverityValue": 5,
  "SyslogSeverity": "NOTICE",
  "SeverityValue": 2,
  "Severity": "INFO",
  "Hostname": "NXLog-Server-1",
  "EventTime": "2022-03-09T17:02:16.000000+01:00",
  "SourceName": "systemd",
  "ProcessID": 1,
  "Message": "Started NXLog daemon.",
  "meta.os_name": "Linux Ubuntu",
  "meta.os_ver": "20.04",
  "meta.processor": "model name\t: Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz",
  "meta.processor_arch": "x86_64",
  "meta.processor_count": 16
}

The NXLog language contains several other functions that you can use for log enrichment, such as the host_ip() and hostname() functions. See Functions in the NXLog Enterprise Edition Reference Manual for a complete listing.

Output log format

JSON is one of the most common formats supported by modern SIEM solutions. NXLog supports data conversion to JSON with the xm_json module, as demonstrated by the examples above. Additionally, it also supports data conversion from and to other formats, including:

Refer to the module documentation in the NXLog EE Reference Manual for further details. You will also find SIEM-specific examples in our extensive list of Integration guides.

Standard NXLog fields

The fields available for a log record depend on the log source and which input or extension modules processed it. However, some fields are standard across several NXLog modules.

The following table lists the fields created by the NXLog core for every record.

Table 1. Core fields
Field Type Description

raw_event

string

The data received from stream modules (e.g. im_file, im_tcp, etc.).

EventReceivedTime

datetime

The time when the event is received or collected by NXLog. *

Hostname

string

The IP address or hostname where the event originated.

SourceModuleName

string

The name of the NXLog input module instance. *

SourceModuleType

string

The type of the NXLog input module instance (e.g. im_file). *

* If these fields already exist, they will not be overwritten.

The following fields are standard across several modules.

Table 2. Event fields
Field Type Description

EventTime

datetime

The date and time of the event.

EventType

string

This field describes the type of event according to the log source, e.g. for Windows events, it represents the severity (CRITICAL, ERROR, etc.) while for IBM AIX audit logs, it represents the type of audit event (USER_Login, FILE_Unlink, etc.)

Message

string

The event message.

MessageSourceAddress

ipaddr

The IP address of the remote host. Available in network modules (e.g. im_tcp, im_udp, etc.)

ProcessID

string, integer

The ID of the process that generated the event.

Severity

string

Severity name corresponding to the SeverityValue: Debug (1), Info (2), Warning (3), Error (4), Critical (5).

SeverityValue

string, integer

NXLog normalized severity value between 1 - 5. Refer to the module documentation for how this value is mapped to the severity set by the log source.

SourceName

string

The application or device that generated the event.

The field type may differ from one module to another. It is essential to handle data types according to your requirements when normalizing data. The NXLog language provides several data conversion functions. Refer to Functions in the NXLog Enterprise Edition Reference Manual.

See the module documentation in the NXLog EE Reference Manual for a list of fields created by each module.