Normalize logs with NXLog Agent

Data normalization enables SIEMs to interpret logs from different sources efficiently, facilitates event correlation, and makes it easier for you to work with the data in dashboards and reports. Almost all SIEM solutions have taxonomies for different types of logs. In addition, they use a set of standard metadata fields, such as labels that describe the system that generated the event. Such data might not be part of the event record, but you must add it from an external source.

Below, we provide examples of transforming and enriching log records with NXLog Agent.

Rewrite logs

One of the most common methods to search for and replace text is using regular expressions.

Example 1. Search text with a regular expression

This configuration reads syslog messages from a file with the im_file input module, which stores the log line in the $raw_event field. It then uses a regular expression to search the message for error and, if found, enriches the log record with a $Severity field.

Finally, it converts records to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module    xm_json
</Extension>

<Input system_messages>
    Module    im_file
    File      '/var/log/syslog'
    <Exec>
        if $raw_event =~ /error/i
        {
            $Severity = "ERROR";
        }
        $Message = $raw_event;
        to_json();
    </Exec>
</Input>

The following is a syslog message collected from a Linux host.

Input sample
Oct 11 15:50:21 SERVER-1 xdg-desktop-por[15999]: Failed to get application states: GDBus.Error:org.freedesktop.portal.Error.Failed: Could not get window list

The following JSON object shows the same log record after NXLog Agent processed it.

Output sample
{
  "EventReceivedTime": "2023-10-11T15:50:26.252732+02:00",
  "SourceModuleName": "system_messages",
  "SourceModuleType": "im_file",
  "Hostname": "SERVER-1",
  "Severity": "ERROR",
  "Message": "Oct 11 15:50:21 SERVER-1 xdg-desktop-por[15999]: Failed to get application states: GDBus.Error:org.freedesktop.portal.Error.Failed: Could not get window list"
}

NXLog Agent supports the regex /g global modifier, which you can use to replace text in place.

Example 2. Replace text with a regular expression

This configuration reads syslog messages from a file with the im_file input module, which stores the log line in the $raw_event field. It then uses a regular expression to search the message for IP addresses and, if found, replaces them with ******.

nxlog.conf
<Input system_messages>
    Module    im_file
    File      '/var/log/syslog'
    <Exec>
        if $raw_event =~ s/((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}/******/g 
        {
            log_info("Sanitized IP address");
        }
    </Exec>
</Input>

The following is a syslog message containing an IP address.

Input sample
Oct 11 09:17:41 SERVER-1 NetworkManager[547]: <info>  [1698218261.0783] dhcp4 (enp0s3): option ip_address => '10.0.0.103'

The following is the same log message after NXLog Agent processed it.

Output sample
Oct 11 09:17:41 SERVER-1 NetworkManager[547]: <info>  [1698218261.0783] dhcp4 (enp0s3): option ip_address => '******'

Rename and delete fields

The NXLog language provides the rename_field() and delete() procedures for simple manipulation of fields.

Example 3. Rename and delete fields

This configuration reads syslog messages from a file and parses records into structured data using the parse_syslog() procedure of the xm_syslog module.

It renames the NXLog Agent $SourceModuleType and $SourceModuleName core fields to $NXLogModuleType and $NXLogModuleName and deletes the $SeverityValue and $Severity fields.

Finally, it converts records to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module    xm_json
</Extension>

<Extension syslog>
    Module    xm_syslog
</Extension>

<Input system_messages>
    Module    im_file
    File      '/var/log/syslog'
    <Exec>
        parse_syslog();

        # Rename a field by passing the field names as strings
        # or the fields themselves
        rename_field("SourceModuleType", "NXLogModuleType");
        rename_field($SourceModuleName, $NXLogModuleName);

        # Delete a field by passing the field name as a string
        # or the field itself
        delete("SeverityValue");
        delete($Severity);

        to_json();
    </Exec>
</Input>

The following is a syslog message collected from a Linux host.

Input sample
Oct  11 23:11:26 SERVER-1 systemd[1]: Started NXLog daemon.

The following JSON object shows the same log record after NXLog Agent processed it.

Output sample
{
  "EventReceivedTime": "2023-10-11T23:11:26.880942+01:00",
  "NXLogModuleName": "system_messages",
  "NXLogModuleType": "im_file",
  "SyslogFacilityValue": 1,
  "SyslogFacility": "USER",
  "SyslogSeverityValue": 5,
  "SyslogSeverity": "NOTICE",
  "Hostname": "SERVER-1",
  "EventTime": "2023-10-11T23:11:26.000000+01:00",
  "SourceName": "systemd",
  "ProcessID": 1,
  "Message": "Started NXLog daemon."
}

Map fields

For more advanced log transformation, you can use the xm_rewrite module to rename or delete fields, specify a list of fields to retain and transform the data based on custom processing.

Example 4. Map fields with xm_rewrite

This configuration reads syslog messages from a file. It parses records into structured data using the parse_syslog() procedure of the xm_syslog module.

It then uses the xm_rewrite module to map NXLog Agent fields to the Elastic Common Schema and converts records to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module    xm_json
</Extension>

<Extension syslog>
    Module    xm_syslog
</Extension>

<Extension syslog_ecs>
    Module    xm_rewrite
    Rename    EventTime, @timestamp
    Rename    EventReceivedTime, event.ingested
    Rename    Severity, event.severity
    Rename    SeverityValue, log.level
    Rename    SyslogSeverityValue, log.syslog.severity.code
    Rename    SyslogSeverity, log.syslog.severity.name
    Rename    SyslogFacilityValue, log.syslog.facility.code
    Rename    SyslogFacility, log.syslog.facility.name
    Rename    ProcessID, process.pid
    Rename    SourceName, service.type
    Rename    Message, message
    Rename    SourceModuleType, nxlog.module.type
    Rename    SourceModuleName, nxlog.module.name
    <Exec>
        ${event.original} = $raw_event;
        ${ecs.version} = "8.0.1";
        if $Hostname =~ /^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$/
        {
            ${host.ip} = $Hostname;
        }
        else
        {
            ${host.hostname} = $Hostname;
        }
    </Exec>
    Delete    Hostname
</Extension>

<Input system_messages>
    Module    im_file
    File      '/var/log/syslog'
    <Exec>
        parse_syslog();
        syslog_ecs->process();
        to_json();
    </Exec>
</Input>

The following is a syslog message collected from a Linux host.

Input sample
Oct  11 23:11:26 SERVER-1 systemd[1]: Started NXLog daemon.

The following JSON object shows the same log record after NXLog Agent processed it.

Output sample
{
  "event.ingested": "2023-10-11T23:11:26.485096+01:00",
  "nxlog.module.name": "system_messages",
  "nxlog.module.type": "im_file",
  "log.syslog.facility.code": 1,
  "log.syslog.facility.name": "USER",
  "log.syslog.severity.code": 5,
  "log.syslog.severity.name": "NOTICE",
  "log.level": 2,
  "event.severity": "INFO",
  "@timestamp": "2023-10-11T23:11:26.000000+01:00",
  "service.type": "systemd",
  "process.pid": 1,
  "message": "Started NXLog daemon.",
  "event.original": "Oct  11 23:11:26 SERVER-1 systemd[1]: Started NXLog daemon.",
  "ecs.version": "8.0.1",
  "host.hostname": "SERVER-1"
}

Use environment variables

You can enrich log records with information from the operating system’s environment variables with the envvar general directive. The NXLog Agent can access environment variables that are visible to its service user.

Example 5. Enrich logs with environment variables

This configuration defines three environment variables to retrieve CPU information. It uses the im_msvistalog module to collect Windows events from the System log.

It then enriches log records with the CPU information from the environment variables and converts them to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
envvar PROCESSOR_IDENTIFIER
envvar PROCESSOR_ARCHITECTURE
envvar NUMBER_OF_PROCESSORS

<Extension json>
    Module    xm_json
</Extension>

<Input eventlog>
    Module    im_msvistalog
    <QueryXML>
        <QueryList>
             <Query Id="0">
                <Select Path="System">*</Select>
             </Query>
        </QueryList>
    </QueryXML>
    <Exec>
        ${meta.processor} = '%PROCESSOR_IDENTIFIER%';
        ${meta.processor_arch} = '%PROCESSOR_ARCHITECTURE%';
        ${meta.processor_count} = %NUMBER_OF_PROCESSORS%;
        to_json();
    </Exec>
</Input>

The following JSON object shows a Windows System event after NXLog Agent processed it.

Output sample
{
  "EventTime": "2023-10-11T18:25:40.946416+01:00",
  "Hostname": "SERVER-1.example.com",
  "Keywords": "9259400833873739776",
  "LevelValue": 4,
  "EventType": "INFO",
  "SeverityValue": 2,
  "Severity": "INFO",
  "EventID": 7036,
  "SourceName": "Service Control Manager",
  "ProviderGuid": "{555908D1-A6D7-4695-8E1E-26931D2012F4}",
  "Version": 0,
  "TaskValue": 0,
  "OpcodeValue": 0,
  "RecordNumber": 23944,
  "ExecutionProcessID": 536,
  "ExecutionThreadID": 1700,
  "Channel": "System",
  "Message": "The nxlog service entered the running state.",
  "Level": "Information",
  "param1": "nxlog",
  "param2": "running",
  "EventData.Binary": "6E0078006C006F0067002F0034000000",
  "EventReceivedTime": "2022-03-09T18:25:42.962159+01:00",
  "SourceModuleName": "eventlog",
  "SourceModuleType": "im_msvistalog",
  "meta.processor": "Intel64 Family 6 Model 165 Stepping 5, GenuineIntel",
  "meta.processor_arch": "AMD64",
  "meta.processor_count": 16
}

Load data from a file or script

The include and include_stdout general directives allow you to load data from a file or script into the NXLog Agent configuration. For example, with include_stdout, you can execute a script to read dynamic data and inject the script’s output into the configuration.

Example 6. Inject metadata into the NXLog Agent configuration

This configuration uses two files to inject static and dynamic values into the NXLog Agent configuration. The first file defines two static values for the operating system name and version.

env.conf
define OS_NAME    Linux Ubuntu
define OS_VER     20.04

The second is a bash script to retrieve CPU information from the operating system and output the values to the standard output.

env.sh
#!/bin/bash

PROCESSOR=$(cat /proc/cpuinfo  | grep 'name'| uniq)
PROCESSOR_COUNT=$(cat /proc/cpuinfo  | grep process| wc -l)
PROCESSOR_ARCH=$(uname -m)

echo "define PROCESSOR $PROCESSOR"
echo "define PROCESSOR_COUNT $PROCESSOR_COUNT"
echo "define PROCESSOR_ARCH $PROCESSOR_ARCH"

The above files are included in the NXLog Agent configuration using the include and include_stdout directives.

The configuration reads syslog messages from a file with the im_file input module and parses records into structured data using the parse_syslog() procedure of the xm_syslog module. It then enriches log records with the operating system and CPU information and converts them to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
include           /opt/nxlog/etc/env.conf
include_stdout    /opt/nxlog/etc/env.sh

<Extension json>
    Module        xm_json
</Extension>

<Extension syslog>
    Module        xm_syslog
</Extension>

<Input system_messages>
    Module        im_file
    File          '/var/log/syslog'
    <Exec>
        parse_syslog();

        ${meta.os_name} = '%OS_NAME%';
        ${meta.os_ver} = '%OS_VER%';

        ${meta.processor} = '%PROCESSOR%';
        ${meta.processor_arch} = '%PROCESSOR_ARCH%';
        ${meta.processor_count} = %PROCESSOR_COUNT%;

        to_json();
    </Exec>
</Input>

The following is a syslog message collected from a Linux host.

Input sample
Oct  11 23:11:26 SERVER-1 systemd[1]: Started NXLog daemon.

The following JSON object shows the same log record after NXLog Agent processed it.

Output sample
{
  "EventReceivedTime": "2023-10-11T23:11:26.172998+01:00",
  "SourceModuleName": "system_messages",
  "SourceModuleType": "im_file",
  "SyslogFacilityValue": 1,
  "SyslogFacility": "USER",
  "SyslogSeverityValue": 5,
  "SyslogSeverity": "NOTICE",
  "SeverityValue": 2,
  "Severity": "INFO",
  "Hostname": "SERVER-1",
  "EventTime": "2023-10-11T23:11:26.000000+01:00",
  "SourceName": "systemd",
  "ProcessID": 1,
  "Message": "Started NXLog daemon.",
  "meta.os_name": "Linux Ubuntu",
  "meta.os_ver": "20.04",
  "meta.processor": "model name\t: Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz",
  "meta.processor_arch": "x86_64",
  "meta.processor_count": 16
}
See also
  • The Logs table reference lists the NXLog Agent core and standard fields.

  • The NXLog language contains several other functions that you can use for log enrichment, such as the host_ip() and hostname() functions. See Functions in the NXLog Agent Reference Manual for a complete listing.