Log normalization
Log normalization is required when events from different log sources need to be translated to a common data structure. You need to normalize logs according to the platform taxonomy when forwarding events to a SIEM or log analytics platform. Almost all SIEM solutions have taxonomies for different types of logs, such as:
-
Logon events (successful or failed login, logoff)
-
File access (auditing)
-
Network access (VPN connections, web access)
Examples include the Elastic Common Schema (ECS), which defines a standard set of fields to store event data in Elasticsearch, or the Unified Data Model (UDM) when forwarding events to Send logs to Google Chronicle. Data normalization enables SIEMs to efficiently interpret logs across different sources, facilitates event correlation, and makes it easier for you to work with the data in dashboards and reports.
Methods to normalize data
NXLog enables you to translate logs from different sources into a single taxonomy. Once logs are parsed, you can map event fields to the required schema, enrich log records with additional fields, and output events in a different format altogether.
See also Extracting data, Log classification, and Rewriting and modifying logs.
Mapping fields
The NXLog language provides the rename_field() and delete() procedures for simple manipulation of fields.
This example reads syslog messages from file and parses records into structured data using the parse_syslog() procedure of the xm_syslog module.
The NXLog $SourceModuleType
and $SourceModuleName
core fields are renamed to $NXLogModuleType
and $NXLogModuleName
respectively, and the NXLog $SeverityValue
and $Severity
fields are deleted.
Finally, records are converted to JSON format using the to_json() procedure of the xm_json module.
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Input system_messages>
Module im_file
File '/var/log/syslog'
<Exec>
parse_syslog();
# Rename a field by passing the field names as strings
# or the fields themselves
rename_field("SourceModuleType", "NXLogModuleType");
rename_field($SourceModuleName, $NXLogModuleName);
# Delete a field by passing the field name as a string
# or the field itself
delete("SeverityValue");
delete($Severity);
to_json();
</Exec>
</Input>
The following is a syslog message collected from a Linux host.
Mar 9 23:11:26 NXLog-Ubuntu-1 systemd[1]: Started NXLog daemon.
The following JSON object shows the same log record after it was processed by NXLog.
{
"EventReceivedTime": "2022-03-09T23:11:26.880942+01:00",
"NXLogModuleName": "system_messages",
"NXLogModuleType": "im_file",
"SyslogFacilityValue": 1,
"SyslogFacility": "USER",
"SyslogSeverityValue": 5,
"SyslogSeverity": "NOTICE",
"Hostname": "NXLog-Server-1",
"EventTime": "2022-03-09T23:11:26.000000+01:00",
"SourceName": "systemd",
"ProcessID": 1,
"Message": "Started NXLog daemon."
}
For more advanced processing, the xm_rewrite module allows you to rename or delete fields, specify a list of fields to retain, and transform the data based on custom processing.
This example reads syslog messages from file. Records are parsed into structured data using the parse_syslog() procedure of the xm_syslog module. The xm_rewrite module is used to map NXLog fields to the Elastic Common Schema. Finally, records are converted to JSON format using the to_json() procedure of the xm_json module.
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Extension syslog_ecs>
Module xm_rewrite
Rename EventTime, @timestamp
Rename EventReceivedTime, event.ingested
Rename Severity, event.severity
Rename SeverityValue, log.level
Rename SyslogSeverityValue, log.syslog.severity.code
Rename SyslogSeverity, log.syslog.severity.name
Rename SyslogFacilityValue, log.syslog.facility.code
Rename SyslogFacility, log.syslog.facility.name
Rename ProcessID, process.pid
Rename SourceName, service.type
Rename Message, message
Rename SourceModuleType, nxlog.module.type
Rename SourceModuleName, nxlog.module.name
<Exec>
${event.original} = $raw_event;
${ecs.version} = "8.0.1";
if $Hostname =~ /^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$/
{
${host.ip} = $Hostname;
}
else
{
${host.hostname} = $Hostname;
}
</Exec>
Delete Hostname
</Extension>
<Input system_messages>
Module im_file
File '/var/log/syslog'
<Exec>
parse_syslog();
syslog_ecs->process();
to_json();
</Exec>
</Input>
The following is a syslog message collected from a Linux host.
Mar 9 22:32:33 NXLog-Server-1 systemd[1]: Started NXLog daemon.
The following JSON object shows the same log record after it was processed by NXLog.
{
"event.ingested": "2022-03-09T22:32:33.485096+01:00",
"nxlog.module.name": "system_messages",
"nxlog.module.type": "im_file",
"log.syslog.facility.code": 1,
"log.syslog.facility.name": "USER",
"log.syslog.severity.code": 5,
"log.syslog.severity.name": "NOTICE",
"log.level": 2,
"event.severity": "INFO",
"@timestamp": "2022-03-09T22:32:33.000000+01:00",
"service.type": "systemd",
"process.pid": 1,
"message": "Started NXLog daemon.",
"event.original": "Mar 9 22:32:33 NXLog-Ubuntu-1 systemd[1]: Started NXLog daemon.",
"ecs.version": "8.0.1",
"host.hostname": "NXLog-Server-1"
}
Data enrichment
Normalization may require log records to include a set of standard metadata fields, such as labels that describe the environment where the event was generated and keywords to tag the event. Such data might not be part of the event record but must be added from an external source. NXLog provides several methods to enrich log records.
The envvar general directive allows you to use operating system environment variables accessible by the NXLog user.
This example defines three environment variables to retrieve CPU information. The im_msvistalog module is used to read Windows events from the System log. Log records are enriched with the CPU information from the environment variables and then converted to JSON format using the to_json() procedure of the xm_json module.
envvar PROCESSOR_IDENTIFIER
envvar PROCESSOR_ARCHITECTURE
envvar NUMBER_OF_PROCESSORS
<Extension json>
Module xm_json
</Extension>
<Input eventlog>
Module im_msvistalog
<QueryXML>
<QueryList>
<Query Id="0">
<Select Path="System">*</Select>
</Query>
</QueryList>
</QueryXML>
<Exec>
${meta.processor} = '%PROCESSOR_IDENTIFIER%';
${meta.processor_arch} = '%PROCESSOR_ARCHITECTURE%';
${meta.processor_count} = %NUMBER_OF_PROCESSORS%;
to_json();
</Exec>
</Input>
The following JSON object shows a Windows System event after it was processed by NXLog.
{
"EventTime": "2022-03-09T18:25:40.946416+01:00",
"Hostname": "SERVER1.example.com",
"Keywords": "9259400833873739776",
"LevelValue": 4,
"EventType": "INFO",
"SeverityValue": 2,
"Severity": "INFO",
"EventID": 7036,
"SourceName": "Service Control Manager",
"ProviderGuid": "{555908D1-A6D7-4695-8E1E-26931D2012F4}",
"Version": 0,
"TaskValue": 0,
"OpcodeValue": 0,
"RecordNumber": 23944,
"ExecutionProcessID": 536,
"ExecutionThreadID": 1700,
"Channel": "System",
"Message": "The nxlog service entered the running state.",
"Level": "Information",
"param1": "nxlog",
"param2": "running",
"EventData.Binary": "6E0078006C006F0067002F0034000000",
"EventReceivedTime": "2022-03-09T18:25:42.962159+01:00",
"SourceModuleName": "eventlog",
"SourceModuleType": "im_msvistalog",
"meta.processor": "Intel64 Family 6 Model 165 Stepping 5, GenuineIntel",
"meta.processor_arch": "AMD64",
"meta.processor_count": 16
}
The include and include_stdout general directives allow you to load data into the NXLog configuration from a file or script respectively. For example, with include_stdout, you can execute a script to read dynamic data and inject the script’s output into the configuration.
This example uses two files to inject static and dynamic values into the NXLog configuration. The first file defines two static values for the operating system name and version.
define OS_NAME Linux Ubuntu
define OS_VER 20.04
The second is a bash script to retrieve CPU information from the operating system and output the values to the standard output.
#!/bin/bash
PROCESSOR=$(cat /proc/cpuinfo | grep 'name'| uniq)
PROCESSOR_COUNT=$(cat /proc/cpuinfo | grep process| wc -l)
PROCESSOR_ARCH=$(uname -m)
echo "define PROCESSOR $PROCESSOR"
echo "define PROCESSOR_COUNT $PROCESSOR_COUNT"
echo "define PROCESSOR_ARCH $PROCESSOR_ARCH"
The above files are included in the NXLog configuration using the include and include_stdout directives. The im_file input module is used to read syslog messages from file and records are parsed into structured data using the parse_syslog() procedure of the xm_syslog module. Log records are enriched with the operating system and CPU information included from the file and script and then converted to JSON format using the to_json() procedure of the xm_json module.
include /opt/nxlog/etc/env.conf
include_stdout /opt/nxlog/etc/env.sh
<Extension json>
Module xm_json
</Extension>
<Extension syslog>
Module xm_syslog
</Extension>
<Input system_messages>
Module im_file
File '/var/log/syslog'
<Exec>
parse_syslog();
${meta.os_name} = '%OS_NAME%';
${meta.os_ver} = '%OS_VER%';
${meta.processor} = '%PROCESSOR%';
${meta.processor_arch} = '%PROCESSOR_ARCH%';
${meta.processor_count} = %PROCESSOR_COUNT%;
to_json();
</Exec>
</Input>
The following is a syslog message collected from a Linux host.
Mar 9 17:02:16 NXLog-Ubuntu-1 systemd[1]: Started NXLog daemon.
The following JSON object shows the same log record after it was processed by NXLog.
{
"EventReceivedTime": "2022-03-09T17:02:16.172998+01:00",
"SourceModuleName": "file",
"SourceModuleType": "im_file",
"SyslogFacilityValue": 1,
"SyslogFacility": "USER",
"SyslogSeverityValue": 5,
"SyslogSeverity": "NOTICE",
"SeverityValue": 2,
"Severity": "INFO",
"Hostname": "NXLog-Server-1",
"EventTime": "2022-03-09T17:02:16.000000+01:00",
"SourceName": "systemd",
"ProcessID": 1,
"Message": "Started NXLog daemon.",
"meta.os_name": "Linux Ubuntu",
"meta.os_ver": "20.04",
"meta.processor": "model name\t: Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz",
"meta.processor_arch": "x86_64",
"meta.processor_count": 16
}
The NXLog language contains several other functions that you can use for log enrichment, such as the host_ip() and hostname() functions. See Functions in the NXLog Enterprise Edition Reference Manual for a complete listing.
Output log format
JSON is one of the most common formats supported by modern SIEM solutions. NXLog supports data conversion to JSON with the xm_json module, as demonstrated by the examples above. Additionally, it also supports data conversion from and to other formats, including:
Refer to the module documentation in the NXLog Enterprise Edition Reference Manual for further details. You will also find SIEM-specific examples in our extensive list of Integration guides.
Standard NXLog fields
The fields available for a log record depend on the log source and which input or extension modules processed it. However, some fields are standard across several NXLog modules.
The following table lists the fields created by the NXLog core for every record.
Field | Type | Description |
---|---|---|
raw_event |
The data received from stream modules (e.g. im_file, im_tcp, etc.). |
|
EventReceivedTime |
The time when the event is received or collected by NXLog. * |
|
Hostname |
The IP address or hostname where the event originated. |
|
SourceModuleName |
The name of the NXLog input module instance. * |
|
SourceModuleType |
The type of the NXLog input module instance (e.g. im_file). * |
* If these fields already exist, they will not be overwritten.
The following fields are standard across several modules.
Field | Type | Description |
---|---|---|
EventTime |
The date and time of the event. |
|
EventType |
This field describes the type of event according to the log source, e.g. for Windows events, it represents the severity ( |
|
Message |
The event message. |
|
MessageSourceAddress |
The IP address of the remote host. Available in network modules (e.g. im_tcp, im_udp, etc.) |
|
ProcessID |
The ID of the process that generated the event. |
|
Severity |
Severity name corresponding to the SeverityValue: Debug (1), Info (2), Warning (3), Error (4), Critical (5). |
|
SeverityValue |
NXLog normalized severity value between 1 - 5. Refer to the module documentation for how this value is mapped to the severity set by the log source. |
|
SourceName |
The application or device that generated the event. |
The field type may differ from one module to another. It is essential to handle data types according to your requirements when normalizing data. The NXLog language provides several data conversion functions. Refer to Functions in the NXLog Enterprise Edition Reference Manual. |
See the module documentation in the NXLog Enterprise Edition Reference Manual for a list of fields created by each module.