Reducing bandwidth and data size
There are several ways that NXLog can be configured to reduce the size of log data. This can help lower bandwidth requirements during transport, storage requirements for log data storage, and licensing costs for commercial SIEM systems that charge based on data volume.
The three main strategies for achieving this goal are covered in the following sections:
-
Filtering events by removing unnecessary or duplicate events at the source so that less data needs to be transported and stored—reducing the data size during all subsequent stages of processing.
-
Trimming events by removing extra content or fields from event records which can reduce the total volume of log data.
-
Compressing during transport can drastically reduce bandwidth requirements for events being forwarded.
To achieve the best results, it is important to understand how fields work in NXLog and which fields are being transferred or stored.
For example, removing or modifying fields without modifying $raw_event
will not reduce data requirements at all for an output module instance that uses only $raw_event
.
See Event records and fields for details, as well as the explanation in Compressing during transport below.
Filtering events
Depending on the logging requirements and the log source, it may be possible to simply discard certain events. NXLog can be configured to filter events based on nearly any set of criteria. See also Filtering logs.
In this example, an NXLog agent is configured to collect Syslog messages from devices on the local network. Events are parsed with the xm_syslog parse_syslog() procedure, which sets the SeverityValue field. Any event with a normalized severity lower than 3 (warning) is discarded.
<Extension _syslog>
Module xm_syslog
</Extension>
<Input syslog>
Module im_udp
Host 0.0.0.0
Port 514
Exec parse_syslog(); if $SeverityValue < 3 drop();
</Input>
Similarly, the pm_norepeat module can be used to detect, count, and discard duplicate events.
In their place, pm_norepeat generates a single event with a last message repeated n times
message.
With this configuration, NXLog collects Syslog messages from hosts on the local network with im_udp and parses them with the xm_syslog parse_syslog() procedure.
Events are then routed through a pm_norepeat module instance, where the $Hostname
, $Message
, and $SourceName
fields are checked to detect duplicate messages.
Last, events are sent to a remote host with om_batchcompress.
<Extension _syslog>
Module xm_syslog
</Extension>
<Input syslog_udp>
Module im_udp
Host 0.0.0.0
Port 514
Exec parse_syslog();
</Input>
<Processor norepeat>
Module pm_norepeat
CheckFields Hostname, Message, SourceName
</Processor>
<Output out>
Module om_batchcompress
Host 10.2.0.2
Port 2514
</Output>
<Route r>
Path syslog_udp => norepeat => out
</Route>
Trimming events
NXLog can be configured to parse events into various fields in the event record. In this case, a whitelist can be used to retain a set of important fields. See Rewriting and modifying logs for more information about modifying events.
This configuration reads from the Windows Event Log with im_msvistalog and uses an xm_rewrite module instance to discard any fields in the event record that are not included in the whitelist. The xm_rewrite instance below could be used with multiple sources; for example, the whitelist would also be suitable for the xm_syslog fields.
The xm_rewrite module does not remove the $raw_event field.
|
<Extension whitelist>
Module xm_rewrite
Keep AccountName, Channel, EventID, EventReceivedTime, EventTime, Hostname, \
Severity, SeverityValue, SourceName
</Extension>
<Input eventlog>
Module im_msvistalog
<QueryXML>
<QueryList>
<Query Id='0'>
<Select Path='Security'>*[System/Level<=4]</Select>
</Query>
</QueryList>
</QueryXML>
Exec whitelist->process();
</Input>
In some cases, event messages contain a lot of extra data that is duplicated across multiple events of the same time. One example of this is the "descriptive event data" which has been introduced by Microsoft for the Windows Event Log. By removing this verbose text from common events, event sizes can be reduced significantly while still preserving all the forensic details of the event.
The following configuration collects events from the Application, Security, and System channels. Rules are included for truncating the messages of Security events with IDs 4688 and 4769.
In this example, the $Message field is truncated. However, the $raw_event field is not.
For most input modules, $raw_event will include the contents of $Message and other fields (see the im_msvistalog $raw_event field).
To update the $raw_event field, include a statement for this (see the comment in the configuration example).
See also Compressing during transport below for more details.
|
A Kerberos service ticket was requested. Account Information: Account Name: WINAD$@TEST.COM Account Domain: TEST.COM Logon GUID: {55a7f67c-a32c-150a-29f1-7e173ff130a7} Service Information: Service Name: WINAD$ Service ID: TEST\WINAD$ Network Information: Client Address: ::1 Client Port: 0 Additional Information: Ticket Options: 0x40810000 Ticket Encryption Type: 0x12 Failure Code: 0x0 Transited Services: - This event is generated every time access is requested to a resource such as a computer or a Windows service. The service name indicates the resource to which access was requested. This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event. The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket. Ticket options, encryption types, and failure codes are defined in RFC 4120.
<Input eventlog>
Module im_msvistalog
<QueryXML>
<QueryList>
<Query Id="0">
<Select Path="Application">
*[System[(Level<=4)]]</Select>
<Select Path="Security">
*[System[(Level<=4)]]</Select>
<Select Path="System">
*[System[(Level<=4)]]</Select>
</Query>
</QueryList>
</QueryXML>
<Exec>
if ($Channel == 'Security') and ($EventID == 4688)
$Message =~ s/\s*Token Elevation Type indicates the type of .*$//s;
else if $(Channel == 'Security') and ($EventID == 4769)
$Message =~ s/\s*This event is generated every time access is .*$//s;
# Additional rules can be added here
# ...
# Optionally, update the $raw_event field
#$raw_event = $EventTime + ' ' + $Message;
</Exec>
</Input>
A Kerberos service ticket was requested. Account Information: Account Name: WINAD$@TEST.COM Account Domain: TEST.COM Logon GUID: {55a7f67c-a32c-150a-29f1-7e173ff130a7} Service Information: Service Name: WINAD$ Service ID: TEST\WINAD$ Network Information: Client Address: ::1 Client Port: 0 Additional Information: Ticket Options: 0x40810000 Ticket Encryption Type: 0x12 Failure Code: 0x0 Transited Services: -
There are cases when large events may cause a problem during transport or for processing by the receiving end. Such a case may be packet fragmentation when using UDP. To prevent this issue, the event may be truncated to make sure that it does not exceed a specific size.
The following configuration reads from the Windows Event Log with im_msvistalog and truncates the event to 1000 bytes by using the substr()
function.
This function accepts an input string and returns a sub-string with the starting and ending positions as byte offsets from the beginning of the string.
This method will cause data after the specified position to be discarded. It should only be used in rare cases when the packet size must not be larger than a set limit. |
<Input eventlog>
Module im_msvistalog
<QueryXML>
<QueryList>
<Query Id='0'>
<Select Path='Security'>*[System/Level=4]</Select>
</Query>
</QueryList>
</QueryXML>
Exec $raw_event = substr($raw_event, 0, 1000);
</Input>
Compressing during transport
There are several ways that event data can be transported between NXLog agents, including the *m_tcp and *m_ssl modules. However, those modules do not provide data compression. The im_batchcompress and om_batchcompress modules, available in NXLog Enterprise Edition, can be used to transfer events in compressed (and optionally, encrypted) batches.
The following chart compares the data requirements for the *m_tcp, *m_ssl (with TLSv1.2), and *m_batchcompress module pairs. It is based on a sample of BSD Syslog records parsed with parse_syslog(). The values shown reflect the total bi-directional bytes transferred at the packet level. Of course, ratios will vary from this in practice based on network conditions and the compressibility of the event data.
Note that the om_tcp and om_ssl modules (among others) transfer only the $raw_event
field by default, but can be configured to transfer all fields with OutputType Binary
.
The om_batchcompress module transfers all fields in the event record, but it is possible to send only the $raw_event
field by first removing the other fields (see Generating $raw_event and removing other fields below).
Simply configuring the *m_batchcompress modules for the transfer of event data between NXLog agents can significantly reduce the bandwidth requirements for that part of the log path.
The table below displays the comparison of sending the same data set using different methods and modules:
Compression method | Modules used | Event size | Diff vs baseline | Sender CPU usage | Receiver CPU usage | EPS sender | EPS receiver |
---|---|---|---|---|---|---|---|
None |
om_tcp, im_tcp |
112 |
0.00% |
141 |
215.07 |
83091.8 |
84169.9 |
None |
om_ssl, im_ssl |
301.7 |
+169.38% |
141.34 |
191.9 |
33161.4 |
47482.9 |
SSLCompression |
om_ssl, im_ssl |
293.2 |
+161.79% |
138.98 |
190.69 |
34497.7 |
47128.5 |
Batch compression |
om_batchcompress, im_batchcompress |
18.4 |
-83.57% |
119.69 |
181.1 |
36252.1 |
77491.8 |
Compression ratios show that enabling SSLCompression yields only a minimal improvement in message size.
Batch compression fares much better, because it compresses data in batches leading to better compression ratios.
With the following configuration, an NXLog agent uses om_batchcompress to send events in compressed batches to a remote NXLog agent.
The *m_batchcompress modules also support SSL/TLS encryption; see the im_batchcompress and om_batchcompress configuration details. |
<Input in>
Module im_file
File 'input.log'
</Input>
<Output out>
Module om_batchcompress
Host 10.2.0.2
Port 2514
</Output>
The remote NXLog agent receives and decompresses the received batches with im_batchcompress. All fields in an event are available to the receiving agent.
<Input in>
Module im_batchcompress
ListenAddr 10.2.0.2
Port 2514
</Input>
<Output out>
Module om_file
File 'output.log'
</Output>
To further reduce the size of the batches transferred by the *m_batchcompress modules, and if only the $raw_event
field will be needed later in the log path, the extra fields can be removed from the event record prior to transfer.
This can be done with an xm_rewrite instance for multiple fields or with the delete() procedure (see Renaming and deleting fields in a log message).
In this configuration, events are collected from the Windows Event Log with im_msvistalog, which sets the $raw_event and many other fields.
To reduce the size of the events, only the $raw_event
field is retained; all the other fields in the event record are removed by the xm_rewrite module instance (called by clean->process()
).
Rather than using the default im_msvistalog $raw_event field, it would also be possible to customize it with something like $raw_event = $EventTime + ' ' + $Message or to_json().
|
<Extension clean>
Module xm_rewrite
Keep raw_event
</Extension>
<Input eventlog>
Module im_msvistalog
<QueryXML>
<QueryList>
<Query Id='0'>
<Select Path='Security'>*[System/Level<=4]</Select>
</Query>
</QueryList>
</QueryXML>
</Input>
<Output out>
Module om_batchcompress
Host 10.2.0.2
Exec clean->process();
</Output>
Alternatively, if the various fields in the event record will be handled later in the log path, the $raw_event
field can be set to an empty string (but see the warning below).
This configuration collects events from the Windows Event Log with im_msvistalog, which writes multiple fields to the event record.
In this case, the $raw_event field contains the same data as other fields.
Because the om_batchcompress module instance will send all the fields in the event record, the $raw_event
field can be emptied.
Many output modules operate on the $raw_event field only.
It should not be set to an empty string unless the output module sends all the event fields (om_batchcompress or a module using the Binary OutputType) and so on for all subsequent agents and modules.
Otherwise, a module instance will encounter an empty $raw_event .
For this reason, the following example is in general not recommended.
|
<Input eventlog>
Module im_msvistalog
<QueryXML>
<QueryList>
<Query Id='1'>
<Select Path='Security'>*[System/Level<=4]</Select>
</Query>
</QueryList>
</QueryXML>
</Input>
<Output out>
Module om_batchcompress
Host 10.2.0.2
Exec $raw_event = '';
</Output>