Reliable message delivery

Sometimes regulatory compliance or other requirements mandate that the logging infrastructure function in an ultra-reliable manner. NXLog Enterprise Edition can be configured to guarantee that:

log data is safe even in case of a crash,
no messages are lost due to intermittent network issues, and
there is no message duplication.

Crash-safe operation

A host or NXLog crash can happen for various reasons, including power failures without a UPS, kernel panics, and software bugs. To protect against data loss in these situations, the following techniques are implemented in NXLog Enterprise Edition.

Log messages are buffered in various places in NXLog, and buffered messages can be lost in the case of a crash. Persistent module message queues can be enabled so that these messages are stored on disk instead of in memory. Each log message is removed from the queue only after successful delivery. See the PersistLogqueue and SyncLogqueue global configuration directives, and the PersistLogqueue and SyncLogqueue module directives.

Log message removal from queues in processor modules happens before delivery. This can result in potential data loss. Do not use processor modules when high-reliability operation is required.

Input positions (for im_file and other modules) are saved in the cache file, and by default, this file is only saved to disk on shutdown. In case of a crash, some events may be duplicated or lost depending on the value of the ReadFromLast directive. This data can be periodically flushed and synced to disk using the CacheFlushInterval and CacheSync directives.

Example 1. Configuration for crash-safe operation

In this example, the log queues are synced to disk after each successful delivery. The cache file containing the current event ID is also flushed and synced to disk after each event is read from the database. Note that these reliability features, when enabled, significantly reduce the processing speed.

nxlog.conf

PersistLogqueue TRUE
SyncLogqueue TRUE
CacheFlushInterval always
CacheSync TRUE

<Input in>
    Module  im_file
    File    'input.log'
</Input>

<Output out>
    Module  om_tcp
    Host    10.0.0.1
    Port    1514
</Output>

Reliable network delivery

NXLog Enterprise Edition exclusive feature

The TCP protocol provides guaranteed packet delivery via packet-level acknowledgment. Unfortunately, if the receiver closes the TCP connection prematurely while messages are being transmitted, unsent data stored in the socket buffers will be lost since this is handled by the operating system instead of the application (NXLog). This can result in message loss and affects im_tcp, om_tcp, im_ssl, and om_ssl. See the diagram in All buffers in a basic route.

The solution to this unreliability in the TCP protocol is application-level acknowledgment. NXLog provides two pairs of modules for this purpose.

NXLog can use the HTTP/HTTPS protocol to provide guaranteed message delivery over the network, optionally with TLS/SSL. The client (om_http) sends the event in an HTTP POST request. The server (im_http, only available in NXLog Enterprise Edition) responds with a status code indicating successful message reception.
Example 2. HTTPS log transfer
In the following configuration example, a client reads logs from a file and transmits the logs over an SSL-secured HTTP connection.
nxlog.conf (client/sending)
<Input in> Module im_file File 'input.log' </Input> <Output out> Module om_http URL https://10.0.0.1:8080/ HTTPSCertFile %CERTDIR%/client-cert.pem HTTPSCertKeyFile %CERTDIR%/client-key.pem HTTPSCAFile %CERTDIR%/ca.pem </Output>
The remote NXLog agent accepts the HTTPS connections and stores the received messages in a file. The contents of input.log will be replicated in output.log.
nxlog.conf (server/receiving)
<Input in> Module im_http ListenAddr 0.0.0.0 Port 8080 HTTPSCertFile %CERTDIR%/server-cert.pem HTTPSCertKeyFile %CERTDIR%/server-key.pem HTTPSCAFile %CERTDIR%/ca.pem </Input> <Output out> Module om_file File 'output.log' </Output>
The om_batchcompress and im_batchcompress modules, available in NXLog Enterprise Edition, also provide acknowledgment as part of the batchcompress protocol.
Example 3. Batched log transfer
With the following configuration, a client reads logs from a file and transmits the logs in compressed batches to a remote NXLog agent.
nxlog.conf (client/sending)
<Input in> Module im_file File 'input.log' </Input> <Output out> Module om_batchcompress Host 10.2.0.2 Port 2514 </Output>
The remote NXLog agent receives and decompresses the received message batches and stores the individual messages in a file. The contents of input.log will be replicated in output.log.
nxlog.conf (server/receiving)
<Input in> Module im_batchcompress ListenAddr 10.2.0.2 Port 2514 </Input> <Output out> Module om_file File 'output.log' </Output>

Protection against duplication

NXLog Enterprise Edition exclusive feature

If the contents of the cache file containing the event position are lost, the module can either read everything from the beginning or risk losing some messages. In the former case, messages may be duplicated. When using persistent queues, messages are not removed from the queue until they have been successfully delivered. If the crash occurs just before removal, the message will be sent again after the agent restarts resulting in a duplicate.

In some cases, it may be very important that a log message is not duplicated. For example, a duplicated message may trigger the same alarm a second time or cause an extra entry in a financial transaction log. NXLog Enterprise Edition can be configured to prevent duplicate messages from occurring.

The best way to prevent duplicated messages is by using serial numbers, as it is only possible to detect duplicates at the receiver. The receiver can keep track of what has been received by storing the serial number of the last message. If a message is received with the same or a lower serial number from the same source, the message is simply discarded.

In NXLog Enterprise Edition, duplication prevention works as follows.

Each module that receives a message directly from an input source or another module in the route assigns a field named $__SERIAL__$ with a monotonically increasing serial number. The serial number is taken from a global generator and is increased after each fetch so that two messages received at two modules simultaneously will not have the same serial number. The serial number is initialized to the seconds elapsed since the UNIX epoch when NXLog is started. This way it can provide 1,000,000 serial numbers per second without problems in case it is stopped and restarted. Otherwise, the value would need to be saved and synced to disk after each serial number fetch, which would adversely affect performance. When a module receives a message it checks the value of the field named $__SERIAL__$ against the last saved value.

The im_http module keeps the value of the last $__SERIAL__$ for each client. It is only possible to know and identify the client (om_http sender) in HTTPS mode. The Common Name (CN) in the certificate subject is used and is assumed to uniquely identify the client.

The remote IP and port number cannot be used to identify the remote sender because the remote port is assigned dynamically and changes for every connection. Thus if a client sends a message, disconnects, reconnects, and then sends the same message again, it is impossible to know if this is the same client or another. For this reason, it is not possible to protect against message duplication with plain TCP or HTTP when multiple clients connect from the same IP. The im_ssl and im_batchcompress modules do not have the certificate subject extraction implemented at this time.

All other non-network modules use the value of $SourceModuleName which is automatically set to the name of the module instance generating the log message. This value is assumed to uniquely identify the source. The value of $SourceModuleName is not overwritten if it already exists. Note that this may present problems in some complex setups.
The algorithm is implemented in one procedure call named duplicate_guard(), which can be used in modules to prevent message duplication. The dropped() function can be then used to test whether the current log message has been dropped.

Example 4. Disallowing duplicated messages

The following client and server configuration examples extend the earlier HTTPS example to provide an ultra-reliable operation where messages cannot be lost locally due to a crash, lost over the network, or duplicated.

nxlog.conf (client/sending)

PersistLogqueue TRUE
SyncLogqueue TRUE
CacheFlushInterval always
CacheSync TRUE

<Input in>
    Module              im_file
    File                'input.log'
</Input>

<Output out>
    Module              om_http
    URL                 https://10.0.0.1:8080/
    HTTPSCertFile       %CERTDIR%/client-cert.pem
    HTTPSCertKeyFile    %CERTDIR%/client-key.pem
    HTTPSCAFile         %CERTDIR%/ca.pem
    Exec                duplicate_guard();
</Output>

The server accepts the HTTPS connections and stores the received messages in a file. The contents of input.log will be replicated in output.log.

nxlog.conf (server/receiving)

PersistLogqueue TRUE
SyncLogqueue TRUE
CacheFlushInterval always
CacheSync TRUE

<Input in>
    Module              im_http
    ListenAddr          0.0.0.0
    Port                8080
    HTTPSCertFile       %CERTDIR%/server-cert.pem
    HTTPSCertKeyFile    %CERTDIR%/server-key.pem
    HTTPSCAFile         %CERTDIR%/ca.pem
    Exec                duplicate_guard();
</Input>

<Output out>
    Module              om_file
    File                'output.log'
    Exec                duplicate_guard();
</Output>