Protect against duplicate data

NXLog Agent implements mechanisms to protect against data loss. For example, several input modules save the last record they processed in a cache file. If the cache file is corrupt, the module reads all available data, possibly duplicating records to avoid the risk of data loss.

Another instance where duplicate data may occur is when using persistent queues. Since data is not removed from the queue before it’s delivered successfully, if a crash happens just before data is removed, the module resends it when NXLog Agent restarts.

If data duplication is undesirable, you can configure NXLog Agent to prevent duplicate records.

Detect duplicate records

NXLog Agent provides the duplicate_guard() procedure to detect and discard duplicate data in the cases described above. It does this by checking whether the record’s serial number is older than the last serial number saved by the module, and if it is, discards it.

Example 1. Detecting and deleting duplicate data

This configuration uses disk-based queues for all processor and output module instances. It then uses the duplicate_guard() procedure to ensure it does not send duplicate records to the SIEM.

nxlog.conf
PersistLogqueue    TRUE    (1)

<Input file>
    Module         im_file
    File           '/path/to/input.log'
</Input>

<Output siem>
    Module         om_http
    URL            http://siem.example.com:8080/
    Exec           duplicate_guard();
</Output>
1 Enables PersistLogqueue to use disk-based log queues.