Processing errors

This page provides troubleshooting tips and solutions for when NXLog Agent stops collecting logs or log processing performance degrades.

NXLog Agent stops processing logs

Symptom

NXLog Agent stops collecting and processing logs even though the log source has new events.

Possible reason

NXLog Agent’s flow control pauses log collection when a processor or output module instance in the route fails. The flow control mechanism is designed to prevent data loss and is enabled by default. This issue can happen even if NXLog Agent is configured to forward logs from a single input instance to multiple output instances. If one of the outputs fails, the input instance stops collecting logs, blocking the entire route.

Investigation

Verify whether any processor or output module instances in the same route(s) as the input instance are encountering errors. It is best to check the NXLog Agent log file first to determine if any of the instances report errors. The default log file location is:

Windows

C:\Program Files\nxlog\data\nxlog.log

Linux and macOS

/opt/nxlog/var/log/nxlog/nxlog.log

Solution

You must resolve the issue with the failing processor or output module instance so NXLog Agent can resume collecting logs. Otherwise, you can turn off flow control globally or for a single instance for NXLog Agent to continue collecting logs if a processor or output instance is in an error state. See the FlowControl directive.

This approach is suitable when you are forwarding logs to multiple destinations and do not mind if one of the destinations is down and doesn’t receive the logs. In this case, NXLog Agent will not try to resend logs to the failing destination once it becomes operational.

If there is only one output instance in the route and you turn off flow control, NXLog Agent discards the logs it collects once the log queue is full. If an output instance has the potential to fail and you must turn off flow control, you can add a memory or disk-based buffer to your route with the Buffer (pm_buffer) module. See NXLog Agent buffering and flow control in the NXLog Platform User Guide for more information.

Too many open files

Symptom

NXLog Agent performance has degraded, and the log file contains errors similar to the following:

2024-07-04 15:26:37 ERROR SSL error, failed to load ca cert from '/opt/nxlog/var/lib/nxlog/cert/agent-ca.pem', reason: Too many open files, system lib,
system lib
Possible reason

The operating system limits the number of files a process may open. Once a process reaches this limit, attempts to open new files are blocked.

Investigation

On Linux, you can check the NXLog Agent process limits with the following command:

$ cat /proc/$(sudo cat /opt/nxlog/var/run/nxlog/nxlog.pid)/limits

On systems not using /proc, use the following command to check the system’s open file limit:

$ sysctl kern.maxfiles

or

$ sysctl fs.file-max

See Inspecting open file handles for more information on checking NXLog Agent’s open files.

Solution

You must ensure that the open file limit imposed on the NXLog Agent process is appropriate for your configuration. On Linux, there are instances where systemd ignores the system-wide limits. In this case, you can override the file limit by creating /etc/systemd/system/nxlog.service.d/override.conf with the following content:

[Service]
LimitNOFILE=100000

You must reload the service settings to apply the changes:

$ sudo systemctl daemon-reload