NXLog Docs

Common issues

Some common NXLog issues can be identified by the type of errors they generate in the internal logs. Match the error message to an example in one of the following sections, and follow the suggested steps to resolve the issue.

Startup error

You may receive this error message in the log file when NXLog fails to start (line break added):

nxlog failed to start: Invalid keyword: ÿþ# at \
C:\Program Files (x86)\nxlog\conf\nxlog.conf:1

This issue occurs because the NXLog configuration file has been saved in either UTF-16 text encoding, or UTF-8 text encoding with a BOM header.

Open the configuration file in a text editor and save it using ASCII encoding or plain UTF-8.

On Windows, you can use Notepad to correct the text encoding of this file.

Access errors

These errors can be generated when attempting to read log data from files on Linux or Event Log data on Microsoft Windows. Both types of errors are caused due to lack of permissions.

Additionally, the internal log file can be used by another application that may cause access problems while trying to read its contents.

Permission-related error on Linux

When configured to read from a file in the /var/log directory on Linux, NXLog may log the following error:

ERROR failed to open /var/log/messages;Permission denied

This error occurs because NXLog does not have permission to read the file with the configured User and Group. See the Reading Rsyslog log files for more information about using NXLog to read files from the /var/log directory.

Permission error of Windows Event Log

When collecting events from the Windows Event Log, the user running NXLog may not have sufficient permissions to access certain channels. When NXLog is running as a service, this applies to the user that the service is configured to run as. In this case, the NXLog log file shows errors such as:

WARNING [im_msvistalog|windows] failed to subscribe to msvistalog events,access denied [error code: 5]: Access is denied.
WARNING [im_msvistalog|windows] Invalid channel: 'Security': Access is denied.

or

WARNING [im_msvistalog|windows] ignoring source as it cannot be subscribed to (error code: 5)

When this error occurs, the user needs to be granted access to read the specified channel. For default Event Log channels, it is usually sufficient to add the user to the built-in Event Log Readers group by following these steps:

  1. Open the Computer Management MMC snap-in by going to the Windows Start menu, type compmgmt.msc and press Enter.

  2. Expand System Tools > Local Users and Groups > Groups.

  3. Double-click on the Event Log Readers group and add the NXLog user to it.

If the error persists, permission needs to be granted using Group Policy for the default Windows Event Log channels or the Windows Registry for other channels. Permissions are specified using the Security Descriptor Definition Language (SDDL).

  1. The first step is to retrieve the SID of the NXLog user. From a command prompt, run the following command:

    > wmic useraccount where name='<username>' get sid
  2. For default Windows Event Log channels:

    1. Open command prompt with an admin user and run the following command afer replacing <channel_name> with the actual channel name.

      > wevtutil gl <channel_name>

      In this example, the security channel was chosen:

    2. Take note of the channelAccess value.

      wevtutil command result
    3. Open the Group Policy Editor by going to the Windows Start menu, then type gpedit.msc and press Enter.

    4. Expand Computer Configuration > Administrative Templates > Windows Components > Event Log Service.

    5. Select the required channel from the list, for example Security. Double-click on the Configure log access policy to edit it.

    6. Select the Enabled option.

    7. Under Log Access, enter the channelAccess value retrieved above.

    8. Append the Log Access value with the permission for the NXLog service user. Add the following permission to grant the user read access:

      (A;;0x1;;;<user_sid>)

      Here, A means allow and 0x1 means read. You will need to replace <user_sid> with the SID retrieved in step 1.

      Configure log access policy
    9. From a command prompt run the following command to apply the updated policy:

      > gpupdate /force
  3. For other Windows Event Log channels:

    1. Open the Registry Editor by going to the Windows Start menu, type regedit and press Enter.

    2. Expand the following registry key:

      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Channels
    3. From the list of keys, find the channel shown in the error and click on it.

    4. In the right pane, double click on the ChannelAccess value to modify it.

    5. Append the permission for the NXLog service user to the existing value. Add the following permission to grant the user read access:

      (A;;0x1;;;<user_sid>)

      Here, A means allow and 0x1 means read. You will need to replace <user_sid> with the SID retrieved in step 1.

      Windows Registry Channel Access Permissions
    6. Repeat these steps for each channel showing the error.

    7. Restart Windows. This step is important because the new permissions will not be applied until Windows has been restarted.

      Your channel selection might be stored in a different registry key than the one specified above. If so, you will need to do some research to determine the correct registry key for each required channel.
      These steps require altering the Windows Registry and should be executed with care. Incorrect modifications could potentially render the system unusable.

Cannot write logs to a mapped network drive

When configured to write logs to a mapped network drive, NXLog logs one of the following errors:

ERROR [om_file|audit] Couldn't create directory: z:\logs\audit_secure (perms=OS_DEFAULT)

or

ERROR [om_file|audit] apr_file_write failed; The request is not supported.

This issue occurs because NFS mounting is only available for the current user and session. By default, the NXLog service runs under the System account, which uses a different session, and therefore does not have access to the user’s drive mappings.

One workaround for this issue would be to run NXLog as an application instead of a service. For example, the following batch script maps DC1:/log to Z: and starts NXLog interactively. You can then configure Task Scheduler to execute this script on Windows startup and disable the NXLog service.

echo off
mount -o fileaccess=777 DC1:/log Z:
cd "C:\Program Files\nxlog\"
nxlog.exe -f -c conf\nxlog.conf

Log file is in use by another application

If you try to view the internal log file of NXLog on Windows, you may receive an error message indicating that the log file is in use by another application and cannot be accessed.

To resolve this issue, either open the log file with an application that does not use exclusive locking (such as Notepad), or stop NXLog before opening the log file.

Connection error

When using the om_tcp and om_ssl modules to transfer logs to remote hosts, network connectivity can be an issue, which is true for any application having a network component as part of its core feature. The first step is to search for any “couldn’t connect to<ip-address:port>;” errors in NXLog’s log file.

The final portion of the “couldn’t connect to …​;” error message yields more valuable information.

The remote host is unreachable

If the error message in the NXLog log file ends with “No route to host”, this is the most severe of the connection errors. This means the destination host cannot be found on the network. For all practical purposes, it is literally offline.

ERROR [om_tcp|tcp_out] couldn't connect to 192.168.0.111:1514;No route to host
INFO [om_tcp|tcp_out] reconnecting to 192.168.0.111:1514 in 1 sec

This could be due to any number of reasons:

  • Physical: Network adapter failure, cable damaged or disconnected, host is powered off, etc.

  • System problems: Networking on the remote host is either disabled or non-functional.

  • Isolated network: The host has been moved to a different network, or subnet, to which there is no route.

It might be helpful in this situation to know if this NXLog agent has ever successfully connected the remote host by searching for “successfully connected to <ip-address>:<port>” in the current NXLog log file (or older, rotated log files), where <ip-address> represents the hostname or IP address of the unreachable host and <port> represents the actual port number it is supposed to be listening on.

The remote host refuses to connect

If the error message in the NXLog log file ends with “Connection refused”, this means there is a physical network connection between the two hosts, but the remote host will not complete the connection until the network rules allow it.

In the following example, the NXLog agent has accepted an incoming connection from a remote host identified by IP address 192.168.0.111 on port 45220. From this, we can deduce that 192.168.0.111 is not only reachable, but it can also send logs to this NXLog agent. But, even though port 1514 is open on the 192.168.0.111 host, the “Connection refused” error message indicates that the NXLog agent cannot send the processed, enriched logs back to it.

INFO [im_tcp|tcp_in] connection accepted from 192.168.0.111:45220
WARNING [im_tcp|tcp_in] TCP connection closed from 192.168.0.111:45220: End of file found
INFO [om_tcp|tcp_out] connecting to 192.168.0.111:1514
ERROR [om_tcp|tcp_out] couldn't connect to 192.168.0.111:1514;Connection refused

In this case, the receiving host identified by IP address 192.168.0.111 is behind a firewall that restricts incoming traffic to its low-range port numbers. Reconfiguring both hosts to use a higher port number, like 31514, to bypass the firewall restrictions solved this problem.

INFO [om_tcp|tcp_out] connecting to 192.168.0.111:31514
INFO [im_tcp|tcp_in] listening on 0.0.0.0:1514
INFO [om_tcp|tcp_out] successfully connected to 192.168.0.111:31514
INFO [im_tcp|tcp_in] connection accepted from 192.168.0.111:45894
WARNING [im_tcp|tcp_in] TCP connection closed from 192.168.0.111:45894: End of file found

More frequently, firewalls are configured to silently drop unwanted packets after a specific period, resulting in one of the following forms of connection timeout error.

ERROR [om_tcp|tcp_out] couldn't connect to 192.168.0.50:1515;Operation timed out
ERROR [om_ssl|ssl_out] couldn't connect to 192.168.0.50:1515;The timeout specified has expired
Both forms of the error message above can apply to either of the om_tcp or om_ssl modules.

To resolve this issue:

  • Check that no firewall, gateway, or other network issue is blocking the connection.

  • Verify that the system can resolve the host name used in the Host directive of the configuration file.

Common firewall configuration issues

Incorrectly configured firewalls are a common cause of network connection errors. To communicate with network devices, SIEMs, and other NXLog agents, NXLog requires an unobstructed connection between hosts.

Unix-based devices commonly use the iptables utility program for managing firewall rules. In some cases, the firewall configuration could block communication channels between an NXLog agent and the sender or receiver. To test if a firewall configuration blocks the connection, with the following command, you can use the iptables program to remove—​flush—​all firewall rules that are currently enabled.

# iptables -F
Flushing the firewall rules will essentially delete your firewall configuration, removing any protection that it provides. This should only be used for testing network connectivity. Be sure to backup your firewall rules before flushing the rules and reconfigure your firewall properly before continuing production operation.

Certificate/TLS issues

This section explains how to handle the following issues:

Certificate cannot be verified

Authentication issues basically stem from unsuccessful verification of the certificate against the CA on the client and server sides. For example, the specified CA file may be non-existent, or the configuration may point to the wrong CA file.

A typical authentication error message is provided below:

SSL certificate problem: unable to get local issuer certificate

Below are more examples of the verification error message:

ERROR SSL certificate verification failed: unable to get local issuer
certificate (err: 20)

and

ERROR SSL certificate verification failed: certificate has expired (err: 10)

Usage of a self-signed certificate produces the following message:

ERROR SSL certificate verification failed: self signed certificate (err: 18)

The first step in solving such problems is to set the AllowUntrusted directive to TRUE and restart the agent. The manager instance should establish connection with the agent.

Additionally, CA and certificate files can be verified using the openssl tool. To verify the CA file, use the command below:

# openssl s_client -CAfile <path_to_CA_file> -connect <host:port>

To verify the server’s certificate against the CA certificate specified by the CAFile directive, use the following command:

# openssl s_client -connect <host:port> -cert <path_to_cert_file> -key <path_to_key_file> -CAfile <path_to_CA_file> -verify 1

Instead of the CA file, the path to the CA directory can be specified per the command below:

# openssl s_client -connect <host:port> -cert <path_to_cert_file> -key <path_to_key_file> -CApath  <path_to_CA_dir> -verify 1

Connection being reset

The following error is generated when the connection is being reset:

ERROR remote ssl socket was reset? (SSL_ERROR_SSL with errno=9); End of file found

This occurs in the following cases:

  • The agent is presenting a certificate that cannot be verified by NXLog Manager

  • The connection is being terminated, for example by a firewall

To troubleshoot this problem, check the other party’s logs and network packet captures. The SSL stack refusing the connection may log a more precise reason.

Incorrect certificate purpose

Certificates created with an incorrect or incompatible "certificate purpose" produce the following error:

ERROR SSL certificate verification failed: unsupported certificate purpose (err: 26)

To examine certificates, use the following command:

# openssl x509 -text -in <path_to_cert_file>

No shared ciphers

When the configuration applies an SSL cipher restriction (for compliance reasons), the two parties may not have any ciphers in common. This produces the following error:

ERROR SSL error, SSL_ERROR_SSL: retval -1, from 127.0.0.1:33240, reason: no shared cipher

To solve this problem, use the same cipher set in openssl to check if the problem exists:

openssl s_server -key agent-key.pem -CAfile agent-ca.pem -cert agent-cert.pem -port 7000 -cipher ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256

This command emulates the setup of the following listener:

<Input im_batchcompress>
    Module           im_batchcompress
    ListenAddr       0.0.0.0:2514
    CAFile           %CERTDIR%/agent-ca.pem
    CertFile         %CERTDIR%/agent-cert.pem
    CertKeyFile      %CERTDIR%/agent-key.pem
    AllowUntrusted   TRUE
    SSLCipher        ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256
</Input>

To test the listener from the sample above, run the following command:

openssl s_client -key agent-key.pem -CAfile agent-ca.pem -cert agent-cert.pem -connect localhost:2514

SSL not supported by the remote service

The following error is generated when an NXLog client configured to exchange data over SSL is connecting to a service configured without encryption:

ERROR SSL error, SSL_ERROR_SSL: retval -1, reason: wrong version number

To solve this problem, make sure that the service is configured to use SSL and enable it. If not, disable the use of SSL encryption from NXLog.

Log format error

If you are using Logstash, you may find that log entries are concatenated. To mitigate the error, make sure that you are using the json_lines codec in your Logstash server configuration.

The default json codec in Logstash sometimes fails to parse log entries passed from NXLog. Switch to the json_lines codec for better reliability.

When NXLog tries to evaluate a directive with the required log data unavailable in the current context, this results in producing the following missing record error:

missing record, assignment possibly after drop()

This error occurs when attempting to access a field from the Exec directive of a Schedule block. The log data is not available in the current context. Log data is never available to a scheduled Exec directive because its execution is not triggered by a log message.

An attempt to access a field can occur directly with a field assignment, or indirectly by calling a function or procedure that accesses log data.

Processing errors

This category encompasses the following types of errors:

Termination of processing

NXLog can send one log stream to multiple outputs. This can be configured by either using the same input in multiple routes or using multiple outputs in the same route (see Routes). By default, when one of the outputs fails NXLog will stop sending logs to all outputs. This is caused by NXLog flow control mechanism, which is designed to prevent messages from being lost. Flow control pauses an Input or Processor module when the next module in the route is not ready to accept data.

In some cases, it is preferred for NXLog to continue sending logs to the remaining active output and discard logs for the failed output. The simplest solution is to disable flow control. This can be done globally with the global FlowControl directive, or for the corresponding Input (and Processor, if any) modules only, with the module FlowControl directive.

With flow control disabled, an Input or Processor module will continue to process logs even if the next module’s buffers are full (and the logs will be dropped).

To retain the improved message durability provided by flow control, it is possible to instead explicitly specify when to drop logs by using a separate route for each output that may fail. Add a pm_buffer module instance to that route, and configure the buffer to drop logs when it reaches a certain size. The output that follows the buffer can fail without causing any Input or Processor module before the buffer to pause processing. For examples, see the Using buffers section.

Check open files and limits

When a system nears or exceeds its open files limit, significant performance penalties are typically quick to follow. LSOF (List Open Files) is a common debugging tool found on the majority of Linux systems and can reveal a great deal about the running system.

On Linux, run the following command to see, for example, which files NXLog has open:

$ lsof -u nxlog

This example returns the number of open files:

$ lsof -Fn -u nxlog | sort | uniq | wc -l

To check NXLog system limits, use the following command:

$ cat /proc/$(sudo cat /opt/nxlog/var/run/nxlog/nxlog.pid)/limits

On Systems not using /proc, check the system’s open file limit with:

$ sysctl kern.maxfiles

or with:

$ sysctl fs.file-max
There is no Windows equivalent to lsof. You can use Windows Process Explorer from Microsoft’s Windows Sysinternals to inspect which program has files or directories open.

Systemd and open files limit

There are certain cases where systemd ignores system-level file limits. This can generate the "too many files open" error.

2019-01-22 15:26:37 ERROR SSL error, failed to load ca cert from '/opt/nxlog/var/lib/nxlog/cert/agent-ca.pem', reason: Too many open files, system lib,
system lib

This scenario requires edits to the service file or an override. To check NXLog system limits, use the following command:

$ cat /proc/$(cat /opt/nxlog/var/run/nxlog/nxlog.pid)/limits

On Systems not using /proc, check the system’s open file limit:

$ sysctl kern.maxfiles

To adjust limits for nxlog, create /etc/systemd/system/nxlog.service.d/override.conf and add the following definition:

[Service]
LimitNOFILE=100000

Update the service settings with:

$ systemctl daemon-reload