Common issues

Common issues are easily resolved by internal logs to identify typical symptoms, finding the corresponding description of the symptom below, and then following the suggested remediation steps.

This section falls into the following categories:

Startup error

You may receive this error message in the log file when NXLog fails to start (line break added):

nxlog failed to start: Invalid keyword: ÿþ# at \
C:\Program Files (x86)\nxlog\conf\nxlog.conf:1

This issue occurs because the NXLog configuration file has been saved in either UTF-16 text encoding, or UTF-8 text encoding with a BOM header.

Open the configuration file in a text editor and save it using ASCII encoding or plain UTF-8.

On Windows, you can use Notepad to correct the text encoding of this file.

Access errors

This section discusses errors which are generated on Linux while reading log data from files and Event Log data on Microsoft Windows. Both types of errors are caused by the lack of permissions.

Additionally, the internal log file can be used by another application which may cause access problems trying to read its contents.

Permission-related error on Linux

When configured to read from a file in the /var/log directory on Linux, NXLog may log the following error:

ERROR failed to open /var/log/messages;Permission denied

This error occurs because NXLog does not have permission to read the file with the configured User and Group. See the Reading Rsyslog log files for more information about using NXLog to read files from the /var/log directory.

Permission error of Windows Event Log

When collecting events from the Windows Event Log, the user running NXLog may not have sufficient permissions to access certain channels. When NXLog is running as a service, this applies to the user that the service is configured to run as. In this case, the NXLog log file shows errors such as:

WARNING [im_msvistalog|windows] failed to subscribe to msvistalog events,access denied [error code: 5]: Access is denied.
WARNING [im_msvistalog|windows] Invalid channel: 'Security': Access is denied.

or

WARNING [im_msvistalog|windows] ignoring source as it cannot be subscribed to (error code: 5)

When this error occurs, the user needs to be granted access to read the specified channel. For default Event Log channels, it is usually sufficient to add the user to the built-in Event Log Readers group by following these steps:

  1. Open the Computer Management MMC snap-in by going to the Windows Start menu, type compmgmt.msc and press Enter.

  2. Expand System Tools > Local Users and Groups > Groups.

  3. Double-click on the Event Log Readers group and add the NXLog user to it.

If the error persists, permission needs to be granted using Group Policy for the default Windows Event Log channels or the Windows Registry for other channels. Permissions are specified using the Security Descriptor Definition Language (SDDL).

  1. The first step is to retrieve the SID of the NXLog user. From a command prompt, run the following command:

    > wmic useraccount where name='<username>' get sid
  2. For default Windows Event Log channels:

    1. Open command prompt with an admin user and run the following command afer replacing <channel_name> with the actual channel name.

      > wevtutil gl <channel_name>

      In this example, the security channel was chosen:

    2. Take note of the channelAccess value.

      wevtutil command result
    3. Open the Group Policy Editor by going to the Windows Start menu, then type gpedit.msc and press Enter.

    4. Expand Computer Configuration > Administrative Templates > Windows Components > Event Log Service.

    5. Select the required channel from the list, for example Security. Double-click on the Configure log access policy to edit it.

    6. Select the Enabled option.

    7. Under Log Access, enter the channelAccess value retrieved above.

    8. Append the Log Access value with the permission for the NXLog service user. Add the following permission to grant the user read access:

      (A;;0x1;;;<user_sid>)

      Here, A means allow and 0x1 means read. You will need to replace <user_sid> with the SID retrieved in step 1.

      Configure log access policy
    9. From a command prompt run the following command to apply the updated policy:

      > gpupdate /force
  3. For other Windows Event Log channels:

    1. Open the Registry Editor by going to the Windows Start menu, type regedit and press Enter.

    2. Expand the following registry key:

      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Channels
    3. From the list of keys, find the channel shown in the error and click on it.

    4. In the right pane, double click on the ChannelAccess value to modify it.

    5. Append the permission for the NXLog service user to the existing value. Add the following permission to grant the user read access:

      (A;;0x1;;;<user_sid>)

      Here, A means allow and 0x1 means read. You will need to replace <user_sid> with the SID retrieved in step 1.

      Windows Registry Channel Access Permissions
    6. Repeat these steps for each channel showing the error.

    7. Restart Windows. This step is important because the new permissions will not be applied until Windows has been restarted.

      Your channel selection might be stored in a different registry key than the one specified above. If so, you will need to do some research to determine the correct registry key for each required channel.
      These steps require altering the Windows Registry and should be executed with care. Incorrect modifications could potentially render the system unusable.

Log file is in use by another application

If you try to view the internal log file of NXLog on Windows, you may receive an error message indicating that the log file is in use by another application and cannot be accessed.

To resolve this issue, either open the log file with an application that does not use exclusive locking (such as Notepad), or stop NXLog before opening the log file.

Connection error

When using the im_tcp and im_ssl modules to transfer data over the network, firewalls and other network issues can prevent successful connections. This can result in the Connection refused error.

ERROR [im_tcp|tcp] couldn't connect to 0.0.0.0:1514;Connection refused

ERROR [im_ssl|ssl] couldn't connect to 0.0.0.0:23456;Connection refused

To resolve this issue:

  • Check that no firewall, gateway, or other network issue is blocking the connection

  • Verify that the system can resolve the host name used in the Host directive of the configuration file

Certificate/TLS issues

This section explains how to handle the following issues:

Certificate cannot be verified

Authentication issues basically stem from unsuccessful verification of the certificate against the CA on the client and server sides. For example, the specified CA file may be non-existent, or the configuration may point to the wrong CA file.

A typical authentication error message is provided below:

SSL certificate problem: unable to get local issuer certificate

Below are more examples of the verification error message:

ERROR SSL certificate verification failed: unable to get local issuer
certificate (err: 20)

and

ERROR SSL certificate verification failed: certificate has expired (err: 10)

Usage of a self-signed certificate produces the following message:

ERROR SSL certificate verification failed: self signed certificate (err: 18)

The first step in solving such problems is to set the AllowUntrusted directive to TRUE and restart the agent. The manager instance should establish connection with the agent.

Additionally, CA and certificate files can be verified using the openssl tool. To verify the CA file, use the command below:

# openssl s_client -CAfile <path_to_CA_file> -connect <host:port>

To verify the server’s certificate against the CA certificate specified by the CAFile directive, use the following command:

# openssl s_client -connect <host:port> -cert <path_to_cert_file> -key <path_to_key_file> -CAfile <path_to_CA_file> -verify 1

Instead of the CA file, the path to the CA directory can be specified per the command below:

# openssl s_client -connect <host:port> -cert <path_to_cert_file> -key <path_to_key_file> -CApath  <path_to_CA_dir> -verify 1

Connection being reset

The following error is generated when the connection is being reset:

ERROR remote ssl socket was reset? (SSL_ERROR_SSL with errno=9); End of file found

This occurs in the following cases:

  • The agent is presenting a certificate that cannot be verified by NXLog Manager

  • The connection is being terminated, for example by a firewall

To troubleshoot this problem, check the other party’s logs and network packet captures. The SSL stack refusing the connection may log a more precise reason.

Incorrect certificate purpose

Certificates created with an incorrect or incompatible "certificate purpose" produce the following error:

ERROR SSL certificate verification failed: unsupported certificate purpose (err: 26)

To examine certificates, use the following command:

# openssl x509 -text -in <path_to_cert_file>

No shared ciphers

When the configuration applies an SSL cipher restriction (for compliance reasons), the two parties may not have any ciphers in common. This produces the following error:

ERROR SSL error, SSL_ERROR_SSL: retval -1, from 127.0.0.1:33240, reason: no shared cipher

To solve this problem, use the same cipher set in openssl to check if the problem exists:

openssl s_server -key agent-key.pem -CAfile agent-ca.pem -cert agent-cert.pem -port 7000 -cipher ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256

This command emulates the setup of the following listener:

<Input im_batchcompress>
    Module           im_batchcompress
    ListenAddr       0.0.0.0:2514
    CAFile           %CERTDIR%/agent-ca.pem
    CertFile         %CERTDIR%/agent-cert.pem
    CertKeyFile      %CERTDIR%/agent-key.pem
    AllowUntrusted   TRUE
    SSLCipher        ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256
</Input>

To test the listener from the sample above, run the following command:

openssl s_client -key agent-key.pem -CAfile agent-ca.pem -cert agent-cert.pem -connect localhost:2514

SSL not supported by the remote service

The following error is generated when an NXLog client configured to exchange data over SSL is connecting to a service configured without encryption:

ERROR SSL error, SSL_ERROR_SSL: retval -1, reason: wrong version number

To solve this problem, make sure that the service is configured to use SSL and enable it. If not, disable the use of SSL encryption from NXLog.

Log format error

If you are using Logstash, you may find that log entries are concatenated. To mitigate the error, make sure that you are using the json_lines codec in your Logstash server configuration.

The default json codec in Logstash sometimes fails to parse log entries passed from NXLog. Switch to the json_lines codec for better reliability.

When NXLog tries to evaluate a directive with the required log data unavailable in the current context, this results in producing the following missing record error:

missing record, assignment possibly after drop()

This error occurs when attempting to access a field from the Exec directive of a Schedule block. The log data is not available in the current context. Log data is never available to a scheduled Exec directive because its execution is not triggered by a log message.

An attempt to access a field can occur directly with a field assignment, or indirectly by calling a function or procedure that accesses log data.

Processing errors

This category encompasses the following types of errors:

Termination of processing

NXLog can send one log stream to multiple outputs. This can be configured by either using the same input in multiple routes or using multiple outputs in the same route (see Routes). By default, when one of the outputs fails NXLog will stop sending logs to all outputs. This is caused by NXLog flow control mechanism, which is designed to prevent messages from being lost. Flow control pauses an Input or Processor module when the next module in the route is not ready to accept data.

In some cases, it is preferred for NXLog to continue sending logs to the remaining active output and discard logs for the failed output. The simplest solution is to disable flow control. This can be done globally with the global FlowControl directive, or for the corresponding Input (and Processor, if any) modules only, with the module FlowControl directive.

With flow control disabled, an Input or Processor module will continue to process logs even if the next module’s buffers are full (and the logs will be dropped).

To retain the improved message durability provided by flow control, it is possible to instead explicitly specify when to drop logs by using a separate route for each output that may fail. Add a pm_buffer module instance to that route, and configure the buffer to drop logs when it reaches a certain size. The output that follows the buffer can fail without causing any Input or Processor module before the buffer to pause processing. For examples, see the Using buffers section.

Check open files and limits

When a system nears or exceeds its open files limit, significant performance penalties are typically quick to follow. LSOF (List Open Files) is a common debugging tool found on the majority of Linux systems and can reveal a great deal about the running system.

On Linux, run the following command to see, for example, which files NXLog has open:

$ lsof -u nxlog

This example returns the number of open files:

$ lsof -Fn -u nxlog | sort | uniq | wc -l

To check NXLog system limits, use the following command:

$ cat /proc/$(sudo cat /opt/nxlog/var/run/nxlog/nxlog.pid)/limits

On Systems not using /proc, check the system’s open file limit with:

$ sysctl kern.maxfiles

or with:

$ sysctl fs.file-max
There is no Windows equivalent to lsof. You can use Windows Process Explorer from Microsoft’s Windows Sysinternals to inspect which program has files or directories open.

Systemd and open files limit

There are certain cases where systemd ignores system-level file limits. This can generate the "too many files open" error.

2019-01-22 15:26:37 ERROR SSL error, failed to load ca cert from '/opt/nxlog/var/lib/nxlog/cert/agent-ca.pem', reason: Too many open files, system lib,
system lib

This scenario requires edits to the service file or an override. To check NXLog system limits, use the following command:

$ cat /proc/$(cat /opt/nxlog/var/run/nxlog/nxlog.pid)/limits

On Systems not using /proc, check the system’s open file limit:

$ sysctl kern.maxfiles

To adjust limits for nxlog, create /etc/systemd/system/nxlog.service.d/override.conf and add the following definition:

[Service]
LimitNOFILE=100000

Update the service settings with:

$ systemctl daemon-reload