NXLog Agent data collection modes

NXLog Agent can collect telemetry data in three modes. Each mode has different characteristics, and you can use any combination of modes within your telemetry data pipeline.

Agent-based data collection: NXLog Agent runs on the system that generates the telemetry data.
Agentless data collection: Applications or devices send telemetry data to NXLog Agent over the network. When using agentless data collection, there are additional configuration requirements to ensure that NXLog Platform accurately counts log sources.
Offline data processing: Use the nxlog-processor(8) tool to process telemetry data manually or via a script.

Agent-based data collection

In agent-based data collection, you install NXLog Agent on the source and configure it to collect, process, and forward telemetry data.

Agent-based data collection is especially suitable if you need to transform the data before forwarding it to the destination and for secure and reliable data transfer. We recommend this mode for most use cases.

Figure 1. Agent-based data collection

Agent-based data collection offers significant advantages over agentless collection, some of which are:

You have more data collection options. Telemetry data sources often provide multiple output methods and formats you can choose from according to your requirements. For example, you can collect logs from a file rather than being tied to using an unreliable logging process to send logs over the network.
You can filter, normalize, and enrich telemetry data before forwarding it to the destination. NXLog Agent has a comprehensive list of data processing capabilities, including transforming events and metrics to a different format, such as JSON, XML, or CSV.
You have complete control over how you transfer telemetry data. NXLog Agent supports several network protocols, including TLS/SSL over TCP and HTTPS for secure data transfer. You can also compress data and implement buffering if necessary.
Implement reliable and secure data collection. NXLog Agent includes delivery guarantees and flow control systems to ensure telemetry data reached the destination. You can also monitor the health of NXLog Agent instances from NXLog Platform to maintain operational integrity.

Although agent-based data collection has its benefits, there are instances where installing an agent on the source is not possible, including:

Many network devices and embedded systems, such as routers and firewalls, do not support installing third-party software.
Compliance or regulatory mandates may prohibit you from installing third-party software on certain systems.

Agentless data collection

In agentless data collection, you configure a central NXLog Agent instance to receive and process telemetry data from remote sources. You then configure applications or devices to send telemetry data to this NXLog Agent instance over the network using protocols such as TCP, UDP, HTTP, WEF, and MSRPC.

We only recommend agentless data collection for sources where you cannot install third-party software, such as network devices and legacy or embedded systems.

Figure 2. Agentless data collection

Agentless data collection can be advantageous because you do not need to install additional software on the source, and applications and devices that support forwarding telemetry data over the network generally only require minimal configuration.

However, it also has some disadvantages that are worth considering, including:

Agentless data collection may be slower than agent-based collection. For example, on Windows, the Windows Management Instrumentation (WMI) process used to forward events can consume a considerable amount of system resources compared to NXLog Agent.
Data may not be transferred reliably and securely. For example, most syslog forwarders use UDP to transfer logs over the network, which is neither reliable nor secure. In addition, it is unlikely that you’ll be able to monitor the health of the forwarding process, resulting in potential data loss if the process or communication breaks down.

Configuration requirements for agentless data collection

NXLog Platform identifies agentless log sources by the IP address and port number used to connect. NXLog Platform must consider both the source IP address and port when counting log sources, given that connectionless protocols such as UDP may be used, and connections may pass through a proxy or load balancer. Systems that do not use a unique IP address and port will be counted multiple times.

Sending data over UDP

When using UDP, some devices use a different outbound port when establishing each network connection, causing NXLog Platform to count the same device multiple times.

For example, a router sending data over UDP using three different outbound ports will result in 3 log sources:

You must configure devices sending data over UDP to use the same source port to avoid duplication.

For example, two routers sending data over UDP, with each device using the same outbound port, results in 2 log sources:

Sending data through a proxy or network load balancer

If devices send data to an NXLog Agent cluster via a reverse proxy or load balancer, the same device can be counted for each NXLog Agent instance through which the data passes.

For example, a router sending data to an NXLog Agent cluster via a load balancer that distributes connections between two agent instances will result in 2 log sources:

NXLog Agent collecting data using a load balancer

To avoid this, you must configure persistent connections on your load balancer, such as hash-based routing by source address.

For example, two routers sending data via a load balancer, which always routes connections from the same source to the same agent instance, results in 2 log sources:

Offline data processing

While the other two modes process data in real-time, you can process telemetry data offline with the nxlog-processor(8) tool. The tool is similar to the main NXLog Agent service and uses the same configuration system but runs in the foreground and exits once it processes all the data.

There are several reasons why you may need to process telemetry offline, such as:

Transferring events from log files to a database.
Converting telemetry data to a different format.
Testing patterns.
Correlating events.
Checking HMAC message integrity.