Distribute log processing with NXLog Agent
Load balancing is the process of distributing workload across multiple servers to reduce the strain on each server, optimizing processing and availability.
Below, we provide two methods that you can use to distribute NXLog Agent load.
Using a network load balancer
Whenever possible, using a network layer load balancer is the best method to distribute connections for higher throughput. A network load balancer can distribute connections between multiple identically-configured NXLog Agents.
Alternatively, you can take advantage of NXLog Agent’s multi-threaded architecture to distribute log processing between multiple input module instances of the same agent.
There are several commercial and open-source network load balancers available. We will use NGINX for our example. You will need NGINX Plus or NGINX Open Source version 1.9.0 or later for the TCP and UDP Load Balancing feature. See the installation instructions in the NGINX Admin Guide to get started.
This NGINX configuration distributes UDP and TCP connections to an NXLog Agent, configured with multiple input instances listening on different ports.
On Debian-based systems, the default location of the NGINX configuration file is /etc/nginx/nginx.conf
, but this may vary depending on your distribution.
The NGINX load balancer routes UDP traffic per message and TCP traffic per connection. As a result, load-balancing TCP traffic works best when log sources send a similar number of events. |
load_module /usr/lib/nginx/modules/ngx_stream_module.so; (1)
stream {
upstream nxlog_udp { (2)
server 192.168.1.81:1001;
server 192.168.1.81:1002;
}
upstream nxlog_tcp { (3)
server 192.168.1.81:1003;
server 192.168.1.81:1004;
}
server {
listen 192.168.1.81:514 udp; (4)
proxy_pass nxlog_udp;
proxy_responses 0;
}
server {
listen 192.168.1.81:1514; (5)
proxy_pass nxlog_tcp;
}
}
worker_rlimit_nofile 1000000;
events {
worker_connections 20000; (6)
}
1 | The NGINX stream module must be loaded from the configuration or enabled with the --with-stream configuration parameter. |
2 | Lists the NXLog Agent input instances listening for UDP connections. |
3 | Lists the NXLog Agent input instances listening for TCP connections. |
4 | Specifies the IP address and port NGINX will listen on for UDP connections. Configure your sources to send logs to this IP and port. |
5 | Specifies the IP address and port NGINX will listen on for TCP connections. Configure your sources to send logs to this IP and port. |
6 | The maximum number of simultaneous connections allowed. |
Refer to the NGINX documentation for more information on the available configuration directives.
This NXLog Agent configuration defines two identical instances of the im_udp input module listening for connections on different ports.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input udp_1>
Module im_udp
ListenAddr 0.0.0.0:1001
<Exec> (1)
parse_syslog(); (2)
to_json(); (3)
</Exec>
</Input>
<Input udp_2>
Module im_udp
ListenAddr 0.0.0.0:1002
<Exec>
parse_syslog();
to_json();
</Exec>
</Input>
<Output file>
Module om_file
File '/path/to/output/file'
</Output>
<Route r1>
Path udp_1, udp_2 => file (4)
</Route>
1 | Exec block for heavy parsing. |
2 | Parses syslog messages into structured data using the parse_syslog() procedure of the xm_syslog module. |
3 | Converts the record to JSON using the to_json() procedure of the xm_json module. |
4 | Routes messages from all input instances to a single output. |
Using NXLog Agent modules as threads
If deploying a Network Load Balancer is not an option, you can implement parallelization within the NXLog Agent configuration. There are several options depending on your use case.
The first method is to implement a selector function in the input instance to reroute individual events to multiple identical output instances. This way, any intensive log processing is distributed between different threads.
This configuration uses the im_tcp input module to listen for connections on port 1514. It then reroutes messages to three identical output instances, distributing the load between them.
Flow control is explicitly disabled when rerouting messages, resulting in NXLog Agent dropping messages if the target module(s) queue is full. |
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input tcp_routing>
Module im_tcp
ListenAddr 0.0.0.0:1514
<Exec>
if (get_var("linecounter") == undef ) set_var("linecounter", 0); (1)
set_var("linecounter", get_var("linecounter")+1); (2)
if get_var("linecounter") == 2 reroute("2"); (3)
if get_var("linecounter") == 3 {
reroute("3");
set_var("linecounter", 0); (4)
}
log_info(get_var("linecounter")); (5)
</Exec>
</Input>
<input null>
Module im_null
</input>
<Output file_1>
Module om_file
File '/path/to/output/file_1'
<Exec> (6)
parse_syslog(); (7)
to_json(); (8)
</Exec>
</Output>
<Output file_2>
Module om_file
File '/path/to/output/file_2'
<Exec>
parse_syslog();
to_json();
</Exec>
</Output>
<Output file_3>
Module om_file
File '/path/to/output/file_3'
<Exec>
parse_json();
to_json();
</Exec>
</Output>
<Route 1>
Path tcp_routing => file_1
</Route>
<Route 2>
Path null => file_2
</Route>
<Route 3>
Path null => file_3
</Route>
1 | Creates a module variable using the get_var() function and set_var() procedure to initialize a counter. The message falls through to route 1. |
2 | Increases the counter by 1. |
3 | Reroutes the message to the relevant output module with the reroute() procedure. |
4 | Resets the counter once it reaches the maximum number of output instances. |
5 | The log_info() procedure is used to write the counter’s value to the log file for testing purposes only. |
6 | Exec block for heavy parsing. |
7 | Parses syslog messages into structured data using the parse_syslog() procedure of the xm_syslog module. |
8 | Converts the record to JSON using the to_json() procedure of the xm_json module. |
Another option when receiving logs over the network is to route connections to multiple identical input instances by enabling the ReusePort directive of the im_tcp or im_udp modules, which allows multiple threads to receive data on the same port. Routing works best when many simultaneous connections deliver approximately the same number of events; otherwise, connection distribution may be skewed and not yield any benefits.
Let’s consider an example where four input threads can handle 7,000 EPS with parsing enabled. Three agents send a cumulative 22,000 EPS.
One might conclude that the total throughput provided by the four threads (28,000 EPS) might be sufficient to handle the influx. However, each source’s connection is associated with a single input thread. Therefore, if source A delivers 20,000 EPS, whereas sources B and C deliver 1,000 EPS, the maximum throughput will not scale as expected. Instead, it will equal the saturation throughput of Input 1 + 2*1,000 EPS, resulting in 9,000 EPS. 13,000 EPS ((20,000+2,000)-9,000) backpressure will cause significant delivery delays.
Distributing connections between threads is handled by the operating system. In our tests, we noticed poor results with few connections. |
This configuration defines two identical im_tcp input module instances listening for connections on port 1514. The ReusePort directive allows each instance to receive data synchronously on the same port.
<Extension syslog>
Module xm_syslog
</Extension>
<Extension json>
Module xm_json
</Extension>
<Input tcp_1>
Module im_tcp
ListenAddr 0.0.0.0:1514
ReusePort TRUE
<Exec> (1)
parse_syslog(); (2)
to_json(); (3)
</Exec>
</Input>
<Input tcp_2>
Module im_tcp
ListenAddr 0.0.0.0:1514
ReusePort TRUE
<Exec>
parse_syslog();
to_json();
</Exec>
</Input>
<Output file>
Module om_file
File '/path/to/output/file'
</Output>
<Route 1>
Path tcp_1, tcp_2 => file (4)
</Route>
1 | Exec block for heavy parsing. |
2 | Parses syslog messages into structured data using the parse_syslog() procedure of the xm_syslog module. |
3 | Converts the record to JSON using the to_json() procedure of the xm_json module. |
4 | Routes messages from all input instances to a single output. |