High Availability (HA)

The two main components of any HA deployment are Failover for fault tolerance and Load balancing for maintaining an acceptable level of performance. NXLog integrates with third-party load balancing solutions and ships with built-in failover capabilities.

Failover

Configuring a cluster of redundant, multiple systems able to provide identical functionality is the first prerequisite for implementing failover. Such a cluster is also known as an active-passive cluster. See NXLog Failover Mode for a detailed guide on configuring NXLog externally managed active-passive clusters.

Creating an NXLog failover cluster for centralized log collection

The following diagram illustrates how the externally managed failover functionality inherent in each of the NXLog v5 agents belonging to the Log Sources group can be individually configured to define not only the nodes that will comprise the NXLog Relay Cluster, but also which node will be active.

{productName} Relay Cluster used in failover mode for centralized logging
Figure 1. NXLog Relay Cluster used in failover mode for centralized logging

The architecture exhibited above includes a Raijin Server Cluster comprised of two nodes. It too will operate in the same manner as the NXLog Relay Cluster, failing over to the next passive node if the active node fails.

To summarize, the failover for the NXLog Relay Cluster is configured in the om_tcp instance of each agent in the Log Sources group. The failover for the Raijin Server Cluster is configured in the om_raijin output instance of each agent in the NXLog Relay Cluster.

Only NXLog v5 agents can provide external failover, thus the architecture presented here works only with agent-based log collection, not with agentless log collection.

Examples of failover implementations

The following sections cover the configurations needed to manage a cluster of NXLog relay nodes in active-passive mode. It is important to note that it is the NXLog agents in the Log Sources tier depicted above that need to fulfill the NXLog v5 requirement in order to natively handle failover, not the NXLog agents in the Relay Cluster. For NXLog deployments that have not yet been upgraded to v5, see the next diagram.

Configuring NXLog v5 and higher for failover

Example 1. Sending logs to an NXLog Relay Cluster in failover mode

The following sample configurations are based on the architecture depicted above. For failover to function as depicted, each of the NXLog relay nodes should have identical module instances (input, out, relay, etc.), exception for their ListenAddr directive.

nxlog.conf (Configuration to be used on agents in the Log Sources tier)
<Output to_relay>
    Module         om_tcp
    OutputType     Binary
    Host           192.168.1.51:1514
    Host           192.168.1.52:1514
    Host           192.168.1.53:1514
</Output>
nxlog.conf (Configuration to be used on agents in the Relay Cluster tier)
<Extension _json>
    Module        xm_json
</Extension>

<Input from_Log_Sources>
    Module        im_tcp
    ListenAddr    0.0.0.0:1514
    InputType     Binary
</Input>

<Output to_archive>
    Module        om_tcp
    Host          archival-fs.example.com:1514
    OutputType    Binary
</Output>

<Output to_siem>
    Module        om_tcp
    Host          siem.example.com:1515
    Exec          to_json();
</Output>
The configurations for sending events to Amazon S3 and SIEMs like Microsoft Azure Sentinel or Splunk require additional setup that is beyond the scope of this chapter. Should you opt to include them in your HA Centralized Log Collection architecture, please visit their respective integration chapters for details on how to prepare and configure the NXLog Relay Cluster nodes for forwarding events to these third-party log analytics and/or long-term archival solutions: Amazon S3, Microsoft Azure Sentinel, and Splunk.

Using a third-party solution to provide failover within the NXLog Relay Cluster

For NXLog EE deployments that have not yet been upgraded to v5.0, failover needs to be managed by using a third-party load balancing solution that can be configured to operate in active-passive mode, such as HAProxy or Nginx.

Using a third-party solution to provide failover within the {productName} Relay Cluster
Figure 2. Using a third-party solution to provide failover within the NXLog Relay Cluster
Example 2. NXLog configurations common to both third-party implementations

In the following examples, either HAProxy or Nginx can be configured to automatically select the next available NXLog node in the Relay Cluster if the current node becomes unresponsive. Using this approach, all Log Sources will have their output instances configured to send events to the third-party failover provider as depicted in the diagram above.

nxlog.conf (Configuration for each log source sending to the failover server)
<Output to_3rd_party_failover>
    Module         om_tcp
    OutputType     Binary
    Host           192.168.1.50:1514
</Output>

Each of the NXLog nodes in the NXLog Relay Cluster are configured to receive events from the third-party failover provider.

nxlog.conf (Configuration for each node in the NXLog Relay Cluster)
<Extension _json>
    Module        xm_json
</Extension>

<Input from_Log_Sources>
    Module        im_tcp
    ListenAddr    0.0.0.0:1514
    InputType     Binary
</Input>

<Output to_archive>
    Module        om_tcp
    Host          archival-fs.example.com:1514
    OutputType    Binary
</Output>

<Output to_siem>
    Module        om_tcp
    Host          siem.example.com:1515
    Exec          to_json();
</Output>
Example 3. Using HAProxy to provide failover for an NXLog Relay Cluster

In this example HAProxy is configured for active-passive mode. The frontend section defines the listening IP and port (0.0.0.0:1514) of the host where HAProxy is running.

The backend section defines the members of the NXLog cluster, assigns them labels (nxlog_1, nxlog_2, nxlog_3), along with their listening IP addresses and ports. The second and third servers (nxlog_2 and nxlog_3) are flagged with backup to indicate they should remain passive as long as nxlog_1 is accepting connections and can relay events. The roundrobin algorithm means each server in the cluster will tested in order, cycling back to the first server again if the last one in the list fails while serving as the primary due to previous failures of the ones before it.

/etc/haproxy/haproxy.cfg
frontend tcp_front
    bind *:1514
    stats uri /haproxy?stats
    default_backend tcp_back

backend  tcp_back
    balance roundrobin
    server nxlog_1 192.168.1.51:1514 check
    server nxlog_2 192.168.1.52:1514 check backup
    server nxlog_3 192.168.1.53:1514 check backup
Example 4. Using Nginx to provide failover for an NXLog Relay Cluster

In this example Nginx is configured for active-passive mode within the context of a stream block. This is important because HTTP connections are handled differently than TCP (stream) connections.

Within the server block, the listen directive defines only the port (effectively 0.0.0.0:1514) that the Nginx host will be accepting connections on. The proxy_pass directive defines the name of the upstream block, nxlog_relay_cluster, to which TCP traffic will be directed.

The named (nxlog_relay_cluster) upsteam block defines each server member of the NXLog cluster by its hostname or IP address and port number. The second and third servers (192.168.1.52:1514 and 192.168.1.53:1514) are flagged with backup to indicate they should remain passive as long as 192.168.1.51:1514 is accepting connections and can relay events. Should the first server fail to accept connections and relay events, each susequent server in the cluster will tested in order, cycling back to the first server again if the last one in the list fails while serving as the primary due to previous failures of the ones before it.

/etc/nginx/nginx.conf
stream {
    upstream nxlog_relay_cluster {
        server 192.168.1.51:1514;
        server 192.168.1.52:1514 backup;
        server 192.168.1.53:1514 backup;
    }

    server {
        listen 1514;
        proxy_pass nxlog_relay_cluster;
    }
}

Load balancing

The primary objective of a load balancer is to distribute the workload within a computing environment across a cluster of nodes to mitigate a number of performance issues, such as:

  • Sudden spikes in workload that would otherwise overwhelm a non-distributed computing environment that can easily slow the entire computing environment down to a crawl or even a halt.

  • In an unmanaged cluster, some compute resources will inevitably be working at peak capacity while others sit idle due the random nature of tasks and network traffic. A load balancer can efficiently distribute network traffic and mete out compute tasks across multiple nodes. This optimizes resource utilization and maximizes throughput.

Load balancing can also be used to scale out your application and increase its performance and redundancy.

Architecture

The active-active mode of managing a load balanced cluster of nodes provides the most cost effective use of a cluster and the highest performance attainable. When comparing the architecture depicted in the following diagram with the previous one for Failover, two major differences are apparent:

  1. An additional Load Balancing tier has been inserted as a mediator between the Log Sources tier and the NXLog Relay Cluster tier.

  2. All nodes in the NXLog Relay Cluster are now active; none are sitting idle waiting for a failure to occur.

For this reason, load balancers are associated first and foremost with performance rather than fault tolerance.

Using load balancing with a {productName} Relay Cluster for centralized logging
Figure 3. Using load balancing with a NXLog Relay Cluster for centralized logging

Example implementations of load balancing solutions

In the following examples, the NXLog Relay Cluster is configured for active-active mode. All nodes will be actively used and managed by a load balancer. The dual-node Load Balancer cluster shown above will operate in active-passive (failover) mode by virtue of NXLog Enterprise Edition v5’s built-in externally managed failover capabilities. Each output instance of the productName agents belonging to the Log Sources group in the diagram is configured to send to both load balancers.

Since neither pre-v5 NXLog Enterprise Edition nor HAProxy nor Nginx (Open Source) ship with built-in failover capabilities, only a single load balancer can be used in standalone mode when configuring pre-v5 Log Sources.
Example 5. NXLog configurations common to both load balancing implementations
nxlog.conf (Configuration for each log source sending to the load balancer cluster)
<Output to_relay>
    Module         om_tcp
    OutputType     Binary

#   Load Balancer 1 - ACTIVE
    Host           192.168.1.50:1514

#   Load Balancer 2 - passive
    Host           192.168.1.60:1514

</Output>

Each of the NXLog nodes in the NXLog Relay Cluster are configured to receive events from the third-party failover provider.

nxlog.conf (Configuration for each node in the NXLog Relay Cluster)
<Extension _json>
    Module        xm_json
</Extension>

<Input from_Log_Sources>
    Module        im_tcp
    ListenAddr    0.0.0.0:1514
    InputType     Binary
</Input>

<Output to_archive>
    Module        om_tcp
    Host          archival-fs.example.com:1514
    OutputType    Binary
</Output>

<Output to_siem>
    Module        om_tcp
    Host          siem.example.com:1515
    Exec          to_json();
</Output>
Example 6. Using HAProxy to provide load balancing for an NXLog Relay Cluster

In this example HAProxy is configured for active-active mode. The leastconn load balancing algorithm is recommended for TCP connections that are long-lived.

/etc/haproxy/haproxy.cfg
frontend tcp_front
    bind *:1514
    default_backend tcp_back

backend  tcp_back
    balance leastconn
    mode tcp
    server nxlog_1 192.168.1.51:1514 check
    server nxlog_2 192.168.1.52:1514 check
    server nxlog_3 192.168.1.53:1514 check
    server nxlog_4 192.168.1.54:1514 check
    server nxlog_5 192.168.1.55:1514 check
Example 7. Using Nginx to provide load balancing for an NXLog Relay Cluster

In this example Nginx is configured for active-active mode. The least_conn load balancing algorithm is recommended for TCP connections that are long-lived.

/etc/nginx/nginx.conf
stream {
    upstream nxlog_relay_cluster {
        least_conn;
        server 192.168.1.51:1514;
        server 192.168.1.52:1514;
        server 192.168.1.53:1514;
        server 192.168.1.54:1514;
        server 192.168.1.55:1514;
    }

    server {
        listen 1514;
        proxy_pass nxlog_relay_cluster;
    }
}
Disclaimer

While we endeavor to keep the information in this topic up to date and correct, NXLog makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability, or availability of the content represented here.

Last revision: 11 May 2021