NXLog failover mode

When using failover-enabled NXLog modules, it is important to understand that configuring an active-passive (failover) cluster is significantly different from other third-party failover implementations.

Failover-enabled modules

All network-based NXLog modules support externally managed failover:

Batched Compression (im_batchcompress, om_batchcompress)
DBI (im_dbi)
Elasticsearch (om_elasticsearch)
HTTP(s) (im_http, om_http)
Microsoft Azure(im_azure, om_azure)
Raijin (om_raijin)
Redis (im_redis)
Remote Management (xm_admin)
TCP (im_tcp, om_tcp)
TLS/SSL (im_ssl, om_ssl)
UDP (im_udp, om_udp, om_udpspoof)

Types of failover

Self-managed failover, typically referred to as an active-passive cluster of nodes provides almost the same functionality as the less-common, externally managed failover variety. The main difference between the two is determined by where (in which tier) the following is found or occurs:

Configuration of which nodes comprise the cluster and which node is the default active node
Detection of a fault within the active node
Selection of the next passive node to be promoted to active status when a fault is detected

Self-managed

The most common implementation of failover provides same-tier failover and is managed within the cluster itself. It’s main advantage is that a cluster can be deployed without any need for management from external hosts. This type of failover is typically only practical for large organizations that need to provide HA for client-server applications dependent on session continuity. Due to the technical complexity of this approach, configuring and deploying such clusters usually requires significant planning and resources.

Externally managed

Application-based, externally defined and managed failover solutions are not common. Emulating Active/Passive Application Clustering with HAProxy is one example. Although the Open Source edition of Nginx does not document this capability explicitly, this post provides an example of how an active-passive cluster can be configured and managed using Nginx.

An externally managed failover cluster is unaware of its peers and none of the nodes contain any configuration details defining themselves as part of a cluster. It relies entirely on the external hosts that will be accessing it to define which node is active and which nodes will be considered viable passive peers. If the active node fails, each external host’s failover-enabled agent, in this case NXLog, will actively try to connect to the next available passive node and promote it to be the active node for its own use. Each external host performs this task independently.

Using NXLog failover to create clusters

Configuring an NXLog relay cluster

Figure 1. NXLog relay cluster used in failover mode for centralized logging

Example 1. NXLog creating a 3-node failover cluster

The following configuration example can be used to create the architecture seen in the diagram. When implemented, only the first node will be actively receiving events. The other two nodes will sit idle unless the first node fails.

To implement this architecture, the following to_relay output instance needs to be included in the configuration file for both Linux servers, both Windows DNS Servers, and both Windows hosts collecting Sysmon events.

This completes the definition of the cluster and its nodes and as well as the sending side of the architecture.

nxlog.conf (Configuration for each log source sending to the failover cluster)

# External Failover Cluster - Active-Passive
<Output to_relay>
    Module  om_tcp

#   Node 1  ACTIVE
    Host    192.168.1.51:1514

#   Node 2  passive
    Host    192.168.1.52:1514

#   Node 3  passive
    Host    192.168.1.53:1514

</Output>

For the receiving side, the following from_log_sources input instance needs to be included in the configuration file for all nodes in the cluster. The output and relay instances complete the relay path depicted in the diagram for sending the events to their final destination, the SIEM Log Collection Server.

nxlog.conf (Configuration for each log source sending to the failover cluster)

# Receiving Events as a Node in a Failover Cluster
<Input from_log_sources>
    Module      im_tcp
    ListenAddr  0.0.0.0:1514
</Input>

# Relay the Events to the SIEM
<Output to_siem>
    Module      om_tcp
    Host        siem.example.com:1514
</Output>

<Route relay>
    Path  from_log_sources  =>  to_siem
</Route>

Configuring a hybrid load balancing/failover cluster

gv hybrid failover load balancing cluster

Figure 2. NXLog relay cluster used in hybrid load balancing/failover mode for centralized logging

As depicted in the diagram above, each agent can be configured to have a different active node from that of other agents configured to use the same NXLog Relay Cluster. This enables the creation of a hybrid failover/load balancing configuration.

From each individual agent’s perceptive, it is communicating with an active-passive cluster, but in reality the cluster will be operating as an active-active load balancer (albeit a static one). This technique prevents the other passive nodes from sitting completely idle, and depending on the number of nodes, can dramatically increase performance.

The following output instance is used by both Linux servers. The first relay node has been set as the default active node for their failover setup.

nxlog.conf (Configuration for the Linux servers)

# External Failover Cluster - ACTIVE Node 1

<Output to_relay>
    Module  om_tcp

#   Node 1  ACTIVE
    Host    192.168.1.51:1514

#   Node 2  passive
    Host    192.168.1.52:1514

#   Node 3  passive
    Host    192.168.1.53:1514

</Output>

The following output instance is used by both Windows DNS servers. The second relay node has been set as the default active node for their failover setup.

nxlog.conf (Configuration for the Windows DNS Servers)

# External Failover Cluster - ACTIVE Node 2
<Output to_relay>
    Module  om_tcp

#   Node 1  ACTIVE
    Host    192.168.1.52:1514

#   Node 2  passive
    Host    192.168.1.53:1514

#   Node 3  passive
    Host    192.168.1.51:1514

</Output>

The following output instance is used by both Windows servers collecting Sysmon events. The third relay node has been set as the default active node for their failover setup.

nxlog.conf (Configuration for the Windows servers collecting Sysmon events)

# External Failover Cluster - ACTIVE Node 3
<Output to_relay>
    Module  om_tcp

#   Node 1  ACTIVE
    Host    192.168.1.53:1514

#   Node 2  passive
    Host    192.168.1.51:1514

#   Node 3  passive
    Host    192.168.1.52:1514

</Output>

nxlog.conf (Configuration for each log source sending to the failover cluster)

# Receiving Events as a Node in a Failover Cluster
<Input from_log_sources>
    Module      im_tcp
    ListenAddr  0.0.0.0:1514
</Input>

# Relay the Events to the SIEM
<Output to_siem>
    Module      om_tcp
    Host        siem.example.com:1514
</Output>

<Route relay>
    Path  from_log_sources  =>  to_siem
</Route>