NXLog failover mode
When using failover-enabled NXLog modules, it is important to understand that configuring an active-passive (failover) cluster is significantly different from other third-party failover implementations.
Failover-enabled modules
All network-based NXLog modules support externally managed failover:
-
Batched Compression (im_batchcompress, om_batchcompress)
-
DBI (im_dbi)
-
Elasticsearch (om_elasticsearch)
-
Raijin (om_raijin)
-
Redis (im_redis)
-
Remote Management (xm_admin)
-
UDP (im_udp, om_udp, om_udpspoof)
Types of failover
Self-managed failover, typically referred to as an active-passive cluster of nodes provides almost the same functionality as the less-common, externally managed failover variety. The main difference between the two is determined by where (in which tier) the following is found or occurs:
-
Configuration of which nodes comprise the cluster and which node is the default active node
-
Detection of a fault within the active node
-
Selection of the next passive node to be promoted to active status when a fault is detected
Self-managed
The most common implementation of failover provides same-tier failover and is managed within the cluster itself. It’s main advantage is that a cluster can be deployed without any need for management from external hosts. This type of failover is typically only practical for large organizations that need to provide HA for client-server applications dependent on session continuity. Due to the technical complexity of this approach, configuring and deploying such clusters usually requires significant planning and resources.
Externally managed
Application-based, externally defined and managed failover solutions are not common. Emulating Active/Passive Application Clustering with HAProxy is one example. Although the Open Source edition of Nginx does not document this capability explicitly, this post provides an example of how an active-passive cluster can be configured and managed using Nginx.
An externally managed failover cluster is unaware of its peers and none of the nodes contain any configuration details defining themselves as part of a cluster. It relies entirely on the external hosts that will be accessing it to define which node is active and which nodes will be considered viable passive peers. If the active node fails, each external host’s failover-enabled agent, in this case NXLog, will actively try to connect to the next available passive node and promote it to be the active node for its own use. Each external host performs this task independently.
Using NXLog failover to create clusters
Configuring an NXLog relay cluster
The following configuration example can be used to create the architecture seen in the diagram. When implemented, only the first node will be actively receiving events. The other two nodes will sit idle unless the first node fails.
To implement this architecture, the following to_relay
output instance needs to be included in the configuration file for both Linux servers, both Windows DNS Servers, and both Windows hosts collecting Sysmon events.
This completes the definition of the cluster and its nodes and as well as the sending side of the architecture.
# External Failover Cluster - Active-Passive
<Output to_relay>
Module om_tcp
# Node 1 ACTIVE
Host 192.168.1.51:1514
# Node 2 passive
Host 192.168.1.52:1514
# Node 3 passive
Host 192.168.1.53:1514
</Output>
For the receiving side, the following from_log_sources
input instance needs to be included in the configuration file for all nodes in the cluster.
The output and relay instances complete the relay path depicted in the diagram for sending the events to their final destination, the SIEM Log Collection Server.
# Receiving Events as a Node in a Failover Cluster
<Input from_log_sources>
Module im_tcp
ListenAddr 0.0.0.0:1514
</Input>
# Relay the Events to the SIEM
<Output to_siem>
Module om_tcp
Host siem.example.com:1514
</Output>
<Route relay>
Path from_log_sources => to_siem
</Route>
Configuring a hybrid load balancing/failover cluster
As depicted in the diagram above, each agent can be configured to have a different active node from that of other agents configured to use the same NXLog Relay Cluster. This enables the creation of a hybrid failover/load balancing configuration.
From each individual agent’s perceptive, it is communicating with an active-passive cluster, but in reality the cluster will be operating as an active-active load balancer (albeit a static one). This technique prevents the other passive nodes from sitting completely idle, and depending on the number of nodes, can dramatically increase performance.
The following output instance is used by both Linux servers. The first relay node has been set as the default active node for their failover setup.
# External Failover Cluster - ACTIVE Node 1
<Output to_relay>
Module om_tcp
# Node 1 ACTIVE
Host 192.168.1.51:1514
# Node 2 passive
Host 192.168.1.52:1514
# Node 3 passive
Host 192.168.1.53:1514
</Output>
The following output instance is used by both Windows DNS servers. The second relay node has been set as the default active node for their failover setup.
# External Failover Cluster - ACTIVE Node 2
<Output to_relay>
Module om_tcp
# Node 1 ACTIVE
Host 192.168.1.52:1514
# Node 2 passive
Host 192.168.1.53:1514
# Node 3 passive
Host 192.168.1.51:1514
</Output>
The following output instance is used by both Windows servers collecting Sysmon events. The third relay node has been set as the default active node for their failover setup.
# External Failover Cluster - ACTIVE Node 3
<Output to_relay>
Module om_tcp
# Node 1 ACTIVE
Host 192.168.1.53:1514
# Node 2 passive
Host 192.168.1.51:1514
# Node 3 passive
Host 192.168.1.52:1514
</Output>
For the receiving side, the following from_log_sources
input instance needs to be included in the configuration file for all nodes in the cluster.
The output and relay instances complete the relay path depicted in the
diagram for sending the events to their final destination, the SIEM Log Collection Server.
# Receiving Events as a Node in a Failover Cluster
<Input from_log_sources>
Module im_tcp
ListenAddr 0.0.0.0:1514
</Input>
# Relay the Events to the SIEM
<Output to_siem>
Module om_tcp
Host siem.example.com:1514
</Output>
<Route relay>
Path from_log_sources => to_siem
</Route>