Windows Server Failover Clustering

Windows Server Failover Clustering (WSFC) is a system-level, high-availability feature that allows server pairs to act as standby nodes for each other. Nodes exchange communication between them, known as a "heartbeat," over the LAN. Notifications are either sent by the active server to the standby node (push heartbeat) or requested periodically from the standby node by the active server (pulled heartbeat). See Failover Clustering in Windows Server on Microsoft Docs for more information about WSFC.

NXLog Agent can collect logs generated by WSFC, parse them, and forward them to a destination of your choice.

WSFC logging changes in Windows Server 2008 R2

WSFC recorded cluster operations and activity in several system log files in the past. Windows Server 2008 R2 and newer versions consolidate logging in Windows Event Log and ETW. The following table compares the log files found in previous versions with Windows Server 2008 R2 and newer logging.

Table 1. Windows cluster logs comparison with Windows Server 2008 R2
Pre-2018 R2 log file Log functionality Logging in newer versions

%systemroot%\Cluster\cluster.log

Debug log file for clustering operations

Debug level events are available with ETW.

%systemroot%\system32\LogFiles\Cluster\clcfgsrv.log

Cluster installer logs

An HTML installation report is created in %systemroot%\Cluster\CreateCluster.htm.

%systemroot%\system32\LogFiles\Cluster\clusocm.log

Records cluster-related activity during an operating system upgrade

This is now part of the core Windows event logging. ETW also has two providers tracing cluster-aware updating.

%systemroot%\system32\LogFiles\Cluster\cluscomp.log

Records the activity that occurs during the compatibility check at the start of an operating system upgrade on a cluster node

This is now part of the core Window event logging.

Logging across a Windows cluster

The current active node undertakes logging responsibility in a Windows cluster. As a result, you can install NXLog Agent on any member server, and they will process all logs from the cluster. For example, in a two-node cluster, as shown in the following diagram:

Windows Server Failover Clustering setup

Where:

  • Cluster-DC is the domain controller since it is a prerequisite for cluster members to be part of the same domain.

  • ClusterNode1 and ClusterNode2 are cluster members.

  • ClusterStorage is the iSCI storage used by the cluster; logs are saved to a location on disk here.

If NXLog Agent is configured on ClusterNode1 and ClusterNode2 to output logs to separate files, they will write the same events in their respective files. The source of each event identifies the node where it was generated.

System event logged on a member server
2022-05-31 18:47:35 ClusterNode2.example.com INFO Keywords="9259400833873739776" EventType="INFO" SeverityValue="2" EventID="7036" SourceName="Service Control Manager" ProviderGuid="{555908D1-A6D7-4695-8E1E-26931D2012F4}" Version="0" TaskValue="0" OpcodeValue="0" RecordNumber="6634" ExecutionProcessID="652" ExecutionThreadID="2340" Channel="System" Message="The Software Protection service entered the stopped state." param1="Software Protection" param2="stopped" EventData.Binary="7300700070007300760063002F0031000000"

The advantage of such a setup is that if the active node fails, any other node running NXLog Agent will continue to process logs to their respective output.

Collecting events from Windows Event Log

WSFC logs events in the following logs:

  • System

  • Microsoft-Windows-FailoverClustering/Operational (found in Event Viewer under Application and Services Logs > Microsoft > Windows > FailoverClustering)

John Marlin, a Senior Product Manager for High Availability and Storage at Microsoft, provides a detailed List of Failover Cluster Events in Windows 2016/2019 in his Microsoft Tech Community blog post. Additionally, you can find a complete list of Failover Clustering system log events on Microsoft Docs.

FailoverClustering includes other unused log categories. They are possibly placeholders for future updates.

You can configure NXLog Agent to collect all events from the FailoverClustering source or specify a query to collect only a subset of events.

Example 1. Collecting Windows Failover cluster events

This configuration uses the im_msvistalog input module to collect the following subset of FailoverClustering events:

Event ID 1000 (UNEXPECTED_FATAL_ERROR)

This event is generated when a software or hardware-related issue prevents the cluster service from starting on a node.

Event ID 1006 (NM_EVENT_MEMBERSHIP_HALT)

This event is generated when the cluster service is halted on a member node due to a lack of connectivity with other cluster nodes.

Event ID 1635 (RCM_RESOURCE_FAILURE_INFO)

This event is generated by the Resource Control Manager when a specific shared resource within the cluster fails to come online. It is typically seen in SQL Server and shared disk failures.

Event ID 1637 (RCM_RESOURCE_STATE_TRANSITION)

This event is generated by the Resource Control Manager when there’s a state transition in any clustered resource. This event doesn’t necessarily reflect an error.

Before sending events to their destination, the configuration converts them to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module    xm_json
</Extension>

<Input cluster_evt>
    Module    im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id="0" Path="System">
                <Select Path="System">*[System[Provider[@Name='Microsoft-Windows-FailoverClustering']
                    and (EventID=1000 or EventID=1006)]]
                </Select>
                <Select Path="Microsoft-Windows-FailoverClustering/Operational">
                    *[System[(EventID=1635 or EventID=1637)]]
                </Select>
            </Query>
        </QueryList>
    </QueryXML>
    Exec      to_json();
</Input>

The following output in JSON format depicts event ID 1637, logged for an IP address that transitioned to a Pending state, after it was processed by NXLog Agent.

Output sample
{
  "EventTime": "2022-05-31T19:48:40.128046-08:00",
  "Hostname": "node2.example.com",
  "Keywords": "4611686018427387904",
  "EventType": "INFO",
  "SeverityValue": 2,
  "Severity": "INFO",
  "EventID": 1637,
  "SourceName": "Microsoft-Windows-FailoverClustering",
  "ProviderGuid": "{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}",
  "Version": 1,
  "TaskValue": 3,
  "OpcodeValue": 0,
  "RecordNumber": 1450,
  "ExecutionProcessID": 2276,
  "ExecutionThreadID": 3544,
  "Channel": "Microsoft-Windows-FailoverClustering/Operational",
  "Domain": "NT AUTHORITY",
  "AccountName": "SYSTEM",
  "UserID": "S-1-5-18",
  "AccountType": "Well Known Group",
  "Message": "Cluster resource 'Cluster IP Address' in clustered role 'Cluster Group' has transitioned from state OnlineCallIssued to state OnlinePending.",
  "Category": "Resource Control Manager",
  "Opcode": "Info",
  "ResourceName": "Cluster IP Address",
  "GroupName": "Cluster Group",
  "FromState": "OnlineCallIssued",
  "ToState": "OnlinePending",
  "FromStateValue": "133",
  "ToStateValue": "129",
  "EventReceivedTime": "2022-05-31T19:48:40.956099-08:00",
  "SourceModuleName": "cluster_evt",
  "SourceModuleType": "im_msvistalog"
}

Collecting ETW logs

Event Tracing for Windows (ETW) is an advanced debugging feature provided by Microsoft that allows you to create customized event tracing using a provider-consumer model. For more information on how ETW works, refer to About Event Tracing on Microsoft Docs.

WSFC ETW providers

Microsoft documentation on ETW providers tends to be obscure; therefore, working with them often requires a lot of trial and error. The following are three typical providers that you will come across:

Microsoft-Windows-FailoverClustering-SoftwareStorageBusTarget

Shared storage is an important topic in WFSC operations. Whether you are running SCSI, iSCSI, or FiberChannel, you may need visibility on input and output operations and the hardware state of your storage adapters. This provider will help you monitor storage operations on cluster shared storage.

Microsoft-Windows-FailoverClustering-WMIProvider

WMI is a feature-rich alternative management option for a server, and if you are managing your cluster with WMI, this is a key ETW provider.

Microsoft-Windows-ClusterAwareUpdating

Cluster Aware Updating is an optional valuable feature that allows the administrator to update cluster members safely. The cluster enters a special maintenance mode that can suspend operations and transition any active roles in the server as needed while updating it. This ETW provider gives you granular visibility into the transitions, stages, and operations related to this feature. See Cluster-Aware Updating overview on Microsoft Docs for more information.

Follow these steps to obtain the WSFC ETW parameters for your NXLog Agent configuration:

  1. Determine the ETW provider(s) you need. Execute the following command to list all the available providers:

    > logman query providers

    The following providers are related to WSFC at the time of writing:

    • Microsoft-Windows-FailoverClustering {BAF908EA-3421-4CA9-9B84-6689B8C6F85F}

    • Microsoft-Windows-FailoverClustering-Client {A82FDA5D-745F-409C-B0FE-18AE0678A0E0}

    • Microsoft-Windows-FailoverClustering-ClusBflt-Diagnostic {923BCB94-58D2-42BE-BBA9-B1315F363838}

    • Microsoft-Windows-FailoverClustering-ClusDisk-Diagnostic {7FEF367F-E76C-4592-9912-E12B36A99780}

    • Microsoft-Windows-FailoverClustering-Clusport-Diagnostic {29C07D0E-E5A0-4E85-A004-1F668531CE22}

    • Microsoft-Windows-FailoverClustering-CsvFlt-Diagnostic {151D3C03-E442-4C4F-AF20-BD48FF41F793}

    • Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic {6A86AE90-4E9B-4186-B1D1-9CE0E02BCBC1}

    • Microsoft-Windows-FailoverClustering-Manager {11B3C6B7-E06F-4191-BBB9-7099FFF55614}

    • Microsoft-Windows-FailoverClustering-NetFt {C1FCCEB3-3F19-42A9-95B9-27B550FA1FBA}

    • Microsoft-Windows-FailoverClustering-SoftwareStorageBusTarget {0AC0708A-A44E-49EF-AA7E-FBE8CCC603A6}

    • Microsoft-Windows-FailoverClustering-WMIProvider {0461BE3C-BC15-4BAD-9A9E-51F3FADFEC75}

    • Microsoft-Windows-ClusterAwareUpdating {10629806-46F2-4366-9092-53025E067E8C}

    • Microsoft-Windows-ClusterAwareUpdating-Management {9B9E93D6-5569-4179-8C8A-5201CB2B9536}

  2. Use the provider GUID to query the keywords available for tracing, e.g.:

    > logman query providers "{A82FDA5D-745F-409C-B0FE-18AE0678A0E0}"

    You should see output similar to the following:

    Provider                                 GUID
    -------------------------------------------------------------------------------
    Microsoft-Windows-FailoverClustering-Client {A82FDA5D-745F-409C-B0FE-18AE0678A0E0}
    
    Value               Keyword              Description
    -------------------------------------------------------------------------------
    0x0000000000000001  Cluster              Cluster
    0x0000000000000002  Node
    0x0000000000000004  Group
    0x0000000000000008  Resource
    0x0000000000000010  Network
    0x0000000000000020  NetInt
    0x0000000000000040  Quorum
    0x0000000000000080  Reconnect            Reconnect
    0x0000000000000100  ResType
    0x0000000000000200  Property
    0x0000000000000400  RPCLog
    0x8000000000000000  System               System
    0x4000000000000000  Microsoft-Windows-FailoverClustering-Client/Diagnostic Microsoft-Windows-FailoverClustering-Client/Diagnostic
    
    Value               Level                Description
    -------------------------------------------------------------------------------
    0x01                win:Critical         Critical
    0x02                win:Error            Error
    0x03                win:Warning          Warning
    0x04                win:Informational    Information
    0x05                win:Verbose          Verbose
    
    PID                 Image
    -------------------------------------------------------------------------------
    0x000005d4          C:\Windows\System32\msdtc.exe
    0x000007a4          C:\Windows\System32\spoolsv.exe
    0x00000878          C:\Windows\System32\svchost.exe
    
    The command completed successfully.
  3. The keywords in the output above represent categories of events that may be included in the trace session. Take note of the HEX values next to the keywords you’re interested in. The sum of these values provides the tracing level for your NXLog Agent configuration. For example, to trace Cluster, Node, Network, and Quorum, the resultant value is 0x0000000000000053.

Example 2. Collecting Failover Clustering client trace logs

This configuration uses the im_etw input module to collect ETW events from the Microsoft-Windows-FailoverClustering-Client provider.

The Level directive specifies that it should capture warning-level and upwards events.

The MatchAnyKeyword directive specifies the sum of event categories we determined above.

Finally, the configuration converts events to JSON format using the to_json() procedure of the xm_json module.

nxlog.conf
<Extension json>
    Module             xm_json
</Extension>

<Input cluster_etw>
    Module             im_etw
    Provider           Microsoft-Windows-FailoverClustering-Client
    Level              Warning
    MatchAnyKeyword    0x00000053
    Exec               to_json();
</Input>
Output sample
{
  "SourceName": "Microsoft-Windows-FailoverClustering",
  "ProviderGuid": "{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}",
  "Channel": "Microsoft-Windows-FailoverClustering/DiagnosticVerbose ",
  "EventID": 5408,
  "Version": 0,
  "ChannelID": 18,
  "OpcodeValue": 0,
  "TaskValue": 0,
  "Keywords": "1152921504606846976",
  "EventTime": "2022-05-31T16:47:18.032921-08:00",
  "ExecutionProcessID": 4336,
  "ExecutionThreadID": 2392,
  "EventType": "DEBUG",
  "SeverityValue": 1,
  "SeverityValue": 1,
  "Severity": "DEBUG",
  "Hostname": "node2",
  "Domain": "NT AUTHORITY",
  "AccountName": "SYSTEM",
  "UserID": "S-1-5-18",
  "AccountType": "Well Known Group",
  "Flags": "EXTENDED_INFO|IS_64_BIT_HEADER|PROCESSOR_INDEX (577)",
  "LogString": "[RCM] rcm::PreemptionTracker::GetPreemptedGroups()",
  "EventReceivedTime": "2022-05-31T16:47:19.035684-08:00",
  "SourceModuleName": "cluster_etw",
  "SourceModuleType": "im_etw"
}
Disclaimer

While we endeavor to keep the information in our guides up to date and correct, NXLog makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability, or availability of the content represented here. We update our screenshots and instructions on a best-effort basis.

The accurateness of the content was tested and proved to be working in our lab environment at the time of the last revision with the following software versions:

Microsoft Windows Server 2019 Standard
NXLog Agent version 5.4.7313

Last revision: 7 June 2022