Detect an inactive data source with NXLog Agent
Telemetry data is essential to your IT infrastructure, especially events and metrics from business-critical applications. An agent not processing data may indicate a problem with the data source or destination, such as a software or hardware error, unplanned network interruption, or a malicious attack.
NXLog Agent provides statistical counters, which you can pair with a scheduled check to alert you if the agent stops processing data. Below, we give an example of generating an alert when NXLog Agent has not received data for over an hour.
This configuration receives data over TCP with the im_tcp input module. It creates a statistical counter within the context of the input module instance with the create_stat() procedure and increases it by 1 with every record it processes. The message rate counter has an interval of one hour (3600 seconds), i.e., the counter value is reset to 0 every hour.
In addition, the configuration uses a schedule block, which checks the value of the same statistical counter every hour.
If the value of msgrate
is less than one, it generates an error in the NXLog Agent log file.
<Input tcp>
Module im_tcp
ListenAddr 0.0.0.0:1514
<Exec>
create_stat("msgrate", "RATE", 3600);
add_stat("msgrate", 1);
</Exec>
<Schedule>
Every 3600 sec
<Exec>
create_stat("msgrate", "RATE", 10);
add_stat("msgrate", 0);
if defined get_stat("msgrate") and get_stat("msgrate") <= 1
log_error("No messages received from the source for the last hour!");
</Exec>
</Schedule>
</Input>
NXLog Agent provides several other ways to generate alerts. See Trigger alerts with NXLog Agent for more examples.