Detecting an inactive agent or log source
It is a common requirement to detect conditions when there are no log messages coming from a source. This usually indicates a problem such as network connectivity issues, a server which is down, or an unresponsive application or system service. Usually this problem should be detected by monitoring tools (such as Nagios or OpenView), but the absence of logs can also be a good reason to investigate.
|The im_mark module is designed as means of monitoring the health of the NXLog agent by generating "mark" messages every 30 minutes. The message text and interval are configurable.|
The solution to this problem is the combined use of statistical counters and Scheduled checks. The input module can update a statistical counter configured to calculate events per hour. In the same input module a Schedule block checks the value of the statistical counter periodically. When the event rate is zero or drops below a certain limit, an appropriate action can be executed such as sending out an alert email or generating an internal warning message. Note that there are other ways to address this issue and this method may not be optimal for all situations.
The following configuration example creates a statistical counter in the context of the im_tcp module to calculate the number of events received per hour. The Schedule block within the context of the same module checks the value of the msgrate statistical counter and generates an internal error message when no logs have been received within the last hour.
<Input in> Module im_tcp Port 2345 <Exec> create_stat("msgrate", "RATE", 3600); add_stat("msgrate", 1); </Exec> <Schedule> Every 3600 sec <Exec> create_stat("msgrate", "RATE", 10); add_stat("msgrate", 0); if defined get_stat("msgrate") and get_stat("msgrate") <= 1 log_error("No messages received from the source!"); </Exec> </Schedule> </Input>