Detecting an inactive agent or log source

It is a common requirement to detect conditions when there are no log messages coming from a source. This usually indicates a problem such as network connectivity issues, a server which is down, or an unresponsive application or system service. Usually this problem should be detected by monitoring tools (such as Nagios or OpenView), but the absence of logs can also be a good reason to investigate.

The im_mark module is designed as means of monitoring the health of the NXLog agent by generating "mark" messages every 30 minutes. The message text and interval are configurable.

The solution to this problem is the combined use of statistical counters and Scheduled checks. The input module can update a statistical counter configured to calculate events per hour. In the same input module a Schedule block checks the value of the statistical counter periodically. When the event rate is zero or drops below a certain limit, an appropriate action can be executed such as sending out an alert email or generating an internal warning message. Note that there are other ways to address this issue and this method may not be optimal for all situations.

Example 1. Alerting on absence of log messages

The following configuration example creates a statistical counter in the context of the im_tcp module to calculate the number of events received per hour. The Schedule block within the context of the same module checks the value of the msgrate statistical counter and generates an internal error message when no logs have been received within the last hour.

<Input in>
    Module  im_tcp
    Port    2345
        create_stat("msgrate", "RATE", 3600);
        add_stat("msgrate", 1);
        Every   3600 sec
            create_stat("msgrate", "RATE", 10);
            add_stat("msgrate", 0);
            if defined get_stat("msgrate") and get_stat("msgrate") <= 1
                log_error("No messages received from the source!");