Home
  • RapidOSS
  • Support
  • About Us
Home » Blogs » berkay's blog

Don't call me I'll call you. Heartbeat Monitoring Demystified

Posted May 7th, 2009 by berkay
in
  • event management
  • monitoring
  • RapidInsight
  • RapidOSS

“Polling” and “listening” are the most common monitoring techniques. Polling is where monitoring systems periodically call monitored systems to check whether the systems is operating as it should, and collect data (performance, etc.).  Listening is where monitoring system waits (listens)  for monitored systems to either directly or indirectly (through other systems, etc.) send information. In network monitoring, monitoring system polling SNMP agent of the devices, etc. is an example of polling technique, and processing of traps and syslog messages are examples of the listening technique.

Heartbeat monitoring is another technique that can be considered an enhanced form of listening. Some systems periodically send information,(events) aka heartbeats, to signal that the system is operating as it should. In this case, the absence of these heartbeat events and not events themselves indicate a problem, hence these events need to be processed differently than typical events.

It's always been possible to implement the necessary logic to handle heartbeat events in RapidOSS. With the version 3.3, RapidOSS now includes a reference implementation to process heartbeat events.

Instead of creating an event for each heartbeat and then processing/correlating these events, heartbeats are stored in a separate class. RsHeartBeat class provides all the functionality required to process and store heartbeat messages.  Process is straight forward:

1. Identify the objects that should receive periodic heartbeats with name and the expected frequency of the heartbeats.
       RsHeartBeat.configureHeartBeatMonitoring(“System1",60)

2. Process received heartbeat events
      RsHeartBeat.recordHeartBeat("System1")

3. If a heartbeat is not received for a monitored system longer than the specified heartbeat interval, an event is created.

4. Event is cleared automatically when a heartbeat is received again.

There are number of use cases where heartbeat monitoring approach can be very useful. When using listening approach to receive events from external systems, a heartbeat can ensure the sending system is operational (no news is not necessarily good news!).  In fact, we often use this technique to monitor the monitoring systems themselves. For example to ensure that SNMP traps are processed throughout out the system a trap may be sent periodically as heartbeat. Absence of this trap for longer than specified period would indicate a problem somewhere in the chain and has to be address immediately.

I believe that the functionality included as part of RapidOSS v3.3 will reduce the barriers to using this technique and make it easy to implement heartbeat monitoring.

  • Login or register to post comments

 Social Bookmark

  • Mobile IT management comes to town
  • Topology Maps in Network Management
  • The dream team, the tale of Smarts, APG and RapidOSS
  • Event Management in IT Operations. The Journey of RapidOSS v3
  • Managing Planned and Unplanned Maintenance with RapidInsight

  • Create new account
  • Request new password