Application level monitoring refers to monitoring of the actual functionality of the applications rather than just whether the application is running or not. It is not uncommon for an application to be up and running yet not working as it should do to variety of reasons. Application level monitoring requires simulation of the common activities an application performs to ensure the application is working as expected, rather than relying on more typical methods of checking whether the process runs or the port is active.
For example trap processing in Smarts involve number of components, such as SNMP Trap Adapter, Open Integration Server (OI), and Service Assurance Manager (SAM). In addition, it may also be necessary to consider the network devices and firewall configuration to ensure trap processing. For example, if a firewall change blocks the SNMP trap port, no trap would be received and processed by Smarts, yet naturally, there would also be no notifications indicating the problem since all application components are functional. Any problem with the components involved, communications among them might impede the processing of the traps.
Traditional approach is to monitor each component involved in the chain individually. Experience shows this approach to be inadequate for critical applications. There are also tools that simulate transactions, however these tools are often not flexible enough and need extensive customization to handle monitoring of complex applications.
To ensure Smarts trap processing is working as it should, the simplest approach would be:
- send an SNMP trap to the Smarts Trap Adapter (from a different server)
- verify that a notification is created in the SAM server for the trap
This would be sufficient to determine whether or not Smarts trap processing is working. When there is a problem, we may not be able to determine where exactly (which component, between which components, etc.) the problem is, but once we are aware of the problem, further investigation can be performed to determine the cause of the problem.
It may also be possible to aid the investigation by automating some of the steps and increasing the granularity of the monitoring. For example, the following approach would give us a better idea where the problem may be:
- send an SNMP trap to the Smarts Trap Adapter
- verify that a notification is created in the OI server with the right property values. This will indicate whether the trap adapter and the OI Server are working and the trap adapter is communicating with the OI server. If the OI server is accessible but the notification does not exist, it would indicate a problem with the trap adapter or with the communications with the trap adapter.
- verify that a notification is created in the SAM server with the expected notification property values (hook script may change the notification property values). This will indicate whether the SAM server is working and communicating with the OI server. If the SAM server is responding but the notification does not exist, this may indicate a problem with OI to SAM communications, hook script, etc.
As described above, this approach would facilitate easier troubleshooting by narrowing down the possibilities. We can determine whether the problem is related the the trap adapter, OI server or the SAM server.
Next we will go through implementation of a application level monitoring solution for Smarts trap processing. We can implement the full functionality above with a scripting language like Perl or Groovy without using any tools, but we would be developing a lot of code that already exist, tested, etc. and nothing to do directly with what we are trying to do. RapidWatcher, a toolkit we have released as open source has been developed provides all the common functionality need, leaving us only the work to develop our monitoring logic.
Sending an SNMP trap
Smarts software includes a tool to send SNMP traps. We will use the sm_snmp (part of the Smarts client package) utility to send the SNMP trap to the Smarts server.
sm_snmp -d 192.168.1.10 trap 192.168.2.4 .1.3.6.1.4.1.5.5.5 6 6 0
Note that if RapidWatcher is implemented on a server that is not on the same LAN with Smarts, the network/firewall configuration problems can also be detected as an added bonus.
A specific SNMP OID needs to be determined for the SNMP trap that will be sent to Smarts and the Smarts Trap adapter needs to be configured to process traps with that OID. The following entry in the trap_mgr.conf file would do what we need:
[code]
BEGIN_TRAP .1.3.6.1.4.1.5.5.5 6 6
ClassName: Monitor
InstanceName: $SYS$
EventName: MonitorHeartbeat
Severity: 5
Expiration: 360
EventType: MOMENTARY
UnknownAgent:CREATE
LogFile: Monitor.log
END_TRAP
[/code]
As an optional step, if a hook script is being used to process notifications send from the OI server to the SAM server, the hook script can be modified to process the notification created for the trap we have sent as well. This would enable us to compare the notification properties from OI and SAM and determine whether the hook script processes the notification as expected.
The above trap configuration specifies that a momentary notification will be created for the specified trap, meaning the notification will stay active on the server for a limited time, slightly more than the frequency of the traps that are sent. For example if we periodically send traps every 5 minutes, the traps may stay active for 6 minutes. As long as new traps continue to come in more frequently than the specified time interval the notification would remain active. If no new trap is received within the time interval, the notification would be cleared by Smarts and would become inactive. Hence we can check whether there is an active notification in the Smarts server and as long as the notification is active we can infer that the traps are being processed.
RapidWatcher utilizes a very simple mechanism. The resources being monitored by RapidWatcher are represented as objects and can be defined in a configuration file or dynamically. RapidWatcher periodically executes the “monitor” script that is used to determine and set the state of any object that is monitored by RapidWatcher. If the state of an object is changed, RapidWatcher executes the “action” script, facilitating of the appropriate action to be taken. This simple structure allows complete flexibility in monitoring and action execution while providing the necessary infrastructure for configuration mechanism, password encryption, scheduling tasks, easy access to information through a web browser or RSS reader, etc. RapidWatcher allows us to focus on developing the monitoring logic handling the rest.
The Monitor Script
As stated above, the logic described above to determine whether the trap processing is working as expected is implemented in the monitor script.
An SNMP trap will be sent by the monitor script every time the monitor script is executed. This ensures that the notification stays active in the Smarts server as long as the trap is processed.
The existence of the notification (that is created for the trap) will be checked. If the notification is not active or does not exist, it indicates a problem with the SNMP trap processing.
Application level monitoring solution for Smarts SNMP trap processing has been developed by Burak Bala, on top of our open source monitoring tool RapidWatcher. The scripts and the configuration files used in this post are freely available for download as well as RapidWatcher. Let us know if you find it useful or have suggestions/requests for improvement.
Additional steps and future enhancements may be (look for them in future posts):
- The action script has not been utilized in this example. Typically, the action script is used to inform the interested parties via email, instant messages, SMS, etc. Another use in this example may be to run additional tests to further analyze where the problem may be. For example, additional information may be collected by running commands or interfacing with other tools to accelerate the problem resolution by providing supporting data along with the problem to the operators.
- The monitoring functionality we've implemented monitors the workings of the Smarts trap processing, from the Trap Adapter to the SAM server. The mechanism can be extended to include other components. For example, if Smarts Report Manager is used to archive Smarts notifications, running a report to retrieve the notifications for the traps that are sent would enable us to determine whether SDI Adapter, database and the application server that serves the reports are working as expected.
- The application level monitoring concept can be applied to Smarts domain managers such as IP Availability Manager (AM). Monitoring of domain managers and communications between domain managers and the SAM server require a different approach as domain managers do not have notifications and simulation of a event is somewhat different.

