Using Resilient Operations Sentinel Servers

Two Operations Sentinel servers can be configured for use in a resilient monitoring environment. Each client program must establish a connection to the secondary server to use this feature. All events generated by the client are normally sent to the primary server. If the client has registered for event callbacks, only events from the primary server are reported to the client in a resilient server environment.

If a client loses connection with the primary server, all events generated by the client are sent to the secondary server. In addition, the client callbacks begin receiving events from the secondary server. When the primary server is once again available, all events are sent to the primary server and the client callbacks are again received only from the primary Operations Sentinel server.

Specifying a Secondary Server

Set the environment variable SPO_EVENT_SECONDARY to specify the location of the secondary server before a client is started. As the value of this variable, use the name or IP address of the secondary server (as specified in the /etc/hosts file, the Domain Name Server, or the Windows Internet Naming Service). If this environment variable is not specified, the client will not connect to a secondary server.

Event Report for Switch of Servers

When a switch between servers occurs, the Event Server API sends an alert event report on behalf of the client, reporting that the Event Server switched to the primary or secondary server, whichever is the case. The API (linked with the client application) sends this event report to any Operations Sentinel server that is connected.

The following fields are included in this event report:

TYPE = AL
SEV = major
APPL = spues (Single Point Universal Event Server)
APPLQUAL = application-name (as specified for SPDInitClient)
ALARMID = _SwitchToPrimary (or _SwitchToSecondary, depending on which case applies)

This alert must be manually cleared. Like any alert event report, an alert raised when servers are switched is logged by Operations Sentinel. This alert is recorded in the logs named SPO and SP-SPALS, but is not recorded in the log of each monitored system. If several client applications report the switch of servers, only one alert appears in the Alerts windows of Operations Sentinel Console, because subsequent alerts are treated as duplicates. Any subsequent alerts (before the first alert has been cleared) are recorded as discarded raises (type DR) in the logs.

If an operator clears this alert before all clients have raised it, the alert reappears when another client sends the event report raising it.

You can examine the SPO and SP-SPALS logs using the Operations Sentinel Log Viewer. Refer to the Operations Sentinel Console User Guide.