7. Advanced usage and scheduling notes

upsmon can call out to a helper script or program when the device changes state. The example upsmon.conf has a full list of which state changes are available - ONLINE, ONBATT, LOWBATT, and more.

There are two options, that will be presented in details:

7.1. The simple approach, using your own script

How it works relative to upsmon

Your command will be called with the full text of the message as one argument.

For the default values, refer to the sample upsmon.conf file.

The environment string NOTIFYTYPE will contain the type string of whatever caused this event to happen - ONLINE, ONBATT, LOWBATT, …

Making this some sort of shell script might be a good idea, but the helper can be in any programming or scripting language.

Note

Remember that your helper must be executable. If you are using a script, make sure the execution flags are set.

For more information, refer to upsmon(8) and upsmon.conf(5) manual pages.

Setting up everything

  • Set EXEC flags on various things in upsmon.conf(5):

    NOTIFYFLAG ONBATT EXEC
    NOTIFYFLAG ONLINE EXEC

    If you want other things like WALL or SYSLOG to happen, just add them:

    NOTIFYFLAG ONBATT EXEC+WALL+SYSLOG

    You get the idea.

  • Tell upsmon where your script is

    NOTIFYCMD /path/to/my/script
  • Make a simple script like this at that location:

    #! /bin/bash
    echo "$*" | sendmail -F"ups@mybox" bofh@pager.example.com
  • Restart upsmon, pull the plug, and see what happens.

That approach is bare-bones, but you should get the text content of the alert in the body of the message, since upsmon passes the alert text (from NOTIFYMSG) as an argument.

Using more advanced features

Your helper script will be run with a few environment variables set.

  • UPSNAME: the name of the system that generated the change.

    This will be one of your identifiers from the MONITOR lines in upsmon.conf.

  • NOTIFYTYPE: this will be ONLINE, ONBATT, or whatever event took place which made upsmon call your script.

You can use these to do different things based on which system has changed state. You could have it only send pages for an important system while totally ignoring a known trouble spot, for example.

Suppressing notify storms

upsmon will call your script every time an event happens that has the EXEC flag set. This means a quick power failure that lasts mere seconds might generate a notification storm. To suppress this sort of annoyance, use upssched as your NOTIFYCMD program, and configure it to call your command after a timer has elapsed.

7.2. The advanced approach, using upssched

upssched is a helper for upsmon that will invoke commands for you at some interval relative to a UPS event. It can be used to send pages, mail out notices about things, or even shut down the box early.

There will be examples scattered throughout. Change them to suit your pathnames, UPS locations, and so forth.

How upssched works relative to upsmon

When an event occurs, upsmon will call whatever you specify as a NOTIFYCMD in your upsmon.conf, if you also enable the EXEC in your NOTIFYFLAGS. In this case, we want upsmon to call upssched as the notifier, since it will be doing all the work for us. So, in the upsmon.conf:

NOTIFYCMD /usr/local/ups/sbin/upssched

Then we want upsmon to actually use it for the notify events, so again in the upsmon.conf we set the flags:

NOTIFYFLAG ONLINE SYSLOG+EXEC
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC
... and so on.

For the purposes of this document I will only use those three, but you can set the flags for any of the valid notify types.

Setting up your upssched.conf

Once upsmon has been configured with the NOTIFYCMD and EXEC flags, you’re ready to deal with the upssched.conf details. In this file, you specify just what will happen when a given event occurs on a particular UPS.

First you need to define the name of the script or program that will handle timers that trigger. This is your CMDSCRIPT, and needs to be above any AT defines. There’s an example provided with the program, so we’ll use that here:

CMDSCRIPT /usr/local/ups/bin/upssched-cmd

Then you have to define the variables PIPEFN and LOCKFN; the former sets the file name of the FIFO that will pass communications between processes to start and stop timers, while the latter sets the file name for a temporary file created by upssched in order to avoid a race condition under some circumstances. Please see the relevant comments in upssched.conf for additional information and advice about these variables.

Now you can tell your CMDSCRIPT what to do when it is called by upsmon.

The big picture

The design in a nutshell is:

upsmon ---> calls upssched ---> calls your CMDSCRIPT

Ultimately, the CMDSCRIPT does the actual useful work, whether that’s initiating an early shutdown with upsmon -c fsd, sending a page by calling sendmail, or opening a subspace channel to V’ger.

Establishing timers

Let’s say that you want to receive a page when any UPS has been running on battery for 30 seconds. Create a handler that starts a 30 second timer for an ONBATT condition.

AT ONBATT * START-TIMER onbattwarn 30

This means "when any UPS (the *) goes on battery, start a timer called onbattwarn that will trigger in 30 seconds". We’ll come back to the onbattwarn part in a moment. Right now we need to make sure that we don’t trigger that timer if the UPS happens to come back before the time is up. In essence, if it goes back on line, we need to cancel it. So, let’s tell upssched that.

AT ONLINE * CANCEL-TIMER onbattwarn
Executing commands immediately

As an example, consider the scenario where a UPS goes onto battery power. However, the users are not informed until 60 seconds later - using a timer as described above. Whilst this may let the logged in users know that the UPS is on battery power, it does not inform any users subsequently logging in. To enable this we could, at the same time, create a file which is read and displayed to any user trying to login whilst the UPS is on battery power. If the UPS comes back onto utility power within 60 seconds, then we can cancel the timer and remove the file, as described above. However, if the UPS comes back onto utility power say 5 minutes later then we do not want to use any timers but we still want to remove the file. To do this we could use:

AT ONLINE * EXECUTE ups-back-on-power

This means that when upsmon detects that the UPS is back on utility power it will signal upssched. Upssched will see the above command and simply pass ups-back-on-power as an argument directly to CMDSCRIPT. This occurs immediately, there are no timers involved.

Writing the command script handler

OK, now that upssched knows how the timers are supposed to work, let’s give it something to do when one actually triggers. The name of the example timer is onbattwarn, so that’s the argument that will be passed into your CMDSCRIPT when it triggers. This means we need to do some shell script writing to deal with that input.

        #! /bin/sh

        case $1 in
                onbattwarn)
                        echo "The UPS has been on battery for awhile" \
                        | mail -s"UPS monitor" bofh@pager.example.com
                        ;;
                ups-back-on-power)
                        /bin/rm -f /some/path/ups-on-battery
                        ;;
                *)
                        logger -t upssched-cmd "Unrecognized command: $1"
                        ;;
        esac

This is a very simple script example, but it shows how you can test for the presence of a given trigger. With multiple ATs creating various timer names, you will need to test for each possibility and handle it according to your desires.

Note

You can invoke just about anything from inside the CMDSCRIPT. It doesn’t need to be a shell script, either - that’s just an example. If you want to write a program that will parse argv[1] and deal with the possibilities, that will work too.

Early Shutdowns

One thing that gets requested a lot is early shutdowns in upsmon. With upssched, you can now have this functionality. Just set a timer for some length of time at ONBATT which will invoke a shutdown command if it elapses. Just be sure to cancel this timer if you go back ONLINE before then.

The best way to do this is to use the upsmon callback feature. You can make upsmon set the "forced shutdown" (FSD) flag on the upsd so your slave systems shut down early too. Just do something like this in your CMDSCRIPT:

/usr/local/ups/sbin/upsmon -c fsd

It’s not a good idea to call your system’s shutdown routine directly from the CMDSCRIPT, since there’s no synchronization with the slave systems hooked to the same UPS. FSD is the master’s way of saying "we’re shutting down now like it or not, so you’d better get ready".

Background

This program was written primarily to fulfill the requests of users for the early shutdown scenario. The "outboard" design of the program (relative to upsmon) was intended to reduce the load on the average system. Most people don’t have the requirement of shutting down after n seconds on battery, since the usual OB+LB testing is sufficient.

This program was created separately so those people don’t have to spend CPU time and RAM on something that will never be used in their environments.

The design of the timer handler is also geared towards minimizing impact. It will come and go from the process list as necessary. When a new timer is started, a process will be forked to actually watch the clock and eventually start the CMDSCRIPT. When a timer triggers, it is removed from the queue. Canceling a timer will also remove it from the queue. When no timers are present in the queue, the background process exits.

This means that you will only see upssched running when one of two things is happening:

  1. There’s a timer of some sort currently running
  2. upsmon just called it, and you managed to catch the brief instance

The final optimization handles the possibility of trying to cancel a timer when there’s none running. If there’s no process already running, there are no timers to cancel, and furthermore there is no need to start a clock-watcher. As a result, it skips that step and exits sooner.