| Last reviewed: 08/20/2018 |
| |
| HPE iLO NMI Watchdog Driver |
| for iLO based ProLiant Servers |
| |
| The HPE iLO NMI Watchdog driver is a kernel module that provides basic |
| watchdog functionality and handler for the iLO "Generate NMI to System" |
| virtual button. |
| |
| All references to iLO in this document imply it also works on iLO2 and all |
| subsequent generations. |
| |
| Watchdog functionality is enabled like any other common watchdog driver. That |
| is, an application needs to be started that kicks off the watchdog timer. A |
| basic application exists in tools/testing/selftests/watchdog/ named |
| watchdog-test.c. Simply compile the C file and kick it off. If the system |
| gets into a bad state and hangs, the HPE ProLiant iLO timer register will |
| not be updated in a timely fashion and a hardware system reset (also known as |
| an Automatic Server Recovery (ASR)) event will occur. |
| |
| The hpwdt driver also has the following module parameters: |
| |
| soft_margin - allows the user to set the watchdog timer value. |
| Default value is 30 seconds. |
| timeout - an alias of soft_margin. |
| pretimeout - allows the user to set the watchdog pretimeout value. |
| This is the number of seconds before timeout when an |
| NMI is delivered to the system. Setting the value to |
| zero disables the pretimeout NMI. |
| Default value is 9 seconds. |
| nowayout - basic watchdog parameter that does not allow the timer to |
| be restarted or an impending ASR to be escaped. |
| Default value is set when compiling the kernel. If it is set |
| to "Y", then there is no way of disabling the watchdog once |
| it has been started. |
| |
| NOTE: More information about watchdog drivers in general, including the ioctl |
| interface to /dev/watchdog can be found in |
| Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt. |
| |
| Due to limitations in the iLO hardware, the NMI pretimeout if enabled, |
| can only be set to 9 seconds. Attempts to set pretimeout to other |
| non-zero values will be rounded, possibly to zero. Users should verify |
| the pretimeout value after attempting to set pretimeout or timeout. |
| |
| Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a |
| panic. This is to allow for a crash dump to be collected. It is incumbent |
| upon the user to have properly configured the system for kdump. |
| |
| The default Linux kernel behavior upon panic is to print a kernel tombstone |
| and loop forever. This is generally not what a watchdog user wants. |
| |
| For those wishing to learn more please see: |
| Documentation/kdump/kdump.txt |
| Documentation/admin-guide/kernel-parameters.txt (panic=) |
| Your Linux Distribution specific documentation. |
| |
| If the hpwdt does not receive the NMI associated with an expiring timer, |
| the iLO will proceed to reset the system at timeout if the timer hasn't |
| been updated. |
| |
| -- |
| |
| The HPE iLO NMI Watchdog Driver and documentation were originally developed |
| by Tom Mingarelli. |
| |