Tool to restart erroneously downed virtual machines.
This program and set of classes implement a watchdog to restart virtual machines in a Ganeti cluster that have crashed or been killed by a node reboot. Run from cron or similar.
Module | nodemaint |
Module doing node maintenance for Ganeti watcher. |
Module | state |
Module keeping state for Ganeti watcher. |
From __init__.py
:
Class |
|
Abstraction for a Virtual Machine instance. |
Class |
|
Data container representing cluster node. |
Exception |
|
Exception raised when this host is not the master. |
Function |
|
Tries to connect to the luxi daemon. |
Function |
|
Connects to RAPI port and does a simple test. |
Function |
|
Probes an echo RPC to WConfD. |
Function |
|
Main function. |
Function |
|
Parse the command line options. |
Function |
|
Run the watcher hooks. |
Function |
|
Check whether we should pause. |
Function |
|
Start all the daemons that should be running on all nodes. |
Constant | BAD |
Undocumented |
Constant | CHILD |
Undocumented |
Constant | ERROR |
Undocumented |
Constant | HELPLESS |
Undocumented |
Constant | INSTANCE |
Undocumented |
Constant | MAXTRIES |
Undocumented |
Constant | NOTICE |
Undocumented |
Function | _ |
Archives old jobs. |
Function | _ |
Check all nodes for restarted ones. |
Function | _ |
Checks if given instances has any secondary in offline status. |
Function | _ |
Make a pass over the list of instances, restarting downed ones. |
Function | _ |
Ensures current host is master node. |
Function | _ |
Undocumented |
Function | _ |
Retrieves instances and nodes per node group. |
Function | _ |
Checks if there are any currently running or pending group verify jobs and if so, returns their id. |
Function | _ |
Main function for global watcher. |
Function | _ |
Main function for per-group watcher process. |
Function | _ |
Returns a list of all node groups known by ssconf . |
Function | _ |
Merges all per-group instance status files into a global one. |
Function | _ |
Reads an instance status file. |
Function | _ |
Starts a new instance of the watcher for every node group. |
Function | _ |
Writes an instance status file from Instance objects. |
Function | _ |
Run a per-group "gnt-cluster verify-disks". |
Function | _ |
Writes the per-group instance status file. |
Undocumented
Value |
|
Checks if given instances has any secondary in offline status.
Parameters | |
nodes | Undocumented |
instance | The instance object |
Returns | |
True if any of the secondary is offline, False otherwise |
Connects to RAPI port and does a simple test.
Connects to RAPI port of hostname and does a simple test. At this time, the test is GetVersion.
If RAPI responds with error code "401 Unauthorized", the test is successful, because the aim of this function is to assess whether RAPI is responding, not if it is accessible.
Parameters | |
hostname:string | hostname of the node to connect to. |
Returns | |
bool | Whether RAPI is working properly |
Writes the per-group instance status file.
The entries are sorted.
Parameters | |
filename:string | Path to instance status file |
data:list of tuple; (instance name as string, status as string) | Instance name and status |
Reads an instance status file.
Parameters | |
filename:string | Path to status file |
Returns | |
tuple; (None or number, list of lists containing instance name and status) | File's mtime and instance status contained in the file; mtime is None if file can't be read |
Merges all per-group instance status files into a global one.
Parameters | |
filename:string | Path to global instance status file |
pergroup | Path to per-group status files, must contain "%s" to be replaced with group UUID |
groups:sequence | UUIDs of known groups |
Tries to connect to the luxi daemon.
Parameters | |
try | Whether to attempt to restart the master daemon |