hroller - Cluster rolling maintenance scheduler for Ganeti
hroller {backend options...} [algorithm options...] [reporting options...]
hroller –version
Backend options:
{ -m cluster | -L[ path ] | -t data-file | -I path }
[ –force ]
Algorithm options:
[ -G *name* ] [ -O *name...* ] [ –node-tags tag,.. ] [ –skip-non-redundant ]
[ –offline-maintenance ] [ –ignore-non-redundant ]
Reporting options:
[ -v... | -q ] [ -S *file* ] [ –one-step-only ] [ –print-moves ]
hroller is a cluster maintenance reboot scheduler. It can calculate which set of nodes can be rebooted at the same time while avoiding having both primary and secondary nodes being rebooted at the same time.
For backends that support identifying the master node (currently RAPI and LUXI), the master node is scheduled as the last node in the last reboot group. Apart from this restriction, larger reboot groups are put first.
hroller will view the nodes as vertices of an undirected graph, with two kind of edges. Firstly, there are edges from the primary to the secondary node of every instance. Secondly, two nodes are connected by an edge if they are the primary nodes of two instances that have the same secondary node. It will then color the graph using a few different heuristics, and return the minimum-size color set found. Node with the same color can then simultaneously migrate all instance off to their respective secondary nodes, and it is safe to reboot them simultaneously.
For a description of the standard options check htools(1) and hbal(1).
If instances are online the tool should refuse to do offline rolling maintenances, unless explicitly requested.
End-to-end shelltests should be provided.
Selecting by tags and getting output for one step only can be used for planing the next maintenance step.
$ hroller --node-tags needsreboot --one-step-only -L
'First Reboot Group'
node1.example.com
node3.example.com
Typically these nodes would be drained and migrated.
$ GROUP=`hroller --node-tags needsreboot --one-step-only --no-headers -L`
$ for node in $GROUP; do gnt-node modify -D yes $node; done
$ for node in $GROUP; do gnt-node migrate -f --submit $node; done
After maintenance, the tags would be removed and the nodes undrained.
If all instances are shut down, usually larger node groups can be found.
$ hroller --offline-maintainance -L
'Node Reboot Groups'
node1.example.com,node3.example.com,node5.example.com
node8.example.com,node6.example.com,node2.example.com
node7.example.com,node4.example.com
By default, hroller plans capacity to move the non-redundant instances out of the nodes to be rebooted. If requested, apropriate locations for the non-redundant instances can be shown. The assumption is that instances are moved back to their original node after each reboot; these back moves are not part of the output.
$ hroller --print-moves -L
'Node Reboot Groups'
node-01-002,node-01-003
inst-20 node-01-001
inst-21 node-01-000
inst-30 node-01-005
inst-31 node-01-004
node-01-004,node-01-005
inst-40 node-01-001
inst-41 node-01-000
inst-50 node-01-003
inst-51 node-01-002
node-01-001,node-01-000
inst-00 node-01-002
inst-01 node-01-003
inst-10 node-01-005
inst-11 node-01-004