This document describes an enhancement of Ganeti’s instance placement by taking into account that some nodes are vulnerable to common failures.
Currently, Ganeti considers all nodes in a single node group as equal. However, this is not true in some setups. Nodes might share common causes of failure or be even located in different places with spacial redundancy being a desired feature.
The similar problem for instances, i.e., instances providing the same external service should not placed on the same nodes, is solved by means of exclusion tags. However, there is no mechanism for a good choice of node pairs for a single instance. Moreover, while instances providing the same service run on different nodes, they are not spread out location wise.
We propose to the cluster metric (as used, e.g., by hbal and hail) to honor additional node tags indicating nodes that might have a common cause of failure.
The following components will be added cluster metric, weighed appropriately.
The weights for these components might have to be tuned as experience with these setups grows, but as a starting point, both components will have a weight of 1.0 each. In this way, any common-failure violations are less important than any hard constraints missed (like instances on offline nodes) so that the hard constraints will be restored first when balancing a cluster. Nevertheless, with weight 1.0 the new common-failure components will still be significantly more important than all the balancedness components (cpu, disk, memory), as the latter are standard deviations of fractions. It will also dominate the disk load component which, which, when only taking static information into account, essentially amounts to counting disks. In this way, Ganeti will be willing to sacrifice equal numbers of disks on every node in order to fulfill location requirements.
Appart from changing the balancedness metric, common-failure tags will not have any other effect. In particular, as opposed to exclusion tags, no hard guarantees are made: hail will try allocate an instance in a common-failure avoiding way if possible, but still allocate the instance if not.
Inequality between nodes can also restrict the set of instance migrations possible. Here, the most prominent example is updating the hypervisor where usually migrations from the new to the old hypervisor version is not possible.
An instance migration will only be considered by htools, if for all migration tags y present on the node migrated from, either the tag is also present on the node migrated to or there is a cluster tag htools::allowmigration:y::z and the target node is tagged z (or both).
For the simple hypervisor upgrade, where migration from old to new is possible, but not the other way round, tagging all already upgraded nodes suffices.