This document describes how N+1 redundancy is achieved for instanes using shared storage.
For instances with DRBD as disk template, in case of failures of their primary node, there is only one node where the instance can be restarted immediately. Therefore, htools reserve enough memory on that node to cope with failure of a single node. For instances using shared storage, however, they can be restarted on any node—implying that on no particular node memory has to be reserved. This, however, motivated the current state where no memory is reserved at all. And even a large cluster can run out of capacity.
A cluster is considered N+1 redundant, if, for every node, all DRBD instances can be migrated out and then all shared-storage instances can be relocated to a different node without moving instances on other nodes. This is precisely the operation done after a node breaking. Obviously, simulating failure and evacuation for every single node is an expensive operation.
For DRBD, keeping N+1 redundancy is affected by moving instances and balancing the cluster. Moreover, taking is into account for balancing can help Improving allocation efficiency by considering the total reserved memory. Hence, N+1 redundancy for DRBD is to be taken into account for all choices affecting instance location, including instance allocation and balancing.
For shared-storage instances, they can move everywhere within the node group. So, in practise, this is mainly a question of capacity planing, especially is most instances have the same size. Nevertheless, offcuts if instances don’t fill a node entirely may not be ignored.
Enter search terms or a module, class or function name.