Checking for N+M redundancy

This document describes how the level of redundancy is estimated in Ganeti.

Current state and shortcomings

Ganeti keeps the cluster N+1 redundant, also taking into account N+1 redundancy for shared storage. In other words, Ganeti tries to keep the cluster in a state, where after failure of a single node, no matter which one, all instances can be started immediately. However, e.g., for planning maintenance, it is sometimes desirable to know from how many node losses the cluster can recover from. This is also useful information, when operating big clusters and expecting long times for hardware repair.

Proposed changes

Higher redundancy as a sequential concept

The intuitive meaning of an N+M redundant cluster is that M nodes can fail without instances being lost. However, when DRBD is used, already failure of 2 nodes can cause complete loss of an instance. Therefore, the best we can hope for, is to be able to recover from M sequential failures. This intuition that a cluster is N+M redundant, if M nodes can fail one-by-one, leaving enough time for a rebalance in between, without losing instances, is formalized in the next definition.

Definition of N+M redundancy

We keep the definition of N+1 redundancy for shared storage. Moreover, for M a non-negative integer, we define a cluster to be N+(M+2) redundant, if after draining any node the standard rebalancing procedure (as, e.g., provided by hbal) will fully evacuate that node and result in an N+(M+1) redundant cluster.

Independence of Groups

Immediately from the definition, we see that the redundancy level, i.e., the maximal M such that the cluster is N+M redundant, can be computed in a group-by-group manner: the standard balancing algorithm will never move instances between node groups. The redundancy level of the cluster is then the minimum of the redundancy level of the independent groups.

Estimation of the redundancy level

The definition of N+M redundancy requires to consider M failures in arbitrary order, thus considering super-exponentially many cases for large M. As, however, balancing moves instances anyway, the redundancy level mainly depends on the amount of node resources available to the instances in a node group. So we can get a good approximation of the redundancy level of a node group by only considering draining one largest node in that group. This is how Ganeti will estimate the redundancy level.

Modifications to existing tools

As redundancy levels higher than N+1 are mainly about planning capacity, they level of redundancy only needs to be computed on demand. Hence, we keep the tool changes minimal.

  • hcheck will report the level of redundancy for each node group as a new output parameter

The rest of Ganeti will not be changed.