Contents
This design document describes changes to hbal and related components (first and foremost LUXI), that will allow it to handle nodes that can’t be considered monolithic in regard to disk layout, for example because they have multiple different storage units available.
Currently the htools assume that there is one storage unit per node and that it can be arbitrarily split among instances. This leads to problems in clusters where multiple storage units are present: There might be 10GB DRBD and 10GB plain storage available on a node, for a total of 20GB. If an instance that uses 15GB of a single type of storage is requested, it can’t actually fit on the node, but the current implementation of hail doesn’t notice this.
This behaviour is clearly wrong, but the problem doesn’t arise often in current setup, due to the fact that instances currently only have a single storage type and that users typically use node groups to differentiate between different node storage layouts.
For the node show action, RAPI only returns
which is insufficient for the same reasons.
The LUXI and IAllocator protocols are extended to include in the node:
storage: a list of objects (storage units) with #. Storage unit, containing in order:
- storage type
- storage key (e.g. volume group name)
- extra parameters (e.g. flag for exclusive storage) as a list.
{
"storage": [
{ "sunit": ["drbd8", "xenvg", []]
, "free": 2000,
, "total": 4000
},
{ "sunit": ["file", "/path/to/storage1", []]
, "free": 5000,
, "total": 10000
},
{ "sunit": ["file", "/path/to/storage2", []]
, "free": 1000,
, "total": 20000
},
{ "sunit": ["lvm-vg", "xenssdvg", [false]]
, "free": 1024,
, "total": 1024
}
]
}
is a node with an LVM volume group mirrored over DRBD, two file storage directories, one half full, one mostly full, and a non-mirrored volume group.
The storage type drbd8 needs to be added in order to differentiate between mirrored storage and non-mirrored storage. The storage key signals the volume group used and the storage unit takes no additional parameters.
The same field is optionally present in the HTools text protocol:
a new “storage” column is added to the node section, which is a semicolon separated list of comma separated fields in the order #. free #. total #. sunit, which in itself contains
- the storage type
- the storage key
- extra arguments
For example:
2000,4000,drbd,xenvg;5000,10000,file,/path/to/storage1;1000,20000; [...]
hbal and hail will use this information only if available, if the data file doesn’t contain the storage field the old algorithm is used.
If the node information contains the storage field, hbal and hail will assume that only the space compatible with the disk’s requirements is available. For an instance to fit a node, all it’s disks need to fit there separately. For a disk to fit a node, a storage unit of the type of the disk needs to have enough free space to contain it. The total free storage is not taken into consideration.
Ignoring the old information will in theory introduce a backwards incompatibility: If the total free storage is smaller than to the sum of the free storage reported in the storage field a previously illegal move will become legal.
In order to determine a storage location for an instance, we collect analogous metrics to the current total node free space metric – namely the standard deviation statistic of the free space per storage unit.
The standard deviation metric of a desired storage unit is the sample standard deviation of the percentage of free space of storage units compatible.
The full storage metric is a average of the standard deviation metrics of the desired storage units.
This is backwards compatible in-so-far as that
Further this retains the property that scarce resources with low total will tend to have bigger impact on the metric than those with large totals, because in latter case the relative differences will not make for a large standard deviation.
Ignoring nodes that do not contain the desired storage unit additionally boosts the importance of the scarce desired storage units, because having more storage units of a desired storage unit will tend to make the standard deviation metric smaller.