This document describes the changes in Ganeti 2.3+ compared to Ganeti 2.3 storage model.
Contents
The aim is to introduce support for externally mirrored, shared storage. This includes two distinct disk templates:
DRBD is currently the only shared storage backend supported by Ganeti. DRBD offers the advantages of high availability while running on commodity hardware at the cost of high network I/O for block-level synchronization between hosts. DRBD’s master-slave model has greatly influenced Ganeti’s design, primarily by introducing the concept of primary and secondary nodes and thus defining an instance’s “mobility domain”.
Although DRBD has many advantages, many sites choose to use networked storage appliances for Virtual Machine hosting, such as SAN and/or NAS, which provide shared storage without the administrative overhead of DRBD nor the limitation of a 1:1 master-slave setup. Furthermore, new distributed filesystems such as Ceph are becoming viable alternatives to expensive storage appliances. Support for both modes of operation, i.e. shared block storage and shared file storage backend would make Ganeti a robust choice for high-availability virtualization clusters.
Throughout this document, the term “externally mirrored storage” will refer to both modes of shared storage, suggesting that Ganeti does not need to take care about the mirroring process from one host to another.
We consider the following use cases:
The design addresses the following procedures:
Additionally, mid- to long-term goals include:
Currently, all storage-related decision-making depends on a number of frozensets in lib/constants.py, typically constants.DTS_NET_MIRROR. However, constants.DTS_NET_MIRROR is used to signify two different attributes:
We propose the introduction of two new frozensets to ease decision-making:
Additionally, DTS_NET_MIRROR will be renamed to DTS_INT_MIRROR to reflect the status of the storage as internally mirrored by Ganeti.
Thus, checks could be grouped into the following categories:
The primary-secondary node concept has primarily evolved through the use of DRBD. In a globally shared storage framework without need for external sync (e.g. SAN, NAS, etc.), such a notion does not apply for the following reasons:
The proposed solution is using the iallocator framework for run-time decision making during migration and failover, for nodes with disk templates in constants.DTS_EXT_MIRROR. Modifications to gnt-instance and gnt-node will be required to accept target node and/or iallocator specification for these operations. Modifications of the iallocator protocol will be required to address at least the following needs:
Basic shared file storage support can be implemented by creating a new disk template based on the existing FileStorage class, with only minor modifications in lib/bdev.py. The shared file disk template relies on a shared filesystem (e.g. NFS, AFS, Ceph, OCFS2 over SAN or DRBD) being mounted on all nodes under the same path, where instance images will be saved.
A new cluster initialization option is added to specify the mountpoint of the shared filesystem.
The remainder of this document deals with shared block storage.
Basic shared block device support will be implemented with an additional disk template. This disk template will not feature any kind of storage control (provisioning, removal, resizing, etc.), but will instead rely on the adoption of already-existing block devices (e.g. SAN LUNs, NBD devices, remote iSCSI targets, etc.).
The shared block device template will make the following assumptions:
A new cluster configuration attribute will be introduced, named “storage_pools”, modeled as a dictionary mapping storage pools to external storage drivers (see below), e.g.:
{
"nas1": "foostore",
"nas2": "foostore",
"cloud1": "barcloud",
}
Ganeti will not interpret the contents of this dictionary, although it will provide methods for manipulating them under some basic constraints (pool identifier uniqueness, driver existence). The manipulation of storage pools will be performed by implementing new options to the gnt-cluster command:
gnt-cluster modify --add-pool nas1 foostore
gnt-cluster modify --remove-pool nas1 # There may be no instances using
# the pool to remove it
Furthermore, the storage pools will be used to indicate the availability of storage pools to different node groups, thus specifying the instances’ “mobility domain”.
New disk templates will also be necessary to facilitate the use of external storage. The proposed addition is a whole template namespace created by prefixing the pool names with a fixed string, e.g. “ext:”, forming names like “ext:nas1”, “ext:foo”.
In addition to external storage pools, a new interface will be introduced to allow external scripts to provision and manipulate shared storage.
In order to provide storage provisioning and manipulation (e.g. growing, renaming) capabilities, each instance’s disk template can possibly be associated with an external “storage driver” which, based on the instance’s configuration and tags, will perform all supported storage operations using auxiliary means (e.g. XML-RPC, ssh, etc.).
A “storage driver” will have to provide the following methods:
The proposed storage driver architecture borrows heavily from the OS interface and follows a one-script-per-function approach. A storage driver is expected to provide the following scripts:
These executables will be called once for each disk with no arguments and all required information will be passed through environment variables. The following environment variables will always be present on each invocation:
Additional variables may be available in a per-script context (see below).
Of particular importance is the disk’s logical ID, which will act as glue between Ganeti and the external storage drivers; there are two possible ways of using a disk’s logical ID in a storage driver:
All scripts should return 0 on success and non-zero on error accompanied by an appropriate error message on stderr. Furthermore, the following special cases are defined:
create In case of success, a string representing the disk’s logical id must be returned on stdout, which will be saved in the instance’s configuration and can be later used by the other scripts of the same storage driver. The logical id may be based on instance name, instance uuid and/or disk index.
resize In case of success, output the new disk size.
rename On success, a new logical id should be returned, which will replace the old one. This script is meant to rename the instance’s backing store and update the disk’s logical ID in case one of them is bound to the instance name.
Moving instances accross node groups
Enter search terms or a module, class or function name.