Contents
This is a design document detailing the proposed changes to the upgrade process, in order to allow it to be more automatic.
Ganeti requires to run the same version of Ganeti to be run on all nodes of a cluster and this requirement is unlikely to go away in the foreseeable future. Also, the configuration may change between minor versions (and in the past has proven to do so). This requires a quite involved manual upgrade process of draining the queue, stopping ganeti, changing the binaries, upgrading the configuration, starting ganeti, distributing the configuration, and undraining the queue.
While we will not remove the requirement of the same Ganeti version running on all nodes, the transition from one version to the other will be made more automatic. It will be possible to install new binaries ahead of time, and the actual switch between versions will be a single command.
While changing the file layout anyway, we install the python code, which is architecture independent, under ${prefix}/share, in a way that properly separates the Ganeti libraries of the various versions.
Currently, Ganeti installs to ${PREFIX}/bin, ${PREFIX}/sbin, and so on, as well as to ${pythondir}/ganeti.
These paths will be changed in the following way.
The set of links for ganeti binaries might change between the versions. However, as the file structure under ${libdir}/ganeti/${VERSION} reflects that of /, two links of differnt versions will never conflict. Similarly, the symbolic links for the python executables will never conflict, as they always point to a file with the same basename directly under ${PREFIX}/share/ganeti/default. Therefore, each version will make sure that enough symbolic links are present in ${PREFIX}/bin, ${PREFIX}/sbin and so on, even though some might be dangling, if a differnt version of ganeti is currently active.
The extra indirection through ${sysconfdir} allows installations that choose to have ${sysconfdir} and ${localstatedir} outside ${PREFIX} to mount ${PREFIX} read-only. The latter is important for systems that choose /usr as ${PREFIX} and are following the Filesystem Hierarchy Standard. For example, choosing /usr as ${PREFIX} and /etc as ${sysconfdir}, the layout for version 2.10 will look as follows.
/
|
+-- etc
| |
| +-- ganeti
| |
| +-- lib -> /usr/lib/ganeti/2.10
| |
| +-- share -> /usr/share/ganeti/2.10
+-- usr
|
+-- bin
| |
| +-- harep -> /usr/lib/ganeti/default/usr/bin/harep
| |
| ...
|
+-- sbin
| |
| +-- gnt-cluster -> /usr/share/ganeti/default/gnt-cluster
| |
| ...
|
+-- ...
|
+-- lib
| |
| +-- ganeti
| |
| +-- default -> /etc/ganeti/lib
| |
| +-- 2.10
| |
| +-- usr
| |
| +-- bin
| | |
| | +-- htools
| | |
| | +-- harep -> htools
| | |
| | ...
| ...
|
+-- share
|
+-- ganeti
|
+-- default -> /etc/ganeti/share
|
+-- 2.10
|
+ -- gnt-cluster
|
+ -- gnt-node
|
+ -- ...
|
+ -- ganeti
|
+-- backend.py
|
+-- ...
|
+-- cmdlib
| |
| ...
...
The actual upgrade process will be done by a new command upgrade to gnt-cluster. If called with the option --to which take precisely one argument, the version to upgrade (or downgrade) to, given as full string with major, minor, revision, and suffix. To be compatible with current configuration upgrade and downgrade procedures, the new version must be of the same major version and either an equal or higher minor version, or precisely the previous minor version.
When executed, gnt-cluster upgrade --to=<version> will perform the following actions.
During the upgrade procedure, the only ganeti process still running is the one instance of gnt-cluster upgrade. This process is also responsible for eventually removing the queue drain. Therefore, we have to provide means to resume this process, if it dies unintentionally. The process itself will handle SIGTERM gracefully by either undoing all changes done so far, or by ignoring the signal all together and continuing to the end; the choice between these behaviors depends on whether change of the configuration has already started (in which case it goes through to the end), or not (in which case the actions done so far are rolled back).
To achieve this, gnt-cluster upgrade will support a --resume option. It is recommended to have gnt-cluster upgrade --resume as an at-reboot task in the crontab. The gnt-cluster upgrade --resume comand first verifies that it is running on the master node, using the same requirement as for starting the master daemon, i.e., confirmed by a majority of all nodes. If it is not the master node, it will remove any possibly existing intend-to-upgrade file and exit. If it is running on the master node, it will check for the existence of an intend-to-upgrade file. If no such file is found, it will simply exit. If found, it will resume at the appropriate stage.
Since gnt-cluster upgrade drains the queue and undrains it later, so any information about a previous drain gets lost. This problem will disappear, once Filtering of jobs for the Ganeti job queue is implemented, as then the undrain will then be restricted to filters by gnt-upgrade.
Since for upgrades we only pause jobs and do not fully drain the queue, we need to be able to transform the job queue into a queue for the new version. The preferred way to obtain this is to keep the serialization format backwards compatible, i.e., only adding new opcodes and new optional fields.
However, even with soft drain, no job is running at the moment cfgupgrade is running. So, if we change the queue representation, including the representation of individual opcodes in any way, cfgupgrade will also modify the queue accordingly. In a jobs-as-processes world, pausing a job will be implemented in such a way that the corresponding process stops after finishing the current opcode, and a new process is created if and when the job is unpaused again.