Design for virtual clusters support¶

Created:	2011-Oct-14
Status:	Partial Implementation
Ganeti-Version:	2.7.0

Introduction¶

Currently there are two ways to test the Ganeti (including HTools) code base:

unittests, which run using mocks as normal user and test small bits of the code
QA/burnin/live-test, which require actual hardware (either physical or virtual) and will build an actual cluster, with one machine to one node correspondence

The difference in time between these two is significant:

the unittests run in about 1-2 minutes
a so-called ‘quick’ QA (without burnin) runs in about an hour, and a full QA could be double that time

On one hand, the unittests have a clear advantage: quick to run, not requiring many machines, but on the other hand QA is actually able to run end-to-end tests (including HTools, for example).

Ideally, we would have an intermediate step between these two extremes: be able to test most, if not all, of Ganeti’s functionality but without requiring actual hardware, full machine ownership or root access.

Current situation¶

Ganeti¶

It is possible, given a manually built config.data and _autoconf.py, to run the masterd under the current user as a single-node cluster master. However, the node daemon and related functionality (cluster initialisation, master failover, etc.) are not directly runnable in this model.

Also, masterd only works as a master of a single node cluster, due to our current “hostname” method of identifying nodes, which results in a limit of maximum one node daemon per machine, unless we use multiple name and IP aliases.

HTools¶

In HTools the situation is better, since it doesn’t have to deal with actual machine management: all tools can use a custom LUXI path, and can even load RAPI data from the filesystem (so the RAPI backend can be tested), and both the ‘text’ backend for hbal/hspace and the input files for hail are text-based, loaded from the file-system.

Proposed changes¶

The end-goal is to have full support for “virtual clusters”, i.e. be able to run a “big” (hundreds of virtual nodes and towards thousands of virtual instances) on a reasonably powerful, but single machine, under a single user account and without any special privileges.

This would have significant advantages:

being able to test end-to-end certain changes, without requiring a complicated setup
better able to estimate Ganeti’s behaviour and performance as the cluster size grows; this is something that we haven’t been able to test reliably yet, and as such we still have not yet diagnosed scaling problems
easier integration with external tools (and even with HTools)

`masterd`¶

As described above, masterd already works reasonably well in a virtual setup, as it won’t execute external programs and it shouldn’t directly read files from the local filesystem (or at least not virtualisation-related, as the master node can be a non-vm_capable node).

`noded`¶

The node daemon executes many privileged operations, but they can be split in a few general categories:

Category	Description	Solution
disk operations	Disk creation and removal	Use only diskless or file-based instances
disk query	Node disk total/free, used in node listing and htools	Not supported currently, could use file-based
hypervisor operations	Instance start, stop and query	Use the fake hypervisor
instance networking	Bridge existence query	Unprivileged operation, can be used with an existing bridge at system level or use NIC-less instances
instance OS operations	OS add, OS rename, export and import	Only used with non-diskless instances; could work with custom OS scripts that just `dd` without mounting filesystems
node networking	IP address management (master ip), IP query, etc.	Not supported; Ganeti will need to work without a master IP; for the IP query operations the test machine would need externally-configured IPs
node add		SSH command must be adjusted
node setup	ssh, /etc/hosts, so on	Can already be disabled from the cluster config
master failover	start/stop the master daemon	Doable (as long as we use a single user), might get tricky w.r.t. paths to executables
file upload	Uploading of system files, job queue files and ganeti config	The only issue could be with system files, which are not owned by the current user; internal ganeti files should be working fine
node oob	Out-of-band commands	Since these are user-defined, we can mock them easily
node OS discovery	List the existing OSes and their properties	No special privileges needed, so works fine as-is
hooks	Running hooks for given operations	No special privileges needed
iallocator	Calling an iallocator script	No special privileges needed
export/import	Exporting and importing instances	When exporting/importing file-based instances, this should work, as the listening ports are dynamically chosen
hypervisor validation	The validation of hypervisor parameters	As long as the hypervisors don’t call to privileged commands, it should work
node powercycle	The ability to power cycle a node remotely	Privileged, so not supported, but anyway not very interesting for testing

It seems that much of the functionality works as is, or could work with small adjustments, even in a non-privileged setup. The bigger problem is the actual use of multiple node daemons per machine.

Multiple `noded` per machine¶

Currently Ganeti identifies node simply by their hostname. Since changing this method would imply significant changes to tracking the nodes, the proposal is to simply have as many IPs per the (single) machine that is used for tests as nodes, and have each IP correspond to a different name, and thus no changes are needed to the core RPC library. Unfortunately this has the downside of requiring root rights for setting up the extra IPs and hostnames.

An alternative option is to implement per-node IP/port support in Ganeti (especially in the RPC layer), which would eliminate the root rights. We expect that this will get implemented as a second step of this design, but as the port is currently static will require changes in many places.

The only remaining problem is with sharing the localstatedir structure (lib, run, log) amongst the daemons, for which we propose to introduce an environment variable (GANETI_ROOTDIR) acting as a prefix for essentially all paths. An environment variable is easier to transport through several levels of programs (shell scripts, Python, etc.) than a command line parameter. In Python code this prefix will be applied to all paths in constants.py. Every virtual node will get its own root directory. The rationale for this is two-fold:

having two or more node daemons writing to the same directory might introduce artificial scenarios not existent in real life; currently noded either owns the entire /var/lib/ganeti directory or shares it with masterd, but never with another noded
having separate directories allows cluster verify to check correctly consistency of file upload operations; otherwise, as long as one node daemon wrote a file successfully, the results from all others are “lost”

In case the use of an environment variable turns out to be too difficult a compile-time prefix path could be used. This would then require one Ganeti installation per virtual node, but it might be good enough.

`rapi`¶

The RAPI daemon is not privileged and furthermore we only need one per cluster, so it presents no issues.

`confd`¶

confd has somewhat the same issues as the node daemon regarding multiple daemons per machine, but the per-address binding still works.

`ganeti-watcher`¶

Since the startup of daemons will be customised with per-IP binds, the watcher either has to be modified to not activate the daemons, or the start-stop tool has to take this into account. Due to watcher’s use of the hostname, it’s recommended that the master node is set to the machine hostname (also a requirement for the master daemon).

CLI scripts¶

As long as the master node is set to the machine hostname, these should work fine.

Cluster initialisation¶

It could be possible that the cluster initialisation procedure is a bit more involved (this was not tried yet). A script will be used to set up all necessary IP addresses and hostnames, as well as creating the initial directory structure. Building config.data manually should not be necessary.

Needed tools¶

With the above investigation results in mind, the only thing we need are:

a tool to setup per-virtual node tree structure of localstatedir (with the help of ensure-dirs) and setup correctly the extra IP/hostnames
changes to the startup daemon tools to launch correctly the daemons per virtual node
changes to constants.py to override the localstatedir path
documentation for running such a virtual cluster
and eventual small fixes to the node daemon backend functionality, to better separate privileged and non-privileged code

Design for virtual clusters support¶

Introduction¶

Current situation¶

Ganeti¶

HTools¶

Proposed changes¶

`masterd`¶

`noded`¶

Multiple `noded` per machine¶

`rapi`¶

`confd`¶

`ganeti-watcher`¶

CLI scripts¶

Cluster initialisation¶

Needed tools¶

Table Of Contents

Previous topic

Next topic

This Page

Design for virtual clusters support¶

Introduction¶

Current situation¶

Ganeti¶

HTools¶

Proposed changes¶

masterd¶

noded¶

Multiple noded per machine¶

rapi¶

confd¶

ganeti-watcher¶

CLI scripts¶

Cluster initialisation¶

Needed tools¶

`masterd`¶

`noded`¶

Multiple `noded` per machine¶

`rapi`¶

`confd`¶

`ganeti-watcher`¶