Moving instances across node groups

Created:2011-Mar-28
Status:Implemented
Ganeti-Version:2.5.0

This design document explains the changes needed in Ganeti to perform instance moves across node groups. Reader familiarity with the following existing documents is advised:

Motivation and and design proposal

At the moment, moving instances away from their primary or secondary nodes with the relocate and multi-evacuate IAllocator calls restricts target nodes to those on the same node group. This ensures a mobility domain is never crossed, and allows normal operation of each node group to be confined within itself.

It is desirable, however, to have a way of moving instances across node groups so that, for example, it is possible to move a set of instances to another group for policy reasons, or completely empty a given group to perform maintenance operations.

To implement this, we propose the addition of new IAllocator calls to compute inter-group instance moves and group-aware node evacuation, taking into account mobility domains as appropriate. The interface proposed below should be enough to cover the use cases mentioned above.

With the implementation of this design proposal, the previous multi-evacuate mode will be deprecated.

Detailed design

All requests honor the groups’ alloc_policy attribute.

Changing instance’s groups

Takes a list of instances and a list of node group UUIDs; the instances will be moved away from their current group, to any of the groups in the target list. All instances need to have their primary node in the same group, which may not be a target group. If the target group list is empty, the request is simply “change group” and the instances are placed in any group but their original one.

Node evacuation

Evacuates instances off their primary nodes. The evacuation mode can be given as primary-only, secondary-only or all. The call is given a list of instances whose primary nodes need to be in the same node group. The returned nodes need to be in the same group as the original primary node.

Result

In all storage models, an inter-group move can be modeled as a sequence of replace secondary, migration and failover operations (when shared storage is used, they will all be failover or migration operations within the corresponding mobility domain).

The result of the operations described above must contain two lists of instances and a list of jobs (each of which is a list of serialized opcodes) to actually execute the operation. Job dependencies can be used to force jobs to run in a certain order while still making use of parallelism.

The two lists of instances describe which instances could be moved/migrated and which couldn’t for some reason (“unsuccessful”). The union of the instances in the two lists must be equal to the set of instances given in the original request. The successful list of instances contains elements as follows:

(instance name, target group name, [chosen node names])

The choice of names is simply for readability reasons (for example, Ganeti could log the computed solution in the job information) and for being able to check (manually) for consistency that the generated opcodes match the intended target groups/nodes. Note that for the node-evacuate operation, the group is not changed, but it should still be returned as such (as it’s easier to have the same return type for both operations).

The unsuccessful list of instances contains elements as follows:

(instance name, explanation)

where explanation is a string describing why the plugin was not able to relocate the instance.

The client is given a list of job IDs (see the design for LU-generated jobs) which it can watch. Failures should be reported to the user.

Example job list:

[
  # First job
  [
    { "OP_ID": "OP_INSTANCE_MIGRATE",
      "instance_name": "inst1.example.com",
    },
    { "OP_ID": "OP_INSTANCE_MIGRATE",
      "instance_name": "inst2.example.com",
    },
  ],
  # Second job
  [
    { "OP_ID": "OP_INSTANCE_REPLACE_DISKS",
      "depends": [
        [-1, ["success"]],
        ],
      "instance_name": "inst2.example.com",
      "mode": "replace_new_secondary",
      "remote_node": "node4.example.com",
    },
  ],
  # Third job
  [
    { "OP_ID": "OP_INSTANCE_FAILOVER",
      "depends": [
        [-2, []],
        ],
      "instance_name": "inst8.example.com",
    },
  ],
]

Accepted opcodes:

  • OP_INSTANCE_FAILOVER
  • OP_INSTANCE_MIGRATE
  • OP_INSTANCE_REPLACE_DISKS