Network Management (revised)

This is a design document detailing how to extend the existing network management and make it more flexible and able to deal with more generic use cases.

Current state and shortcomings

Currently in Ganeti, networks are tightly connected with IP pools, since creation of a network implies the existence of one subnet and the corresponding IP pool. This design does not allow common scenarios like:

  • L2 only networks
  • IPv6 only networks
  • Networks without an IP pool
  • Networks with an IPv6 pool
  • Networks with multiple IP pools (alternative to externally reserving IPs)

Additionally one cannot have multiple IP pools inside one network. Finally, from the instance perspective, a NIC cannot get more than one IPs (v4 and v6).

Proposed changes

In order to deal with the above shortcomings, we propose to extend the existing networks in Ganeti and support:

  1. Networks with multiple subnets
  2. Subnets with multiple IP pools
  3. NICs with multiple IPs from various subnets of a single network

These changes bring up some design and implementation issues that we discuss in the following sections.

Semantics

Quoting the initial network management design doc “an IP pool consists of two bitarrays. Specifically the reservations bitarray which holds all IP addresses reserved by Ganeti instances and the external reservations bitarray with all IPs that are excluded from the IP pool and cannot be assigned automatically by Ganeti to instances (via ip=pool)”.

Without violating those semantics, here, we clarify the following definitions.

network: A cluster level taggable configuration object with a user-provider name, (e.g. network1, network2), UUID and MAC prefix.

L2: The mode and link with which we connect a network to a nodegroup. A NIC attached to a network will inherit this info, just like connecting an Ethernet cable to a physical NIC. In this sense we only have one L2 info per NIC.

L3: A CIDR and a gateway related to the network. Since a NIC can have multiple IPs on the same cable each network can have multiple L3 info with the restriction that they do not overlap with each other. The gateway is optional (just like with current implementation). No gateway can be used for private networks that do not have a default route.

subnet: A subnet is the above L3 info plus some additional information (see below).

ip: A valid IP should reside in a network’s subnet, and should not be used by more than one instance. An IP can be either obtained dynamically from a pool or requested explicitly from a subnet (or a pool).

range: Sequential IPs inside one subnet calculated either from the first IP and a size (e.g. start=192.0.2.10, size=10) or the first IP and the last IP (e.g. start=192.0.2.10, end=192.0.2.19). A single IP can also be thought of as an IP range with size=1 (see configuration changes).

reservations: All IPs that are used by instances in the cluster at any time.

external reservations: All IPs that are supposed to be reserved by the admin for either some external component or specific instances. If one instance requests an external IP explicitly (ip=192.0.2.100), Ganeti will allow the operation only if --force is given. Still, the admin can externally reserve an IP that is already in use by an instance, as happens now. This helps to reserve an IP for future use and at the same time prevent any possible race between the instance that releases this IP and another that tries to retrieve it.

pool: A (range, reservations, name) tuple from which instances can dynamically obtain an IP. Reservations is a bitarray with length the size of the range, and is needed so that we know which IPs are available at any time without querying all instances. The use of name is explained below. A subnet can have multiple pools.

Split L2 from L3

Currently networks in Ganeti do not separate L2 from L3. This means that one cannot use L2 only networks. The reason is because the CIDR (passed currently with the --network option) and the derived IP pool are mandatory. This design makes L3 info optional. This way we can have an L2 only network just by connecting a Ganeti network to a nodegroup with the desired mode and link. Then one could add one or more subnets to the existing network.

Multiple Subnets per Network

Currently the IPv4 CIDR is mandatory for a network. Also a network can obtain at most one IPv4 CIDR and one IPv6 CIDR. These restrictions will be lifted.

This design doc introduces support for multiple subnets per network. The L3 info will be moved inside the subnet. A subnet will have a name and a uuid just like NIC and Disk config objects. Additionally it will contain the dhcp flag which is explained below, and the pools and external fields which are mentioned in the next section. Only the cidr will be mandatory.

Any subnet related actions will be done via the new --subnet option. Its syntax will be similar to --net.

The network’s subnets must not overlap with each other. Logic will validate any operations related to reserving/releasing of IPs and check whether a requested IP is included inside one of the network’s subnets. Just like currently, the L3 info will be exported to NIC configuration hooks and scripts as environment variables. The example below adds subnets to a network:

gnt-network modify --subnet add:cidr=10.0.0.0/24,gateway=10.0.0.1,dhcp=true net1
gnt-network modify --subnet add:cidr=2001::/64,gateway=2001::1,dhcp=true net1

To remove a subnet from a network one should use:

gnt-network modify --subnet some-ident:remove network1

where some-ident can be either a CIDR, a name or a UUID. Ganeti will allow this operation only if no instances use IPs from this subnet.

Since DHCP is allowed only for a single CIDR on the same cable, the subnet must have a dhcp flag. Logic must not allow more that one subnets of the same version (4 or 6) in the same network to have DHCP enabled. To modify a subnet’s name or the dhcp flag one could use:

gnt-network modify --subnet some-ident:modify,dhcp=false,name=foo network1

This would search for a registered subnet that matches the identifier, disable DHCP on it and change its name. The dhcp parameter is used only for validation purposes and does not make Ganeti starting a DHCP service. It will just be exported to external scripts (ifup and hooks) and handled accordingly.

Changing the CIDR or the gateway of a subnet should also be supported.

gnt-network modify --subnet some-ident:modify,cidr=192.0.2.0/22 net1
gnt-network modify --subnet some-ident:modify,cidr=192.0.2.32/28 net1
gnt-network modify --subnet some-ident:modify,gateway=192.0.2.40 net1

Before expanding a subnet logic should should check for overlapping subnets. Shrinking the subnet should be allowed only if the ranges that are about to be trimmed are not included either in pool reservations or external ranges.

Multiple IP pools per Subnet

Currently IP pools are automatically created during network creation and include the whole subnet. Some IPs can be excluded from the pool by passing them explicitly with --add-reserved-ips option.

Still for IPv6 subnets or even big IPv4 ones this might be insufficient. It is impossible to have two bitarrays for a /64 prefix. Even for IPv4 networks a /20 subnet currently requires 8K long bitarrays. And the second 4K is practically useless since the external reservations are way less than the actual reservations.

This design extract IP pool management from the network logic, and pools will become optional. Currently the pool is created based on the network’s CIDR. With multiple subnets per network, we should be able to create and add IP pools to a network (and eventually to the corresponding subnet). Each pool will have an optional user friendly name so that the end user can refer to it (see instance related operations).

The user will be able to obtain dynamically an IP only if we have already defined a pool for a network’s subnet. One would use ip=pool for the first available IP of the first available pool, or ip=some-pool-name for the first available IP of a specific pool.

Any pool related actions will be done via the new --pool option.

In order to add a pool a relevant subnet should pre-exist. Overlapping pools won’t be allowed. For example:

gnt-network modify --pool add:192.0.2.10-192.0.2.100,name=pool1 net1
gnt-network modify --pool add:10.0.0.7-10.0.0.20 net1
gnt-network modify --pool add:10.0.0.100 net1

will first parse and find the ranges. Then for each range, Ganeti will try to find a matching subnet meaning that a pool must be a subrange of the subnet. If found, the range with empty reservations will be appended to the list of the subnet’s pools. Moreover, logic must be added to reserve the IPs that are currently in use by instances of this network.

Adding a pool can be easier if we associate it directly with a subnet. For example on could use the following shortcuts:

gnt-network modify --subnet add:cidr=10.0.0.0/27,pool net1
gnt-network modify --pool add:subnet=some-ident
gnt-network modify --pool add:10.0.0.0/27 net1

During pool removal, logic should be added to split pools if ranges given overlap existing ones. For example:

gnt-network modify --pool remove:192.0.2.20-192.0.2.50 net1

will split the pool previously added (10-100) into two new ones; 10-19 and 51-100. The corresponding bitarrays will be trimmed accordingly. The name will be preserved.

The same things apply to external reservations. Just like now, modifications will take place via the --add|remove-reserved-ips option. Logic must be added to support IP ranges.

gnt-network modify --add-reserved-ips 192.0.2.20-192.0.2.50 net1

Based on the aforementioned we propose the following changes:

  1. Change the IP pool representation in config data.

Existing reservations and external_reservations bitarrays will be removed. Instead, for each subnet we will have:

  • pools: List of (IP range, reservations bitarray) tuples.
  • external: List of IP ranges

For external ranges the reservations bitarray is not needed since this will be all 1’s.

A configuration example could be:

net1 {
  subnets [
    uuid1 {
        name: subnet1
        cidr: 192.0.2.0/24
        pools: [
          {range:Range(192.0.2.10, 192.0.2.15), reservations: 00000, name:pool1}
          ]
        reserved: [192.0.2.15]
        }
    uuid2  {
        name: subnet2
        cidr: 10.0.0.0/24
        pools: [
          {range:10.0.0.8/29, reservations: 00000000, name:pool3}
          {range:10.0.0.40-10.0.0.45, reservations: 000000, name:pool3}
          ]
        reserved: [Range(10.0.0.8, 10.0.0.15), 10.2.4.5]
        }
    ]
}

Range(start, end) will be some json representation of an IPRange(). We decide not to store external reservations as pools (and in the same list) since we get the following advantages:

  • Keep the existing semantics for pools and external reservations.
  • Each list has similar entries: one has pools the other has ranges. The pool must have a bitarray, and has an optional name. It is meaningless to add a name and a bitarray to external ranges.
  • Each list must not have overlapping ranges. Still external reservations can overlap with pools.
  • The –pool option supports add|remove|modify command just like –net and –disk and operate on single entities (a restriction that is not needed for external reservations).
  • Another thing, and probably the most important, is that in order to get the first available IP, only the reserved list must be checked for conflicts. The ipaddr.summarize_address_range(first, last) could be very helpful.
  1. Change the network module logic.

The above changes should be done in the network module and be transparent to the rest of the Ganeti code. If a random IP from the networks is requested, Ganeti searches for an available IP from the first pool of the first subnet. If it is full it gets to the next pool. Then to the next subnet and so on. Of course the external IP ranges will be excluded. If an IP is explicitly requested, Ganeti will try to find a matching subnet. Its pools and external will be checked for availability. All this logic will be extracted in a separate class with helper methods for easier manipulation of IP ranges and bitarrays.

Bitarray processing can be optimized too. The usage of bitarrays will be reduced since (a) we no longer have external_reservations and (b) pools will have shorter bitarrays (i.e. will not have to cover the whole subnet). Besides that, we could keep the bitarrays in memory, so that in most cases (e.g. adding/removing reservations, querying), we don’t keep converting strings to bitarrays and vice versa. Also, the Haskell code could as well keep this in memory as a bitarray, and validate it on load.

  1. Changes in config module.
We should not have instances with the same IP inside the same network. We introduce _AllIPs() helper config method that will hold all existing (IP, network) tuples. Config logic will check this list as well before passing it to TemporaryReservationManager.
  1. Change the query mechanism.

Since we have more that one subnets the new subnets field will include a list of:

  • cidr: IPv4 or IPv6 CIDR
  • gateway: IPv4 or IPv6 address
  • dhcp: True or False
  • name: The user friendly name for the subnet

Since we want to support small pools inside big subnets, current query results are not practical as far as the map field is concerned. It should be replaced with the new pools field for each subnet, which will contain:

  • start: The first IP of the pool
  • end: The last IP of the pool
  • map: A string with ‘X’ for reserved IPs (either external or not) and with ‘.’ for all available ones inside the pool

Multiple IPs per NIC

Currently IP is a simple string inside the NIC object and there is a one-to-one mapping between the ip and the network slots. The whole logic behind this is that a NIC belongs to a network (cable) and inherits its mode and link. This rational will not change.

Since this design adds support for multiple subnets per network, a NIC must be able to obtain multiple IPs from various subnets of the same network. Thus we change the ip slot into list.

We introduce a new ipX attribute. For backwards compatibility ip will denote ip0. During instance related operations one could use something like:

gnt-instance add --net 0:ip0=192.0.2.4,ip1=pool,ip2=some-pool-name,network=network1 inst1
gnt-instance add --net 0:ip=pool,network1 inst1

This will be parsed, converted to a proper list (e.g. ip = [192.0.2.4, “pool”, “some-pool-name”]) and finally passed to the corresponding opcode. Based on the previous example, here the first IP will match subnet1, the second IP will be retrieved from the first available pool of the first available subnet, and the third from the pool with the some-pool name.

During instance modification, the ip option will refer to the first IP of the NIC, whereas the ipX will refer to the X’th IP. As with NICs we start counting from 0 so ip1 will refer to the second IP. For example one should pass:

--net 0:modify,ip1=1.2.3.10

to change the second IP of the first NIC to 1.2.3.10,

--net -1:add,ip0=pool,ip1=1.2.3.4,network=test

to add a new NIC with two IPs, and

--net 1:modify,ip1=none

to remove the second IP of the second NIC.

Configuration changes

IPRange config object:
Introduce new config object that will hold ranges needed by pools, and reservations. It will be either a tuple of (start, size, end) or a simple string. The end is redundant and can derive from start and size in runtime, but will appear in the representation for readability and debug reasons.
Pool config object:
Introduce new config object to represent a single subnet’s pool. It will have the range, reservations, name slots. The range slot will be an IPRange config object, the reservations a bitarray and the name a simple string.
Subnet config object:
Introduce new config object with slots: name, uuid, cidr, gateway, dhcp, pools, external. Pools is a list of Pool config objects. External is a list of IPRange config objects. All ranges must reside inside the subnet’s CIDR. Only cidr will be mandatory. The dhcp attribute will be False by default.
Network config objects:
The L3 and the IP pool representation will change. Specifically all slots besides name, mac_prefix, and tag will be removed. Instead the slot subnets with a list of Subnet config objects will be added.
NIC config objects:
NIC’s network slot will be removed and the ip slot will be modified to a list of strings.
KVM runtime files:
Any change done in config data must be done also in KVM runtime files. For this purpose the existing _UpgradeSerializedRuntime() can be used.

Exported variables

The exported variables during instance related operations will be just like Linux uses aliases for interfaces. Specifically:

IP:i for the ith IP.

NETWORK_*:i for the ith subnet. * is SUBNET, GATEWAY, DHCP.

In case of hooks those variables will be prefixed with INSTANCE_NICn for the nth NIC.

Backwards Compatibility

The existing networks representation will be internally modified. They will obtain one subnet, and one pool with range the whole subnet.

During gnt-network add if the deprecated --network option is passed will still create a network with one subnet, and one IP pool with the size of the subnet. Otherwise --subnet and --pool options will be needed.

The query mechanism will also include the deprecated map field. For the newly created network this will contain only the mapping of the first pool. The deprecated network, gateway, network6, gateway6 fields will point to the first IPv4 and IPv6 subnet accordingly.

During instance related operation the ip argument of the --net option will refer to the first IP of the NIC.

Hooks and scripts will still have the same environment exported in case of single IP per NIC.

This design allows more fine-grained configurations which in turn yields more flexibility and a wider coverage of use cases. Still basic cases (the ones that are currently available) should be easy to set up. Documentation will be enriched with examples for both typical and advanced use cases of gnt-network.