MacVTap support

This is a design document detailing the implementation of MacVTap support in Ganeti. The initial implementation targets the KVM hypervisor, but it is intended to be ported to the XEN hypervisor as well.

Current state and shortcomings

Currently, Ganeti provides a number of options for networking a virtual machine, that are the bridged, routed, and openvswitch modes. MacVTap, is another virtual network interface in Linux, that is not supported by Ganeti and that could be added to the currently supported solutions. It is an interface that acts as a regular TUN/TAP device, and thus it is transparently supported by QEMU. Because of its design, it can greatly simplify Ganeti setups using bridged instances.

In brief, the MacVTap interface is based on the MacVLan Linux driver, which basically allows a single physical interface to be associated with multiple IPs and MAC addresses. It is meant to replace the combination of the TUN/TAP and bridge drivers with a more lightweight setup that doesn’t require any extra configuration on the host. MacVTap driver is supposed to be more efficient than using a regular bridge. Unlike bridges, it doesn’t need to do STP or to discover/learn MAC addresses of other connected devices on a given domain, as it it knows every MAC address it can receive. In fact, it introduces a bridge-like behavior for virtual machines but without the need to have a real bridge setup on the host. Instead, each virtual interface extends an existing network device by attaching directly to it, having its own MAC address, and providing a separate virtual interface to be used by the userspace processes. The MacVTap MAC address is used on the external network and the guest OS cannot spoof or change that address.

Background

This section provides some extra information for the MacVTap interface, that we took into account for the rest of this design document.

MacVTap modes of operation

A MacVTap device can operate in one of four modes, just like the MacVLan driver does. These modes determine how the tap endpoints communicate between each other providing various levels of isolation between them. Those modes are the following:

  • VEPA (Virtual Ethernet Port Aggregator) mode: The default mode that is compatible with virtualization-enabled switches. The communication between endpoints on the same lower device, happens through the external switch.
  • Bridge mode: It works almost like a traditional bridge, connecting all endpoints directly to each other.
  • Private mode: An endpoint in this mode can never communicate to any other endpoint on the same lower device.
  • Passthru mode: This mode was added later to work on some limitations on MacVLans (more details here).

MacVTap internals

The creation of a MacVTap device is not done by opening the /dev/net/tun device and issuing a corresponding ioctl() to register a network device as happens in tap devices. Instead, there are two ways to create a MacVTap device. The first one is using the rtnetlink(7) interface directly, just like the libvirt or the iproute2 utilities do, and the second one is to use the high-level ip-link command. Since creating a MacVTap interface programmatically using the netlink protocol is a bit more complicated than creating a normal TUN/TAP device, we propose using the ip-link tool for the MacVTap handling, which it is much simpler and straightforward in use, and also fulfills all our needs. Additionally, since Ganeti already depends on iproute2 being installed in the system, this does not introduces an extra dependency.

The following example, creates a MacVTap device using the ip-link tool, named macvtap0, operating in bridge mode, and which is using eth0 as its lower device:

ip link add link eth0 name macvtap0 address 1a:36:1b:aa:b3:77 type macvtap mode bridge

Once a MacVTap interface is created, an actual character device appears under /dev, called /dev/tapXX, where XX is the interface index of the device.

Proposed changes

In order to be able to create instances using the MacVTap device driver, we propose some modifications that affect the nicparams slot of the Ganeti’s configuration NIC object, and also the code part regarding to the KVM hypervisor, as detailed in the following sections.

Configuration changes

The nicparams mode attribute will be extended to support the macvtap mode. When using the MacVTap mode, the link attribute will specify the network device where the MacVTap interfaces will be attached to, the lower device. Note that the lower device should exists, otherwise the operation will fail. If no link is specified, the cluster-wide default NIC link param will be used instead.

We propose the MacVTap mode to be configurable, and so the nicparams object will be extended with an extra slot named mvtap_mode. This parameter will only be used if the network mode is set to MacVTap since it does not make sense in other modes, similarly to the vlan slot of the openvswitch mode.

Below there is a snippet of some of the gnt-network commands’ output:

Network connection

gnt-network connect -N mode=macvtap,link=eth0,mvtap_mode=bridge vtap-net vtap_group

Network listing

gnt-network list

Network  Subnet           Gateway       MacPrefix GroupList
br-net   10.48.1.0/24     10.48.1.254   -         default (bridged, br0, , )
vtap-net 192.168.100.0/24 192.168.100.1 -         vtap_group (macvtap, eth0, , bridge)

Network information

gnt-network info

Network name: vtap-net
UUID: 4f139b48-3f08-46b1-911f-d37de7e12dcf
Serial number: 1
Subnet: 192.168.100.0/28
Gateway: 192.168.100.1
IPv6 Subnet: 2001:db8:2ffc::/64
IPv6 Gateway: 2001:db8:2ffc::1
Mac Prefix: None
size: 16
free: 10 (62.50%)
usage map:
      0 XXXXX..........X                                   63
       (X) used    (.) free
externally reserved IPs:
  192.168.100.0, 192.168.100.1, 192.168.100.15
connected to node groups:
  vtap_group (mode:macvtap link:eth0 vlan: mvtap_mode:bridge)
used by 2 instances:
  inst1.example.com: 0:192.168.100.2
  inst2.example.com: 0:192.168.100.3

Hypervisor changes

A new method will be introduced in the KVM’s netdev.py module, named OpenVTap, similar to the OpenTap method, that will be responsible for creating a MacVTap device using the ip-link command, and returning its file descriptor. The OpenVtap method will receive as arguments the network’s link, the mode of the MacVTap device (mvtap_mode), and also the interface name of the device to be created, otherwise we will not be able to retrieve it, and so opening the created device.

Since we want the names among the MacVTap devices to be unique on the same node, we will make use of the existing _GenerateKvmTapName method to generate device names but with some modifications, to be adapted to our needs. This method is actually a wrapper over the GenerateTapName method which currently is being used to generate TAP interface names for NICs meant to be used in instance communication using the gnt.com prefix. We propose extending this method to generate names for the MacVTap interface too, using the vtap prefix. To do so, we could add an extra boolean argument in that method, named inst_comm, to differentiate the two cases, so that the method will return the appropriate name depending on its usage. This argument will be optional and defaulted to True, to not affect the existing API.

Currently, the OpenTap method handles the vhost-net, mq, and the vnet_hdr features. The vhost-net feature will be normally supported for the MacVTap devices too, and so is the multiqueue feature, which can be enabled using the numrxqueues and numtxqueues parameters of the ip-link command. The only drawback seems to be the vnet_hdr feature modification. For a MacVTap device this flag is enabled by default, and it can not be disabled if a user requests to.

A last hypervisor change will be the introduction of a new method named _RemoveStaleMacvtapDevs that will remove any remaining MacVTap devices, and which is detailed in the following section.

Tools changes

Some of the Ganeti tools should also be extended to support MacVTap devices. Those are the kvm-ifup and net-common scripts. These modifications will include a new method named setup_macvtap that will simply change the device status to UP just before and instance is started:

ip link set $INTERFACE up

As mentioned in the Background section, MacVTap devices are persistent. So, we have to manually delete the MacVTap device after an instance shutdown. To do so, we propose creating a kvm-ifdown script, that will be invoked after an instance shutdown in order to remove the relevant MacVTap devices. The kvm-ifdown script should explicitly call the following commands and currently will be functional for MacVTap NICs only:

ip link set $INTERFACE down
ip link delete $INTERFACE

To be able to call the kvm-ifdown script we should extend the KVM’s _ConfigureNIC method with an extra argument that is the name of the script to be invoked, instead of calling by default the kvm-ifup script, as it currently happens.

The invocation of the kvm-ifdown script will be made through a separate method that we will create, named _RemoveStaleMacvtapDevs. This method will read the NIC runtime files of an instance and will remove any devices using the MacVTap interface. This method will be included in the CleanupInstance method in order to cover all the cases where an instance using MacVTap NICs needs to be cleaned up.

Besides the instance shutdown, there are a couple of cases where the MacVTap NICs will need to be cleaned up too. In case of an internal instance shutdown, where the kvmd is not enabled, the instance will be in ERROR_DOWN state. In that case, when the instance is started either by the ganeti-watcher or by the admin, the CleanupInstance method, and consequently the kvm-ifdown script, will not be called and so the MacVTap NICs will have to manually be deleted. Otherwise starting the instance will result in more than one MacVTap devices using the same MAC address. An instance migration is another case where deleting an instance will keep stale MacVTap devices on the source node. In order to solve those potential issues, we will explicitly call the _RemoveStaleMacvtapDevs method after a successful instance migration on the source node, and also before creating a new device for a NIC that is using the MacVTap interface to remove any stale devices.