=============== MacVTap support =============== .. contents:: :depth: 3 This is a design document detailing the implementation of `MacVTap` support in Ganeti. The initial implementation targets the KVM hypervisor, but it is intended to be ported to the XEN hypervisor as well. Current state and shortcomings ============================== Currently, Ganeti provides a number of options for networking a virtual machine, that are the ``bridged``, ``routed``, and ``openvswitch`` modes. ``MacVTap``, is another virtual network interface in Linux, that is not supported by Ganeti and that could be added to the currently supported solutions. It is an interface that acts as a regular TUN/TAP device, and thus it is transparently supported by QEMU. Because of its design, it can greatly simplify Ganeti setups using bridged instances. In brief, the MacVTap interface is based on the ``MacVLan`` Linux driver, which basically allows a single physical interface to be associated with multiple IPs and MAC addresses. It is meant to replace the combination of the TUN/TAP and bridge drivers with a more lightweight setup that doesn't require any extra configuration on the host. MacVTap driver is supposed to be more efficient than using a regular bridge. Unlike bridges, it doesn't need to do STP or to discover/learn MAC addresses of other connected devices on a given domain, as it it knows every MAC address it can receive. In fact, it introduces a bridge-like behavior for virtual machines but without the need to have a real bridge setup on the host. Instead, each virtual interface extends an existing network device by attaching directly to it, having its own MAC address, and providing a separate virtual interface to be used by the userspace processes. The MacVTap MAC address is used on the external network and the guest OS cannot spoof or change that address. Background ========== This section provides some extra information for the MacVTap interface, that we took into account for the rest of this design document. MacVTap modes of operation -------------------------- A MacVTap device can operate in one of four modes, just like the MacVLan driver does. These modes determine how the tap endpoints communicate between each other providing various levels of isolation between them. Those modes are the following: * `VEPA (Virtual Ethernet Port Aggregator) mode`: The default mode that is compatible with virtualization-enabled switches. The communication between endpoints on the same lower device, happens through the external switch. * `Bridge mode`: It works almost like a traditional bridge, connecting all endpoints directly to each other. * `Private mode`: An endpoint in this mode can never communicate to any other endpoint on the same lower device. * `Passthru mode`: This mode was added later to work on some limitations on MacVLans (more details here_). MacVTap internals ----------------- The creation of a MacVTap device is *not* done by opening the `/dev/net/tun` device and issuing a corresponding `ioctl()` to register a network device as happens in tap devices. Instead, there are two ways to create a MacVTap device. The first one is using the `rtnetlink(7)` interface directly, just like the `libvirt` or the `iproute2` utilities do, and the second one is to use the high-level `ip-link` command. Since creating a MacVTap interface programmatically using the netlink protocol is a bit more complicated than creating a normal TUN/TAP device, we propose using the ip-link tool for the MacVTap handling, which it is much simpler and straightforward in use, and also fulfills all our needs. Additionally, since Ganeti already depends on `iproute2` being installed in the system, this does not introduces an extra dependency. The following example, creates a MacVTap device using the `ip-link` tool, named `macvtap0`, operating in `bridge` mode, and which is using `eth0` as its lower device: :: ip link add link eth0 name macvtap0 address 1a:36:1b:aa:b3:77 type macvtap mode bridge Once a MacVTap interface is created, an actual character device appears under `/dev`, called ``/dev/tapXX``, where ``XX`` is the interface index of the device. Proposed changes ================ In order to be able to create instances using the MacVTap device driver, we propose some modifications that affect the ``nicparams`` slot of the Ganeti's configuration ``NIC`` object, and also the code part regarding to the KVM hypervisor, as detailed in the following sections. Configuration changes --------------------- The nicparams ``mode`` attribute will be extended to support the ``macvtap`` mode. When using the MacVTap mode, the ``link`` attribute will specify the network device where the MacVTap interfaces will be attached to, the *lower device*. Note that the lower device should exists, otherwise the operation will fail. If no link is specified, the cluster-wide default NIC `link` param will be used instead. We propose the MacVTap mode to be configurable, and so the nicparams object will be extended with an extra slot named ``mvtap_mode``. This parameter will only be used if the network mode is set to MacVTap since it does not make sense in other modes, similarly to the `vlan` slot of the `openvswitch` mode. Below there is a snippet of some of the ``gnt-network`` commands' output: Network connection ~~~~~~~~~~~~~~~~~~ :: gnt-network connect -N mode=macvtap,link=eth0,mvtap_mode=bridge vtap-net vtap_group Network listing ~~~~~~~~~~~~~~~ :: gnt-network list Network Subnet Gateway MacPrefix GroupList br-net 10.48.1.0/24 10.48.1.254 - default (bridged, br0, , ) vtap-net 192.168.100.0/24 192.168.100.1 - vtap_group (macvtap, eth0, , bridge) Network information ~~~~~~~~~~~~~~~~~~~ :: gnt-network info Network name: vtap-net UUID: 4f139b48-3f08-46b1-911f-d37de7e12dcf Serial number: 1 Subnet: 192.168.100.0/28 Gateway: 192.168.100.1 IPv6 Subnet: 2001:db8:2ffc::/64 IPv6 Gateway: 2001:db8:2ffc::1 Mac Prefix: None size: 16 free: 10 (62.50%) usage map: 0 XXXXX..........X 63 (X) used (.) free externally reserved IPs: 192.168.100.0, 192.168.100.1, 192.168.100.15 connected to node groups: vtap_group (mode:macvtap link:eth0 vlan: mvtap_mode:bridge) used by 2 instances: inst1.example.com: 0:192.168.100.2 inst2.example.com: 0:192.168.100.3 Hypervisor changes ------------------ A new method will be introduced in the KVM's `netdev.py` module, named ``OpenVTap``, similar to the ``OpenTap`` method, that will be responsible for creating a MacVTap device using the `ip-link` command, and returning its file descriptor. The ``OpenVtap`` method will receive as arguments the network's `link`, the mode of the MacVTap device (``mvtap_mode``), and also the ``interface name`` of the device to be created, otherwise we will not be able to retrieve it, and so opening the created device. Since we want the names among the MacVTap devices to be unique on the same node, we will make use of the existing ``_GenerateKvmTapName`` method to generate device names but with some modifications, to be adapted to our needs. This method is actually a wrapper over the ``GenerateTapName`` method which currently is being used to generate TAP interface names for NICs meant to be used in instance communication using the ``gnt.com`` prefix. We propose extending this method to generate names for the MacVTap interface too, using the ``vtap`` prefix. To do so, we could add an extra boolean argument in that method, named `inst_comm`, to differentiate the two cases, so that the method will return the appropriate name depending on its usage. This argument will be optional and defaulted to `True`, to not affect the existing API. Currently, the `OpenTap` method handles the `vhost-net`, `mq`, and the `vnet_hdr` features. The `vhost-net` feature will be normally supported for the MacVTap devices too, and so is the `multiqueue` feature, which can be enabled using the `numrxqueues` and `numtxqueues` parameters of the `ip-link` command. The only drawback seems to be the `vnet_hdr` feature modification. For a MacVTap device this flag is enabled by default, and it can not be disabled if a user requests to. A last hypervisor change will be the introduction of a new method named ``_RemoveStaleMacvtapDevs`` that will remove any remaining MacVTap devices, and which is detailed in the following section. Tools changes ------------- Some of the Ganeti tools should also be extended to support MacVTap devices. Those are the ``kvm-ifup`` and ``net-common`` scripts. These modifications will include a new method named ``setup_macvtap`` that will simply change the device status to `UP` just before and instance is started: :: ip link set $INTERFACE up As mentioned in the `Background` section, MacVTap devices are persistent. So, we have to manually delete the MacVTap device after an instance shutdown. To do so, we propose creating a ``kvm-ifdown`` script, that will be invoked after an instance shutdown in order to remove the relevant MacVTap devices. The ``kvm-ifdown`` script should explicitly call the following commands and currently will be functional for MacVTap NICs only: :: ip link set $INTERFACE down ip link delete $INTERFACE To be able to call the `kvm-ifdown` script we should extend the KVM's ``_ConfigureNIC`` method with an extra argument that is the name of the script to be invoked, instead of calling by default the `kvm-ifup` script, as it currently happens. The invocation of the `kvm-ifdown` script will be made through a separate method that we will create, named ``_RemoveStaleMacvtapDevs``. This method will read the NIC runtime files of an instance and will remove any devices using the MacVTap interface. This method will be included in the ``CleanupInstance`` method in order to cover all the cases where an instance using MacVTap NICs needs to be cleaned up. Besides the instance shutdown, there are a couple of cases where the MacVTap NICs will need to be cleaned up too. In case of an internal instance shutdown, where the ``kvmd`` is not enabled, the instance will be in ``ERROR_DOWN`` state. In that case, when the instance is started either by the `ganeti-watcher` or by the admin, the ``CleanupInstance`` method, and consequently the `kvm-ifdown` script, will not be called and so the MacVTap NICs will have to manually be deleted. Otherwise starting the instance will result in more than one MacVTap devices using the same MAC address. An instance migration is another case where deleting an instance will keep stale MacVTap devices on the source node. In order to solve those potential issues, we will explicitly call the ``_RemoveStaleMacvtapDevs`` method after a successful instance migration on the source node, and also before creating a new device for a NIC that is using the MacVTap interface to remove any stale devices. .. _here: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61824/) .. vim: set textwidth=72 : .. Local Variables: .. mode: rst .. fill-column: 72 .. End: