This is a design document about adding support for an ifdown script responsible for deconfiguring network devices and cleanup changes made by the ifup script. The first implementation will target KVM but it could be ported to Xen as well especially when hotplug gets implemented.
Currently, KVM before instance startup, instance migration and NIC hotplug, it creates a tap and invokes explicitly the kvm-ifup script with the relevant environment (INTERFACE, MAC, IP, MODE, LINK, TAGS, and all the network info if any; NETWORK_SUBNET, NETWORK_TAGS, etc).
For Xen we have the vif-ganeti script (associated with vif-script hypervisor parameter). The main difference is that Xen calls it by itself by passing it as an extra option in the configuration file.
This ifup script can do several things; bridge a tap to a bridge, add ip rules, update a external DNS or DHCP server, enable proxy ARP or proxy NDP, issue openvswitch commands, etc. In general we can divide those actions in two categories:
Currently those changes do not get cleaned up or modified upon instance shutdown, remove, migrate, or NIC hot-unplug. Thus we have stale entries in hosts and most important might have stale/invalid configuration on external components like routers that could affect connectivity.
A workaround could be hooks but:
1) During migrate hooks the environment is the one held in config data and not in runtime files. The NIC configuration might have changed on master but not on the running KVM process (unless hotplug is used). Plus the NIC order in config data might not be the same one on the KVM process.
2) On instance modification, changes are not available on hooks. With other words we do not know the configuration before and after modification.
Since Ganeti is the orchestrator and is the one who explicitly configures host devices (tap, vif) it should be the one responsible for cleanup/ deconfiguration. Especially on a SDN approach this kind of script might be useful to cleanup flows in the cluster in order to ensure correct paths without ping pongs between hosts or connectivity loss for the instance.
We add an new script, kvm-ifdown that is explicitly invoked after:
If an administrator’s custom ifdown script exists (e.g. kvm-ifdown-custom), the kvm-ifdown script executes that script, as happens with kvm-ifup.
Along with that change we should rename custom ifup script from kvm-vif-bridge (which does not make any sense) to kvm-ifup-custom.
In contrary to kvm-ifup, one cannot rely on kvm-ifdown script to be called. A node might die just after a successful migration or after an instance shutdown. In that case, all “undo” operations will not be invoked. Thus, this script should work “on a best effort basis” and the network should not rely on the script being called or being successful. Additionally it should modify only the node local dynamic configs (routes, arp entries, SDN, firewalls, etc.), whereas static ones (DNS, DHCP, etc.) should be modified via hooks.
We cannot account on config data since it might have changed. So the only place we keep our valid data is inside the runtime file. During instance modifications (NIC hot-remove, hot-modify) we have the NIC object from the RPC. We take its UUID and search for the corresponding entry in the runtime file to get further info. After instance shutdown and migration we just take all NICs from the runtime file and invoke the ifdown script for each one
Currently TAP names are kept under /var/run/ganeti/kvm-hypervisor/nics/<instance>/<nic_index>. This is not enough. As told above a NIC’s index might change during instance’s life. An example will make things clear:
The index that will arrive with the RPC will be 1 and if we read the relevant NIC file we will get the tap of the NIC that has been removed on second step but is still existing in the KVM process.
So upon TAP creation we write another file with the same info but named after the NIC’s UUID. The one named after its index can be left for compatibility (Ganeti does not use it; external tools might) Obviously this info will not be available for old instances in the cluster. The ifdown script should be aware of this corner case.
Upon NIC hot-remove we obviously want to wipe everything. But on instance migration we don’t want to reset external configuration like DNS. So we choose to pass an extra positional argument to the ifdown script (it already has the TAP name) that will reflect the context it was invoked with. Please note that de-configuration of external components is not encouraged and should be done via hooks. Still we could easily support it via this extra argument.
In general the same environment passed to ifup script. Except instance’s tags. Those are the only info not kept in runtime file and it can change between ifup and ifdown script execution. The ifdown script must be aware of it and should cleanup everything that ifup script might setup depending on instance tags (e.g. firewalls, etc)