This is a design document about implementing support for huge pages in Ganeti. (Please note that Ganeti works with Transparent Huge Pages i.e. THP and any reference in this document to Huge Pages refers to explicit Huge Pages).
The Linux kernel allows using pages of larger size by setting aside a portion of the memory. Using larger page size may enhance the performance of applications that require a lot of memory by improving page hits. To use huge pages, memory has to be reserved beforehand. This portion of memory is subtracted from free memory and is considered as in use. Currently Ganeti cannot take proper advantage of huge pages. On a node, if huge pages are reserved and are available to fulfill the VM request, Ganeti fails to recognize huge pages and considers the memory reserved for huge pages as used memory. This leads to failure of launching VMs on a node where memory is available in the form of huge pages rather than normal pages.
The following components will be changed in order for Ganeti to take advantage of Huge Pages.
Currently, It is possible to set or modify huge pages mount point at cluster level via the hypervisor parameter mem_path as:
$ gnt-cluster init \
>--enabled-hypervisors=kvm -nic-parameters link=br100 \
> -H kvm:mem_path=/mount/point/for/hugepages
This hypervisor parameter is inherited by all the instances as default although it can be overriden at the instance level.
The following changes will be made to the inheritence behaviour.
The hypervisor parameter mem_path and all other hypervisor parameters will be made available at the node group level (in addition to the cluster level), so that users can set defaults for the node group:
$ gnt-group add/modify\
> -H hv:parameter=value
This changes the hypervisor inheritence level as:
cluster -> group -> OS -> instance
Furthermore, the hypervisor parameter mem_path will be changeable only at the cluster or node group level and users must not be able to override this at OS or instance level. The following command must produce an error message that mem_path may only be set at either the cluster or the node group level:
$ gnt-instance add -H kvm:mem_path=/mount/point/for/hugepages
Memory management of Ganeti will be improved by creating separate pools for memory used by the node itself, memory used by the hypervisor and the memory reserved for huge pages as: - mtotal/xen (Xen memory) - mfree/xen (Xen unused memory) - mtotal/hp (Memory reserved for Huge Pages) - mfree/hp (Memory available from unused huge pages) - mpgsize/hp (Size of a huge page)
mfree and mtotal will be changed to mean “the total and free memory for the default method in this cluster/nodegroup”. Note that the default method depends both on the default hypervisor and its parameters.
If huge pages are set as default for a cluster of node group, then iAllocator must consider the huge pages memory on the nodes, as a parameter when trying to find the best node for the VM. Note that the iallocator will also be changed to use the correct parameter depending on the cluster/group.
The cluster balancer (hbal) will be changed to use the default memory pool and recognize memory reserved for huge pages when trying to rebalance the cluster.