File-based Storage

Created:2014-Jan-27
Status:Implemented
Ganeti-Version:2.0.0

This page describes the proposed file-based storage for the 2.0 version of Ganeti. The project consists in extending Ganeti in order to support a filesystem image as Virtual Block Device (VBD) in Dom0 as the primary storage for a VM.

Objective

Goals:

  • file-based storage for virtual machines running in a Xen-based Ganeti cluster
  • failover of file-based virtual machines between cluster-nodes
  • export/import file-based virtual machines
  • reuse existing image files
  • allow Ganeti to initialize the cluster without checking for a volume group (e.g. xenvg)

Non Goals:

  • any kind of data mirroring between clusters for file-based instances (this should be achieved by using shared storage)
  • special support for live-migration
  • encryption of VBDs
  • compression of VBDs

Background

Ganeti is a virtual server management software tool built on top of Xen VM monitor and other Open Source software.

Since Ganeti currently supports only block devices as storage backend for virtual machines, the wish came up to provide a file-based backend. Using this file-based option provides the possibility to store the VBDs on basically every filesystem and therefore allows to deploy external data storages (e.g. SAN, NAS, etc.) in clusters.

Overview

Introduction

Xen (and other hypervisors) provide(s) the possibility to use a file as the primary storage for a VM. One file represents one VBD.

Advantages/Disadvantages

Advantages of file-backed VBD:

  • support of sparse allocation
  • easy from a management/backup point of view (e.g. you can just copy the files around)
  • external storage (e.g. SAN, NAS) can be used to store VMs

Disadvantages of file-backed VBD: * possible performance loss for I/O-intensive workloads

  • using sparse files requires care to ensure the sparseness is preserved when copying, and there is no header in which metadata relating back to the VM can be stored

Detailed Design

Terminology

  • VBD (Virtual Block Device): Persistent storage available to a virtual machine, providing the abstraction of an actual block storage device. VBDs may be actual block devices, filesystem images, or remote/network storage.
  • Dom0 (Domain 0): The first domain to be started on a Xen machine. Domain 0 is responsible for managing the system.
  • VM (Virtual Machine): The environment in which a hosted operating system runs, providing the abstraction of a dedicated machine. A VM may be identical to the underlying hardware (as in full virtualization, or it may differ, as in paravirtualization). In the case of Xen the domU (unprivileged domain) instance is meant.
  • QCOW: QEMU (a processor emulator) image format.

Implementation

Managing file-based instances

The option for file-based storage will be added to the ‘gnt-instance’ utility.

Add Instance

Example:

gnt-instance add -t file:[path=[,driver=loop[,reuse[,...]]]] –disk 0:size=5G –disk 1:size=10G -n node -o debian-etch instance2

This will create a file-based instance with e.g. the following files: * /sda -> 5GB * /sdb -> 10GB

The default directory where files will be stored is /srv/ganeti/file-storage/. This can be changed by setting the <path> option. This option denotes the full path to the directory where the files are stored. The filetype will be “raw” for the first release of Ganeti 2.0. However, the code will be extensible to more file types, since Ganeti will store information about the file type of each image file. Internally Ganeti will keep track of the used driver, the file-type and the full path to the file for every VBD. Example: “logical_id” : [FD_LOOP, FT_RAW, "/instance1/sda"] If the --reuse flag is set, Ganeti checks for existing files in the corresponding directory (e.g. /xen/instance2/). If one or more files in this directory are present and correctly named (the naming conventions will be defined in Ganeti version 2.0) Ganeti will set a VM up with these. If no file can be found or the names or invalid the operation will be aborted.

Remove instance

The instance removal will just differ from the actual one by deleting the VBD-files instead of the corresponding block device (e.g. a logical volume).

Starting/Stopping Instance

Here nothing has to be changed, as the xen tools don’t differentiate between file-based or blockdevice-based instances in this case.

Export/Import instance

Provided “dump/restore” is used in the “export” and “import” guest-os scripts, there are no modifications needed when file-based instances are exported/imported. If any other backup-tool (which requires access to the mounted file-system) is used then the image file can be temporarily mounted. This can be done in different ways:

Mount a raw image file via loopback driver:

mount -o loop /srv/ganeti/file-storage/instance1/sda1 /mnt/disk\

Mount a raw image file via blkfront driver (Dom0 kernel needs this module to do the following operation):

xm block-attach 0 tap:aio:/srv/ganeti/file-storage/instance1/sda1 /dev/xvda1 w 0\

mount /dev/xvda1 /mnt/disk

Mount a qcow image file via blkfront driver (Dom0 kernel needs this module to do the following operation)

xm block-attach 0 tap:qcow:/srv/ganeti/file-storage/instance1/sda1 /dev/xvda1 w 0

mount /dev/xvda1 /mnt/disk

High availability features with file-based instances

Failing over an instance

Failover is done in the same way as with block device backends. The instance gets stopped on the primary node and started on the secondary. The roles of primary and secondary get swapped. Note: If a failover is done, Ganeti will assume that the corresponding VBD(s) location (i.e. directory) is the same on the source and destination node. In case one or more corresponding file(s) are not present on the destination node, Ganeti will abort the operation.

Replacing an instance disks

Since there is no data mirroring for file-backed VM there is no such operation.

Evacuation of a node

Since there is no data mirroring for file-backed VMs there is no such operation.

Live migration

Live migration is possible using file-backed VBDs. However, the administrator has to make sure that the corresponding files are exactly the same on the source and destination node.

Xen Setup

File creation

Creation of a raw file is simple. Example of creating a sparse file of 2 Gigabytes. The option “seek” instructs “dd” to create a sparse file:

dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1

Creation of QCOW image files can be done with the “qemu-img” utility (in debian it comes with the “qemu” package).

Config file

The Xen config file will have the following modification if one chooses the file-based disk-template.

1) loopback driver and raw file
disk = ['file:</path/to/file>,sda1,w']
2) blktap driver and raw file
disk = ['tap:aio:,sda1,w']
3) blktap driver and qcow file
disk = ['tap:qcow:,sda1,w']

Other hypervisors

Other hypervisors have mostly different ways to make storage available to their virtual instances/machines. This is beyond the scope of this document.