Design for replacing Ganeti’s HTTP server

Current state and shortcomings

The new design for import/export depends on an HTTP server. Ganeti includes a home-grown HTTP server based on Python’s BaseHTTPServer. While it served us well so far, it only implements the very basics of the HTTP protocol. It is, for example, not structured well enough to support chunked transfers (RFC 2616, section 3.6.1), which would have some advantages. In addition, it has not been designed for sending large responses.

In the case of the node daemon the HTTP server can not easily be separated from the actual backend code and therefore must run as “root”. The RAPI daemon does request parsing in the same process as talking to the master daemon via LUXI.

Proposed changes

The proposal is to start using a full-fledged HTTP server in Ganeti and to run Ganeti’s code as FastCGI applications. Reasons:

  • Simplify Ganeti’s code by delegating the details of HTTP and SSL to another piece of software
  • Run HTTP frontend and handler backend as separate processes and users (esp. useful for node daemon, but also import/export and Remote API)
  • Allows implementation of RPC feedback

Software choice

Theoretically any server able of speaking FastCGI to a backend process could be used. However, to keep the number of steps required for setting up a new cluster at roughly the same level, the implementation will be geared for one specific HTTP server at the beginning. Support for other HTTP servers can still be implemented.

After a rough selection of available HTTP servers lighttpd and nginx were the most likely candidates. Both are widely used and tested.

Nginx’ original documentation is in Russian, translations are available in a Wiki. Nginx does not support old-style CGI programs.

The author found lighttpd’s documentation easier to understand and was able to configure a test server quickly. This, together with the support for more technologies, made deciding easier.

With its use as a public-facing web server on a large number of websites (and possibly more behind proxies), lighttpd should be a safe choice. Unlike other webservers, such as the Apache HTTP Server, lighttpd’s codebase is of manageable size.

Initially the HTTP server would only be used for import/export transfers, but its use can be expanded to the Remote API and node daemon (see RPC feedback).

To reduce the attack surface, an option will be provided to configure services (e.g. import/export) to only listen on certain network interfaces.

RPC feedback

HTTP/1.1 supports chunked transfers (RFC 2616, section 3.6.1). They could be used to provide feedback from node daemons to the master, similar to the feedback from jobs. A good use would be to provide feedback to the user during long-running operations, e.g. downloading an instance’s data from another cluster.

WSGI 1.0 (PEP 333) includes the following requirement:

WSGI servers, gateways, and middleware must not delay the transmission of any block; they must either fully transmit the block to the client, or guarantee that they will continue transmission even while the application is producing its next block

This behaviour was confirmed to work with lighttpd and the flup library. FastCGI by itself has no such guarantee; webservers with buffering might require artificial padding to force the message to be transmitted.

The node daemon can send JSON-encoded messages back to the master daemon by separating them using a predefined character (see LUXI). The final message contains the method’s result. pycURL passes each received chunk to the callback set as CURLOPT_WRITEFUNCTION. Once a message is complete, the master daemon can pass it to a callback function inside the job, which then decides on what to do (e.g. forward it as job feedback to the user).

A more detailed design may have to be written before deciding whether to implement RPC feedback.

Software requirements

Lighttpd SSL configuration

The following sample shows how to configure SSL with client certificates in Lighttpd:

$SERVER["socket"] == ":443" {
  ssl.engine = "enable"
  ssl.pemfile = "server.pem" = "ca.pem"
  ssl.use-sslv2  = "disable"
  ssl.cipher-list = "HIGH:-DES:-3DES:-EXPORT:-ADH"
  ssl.verifyclient.activate = "enable"
  ssl.verifyclient.enforce = "enable"
  ssl.verifyclient.exportcert = "enable"
  ssl.verifyclient.username = "SSL_CLIENT_S_DN_CN"