Contents
This is a design document detailing the semantics of the fine-grained control of jobs in Ganeti. For the implementation there will be a separate design document that also describes the vision for the Ganeti daemon structure.
Control of the Ganeti job queue is quite limited. There is a single status bit, the “drained flag”. If set, no new jobs are accepted to the queue. This is too coarse for some use cases.
We propose to add filters on the job queue. These will be part of the configuration and as such are persisted with it. Conceptionally, the filters are always processed when a job enters the queue and while it is still in the queue. Of course, in the implementation, reevaluation is only carried out, if something could make the result change, e.g., a new job is entered to the queue, or the filter rules are changed. There is no distinction between filter processing when a job is about to enter the queue and while it is in the queue, as this can be expressed by the filter rules themselves (see predicates below).
Filter rules are given by the following data.
A UUID. This ensures that there can be different filter rules that otherwise have all parameters equal. In this way, multiple drains for different reasons are possible. The UUID is used to address the filter rule, in particular for deletion.
If no UUID is provided at rule addition, Ganeti will create one.
The watermark. This is the highest job id ever used, as valid in the moment when the filter was added. This data will be added automatically upon addition of the filter.
A priority. This is a non-negative integer. Filters are processed in order of increasing priority until a rule applies. While there is a well-defined order in which rules of the same priority are evaluated (increasing watermark, then the uuid, are taken as tie breakers), it is not recommended to have rules of the same priority that overlap and have different actions associated.
A list of predicates. The rule fires, if all of them hold true for the job.
An action. For the time being, one of the following, but more actions might be added in the future (in particular, future implementations might add an action making filtering continue with a different filter chain).
A reason trail, in the same format as reason trails for opcodes. This allows to find out, which maintenance (or other reason) caused the addition of this filter rule.
A predicate is a list, with the first element being the name of the predicate and the rest being parameters suitable for that predicate. In most cases, the name of the predicate will be a field of a job, and there will be a single parameter, which is a boolean expression (filter) in the sense of the Ganeti query language. However, no assumption should be made that all predicates are of this shape. More predicates may be added in the future.
Draining the queue.
{'priority': 0,
'predicates': [['jobid', ['>', 'id', 'watermark']]],
'action': 'REJECT'}
Soft draining could be achieved by replacing REJECT by PAUSE in the above example.
Pausing all new jobs not belonging to a specific maintenance.
{'priority': 1,
'predicates': [['jobid', ['>', 'id', 'watermark']],
['reason', ['!', ['=~', 'reason', 'maintenance pink bunny']]]],
'action': 'PAUSE'}
Canceling all queued instance creations and disallowing new such jobs.
{'priority': 1,
'predicates': [['opcode', ['=', 'OP_ID', 'OP_INSTANCE_CREATE']]],
'action': 'REJECT'}
Since queue control is intended to be used by external maintenance-handling tools as well, the primary interface for manipulating queue filters is the Ganeti remote API. For convenience, a command-line interface will be added as well.
The following resources will be added.
Filtering of jobs is not a security feature. It merely serves the purpose of coordinating efforts and avoiding accidental conflicting jobs. Everybody with appropriate credentials can modify the filter rules, not just the originator of a rule. To avoid accidental lock-out, requests modifying the queue are executed directly and not going through the queue themselves.