Removal of the Config Lock Overhead¶
Created: | 2013-Jul-22 |
---|---|
Status: | Partially Implemented |
Ganeti-Version: | 2.14.0, 2.15.0 |
Contents
This is a design document detailing how the adverse effect of the config lock can be removed in an incremental way.
Current state and shortcomings¶
As a result of the Ganeti daemons refactoring, the configuration is held in a process different from the processes carrying out the Ganeti jobs. Therefore, job processes have to contact WConfD in order to change the configuration. Of course, these modifications of the configuration need to be synchronised.
The current form of synchronisation is via ConfigLock
. Exclusive
possession of this lock guarantees that no one else modifies the
configuration. In other words, the current procedure for a job to
update the configuration is to
- acquire the
ConfigLock
from WConfD, - read the configuration,
- write the modified configuration, and
- release
ConfigLock
.
The current procedure has some drawbacks. These also affect the overall throughput of jobs in a Ganeti cluster.
- At each configuration update, the whole configuration is transferred between the job and WConfD.
- More importantly, however, jobs can only release the
ConfigLock
after the write; the write, in turn, is only confirmed once the configuration is written on disk. In particular, we can only have one update per configuration write. Also, having theConfigLock
is only confirmed to the job, once the new lock status is written to disk.
Additional overhead is caused by the fact that reads are synchronised over a shared config lock. This used to make sense when the configuration was modifiable in the same process to ensure consistent read. With the new structure, all access to the configuration via WConfD are consistent anyway, and local modifications by other jobs do not happen.
Proposed changes for an incremental improvement¶
Ideally, jobs would just send patches for the configuration to WConfD
that are applied by means of atomically updating the respective IORef
.
This, however, would require changing all of Ganeti’s logical units in
one big change. Therefore, we propose to keep the ConfigLock
and,
step by step, reduce its impact till it eventually will be just used
internally in the WConfD process.
Unlocked Reads¶
In a first step, all configuration operations that are synchronised over
a shared config lock, and therefore necessarily read-only, will instead
use WConfD’s readConfig
used to obtain a snapshot of the configuration.
This will be done without modifying the locks. It is sound, as reads to
a Haskell IORef
always yield a consistent value. From that snapshot
the required view is computed locally. This saves two lock-configuration
write cycles per read and, additionally, does not block any concurrent
modifications.
In a second step, more specialised read functions will be added to WConfD
.
This will reduce the traffic for reads.
Cached Reads¶
As jobs synchronize with each other by means of regular locks, the parts
of the configuration relevant for a job can only change while a job waits
for new locks. So, if a job has a copy of the configuration and not asked
for locks afterwards, all read-only access can be done from that copy. While
this will not affect the ConfigLock
, it saves traffic.
Set-and-release action¶
As a typical pattern is to change the configuration and afterwards release
the ConfigLock
. To avoid unnecessary RPC call overhead, WConfD will offer
a combined call. To make that call retryable, it will do nothing if the the
ConfigLock
is not held by the caller; in the return value, it will indicate
if the config lock was held when the call was made.
Short-lived ConfigLock
¶
For a lot of operations, the regular locks already ensure that only
one job can modify a certain part of the configuration. For example,
only jobs with an exclusive lock on an instance will modify that
instance. Therefore, it can update that entity atomically,
without relying on the configuration lock to achieve consistency.
WConfD
will provide such operations. To
avoid interference with non-atomic operations that still take the
config lock and write the configuration as a whole, this operation
will only be carried out at times the config lock is not taken. To
ensure this, the thread handling the request will take the config lock
itself (hence no one else has it, if that succeeds) before the change
and release afterwards; both operations will be done without
triggering a writeout of the lock status.
Note that the thread handling the request has to take the lock in its
own name and not in that of the requesting job. A writeout of the lock
status can still happen, triggered by other requests. Now, if
WConfD
gets restarted after the lock acquisition, if that happened
in the name of the job, it would own a lock without knowing about it,
and hence that lock would never get released.
Approaches considered, but not working¶
Set-and-release action with asynchronous writes¶
Approach¶
As a typical pattern is to change the configuration and afterwards release
the ConfigLock
. To avoid unnecessary delay in this operation (the next
modification of the configuration can already happen while the last change
is written out), WConfD will offer a combined command that will
- set the configuration to the specified value,
- release the config lock,
- and only then wait for the configuration write to finish; it will not wait for confirmation of the lock-release write.
If jobs use this combined command instead of the sequential set followed by release, new configuration changes can come in during writeout of the current change; in particular, a writeout can contain more than one change.
Problem¶
This approach works fine, as long as always either WConfD
can do an ordered
shutdown or the calling process dies as well. If however, we allow random kill
signals to be sent to individual daemons (e.g., by an out-of-memory killer),
the following race occurs. A process can ask for a combined write-and-unlock
operation; while the configuration is still written out, the write out of the
updated lock status already finishes. Now, if WConfD
forcefully gets killed
in that very moment, a restarted WConfD
will read the old configuration but
the new lock status. This will make the calling process believe that its call,
while it didn’t get an answer, succeeded nevertheless, thus resulting in a
wrong configuration state.