harep - Ganeti auto-repair tool
harep [ [-L | --luxi ] = socket ] [ --job-delay = seconds ]
harep --version
Harep is the Ganeti auto-repair tool. It is able to detect that an instance is broken and to generate a sequence of jobs that will fix it, in accordance to the policies set by the administrator.
Harep is able to recognize what state an instance is in (healthy, suspended, needs repair, repair disallowed, pending repair, repair disallowed, repair failed) and to lead it through a sequence of steps that will bring the instance back to the healthy state. Therefore, harep is mainly meant to be run regularly and frequently using a cron job, so that is can actually follow the instance along all the process. At every run, harep will update the tags it adds to instances that describe its repair status, and will submit jobs that actually perform the required repair operations.
By default, harep only reports on the health status of instances, but doesn't perform any action, as they might be potentially dangerous. Therefore, harep will only touch instances that it has been explicitly authorized to work on.
The tags enabling harep, can be associated to single instances, or to a nodegroup or to the whole cluster, therefore affecting all the instances they contain. The possible tags share the common structure:
ganeti:watcher:autorepair:<type>
where <type>
can have the following values:
fix-storage
: allow disk replacement or fix the backend without affecting the instance itself (broken DRBD secondary)migrate
: allow instance migrationfailover
: allow instance reboot on the secondaryreinstall
: allow disks to be recreated and the instance to be reinstalledEach element in the list of tags, includes all the authorizations of the previous one, with fix-storage
being the least powerful and reinstall
being the most powerful.
In case multiple autorepair tags act on the same instance, only one can actually be active. The conflict is solved according to the following rules:
if multiple tags are in the same object, the least destructive takes precedence.
if the tags are across objects, the nearest tag wins.
Example: A cluster has instances I1 and I2, where I1 has the failover
tag, and the cluster has both fix-storage
and reinstall
. The I1 instance will be allowed to failover
, the I2 instance only to fix-storage
.
Harep doesn't do any hardware failure detection on its own, it relies on nodes being marked as offline by the administrator.
Also harep currently works only for instances with the drbd
and plain
disk templates.
Both these issues will be addressed by a new maintenance daemon in future Ganeti versions, which will supersede harep.
The options that can be passed to the program are as follows:
collect data via Luxi, optionally using the given socket path.
insert this much delay before the execution of repair jobs to allow the tool to continue processing instances.
Report bugs to project website or contact the developers using the Ganeti mailing list.
Ganeti overview and specifications: ganeti(7) (general overview), ganeti-os-interface(7) (guest OS definitions), ganeti-extstorage-interface(7) (external storage providers).
Ganeti commands: gnt-cluster(8) (cluster-wide commands), gnt-job(8) (job-related commands), gnt-node(8) (node-related commands), gnt-instance(8) (instance commands), gnt-os(8) (guest OS commands), gnt-storage(8) (storage commands), gnt-group(8) (node group commands), gnt-backup(8) (instance import/export commands), gnt-debug(8) (debug commands).
Ganeti daemons: ganeti-watcher(8) (automatic instance restarter), ganeti-cleaner(8) (job queue cleaner), ganeti-noded(8) (node daemon), ganeti-masterd(8) (master daemon), ganeti-rapi(8) (remote API daemon).
Ganeti htools: htools(1) (generic binary), hbal(1) (cluster balancer), hspace(1) (capacity calculation), hail(1) (IAllocator plugin), hscan(1) (data gatherer from remote clusters), hinfo(1) (cluster information printer), mon-collector(7) (data collectors interface).
Copyright (C) 2006-2014 Google Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.