Procedure: Signer failure

Written by Rick van Rein in category: Procedures, Resilience

When?
If either the master or slave instance of OpenDNSSEC fails, follow the procedures below. It is assumed that an empty instance is available as a cloneable virtual machine.

What?

  • The failed signer is fenced and/or destroyed
  • At most one master is active (at any time)
  • One master is active after completing the procedure below
  • Eventually, one slave signer is back in sync with the master signer

Why?
Since OpenDNSSEC is currently unsuitable for multi-master mode, there may never be more than one actively signing master instance of OpenDNSSEC. We do setup slave instances which copy-in the work done on the master, and having one of those on active standby is the end-situation that this procedure re-establishes.
What to do depends on whether the master or slave fails. If the master fails, the best step to take is to turn the slave into a master; if the slave fails, it must be replaced.
Fencing a failed signer is useful because it avoids erratic behaviour of the signing service as a whole. Before doing this, it is generally useful to establish that the problems are not of a passing nature, such as connectivity problems.
Having a standby node is probably the best balance between maintenance overhead and emergency recovery work. Setting up a new instance of OpenDNSSEC from scratch or from another, active node may not be constructive.

How?

Handling slave failure:

  1. Establish that the slave is really failed, and that the problem is not of a passing nature.
  2. Fence the slave.
  3. Destroy the slave.
  4. Undo fencing of the slave.
  5. Clone the running master instance of OpenDNSSEC.
  6. Update its hostname and IP setup and bring it up, but do not start OpenDNSSEC.

Handling master failure:

  1. Establish that the master is really failed, and that the problem is not of a passing nature.
  2. Fence the master.
  3. Destroy the master.
  4. Clone the current slave to the place of the just-destroyed master.
  5. Setup IP and hostname on the new master, and bring it up. Configure it with the current list of zone/policy pairs. Then start OpenDNSSEC.
  6. Undo fencing of the master.

2 Comments to “Procedure: Signer failure”

  1. nov1ce says:

    Thanks!

    What files (apart from *.xml and kasp.db) do you replicate to the slave? Do you replicate tmp/ and signconf/ directories as well?

  2. admin says:

    Apologies for the late response; we do not replicate tmp/ or signconf/ as these can be re-generated by the OpenDNSSEC enforcer daemon.

Respond

*