Master/slave replication with OpenDNSSEC

Written by Rick van Rein in category: Architecture, Resilience

In a previous article we discussed the idea of a high-availability Hardware Security Module (or HSM) service. To make the entire DNSSEC signing service act in high-availability mode there is one more part to replicate, namely the OpenDNSSEC signer machines. These manage the procedures and are aware of the DNS and timing intricacies of DNSSEC. If all signing is done on one machine, its failure would introduce a risk of missing the signing deadline of some domains, which would lead to those domains being rejected by secure resolvers. Domain downtime is generally bad, and may even block DNSSEC adoption, so we will setup redundant signer machines.

OpenDNSSEC is not really designed (at least in version 1.1) to function in redundant mode, but there are ways of getting it to function redundantly. If one signer is an actively signing master and the other is a slave that clones the master’s output, then it ought to be possible to switch them from master to slave, and from slave to master. This is possible because OpenDNSSEC from version 1.1 on will not sign a full zone but instead it tries to reuse what signatures it already has. If we feed the master’s signatures back into the slave, we should be able to have it pickup where a (failing) master left off. As a result, the signing service would commence without noticeable interruption.

This makes our network diagram a little more complex:

One of the identical pair of HSMs will be backed up regularly

Image components by

Furthermore, the serving OpenDNSSEC master will push its signed zones to the authoritative name servers that publish the zone. In fact, the only part of the infrastructure that is not necessarily redundant is the webinterface that runs the end-user application “SURFdomeinen”; this interface could go down without domains dropping off the Internet. It would only impair end users from changing their domains, which is not a concern to address in this exploration of a DNSSEC service.

Needless to say that the HSMs and the signer machines are spread over two locations, and managed by two independent parties. The cross-over in the diagram actually covers quite a few kilometers, but since it won’t carry much data there is hardly a burden from that slight inefficiency. And, because we run the OpenDNSSEC instances as virtual machines, we will actually keep a third one ready to clone so it can quickly replace a failing instance. As long as data is replicated over both active instances, it does no harm to simply destroy a failed one, and start a fresh instance from scratch.

Note that an HSM generally does not mind being addressed by multiple clients at the same time; this is a normal mode of operation for PKCS #11 implementations.

What remains to be done is to define master/slave handover procedures. Since these will usually be emergency-time actions, it is best if these are spelled out in detail in a separate place. Our general approach will be to ensure that the original signer is really down, then we assure that database and HSM are in a proper state, after which we start the signing service on the replacement signer.

1 Comment to “Master/slave replication with OpenDNSSEC”

  1. Antti Ristimäki says:

    We have also setup a redundant signing system with OpenDNSSEC. It would be interesting to hear what information you replicate between the master and the backup signer? We ended up replicating only the OpenDNSSEC database and the unsigned zone files. We are not replicating the temporary files generated by OpenDNSSEC, which means that when we do a switchover, the backup signer will perform a full zone signing – using the same keys as the primary server, naturally.

    Do you have an automatic failover mehanism or does it require human intervention to activate the backup signer?