Failover With ISC DHCP
Failover With ISC DHCP
Failover With ISC DHCP
com/geek/dhcp-failover/
m a d b o a . c o m
Home
Geek stuff
Failover with ISC DHCP
Praise songs Paul Heinlein <[email protected]>
Paul's page
Initial publication: November 6, 2005
Most recent revision: April 11, 2008
Book notes
This site
How to set up the ISC DHCP daemon with load sharing and failover capabilities.
Table of Contents
Introduction
A simple starting point
The problem
Configuring the primary
Configuring the secondary
SELinux notes
What the logs will show
Introduction
Small- and medium-sized networks often have a single DHCP server, which can become
a single point of failure for a large number of hosts on the network. When the DHCP
server goes off-line, DHCP client hosts lose their addresses and ability to communicate
with the rest of the network. Since most desktop computers, and even some servers, get
their networking configuration via DHCP, such an outage can result in a lot of
downtime.
If the network has a Unix infrastructure, there’s a good chance that it’s using the
Internet Systems Consortium (ISC) DHCP server, which is widely available on Linux
and BSD systems.
Starting with version 3.0, the ISC DHCP server offered failover capabilities that allow
network administrators to offer a more robust DHCP service. A failover setup requires a
little care, but it’s fairly straightforward to implement.
#
# /etc/dhcpd.conf for simple network
#
authoritative;
ddns-update-style none;
1 of 5 5/29/2010 10:50 PM
Failover with ISC DHCP http://www.madboa.com/geek/dhcp-failover/
With this configuration, our server will act as the authoritative DHCP server on the
192.168.200.0 subnet, handing out addresses from 192.168.200.100 to
192.168.200.254 to any host that asks for one.
The problem
Our configuration will work fine until the DHCP server goes off-line. The cause of its
demise might be a hardware failure, a power outage, or even an OS upgrade; it doesn’t
matter. Once it’s gone, all DHCP client hosts will lose their network configurations
within 30 minutes (our maximum lease time).
We could just bring another DHCP server online in its place, but the information about
leases will be lost, possibly forcing clients to acquire new addresses. In that situation,
clients would have to break any existing network connections. In some cases, local X
sessions would also break. (If you’re bored sometime, try changing the hostname of your
machine when running a live X desktop. The recovery process can be amusing.)
Alternatively, we could plan for a downtime by increasing lease times from 30 minutes
to the better part of a day. That would reduce—but not completely remove—the risk of
any given client having its lease expire while the server is off-line, but any newly
arriving client won’t get an address.
The failover peer section that identifies the primary and secondary servers; in
the example below, it’s called “dhcp-failover,” but it can be any string that works
for you. The example identifies the two DHCP servers by address, but you can use
DNS names as well. In the past couple years, TCP ports 647 (primary) and 847
(peer) have emerged as the standard bindings for DHCP failover. It’s worth
noting that as recently as 2005, the dhcpd.conf(5) man page used ports 519 and
520 in its failover example, but 647 and 847 look like good choices as of 2008.
The dhcpd.conf(5) man page says that the primary port and the peer port may
be the same number. That’s the configuration I deploy, using the port 647 for both
the primary and the peer.
The pool sections for which the failover pair is active; in our example, we have
only one pool section, so we add a reference to our failover peer set.
#
# /etc/dhcpd.conf for primary DHCP server
#
2 of 5 5/29/2010 10:50 PM
Failover with ISC DHCP http://www.madboa.com/geek/dhcp-failover/
authoritative;
ddns-update-style none;
#
# /etc/dhcpd.conf for secondary DHCP server
#
authoritative;
ddns-update-style none;
3 of 5 5/29/2010 10:50 PM
Failover with ISC DHCP http://www.madboa.com/geek/dhcp-failover/
The folks at ISC note that the DHCP failover protocol is still under development, which
makes it sort of a moving target. As a result, they strongly suggest that the primary and
secondary servers both be running the same version of dhcpd.
SELinux notes
As noted, running dhcpd in failover mode involves opening a TCP port for
communication with the peer server. The SELinux policy distributed with CentOS 4 and
5 allows dhcpd to send packets over ports 647 and 847, but you’ll need to tweak the
policy if you want to use different ports.
The instructions below apply specifically to CentOS 4 (and, by extension, to Red Hat
Enterprise Linux 4), though I suspect that they would also work on Fedora Core 3 and
4.
When the primary comes back, the log will say (among other things)
The other main difference in the logs will be the presence of pool reports. In failover
mode, dhcpd will try to ensure that the primary and secondary servers each have a
similar number of free dynamic leases for each pool declared in the configuration file.
As the servers work to keep that balance, they’ll occasionally log their status.
In this case, 75 of the 155 of the addresses we declared eligible for dynamic assignment
are still available. The primary holds 38 in reserve, the secondary 37. As long as the
values for free and backup differ by no more than one, things are good. Should they
vary by two or more (with a resulting non-zero lts), the pool addresses will be juggled
until balance is restored.
4 of 5 5/29/2010 10:50 PM
Failover with ISC DHCP http://www.madboa.com/geek/dhcp-failover/
Now, the single point of failure is gone. So go hog wild: install those security patches on
your DHCP server that you’d put off because you didn’t want to lose leases!
This article is licensed under a Creative Commons License.
5 of 5 5/29/2010 10:50 PM