BIND 9 is a mature piece of software, and compared with its predecessors BIND 4 and BIND 8 it is noticeably more stable and secure. One reason for this is the "Design by contract" programming style used by the BIND 9 team; as a result, BIND 9 is very particular about the data it consumes, and about its own internal data structures. Once BIND 9 encounters an unexpected state in its data structures, it terminates the DNS server process rather than continue running with bad data (and thus potentially compromise security).
While this behavior has clear advantages in terms of security, it can adversely affect service uptime - BIND 9 had several incidents in the past years where BIND 9 terminated because of issues inside the code or data structures, such as "BIND 9 Resolver crashes after logging an error in query.c". And for all its security benefits, an end user unable to reach Facebook may not be terribly understanding in the event of an outage.
The real issue, however, is not that BIND terminates when it comes across bad data, but rather that the process cannot automatically restart after the fact; there is no "supervisor" process in BIND 9.
Some operating systems have a built-in solution: MacOS X has launchd, and the BIND 9 version Apple delivers with the OS is automatically restarted should it terminate unexpectedly. Solaris has SMF (Service Management Facility), and BIND 9 can be integrated into SMF. Recent versions of Ubuntu, RedHat Enterprise, SuSe Enterprise, and Fedora now all use systemd, which can also monitor processes and restart them if needed.
But for Unix and Linux operating systems that do not ship with a process supervisor solution, supervisord is a strong alternative, with the added benefit of being relatively easy to install and configure. Supervisord comes as a package with many Linux distributions, and also works on BSD distributions.
The configuration below is intended for RedHat 6, but should require only minor tweaks to run on other Unix systems as well.
Installation
Supervisord is written in Python (2.4 - 2.7) and can be installed from source (where we have to download and install all dependencies) or with the help of setuptools, which takes care of downloading and installing dependencies (Meld3 and ElementTree).
Full Installation instructions can be found at [http://supervisord.org/installing.html]
Automatic Installation
‣ download "setuptools" from [https://pypi.python.org/packages/source/s/setuptools/setuptools-9.1.tar.gz]
shell> tar xfz setuptools-9.1.tar.gz
shell> cd setuptools-9.1
root-shell> python setup.py install
Once setuptools have been installed, run the following command to install Supervisor and all required dependencies:
root-shell> easy_install supervisor
Manual Installation
Supervisor and its dependencies can also be installed manually.
‣ download "setuptools" from [https://pypi.python.org/packages/source/s/setuptools/setuptools-9.1.tar.gz]
shell> tar xfz setuptools-9.1.tar.gz
shell> cd setuptools-9.1
root-shell> python setup.py install
‣ download "Meld3" from [http://www.plope.com/software/meld3/meld3-0.6.5.tar.gz]
shell> tar xfz meld3-0.6.5.tar.gz
shell> cd meld3-0.6.5
root-shell> python setup.py install
‣ download "ElementTree" from [http://effbot.org/media/downloads/elementtree-1.2.6-20050316.tar.gz]
shell> tar xfz elementtree-1.2.6-20050316.tar.gz
shell> cd cd elementtree-1.2.6-20050316
root-shell> python setup.py install
‣ download "Supervisor" from [https://pypi.python.org/packages/source/s/supervisor/supervisor-3.1.3.tar.gz]
shell> tar xfz supervisor-3.1.3.tar.gz
shell> cd supervisor-3.1.3
root-shell> python setup.py install
Installing startscript and sysconfig
‣ download the startscript from [https://raw.githubusercontent.com/Supervisor/initscripts/master/redhat-init-jkoppe] and place it in /etc/init.d/supervisord
root-shell> cp redhat-init-jkoppe /etc/init.d/supervisord
root-shell> chmod +x /etc/init.d/supervisord
‣ download the 'sysconfig' file from [https://raw.githubusercontent.com/Supervisor/initscripts/master/redhat-sysconfig-jkoppe] and place it in /etc/sysconfig/supervisord
root-shell> cp redhat-sysconfig-jkoppe /etc/sysconfig/supervisord
Installing Bind 9 from Men & Mice repositories
‣ download the BIND 9 RPM from [http://support.menandmice.com/download/bind/linux/redhat/6.x/]<arch>/<version>/
root-shell> yum install ISCBIND-<version>-<flavor>RHL<arch>.rpm
root-shell> mkdir /var/named
root-shell> useradd -d /var/named -r named
root-shell> chown -R named: /var/named
‣ create a BIND 9 configuration file '/etc/named.conf'
options { directory "/var/named"; dnssec-validation auto; };
‣ create an 'rndc' configuration
root-shell> rndc-confgen -a
‣ verify the configuration
root-shell> named-checkconf -z
A basic configuration file for BIND 9 "named"
Below is my basic /etc/supervisord.conf configuration file for one service, the BIND 9 DNS Server:
[unix_http_server]
file = /tmp/supervisor.sock
chmod = 0777
chown= nobody:nobody
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.
rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[supervisord]
logfile = /var/log/supervisord.log
logfile_maxbytes = 10MB
logfile_backups=10
loglevel = info
pidfile = /var/run/supervisord.pid
identifier = supervisor
directory = /tmp
[program:named]
command=/usr/sbin/named -u named -f
process_name=%(program_name)s
numprocs=1
directory=/var/named
priority=100
autostart=true
autorestart=unexpected
startsecs=5
startretries=3
exitcodes=0,2
stopsignal=TERM
stopwaitsecs=10
redirect_stderr=false
stdout_logfile=/var/log/named_supervisord.log
stdout_logfile_maxbytes=1MB
stdout_logfile_backups=10
stdout_capture_maxbytes=1MB
Starting supervisord
With the configuration file in place, we can start supervisord. Make sure that BIND 9 is not started or you will end up with two instances of the BIND 9 server running, which isn't recommended. Also make sure that supervisord will be started on reboot of the server, either through a startscript or other means. Note that the supervisord packages bundled with Linux distributions install a startscript.
root-shell> /etc/init.d/supervisord start
status
root-shell> rndc status
version: 9.9.6-P1 <id:3612d8fb>
number of zones: 98
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running
root-shell> ps -ef
[...]
root 10906 0.0 2.5 209096 12988 ? Ss 19:55 0:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf
named 10908 0.7 1.6 44292 8112 ? S 19:55 0:00 /usr/sbin/named -u named -f
root 10910 0.0 0.2 110228 1156 pts/0 R+ 19:55 0:00 ps aux
root-shell> supervisorctl
named RUNNING pid 10908, uptime 0:03:19
root-shell> chkconfig --add supervisord
root-shell> chkconfig supervisord on
Great, supervisord has started, and it also started the BIND 9 process "named". DNS is working now.
Simulating a BIND 9 crash
To simulate a BIND 9 crash, we "kill" the BIND 9 named process:
root-shell> killall -9 named
Supervisord should detect that the running BIND 9 process has terminated, and start a new one. DNS is still up and running.
Controlling supervisord
Supervisord can be controlled from the command line using the supervisorctl command. A list of all a control commands can be found with "help", and a description of each command with "help command":
shell> supervisorctl help
default commands (type help ):
=====================================
add clear fg open quit remove restart start stop update
avail exit maintail pid reload reread shutdown status tail version
shell> supervisorctl help status
status Get all process status info.
status Get status on a single process by name.
status Get status on multiple named processes.
shell> supervisorctl status named
RUNNING pid 25770, uptime 0:00:12
shell> supervisorctl stop named named: stopped
shell> supervisorctl start named
named: started
Now, whenever there is a triggered assertion error in the code BIND 9 will terminate, but supervisord will bring it back from the dead. Your DNS service stays up, and your users and customers stay happy.
Read the supervisord documentation on how to setup event notifications, so that you get an e-mail notification should BIND 9 restart (should the outage be caused by a security vulnerability you might want to report it to bind9-bugs@isc.org as well).
Of course supervisord can be used to restart other processes as well, including other types of DNS Servers (NSD, Unbound, dnsmasq ...).