British Airways is still reeling after a weekend IT system outage that affected more than 1,000 flights and stranded approximately 75,000 passengers at Heathrow and Gatwick airports. Some sources speculate that the compensation costs could be similar, if not considerably more than the $100 million that last year’s crippling IT failure cost Delta Airways.
Statements from British Airways blamed the IT meltdown on a power supply issue at a data center, while ruling out any possibility of a cyber attack. Though it’s far too early to speculate on exactly how a power supply problem could knock a thousand flights off schedule, one thing is certain: British Airways’ Disaster Recovery Plan failed spectacularly - where system redundancy should’ve kicked in, there was none.
British Airways’ woes serve as an unpleasant, but urgent, reminder that the way we back up our systems is sometimes even more critical than how we run it day-to-day. As it goes with life insurance or a last will and testament, there’s no point in waiting until your plane goes down (or fails to go up) before you start getting your house in order.
The most effective way of providing ‘life insurance’ for your network, is to make sure that exactly mirrored copies of critical parts, such as DNS, are replicated to other locations away from your own data centers, thereby providing system redundancy. That way, if your data centers are knocked out, due to power failure, human error or malicious cyber activity, this critical service is still active, ensuring service continuity and retaining critical operational data – and keeping your passengers happy in the air, instead of sleeping on yoga mats in conference centers.
So how do you make your DNS redundant?
In a traditional DNS setup, a DNS master-slave deployment is used to maintain network availability, with one DNS server as the single writable source, or the master (see Diagram 1). Other DNS servers, or slaves, serve as back-ups, but rely on the availability of the master for new data. If the master becomes unavailable, critical DNS zones cannot be changed, and as ‘inferior’ entities, slaves can only serve zones temporarily in absence of their master.
Depending exclusively on a master-slave deployment poses a significant risk to a company in the event of any DNS outage. The risk is compounded when automation has been built on top of the DNS infrastructure, as the automation piece will halt until the master has been restored, or a slave has been manually promoted to the status of master. However, manual change, especially on networks serving hundreds of thousands internal and external customers, is not only very complicated, but carries a huge potential for error. When combined with the time factor and the complexities related to siloed teams and applications, reverting to manual change can too easily lead to disaster.
DNS redundancy is the process of expanding the choice of available DNS nameservers and distributing them between separate networks - basically keeping your DNS servers replicated in a lot of places, and pointing it at a lot of places.
To further limit risk, companies are increasingly turning to storing their critical external DNS zones on-premises, as well as with more than one specialized DNS or cloud provider that possesses the security, equipment and expertise to handle large amounts of DNS traffic from a variety of sources successfully. Ideally, the most effective redundant DNS architecture will have multiple masters, each possessing the advanced functionality to act as a primary server responding to DNS queries (see Diagram 2). Keeping the multiple master DNS records up to date and in sync can prove a challenge, but one that is totally outweighed by the ultimate benefits of continuous high availability.
Why make your DNS redundant?
Sensible as it may seem, maintaining DNS redundancy is an IT expense that many enterprises try to avoid in order to keep operational costs down – a bit like putting off getting life insurance because it feels like such a waste to spend on the what ifs of tomorrow when all systems seem to be running just fine today. Yet these kinds of short-term savings can too easily turn into a “save a million, lose a billion” scenario, as (quite possibly) several airline bosses have recently discovered the hard way.
Keeping the running of your DNS diverse and distributed is an essential backup mechanism for any company wishing to stay connected, providing services and generating income 24/7/365.
Watch this space in the weeks to come for more information on how to manage redundant DNS complexity from one point of access, gain secure versatility and keep down unexpected expenses.