When disaster recovery plans fail in real disasters
Do you feel like you’re ready for a disaster? If the answer is no, you’re not alone—not by a long shot.
According to Gartner, only 11% of IT leaders with disaster recovery plans feel fully prepared for a disaster. When asked why this is the case, they say manual recovery techniques (47%), lack of testing (46%), and juggling multiple recovery tools (37%) are their top barriers.
If you’re already being asked to do more with less, you could be justified in wondering just how you’re supposed to keep critical systems running while also carving out time to plan for disasters that never come. Forrester has found that many businesses are actually falling behind in their overall DR preparedness.
You’re up against real challenges, but you’re not without options for addressing them. Here’s why DR plans often aren’t as bulletproof as they should be, why DR solutions fail during crises, and what you can do about it.
The testing gap that undermines DR planning
As Gartner found, most IT leaders (33%) only address their disaster recovery plans on a quarterly basis. Given nearly half of them aren’t testing backups, “addressing” these plans could simply mean they’re confirming the backups are running and updating contact lists.
If you’re in their shoes, you might find out what’s really going on at the worst possible moment. Your backup may be intact, but the service account password could have expired. The data may restore perfectly, but critical dependencies might not have been backed up. Or, crucial firewall rules simply might not exist in your DR site, making it impossible to cut over.
Nobody wants any of these outcomes, but comprehensive testing probably doesn’t feel realistic, either. You can’t take production offline for a weekend, and your three-person team is already pulling long hours just keeping up with patches and after-hours tickets. So you make sure your backups complete, restore a few files to prove the system works, and hope that’s enough when the time comes.
Common challenges with cloud and on-prem DR
IDC found that when organizations were asked what was most important for their cloud investments in 2024, disaster recovery and backup topped the list. Cloud providers promise geographic redundancy, automated backups, and freedom from hardware failures.
But when ransomware actors compromise your cloud credentials, they can simultaneously encrypt everything in multiple locations. That “backup” your cloud provider offers might turn out to be versioning with 30-day retention, which helps with accidental deletions but won’t save you from attackers who’ve been camping out in your environment for months.
On-prem also has its drawbacks. IT teams may feel like they have control because the servers are in their building and/or colo sites, but if only one person understands your backup infrastructure, you could have a single point of failure in human form—one that might go on vacation at the worst possible time.
What disaster recovery can actually look like
Vendors may measure Recovery Time Objectives (RTOs) under perfect conditions, while your actual recovery scenario could look radically different: multiple systems are compromised at 3AM on Christmas, you’re working with a skeleton crew, executives are demanding hourly updates, and you’re trying to figure out if Tuesday’s backup or Thursday’s is the last clean one.
Your DR runbook assumes everything works the same way in your recovery site, but your DR hardware is two generations older and that application that barely runs on current servers won’t start at all. The network bandwidth that looked fine for nightly replication grinds to a halt when you try to run full production traffic through it.
Every hour of delay compounds the pressure when you’re in firefighting mode. Email might need to come back today, while that archive of marketing materials from 2019 can wait weeks or maybe never gets restored at all. You might even briefly consider paying that ransom just to make it stop. Recovery will be messy, with data loss and systems needing rebuilds. If the business survives and customers stay, that’s a win.
Making DR work better with what you have
The gap between DR marketing and DR reality won’t close soon, but you can narrow it. Here’s how to build better disaster recovery with the resources you actually have.
- Start with a DR card for each critical system
This isn’t a 50-page disaster recovery plan nobody reads—it’s a two-page cheat sheet someone can follow at 3 AM. Document what actually needs to happen to restore each system, not what the manual says. Include granular details like the weird workarounds and the specific settings that matter. Note which systems have to come up first and what breaks if they don’t.
- Set up backup validation that actually works
Rather than simply confirming the backup job succeeded, automate weekly restores of random files and verify they’re readable. Make sure your backup includes everything needed for recovery, including system state, certificates, and those easily-missed application dependencies. If you’re backing up databases, periodically restore one to a test server and verify the application can actually connect to it.
- Build muscle memory with monthly micro-tests
You can’t take production down for a weekend, but you can restore one critical database to a test server during lunch. Next month, restore a different system. Document what broke and what took longer than expected. These small tests reveal the gaps that would grind everything to a halt during a real disaster.
- Focus your limited testing on what kills the business if it fails
Your Active Directory (especially if you’re hybrid with Azure), line-of-business applications, and SQL databases probably matter more than that SharePoint site nobody visits. Test recovering these systems you actually control first. Figure out how long each one takes to restore and what has to happen in what order. These are the systems that will get priority when everything’s on fire.
- Create runbooks that assume your expert isn’t available
Write them for the junior tech who’s on call, not the senior admin who built the system. Include screenshots, specific commands, and common error messages with their fixes. Test these runbooks by having someone unfamiliar with the system follow them. If they get stuck, tweak the documentation accordingly.
- Know your cloud provider’s shared responsibility model
Calculate the real cost of recovery, including egress fees and compute time for running in DR mode. Set up billing alerts, so you know when disaster recovery is eating into your annual budget. Consider keeping a local copy of critical data even if you’re cloud first—because downloading 40TB over your internet connection could take far longer than your company can tolerate.
- Have the awkward conversation with leadership about realistic recovery
Show your executives what recovering actually looks like with your current resources. Walk through real-world scenarios and timelines—for example, what happens when you’re down to two people and a critical system needs rebuilding from scratch.
An ounce of prevention is worth a pound of cure
Perfect disaster recovery is a luxury most businesses can’t afford. Your next disaster won’t wait for immaculate preparation, but if you know which systems you can actually restore, how long it really takes, and what will stay broken no matter what, you’re already ahead of most organizations. That knowledge will determine whether and how swiftly your business recovers after a disaster strikes.