Cloud & Infrastructure · Security
Disaster Recovery
Also known as: DR, Business Continuity, BCP, BCDR
Disaster recovery (DR) is the set of policies, procedures, and technologies that enable an organization to restore IT systems and data after a major failure — a ransomware attack, hardware failure, natural disaster, or facility loss.
Disaster recovery planning answers the question: if we lost everything in our primary environment right now, how would we recover, how long would it take, and how much data would we lose?
Two metrics define a DR posture:
RTO (Recovery Time Objective) — how long the organization can tolerate being down. An hour? A day? A week? This drives the investment required — a one-hour RTO demands near-real-time failover; a 24-hour RTO allows for more economical backup restore.
RPO (Recovery Point Objective) — how much data loss is acceptable. If RPO is 4 hours, you need backups at least every 4 hours. If RPO is near-zero, you need real-time replication.
DR tiers
Backup and restore — lowest cost, longest recovery time. Restore from backup after an incident. Recovery time measured in hours to days.
Warm standby — a secondary environment that runs scaled-down infrastructure, continuously updated via replication. Can be brought to full capacity in minutes to hours.
Hot standby / active-active — full parallel environment, always running. Failover is nearly instantaneous. Highest cost.
Backup fundamentals
Regardless of tier, backup hygiene is the foundation:
3-2-1 rule — three copies of data, on two different media types, with one copy offsite.
Immutable backups — backups that ransomware can't encrypt or delete, typically offsite or air-gapped.
Test restores — an untested backup is not a backup. Regular restore tests confirm that backup data is actually recoverable.
DR is not the same as high availability (HA). HA handles component failures within a running system — a server failing in a cluster. DR handles the loss of the entire environment.