RSystems

Cloud & Infrastructure · Security

Disaster Recovery

Also known as: DR, Business Continuity, BCP, BCDR

Disaster recovery (DR) is the set of policies, procedures, and technologies that enable an organization to restore IT systems and data after a major failure — a ransomware attack, hardware failure, natural disaster, or facility loss.

Disaster recovery planning answers the question: if we lost everything in our primary environment right now, how would we recover, how long would it take, and how much data would we lose?

Two metrics define a DR posture:

RTO (Recovery Time Objective) — how long the organization can tolerate being down. An hour? A day? A week? This drives the investment required — a one-hour RTO demands near-real-time failover; a 24-hour RTO allows for more economical backup restore.

RPO (Recovery Point Objective) — how much data loss is acceptable. If RPO is 4 hours, you need backups at least every 4 hours. If RPO is near-zero, you need real-time replication.

DR tiers

Backup and restore — lowest cost, longest recovery time. Restore from backup after an incident. Recovery time measured in hours to days.

Warm standby — a secondary environment that runs scaled-down infrastructure, continuously updated via replication. Can be brought to full capacity in minutes to hours.

Hot standby / active-active — full parallel environment, always running. Failover is nearly instantaneous. Highest cost.

Backup fundamentals

Regardless of tier, backup hygiene is the foundation:

3-2-1 rule — three copies of data, on two different media types, with one copy offsite.

Immutable backups — backups that ransomware can't encrypt or delete, typically offsite or air-gapped.

Test restores — an untested backup is not a backup. Regular restore tests confirm that backup data is actually recoverable.

DR is not the same as high availability (HA). HA handles component failures within a running system — a server failing in a cluster. DR handles the loss of the entire environment.