Back to Cookbook

Backup Integrity and Restore Test Verifier

aka “Backup Prober

Find out your backups are empty before you need them, not during the incident

Verifies your backups are actually restorable — checks completeness, freshness, and integrity. Runs automated restore tests to catch the silent failures that make backup confidence an illusion.

House RecipeWork5 min
Try in KiloClawFree 7-day trial

INGREDIENTS

💬Slack✈️Telegram

PROMPT

Create a skill called "Backup Prober". Verify backup health across my infrastructure: 1. Inventory all backup systems: - AWS RDS snapshots: `aws rds describe-db-snapshots` - AWS Backup: `aws backup list-backup-jobs` - Kubernetes Velero: `velero backup get` - Any custom backup scripts (check cron logs) 2. For each backup system, verify: - Last successful backup time (is it within the expected RPO?) - Backup size (is it reasonable? Has it changed dramatically?) - Backup completeness (are all expected databases/volumes included?) 3. If possible, run a restore test: - Restore to a test instance - Run basic data verification (row counts, key table spot checks) - Clean up test resources after verification 4. Generate a DR readiness report: - Actual RPO (time since last good backup) - Estimated RTO (based on backup size and restore process) - Gaps: what's not being backed up that should be Flag any backup system outside its expected RPO as CRITICAL. If I don't provide an RPO target, default to 24 hours.

How It Works

GitLab found out 5 out of 5 backup methods had failed during their 2017

database outage. This skill makes sure that doesn't happen to you by

regularly verifying backup health.

What You Get

  • Backup inventory: all backup jobs, schedules, and last run status
  • Freshness check: when was the last successful backup?
  • Completeness check: is the backup size reasonable vs. data size?
  • Integrity check: checksums, consistency verification
  • Restore test: automated restore to a test instance with data verification
  • DR readiness report: RPO/RTO assessment based on actual backup state

Setup Steps

  1. Tell your Claw about your backup systems (AWS Backup, RDS snapshots, Velero, custom scripts)
  2. Run the verification to get a baseline health report
  3. Set up a weekly or monthly schedule for ongoing verification
  4. Address any failures found in the report

Tips

  • A backup that hasn't been restore-tested is a hope, not a backup
  • Check backup sizes over time — a sudden size drop often means something broke
  • Verify cross-region or cross-account backups separately
  • Include both database and infrastructure state (Terraform, K8s) in your backup scope
  • Run the restore test in an isolated environment to avoid impacting production
Tags:#backups#disaster-recovery#reliability#devops