Modern PostgreSQL Superstack Series
    Part 7 of 7

    Backups and Disaster Recovery with pgBackRest

    The operational maturity checkpoint. Continuous archiving, encrypted offsite backups, and a written DR runbook.

    60 minutes
    + S3-compatible bucket
    Prerequisites

    Working Postgres from Part 1; ideally Part 6's cluster

    Time to Complete

    55–80 minutes

    External

    S3-compatible bucket (B2/Wasabi/RamNode CO Storage)

    Why pg_dump Is Not a Backup Strategy for Production

    • • No point-in-time recovery — you restore to the moment pg_dump ran, never to one minute before the bad UPDATE
    • • Restore is single-threaded and slow on a database larger than a few GB
    • • Long-running dumps hold transaction snapshots that prevent vacuum from cleaning up dead tuples
    • • Schema drift between dump and database is invisible until you try to restore

    pg_dump is a fine logical export tool. It is not a backup.

    The Continuous Archiving Model

    Take a base backup of the data directory once. Stream WAL segments to the repository continuously after that. To restore to any point in time, lay down the base backup and replay WAL up to your chosen target. This is what pgBackRest, Barman, and WAL-G all implement; they differ in the operational ergonomics.

    pgBackRest vs Barman vs WAL-G

    All three work. pgBackRest gets the spotlight here because it has the strongest combination of features for a self-hosted Postgres on a VPS:

    • • Incremental block-level backups — only changed 16 KB blocks ship over the wire
    • • Built-in encryption (AES-256-CBC) and parallel compression (zstd/gzip/lz4)
    • • First-class S3, Azure Blob, and GCS support
    • • Easy parallel restore and PITR with a single pgbackrest restore

    Install pgBackRest 2.x from PGDG

    sudo apt install -y pgbackrest

    Choosing the Repository Location

    Two practical options on a RamNode deployment:

    • A small dedicated RamNode VPS as a backup host — fast restore, predictable bandwidth, no per-GB egress fees within the same datacenter
    • An S3-compatible bucket — Backblaze B2 ($6/TB/month) and Wasabi ($6.99/TB/month) are the cost-effective defaults; RamNode Cloud Object Storage works equally well

    The 3-2-1 Rule for Self-Hosters

    Three copies of the data, on two different media, with one offsite. For a one-VPS Postgres that means: production data + a local fast-restore repo on a second VPS in the same region + an encrypted S3-compatible offsite repo. pgBackRest supports two repositories simultaneously, so you can satisfy the rule from a single configuration.

    pgbackrest.conf Walkthrough

    /etc/pgbackrest/pgbackrest.conf
    [global]
    # Local-on-backup-host repo (fast restore):
    repo1-path=/var/lib/pgbackrest
    repo1-retention-full=2
    repo1-retention-diff=7
    
    # Offsite encrypted S3-compatible repo:
    repo2-type=s3
    repo2-s3-endpoint=s3.us-east-005.backblazeb2.com
    repo2-s3-bucket=acme-pg-backups
    repo2-s3-region=us-east-005
    repo2-s3-key=AKIA...
    repo2-s3-key-secret=SECRET...
    repo2-path=/postgres/pg-cluster
    repo2-cipher-type=aes-256-cbc
    repo2-cipher-pass=GENERATE-A-LONG-RANDOM-PASSPHRASE-HERE
    repo2-retention-full=4
    repo2-retention-diff=14
    
    process-max=4
    log-level-console=info
    log-level-file=detail
    compress-type=zst
    compress-level=3
    
    [pg-cluster]
    pg1-path=/var/lib/postgresql/17/main
    pg1-port=5432
    pg1-user=postgres

    archive_command Setup

    postgresql.conf
    archive_mode = on
    archive_command = 'pgbackrest --stanza=pg-cluster archive-push %p'
    max_wal_senders = 10
    wal_level = replica
    sudo systemctl restart postgresql@17-main

    Create the Stanza, First Backup, Verify

    sudo -u postgres pgbackrest --stanza=pg-cluster --log-level-console=info stanza-create
    sudo -u postgres pgbackrest --stanza=pg-cluster --type=full --log-level-console=info backup
    sudo -u postgres pgbackrest --stanza=pg-cluster check

    Backup Types and a Sensible Schedule

    Common rhythm:

    • Full — weekly (Sunday 02:00). Slow, complete.
    • Differential — daily (02:00 Mon–Sat). Changes since the last full.
    • Incremental — hourly. Changes since the last backup of any type.
    /etc/cron.d/pgbackrest
    # m h dom mon dow user command
    0  2 * * 0 postgres pgbackrest --stanza=pg-cluster --type=full backup
    0  2 * * 1-6 postgres pgbackrest --stanza=pg-cluster --type=diff backup
    0  *  * * * postgres pgbackrest --stanza=pg-cluster --type=incr backup

    Retention Policies

    pgBackRest retention is expressed in terms of full and differential backups. With repo1-retention-full=2, two full backups are kept, plus all dependent diffs and incrementals, plus the WAL needed to reach the oldest restore point. Add repo*-retention-archive if you want WAL pruning that diverges from backup retention.

    Encryption

    AES-256-CBC at rest. Generate a long random passphrase, store it in a password manager, and out-of-band — losing the passphrase means losing the offsite backup. Key rotation requires a fresh stanza on the new key; do not try to re-encrypt in place.

    Compression

    Default is gzip. zstd at level 3 gets you 30–50% better ratios at similar CPU cost — use it unless you are CPU-starved on the database host.

    Verifying Restores in a Fire Drill

    The only backup that works is the one you have tested. Spin up a throwaway VPS, install Postgres, install pgBackRest with the same config, and:

    sudo systemctl stop postgresql@17-main
    sudo -u postgres rm -rf /var/lib/postgresql/17/main/*
    sudo -u postgres pgbackrest --stanza=pg-cluster --log-level-console=info restore
    sudo systemctl start postgresql@17-main

    Schedule this monthly. Document how long the restore takes — that is your real RTO.

    Point-in-Time Recovery

    sudo systemctl stop postgresql@17-main
    sudo -u postgres pgbackrest --stanza=pg-cluster \
      --type=time --target='2026-05-13 14:32:00' \
      --target-action=promote restore
    sudo systemctl start postgresql@17-main

    The cluster replays WAL up to your target time and then promotes. Verify the data, then continue.

    Restoring to a New Node (Dead Patroni Leader)

    When a Patroni leader dies for good, Patroni promotes a replica automatically. Replacing the dead node is then "build a new VPS, restore from pgBackRest, let Patroni reattach it as a replica":

    # On the new node, after installing Postgres and Patroni:
    sudo -u postgres pgbackrest --stanza=pg-cluster --type=standby restore
    sudo systemctl start patroni
    patronictl -c /etc/patroni/patroni.yml list  # new node should appear as Replica

    Single-Table Restores

    pgBackRest does not restore individual tables — it restores the cluster. The fallbacks:

    • • Restore the cluster to a side instance using PITR, then pg_dump --table=schema.table from the side instance and reload into production
    • • For ongoing protection of a small set of tables, set up logical replication to a sidecar instance — then a "single-table restore" is just truncating production and copying from the sidecar

    Monitoring Backup Success

    pgbackrest --stanza=pg-cluster info

    Wrap that in a check that posts to a webhook (Slack/ntfy) when the most recent backup is older than your tolerance:

    /usr/local/bin/check-backup-age.sh
    #!/bin/bash
    LATEST=$(pgbackrest --stanza=pg-cluster info --output=json \
      | jq -r '.[0].backup[-1].timestamp.stop')
    NOW=$(date +%s)
    AGE=$((NOW - LATEST))
    THRESHOLD=$((26 * 3600))   # 26 hours
    if [ "$AGE" -gt "$THRESHOLD" ]; then
      curl -s -X POST -d "{\"text\":\":warning: pgBackRest backup is $((AGE/3600))h old\"}" \
        -H 'Content-Type: application/json' \
        https://hooks.slack.com/services/T/B/X
    fi

    Disaster Recovery Runbook Template

    RPO target: 5 minutes (continuous WAL archiving).
    RTO target: 30 minutes for most-recent backup; 60 minutes for PITR.
    On-call roles: Incident Lead (decisions), Restore Operator (executes commands), Comms Lead (status updates).

    Decision tree:

    Is the primary alive?
      YES + corruption?     → PITR to before the corruption (this part)
      NO  + replica healthy?→ Patroni already failed over (Part 6); replace the dead node
      NO  + cluster lost?   → Provision new VPS, restore from pgBackRest, repoint apps

    Verified commands: linked above in this part. Comms cadence: internal status every 15 min, external status every 30 min until resolved.

    Series Wrap-Up

    You now have an opinionated production PostgreSQL stack you can stand up on RamNode in an afternoon: tuned single-node, pooled, with pgvector + ParadeDB + pg_duckdb covering RAG, search, and analytics, sitting behind a Patroni HA cluster with verified offsite backups.

    For where to go next, the series landing page lists companion content opportunities — logical replication and zero-downtime major upgrades, time-series partitioning with pg_partman, and horizontal sharding with Citus. Build something with what you have here first.