PostgreSQL Point-in-Time Recovery with Juju

Databases should not lose data, but we still have to plan for recovery. We make logical backups when we can, exporting the database to SQL (pg_dump). Even terabyte sized databases can be dumped daily. But as databases become bigger, logical backups are becoming more of a luxury; they require long running transactions, and can take an impractical amount of time with multi-terabyte databases.

Binary backups are becoming standard. An inconsistent backup of the filesystem is made, along with the Write Ahead Log (WAL) files needed to repair the inconsistent backup. While generally larger in size, the backup process is less resource intensive on the database. And if you keep archiving the WAL files as they are produced, you have the option of Point In Time Recovery (PITR).

Deploying PostgreSQL with Juju simplifies the process of making and restoring PITR backups. The PostgreSQL charm uses the WAL-E tool for binary backups, WAL archiving and PITR. It stores the backups in cloud storage, supporting OpenStack Swift, Amazon S3, Azure WABS and Google Cloud Storage. I’m using it in production with OpenStack Swift, and I’m interested in hearing about experiences from others using other cloud storage options.
Once you have Juju setup and a model created, it’s easy to deploy replicated PostgreSQL:

juju deploy -n 3 cs:postgresql --storage pgdata=1G
watch -c juju status --color

Juju spins up three default instances, and the PostgreSQL charm installs PostgreSQL and configures one unit as the master and the other two as hot standbys. Normally, our deployments are to bare metal and we would be deploying to servers provisioned by MAAS and added into our model using the manual provider, but here I’m just using default containers and default Juju storage for the data:

Model  Controller  Cloud/Region         Version  SLA
walex  lxd         localhost/localhost  2.2.2    unsupported

App         Version  Status  Scale  Charm       Store       Rev  OS      Notes
postgresql  9.5.8    active      3  postgresql  jujucharms  158  ubuntu

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0*  active    idle   0        10.0.4.30       5432/tcp  Live master (9.5.8)
postgresql/1   active    idle   1        10.0.4.71       5432/tcp  Live secondary (9.5.8)
postgresql/2   active    idle   2        10.0.4.198      5432/tcp  Live secondary (9.5.8)

Machine  State    DNS         Inst id        Series  AZ  Message
0        started  10.0.4.30   juju-a823cc-0  xenial      Running
1        started  10.0.4.71   juju-a823cc-1  xenial      Running
2        started  10.0.4.198  juju-a823cc-2  xenial      Running

Relation     Provides    Consumes    Type
replication  postgresql  postgresql  peer

Next up, I need to configure WAL-E. I’m going to be using OpenStack Swift for storage, so it will need those credentials:

$ cat swift.yaml
postgresql:
  os_username: osme
  os_tenant_name: osme_project
  os_password: secret
  os_auth_url: https://keystone.example.com:443/v2.0/
$ juju config postgresql --file swift.yaml

And I need to instruct the charm what container or bucket to use (the URI is the WAL-E documented format). This also triggers the WAL-E Snap package installation:

$ juju config postgresql wal_e_storage_uri=swift://walex_1

N.B. While the charm creates the Swift container automatically, with other storage backends you may need to create it manually using the cloud native tools.

WAL files by themselves are not terribly useful. We need a filesystem level backup of the database to apply them to. We could wait for cron to kick off the regular job, but it is even better to run it right now. There is a Juju action to do just that, which must be run on the master unit (as reported by juju status; it will not always be the first unit). The action runs WAL-E, which streams a copy of the database and necessary WAL files directly to cloud storage; unlike the charm’s logical backup support, you do not need to worry about having enough disk space on the unit to store a copy of the database.

$ juju run-action --wait postgresql/0 wal-e-backup
action-id: 29dc70c3-a3c5-4ecb-8dc9-c060d485a574
results:
  backup-return-code: "0"
  wal-e-backup-cmd: /snap/bin/wal-e.envdir /etc/postgresql/9.5/main/wal-e.env /snap/bin/wal-e
  backup-push /var/lib/postgresql/9.5/main
  wal-e-prune-cmd: None
status: completed
timing:
  completed: 2017-08-29 13:07:18 +0000 UTC
  enqueued: 2017-08-29 13:06:26 +0000 UTC
  started: 2017-08-29 13:06:33 +0000 UTC
$ juju run-action --wait postgresql/0 wal-e-list-backups
action-id: b699bbd6-9738-4d7e-8286-dc94874b8c12
results:
  base-00000001000000000000000a-00000096:
    expanded-size-bytes: "21710137"
    last-modified: 2017-08-29T13:07:17.113900
    name: base_00000001000000000000000A_00000096
    wal-segment-backup-start: 00000001000000000000000A
    wal-segment-backup-stop: 00000001000000000000000A
    wal-segment-offset-backup-start: 00000096
    wal-segment-offset-backup-stop: "00000360"
status: completed
timing:
  completed: 2017-08-29 13:07:31 +0000 UTC
  enqueued: 2017-08-29 13:07:23 +0000 UTC
  started: 2017-08-29 13:07:25 +0000 UTC

I’m going to create an example table with some data for us to test PITR. I have not got a client connected, so I will do this directly on the master server. I then need to force PostgreSQL to cycle the current WAL file, so these changes can be archived off to Swift; there is no other traffic to cause it to fill, and I have not set any PostgreSQL options to force regular WAL cycling:

postgres@juju-a823cc-0:/home/ubuntu$ psql
psql (9.5.8)
Type "help" for help.

postgres=# CREATE TABLE ts (t TIMESTAMP WITH TIME ZONE);
CREATE TABLE
postgres=# INSERT INTO ts VALUES (CURRENT_TIMESTAMP);
INSERT 0 1
postgres=# INSERT INTO ts VALUES (CURRENT_TIMESTAMP);
INSERT 0 1
postgres=# INSERT INTO ts VALUES (CURRENT_TIMESTAMP);
INSERT 0 1
postgres=# SELECT * FROM ts ORDER BY t;
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
2017-08-29 13:32:46.835794+00
(3 rows)

postgres=# SELECT pg_switch_xlog();
pg_switch_xlog
----------------
0/B0502B0
(1 row)

Now let’s do a PITR on a fresh unit in a new Juju model. This is another default deploy, with the same configuration except for the wal_e_storage_uri setting. Every service must use a unique bucket, or they will collide and your backups are at risk.

Here we are not shutting down the original three units, so this approach can also be used to test recovery, build staging servers or even to migrate to new hardware:

$ juju run-action --wait postgresql/0 wal-e-restore storage-uri=swift://walex_1 target-time='2017-08-29 13:32:41.000000+00' confirm=true
action-id: bf0f24e4-6bc9-493c-80c0-103e7a2c87f5
status: completed
timing:
  completed: 2017-08-29 15:24:07 +0000 UTC
  enqueued: 2017-08-29 15:22:09 +0000 UTC
  started: 2017-08-29 15:22:11 +0000 UTC

And check that this really was a recovered to the desired point in time; the timestamps exist from before the target time, but none after:

$ juju ssh postgresql/0
[...]
$ sudo -u postgres -s -H
postgres@juju-268cdf-0:/home/ubuntu$ psql
psql (9.5.8)
Type "help" for help.

postgres=# SELECT * FROM ts ORDER BY t;
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
(2 rows)

We can also recover our original deployment, for example if you need to roll back following massive data loss. Ideally you would deploy to new units and cut over, but you don’t always have the necessary hardware available. The first thing we need to do is change the wal_e_storage_uri setting to a new, unique value. This is a deliberate limitation of the charm, to avoid collisions when multiple recovery attempts are made; the Juju actions exist to make recovery easy, and to avoid corrupting your backups easily. Here, we recover the master database. The standby servers are left alone, and are still live and able to serve queries on the original time line throughout the recovery process:

$ juju config postgresql wal_e_storage_uri=swift://walex_3
$ juju ssh postgresql/0 'sudo -u postgres -H psql -c "SELECT * FROM ts ORDER BY t"'
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
2017-08-29 13:32:46.835794+00
(3 rows)

$ juju run-action --wait postgresql/0 wal-e-restore storage-uri=swift://walex_1 target-time='2017-08-29 13:32:41.000000+00' confirm=true
action-id: 5ae3026e-db80-4a07-89e8-c31a28e05c80
status: completed
timing:
  completed: 2017-08-29 16:17:22 +0000 UTC
  enqueued: 2017-08-29 16:14:44 +0000 UTC
  started: 2017-08-29 16:14:48 +0000 UTC

$ juju ssh postgresql/0 'sudo -u postgres -H psql -c "SELECT * FROM ts ORDER BY t"'
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
(2 rows)

And recover one of the standbys, confirming replication still works:

$ juju run-action postgresql/1 wal-e-restore storage-uri=swift://walex_1 target-time='2017-08-29 13:32:41.000000+00' confirm=true
Action queued with id: 2b094006-f508-46a6-801a-d863ea4ac33a
$ juju show-action-output 2b094006-f508-46a6-801a-d863ea4ac33a
status: completed
timing:
  completed: 2017-08-29 16:19:57 +0000 UTC
  enqueued: 2017-08-29 16:18:02 +0000 UTC
  started: 2017-08-29 16:18:04 +0000 UTC

$ juju ssh postgresql/0 'sudo -u postgres -H psql -c "INSERT INTO ts VALUES (CURRENT_TIMESTAMP)" '
INSERT 0 1
$ juju ssh postgresql/1 'sudo -u postgres -H psql -c "SELECT * FROM ts ORDER BY t"'
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
2017-08-29 16:20:46.796298+00
(3 rows)

As you can see, making and restoring PITR backups with Juju is very straightforward. Juju actions abstract the complication involved in this process and allow us to encapsulate operational knowledge in the charm itself, making your operations playbooks simpler. We just hope you never actually need to use it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s