PostgreSQL Redeploys with Juju Storage

New features and improvements to Juju 2.3’s storage feature provide new mechanisms for server migrations. A PostgreSQL deployment can be made with the data stored on an attached volume, such as a Ceph mount or Amazon EBS volume. To migrate to a new instance, we can bring up new units in the same or new Juju model, or even with a new Juju controller, and reuse the storage volume to bring our data across. While this has been possible for a while, it was an ad hoc process that needed to be performed manually or with non-standard tools like the (now deprecated) BlockStorageBroker charms. With Juju 2.3, the process becomes smooth and can be managed with Juju. Charms like PostgreSQL now have standard mechanisms they can use to support these sorts of processes, and creates opportunities for new features such as major version upgrades. Starting with a configured Juju controller and a fresh model, PostgreSQL can easily be deployed using Juju, with Juju managing storage. Using Amazon for this example, this will deploy a PostgreSQL instance with 50GB of attached EBS storage:

juju deploy cs:postgresql --storage pgdata=ebs,50G

While it is possible to attach storage after the initial deploy, for PostgreSQL it is best to specify storage at deployment time. This way, new units will also be deployed with the same attached storage and have enough space to replicate the database from the primary. So to add a hot standby unit:

juju add-unit postgresql

After things settle, you end up with a deployment like this:

$ juju status

Model Controller Cloud/Region Version SLA
 rightsaidfred aws-ap-southeast-2 aws/ap-southeast-2 2.3.1 unsupported

App Version Status Scale Charm Store Rev OS Notes
 postgresql 9.5.10 active 2 postgresql jujucharms 164 ubuntu

Unit Workload Agent Machine Public address Ports Message
 postgresql/0* active idle 0 13.211.42.219 5432/tcp Live master (9.5.10)
 postgresql/1  active idle 1 52.65.20.140  5432/tcp Live secondary (9.5.10)

Machine State DNS Inst id Series AZ Message
 0 started 13.211.42.219 i-0fc0f0a21290ff909 xenial ap-southeast-2a running
 1 started 52.65.20.140  i-0be95ac0e1a048e6f xenial ap-southeast-2b running

Relation provider Requirer Interface Type Message
 postgresql:coordinator postgresql:coordinator coordinator peer
 postgresql:replication postgresql:replication pgpeer peer

$ juju list-storage

[Storage]
 Unit Id Type Pool Provider id Size Status Message
 postgresql/0 pgdata/0 filesystem ebs vol-0bbe053869187f9c6 50GiB attached
 postgresql/1 pgdata/1 filesystem ebs vol-0a95d56991e1dff1b 50GiB attached
$ juju ssh postgresql/0
 Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1047-aws x86_64)
 [...]
 ubuntu@ip-172-31-15-43:~$ sudo -u postgres psql
 psql (9.5.10)
 Type "help" for help.

postgres=# create table data(d text);
 CREATE TABLE
 postgres=# insert into data values ('hello');
 INSERT 0 1
 postgres=# \q
 ubuntu@ip-172-31-15-43:~$ exit
 logout
 Connection to 13.211.42.219 closed.

We can tear down this deploy whilst preserving our data

$ juju destroy-model rightsaidfred
 WARNING! This command will destroy the "rightsaidfred" model.
 This includes all machines, applications, data and other resources.

Continue [y/N]? y
 Destroying model
 ERROR cannot destroy model "rightsaidfred"

The model has persistent storage remaining:
 2 volumes and 2 filesystems

To destroy the storage, run the destroy-model
 command again with the "--destroy-storage" flag.

To release the storage from Juju's management
 without destroying it, use the "--release-storage"
 flag instead. The storage can then be imported
 into another Juju model.

$ juju destroy-model rightsaidfred --release-storage
 [...]
 Model destroyed.

For the record, I normally wouldn’t be so brash as to destroy the old model before bringing up the replacement. A better approach when dealing with production data is to put the PostgreSQL database into backup mode and duplicate the storage volume (exactly how depends on your cloud provider or bare metal setup). You would then proceed to bring up the new deployment with the duplicated filesystem, while leaving the original deployment in place in case you need to back out the migration.

Continuing, build the new deployment in a new model. First, the master database reusing the destroyed master’s storage (pgdata/0, vol-0bbe053869187f9c6).

$ juju add-model knowwhatimean
 Uploading credential 'aws/admin/aws' to controller
 Added 'knowwhatimean' model on aws/ap-southeast-2 with credential 'aws' for user 'admin'

$ juju import-filesystem ebs vol-0bbe053869187f9c6 pgdata
 importing "vol-0bbe053869187f9c6" from storage pool "ebs" as storage "pgdata"
 imported storage pgdata/0

$ juju deploy cs:postgresql --attach-storage pgdata/0
 Located charm "cs:postgresql-164".
 Deploying charm "cs:postgresql-164".

At this point, it is important to wait for setup to complete. If we attempted to bring up a second unit right now, there is a chance the second unit would be anointed the master; it would depend on which AWS VM happened to spin up and complete initial setup first. And that would be bad, as a new, empty database would be replicated instead of the one on the attached storage.

$ juju status
 Model Controller Cloud/Region Version SLA
 knowwhatimean aws-ap-southeast-2 aws/ap-southeast-2 2.3.1 unsupported

App Version Status Scale Charm Store Rev OS Notes
 postgresql 9.5.10 active 1 postgresql jujucharms 164 ubuntu

Unit Workload Agent Machine Public address Ports Message
 postgresql/0* active idle 0 13.55.208.91 5432/tcp Live master (9.5.10)

Machine State DNS Inst id Series AZ Message
 0 started 13.55.208.91 i-09c740e176da8f90f xenial ap-southeast-2a running

Relation provider Requirer Interface Type Message
 postgresql:coordinator postgresql:coordinator coordinator peer
 postgresql:replication postgresql:replication pgpeer peer

$ juju ssh postgresql/0
 [...]
 ubuntu@ip-172-31-11-134:~$ sudo -u postgres psql
 psql (9.5.10)
 Type "help" for help.

postgres=# \d data
 Table "public.data"
 Column | Type | Modifiers
 --------+------+-----------
 d | text |

postgres=# \q
 ubuntu@ip-172-31-11-134:~$ exit
 logout
 Connection to 13.55.208.91 closed.

Now, it is safe to add a new unit.

$ juju import-filesystem ebs vol-0a95d56991e1dff1b pgdata
 importing "vol-0a95d56991e1dff1b" from storage pool "ebs" as storage "pgdata"
 imported storage pgdata/1

$ juju add-unit postgresql --attach-storage pgdata/1

[... wait ...]

$ juju status
 Model Controller Cloud/Region Version SLA
 knowwhatimean aws-ap-southeast-2 aws/ap-southeast-2 2.3.1 unsupported

App Version Status Scale Charm Store Rev OS Notes
 postgresql 9.5.10 active 2 postgresql jujucharms 164 ubuntu

Unit Workload Agent Machine Public address Ports Message
 postgresql/0* active idle 0 13.55.208.91  5432/tcp Live master (9.5.10)
 postgresql/1  active idle 1 52.65.239.125 5432/tcp Live secondary (9.5.10)

Machine State DNS Inst id Series AZ Message
 0 started 13.55.208.91  i-09c740e176da8f90f xenial ap-southeast-2a running
 1 started 52.65.239.125 i-0c60aea0716cf8320 xenial ap-southeast-2b running

Relation provider Requirer Interface Type Message
 postgresql:coordinator postgresql:coordinator coordinator peer
 postgresql:replication postgresql:replication pgpeer peer

$ juju list-storage
 [Storage]
 Unit Id Type Pool Provider id Size Status Message
 postgresql/0 pgdata/0 filesystem ebs vol-0bbe053869187f9c6 50GiB attached
 postgresql/1 pgdata/1 filesystem ebs vol-0a95d56991e1dff1b 50GiB attached

Future charm work is expected to make migration stories even easier, with Ubuntu 18.04 (Bionic) support and the latest version of PostgreSQL. pg_rewind will avoid unnecessary database cloning. And logical replication should allow major version upgrades, bringing up a new PostgreSQL deployment in parallel with a live deployment and cutting over.

PostgreSQL Point-in-Time Recovery with Juju

Databases should not lose data, but we still have to plan for recovery. We make logical backups when we can, exporting the database to SQL (pg_dump). Even terabyte sized databases can be dumped daily. But as databases become bigger, logical backups are becoming more of a luxury; they require long running transactions, and can take an impractical amount of time with multi-terabyte databases.

Binary backups are becoming standard. An inconsistent backup of the filesystem is made, along with the Write Ahead Log (WAL) files needed to repair the inconsistent backup. While generally larger in size, the backup process is less resource intensive on the database. And if you keep archiving the WAL files as they are produced, you have the option of Point In Time Recovery (PITR).

Deploying PostgreSQL with Juju simplifies the process of making and restoring PITR backups. The PostgreSQL charm uses the WAL-E tool for binary backups, WAL archiving and PITR. It stores the backups in cloud storage, supporting OpenStack Swift, Amazon S3, Azure WABS and Google Cloud Storage. I’m using it in production with OpenStack Swift, and I’m interested in hearing about experiences from others using other cloud storage options.
Once you have Juju setup and a model created, it’s easy to deploy replicated PostgreSQL:

juju deploy -n 3 cs:postgresql --storage pgdata=1G
watch -c juju status --color

Juju spins up three default instances, and the PostgreSQL charm installs PostgreSQL and configures one unit as the master and the other two as hot standbys. Normally, our deployments are to bare metal and we would be deploying to servers provisioned by MAAS and added into our model using the manual provider, but here I’m just using default containers and default Juju storage for the data:

Model  Controller  Cloud/Region         Version  SLA
walex  lxd         localhost/localhost  2.2.2    unsupported

App         Version  Status  Scale  Charm       Store       Rev  OS      Notes
postgresql  9.5.8    active      3  postgresql  jujucharms  158  ubuntu

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0*  active    idle   0        10.0.4.30       5432/tcp  Live master (9.5.8)
postgresql/1   active    idle   1        10.0.4.71       5432/tcp  Live secondary (9.5.8)
postgresql/2   active    idle   2        10.0.4.198      5432/tcp  Live secondary (9.5.8)

Machine  State    DNS         Inst id        Series  AZ  Message
0        started  10.0.4.30   juju-a823cc-0  xenial      Running
1        started  10.0.4.71   juju-a823cc-1  xenial      Running
2        started  10.0.4.198  juju-a823cc-2  xenial      Running

Relation     Provides    Consumes    Type
replication  postgresql  postgresql  peer

Next up, I need to configure WAL-E. I’m going to be using OpenStack Swift for storage, so it will need those credentials:

$ cat swift.yaml
postgresql:
  os_username: osme
  os_tenant_name: osme_project
  os_password: secret
  os_auth_url: https://keystone.example.com:443/v2.0/
$ juju config postgresql --file swift.yaml

And I need to instruct the charm what container or bucket to use (the URI is the WAL-E documented format). This also triggers the WAL-E Snap package installation:

$ juju config postgresql wal_e_storage_uri=swift://walex_1

N.B. While the charm creates the Swift container automatically, with other storage backends you may need to create it manually using the cloud native tools.

WAL files by themselves are not terribly useful. We need a filesystem level backup of the database to apply them to. We could wait for cron to kick off the regular job, but it is even better to run it right now. There is a Juju action to do just that, which must be run on the master unit (as reported by juju status; it will not always be the first unit). The action runs WAL-E, which streams a copy of the database and necessary WAL files directly to cloud storage; unlike the charm’s logical backup support, you do not need to worry about having enough disk space on the unit to store a copy of the database.

$ juju run-action --wait postgresql/0 wal-e-backup
action-id: 29dc70c3-a3c5-4ecb-8dc9-c060d485a574
results:
  backup-return-code: "0"
  wal-e-backup-cmd: /snap/bin/wal-e.envdir /etc/postgresql/9.5/main/wal-e.env /snap/bin/wal-e
  backup-push /var/lib/postgresql/9.5/main
  wal-e-prune-cmd: None
status: completed
timing:
  completed: 2017-08-29 13:07:18 +0000 UTC
  enqueued: 2017-08-29 13:06:26 +0000 UTC
  started: 2017-08-29 13:06:33 +0000 UTC
$ juju run-action --wait postgresql/0 wal-e-list-backups
action-id: b699bbd6-9738-4d7e-8286-dc94874b8c12
results:
  base-00000001000000000000000a-00000096:
    expanded-size-bytes: "21710137"
    last-modified: 2017-08-29T13:07:17.113900
    name: base_00000001000000000000000A_00000096
    wal-segment-backup-start: 00000001000000000000000A
    wal-segment-backup-stop: 00000001000000000000000A
    wal-segment-offset-backup-start: 00000096
    wal-segment-offset-backup-stop: "00000360"
status: completed
timing:
  completed: 2017-08-29 13:07:31 +0000 UTC
  enqueued: 2017-08-29 13:07:23 +0000 UTC
  started: 2017-08-29 13:07:25 +0000 UTC

I’m going to create an example table with some data for us to test PITR. I have not got a client connected, so I will do this directly on the master server. I then need to force PostgreSQL to cycle the current WAL file, so these changes can be archived off to Swift; there is no other traffic to cause it to fill, and I have not set any PostgreSQL options to force regular WAL cycling:

postgres@juju-a823cc-0:/home/ubuntu$ psql
psql (9.5.8)
Type "help" for help.

postgres=# CREATE TABLE ts (t TIMESTAMP WITH TIME ZONE);
CREATE TABLE
postgres=# INSERT INTO ts VALUES (CURRENT_TIMESTAMP);
INSERT 0 1
postgres=# INSERT INTO ts VALUES (CURRENT_TIMESTAMP);
INSERT 0 1
postgres=# INSERT INTO ts VALUES (CURRENT_TIMESTAMP);
INSERT 0 1
postgres=# SELECT * FROM ts ORDER BY t;
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
2017-08-29 13:32:46.835794+00
(3 rows)

postgres=# SELECT pg_switch_xlog();
pg_switch_xlog
----------------
0/B0502B0
(1 row)

Now let’s do a PITR on a fresh unit in a new Juju model. This is another default deploy, with the same configuration except for the wal_e_storage_uri setting. Every service must use a unique bucket, or they will collide and your backups are at risk.

Here we are not shutting down the original three units, so this approach can also be used to test recovery, build staging servers or even to migrate to new hardware:

$ juju run-action --wait postgresql/0 wal-e-restore storage-uri=swift://walex_1 target-time='2017-08-29 13:32:41.000000+00' confirm=true
action-id: bf0f24e4-6bc9-493c-80c0-103e7a2c87f5
status: completed
timing:
  completed: 2017-08-29 15:24:07 +0000 UTC
  enqueued: 2017-08-29 15:22:09 +0000 UTC
  started: 2017-08-29 15:22:11 +0000 UTC

And check that this really was a recovered to the desired point in time; the timestamps exist from before the target time, but none after:

$ juju ssh postgresql/0
[...]
$ sudo -u postgres -s -H
postgres@juju-268cdf-0:/home/ubuntu$ psql
psql (9.5.8)
Type "help" for help.

postgres=# SELECT * FROM ts ORDER BY t;
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
(2 rows)

We can also recover our original deployment, for example if you need to roll back following massive data loss. Ideally you would deploy to new units and cut over, but you don’t always have the necessary hardware available. The first thing we need to do is change the wal_e_storage_uri setting to a new, unique value. This is a deliberate limitation of the charm, to avoid collisions when multiple recovery attempts are made; the Juju actions exist to make recovery easy, and to avoid corrupting your backups easily. Here, we recover the master database. The standby servers are left alone, and are still live and able to serve queries on the original time line throughout the recovery process:

$ juju config postgresql wal_e_storage_uri=swift://walex_3
$ juju ssh postgresql/0 'sudo -u postgres -H psql -c "SELECT * FROM ts ORDER BY t"'
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
2017-08-29 13:32:46.835794+00
(3 rows)

$ juju run-action --wait postgresql/0 wal-e-restore storage-uri=swift://walex_1 target-time='2017-08-29 13:32:41.000000+00' confirm=true
action-id: 5ae3026e-db80-4a07-89e8-c31a28e05c80
status: completed
timing:
  completed: 2017-08-29 16:17:22 +0000 UTC
  enqueued: 2017-08-29 16:14:44 +0000 UTC
  started: 2017-08-29 16:14:48 +0000 UTC

$ juju ssh postgresql/0 'sudo -u postgres -H psql -c "SELECT * FROM ts ORDER BY t"'
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
(2 rows)

And recover one of the standbys, confirming replication still works:

$ juju run-action postgresql/1 wal-e-restore storage-uri=swift://walex_1 target-time='2017-08-29 13:32:41.000000+00' confirm=true
Action queued with id: 2b094006-f508-46a6-801a-d863ea4ac33a
$ juju show-action-output 2b094006-f508-46a6-801a-d863ea4ac33a
status: completed
timing:
  completed: 2017-08-29 16:19:57 +0000 UTC
  enqueued: 2017-08-29 16:18:02 +0000 UTC
  started: 2017-08-29 16:18:04 +0000 UTC

$ juju ssh postgresql/0 'sudo -u postgres -H psql -c "INSERT INTO ts VALUES (CURRENT_TIMESTAMP)" '
INSERT 0 1
$ juju ssh postgresql/1 'sudo -u postgres -H psql -c "SELECT * FROM ts ORDER BY t"'
t
-------------------------------
2017-08-29 13:32:35.532336+00
2017-08-29 13:32:40.963069+00
2017-08-29 16:20:46.796298+00
(3 rows)

As you can see, making and restoring PITR backups with Juju is very straightforward. Juju actions abstract the complication involved in this process and allow us to encapsulate operational knowledge in the charm itself, making your operations playbooks simpler. We just hope you never actually need to use it.