Replication

Setting up replication for PipelineDB is identical to how you would set up replication on regular PostgreSQL. If you’ve already done so in the past, all of this might sound extremely familiar. If not, then it might be worth a read because setting up replication on PostgreSQL has a lot of quirks, mostly as a result of how replication has evolved overtime. The history of replication in PostgreSQL by Peter Eisentraut is a fun read if you want to learn more about this evolution.

We’re not going to look at old replication methods such as Log-Shipping Standby Servers since they’re overly complex and not very robust. The only reason to use them would be if you had an old PostgreSQL version running, but since we’re built into the PostgreSQL 9.5 core, we can leverage all the latest and greatest features PostgreSQL has to offer.

Streaming Replication

PipelineDB supports PostgreSQL’s streaming replication (both asynchronous and synchronous variants) out of the box. Using streaming replication, we can create a hot-standby node which keeps up to date with the primary by tailing the write-ahead log and can serve read-only requests. In case the primary fails, the hot-standby can be promoted to be the new primary.

Let’s say we already have a PipelineDB instance running on localhost:5432. The first thing we need to do is create a role on the primary with REPLICATION previledges. This role will be used by the standby to connect to the primary and stream the WAL.

$ psql -h localhost -p 5432 -d postgres -c \
"CREATE ROLE replicator WITH LOGIN REPLICATION;"

CREATE ROLE

You can also create the role with a PASSWORD option, in case your primary is on the open network (free tip: it never should be).

Next we need to add an entry for the standby to the pg_hba.conf file. You can find it in the data directory of the primary. The pg_hba.conf file handles all client authentication details for PipelineDB. For our example, we’ll append the following entry to it, but for any real-world setup it will almost always be different.

host replication replicator 0.0.0.0/0 trust

Next, we need to set a few configuration parameters on the primary by either updating the postgresql.conf file or passing them as -c flags when starting up postgresql. Set wal_level to hot_standby, hot_standby to on, max_replication_slots to 1, and max_wal_senders to 2. Even though one standby node only needs one sender connection, 2 are needed while bootstrapping (not necessarily, but at least in the method documented below). You will need to restart the primary after updating these parameters.

Last we will create a replication slot for the standby. Replication slots are a means for the standby to register with the primary, so that it is always aware of what WAL segments need to be kept around. Once a standby has consumed a WAL segment, it updates the restart_lsn column in the pg_replication_slots catalog so that the primary knows it can now garbage collect that WAL segment.

$ psql -h localhost -p 5432 -d postgres -c \
"SELECT * FROM pg_create_physical_replication_slot('replicator_slot');"

    slot_name    | xlog_position
-----------------+---------------
 replicator_slot |
(1 row)

This is all the setup work we need to do on the primary. Let’s move on to the standby now. The first thing we need to do on the standby is to take a base backup of the primary. You can think of this as rsync-ing the primary’s data directory which the standby will use as its starting point. For this we use the pipeline-basebackup utility (analagous to pg_basebackup). You can also use rsync—it tends to be a little faster, but at the added complexity of dealing with authentication setups yourself. pipeline-basebackup uses a normal PostgreSQL connection to ship all the base files so you don’t have to worry about auth details.

$ pg_basebackup -X stream -D /path/to/standby_datadir -h localhost -p 5432 -U replicator

This -X stream argument is what requires the second slot when taking a base backup. Essentially what this does is stream the WAL for changes that take place while the base backup is happening, so we don’t need to manually configure the wal_keep_segments parameter.

The final thing we need to do is write a recovery.conf in the standby’s data directory which tells the PipelineDB instance that it needs to operate under standby mode and how to connect to the primary node. For us it will look like:

standby_mode = 'on'
primary_slot_name = 'replicator_slot'
primary_conninfo = 'user=replicator host=localhost port=5432'
recovery_target_timeline = 'latest'

We’re all set now. Let’s fire off the hot standby on post 6544.

pg_ctl start -D /path/to/standby_datadir -o "-p 6544"

You should see something like the following in the standby’s log file:

LOG:  entering standby mode
LOG:  redo starts at 0/5000028
LOG:  consistent recovery state reached at 0/50000F0
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 0/6000000 on timeline 1

Just to make sure, connect to the standby and confirm it’s in recovery mode.

$ psql -h localhost -p 6544 -d postgres -c \
"SELECT pg_is_in_recovery();"

 pg_is_in_recovery
-------------------
 t
(1 row)

High Availability

PostgreSQL doesn’t come with high availability options out of the box. Most deployments will rely on manually promoting the hot standby in case of a primary failure. Failover can be triggered by pg_ctl promote or touching a trigger file is there is a trigger_file setting in the recovery.conf file. Compose.io has a good blog post about how they designed their HA solution. You could potentially reuse their Governor system; make sure to change the PostgreSQL binaries referenced in the code to their PipelineDB equivalent ones though.

Please get in touch if all of this seems inadequte and we’ll help you figure something out!