Configuring Binary Replication with pgBackRest

Introduction

PostgreSQL has two forms of native replication: logical replication and binary replication. Logical replication offers tuple-by-tuple changes streamed from a primary server to a secondary server. Binary replication, also known as physical replication, sends changes at a disk block level.

Binary replication allows for backing up an entire database and recovering it to a specific point in time, called point-in-time-recovery (PITR). PostgreSQL accomplishes this by using a write-ahead log (WAL), which details the transactions that occur. This tutorial provides a guide for implementing binary replication between a primary and secondary server.

Assumptions

This tutorial assumes that you are operating on a Linux OS, and the examples shown are specific to Ubuntu 22.04 LTS. It is also assumed that PostgreSQL 14 is already installed on your server, and data in the database already exists. The user performing commands should have sudo access to the postgres user. You will need ssh access to your machines for this tutorial, if you are unaware of how to enable that, please see this tutorial.

Variables

The primary server has an example IP address of 198.168.0.1, and the secondary server has an IP address of 198.168.0.2. The user, database, table, target directories/files, etc. are examples and will need modification specific to you and your system. These are highlighted green in the code examples.

____________________________________________________________________________

Primary Server Procedure

Step 1: PgBackRest Installation and Configuration

PgBackRest is a reliable backup and restore option for PostgreSQL. Since we are using pgBackRest to backup the data in our database, we’re installing it on both the primary and secondary servers. With pgBackRest on both machines, we can call it from the secondary, and it will execute on the primary.

Let us begin by installing it on our primary server. We update the repositories on the servers with:

sudo apt update

And now we install pgBackRest with:

sudo apt -y install pgbackrest

Now that we have pgBackRest installed, we need to configure it for binary replication. It is good practice when altering an important file to first back it up. We shall do so with:

sudo cp /etc/pgbackrest.conf /etc/pbackrest.conf.backup

Now we can edit the configuration without a care in the world (except probably that it works). Let’s edit the configuration file with:

sudo nano /etc/pgbackrest.conf

The following are settings we will alter for this tutorial:

The [global] section in the configuration file defines the location of backups and logging settings.

  • repo1-path defines the path to the location where backups are stored.
  • repo1-retention-full defines how many full backups are kept (2).
  • log-level-console set to info will display detailed information of processes to the console.
  • log-level-file set to debug allows enough information to be sent to the log file for us to troubleshoot if necessary.

The [tutorial] section is the name of the stanza that we are creating.

  • pg1-path defines the path to the data being backed up.

More detailed options can be found here.

Let’s edit the configuration to set the desired parameters:

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
log-level-console=info
log-level-file=debug
[tutorial]
pg1-path=/var/lib/postgresql/14/main

Step 2: Create Replication Role

To securely access the data and replicate it, it is important for us to create a role specifically for that purpose.

Let’s launch psql with:

sudo -iu postgres psql

Create the replication role, replicator:

CREATE ROLE replicator WITH REPLICATION LOGIN;

Next, we want to securely set our password. You can set the password while establishing the role, but we do it in a separate command to ensure our password is not displayed on-screen and not recorded in terminal logs. We do so by using the password meta-command and telling psql which role we want to change:

\password replicator

This prompts you to enter and repeat your desired password. Now we have set up our replication role and pgBackRest. Next, it is time to configure PostgreSQL.

Step 3: Primary Server PostgreSQL Configuration

We need to configure postgresql.conf and pg_hba.conf to enable binary replication. Configuration files are found in different places on different systems, but we can locate our configuration file with:

sudo -iu postgres psql -U postgres -c 'SHOW config_file'

In this case, the return was:

config_file
-----------------------------------------
/etc/postgresql/14/main/postgresql.conf
(1 row)

So, let’s edit the contents with your favorite text editor:

sudo nano /etc/postgresql/14/main/postgresql.conf

The comment hash (#) must be removed from the beginning of each line we want to enable so the system will recognize it as an active setting. Find the line with listen_addresses and alter the text within quotes to be the IP address of the primary server, or a wildcard (*) to indicate that PostgreSQL will listen to all addresses it has access to. It will look something like this:

listen_addresses = '198.168.0.1'

or

listen_addresses = '*'

To keep the connection secure, we want the encryption on passwords set to scram_sha_256. Since scram_sha_256 is the default setting for PostgreSQL 14, we don’t need to change anything in the file. But, if you are working with PostgreSQL 13 or older, we can do so by uncommenting the password_encryption line and adding scram_sha_256 to look like this:

password_encryption = scram_sha_256

NOTE: If you choose to use a different encryption method, be sure to include the correct encryption method in the pg_hba.conf file.

The Write-Ahead Log (WAL) is the log of changes made to a database cluster to be used as part of the database recovery process or, as in this case, to replay changes in the database for replication. The wal_level setting determines how much information is written into the WAL file. We only need enough information in the WAL file to perform binary replication, and this is set to replica by default. If you are working with a version older than PostgreSQL 10, this will not be default, and we should set wal_level to replica, like so:

wal_level = replica

The max_wal_senders setting determines the maximum number of concurrent connections from secondary servers or streaming base backup clients. The default setting for max_wal_senders is 10, which is an appropriate setting for a production environment. However, since we will only have one physical replica connected, we can just set this to 3:

max_wal_senders = 3

To archive any data, we must set the archive_mode to on. Doing so allows completed WAL segments to be sent to archive storage:

archive_mode = on

With archiving enabled, we now need to tell PostgreSQL where we want to send the WAL segments. We do so with the archive_command setting, which tells the local shell to archive a completed WAL file segment. With this command, we are telling pgBackRest to point to the configuration settings of the tutorial stanza and push the archive to the designated path, %p (repo1-path):

archive_command = 'pgbackrest --stanza=tutorial archive-push %p'

Now that we have made the changes to this configuration file, we will save and exit the postgresql.conf file.

Step 4: Primary Server Host-Based Authentication Configuration

Next up is configuring the host-based authentication file, pg_hba.conf. This file should be located in the same directory as postgresql.conf, so we open it to edit with:

sudo nano /etc/postgresql/14/main/pg_hba.conf

The pg_hba.conf file has 5 fields that need to be filled out to authenticate a client to use PostgreSQL: the host type (TYPE), database name (DATABASE), user name (USER), IP address (ADDRESS), and encryption method (METHOD).

  • Set TYPE to host to match connection attempts using TCP/IP.
  • Enter replication as the record for DATABASE to have access to databases set up for replication.
  • Enter replicator under USER because the role we created will need access from the secondary.
  • Under ADDRESS, enter the IP address of the secondary server that will be accessing the database.
  • Under METHOD, enter the password encryption method we are using, which is scram-sha-256.

Add the following line to pg_hba.conf:

# TYPE DATABASE USER ADDRESS METHOD

host replication replicator 198.168.0.2/32 scram-sha-256

Now our PostgreSQL configuration is complete, but we still need to enable the changes.

Step 5: Restart PostgreSQL

To enable the configuration changes made to postgresql.conf and pg_hba.conf, we need to restart PostgreSQL. The systemd service unit name varies depending on the OS and software package source you are using. In Debian (Ubuntu) and most Linux systems, we can do so with the systemctl command:

sudo systemctl restart postgresql

The green highlighted portion is likely to vary between systems and versions of PostgreSQL. For example, it is possible that you need to use a command like this:

sudo systemctl restart postgresql-14

Or this:

sudo systemctl restart postgresql@14-main

In these examples, you replace the 14 with the version you are working with.

Secondary Server Procedure

Step 6: Edit Secondary pgBackRest Configuration

Let’s begin by installing pgBackRest on the secondary. First, we update the repositories:

sudo apt update

And now we install pgBackRest with:

sudo apt -y install pgbackrest

Now let’s configure pgBackRest on the secondary server. It is very similar to how we have pgBackRest configured on the primary, but we are adding extra settings to allow binary replication from the primary. Under the [global] section we include:

  • The repo1-host, which is the IP address to the primary server (where the repository is located).
  • We will also include repo1-host-user, which we set to postgres as the OS user who owns the repository. By default, this is the pgbackrest user on the primary server, which is not configured.
  • We also include delta set to y, which automatically uses the delta restore option.

Additionally, we include a few recovery-option settings in the [tutorial] stanza section.

  • The primary_conninfo setting details the connection information to the primary server, and we include the host IP address and the PostgreSQL user responsible for the replication.
  • The recovery_target_timeline set to latest tells PostgreSQL to use the latest WAL file for recovery.

Let’s configure /etc/pgbackrest.conf on the secondary to look like this:

[global]
repo1-path=/var/lib/pgbackrest
repo1-host=198.168.0.1
repo1-host-user=postgres
repo1-retention-full=2
log-level-console=info
log-level-file=debug
delta=y

[tutorial]
pg1-path=/var/lib/postgresql/14/main
recovery-option=primary_conninfo=host=198.168.0.1 user=replicator
recovery-option=recovery_target_timeline=latest

Data Transfer

Step 7: Backup on the Primary

We are ready to create the stanza on the primary server.

The --stanza option is the name of our stanza, which should match the section we added to pgbackrest.conf. The stanza-create command creates the stanza:

pgbackrest --stanza=tutorial stanza-create

Now let’s confirm we have information created with the previous command using the check command:

pgbackrest --stanza=tutorial check

And finally, we arrive at our backup. We use the backup command with the --type option set to full, indicating that we want a full backup:

pgbackrest --stanza=tutorial --type=full backup

Now we have a full backup of our database!

Step 8: Restore on the Secondary

Let’s restore the backup of our primary database on the secondary.

First things first, we need to stop the cluster from running before restoring. The simplest way to do this is to stop PostgreSQL on the primary with:

sudo systemctl stop postgresql

Now that the cluster is not running processes, we use the restore command on the secondary to recreate the data from the primary:

sudo -u postgres pgbackrest --stanza=tutorial --type=standby restore

Let’s start PostgreSQL to enable the changes in a functioning secondary database.

sudo systemctl start postgresql

Follow-Up Procedures

Step 9: Confirm Successful Replication

Let’s confirm that the replication was successful.

Enter the psql prompt and run the following query:

sudo -iu postgres psql
SELECT * FROM select_table;

If the data on the secondary matches the data from the primary, the replication has been successful. Let’s confirm that the data is streaming properly.

Return to primary and insert some data into our select_table (substitute data to match your schema):

INSERT INTO select_table VALUES ('some data', 101);

Make sure the data was inserted properly (substitute data to match what you inserted):

SELECT * FROM select_table WHERE select_column='some data';

For the changes to be reflected on the secondary, we must first backup the data from the primary. Let’s exit psql, and do so as before:

exit
sudo -u postgres pgbackrest --stanza=tutorial --type=full backup

Now, let’s return to the psql prompt on the secondary to make sure the changes have been made:

sudo -iu postgres psql
SELECT * FROM select_table WHERE select_column='some data';

If the data matches the data from the primary, you have successfully set up streaming binary replication!

Step 10: Automate Backups

Now that we have streaming binary replication set up, we want to automate backups to keep our secondary up-to-date with the latest changes. This can be done for a plethora of intervals, but let’s keep it simple and backup every night. We are using the cron service to automate this task. We want our cron job to be carried out by the postgres user, so we will become the postgres user:

sudo -iu postgres

Now we open a cron table (crontab) with:

crontab -e

The format of the crontab is laid out in the following format:

minute hour day-of-month month day-of-week command

To set up daily backups, we add a line to the end of the file.

  • The minute setting is 0 (zero; will occur on the hour)
  • The hour setting is 23 (will occur at 11 pm).
  • The day-of-month, month, and day-of-week settings are the wildcard (*), so it occurs for each of these.

The command is the same used for our full backup:

0 23 * * * pgbackrest --stanza=tutorial --type=full backup

Our primary will now backup each night, and our secondary will be updated daily.

Summary

There you have it! We installed and configured pgBackRest, configured PostgreSQL for binary replication, set up a passwordless SSH connection, transferred data between servers using binary replication, and automated the process to keep our backup current. From now on, you should have a functioning secondary for your primary database server.