Postgres Load Testing with pgreplay


Usually we want to test before deploying changes to a production Postgresql cluster. Commonly the test itself is executed in a context that is not very similar to the production environment. How can you run a test that is realistic so there are no horrible surprises when you deploy on the production system?  Read on! This document describes a procedure where changes can be tested on a system that is running a load that is nearly identical to a production system.

This procedure utilizes pgreplay, which reads postgres' server logs and executes the sql statements found there, with the same timing that they were executed on the production system.

This procedure also uses lvm and its volume snapshot feature for fast testing iteration.

Collect Production Logs

Normally only select sql statements (such as slow queries) are logged to the server log. We will temporarily change the cluster configuration so that ALL sql statements are logged, as well as some other things.

It's convenient to use the "alter system" statement for this, if the Postgresql version is recent enough; if it's not recent enough, the "include" directive in postgresql.conf can be used instead.

alter system set log_destination = 'stderr,csvlog';
alter system set log_filename = 'postgresql-%Y-%m-%d_%H%M%S_verbose.log';
alter system set log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '; --only change if not using csv
alter system set log_statement = 'all';
alter system set log_min_messages = error;
alter system set log_min_error_statement = log;
alter system set log_connections = on;
alter system set log_disconnections = on;

Reload postgres.

For a more realistic test, it helps to find a point in time with no open transactions. This point in time will be used later in the replay, and allows the replay to start in a consistent state. Run the following query interactively until it returns a timestamp, and record the timestamp for later:

select now()
from pg_stat_activity
    xact_start is not null
    and query not like 'autovacuum%'
having count(*) = 0;

Keep the temporary logging settings in effect during a time and for a duration that you judge to be representative of production activity. One hour is a starting point to consider. After that duration, reset the configs:

alter system reset all;

or, if you have other configs that you don't want to reset, then reset these individually:

alter system reset log_destination;
alter system reset log_filename;
alter system reset log_line_prefix;
alter system reset log_min_duration_statement;
alter system reset log_min_messages;
alter system reset log_min_error_statement;
alter system reset log_connections;
alter system reset log_disconnections;

Copy the log file (csv preferably) to the test system.

On the test system, edit the first log file, deleting all lines preceding the timestamp recorded above.

Restore the Test Cluster

You need to restore the test cluster to the exact timestamp determined above. Now write a file recovery.conf using that timestamp:

standby_mode = on
recovery_target_timeline = 'latest'
recovery_target_inclusive = false
recovery_target_time = '2017-07-18 15:06:19.348'
restore_command = 'cp /path/to/archive/%f %p'

Use your base backups to recover a new cluster to a lvm volume using this recovery.conf file. After recovering from WAL archives to the designated timestamp, recovery will pause. At this point you want it to stay paused -- do not "resume" recovery. Instead, simply stop the cluster.

Snapshot the Filesystem

You are going to run these tests multiple times. Recovering from a base backup after each test is time-consuming, and we can use filesystem snapshots as a shortcut.

Note: Running a cluster on a filesystem snapshot creates performance overhead that will differ from your production system. If you need the test system to have equal performance to production, then skip the filesystem snapshot and instead take the long slow route of recovering from base backups after each test.

More typically, it is more important to compare pre- and post- deployment performance to each other, not necessarily compare to production performance. The snapshot volume only needs to be large enough to hold the changed disk pages that your tests will create. Create the snapshot and mount it with something like this:

lvcreate --size 10G --snapshot --name pg_testing_snapshot /dev/vg1/existing_cluster_here
mount /dev/vg1/pg_testing_snapshot /mnt/pg_testing

Later, pgreplay is going to make lots of connections with usernames from production. You need to set up some loose authentication configurations for that in pg_hba.conf, such as:

local      all      all     trust

Start up the cluster whenever you are ready. It will recover to point it time configured before. Promote the cluster with "select pg_xlog_replay_resume()". Check the logs to confirm operation.

Run Workload Test with pgreplay

pgreplay is simple to use. It's also simple to install. If you are using the pgdg repos, just "yum install pgreplay" or "apt-get install pgreplay". pgreplay can read raw log files but you can first test the log file for pgreplay-readiness and simultaneously convert it to binary "replay" format that needs less parsing by pgreplay:

pgreplay -f -c -o biglog.pgreplay biglog.csv

Then start your test: execute pgreplay, passing it the "replay" file. Your cluster will start processing sql just like the production system did earlier:

pgreplay -r biglog.pgreplay

Rollback the Tests: Drop the lvm Snapshot

Perform any other tests you want on the cluster. When you are done with your test, you can drop the snapshot and start over.

Caution: all data in the snapshot will be lost, ie. you will have just the filesystem that it was snapshot from. Copy any notes, log files, scripts etcetera elsewhere if you want them saved.

Stop the cluster, then:

umount /dev/vg1/pg_testing_snapshot
lvremove /dev/vg1/pg_testing_snapshot

From here, to start with a new test, go back to step "Snapshot the Filesystem". Repeat until everything is just right!