Multi-AZ PostgreSQL COMMIT wait events: WALSync, SyncRep & XactSync

Multi-AZ PostgreSQL COMMIT wait events: WALSync, SyncRep & XactSync

When deploying your PostgreSQL application on AWS, you can replicate the database to multiple Availability Zones to failover in case of failures. This is commonly referred to as “Multi-AZ”, but this covers different options, and the documentation on how it works is not comprehensive. To achieve failover with a Recovery Point Objective (RPO) of zero, the replication must be synchronous to ensure that all changes are replicated when your commit is successful. When PostgreSQL waits, the reason is exposed with wait events, which are visible in pg_stat_activity, or Performance Insights in RDS. By examining the wait events for the COMMIT statement, you can determine the replication method being used by your deployment option. Here are some examples.

I have created three databases:

database-1 is RDS PostgreSQL Multi-AZ DB instance, using block storage replication to another AZ
database-2 is RDS PostgreSQL Multi-AZ DB instance, using PostgreSQL WAL replication to two other AZs
database-3 is RDS Aurora PostgreSQL Compatible, using its proprietary WAL replication to storage servers on three AZs

I described those options in a previous blog post:

On each of them, I’ve run PgBench for one hour, inserting rows from five sessions, with a commit every thousand rows:

pgbench nrf /dev/stdin c 5 P 5 T 3600 <<‘SQL’
begin transaction;
insert into demo select from generate_series(1,1000);
commit;
SQL

I’ll explain what happens on my three databases

Multi-AZ DB instance: IO:WALSync when writing to replicated storage

The Multi-AZ DB instance was the first one available in Amazon RDS. It doesn’t use the database mechanism for replication but is implemented at a lower level on the block storage (EBS), where all database files are stored. For this reason, there is no specific wait event, but the synchronous replication impacts all writes. At COMMIT, the WAL must be synced to persistent storage to guarantee the Durability of ACID, and this is the IO:WALSync event. Because multiple sessions are writing to the monolithic WAL stream, the COMMIT statement also waits on LWLock:WALWrite to synchronize their writes.

Here is the database load during the test:

Here is the output from PgBench:

query mode: simple
number of clients: 5
number of threads: 1
maximum number of tries: 1
duration: 3600 s
number of transactions actually processed: 168454
number of failed transactions: 0 (0.000%)
latency average = 106.818 ms
latency stddev = 38.720 ms
initial connection time = 1218.503 ms
tps = 46.807315 (without initial connection time)
statement latencies in milliseconds and failures:
33.384 0 begin transaction;
36.695 0 insert into demo select from generate_series(1,1000);
36.740 0 commit;

Replicating all writes is simple but not efficient, as it writes full blocks and waits for the acknowledgment. Performance is acceptable with only one replica but is then limited in terms of availability: if one AZ is down, the database is not protected anymore. This is one reason AWS introduced Aurora, to replicate to more replicas with less I/O transfer.

Multi-AZ DB cluster: IPC:SyncRep when sending WAL to standby instance

The Multi-AZ DB cluster deployment option was added more recently and uses PostgreSQL physical replication with WAL shipping to a standby instance, which applies it to its database and can also be used as a read replica. At commit, in addition to synching the WAL to the local file (IO:WALSync), it synches it with the replica (IPC:SyncRep).

Here is the database load during the test:

Here is the output from PgBench:

query mode: simple
number of clients: 5
number of threads: 1
maximum number of tries: 1
duration: 3600 s
number of transactions actually processed: 164828
number of failed transactions: 0 (0.000%)
latency average = 109.138 ms
latency stddev = 29.712 ms
initial connection time = 2320.518 ms
tps = 45.811266 (without initial connection time)
statement latencies in milliseconds and failures:
32.957 0 begin transaction;
39.538 0 insert into demo select from generate_series(1,1000);
36.643 0 commit;

This approach offers the benefit of utilizing PostgreSQL native replication, which is widely recognized and efficient. However, it requires a standby database to receive and apply the Write-Ahead Log (WAL), resulting in increased costs. Nonetheless, it enables offloading some read operations, such as reporting, to the replicas, particularly those that can tolerate staleness. Having a replica ready to be activated also reduces the Recovery Time Objective (RTO).

Multi-AZ Aurora: IO:XactSync when sending WAL to storage servers

Aurora doesn’t write the WAL on local storage but sends it to the storage servers using its proprietary compute storage process. It is instrumented with a wait event that doesn’t exist in PostgreSQL: IO:XactSync

Here is the database load during the test:

Here is the output from PgBench:

query mode: simple
number of clients: 5
number of threads: 1
maximum number of tries: 1
duration: 3600 s
number of transactions actually processed: 130509
number of failed transactions: 0 (0.000%)
latency average = 137.876 ms
latency stddev = 44.699 ms
initial connection time = 1186.606 ms
tps = 36.263520 (without initial connection time)
statement latencies in milliseconds and failures:
32.617 0 begin transaction;
56.831 0 insert into demo select from generate_series(1,1000);
48.428 0 commit;

The higher latency and lower throughput are due to the process of writing to storage with six replicas across three Availability zones. In this process, four out of the six replicas must acknowledge the Write-Ahead Log (WAL), which is later applied to blocks. The read replicas also need to receive the WAL asynchronously to invalidate their cache and read the newer version from the distributed storage. Contrary to popular belief, Aurora offers additional features such as higher availability and faster point-in-time recovery, but not necessarily better performance.

Summary

Protecting the database from failures impacts performance because synchronous replication is required to allow for no-data-loss recovery. While a single PostgreSQL cluster is always faster, it may lose data or experience high downtimes. Multi-AZ allows access to the database from multiple availability zones, but it is an active-passive configuration where only one instance at a time can open the database for read/write operations. To take it a step further and connect your PostgreSQL application to a database active on all Availability Zones, you can obtain YugabyteDB from the marketplace. When you compare the performance, it is important to compare with Multi-AZ deployments.