Generally, one appreciates new features of PostgreSQL on the release date after anxious inspection of the release notes or having skimmed through the git logs. Every once in a while, particularly when dealing with an older version of PostgreSQL, you will discover a feature that didn't get the necessary fanfare in order to come to your attention, but when you discover it, you're all the happier for it.
I recently ran into an issue with some strange performance issues on a client's PostgreSQL cluster. They were self-hosted on AWS instances/hardware, using replicas cloned from the primary using pg_basebackup and were created with identical machine types, so it had the same IOPS, CPU count, and RAM. However performance on the replicas for specific queries was orders of magnitude worse compared to the primary. Additionally, there were large numbers of pegged backend processes on the replica.
The client then performed a manual VACUUM ANALYZE on the table in question, which resulted in the primary and the replica both returning to <1ms performance for the target query (compared to ~ 10ms on the primary before and 300ms on the replica) and load issues resolving.
Our task was now to explain the data and explain how to prevent this issue from occurring.
Since the machines have the same specs, configuration, and pg_statistics data (due to being part of the physically replicated cluster), this implies that plans for a given query would be the same on a primary or a replica. This further indicates that the likely culprit was that the replica had different data loaded into shared_buffers, so we were running into the typical cold cache sort of situation. Subsequent runs of the same queries should bear that out with similar performance characteristics.
So why was the performance previously bad and could we improve it? After the manual VACUUM was run the performance increased drastically on both primary and replica, indicating that an INDEX ONLY scan was likely being used. Therefore less data needed to be loaded on the replica, so we were no longer IO-limited. This was significant, because in addition to cleaning up dead tuples, VACUUM updates the relation's Visibility Map which is required to be relatively up-to-date for the planner to consider an INDEX ONLY scan.
This was also odd, since the client had regular autovacuum setup, so presumably should be having regular vacuums to keep this up-to-date.
The missing piece of the puzzle
The missing piece of the puzzle is that these tables were insert-only. In versions of PostgreSQL prior to 13, no auto*vacuum* would be kicked off with an insert-only workload, presumably because if there are no updates or deletes, there would be no free space that would need to be reclaimed. autoanalyze was kicked off for insert-only workloads, since there was new data and new statistics that could be gathered. Since autovacuum was not running on this table, the Visibility Map was not updated, and thus the query performance started to suffer over time. Once the Visibility Map was updated, performance returned.
From the PostgreSQL 13 Release notes:
Allow inserts, not only updates and deletes, to trigger vacuuming activity in autovacuum (Laurenz Albe, Darafei Praliaskouski)
Previously, insert-only activity would trigger auto-analyze but not auto-vacuum, on the grounds that there could not be any dead tuples to remove. However, a vacuum scan has other useful side-effects such as setting page-all-visible bits, which improves the efficiency of index-only scans. Also, allowing an insert-only table to receive periodic vacuuming helps to spread out the work of “freezing” old tuples, so that there is not suddenly a large amount of freezing work to do when the entire table reaches the anti-wraparound threshold all at once.
If necessary, this behavior can be adjusted with the new parameters autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor, or the equivalent table storage options.
So this both explains the issue that was seen, as well as shows how it has been solved in future versions of PostgreSQL 13+.
So what are the options for the client now?
- Upgrade to PostgreSQL 13+. While this is on the Roadmap, this is something that will need to be planned out and coordinated and is not an immediate solution.
- Scheduled external VACUUM. Since autovacuum isn't catching this explicitly for the given table, use cron and schedule a regular run of VACUUM ANALYZE on the table in question. The frequency here would be workload dependent, but a good guess would be a weekly run, and see if things need to be moved to more frequently.
- Table storage parameter tweaks. Now we did say that there was no way to force autovacuum to vacuum a specific table, but that's not strictly true. We can (fairly hackishly) tweak the autovacuum_max_freeze_age parameter for the table in question in order to kick off a wraparound-prevention vacuum, which has the side effect of running the type of vacuum we want. This parameter can be set as low as 100k, and determines how many transactions old the database will allow a table to get before forcing a vacuum. This threshold can be set on this table only (and probably should be, if you were to go this route). This does mean that these vacuums are based purely on the number of transactions in the database as a whole, and even if there were no changes to this table whatsoever, this autovacuum would still run at the frequency determined by this parameter.
In short, if you have insert-only tables in a version of PostgreSQL earlier that 13, you could benefit from running a regularly scheduled VACUUM.