Fun with pg_checksums

Greg Sabino Mullane
pgBackRest Database Performance

Data checksums are a great feature in PostgreSQL. They are used to detect any corruption of the data that Postgres stores on disk. Every system we develop at Crunchy Data has this feature enabled by default. It's not only Postgres itself that can make use of these checksums. Some third party tools such as the awesome pgBackRest program can also use them for enhanced data integrity.

Sadly, enabling data checksums is still not the default behavior when creating a new Postgres cluster. When you invoke the initdb program, add the ‑‑data‑checksums flag (or ‑k if you prefer obscure arguments), and your shiny new Postgres cluster will have data checksums enabled.

What's that, you say? You already created your database without checksums, and now you want to add them back in? Lucky for you, there is now an easy way to do that via the pg_checksums utility. In the past, the only way to enable checksums was to create a new cluster and dump your old database to it using the ol' pg_dump | psql method. (Or a more complicated migration using logical replication, Bucardo, etc.) Unfortunately, those methods are very, very slow and error prone. Version 12 of Postgres introduced a new built‑in program named pg_checksums which can enable checksums on your existing checksum‑less database.

Before we fire off some demonstrations, let's review what data checksums do. First, they only checksum the actual table and index data files used by Postgres, but there are other parts of the system that are not checked. Not the system catalogs, nor things like the free space map - only the data itself. Postgres stores data as a bunch of "pages" inside the files on disk. These pages are what are actually checksummed. When the page moves from memory to disk, a checksum of the page gets added to the page header. If the data changes outside of Postgres (i.e. gets corrupted), the checksum will no longer match, and Postgres (as well as pgBackRest!) will complain about the failed checksum.

Here's the help output for the pg_checksums program; as you can see, usage is simple:

$ pg_checksums --help
pg_checksums enables, disables, or verifies data checksums in a PostgreSQL database cluster.

Usage:
pg_checksums [OPTION]... [DATADIR]

Options:
[-D, --pgdata=]DATADIR data directory
-c, --check check data checksums (default)
-d, --disable disable data checksums
-e, --enable enable data checksums
-f, --filenode=FILENODE check only relation with specified filenode
-N, --no-sync do not wait for changes to be written safely to disk
-P, --progress show progress information
-v, --verbose output verbose messages
-V, --version output version information, then exit
-?, --help show this help, then exit

If no data directory (DATADIR) is specified, the environment variable PGDATA
is used.

Report bugs to <pgsql-bugs@lists.postgresql.org>.
PostgreSQL home page: <https://www.postgresql.org/>

On to the demo! For this, we will create a new Postgres cluster (version 14), but without checksums. Before we start, we are going to set the PGDATA environment variable, which allows us to stop telling all these Postgres utilities that "data" is the name of our data directory. It also avoids the confusing collision seen above, as pg_checksums has both -D and -d as arguments!

$ export PGDATA=data
$ initdb --no-instructions --auth=trust
The files belonging to this database system will be owned by user "greg".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Livingstone/Zambia
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Before we start up the database, let's enable better Postgres logging, and disable all TCP/IP listening, even on localhost:

$ echo -e "logging_collector=on\nlisten_addresses='' " >> data/postgresql.conf
$ pg_ctl start
waiting for server to start....
2020-09-05 00:00:47.914 EDT [8499] LOG: redirecting log output to logging collector process
2020-09-05 00:00:47.914 EDT [8499] HINT: Future log output will appear in directory "log".
done
server started

Time for another ENV shortcut. In this case, we want to always use a database named "testdb". After creating the database, we will use the pgbench utility to populate some sample data:

$ export PGDATABASE=testdb
$ createdb
$ pgbench --initialize --scale=300
dropping old tables...
creating tables...
generating data (client-side)...
30000000 of 30000000 tuples (100%) done (elapsed 22.60 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 39.61 s (drop tables 0.21 s, create tables 0.01 s, client-side generate 22.89 s, vacuum 3.29 s, primary keys 13.21 s).

Before we enable checksums on this new Postgres cluster, let's confirm that data checksums are not currently enabled. There are two ways to do this: a direct SQL query, and the pg_controldata utility. Let's see both of them in action:

$ psql -Axtc 'show data_checksums'
data_checksums|off

$ pg_controldata | grep -E 'state|checksum'
Database cluster state: in production
Data page checksum version: 0

(Aside: While I generally prefer long options, I bend my own rules when it comes to common psql arguments. The ‑Axtc can also be written as ‑‑no‑align ‑‑expanded ‑‑tuples‑only ‑‑command).

Note that for the pg_checksums program to work, the database must shut down in a clean way (more on that later). So let's attempt to use pg_checksums on a running Postgres cluster, then shut the database down, and finally view the new cluster state:

$ pg_checksums --pgdata data --enable
pg_checksums: error: cluster must be shut down

$ pg_ctl stop --mode=fast
waiting for server to shut down.... done
server stopped

$ pg_controldata | grep -E 'state|checksum'
Database cluster state: shut down
Data page checksum version: 0

Time for the magic to happen - let's change this database from a database without checksums to a Postgres database with checksums! We will add the ‑‑progress option, which gives a once‑a‑second update on the percent completed, assuming you are not redirecting the output somewhere.

$ time bin/pg_checksums --enable --progress
4518/4518 MB (100%) computed
Checksum operation completed
Files scanned: 1239
Blocks scanned: 578376
pg_checksums: syncing data directory
pg_checksums: updating control file
Checksums enabled in cluster

real 0m4.719s
user 0m1.157s
sys 0m1.669s

Note that I specified the shutdown mode above when doing the pg_ctl stop, by using the --mode=fast option. This instructs Postgres to kick everyone out right away, and then cleanly shut down. I find it better to always explicitly set this mode, even if "fast" is the default, because the other two modes have some serious drawbacks.

Using smart mode (--mode=smart) only shuts the database down once all clients have disconnected first. If one is about to bring the whole database down anyway, it is better to stop any existing connections from the database side via --mode=fast. The other mode, immediate (--mode=immediate), should be called "emergency stop" as it brings the database down as quick as possible. This can lead to problems, as the database is now in a broken state, and the pg_checksums program will refuse to run:

$ pg_ctl stop --mode=immediate
waiting for server to shut down.... done
server stopped

$ pg_controldata | grep -E 'state|checksum'
Database cluster state: in production
Data page checksum version: 0

$ pg_checksums --pgdata data --enable
pg_checksums: error: cluster must be shut down

$ psql -Axtc 'show data_checksums'
psql: error: could not connect to server: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

As you can see above, pg_controldata thinks the server is in production, but it has been shut down, as proved by the failure of psql to connect. So if your pg_checksums attempts are failing even though you know the database is shut down, check your pg_controldata. To fix it, start the server up then shut it down again cleanly:

$ pg_ctl start; pg_ctl stop --mode=fast
waiting for server to start....
2020-09-05 00:38:26.866 EDT [7547] LOG: starting PostgreSQL 14.0
2020-09-05 00:38:26.866 EDT [7547] LOG: redirecting log output to logging collector process
2020-09-05 00:38:26.866 EDT [7547] HINT: Future log output will appear in directory "log".
done
server started
waiting for server to shut down.... done
server stopped

$ pg_controldata | grep state
Database cluster state: shut down

There are two other things that the pg_checksum program can do besides --enable: --check, and --disable. Let's see both of them in action. (This program used to be named "pg_verify_checksums", so of course it can handle basic verification).

$ pg_checksums --check
Checksum operation completed
Files scanned: 1239
Blocks scanned: 578376
Bad checksums: 0
Data checksum version: 1

$ time pg_checksums --disable
pg_checksums: syncing data directory
pg_checksums: updating control file
Checksums disabled in cluster

real 0m0.180s
user 0m0.013s
sys 0m0.025s

Note how fast the disable was - practically instantaneous. This is because disable does not bother to "zero out" the existing checksums, but only modifies the Postgres control file, and tells it to not use checksums, even though checksums are still set in the data files themselves! This is safe, as pg_checksums is the only known program to change the Postgres control file with regards to the checksums.

If we use ‑‑enable again, we will get the same timings, even though the pages already have checksums in them. The pg_checksums program generates a checksum for each page and rewrites the header with the new checksum, overwriting whatever was there, even if the same checksum. I've got a patch to remove this limitation in Postgres version 15 - it might be merged by the time you read this! :)

A drawback to pg_checksums is that is has no built‑in way to prevent Postgres from starting up while it is running, which would be a bad thing. One way around this is to sabotage Postgres and prevent it from starting up. While one might think that putting a syntax error into the postgresql.conf file would do the trick, it is possible to start Postgres with an alternate configuration file. We can completely prevent startup by temporarily renaming one of the core files that Postgres needs. I like to pick on the pg_twophase directory, as it is not needed by pg_checksums itself, but is key to starting Postgres up. So your complete process would look like this:

$ pg_ctl stop
$ mv data/pg_twophase data/pg_twophase.DO_NOT_START_THIS_DATABASE
$ pg_checksums --enable --progress
$ mv data/pg_twophase.DO_NOT_START_THIS_DATABASE data/pg_twophase
$ pg_ctl start

How long will it take to enable checksums? It depends on the speed of your system and the size of your database. The pg_checksums program is going to rewrite every data file, so you can get an estimate by running it on a similar system with a smaller database, then doing some multiplication. My laptop was able to do about 1.5 gigabytes per second - a production box should be faster than that.

The pg_checksums program rewrites the actual data files, but it does not remove database bloat? The only part that is rewritten is the page header - and this means every page for both normal and bloated tables. Your database size will be exactly the same afterwards.

How else to speed things up? The patch I've submitted means that files are only modified if the checksums have changed, which means you could perform the checksum enabling in stages. Run the program for a while, then stop it, bring the database back up, and try again later. Each iteration will get further than the last, as already‑checksummed pages will be skipped. Another future idea is to enable parallel enabling of checksums. This would have the greatest win if your data is spread across more than one file system, such as when using tablespaces.

What about the performance impact of checksums - do they slow everything down, due to the overhead of computing checksums? That's a topic for another day, but the short answer is that the impact is very minimal.

So, are checksums worth the temporary hassle? Absolutely. With the pg_checksums program you can enable checksums on an existing database with a small downtime, for a greater peace of mind.

Newsletter