PostgreSQL 14 on Kubernetes (with examples!)

Jonathan S. Katz
Kubernetes PostgreSQL Operator PGO

Last week PostgreSQL 14 became generally available to the public. The release contains a lot of cool stuff, including features that I can appreciate from my application development background like better JSON and multirange types. There are also a lot of little things that make PostgreSQL 14 a really cool release, and I encourage you to explore the release notes to find a feature that will make your life easier.

Crunchy Bridge, our managed service available on Amazon, Microsoft, and Google, made PostgreSQL 14 available only several hours after it was announced, and we wanted to bring the PostgreSQL 14 experience as quickly as possible to PGO, the open source Postgres Operator for Kubernetes, and we're pleased to announce that it is already generally available.

With some of the distributed computing improvements to PostgreSQL 14, I thought going through a quick demo of getting PostgreSQL 14 up and running on Kubernetes would be a great way to show off these new features!

Setting up PostgreSQL 14

After installing PGO (I do recommend the quickstart!), you can get up and running with PostgreSQL 14 on Kubernetes with the following manifest:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi

However, let's create three separate Postgres clusters. You can use the manifests below, or build off of a manifest from the Postgres Operator examples repo:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: rhino1
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
users:
- name: postgres
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: rhino2
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.0-0
postgresVersion: 14
instances:
- dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
users:
- name: postgres

You should have your PostgreSQL 14 clusters up and running fairly quickly! This manifest does allow for exposing the postgres superuser from the rhino Postgres cluster. This is mainly for convenience to show off part of our example: in a production environment, you should be sure to set appropriate roles and privileges when using foreign data wrappers.

Anyway, we can confirm that we're running PostgreSQL 14 with this quick command (I'm assuming you've deployed your clusters to the postgres-operator namespace):

kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o name) -- \
psql -c 'SELECT version();'

which should yield something similar to:

                                                version                                                 
--------------------------------------------------------------------------------------------------------
PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

Cool.

There are many ways you can connect to your database, but for this example, we will use the kubectl exec pattern like above.

Example: PostgreSQL 14 Async Foreign Table Queries

One of the many really cool features of PostgreSQL 14 is the ability to perform queries on multiple [foreign tables] in parallel. This capability is enabled by setting the async_capable option to 'true' on either the foreign server or a foreign table definition. This is already implemented for the postgres_fdw.

We create two Postgres clusters named rhino1 and rhino2. These are going to contain data that we will query from the hippo Postgres cluster. First, we need to add some data to each of the rhino Postgres clusters.

Log into rhino1:

kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=rhino1,postgres-operator.crunchydata.com/role=master' -o name) -- psql

and execute the following SQL:

CREATE SCHEMA IF NOT EXISTS rhino1;

CREATE TABLE IF NOT EXISTS rhino1.stats (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO rhino1.stats (data, created_at)
SELECT
json_build_object('sample', x, 'node', 'rhino1', 'stat', random()),
clock_timestamp()
FROM generate_series(1,2000000) x;

This creates a schema with some randomly generated data. Let's do something similar on rhino2:

kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=rhino2,postgres-operator.crunchydata.com/role=master' -o name) -- psql

Execute the following SQL on rhino2:

CREATE SCHEMA IF NOT EXISTS rhino2;

CREATE TABLE IF NOT EXISTS rhino2.stats (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO rhino2.stats (data, created_at)
SELECT
json_build_object('sample', x, 'node', 'rhino2', 'stat', random()),
clock_timestamp()
FROM generate_series(1,2000000) x;

Great! Now before we set up hippo to be able to perform parallel foreign table queries, we need to capture the password for the postgres user on rhino1 and rhino2 (I will also add the obligatory "in production, you would not use the postgres user but rather set up a special purpose user for accessing the foreign tables"). You can grab the password and store it to a couple of environmental variables:

postgres_rhino1=$(kubectl -n postgres-operator get secrets rhino1-pguser-postgres -o jsonpath='{.data.password}' | base64 -d)
postgres_rhino2=$(kubectl -n postgres-operator get secrets rhino2-pguser-postgres -o jsonpath='{.data.password}' | base64 -d)

# if you want to display the passwords on your terminal, uncomment the lines below
# echo $postgres_rhino1
# echo $postgres_rhino2

Alright, let's log into the hippo Postgres cluster:

kubectl -n postgres-operator exec -it -c database \
$(kubectl get pods -n postgres-operator --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o name) -- psql

First, let's ensure that the postgres_fdw is enabled:

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

Next, let's set up the definitions for how we can connect to the rhino clusters:

CREATE SERVER IF NOT EXISTS rhino1
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'rhino1-primary.postgres-operator.svc', dbname 'postgres', sslmode 'verify-full', sslrootcert '/pgconf/tls/ca.crt', async_capable 'true');

CREATE SERVER IF NOT EXISTS rhino2
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'rhino2-primary.postgres-operator.svc', dbname 'postgres', sslmode 'verify-full', sslrootcert '/pgconf/tls/ca.crt', async_capable 'true');

There are a couple of things to note here:

  • We know the name of our host based on how PGO sets up services.
  • This example also demonstrates how to use Postgres' sslmode verify-full when connecting between Postgres instances. verify-full provides the strongest level of protection for connecting to a Postgres instance, and PGO lets you do this out-of-the-box!
  • The other key bit is the async_capable 'true' option. This enables parallel scans on foreign tables across all foreign tables created for that server. This is both for convenience for the fact that this option is not enabled by default in PostgreSQL 14.

Next, create the "user mappings" that tell Postgres how hippo can authenticate into rhino1 and rhino2:

CREATE USER MAPPING IF NOT EXISTS FOR postgres
SERVER rhino1
OPTIONS (user 'postgres', password '$postgres_rhino1');

CREATE USER MAPPING IF NOT EXISTS FOR postgres
SERVER rhino2
OPTIONS (user 'postgres', password '$postgres_rhino2');

Note that $postgres_rhino1 and $postgres_rhino2 are placeholders for your actual credentials.

Now we can start creating foreign tables. For conveniences, let's use the IMPORT FOREIGN SCHEMA functionality:

CREATE SCHEMA IF NOT EXISTS rhino1;
IMPORT FOREIGN SCHEMA rhino1
FROM SERVER rhino1
INTO rhino1;

CREATE SCHEMA IF NOT EXISTS rhino2;
IMPORT FOREIGN SCHEMA rhino2
FROM SERVER rhino2
INTO rhino2;

Now, we can finally test the ability for executing queries against foreign tables in parallel! Here is a simple query to test the functionality (the example is a bit contrived):

EXPLAIN ANALYZE SELECT avg(x.stat) FROM (
SELECT data['stat']::float AS stat FROM rhino1.stats
UNION ALL
SELECT data['stat']::float AS stat FROM rhino2.stats
) x;

In my Kubernetes environment, this yield the following plan:

                                                                  QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=424.80..424.81 rows=1 width=8) (actual time=13700.998..13701.000 rows=1 loops=1)
-> Append (cost=100.00..412.00 rows=5120 width=8) (actual time=2.937..13410.193 rows=4000000 loops=1)
-> Async Foreign Scan on stats (cost=100.00..193.20 rows=2560 width=8) (actual time=1.394..5337.504 rows=2000000 loops=1)
-> Async Foreign Scan on stats stats_1 (cost=100.00..193.20 rows=2560 width=8) (actual time=1.760..4807.722 rows=2000000 loops=1)
Planning Time: 1.019 ms
Execution Time: 13704.199 ms

Cool -- you can see the PostgreSQL 14 feature on the lines that say "async foreign scan"!

Next Steps

As mentioned at the top of the blog post, there are a lot of features to unpack in PostgreSQL 14. You can easily explore Postgres 14 in Kubernetes with PGO while having Postgres clusters that are production ready. I definitely encourage you to try out Postgres 14 with PGO or Crunchy Bridge and see which features help make your data management easier!

Newsletter