Streaming Replication Across Kubernetes Postgres Clusters

Brian Pace
PostgreSQL Operator

A common need when running PostgreSQL in Kubernetes is to establish a standby database in a different Kubernetes cluster. In a typical configuration there is a shared storage that both Kubernetes clusters have access to. Using the pgBackRest tool, this can be S3 compatible, GCS, Azure, or NFS. You can create a standby from an initial backup from the shared storage. As WAL segments are archived, pgBackRest loads the segments to the shared storage. The standby instance then pulls archived WAL segments from the shared storage and applies them.

The challenge created in this configuration is the potential for lag based on how frequently WAL segments are archived, which is dependent on database load and the archive_timeout parameter. A second challenge is that once a failover occurs the source database (old primary) must be recreated before a fail back can occur.

A solution to the challenge of solely using the WAL archive for replication is to use streaming replication for the standby. For the streaming replication to function, the PostgreSQL primary database must be exposed externally to the cluster. In this example, this can be accomplished by using the Load Balancer service type.

Creating Environments

In this article a S3 bucket is configured for the shared storage. The S3 storage is in a separate data center than the production and standby Kubernetes cluster. This is the optimal configuration to provide the highest level of data protection to allow for the loss of two complete environments with limited loss of data. When this is not possible, the second best option is to host the shared storage in the standby location.

To get started, create the production PostgreSQL cluster. This cluster will be configured with two pgBackRest repositories. The first repository is a local PVC and the third is the shared S3. Having a local set of backups will help save time and potentially cost for local recovery or cloning activities.

When the production cluster is running, a backup is executed to the S3 storage. When the backup is complete, the next step is to create the standby. The cluster will first be configured in a non-streaming fashion and then converted in a later step to streaming replication.

In order to allow the steps outlined here to be easily reproduced for your own testing, the Postgres Operator Examples repository will be used. To follow along, fork the repository and clone the forked repository to your workstation. The postgres-operator-examples/kustomize directory will serve as the working directory for the rest of the article.

Production

To get started, the production Postgres cluster will be created. This cluster will have one pgBackRest repository that leverages an S3 compatible platform. To create the necessary manifest and supporting files, the following steps are performed:

  • Copy the kustomize/s3 example to kustomize/xkube-prod and then edit the kustomize/xkube/postgres.yaml and change all occurances of hippo-s3 to xkube.

    cp -r s3 xkube-prod
  • Rename the s3.conf.example file to s3.conf.

    mv xkube-prod/s3.conf.example xkube-prod/s3.conf
  • Edit the s3.conf and enter the S3 key and key secret. Leave the values unquoted.

    [global]
    repo1-s3-key=<YOUR_AWS_S3_KEY>
    repo1-s3-key-secret=<YOUR_AWS_S3_KEY_SECRET>
    
  • Edit the postgres.yaml file and specify the correct values for the s3 storage under spec.backups.pgbackrest.repos.0.s3.

    repos:
      - name: repo1
        s3:
          bucket: "<YOUR_AWS_S3_BUCKET_NAME>"
          endpoint: "<YOUR_AWS_S3_ENDPOINT>"
          region: "<YOUR_AWS_S3_REGION>"  
  • In preparation for streaming replication, the last edit that is needed for the postgres.yaml is to add the service section to the spec as shown below. The new spec.service section will expose the Postgres database externally to the Kubernetes cluster.

    spec:
      service:
        type: LoadBalancer
  • Add the standby section to the spec in preparation. The spec.standby.enabled is set to false since this is the production and current primary site.

    spec:
      standby:
        enabled: false
        repoName: repo1
  • Last edit to the postgres.yaml is to add the patroni section to spec. In this section one Postgres parameter, archive_timeout, will be set to ensure that during times when processing is to law to force a WAL segment switch, WAL segments will be archived at least every 'n' seconds. In addition, a pg_hba rule will be added to allow the replication user to connect from the standby.

    spec:
      patroni:
        dynamicConfiguration:
          postgresql:
            pg_hba:
              - "hostssl replication replicator all md5"
            parameters:
              archive_timeout: 120
  • Using kubectl, deploy the production Postgres cluster.

    kubectl apply -k xkube-prod

Once the Postgres pod is to full ready state, the Operator will automatically perform a full backup. The end state should be one pod running Postgres and a completed backup pod as seen below.

kubectl get pods

NAME                      READY   STATUS      RESTARTS   AGE
xkube-00-8fkl-0           2/2     Running     0          2m18s
xkube-backup-bfn4-zwc87   0/1     Completed   0          111s

Lastly, capture the external ip that has been assigned to the xkube-ha service as this will be used to establish streaming replication later in the article.

Standby

Next, the standby cluster will be created. To bootstrap the standby Postgres instance, the Operator will use the pgBackRest repo in the S3 bucket. The following steps create the standby instance using the S3 bucket as the source for WAL segments to be replayed on the standby cluster.

  • Copy the xkube-prod directory to xkube-standby.

    cp -r xkube-prod xkube-standby
  • Set the spec.standby.enabled key to true in the xkube-standby/postgres.yaml.

    standby:
        enabled: true
        repoName: repo1
  • Using kubectl, deploy the standby Postgres cluster to the standby Kubernetes cluster.

    kubectl apply -k xkube-standby

Depending on the network and storage capabilities, it could take several minutes to bootstrap the standby instance and for the pod to come to full ready state.

Verify Replication

With both production and standby clusters running, verify replication by creating a temporary table and populating it with some data. After two minutes (value of archive_timeout) the new table and data should be visible on the standby. To force the replication, one could execute 'select pg_switch_wal();' on the production instance to force a WAL segment switch and archive.

Production

kubectl exec -c database $(kubectl get pod --selector="postgres-operator.crunchydata.com/cluster=xkube,postgres-operator.crunchydata.com/role=master" -o name) -- psql -c "create table test (chkdate timestamp); insert into test values (current_timestamp)"

Standby

kubectl exec -c database $(kubectl get pod --selector="postgres-operator.crunchydata.com/cluster=xkube,postgres-operator.crunchydata.com/role=master" -o name) -- psql -c "select * from test"

As WAL segments are archived, pgBackRest copies them to the S3 bucket. At regular intervals, the standby instance performs an archive get to pull down any available archived WAL segments and replays the WAL segments in the standby instance.

Standby (Streaming Configuration)

The desired configuration for streaming replication is to leverage the default replication user _crunchyrepl with certificate authentication. This requires that both the production and standby certificates be signed by the same certificate authority. For details on how to integrate the Operator with a central certificate manager, see the "Using Cert Manager to Deploy TLS for Postgres on Kubernetes" blog post.

To keep this example simple, a new user will be created to use for streaming replication. The following steps are followed to enable streaming replication.

  • On the production cluster, create a replication user.

    kubectl exec -c database $(kubectl get pod --selector="postgres-operator.crunchydata.com/cluster=xkube,postgres-operator.crunchydata.com/role=master" -o name) -- psql -c "create role replicator with login replication password 'Welcome1'"
  • Establish a bash session in the database container on the standby leader pod. Once in, append the following to the /pgdata/pg13/postgresql.base.conf file. The IP address specified for the host parameter should be the external ip given to the xkube-ha service on the production Kubernetes cluster.

    primary_conninfo='host=10.1.1.100 user=replicator sslmode=require password=Welcome1'
  • Apply the updated manifest to the standby Kubernetes cluster.

    kubectl apply -k xkube-standby
  • After applying the new manifest, the Operator will automatically make the parameter change. However, we need to instruct the operator to restart the pod for the parameter change to take effect. On the standby side, execute the following to trigger a restart of the Postgres pods.

    kubectl patch postgrescluster/xkube --type merge --patch '{"spec":{"metadata":{"annotations":{"restarted":"'"$(date)"'"}}}}'

Verify Streaming Replication

One the restart is complete and the standby pods are full ready state, verify streaming replication using the commands below. If streaming replication set was successful, the new rows will show up in the standby within seconds of the insert.

Production

kubectl exec -c database $(kubectl get pod --selector="postgres-operator.crunchydata.com/cluster=xkube,postgres-operator.crunchydata.com/role=master" -o name) -- psql -c " insert into test values (current_timestamp)"

Standby

kubectl exec -c database $(kubectl get pod --selector="postgres-operator.crunchydata.com/cluster=xkube,postgres-operator.crunchydata.com/role=master" -o name) -- psql -c "select * from test"

Conclusion

Setting up streaming replication can reduce lag time between the primary and standby Postgres instances running on different Kubernetes clusters. Do note that with this approach you're manually manipulating the postgrersql.base.conf. At some point Patroni, the Operator, or other processes could overwrite this file and the primary_conninfo lost. In addition, if there are multiple Postgres instances on the standby cluster, the change to the postgresql.base.conf has to be performed to all instances.

Newsletter