Devious SQL: Dynamic DDL in PostgreSQL

David Christensen
SQL

Supporting PostgreSQL DBAs is an important part of daily life here at Crunchy Data. I’ve recently run across a few use cases where utility queries based on the current state of the database are needed. A simple example could be where you have a table that is the target of logical replication and the id column becomes out of sync with the sequence that generated the data. This would result in new rows having primary key conflicts. To correct this issue, you would need to set the sequence to generate values past the current max value in the table.

This example is part of a larger class of problems which are best solved with functionality that SQL by itself does not directly provide: Dynamic DDL. Data Definition Language (DDL) in SQL itself is notoriously non-dynamic, with strict parsing rules, predefined data types, table structures, and queries based on known and articulated columns.

So how can we bend SQL to our will and execute Dynamic DDL Postgres queries without having to manually write these queries each time? In this next installment of my Devious SQL series (see posts #1 and #2), I’ll show you some SQL approaches to get the job done.

Altering sequence restart values

Let us again consider a scenario where we want to explicitly provide the RESTART value for a sequence via a query. This is an easy thing to express in terms of what we would like to do: we want to reset a sequence to start after the current maximum value of the table it is associated with.

Trying the naïve approach, we get:

ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
ERROR:  syntax error at or near "(", at character 41
STATEMENT:  ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);

As we can see, this approach isn't supported by the PostgreSQL grammar, as it is expecting an actual value here, not a subquery (as nice as that would be).

So what are some approaches here?

Using psql variable substitution

If we are using psql, we have a few options on how to solve this problem. One approach is using psql variables and first selecting the value we want into a variable, then substituting this value into the expression we pass to psql:

-- use \gset to set a psql variable with the results of this query
SELECT max(id) + 1 as big_table_max from big_table \gset
-- substitute the variable in a new query
ALTER SEQUENCE big_table_id_seq RESTART :big_table_max ;
ALTER SEQUENCE

In this example, we are using the \gset command to capture the results of the first query and store it for use later in the psql session. We then interpolate this variable into our expression using the :big_table_max syntax, which will be passed directly to the PostgreSQL server.

Using psql's \gexec command

Another method of utilizing psql for dynamic SQL is constructing the query as a SELECT statement returning the statements you wish to run, then using the \gexec command to execute the underlying queries. First let's look at making ourselves a query that returns the statement we want, then we'll run this statement using \gexec:

SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table;
SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table \gexec
query
ALTER SEQUENCE big_table_id_seq RESTART 100001
ALTER SEQUENCE

A benefit of this approach compared to the variable substitution one is that this can work with more complex statements and multiple return values, so you could construct queries based on arbitrary conditions and generate more than one SQL query; the first implementation is limited to queries that return single rows at a time. This also gives you a preview of the underlying SQL statement that you will be running before you execute it against the server with \gexec, so provides some level of safety if you were doing some sort of destructive action in the query.

Dynamic SQL without psql

Not everyone uses psql as the interface to PostgreSQL, despite its obvious superiority :-), so are there ways to support dynamic SQL using only server-side tools? As it so happens there are several, using basically the same approach of writing a plpgsql snippet to generate the query, then EXECUTE to run the underlying utility statement. These roughly correlate to the approaches in the psql section above in that they work best for single or multiple dynamic statements.

DO blocks

To use server-side Dynamic SQL we will need to construct our queries using plpgsql and execute the underlying text as if we were issuing the underlying query ourselves.

DO $$
BEGIN
    EXECUTE format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table));
END
$$
LANGUAGE plpgsql;
DO

In this case we are using PostgreSQL's built-in format() function which substitutes arguments similar to printf() in C-based languages. This allows us to interpolate the subquery result we were wanting in this case, resulting in a string that PostgreSQL can EXECUTE and giving us the result we want.

Create an exec() function

Almost identical in function to the DO block, we can also create a simple plpgsql function that simply calls EXECUTE on it input parameter like so:

CREATE OR REPLACE FUNCTION exec(raw_query text) RETURNS text AS $$
BEGIN
  EXECUTE raw_query;
  RETURN raw_query;
END
$$
LANGUAGE plpgsql;

SELECT exec(format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table)));
CREATE FUNCTION
exec
ALTER SEQUENCE big_table_id_seq RESTART 100001

This may seem like a fairly pointless change compared to the previous approach, as we have basically only moved our query into a parameter that we pass in, but what it buys us is the ability to call this function against a list of queries that we construct using normal SQL, giving us the option of running each in turn.

Restrictions

So what type of SQL can be run in each of these sorts of approaches, and are there any restrictions in what we can run via Dynamic SQL with these methods? The main consideration about the different approaches is related to commands that need to be run outside of an explicit transaction block.

Consider if we wanted to run a REINDEX CONCURRENTLY on all known indexes, so we used the exec() approach to construct a REINDEX CONCURRENTLY statement for all indexes in the public schema:

SELECT
    exec(format('REINDEX INDEX CONCURRENTLY %I', relname))
FROM
    pg_class
JOIN
    pg_namespace
ON pg_class.relnamespace = pg_namespace.oid
WHERE
    relkind = 'i' AND
    nspname = 'public'
ERROR:  REINDEX CONCURRENTLY cannot be executed from a function
CONTEXT:  SQL statement "REINDEX INDEX CONCURRENTLY big_table_pkey"
PL/pgSQL function exec(text) line 3 at EXECUTE

As you can see here, this won't work as a function due to REINDEX CONCURRENTLY needing to manage its own transaction state; in PostgreSQL, functions inherently run inside a transaction to allow the impact of a function to either completely succeed or completely fail. (Atomicity in ACID.)

Let's try this using \gexec:

SELECT
    format('REINDEX INDEX CONCURRENTLY %I', relname)
FROM
    pg_class
JOIN
    pg_namespace
ON pg_class.relnamespace = pg_namespace.oid
WHERE
    relkind = 'i' AND
    nspname = 'public'
\gexec
REINDEX

Since the \gexec handling is done by psql, the resulting statement is effectively run at the top-level as if it appeared literally in the SQL file.

More advanced usage

Look for a followup blog article where I go into more advanced techniques using Dynamic SQL, particularly using the \gexec function or exec() itself. Until next time, stay devious1!

Footnotes:

1 Devious: longer and less direct than the most straightforward way.

Newsletter