Supporting PostgreSQL DBAs is an important part of daily life here at Crunchy Data. I’ve recently run across a few use cases where utility queries based on the current state of the database are needed. A simple example could be where you have a table that is the target of logical replication and the
id column becomes out of sync with the sequence that generated the data. This would result in new rows having primary key conflicts. To correct this issue, you would need to set the sequence to generate values past the current max value in the table.
This example is part of a larger class of problems which are best solved with functionality that SQL by itself does not directly provide: Dynamic DDL. Data Definition Language (DDL) in SQL itself is notoriously non-dynamic, with strict parsing rules, predefined data types, table structures, and queries based on known and articulated columns.
So how can we bend SQL to our will and execute Dynamic DDL Postgres queries without having to manually write these queries each time? In this next installment of my Devious SQL series (see posts #1 and #2), I’ll show you some SQL approaches to get the job done.
Altering sequence restart values
Let us again consider a scenario where we want to explicitly provide the
RESTART value for a sequence via a query. This is an easy thing to express in terms of what we would like to do: we want to reset a sequence to start after the current maximum value of the table it is associated with.
Trying the naïve approach, we get:
ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
ERROR: syntax error at or near "(", at character 41 STATEMENT: ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
As we can see, this approach isn't supported by the PostgreSQL grammar, as it is expecting an actual value here, not a subquery (as nice as that would be).
So what are some approaches here?
psql variable substitution
If we are using
psql, we have a few options on how to solve this problem. One approach is using
psql variables and first selecting the value we want into a variable, then substituting this value into the expression we pass to psql:
-- use \gset to set a psql variable with the results of this query SELECT max(id) + 1 as big_table_max from big_table \gset -- substitute the variable in a new query ALTER SEQUENCE big_table_id_seq RESTART :big_table_max ;
In this example, we are using the
\gset command to capture the results of the first query and store it for use later in the
psql session. We then interpolate this variable into our expression using the
:big_table_max syntax, which will be passed directly to the PostgreSQL server.
Another method of utilizing
psql for dynamic SQL is constructing the query as a
SELECT statement returning the statements you wish to run, then using the
\gexec command to execute the underlying queries. First let's look at making ourselves a query that returns the statement we want, then we'll run this statement using
SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table; SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table \gexec
query ALTER SEQUENCE big_table_id_seq RESTART 100001 ALTER SEQUENCE
A benefit of this approach compared to the variable substitution one is that this can work with more complex statements and multiple return values, so you could construct queries based on arbitrary conditions and generate more than one SQL query; the first implementation is limited to queries that return single rows at a time. This also gives you a preview of the underlying SQL statement that you will be running before you execute it against the server with
\gexec, so provides some level of safety if you were doing some sort of destructive action in the query.
Dynamic SQL without
Not everyone uses
psql as the interface to PostgreSQL, despite its obvious superiority :-), so are there ways to support dynamic SQL using only server-side tools? As it so happens there are several, using basically the same approach of writing a
plpgsql snippet to generate the query, then
EXECUTE to run the underlying utility statement. These roughly correlate to the approaches in the
psql section above in that they work best for single or multiple dynamic statements.
To use server-side Dynamic SQL we will need to construct our queries using
plpgsql and execute the underlying text as if we were issuing the underlying query ourselves.
DO $$ BEGIN EXECUTE format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table)); END $$ LANGUAGE plpgsql;
In this case we are using PostgreSQL's built-in
format() function which substitutes arguments similar to
printf() in C-based languages. This allows us to interpolate the subquery result we were wanting in this case, resulting in a string that PostgreSQL can
EXECUTE and giving us the result we want.
Almost identical in function to the
DO block, we can also create a simple
plpgsql function that simply calls
EXECUTE on it input parameter like so:
CREATE OR REPLACE FUNCTION exec(raw_query text) RETURNS text AS $$ BEGIN EXECUTE raw_query; RETURN raw_query; END $$ LANGUAGE plpgsql; SELECT exec(format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table)));
CREATE FUNCTION exec ALTER SEQUENCE big_table_id_seq RESTART 100001
This may seem like a fairly pointless change compared to the previous approach, as we have basically only moved our query into a parameter that we pass in, but what it buys us is the ability to call this function against a list of queries that we construct using normal SQL, giving us the option of running each in turn.
So what type of SQL can be run in each of these sorts of approaches, and are there any restrictions in what we can run via Dynamic SQL with these methods? The main consideration about the different approaches is related to commands that need to be run outside of an explicit transaction block.
Consider if we wanted to run a
REINDEX CONCURRENTLY on all known indexes, so we used the
exec() approach to construct a
REINDEX CONCURRENTLY statement for all indexes in the
SELECT exec(format('REINDEX INDEX CONCURRENTLY %I', relname)) FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relkind = 'i' AND nspname = 'public'
ERROR: REINDEX CONCURRENTLY cannot be executed from a function CONTEXT: SQL statement "REINDEX INDEX CONCURRENTLY big_table_pkey" PL/pgSQL function exec(text) line 3 at EXECUTE
As you can see here, this won't work as a function due to
REINDEX CONCURRENTLY needing to manage its own transaction state; in PostgreSQL, functions inherently run inside a transaction to allow the impact of a function to either completely succeed or completely fail. (Atomicity in ACID.)
Let's try this using
SELECT format('REINDEX INDEX CONCURRENTLY %I', relname) FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relkind = 'i' AND nspname = 'public' \gexec
\gexec handling is done by
psql, the resulting statement is effectively run at the top-level as if it appeared literally in the SQL file.
More advanced usage
Look for a followup blog article where I go into more advanced techniques using Dynamic SQL, particularly using the
\gexec function or
exec() itself. Until next time, stay devious1!