Projecting Monthly Revenue Run Rate in Postgres

Jonathan S. Katz
SQL

Monthly recurring revenue (MRR) and annual recurring revenue (ARR) are important metrics for a subscription-based business model. It allows for the business to project its available capital to make important decisions around expansion, hiring and more.

In an on-demand subscription model, MRR can fluctuate on any given day. This is especially true in the cloud-services world, where people are constantly spinning up and down services. That's why it makes sense to try and capture what the subscription revenue "run rate" is: this is a projection of how much money you are earning over a period of time based upon current service usage.

This exercise recently came up with a friend who was looking for a little help with analyzing and reporting on their SaaS business which is running on Crunchy Bridge. Because sometimes SQL is just fun I decided to roll up my sleeves and see what business insights we could get with a single query.

The twist here from my history is that this SaaS business was more of a utility model that you can start/stop subscriptions vs. having some defined plan for each month. (Think metered billing, apparently something my colleague Craig said was a big topic for Add-on Marketplaces in his Heroku days). Before I started running Postgres on Kubernetes, I was an application developer / a SQL wonk of sorts and just so happened to work for a business that had a subscription model. Thus, I was no stranger to the MRR/ARR queries.

So let's dive in.

The Query to Project Monthly Revenue Run Rate

So here it is:

SELECT
  to_char(rates.day, 'Mon DD, YYYY') AS day,
  to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
  to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate,
  to_char(CASE
    WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
      (rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
        lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
    ELSE 0
  END * 100, '9,999,999D99%') AS mrr_run_rate_mom
FROM (
  SELECT 
    dates.day,
    SUM(
    CASE 
      WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
        subscription.rate
      ELSE
        0
    END) / 100::numeric AS mrr_run_rate
  FROM subscription,
    LATERAL (
      SELECT *
      FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
    ) dates
  WHERE subscription.org_id not in (
    SELECT id 
    FROM account
    WHERE billable
  )
  GROUP BY dates.day
) rates
ORDER BY rates.day DESC;

Upon first inspection, there is a lot going on here. What I have found through the years is that it's best to work through SQL queries inside-out. Let's start here:

FROM subscription

This is the heart of the query, pulling the data about all of the subscriptions. There are a few columns from this table that are required for this query, including:

  • rate, the value of the subscription, measured monthly and stored in cents.
  • period, the duration of the subscription from start to finished. This is stored in a tstzrange range type.
  • org_id, which we use to ensure the org is billable

Let's move to the next part, building up the query:

FROM subscription,
  LATERAL (
    SELECT *
    FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
  ) dates

There are a bunch of things going on in this query. First, let's go inside and look at:

SELECT *
FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day

This generates a bunch of dates going back to Jan 1, 2021 to present day. We are going to use this list of dates to inspect the state of a subscription on any given day and calculate the run rate. We accomplish that using a LATERAL join, which allows us to cross-reference the subscription to a date regardless if it was active on that particular day.

Let's continue working our way inside out. Let's look at the WHERE clause:

WHERE subscription.team_id not in (
  SELECT id 
  FROM account
  WHERE billable
)

This clause allows us to exclude any not billable subscriptions, this could be the case for demo accounts or trials a sales team is running.

We'll take a quick peek ahead at the GROUP BY:

GROUP BY dates.day

We will be performing an aggregate query by summing up the projected monthly revenue on a given day, so we will need to aggregate that information by day!

Let's now look at the SELECT list:

SELECT 
  dates.day,
  SUM(
    CASE 
      WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
        subscription.rate
      ELSE
        0
    END) / 100::numeric AS mrr_run_rate

Keeping the day that we are looking at the MRR run rate is important for display purposes, which is why it's included in the SELECT list. However, the key part of this query is this CASE statement:

CASE 
  WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
    subscription.rate
  ELSE
    0
END

What's happening here? First, we're checking if a subscription happened to be active on a given day. We can use a range type "overlap" query there: we construct a range consisting of the present day and see if it happens to overlap with the subscription. If it does, we can include the subscription in the MRR projection. Otherwise, we're going to return 0.

Given this is a projection, we don't necessarily need to look at how long a subscription was active for that given day, we just want a directional sense of what our MRR is. And now we have analyzed the heart of our query:

SELECT 
  dates.day,
  SUM(
  CASE 
    WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
      subscription.rate
    ELSE
      0
  END) / 100::numeric AS mrr_run_rate
FROM subscription,
  LATERAL (
    SELECT *
    FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
  ) dates
WHERE subscription.team_id not in (
  SELECT id 
  FROM account
  WHERE billable
)
GROUP BY dates.day

which gives us our MRR rate!

Formatting The Report + ARR

Our full query had an additional part, an outer layer that look liked:

SELECT
  to_char(rates.day, 'Mon DD, YYYY') AS day,
  to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
  to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate
FROM (
  -- ...
) rates
ORDER BY rates.day DESC;

What's happening here?

First, for convenience we're ordering our data from most recent to oldest: typically that is what we're most interested in when looking at the current MRR projection. This is handled with the ORDER BY clause:

ORDER BY rates.day DESC;

Now, we have a series of calls to the to_char formatting function:

SELECT
  to_char(rates.day, 'Mon DD, YYYY') AS day,
  to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
  to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate

All of these allow for some nicer display of data, e.g:

SELECT
  to_char(CURRENT_TIMESTAMP, 'Mon DD, YYYY') AS day,
  to_char(123456789, '$99,999,999,999D99') AS mrr_run_rate,
  to_char(12 * 123456789, '$99,999,999,999D99') AS arr_run_rate
  

yields:

day          |    mrr_run_rate     |    arr_run_rate     
--------------+---------------------+---------------------
Oct 15, 2021 | $    123,456,789.00 | $  1,481,481,468.00

Projecting the ARR run rate is fairly straightforward: take the MRR run rate and multiply by 12.

We could have done this in the inner part of the query, but for readability purposes it's a bit easier to write the query like this and you will not pay a penalty for it.

And voilà, this gives you a nice, presentable MRR/ARR run rate chart!

Bonus: Getting Month-over-Month MRR Growth

As part of this exercise, I was also tasked with finding the month-over-month (MoM) MRR growth. This yield the following query:

SELECT
  to_char(rates.day, 'Mon DD, YYYY') AS day,
  to_char(rates.mrr_run_rate, '$99,999,999,999D99') AS mrr_run_rate,
  to_char(12 * rates.mrr_run_rate, '$99,999,999,999D99') AS arr_run_rate,
  to_char(CASE
    WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
      (rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
        lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
    ELSE 0
  END * 100, '9,999,999D99%') AS mrr_run_rate_mom
FROM (
  SELECT 
    dates.day,
    SUM(
    CASE 
      WHEN tstzrange(dates.day, dates.day + '1 day'::interval) && subscription.period THEN
        subscription.rate
      ELSE
        0
    END) / 100::numeric AS mrr_run_rate
  FROM subscription,
    LATERAL (
      SELECT *
      FROM generate_series('2021-01-01'::date, CURRENT_TIMESTAMP, '1 day'::interval) AS day
    ) dates
  WHERE subscription.team_id not in (
    SELECT id 
    FROM account
    WHERE billable
  )
  GROUP BY dates.day
) rates
ORDER BY rates.day DESC;

The key portion is this:

to_char(CASE
  WHEN COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0) <> 0 THEN
    (rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
      lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)
  ELSE 0
END * 100, '9,999,999D99%') AS mrr_run_rate_mom

and really, the portion is the lead window function:

lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)

What this function call does is that it looks 30 rows back in the query to find what the MRR run rate was on that given day. Once we know that value, we can calculate the MoM growth:

(rates.mrr_run_rate - lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)) /
  lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC)

Why do we have a COALESCE statement?

COALESCE(lead(rates.mrr_run_rate, 30) OVER (ORDER BY rates.day DESC), 0)

The way this query is constructed, we will get to a point where 30 previous rows will not exist, which makes it impossible to find the MoM value. While this makes the data NULL, I use COALESCE to check for 0 values as well, since we can't divide by 0. If we wanted to determine the MoM growth for, say, January 2, 2021, we would need to also include the previous 30 dates in the query, and then ultimately exclude those from the display.

The Power of Postgres

Being able to get data insights like the above is one of the many reasons why I love working with Postgres so much. This example combines many of my favorite PostgreSQL features: date/time support, range types, conditional expressions, window functions, LATERAL joins, and more. Instead of having to pull my raw data into a spreadsheet or a different programming language, I can get the insights I need directly with a few lines of SQL.

Newsletter