In a grouped table, how to select an ID of a row that has the MAX(timestamp) within the group?

In a database for a financial app, there are tables such as:

  • account(account_id PK, account_name, …),
  • transaction(transaction_id PK, transaction_timestamp, account_debit_id FK, account_credit_id FK), and
  • balance_event(account_balance, account_id FK, transaction_id FK).

SQL definitions are below. Only relevant fields are listed.

The latter is used to keep track of the balance of every account; the naming might be poor, but the table records the events when the balance of an account is changed as a result of some transaction. This way getting the account balance doesn’t require traversing and summing all the thousands of transactions, and also the history of the account balance is kept for future reference.

I want to describe a query1 that gives me all rows from the account table, together with their corresponding account_balance taken from balance_event, — a classic problem. I’ve tried many different approaches, none of which work for different reasons (also, ChatGPT spitted a lot of variations, but I’m not including them here). Some of the queries don’t work because ONLY_FULL_GROUP_BY is on; if possible, I’d like to keep it that way.

This is approximately what I am trying to do:

-- I know this query doesn't make any sense, don't bite me
-- This is only a gist, a suggestion
SELECT
  account.*,
  balance_event.account_balance
FROM
  account
JOIN
  balance_event ON account.account_id = balance_event.account_id
WHERE
  balance_event.transaction_id = (
    SELECT
      `transaction`.transaction_id
    FROM
      `transaction`
    JOIN
      balance_event ON balance_event.transaction_id = `transaction`.transaction_id
    WHERE
      balance_event.account_id = account.account_id
      AND
      `transaction`.transaction_timestamp = MAX(`transaction`.transaction_timestamp)
    GROUP BY
      balance_event.account_id
  )
GROUP BY
  balance_event.account_id
;

How to do it properly, so that it actually works?


1 — I’m testing the code in a random online SQL editor that just happens to use MySQL, but I seek to have a universal query or at least one that’s supported by PostgreSQL.


For reference, this is the full definition of the fields:

CREATE TABLE account (
  account_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid
  account_name VARCHAR(255) NOT NULL
);

CREATE TABLE `transaction` (
  transaction_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid
  transaction_timestamp BIGINT NOT NULL,
  account_debit_id VARCHAR(36) NOT NULL,
  account_credit_id VARCHAR(36) NOT NULL,
  transaction_amount DECIMAL(10,2) NOT NULL,
  FOREIGN KEY (account_debit_id) REFERENCES account(account_id),
  FOREIGN KEY (account_credit_id) REFERENCES account(account_id)
);

CREATE TABLE balance_event (
  balance_event_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid
  transaction_id VARCHAR(36) NOT NULL,
  account_id VARCHAR(36) NOT NULL,
  account_balance DECIMAL(10,2) NOT NULL,
  FOREIGN KEY (transaction_id) REFERENCES `transaction`(transaction_id),
  FOREIGN KEY (account_id) REFERENCES account(account_id)
);

>Solution :

Here’s a solution that uses Window Functions, which are supported by PostgreSQL for a long time, and MySQL 8.0, and most other popular SQL databases, both commercial and open source.

SELECT *
FROM (
  SELECT
    a.*,
    b.account_balance,
    ROW_NUMBER() OVER (PARTITION BY a.account_id ORDER BY t.transaction_timestamp DESC) AS rownum
  FROM account AS a
  JOIN balance_event AS b USING (account_id)
  JOIN transaction AS t USING (transaction_id)
) t
WHERE rownum = 1;

With this solution, you don’t need GROUP BY at all, so MySQL’s ONLY_FULL_GROUP_BY SQL mode doesn’t affect it.

Leave a Reply