In a database for a financial app, there are tables such as:
account(account_id PK, account_name, …)
,transaction(transaction_id PK, transaction_timestamp, account_debit_id FK, account_credit_id FK)
, andbalance_event(account_balance, account_id FK, transaction_id FK)
.
SQL definitions are below. Only relevant fields are listed.
The latter is used to keep track of the balance of every account; the naming might be poor, but the table records the events when the balance of an account is changed as a result of some transaction. This way getting the account balance doesn’t require traversing and summing all the thousands of transactions, and also the history of the account balance is kept for future reference.
I want to describe a query1 that gives me all rows from the account
table, together with their corresponding account_balance
taken from balance_event
, — a classic problem. I’ve tried many different approaches, none of which work for different reasons (also, ChatGPT spitted a lot of variations, but I’m not including them here). Some of the queries don’t work because ONLY_FULL_GROUP_BY
is on; if possible, I’d like to keep it that way.
This is approximately what I am trying to do:
-- I know this query doesn't make any sense, don't bite me
-- This is only a gist, a suggestion
SELECT
account.*,
balance_event.account_balance
FROM
account
JOIN
balance_event ON account.account_id = balance_event.account_id
WHERE
balance_event.transaction_id = (
SELECT
`transaction`.transaction_id
FROM
`transaction`
JOIN
balance_event ON balance_event.transaction_id = `transaction`.transaction_id
WHERE
balance_event.account_id = account.account_id
AND
`transaction`.transaction_timestamp = MAX(`transaction`.transaction_timestamp)
GROUP BY
balance_event.account_id
)
GROUP BY
balance_event.account_id
;
How to do it properly, so that it actually works?
1 — I’m testing the code in a random online SQL editor that just happens to use MySQL, but I seek to have a universal query or at least one that’s supported by PostgreSQL.
For reference, this is the full definition of the fields:
CREATE TABLE account (
account_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid
account_name VARCHAR(255) NOT NULL
);
CREATE TABLE `transaction` (
transaction_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid
transaction_timestamp BIGINT NOT NULL,
account_debit_id VARCHAR(36) NOT NULL,
account_credit_id VARCHAR(36) NOT NULL,
transaction_amount DECIMAL(10,2) NOT NULL,
FOREIGN KEY (account_debit_id) REFERENCES account(account_id),
FOREIGN KEY (account_credit_id) REFERENCES account(account_id)
);
CREATE TABLE balance_event (
balance_event_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid
transaction_id VARCHAR(36) NOT NULL,
account_id VARCHAR(36) NOT NULL,
account_balance DECIMAL(10,2) NOT NULL,
FOREIGN KEY (transaction_id) REFERENCES `transaction`(transaction_id),
FOREIGN KEY (account_id) REFERENCES account(account_id)
);
>Solution :
Here’s a solution that uses Window Functions, which are supported by PostgreSQL for a long time, and MySQL 8.0, and most other popular SQL databases, both commercial and open source.
SELECT *
FROM (
SELECT
a.*,
b.account_balance,
ROW_NUMBER() OVER (PARTITION BY a.account_id ORDER BY t.transaction_timestamp DESC) AS rownum
FROM account AS a
JOIN balance_event AS b USING (account_id)
JOIN transaction AS t USING (transaction_id)
) t
WHERE rownum = 1;
With this solution, you don’t need GROUP BY at all, so MySQL’s ONLY_FULL_GROUP_BY SQL mode doesn’t affect it.