In a database for a financial app, there are tables such as:
account(account_id PK, account_name, …),
transaction(transaction_id PK, transaction_timestamp, account_debit_id FK, account_credit_id FK), and
balance_event(account_balance, account_id FK, transaction_id FK).
SQL definitions are below. Only relevant fields are listed.
The latter is used to keep track of the balance of every account; the naming might be poor, but the table records the events when the balance of an account is changed as a result of some transaction. This way getting the account balance doesn’t require traversing and summing all the thousands of transactions, and also the history of the account balance is kept for future reference.
I want to describe a query1 that gives me all rows from the
account table, together with their corresponding
account_balance taken from
balance_event, — a classic problem. I’ve tried many different approaches, none of which work for different reasons (also, ChatGPT spitted a lot of variations, but I’m not including them here). Some of the queries don’t work because
ONLY_FULL_GROUP_BY is on; if possible, I’d like to keep it that way.
This is approximately what I am trying to do:
-- I know this query doesn't make any sense, don't bite me -- This is only a gist, a suggestion SELECT account.*, balance_event.account_balance FROM account JOIN balance_event ON account.account_id = balance_event.account_id WHERE balance_event.transaction_id = ( SELECT `transaction`.transaction_id FROM `transaction` JOIN balance_event ON balance_event.transaction_id = `transaction`.transaction_id WHERE balance_event.account_id = account.account_id AND `transaction`.transaction_timestamp = MAX(`transaction`.transaction_timestamp) GROUP BY balance_event.account_id ) GROUP BY balance_event.account_id ;
How to do it properly, so that it actually works?
1 — I’m testing the code in a random online SQL editor that just happens to use MySQL, but I seek to have a universal query or at least one that’s supported by PostgreSQL.
For reference, this is the full definition of the fields:
CREATE TABLE account ( account_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid account_name VARCHAR(255) NOT NULL ); CREATE TABLE `transaction` ( transaction_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid transaction_timestamp BIGINT NOT NULL, account_debit_id VARCHAR(36) NOT NULL, account_credit_id VARCHAR(36) NOT NULL, transaction_amount DECIMAL(10,2) NOT NULL, FOREIGN KEY (account_debit_id) REFERENCES account(account_id), FOREIGN KEY (account_credit_id) REFERENCES account(account_id) ); CREATE TABLE balance_event ( balance_event_id VARCHAR(36) NOT NULL PRIMARY KEY, -- uuid transaction_id VARCHAR(36) NOT NULL, account_id VARCHAR(36) NOT NULL, account_balance DECIMAL(10,2) NOT NULL, FOREIGN KEY (transaction_id) REFERENCES `transaction`(transaction_id), FOREIGN KEY (account_id) REFERENCES account(account_id) );
Here’s a solution that uses Window Functions, which are supported by PostgreSQL for a long time, and MySQL 8.0, and most other popular SQL databases, both commercial and open source.
SELECT * FROM ( SELECT a.*, b.account_balance, ROW_NUMBER() OVER (PARTITION BY a.account_id ORDER BY t.transaction_timestamp DESC) AS rownum FROM account AS a JOIN balance_event AS b USING (account_id) JOIN transaction AS t USING (transaction_id) ) t WHERE rownum = 1;
With this solution, you don’t need GROUP BY at all, so MySQL’s ONLY_FULL_GROUP_BY SQL mode doesn’t affect it.