Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

DISTINCT ON slow for 300000 rows

I have a table named assets. Here is the ddl:

create table assets
(
    id            bigint                                 not null
        constraint pk_assets
            primary key,
    name          varchar(255)                           not null,
    value         double precision                       not null,
    business_time timestamp with time zone,
    insert_time   timestamp with time zone default now() not null
);

alter table assets
    owner to java_role;

create index idx_assets_name
    on assets (name);

I need to extract the newest (based on insert_time) value for each asset name. This is the query that I initially used:

SELECT DISTINCT
    ON (a.name) *
FROM home.assets a
WHERE a.name IN (
                 'USD_RLS',
                 'EUR_RLS',
                 'SEKKEH_RLS',
                 'NIM_SEKKEH_RLS',
                 'ROB_SEKKEH_RLS',
                 'BAHAR_RLS',
                 'GOLD_18_RLS',
                 'GOLD_OUNCE_USD',
                 'SILVER_OUNCE_USD',
                 'PLATINUM_OUNCE_USD',
                 'GOLD_MESGHAL_RLS',
                 'GOLD_24_RLS',
                 'STOCK_IR',
                 'AED_RLS',
                 'GBP_RLS',
                 'CAD_RLS',
                 'CHF_RLS',
                 'TRY_RLS',
                 'AUD_RLS',
                 'JPY_RLS',
                 'CNY_RLS',
                 'RUB_RLS',
                 'BTC_USD'
    )
ORDER BY a.name,
         a.insert_time DESC;

I have around 300,000 rows in the assets table. On my VPS this query takes about 800 ms. this is causing a whole response time of about 1 second for a specific endpoint. This is a bit slow and considering the fact that the assets table is growing fast, this endpoint will be even slower in the near future. I also tried to avoid IN(...) using this query:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

SELECT DISTINCT
    ON (a.name) *
FROM home.assets a
ORDER BY a.name,
         a.insert_time DESC;

But I didn’t notice a significant difference. Any idea how I could optimize this query?

>Solution :

You may try adding the following index to your table:

CREATE INDEX idx ON assets (name, insert_time DESC);

If used, Postgres can simply scan this index to find the distinct record having the most recent insert_time for each name.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading