Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to aggregate a column based on values of other columns – PostgreSQL

Say I have a table:

  date  | fruit  | status | numberOfFruits
---------------------------------------
2022-01 | apple  | ripe   | 3
2022-01 | banana | mature | 5
2022-01 | pear   | ripe   | 10
2022-01 | grapes | mature | 9
2022-02 | apple  | ripe   | 3
2022-02 | banana | mature | 3
2022-02 | pear   | ripe   | 3
2022-02 | grapes | mature | 7

I want to be able to create a query that adds 3 aggregate columns (AVG, MIN, MAX) that aggregates the column numberOfFruits based on their status and date, ordered by fruit and date. The output of the table should be:

  date  | fruit  | status | numberOfFruits | AvgNumOfFruits | MaxNumOfFruits | MinNumOfFruits
---------------------------------------------------------------------------------------------
2022-01 | apple  | ripe   |       3        |       6.5      |      10        |      3
2022-01 | pear   | ripe   |       10       |       6.5      |      10        |      3
2022-01 | banana | mature |       5        |        7       |      9         |      5
2022-01 | grapes | mature |       9        |        7       |      9         |      5
2022-02 | apple  | ripe   |       3        |        3       |      3         |      3
2022-02 | pear   | ripe   |       3        |        3       |      3         |      3
2022-02 | banana | mature |       3        |        5       |      7         |      3
2022-02 | grapes | mature |       7        |        5       |      7         |      3

I’m at a loss and here’s what I have so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

SELECT 
    date, 
    fruit, 
    status,
    numberOfFruits,
    AVG(CASE WHEN "status" = 'ripe' THEN "numberOfFruits" ELSE "numberOfFruits" END) as AvgNumOfFruits, 
    MIN(CASE WHEN "status" = 'ripe' THEN "numberOfFruits" ELSE "numberOfFruits" END) as MingNumOfFruits,
    MAX(CASE WHEN "status" = 'ripe' THEN "numberOfFruits" ELSE "numberOfFruits" END) as MaxNumOfFruits
FROM fruitdata
GROUP BY 1, 2, 3, 4
ORDER BY date, status

The query just repeats the numberofFruits for those 3 aggregate columns. It should be like "take the average of the column numberOfFruits that has the status of ripe, as well as the date 2022-01, and put it in a new column called AvgNumOfFruits. Somehow I can’t translate this to SQL.

Any advice/tips/help would be appreciated. Thanks in advance!

>Solution :

Aggregation functions are employed to aggregate rows (combining rows to get aggregated values). In your case you should rather use window functions, that compute values over windows (partitions/groups of rows), though without aggregating the rows.

SELECT *,
       AVG(numberOfFruits) OVER(PARTITION BY date, status) AS AvgNumOfFruits, 
       MAX(numberOfFruits) OVER(PARTITION BY date, status) AS MaxNumOfFruits,
       MIN(numberOfFruits) OVER(PARTITION BY date, status) AS MinNumOfFruits 
FROM fruitdata
ORDER BY date,
         status DESC

The corresponding window functions need to be computed on groups of "date" and "status" fields. On the other hand, the ordering is completely optional and up to you (ORDER BY clause).

Check the demo here.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading