Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging similar SQL data for counts efficiently?

I am trying to understand the best way to merge a count from an INNER JOIN of two tables and grouping, but the best thing I can do is to query each table separately and then apply a union and further GROUP BY. This seems a bit convoluted, but with joins the counts get messed up, and I am not sure what other method I can use to query this efficiently.

Correct code, but probably inefficient:

SELECT
  x.id,
  sum(x.question_count) AS question_count,
  sum(x.card_count) AS card_count
FROM (
  SELECT
    c.id,
    count(*) AS question_count,
    0 AS card_count
  FROM
    concepts AS c
    INNER JOIN questions ON c.id = questions."conceptId"
  GROUP BY
    c.id
  UNION ALL
  SELECT
    c.id,
    0 AS question_count,
    count(*) AS card_count
  FROM
    concepts AS c
    INNER JOIN cards ON c.id = cards."conceptId"
  GROUP BY
    c.id) AS x
GROUP BY
  x.id
ORDER BY x.id;

output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id q_count c_count
1 1 2
2 7 9
3 1 1

My hopeful join code, that gives incorrect counts:

SELECT
  x."conceptId",
  q_count,
  c_count
FROM (
  SELECT
    q."conceptId",
    count(*) AS q_count
  FROM
    questions AS q
  GROUP BY
    q."conceptId") AS x
  INNER JOIN (
    SELECT
      c."conceptId",
      count(*) AS c_count
    FROM
      questions AS c
    GROUP BY
      c."conceptId") AS y ON x."conceptId" = y."conceptId";

output:

id q_count c_count
1 1 1
2 7 7
3 1 1

>Solution :

You would typically aggregate the related table first, then join them to the base table:

select co.id, qu.question_count, ca.card_count
from concepts co
inner join (
    select conceptId, count(*) question_count
    from questions
    group by conceptId
) qu on qu.conceptId = co.id
inner join (
    select conceptId, count(*) card_count
    from cards 
    group by conceptId
) ca on ca.conceptId = co.id

Depending on the availability of your data in the related tables, you might want left joins instead.

Depending on your data as well (typically, if you have few concepts and many questions and cards), it might be more efficient to use correlated subqueries or lateral joins, like so:

select co.id, qu.question_count, ca.card_count
from concepts co
cross join lateral (select count(*) question_count from questions qu where qu.conceptId = co.id) qu
cross join lateral (select count(*) card_count     from cards     ca where ca.conceptId = co.id) ca
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading