Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to show most common pairings of column values in SQL, specifically with an != statement

I’m working RWFD emergency room dataset, which includes patient’s race and what department they were referred to. I’m trying to show which department each race (7 total) is most commonly referred to, with the stipulation that department_referral is not ‘None’.

Dataset: https://data.world/markbradbourne/rwfd-real-world-fake-data/workspace/file?filename=Hospital+ER.csv

Problem with what I have tried: I can’t order by most_referrals and LIMIT 7 as this will show data for the more common races. For example, patient_race of White and African American show up twice within the top 7 rows.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I tried:

SELECT patient_race, department_referral, COUNT(department_referral) as most_referrals
FROM ed
WHERE department_referral != 'None'
GROUP BY 1, 2 
ORDER BY most_referrals DESC
LIMIT 7
;

>Solution :

  1. First, group the dataset by patient_race and department_referral and count the number of referrals for each pair (excluding referrals labeled ‘None’).
  2. Then, find the maximum referral count (referral_count) per race.
  3. Join these results on race and the maximum referral count to get only the most common referral departments for each race.
  4. Finally, order by race.
WITH RaceReferralCounts AS (
    SELECT 
        patient_race, 
        department_referral, 
        COUNT(department_referral) AS referral_count
    FROM ed
    WHERE department_referral != 'None'
    GROUP BY patient_race, department_referral
),
MaxReferralsPerRace AS (
    SELECT 
        patient_race, 
        MAX(referral_count) AS max_referral_count
    FROM RaceReferralCounts
    GROUP BY patient_race
)
SELECT 
    r.patient_race, 
    r.department_referral, 
    r.referral_count AS most_referrals
FROM RaceReferralCounts r
JOIN MaxReferralsPerRace m 
    ON r.patient_race = m.patient_race 
    AND r.referral_count = m.max_referral_count
ORDER BY patient_race;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading