Home How to show most common pairings of column values in SQL, specifically with an != statement

Questions

How to show most common pairings of column values in SQL, specifically with an != statement

October 29, 2024

I’m working RWFD emergency room dataset, which includes patient’s race and what department they were referred to. I’m trying to show which department each race (7 total) is most commonly referred to, with the stipulation that department_referral is not ‘None’.

Dataset: https://data.world/markbradbourne/rwfd-real-world-fake-data/workspace/file?filename=Hospital+ER.csv

Problem with what I have tried: I can’t order by most_referrals and LIMIT 7 as this will show data for the more common races. For example, patient_race of White and African American show up twice within the top 7 rows.

What I tried:

SELECT patient_race, department_referral, COUNT(department_referral) as most_referrals
FROM ed
WHERE department_referral != 'None'
GROUP BY 1, 2 
ORDER BY most_referrals DESC
LIMIT 7
;

>Solution :

First, group the dataset by patient_race and department_referral and count the number of referrals for each pair (excluding referrals labeled ‘None’).
Then, find the maximum referral count (referral_count) per race.
Join these results on race and the maximum referral count to get only the most common referral departments for each race.
Finally, order by race.

WITH RaceReferralCounts AS (
    SELECT 
        patient_race, 
        department_referral, 
        COUNT(department_referral) AS referral_count
    FROM ed
    WHERE department_referral != 'None'
    GROUP BY patient_race, department_referral
),
MaxReferralsPerRace AS (
    SELECT 
        patient_race, 
        MAX(referral_count) AS max_referral_count
    FROM RaceReferralCounts
    GROUP BY patient_race
)
SELECT 
    r.patient_race, 
    r.department_referral, 
    r.referral_count AS most_referrals
FROM RaceReferralCounts r
JOIN MaxReferralsPerRace m 
    ON r.patient_race = m.patient_race 
    AND r.referral_count = m.max_referral_count
ORDER BY patient_race;