I’m working RWFD emergency room dataset, which includes patient’s race and what department they were referred to. I’m trying to show which department each race (7 total) is most commonly referred to, with the stipulation that department_referral is not ‘None’.
Dataset: https://data.world/markbradbourne/rwfd-real-world-fake-data/workspace/file?filename=Hospital+ER.csv
Problem with what I have tried: I can’t order by most_referrals and LIMIT 7 as this will show data for the more common races. For example, patient_race of White and African American show up twice within the top 7 rows.
What I tried:
SELECT patient_race, department_referral, COUNT(department_referral) as most_referrals
FROM ed
WHERE department_referral != 'None'
GROUP BY 1, 2
ORDER BY most_referrals DESC
LIMIT 7
;
>Solution :
- First, group the dataset by patient_race and department_referral and count the number of referrals for each pair (excluding referrals labeled ‘None’).
- Then, find the maximum referral count (referral_count) per race.
- Join these results on race and the maximum referral count to get only the most common referral departments for each race.
- Finally, order by race.
WITH RaceReferralCounts AS (
SELECT
patient_race,
department_referral,
COUNT(department_referral) AS referral_count
FROM ed
WHERE department_referral != 'None'
GROUP BY patient_race, department_referral
),
MaxReferralsPerRace AS (
SELECT
patient_race,
MAX(referral_count) AS max_referral_count
FROM RaceReferralCounts
GROUP BY patient_race
)
SELECT
r.patient_race,
r.department_referral,
r.referral_count AS most_referrals
FROM RaceReferralCounts r
JOIN MaxReferralsPerRace m
ON r.patient_race = m.patient_race
AND r.referral_count = m.max_referral_count
ORDER BY patient_race;