Assume the following dataset:
| member_id | company | Year_started |
|---|---|---|
| 1 | Apple | 2001 |
| 1 | IBM | 2002 |
| 1 | Oracle | 2005 |
| 1 | Microsoft | 2010 |
| 2 | IBM | 2002 |
| 2 | Microsoft | 2004 |
| 2 | Oracle | 2008 |
Member 1, began work at IBM in 2002, moved to Oracle in 2005 and moved to Microsoft in 2010. Member 2, began workin gat IBM in 2002, moved to Microsoft in 2004 and then Moved to oracle in 2008. Assume that for each member in each year, there is only one company (cannot work at 2 different companies in the same year).
**Question: How many members ever worked at IBM prior to working at Oracle? **
How would you go about solving this? I tried a combination of CASE when’s but am lost as to where else to go. Thanks.
…
..
>Solution :
More verbose, but using CTEs:
- For each
member_id, get the first year they worked at IBM (if any), and get the first year they worked at Oracle (again, if any).- "For each
member_id" translates toGROUP BY member_id - "Get the first year…" translates to
MIN( CASE WHEN "Company" = 'etc' THEN "Year_Started" END )
- "For each
- Filter those rows to rows where the first-year-at-IBM is less-than their first-year-at-Oracle.
- Then simply get the
COUNT(*)of those rows.
WITH ibmOracleYears AS (
SELECT
member_id,
MIN( CASE WHEN "Company" = 'IBM' THEN "Year_Started" END ) AS JoinedIbm,
MIN( CASE WHEN "Company" = 'Oracle' THEN "Year_Started" END ) AS
JoinedOracle
FROM
yourTable
GROUP BY
member_id
),
workedAtIbmBeforeOracle AS (
SELECT
y.*
FROM
ibmOracleYears AS y
WHERE
y.JoinedIbm IS NOT NULL /* <-- This IS NOT NULL check isn't absolutely necessary, but I'm including it for clarity. */
AND
y.JoinedOracle IS NOT NULL
AND
y.JoinedIbm < y.JoinedOracle
)
SELECT
COUNT(*) AS "Number of members that worked at IBM before Oracle"
FROM
workedAtIbmBeforeOracle
But that query can be reduced down to this (if you don’t mind anonymous expressions in HAVING clauses):
SELECT
COUNT(*) AS "Number of members that worked at IBM before Oracle"
FROM
(
SELECT
member_id
FROM
yourTable
GROUP BY
member_id
HAVING
MIN( CASE WHEN "Company" = 'IBM' THEN "Year_Started" END ) < MIN( CASE WHEN "Company" = 'Oracle' THEN "Year_Started" END )
) AS q