I have a table with a user "id", the "country" they are from, and the "year" they signed up, for example:
| id | country | year |
|---|---|---|
| 1 | USA | 2010 |
| 2 | Mexico | 2010 |
| 3 | USA | 2011 |
| 4 | India | 2011 |
| 5 | Japan | 2011 |
I am trying to calculate the running count of distinct countries by year. So the final output for the example above should be:
| year | country_count |
|---|---|
| 2010 | 2 |
| 2011 | 4 |
I wrote something along these lines but obviously this is flawed logic since the 2nd half of the query isn’t a distinct count:
with t1 as (
select year, count(distinct country) country_count
from data
group by 1 order by 1
)
select *, sum(country_count) over (order by year) AS cumulative_country_count
from t1
>Solution :
select distinct year
,count(country) over(order by year) as cnt
from (
select *
,row_number() over(partition by country order by year) as rn
from t
) t
where rn = 1
| year | cnt |
|---|---|
| 2010 | 2 |
| 2011 | 4 |