How to count countires per unit of time in pandas

December 6, 2021

I have a following problem. I have these data:

data = pd.DataFrame(
    {
        "time": [
            "2021-11-30 00:01:08",
            "2021-11-30 00:01:10",
            "2021-11-30 00:03:08",
            "2021-11-30 00:10:08",
        ],
        "country": ["Czechia","Czechia","USA","Czechia"],
    }
)

I need to count how many times each country occurs in the dataset in each five minute. That is from 2021-11-30 00:00:00 to 2021-11-30 00:04:59 ; 2021-11-30 00:05:00 to 2021-11-30 00:09:59 etc. Countries should be transformed into new columns. Desired output is:

output = pd.DataFrame(
    {
        "Czechia": [2,0,1],
        "USA": [1,0,0],
    }
)

How can I do this, please?

>Solution :

Use Grouper with GroupBy.size and reshape by Series.unstack and DataFrame.asfreq:

data["time"] = pd.to_datetime(data["time"])

    
df = (data.groupby([pd.Grouper(freq='5Min', key='time'), 'country'])
          .size()
          .unstack(fill_value=0)
          .asfreq('5Min', fill_value=0))
print (df)
country              Czechia  USA
time                             
2021-11-30 00:00:00        2    1
2021-11-30 00:05:00        0    0
2021-11-30 00:10:00        1    0