I have a following problem. I have these data:
data = pd.DataFrame(
{
"time": [
"2021-11-30 00:01:08",
"2021-11-30 00:01:10",
"2021-11-30 00:03:08",
"2021-11-30 00:10:08",
],
"country": ["Czechia","Czechia","USA","Czechia"],
}
)
I need to count how many times each country occurs in the dataset in each five minute. That is from 2021-11-30 00:00:00 to 2021-11-30 00:04:59 ; 2021-11-30 00:05:00 to 2021-11-30 00:09:59 etc. Countries should be transformed into new columns. Desired output is:
output = pd.DataFrame(
{
"Czechia": [2,0,1],
"USA": [1,0,0],
}
)
How can I do this, please?
>Solution :
Use Grouper with GroupBy.size and reshape by Series.unstack and DataFrame.asfreq:
data["time"] = pd.to_datetime(data["time"])
df = (data.groupby([pd.Grouper(freq='5Min', key='time'), 'country'])
.size()
.unstack(fill_value=0)
.asfreq('5Min', fill_value=0))
print (df)
country Czechia USA
time
2021-11-30 00:00:00 2 1
2021-11-30 00:05:00 0 0
2021-11-30 00:10:00 1 0