Say you have the following data frame and you needed to know how many Assays were done per month.
type,"Date Tested"
Assay,2022/01/28
Assay,2022/01/31
Assay,2022/02/02
Assay,2022/03/31
Assay,2022/04/21
Assay,2022/05/12
Assay,2022/06/02
Assay,2022/02/03
Assay,2022/06/03
You can use value_counts() from Pandas to easily do this.
data['Date Tested']=pd.to_datetime(data['Date Tested'], format = "%Y/%m/%d")
months = data['Date Tested'].dt.month.value_counts(sort=False)
print(months)
Which outputs:
1 2
2 2
3 1
4 1
5 1
6 2
Name: Date Tested, dtype: int64
The ‘numbers’ in the first column are each month (i.e 01 – Jan, 02 – Feb etc..) but this isn’t great. What if the dataset started at March? Then March = 01.
Or what if I needed to do the same thing but by weeks? How could you workout what, say 12 was in terms of a week?
How can you modify the output of value_count to include the corresponding month/week? This information is present in the dataframe, shown by:
print(data['Date Tested'])
Which gives:
0 2022-01-28
1 2022-01-31
2 2022-02-02
3 2022-03-31
4 2022-04-21
5 2022-05-12
6 2022-06-02
7 2022-02-03
8 2022-06-03
Name: Date Tested, dtype: datetime64[ns]
Ideally, my count output would be something like this:
2022-01 2
2022-02 2
2022-03 1
2022-04 1
2022-05 1
2022-06 2
Name: Date Tested, dtype: datetime64[ns]
>Solution :
You can convert to_period
(M
= monthly) and count:
data.groupby(data['Date Tested'].dt.to_period('M'))['type'].count()
output:
Date Tested
2022-01 2
2022-02 2
2022-03 1
2022-04 1
2022-05 1
2022-06 2
Freq: M, Name: type, dtype: int64
By week:
data.groupby(data['Date Tested'].dt.to_period('W'))['type'].count()
output:
Date Tested
2022-01-24/2022-01-30 1
2022-01-31/2022-02-06 3
2022-03-28/2022-04-03 1
2022-04-18/2022-04-24 1
2022-05-09/2022-05-15 1
2022-05-30/2022-06-05 2
Freq: W-SUN, Name: type, dtype: int64