Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get relative frequencies from pandas groupby, with two grouping variables?

Suppose my data look as follows:

import datetime
import pandas as pd
df = pd.DataFrame({'datetime': [datetime.datetime(2024, 11, 27, 0), datetime.datetime(2024, 11, 27, 1), datetime.datetime(2024, 11, 28, 0),
                               datetime.datetime(2024, 11, 28, 1), datetime.datetime(2024, 11, 28, 2)],
                  'product': ['Apple', 'Banana', 'Banana', 'Apple', 'Banana']})



    datetime            product
0   2024-11-27 00:00:00 Apple
1   2024-11-27 01:00:00 Banana
2   2024-11-28 00:00:00 Banana
3   2024-11-28 01:00:00 Apple
4   2024-11-28 02:00:00 Banana


All I want is to plot the relative frequencies of the products sold at each day. In this example 1/2 (50%) of apples and 1/2 of bananas on day 2024-11-27. And 1/3 apples and 2/3 bananas on day 2024-11-28


What I managed to do:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

absolute_frequencies = df.groupby([pd.Grouper(key='datetime', freq='D'), 'product']).size().reset_index(name='count')
total_counts = absolute_frequencies.groupby('datetime')['count'].transform('sum')
absolute_frequencies['relative_frequency'] = absolute_frequencies['count'] / total_counts
absolute_frequencies.pivot(index='datetime', columns='product', values='relative_frequency').plot()

I am pretty confident, there is a much less complicated way, since for the absolute frequencies I simply can use:

df.groupby([pd.Grouper(key='datetime', freq='D'), 'product']).size().unstack('product').plot(kind='line')

>Solution :

You can use a crosstab with normalize:

ct = pd.crosstab(df['datetime'].dt.normalize(), df['product'], normalize='index')

Output:

product        Apple    Banana
datetime                      
2024-11-27  0.500000  0.500000
2024-11-28  0.333333  0.666667

As a graph:

ct.plot.bar()

Output:

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading