Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get count_values(normalize=True) result for each value, in each group in pandas GroupByDataframe

Say I have a dataframe

import pandas as pd
df = pd.DataFrame({"id":[1,1,1,1,2,2,2],
                  "car":["Volvo","Audi","Volvo","Volvo","VW","Audi","Volvo"]})

and I want to get the count-ratio for each car within each id aswell as the count i.e the resulting dataframe would be


          ratio    count
    car
id 

1   Audi   0.25     1
    Volvo  0.75     3

2   Audi   0.33     1
    Volvo  0.33     1
    VW     0.33     1

I have tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


def get_tag_ratio(x):
    """
    Calculate the ratio for each car
    """
    ratio = x.value_counts(normalize=True)
    return ratio

car_info = df.groupby(["id"])["car"].agg(
    ratio=get_tag_ratio, count="count")

but the result is

          ratio    count
 
id 

1   [0.75,0.25]          4

2   [0.33,0.33,0.33]     3

I have also tried

car_info = df.groupby(["id","car"])["car"].agg(
    ratio=get_tag_ratio, count="count")

which gives

          ratio    count
    
id   car

1   Audi   1.0      1
    Volvo  1.0      3

2   Audi   1.0      1
    Volvo  1.0      1
    VW     1.0      1

which is almost there – unless the value_counts now is applied within each car group and not each id group.

Just parsing "value_counts" doens’t work (also, I don’t know how to get "normalize=True" as an argument to value_counts here)

df.groupby("id")["car"].agg(["value_counts","count"]) #`ValueError: Must pass non-zero number of levels/codes`

Is there a way to just to it in the .agg call, or do I have to join another dataframe with the value_counts? I have quite a lot of id and car so the groupby operations a rather timeconsuming, thus I prefer only having to do one "call" of groupby.agg

>Solution :

Use Series.value_counts without parameter and with parameter normalize with DataFrame cosntructor:

def get_tag_ratio(x):
    """
    Calculate the ratio for each car
    """
    ratio = pd.DataFrame({'ratio': x.value_counts(normalize=True), 
                          'count': x.value_counts()})
    return ratio

car_info = df.groupby(["id"])["car"].apply(get_tag_ratio)
print (car_info)
             ratio  count
id                       
1  Volvo  0.750000      3
   Audi   0.250000      1
2  Audi   0.333333      1
   VW     0.333333      1
   Volvo  0.333333      1

Or join 2 MultiIndex Series:

g = df.groupby("id")["car"]
car_info = pd.concat([g.value_counts(normalize=True),
                      g.value_counts()], axis=1, keys=['ratio','count'])
print (car_info)
             ratio  count
id car                   
1  Volvo  0.750000      3
   Audi   0.250000      1
2  Audi   0.333333      1
   VW     0.333333      1
   Volvo  0.333333      1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading