Say I have a dataframe
import pandas as pd
df = pd.DataFrame({"id":[1,1,1,1,2,2,2],
"car":["Volvo","Audi","Volvo","Volvo","VW","Audi","Volvo"]})
and I want to get the count-ratio for each car within each id aswell as the count i.e the resulting dataframe would be
ratio count
car
id
1 Audi 0.25 1
Volvo 0.75 3
2 Audi 0.33 1
Volvo 0.33 1
VW 0.33 1
I have tried
def get_tag_ratio(x):
"""
Calculate the ratio for each car
"""
ratio = x.value_counts(normalize=True)
return ratio
car_info = df.groupby(["id"])["car"].agg(
ratio=get_tag_ratio, count="count")
but the result is
ratio count
id
1 [0.75,0.25] 4
2 [0.33,0.33,0.33] 3
I have also tried
car_info = df.groupby(["id","car"])["car"].agg(
ratio=get_tag_ratio, count="count")
which gives
ratio count
id car
1 Audi 1.0 1
Volvo 1.0 3
2 Audi 1.0 1
Volvo 1.0 1
VW 1.0 1
which is almost there – unless the value_counts now is applied within each car group and not each id group.
Just parsing "value_counts" doens’t work (also, I don’t know how to get "normalize=True" as an argument to value_counts here)
df.groupby("id")["car"].agg(["value_counts","count"]) #`ValueError: Must pass non-zero number of levels/codes`
Is there a way to just to it in the .agg call, or do I have to join another dataframe with the value_counts? I have quite a lot of id and car so the groupby operations a rather timeconsuming, thus I prefer only having to do one "call" of groupby.agg
>Solution :
Use Series.value_counts without parameter and with parameter normalize with DataFrame cosntructor:
def get_tag_ratio(x):
"""
Calculate the ratio for each car
"""
ratio = pd.DataFrame({'ratio': x.value_counts(normalize=True),
'count': x.value_counts()})
return ratio
car_info = df.groupby(["id"])["car"].apply(get_tag_ratio)
print (car_info)
ratio count
id
1 Volvo 0.750000 3
Audi 0.250000 1
2 Audi 0.333333 1
VW 0.333333 1
Volvo 0.333333 1
Or join 2 MultiIndex Series:
g = df.groupby("id")["car"]
car_info = pd.concat([g.value_counts(normalize=True),
g.value_counts()], axis=1, keys=['ratio','count'])
print (car_info)
ratio count
id car
1 Volvo 0.750000 3
Audi 0.250000 1
2 Audi 0.333333 1
VW 0.333333 1
Volvo 0.333333 1