I have a dataframe like as below
name,id,AL
A,1,22
A,2,22
B,5,21
B,5,23
B,4,24
C,6,21
I would like to do the below
a) Groupby name
b) get the unique count (nunique) of id
and PL
for each name
With the help of this post, I managed to do the below
cols = ['id','PL']
for col in cols:
d = {i: x.value_counts(normalize=True).to_dict() for i, x in cf.groupby('name')[col]}
print(pd.Series(d))
But the above doesn’t work for multiple columns (purely due to my limitation)
Is there anyway to make it work for multiple columns? My real data has 5 million rows and I have store count values in a % format under dict (for around 5-6 columns). So, any elegant and efficient solution welcome
I expect my output to be like below
name id_list AL_list
A {1:0.5,2:0.5} {22:1}
B {5:0.66,4:0.33} {21:0.33,23:0.33,24:0.33}
C {6:1.0} {21:1}
UPDATE – incorrect output based on answer try
>Solution :
You can do groupby
with value_counts
then convert to dict
out = pd.concat([df.groupby('name')[x].value_counts(normalize=True).round(2).reset_index(level=0).groupby('name').agg(dict) for x in ['id','AL']],axis=1).add_suffix('_list').reset_index()
Out[716]:
name id_list AL_list
0 A {1: 0.5, 2: 0.5} {22: 1.0}
1 B {5: 0.67, 4: 0.33} {21: 0.33, 23: 0.33, 24: 0.33}
2 C {6: 1.0} {21: 1.0}