i have a dataframe like that
MV id NAME ADDRESS DOC DOCTYPE PHONE
1 100 Mark Home 299 NI {123,456}
2 100 John Work A123 Pass {789,101}
3 100 Club
what i want to do is to merge the columns that has the same id into one cell in dictionary like this and make the key of a value in dictionary from another column
id NAME ADDRESS DOC PHONE
100 {1:Mark,2:John} {1:'Home',2:'Work',3:'Club'} {NI:'299',Pass:'A123'} {1:{123,456},2:{789,101}}
as u can see i used 2 columns as a key column mv used as a key in (Name,Address,Phone)
and i used column DOCTYPE as key in DOC so how can i do something like that i tried this
agg={'id':'first','NAME':dict,'ADDRESS':dict,'PHONE':dict}
df_new=df.groupby(['CUSTOMER_CODE'],as_index=False).aggregate(agg)
return df_new
but it gave me this output
id Name Address Phone
100 {0:Mark,1:John} {0:Home,1:Work,2:Club} {0:{123,456},1:{789,101}}
>Solution :
Assuming your empty cells are NaN, you can use a custom groupby.agg:
Let’s first see how we could aggregate with a single reference:
df.set_index('MV').groupby('id', as_index=False).agg(lambda s: s.dropna().to_dict())
Output:
id NAME ADDRESS DOC DOCTYPE PHONE
0 100 {1: 'Mark', 2: 'John'} {1: 'Home', 2: 'Work', 3: 'Club'} {1: '299', 2: 'A123'} {1: 'NI', 2: 'Pass'} {1: '{123,456}', 2: '{789,101}'}
generic answer for different references
Now we can adapt that to use different reference columns:
def f(s):
return s.dropna().to_dict()
d = {'MV': ['NAME', 'ADDRESS', 'PHONE'],
'DOCTYPE': ['DOC']
}
out = (pd.concat([df.set_index(k).groupby('id')[v].agg(f)
for k, v in d.items()], axis=1)
.reset_index()
)
Output:
id NAME ADDRESS PHONE DOC
0 100 {1: 'Mark', 2: 'John'} {1: 'Home', 2: 'Work', 3: 'Club'} {1: '{123,456}', 2: '{789,101}'} {'NI': '299', 'Pass': 'A123'}