Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to merge rows in dictionary python using pandas

i have a dataframe like that

MV id   NAME   ADDRESS  DOC  DOCTYPE PHONE
1  100  Mark   Home     299  NI      {123,456}
2  100  John   Work     A123 Pass    {789,101}
3  100         Club

what i want to do is to merge the columns that has the same id into one cell in dictionary like this and make the key of a value in dictionary from another column

id   NAME              ADDRESS                          DOC                    PHONE
100  {1:Mark,2:John}   {1:'Home',2:'Work',3:'Club'}     {NI:'299',Pass:'A123'} {1:{123,456},2:{789,101}}

as u can see i used 2 columns as a key column mv used as a key in (Name,Address,Phone)
and i used column DOCTYPE as key in DOC so how can i do something like that i tried this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

agg={'id':'first','NAME':dict,'ADDRESS':dict,'PHONE':dict}
    df_new=df.groupby(['CUSTOMER_CODE'],as_index=False).aggregate(agg)
    return df_new

but it gave me this output

id   Name             Address                 Phone
100  {0:Mark,1:John}  {0:Home,1:Work,2:Club}  {0:{123,456},1:{789,101}}

>Solution :

Assuming your empty cells are NaN, you can use a custom groupby.agg:

Let’s first see how we could aggregate with a single reference:

df.set_index('MV').groupby('id', as_index=False).agg(lambda s: s.dropna().to_dict())

Output:

    id                    NAME                            ADDRESS                    DOC               DOCTYPE                             PHONE
0  100  {1: 'Mark', 2: 'John'}  {1: 'Home', 2: 'Work', 3: 'Club'}  {1: '299', 2: 'A123'}  {1: 'NI', 2: 'Pass'}  {1: '{123,456}', 2: '{789,101}'}

generic answer for different references

Now we can adapt that to use different reference columns:

def f(s):
    return s.dropna().to_dict()

d = {'MV': ['NAME', 'ADDRESS', 'PHONE'],
     'DOCTYPE': ['DOC']
    }

out = (pd.concat([df.set_index(k).groupby('id')[v].agg(f)
                 for k, v in d.items()], axis=1)
         .reset_index()
      )

Output:

    id                    NAME                            ADDRESS                             PHONE                            DOC
0  100  {1: 'Mark', 2: 'John'}  {1: 'Home', 2: 'Work', 3: 'Club'}  {1: '{123,456}', 2: '{789,101}'}  {'NI': '299', 'Pass': 'A123'}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading