Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas: combine values in a list of dataframes of different sizes

I have a dataframe as follows,(but much bigger):

import pandas as pd
panda_list = [pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_a.3': [0, 6, 0, 8],'sent_a.4': [1, 1, 8, 6],'ID':['id_1_2','id_1_3','id_1_4','id_1_4']}),
              pd.DataFrame({'sent_a.1': [0, 3], 'sent_a.2': [0, 2], 'sent_a.3': [0, 6],'sent_a.4': [1, 1],'ID':['id_2_1','id_2_2']}),
              pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_a.3': [0, 6, 0, 8], 'sent_a.4': [1, 1, 8, 6], 'ID': ['id_1_2', 'id_1_3', 'id_1_4', 'id_1_4']}),
              ]

I would like to combine the pandas on similar columns into one dataframe, so the output would look like this,

desired output:
print(new_df)

   sent_a.1  sent_a.2  sent_a.3   sent_a.4      ID
0   0,0,0     0,0,0       0,0,0     1,1,1    id_1_2,id_2_1,id_3_1
1   3,3,3     1,2,1       6,6,6     1,1,1    id_1_3,id_2_2,id_3_2
2   2,NaN,2   4,NaN,4     0,NaN,0   8,NaN,8  id_1_4,NaN,id_3_3
3   1,NaN,1   0,NaN,0     8,NaN,8   6,NaN,6  id_1_4,NaN,id_3_4

So I have tried the following for example for the first columns but since the sizes are not the same, it send an error,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_df=pd.DataFrame()

new_df['sent_a.1']=panda_list[0]['sent_a.1'] + ',' + panda_list[1]['sent_a.1'] +','+ panda_list[2]['sent_a.1']

Note:

I do not want to concatenate or merge the columns though, i want to combine them. please look at the desired output.

>Solution :

IIUC, you can concat and aggregate (assuming here as list).

dropping NaNs:
out = pd.concat(panda_list).groupby(level=0).agg(list)

output:

    sent_a.1   sent_a.2   sent_a.3   sent_a.4                        ID
0  [0, 0, 0]  [0, 0, 0]  [0, 0, 0]  [1, 1, 1]  [id_1_2, id_2_1, id_1_2]
1  [3, 3, 3]  [1, 2, 1]  [6, 6, 6]  [1, 1, 1]  [id_1_3, id_2_2, id_1_3]
2     [2, 2]     [4, 4]     [0, 0]     [8, 8]          [id_1_4, id_1_4]
3     [1, 1]     [0, 0]     [8, 8]     [6, 6]          [id_1_4, id_1_4]

To aggregate as string, ensure to convert to string first:

out = pd.concat(panda_list).astype(str).groupby(level=0).agg(','.join)

output:

  sent_a.1 sent_a.2 sent_a.3 sent_a.4                    ID
0    0,0,0    0,0,0    0,0,0    1,1,1  id_1_2,id_2_1,id_1_2
1    3,3,3    1,2,1    6,6,6    1,1,1  id_1_3,id_2_2,id_1_3
2      2,2      4,4      0,0      8,8         id_1_4,id_1_4
3      1,1      0,0      8,8      6,6         id_1_4,id_1_4
ensuring not to lose NaNs

concatenate on the other axis and perform a double transpose

out = (
 pd.concat(panda_list, axis=1).astype(str)
   .T.groupby(level=0, sort=False)
   .agg(','.join).T
)

output:

  sent_a.1 sent_a.2 sent_a.3 sent_a.4                    ID
0  0,0.0,0  0,0.0,0  0,0.0,0  1,1.0,1  id_1_2,id_2_1,id_1_2
1  3,3.0,3  1,2.0,1  6,6.0,6  1,1.0,1  id_1_3,id_2_2,id_1_3
2  2,nan,2  4,nan,4  0,nan,0  8,nan,8     id_1_4,nan,id_1_4
3  1,nan,1  0,nan,0  8,nan,8  6,nan,6     id_1_4,nan,id_1_4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading