Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge two pandas dataframes, as lists in every cell

I want to merge 2 dataframes, with the resulting dataframe having a list in every single cell. I’m completely lost on how to do this.

My current solution is using the index of each dataframe to build a dict (eg. dict[index[0]]['DEPTH'] = []), and then looping over rows of the dataframes to append to dict keys (eg. dict[index[0]]['DEPTH'].append(cell_value)), but I’m thinking that’s super inefficient and slow.

Does a pandas solution exist that would get this done?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • df1 would look like this:

enter image description here

  • df2 would look like this:

enter image description here

  • Resulting df would look something like this:
                    DEPTH        A
chr1~10007022~C    [1, 1]      [0, 0]
chr1~10007023~T    [1, 1]      [0, 0]
                  .
                  .
                  .
chr1~10076693~T    [1, 1]      [0, 0]

Keep in mind:

  • indexes of dataframe would probably differ, but not always.
  • dataframes will probably contain >100M rows each

>Solution :

You could concatenate the two, groupby the item and then agg with list.

import pandas as pd

df = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007023-T'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})

df = df.set_index('item')

df2 = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007026-X'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})
df2 = df2.set_index('item')

out = pd.concat([df,df2]).groupby(level=0).agg(list)

Output

                  DEPTH       A       C
item                                   
chr1-10007022-C  [1, 1]  [0, 0]  [0, 0]
chr1-10007023-T     [1]     [0]     [0]
chr1-10007026-X     [1]     [0]     [0]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading