Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging many to many Dask

say I have the following databases (suppose they are Dask data frames:

df A =

1
1
2
2
2
2
3
4
5
5
5
5
5
5

df B =

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

1
2
2
3
3
3
4
5
5
5

and I would like to merge the two so that the resulting DataFrame has the most information among the two (so for instance in the case of observation 1 I would like to preserve the info of df A, in case of observation number 3, I would like to preserve the info of df B and iso on…).
In other words the resulting DataFrame should be like this:

df C=

1
1
2
2
2
2
3
3
3
4
5
5
5
5
5
5

Is there a way to do that in Dask?

Thank you

>Solution :

Looks like OP wants to use dask.dataframe.DataFrame.merge. Start by importing dask.dataframe and then do the desired merge by changing the how parameter.

import dask.dataframe as dd

df_c = dd.merge(df_a, df_b, how='outer', on='sample_id')

[Out]:
    sample_id
0           1
1           1
2           2
3           2
4           2
5           2
6           2
7           2
8           2
9           2
10          3
11          3
12          3
13          4
14          5
15          5
16          5
17          5
18          5
19          5
20          5
21          5
22          5
23          5
24          5
25          5
26          5
27          5
28          5
29          5
30          5
31          5

Note:

  • This thread has really valuable information on merges. Even though its focus is on Pandas, it will allow one to understand left, right, outer, and inner merges.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading