Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Randomly merge two dataframes based on condition in Pandas

I have two dataframes of same length, with a shared column called post_id, look like this:

df1:

post_id text
001 some text 1
002 some text 2
003 some text 3
999 some text 999

df2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

post_id text
001 different text 1
002 different text 2
003 different text 3
999 different text 999

What I want is a new dataframe with half of the rows randomly selected from df1, the other half from df2, with all the post_id still in there and no duplicates. Is there a way to do this short of manually iloc the rows?

>Solution :

If there is same number of columns and same index use DataFrame.update with DataFrame.sample:

df1.update(df2.sample(frac=0.5, replace=False))
print (df1)
   post_id                text
0      1.0    different text 1
1      2.0         some text 2
2      3.0         some text 3
3    999.0  different text 999
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading