Randomly merge two dataframes based on condition in Pandas

December 29, 2021

I have two dataframes of same length, with a shared column called post_id, look like this:

df1:

post_id	text
001	some text 1
002	some text 2
003	some text 3
…	…
999	some text 999

df2:

post_id	text
001	different text 1
002	different text 2
003	different text 3
…	…
999	different text 999

What I want is a new dataframe with half of the rows randomly selected from df1, the other half from df2, with all the post_id still in there and no duplicates. Is there a way to do this short of manually iloc the rows?

>Solution :

If there is same number of columns and same index use DataFrame.update with DataFrame.sample:

df1.update(df2.sample(frac=0.5, replace=False))
print (df1)
   post_id                text
0      1.0    different text 1
1      2.0         some text 2
2      3.0         some text 3
3    999.0  different text 999