I have two dataframes of same length, with a shared column called post_id, look like this:
df1:
| post_id | text |
|---|---|
| 001 | some text 1 |
| 002 | some text 2 |
| 003 | some text 3 |
| … | … |
| 999 | some text 999 |
df2:
| post_id | text |
|---|---|
| 001 | different text 1 |
| 002 | different text 2 |
| 003 | different text 3 |
| … | … |
| 999 | different text 999 |
What I want is a new dataframe with half of the rows randomly selected from df1, the other half from df2, with all the post_id still in there and no duplicates. Is there a way to do this short of manually iloc the rows?
>Solution :
If there is same number of columns and same index use DataFrame.update with DataFrame.sample:
df1.update(df2.sample(frac=0.5, replace=False))
print (df1)
post_id text
0 1.0 different text 1
1 2.0 some text 2
2 3.0 some text 3
3 999.0 different text 999