Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to merge two dataframes in pandas based on order

I have two df’s that I want to merge but they do not have a common column.

Thus, I have created a temporary column on each of the dataframes called tmp:

y_pred['tmp'] = 1
data['tmp'] = 1 

data looks like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     mean  year  tmp
4600  2.3  2019  1
2601  5.3  2020  1

whereas y_pred looks like:

     pred_score  tmp
0     2           1
1     5.2         1

and I merge them:

new_df = pd.merge(data, y_pred, on=['tmp'], how='left')
new_df.drop('tmp', inplace=True, axis=1)

I get 900 rows where I need to have only 30 (suppose that datasets have 30 rows each, I get 30 times 30)

whereas what I need is new_df to have 30 rows and just merge the column pred_score to data in the order the rows are currently.

So that I would get:

new_df:

     mean  year  pred_score
4600  2.3  2019  2
2601  5.3  2020  5.2

Is there a way to achieve this without having a common column?

>Solution :

Use y_pred.values:

>>> data
      mean  year
4600   2.3  2019
2601   5.3  2020

>>> y_pred
   pred_score
0         2.0
1         5.2

>>> data['pred_score'] = y_pred.values

# Output
      mean  year  pred_score
4600   2.3  2019         2.0
2601   5.3  2020         5.2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading