Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to merge two dataframes on datetime, conditionally if date is the same?

I have two dataframes

df1:

       datetime                 value  date_only
324    2022-03-09 09:26:12.185  48     2022-03-09
626    2022-03-10 08:39:59.087  24     2022-03-10
923    2022-03-11 09:20:41.032  30     2022-03-11
1218   2022-03-12 09:54:11.748  44     2022-03-12

and df2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

      datetime                      amount  date_only
22    2022-03-08 09:53:12.040358    5.0     2022-03-08
313   2022-03-09 08:34:55.160470    16.0    2022-03-09
339   2022-03-09 10:39:25.053984    3.0     2022-03-09
623   2022-03-10 08:30:14.754819    7.0     2022-03-10
916   2022-03-11 08:51:41.312365    14.0    2022-03-11
1217  2022-03-12 09:40:40.462450    13.0    2022-03-12
1472  2022-03-13 07:52:02.369454    8.0     2022-03-13
1774  2022-03-14 07:58:23.352999    8.0     2022-03-14
2095  2022-03-15 08:50:33.755545    7.0     2022-03-15

I am trying to merge the two dataframes so that everything from the df1 is included, and only rows where df2 occurs on the same date (using date_only) are included, but are interspersed chronologically based on datetime (or index), as follows:

       datetime                     value   date_only    amount
313    2022-03-09 08:34:55.160470   null    2022-03-09   16.0
324    2022-03-09 09:26:12.185      48      2022-03-09   null
339    2022-03-09 10:39:25.053984   null    2022-03-09   3.0
623    2022-03-10 08:30:14.754819   null    2022-03-10   7.0
626    2022-03-10 08:39:59.087      24      2022-03-10   null
916    2022-03-11 08:51:41.312365   null    2022-03-11   14.0
923    2022-03-11 09:20:41.032      30      2022-03-11   null
1217   2022-03-12 09:40:40.462450   null    2022-03-12   13.0
1218   2022-03-12 09:54:11.748      44      2022-03-12   null

Note that rows from df2 where date was NOT in df1 (March 8/13/14/15) are not included in the merged df.

I have tried following this question with a similar topic but they are using cumcount() to achieve something else. I have also tried implementing my own merges, joins, concats, etc. to no avail, e.g.

test_merge = pd.merge(df1, df2, how='left', on='date_only', left_index=True, right_index=True)

Can anyone help me figure out the proper syntax for this merge? Or do I need some other pre-processing step before the merge (new column, etc.) to make it work? Thank you so much.

>Solution :

IIUC, you do not need a merge. Just concat your data after filtering the rows of df2 by dates in df1 and sort_values:

out = (pd
 .concat([df1, df2[df2['date_only'].isin(df1['date_only'])]])
 .sort_values(by=['datetime'])
)

output:

                        datetime  value   date_only  amount
313   2022-03-09 08:34:55.160470    NaN  2022-03-09    16.0
324      2022-03-09 09:26:12.185   48.0  2022-03-09     NaN
339   2022-03-09 10:39:25.053984    NaN  2022-03-09     3.0
623   2022-03-10 08:30:14.754819    NaN  2022-03-10     7.0
626      2022-03-10 08:39:59.087   24.0  2022-03-10     NaN
916   2022-03-11 08:51:41.312365    NaN  2022-03-11    14.0
923      2022-03-11 09:20:41.032   30.0  2022-03-11     NaN
1217  2022-03-12 09:40:40.462450    NaN  2022-03-12    13.0
1218     2022-03-12 09:54:11.748   44.0  2022-03-12     NaN

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading