Home How to prevent data from being recycled when using pd.merge_asof in Python

Questions

How to prevent data from being recycled when using pd.merge_asof in Python

December 7, 2021

I am looking to join two data frames using the pd.merge_asof function. This function allows me to match data on a unique id and/or a nearest key. In this example, I am matching on the id as well as the nearest date that is less than or equal to the date in df1.

Is there a way to prevent the data from df2 being recycled when joining?

This is the code that I currently have that recycles the values in df2.

import pandas as pd
import datetime as dt

df1 = pd.DataFrame({'date': [dt.datetime(2020, 1, 2), dt.datetime(2020, 2, 2), dt.datetime(2020, 3, 2)],
                    'id': ['a', 'a', 'a']})

df2 = pd.DataFrame({'date': [dt.datetime(2020, 1, 1)],
                    'id': ['a'],
                    'value': ['1']})

pd.merge_asof(df1,
              df2,
              on='date',
              by='id',
              direction='backward',
              allow_exact_matches=True)

This is the output that I would like to see instead where only the first match is successful

>Solution :

Given your merge direction is backward, you can do a mask on duplicated id and df2’s date after merge_asof:

out = pd.merge_asof(df1,
              df2.rename(columns={'date':'date1'}),    # rename df2's date
              left_on='date',
              right_on='date1',                        # so we can work on it later
              by='id',
              direction='backward',
              allow_exact_matches=True)

# mask the value
out['value'] = out['value'].mask(out.duplicated(['id','date1']))
# equivalently
# out.loc[out.duplicated(['id', 'date1']), 'value'] = np.nan

Output:

        date id      date1 value
0 2020-01-02  a 2020-01-01     1
1 2020-02-02  a 2020-01-01   NaN
2 2020-03-02  a 2020-01-01   NaN

dataframe

byMR

Published December 07, 2021

Add a comment

Query DB for first or last name

byMR

December 7, 2021

Questions

Illegal Variable Name/Number when Passing in Python List

byMR

December 7, 2021

Questions

How to extract the value between the key using RegEx?

byMR

December 7, 2021

Questions

Pass parameters to inner arrow function used with Playwright's evaluate

byMR

December 7, 2021

Questions

How to replace the duplicated values in a string by a specific character?

byMR

December 7, 2021

Questions

How to extract array data from a JSON Column in SQL Server (OPENJSON, JSON_QUERY)

byMR

December 7, 2021

How to prevent data from being recycled when using pd.merge_asof in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Query DB for first or last name

Illegal Variable Name/Number when Passing in Python List

How to extract the value between the key using RegEx?

Pass parameters to inner arrow function used with Playwright's evaluate

How to replace the duplicated values in a string by a specific character?

How to extract array data from a JSON Column in SQL Server (OPENJSON, JSON_QUERY)

Keep Up to Date with the Most Important News

How to prevent data from being recycled when using pd.merge_asof in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Query DB for first or last name

Illegal Variable Name/Number when Passing in Python List

How to extract the value between the key using RegEx?

Pass parameters to inner arrow function used with Playwright's evaluate

How to replace the duplicated values in a string by a specific character?

How to extract array data from a JSON Column in SQL Server (OPENJSON, JSON_QUERY)

Discover more from Dev solutions