Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditionally append rows from one dataframe to another based on key-id and date range

I have two large dataframes with the same columns. Example data of relevance is presented below.

import pandas as pd 
    
df1 = [[52, '1', 'Mast', '1/1/2022'], [54, '1', 'Mast', '1/1/2023'],
       [55, '2', 'Mast', '25/5/2022'], [52, '1', 'Mast', '14/7/2022']]
    
df = pd.DataFrame(df1, columns =['Cow','Lact', 'Event', 'Date'])
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')


df2 = [[60, '1', 'Other', '10/1/2022'], [54, '1', 'Other', '3/1/2023'],
       [55, '2', 'Other', '30/5/2022'], [52, '1', 'Other', '14/8/2022']]

    
df2 = pd.DataFrame(df2, columns =['Cow', 'Lact', 'Event', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date'], format='%d/%m/%Y')

The structure of df is

Cow Lact    Event   Date
0   52  1   Mast    2022-01-01
1   54  1   Mast    2023-01-01
2   55  2   Mast    2022-05-25
3   52  1   Mast    2022-07-14

The structure of df2 is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    Cow Lact    Event   Date
0   60  1   Other   2022-01-10
1   54  1   Other   2023-01-03
2   55  2   Other   2022-05-30
3   52  1   Other   2022-08-14

I would like to append rows from df2 to df if the "Cow" and "Lact" from df2 matches df and the "Date" of df2 fits within the range of the "Date" in df and five days greater than the "Date" of df.

The desired output with this example data is

Cow Lact    Event   Date
0   52  1   Mast    2022-01-01
1   54  1   Mast    2023-01-01
2   54  1   Other   2023-01-03
3   55  2   Mast    2022-05-25
4   55  2   Other   2022-05-30
5   52  1   Mast    2022-07-14

Any ideas appreciated.

>Solution :

Not a clean solution, but it works:

# create temp columns cow-Lact
df['cow_lact'] = df['Cow'].astype(str) + '_' +  df['Lact'].astype(str)
df2['cow_lact'] = df2['Cow'].astype(str) + '_' +  df2['Lact'].astype(str)

# select correct colunms from df2 
df_append = df2[
    df2['cow_lact'].isin(df['cow_lact'])
    & df2['Date'].between(df['Date'].min(), df['Date'].max() + pd.Timedelta(days=5))]

# concat
result = pd.concat([df,df_append], ignore_index=True)
# drop temp column
result.drop(columns=['cow_lact'])

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading