Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding whether there is any overlap between two date periods in DataFrame

I have the following pd.DataFrame

df = pd.DataFrame({'admission_timestamp': ['2021-01-17 17:45:00', '2020-03-31 23:32:00', '2020-03-27 18:20:00', '2020-04-17 18:12:00', '2020-03-19 19:12:00'], 'end_period': ['2021-01-18 17:45:00', '2020-04-01 23:32:00', '2020-03-28 18:20:00', '2020-04-18 18:12:00', '2020-03-20 19:12:00'], 'start_med': ['NaT', '2020-04-01 00:00:00', '2020-03-27 19:00:00', '2020-04-17 18:39:24', 'NaT'], 'end_med': ['NaT', '2020-04-14 21:00:00', '2020-04-05 00:00:00', '2020-05-06 22:07:29', 'NaT']})

that looks like:

   admission_timestamp  end_period           start_med            end_med 
1  2021-01-17 17:45:00  2021-01-18 17:45:00  NaT                  NaT   
2  2020-03-31 23:32:00  2020-04-01 23:32:00  2020-04-01 00:00:00  2020-04-14 21:00:00
3  2020-03-27 18:20:00  2020-03-28 18:20:00  2020-03-27 19:00:00  2020-04-05 00:00:00
4  2020-04-17 18:12:00  2020-04-18 18:12:00  2020-04-17 18:39:24  2020-05-06 22:07:29   
5  2020-03-19 19:12:00  2020-03-20 19:12:00  NaT                  NaT

I want to create a new column received_medidation that states whether or not (boolean) the patient received medication between admission_timestamp and end_period (even if it was for only one second). So, the boolean should state if there is any time between admission_timestamp and end_period that overlaps with the time between start_med and end_med. The dtypes are all datetime64[ns].

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I know that we can create boolean masks such as

condition = (df['date'] > start_date) & (df['date'] <= end_date)

… however I fail to understand how this could possibily solve the task above. Any help is appreciated.

>Solution :

You can apply a function that checks the condition (note that it suffices that start_med date is between admission_timestamp and end_period):

for col in df.columns:
    df[col] = pd.to_datetime(df[col])

df['received_medidation'] = (df['admission_timestamp'] < df['start_med']) & (df['start_med'] < df['end_period'])

Output:

  admission_timestamp          end_period           start_med  \
0 2021-01-17 17:45:00 2021-01-18 17:45:00                 NaT   
1 2020-03-31 23:32:00 2020-04-01 23:32:00 2020-04-01 00:00:00   
2 2020-03-27 18:20:00 2020-03-28 18:20:00 2020-03-27 19:00:00   
3 2020-04-17 18:12:00 2020-04-18 18:12:00 2020-04-17 18:39:24   
4 2020-03-19 19:12:00 2020-03-20 19:12:00                 NaT   

              end_med  received_medidation  
0                 NaT                False  
1 2020-04-14 21:00:00                 True  
2 2020-04-05 00:00:00                 True  
3 2020-05-06 22:07:29                 True  
4                 NaT                False  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading