Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find overlapping time start and end points?

I would like to find for each ID, earliest measurement time before 12:00:00 and latest measurement time after 12:00:00. So that I can choose maximum overlapping start and ending time. Here is the sample data:

import numpy as np
import pandas as pd
import random

df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
                   'ID': [random.randrange(1, 20) for n in range(430)]})

df['VALUE1'] = [random.randrange(110, 140) for n in range(430)]
df['VALUE2'] = [random.randrange(50, 60) for n in range(430)]
df['VALUE3'] = [random.randrange(80, 100) for n in range(430)]
df['VALUE4'] = [random.randrange(30, 50) for n in range(430)]

df['MODEL'] = [random.randrange(1, 3) for n in range(430)]

df['SOLD'] = [random.randrange(0, 2) for n in range(430)]

df['INSPECTION'] = df['DATE_TIME'].dt.day

df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')

df['TIME'] = df['DATE_TIME'].dt.time
# df['TIME'] = pd.to_timedelta(df['TIME'])
df['TIME'] = df['TIME'].astype('str')


# Create DAY Night columns only-------------------------------------------------------------------------
def cycle_day_period(dataframe: pd.DataFrame, midnight='00:00:00', start_of_morning='06:00:00',
                     start_of_afternoon='13:00:00',
                     start_of_evening='18:00:00', end_of_evening='23:00:00', start_of_night='24:00:00'):
    bins = [midnight, start_of_morning, start_of_afternoon, start_of_evening, end_of_evening, start_of_night]
    labels = ['Night', 'Morning', 'Morning', 'Night', 'Night']

    return pd.cut(
        pd.to_timedelta(dataframe),
        bins=list(map(pd.Timedelta, bins)),
        labels=labels, right=False, ordered=False
    )


df['CYCLE_PART'] = cycle_day_period(df['TIME'], '00:00:00', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00')

My expectation is to find T_start and T_end like (for a same day 24h measurement) in the picture. Please refer to the drawing since my wording of the problem might be confusing:

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

What you want is unclear, but assuming you want to get the min and max Times that is present in all groups, first groupby.agg to get the min/max per group. Then aggregate again this time getting the max of the minima and min of the maxima:

df.groupby('ID')['TIME'].agg(['min', 'max']).agg({'min': 'max', 'max': 'min'})

If you really need to filter the value before after 12:00:00:

(df.groupby('ID')['TIME']
   .agg(min=lambda x: x[x.lt('12:00:00')].min(),
        max=lambda x: x[x.gt('12:00:00')].max())
   .agg({'min': 'max', 'max': 'min'})
)

Output:

min    07:00:00
max    19:40:00
dtype: object

Intermediate:

df.groupby('ID')['TIME'].agg(['min', 'max'])
         min       max
ID                    
1   00:40:00  20:00:00
2   02:20:00  23:40:00
3   00:20:00  23:40:00
4   01:20:00  23:20:00
5   00:00:00  22:40:00
6   02:00:00  21:40:00
7   00:20:00  23:20:00
8   00:40:00  19:40:00  # min of maxima: 19:40:00
9   00:40:00  22:40:00
10  00:20:00  23:20:00
11  00:00:00  22:00:00
12  02:20:00  23:40:00
13  01:00:00  22:40:00
14  00:00:00  23:00:00
15  00:00:00  23:00:00
16  01:00:00  23:40:00
17  00:00:00  22:40:00
18  00:00:00  22:00:00
19  07:00:00  23:00:00  # max of minima: 07:00:00
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading