Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract time from string in dataframe's rows into new columns

I have a a dataframe where each lines contains text, and the start of this text is a time that is in the format of 00:00:00-00:00:01.
I would like to extract the starting time, the ending time but also the difference between those time into 3 columns of my df.

Here is a sample of the dataframe :

text
10:30:00-12:30:00 some random text(some special caracters [-‘(@ ) some numbers 456231 386
15:35:10-15:36:12 some other text some numbers 9875321651132

Here is the expected result

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

text start time end time delta
10:30:00-12:30:00 some random text(some special caracters [-‘(@ ) some numbers 456231 386 10:30:00 12:30:00 02:00:00
15:35:10-15:36:12 some other text some numbers 9875321651132 15:35:10 15:36:12 00:01:02

I think using str.extract like this would do the trick but can’t get the matching as I would like, where the pattern match a 00:00:00 format

df['start time'] = df['text'].str.extract(r'( ^Pattern)')  # ^ for the begining of the string where the start time is.
df['end time'] = df['text'].str.extract(r'( -Patern)')     
#the - because the end time is between a "-" so the pattern for the end time would have to start with "-" to get it right 

And for the delta, I was basically thinking about converting the start and end time strings to datetime and doing :

df['delta'] = df['end time']- df['start time']

>Solution :

For start_time and end_time, you can do:

df[['start_time','end_time']] = df['text'].str.extract('(\d{2}:\d{2}:\d{2})-(\d{2}:\d{2}:\d{2})')

Then for delta, you can convert the strings into Timedelta, get the difference, and convert to string:

df['delta'] = (pd.to_timedelta(df['end_time'])
    .sub(pd.to_timedelta(df['start_time']))
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading