Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding Min and Max Datetime of row

I have a CSV that has several columns that contain Datetime data. After removing rows that are "Invalid", I want to be able to find the Min and Max Datetime value of each row and and place this result as a new columns. However I seem to be getting a Future Warning error when I attempt to add the code for these 2 new columns.

FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

Here is my script:

import pandas as pd
import os
import glob

# Datetime columns
dt_columns = ['DT1','DT2','DT3']

df = pd.read_csv('full_list.csv', dtype=object)

# Remove "invalid" Result rows 
remove_invalid = df['Result'].str.contains('Invalid')
df.drop(index=df[remove_invalid].index, inplace=True)

for eachCol in dt_columns:
    df[eachCol] = pd.to_datetime(df[eachCol]).dt.strftime('%Y-%m-%d %H:%M:%S')

# Create new columns
df['MinDatetime'] = df[dt_columns].min(axis=1)
df['MaxDatetime'] = df[dt_columns].max(axis=1)

# Save as CSV
df.to_csv('test.csv', index=False)

For context, the CSV looks something like this (usually contains 100000+ rows):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Area    Serial  Route   DT1                 DT2                 DT3                 Result
Brazil  17763   4       13/08/2021 23:46:31 16/10/2021 14:04:27 28/10/2021 08:19:59 Confirmed
China   28345   2       15/09/2021 03:09:21 24/04/2021 09:56:34 04/05/2021 22:07:13 Confirmed
Malta   13630   5       21/03/2021 11:59:27 18/09/2021 11:03:25 02/07/2021 02:32:48 Invalid
Serbia  49478   2       12/04/2021 06:38:05 19/03/2021 03:16:47 29/06/2021 06:39:30 Confirmed
France  34732   1       29/04/2021 03:03:14 24/03/2021 01:49:48 26/04/2021 06:44:21 Invalid
Mexico  21840   3       23/11/2021 12:53:33 10/01/2022 02:42:48 29/04/2021 14:22:51 Invalid
Ukraine 20468   3       04/11/2021 18:40:44 13/11/2021 03:38:39 11/03/2021 09:09:14 Invalid
China   28830   1       07/02/2021 23:50:34 03/12/2021 14:04:32 14/07/2021 22:59:10 Confirmed
India   49641   4       02/06/2021 11:17:35 09/05/2021 13:51:55 19/01/2022 06:56:07 Confirmed
Greece  43163   3       30/11/2021 09:31:29 28/01/2021 08:52:50 12/05/2021 07:49:48 Invalid

I want the minimum and maximum value of each row as a new column but at the moment the code produces the 2 columns with no data in.
What am I doing wrong?

>Solution :

Better filtering without drop:

# Remove "invalid" Result rows 
remove_invalid = df['Result'].str.contains('Invalid')
df = df[~remove_invalid]

You need working with datetimes, so cannot after converting use Series.dt.strftime for strings repr of datetimes:

for eachCol in dt_columns:
    df[eachCol] = pd.to_datetime(df[eachCol], dayfirst=True)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading