Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to compare just the date or just date time ignoring seconds in a Python Pandas dataframe column of mixed data types?

In a pandas dataframe, I have a column of mixed data types, such as text, integers and datetimes. I need to find columns where datetimes match: (1) exact values in some cases, (2) only the date (ignoring time), or (3) only the date and time, but ignoring seconds.

In the following code example with a mixed data type dataframe column, there are three dates of varying imprecision. Mapping the conditions into a separate dataframe works for a precise value.

import pandas as pd
import numpy as np
# example data frame
inp = [{'Id': 0, 'mixCol': np.nan},
       {'Id': 1, 'mixCol': "text"},
       {'Id': 2, 'mixCol': 43831},
       {'Id': 3, 'mixCol': pd.to_datetime("2020-01-01 00:00:00")}, 
       {'Id': 4, 'mixCol': pd.to_datetime("2020-01-01 01:01:00")},
       {'Id': 5, 'mixCol': pd.to_datetime("2020-01-01 01:01:01")}
       ]
df = pd.DataFrame(inp)
print(df.dtypes)

myMap = pd.DataFrame()
myMap["Exact"] = df["mixCol"] == pd.to_datetime("2020-01-01 01:01:01")

0   False
1   False
2   False
3   False
4   False
5   True
6   False

The output I need should be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Id   Exact    DateOnly    NoSeconds
0    False    False       False
1    False    False       False
2    False    False       False
3    False    True        False
0    False    True        True 
5    True     True        True 
6    False    False       False 

BUT, mapping just the date, without time, maps as if the date had a time of 00:00:00.

myMap["DateOnly"] = df["mixCol"] == pd.to_datetime("2020-01-01")

Id   Exact    DateOnly
0    False    False   
1    False    False  
2    False    False  
3    False    True   
0    False    False  
5    True     False  
6    False    False  

Trying to convert values in the mixed column throws an AttributeError: ‘Series’ object has not attribute ‘date’; and trying to use ">" and "<" to define the relevant range throws a TypeError: ‘>=’ not supported between instances of ‘str’ and ‘Timestamp’

myMap["DateOnly"] = df["mixCol"].date == pd.to_datetime("2020-01-01")
myMap["NoSeconds"] = (df["mixCol"] >= pd.to_datetime("2020-01-01 01:01:00")) & (df["mixCol"] < pd.to_datetime("2020-01-01 01:02:00"))

If I try to follow the solution for mix columns in pandas proposed here, both the np.nan and text value map true as dates.

df["IsDate"] = df.apply(pd.to_datetime, errors='coerce',axis=1).nunique(1).eq(1).map({True:True ,False:False})

I’m not sure how to proceed in this situation?

>Solution :

Use Series.dt.normalize for compare datetimes with remove times (set them to 00:00:00) or with Series.dt.floor by days or minutes for remove seconds:

#convert column to all datetimes with NaT
d = pd.to_datetime(df["mixCol"], errors='coerce')
myMap["DateOnly"] = d.dt.normalize() == pd.to_datetime("2020-01-01")
myMap["DateOnly"] = d.dt.floor('D') == pd.to_datetime("2020-01-01")

#alternative with dates
myMap["DateOnly"] = d.dt.date == pd.to_datetime("2020-01-01").date()

myMap['NoSeconds'] = d.dt.floor('Min') == pd.to_datetime("2020-01-01 01:01:00")

print (myMap)
   Exact  DateOnly  NoSeconds
0  False     False      False
1  False     False      False
2  False     False      False
3  False      True      False
4  False      True       True
5   True      True       True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading