Suppose I have a df like so:
foo = pd.DataFrame(
{
'a': [1, 2, 3],
'b': ['2021-01-05 05:15', '2021-01-06 11:10', '2021-03-01 09:00']
}
)
And I want to convert column b to datetime and extract only the date part. I can do something like:
foo['date'] = pd.to_datetime(foo.b).dt.date
But even though this returns a Numpy array of datetime objects, Pandas doesn’t recognise this and assigns an object dtype to the column:
foo.dtypes
Out:
a int64
b object
date object
dtype: object
I can of course get it to be a datetime by casting it to datetime again:
foo['date'] = pd.to_datetime(pd.to_datetime(foo.b).dt.date)
I can also get it with string slicing
foo['date2'] = pd.to_datetime(foo.b.str[:11])
But I feel like there must be a cleaner way of getting a date out of datetime column.
>Solution :
You can use dt.normalize:
foo['date'] = pd.to_datetime(foo['b']).dt.normalize()
Output:
>>> foo
a b date
0 1 2021-01-05 05:15 2021-01-05
1 2 2021-01-06 11:10 2021-01-06
2 3 2021-03-01 09:00 2021-03-01
>>> foo.dtypes
a int64
b object
date datetime64[ns]
dtype: object
However your last solution is a good solution pd.to_datetime(foo.b.str[:11]).