I have a column name called "date" in one pandas dataframe, this are the first 10 rows:
0 22-Oct-2022
1 3-Dec-2019
2 27-Jun-2022
3 2023
4 15-Jul-2017
5 2019
6 7-Sep-2022
7 2021
8 30-Sep-2022
9 17-Aug-2021
I want convert all those dates to for example:
0 2023-05-19
1 2023-01-20
2 ...
and for those rows that only has the YEAR I want set it to for example, if the original df has:
0 2019
1 2021
to
5 2019-01-01
7 2021-01-01
in other words I mean I want set for this cases set the first date of the year but keeping the original year not the current year.
I tried:
df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')
However it’s generating NaT values. I hope that you understand this case guys, I will appreciate any idea to fix this problem
thanks.
>Solution :
You can set the format as mixed (New in 2.0.0, see GH50972) when calling to_datetime :
format: str, default None"mixed", to infer the format for each element individually. This is
risky, and you should probably use it along with dayfirst.
df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)
Or a classical double date-parsing + fillna :
df["date"] = (
pd.to_datetime(df["date"], errors="coerce", format="%Y")
.fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)
Output :
print(df)
date
0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17