Home Set first date of the year when only it has only the year in a pandas dataframe

Questions

Set first date of the year when only it has only the year in a pandas dataframe

May 25, 2023

I have a column name called "date" in one pandas dataframe, this are the first 10 rows:

0    22-Oct-2022
1     3-Dec-2019
2    27-Jun-2022
3           2023
4    15-Jul-2017
5           2019
6     7-Sep-2022
7           2021
8    30-Sep-2022
9    17-Aug-2021

I want convert all those dates to for example:

0    2023-05-19 
1    2023-01-20 
2    ...

and for those rows that only has the YEAR I want set it to for example, if the original df has:

0           2019
1           2021

5           2019-01-01
7           2021-01-01

in other words I mean I want set for this cases set the first date of the year but keeping the original year not the current year.

I tried:

df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')

However it’s generating NaT values. I hope that you understand this case guys, I will appreciate any idea to fix this problem

thanks.

>Solution :

You can set the format as mixed (New in 2.0.0, see GH50972) when calling to_datetime :

format : str, default None

"mixed", to infer the format for each element individually. This is
risky, and you should probably use it along with dayfirst.

df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)

Or a classical double date-parsing + fillna :

df["date"] = (
    pd.to_datetime(df["date"], errors="coerce", format="%Y")
        .fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)

Output :

print(df)

        date
0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17