I have a string column (object type):
Date
2020-06-15
2019-07-23
Data non available
How can I convert the string Data non available and/or any missing values to the format 0000-00-00, before extracting the month and the year?
I can convert the Date column to datetime, and extract information on year and month from the other rows, but then I get this error:
ParserError: Unknown string format: Data not available
My code:
df['Date'] = pd.to_datetime(df['Date'])
df['Date_Year'], df['Date_month'] = df['Date'].dt.year, df['Date'].dt.month
Expected output:
Date Date_Year Date_month
2020-06-15 2020 06
2019-07-23 2019 07
Data non available 0000 00
>Solution :
I’d replace Data not available with NaT (Not A Time – NaN equivalent for dates/times) before you call to_datetime:
df['Date'] = df['Date'].replace({'Data non available': pd.NaT})
Then
df['Date'] = pd.to_datetime(df['Date'])
df['Date_Year'], df['Date_month'] = df['Date'].dt.year, df['Date'].dt.month
Output:
>>> df
Date Date_Year Date_month
0 2020-06-15 2020.0 6.0
1 2019-07-23 2019.0 7.0
2 NaT NaN NaN