I am importing an excel file in which a column is a date in the format dd/mm/yyyy.
When I import it from the excel file, I think it is understood as a string. I need to sort the whole DataFrame by date, so I perform this code:
import pandas as pd
import pandas as pd
dictionary = {
"DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],
"CONCEPT" : ['Supermarket','Restaurant', 'Gas', 'Suscription'],
"EUR" : [-150,-50,-45,-95]
}
df = pd.DataFrame(dictionary)
df['DATE'] = pd.to_datetime(df['DATE']).dt.strftime('%d/%m/%Y')
df = df.sort_values(by=['DATE'],axis=0, ascending=True)
If you perform this example, you will see it works perfectly fine, as the first-row date, 12/02/2023, is sorted in the last position. However, when I am using my real excel file, this date is interpreted as the 2nd of December 2023. Moreover, it sorts the date column as strings as not as dates; therefore, 31/01/2023 goes after 28/02/2023.
How could I solve this problem?
>Solution :
You need to use the dayfirst
argument to to_datetime
to make the conversion work as expected. Using your sample data:
pd.to_datetime(df['DATE'])
# 0 2023-12-02
# 1 2023-02-01
# 2 2023-02-01
# 3 2023-10-02
pd.to_datetime(df['DATE'], dayfirst=True)
# 0 2023-02-12
# 1 2023-01-02
# 2 2023-01-02
# 3 2023-02-10
Your DATE
column is then being sorted as strings because you have converted it to a string using strftime
. You should sort first and then convert to a string just for display purposes.
df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)
df = df.sort_values(by=['DATE'],axis=0, ascending=True)
df['DATE'] = df['DATE'].dt.strftime('%d/%m/%Y')
Output
DATE CONCEPT EUR
1 02/01/2023 Restaurant -50
2 02/01/2023 Gas -45
3 10/02/2023 Suscription -95
0 12/02/2023 Supermarket -150