Sort dataframe by dates

Advertisements

I am importing an excel file in which a column is a date in the format dd/mm/yyyy.

When I import it from the excel file, I think it is understood as a string. I need to sort the whole DataFrame by date, so I perform this code:
import pandas as pd

import pandas as pd

dictionary = {
      "DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],    
      "CONCEPT" : ['Supermarket','Restaurant', 'Gas', 'Suscription'],
      "EUR" : [-150,-50,-45,-95]
          }
df = pd.DataFrame(dictionary)

df['DATE'] =  pd.to_datetime(df['DATE']).dt.strftime('%d/%m/%Y')

df = df.sort_values(by=['DATE'],axis=0, ascending=True)

If you perform this example, you will see it works perfectly fine, as the first-row date, 12/02/2023, is sorted in the last position. However, when I am using my real excel file, this date is interpreted as the 2nd of December 2023. Moreover, it sorts the date column as strings as not as dates; therefore, 31/01/2023 goes after 28/02/2023.

How could I solve this problem?

>Solution :

You need to use the dayfirst argument to to_datetime to make the conversion work as expected. Using your sample data:

pd.to_datetime(df['DATE'])
# 0   2023-12-02
# 1   2023-02-01
# 2   2023-02-01
# 3   2023-10-02
pd.to_datetime(df['DATE'], dayfirst=True)
# 0   2023-02-12
# 1   2023-01-02
# 2   2023-01-02
# 3   2023-02-10

Your DATE column is then being sorted as strings because you have converted it to a string using strftime. You should sort first and then convert to a string just for display purposes.

df['DATE'] =  pd.to_datetime(df['DATE'], dayfirst=True)
df = df.sort_values(by=['DATE'],axis=0, ascending=True)
df['DATE'] = df['DATE'].dt.strftime('%d/%m/%Y')

Output

         DATE      CONCEPT  EUR
1  02/01/2023   Restaurant  -50
2  02/01/2023          Gas  -45
3  10/02/2023  Suscription  -95
0  12/02/2023  Supermarket -150

Leave a ReplyCancel reply