Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sort dataframe by dates

I am importing an excel file in which a column is a date in the format dd/mm/yyyy.

When I import it from the excel file, I think it is understood as a string. I need to sort the whole DataFrame by date, so I perform this code:
import pandas as pd

import pandas as pd

dictionary = {
      "DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],    
      "CONCEPT" : ['Supermarket','Restaurant', 'Gas', 'Suscription'],
      "EUR" : [-150,-50,-45,-95]
          }
df = pd.DataFrame(dictionary)

df['DATE'] =  pd.to_datetime(df['DATE']).dt.strftime('%d/%m/%Y')

df = df.sort_values(by=['DATE'],axis=0, ascending=True)

If you perform this example, you will see it works perfectly fine, as the first-row date, 12/02/2023, is sorted in the last position. However, when I am using my real excel file, this date is interpreted as the 2nd of December 2023. Moreover, it sorts the date column as strings as not as dates; therefore, 31/01/2023 goes after 28/02/2023.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How could I solve this problem?

>Solution :

You need to use the dayfirst argument to to_datetime to make the conversion work as expected. Using your sample data:

pd.to_datetime(df['DATE'])
# 0   2023-12-02
# 1   2023-02-01
# 2   2023-02-01
# 3   2023-10-02
pd.to_datetime(df['DATE'], dayfirst=True)
# 0   2023-02-12
# 1   2023-01-02
# 2   2023-01-02
# 3   2023-02-10

Your DATE column is then being sorted as strings because you have converted it to a string using strftime. You should sort first and then convert to a string just for display purposes.

df['DATE'] =  pd.to_datetime(df['DATE'], dayfirst=True)
df = df.sort_values(by=['DATE'],axis=0, ascending=True)
df['DATE'] = df['DATE'].dt.strftime('%d/%m/%Y')

Output

         DATE      CONCEPT  EUR
1  02/01/2023   Restaurant  -50
2  02/01/2023          Gas  -45
3  10/02/2023  Suscription  -95
0  12/02/2023  Supermarket -150
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading