Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sort values of a pandas Dataframe based on date column but also take into consideration duplicated values of 3 other columns

I want to sort values of a pandas dataframe based on date column in descending order but also take into consideration duplicated values of name, product and release version columns such that the sorting contains the rows based on duplicated name, product and release version columns in consecutive rows. But I’m not able to get the desired result.

This is the sample code that I wrote:-

data = {
    'Name': ['John', 'Alice', 'John', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob'],
    'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'B', 'C'],
    'Release Version': ['1.6', '2.0', '1.5', '3.0', '2.5', '3.2', '2.6', '2.8'],
    'Date': ['2022-05-15', '2022-04-20', '2022-05-10', '2022-05-01', '2022-04-25', '2022-05-05', '2022-04-29', '2022-04-27']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Convert the 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Sort values based on 'Date' in descending order and 'Name', 'Product', 'Release Version' columns
df = df.sort_values(['Date', 'Name', 'Product', 'Release Version'], ascending=[False, True, True, False])

This gives me the below result:-

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    Name Product     Release Version    Date
    John    A            1.6     2022-05-15
    John    A            1.5     2022-05-10
    Bob     C            3.2     2022-05-05
    Bob     C            3.0     2022-05-01
    Alice   B            2.6     2022-04-29
    Bob     C            2.8     2022-04-27
    Alice   B            2.5     2022-04-25
    Alice   B            2.0     2022-04-20

The desired result is something like this:-

    Name Product     Release Version    Date
    John    A            1.6         2022-05-15
    John    A            1.5         2022-05-10
    Bob     C            3.2         2022-05-05
    Bob     C            3.0         2022-05-01
    Bob     C            2.8         2022-04-27
    Alice   B            2.6         2022-04-29
    Alice   B            2.5         2022-04-25
    Alice   B            2.0         2022-04-20

Would be great if someone can help me with this.

>Solution :

If I understand correctly, you first need to sort by the max date per group (obtained with groupby.transform('max')), and last by the date:

out = (df
   .assign(max_date=df.groupby(['Name', 'Product'])['Date'].transform('max'))
   .sort_values(['max_date', 'Name', 'Product', 'Release Version', 'Date'],
                ascending=[False, True, True, False, False])
   .drop(columns='max_date')
)

Output:

    Name Product Release Version       Date
0   John       A             1.6 2022-05-15
2   John       A             1.5 2022-05-10
5    Bob       C             3.2 2022-05-05
3    Bob       C             3.0 2022-05-01
7    Bob       C             2.8 2022-04-27
6  Alice       B             2.6 2022-04-29
4  Alice       B             2.5 2022-04-25
1  Alice       B             2.0 2022-04-20

Intermediate:

    Name Product Release Version       Date   max_date
0   John       A             1.6 2022-05-15 2022-05-15
2   John       A             1.5 2022-05-10 2022-05-15
5    Bob       C             3.2 2022-05-05 2022-05-05
3    Bob       C             3.0 2022-05-01 2022-05-05
7    Bob       C             2.8 2022-04-27 2022-05-05
6  Alice       B             2.6 2022-04-29 2022-04-29
4  Alice       B             2.5 2022-04-25 2022-04-29
1  Alice       B             2.0 2022-04-20 2022-04-29
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading