I want to sort values of a pandas dataframe based on date column in descending order but also take into consideration duplicated values of name, product and release version columns such that the sorting contains the rows based on duplicated name, product and release version columns in consecutive rows. But I’m not able to get the desired result.
This is the sample code that I wrote:-
data = {
'Name': ['John', 'Alice', 'John', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob'],
'Product': ['A', 'B', 'A', 'C', 'B', 'C', 'B', 'C'],
'Release Version': ['1.6', '2.0', '1.5', '3.0', '2.5', '3.2', '2.6', '2.8'],
'Date': ['2022-05-15', '2022-04-20', '2022-05-10', '2022-05-01', '2022-04-25', '2022-05-05', '2022-04-29', '2022-04-27']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Convert the 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])
# Sort values based on 'Date' in descending order and 'Name', 'Product', 'Release Version' columns
df = df.sort_values(['Date', 'Name', 'Product', 'Release Version'], ascending=[False, True, True, False])
This gives me the below result:-
Name Product Release Version Date
John A 1.6 2022-05-15
John A 1.5 2022-05-10
Bob C 3.2 2022-05-05
Bob C 3.0 2022-05-01
Alice B 2.6 2022-04-29
Bob C 2.8 2022-04-27
Alice B 2.5 2022-04-25
Alice B 2.0 2022-04-20
The desired result is something like this:-
Name Product Release Version Date
John A 1.6 2022-05-15
John A 1.5 2022-05-10
Bob C 3.2 2022-05-05
Bob C 3.0 2022-05-01
Bob C 2.8 2022-04-27
Alice B 2.6 2022-04-29
Alice B 2.5 2022-04-25
Alice B 2.0 2022-04-20
Would be great if someone can help me with this.
>Solution :
If I understand correctly, you first need to sort by the max date per group (obtained with groupby.transform('max')), and last by the date:
out = (df
.assign(max_date=df.groupby(['Name', 'Product'])['Date'].transform('max'))
.sort_values(['max_date', 'Name', 'Product', 'Release Version', 'Date'],
ascending=[False, True, True, False, False])
.drop(columns='max_date')
)
Output:
Name Product Release Version Date
0 John A 1.6 2022-05-15
2 John A 1.5 2022-05-10
5 Bob C 3.2 2022-05-05
3 Bob C 3.0 2022-05-01
7 Bob C 2.8 2022-04-27
6 Alice B 2.6 2022-04-29
4 Alice B 2.5 2022-04-25
1 Alice B 2.0 2022-04-20
Intermediate:
Name Product Release Version Date max_date
0 John A 1.6 2022-05-15 2022-05-15
2 John A 1.5 2022-05-10 2022-05-15
5 Bob C 3.2 2022-05-05 2022-05-05
3 Bob C 3.0 2022-05-01 2022-05-05
7 Bob C 2.8 2022-04-27 2022-05-05
6 Alice B 2.6 2022-04-29 2022-04-29
4 Alice B 2.5 2022-04-25 2022-04-29
1 Alice B 2.0 2022-04-20 2022-04-29