Goal: remove the biggest value (slowest time) from each subset (column Model) from Dataframe.
DataFrame:
Model Time
1 bert-base-uncased 6.570979
2 bert-base-uncased 11.570979
3 bert-base-uncased 6.788779
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
6 albert-large-v1 9.727373
Desired Dataframe:
Model Time
1 bert-base-uncased 6.570979
2 bert-base-uncased 6.788779
3 albert-large-v1 5.785576
4 albert-large-v1 5.603203
Code:
subsets = df['Model']
for s in subsets:
df[s]
Please let me know if there’s anything else I can add to post.
>Solution :
Solution for remove all maximals values per groups by GroupBy.transform with max and compare for not equal by Series.ne, filter in boolean indexing:
print (df)
Model Time
1 bert-base-uncased 6.570979
2 bert-base-uncased 11.570979
3 bert-base-uncased 6.788779
2 bert-base-uncased 11.570979 <- added another maximum per bert-base-uncased
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
6 albert-large-v1 9.727373
df1 = df[df['Time'].ne(df.groupby('Model')['Time'].transform('max'))]
print (df1)
Model Time
1 bert-base-uncased 6.570979
3 bert-base-uncased 6.788779
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
If need remove only first maximum add condition:
df2 = (df[df['Time'].ne(df.groupby('Model')['Time'].transform('max')) |
df['Time'].duplicated()])
print (df2)
Model Time
1 bert-base-uncased 6.570979
3 bert-base-uncased 6.788779
2 bert-base-uncased 11.570979
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203