I’m trying to filter a dataframe, and I can’t figure out how to filter to the values I actually want. I don’t really know how to explain what I’m trying to do (which makes searching a real pain), so a sample snippet is below:
race driver team ...
1 Alonso Ferrari
1 Massa Ferrari
1 Hamilton McLaren
...
2 Alonso McLaren
2 Hamilton McLaren
2 Massa Ferrari
...
What I want is to be able to filter based on Alonso’s team for the specific race, in order to find his teammate – such that for race 1, I get Alonso and Massa, but for race 2 I get Alonso and Hamilton. So the intended result after filtering is:
race driver team ...
1 Alonso Ferrari
1 Massa Ferrari
2 Alonso McLaren
2 Hamilton McLaren
...
I’ve been able to filter to just Alonso’s rows with df[df.driver == "Alonso"], or to all Ferrari and McLaren rows (there are more teams than just those two in the data) with df.team.isin(teams) where teams lists all teams listed in rows where driver matches Alonso, but I can’t seem to combine the two filters in a way that filters conditionally based on which team matches Alonso that race.
This is a personal project that I’m using to sharpen back up on Python, which I’ve mostly self-taught, so apologies if there’s an obvious solution that I’ve missed.
Thank you for any help!
>Solution :
Code
use groupby + filter
df.groupby(['race', 'team']).filter(lambda x: x['driver'].eq('Alonso').sum() > 0)
output:
race driver team
0 1 Alonso Ferrari
1 1 Massa Ferrari
3 2 Alonso McLaren
4 2 Hamilton McLaren
Example Code
import pandas as pd
data1 = {'race': [1, 1, 1, 2, 2, 2],
'driver': ['Alonso', 'Massa', 'Hamilton', 'Alonso', 'Hamilton', 'Massa'],
'team': ['Ferrari', 'Ferrari', 'McLaren', 'McLaren', 'McLaren', 'Ferrari']}
df = pd.DataFrame(data1)
df
race driver team
0 1 Alonso Ferrari
1 1 Massa Ferrari
2 1 Hamilton McLaren
3 2 Alonso McLaren
4 2 Hamilton McLaren
5 2 Massa Ferrari