Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter a pandas dataframe conditionally

I’m trying to filter a dataframe, and I can’t figure out how to filter to the values I actually want. I don’t really know how to explain what I’m trying to do (which makes searching a real pain), so a sample snippet is below:

race    driver     team      ...
1       Alonso     Ferrari
1       Massa      Ferrari
1       Hamilton   McLaren
...
2       Alonso     McLaren
2       Hamilton   McLaren
2       Massa      Ferrari
...

What I want is to be able to filter based on Alonso’s team for the specific race, in order to find his teammate – such that for race 1, I get Alonso and Massa, but for race 2 I get Alonso and Hamilton. So the intended result after filtering is:

race    driver     team      ...
1       Alonso     Ferrari
1       Massa      Ferrari
2       Alonso     McLaren
2       Hamilton   McLaren
...

I’ve been able to filter to just Alonso’s rows with df[df.driver == "Alonso"], or to all Ferrari and McLaren rows (there are more teams than just those two in the data) with df.team.isin(teams) where teams lists all teams listed in rows where driver matches Alonso, but I can’t seem to combine the two filters in a way that filters conditionally based on which team matches Alonso that race.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is a personal project that I’m using to sharpen back up on Python, which I’ve mostly self-taught, so apologies if there’s an obvious solution that I’ve missed.

Thank you for any help!

>Solution :

Code

use groupby + filter

df.groupby(['race', 'team']).filter(lambda x: x['driver'].eq('Alonso').sum() > 0)

output:

race    driver      team
0   1   Alonso      Ferrari
1   1   Massa       Ferrari
3   2   Alonso      McLaren
4   2   Hamilton    McLaren

Example Code

import pandas as pd
data1 = {'race': [1, 1, 1, 2, 2, 2], 
         'driver': ['Alonso', 'Massa', 'Hamilton', 'Alonso', 'Hamilton', 'Massa'], 
         'team': ['Ferrari', 'Ferrari', 'McLaren', 'McLaren', 'McLaren', 'Ferrari']}
df = pd.DataFrame(data1)

df

race    driver      team
0   1   Alonso      Ferrari
1   1   Massa       Ferrari
2   1   Hamilton    McLaren
3   2   Alonso      McLaren
4   2   Hamilton    McLaren
5   2   Massa       Ferrari
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading