Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter out multiple rows in a pandas.DataFrame based on multiple conditions for the same column

I have an exemplary pd.DataFrame containing codenames of software developed in different development studios:

df = pd.DataFrame({'project_id': [36423, 28564, 96648, 96648, 10042, 68277, 68277, 68277], 'codename': ['banana', 'apple', 'peach', 'peach', 'melon', 'pear', 'pear', 'pear'], 'studio': ['paris', 'amsterdam', 'frankfurt', 'paris', 'london', 'brussel', 'amsterdam', 'sofia']})
      id codename     studio
0  36423   banana      paris
1  28564    apple  amsterdam
2  96648    peach  frankfurt
3  96648    peach      paris
4  10042    melon     london
5  68277     pear    brussel
6  68277     pear  amsterdam
7  68277     pear      sofia

What would be the best way to filter out these rows which hold projects developed

  1. in at least two different studios?
  2. in two specific studios?

The results I am trying to achieve look like as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Which projects are getting developed in at least two different studios:

   project_id codename     studio
0       96648    peach  frankfurt
1       96648    peach      paris
2       68277     pear    brussel
3       68277     pear  amsterdam
4       68277     pear      sofia

Which projects are getting developed in frankfurt AND paris?

   project_id codename     studio
0       96648    peach  frankfurt
1       96648    peach      paris

Using df.loc[df['studio'].isin(['frankfurt', 'paris'])] for instance does not work, as this function filters out all rows which contain either frankfurt or paris in the column studio. Is there a more elegant way than filtering the dataframe for frankfurt and paris and using the Series.intersection() method? I am running out of Ideas here.

Thanks in advance! 🙂

>Solution :

For the first question:

df[df.groupby('project_id')['studio'].transform('nunique').ge(2)]

output:

   project_id codename     studio
2       96648    peach  frankfurt
3       96648    peach      paris
5       68277     pear    brussel
6       68277     pear  amsterdam
7       68277     pear      sofia

For the second:

df[df.groupby('project_id')['studio']
     .transform(lambda x: set(x)=={'frankfurt', 'paris'})]
# if you want at least frankfurt+paris, use
# set(x)>={'frankfurt', 'paris'})

output:

   project_id codename     studio
2       96648    peach  frankfurt
3       96648    peach      paris
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading