Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter rows based on the value in two columns and a different value in a third

The following code generates a dataframe to illustrate my question

import pandas as pd
data = [[1152, '1', '10'], [1154, '1', '4'],
       [1152, '1', '10'],  [1155, '2', '10'], 
       [1152, '1', '4'],  [1155, '2', '10']]
    
df = pd.DataFrame(data, columns =['Cow', 'Lact', 'Procedure'])

This generates the following

    Cow   Lact  Procedure
0   1152    1   10
1   1154    1   4
2   1152    1   10
3   1155    2   10
4   1152    1   4
5   1155    2   10

I want to identify the rows where Cow and Lact are the same but procedure is different. The output I am looking for is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    Cow   Lact  Procedure
0   1152    1   10
1   1152    1   10
2   1152    1   4

I figure it will require a groupby and filter function but not sure how to put it together.
Thanks

>Solution :

Use groupby.transform('nunique') and boolean indexing:

df[df.groupby(['Cow', 'Lact'])['Procedure'].transform('nunique').gt(1)]

Output:

    Cow Lact Procedure
0  1152    1        10
2  1152    1        10
4  1152    1         4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading