Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Selecting rows with logic involving multiple variables across rows

I have a data frame like this:

df = pd.DataFrame({"product": [1,2,3,4,5], "company": ["A", "B","B","A","B"], "state": ["CA", "NY", "CA", "CA", "NY"]})

company state   product
0   A   CA      1
1   B   NY      2
2   B   CA      3
3   A   CA      4
4   B   NY      5

I would like a bool that picks out just the rows that correspond to states which have only one company in them. In this case that would be only NY which has only company B, so the desired bool would be [False, True, False, False, True]

Alternatively, I would like to know the set of states that have only one company in them. I guess I could do that e.g. using value_counts once I have the bool.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How do I do this?

Thanks!

>Solution :

You can use groupby_nunique with transform to broadcast the result over rows then just check if the result is equal to 1:

df['flag'] = df.groupby('state')['company'].transform('nunique').eq(1)
print(df)

# Output
   product company state   flag
0        1       A    CA  False
1        2       B    NY   True
2        3       B    CA  False
3        4       A    CA  False
4        5       B    NY   True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading