Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Checking if a column has the same values if the other column has the same value

I have two string columns in a Pandas dataframe.
Column1 has thousands of different values, but column2 has one of five strings, say A, B, C, D, and E.

What I would like to check is if column2 has the same value if the values are same in column1 and identify the index of the row if they are diffent.

idx  col1  col2
1    X     A
2    Y     B
3    Y     B
4    X     A
5    Z     C
6    X     B

In the above, rows 2 and 3 has the same values in column1 and the values in column are the same. So it is okay.
For rows 1, 4, and 6, it is commonly X in column A but the values in column2 are different (A, A, and B).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I need to check the Pandas dataframe meets this requirement, and identify the rows 1, 4, and 6 or the value X if there is one.

>Solution :

Use GroupBy.transform with DataFrameGroupBy.nunique for get rows not number of unique values per groups equal 1:

df1 = df[df.groupby('col1')['col2'].transform('nunique').ne(1)]
print (df1)
   idx col1 col2
0    1    X    A
3    4    X    A
5    6    X    B

Or for get values of column col1 use DataFrameGroupBy.nunique with filter indices of Series:

s = df.groupby('col1')['col2'].nunique()
vals = s.index[s.ne(1)].tolist()
print (vals)
['X']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading