Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check if the values in a column are unique, if they are unique add to end of the row if not unique add not unique to end of row

I have a DataFrame that looks like:

   c1   c2   c3
0  10  100  200
1  11  110  233
2  12  120  444
3  33  100  776

I need to go through each row of the DataFrame and check if the value in c2 is unique to just c2 (i.e. there is only one of that value in the entire c2 of that df). If not unique add "not unique" to the end of the row if unique then add unique to then end of the row. Expected output:

   c1   c2   c3     c4
0  10  100  200  not unique
1  11  110  233  unique
2  12  120  444  unique
3  33  100  776  not unique

I have tried a few things so far and have not been able to get the results that i want:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for x in dfs:
    if x["c2"].unique(): #i also tried x[x['c2']]
        dfs["duplicated"] = "unique"
    else:
        dfs["duplicated"] = "not_unique"

or

dfs["Duplicates"] = np.where(dfs.c2.duplicated(), "not_unique", "unique")

>Solution :

Use numpy.where with Series.duplicated:

In [318]: import numpy as np

In [319]: df['c4'] = np.where(df['c2'].duplicated(keep=False), 'not unique', 'unique')

In [320]: df
Out[320]: 
   c1   c2   c3          c4
0  10  100  200  not unique
1  11  110  233      unique
2  12  120  444      unique
3  33  100  776  not unique
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading