I have a DataFrame that looks like:
c1 c2 c3
0 10 100 200
1 11 110 233
2 12 120 444
3 33 100 776
I need to go through each row of the DataFrame and check if the value in c2 is unique to just c2 (i.e. there is only one of that value in the entire c2 of that df). If not unique add "not unique" to the end of the row if unique then add unique to then end of the row. Expected output:
c1 c2 c3 c4
0 10 100 200 not unique
1 11 110 233 unique
2 12 120 444 unique
3 33 100 776 not unique
I have tried a few things so far and have not been able to get the results that i want:
for x in dfs:
if x["c2"].unique(): #i also tried x[x['c2']]
dfs["duplicated"] = "unique"
else:
dfs["duplicated"] = "not_unique"
or
dfs["Duplicates"] = np.where(dfs.c2.duplicated(), "not_unique", "unique")
>Solution :
Use numpy.where
with Series.duplicated
:
In [318]: import numpy as np
In [319]: df['c4'] = np.where(df['c2'].duplicated(keep=False), 'not unique', 'unique')
In [320]: df
Out[320]:
c1 c2 c3 c4
0 10 100 200 not unique
1 11 110 233 unique
2 12 120 444 unique
3 33 100 776 not unique