Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can "unique" show duplicate values in a dataframe?

Background: I am very confused by my dataframe (df), which when I do some simple analyses is producing random rows for a specific value within my column named ‘ID’ (specifically, when ID == 42). As a result, I have started to do some troubleshooting.

When I try to list all the rows where ID = 42, I do:

data=df.loc[df['ID'] == 42]

And the rows look correct in this new variable called ‘data’. However, when I scroll manually through the original dataframe df (e.g., in the Variable Explorer on Spyder), I can see there are way more rows for ID=42 that are not being printed to ‘data’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Then, to double check why the ‘ID’ values are showing this weird behavior, I did

print(df['ID'].unique())

And, weirdly, I get this:

[ 20. 31. 42. 42. 84. 142. 198. 248. 280. 288. 352. 378. 459. 498.]
— note that 42 is repeated!

My question is, how can there be two 42s when I use the .unique() function? I thought it was supposed to output all the unique values? If I could understand this better, I could start to understand the rest of the problems that ensue…

Am I missing something about how ‘unique’ works?

Ps. My files are big so I haven’t included them, but if I need to provide more (numerical) context please let me know.

Thanks!

>Solution :

Moving my comment to an answer, as it solved the problem:

print(df['ID'].astype(int).unique())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading