Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Multiple instances of unique variables in dataframe column

I’m working with a Pandas dataframe, and have a column of dependant variables (called CLASS), which consists of three classes: Y, N, and P.
However, when I run –

df.CLASS.unique()

I get –

array(['N', 'N ', 'P', 'Y', 'Y '], dtype=object)

I opened up the dataset in Excel, and tried using the filter to see how many unique variables were in the column; Excel says there are only 3.
Terribly confused here, would greatly appreciate some help. The dataset is available here if it’s of any benefit.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

"N with a space" and a "single N", both are different in Pandas, but I think, for Excel, they are the same.
You have to preprocess that data, use this:

df['CLASS'] = df['CLASS'].replace('N ', 'N')
df['CLASS'] = df['CLASS'].replace('Y ', 'Y')

df.CLASS.unique() You will get 3 classes after that.

array(['N', 'P', 'Y'], dtype=object)

Update: I tried running =UNIQUE(N2:N1001) this command to find uniques in Excel, and it has returned me 5 values. So, IDK what’s wrong with your Excel.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading