I’m working with a Pandas dataframe, and have a column of dependant variables (called CLASS), which consists of three classes: Y, N, and P.
However, when I run –
df.CLASS.unique()
I get –
array(['N', 'N ', 'P', 'Y', 'Y '], dtype=object)
I opened up the dataset in Excel, and tried using the filter to see how many unique variables were in the column; Excel says there are only 3.
Terribly confused here, would greatly appreciate some help. The dataset is available here if it’s of any benefit.
>Solution :
"N with a space" and a "single N", both are different in Pandas, but I think, for Excel, they are the same.
You have to preprocess that data, use this:
df['CLASS'] = df['CLASS'].replace('N ', 'N')
df['CLASS'] = df['CLASS'].replace('Y ', 'Y')
df.CLASS.unique() You will get 3 classes after that.
array(['N', 'P', 'Y'], dtype=object)
Update: I tried running =UNIQUE(N2:N1001) this command to find uniques in Excel, and it has returned me 5 values. So, IDK what’s wrong with your Excel.