Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Change certain categorical variables to a unified entry

Let’s say I have have a dataframe with a column called animals. The entries look as followed:

'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'E', 'F', 'G', 'H', 'I'.

I want to change the entries ‘E’, ‘F’, ‘G’, ‘H’ and ‘I’ to another unified entry called ‘D’. What is the best way to transform all these categorical entries into one category?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can create a list of the entries you want to change, and then you can assign ‘D’ for them using loc to spot them, and isin to evalute if your condition is satisfied:

li = ['E','F','G','H','I']
df.loc[df.animals.isin(li), 'animals'] = 'D'

An alternative to loc, would be numpy‘s where:

df['animals'] = np.where(df['animals'].isin(li),'D',df['animals'])

Which reads: for every row in the animals column, check if the value is in the the list called li and if it is return ‘D’, otherwise keep the column intact

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading