Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to relevel a factor variable with over 500 levels efficiently in R

I haven’t been able to find any answers to this specific problem:

I have a factor variable with over 500 levels, that I need to relevel to just 2 levels (1/0.)

Many of the levels start with the same character string e.g. "Woman’s mother or sister:"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there a way to use starts_with to relevel all of these levels at the same time, instead of doing one by one as I have been doing with this code:

   levels(DF1$MedicalCondition)[levels(DF1$MedicalCondition) == "Woman's mother or sister: sister"] <- "1"

Any help appreciated, thank you!

>Solution :

tidyselect::starts_with is specifically written for use on column names within dplyr-type functions, but you can use the base R startsWith:

levels(DF1$MedicalCondition)[
  startsWith(levels(DF1$MedicalCondition), "Woman's mother or sister")
] <- "1"

You can also use general regex patterns with grepl or stringr::str_detect, which can be very powerful.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading