Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Collapse levels of a factor when number of observations within a level are below a limit

I would like a way to collapse levels of a factor based on the number of observations for each level.

For example, if I have the tibble below with a factor column of animals (four levels: cat, dog, hamster, goldfish), can I collapse levels with less than 2 observations into a level called "other"?

# A tibble: 7 × 1
  animal  
  <fct>   
1 cat     
2 cat     
3 cat     
4 dog     
5 dog     
6 hamster 
7 goldfish

This should result in the following…

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# A tibble: 7 × 2
  animal   animal2
  <fct>    <fct>  
1 cat      cat    
2 cat      cat    
3 cat      cat    
4 dog      dog    
5 dog      dog    
6 hamster  other  
7 goldfish other  

I would like to be able to adjust the cut-off (e.g. groups with less that 5 observations) and ideally this would be done using tidyverse.

>Solution :

You’re looking for forcats::fct_lump_min; which collapse to 'Other' levels that appear less than min times:

library(forcats)
library(dplyr)
df %>% 
  mutate(animal2 = fct_lump_min(animal, min = 2),
         animal3 = fct_lump_min(animal, 3))

output

# A tibble: 7 × 3
  animal   animal2 animal3
  <fct>    <fct>   <fct>  
1 cat      cat     cat    
2 cat      cat     cat    
3 cat      cat     cat    
4 dog      dog     Other  
5 dog      dog     Other  
6 hamster  Other   Other  
7 goldfish Other   Other
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading