Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to avoid recycling while trying to replace values from a vector in a dataframe column

This question arose, while working on this question Replace list names if they exist

I have this manipulated iris dataset with two vectors:

new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")
iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species))

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
         <dbl>       <dbl>        <dbl>       <dbl> <chr>     
1          5.1         3.5          1.4         0.2 setosa    
2          4.9         3            1.4         0.2 setosa    
3          7           3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          6.3         3.3          6           2.5 virginica 
6          5.8         2.7          5.1         1.9 virginica

I would like to replace values in Species selected from a vector (to_select) with values from another vector (new_name)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

When I do:

new_name <- c("new_setoas", "new_virginica")
to_select <- c("setosa", "virginica")
iris %>% 
  group_by(Species) %>% 
  slice(1:2) %>% 
  mutate(Species = as.character(Species)) %>% 
  mutate(Species = ifelse(Species %in% to_select, new_name, Species))

# I get:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
         <dbl>       <dbl>        <dbl>       <dbl> <chr>        
1          5.1         3.5          1.4         0.2 new_setoas   
2          4.9         3            1.4         0.2 **new_virginica** # should be new_setoas
3          7           3.2          4.7         1.4 versicolor   
4          6.4         3.2          4.5         1.5 versicolor   
5          6.3         3.3          6           2.5 **new_setoas** # should be new_virginica   
6          5.8         2.7          5.1         1.9 new_virginica 

While I know this is happening because of recycling. I don’t know how to avoid this!

>Solution :

We may use recode – instead of grouping and then modifying the group column afterwards, it can be done at the group_by step itself

library(dplyr)
iris %>% 
  group_by(Species =  recode(as.character(Species),
     !!!setNames(new_name, to_select))) %>% 
  slice(1:2) 

-output

# A tibble: 6 × 5
# Groups:   Species [3]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species      
         <dbl>       <dbl>        <dbl>       <dbl> <chr>        
1          5.1         3.5          1.4         0.2 new_setoas   
2          4.9         3            1.4         0.2 new_setoas   
3          7           3.2          4.7         1.4 versicolor   
4          6.4         3.2          4.5         1.5 versicolor   
5          6.3         3.3          6           2.5 new_virginica
6          5.8         2.7          5.1         1.9 new_virginica
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading