Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Removing columns which start with XXX and meet a criteria

Objective: Remove columns if their name starts with XXX and the rows meet a criteria. For example, in the dataset below, remove all "fillerX" columns which only contain zeros.

Data:

iris %>% 
    tibble() %>% 
    slice(1:5) %>% 
    mutate(
        fillerQ = rep(0,5),
        fillerW = rep(0,5),
        fillerE = rep(0,5),
        fillerR = c(0,0,1,0,0),
        fillerT = rep(0,5),
        fillerY = rep(0,5),
        fillerU = rep(0,5),
        fillerI = c(0,0,0,0,1),
        fillerO = rep(0,5),
        fillerP = rep(0,5),
    )
# A tibble: 5 × 15
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerQ fillerW fillerE fillerR fillerT fillerY fillerU fillerI fillerO fillerP
         <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1          5.1         3.5          1.4         0.2 setosa        0       0       0       0       0       0       0       0       0       0
2          4.9         3            1.4         0.2 setosa        0       0       0       0       0       0       0       0       0       0
3          4.7         3.2          1.3         0.2 setosa        0       0       0       1       0       0       0       0       0       0
4          4.6         3.1          1.5         0.2 setosa        0       0       0       0       0       0       0       0       0       0
5          5           3.6          1.4         0.2 setosa        0       0       0       0       0       0       0       1       0       0

Problem: We can use starts_with("filler") to reference the filler columns, and we can use select_if(~ sum(abs(.)) != 0) to keep non-zero columns, but we cannot put starts_with() inside of select_if(), since we will get the error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error:
! `starts_with()` must be used within a *selecting* function.
ℹ See ?tidyselect::faq-selection-context for details.
Run `rlang::last_trace()` to see where the error occurred.

Question: How do you combine starts_with() and select_if()?

>Solution :

select_if() has been superseded. Use where() inside select() instead.

library(dplyr)

df %>%
  select(!(starts_with("filler") & where(~ all(.x == 0))))

# # A tibble: 5 × 7
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species fillerR fillerI
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>   <dbl>
# 1          5.1         3.5          1.4         0.2 setosa        0       0
# 2          4.9         3            1.4         0.2 setosa        0       0
# 3          4.7         3.2          1.3         0.2 setosa        1       0
# 4          4.6         3.1          1.5         0.2 setosa        0       0
# 5          5           3.6          1.4         0.2 setosa        0       1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading