Mutate one column by filtering others in dplyr

February 23, 2022

I have a dataset with several columns that are mutually exclusive. One of the columns contains the answer for that observation (e.g. "abc") while the rest of columns contains negative numbers that represent NA. As column A in the toy dataset below, a column always contains the same answer .

I want to create a new variable by merging the column with the answer for each observation. So far, I did it as below, but in the real dataset there are 50 columns for 2000 observations, so I’m looking for a more optimized (and more elegant) way to do it.

data <- tibble::tribble(
  ~id, ~A, ~B, ~C,
  "a", "abc", "-1", "-2",
  "b", "-1","-3", "def",
  "c", "-2", "ghi", "-1",
  "d", "abc", "-3", "-1"
)

data %>% 
  dplyr::mutate(across(A:C, ~ ifelse(str_starts(.,"-"), "", .))) %>%
  dplyr::mutate(regions = paste0(A, B, C))

Thank you.

>Solution :

We may use unite from tidyr and then remove the numeric/- characters with str_remove_all

library(dplyr)
library(tidyr)
library(stringr)
data %>%
  unite(regions, A:C, remove = FALSE) %>%
  mutate(regions = str_remove_all(regions, "[^a-z]+"))

-output

# A tibble: 4 × 5
  id    regions A     B     C    
  <chr> <chr>   <chr> <chr> <chr>
1 a     abc     abc   -1    -2   
2 b     def     -1    -3    def  
3 c     ghi     -2    ghi   -1   
4 d     abc     abc   -3    -1