Mutate tibble conditionally based on prefixes

January 4, 2023

I am trying to mutate a tibble based on the following conditions:

For each row, if the column containing only the prefix, i.e., a or b, has the value 1, other columns starting with the prefix in question should be recoded to 1 as well
However, for each row, if any of the columns starting with the prefix has the value 1, the values in all rows beginning with that prefix should remain
The columns that are named with only the prefix should be deleted after the mutation.

A reproducible example is:

tibble(a = c(1, 1, 0, 0, 1),
       a.1 = c(0, 0, 1, 0, 1),
       a.2 = c(0, 0, 0, 1, 0),
       b = c(0, 0, 0, 0, 1),
       b.1 = c(0, 0, 0, 1, 0),
       b.2 = c(0, 0, 0, 0, 0))

# A tibble: 5 × 6
      a   a.1   a.2     b   b.1   b.2
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     0     0     0     0
2     1     0     0     0     0     0
3     0     1     0     0     0     0
4     0     0     1     0     1     0
5     1     1     0     1     0     0

The end result should look like:

tibble(
       a.1 = c(1, 0, 1, 0, 1),
       a.2 = c(1, 0, 0, 1, 0),
       b.1 = c(0, 0, 0, 1, 1),
       b.2 = c(0, 0, 0, 0,

 1))

# A tibble: 5 × 4
    a.1   a.2   b.1   b.2
  <dbl> <dbl> <dbl> <dbl>
1     1     1     0     0
2     0     0     0     0
3     1     0     0     0
4     0     1     1     0
5     1     0     1     1

There is not a constant amount of variables for each prefix in my real data. Thus, I am trying to write a general function.

If anyone can help me out, it is greatly appreciated 🙂

>Solution :

A solution with split.default + map_dfc:

tbl %>% 
  split.default(gsub("\\..*", "", colnames(.))) %>% 
  map_dfc(~ {.x[.x[1] == 1 & rowSums(.x[-1]) == 0, ] <- 1
         .x[-1]})

output

# A tibble: 5 × 4
    a.1   a.2   b.1   b.2
  <dbl> <dbl> <dbl> <dbl>
1     1     1     0     0
2     1     1     0     0
3     1     0     0     0
4     0     1     1     0
5     1     0     1     1