Combining two columns into one based on row value in R

October 18, 2022

I have a dataframe that looks like the following. I’m trying to combine two columns (a_on_off and b_on_off) into a single column (on_off), based on their ID. When ID is not equal to a1, a2, or b1, b2, I would like the on_off column to be empty (NA).

a_on_off	b_on_off	ID	Date
off	on	b1	2017-08-09
off	on	a2	2017-08-09
off	on	a1	2017-08-10
off	on	a1	2017-08-11
on	on	x1	2017-08-12
on	on	x2	2017-08-13
on	on	y1	2017-08-13
off	off	b1	2017-08-13
off	off	a2	2017-08-14
off	off	a2	2017-08-15
on	on	b2	2017-08-15
on	on	y1	2017-08-15
on	on	x1	2017-08-15
on	on	y3	2017-08-16

The code I have is almost there, but it not matching the correct row value of a_on_off and b_on_off to the ID number. Instead it is filling in the a_on_off and b_on_off values in order. How do I specify that I want the values to come from the same row?

df$on_off[df$ID == "a1" | df$ID == "a2)"] <- df$a_on_off
df$on_off[df$ID == "b1" | df$ID == "b2)"] <- df$b_on_off

Any help would be appreciated. Thank you!

>Solution :

We can detect the ‘ID’ where there are ‘a’ substring followed by some digits (\\d+) with str_detect and return the values of ‘a_on_off’, similarly do this for the ‘b’ substring in ‘ID’. By default, the remaining will return NA in case_when

library(dplyr)
library(stringr)
df <- df %>%
   mutate(on_off = case_when(str_detect(ID, '^a\\d+') ~ a_on_off, 
         str_detect(ID, '^b\\d+') ~ b_on_off))

-output

df
   a_on_off b_on_off ID       Date on_off
1       off       on b1 2017-08-09     on
2       off       on a2 2017-08-09    off
3       off       on a1 2017-08-10    off
4       off       on a1 2017-08-11    off
5        on       on x1 2017-08-12   <NA>
6        on       on x2 2017-08-13   <NA>
7        on       on y1 2017-08-13   <NA>
8       off      off b1 2017-08-13    off
9       off      off a2 2017-08-14    off
10      off      off a2 2017-08-15    off
11       on       on b2 2017-08-15     on
12       on       on y1 2017-08-15   <NA>
13       on       on x1 2017-08-15   <NA>
14       on       on y3 2017-08-16   <NA>

data

df <- structure(list(a_on_off = c("off", "off", "off", "off", "on", 
"on", "on", "off", "off", "off", "on", "on", "on", "on"), b_on_off = c("on", 
"on", "on", "on", "on", "on", "on", "off", "off", "off", "on", 
"on", "on", "on"), ID = c("b1", "a2", "a1", "a1", "x1", "x2", 
"y1", "b1", "a2", "a2", "b2", "y1", "x1", "y3"), Date = c("2017-08-09", 
"2017-08-09", "2017-08-10", "2017-08-11", "2017-08-12", "2017-08-13", 
"2017-08-13", "2017-08-13", "2017-08-14", "2017-08-15", "2017-08-15", 
"2017-08-15", "2017-08-15", "2017-08-16")), class = "data.frame", 
row.names = c(NA, 
-14L))