Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combining two columns into one based on row value in R

I have a dataframe that looks like the following. I’m trying to combine two columns (a_on_off and b_on_off) into a single column (on_off), based on their ID. When ID is not equal to a1, a2, or b1, b2, I would like the on_off column to be empty (NA).

a_on_off b_on_off ID Date
off on b1 2017-08-09
off on a2 2017-08-09
off on a1 2017-08-10
off on a1 2017-08-11
on on x1 2017-08-12
on on x2 2017-08-13
on on y1 2017-08-13
off off b1 2017-08-13
off off a2 2017-08-14
off off a2 2017-08-15
on on b2 2017-08-15
on on y1 2017-08-15
on on x1 2017-08-15
on on y3 2017-08-16

The code I have is almost there, but it not matching the correct row value of a_on_off and b_on_off to the ID number. Instead it is filling in the a_on_off and b_on_off values in order. How do I specify that I want the values to come from the same row?

df$on_off[df$ID == "a1" | df$ID == "a2)"] <- df$a_on_off
df$on_off[df$ID == "b1" | df$ID == "b2)"] <- df$b_on_off

Any help would be appreciated. Thank you!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We can detect the ‘ID’ where there are ‘a’ substring followed by some digits (\\d+) with str_detect and return the values of ‘a_on_off’, similarly do this for the ‘b’ substring in ‘ID’. By default, the remaining will return NA in case_when

library(dplyr)
library(stringr)
df <- df %>%
   mutate(on_off = case_when(str_detect(ID, '^a\\d+') ~ a_on_off, 
         str_detect(ID, '^b\\d+') ~ b_on_off))

-output

df
   a_on_off b_on_off ID       Date on_off
1       off       on b1 2017-08-09     on
2       off       on a2 2017-08-09    off
3       off       on a1 2017-08-10    off
4       off       on a1 2017-08-11    off
5        on       on x1 2017-08-12   <NA>
6        on       on x2 2017-08-13   <NA>
7        on       on y1 2017-08-13   <NA>
8       off      off b1 2017-08-13    off
9       off      off a2 2017-08-14    off
10      off      off a2 2017-08-15    off
11       on       on b2 2017-08-15     on
12       on       on y1 2017-08-15   <NA>
13       on       on x1 2017-08-15   <NA>
14       on       on y3 2017-08-16   <NA>

data

df <- structure(list(a_on_off = c("off", "off", "off", "off", "on", 
"on", "on", "off", "off", "off", "on", "on", "on", "on"), b_on_off = c("on", 
"on", "on", "on", "on", "on", "on", "off", "off", "off", "on", 
"on", "on", "on"), ID = c("b1", "a2", "a1", "a1", "x1", "x2", 
"y1", "b1", "a2", "a2", "b2", "y1", "x1", "y3"), Date = c("2017-08-09", 
"2017-08-09", "2017-08-10", "2017-08-11", "2017-08-12", "2017-08-13", 
"2017-08-13", "2017-08-13", "2017-08-14", "2017-08-15", "2017-08-15", 
"2017-08-15", "2017-08-15", "2017-08-16")), class = "data.frame", 
row.names = c(NA, 
-14L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading