I have a large df which I have simplified where I would love to create two new column K_status and S_status with based on variables in another column and I am struggling on how best to code for this.
A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
df <- data.frame(A, B, C)
To generate the K_status and S_status additional columns to df my current code is:
df <- df %>%
mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
TRUE ~ "WT"))
This code is not working as my intended new df should look like this
A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
K_status <- c("WT", "Mut", "WT", "WT", "WT", "WT", "WT")
S_status <- c("WT", "WT", "WT", "WT", "WT", "Mut", "WT")
df <- data.frame(A, B, C, K_status, S_status)
Any help in writing this code to generate K_status and S_status would be greatly appreciated. Thank you.
>Solution :
We may use base R – would be more efficient with rowSums to create a logical vector and then do the assignment based on it
i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
df$K_status <- "WT"
df$K_status[i1] <- "Mut"
df$S_status <- "WT"
df$S_status[i2] <- "Mut"
-output
> df
A B C K_status S_status
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT
Or with tidyverse in a vectorized way for efficient execution of code – just create a key/value dataset or a named list, then loop over the columns in if_all, extract the corresponding value from keydat dataset, compare and use case_when to create new columns
library(dplyr)
keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))
df %>%
mutate(K_status = case_when(if_all(everything(),
~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"),
S_status = case_when(if_all(A:C, ~
.x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))
-output
A B C K_status S_status
1 K NA TT WT WT
2 K AA YY Mut WT
3 K AC YY WT WT
4 S NA TT WT WT
5 S AA YY WT WT
6 S AB Y WT Mut
7 NA LD TT WT WT