Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new column based on non-numerical variables from several columns in the same dataframe in R

I have a large df which I have simplified where I would love to create two new column K_status and S_status with based on variables in another column and I am struggling on how best to code for this.

A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
df <- data.frame(A, B, C)

To generate the K_status and S_status additional columns to df my current code is:

df <- df %>%
mutate(K_status = case_when(all("K", "AA", "YY") %in% df) ~ "Mut",
TRUE ~ "WT")) %>%
mutate(S_status = case_when(all("S", "AB", "Y") %in% df) ~ "Mut",
TRUE ~ "WT")) 

This code is not working as my intended new df should look like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A <- c("K", "K", "K", "S", "S", "S", "NA")
B <- c("NA", "AA", "AC", "NA", "AA", "AB", "LD")
C <- c("TT", "YY", "YY", "TT", "YY", "Y", "TT")
K_status <- c("WT", "Mut", "WT", "WT", "WT", "WT", "WT")
S_status <- c("WT", "WT", "WT", "WT", "WT", "Mut", "WT")
df <- data.frame(A, B, C, K_status, S_status)

Any help in writing this code to generate K_status and S_status would be greatly appreciated. Thank you.

>Solution :

We may use base R – would be more efficient with rowSums to create a logical vector and then do the assignment based on it

 i1 <- rowSums(df == c("K", "AA", "YY")[col(df)]) == 3
 i2 <- rowSums(df == c("S", "AB", "Y")[col(df)]) == 3
 df$K_status <- "WT"
 df$K_status[i1] <- "Mut"
  df$S_status <- "WT"
 df$S_status[i2] <- "Mut"

-output

> df
   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT

Or with tidyverse in a vectorized way for efficient execution of code – just create a key/value dataset or a named list, then loop over the columns in if_all, extract the corresponding value from keydat dataset, compare and use case_when to create new columns

library(dplyr)
keydat <- tibble(A = c("K", "S"), B = c("AA", "AB"), C = c("YY", "Y"))

df %>%
   mutate(K_status = case_when(if_all(everything(),
    ~ .x == keydat[[cur_column()]][1]) ~ "Mut", TRUE ~ "WT"), 
   S_status = case_when(if_all(A:C, ~
    .x == keydat[[cur_column()]][2]) ~ "Mut", TRUE ~ "WT"))

-output

   A  B  C K_status S_status
1  K NA TT       WT       WT
2  K AA YY      Mut       WT
3  K AC YY       WT       WT
4  S NA TT       WT       WT
5  S AA YY       WT       WT
6  S AB  Y       WT      Mut
7 NA LD TT       WT       WT
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading