I am a student relatively new to R and have learnt a lot from browsing here, I have been stuck on something recently which after hours of trying still haven’t been able to figure out what to do. Let’s propose the following data set:
ID Y1 Y2 Y3 Y4
1 0 0 1 1
2 0 0 0 0
3 NA NA NA NA
I want to create a new column where it is filled based upon the following the conditions:
- If the row contains 1, return 1 regardless of NA or 0
- If it contains a mix of 0 and NA but not 1, return 0
- If it only contains NA, return NA
So using the example above I wanted to get the following:
ID Y1 Y2 Y3 Y4 Outcome
1 0 0 1 1 1
2 0 0 0 0 0
3 NA NA NA NA NA
However, the code I tried:
Data2 <- Data %>% mutate(Outcome = case_when(
Data$Y1 == "na" &
Data$Y2 == "na" &
Data$Y3 == "na" &
Data$Y4 == "na" ~ "na")) %>%
mutate(Outcome = case_when(Data$Y1 == 1 ~ "1",
Data$Y2 == 1 ~ "1",
Data$Y3 == 1 ~ "1",
Data$Y4 == 1 ~ "1",
TRUE ~ "No"))
will return with:
ID Y1 Y2 Y3 Y4 Outcome
1 0 0 1 1 1
2 0 0 0 0 0
3 NA NA NA NA 0
which seems to ignore condition 3 where if it only contains na, return na.
Any pointers as to what I done wrong would be greatly appreciated.
Please forgive the formatting, I’m not sure how I could make it prettier as this is the first time I asked a question here.
Many thanks in advance!
[Edit] Thanks to Shah I noticed that there is potential for confusion, for that I apologise. I need give some clarification that this is just a segment of the data set to get the point across. I’m dealing with a big dataset which contains more columns, some of which also have numeric values.
>Solution :
You can try this using dplyr rowwise function which treat each row separately
library(dplyr)
df |> rowwise() |>
mutate(Outcome = case_when(any(c_across(Y1:Y4) == 1) ~ "1" ,
all(is.na(c_across(Y1:Y4))) ~ NA_character_ , TRUE ~ "0"))
- output
# A tibble: 3 × 6
# Rowwise:
ID Y1 Y2 Y3 Y4 Outcome
<int> <int> <int> <int> <int> <chr>
1 1 0 0 1 1 1
2 2 0 0 0 0 0
3 3 NA NA NA NA NA