Advertisements
I have cancer data and each patient had 1-4 measurements. Some measurements had cytology done, others had pathology done, some had both.
library(dplyr)
library(tibble)
data<-tribble(
~record_number, ~tool, ~cytology, ~pathology,
114, "forceps", "Indeterminate", NA,
114, "needle", "Non-Malignant", "Malignant",
114, "lavage", NA, "Indeterminate",
115, "forceps", NA, "Non-Malignant",
115, "needle", NA, "Malignant"
)
I’d like to create a Malignancy variable (0/1) if "Malignant" occurs for any of the samples (rows) for a given subject (record_number), in either of the columns (cytology, pathology).
Any ideas are appreciated!
desired<-tribble(
~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
114, "forceps", "Indeterminate", NA, 1,
114, "needle", "Non-Malignant", "Malignant", 1,
114, "lavage", NA, "Indeterminate", 1,
115, "forceps", NA, "Non-Malignant", 1,
115, "needle", NA, "Malignant", 1,
)
I’m thinking it will start with group_by(record_number)…but then what?
desired<-data %>%
group_by(record_number) %>%
...?
>Solution :
We can ifelse
with any
:
library(dplyr) #> 1.1.0
data %>%
mutate(Malignant = ifelse(any(cytology == "Malignant" | pathology == "Malignant"), 1, 0), .by=record_number)
record_number tool cytology pathology Malignant
<dbl> <chr> <chr> <chr> <dbl>
1 114 forceps Indeterminate NA 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage NA Indeterminate 1
4 115 forceps NA Non-Malignant 1
5 115 needle NA Malignant 1