How can I code to search across multiple columns within a group – R

Advertisements

I have cancer data and each patient had 1-4 measurements. Some measurements had cytology done, others had pathology done, some had both.

library(dplyr)
library(tibble)

data<-tribble(
  ~record_number, ~tool, ~cytology, ~pathology,
  114, "forceps", "Indeterminate", NA,
  114, "needle", "Non-Malignant", "Malignant",
  114, "lavage", NA, "Indeterminate",
  115, "forceps", NA, "Non-Malignant",
  115, "needle", NA, "Malignant"
)

I’d like to create a Malignancy variable (0/1) if "Malignant" occurs for any of the samples (rows) for a given subject (record_number), in either of the columns (cytology, pathology).

Any ideas are appreciated!

desired<-tribble(
  ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  114, "forceps", "Indeterminate", NA, 1,
  114, "needle", "Non-Malignant", "Malignant", 1,
  114, "lavage", NA, "Indeterminate", 1, 
  115, "forceps", NA, "Non-Malignant", 1, 
  115, "needle", NA, "Malignant", 1,
)

I’m thinking it will start with group_by(record_number)…but then what?

desired<-data %>%
  group_by(record_number) %>%
  ...?

>Solution :

We can ifelse with any:

library(dplyr) #> 1.1.0
data %>%
  mutate(Malignant = ifelse(any(cytology == "Malignant" | pathology == "Malignant"), 1, 0), .by=record_number)

  record_number tool    cytology      pathology     Malignant
          <dbl> <chr>   <chr>         <chr>             <dbl>
1           114 forceps Indeterminate NA                    1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  NA            Indeterminate         1
4           115 forceps NA            Non-Malignant         1
5           115 needle  NA            Malignant             1

Leave a ReplyCancel reply