Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I make a conditional composite score based off a categorical variable in R?

Dataset

I have a dataset similar to this:

stack_tib <- structure(list(ID = 1:50, Phase = c("Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", 
"Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", 
"Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", 
"Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2", "Wave_1", 
"Wave_2", "Wave_1", "Wave_2", "Wave_1", "Wave_2"), ABC1 = c(NA, 
45, NA, 51, NA, 84, NA, 25, NA, 41, NA, 57, NA, 52, NA, 86, NA, 
11, NA, 41, NA, 46, NA, 35, NA, 16, NA, 53, NA, 75, NA, 44, NA, 
68, NA, 64, NA, 49, NA, 42, NA, 46, NA, 93, NA, 28, NA, 41, NA, 
48), ABC2 = c(NA, 49, NA, 77, NA, 80, NA, 62, NA, 54, NA, 40, 
NA, 30, NA, 56, NA, 51, NA, 91, NA, 4, NA, 36, NA, 71, NA, 26, 
NA, 47, NA, 58, NA, 63, NA, 57, NA, 59, NA, 73, NA, 61, NA, 37, 
NA, 38, NA, 81, NA, 29), ABC3 = c(NA, 55, NA, 43, NA, 49, NA, 
17, NA, 68, NA, 62, NA, 49, NA, 56, NA, 37, NA, 30, NA, 31, NA, 
45, NA, 37, NA, 52, NA, 49, NA, 59, NA, 42, NA, 73, NA, 65, NA, 
21, NA, 45, NA, 20, NA, 39, NA, 64, NA, 24), RABC1 = c(66, NA, 
57, NA, 48, NA, 61, NA, 70, NA, 71, NA, 25, NA, 42, NA, 63, NA, 
60, NA, 46, NA, 49, NA, 35, NA, 51, NA, 59, NA, 29, NA, 43, NA, 
45, NA, 72, NA, 65, NA, 54, NA, 52, NA, 24, NA, 62, NA, 38, NA
), RABC2 = c(94, NA, 45, NA, 42, NA, 34, NA, 83, NA, 52, NA, 
75, NA, 30, NA, 41, NA, 25, NA, 39, NA, 72, NA, 43, NA, 36, NA, 
68, NA, 89, NA, 54, NA, 39, NA, 46, NA, 56, NA, 34, NA, 80, NA, 
46, NA, 48, NA, 37, NA), RABC3 = c(42, NA, 43, NA, 82, NA, 72, 
NA, 48, NA, 40, NA, 51, NA, 96, NA, 47, NA, 42, NA, 67, NA, 64, 
NA, 67, NA, 53, NA, 59, NA, 17, NA, 58, NA, 52, NA, 54, NA, 46, 
NA, 73, NA, 73, NA, 90, NA, 87, NA, 50, NA)), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

Problem

This is what my data looks like:

# A tibble: 50 × 8
      ID Phase   ABC1  ABC2  ABC3 RABC1 RABC2 RABC3
   <int> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1 Wave_1    NA    NA    NA    66    94    42
 2     2 Wave_2    45    49    55    NA    NA    NA
 3     3 Wave_1    NA    NA    NA    57    45    43
 4     4 Wave_2    51    77    43    NA    NA    NA
 5     5 Wave_1    NA    NA    NA    48    42    82
 6     6 Wave_2    84    80    49    NA    NA    NA
 7     7 Wave_1    NA    NA    NA    61    34    72
 8     8 Wave_2    25    62    17    NA    NA    NA
 9     9 Wave_1    NA    NA    NA    70    83    48
10    10 Wave_2    41    54    68    NA    NA    NA
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

You can see here that there are 3 items for ABC, 3 items for RABC. However you may also notice that the ABC values are NA for Wave 1 and the RABC values are NA for Wave 2. It is not this uniform, but basically those who were tested at different times were given slightly different scales. As such, I need to find a way to run a composite for the specific ID of each person based on which version it is (basically which Wave in this case). So ideally it would look something like this:

# A tibble: 50 × 9
      ID Phase   ABC1  ABC2  ABC3 RABC1 RABC2 RABC3 Sum_ABC
   <int> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1     1 Wave_1    NA    NA    NA    66    94    42     149
 2     2 Wave_2    45    49    55    NA    NA    NA     138
 3     3 Wave_1    NA    NA    NA    57    45    43     144
 4     4 Wave_2    51    77    43    NA    NA    NA     150
 5     5 Wave_1    NA    NA    NA    48    42    82     157
 6     6 Wave_2    84    80    49    NA    NA    NA     133
 7     7 Wave_1    NA    NA    NA    61    34    72     147
 8     8 Wave_2    25    62    17    NA    NA    NA     158
 9     9 Wave_1    NA    NA    NA    70    83    48     145
10    10 Wave_2    41    54    68    NA    NA    NA     152

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

What about this?

stack_tib %>% 
  mutate(Sum_ABC = rowSums(.[-c(1, 2)], na.rm = T))

#More flexible:
stack_tib %>% 
  mutate(Sum_ABC = rowSums(select(., where(is.numeric)), na.rm = T))

#Select if contains 'ABC':
stack_tib %>% 
  mutate(Sum_ABC = rowSums(select(., contains("ABC")), na.rm = T))

output

# A tibble: 50 × 9
      ID Phase   ABC1  ABC2  ABC3 RABC1 RABC2 RABC3 Sum_ABC
   <int> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1     1 Wave_1    NA    NA    NA    66    94    42     202
 2     2 Wave_2    45    49    55    NA    NA    NA     149
 3     3 Wave_1    NA    NA    NA    57    45    43     145
 4     4 Wave_2    51    77    43    NA    NA    NA     171
 5     5 Wave_1    NA    NA    NA    48    42    82     172
 6     6 Wave_2    84    80    49    NA    NA    NA     213
 7     7 Wave_1    NA    NA    NA    61    34    72     167
 8     8 Wave_2    25    62    17    NA    NA    NA     104
 9     9 Wave_1    NA    NA    NA    70    83    48     201
10    10 Wave_2    41    54    68    NA    NA    NA     163
# … with 40 more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading