Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to combine member of one column and then count other columns in R?

I have a data frame:

df <- structure(list(ID = c("x1", "x1", "x1", "x1", "x1", "x1", "x2", "x2", "x2", "x2", "x2", "x2", "x3", "x3", "x3", "x3", "x3", "x3", "x1", "x1", "x1", "x1", "x1", "x1", "x2", "x2", "x2", "x2", "x2", "x2", "x3", "x3", "x3", "x3", "x3", "x3"), col1=c("a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2"), col2 = c("a", "b", "c", "d", "e", "f", "a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f", "a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f"), col3 = c(2,13,1,21,0,5,3,0,6,4,50,0,0,0,0,9,5,0,51,3,6,0,0,9,89,4,29,1,4,17,6,16,9,1,0,0)), 
class = "data.frame", row.names = c(NA,-36L))
ID col1 col2 col3
x1  a1  a   2
x1  a1  b   13
x1  a1  c   1
x1  a1  d   21
x1  a1  e   0
x1  a1  f   5
x2  a1  a   3
x2  a1  b   0
x2  a1  c   6
x2  a1  d   4
x2  a1  e   50
x2  a1  f   0
x3  a1  a   0
x3  a1  b   0
x3  a1  c   0
x3  a1  d   9
x3  a1  e   5
x3  a1  f   0
x1  a2  a   51
x1  a2  b   3
x1  a2  c   6
x1  a2  d   0
x1  a2  e   0
x1  a2  f   9
x2  a2  a   89
x2  a2  b   4
x2  a2  c   29
x2  a2  d   1
x2  a2  e   4
x2  a2  f   17
x3  a2  a   6
x3  a2  b   16
x3  a2  c   9
x3  a2  d   1
x3  a2  e   0
x3  a2  f   0

I want to count the unique IDs that have "a", "b" or "c" >0 (more than zero), then "d" or "e" >0, and finally "f">0.
Then get the sum of all (abc), (de) and (f) separately in a different column.
So the result would look like the following:

df2<- structure(list(col1=c("a1","a1","a1","a2","a2","a2"), col2 = c("abc", "de", "f", "abc", "de", "f"), count.ID = c(2,3,1,3,2,2), total=c(25,89,5,213,6,26)),
                class = "data.frame", row.names = c(NA,-6L))
col1 col2 count.ID total
a1  abc 2   25
a1  de  3   89
a1  f   1   5
a2  abc 3   213
a2  de  2   6
a2  f   2   26

How is this possible in R?
Thanks

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

One premise would be to create a frame of grouping variables mapping old col2 to the new combined col2, and then merge/join it to the original data.

dplyr

library(dplyr)
groups <- data.frame(col2=c("a","b","c","d","e","f"), col2b=c("abc","abc","abc","de","de","f"))
left_join(df, groups, by = "col2") %>%
  group_by(col1, col2 = col2b) %>%
  summarize(count.ID = n(), total = sum(col3)) %>%
  ungroup()
# # A tibble: 6 x 4
#   col1  col2  count.ID total
#   <chr> <chr>    <int> <dbl>
# 1 a1    abc          9    25
# 2 a1    de           6    89
# 3 a1    f            3     5
# 4 a2    abc          9   213
# 5 a2    de           6     6
# 6 a2    f            3    26
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading