Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group by an overlapping category (a category should be grouped in 2 different other categories)

I have a dataset where there are groups ‘gr1’, ‘gr2’, and ‘both’. Basically, I’d like to group the ‘gr’ column by c("gr1", "both") and c("gr2", "both").

Here I’m proposing a solution with a simple data frame, but I’d like to know if there is a way to make ‘complex’ grouping such as group_by(gr ''using c("gr1", "both") and c("gr2", "both") as groups'' ). Is there a way to specify what to group together in dplyr instead of doing the rind like shown below?

library(tidyverse)
set.seed(1234)

df = data.frame(x = 1:10, id = sample(LETTERS[1:3], size = 10, replace = TRUE),
           gr = c(rep("gr1",3), rep("gr2",4),rep("both",3)))

df
    x id   gr
1   1  B  gr1
2   2  B  gr1
3   3  A  gr1
4   4  C  gr2
5   5  A  gr2
6   6  A  gr2
7   7  B  gr2
8   8  B both
9   9  C both
10 10  B both

sum.gr1 = df %>% 
  filter(gr %in% c("gr1", "both")) %>% 
  group_by(id) %>% 
  summarize(x.sum = sum(x)) %>% 
  mutate(gr.filt = "gr1.both")

sum.gr2 = df %>% 
  filter(gr %in% c("gr2", "both")) %>% 
  group_by(id) %>% 
  summarize(x.sum = sum(x))%>% 
  mutate(gr.filt = "gr2.both")

df.gr = rbind(sum.gr1, sum.gr2)
df.gr

# A tibble: 6 Ă— 3
id    x.sum gr.filt 
<chr> <int> <chr>   
1 A         3 gr1.both
2 B        21 gr1.both
3 C         9 gr1.both
4 A        11 gr2.both
5 B        25 gr2.both
6 C        13 gr2.both

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here I combine the "non-both" data with a version of the "both" data where each row has been copied to each of the "non-both" groups.

library(dplyr)
bind_rows(
  df |> filter(gr != "both"),
  df |> filter(gr == "both") |> select(-gr) |>
    tidyr::crossing(gr = unique(df$gr[df$gr != "both"]))
) |>
  count(gr = paste0(gr, ".both"), id, wt = x)

Result

        gr id  n
1 gr1.both  A  3
2 gr1.both  B 21
3 gr1.both  C  9
4 gr2.both  A 11
5 gr2.both  B 25
6 gr2.both  C 13
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading