Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

group_by and filter removes too many rows

Below is my example, let me explain what I am trying to do, although it is not working as I want it to.

I need to find all instances where there are 2+ unique values in column z on the same date for the same person. However, I need to find where a specific list of values in column z are present.

library(tidyverse)

x <- c("Person A","Person A","Person A","Person A","Person A","Person A")
y <- c("2022-01-01","2022-01-01","2022-01-20","2022-02-01","2022-02-01","2022-02-01")
z <- c("A","D","A","A","C","B")

df <- data.frame(x,y,z)
df

df %>% 
  group_by(x,y) %>% 
  mutate(unique_z = n_distinct(z)) %>% 
  # ungroup() %>% 
  filter(unique_z > 1,
         z %in% c("C","B"))

Below is an image of what I want the output to be, but I cannot figure it out.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Row 1 and 2 should be removed because even though it is 2 unique values of z on the same date for the same person, it does not include "C" or "B".

Row 3 is removed because it is only one unique value for that person and date.

Rows 4, 5, and 6 all should stay because that person, date combination has three unique values of z. Also, "C" and/or "B" occur in these rows. For some reason, row 4 is being removed every time. I to see the other values of z on that person, date combination. I thought grouping and filtering would do this, but it does not seem to the way I am doing it.

>Solution :

You need to use any to check for the presence of c("B", "C") within each group and not at each row; see below:

library(dplyr)

df %>% 
  group_by(x,y) %>% 
  mutate(unique_z = n_distinct(z)) %>% 
  filter(unique_z > 1,
         any(z %in% c("B","C")))
## any(z %in% c("C")) & any(z %in% c("B"))) 
## use this one instead if you want B and C present at the same time ...
## ... and two B's or two C's are not desired

# # A tibble: 3 x 4
# # Groups:   x, y [1]
#   x        y          z     unique_z
#   <fct>    <fct>      <fct>    <int>
# 1 Person A 2022-02-01 A            3
# 2 Person A 2022-02-01 C            3
# 3 Person A 2022-02-01 B            3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading