Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter by group using column from another dataframe but keep grouping information in R

I have a large dataframe that I would like to filter by another dataframe column but keep the grouping information to inform the filtering. I want to filter by the ID column but use the year column to inform the grouping with the other dataframe. In my current example the group_by call does not work across dataframes since I am filtering by only one column. Keep in mind that I want this to be a more general expression that I can use on a dataset with hundreds of thousands of rows, so explicitly writing out all the potential filtering combinations of year and ID would not be practical. Here is my example:

df_filter<-data.frame(year=c(20,20,20,21,21,21),ID=seq(1:6))

df_all<-data.frame(year=c(20,20,20,21,21,21), ID=c(1,2,4,3,5,6))

filtered<-df_all %>% 
  group_by(year) %>% 
  filter(ID %in% df_filter$ID)

> filtered
   year    ID
1    20     1
2    20     2
3    20     4
4    21     3
5    21     5
6    21     6

***desired result***
   year    ID
1    20     1
2    20     2
3    21     5
4    21     6

Thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use dplyr::semi_join():

library(dplyr)

semi_join(df_all, df_filter)
# Joining with `by = join_by(year, ID)`
  year ID
1   20  1
2   20  2
3   21  5
4   21  6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading