I have a large dataframe that I would like to filter by another dataframe column but keep the grouping information to inform the filtering. I want to filter by the ID column but use the year column to inform the grouping with the other dataframe. In my current example the group_by call does not work across dataframes since I am filtering by only one column. Keep in mind that I want this to be a more general expression that I can use on a dataset with hundreds of thousands of rows, so explicitly writing out all the potential filtering combinations of year and ID would not be practical. Here is my example:
df_filter<-data.frame(year=c(20,20,20,21,21,21),ID=seq(1:6))
df_all<-data.frame(year=c(20,20,20,21,21,21), ID=c(1,2,4,3,5,6))
filtered<-df_all %>%
group_by(year) %>%
filter(ID %in% df_filter$ID)
> filtered
year ID
1 20 1
2 20 2
3 20 4
4 21 3
5 21 5
6 21 6
***desired result***
year ID
1 20 1
2 20 2
3 21 5
4 21 6
Thanks!
>Solution :
Use dplyr::semi_join():
library(dplyr)
semi_join(df_all, df_filter)
# Joining with `by = join_by(year, ID)`
year ID
1 20 1
2 20 2
3 21 5
4 21 6