I wish to filter a dataframe based on the values of 2 columns, so in the example below if we see more than one location under a number/code I wish to pull out all of that information.
mynewdf
code location
101 london
101 london
102 manchester
102 manchester
102 bolton
102 oldham
102 oldham
103 leeds
103 huddersfield
103 leeds
104 sheffield
104 sheffield
105 hull
So here im trying to pull out any codes where the code and location where the location is 2 of more seperate locations, what im trying to get is a new df looking like;
mynewdf
code location
102 manchester
102 manchester
102 bolton
102 oldham
102 oldham
103 leeds
103 huddersfield
103 leeds
I thought about trying to do this by uniquing my df and looking for any duplicates of the code column but wasn’t very successful as I think duplicate doesn’t return all the values?
>Solution :
base R
mynewdf[with(mynewdf, ave(location, code, FUN = function(z) length(unique(z)) > 1) == "TRUE"),]
# code location
# 3 102 manchester
# 4 102 manchester
# 5 102 bolton
# 6 102 oldham
# 7 102 oldham
# 8 103 leeds
# 9 103 huddersfield
# 10 103 leeds
dplyr
library(dplyr)
mynewdf |>
filter(.by = code, n_distinct(location) > 1)
# code location
# 1 102 manchester
# 2 102 manchester
# 3 102 bolton
# 4 102 oldham
# 5 102 oldham
# 6 103 leeds
# 7 103 huddersfield
# 8 103 leeds
This is using dplyr_1.1.0; if you have an older version, replace the .by=c(..) with group_by(..) before the verb.