Filtering a dataframe on values in 2 columns

February 21, 2024

I wish to filter a dataframe based on the values of 2 columns, so in the example below if we see more than one location under a number/code I wish to pull out all of that information.

mynewdf
code        location
101         london
101         london
102         manchester
102         manchester
102         bolton
102         oldham
102         oldham
103         leeds
103         huddersfield
103         leeds
104         sheffield
104         sheffield
105         hull

So here im trying to pull out any codes where the code and location where the location is 2 of more seperate locations, what im trying to get is a new df looking like;

mynewdf
code        location
102         manchester
102         manchester
102         bolton
102         oldham
102         oldham
103         leeds
103         huddersfield
103         leeds

I thought about trying to do this by uniquing my df and looking for any duplicates of the code column but wasn’t very successful as I think duplicate doesn’t return all the values?

>Solution :

base R

mynewdf[with(mynewdf, ave(location, code, FUN = function(z) length(unique(z)) > 1) == "TRUE"),]
#    code     location
# 3   102   manchester
# 4   102   manchester
# 5   102       bolton
# 6   102       oldham
# 7   102       oldham
# 8   103        leeds
# 9   103 huddersfield
# 10  103        leeds

dplyr

library(dplyr)
mynewdf |>
  filter(.by = code, n_distinct(location) > 1)
#   code     location
# 1  102   manchester
# 2  102   manchester
# 3  102       bolton
# 4  102       oldham
# 5  102       oldham
# 6  103        leeds
# 7  103 huddersfield
# 8  103        leeds

This is using dplyr_1.1.0; if you have an older version, replace the .by=c(..) with group_by(..) before the verb.