Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filtering a dataframe on values in 2 columns

I wish to filter a dataframe based on the values of 2 columns, so in the example below if we see more than one location under a number/code I wish to pull out all of that information.

mynewdf
code        location
101         london
101         london
102         manchester
102         manchester
102         bolton
102         oldham
102         oldham
103         leeds
103         huddersfield
103         leeds
104         sheffield
104         sheffield
105         hull

So here im trying to pull out any codes where the code and location where the location is 2 of more seperate locations, what im trying to get is a new df looking like;

mynewdf
code        location
102         manchester
102         manchester
102         bolton
102         oldham
102         oldham
103         leeds
103         huddersfield
103         leeds

I thought about trying to do this by uniquing my df and looking for any duplicates of the code column but wasn’t very successful as I think duplicate doesn’t return all the values?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

base R

mynewdf[with(mynewdf, ave(location, code, FUN = function(z) length(unique(z)) > 1) == "TRUE"),]
#    code     location
# 3   102   manchester
# 4   102   manchester
# 5   102       bolton
# 6   102       oldham
# 7   102       oldham
# 8   103        leeds
# 9   103 huddersfield
# 10  103        leeds

dplyr

library(dplyr)
mynewdf |>
  filter(.by = code, n_distinct(location) > 1)
#   code     location
# 1  102   manchester
# 2  102   manchester
# 3  102       bolton
# 4  102       oldham
# 5  102       oldham
# 6  103        leeds
# 7  103 huddersfield
# 8  103        leeds

This is using dplyr_1.1.0; if you have an older version, replace the .by=c(..) with group_by(..) before the verb.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading