Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter on combinations of collumns in dataframe (R)

Helllo,

I am trying to filter a dataframe in R using dplyr where I want to check on a combination of 2 columns. Lets say I have the following dataframe;

df <- data.frame(val1 = c("a","b","c","s","s","s"),
             val2 = c(10, 5, 2, 2, 10, 5))

resulting in the following df;

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  val1 val2
1    a   10
2    b    5
3    c    2
4    s    2
5    s   10
6    s    5

I want to filter out all rows where val1 == ‘s’ EXCEPT when val2 on the same row is 5 or 10. So the result should look like this;

  val1 val2
1    a   10
2    b    5
3    c    2
4    s   10
5    s    5

I want to use the filter function from dplyr because I am using it with other filters in a query on a oracle database. I tried the following lines, but they do not work for they either filter out all rows where val1 == "s" or all rows where val2 == 5 or val2 == 10 (so in this example, including the rows where val1 == "a" or val1 == "b";

filter(val1 == "s" & val2 == 10 | val2 == 5)
filter(val1 == "s" && val2 == 10 | val2 == 5)
filter(val1 == "s" & (val2 == 10 | val2 == 5))
filter(val1 == "s" && (val2 == 10 | val2 == 5))

I tried looking up similar questions but they do not seem to answer my problem.
I make it work by combining val1 and val2 as val3, and then I filter out all rows where val3 == "s5" or val3=="s10", but as I said, I intend to use the filter in a query to a database. Therefore, I would prefer to filter as much as possible in the query itself because my actual dataframe consists of hundreds of thousands of rows and 50 columns.

I do not think it would be hard, but I just can’t seem to get it. Any help will be greatly appreciated!

>Solution :

This works:

df %>% filter(!(val1 == 's' & !val2 %in% c(10, 5)))

Or equivalently:

df %>% filter((val1 != 's' | val2 %in% c(10, 5)))

output:

  val1 val2
1    a   10
2    b    5
3    c    2
4    s   10
5    s    5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading