Helllo,
I am trying to filter a dataframe in R using dplyr where I want to check on a combination of 2 columns. Lets say I have the following dataframe;
df <- data.frame(val1 = c("a","b","c","s","s","s"),
val2 = c(10, 5, 2, 2, 10, 5))
resulting in the following df;
val1 val2
1 a 10
2 b 5
3 c 2
4 s 2
5 s 10
6 s 5
I want to filter out all rows where val1 == ‘s’ EXCEPT when val2 on the same row is 5 or 10. So the result should look like this;
val1 val2
1 a 10
2 b 5
3 c 2
4 s 10
5 s 5
I want to use the filter function from dplyr because I am using it with other filters in a query on a oracle database. I tried the following lines, but they do not work for they either filter out all rows where val1 == "s" or all rows where val2 == 5 or val2 == 10 (so in this example, including the rows where val1 == "a" or val1 == "b";
filter(val1 == "s" & val2 == 10 | val2 == 5)
filter(val1 == "s" && val2 == 10 | val2 == 5)
filter(val1 == "s" & (val2 == 10 | val2 == 5))
filter(val1 == "s" && (val2 == 10 | val2 == 5))
I tried looking up similar questions but they do not seem to answer my problem.
I make it work by combining val1 and val2 as val3, and then I filter out all rows where val3 == "s5" or val3=="s10", but as I said, I intend to use the filter in a query to a database. Therefore, I would prefer to filter as much as possible in the query itself because my actual dataframe consists of hundreds of thousands of rows and 50 columns.
I do not think it would be hard, but I just can’t seem to get it. Any help will be greatly appreciated!
>Solution :
This works:
df %>% filter(!(val1 == 's' & !val2 %in% c(10, 5)))
Or equivalently:
df %>% filter((val1 != 's' | val2 %in% c(10, 5)))
output:
val1 val2
1 a 10
2 b 5
3 c 2
4 s 10
5 s 5