When I try to filter a dataframe using the %in% operator using a pull() subfilter, It does not work.
However, When I store the pull() subquery in a variable, and then use the %in% operator on the variable, It does work.
I use as an example the well known mtcars dataset.
library(tidyverse)
mydf <- tibble(mtcars)
Say I want all the observations who share cyl+am+vs
The following code does not work:
mydf |> filter(mpg %in%
mydf |> filter(duplicated(paste0(cyl,am,vs))) |> pull(mpg)
)
Error:
Error in `filter()`:
ℹ In argument: `pull(...)`.
Caused by error in `UseMethod()`:
! no applicable method for 'filter' applied to an object of class "logical"
However, the same structure, using a variable work:
mpg_as_var <- mydf |> filter(duplicated(paste0(cyl,am,vs))) |> pull(mpg)
mydf |> filter(mpg %in% mpg_as_var)
I don’t want to just take the duplicates, but also the first duplicated observations. otherwise it would’ve been a simple filter(duplicated()) query
Got any ideas?
>Solution :
Use brackets around the vector you create, e.g.
mydf |> filter(mpg %in%
(mydf |> filter(duplicated(paste0(cyl,am,vs))) |> pull(mpg))
)
# A tibble: 28 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
6 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4