tidyverse filter behaviour I dont expect (%in% doesnt work with pull() )

July 19, 2023

When I try to filter a dataframe using the %in% operator using a pull() subfilter, It does not work.
However, When I store the pull() subquery in a variable, and then use the %in% operator on the variable, It does work.

I use as an example the well known mtcars dataset.

library(tidyverse)
mydf <- tibble(mtcars)

Say I want all the observations who share cyl+am+vs
The following code does not work:

mydf |> filter(mpg %in% 
mydf |> filter(duplicated(paste0(cyl,am,vs))) |> pull(mpg)
)

Error:

Error in `filter()`:
ℹ In argument: `pull(...)`.
Caused by error in `UseMethod()`:
! no applicable method for 'filter' applied to an object of class "logical"

However, the same structure, using a variable work:

mpg_as_var <- mydf |> filter(duplicated(paste0(cyl,am,vs))) |> pull(mpg)
mydf |> filter(mpg %in% mpg_as_var)

I don’t want to just take the duplicates, but also the first duplicated observations. otherwise it would’ve been a simple filter(duplicated()) query

Got any ideas?

>Solution :

Use brackets around the vector you create, e.g.

      mydf |> filter(mpg %in% 
                       (mydf |> filter(duplicated(paste0(cyl,am,vs))) |> pull(mpg))
      )

# A tibble: 28 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4