I’m trying to make a new column based on whether one column is a substring of another. Using if_else & grepl works with a constant, but not comparing two columns to each other.
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "second st", "third st"))
df <- df %>% dplyr::mutate(test = if_else(grepl("st", col2,fixed=TRUE),1,0)) # WORKS
df <- df %>% dplyr::mutate(test2 = if_else(grepl(col1, col2,fixed=TRUE),1,0)) # ERROR
Warning message:
Problem with `mutate()` column `test`.
i `test = if_else(grepl(col1, col2, fixed = TRUE), 1, 0)`.
i argument 'pattern' has length > 1 and only the first element will be used
>df
col1 col2 test test2
1 first street first street #6 1 1
2 second st second st 1 0 <--- should be 1
3 third st apt1 third st 1 0
Why can’t I use both the variable columns in the grepl? It works fine under the mutate, for instance test3 = paste(col1, col2) returns the expected result.
>Solution :
You could use rowwise() before the mutate or you could use str_detect() from stringr:
library(tidyverse)
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "2nd st", "third st"))
df <- df %>% rowwise() %>% dplyr::mutate(test2 = if_else(grepl(col1, col2,fixed=TRUE),1,0))
df
#> # A tibble: 3 × 3
#> # Rowwise:
#> col1 col2 test2
#> <chr> <chr> <dbl>
#> 1 first street first street #6 1
#> 2 second st 2nd st 0
#> 3 third st apt1 third st 0
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "2nd st", "third st"))
df <- df %>% dplyr::mutate(test2 = if_else(str_detect(col2, col1),1,0))
df
#> col1 col2 test2
#> 1 first street first street #6 1
#> 2 second st 2nd st 0
#> 3 third st apt1 third st 0
Created on 2022-02-01 by the reprex package (v2.0.1)