I have a data.frame as follow:
df = data.frame(sp_name = c("Xylopia brasiliensis", "Xylosma tweediana", "Zanthoxylum fagara subsp. lentiscifolium", "Schinus terebinthifolia var. raddiana", "Eugenia"), value = c(1, 2, 3, 4, 5))
Here’s the deal: I am only interested in subsetting/filtering the rows from the df that contain exactly two words (in my case, Xylopia brasiliensis and Xylosma tweediana). How can I proceed? I’m failing miserably in using the filter function from tidyverse
Thanks already.
>Solution :
We can use str_count to create a logical vector in filter
library(dplyr)
library(stringr)
df %>%
filter(str_count(sp_name, "\\w+") == 2)
-output
sp_name value
1 Xylopia brasiliensis 1
2 Xylosma tweediana 2
Or this can be done with str_detect as well – match the word (\\w+) from the start (^) followed by a space and another word (\\w+) at the end ($) of the string
df %>%
filter(str_detect(sp_name, "^\\w+ \\w+$"))
Or in base R with grep
subset(df, grepl("^\\w+ \\w+$", sp_name))
sp_name value
1 Xylopia brasiliensis 1
2 Xylosma tweediana 2