Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr filter function not working to filter my dataframe in R

I have a dataframe in R with two columns. The datatype/class of the first column is "character". However there are numerics embedded within it … but I presumed these are still technically characters since when I run the function class(column_name) it returns "character".

I am trying to filter the dataframe using the dplyr filter function. I want the filter function to return the same dataframe, but without the rows where the column ‘doc_id’ contains "(2).txt" at the end.

I have been trying many things but none have worked.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried:

constitutions <- constitutions %>% filter(!str_detect(doc_id, "(2).txt"))

constitutions <- constitutions[constitutions$doc_id %in% "(2).txt == FALSE]

constitutions %>% filter(!str_detect(doc_id, "(2).txt"))

*Note: This one ^ seems to have gotten rid of only a few of them, but not close to all.

constitutions <- subset(constitutions, !"(2).txt" %in% doc_id)

constitutions <- subset(constitutions, !("(2).txt" %in% consitutions$doc_id))

And MANY more iterations … what am I missing?

P.S. An example of a doc_id column value I am trying to remove from the constitutions dataframe is:

Brazil_1988_rev_2017 (2).txt

Would using a regex within one of the functions above work? I am lost, and running out of ideas.
Any help would be much appreciated.

>Solution :

Does escaping the parenthesis and period like this solve the problem?

constitutions <- constitutions %>% filter(!str_detect(doc_id, "\\(2\\)\\.txt"))

Parenthesis and periods (and a bunch of other symbols) are all special symbols in regular expressions. To look for a literal parenthesis or period, you have to escape using backslashes. For example:

This works:

> "document(2).txt" %>% str_detect("\\(2\\)\\.txt")
[1] TRUE

This doesn’t:

> "document(2).txt" %>% str_detect("(2).txt")
[1] FALSE

Here’s a link to more about regular expressions. The whole chapter is useful, but here’s the section about escaping: https://r4ds.hadley.nz/regexps.html#sec-regexp-escaping

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading