Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove accent for specific strings R

I would like to remove the accent "é" in a large dataset, but only for the strings in the list.

Here below a small replicable example:

library(tidyverse)
library(stringr)
library(dplyr)
library(tidyr)
library(stringi)

data <- data.frame (territory  = c("Abbécourt", "Achéres", "Beaumé", "Belvezé", 
"Marré"))

# I create a list of string for which I want to remove the accent
strings<-c("Abbécourt","Achéres","Belvezé")
strings <- paste(paste0("^", strings[order(-nchar(strings))], "$"), collapse = "|")

What I do is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data <- data %>% dplyr::mutate(territory = gsub("é", "e", territory))

but of course the command removes all the "é" in the dataset.

I can’t find a way to have the following output:

territory
1 Abbecourt
2   Acheres
3    Beaumé
4   Belveze
5     Marré

Thank you very much for your help,
Best Regards,

>Solution :

data %>%
   mutate(territory = case_when(territory %in% strings ~ 
    str_replace_all(territory, "é", "e"), TRUE ~ territory))

-output

  territory
1 Abbecourt
2   Acheres
3    Beaumé
4   Belveze
5     Marré
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading