In to need to get a substring from the following elements of a R data frame column. In detail, I need to take the substring located before the first number or the first open brachet (‘[‘). Even the trailing space should be removed.
[1] "Arturo Beniamo 29 10 2015.docx"
[2] "Arturo Beniamo [30 12 2015].docx"
[3] "Dominici Leonardo 02 06 2019.docx"
[4] "Didonna Marco 07 09 2023.docx"
This should be the result:
[1] "Arturo Beniamo"
[2] "Arturo Beniamo"
[3] "Dominici Leonardo"
[4] "Didonna Marco"
>Solution :
You may use :
x <- c("Arturo Beniamo 29 10 2015.docx", "Arturo Beniamo [30 12 2015].docx" ,
"Dominici Leonardo 02 06 2019.docx", "Didonna Marco 07 09 2023.docx")
sub('\\s(\\d|\\[).*', '', x)
#[1] "Arturo Beniamo" "Arturo Beniamo" "Dominici Leonardo" "Didonna Marco"
This removes a whitespace (\\s) followed by either a number (\\d) or an opening square bracket ([).