Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

str_extract all syntax

I need some help with stringr::str_extract_all

x is the name of my data frame.

V1
(A_K9B,A_K9one,A_K9two,B_U10J) 
x = x %>% 
  mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>% 
  mutate(N_.1 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[o][n][e]'), toString))
x = x %>% 
  mutate(N_.2 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[t][w][o]'), toString))

This is my current output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

V1                                N_alph  N_.1     N_.2
(A_K9B,A_K9one,A_K9two,B_U10J)   A_K9B   A_K9one  A_K9two 

I am fine with my column N_alph as is I want it separate from the other two. But Ideally I would like to avoid typing [o][n][e] and [t][w][o] for those variables that are followed by words rather than one alphabetical letter, if I use:

x = x %>% 
  mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>% 
  mutate(N_all.words = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[\\w+]'), toString))

Output is:

V1                                N_alph  N_all.words    
(A_K9B,A_K9one,A_K9two,B_U10J)   A_K9B   A_K9B,A_K9o,A_K9t 

Desired output would be

V1                                N_alph  N_all.words    
(A_K9B,A_K9one,A_K9two,B_U10J)   A_K9B   A_K9one,A_K9two 

>Solution :

When you use metacharacters like \w, \b, \s, etc., you don’t need the square brackets. But if you do use the square brackets than the + would need to be outside. Also, the number group should be [0-9] as we are talking about individual characters, not combinations of characters. To take into account numbers higher than 9 we just expand the amount of times we check for the group with {} brackets, or simply the + operator. The final result looks like so:

x %>% 
  mutate(N_all.words = str_extract_all(V1, 'A_([A-Z][0-9]{1,2})\\w+'))

Resulting to:

                              V1             N_all.words
1 (A_K9B,A_K9one,A_K9two,B_U10J) A_K9B, A_K9one, A_K9two

I also created a version that I found a little tidier:

x %>% 
  mutate(N_all.words = str_extract_all(V1, 'A_\\w\\d{1,2}\\w+'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading