Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract uppercase words till the first lowercase letter

I need to extract the first part of a text, which is uppercase till the first letter lowercase.

For example, I have the text: "IV LONG TEXT HERE and now the Text End HERE"

I want to extract the "IV LONG TEXT HERE".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have been trying something like this:

text <- "IV LONG TEXT HERE and now the Text End HERE"

stringr::str_extract_all(text, "[A-Z]")

but I’m failing at the regex.

>Solution :

You could use str_extract, with a pattern to match a single uppercase char and optionally match spaces and uppercase chars ending with another uppercase char.

\b[A-Z](?:[A-Z ]*[A-Z])?\b

Explanation

  • \b[A-Z] A word boundary to prevent a partial word match, then match a single char A-Z
  • (?: Non capture group to match as a whole
    • [A-Z ]*[A-Z] Match optional chars A-Z or a space and match a char A-Z
  • )? Close the non capture group and make it optional
  • \b A word boundary

Example

text <- "IV LONG TEXT HERE and now the Text End HERE"

stringr::str_extract(text, "\\b[A-Z](?:[A-Z ]*[A-Z])?\\b")

Output

[1] "IV LONG TEXT HERE"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading