Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Return only values if pattern is matched in gsub

I have a string containing bills in this form:

 bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")

I want to extract the weight of the products and I did so with:

gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$", "\\1", bills)
"250"       "0,5"       "3425 Milk"

This kind of works since it correctly returns 250 and 0,5 for the first two entries, but why does it return the whole third entry "3425 Milk"? I thought that by using "\\1" I would tell gsub to extract the first matching group, which here is (\\d*,*\\d+). Therefore, I would expect the last entry being a NA or an empty string. Thus this is my expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

expected <- c("250", "0,5", NA) # OR
expected <- c("250", "0,5", "")

>Solution :

You can add alteration to capture everything.

In case when your substitution string doesn’t introduce new symbols (only recombination of captured groups, like \\1 or \\1\\3\\2 for example), this will result in replacing input string with empty one:

gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$|.*", "\\1", bills)
# [1] "250"  "0,5" "" 

Also I’d change ,* to ,?, as I don’t believe your input will be valid if it contains something like 1,,,5g

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading