I have a string containing bills in this form:
bills <- c("2940 Green apples 250g", "5435 Bananas 0,5kg", "3425 Milk")
I want to extract the weight of the products and I did so with:
gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$", "\\1", bills)
"250" "0,5" "3425 Milk"
This kind of works since it correctly returns 250 and 0,5 for the first two entries, but why does it return the whole third entry "3425 Milk"? I thought that by using "\\1" I would tell gsub to extract the first matching group, which here is (\\d*,*\\d+). Therefore, I would expect the last entry being a NA or an empty string. Thus this is my expected output:
expected <- c("250", "0,5", NA) # OR
expected <- c("250", "0,5", "")
>Solution :
You can add alteration to capture everything.
In case when your substitution string doesn’t introduce new symbols (only recombination of captured groups, like \\1 or \\1\\3\\2 for example), this will result in replacing input string with empty one:
gsub(".*\\s(\\d*,*\\d+)\\s*(g|kg)$|.*", "\\1", bills)
# [1] "250" "0,5" ""
Also I’d change ,* to ,?, as I don’t believe your input will be valid if it contains something like 1,,,5g