I wanted to make a tool to parse atomic elements from a formula
so say I started with Ba(Co3Ti)2 + 3BrH20 I would first want to parse each compound in the formula, which is easy enough with let regions = str.replace(/\s/g, '').split(/\+/g);
Now for each compound, I want to identify each element and its numerical "amount"
so for the example above, for the first compound, Id want an array like this:
[
"Ba",
[
"Co3",
"Ti"
],
"2"
]
and if finding sub-compounds within parenthesis isnt possible, then I could work with this:
[
"Ba",
"(Co3",
"Ti)",
"2"
]
Is this possible with regex?
This is what I’ve come up with in a few minutes..
let compounds = str.replace(/\s/g, '').split(/\+/g);
for (var r = 0; r < compounds.length; ++r) {
let elements = compounds[r]
}
>Solution :
You can use
str.match(/\(?(?:[A-Z][a-z]*\d*|\d+)\)?/g)
See the regex demo. Details:
\(?– an optional((?:[A-Z][a-z]*\d*|\d+)– either of the two options:[A-Z][a-z]*\d*– an uppercase letter, then zero or more lowercase letters and then zero or more digits|– or\d+– one or more digits
\)?– an optional).
See a JavaScript demo:
const str = 'Ba(Co3Ti)2';
const re = /\(?(?:[A-Z][a-z]*\d*|\d+)\)?/g;
let compounds = str.match(re);
console.log(compounds);