Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

using re to grab all instances of values between parenthesis

I’m using python’s re module to grab all instances of values between the
opening and closing parenthesis.

i.e.  (A)way(Of)testing(This)

would produce a list:

   ['A', 'Of', 'This']

I took a look at 1 and 2.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is my code:

import re


sentence = "(A)way(Of)testing(This)is running (it)"

res = re.compile(r".*\(([a-zA-Z0-9|^)])\).*", re.S)
for s in re.findall(res, sentence):
    print(s)

What I get from this is:

it

Then I realized I was only capturing just one character, so I used

res = re.compile(r".*\(([a-zA-Z0-9-|^)]*)\).*", re.S)

But I still get it

I’ve always struggled with regex. My understanding of my search string
is as follows:

  • .* (any character)
  • \( (escapes the opening parenthesis)
  • ( (starts the grouping)
  • [a-zA-Z0-9-|^)]* (set of characters allowed : a-Z, A-Z, 0-9, – *EXCEPT the ")" )
  • ) (closes the grouping)
  • \) (escapes the closing parenthesis)
  • .* (anything else)

So in theory it should go through sentence and once it encounters a (,
it should copy the contents up until it encounters a ), at which point it should
store that into one group. It then proceeds through the sentence.

I even used the following:

  res = re.compile(r".*\(([a-z|A-Z|0-9|-|^)]*)\).*", re.S)

But it still returns an it.

Any help greatly appreciated,

Thanks

>Solution :

You can shorten the pattern without the .* and the ^ and ) and only use the character class.

The .* part matches any character, and as the part between parenthesis is only once in the pattern you will capture only 1 group.

In your explanation about this part [a-zA-Z0-9-|^)]* the character class does not rule out the ) using |^). It will just match either a | ^ or ) char.

If you want to use a negated character class, the ^ should be at the start of the character class like [^ but that is not necessary here as you can specify what do you want to match instead of what you don’t want to match.

\(([a-zA-Z0-9-]*)\)

The pattern matches:

  • \( Match (
  • ( Capture group 1
    • [a-zA-Z0-9-]* Optionally repeat matching one of the listed ranges a-zA-Z0-9 or -
  • ) Close group 1
  • \) Match )

regex demo

You don’t need the re.S as there is no dot in the pattern that should match a newline.

import re

sentence = "(A)way(Of)testing(This)is running (it)"
res = re.compile(r"\(([a-zA-Z0-9-]*)\)")
print(re.findall(res, sentence))

Output

['A', 'Of', 'This', 'it']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading