Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex capture group extracting elements of a list in a sentence

I have a list of sentences, with some that contain elements in sentence list form:

index sentence
0 You can get cars, trucks, planes, and boats.
1 You can get the car, truck, and plane.
2 You should ignore this sentence.

I only wish to extract elements from sentences that start with "You can get" or "You can get the" which I hope to extract using pandas extractall method, where I extract each individual element of the list in the sentences.

Desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

index match object
0 0 car
1 truck
2 plane
3 boat
1 0 car
1 truck
2 plane

I have three main questions:

  1. How to use look behinds (?<=[Y|y]ou can get ) so it won’t capture the
  2. How to include the look ahead \w+(?=s)? so that both plural and singular forms of the elements are captured
  3. Is it possible to write a capture group that also extracts each word as individual elements, or should I extract the list in the sentence first (e.g cars, trucks, planes, and boats) then run another regex?

>Solution :

What about using:

df.loc[df['sentence'].str.startswith('You can get '),
       'sentence'].str.extractall(r'(?P<object>\S+?)s?\b(?:,|.$)')

Output:

        object
  match       
0 0        car
  1      truck
  2      plane
  3       boat
1 0        car
  1      truck
  2      plane
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading