Home Regular expression for capturing all text starting at one pattern and ending at another

Questions

Regular expression for capturing all text starting at one pattern and ending at another

June 1, 2023

I am scraping text data off a pdf using python. There is a common pattern that contains the data I need that begins with a numerical pattern and ends with a string pattern. I need to capture all the text, including the patterns using a regular expression.

I have a regular expression that works when I import the data by going pdf to txt and reading the text in. When I use PyPDF2 to extract the text from the pdf pages, the regular expression fails.

The data stream looks like this

Filed: 8/21/2022\nEntered:  10/21/2022\nDischarged:  01/23/2023\nClosed: 01/30/2023\n17-55018-   \nQRTbk 7 Windows PC\n OS:xxx\nRole: AdminHubertson

The start point is the 17-55018- string which I have a regex that works:

[0-9]{2}-[0-9]{5}-

The end point is the Role: Admin which is unique enough to compile.

I have tried a number of capture methods using lookaheads to get the text I need. These methods I have tested on regex101 and they work but I cannot get them to work

Some patterns I have tried:

[0-9]{2}-[0-9]{5}-\s(\n(?!Role)(.*))*Role: Admin
[0-9]{2}-[0-9]{5}-\.(.*?)Role: Admin
[0-9]{2}-[0-9]{5}-.*(?=Role).*Role: Admin

>Solution :

Try this one:

\d{2}\-\d{5}.*?Role:\sAdmin

regex-lookarounds

byMR

Published June 01, 2023

Add a comment

How to get element of component without using listener such as onChange?

byMR

June 1, 2023

Questions

Find all unique single branch on a tree in python

byMR

June 1, 2023

Questions

How can i add new key and value in a existing list

byMR

June 1, 2023

Questions

pandas dataframe query not working with where

byMR

June 1, 2023

Questions

Getting error in the calculation in pandas

byMR

June 1, 2023

Questions

How to convert integer value to float in json column in MariaDB?

byMR

June 1, 2023

Regular expression for capturing all text starting at one pattern and ending at another

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to get element of component without using listener such as onChange?

Find all unique single branch on a tree in python

How can i add new key and value in a existing list

pandas dataframe query not working with where

Getting error in the calculation in pandas

How to convert integer value to float in json column in MariaDB?

Keep Up to Date with the Most Important News

Regular expression for capturing all text starting at one pattern and ending at another

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to get element of component without using listener such as onChange?

Find all unique single branch on a tree in python

How can i add new key and value in a existing list

pandas dataframe query not working with where

Getting error in the calculation in pandas

How to convert integer value to float in json column in MariaDB?

Discover more from Dev solutions