Home Capturing the last group: everything when the first character appears

Questions

Capturing the last group: everything when the first character appears

February 2, 2023

I am trying to capture everything after and including the first non-digit character in the following text:

1         1,486,399.87    5              ORTIZ ASPHALT PAVING INC              909 386-1200  SB PREF CLAIMED
                                                                                                  00814766
                                                            P O BOX 883                       FAX 909 386-1288
                                                            COLTON CA  92324

For example, I would want regex to capture groups in a way that it matches: 1, 1,486,399.87, 5, and ORTIZ ASPHALT PAVING INC 909 386-1200 SB PREF CLAIMED 00814766 P O BOX 883 FAX 909 386-1288 COLTON CA 92324.

The code I have right now is:

# imports
import os
import pandas as pd
import re
import docx2txt
import textract
import antiword
import itertools

# text
t = "    1         1,486,399.87    5              ORTIZ ASPHALT PAVING INC              909 386-1200  SB PREF CLAIMED
                                                                                                  00814766
                                                            P O BOX 883                       FAX 909 386-1288
                                                            COLTON CA  92324"

tt = re.search(r"(\d+)\s+(\$?[+-]?\d{1,3}(\,\d{3})*%?(\.\d+)?)\s+(\d+)\s+(\S*)", t)

ttgroup = len(tt.groups())

print(tt[ttgroup])

It returns only ORTIZ. I suppose we need to improve the (S*) grouping towards the end of the pattern. Is there a way we could capture the entire ORTIZ ASPHALT PAVING INC 909 386-1200 SB PREF CLAIMED 00814766 P O BOX 883 FAX 909 386-1288 COLTON CA 92324 in the last group? Thank you so much!

>Solution :

I’d replace the last group, that is now (\S*), with (\S.*) since you want to capture the rest of the string. Also add the re.DOTALL flag since this is a multiline string:

tt = re.search(r"(\d+)\s+(\$?[+-]?\d{1,3}(\,\d{3})*%?(\.\d+)?)\s+(\d+)\s+(\S.*)", t, re.DOTALL)

regex

byMR

Published February 02, 2023

Add a comment

tidyr::pivot_wider() error ! Can't subset columns that don't exist

byMR

February 2, 2023

Questions

Make a navigation list panel taller in shiny

byMR

February 2, 2023

Questions

Change the background color of datatable cell based on condition

byMR

February 2, 2023

Questions

React cannot read properties of undefined props

byMR

February 2, 2023

Questions

"x.rsplit()" with multiple delimiters in Python

byMR

February 2, 2023

Questions

How can I dynamically assign variable names within a function in data.table in R?

byMR

February 2, 2023

Capturing the last group: everything when the first character appears

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

tidyr::pivot_wider() error ! Can't subset columns that don't exist

Make a navigation list panel taller in shiny

Change the background color of datatable cell based on condition

React cannot read properties of undefined props

"x.rsplit()" with multiple delimiters in Python

How can I dynamically assign variable names within a function in data.table in R?

Keep Up to Date with the Most Important News

Capturing the last group: everything when the first character appears

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

tidyr::pivot_wider() error ! Can't subset columns that don't exist

Make a navigation list panel taller in shiny

Change the background color of datatable cell based on condition

React cannot read properties of undefined props

"x.rsplit()" with multiple delimiters in Python

How can I dynamically assign variable names within a function in data.table in R?

Discover more from Dev solutions