Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to remove numbers from string but keep specific groups of numbers in python

I want to use python regular expression to remove numbers from string from keep number 754 and 1231 as they are related to tax section code 754 and sec code 1231. For example, I have the text data below:

test="""Dividends 9672
Dividends 9680
Interest Income
Ordinary Dividends
Royalties
Capital Gain Distributions
Income from Blackstone
Ordinary Income
Rental Income
Long Term Capital Gain
Short Term Capital Gain
1231 Gain
Section 754 Stock Basis Adjustment - 2015
M-1 Section 754 Stock Basis Adjustment - 2015
Section 754 Stock Basis Adjustment - 2018
M-1 Section 754 Stock basis adjustment - 2018
"""

and I want the output to be:

Dividends
Dividends
Interest Income
Ordinary Dividends
Royalties
Capital Gain Distributions
Income from Blackstone
Ordinary Income
Rental Income
Long Term Capital Gain
Short Term Capital Gain
1231 Gain
Section 754 Stock Basis Adjustment
M- Section 754 Stock Basis Adjustment
Section 754 Stock Basis Adjustment
M- Section 754 Stock basis adjustment

my solution is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

test=re.sub(r'[^(754)(1231)A-Za-z]','',test)
print(test)

but it doesn’t look at 754 or 1231 as whole group and only removes digit 6,8,9.

I greatly appreciate any help. Thanks!

>Solution :

You can use

re.sub(r'(754|1231)|[^A-Za-z\s]', r'\1', text)

See the regex demo.

Here, (754|1231) matches and captures into Group 1 a 754 or 1231 digit sequences, and then |[^A-Za-z\s] matches any char other than an ASCII letter or any Unicode whitespace, and the matches are replaced with Group 1 value (i.e. what was captured remains in the string).

Note: if the numbers are to be matched as exact numbers use digit boundaries:

re.sub(r'(?<!\d)(754|1231)(?!\d)|[^A-Za-z\s]', r'\1', text)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading