Home Regular Expression to Pull Information from String in Python

Questions

Regular Expression to Pull Information from String in Python

November 1, 2022

What I am trying to do is take my current string and remove all data from it that doesn’t contain the actual software version. Here is the string I am currently working with:

print (CurrentVersion)

Delivers the output:

2018, \\\\some\\directory\\is\\here, \\\\some\\directory\\is\\here,  2019, \\\\here\\is\\another\\directory, \\\\here\\is\\another\\directory,  2021, \\\\here\\is\\another\\path_2021,   2020, http://some.will/even/look/like/this,   2022r2,   2023

When what I really want is this for an output:

2018, 2019, 2020, 2021, 2022r2, 2023

What I have tried was to come up with a regular expression to remove the excess data. It looks like ‘[0-9, ]’ will pull out the numbers and commas getting me closer to my goal. So I came up with this code:

RegexVersion = re.compile(r'[0-9, ]')
CurrentVersion = RegexVersion.search(CurrentVersion)
print (CurrentVersion.group())

But this only prints out an output of "2". Based on a regex calculator it looked like it was going to be a little closer to my expected output. From there I was planning on using .replace to get rid of the extra commas and spaces, but I can’t seem to get that far.

So the question is, how do I go from the current output of "CurrentVersion" stripped down to only versions, preferably in numerical order?

>Solution :

You might use a capture group:

(?:^|,\s*)(\d{4}\w*)(?=,|$)

The pattern matches:

(?:^|,\s*) Match either the start of the string, or match a comma followed by optional whitespace chars
(\d{4}\w*) Capture at least 4 digits followed by optional word characters
(?=,|$) Assert either a comma or the end of the string to the right

See a regex demo

Example

import re
 
pattern = r"(?:^|,\s*)(\d{4}\w*)(?=,|$)"
 
s = ("2018, \\\\\\\\some\\\\directory\\\\is\\\\here, \\\\\\\\some\\\\directory\\\\is\\\\here,  2019, \\\\\\\\here\\\\is\\\\another\\\\directory, \\\\\\\\here\\\\is\\\\another\\\\directory,  2021, \\\\\\\\here\\\\is\\\\another\\\\path_2021,   2020, http://s...content-available-to-author-only...e.will/even/look/like/this,   2022r2,   2023\n")
 
print(re.findall(pattern, s))

Output

['2018', '2019', '2021', '2020', '2022r2', '2023']

Other options could be finding all the years that start with 20 and then optionally match r followed by 1 of more digits:

(?:^|,\s*)(20\d\d(?:r\d+)?)(?=,|$)

Regex demo

Or matching 4 digits followed by all except a comma:

(?:^|,\s*)(\d{4}[^,]*)

Regex demo

regex

byMR

Published November 01, 2022

Add a comment

Type 'IconType' is not assignable to type 'ReactNode'

byMR

November 1, 2022

Questions

How does #define carries the function name in c?

byMR

November 1, 2022

Questions

Failed to load URL because the scheme does not have a registered handler, using extension URL

byMR

November 1, 2022

Questions

Second highest value for each id

byMR

November 1, 2022

Questions

C# Forms – remove column header in DataGridView

byMR

November 1, 2022

Questions

loop data into object javascript and returning the value

byMR

November 1, 2022

Regular Expression to Pull Information from String in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Type 'IconType' is not assignable to type 'ReactNode'

How does #define carries the function name in c?

Failed to load URL because the scheme does not have a registered handler, using extension URL

Second highest value for each id

C# Forms – remove column header in DataGridView

loop data into object javascript and returning the value

Keep Up to Date with the Most Important News

Regular Expression to Pull Information from String in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Type 'IconType' is not assignable to type 'ReactNode'

How does #define carries the function name in c?

Failed to load URL because the scheme does not have a registered handler, using extension URL

Second highest value for each id

C# Forms – remove column header in DataGridView

loop data into object javascript and returning the value

Discover more from Dev solutions