Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex – display / match strings containing one or more ++ at the end of string

I have a text file which includes different packages (name, id, current version, new version, source) extracted from winget (winget upgrade) (I removed the first two lines and the last line)

Content of the text file:

Brave                        Brave.Brave         111.1.49.120         111.1.49.128        winget
Git                          Git.Git             2.39.2               2.40.0              winget
Notepad++ (64-bit x64)       Notepad++.Notepad++ 8.5                  8.5.1               winget
Spotify                      Spotify.Spotify     1.2.7.1277.g2b3ce637 1.2.8.907.g36fbeacc winget
Teams Machine-Wide Installer Microsoft.Teams     1.5.0.30767          1.6.00.4472         winget
PDFsam Basic                 PDFsam.PDFsam       5.0.3.0              5.1.1.0             winget

I am trying to use Python3 to filter out all package ids, cause the output of winget upgrade is just text based.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I have tried so far:

import re

with open(r"C:\Users\Username\Desktop\winget_upgrade.txt", "r") as f:
    for line in f:
        match = re.search(r"\b([a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\.[a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\+*)\b", line)
        if match:
            print(match.group(1))

The output is:

Brave.Brave
Git.Git
Notepad++.Notepad
Spotify.Spotify
Microsoft.Teams
PDFsam.PDFsam

The problem here is that the package notepad is missing two + characters at the end.
How can I edit my regex syntax to successfully display:

notepad++.notepad++ instead of notepad++.notepad

I think I must change something at the + filter: ()+\-.]*\+*)

But I am not sure what.
Can you help me?

>Solution :

Problem is caused by \b, as transition from + to space is not word boundary.

Use lookahead (?=\s) instead:

import re

lines = [
'Brave                        Brave.Brave         111.1.49.120         111.1.49.128        winget',
'Git                          Git.Git             2.39.2               2.40.0              winget',
'Notepad++ (64-bit x64)       Notepad++.Notepad++ 8.5                  8.5.1               winget',
'Spotify                      Spotify.Spotify     1.2.7.1277.g2b3ce637 1.2.8.907.g36fbeacc winget',
'Teams Machine-Wide Installer Microsoft.Teams     1.5.0.30767          1.6.00.4472         winget',
'PDFsam Basic                 PDFsam.PDFsam       5.0.3.0              5.1.1.0             winget',
    ]

for line in lines:
    match = re.search(r"\b([a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\.[a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\+*)(?=\s)", line)
    if match:
        print(match.group(1))

Output:

Brave.Brave
Git.Git
Notepad++.Notepad++
Spotify.Spotify
Microsoft.Teams
PDFsam.PDFsam
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading