Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove version number (continuous integers and dots) from a string list

I have a string list contains program name and version numbers, for example:
[‘Adobe PDF Library 9.9’, ‘Adobe PDF Library 8.0’, ‘Adobe PDF Library 15.0’, ‘Adobe PDF Library 11.0’, ‘Mac OS X 10.13.3 Quartz PDFContext’].

I am doing statistical analysis and need to remove all the version numbers and only retain the program name. The version number may have multiple sections and dots, and may appear any any part of the string.

Is there an efficient way to achieve the goal using regex to match the pattern, but without using a regular for-loop to exam each item manually?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use re.sub() with a regex that matches a space followed by digits followed by zero or more . and digits:

import re

l = [
    'Adobe PDF Library 9.9', 
    'Adobe PDF Library 8.0', 
    'Adobe PDF Library 15.0', 
    'Adobe PDF Library 11.0', 
    'Mac OS X 10.13.3 Quartz PDFContext',
    'Notes',
    '100Things 10.2',
    'Photoshop 1'
]

rx = re.compile('\s\d+(\.\d+)*')

[rx.sub('', s) for s in l]

Which produces:

['Adobe PDF Library',
 'Adobe PDF Library',
 'Adobe PDF Library',
 'Adobe PDF Library',
 'Mac OS X Quartz PDFContext',
 'Notes',
 '100Things',
 'Photoshop']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading