Remove version number (continuous integers and dots) from a string list

June 23, 2022

I have a string list contains program name and version numbers, for example:
[‘Adobe PDF Library 9.9’, ‘Adobe PDF Library 8.0’, ‘Adobe PDF Library 15.0’, ‘Adobe PDF Library 11.0’, ‘Mac OS X 10.13.3 Quartz PDFContext’].

I am doing statistical analysis and need to remove all the version numbers and only retain the program name. The version number may have multiple sections and dots, and may appear any any part of the string.

Is there an efficient way to achieve the goal using regex to match the pattern, but without using a regular for-loop to exam each item manually?

>Solution :

You can use re.sub() with a regex that matches a space followed by digits followed by zero or more . and digits:

import re

l = [
    'Adobe PDF Library 9.9', 
    'Adobe PDF Library 8.0', 
    'Adobe PDF Library 15.0', 
    'Adobe PDF Library 11.0', 
    'Mac OS X 10.13.3 Quartz PDFContext',
    'Notes',
    '100Things 10.2',
    'Photoshop 1'
]

rx = re.compile('\s\d+(\.\d+)*')

[rx.sub('', s) for s in l]

Which produces:

['Adobe PDF Library',
 'Adobe PDF Library',
 'Adobe PDF Library',
 'Adobe PDF Library',
 'Mac OS X Quartz PDFContext',
 'Notes',
 '100Things',
 'Photoshop']