Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I split a string to extract only uppercase string or uppercase followed by float?

I am using Selenium with Python to scrape some file information. I would like to extract only the file type and version number if available eg. GML 3.1.1. I’m looking for the split function to do so. My current response is a list that looks like this:

ESRI Shapefile, (50.7 kB)
GML 3.1.1, (124.9 kB)
Google Earth KML 2.1, (126.5 kB)
MapInfo MIF, (53.5 kB)

The script section is as follows:

for file in files:
    file_format = file.text
    print(file_format)

I’m looking for the strip() function that checks if the word before the comma is uppercase or uppercase followed by float. The following is the output I’m looking for:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ESRI
GML 3.1.1
KML 2.1
MIF

>Solution :

Using a regex that finds words of all uppercase letters followed optionally by a space and digits / dots would work here:

s='''ESRI Shapefile, (50.7 kB)
GML 3.1.1, (124.9 kB)
Google Earth KML 2.1, (126.5 kB)
MapInfo MIF, (53.5 kB)'''

import re

re.findall(r'\b[A-Z]+\b(?:\s[\d\.]+)?', s)
['ESRI', 'GML 3.1.1', 'KML 2.1', 'MIF']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading