Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use Regex to Separate Pairs of Title and URL

I copied a list of books and their URL from a website that becomes one string when pasted in a word doc and I’d like to separate each Title and URL on new lines:

Copied list:

Elementary Algebra https://amzn.to/3S7yG0Y Pre-Algebra https://amzn.to/3TpW8HK Discrete Mathematical Structures https://amzn.to/3eBYogb Discrete Mathematics and its Applications https://amzn.to/3TvfThe Discrete and Combinatorial Mathematics https://amzn.to/3CELUfO …etc…

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I figure the regex pattern can be something like:
any number of words (might have a hypthon or something) then http or https :// then a
mix of alphanumerics, forward slashes and periods and then a final space after the http
address to mark the location to split on.

I have this aweful looking pattern that is not capturing all the pairs:

\w*\s\w*\s\w*://\w*.\w*/\w*\s

enter image description here
https://regexr.com/

Also, in python it returns only the first match and I can’t figure out how to group it with () and then get all of them with *

>Solution :

Try (regex101):

import re

s = """Elementary Algebra https://amzn.to/3S7yG0Y Pre-Algebra https://amzn.to/3TpW8HK Discrete Mathematical Structures https://amzn.to/3eBYogb Discrete Mathematics and its Applications https://amzn.to/3TvfThe Discrete and Combinatorial Mathematics https://amzn.to/3CELUfO"""

pat = re.compile(r"\s*(.*?)\s+(https?://\S+)")

print(pat.findall(s))

Prints:

[
    ("Elementary Algebra", "https://amzn.to/3S7yG0Y"),
    ("Pre-Algebra", "https://amzn.to/3TpW8HK"),
    ("Discrete Mathematical Structures", "https://amzn.to/3eBYogb"),
    ("Discrete Mathematics and its Applications", "https://amzn.to/3TvfThe"),
    ("Discrete and Combinatorial Mathematics", "https://amzn.to/3CELUfO"),
]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading