Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex for URL without path

I know there are many solutions, articles and libraries for this case, but couldn’t find one to match my case. I’m trying to write a regex to extract a URL(which represent the website) from a text (a signature of a person in an email), and has multiple cases:

  • Could contain http(s):// , or not
  • Could contain www. , or not
  • Could have multiple TLD such as "test.com.cn"

Here are some examples:

www.test.com
https://test.com.cn
http://www.test.com.cn
test.com
test.com.cn

I’ve come up with the following regex:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(https?://)?(www\.)?\w{2,}\.[a-zA-Z]{2,}(\.[a-zA-Z]{2,})?$

But there are two main problems with this, because the signature can contain an email address:

  1. It (wrongly) capture the TLDs of emails like this one: name.surname@test2.com
  2. It doesn’t capture URLS in the middle of a line, and if I remove the $ sign at the end, it captures the name.surname part of the last example

For (1) I tried using negative lookbehind, adding this (?<!@) to the beginning, the problem is that now it captures est2.com instead of not matching it at all.

>Solution :

I think you could use \b (boundary) instead of $ (and at the beginning as well) and exclude @ in negative lookbehind and lookahead:

(?<!@|\.|-)\b(https?://)?(www\.)?\w{2,}\.[a-zA-Z]{2,}(\.[a-zA-Z]{2,})?\b(?!@|\.|-)

Edit: exclude the dot (and all non alphanumeric characters likely to occur in an URL/email address) in your lookarounds to avoid matching name.middlename in name.middlename.surname@test2.com or com.cn in name.surname@test2.com.cn. See this answer for the list of characters

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading