Regex for URL without path

I know there are many solutions, articles and libraries for this case, but couldn’t find one to match my case. I’m trying to write a regex to extract a URL(which represent the website) from a text (a signature of a person in an email), and has multiple cases:

  • Could contain http(s):// , or not
  • Could contain www. , or not
  • Could have multiple TLD such as ""

Here are some examples:

I’ve come up with the following regex:


But there are two main problems with this, because the signature can contain an email address:

  1. It (wrongly) capture the TLDs of emails like this one:
  2. It doesn’t capture URLS in the middle of a line, and if I remove the $ sign at the end, it captures the name.surname part of the last example

For (1) I tried using negative lookbehind, adding this (?<!@) to the beginning, the problem is that now it captures instead of not matching it at all.

>Solution :

I think you could use \b (boundary) instead of $ (and at the beginning as well) and exclude @ in negative lookbehind and lookahead:


Edit: exclude the dot (and all non alphanumeric characters likely to occur in an URL/email address) in your lookarounds to avoid matching name.middlename in or in See this answer for the list of characters

Leave a Reply