Extract joined urls but not if redirect exists

I’m looking for a regex for extracting urls when they are not separated by a space or whatever, but keep the "redirect" ones a a complete url.

Let me show you an example:


should result in the following array:

['http://foo.bar', 'https://foo.baz', 'http://foo.bar?url=http://foo.baz']

I am able to separate urls joined thanks to this regex :


from this answer: Extract urls from string without spaces between

But I struggle to also extract the ones by keeping the =http


>Solution :

EDIT: for python

Use re.split and regex (?<!=)(?<!^)(?=https?://).

It will split on beginning of new url, unless this new url preceded by =, or first in line (to exclude redundunt split in the beginning of string)

>>> re.split(r'(?<!=)(?<!^)(?=https?://)', 'http://foo.barhttps://foo.bazhttp://foo.bar?url=http://foo.baz')
['http://foo.bar', 'https://foo.baz', 'http://foo.bar?url=http://foo.baz']

Demo and explanation at regex101.

Assuming (based on regex provided in question) you are using PHP:

Use preg_split and lookahead for https?:// and negative lookbehind with =|^ to exclude matching beginning of URL preceded by = and redundant split in the beginning of line.

$keywords = preg_split("~(?<!=|^)(?=https?://)~", "http://foo.barhttps://foo.bazhttp://foo.bar?url=http://foo.baz");


    [0] => http://foo.bar
    [1] => https://foo.baz
    [2] => http://foo.bar?url=http://foo.baz

Online demo here.

Demo and explanation at regex101.

Leave a Reply