Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract joined urls but not if redirect exists

I’m looking for a regex for extracting urls when they are not separated by a space or whatever, but keep the "redirect" ones a a complete url.

Let me show you an example:

http://foo.barhttps://foo.bazhttp://foo.bar?url=http://foo.baz

should result in the following array:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

['http://foo.bar', 'https://foo.baz', 'http://foo.bar?url=http://foo.baz']

I am able to separate urls joined thanks to this regex :

'~(?:https?:)?//.*?(?=$|(?:https?:)?//)~'

from this answer: Extract urls from string without spaces between

But I struggle to also extract the ones by keeping the =http

Thanks,

>Solution :

EDIT: for python

Use re.split and regex (?<!=)(?<!^)(?=https?://).

It will split on beginning of new url, unless this new url preceded by =, or first in line (to exclude redundunt split in the beginning of string)

>>> re.split(r'(?<!=)(?<!^)(?=https?://)', 'http://foo.barhttps://foo.bazhttp://foo.bar?url=http://foo.baz')
['http://foo.bar', 'https://foo.baz', 'http://foo.bar?url=http://foo.baz']

Demo and explanation at regex101.


Assuming (based on regex provided in question) you are using PHP:

Use preg_split and lookahead for https?:// and negative lookbehind with =|^ to exclude matching beginning of URL preceded by = and redundant split in the beginning of line.

<?php
$keywords = preg_split("~(?<!=|^)(?=https?://)~", "http://foo.barhttps://foo.bazhttp://foo.bar?url=http://foo.baz");
print_r($keywords);
?>

Outputs:

Array
(
    [0] => http://foo.bar
    [1] => https://foo.baz
    [2] => http://foo.bar?url=http://foo.baz
)

Online demo here.

Demo and explanation at regex101.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading