Golang – extract links using regex

Advertisements

Golang – extract links using regex

I need to get all links from text which are in specific domain example.de using Regex in Go

Below are all possible links that should be extracted:

https://example.de 
https://example.de/
https://example.de/home
https://example.de/home/
https://example.de/home some text that should not be extracted
https://abc.example.de
https://abc.example.de/
https://abc.example.de/home
https://abc.example.de/home
https://abc.example.de/home some text that should not be extracted

What I already tried

I used this website to check if my regex are correct: https://regex101.com/r/ohxUcG/2
and here are combinations that failed:

  • https?://*.+example.de*.+ failed on expression https://abc.example.de/a1b2c3 dsadsa getting whole text to the \n instead of https://abc.example.de/a1b2c3 without dsadsa
  • https?://*.+example.de*.+\s(\w+)$ this gets links that are terminated only with space but sometimes links can be terminated with \n or \t etc.

Resources which may be useful

>Solution :

You can use

(?:https?://)?(?:[^/.]+\.)*\bexample\.de\b(?:/[^/\s]+)*/?

See the regex demo. Details:

  • (?:https?://)? – an optional http:// or https:// string
  • (?:[^/.]+\.)* – zero or more sequences of one or more chars other than a / and . chars and then a . char
  • \bexample\.de\b – a whole word example.de
  • (?:/[^/\s]+)* – zero or more repetitions of / and then one or more chars other than whitespace and /
  • /? – an optional / char.

Leave a ReplyCancel reply