Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting with regex two conditions of some text

My code doesn’t work:

 regexp_substr('Lorem ipsum dolor sit amet. consectetur', '([^(.|()]+)|((.){0,9})')

The text should end with a dot, and if it does not have a dot, then it should have a maximum of 10 characters. Is it even possible to do this?

Two examples text:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. Lorem ipsum dolor sit amet. consectetur
  2. Donec quis turpis sed sapien ullamcorper viverra sodales a est

This is what it should look like

  1. Lorem ipsum dolor sit amet
  2. Donec quis

>Solution :

You can use a replacing approach here:

REGEXP_REPLACE('Lorem ipsum dolor sit amet. consectetur',
                      '^([^.]+)\..*|^(.{10}).*',
                      '\1\2') 

See this regex demo.

Details:

  • ^([^.]+)\..* – if a string has a dot, capture the text before it, and then match the dot with the rest of the string:
    • ^ – start of string
    • ([^.]+) – Group 1 (\1): any one or more chars other than .
    • \. – a dot
    • .* – any zero or more chars as many as possible
  • | – or
  • ^(.{10}).* – match and capture (into Group 2) the 10 chars ((.{10})) at the beginning of the string (^), then match the rest of the string.

The replacement is two backreferences to the captured values.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading