Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex to match words on specific lines

Given the following example from a class definition in ObjectScript:

Include %sySystem
Include (%sySystem, %soap, %Net.WebSocket)

Class HS.Local.zimpl.fhirpro.UI.FileViewer Extends (HS.Local.zimpl.fhirpro.UI.Super, %CSP.Page)

I need to match the individual words behind "Include" and the pattern must not match on any other line. Matches must exclude any punctuation.

The regex will be used in Javascript.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My best efforts produced this:

(?<=^Include \(?)([%A-Za-z0-9.]+)|((?<=, )[%A-Za-z0-9.]+)

A positive lookbehind finds lines starting with "Include " and optionally an opening parenthesis, matching words which may contain a percent symbol or period.

In order to match further words I added the alternative OR and a second capturing group with a further lookbehind. This results in many other lines being matched though – essentially anything behind a comma and space.

>Solution :

The (?<=^Include \(?) lookbehind only applies to the first alternative in your regex pattern.

To make it apply to any "word" that matches your main [%A-Za-z0-9.]+ pattern you need to add .* inside it:

/(?<=^Include .*)[%A-Za-z0-9.]+/gm
/(?<=^Include .*)[%a-z0-9.]+/gmi

See the regex demo. I removed )? from the lookbehind because .* matches a ) symbol, too.

Note that if the "words" always start with %, you may use %[A-Za-z0-9.]+ instead of [%A-Za-z0-9.]+.

More details:

  • (?<=^Include .*) – a positive lookbehind that matches a location that is immediately preceded with Include at the start of the line (if you remove m flag, then only the start of the whole string) and then followed with a space and any zero or more chars other than line break chars after it
  • [%A-Za-z0-9.]+ – one or more ASCII letters, digits, and . and % chars.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading