Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I extract ISO and ASTM standards from a text using regex?

I would like to extract the ISO and ASTM standards from a text. The corresponding literals ISO and ASTM followed by the numbers would have to be found.

Rules:

  • Match starts with ISO or ASTM
  • ASTM is followed by a D
  • This is followed by a number (either preceded or not with a space or hyphen) that can also contain optional spaces and hyphens
  • As soon as the number sequence ends, the match ends

Possible pattern for the first two rules:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(?:ISO|ASTM\s*D)

Example:

ISO 527-1, DIN EN ISO 3349-3, and ASTM D143 are all testing standards. ISO 31 33, ISO 334 9 are specific to static bending, but ASTM D 149-3 includes various other 9.

https://regex101.com/r/IFlqT2/1

What would a corresponding regex look like?

>Solution :

You can use

(?:ISO|ASTM\s*D)(?:[\s-]*\d)+

Details:

  • (?:ISO|ASTM\s*D)ISO or ASTM + zero or more whitespaces + D
  • (?:[\s-]*\d)+ – one or more repetitions of
    • [\s-]* – zero or more whitespaces or hyphens
    • \d – a digit.

See the regex demo.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading