Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Does ECMAScript Regular Expressions match its syntax characters?

I’m referring to ECMAScript regular expression syntax defined in https://tc39.es/ecma262/#sec-regexp-regular-expression-objects.

I checked how the following pattern matches in a regular expression via several online sources.

Pattern: /[[]]/

They all included that

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[ -> Match any character in the character set
  [ -> Matches a `[` character 
]

] -> Matches a `]` character

I get how the character set is matched, but I don’t understand why the last closing bracket(]) is matched. Isn’t it a syntax error in the regular expression since a PatternCharacter can only be a SourceCharacter that is not a SyntaxCharacter according to the syntax defined ECMAScript specification (https://tc39.es/ecma262/#prod-PatternCharacter)? The closing bracket(]) is a SyntaxCharacter.

PatternCharacter ::
       SourceCharacter but not SyntaxCharacter

>Solution :

In the annex B Additional ECMAScript Features for Web Browsers, section B.1.2 Regular Expressions Patterns of the same specification, it says:

The syntax of 22.2.1 is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.

And there we find that a Term can be an ExtendedAtom, which in turn can be an ExtendedPatternCharacter, which is defined as:

SourceCharacter but not one of ^ $ \ . * + ? ( ) [ |

So here ] is allowed.

The annex B of this specification is introduced with:

The ECMAScript language syntax and semantics defined in this annex are required when the ECMAScript host is a web browser. The content of this annex is normative but optional if the ECMAScript host is not a web browser.

It is interesting that this "extra" behaviour is also provided in NodeJS, even though it would not have had to according to this specification.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading