Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why fountain character matches anchor character in regular expression inside range?

In PCRE2 I got a problem with regular expression /[⚓️]/. It has match for string "⛲️".

Demo

Unexpected behaviour persists only inside range.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Can somebody explain why is this happening?

By the way, PCRE1 works just fine: no matches.

>Solution :

The ⚓️ emoji is a sequence of two Unicode code points: \x{2693}\x{FE0F}. You can test it and see that \x{2693}\x{FE0F} regex matches ⚓️.

When you place the \x{2693}\x{FE0F} into a character class, you find a match in both ⛲️ (=\x{26F2}\x{FE0F}) and ⚓️ since both contain at least one of the Unicode code points.

As a workaround, place the emojis into a non-capturing group rather than a character class, e.g. (?:⚓️|[a-z0-9]) will match a ⚓️ or a lowercase ASCII letter/digit.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading