Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Matching multiple unicode characters in Golang Regexp

As a simplified example, I want to get ^⬛+$ matched against ⬛⬛⬛ to yield a find match of ⬛⬛⬛.

    r := regexp.MustCompile("^⬛+$")
    matches := r.FindString("⬛️⬛️⬛️")
    fmt.Println(matches)

But it doesn’t match successfully even though this would work with regular ASCII characters.

I’m guessing there’s something I don’t know about Unicode matching, but I haven’t found any decent explanation in documentation yet.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Can someone explain the problem?

Go Play

>Solution :

The regular expression matches a string containing one or more ⬛ (black square box).

The subject string is three pairs of black square box and variation selector-16. The variation selectors are invisible (on my terminal) and prevent a match.

Fix by removing the variation selectors from the subject string or adding the variation selector to the pattern.

Here’s the first fix: https://go.dev/play/p/oKIVnkC7TZ1

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading