import re
pat = re.compile(r"[\u20000-\u2A6D6]+")
pat.match("Hello World!")
This will give us a result
<re.Match object; span=(0, 5), match='Hello'>
But in fact, the input string here is fully ASCII which is not from the unicode range.
Is this expected? If so, how to compile those unicode range in practice?
>Solution :
The pattern currently describes a character class consisting of either \u2000, any character in the range 0-\u2A6D or 6.
For python character literals that are wider than 2 bytes, you need to use the escape sequence \U with 8 hex digits:
pat = re.compile(r"[\U00020000-\U0002A6D6]+")