Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Built-in isidentifier() function inconsistent results

I’m using built-in isidentifier() function to find Unicode chars allowed for variable names (I know about xid_start and xid_continue chars, don’t need explanation on that). The following program has certain inconsistency with it’s results on different systems. I’m very confused and interested about the reasoning.

chars = []

for char in range(0x110000):
    char = chr(char)
    if char.isidentifier() or ('a' + char).isidentifier():
        chars += [char]

print(len(chars))

Program results running in PyCharm gives me 134415, but running it on repl.it gives me 128770. My python version is 3.9.7, while repl’s is 3.8.12. Everything I was able to find was this isidentifier() documentation, which gives a hint at PEP 3131 standard which is used in Python 3. But both I and repl.it are using same major python version, it’s just minor version difference. Looking for function changelog also gives no results. Hope you will be able to help me resolve this issue!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

They’re using different versions of unicode data

Try adding to your script

import unicodedata

print(unicodedata.unidata_version)

For me, repl.it was using version 12.1.0 and my python 3.9.9 on mac 12.3 was using version 13.0.0

The pep you link to says that the characters depend on the DerivedCoreProperties.txt file thats in the unicode version used by python

Version 12.1.0
Version 13.0.0

The exact specification of what characters have the XID_Start or XID_Continue properties can be found in the DerivedCoreProperties file of the Unicode data in use by Python


This matches up to what the unicodedata module says in its docs.

When using python 3.8

The data contained in this database is compiled from the UCD version 12.1.0.

When using python 3.9

The data contained in this database is compiled from the UCD version 13.0.0.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading