I have this character ‘³’ in my dataset that I’m processing on top of.
Generic Idea is to detect if a character is an integer, convert it into an integer and process on top of it.
>>> x = '³'
>>> x.isdigit() # Returns True
True
Python detects this character as a digit. But raises the following error when I try to convert it
>>> int(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '³'
I would like it if such characters could also be converted to integer, to ease my further processing
Not sure if this helps, but here is my locale info
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'UTF-8')
>Solution :
You can use unicodedata and NFKC to convert it
here is a detailed code with some error handling
import unicodedata
x = '³'
try:
regular_digit = unicodedata.normalize('NFKC', x)
integer_value = int(regular_digit)
print(integer_value)
except ValueError:
print(f"'{x}' is not a convertible superscript digit.")