Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reading and Writing special characters in Python

Python ver. 3.11.5 on Windows 10

I have a directory filled with .gz text archives. To scan these archives, I use the following python code:

    with gzip.open(logDir+"\\"+fileName, mode="rb") as archive:
        for filename in archive:
            print(filename.decode().strip())

All used to work, however, the new system adds lines similar to this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

:§f Press [§bJ§f]

Python gives me this error:

File "C:\Users\Me\Documents\Python\ConvertLog.py", line 16, in readZIP print(filename.decode().strip())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 49: invalid start byte

Anyone know a way of dealing with strange characters that pop up? I can’t just ignore the line. This happens to be one of the few lines I need to strip out and write to a condensed report.

I tried other modes, besides "rb". I really have no idea what else to try.

>Solution :

You can use different options for how to handle errors and using decode() a bit differently, which you can read more about in the documentation.

In decode, you case specify errors='strict', errors='ignore', or errors='replace'. If unspecified, strict is the default, and will throw an error when it finds itself in a situation like yours. ignore will simply ignore the invalid characters. replace replaces the character with a "suitable replacement character."

So, one way this might be implemented could be:

import gzip

with gzip.open(logDir + "\\" + fileName, mode="rb") as archive:
    for line in archive:
        decoded_line = line.decode('utf-8', errors='ignore').strip()
        print(decoded_line)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading