Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Does UTF-8 use the ESC byte?

Is there any Unicode codepoint that one of the bytes in its utf-8 representation is the ESC byte (0x1b)?

Context: The ESC byte is used in ANSI escape codes (in terminals) and I’d like to know whether that byte can appear as part of a utf-8 byte sequence.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

No, all bytes in UTF-8 multibyte sequences have bit 7 set. Only the single-byte ASCII range 0-127 has bit 7 clear, and that includes 0x1B.

Bit patterns
1-byte: 0xxxxxxx
2-byte: 110xxxxx 10xxxxxx
3-byte: 1110xxxx 10xxxxxx 10xxxxxx
4-byte: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading