Does UTF-8 use the ESC byte?


Is there any Unicode codepoint that one of the bytes in its utf-8 representation is the ESC byte (0x1b)?

Context: The ESC byte is used in ANSI escape codes (in terminals) and I’d like to know whether that byte can appear as part of a utf-8 byte sequence.

>Solution :

No, all bytes in UTF-8 multibyte sequences have bit 7 set. Only the single-byte ASCII range 0-127 has bit 7 clear, and that includes 0x1B.

Bit patterns
1-byte: 0xxxxxxx
2-byte: 110xxxxx 10xxxxxx
3-byte: 1110xxxx 10xxxxxx 10xxxxxx
4-byte: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Leave a Reply Cancel reply