UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths.

Annex B UTF- 8


UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths. It has the clear advantage that the character addresses U 0000hex to U 007Fhex, corresponding ASCII (and ISO 646:1991) values 00hex to 7Fhex are represented by single octets of the same value. It is straightforward both to generate and parse and produces reasonable compaction.


Input and output of up to 21-bit Unicode 3 character addresses for all 1 114 112 characters on the 17 Code Planes 0 through 16 can be cumbersome in normal byte-oriented data systems. In Table B.1, the length of the binary data representation of characters to be encoded (ignoring leading zero bits) determines how many UTF-8 bytes are required.


Table B.1: UTF- 8 byte sequences for Unicode character addresses


Data type and length


Unicode address

(binary format)


1st Byte


2nd Byte


3rd Byte


4th Byte


Up to 7-bits, encoded as 7-bit ASCII or ISO 646


00000000 0xxxxxxx


0xxxxxxxx








8 to 11 bits


00000yyy yyxxxxxx


110yyyyy

文章整理:西部数码--专业提供域名注册虚拟主机服务
http://www.west263.com
以上信息与文章正文是不可分割的一部分,如果您要转载本文章,请保留以上信息,谢谢!