UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths.
Annex B UTF- 8
UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths. It has the clear advantage that the character addresses U 0000hex to U 007Fhex, corresponding ASCII (and ISO 646:1991) values 00hex to 7Fhex are represented by single octets of the same value. It is straightforward both to generate and parse and produces reasonable compaction.
Input and output of up to 21-bit Unicode 3 character addresses for all 1 114 112 characters on the 17 Code Planes 0 through 16 can be cumbersome in normal byte-oriented data systems. In Table B.1, the length of the binary data representation of characters to be encoded (ignoring leading zero bits) determines how many UTF-8 bytes are required.




