UTF8ToCharCode and UTF8ToLength for 5 and 6 byte characters
|Reported by:||marcusoverhagen||Owned by:||axeld|
|Has a Patch:||no||Platform:||All|
I think UTF8ToCharCode and UTF8ToLength should be changed to support 5 and 6 byte character sequences.
RFC 3629 states:
In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets.
It does not limit UTF-8 to that range.
[...] the ISO/IEC 10646 description of UTF-8 allows encoding character numbers up to U+7FFFFFFF, yielding sequences of up to 6 bytes.