Ticket #838 (closed bug: invalid)
UTF8ToCharCode and UTF8ToLength for 5 and 6 byte characters
| Reported by: | marcusoverhagen | Owned by: | axeld |
|---|---|---|---|
| Priority: | normal | Milestone: | R1 |
| Component: | Kits/Interface Kit | Version: | |
| Keywords: | Cc: | ||
| Blocked By: | Platform: | All | |
| Blocking: |
Description
I think UTF8ToCharCode and UTF8ToLength should be changed to support 5 and 6 byte character sequences.
RFC 3629 states:
In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets.
It does not limit UTF-8 to that range.
also:
[...] the ISO/IEC 10646 description of UTF-8 allows encoding character numbers up to U+7FFFFFFF, yielding sequences of up to 6 bytes.
Change History
Note: See
TracTickets for help on using
tickets.
