Opened 4 months ago
Closed 4 months ago
#19035 closed bug (fixed)
mbrtowc does not work on 4-bytes characters
Reported by: | bhaible | Owned by: | nobody |
---|---|---|---|
Priority: | high | Milestone: | R1/beta5 |
Component: | System/POSIX | Version: | R1/beta5 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
Unicode characters in the range U+10000 .. U+10FFFF are, in UTF-8, encoded as sequences of 4 bytes. In a UTF-8 locale, mbrtowc should be able to combine these bytes to a single character. POSIX specification: https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbrtowc.html
This does not work in Haiku hrev57823 (from 2024-07-15).
How to reproduce: Compile and run the attached test program.
gcc -Wall foo.c ./a.out
Expected output:
ret = 1, wc = 0x0001F403
Actual output:
ret = -1, wc = 0x0BADFACE a.out: foo.c:92:main: ret == 1
In Gnulib, most of the Unicode support is based on mbrtowc. Due to this bug, 31 tests of the test suite fail.
Attachments (1)
Change History (2)
by , 4 months ago
comment:1 by , 4 months ago
Milestone: | Unscheduled → R1/beta5 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed in hrev58063 +beta5.
Note:
See TracTickets
for help on using tickets.
test case foo.c