Opened 2 months ago

Closed 2 months ago

#19035 closed bug (fixed)

mbrtowc does not work on 4-bytes characters

Reported by: bhaible Owned by: nobody
Priority: high Milestone: R1/beta5
Component: System/POSIX Version: R1/beta5
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Unicode characters in the range U+10000 .. U+10FFFF are, in UTF-8, encoded as sequences of 4 bytes. In a UTF-8 locale, mbrtowc should be able to combine these bytes to a single character. POSIX specification: https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbrtowc.html

This does not work in Haiku hrev57823 (from 2024-07-15).

How to reproduce: Compile and run the attached test program.

gcc -Wall foo.c
./a.out

Expected output:

ret = 1, wc = 0x0001F403

Actual output:

ret = -1, wc = 0x0BADFACE
a.out: foo.c:92:main: ret == 1

In Gnulib, most of the Unicode support is based on mbrtowc. Due to this bug, 31 tests of the test suite fail.

Attachments (1)

foo.c (2.4 KB ) - added by bhaible 2 months ago.
test case foo.c

Download all attachments as: .zip

Change History (2)

by bhaible, 2 months ago

Attachment: foo.c added

test case foo.c

comment:1 by waddlesplash, 2 months ago

Milestone: UnscheduledR1/beta5
Resolution: fixed
Status: newclosed

Fixed in hrev58063 +beta5.

Note: See TracTickets for help on using tickets.