Opened 17 years ago
Closed 16 years ago
#2041 closed bug (fixed)
In some cases, __mbrtowc can enter in an infinite loop
Reported by: | oco | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/libroot.so | Version: | R1/pre-alpha1 |
Keywords: | Cc: | mjw | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
When compiling current ruby source tree, miniruby (a reduced version of ruby used in the build process) eat all the CPU when initializing his regex engine.
The kernel debugger show this call stack :
mbrtowc mblen ruby_re_mbcinit ruby_incpush init_load_path ruby_process_options ruby_options main
I will investigate more to identify in which special case this infinite loop occur.
Here is the current steps needed to reproduce the problem :
- get latest ruby source tree : svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8
- apply the patch ruby_on_haiku_06_04_2008.diff (see attached files)
- run autoconf
- ./configure
- make
The second file is an evil "workaround" based on the previous "dummy" implementation. This enable a complete build of ruby under haiku. If you are not sure about the "evilness", look at the file size...
Attachments (3)
Change History (13)
by , 17 years ago
Attachment: | ruby_on_haiku_06_04_2008.diff added |
---|
by , 17 years ago
Attachment: | ugly_patch_to_compile_ruby.diff added |
---|
comment:1 by , 17 years ago
follow-up: 3 comment:2 by , 17 years ago
Maybe the BSDs have an implementation that is less involved with other code? One of the biggest problems in glibc code is its poor modularity IMO.
comment:3 by , 17 years ago
Replying to axeld:
Maybe the BSDs have an implementation that is less involved with other code?
I don't think we can use the bsd code just for wide chars stuff, since in glibc it's all integrated (libio, locale, stdlib, etc). For example: bsd's fwread() won't be able to operate on glibc's FILE structs. So it's either we drop glibc completely or take it as a whole.
One of the biggest problems in glibc code is its poor modularity IMO.
Yeah, I've noticed while trying to fix this issue.
comment:4 by , 17 years ago
I also found this bug when trying to port file, and I think I have found out what is causing it. I'm compiling the program under Haiku in qemu.
glibc expects wchar_t to be 4 bytes wide, however, the current build of GCC ends up doing: "typedef short unsigned int wchar_t", meaning that wchar_t is only 2 bytes wide.
The line in question is in: /boot/develop/tools/gnupro/lib/gcc-lib/i586-pc-haiku-080323/include/stddef.h, line 251. I _think_ that WCHAR_TYPE is defined somewhere inside the compiler - there is an interesting note about it on line 88 of stddef.h.
I have attached a noddy test program. The test will fail by default. If you uncomment the 4 lines from line 37, then it will work (albeit with a warning).
Hope that helps.
by , 17 years ago
Attachment: | mbrtowc_test.c added |
---|
A test case that can be made to fail or succeed.
comment:5 by , 17 years ago
Cc: | added |
---|
BTW: Proof that glibc expects whcar_t to be 4 bytes:
- When the test app runs on Linux, it reports that sizeof(whcar_t) is 4 bytes, when run on Haiku, it reports that sizeof(wchar_t) is 2 bytes.
- src/system/libroot/posix/glibc/iconv/gconv_simple.c, line 802, sets MIN_NEEDED_TO to 4 (which, I'm assuming, means 4 bytes).
comment:7 by , 16 years ago
Using the wchar fix to build sed-4.2.1 I now get this when I run make check:
~/sed-4.2.1> make check Making check in lib make[1]: Entering directory `/boot/home/sed-4.2.1/lib' make check-recursive make[2]: Entering directory `/boot/home/sed-4.2.1/lib' make[3]: Entering directory `/boot/home/sed-4.2.1/lib' make[3]: Nothing to be done for `check-am'. make[3]: Leaving directory `/boot/home/sed-4.2.1/lib' make[2]: Leaving directory `/boot/home/sed-4.2.1/lib' make[1]: Leaving directory `/boot/home/sed-4.2.1/lib' Making check in po make[1]: Entering directory `/boot/home/sed-4.2.1/po' make[1]: Nothing to be done for `check'. make[1]: Leaving directory `/boot/home/sed-4.2.1/po' Making check in sed make[1]: Entering directory `/boot/home/sed-4.2.1/sed' make[1]: Nothing to be done for `check'. make[1]: Leaving directory `/boot/home/sed-4.2.1/sed' Making check in doc make[1]: Entering directory `/boot/home/sed-4.2.1/doc' make[1]: Nothing to be done for `check'. make[1]: Leaving directory `/boot/home/sed-4.2.1/doc' Making check in testsuite make[1]: Entering directory `/boot/home/sed-4.2.1/testsuite' make make[2]: Entering directory `/boot/home/sed-4.2.1/testsuite' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/boot/home/sed-4.2.1/testsuite' make check-TESTS make[2]: Entering directory `/boot/home/sed-4.2.1/testsuite' PASS: appquit PASS: enable PASS: sep PASS: inclib PASS: 8bit PASS: newjis PASS: xabcx PASS: dollar PASS: noeol PASS: noeolw PASS: modulo PASS: numsub PASS: numsub2 PASS: numsub3 PASS: numsub4 PASS: numsub5 PASS: 0range PASS: bkslashes PASS: head PASS: madding PASS: mac-mf PASS: empty PASS: xbxcx PASS: xbxcx3 PASS: recall PASS: recall2 PASS: xemacs PASS: fasts PASS: uniq PASS: manis PASS: khadafy PASS: linecnt PASS: eval PASS: distrib PASS: 8to7 PASS: y-bracket PASS: y-newline PASS: allsub PASS: cv-vars PASS: classes PASS: middle PASS: bsd PASS: stdin PASS: flipcase PASS: insens PASS: subwrite PASS: writeout PASS: readin PASS: insert make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite' echo "LANG=ru_RU.UTF-8" \ " ../sed/sed -f ./utf8-1.sed" \ "< ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out"; \ LANG=ru_RU.UTF-8 \ ../sed/sed -f ./utf8-1.sed \ < ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out; \ cmp ./utf8-1.good utf8-1.out && exit 0; \ cmp ./utf8-1.inp utf8-1.out || exit 1; \ locale > utf8-1.info 2>/dev/null || { rm utf8-1.info; :>utf8-1.skip; exit 77; }; \ . utf8-1.info; rm utf8-1.info; \ case "$LC_CTYPE" in \ *UTF-8 | *UTF8 | *utf8 | *utf-8) \ echo " ../sed/sed -f ./utf8-1.sed" \ " < ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out"; \ ../sed/sed -f ./utf8-1.sed \ < ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out; \ cmp ./utf8-1.good utf8-1.out && exit 0; \ cmp ./utf8-1.inp utf8-1.out || exit 1 ;; \ *) ;; \ esac; \ :>utf8-1.skip; exit 77 LANG=ru_RU.UTF-8 ../sed/sed -f ./utf8-1.sed < ./utf8-1.inp | LC_ALL=C tr -d \r > utf8-1.out ./utf8-1.good utf8-1.out differ: char 1, line 1 ./utf8-1.inp utf8-1.out differ: char 1, line 1 make[3]: *** [utf8-1] Error 1 make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite' XFAIL: utf8-1 make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite' echo "LANG=ru_RU.UTF-8" \ " ../sed/sed -f ./utf8-2.sed" \ "< ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out"; \ LANG=ru_RU.UTF-8 \ ../sed/sed -f ./utf8-2.sed \ < ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out; \ cmp ./utf8-2.good utf8-2.out && exit 0; \ cmp ./utf8-2.inp utf8-2.out || exit 1; \ locale > utf8-2.info 2>/dev/null || { rm utf8-2.info; :>utf8-2.skip; exit 77; }; \ . utf8-2.info; rm utf8-2.info; \ case "$LC_CTYPE" in \ *UTF-8 | *UTF8 | *utf8 | *utf-8) \ echo " ../sed/sed -f ./utf8-2.sed" \ " < ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out"; \ ../sed/sed -f ./utf8-2.sed \ < ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out; \ cmp ./utf8-2.good utf8-2.out && exit 0; \ cmp ./utf8-2.inp utf8-2.out || exit 1 ;; \ *) ;; \ esac; \ :>utf8-2.skip; exit 77 LANG=ru_RU.UTF-8 ../sed/sed -f ./utf8-2.sed < ./utf8-2.inp | LC_ALL=C tr -d \r > utf8-2.out ./utf8-2.good utf8-2.out differ: char 1, line 1 ./utf8-2.inp utf8-2.out differ: char 1, line 1 make[3]: *** [utf8-2] Error 1 make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite' XFAIL: utf8-2 make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite' echo "LANG=ru_RU.UTF-8" \ " ../sed/sed -f ./utf8-3.sed" \ "< ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out"; \ LANG=ru_RU.UTF-8 \ ../sed/sed -f ./utf8-3.sed \ < ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out; \ cmp ./utf8-3.good utf8-3.out && exit 0; \ cmp ./utf8-3.inp utf8-3.out || exit 1; \ locale > utf8-3.info 2>/dev/null || { rm utf8-3.info; :>utf8-3.skip; exit 77; }; \ . utf8-3.info; rm utf8-3.info; \ case "$LC_CTYPE" in \ *UTF-8 | *UTF8 | *utf8 | *utf-8) \ echo " ../sed/sed -f ./utf8-3.sed" \ " < ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out"; \ ../sed/sed -f ./utf8-3.sed \ < ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out; \ cmp ./utf8-3.good utf8-3.out && exit 0; \ cmp ./utf8-3.inp utf8-3.out || exit 1 ;; \ *) ;; \ esac; \ :>utf8-3.skip; exit 77 LANG=ru_RU.UTF-8 ../sed/sed -f ./utf8-3.sed < ./utf8-3.inp | LC_ALL=C tr -d \r > utf8-3.out ./utf8-3.good utf8-3.out differ: char 1, line 1 ./utf8-3.inp utf8-3.out differ: char 1, line 1 make[3]: *** [utf8-3] Error 1 make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite' XFAIL: utf8-3 make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite' echo "LANG=ru_RU.UTF-8" \ " ../sed/sed -f ./utf8-4.sed" \ "< ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out"; \ LANG=ru_RU.UTF-8 \ ../sed/sed -f ./utf8-4.sed \ < ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out; \ cmp ./utf8-4.good utf8-4.out && exit 0; \ cmp ./utf8-4.inp utf8-4.out || exit 1; \ locale > utf8-4.info 2>/dev/null || { rm utf8-4.info; :>utf8-4.skip; exit 77; }; \ . utf8-4.info; rm utf8-4.info; \ case "$LC_CTYPE" in \ *UTF-8 | *UTF8 | *utf8 | *utf-8) \ echo " ../sed/sed -f ./utf8-4.sed" \ " < ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out"; \ ../sed/sed -f ./utf8-4.sed \ < ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out; \ cmp ./utf8-4.good utf8-4.out && exit 0; \ cmp ./utf8-4.inp utf8-4.out || exit 1 ;; \ *) ;; \ esac; \ :>utf8-4.skip; exit 77 LANG=ru_RU.UTF-8 ../sed/sed -f ./utf8-4.sed < ./utf8-4.inp | LC_ALL=C tr -d \r > utf8-4.out ./utf8-4.good utf8-4.out differ: char 1, line 1 ./utf8-4.inp utf8-4.out differ: char 1, line 1 make[3]: *** [utf8-4] Error 1 make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite' XFAIL: utf8-4 PASS: badenc PASS: inplace-hold PASS: brackets PASS: help PASS: version PASS: file PASS: quiet PASS: factor PASS: binary3 PASS: binary2 PASS: binary PASS: dc ====================================================== All 65 tests behaved as expected (4 expected failures) ====================================================== make[2]: Leaving directory `/boot/home/sed-4.2.1/testsuite' make[1]: Leaving directory `/boot/home/sed-4.2.1/testsuite' make[1]: Entering directory `/boot/home/sed-4.2.1' make[1]: Leaving directory `/boot/home/sed-4.2.1' ~/sed-4.2.1>
So it appears that the wchar fix will take care of the reported failures with sed.
comment:8 by , 16 years ago
The 32bit-wchar_t branch (hrev31395) also fix the problem while building ruby 1.8.
comment:9 by , 16 years ago
This ticket can be closed now that the wchar32 fix has been moved to trunk. The attached test case gives this output:
~> mbrtowc_test 4, 4 mjw... in while 1 mjw... in while 1 mjw... in while 1 mjw... in while 1 mjw... in while 1 mjw... in while 1 mjw... in while 1 mjw... in while 1 mjw... in while 1 9 ~>
comment:10 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
closing, since the problem has been fixed by the wchar_t changes.
oco and scottmc: thanks for the feedback!
The wide char support is broken in our glibc (see #1855). Looks like we'll also need glibc's locale support (see #1881). I tried to have a look but couldn't achieve anything useful, especially because we're using mixed versions of glibc. Maybe I should revert the preliminary changes I've done to mbrtowc(), since they're probably causing this bug.