Opened 17 years ago

Closed 16 years ago

#2041 closed bug (fixed)

In some cases, __mbrtowc can enter in an infinite loop

Reported by: oco Owned by: axeld
Priority: normal Milestone: R1
Component: System/libroot.so Version: R1/pre-alpha1
Keywords: Cc: mjw
Blocked By: Blocking:
Platform: All

Description

When compiling current ruby source tree, miniruby (a reduced version of ruby used in the build process) eat all the CPU when initializing his regex engine.

The kernel debugger show this call stack :

mbrtowc mblen ruby_re_mbcinit ruby_incpush init_load_path ruby_process_options ruby_options main

I will investigate more to identify in which special case this infinite loop occur.

Here is the current steps needed to reproduce the problem :

The second file is an evil "workaround" based on the previous "dummy" implementation. This enable a complete build of ruby under haiku. If you are not sure about the "evilness", look at the file size...

Attachments (3)

ruby_on_haiku_06_04_2008.diff (61.6 KB ) - added by oco 17 years ago.
ugly_patch_to_compile_ruby.diff (666 bytes ) - added by oco 17 years ago.
mbrtowc_test.c (2.7 KB ) - added by mjw 17 years ago.
A test case that can be made to fail or succeed.

Download all attachments as: .zip

Change History (13)

comment:1 by jackburton, 17 years ago

The wide char support is broken in our glibc (see #1855). Looks like we'll also need glibc's locale support (see #1881). I tried to have a look but couldn't achieve anything useful, especially because we're using mixed versions of glibc. Maybe I should revert the preliminary changes I've done to mbrtowc(), since they're probably causing this bug.

comment:2 by axeld, 17 years ago

Maybe the BSDs have an implementation that is less involved with other code? One of the biggest problems in glibc code is its poor modularity IMO.

in reply to:  2 comment:3 by jackburton, 17 years ago

Replying to axeld:

Maybe the BSDs have an implementation that is less involved with other code?

I don't think we can use the bsd code just for wide chars stuff, since in glibc it's all integrated (libio, locale, stdlib, etc). For example: bsd's fwread() won't be able to operate on glibc's FILE structs. So it's either we drop glibc completely or take it as a whole.

One of the biggest problems in glibc code is its poor modularity IMO.

Yeah, I've noticed while trying to fix this issue.

comment:4 by mjw, 17 years ago

I also found this bug when trying to port file, and I think I have found out what is causing it. I'm compiling the program under Haiku in qemu.

glibc expects wchar_t to be 4 bytes wide, however, the current build of GCC ends up doing: "typedef short unsigned int wchar_t", meaning that wchar_t is only 2 bytes wide.

The line in question is in: /boot/develop/tools/gnupro/lib/gcc-lib/i586-pc-haiku-080323/include/stddef.h, line 251. I _think_ that WCHAR_TYPE is defined somewhere inside the compiler - there is an interesting note about it on line 88 of stddef.h.

I have attached a noddy test program. The test will fail by default. If you uncomment the 4 lines from line 37, then it will work (albeit with a warning).

Hope that helps.

by mjw, 17 years ago

Attachment: mbrtowc_test.c added

A test case that can be made to fail or succeed.

comment:5 by mjw, 17 years ago

Cc: mjw added

BTW: Proof that glibc expects whcar_t to be 4 bytes:

  1. When the test app runs on Linux, it reports that sizeof(whcar_t) is 4 bytes, when run on Haiku, it reports that sizeof(wchar_t) is 2 bytes.
  2. src/system/libroot/posix/glibc/iconv/gconv_simple.c, line 802, sets MIN_NEEDED_TO to 4 (which, I'm assuming, means 4 bytes).

comment:6 by kaliber, 17 years ago

Due to bug(s) in mbrtowc() 16 tests from sed-4.1.5 are failing.

comment:7 by scottmc, 16 years ago

Using the wchar fix to build sed-4.2.1 I now get this when I run make check:

~/sed-4.2.1> make check
Making check in lib
make[1]: Entering directory `/boot/home/sed-4.2.1/lib'
make  check-recursive
make[2]: Entering directory `/boot/home/sed-4.2.1/lib'
make[3]: Entering directory `/boot/home/sed-4.2.1/lib'
make[3]: Nothing to be done for `check-am'.
make[3]: Leaving directory `/boot/home/sed-4.2.1/lib'
make[2]: Leaving directory `/boot/home/sed-4.2.1/lib'
make[1]: Leaving directory `/boot/home/sed-4.2.1/lib'
Making check in po
make[1]: Entering directory `/boot/home/sed-4.2.1/po'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/boot/home/sed-4.2.1/po'
Making check in sed
make[1]: Entering directory `/boot/home/sed-4.2.1/sed'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/boot/home/sed-4.2.1/sed'
Making check in doc
make[1]: Entering directory `/boot/home/sed-4.2.1/doc'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/boot/home/sed-4.2.1/doc'
Making check in testsuite
make[1]: Entering directory `/boot/home/sed-4.2.1/testsuite'
make  
make[2]: Entering directory `/boot/home/sed-4.2.1/testsuite'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
make  check-TESTS
make[2]: Entering directory `/boot/home/sed-4.2.1/testsuite'
PASS: appquit
PASS: enable
PASS: sep
PASS: inclib
PASS: 8bit
PASS: newjis
PASS: xabcx
PASS: dollar
PASS: noeol
PASS: noeolw
PASS: modulo
PASS: numsub
PASS: numsub2
PASS: numsub3
PASS: numsub4
PASS: numsub5
PASS: 0range
PASS: bkslashes
PASS: head
PASS: madding
PASS: mac-mf
PASS: empty
PASS: xbxcx
PASS: xbxcx3
PASS: recall
PASS: recall2
PASS: xemacs
PASS: fasts
PASS: uniq
PASS: manis
PASS: khadafy
PASS: linecnt
PASS: eval
PASS: distrib
PASS: 8to7
PASS: y-bracket
PASS: y-newline
PASS: allsub
PASS: cv-vars
PASS: classes
PASS: middle
PASS: bsd
PASS: stdin
PASS: flipcase
PASS: insens
PASS: subwrite
PASS: writeout
PASS: readin
PASS: insert
make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite'
echo "LANG=ru_RU.UTF-8" \
          " ../sed/sed -f ./utf8-1.sed" \
          "< ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out"; \
        LANG=ru_RU.UTF-8 \
           ../sed/sed -f ./utf8-1.sed \
                < ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out; \
        cmp ./utf8-1.good utf8-1.out && exit 0; \
        cmp ./utf8-1.inp utf8-1.out || exit 1; \
        locale > utf8-1.info 2>/dev/null || { rm utf8-1.info; :>utf8-1.skip; exit 77; }; \
        . utf8-1.info; rm utf8-1.info; \
        case "$LC_CTYPE" in \
          *UTF-8 | *UTF8 | *utf8 | *utf-8) \
            echo " ../sed/sed -f ./utf8-1.sed" \
              " < ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out"; \
             ../sed/sed -f ./utf8-1.sed \
                < ./utf8-1.inp | LC_ALL=C tr -d \\r > utf8-1.out; \
            cmp ./utf8-1.good utf8-1.out && exit 0; \
            cmp ./utf8-1.inp utf8-1.out || exit 1 ;; \
          *) ;; \
        esac; \
        :>utf8-1.skip; exit 77
LANG=ru_RU.UTF-8  ../sed/sed -f ./utf8-1.sed < ./utf8-1.inp | LC_ALL=C tr -d \r > utf8-1.out
./utf8-1.good utf8-1.out differ: char 1, line 1
./utf8-1.inp utf8-1.out differ: char 1, line 1
make[3]: *** [utf8-1] Error 1
make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
XFAIL: utf8-1
make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite'
echo "LANG=ru_RU.UTF-8" \
          " ../sed/sed -f ./utf8-2.sed" \
          "< ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out"; \
        LANG=ru_RU.UTF-8 \
           ../sed/sed -f ./utf8-2.sed \
                < ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out; \
        cmp ./utf8-2.good utf8-2.out && exit 0; \
        cmp ./utf8-2.inp utf8-2.out || exit 1; \
        locale > utf8-2.info 2>/dev/null || { rm utf8-2.info; :>utf8-2.skip; exit 77; }; \
        . utf8-2.info; rm utf8-2.info; \
        case "$LC_CTYPE" in \
          *UTF-8 | *UTF8 | *utf8 | *utf-8) \
            echo " ../sed/sed -f ./utf8-2.sed" \
              " < ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out"; \
             ../sed/sed -f ./utf8-2.sed \
                < ./utf8-2.inp | LC_ALL=C tr -d \\r > utf8-2.out; \
            cmp ./utf8-2.good utf8-2.out && exit 0; \
            cmp ./utf8-2.inp utf8-2.out || exit 1 ;; \
          *) ;; \
        esac; \
        :>utf8-2.skip; exit 77
LANG=ru_RU.UTF-8  ../sed/sed -f ./utf8-2.sed < ./utf8-2.inp | LC_ALL=C tr -d \r > utf8-2.out
./utf8-2.good utf8-2.out differ: char 1, line 1
./utf8-2.inp utf8-2.out differ: char 1, line 1
make[3]: *** [utf8-2] Error 1
make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
XFAIL: utf8-2
make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite'
echo "LANG=ru_RU.UTF-8" \
          " ../sed/sed -f ./utf8-3.sed" \
          "< ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out"; \
        LANG=ru_RU.UTF-8 \
           ../sed/sed -f ./utf8-3.sed \
                < ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out; \
        cmp ./utf8-3.good utf8-3.out && exit 0; \
        cmp ./utf8-3.inp utf8-3.out || exit 1; \
        locale > utf8-3.info 2>/dev/null || { rm utf8-3.info; :>utf8-3.skip; exit 77; }; \
        . utf8-3.info; rm utf8-3.info; \
        case "$LC_CTYPE" in \
          *UTF-8 | *UTF8 | *utf8 | *utf-8) \
            echo " ../sed/sed -f ./utf8-3.sed" \
              " < ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out"; \
             ../sed/sed -f ./utf8-3.sed \
                < ./utf8-3.inp | LC_ALL=C tr -d \\r > utf8-3.out; \
            cmp ./utf8-3.good utf8-3.out && exit 0; \
            cmp ./utf8-3.inp utf8-3.out || exit 1 ;; \
          *) ;; \
        esac; \
        :>utf8-3.skip; exit 77
LANG=ru_RU.UTF-8  ../sed/sed -f ./utf8-3.sed < ./utf8-3.inp | LC_ALL=C tr -d \r > utf8-3.out
./utf8-3.good utf8-3.out differ: char 1, line 1
./utf8-3.inp utf8-3.out differ: char 1, line 1
make[3]: *** [utf8-3] Error 1
make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
XFAIL: utf8-3
make[3]: Entering directory `/boot/home/sed-4.2.1/testsuite'
echo "LANG=ru_RU.UTF-8" \
          " ../sed/sed -f ./utf8-4.sed" \
          "< ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out"; \
        LANG=ru_RU.UTF-8 \
           ../sed/sed -f ./utf8-4.sed \
                < ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out; \
        cmp ./utf8-4.good utf8-4.out && exit 0; \
        cmp ./utf8-4.inp utf8-4.out || exit 1; \
        locale > utf8-4.info 2>/dev/null || { rm utf8-4.info; :>utf8-4.skip; exit 77; }; \
        . utf8-4.info; rm utf8-4.info; \
        case "$LC_CTYPE" in \
          *UTF-8 | *UTF8 | *utf8 | *utf-8) \
            echo " ../sed/sed -f ./utf8-4.sed" \
              " < ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out"; \
             ../sed/sed -f ./utf8-4.sed \
                < ./utf8-4.inp | LC_ALL=C tr -d \\r > utf8-4.out; \
            cmp ./utf8-4.good utf8-4.out && exit 0; \
            cmp ./utf8-4.inp utf8-4.out || exit 1 ;; \
          *) ;; \
        esac; \
        :>utf8-4.skip; exit 77
LANG=ru_RU.UTF-8  ../sed/sed -f ./utf8-4.sed < ./utf8-4.inp | LC_ALL=C tr -d \r > utf8-4.out
./utf8-4.good utf8-4.out differ: char 1, line 1
./utf8-4.inp utf8-4.out differ: char 1, line 1
make[3]: *** [utf8-4] Error 1
make[3]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
XFAIL: utf8-4
PASS: badenc
PASS: inplace-hold
PASS: brackets
PASS: help
PASS: version
PASS: file
PASS: quiet
PASS: factor
PASS: binary3
PASS: binary2
PASS: binary
PASS: dc
======================================================
All 65 tests behaved as expected (4 expected failures)
======================================================
make[2]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
make[1]: Leaving directory `/boot/home/sed-4.2.1/testsuite'
make[1]: Entering directory `/boot/home/sed-4.2.1'
make[1]: Leaving directory `/boot/home/sed-4.2.1'
~/sed-4.2.1> 

So it appears that the wchar fix will take care of the reported failures with sed.

comment:8 by oco, 16 years ago

The 32bit-wchar_t branch (hrev31395) also fix the problem while building ruby 1.8.

comment:9 by scottmc, 16 years ago

This ticket can be closed now that the wchar32 fix has been moved to trunk. The attached test case gives this output:

~> mbrtowc_test
4, 4
mjw... in while 1
mjw... in while 1
mjw... in while 1
mjw... in while 1
mjw... in while 1
mjw... in while 1
mjw... in while 1
mjw... in while 1
mjw... in while 1
9
~> 

comment:10 by zooey, 16 years ago

Resolution: fixed
Status: newclosed

closing, since the problem has been fixed by the wchar_t changes.

oco and scottmc: thanks for the feedback!

Note: See TracTickets for help on using tickets.