Opened 9 years ago

Closed 9 years ago

#5321 closed bug (invalid)

vm_soft_fault while compiling ruby, reproducible.

Reported by: kallisti5 Owned by: axeld
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc: Jens.Arm@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev35267

KERN: vm_soft_fault: va 0x0 not covered by area in address space KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x0, write 0, user 1, thread 0xe7 KERN: vm_page_fault: thread "miniruby" (231) in team "miniruby" (231) tried to read address 0x0, ip 0x0 ("???" +0x0)

easily reproduced by checking out ruby main branch via svn, running configure, running make

Attachments (1)

strace.out (45.9 KB) - added by kallisti5 9 years ago.
strace -T of miniruby seg

Download all attachments as: .zip

Change History (25)

comment:1 Changed 9 years ago by kallisti5

KERN: vm_soft_fault: va 0x0 not covered by area in address space
KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x0, write 0, user 1, thread 0xe7
KERN: vm_page_fault: thread "miniruby" (231) in team "miniruby" (231) tried to read address 0x0, ip 0x0 ("???" +0x0)

also..

http://redmine.ruby-lang.org/issues/show/2641

comment:2 Changed 9 years ago by jahaiku

Cc: Jens.Arm@… added

comment:3 Changed 9 years ago by bonefish

As always, please attach at least a stack trace.

comment:4 Changed 9 years ago by kallisti5

no can do. GDB never gets called and there is no kernel panic.

Is there a preferred way to create a stack trace in this scenario? I think I already tried attaching gdb but my attempts were unsuccessful.

comment:5 Changed 9 years ago by bonefish

The easiest way is probably to find the disable_debugger() call in the miniruby sources and comment it out.

comment:6 Changed 9 years ago by kallisti5

with my initial quick searching I don't see where Ruby is disabling Haiku's debugger.

I am going to try building this on R1A1 this afternoon and see if the same thing occurs, this should rule out the recent vm* changes in the kernel.

comment:7 Changed 9 years ago by kallisti5

ok.. weird but good news. compiling ruby (same code) on R1A1 brings up the gdb crash window at the same step that was failing in hrev35267.

GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-pc-haiku"...
[tcsetpgrp failed in terminal_inferior: Invalid Argument]
Thread 8823 caused an exception: Segment violation
Reading symbols from /boot/system/runtime_loader...done.
Loaded symbols for /boot/system/runtime_loader
Reading symbols from /boot/system/lib/gcc4/libroot.so...done.
Loaded symbols for /boot/system/lib/gcc4/libroot.so
[tcsetpgrp failed in terminal_inferior: Invalid Argument]
[Switching to team ./miniruby -I./lib -I.ext/common -I./- -r./ext/purelib.rb ./too (8822) thread pthread func (8823)]
0x002471ee in report_bug (file=0xffffffff <Address 0xffffffff out of bounds>, 
    line=-1, fmt=0x365d05 "%s: %s (%s)", args=0x70004f38 "S\2068")
    at error.c:216
216     {
(gdb) bt
#0  0x002471ee in report_bug (
    file=0xffffffff <Address 0xffffffff out of bounds>, line=-1, 
    fmt=0x365d05 "%s: %s (%s)", args=0x70004f38 "S\2068") at error.c:216
#1  0x00247392 in rb_bug (fmt=0x365d05 "%s: %s (%s)") at error.c:245
#2  0x0024797b in rb_bug_errno (mesg=0x388653 "thread_timer/timedwait", 
    errno_arg=3704403) at error.c:259
#3  0x0035ed6f in thread_timer (dummy=0x0) at thread_pthread.c:769
#4  0x004020fb in pthread_thread_entry () from /boot/system/lib/gcc4/libroot.so
#5  0x70004fec in ?? ()

comment:8 Changed 9 years ago by bonefish

So far that doesn't look like a Haiku problem. Unless you've anything that suggests otherwise, I would close the ticket.

Regarding the missing debugger invocation, another (I guess even more likely) reason would be that a SIGSEGV handler is installed. Normally the handler would do something useful (like handling the error or notifying the user), though.

comment:9 Changed 9 years ago by kallisti5

http://dev.haiku-os.org/browser/haiku/trunk/src/system/libroot/posix/pthread/pthread_cond.c

Haiku's pthread_cond_timedwait returns 3704403, but the value is not ETIMEDOUT nor EINTR.

http://www.opengroup.org/onlinepubs/000095399/functions/pthread_cond_timedwait.html states that possible return values are zero (with success) or an error number shall be returned to indicate the error.

Any ideas what error 3704403 points to?

comment:10 in reply to:  9 Changed 9 years ago by bonefish

Replying to kallisti5:

Any ideas what error 3704403 points to?

A bug I would say. Haiku's error codes are negative numbers, so this is definitely not a valid return code. Save for the result of an acquire_sem_etc() pthread_cond_timedwait() returns only constants. Please verify by running with strace.

comment:11 Changed 9 years ago by kallisti5

attaching output of strace -o strace.out -T ./miniruby -I./lib -I.ext/common -I./- -r./ext/purelib.rb ./enc/make_encmake.rb --builtin-encs="ascii.o us_ascii.o unicode.o utf_8.o" --builtin-transes="newline.o" enc.mk

Changed 9 years ago by kallisti5

Attachment: strace.out added

strace -T of miniruby seg

comment:12 Changed 9 years ago by bonefish

From the strace it looks like the main thread causes a segment violation, its SIGSEGV signal handler is called, which in turn spills quite a bit of stuff to stdout, and finally aborts. Why the timer thread would segfault, I cannot say. Might just be collateral damage due to the main thread leaving something in an invalid state. Or it might as well have the same cause as the first segfault.

Anyway, I would disable the sigaction() call for SIGSEGV, so that the main thread enters the debugger and one can see where it segfaults.

comment:13 in reply to:  7 ; Changed 9 years ago by augiedoggie

Replying to kallisti5:

ok.. weird but good news. compiling ruby (same code) on R1A1 brings up the gdb crash window at the same step that was failing in hrev35267.

Probably not related to the compile problem in this ticket, just wanted to point out that ruby doesn't run on R1A1 because of pthread problems that were fixed in hrev33779

There is also a ruby package available for download at HaikuPorts

And the portlog

comment:14 in reply to:  13 ; Changed 9 years ago by kallisti5

Replying to augiedoggie:

Replying to kallisti5:

ok.. weird but good news. compiling ruby (same code) on R1A1 brings up the gdb crash window at the same step that was failing in hrev35267.

Probably not related to the compile problem in this ticket, just wanted to point out that ruby doesn't run on R1A1 because of pthread problems that were fixed in hrev33779

As stated this is hrev35267.

There is also a ruby package available for download at HaikuPorts

And the portlog

Ha... i wish i knew about these.. several of the things in that patch I did myself and submitted upstream to Ruby. (they were all added)

  • Missing Haiku SA_SIGINFO -- fixed in mainline ruby, don't have the revision number off hand though.
  • The following probably still needs committed (based on the port log)

diff -ur ruby-1.9.1-p243/configure.in ruby-1.9.1-p243-haiku/configure.in

  • DLDFLAGS="$DLDFLAGS -L/boot/develop/lib/x86 -lbe -lroot"

+ DLDFLAGS="$DLDFLAGS -L/boot/develop/lib/x86 -lbe -lroot -L/boot

diff -ur ruby-1.9.1-p243/ext/nkf/nkf-utf8/nkf.h ruby-1.9.1-p243-haiku/ext/nkf/nkf-utf8/nkf.h

+#elif defined(HAIKU) +# undef HAVE_LANGINFO_H +# ifndef HAVE_LOCALE_H +# define HAVE_LOCALE_H +# endif

comment:15 Changed 9 years ago by kallisti5

anyway, these things should be fixed as it will cause less problems porting other POSIX software.

I'll work on disabling the sigaction() call tomorrow and let you know the results.

comment:16 in reply to:  14 ; Changed 9 years ago by augiedoggie

Replying to kallisti5:

Replying to augiedoggie:

Replying to kallisti5:

ok.. weird but good news. compiling ruby (same code) on R1A1 brings up the gdb crash window at the same step that was failing in hrev35267.

Probably not related to the compile problem in this ticket, just wanted to point out that ruby doesn't run on R1A1 because of pthread problems that were fixed in hrev33779

As stated this is hrev35267.

Yes, but the Version field on this ticket states R1/alpha1 and your comment said you were attempting to compile on R1A1 in addition to hrev35267

There is also a ruby package available for download at HaikuPorts

And the portlog

Ha... i wish i knew about these.. several of the things in that patch I did myself and submitted upstream to Ruby. (they were all added)

Yes I saw that. I'm still debating whether to submit a follow up patch to fix some of the things you had them change. You enabled old BeOS code paths that were not necessary on Haiku. They probably won't hurt except that we now have to maintain code that is different from other operating systems(like library/symbol loading, chown/chgrp stuff, and possibly the file IO stuff). As you can see from the test results in the portlog I didn't need to use any of the old BeOS code that you've enabled.

comment:17 Changed 9 years ago by mmadia

Version: R1/alpha1R1/Development

comment:18 in reply to:  16 Changed 9 years ago by stippi

Replying to augiedoggie:

Yes I saw that. I'm still debating whether to submit a follow up patch to fix some of the things you had them change. You enabled old BeOS code paths that were not necessary on Haiku. They probably won't hurt except that we now have to maintain code that is different from other operating systems(like library/symbol loading, chown/chgrp stuff, and possibly the file IO stuff). As you can see from the test results in the portlog I didn't need to use any of the old BeOS code that you've enabled.

Ouch, yes please submit a follow up patch. Haiku specifically doesn't #define BEOS for this very reason.

comment:19 Changed 9 years ago by kallisti5

I was under the assumption that Haiku did define BEOS, I know for a fact that it used to. (Thus my preparation for Haiku by adding ifdef BEOS
HAIKU defines. If a a few of the BeOS patches or work arounds are no longer needed the BEOS HAIKU defines could be easily changed to BEOS && !HAIKU)

comment:20 Changed 9 years ago by kallisti5

sigh trac,' ifdef BEOS HAIKU ' is supposed to be ' ifdef BEOS <PIPE><PIPE> HAIKU '

comment:21 Changed 9 years ago by kallisti5

SIGH... hrev30536 changed to no longer define BEOS thus defeating my awesome and powerful logic.

This is all still off topic, even removing a few of the added / unneeded Haiku ifdef's the vm_soft_fault's don't stop.

comment:22 Changed 9 years ago by kallisti5

...hrev30536 should be hrev29068, this ain't MIPS

comment:23 in reply to:  21 Changed 9 years ago by bonefish

Replying to kallisti5:

This is all still off topic, even removing a few of the added / unneeded Haiku ifdef's the vm_soft_fault's don't stop.

Indeed, the ticket is only still open because I wouldn't rule out that there's an underlying Haiku problem. Other porting issues should rather be discussed/tracked at HaikuPorts.

comment:24 Changed 9 years ago by bonefish

Resolution: invalid
Status: newclosed

The pthread condition variable implementation has been rewritten in hrev36323, so at least the issue with the incorrect return value of pthread_cond_timedwait() should be resolved. Closing the ticket as there hasn't been any update that would allow us to decide whether this is a problem with the port or whether there's an underlying Haiku bug. Feel free to reopen, if you find any indication for the latter.

Note: See TracTickets for help on using tickets.