Opened 7 years ago

Last modified 10 hours ago

#8430 new bug

[kernel] mplayer segfaults in a loop

Reported by: diver Owned by: bonefish
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All


This is hrev43901 gcc2hybrid. mplayer binary (8MB)

After running mplayer smb://share/video.avi (it doesn't matter if video.avi actually exists or not!) mplayer crashes. After that GUI and cursor becomes extremely sluggish and unresponsive.

ProcessController shows that mplayer uses 50% of cpu, the other 50% is used by syslog_daemon.
I had changed it's priority from normal ![10] to lowest active ![1] and GUI with cursor became normal again.

I can see in Tracker that syslog gets constantly overwritten. The following message repeats zillion times in syslog:

vm_soft_fault: va 0x0 not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x0, write 0, user 1, thread 0x313
vm_page_fault: thread "mplayer" (787) in team "mplayer" (787) tried to read address 0x0, ip 0x0 ("???" +0x0)

Then I debugged mplayer thread from ProcessController:

[Switching to team /boot/apps/Mplayer/mplayer smb://share/video.avi (299) thread mplayer (299)]
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x03cd09ee in ucnv_close () from /boot/system/lib/gcc4/
#2  0x03de1e67 in BPrivate::Libroot::ICUThreadLocalStorageValue::~ICUThreadLocalStorageValue () from /boot/system/lib/gcc4/
#3  0x03ddf0e9 in BPrivate::Libroot::ICULocaleBackend::_DestroyThreadLocalStorageValue () from /boot/system/lib/gcc4/
#4  0x018bb892 in __pthread_key_call_destructors () from /boot/system/lib/gcc4/
#5  0x018bad18 in __pthread_destroy_thread () from /boot/system/lib/gcc4/
#6  0x018ae222 in _thread_do_exit_work () from /boot/system/lib/gcc4/
#7  0x0190f0b0 in exit () from /boot/system/lib/gcc4/
#8  0x0030ab39 in rm_osd_msg ()
#14 0x00b81e4f in _tevent_req_notify_callback ()
#30 0x01000000 in ff_codec_caf_tags ()
#43 0x00eabcd1 in tdb_allocate ()
#44 0x03de1e67 in BPrivate::Libroot::ICUThreadLocalStorageValue::~ICUThreadLocalStorageValue () from /boot/system/lib/gcc4/
#45 0x03ddf0e9 in BPrivate::Libroot::ICULocaleBackend::_DestroyThreadLocalStorageValue () from /boot/system/lib/gcc4/
#46 0x018bb892 in __pthread_key_call_destructors () from /boot/system/lib/gcc4/
#47 0x018bad18 in __pthread_destroy_thread () from /boot/system/lib/gcc4/
#48 0x018ae222 in _thread_do_exit_work () from /boot/system/lib/gcc4/
#49 0x0190f0b0 in exit () from /boot/system/lib/gcc4/
#50 0x0030c0a2 in exit_player_with_rc ()
#51 0x012be558 in ?? ()

mplayer binary is from 20.05.2011.
Some pointers from DeadYak:

DeadYak	looks like it's trapping segfaults
DeadYak	that might explain the issue, if it's doing that in a loop it's going to be nailing the VM like crazy
DeadYak	file that as a kernel bug and assign to bonefish please...including screenshots
DeadYak	looks like mplayer installs a segfault handler and then something else goes wrong
DeadYak	possibly during thread cleanup
DeadYak	but right now it looks more like a VM bug
DeadYak	or a combination of a design issue in the VM + bad behavior somewhere
DeadYak	also, if that behavior doesn't happen in older revs, it'd be interesting to know when it started
DeadYak	I suspect this isn't a new problem though

Attachments (1)

mplayer.png (101.4 KB) - added by diver 7 years ago.

Download all attachments as: .zip

Change History (14)

Changed 7 years ago by diver

Attachment: mplayer.png added

comment:1 Changed 7 years ago by diver

Using binary search I've found out that this behavior started from hrev43507.

comment:2 Changed 7 years ago by anevilyak

That begs a question: how old is that mplayer binary? If it was built on Haiku, it would have needed to be rebuilt post-43507 because this changes breaks compatibility.

comment:3 Changed 7 years ago by diver

Sure, mplayer binary is from 20.05.2011. But shouldn't Haiku behave more gracefully?

comment:4 Changed 7 years ago by anevilyak

That might be difficult depending on what kind of signal handlers mplayer puts in place. Under normal circumstances this would simply crash on exit, but it could be that something else strange is going on here. I'm not sure what order the images get unmapped in again, so it could be some strange interaction like the image the signal handler is in already being unmapped at this point in at-exit cleanups. Ingo might have a better clue.

comment:5 Changed 7 years ago by bonefish

I don't really have anything insightful to add. I think the analysis so far is correct. Due to some problem (the incompatible compiler changes?) mplayer crashes on exit. It probably has installed a SIGSEGV handler which is why it isn't caught by the debug server, but enters its signal handler. No clue what this handler does. Normally they just print some error message and call _exit(). Due to the stack trace being incomplete and probably not fully correct anyway (IIRC I never implemented signal handler support in gdb), it's hard to say what happens exactly.

Maybe the SIGSEGV handler actually (directly or indirectly) calls exit() which crashes again, thus causing an infinite recursion. This should eventually overflow the stack, causing the setup of the signal handler environment to fail. ATM we just ignore that error (which should be fixed). So I guess at that point the kernel would just return to the previous point of execution and immediately trigger the page fault again. The syslog message speaks against this theory, though. On the other hand, the stack might not have overflown at this point yet. If serial debug output is enabled things might be slow enough for that to take quite a while. A kernel stack trace might provide more insights.

comment:6 Changed 7 years ago by anevilyak

I'm unable to reproduce this on my end, it does indeed crash, but then exits normally. Trapping it in the debugger instead of letting it terminate gives a more or less expected backtrace:

#0  0xffff0114 in ?? ()
#1  0x018aa7f0 in debugger () from /boot/system/lib/gcc4/
#2  0x018b4b3e in __assert_fail () from /boot/system/lib/gcc4/
#3  0x01920fd2 in BPrivate::hoardHeap::freeBlock ()
   from /boot/system/lib/gcc4/
#4  0x01921d1d in BPrivate::processHeap::free ()
   from /boot/system/lib/gcc4/
#5  0x01922a9e in free () from /boot/system/lib/gcc4/
#6  0x03cf18c9 in uprv_free () from /boot/system/lib/gcc4/
#7  0x03cfba9a in ucnv_close () from /boot/system/lib/gcc4/
#8  0x03e0ce67 in BPrivate::Libroot::ICUThreadLocalStorageValue::~ICUThreadLocalStorageValue () from /boot/system/lib/gcc4/
#9  0x03e0a0e9 in BPrivate::Libroot::ICULocaleBackend::_DestroyThreadLocalStorageValue () from /boot/system/lib/gcc4/
#10 0x018bb892 in __pthread_key_call_destructors ()
   from /boot/system/lib/gcc4/
#11 0x018bad18 in __pthread_destroy_thread ()
   from /boot/system/lib/gcc4/
#12 0x018ae222 in _thread_do_exit_work () from /boot/system/lib/gcc4/
#13 0x0190f0b0 in exit () from /boot/system/lib/gcc4/
#14 0x0030c0a2 in exit_player_with_rc ()
#15 0x012be558 in ?? ()

Perhaps it's in some way triggered by memory usage constraints? I do have 8GiB here.

comment:7 Changed 7 years ago by anevilyak

I stand corrected, feeding it different SMB URI which actually led to a valid machine reproduced the issue here. KDL-based backtrace is as follows:

stack trace for thread 453 "mplayer"
   kernel stack: 0x82666000 to 0x8266a000
     user stack: 0x7efef000 to 0x7ffef000
frame               caller     <image>:function + offset
 0 82669c48 (+  48) 8006c31a   <kernel_x86> process_pending_ici(int32: 3) + 0x0106
 1 82669c78 (+  48) 8006c3d5   <kernel_x86>:smp_intercpu_int_handler + 0x0015
 2 82669ca8 (+  32) 801277ac   <kernel_x86>:i386_ici_interrupt(NULL) + 0x0014
 3 82669cc8 (+  48) 8005880a   <kernel_x86>:int_io_interrupt_handler + 0x005e
 4 82669cf8 (+  64) 80125bcd   <kernel_x86> hardware_interrupt(iframe*: 0x82669d44) + 0x006d
 5 82669d38 (+  12) 8012a88d   <kernel_x86>:int_bottom + 0x003d
kernel iframe at 0x82669d44 (end = 0x8669d94)
 eax 0x1            ebx 0x8019cf08      ecx 0x0          edx 0x8019cf08
 esi 0x82669df8     edi 0x80183dc0      ebp 0x82669dcc   esp 0x82669d78
 eip 0x8009298c  eflags 0x213202
 vector: 0xfd, error code: 0x0
 6 82669d44 (+ 136) 8009298c   <kernel_x86> dprintf_args(0x80183dc0 "vm_soft_fault: va 0x%lx not covered by area in address space", 0x82669df8, "", true) + 0x00fc
 7 82669dcc (+  32) 8009346f   <kernel_x86>:dprintf + 0x0037
 8 82669dec (+ 192) 80107ddf   <kernel_x86> vm_soft_fault(VMAddressSpace*: 0xcdd17a88, uint32: 0x0 (0), false, true, vm_page*: NULL, VMAreaWiredRange*: NULL) + 0x00fb
 9 82669eac (+ 160) 801073b7   <kernel_x86>:vm_page_fault + 0x009f
10 82669f4c (+  80) 80125b4a   <kernel_x86> page_fault_exception(iframe*: 0x82669fa8) + 0x0176
11 82669f9c (+  12) 8012a90b   <kernel_x86>:int_bottom_user + 0x006f
user iframe at 0x82669fa8 (end = 0x8266a000)
 eax 0x1803d050     ebx 0x3dfe58c       ecx 0x7ffed1a4   edx 0x0
 esi 0x0            edi 0x7ffed1c0      ebp 0x7ffed1dc   esp 0x82669fdc
 eip 0x0         eflags 0x213283   user esp 0x7ffed150
 vector: 0xe, error code: 0x4
12 82669fa8 (+   0) 00000000
13 7ffed1dc (+  32) 03e0ce67   <> BPrivate::libroot::ICUThreadLocalStorageValue::~ICUThreadLocalStorageValue() + 0x0025
14 7ffed1fc (+  32) 03e0a0e9   <> BPrivate::libroot::ICULocaleBackend<0x180a1820>::_DestroyThreadLocalStorageValue(void*: 0x7ffed248) + 0x0021
15 7ffed21c (+  48) 018bb892   <>:__pthread_key_call_destructor + 0x005e
16 7ffed24c (+  32) 018bad18   <>:__pthread_destroy_thread + 0x0038
17 7ffed26c (+  32) 018ae222   <>:_thread_do_exit_work + 0x005d
18 7ffed28c (+  16) 0190f0b0   <>:exit + 0x0018
19 7ffed29c (+ 624) 0030ab39   <mplayer>:rm_osd_msg (nearest) + 0x1319

Also note, confirmed that mplayer indeed installs a signal handler, as it displays a console error message indicating that it was interrupted by signal 11 in module: exit_player.

Last edited 7 years ago by anevilyak (previous) (diff)

comment:8 Changed 7 years ago by anevilyak

If it matters, the thread in question can be terminated via ProcessController when it's trapped in this state and exits normally then.

comment:9 Changed 7 years ago by anevilyak

Custom signal handlers don't seem to be coming into play for the infinite loop at least ; a kernel with signal tracing enabled isn't showing any extra activity to go with the page fault/soft fault messages.

comment:10 Changed 7 years ago by zooey

I have no idea about the infinite loop, but the trigger of the segfault has been fixed in hrev43933.

Leaving this open for anyone wanting to look at why the loop occurs ...

comment:11 Changed 6 years ago by diver

What about syslog_daemon? Does it really need to run at normal priority or we can lower it to lowest active?

comment:12 Changed 11 months ago by waddlesplash

Milestone: R1Unscheduled

comment:13 Changed 10 hours ago by korli

diver, could you check with a hrev newer than hrev53150? hrev53150 should definitely help.

Note: See TracTickets for help on using tickets.