Opened 8 years ago

Last modified 3 years ago

#6673 new bug

[registrar] can be crashed by running multiple instances of mimeset

Reported by: samui Owned by: bonefish
Priority: critical Milestone: R1
Component: Servers/registrar Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #11063
Has a Patch: no Platform: All

Description

reproducable bug. click on a video a gazillion times and see what happens.

sometimes you get and out of memory error, sometimes a bad port id error. both of these errors randomly crash deskbar and or tracker. the mouse freezes.

Sometimes you are dropped to kdl with the enclosed info.

256mb of ram, no virtual memory enabled, build 38817

Attachments (10)

test2LauchManyVideosTrackerCrash.png (54.1 KB) - added by samui 8 years ago.
outOf256mbMem.png (79.0 KB) - added by samui 8 years ago.
create_app_meta_mime.jpg (1.6 MB) - added by diver 8 years ago.
MIMEManager.jpg (1.5 MB) - added by diver 8 years ago.
kdl_mimeset.png (29.7 KB) - added by diver 8 years ago.
syslog (797.1 KB) - added by diver 7 years ago.
listport (55.9 KB) - added by diver 7 years ago.
debug_session (46.4 KB) - added by diver 7 years ago.
listport_form_debug_session (56.0 KB) - added by diver 7 years ago.
registrar-189-debug-03-06-2014-13-39-03.report (11.6 KB) - added by diver 5 years ago.

Change History (28)

Changed 8 years ago by samui

Changed 8 years ago by samui

Attachment: outOf256mbMem.png added

comment:1 Changed 8 years ago by stippi

When you say "click on a video a gazillion times", you mean double click it in Tracker, right? Do you leave each instance open, or do you close it each time before you open it again? Since one of my changes to the media node based playback in MediaPlayer, the video producer node and the video consumer node share the same video buffers (to avoid having to copy a video frame when it is transfered from the producer to the consumer node). Upon destruction of the connection, the media_server failes to deallocate these buffers. I have not investigated this issue closer for the time being. Just a hint in case I don't get around to fixing that, for whoever wants to work on it later.

comment:2 Changed 8 years ago by stargatefan

I was wondering if maybe that was the issue with gnash. Buffer deallocation.

comment:3 Changed 8 years ago by diver

Component: Audio & VideoServers/registrar
Owner: changed from nobody to bonefish
Version: R1/alpha2R1/Development

I can reproduce this bug easily on real hardware by opening /boot/common/boot/post_install, selecting mime_update.sh and holding down the return key for about 30 seconds. Most of the time registar crashes in create_app_meta_mime thread or in main_mime thread (MIMEManager::MessageReceived). Sometimes it KDLs (see attached images). I tried to find a shorter way to reproduce this bug, but behavior has changed:

for i in `seq 1 10000`; do mimeset -apps -f /system/apps/*; done

After running this command (in vbox, so take with a grain of salt) simultaneously in several tabs registrar just dies silently and launching new apps throws Bad port ID error (sounds a lot like #381).

Last edited 3 years ago by diver (previous) (diff)

Changed 8 years ago by diver

Attachment: create_app_meta_mime.jpg added

Changed 8 years ago by diver

Attachment: MIMEManager.jpg added

Changed 8 years ago by diver

Attachment: kdl_mimeset.png added

comment:4 Changed 8 years ago by diver

After getting Bad port ID error Tracker crashed. I tried to find it in syslog after reboot and found something, but I'm not sure if it's really the tracker crash traces, but anyway:

KERN: 16665: DEBUGGER: The clipboard must be locked before proceeding.
KERN: debug_server: Thread 16665 entered the debugger: Debugger call: `The clipboard must be locked before proceeding.'
KERN: stack trace, current PC 0xffff0114  :
KERN:   (0x782068bc)  0x2c5821  _ZNK10BClipboard13_AssertLockedEv + 0x39
KERN:   (0x782068dc)  0x2c5849  _ZNK10BClipboard4DataEv + 0x1d
KERN:   (0x782068fc)  0x518b64  _Z23FSClipboardFindNodeModePN8BPrivate5ModelEbb + 0x6c
KERN:   (0x782069dc)  0x57d3e0  _ZN8BPrivate9BPoseView11CreatePosesEPPNS_5ModelEPNS_8PoseInfoElPPNS_5BPoseEbPlP5BRectb + 0x814
KERN:   (0x78206b3c)  0x5891a6  _ZN8BPrivate9BPoseView15MessageReceivedEP8BMessage + 0x386
KERN:   (0x78206c3c)  0x2caa38  _ZN7BLooper15DispatchMessageEP8BMessageP8BHandler + 0x50
KERN:   (0x78206c5c)  0x38e404  _ZN7BWindow15DispatchMessageEP8BMessageP8BHandler + 0x19da
KERN:   (0x78206f1c)  0x389d09  _ZN7BWindow11task_looperEv + 0x22b
KERN:   (0x78206f9c)  0x2cc29a  _ZN7BLooper7_task0_EPv + 0x30
KERN:   (0x78206fbc)  0x66518d  thread_entry + 0x2d

comment:5 Changed 8 years ago by diver

Summary: Deskbar freezes under heavy load with mediaplayer[registrar] can be crashed by running multiple instances of mimeset

comment:6 Changed 8 years ago by diver

Could it be that MediaPlayer registers mime types on every start? Then it could explain observed behavior from the description with regard to registrar.

comment:7 Changed 8 years ago by axeld

Priority: normalcritical

comment:8 Changed 8 years ago by bonefish

The KDL is tracked in #5474.

Regarding the Tracker crash, that's due to BPoseView::CreatePoses() not checking the return value of be_clipboard->Lock(), just assuming it worked. Since BClipboard::Lock() does synchronous messenging, which might require a port to be created on the fly, running out of ports can cause BClipboard::Lock() to fail.

Why the system is running out of ports is an interesting question, though. The registrar seems to limit the MIME update threads to 12, denying any further requests after the limit is reached. So, at least in theory, there shouldn't be any resources piling up in the registrar due to MIME update requests. The output of a listport (or ports in KDL -- long, so skip and fetch from syslog) in such a situation would at least help to identify who is leaking/hogging what kind of ports.

Why the registrar crashes is another interesting question. One stack trace is an hoard assert, so the heap had been corrupted. The other is an outright crash in MIMEManager::MessageReceived(), though I haven't spotted any suspicious code (the syslog would have been nice). Might also be a side effect of a heap corruption.

What looks a bit suspicious is RegistrarThreadManager. It uses atomic operations to change fThreadCount, but doesn't do any locking to protect the thread list (fThreads). I haven't investigated this further, but so far I've seen the RegistrarThreadManager only being used in the MIMEManagers thread. So the atomic operations might simply be superfluous.

comment:9 in reply to:  8 ; Changed 8 years ago by diver

Replying to bonefish:

The output of a listport (or ports in KDL -- long, so skip and fetch from syslog) in such a situation would at least help to identify who is leaking/hogging what kind of ports.

My keyboard doesn't work in KDL, I do have auto_stack_trace installed, though. Is there a way to modify it to auto execute listport/ports. Also, how should I enter KDL? Alt-Sysreq-D just prints a small backtrace for keyboard thread and hangs.

comment:10 in reply to:  9 Changed 8 years ago by bonefish

Replying to diver:

Replying to bonefish:

My keyboard doesn't work in KDL, I do have auto_stack_trace installed, though. Is there a way to modify it to auto execute listport/ports. Also, how should I enter KDL? Alt-Sysreq-D just prints a small backtrace for keyboard thread and hangs.

listport is a command line program. I mentioned the KDL alternative only in case listport cannot be executed/doesn't work correctly in this situation.

With a USB keyboard you have to at least once enter KDL via Alt-SysReq-D, since otherwise it won't work. If it doesn't work anyway (e.g. because it's OHCI) you're out of luck. I'm not aware of any method to easily execute other KDL commands automatically. I think auto_stack_trace is mostly superfluous nowadays, as any panic prints a stack trace anyway and when voluntarily entering KDL one usually can (and wants to) do other/more things. auto_stack_trace cannot be configured. If you build your images yourself, you could hack it to execute another command instead of sc, though.

comment:11 Changed 7 years ago by diver

I tried to reproduce it several times in hrev42467, but I don't get these crashes anymore. I got new one #7864, though.

comment:12 Changed 7 years ago by diver

Tried to reproduce it again (by starting mime_update.sh many times) and got fullscreen GDB once again. Unfortunately, my keyboard didn't work there. I tried to reproduce it one more time after reboot but didn't manage to crash it, instead after some time running process started by mime_update.sh stopped quitting. CPU usage was always at max, there was a lot of free memory, though.

Changed 7 years ago by diver

Attachment: syslog added

comment:13 Changed 7 years ago by diver

Managed to crash it again, this time with ssh access:

KERN: 60: DEBUGGER: getNumAvailable() == 0
KERN: debug_server: Thread 60 entered the debugger: Debugger call: `getNumAvailable() == 0'
KERN: stack trace, current PC 0xffff0114  :
KERN:   (0x700c1b7c)  0x5f1062  __assert_fail + 0x46
KERN:   (0x700c1b9c)  0x665504  _ZN8BPrivate9hoardHeap23findAvailableSuperblockEiRPNS_5blockEPNS_11processHeapE + 0x34c
KERN:   (0x700c1bec)  0x663431  _ZN8BPrivate10threadHeap6mallocEm + 0xa3
KERN:   (0x700c1c3c)  0x663949  malloc + 0x1b5
KERN:   (0x700c1c5c)  0x5ed4bf  _ZN8BPrivate8KMessage14_AllocateSpaceElbbPPvPl + 0x95
KERN:   (0x700c1c9c)  0x5ed64d  _ZN8BPrivate8KMessage9_AddFieldEPKcmlPNS_13KMessageFieldE + 0x3b
KERN:   (0x700c1cec)  0x5ee261  _ZN8BPrivate8KMessage7AddDataEPKcmPKvlb + 0xa5
KERN:   (0x700c1d4c)  0x22567d  _ZN21AuthenticationManager14_RequestThreadEv + 0x11c3
KERN:   (0x700c1f9c)  0x2272db  _ZN21AuthenticationManager19_RequestThreadEntryEPv + 0x1b
KERN:   (0x700c1fbc)  0x5ea957  thread_entry + 0x1d 
KERN: ps2: possibly a hot plugin of input/keyboard/at/0
KERN: ps2: devfs_publish_device input/keyboard/at/0, status = 0x00000000
KERN: ps2: keyboard found

After that I connected ps2 keyboard which didn't work anyway.

Changed 7 years ago by diver

Attachment: listport added

comment:14 Changed 7 years ago by diver

Running kernel_debugger from ssh made my ps2 keyboard work in KDL, see debug_session attachment.

Changed 7 years ago by diver

Attachment: debug_session added

Changed 7 years ago by diver

Attachment: listport_form_debug_session added

comment:15 Changed 7 years ago by diver

This behavior seems to be related to gcc4 as I'm unable to reproduce these issues under gcc2.

comment:16 Changed 7 years ago by diver

registrar still silently crashes in hrev44065 gcc4 while running:

for i in `seq 1 100000`; do mimeset -apps -f /system/apps/*; done

in 6 Terminals simultaneously. debug_server doesn't seem to catch it. Every open terminal starts printing:

FATAL: be_roster is not valid. Is the registrar running?

registrar is easily restartable using on_exit /system/servers/registrar KDL command.

Last edited 7 years ago by diver (previous) (diff)

comment:17 Changed 5 years ago by diver

hrev47261 gcc2hybrid.

Out of curiosity I tried the above command with a freshly unpacked archive of BeOS apps archive just to see how things are now two years later. Registrar crashed, but this time this was with gcc2 image and I managed to save debug report.

comment:18 Changed 4 years ago by diver

Blocking: 11063 added

(In #11063) Seems to be a dupe of #6673.

Note: See TracTickets for help on using tickets.