#6673 closed bug (fixed)
[registrar] can be crashed by running multiple instances of mimeset
Reported by: | samui | Owned by: | bonefish |
---|---|---|---|
Priority: | critical | Milestone: | R1/beta5 |
Component: | Servers/registrar | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #11063 | |
Platform: | All |
Description
reproducable bug. click on a video a gazillion times and see what happens.
sometimes you get and out of memory error, sometimes a bad port id error. both of these errors randomly crash deskbar and or tracker. the mouse freezes.
Sometimes you are dropped to kdl with the enclosed info.
256mb of ram, no virtual memory enabled, build 38817
Attachments (12)
Change History (49)
by , 14 years ago
Attachment: | test2LauchManyVideosTrackerCrash.png added |
---|
by , 14 years ago
Attachment: | outOf256mbMem.png added |
---|
comment:1 by , 14 years ago
comment:2 by , 14 years ago
I was wondering if maybe that was the issue with gnash. Buffer deallocation.
comment:3 by , 14 years ago
Component: | Audio & Video → Servers/registrar |
---|---|
Owner: | changed from | to
Version: | R1/alpha2 → R1/Development |
I can reproduce this bug easily on real hardware by opening /boot/common/boot/post_install, selecting mime_update.sh and holding down the return key for about 30 seconds. Most of the time registar crashes in create_app_meta_mime thread or in main_mime thread (MIMEManager::MessageReceived). Sometimes it KDLs (see attached images). I tried to find a shorter way to reproduce this bug, but behavior has changed:
for i in `seq 1 10000`; do mimeset -apps -f /system/apps/*; done
After running this command (in vbox, so take with a grain of salt) simultaneously in several tabs registrar just dies silently and launching new apps throws Bad port ID error (sounds a lot like #381).
by , 14 years ago
Attachment: | create_app_meta_mime.jpg added |
---|
by , 14 years ago
Attachment: | MIMEManager.jpg added |
---|
by , 14 years ago
Attachment: | kdl_mimeset.png added |
---|
comment:4 by , 14 years ago
After getting Bad port ID error Tracker crashed. I tried to find it in syslog after reboot and found something, but I'm not sure if it's really the tracker crash traces, but anyway:
KERN: 16665: DEBUGGER: The clipboard must be locked before proceeding. KERN: debug_server: Thread 16665 entered the debugger: Debugger call: `The clipboard must be locked before proceeding.' KERN: stack trace, current PC 0xffff0114 : KERN: (0x782068bc) 0x2c5821 _ZNK10BClipboard13_AssertLockedEv + 0x39 KERN: (0x782068dc) 0x2c5849 _ZNK10BClipboard4DataEv + 0x1d KERN: (0x782068fc) 0x518b64 _Z23FSClipboardFindNodeModePN8BPrivate5ModelEbb + 0x6c KERN: (0x782069dc) 0x57d3e0 _ZN8BPrivate9BPoseView11CreatePosesEPPNS_5ModelEPNS_8PoseInfoElPPNS_5BPoseEbPlP5BRectb + 0x814 KERN: (0x78206b3c) 0x5891a6 _ZN8BPrivate9BPoseView15MessageReceivedEP8BMessage + 0x386 KERN: (0x78206c3c) 0x2caa38 _ZN7BLooper15DispatchMessageEP8BMessageP8BHandler + 0x50 KERN: (0x78206c5c) 0x38e404 _ZN7BWindow15DispatchMessageEP8BMessageP8BHandler + 0x19da KERN: (0x78206f1c) 0x389d09 _ZN7BWindow11task_looperEv + 0x22b KERN: (0x78206f9c) 0x2cc29a _ZN7BLooper7_task0_EPv + 0x30 KERN: (0x78206fbc) 0x66518d thread_entry + 0x2d
comment:5 by , 14 years ago
Summary: | Deskbar freezes under heavy load with mediaplayer → [registrar] can be crashed by running multiple instances of mimeset |
---|
comment:6 by , 14 years ago
Could it be that MediaPlayer registers mime types on every start? Then it could explain observed behavior from the description with regard to registrar.
comment:7 by , 14 years ago
Priority: | normal → critical |
---|
follow-up: 9 comment:8 by , 14 years ago
The KDL is tracked in #5474.
Regarding the Tracker crash, that's due to BPoseView::CreatePoses()
not checking the return value of be_clipboard->Lock()
, just assuming it worked. Since BClipboard::Lock()
does synchronous messenging, which might require a port to be created on the fly, running out of ports can cause BClipboard::Lock()
to fail.
Why the system is running out of ports is an interesting question, though. The registrar seems to limit the MIME update threads to 12, denying any further requests after the limit is reached. So, at least in theory, there shouldn't be any resources piling up in the registrar due to MIME update requests. The output of a listport
(or ports
in KDL -- long, so skip and fetch from syslog) in such a situation would at least help to identify who is leaking/hogging what kind of ports.
Why the registrar crashes is another interesting question. One stack trace is an hoard assert, so the heap had been corrupted. The other is an outright crash in MIMEManager::MessageReceived()
, though I haven't spotted any suspicious code (the syslog would have been nice). Might also be a side effect of a heap corruption.
What looks a bit suspicious is RegistrarThreadManager
. It uses atomic operations to change fThreadCount
, but doesn't do any locking to protect the thread list (fThreads
). I haven't investigated this further, but so far I've seen the RegistrarThreadManager
only being used in the MIMEManager
s thread. So the atomic operations might simply be superfluous.
follow-up: 10 comment:9 by , 14 years ago
Replying to bonefish:
The output of a
listport
(orports
in KDL -- long, so skip and fetch from syslog) in such a situation would at least help to identify who is leaking/hogging what kind of ports.
My keyboard doesn't work in KDL, I do have auto_stack_trace installed, though. Is there a way to modify it to auto execute listport/ports. Also, how should I enter KDL? Alt-Sysreq-D just prints a small backtrace for keyboard thread and hangs.
comment:10 by , 14 years ago
Replying to diver:
Replying to bonefish:
My keyboard doesn't work in KDL, I do have auto_stack_trace installed, though. Is there a way to modify it to auto execute listport/ports. Also, how should I enter KDL? Alt-Sysreq-D just prints a small backtrace for keyboard thread and hangs.
listport
is a command line program. I mentioned the KDL alternative only in case listport
cannot be executed/doesn't work correctly in this situation.
With a USB keyboard you have to at least once enter KDL via Alt-SysReq-D, since otherwise it won't work. If it doesn't work anyway (e.g. because it's OHCI) you're out of luck. I'm not aware of any method to easily execute other KDL commands automatically. I think auto_stack_trace
is mostly superfluous nowadays, as any panic
prints a stack trace anyway and when voluntarily entering KDL one usually can (and wants to) do other/more things. auto_stack_trace
cannot be configured. If you build your images yourself, you could hack it to execute another command instead of sc
, though.
comment:11 by , 14 years ago
comment:12 by , 13 years ago
Tried to reproduce it again (by starting mime_update.sh many times) and got fullscreen GDB once again. Unfortunately, my keyboard didn't work there. I tried to reproduce it one more time after reboot but didn't manage to crash it, instead after some time running process started by mime_update.sh stopped quitting. CPU usage was always at max, there was a lot of free memory, though.
by , 13 years ago
comment:13 by , 13 years ago
Managed to crash it again, this time with ssh access:
KERN: 60: DEBUGGER: getNumAvailable() == 0 KERN: debug_server: Thread 60 entered the debugger: Debugger call: `getNumAvailable() == 0' KERN: stack trace, current PC 0xffff0114 : KERN: (0x700c1b7c) 0x5f1062 __assert_fail + 0x46 KERN: (0x700c1b9c) 0x665504 _ZN8BPrivate9hoardHeap23findAvailableSuperblockEiRPNS_5blockEPNS_11processHeapE + 0x34c KERN: (0x700c1bec) 0x663431 _ZN8BPrivate10threadHeap6mallocEm + 0xa3 KERN: (0x700c1c3c) 0x663949 malloc + 0x1b5 KERN: (0x700c1c5c) 0x5ed4bf _ZN8BPrivate8KMessage14_AllocateSpaceElbbPPvPl + 0x95 KERN: (0x700c1c9c) 0x5ed64d _ZN8BPrivate8KMessage9_AddFieldEPKcmlPNS_13KMessageFieldE + 0x3b KERN: (0x700c1cec) 0x5ee261 _ZN8BPrivate8KMessage7AddDataEPKcmPKvlb + 0xa5 KERN: (0x700c1d4c) 0x22567d _ZN21AuthenticationManager14_RequestThreadEv + 0x11c3 KERN: (0x700c1f9c) 0x2272db _ZN21AuthenticationManager19_RequestThreadEntryEPv + 0x1b KERN: (0x700c1fbc) 0x5ea957 thread_entry + 0x1d KERN: ps2: possibly a hot plugin of input/keyboard/at/0 KERN: ps2: devfs_publish_device input/keyboard/at/0, status = 0x00000000 KERN: ps2: keyboard found
After that I connected ps2 keyboard which didn't work anyway.
by , 13 years ago
comment:14 by , 13 years ago
Running kernel_debugger from ssh made my ps2 keyboard work in KDL, see debug_session attachment.
by , 13 years ago
Attachment: | debug_session added |
---|
by , 13 years ago
Attachment: | listport_form_debug_session added |
---|
comment:15 by , 13 years ago
This behavior seems to be related to gcc4 as I'm unable to reproduce these issues under gcc2.
comment:16 by , 13 years ago
registrar still silently crashes in hrev44065 gcc4 while running:
for i in `seq 1 100000`; do mimeset -apps -f /system/apps/*; done
in 6 Terminals simultaneously. debug_server doesn't seem to catch it. Every open terminal starts printing:
FATAL: be_roster is not valid. Is the registrar running?
registrar is easily restartable using on_exit /system/servers/registrar KDL command.
comment:17 by , 11 years ago
hrev47261 gcc2hybrid.
Out of curiosity I tried the above command with a freshly unpacked archive of BeOS apps archive just to see how things are now two years later. Registrar crashed, but this time this was with gcc2 image and I managed to save debug report.
by , 11 years ago
Attachment: | registrar-189-debug-03-06-2014-13-39-03.report added |
---|
comment:19 by , 4 years ago
As of hrev54154+119 on a x86_gcc2 Haiku system, I can't reproduce the registrar crash anymore after stress testing the registrar with up to 30 instances of
"for i in seq 1 100000
; do mimeset -apps -f /system/apps/*; done"
running concurrently.
The only error I notice comes after 8-10 instances of mimeset running concurrently, in which case the Terminal will start printing "mimeset: no more threads", which is less troubling than a crash. After closing the Terminal (which automatically kills all mimeset jobs), everything continues to work as expected: Apps launch, mimetyping works, and no errors are even reported to the syslog. I can even try the stress test again!
Could someone double check on a newer version of the x86_64 version of Haiku? If all is well, I think this commit a while back may have fixed what was going on: https://git.haiku-os.org/haiku/commit/?id=462bfeede0cd123afe2e79d465876289e925ca53
follow-up: 21 comment:20 by , 4 years ago
Upon further analysis of the registrar codebase, the “no more threads” error is an expected behavior/safety mechanism to make sure the registrar isn’t doing too much work at once (the current limit is hard-wired into the registrar at 12 concurrent tasks).
Unless anyone else has any problems, I believe this problem was fixed in hrev49570. Diver’s last test was in hrev47261, so I believe the test results may not longer be a problem.
Thus, I believe this critical ticket can be closed! :)
comment:21 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Replying to CodeforEvolution:
Upon further analysis of the registrar codebase, the “no more threads” error is an expected behavior/safety mechanism to make sure the registrar isn’t doing too much work at once (the current limit is hard-wired into the registrar at 12 concurrent tasks).
Unless anyone else has any problems, I believe this problem was fixed in hrev49570. Diver’s last test was in hrev47261, so I believe the test results may not longer be a problem.
Thus, I believe this critical ticket can be closed! :)
Thanks for testing! Indeed, let's close it.
by , 4 years ago
Attachment: | registrar-521-debug-24-08-2020-08-52-28.report added |
---|
comment:22 by , 4 years ago
Sorry to comment just after the ticket has been closed.
I've attached a crash report for hrev54447 x86_64. I guess after so many years it's not the same bug, but it certainly is the same reproducer with the same effects.
Maybe I was just lucky, but I got the crash on real hardware when I was still starting those "mimeset bomb loops", and I couldn't reproduce it in three 15-minute runs in qemu-kvm with different smp settings.
comment:23 by , 4 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
No worries, thanks for posting the stack traces, this will allow us to further investigate the issue.
comment:24 by , 4 years ago
Thank you for the stack trace! I think two threads must of collided...I see a "General protection fault" on one thread, and a "Call (_numAvailable <= _numBlocks)" debugger call on the other thread. They also were doing the exact same thing, using BIconUtils::GetVectorIcon().
I wonder if these two threads collided because they were working on the same file...
comment:26 by , 4 years ago
I tried again after updating in case hrev54529 helped, but still crashed. The stack trace is quite similar this time on getting memory instead of releasing, if I understand it correctly.
I can't debug this myself, but I can apply patches, compile and follow directions. If there's anything I can do to help track this down, just tell me.
by , 4 years ago
Attachment: | registrar-520-debug-24-08-2020-11-49-12.report added |
---|
comment:27 by , 4 years ago
That really is strange...as waddlesplash mentioned, that really does look like memory corruption of sorts...and once again, it was in BIconUtils::GetVectorIcon...
Anyway, thank you for these logs! These sort of crashes requires all the info we can get! :)
comment:28 by , 4 years ago
cl-amp and http://pulkomandy.tk/~beosarchive/unsorted/BDifferent/Emulators/BeSmsPlus0.9.3.zip also trigger this crash.
comment:29 by , 4 years ago
On RISC-V port I see that on each application start and Tracker icon update registrar
mime-update
thread is created. It may be related.
comment:30 by , 12 months ago
Milestone: | R1 → R1/beta5 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
Fixed somewhere between hrev57193 and hrev57197.
comment:31 by , 12 months ago
That seems surprising to me. If it was a double-lock race, that could have been fixed by some of the earlier commits, but for it to be fixed by one of those changes seems a bit unexpected. But if you've tested thoroughly and it seems to really be the case, well, I guess that's that...
comment:32 by , 12 months ago
comment:33 by , 12 months ago
Hmm. Could you perhaps try and narrow down precisely which commit in that series fixed the problem? It might be good to get a better understanding of it. If it's just the locking commits that's one thing, but if it's the Queue changes, that might be indicative of something else going on.
comment:34 by , 12 months ago
6d30655 kernel/condition_variable: Adjust notified count and wait-status returning rids me of crashes and deadlocks. I've noticed the syslog is spammed with libroot __mutex_unlock: mutex was not actually locked! messages while running the mimeset bombs, though.
comment:35 by , 12 months ago
That's strange. If it had been the commit prior to that one, that would make sense, but for it to be that one is odd.
Could you perhaps enable that debugger() invocation and see where it triggers? If it's inside hoard2 that's one thing, but if it's triggering inside registrar itself that may be the real problem.
comment:36 by , 12 months ago
Edit: forgot to say these are enabling the debugger on current master
There's a variety of stack traces, all from registrar threads (usually "create_app_meta_mime (s)", but I got one from "messaging command processor"), always with free
at the top.
stack trace, current PC 0x1c2360ab099 </boot/system/lib/libroot.so> _kern_debugger + 0x9: (0x7f4324e20c10) 0x1c236133169 </boot/system/lib/libroot.so> free + 0x49 (0x7f4324e20c40) 0x1dd98b00136 </boot/system/lib/libbe.so> _ZN3agg11pod_bvectorINS_11vertex_distELj6EED2Ev + 0x36 (0x7f4324e20c60) 0x1dd98b01451 </boot/system/lib/libbe.so> _ZN8BPrivate4Icon17StrokeTransformerD0Ev + 0x11 (0x7f4324e20c80) 0x1dd98afa5d8 </boot/system/lib/libbe.so> _ZN8BPrivate4Icon9ContainerINS0_11TransformerEE9MakeEmptyEv + 0xd8 (0x7f4324e20d10) 0x1dd98afa140 </boot/system/lib/libbe.so> _ZN8BPrivate4Icon5ShapeD2Ev + 0xb0 (0x7f4324e20d40) 0x1dd98af94d1 </boot/system/lib/libbe.so> _ZN8BPrivate4Icon15PathSourceShapeD0Ev + 0x11 (0x7f4324e20d60) 0x1dd98b025f7 </boot/system/lib/libbe.so> _ZN8BPrivate4Icon9ContainerINS0_5ShapeEE9MakeEmptyEv + 0xb7 (0x7f4324e20df0) 0x1dd98b022d4 </boot/system/lib/libbe.so> _ZN8BPrivate4Icon4IconD2Ev + 0x34 (0x7f4324e20e20) 0x1dd98af5e3a </boot/system/lib/libbe.so> _ZN10BIconUtils13GetVectorIconEPKhmP7BBitmap + 0x29a (0x7f4324e21280) 0x1dd98aacd60 </boot/system/lib/libbe.so> _ZNK12BAppFileInfo14GetIconForTypeEPKcP7BBitmap9icon_size + 0x340 (0x7f4324e21330) 0xad7223f5e7 </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime18AppMetaMimeCreator2DoERK9entry_refPb + 0x4b7 (0x7f4324e21640) 0xad7223e674 </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread11UpdateEntryEPK9entry_ref.localalias + 0x54 (0x7f4324e21710) 0xad7223e82d </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread14ThreadFunctionEv + 0xad (0x7f4324e217b0) 0x1c2360a9dc9 </boot/system/lib/libroot.so> thread_entry + 0x19
stack trace, current PC 0x441a005099 </boot/system/lib/libroot.so> _kern_debugger + 0x9: (0x7f6eec37d750) 0x441a08d169 </boot/system/lib/libroot.so> free + 0x49 (0x7f6eec37d780) 0x1b4c6ec2c0c </boot/system/lib/libbe.so> _ZN8BPrivate7Storage12ResourceFile13_InitELFXFileI10Elf64_Ehdr10Elf64_Phdr10Elf64_ShdrEEvR5BFilem + 0x3bc (0x7f6eec37d850) 0x1b4c6eac166 </boot/system/lib/libbe.so> _ZN8BPrivate7Storage12ResourceFile12_InitELFFileER5BFile + 0x86 (0x7f6eec37d8a0) 0x1b4c6ead072 </boot/system/lib/libbe.so> _ZN8BPrivate7Storage12ResourceFile9_InitFileER5BFileb + 0x232 (0x7f6eec37d930) 0x1b4c6ead0dd </boot/system/lib/libbe.so> _ZN8BPrivate7Storage12ResourceFile5SetToEP5BFileb + 0x2d (0x7f6eec37d970) 0x1b4c6ead8a6 </boot/system/lib/libbe.so> _ZN10BResources5SetToEPK5BFileb + 0xd6 (0x7f6eec37d9b0) 0x1b4c6e940e6 </boot/system/lib/libbe.so> _ZN12BAppFileInfo5SetToEP5BFile + 0xe6 (0x7f6eec37d9e0) 0x1b4c6e941ba </boot/system/lib/libbe.so> _ZN12BAppFileInfoC1EP5BFile + 0x3a (0x7f6eec37da00) 0x4629e281ce </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime18AppMetaMimeCreator2DoERK9entry_refPb + 0x9e (0x7f6eec37dd10) 0x4629e27674 </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread11UpdateEntryEPK9entry_ref.localalias + 0x54 (0x7f6eec37dde0) 0x4629e276dd </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread11UpdateEntryEPK9entry_ref.localalias + 0xbd (0x7f6eec37deb0) 0x4629e276dd </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread11UpdateEntryEPK9entry_ref.localalias + 0xbd (0x7f6eec37df80) 0x4629e2782d </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread14ThreadFunctionEv + 0xad (0x7f6eec37e020) 0x441a003dc9 </boot/system/lib/libroot.so> thread_entry + 0x19
stack trace, current PC 0x20825b12099 </boot/system/lib/libroot.so> _kern_debugger + 0x9: (0x7ffe15231920) 0x20825b9a169 </boot/system/lib/libroot.so> free + 0x49 (0x7ffe15231950) 0xbf173b5f11 </boot/system/servers/registrar> _ZN16MessageDeliverer14DeliverMessageEPKviR18MessagingTargetSetl + 0x3a1 (0x7ffe152319d0) 0xbf173b83c6 </boot/system/servers/registrar> _ZN16MessagingService17_CommandProcessorEv + 0xd6 (0x7ffe15231a30) 0x20825b10dc9 </boot/system/lib/libroot.so> thread_entry + 0x19
stack trace, current PC 0xab414e5099 </boot/system/lib/libroot.so> _kern_debugger + 0x9: (0x7fee041ffeb0) 0xab4156d169 </boot/system/lib/libroot.so> free + 0x49 (0x7fee041ffee0) 0xf1100356d3 </boot/system/lib/libbe.so> _ZN9BMallocIOD2Ev + 0x23 (0x7fee041fff00) 0x107a56ce280 </boot/system/servers/registrar> _ZN16MessageDeliverer14DeliverMessageEP8BMessageR18MessagingTargetSetl + 0x70 (0x7fee041fff90) 0x107a56ce2f1 </boot/system/servers/registrar> _ZN16MessageDeliverer14DeliverMessageEP8BMessage10BMessengerl + 0x51 (0x7fee04200010) 0x107a56d06c4 </boot/system/servers/registrar> _ZN11MIMEManager6NotifyEP8BMessageRK10BMessenger + 0x44 (0x7fee04200060) 0x107a56e21a2 </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime8Database18_SendMonitorUpdateEiPKcbi + 0x112 (0x7fee04200120) 0x107a56e303b </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime8Database14SetIconForTypeEPKcS4_PKvm + 0x2fb (0x7fee042001e0) 0x107a56e0ce1 </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime18AppMetaMimeCreator2DoERK9entry_refPb + 0xbb1 (0x7fee042004f0) 0x107a56df674 </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread11UpdateEntryEPK9entry_ref.localalias + 0x54 (0x7fee042005c0) 0x107a56df82d </boot/system/servers/registrar> _ZN8BPrivate7Storage4Mime16MimeUpdateThread14ThreadFunctionEv + 0xad (0x7fee04200660) 0xab414e3dc9 </boot/system/lib/libroot.so> thread_entry + 0x19
comment:37 by , 12 months ago
I guess this is an instance of #18451, then. It looks like a more reliable reproducer than anything else we have so far.
When you say "click on a video a gazillion times", you mean double click it in Tracker, right? Do you leave each instance open, or do you close it each time before you open it again? Since one of my changes to the media node based playback in MediaPlayer, the video producer node and the video consumer node share the same video buffers (to avoid having to copy a video frame when it is transfered from the producer to the consumer node). Upon destruction of the connection, the media_server failes to deallocate these buffers. I have not investigated this issue closer for the time being. Just a hint in case I don't get around to fixing that, for whoever wants to work on it later.