#17455 closed bug (fixed)
KDL in VirtualBox in ConditionVariableEntry::_RemoveFromVariable
Reported by: | jmairboeck | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta4 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | #17444, #17686 | Blocking: | |
Platform: | All |
Description
I just updated to hrev55706 and got this KDL when closing AboutSystem.
KERN: PANIC: variable pointer was not unset for a long time! KERN: Welcome to Kernel Debugging Land... KERN: Thread 621 "w:617:Ãber dieses System" running on CPU 1 KERN: stack trace for thread 621 "w:617:Ãber dieses System" KERN: kernel stack: 0xffffffff8070c000 to 0xffffffff80711000 KERN: user stack: 0x00007f917f1f1000 to 0x00007f917f231000 KERN: frame caller <image>:function + offset KERN: 0 ffffffff80710ba8 (+ 24) ffffffff8014444c <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 KERN: 1 ffffffff80710bc0 (+ 80) ffffffff800adb08 <kernel_x86_64> debug_call_with_fault_handler + 0x78 KERN: 2 ffffffff80710c10 (+ 96) ffffffff800af123 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3 KERN: 3 ffffffff80710c70 (+ 80) ffffffff800af4be <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e KERN: 4 ffffffff80710cc0 (+ 240) ffffffff800af817 <kernel_x86_64> panic + 0xb7 KERN: 5 ffffffff80710db0 (+ 80) ffffffff800562f7 <kernel_x86_64> ConditionVariableEntry::_RemoveFromVariable() + 0x107 KERN: 6 ffffffff80710e00 (+ 64) ffffffff8005652e <kernel_x86_64> ConditionVariableEntry::Wait(unsigned int, long) + 0xae KERN: 7 ffffffff80710e40 (+ 144) ffffffff8006d81c <kernel_x86_64> _get_port_message_info_etc + 0x11c KERN: 8 ffffffff80710ed0 (+ 80) ffffffff8006e92b <kernel_x86_64> _user_port_buffer_size_etc + 0x4b KERN: 9 ffffffff80710f20 (+ 16) ffffffff80145f3f <kernel_x86_64> x86_64_syscall_entry + 0xfb KERN: user iframe at 0xffffffff80710f30 (end = 0xffffffff80710ff8) KERN: rax 0xdb rbx 0x7fffffffffffffff rcx 0x4a3d1eb59c KERN: rdx 0x7fffffffffffffff rsi 0x0 rdi 0x123 KERN: rbp 0x7f917f2304a0 r8 0x112351a3e7a0 r9 0x0 KERN: r10 0x0 r11 0x246 r12 0x0 KERN: r13 0x1123526eba90 r14 0x1123526eba90 r15 0x7fffffffffffffff KERN: rip 0x4a3d1eb59c rsp 0x7f917f230478 rflags 0x246 KERN: vector: 0x63, error code: 0x0 KERN: 10 ffffffff80710f30 (+140265020061040) 0000004a3d1eb59c <libroot.so> _kern_port_buffer_size_etc + 0x0c KERN: 11 00007f917f2304a0 (+ 64) 000001fcdde2bed6 <libbe.so> BPrivate::LinkReceiver::ReadFromPort(long) + 0x26 KERN: 12 00007f917f2304e0 (+ 32) 000001fcdde2be89 <libbe.so> BPrivate::LinkReceiver::GetNextMessage(int&, long) + 0x69 KERN: 13 00007f917f230500 (+ 112) 00000203136d948e <_APP_> ServerWindow::_MessageLooper() + 0x12e KERN: 14 00007f917f230570 (+ 16) 00000203136bccea <_APP_> MessageLooper::_message_thread(void*) + 0x0a KERN: 15 00007f917f230580 (+ 32) 0000004a3d1ea3a9 <libroot.so> _thread_do_exit_work (nearest) + 0x89 KERN: 16 00007f917f2305a0 (+ 0) 00007fdc25053260 <commpage> commpage_thread_exit + 0x00 KERN: kdebug> co
The KDL was continuable and it seems to run fine now.
Is this related to #17444?
Attachments (3)
Change History (25)
by , 3 years ago
comment:1 by , 3 years ago
comment:2 by , 3 years ago
My laptop's CPU doesn't run at full frequency (it fluctuates quite a bit) because of a somewhat broken fan, and I suppose the virtualization doesn't make it better. Using Linux as host OS.
comment:3 by , 3 years ago
To be honest, it is kind of concerning that thread_unblock is apparently significantly more expensive than 10000 atomic_pointer_gets, even after waking up the other thread.
comment:5 by , 3 years ago
I finished building with the patch you sent yesterday and just got one such KDL again, but less frequent than before, it seems. I will now update to the official build and test again. Hopefully the increased timeout fixes this completely.
comment:6 by , 3 years ago
Unfortunately, it didn't seem to fix this completely. I just one directly after booting hrev55709. :(
comment:7 by , 3 years ago
On the other hand, it now runs perfectly fine, after the initial ones directly after booting, even playing audio, and having Gerrit opened in Web+. Most of the KDLs came from the "Audio mixer" and "Websocket" threads before.
comment:8 by , 3 years ago
And as I pressed "submit" on the previous comment, I immediately got 3 more, all from the "Audio mixer", which shouldn't do anything right now ...
by , 3 years ago
comment:10 by , 3 years ago
Blocked By: | 17444 added |
---|
comment:12 by , 3 years ago
Using only a single virtual CPU seems to really avoid the issue reliably. I have not gotten any KDL yet in the last few hours. So these two issues really seem to be related, if I understand that correctly.
comment:13 by , 3 years ago
The KDL cannot occur at all when there is only one CPU, so, that would make sense that you don't see it. :)
comment:14 by , 3 years ago
Can you try enabling X2APIC for your guest VM, using VBoxManage modifyvm "name" --x2apic on
(and then re-enabling more CPU cores) and see what happens? (Please check your syslog after doing this to make sure X2APIC is actually enabled.)
comment:15 by , 3 years ago
Still no luck with that option enabled. I see it occuring in the syslog a few times, but still got a KDL just now.
by , 3 years ago
comment:16 by , 3 years ago
I have pushed a patch that may help here at https://review.haiku-os.org/c/haiku/+/4819. There are test builds available, could you install the appropriate one (should be https://haiku.movingborders.es/testbuild/I6e00286508c069705e07c9a0b59af2cf5e15e427/1/hrev55735/x86_64/haiku.hpkg) and see if that fixes things?
comment:19 by , 3 years ago
How frequently does it occur, now? More or less than your last comments about it?
comment:20 by , 3 years ago
As you have reduced the timeout again also, it is definitely less frequent than before with the old timeout. But I think this depends quite heavily on the host CPU frequency, how often this occurs.
comment:21 by , 3 years ago
Milestone: | Unscheduled → R1/beta4 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Panic downgraded into a syslog print in hrev55804.
comment:22 by , 3 years ago
Blocked By: | 17686 added |
---|
Yes.
It seems thread_unblock is much more expensive than I'd hoped, and that it is still running at the point the other thread wakes up and tries to do things, leading to this assertion failure. I could increase the timeout, or perhaps there is something else I can do still...