Opened 15 years ago
Closed 15 years ago
#4360 closed bug (fixed)
KDL during bootscript on waitfor on SMP x86 machine
Reported by: | phoudoin | Owned by: | axeld |
---|---|---|---|
Priority: | blocker | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
Since a few days, any Haiku x86gcc4hybrid (ATA stack) built from trunk goes to KDL during bootscript time: the default desktop screen show up (app_server started) and immediately after this KDL message:
PANIC: page fault, but interrupts were disabled. Touching address 0x00020036 from eip 0x800d6d79 Welcome to Kernel Debugging Land... Thread 65 "waitfor", on CPU 0. kdebug>
Seen again this morning with a hrev32762, but it was the case since few days already. Trying every safe options but "disable SMP" changes nothing. With only "disable SMP" option, everything works fine. Except SMP ;-).
Haiku is installed natively, the machine is a Quad Core 2 Q6600 @ 2.4Ghz, Radeon HD4870 (-> vesa driver), 2 SATA disks (controller in SATA mode), USB Keyboard and mouse, an USB Volito tablet and an USB finger print reader.
Unfortunatly, my Logitech Illuminated USB keyboard connected to onboard EHCI/UHCI controller doesn't work while in KDL, so I can investigate more from the KDL :-\
I'll check with a r1a1 raw image tonight.
Change History (13)
follow-up: 2 comment:1 by , 15 years ago
comment:2 by , 15 years ago
Replying to axeld:
It would be very interesting to know which revision brought that regression.
I dunno exactly, but it's during the last two weeks, no more.
I'll tonight test with a clean r1a1 image written on my boot partition. Maybe I'm just doing something nuts in my UserBuildConfig, build config or, worse, I just didn't notice that svn update merge some forgotten but evil changes I keep secret (for good reason, will be the proof!) deep in my working_copy.
follow-up: 4 comment:3 by , 15 years ago
That's interesting. Marcus reported having to disable SMP on his quad as well while he was testing ATA yesterday. I have a Q6600 here as well, it works fine, but is only at hrev32438 right now. I'll use my second partition and update that one to a current revision and see if I can reproduce it.
The keyboard not working KDL is unfortunate, but caused by the fact that usb_hid needs to be active to register the USB keyboard. This is only the case after the input_server has been started and usb_hid is in use. Besides that, there's a bug somewhere that causes the USB keyboards to not work if you don't enter KDL manually first. That one is solvable though and on my TODO list.
comment:4 by , 15 years ago
I'll plug my former PS2 keyboard in order to report more (hopefully) usefull info on that KDL.
follow-up: 6 comment:5 by , 15 years ago
comment:6 by , 15 years ago
comment:7 by , 15 years ago
A PS2 keyboard, a second computer and an USB-serial adapter give me more info:
allocate MTRR slot 0, base = 7ff00000, length = 100000, type=0x0 allocate MTRR slot 1, base = 0, length = 80000000, type=0x6 kernel debugger extension "debugger/disasm/v1": loaded kernel debugger extension "debugger/hangman/v1": loaded kernel debugger extension "debugger/invalidate_on_exit/v1": loaded kernel debugger extension "debugger/run_on_exit/v1": loaded kernel debugger extension "debugger/usb_keyboard/v1": loaded allocate MTRR slot 2, base = e0000000, length = 800000, type=0x1 acpi: ACPI disabled ahci: ahci_supports_device PANIC: page fault, but interrupts were disabled. Touching address 0x00020036 from eip 0x800d6d79 Welcome to Kernel Debugging Land... Thread 65 "waitfor" running on CPU 0 kdebug> sc stack trace for thread 65 "waitfor" kernel stack: 0x8020e000 to 0x80212000 user stack: 0x7efef000 to 0x7ffef000 frame caller <image>:function + offset 0 80211ab0 (+ 32) 80065155 <kernel_x86> invoke_command_trampoline(void*: 0x80211b30) + 0x0015 1 80211ad0 (+ 12) 800caec3 <kernel_x86>:arch_debug_call_with_fault_handler + 0x001b 2 80211adc (+ 48) 8006349b <kernel_x86>:debug_call_with_fault_handler + 0x004c 3 80211b0c (+ 64) 80065523 <kernel_x86>:invoke_debugger_command + 0x00bb 4 80211b4c (+ 48) 80065640 <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x8011b5c2, int32: 0, char*: NULL) + 0x0083 5 80211b7c (+ 32) 80065708 <kernel_x86>:invoke_debugger_command_pipe + 0x008b 6 80211b9c (+ 128) 800695e7 <kernel_x86> ExpressionParser<0x80211c6c>::_ParseCommandPipe(int&: 0x80211c68) + 0x0aa3 7 80211c1c (+ 48) 8006bd87 <kernel_x86> ExpressionParser<0x80211c6c>::EvaluateCommand(char const*: 0x8011b5c0 "sc", int&: 0x8021 1c68) + 0x06d5 8 80211c4c (+ 192) 8006bf00 <kernel_x86>:evaluate_debug_command + 0x0084 9 80211d0c (+ 96) 80064495 <kernel_x86> kernel_debugger_internal(char const*: 0x819f4800 "", int32: -2145313176) + 0x0395 10 80211d6c (+ 16) 8006461c <kernel_x86>:kernel_debugger + 0x003f 11 80211d7c (+ 160) 800646d9 <kernel_x86>:panic + 0x002a 12 80211e1c (+ 64) 800c7ebe <kernel_x86> page_fault_exception(iframe*: 0x80211e68) + 0x011e 13 80211e5c (+ 12) 800cb26d <kernel_x86>:int_bottom + 0x003d kernel iframe at 0x80211e68 (end = 0x80211eb8) eax 0x20036 ebx 0x20002 ecx 0x0 edx 0x80120060 esi 0x80211f24 edi 0x20036 ebp 0x80211ec4 esp 0x80211e9c eip 0x800d6d79 eflags 0x10086 vector: 0xe, error code: 0x0 14 80211e68 (+ 92) 800d6d79 <kernel_x86>:strcmp + 0x0011 15 80211ec4 (+ 64) 80058cb7 <kernel_x86>:find_thread + 0x006c 16 80211f04 (+ 64) 80058d7f <kernel_x86>:_user_find_thread + 0x0049 17 80211f44 (+ 100) 800cb4a2 <kernel_x86>:handle_syscall + 0x00af user iframe at 0x80211fa8 (end = 0x80212000) eax 0x2d ebx 0x2c5e48 ecx 0x7ffeef1c edx 0xffff0114 esi 0x7ffef538 edi 0x7ffef544 ebp 0x7ffeef38 esp 0x80211fdc eip 0xffff0114 eflags 0x216 user esp 0x7ffeef1c vector: 0x63, error code: 0x0 18 80211fa8 (+ 0) ffff0114 <commpage>:commpage_syscall + 0x0004 19 7ffeef38 (+ 48) 00200813 <_APP_>:main + 0x0057 20 7ffeef68 (+ 52) 0020069d <_APP_>:_start + 0x0051 21 7ffeef9c (+ 64) 0010525b </boot/system/runtime_loader@0x00100000>:unknown + 0x525b 22 7ffeefdc (+ 0) 7ffeefec 1095:waitfor_main_stack@0x7efef000 + 0xffffec kdebug> teams team id parent name 0x811b7000 1 0x00000000 kernel_team 0x811b7330 64 0x811b7198 registrar 0x811b74c8 65 0x811b7198 waitfor 0x811b7198 55 0x811b7000 sh kdebug> threads thread id state wait for object cpu pri stack team name 0x819f8000 31 waiting sem 115 - 20 0x809c3000 1 uhci finish thread 0x819f8800 32 waiting sem 116 - 10 0x809c7000 1 uhci cleanup thread 0x801241a0 1 ready - - 0 0x80201000 1 idle thread 1 0x848fe000 64 ready - - 10 0x80189000 64 registrar 0x819f9000 33 waiting sem 123 - 20 0x809cb000 1 uhci isochronous finish thread 0x80124780 2 running - 2 0 0x80980000 1 idle thread 2 0x819f4800 65 running - 0 10 0x8020e000 65 waitfor 0x81a0a000 34 waiting sem 128 - 20 0x809d0000 1 uhci finish thread 0x80120060 -1073430524 UNKNOWN - [*** READ FAULT at 0xd508d508, pc: 0x800594ca ***] kdebug>
Something corrupts threads list.
follow-ups: 9 11 comment:8 by , 15 years ago
I've seen the exact same panic happen here when using a kernel revision that was incompatible with libroot concerning the size of the DIR cookie, due to the recent addition of seekdir/telldir support fields. Is it possible that your kernel is out of sync somehow? Did you update the kernel separately (like I do often)?
comment:9 by , 15 years ago
Replying to mmlr:
Is it possible that your kernel is out of sync somehow? Did you update the kernel separately (like I do often)?
No. Only when I tried to revert scheduler changes. But the above KDL output was from a hrev32798 gcc4 build with a whole svn update on my wc (not changes pending) and jam -qa @disk directly at target partition. Kernel and libroot should be in sync in such case, right?
I still have to test with a nightly build r1a1 gcc2 raw image, BTW.
comment:10 by , 15 years ago
Yeah you should be fine there. You could try reverting just hrev32679 to see just in case.
comment:11 by , 15 years ago
Replying to mmlr:
I've seen the exact same panic happen here when using a kernel revision that was incompatible with libroot concerning the size of the DIR cookie, due to the recent addition of seekdir/telldir support fields. Is it possible that your kernel is out of sync somehow? Did you update the kernel separately (like I do often)?
DIR cookies never cross the libroot-kernel boundary, so regarding this change it really shouldn't matter if kernel and userland are not in sync.
The stack trace suggests that a thread structure respectively the thread table is corrupt. So a scheduler-related problem seems more likely, particularly since with SMP disabled things work fine. hrev32503 also having the problem speaks against the theory. Are you sure you have correctly updated to that revision (to be sure the complete system so that you don't miss any of kernel, boot loader, runtime loader or any lib that does syscalls)?
comment:13 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
It would be very interesting to know which revision brought that regression.