#1056 closed bug (fixed)
Kernel Space Page Fault during Late Boot Process
Reported by: | bonefish | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
Every few boots I get a KDL during the late boot process under VMWare with hrev20262. The sc doesn't help at all. I don't really know how to best extract helpful infos from the kernel debugger in this case. I guess you'll know.
The virtual machine is configured with 256 MB RAM and one processor, the standard Haiku image as HD, and the usual peripherals.
Attachments (2)
Change History (11)
by , 18 years ago
comment:1 by , 18 years ago
The problem occurred three or four times in a rather short period of time. Since reporting it, it didn't happen even once more (probably 20+ runs since then). Fascinating.
comment:2 by , 18 years ago
comment:3 by , 18 years ago
As I wrote, I couldn't even reproduce it with hrev20262 anymore. :-) I think I hit the same (or similar) bug twice or thrice with earlier revisions, not very often, though. It looked like I really could reproduced quite frequently now, but apparently this isn't the case after all.
comment:4 by , 18 years ago
Reproduced with hrev20288. I add the serial debug output. The sc of the thread hitting KDL is as inconclusive as before. We've got a nice sc from the first userland crash though. Translating the libroot addresses:
0x21b18: atomic_test_and_set() (frameless) BPrivate::hoardLock(long &) 0x29c9a: BPrivate::threadHeap::malloc(unsigned long) 0x2a9e9: malloc() 0x77a81: add_fork_hook() 0x77bdc: __register_atfork() 0x6a2dc: atfork() 0x25be1: __init_heap() 0x170a3: initialize_before() 0x16f3e: _init_before() [runtime_loader...]
The accessed lock variable is part of the threadHeap object, which is part of the processHeap object. So, it looks like the pointer getAllocator() returns was invalid. Don't see how that would be possible though. It also doesn't explain why shortly after another process misbehaves in the kernel. Likely both are caused by a VM bug.
comment:5 by , 18 years ago
This bug is becoming my favorite... :-) Still hrev20288:
PANIC: cacheRef 0x908ad620 ref count too low! Welcome to Kernel Debugging Land... Running on CPU 0 kdebug> sc stack trace for thread 0x32 "sh" kernel stack: 0x803fd000 to 0x80400000 user stack: 0x7efe7000 to 0x7ffe7000 frame caller <image>:function + offset 803ff744 (+ 48) 8007a44d <kernel>:kernel_debugger_loop + 0x00dd 803ff774 (+ 32) 8007aec2 <kernel>:kernel_debugger + 0x009a 803ff794 (+ 176) 8007ae1e <kernel>:panic + 0x0036 803ff844 (+ 64) 80052dc9 <kernel>:vm_cache_remove_consumer + 0x0225 803ff884 (+ 48) 8005288c <kernel>:vm_cache_release_ref + 0x0128 803ff8b4 (+ 64) 8004dd67 <kernel>:_vm_put_area__FP7vm_areab + 0x011f 803ff8f4 (+ 48) 8004f186 <kernel>:vm_delete_areas + 0x008a 803ff924 (+ 64) 800325a1 <kernel>:exec_team + 0x0131 803ff964 (+1088) 80033aca <kernel>:_user_exec + 0x006a 803ffda4 (+ 352) 8002fd0b <kernel>:syscall_dispatcher + 0x043b 803fff04 (+ 160) 80080984 <kernel>:i386_handle_trap + 0x0238 iframe at 0x803fffac (end = 0x80400000) eax 0x18 ebx 0x3250cc ecx 0xffffff00 edx 0x2 esi 0x7ffe7008 edi 0x15 ebp 0x7ffe6b70 esp 0x803fffdc eip 0xffff0102 eflags 0x207 vector: 0x63, error code: 0x0 803fffa4 (+ 0) ffff0102 7ffe6b70 (+ 192) 00227148 533:sh_seg0ro@0x00200000 + 0x27148 7ffe6c30 (+ 76) 00226fdf 533:sh_seg0ro@0x00200000 + 0x26fdf 7ffe6c7c (+ 96) 002264b9 533:sh_seg0ro@0x00200000 + 0x264b9 7ffe6cdc (+ 96) 002237a6 533:sh_seg0ro@0x00200000 + 0x237a6 7ffe6d3c (+ 64) 00224897 533:sh_seg0ro@0x00200000 + 0x24897 7ffe6d7c (+ 96) 00223a30 533:sh_seg0ro@0x00200000 + 0x23a30 7ffe6ddc (+ 80) 0022315d 533:sh_seg0ro@0x00200000 + 0x2315d 7ffe6e2c (+ 48) 00225a0e 533:sh_seg0ro@0x00200000 + 0x25a0e 7ffe6e5c (+ 80) 002239a4 533:sh_seg0ro@0x00200000 + 0x239a4 7ffe6eac (+ 80) 0022315d 533:sh_seg0ro@0x00200000 + 0x2315d 7ffe6efc (+ 48) 0021ee19 533:sh_seg0ro@0x00200000 + 0x1ee19 7ffe6f2c (+ 64) 0021d0b2 533:sh_seg0ro@0x00200000 + 0x1d0b2 7ffe6f6c (+ 64) 00215d9f 533:sh_seg0ro@0x00200000 + 0x15d9f 7ffe6fac (+ 48) 001007ae 7ffe6fdc (+ 0) 7ffe6fec 540:/bin/sh_main_stack@0x7efe7000 + 0xffffec
comment:6 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Spotted the bug in lock_memory(). Fixed in 20362.
I think get_memory_map() needs reviews, though. Since vm_translation_map_ops::query() never returns an error (x86), but a NULL address in what I would consider an error case (Huh, what's the point of a return value, if it is constant?), get_memory_map() fills the respective physical_entry() with NULL in such a case.
How about a get_memory_map_etc() with additional parameters returning the number of physical entries set and the total amount of they cover?
comment:7 by , 18 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Darn, commented and closed wrong bug. :-/
Although it might indeed have been the cause too. I reopen it until verified.
comment:9 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Should be solved in hrev20474.
serial debug output