Opened 17 years ago
Closed 15 years ago
#1925 closed bug (fixed)
Reboot during startup on Toshiba Satellite 2615DVD/6.0
Reported by: | umccullough | Owned by: | bonefish |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
Booting Haiku natively on this laptop causes it to reboot shortly after showing the HAIKU logo at the bottom of the screen. Tried disabling all options in boot menu, and it won't boot into Safe Mode either.
Possibly something I've done wrong, or possible the funky hardware on this laptop (some Linux LiveCDs hang at boot on this machine also - but usually after the X server is started).
Laptop is an older Toshiba Satellite 2615DVD/6.0:
PII Celeron running at ~430mhz 192MB RAM 6GB disk with Linux on it (first partition on the drive is ~256mb for Haiku)
Attaching serial debug log. lspci output will be attached shortly.
Note: this was built and installed directly to the partition from within Xubuntu 7.10 which generally doesn't give me any problems on other machines.
Attachments (18)
Change History (67)
by , 17 years ago
Attachment: | tosh_sat_r24408.txt added |
---|
comment:1 by , 17 years ago
Should also add that it was hrev24408 built with gcc2.95.3 and configured with --include-gpl-addons and --include-3rdparty
comment:2 by , 17 years ago
I have seen a similar issue on an HP DC 7600 (Pentium 4), booting from CD. I haven't filed a bug yet because I don't have the pc at hand to do tests (it's not my pc). I was able to boot further (it then panics by not finding a boot volume) by disabling the "use APIC I/O" (or something similar) option in the bios.
comment:3 by , 17 years ago
Good idea. I will try messing with some of the more obscure BIOS settings and see if that changes my results.
Will update in a day or two with any results I find.
comment:4 by , 17 years ago
I was unable to find any BIOS options that would change the behavior. (The Toshiba BIOS on this laptop is amazingly stupid).
I suppose it's time for me to start adding some trace code to find where it fails - any recommendations on where to start based on the attached serial log?
by , 17 years ago
Attachment: | tosh_sat_r24494_with_trace.txt added |
---|
New serial log with TRACE_VM and TRACE_FAULTS enabled
comment:5 by , 17 years ago
I attached a new serial log from Haiku hrev24494 and I enabled TRACE_VM and TRACE_FAULTS in vm.cpp (had to comment out a trace line that didn't compile, btw).
Looks like it maybe fails in find_and_insert_area_slot()
Anything I can try or provide to further identify the cause here?
comment:6 by , 17 years ago
The line "PANIC: looking up page failed for pa 0x41433000" can only be caused by vm_create_anonymous_area() when it tries to create areas for already bound memory. There seems to be something seriously wrong there, which is also underpinned by the fact that it doesn't find its way into the debugger.
Could you also provide the output of TRACE_MMU in boot/platform/bios_ia32/mmu.cpp?
Would it be possible that the memory is corrupt on this machine? Does it work flawlessly with other operating systems?
comment:7 by , 17 years ago
It does work fine with Xubuntu on it (in fact, I used Xubuntu on this very machine to build Haiku onto the other partition) and previously had Windows XP on it with no issues that I ever recall.
I believe I ran memtest86+ the other day to verify that very notion - but I only let it run through the first couple tests (as these usually find bad ram quickly in my experience). However, if it's a subtle memory issue I can let memtest86+ run through an ENTIRE pass of tests one night and see what it throws out.
I will also turn on the TRACE_MMU and rebuild soon and report back my results.
comment:8 by , 17 years ago
Don't worry about that; if other systems run that stable, there is something wrong with Haiku, I'd guess. It will probably corrupt some memory at some point of the boot process.
by , 17 years ago
Attachment: | tosh_sat_r24636_trace_vm_and_trace_mmu.txt added |
---|
New trace with TRACE_MMU and TRACE_VM enabled
comment:9 by , 17 years ago
Added another serial log with TRACE_MMU and TRACE_VM enabled.
LOTS of info in this one :/
comment:10 by , 16 years ago
I have the exact same problem with booting Haiku hrev25852 on a Toshiba Satellite Pro 4270. The boot process fails at the same place with the same PANIC messages.
I've had Win98, Ubuntu 8.04 and BeOS R5 running fine on the laptop.
Feel free to contact me if you'd like me to do any testing on this machine.
comment:11 by , 16 years ago
comment:12 by , 16 years ago
I've been testing revisions every few weeks on this same laptop with no change in result until recently.
I just finally got around to testing hrev27470 and noticed that instead of an immediate reboot, it now hangs at the boot screen with *no* icons lit.
Unfortunately I haven't had a chance to dig any deeper yet, I'll try to get a serial log and try going back some revs to find out what changed.
I will do so shortly.
by , 16 years ago
Attachment: | Haiku_r27641_tosh_sat.txt added |
---|
new failure during boot with hrev27641
comment:13 by , 16 years ago
I attached dmesg output from ubuntu (currently gOS 3.0 beta) in hopes that it might yield something of interest that is specific to this machine in the early boot process.
comment:14 by , 15 years ago
Just in case anyone was wondering, this laptop still reboots shortly after displaying the splash screen.
I was hoping some of the recent triple-fault fixes might have fixed it, but really wasn't expecting much since they seemed unrelated.
Tested with hrev32451 IIRC.
If there's any amount of debug info I can provide to further track this down, please don't hesitate to ask.
follow-up: 16 comment:15 by , 15 years ago
Have you tried using the on-screen debug output option in the boot loader?
comment:16 by , 15 years ago
Replying to axeld:
Have you tried using the on-screen debug output option in the boot loader?
Pretty certain this has been tried in the past, but I'm not sure what you're asking for here. If you want a serial debug I can provide it (and already have attached an older one to this ticket)...
Is there some additional debugging I can enable beyond the TRACE_VM and TRACE_MMU that I have already provided?
comment:17 by , 15 years ago
My poor Toshiba laptop still fails to boot, but now it fails with an ASSERT at least!
PANIC: ASSERT FAILED (/home/umccullough/haiku/haiku/trunk/src/system/kernel/arch/x86/arch_vm_translation_map.cpp:1411): (sPageHole[va / 4096] & 0x00000001) == 0 Welcome to Kernel Debugging Land...
This is a new serial output result, but the machine still reboots before the first icon lights up on the splash screen. (will attach complete serial log)
I can enable more debugging output if that will help.
comment:18 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Let's first track down the kernel debugger problem. Otherwise it's mighty annoying to get more info.
PANIC: page fault in debugger without fault handler! Touching address 0x00000004 from eip 0x800fc68f
Please run objdump -D --demangle path/to/kernel_x86
in Linux and attach the disassembled function that contains the mentioned eip address (0x800fc68f -- might have changed, if you've recompiled the kernel in the meantime).
follow-up: 20 comment:19 by , 15 years ago
Ok, I apparently had already recompiled it, so I used this opportunity to upgrade to hrev35633 before getting the info you needed.
The new failure happens here:
PANIC: page fault in debugger without fault handler! Touching address 0x00000004 from eip 0x800fcde7
And this maps back to this function in the kernel_x86 dump:
800fcdd8 <LargeMemoryPhysicalPageMapper::GetPageDebug(unsigned long, unsigned long *, void **)>:
(hand typed, sorry if there's an error)
I can upload the entire dump if you want, but it's 28mb
comment:20 by , 15 years ago
Replying to umccullough:
I can upload the entire dump if you want, but it's 28mb
Just the section for LargeMemoryPhysicalPageMapper::GetPageDebug() would be perfect. Thanks!
by , 15 years ago
Attachment: | GetPageDebug_r35633.txt added |
---|
dump of LargeMemoryPhysicalPageMapper::GetPageDebug() from hrev35633
by , 15 years ago
Attachment: | 1925-page-mapper-debug-output.diff added |
---|
patch: additional debug output in LargeMemoryPhysicalPageMapper
comment:21 by , 15 years ago
Apparently the page mapper's fDebugPool is NULL. Please try the attached patch. It would produce a bit more debug output, that hopefully helps to understand the reason. Updating only the kernel will suffice.
follow-up: 24 comment:22 by , 15 years ago
I'm afraid the debug output appears to be corrupt on the serial line, see the snippet below:
load kernel... video mode: 800x600x32 kernel entry at 8004b058-¡Ukë«« ½É¹±ÕɽÕÑÁÕÑ 5)! µ½ÙÁÑåÁÁ½¹éÍÕÙÍÍ5) ÑÁÍÍÙµµááÍÉ5)ÁááÁÅÙÍÙÁµù1 ±Aµ½ÉåA¡åÍ¥¹}µÍÉÁÍÑÁÁ¥¹ÅÁ±ÍÑÉ¥¹¹Õ¥¹%¹Ñ±AUÁé 5ÕÉÍéÁÕÙµÁÍÑÍ ÁÁÉéé%¹¥Ñ¡¥5)ÉÍÉÙÁ éÁááÁÙáÉÁÁÁ5) ±¥ Welcome to Kernel Debugging Land... PANIC: page fault in debugger without fault handler! Touching address 0x00000004 from eip 0x800fce73
Any ideas on what is causing that?
comment:23 by , 15 years ago
Slightly more readable (set my receiving terminal to UTF-8 and rebooted again):
load kernel... video mode: 800x600x32 kernel entry at 8004b058-▒Uk뫫 AU▒▒遙▒▒▒▒ɕ▒遙▒Ձٵ▒▒▒▒▒▒͕▒▒͍▒▒▒Ɂ▒▒▒▒▒▒▒▒▒▒͕▒▒▒▒Ɂ▒▒▒▒▒▒▒▒▒▒ف▒с▒͕▒ف▒▒ၙ▒▒Ɂ5)▒▒▒▒▒▒͙▒▒▒1▒ɝ▒5▒▒▒▒▒A▒▒ͥ▒▒▒A▒▒▒5▒▒▒▒▒▒%▒▒ѡ▒5)▒▒ɕ͕▒ٕ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒5)▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒5)▒PANIC: ASSERT FAILED (/home/umccullough/haiku/haiku/trunk/src/system/kernel/arch/x86/arch_vm_translation_map.cpp:1411): (sPageHole[va / 4096] & 0x00000001) == 0 Welcome to Kernel Debugging Land... PANIC: page fault in debugger without fault handler! Touching address 0x00000004 from eip 0x800fce73
But still nothing usable I think.
comment:24 by , 15 years ago
Replying to umccullough:
I'm afraid the debug output appears to be corrupt on the serial line, see the snippet below:
![...]
Any ideas on what is causing that?
Maybe a hardware problem (cable, port)? You could try a lower bit rate (kernel settings).
comment:25 by , 15 years ago
I had a flash of inspiration. hrev35644 should fix the recursive panic(). I.e. the system should no longer reboot and at least spit out a stack trace. The stack trace will be numeric only, though, so the symbols will have to be resolved manually. Please attach the kernel, if you find a zip tool that makes it small enough. Otherwise feel free to send it via email.
comment:26 by , 15 years ago
Interestingly, hrev35644 seems to at least fix the corrupted serial output, but it doesn't fix the panic problem.
I'll try to add your dprintf's again and see if I can get better info next.
New output:
load kernel... video mode: 800x600x32 kernel entry at 8004b058 Welcome to kernel debugger output! Haiku revision: 35643 CPU 0: type 0 family 6 extended_family 0 model 6 extended_model 0 stepping 10, string 'GenuineIntel' CPU 0: features: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr PANIC: ASSERT FAILED (/home/umccullough/haiku/haiku/trunk/src/system/kernel/arch/x86/arch_vm_translation_map.cpp:1411): (sPageHole[va / 4096] & 0x00000001) == 0 Welcome to Kernel Debugging Land... PANIC: page fault in debugger without fault handler! Touching address 0x00000004 from eip 0x800fced7 Welcome to Kernel Debugging Land... PANIC: page fault in debugger without fault handler! Touching address 0x00000004 from eip 0x800fced7
Note, it says rev 35643, but i'm pretty certain i updated it fully... I'll check again.
by , 15 years ago
Attachment: | tosh_sat_r35646.txt added |
---|
Toshiba serial log for hrev35646 with stack trace
follow-up: 29 comment:28 by , 15 years ago
Note to self: remember to "svn switch" all old working copies to the new svn repo to avoid delayed updates.
New serial log attached here, I am emailing the kernel_x86 as it was still > 488kb bz2'd.
It seems keyboard input even works in kdebug, but it doesn't show on the laptop display, only via serial. It also doesn't accept any of the commands I tried, so I guess the debugger is in a barely-functional state at this point?
On the bright side, something seems to have fixed the corrupt serial output as well, maybe i jiggled the cable just right :)
follow-up: 30 comment:29 by , 15 years ago
Replying to umccullough:
New serial log attached here, I am emailing the kernel_x86 as it was still > 488kb bz2'd.
The stack trace translates to:
arch_debug_stack_trace kernel_debugger_loop kernel_debugger_internal panic arch_vm_translation_map_early_map vm_allocate_early LargeMemoryPhysicalPageMapper::Init large_memory_physical_page_ops_init arch_vm_translation_map_init vm_init _start
As expected we're still in LargeMemoryPhysicalPageMapper::Init(), the called vm_allocate_early() in this case is the second one (for page table and data).
It seems keyboard input even works in kdebug, but it doesn't show on the laptop display, only via serial. It also doesn't accept any of the commands I tried, so I guess the debugger is in a barely-functional state at this point?
Yes, it's too early in the boot process. There's no kernel heap yet and the kernel debugger commands haven't been registered.
On the bright side, something seems to have fixed the corrupt serial output as well, maybe i jiggled the cable just right :)
Good. :-)
I've added more debug output in hrev35658. I'm also attaching patch 1925-early-allocation-debug-output.diff which enables it and adds a bit more.
by , 15 years ago
Attachment: | 1925-early-allocation-debug-output.diff added |
---|
patch: additional debug output in vm_allocate_early and arch_vm_translation_map.cpp
follow-up: 31 comment:30 by , 15 years ago
comment:31 by , 15 years ago
Replying to umccullough:
Just to make sure I understand, do you want me to also apply the previous patch as well? (1925-page-mapper-debug-output.diff)
Nope, that won't be necessary. If it is still applied, you don't need to bother to remove it either, though.
by , 15 years ago
Attachment: | tosh_sat_r35662_TRACE_VM_TMAP.txt added |
---|
Toshiba hrev35662 serial output with TRACE_VM_TMAP enabled
comment:32 by , 15 years ago
It looks like the broken page table entry stems from the boot loader. Please update to hrev35686 and enable TRACE_MMU in src/system/boot/platform/bios_ia32/mmu.cpp. The previous patch can stay enabled.
comment:34 by , 15 years ago
by , 15 years ago
Attachment: | tosh_sat_r35688_TRACE_MMU.txt added |
---|
Toshiba 35688 serial log with TRACE_MMU
comment:35 by , 15 years ago
Attached is the new serial log.
Unfortunately, I'm getting junk serial output again after kernel entry, I hope the info you need is prior to that?
comment:36 by , 15 years ago
Inconclusive yet. We're missing some earlier debug output. Please retry with hrev35699.
by , 15 years ago
Attachment: | tosh_sat_r35700_TRACE_MMU.txt added |
---|
Toshiba hrev35700 serial output with TRACE_MMU (take 2)
comment:37 by , 15 years ago
I'm beginning to see the problems. Please disable TRACE_MMU again and enable TRACE_MEMORY_MAP in the same file (a few lines later).
by , 15 years ago
Attachment: | tosh_sat_r35700_TRACE_MEMORY_MAP.txt added |
---|
Toshiba hrev35700 serial output with TRACE_MEMORY_MAP
comment:38 by , 15 years ago
Doesn't seem to be related to the problem I found. The extended memory says:
base 0x000e8000, len 0x00004000, type 1 (memory)
So the page at 0xe9000 should point to regular RAM. From tosh_sat_r35662_TRACE_VM_TMAP.txt:
early_tmap: entry pa 0xe8000 va 0x80e81000 early_map: asked for free page for pgtable. 0xe9000 PANIC: ASSERT FAILED (/home/umccullough/haiku/haiku/trunk/src/system/kernel/arch/x86/arch_vm_translation_map.cpp:1439): (sPageHole[va / 4096] & 0x00000001) == 0; existing pte: 0xffffffff
Page 0xe9000 is allocated for a page table. That means the page is completely cleared. Yet the immediately following assert fails, finding the value 0xffffffff in it. So obviously that isn't usable memory. Either the memory is defective or the memory map is wrong. I'd recommend running memtest.
comment:39 by , 15 years ago
I've run memtest on this laptop half-dozen times since this problem started to occur :/
In any case, if you look at the ubuntu dmesg output attached to this ticket:
http://dev.haiku-os.org/attachment/ticket/1925/ubuntu_dmesg_toshiba.txt
It seems to have a different map here... Perhaps it knows something about that range that Haiku doesn't?
comment:40 by , 15 years ago
Scratch that, Debian dmesg currently reports something similar (will attach in a few mins).
by , 15 years ago
Attachment: | deb_dmesg_output.txt added |
---|
Toshiba debian dmesg output as of 2010-03-02
follow-up: 42 comment:41 by , 15 years ago
I'll run memtest86+ all day today to see if it finds anything. I have it set to "e820-All" to hopefully force it to test something that wouldn't otherwise be tested.
comment:42 by , 15 years ago
Replying to umccullough:
I'll run memtest86+ all day today to see if it finds anything. I have it set to "e820-All" to hopefully force it to test something that wouldn't otherwise be tested.
I suspect this is a BIOS issue (incorrect entry for the physical range) rather than defective memory. The question is why Linux doesn't have a problem with it. Maybe they treat certain ranges specially. I suppose we could ignore the ranges below 1 MB just as well. We already reserve 0x0 - 0xa0000 (dma_region) and 0xe0000 - 0x100000 (pc bios), which leaves only the 256 KB between 0xa0000 and 0xe0000. Neither in your example nor when I use qemu there's any usable memory in this range, anyway. I don't know how far http://wiki.osdev.org/Memory_Map_%28x86%29 can be trusted, but it essentially claims that 0xa0000 through 0x100000 is never usable.
comment:43 by , 15 years ago
Before I left the house, I did have to switch memtest back to e820-Std mode, as the "all" mode kept freezing up. Upon reading the documentation for memtest, it was evident that this is considered an "unstable option" as it will scan reserved ranges as well.
Your diagnosis may well be correct, perhaps Linux just avoids this region for compatibility's sake. I wouldn't know where to look offhand personally, but maybe a glance at the code for Linux or BSD might yield some interesting comments there.
by , 15 years ago
Attachment: | tosh_sat_r35731_kdl.txt added |
---|
Toshiba hrev35731 serial output with GPE KDL
follow-up: 45 comment:44 by , 15 years ago
So, wow!
After updating the hrev35731, it boots a lot further (I had to disable all the tracing you had me enable previously since it was clogging the serial output).
As you can see from the newly attached serial output, we've gotten to a reasonable state of affairs, with a new KDL.
Perhaps we can close now this ticket and I can open a new one with this new problem?
comment:45 by , 15 years ago
Replying to umccullough:
Perhaps we can close now this ticket and I can open a new one with this new problem?
I think it's still the same problem. Looking at the objdump of the locked_pool module the function currently dereferences %edi, which comes from a memory allocation and currently has the value 0xffffffff. So that's probably one of those unusable four pages, which is just used at a different place now.
comment:46 by , 15 years ago
Version: | R1/pre-alpha1 → R1/Development |
---|
I went ahead and changed the boot loader to ignore all memory < 1 MB in hrev35736. This should fix the issue.
by , 15 years ago
Attachment: | tosh_sat_r35737_hang.txt added |
---|
Toshiba hrev35737 serial output hang after rocket icon
comment:47 by , 15 years ago
Definitely gets further again.
All boot icons are lit now, but it's currently hung after the rocket icon.
No indication or movement in the serial output as to the problem that I can see. I'll have to mess with it some more after work tonight.
follow-up: 49 comment:48 by , 15 years ago
Note: I can get into KDL still, so if there's any specific output that would be relevant, let me know. Also, would this be the right time to close this and open a new ticket?
comment:49 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Replying to umccullough:
Note: I can get into KDL still, so if there's any specific output that would be relevant, let me know.
In the log there is:
Finding best mode failed
Which is a message from the app server (Screen.cpp). I'm not really familiar with the app server. This may be fatal and the reason for the boot process not continuing, or it may be harmless. Axel or Stippi will know.
You could add output to the Bootscript to see how far it gets (redirect to "/dev/dprintf"). Or, maybe easier, a "teams" in the kernel debugger will let you see what is already running. If a "waitfor" is among those a "team <id>" for that team will list the command line argument (among other things).
Also, would this be the right time to close this and open a new ticket?
Absolutely.
Toshiba Satellite boot failure log