#1711 closed bug (fixed)
Haiku stops booting at allocate_commpage_entry(4, 34)
Reported by: | euan | Owned by: | korli |
---|---|---|---|
Priority: | critical | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
Only started happening in the last week or so. Harddisk light remains on.
Hardware AMD Athlon X2 3800+ ASUS A8R-MVP AMD / ALI chipset 2GB RAM PowerColor Radeon HD 2600XD
Last lines of syslog:
KDiskSystem::Unload(): file_systems/iso9660/v1 -> 0 KDiskSystem::Load(): file_systems/nfs/v1 -> 1
trying: file_systems/nfs/v1 returned: -1000/1000
KDiskSystem::Unload(): file_systems/nfs/v1 -> 0 allocate_commpage_entry(4, 34) -> 0xffff0118
Attachments (1)
Change History (54)
by , 17 years ago
Attachment: | syslog.txt added |
---|
comment:1 by , 17 years ago
It would be nice to know where exactly the kernel hangs. If you can enter the kernel debugger (F12, PS/2 keyboard needed), please print a stack trace of the "main2" thread. You can look up it's ID in the listing produced by the command "threads" (probably 7) and pass it as argument to "sc".
follow-up: 4 comment:2 by , 17 years ago
Hi,
Unfortunately the debugger is initialised on what would be the next line, so isn't ready (or doesn't work).
When I first booted with the PS2 keyboard connected, it past the line at fault with no problem, next line reads: "kernel debugger extention "debugger/hangman/v1": loaded". However it then locked up on the line after "starting EHCI Host Controller", again, the debugger could not be invoked. Disconnecting the usb devices still results in same hang. Trying other combinations as we speak.
Perhaps we can move the debugger init earlier just to test?
comment:4 by , 17 years ago
Owner: | changed from | to
---|
Replying to euan:
Unfortunately the debugger is initialised on what would be the next line, so isn't ready (or doesn't work).
[...]
Perhaps we can move the debugger init earlier just to test?
Not sure what you mean by that. The kernel debugger is fully initialized shortly after the VM. The output "allocate_commpage_entry(4, 34)..." originates from a call from cpu_init_post_modules() which rather late in the boot process. There are only a few more initializations before the kernel is fully initialized and the boot script is started.
Replying to euan:
Update: Disabling USB in the bios, allows the OS to boot.
That suggests it is USB related. Assigning to mmlr. Maybe he has an idea.
comment:5 by , 17 years ago
I added some addtional logging:
Adding this to the syslog:
INIT: CPU init
allocate_commpage_entry(4, 34) -> 0xffff0118
INIT: VM init
INIT: debug init
register kernel daemon locking
register kernel daemon locked
register kernel daemon unlocking
register kernel daemon unlocked
syslog init post threads
syslog init post threads create sem
syslog init post threads create sem ok
syslog init post threads spawn thread
syslog init post threads spawn thread ok
open module list
The last line is from /kernel/debug.cpp:
status_t debug_init_post_modules(struct kernel_args *args) {
void *cookie;
check for dupped lines every 10/10 second register_kernel_daemon(check_pending_repeats, NULL, 10);
dprintf("syslog init post threads\n"); syslog_init_post_threads();
load kernel debugger addons dprintf("open module list\n"); cookie = open_module_list("debugger");
comment:8 by , 17 years ago
You can easily tell whether it's SMP related by disabling SMP from the boot menu and test if it works then. The open_module_list() function looks pretty uneventful to me, so I wouldn't know what should go wrong there, especially in relation to USB. You could also try to enter the debugger again (with F12) maybe it works now with SMP enabled after the changes in hrev23751. Then you can do a backtrace to find the exact location.
comment:9 by , 17 years ago
disabling SMP makes no difference. Still to try the new changes. If they don't work. I'll try reverting back to older versions. I suspect it was just around BeGeistert when it stopped functioning.
comment:10 by , 17 years ago
I tried revisions all the way back to December, and still not found the cause yet. Even if I delete the USB input drivers, all the USB bus manager and busses files it still won't start. Yet if I disable USB in the Bios it boots just fine.
comment:11 by , 17 years ago
Then it is most probably related to USB legacy emulation. In case you remove all USB files, the stack and host controller driver obviously will not get loaded. In this case the controllers will stay in legacy emulation mode. Maybe the PS/2 bus manager has a problem in that case? You could try removing the PS/2 bus manager too and see if this makes any difference.
comment:13 by , 17 years ago
It actually fails in vm_init_post_modules(), just after a call to x86_set_mtrr().
set_memory_type called with : id = -1, base = 0, length = bfee0000, type = 0x50000000 allocate MTRR slot 0, base = 0, length = 100000000
on Linux : cat /proc/mtrr reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xbff00000 (3071MB), size= 1MB: uncachable, count=1 reg03: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1
comment:14 by , 17 years ago
Owner: | changed from | to
---|
comment:16 by , 17 years ago
Here's my linux mtrrs for what it's worth.
reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x80000000 (2048MB), size= 512MB: write-back, count=1
Doing a build at the moment will update in 15 mins.
comment:18 by , 17 years ago
Hmm seems that it now hangs in a part of the boot loading filesystems stuff. Can't be sure if it's another issue, or the same one...
comment:19 by , 17 years ago
Haiku fails to boot when i have 3GB of RAM in my system (128 is shared video ram), but works with 1 or 2 GB. Linux MTRRs:
reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x80000000 (2048MB), size= 512MB: write-back, count=1 reg02: base=0xa0000000 (2560MB), size= 128MB: write-back, count=1 reg03: base=0xc0000000 (3072MB), size= 128MB: write-combining, count=1
Haiku MTRRs:
0: base=0x00000000, size = 2048 MB. 1: base=0x80000000, size = 1024 MB. 2: base=0xc0000000, size = 1 MB.
(08:41:35 PM) DHowett: geist: Haiku's adds up to 3gb.. linux up to less than that but i take that to be related to my 128 mb shared video ram (08:43:33 PM) geist: could be (08:43:45 PM) geist: yeah, bet haiku is marking over the video ram as regular mem
comment:20 by , 17 years ago
Ok so I managed to hook up my laptop and get a debug trace. Unfortunately the only difference is that it hangs one line before the original log. So the allocate_comm_page_entry line is never printed now.
So not fixed as yet. :(
comment:21 by , 17 years ago
I would need additional information if it's possible : on Linux, you should find something about "BIOS-provided physical RAM map" in /var/log/messages, especially lines beginning with "BIOS-e820". Thanks.
comment:22 by , 17 years ago
Darn I had that saved off to a file too for uploading. I won't be back home until Wednesday sorry.
comment:23 by , 17 years ago
At least for me, with 3 GB RAM -- assuming i'm having the same class of issue as euan...
BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009dc00 (usable) BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000a7f00000 (usable) BIOS-e820: 00000000a7f00000 - 00000000a7f15000 (ACPI data) BIOS-e820: 00000000a7f15000 - 00000000a7f80000 (ACPI NVS) BIOS-e820: 00000000a7f80000 - 00000000b0000000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000110000000 (usable)
comment:24 by , 17 years ago
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000009ffb0000 (usable) BIOS-e820: 000000009ffb0000 - 000000009ffbe000 (ACPI data) BIOS-e820: 000000009ffbe000 - 000000009ffe0000 (ACPI NVS) BIOS-e820: 000000009ffe0000 - 00000000a0000000 (reserved) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
comment:25 by , 17 years ago
I also have that issue with 5GB RAM. Posting my MTRR and RAM map:
Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-provided physical RAM map: Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e800 (usable) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Mar 14 17:17:53 workstation kernel: [ 0.000000] BIOS-e820: 0000000100000000 - 0000000160000000 (usable)
reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg02: base=0x100000000 (4096MB), size=2048MB: write-back, count=1 reg03: base=0x160000000 (5632MB), size= 512MB: uncachable, count=1 reg04: base=0xdff00000 (3583MB), size= 1MB: uncachable, count=1
comment:28 by , 17 years ago
no hold that thought I see differences between the build on my laptop and desktop, even though they are at the same revision. Something must be wrong with my desktop svn repos. I'll wipe it and rebuild from scratch tomorrow...
comment:29 by , 17 years ago
Everything works now. It was probably fixed for me on the first mtrr change. Sadly my svn repos was a bit knackered. I only noticed when the new boot splash logos didn't appear...
Thanks very much for the fix.
comment:30 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thank both of you for your feedback. Fixed in hrev24424.
comment:31 by , 17 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Sadly the fix didn't work for me on 5GB, it gets further in the boot process but still refuses to boot properly.
follow-ups: 33 39 comment:32 by , 17 years ago
Here are the relevant lines from the debug log, i had to type it in manually from a photo taken off the screen so I've omitted the lines showing what's being loaded. Last loaded debugger extension is hangman. .
set_memory_write_back base 0 length dfee0000 find_nearest dfee0000 0 find_nearest 5fee0000 1 find_nearest 1fee0000 2 find_nearest fee0000 3 find_nearest 7ee000 4 find_nearest 120000 4 find_nearest 120000 3 find_nearest 20120000 2 find_nearest 120000 3 find_nearest 1fee0000 3 find_nearest 20120000 1 find_nearest 120000 2 find_nearest 1fee0000 2 sols: 0xffffffff00000000 0x20000000 0x100000 allocate MTRR slot 0, base = 0, length = 1000 allocate MTRR slot 1, base = e00000000, length = 20000000 allocate MTRR slot 2, base = dff00000, length = 100000 allocate MTRR slot 3, base = e0000000, length = 800000 ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25
The last two lines just keeps on repeating until, after a few minutes, the screen goes black and it hangs when the desktop is normally drawn.
follow-up: 34 comment:33 by , 17 years ago
Replying to ddew:
ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25 }}}
This isn't related at all, looks like a problem with the IDE stack/SATA driver. This shows that the original bug is fixed, though, since the boot process goes on.
follow-up: 35 comment:34 by , 17 years ago
Replying to jackburton:
ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25 }}}
This isn't related at all, looks like a problem with the IDE stack/SATA driver. This shows that the original bug is fixed, though, since the boot process goes on.
No, I had the same behavior when setting the memory to write back, I/O requests are then stalled because the PCI bus is not refreshed. Though I'm wondering if Haiku is supposed to work with 5GB.
follow-up: 38 comment:35 by , 17 years ago
Replying to korli:
Replying to jackburton:
ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25 }}}
This isn't related at all, looks like a problem with the IDE stack/SATA driver. This shows that the original bug is fixed, though, since the boot process goes on.
No, I had the same behavior when setting the memory to write back, I/O requests are then stalled because the PCI bus is not refreshed. Though I'm wondering if Haiku is supposed to work with 5GB.
I'm not picky about having all the RAM being seen and used by the os, having only 3.5gb showing up is acceptable. Although PAE-support would be nice I just want to be able to boot into Haiku without pulling out RAM from the machine.:)
follow-up: 37 comment:36 by , 17 years ago
How do the MTRR look like when booting Linux with those 5GB?
comment:37 by , 17 years ago
Replying to axeld:
How do the MTRR look like when booting Linux with those 5GB?
See the comment 25 dated from 03/14/08 11:38:13
comment:38 by , 17 years ago
Replying to ddew:
Though I'm wondering if Haiku is supposed to work with 5GB.
Looks like it works with 4GB at least...
comment:39 by , 17 years ago
Replying to ddew:
allocate MTRR slot 0, base = 0, length = 1000 allocate MTRR slot 1, base = e00000000, length = 20000000 allocate MTRR slot 2, base = dff00000, length = 100000 allocate MTRR slot 3, base = e0000000, length = 800000
The first line should have a length 0x100000000. Are you sure about it ?
comment:41 by , 17 years ago
Tried hrev24479 and it just reboots the box when it starts setting up the MTRRs.
comment:42 by , 17 years ago
Maybe it's not a good idea to have a MTRR uncached covering a MTRR write combining (the fourth one) ?
comment:44 by , 17 years ago
Still won't boot. Now it resets right after allocating slot 0 judging from what I had time to see. The system resets during the allocation making hard to catch a glimpse of exactly what's being done and I don't have serial access to the machine.
comment:46 by , 16 years ago
as the original reporter I believe this bug is generally resolved for the majority of systems.
comment:48 by , 16 years ago
I understood a system with 5GB wasn't working. Maybe a separate bug report would fit better to follow the issue.
comment:49 by , 16 years ago
<luroh> ddew|bofh: still got that 5 GB system mentioned in http://dev.haiku-os.org/ticket/1711 ?
<ddew|bofh> luroh: nope, running with 8 now
<ddew|bofh> luroh: works fine
<ddew|bofh> as did 5 after the mtrr fixes
Guess it can be closed.
comment:50 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Cool, thanks for the update.
comment:51 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I still have this issue in hrev30890. My system has Intel DG33FB motherboard based on G33 chipset, Core2 Duo E6750 CPU and 2GB of RAM.
Haiku hangs at the 5th icon half typing the message "allocate_commpa" or "allocate_commpage_entr" or other variations on serial debug. After disabling SMP (either in BIOS or via safe mode options) I finally get
<...> ahci: AHCIPort::ResetPort port 4, deviceBusy 0, forceDeviceReset 0 ahci: AHCIPort::PostReset port 4 ahci: device signature 0xeb140101 (ATAPI) allocate_commpage_entry(4, 34) -> 0xffff0118 PANIC: Fatal exception "Machine-Check Exception" occurred! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 12 "main2" running on CPU 0 kdebug>
The KDL prompt appears only on serial debug but not on the screen.
comment:52 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Can you please open a new ticket. The original problem in this ticket has been reported fixed and a "Machine-Check Exception" had not been mentioned in the first place. Also a stack trace would be very helpful.
Euan's X2 Syslog file (several boots with various boot options set)