Opened 12 years ago

Closed 10 years ago

Last modified 10 years ago

#1711 closed bug (fixed)

Haiku stops booting at allocate_commpage_entry(4, 34)

Reported by: euan Owned by: korli
Priority: critical Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Only started happening in the last week or so. Harddisk light remains on.

Hardware AMD Athlon X2 3800+ ASUS A8R-MVP AMD / ALI chipset 2GB RAM PowerColor Radeon HD 2600XD

Last lines of syslog:

KDiskSystem::Unload(): file_systems/iso9660/v1 -> 0 KDiskSystem::Load(): file_systems/nfs/v1 -> 1

trying: file_systems/nfs/v1 returned: -1000/1000

KDiskSystem::Unload(): file_systems/nfs/v1 -> 0 allocate_commpage_entry(4, 34) -> 0xffff0118

Attachments (1)

syslog.txt (332.8 KB ) - added by euan 12 years ago.
Euan's X2 Syslog file (several boots with various boot options set)

Download all attachments as: .zip

Change History (54)

by euan, 12 years ago

Attachment: syslog.txt added

Euan's X2 Syslog file (several boots with various boot options set)

comment:1 by bonefish, 12 years ago

It would be nice to know where exactly the kernel hangs. If you can enter the kernel debugger (F12, PS/2 keyboard needed), please print a stack trace of the "main2" thread. You can look up it's ID in the listing produced by the command "threads" (probably 7) and pass it as argument to "sc".

comment:2 by euan, 12 years ago

Hi,

Unfortunately the debugger is initialised on what would be the next line, so isn't ready (or doesn't work).

When I first booted with the PS2 keyboard connected, it past the line at fault with no problem, next line reads: "kernel debugger extention "debugger/hangman/v1": loaded". However it then locked up on the line after "starting EHCI Host Controller", again, the debugger could not be invoked. Disconnecting the usb devices still results in same hang. Trying other combinations as we speak.

Perhaps we can move the debugger init earlier just to test?

comment:3 by euan, 12 years ago

Update: Disabling USB in the bios, allows the OS to boot.

in reply to:  2 comment:4 by bonefish, 12 years ago

Owner: changed from bonefish to mmlr

Replying to euan:

Unfortunately the debugger is initialised on what would be the next line, so isn't ready (or doesn't work).

[...]

Perhaps we can move the debugger init earlier just to test?

Not sure what you mean by that. The kernel debugger is fully initialized shortly after the VM. The output "allocate_commpage_entry(4, 34)..." originates from a call from cpu_init_post_modules() which rather late in the boot process. There are only a few more initializations before the kernel is fully initialized and the boot script is started.

Replying to euan:

Update: Disabling USB in the bios, allows the OS to boot.

That suggests it is USB related. Assigning to mmlr. Maybe he has an idea.

comment:5 by euan, 12 years ago

I added some addtional logging:

Adding this to the syslog:

INIT: CPU init

allocate_commpage_entry(4, 34) -> 0xffff0118

INIT: VM init

INIT: debug init

register kernel daemon locking

register kernel daemon locked

register kernel daemon unlocking

register kernel daemon unlocked

syslog init post threads

syslog init post threads create sem

syslog init post threads create sem ok

syslog init post threads spawn thread

syslog init post threads spawn thread ok

open module list

The last line is from /kernel/debug.cpp:

status_t debug_init_post_modules(struct kernel_args *args) {

void *cookie;

check for dupped lines every 10/10 second register_kernel_daemon(check_pending_repeats, NULL, 10);

dprintf("syslog init post threads\n"); syslog_init_post_threads();

load kernel debugger addons dprintf("open module list\n"); cookie = open_module_list("debugger");

comment:6 by euan, 12 years ago

hopefully fixed in #1723 will know tomorrow.

comment:7 by euan, 12 years ago

no change tested with mmlr's patches up to today's [23742] :(

comment:8 by mmlr, 12 years ago

You can easily tell whether it's SMP related by disabling SMP from the boot menu and test if it works then. The open_module_list() function looks pretty uneventful to me, so I wouldn't know what should go wrong there, especially in relation to USB. You could also try to enter the debugger again (with F12) maybe it works now with SMP enabled after the changes in hrev23751. Then you can do a backtrace to find the exact location.

comment:9 by euan, 12 years ago

disabling SMP makes no difference. Still to try the new changes. If they don't work. I'll try reverting back to older versions. I suspect it was just around BeGeistert when it stopped functioning.

comment:10 by euan, 12 years ago

I tried revisions all the way back to December, and still not found the cause yet. Even if I delete the USB input drivers, all the USB bus manager and busses files it still won't start. Yet if I disable USB in the Bios it boots just fine.

comment:11 by mmlr, 12 years ago

Then it is most probably related to USB legacy emulation. In case you remove all USB files, the stack and host controller driver obviously will not get loaded. In this case the controllers will stay in legacy emulation mode. Maybe the PS/2 bus manager has a problem in that case? You could try removing the PS/2 bus manager too and see if this makes any difference.

comment:12 by korli, 12 years ago

Reproduced on a Core2Duo 3GB.

comment:13 by korli, 12 years ago

It actually fails in vm_init_post_modules(), just after a call to x86_set_mtrr().

set_memory_type called with : id = -1, base = 0, length = bfee0000, type = 0x50000000 allocate MTRR slot 0, base = 0, length = 100000000

on Linux : cat /proc/mtrr reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xbff00000 (3071MB), size= 1MB: uncachable, count=1 reg03: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1

comment:14 by korli, 12 years ago

Owner: changed from mmlr to korli

comment:15 by korli, 12 years ago

This could be fixed in hrev24244.

comment:16 by euan, 12 years ago

Here's my linux mtrrs for what it's worth.

reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x80000000 (2048MB), size= 512MB: write-back, count=1

Doing a build at the moment will update in 15 mins.

comment:17 by euan, 12 years ago

Sorry no change for me. :( Will try to get an updated syslog.

comment:18 by euan, 12 years ago

Hmm seems that it now hangs in a part of the boot loading filesystems stuff. Can't be sure if it's another issue, or the same one...

comment:19 by dustin howett, 12 years ago

Haiku fails to boot when i have 3GB of RAM in my system (128 is shared video ram), but works with 1 or 2 GB. Linux MTRRs:

reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size= 512MB: write-back, count=1
reg02: base=0xa0000000 (2560MB), size= 128MB: write-back, count=1
reg03: base=0xc0000000 (3072MB), size= 128MB: write-combining, count=1

Haiku MTRRs:

0: base=0x00000000, size = 2048 MB.
1: base=0x80000000, size = 1024 MB.
2: base=0xc0000000, size = 1 MB.
(08:41:35 PM) DHowett: geist: Haiku's adds up to 3gb.. linux up to less than that but i take that to be related to my 128 mb shared video ram
(08:43:33 PM) geist: could be
(08:43:45 PM) geist: yeah, bet haiku is marking over the video ram as regular mem

comment:20 by euan, 12 years ago

Ok so I managed to hook up my laptop and get a debug trace. Unfortunately the only difference is that it hangs one line before the original log. So the allocate_comm_page_entry line is never printed now.

So not fixed as yet. :(

comment:21 by korli, 12 years ago

I would need additional information if it's possible : on Linux, you should find something about "BIOS-provided physical RAM map" in /var/log/messages, especially lines beginning with "BIOS-e820". Thanks.

comment:22 by euan, 12 years ago

Darn I had that saved off to a file too for uploading. I won't be back home until Wednesday sorry.

comment:23 by dustin howett, 12 years ago

At least for me, with 3 GB RAM -- assuming i'm having the same class of issue as euan...

BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
 BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000a7f00000 (usable)
 BIOS-e820: 00000000a7f00000 - 00000000a7f15000 (ACPI data)
 BIOS-e820: 00000000a7f15000 - 00000000a7f80000 (ACPI NVS)
 BIOS-e820: 00000000a7f80000 - 00000000b0000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000110000000 (usable)

comment:24 by euan, 12 years ago

BIOS-provided physical RAM map:

BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000009ffb0000 (usable) BIOS-e820: 000000009ffb0000 - 000000009ffbe000 (ACPI data) BIOS-e820: 000000009ffbe000 - 000000009ffe0000 (ACPI NVS) BIOS-e820: 000000009ffe0000 - 00000000a0000000 (reserved) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)

comment:25 by ddew, 12 years ago

I also have that issue with 5GB RAM. Posting my MTRR and RAM map:

Mar 14 17:17:53 workstation kernel: [    0.000000] BIOS-provided physical RAM map:
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Mar 14 17:17:53 workstation kernel: [    0.000000]  BIOS-e820: 0000000100000000 - 0000000160000000 (usable)

reg00: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x100000000 (4096MB), size=2048MB: write-back, count=1
reg03: base=0x160000000 (5632MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: uncachable, count=1

comment:26 by dustin howett, 12 years ago

hrev24424 fixed it for me.

comment:27 by euan, 12 years ago

Still no change for me (as of hrev24424)

comment:28 by euan, 12 years ago

no hold that thought I see differences between the build on my laptop and desktop, even though they are at the same revision. Something must be wrong with my desktop svn repos. I'll wipe it and rebuild from scratch tomorrow...

comment:29 by euan, 12 years ago

Everything works now. It was probably fixed for me on the first mtrr change. Sadly my svn repos was a bit knackered. I only noticed when the new boot splash logos didn't appear...

Thanks very much for the fix.

comment:30 by korli, 12 years ago

Resolution: fixed
Status: newclosed

Thank both of you for your feedback. Fixed in hrev24424.

comment:31 by ddew, 12 years ago

Resolution: fixed
Status: closedreopened

Sadly the fix didn't work for me on 5GB, it gets further in the boot process but still refuses to boot properly.

comment:32 by ddew, 12 years ago

Here are the relevant lines from the debug log, i had to type it in manually from a photo taken off the screen so I've omitted the lines showing what's being loaded. Last loaded debugger extension is hangman. .

set_memory_write_back base 0 length dfee0000
find_nearest dfee0000 0
find_nearest 5fee0000 1
find_nearest 1fee0000 2
find_nearest fee0000 3
find_nearest 7ee000 4
find_nearest 120000 4
find_nearest 120000 3
find_nearest 20120000 2
find_nearest 120000 3
find_nearest 1fee0000 3
find_nearest 20120000 1
find_nearest 120000 2
find_nearest 1fee0000 2
sols: 0xffffffff00000000 0x20000000 0x100000
allocate MTRR slot 0, base = 0, length = 1000
allocate MTRR slot 1, base = e00000000, length = 20000000
allocate MTRR slot 2, base = dff00000, length = 100000
allocate MTRR slot 3, base = e0000000, length = 800000
ahci: ExecuteAtaRequest port 0: device transfer timeout
ahci: sata_request::abort called for command 0x25

The last two lines just keeps on repeating until, after a few minutes, the screen goes black and it hangs when the desktop is normally drawn.

in reply to:  32 ; comment:33 by jackburton, 12 years ago

Replying to ddew:

ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25 }}}

This isn't related at all, looks like a problem with the IDE stack/SATA driver. This shows that the original bug is fixed, though, since the boot process goes on.

in reply to:  33 ; comment:34 by korli, 12 years ago

Replying to jackburton:

ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25 }}}

This isn't related at all, looks like a problem with the IDE stack/SATA driver. This shows that the original bug is fixed, though, since the boot process goes on.

No, I had the same behavior when setting the memory to write back, I/O requests are then stalled because the PCI bus is not refreshed. Though I'm wondering if Haiku is supposed to work with 5GB.

in reply to:  34 ; comment:35 by ddew, 12 years ago

Replying to korli:

Replying to jackburton:

ahci: ExecuteAtaRequest port 0: device transfer timeout ahci: sata_request::abort called for command 0x25 }}}

This isn't related at all, looks like a problem with the IDE stack/SATA driver. This shows that the original bug is fixed, though, since the boot process goes on.

No, I had the same behavior when setting the memory to write back, I/O requests are then stalled because the PCI bus is not refreshed. Though I'm wondering if Haiku is supposed to work with 5GB.

I'm not picky about having all the RAM being seen and used by the os, having only 3.5gb showing up is acceptable. Although PAE-support would be nice I just want to be able to boot into Haiku without pulling out RAM from the machine.:)

comment:36 by axeld, 12 years ago

How do the MTRR look like when booting Linux with those 5GB?

in reply to:  36 comment:37 by korli, 12 years ago

Replying to axeld:

How do the MTRR look like when booting Linux with those 5GB?

See the comment 25 dated from 03/14/08 11:38:13

in reply to:  35 comment:38 by jackburton, 12 years ago

Replying to ddew:

Though I'm wondering if Haiku is supposed to work with 5GB.

Looks like it works with 4GB at least...

http://www.biffuz.it/misc/haiku4gb.jpg

in reply to:  32 comment:39 by korli, 12 years ago

Replying to ddew:

allocate MTRR slot 0, base = 0, length = 1000 allocate MTRR slot 1, base = e00000000, length = 20000000 allocate MTRR slot 2, base = dff00000, length = 100000 allocate MTRR slot 3, base = e0000000, length = 800000

The first line should have a length 0x100000000. Are you sure about it ?

comment:40 by korli, 12 years ago

Could you test again with hrev24476 please ?

comment:41 by ddew, 12 years ago

Tried hrev24479 and it just reboots the box when it starts setting up the MTRRs.

comment:42 by korli, 12 years ago

Maybe it's not a good idea to have a MTRR uncached covering a MTRR write combining (the fourth one) ?

comment:43 by korli, 12 years ago

Could you please check again with hrev24494 ?

comment:44 by ddew, 12 years ago

Still won't boot. Now it resets right after allocating slot 0 judging from what I had time to see. The system resets during the allocation making hard to catch a glimpse of exactly what's being done and I don't have serial access to the machine.

comment:45 by korli, 11 years ago

is this bug still valid ? (just asking)

comment:46 by euan, 11 years ago

as the original reporter I believe this bug is generally resolved for the majority of systems.

comment:47 by scottmc, 11 years ago

looks like this one should be closed?

comment:48 by korli, 11 years ago

I understood a system with 5GB wasn't working. Maybe a separate bug report would fit better to follow the issue.

comment:49 by luroh, 11 years ago

<luroh> ddew|bofh: still got that 5 GB system mentioned in http://dev.haiku-os.org/ticket/1711 ?
<ddew|bofh> luroh: nope, running with 8 now
<ddew|bofh> luroh: works fine
<ddew|bofh> as did 5 after the mtrr fixes

Guess it can be closed.

comment:50 by stippi, 11 years ago

Resolution: fixed
Status: reopenedclosed

Cool, thanks for the update.

comment:51 by e_barsukowski, 10 years ago

Resolution: fixed
Status: closedreopened

I still have this issue in hrev30890. My system has Intel DG33FB motherboard based on G33 chipset, Core2 Duo E6750 CPU and 2GB of RAM.

Haiku hangs at the 5th icon half typing the message "allocate_commpa" or "allocate_commpage_entr" or other variations on serial debug. After disabling SMP (either in BIOS or via safe mode options) I finally get

<...>
ahci: AHCIPort::ResetPort port 4, deviceBusy 0, forceDeviceReset 0
ahci: AHCIPort::PostReset port 4
ahci: device signature 0xeb140101 (ATAPI)
allocate_commpage_entry(4, 34) -> 0xffff0118
PANIC: Fatal exception "Machine-Check Exception" occurred! Error code: 0x0

Welcome to Kernel Debugging Land...
Thread 12 "main2" running on CPU 0
kdebug> 

The KDL prompt appears only on serial debug but not on the screen.

comment:52 by bonefish, 10 years ago

Resolution: fixed
Status: reopenedclosed

Can you please open a new ticket. The original problem in this ticket has been reported fixed and a "Machine-Check Exception" had not been mentioned in the first place. Also a stack trace would be very helpful.

comment:53 by e_barsukowski, 10 years ago

Sorry. I'll open a new ticket.

Note: See TracTickets for help on using tickets.