Opened 10 years ago

Closed 10 years ago

#5822 closed bug (fixed)

Boot failure

Reported by: andreasf Owned by: bonefish
Priority: normal Milestone: R1
Component: System/Boot Loader Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

On my old Gericom Overdoze II XXL MSW notebook (Pentium III, 128 MB RAM, ATI Mach64 8 MB) Haiku gets to the splash screen okay but after the rocket icon stalls with a black-and-white splash screen four times next to each other (hrev36100, hrev36339, hrev36474). Serial output follows:

ati: Interrupt assigned:  yes
ati: device_open() returning 0x0,  open count: 1
ati: Enter InitAccelerant()
ati: Vendor ID: 0x1002,  Device ID: 0x4C42
ati: Mach64_Init()
ati: Video Memory size: 8 MB  frameBufferOffset: 0x0  cursorOffset: 0x7ff000
ati: Memory type: 5
add_memory_type_range(1186, 0x0, 0x1000, 0)
set MTRRs to:
  mtrr:  0: base:        0x0, size:     0x1000, type: 0
  mtrr:  1: base: 0xd9000000, size:  0x1000000, type: 0
  mtrr:  2: base: 0x80000000, size: 0x80000000, type: 1
remove_memory_type_range(1186, 0x0, 0x1000, 0)
set MTRRs to:
  mtrr:  0: base: 0xd9000000, size:  0x1000000, type: 0
  mtrr:  1: base: 0x80000000, size: 0x80000000, type: 1
add_memory_type_range(1187, 0xa0000, 0x60000, 0)
set MTRRs to:
  mtrr:  0: base:    0xa0000, size:    0x20000, type: 0
  mtrr:  1: base:    0xc0000, size:    0x40000, type: 0
  mtrr:  2: base: 0xd9000000, size:  0x1000000, type: 0
  mtrr:  3: base: 0x80000000, size: 0x80000000, type: 1
remove_memory_type_range(1187, 0xa0000, 0x60000, 0)
set MTRRs to:
  mtrr:  0: base: 0xd9000000, size:  0x1000000, type: 0
  mtrr:  1: base: 0x80000000, size: 0x80000000, type: 1
ati: CreateModeList(); Unable to get EDID info
ati: Leave InitAccelerant(), result: 0x0
ati: SetDisplayMode() begin
ati: ProposeDisplayMode()  1024x768, pixel clock: 65000 kHz, space: 0x8
ati: Set display mode: 1024x768  virtual size: 1024x768  color depth: 32 bits/pixel
ati:    mode timing: 65000  1024 1048 1184 1344  768 771 777 806
ati:    mode hFreq: 48.4 kHz  vFreq: 60.0 Hz  -hSync -vSync
ati: SetDisplayMode() done

In QEMU for comparison Haiku boots up fine with such low amounts of RAM. I tried removing the ati driver and accelerant from the image. It then prints that it has loaded the vesa accelerant and stalls with a black screen instead. If I enter the boot menu at all, things get worse.

Once or twice I got to the blue screen but the cursor was not movable and nothing else happened, i.e. no language/keymap selection and no new serial output.

Attachments (3)

haiku_overdozesplash.jpg (150.2 KB ) - added by andreasf 10 years ago.
quadrupled splash screen w/ati
boot-memory.diff (1.8 KB ) - added by andreasf 10 years ago.
proposed patch
boot-memory.2.diff (2.0 KB ) - added by andreasf 10 years ago.
updated patch

Download all attachments as: .zip

Change History (30)

by andreasf, 10 years ago

Attachment: haiku_overdozesplash.jpg added

quadrupled splash screen w/ati

comment:1 by anevilyak, 10 years ago

Component: - GeneralDrivers/Graphics
Owner: changed from nobody to gerald.zajac
Status: newassigned

comment:2 by andreasf, 10 years ago

With worse referring to stalling at splash screen with no icon lighting up, after the three heap_add_areas. So I doubt this is a graphics driver issue.

SystemSoft BIOS mentions Intel 440BX/440ZX chipset. Further device information can be obtained from Linux (or BeOS R5) if necessary.

comment:3 by anevilyak, 10 years ago

Component: Drivers/GraphicsSystem/Kernel

Ah, missed that part, sorry. Assigning to kernel until a more appropriate component can be found.

comment:4 by anevilyak, 10 years ago

Owner: changed from gerald.zajac to bonefish

comment:5 by andreasf, 10 years ago

Removed the IDE harddrive and connected an external PS/2 mouse. Boot got to blue screen with new serial messages:

ati: SetDisplayMode() done
loaded driver /boot/system/add-ons/kernel/drivers/dev/net/pegasus
usb_asix:02.11.467:init_driver::ver.0.8.3
loaded driver /boot/system/add-ons/kernel/drivers/dev/net/usb_asix
loaded driver /boot/system/add-ons/kernel/drivers/dev/net/usb_ecm
0x838bc5d8->VMAnonymousCache::_Commit(13295616): Failed to reserve 13295616 bytes of RAM
runtime_loader: /boot/system/lib/libicu-data.so.4.2: Could not map image: Out of memory
ps2_hid: init_hardware
ps2_hid: init_driver
ps2_hid: publish_devices
ps2_hid: uninit_driver
loaded driver /boot/system/add-ons/kernel/drivers/dev/input/ps2_hid
ps2: probe_mouse Extended PS/2 mouse found
ps2: devfs_publish_device input/mouse/ps2/intelli_0, status = 0x00000000
loaded driver /boot/system/add-ons/kernel/drivers/dev/input/usb_hid
ps2: devfs_publish_device input/keyboard/at/0, status = 0x00000000
loaded driver /boot/system/add-ons/kernel/drivers/dev/input/wacom
0x838bc220->VMAnonymousCache::_Commit(212992): Failed to reserve 212992 bytes of RAM
runtime_loader: /boot/system/lib/libroot.so: Could not map image: Out of memory
ps2: keyboard found
load addon (null) failed
Last message repeated 3 times.
0x8387c330->VMAnonymousCache::_Commit(856064): Failed to reserve 856064 bytes of RAM
runtime_loader: /boot/system/lib/libroot.so: Could not map image: Out of memory
0x83893770->VMAnonymousCache::_Commit(729088): Failed to reserve 729088 bytes of RAM
0x838b2220->VMAnonymousCache::_Commit(2371584): Failed to reserve 2371584 bytes of RAM
runtime_loader: /boot/system/lib/libbe.so: Could not map image: Out of memory
0x838b6b28->VMAnonymousCache::_Commit(856064): Failed to reserve 856064 bytes of RAM
runtime_loader: /boot/system/lib/libroot.so: Could not map image: Out of memory
0x83880550->VMAnonymousCache::_Commit(2371584): Failed to reserve 2371584 bytes of RAM
runtime_loader: /boot/system/lib/libbe.so: Could not map image: Out of memory
0x83870550->VMAnonymousCache::_Commit(13295616): Failed to reserve 13295616 bytes of RAM
runtime_loader: /boot/system/lib/libicu-data.so.4.2: Could not map image: Out of memory

So for some reason it is indeed a memory issue.

Checked that QEMU boots slow but okay with 96 MB. At 64 MB one time booted directly to desktop, otherwise fails.

QEMU does show some ugly black-green-blue-whitish intermediate screen between splash screen and desktop screen, too.

comment:6 by bonefish, 10 years ago

More info please:

  • Boot from HD or CD?
  • Swap enabled?
  • KDL: "avail", "page_stats", "swap", maybe "areas" (longish)

In case you boot without swap enabled or, even worse, from CD, I'm afraid things don't look so good anymore. Obviously ICU is a bitch. Since we don't over-commit memory other than stack, 13 MB need to be committed while relocating libicu-data.so. When a few programs/servers are launched concurrently, this can easily exhaust the available memory.

in reply to:  6 ; comment:7 by andreasf, 10 years ago

CD boot, therefore expecting the language/keymap dialog.

Do my findings also indicate that the boot menu leaks memory then? Possibly the partition/filesystem code since both removing the HDD and not entering the boot menu improves things?

Alt+SysRq+d at the blue desktop screen results in:

Page fault in double fault debugger without fault handler! Touching address 0x81005010 from eip 0x800dacb7. Entering infinite loop...

in reply to:  7 ; comment:8 by bonefish, 10 years ago

Replying to andreasf:

CD boot, therefore expecting the language/keymap dialog.

Do my findings also indicate that the boot menu leaks memory then? Possibly the partition/filesystem code since both removing the HDD and not entering the boot menu improves things?

After the early VM setup all memory allocated by the boot loader that has not be claimed explicitly is freed. So even if the boot loader did leak memory, that wouldn't matter.

Alt+SysRq+d at the blue desktop screen results in:

Page fault in double fault debugger without fault handler! Touching address 0x81005010 from eip 0x800dacb7. Entering infinite loop...

Don't know what that is about, I don't think it's related to the boot problem, though. You could check what function 0x800dacb7 translates to (objdump).

in reply to:  8 ; comment:9 by andreasf, 10 years ago

Replying to bonefish:

Don't know what that is about, I don't think it's related to the boot problem, though.

I would've thought it means there's not enough memory to run the kernel debugger...

You could check what function 0x800dacb7 translates to (objdump).

Here's the output from objdump -t objects/haiku/x86/release/system/kernel/kernel_x86 | grep 800dacb:

800dacb0 g     F .text  0000000b              ring_buffer_readable

If you meant something else, please be more specific.

in reply to:  9 ; comment:10 by bonefish, 10 years ago

Replying to andreasf:

Replying to bonefish:

Don't know what that is about, I don't think it's related to the boot problem, though.

I would've thought it means there's not enough memory to run the kernel debugger...

There's always enough memory to run the kernel debugger. :-) All memory used is pre-allocated, static, stack, or dispensable.

You could check what function 0x800dacb7 translates to (objdump).

Here's the output from objdump -t objects/haiku/x86/release/system/kernel/kernel_x86 | grep 800dacb:

800dacb0 g     F .text  0000000b              ring_buffer_readable

If you meant something else, please be more specific.

Thanks, that was what I was looking for. The double/triple fault should be fixed in hrev36497. The kernel debugger was accessing the debug syslog buffer unguardedly, although there's no guarantee that it exists at that point (could be disabled or the allocation could have failed (your case)).

in reply to:  10 ; comment:11 by andreasf, 10 years ago

Replying to bonefish:

The double/triple fault should be fixed in hrev36497. The kernel debugger was accessing the debug syslog buffer unguardedly, although there's no guarantee that it exists at that point (could be disabled or the allocation could have failed (your case)).

I actually do have it disabled in the kernel settings to save some more space. ;)

At hrev36501 Alt+SysRq+d just shows me three empty white lines instead.

in reply to:  11 ; comment:12 by bonefish, 10 years ago

Replying to andreasf:

I actually do have it disabled in the kernel settings to save some more space. ;)

Well, even better. :-)

At hrev36501 Alt+SysRq+d just shows me three empty white lines instead.

Anything on the serial line?

in reply to:  12 ; comment:13 by andreasf, 10 years ago

Replying to bonefish:

Anything on the serial line?

Nope, nothing KDLly. Last thing was that it couldn't load libbe.so.

Tried an hrev36501 anyboot image with some usb and wacom drivers dumped:

ati: SetDisplayMode() done
etherpci: init_driver init_driver: etherpci not found
loaded driver /boot/system/add-ons/kernel/drivers/dev/net/pegasus
0x838b3440->VMAnonymousCache::_Commit(1789952): Failed to reserve 1789952 bytes of RAM
runtime_loader: /boot/system/lib/libicu-i18n.so.4.2: Could not map image: Out of memory
0x838b0550->VMAnonymousCache::_Commit(856064): Failed to reserve 856064 bytes of RAM
runtime_loader: /boot/system/lib/libroot.so: Could not map image: Out of memory
ps2_hid: init_hardware
ps2_hid: init_driver
ps2_hid: publish_devices
ps2_hid: uninit_driver
loaded driver /boot/system/add-ons/kernel/drivers/dev/input/ps2_hid
ps2: probe_mouse Standard PS/2 mouse found
ps2: devfs_publish_device input/mouse/ps2/standard_0, status = 0x00000000
bfs: inode at 0 requested!
load addon (null) failed
ps2: devfs_publish_device input/keyboard/at/0, status = 0x00000000
virtual void AddOnManager::MessageReceived(BMessage*) what: NMP_
bfs: inode at 0 requested!
load addon (null) failed
bfs: inode at 0 requested!
load addon (null) failed
ps2: keyboard found

KDL is unchanged.

Looks hopeless, but at least we found one bug. ;-)

in reply to:  13 ; comment:14 by bonefish, 10 years ago

Replying to andreasf:

Looks hopeless, but at least we found one bug. ;-)

"Never give up! Never surrender!" :-)

Fixed another bug with syslog explicitly disabled in hrev36528. You should be able to enter KDL, now.

comment:15 by bonefish, 10 years ago

hrev36552 might fix the out-of-memory issues.

in reply to:  14 comment:16 by andreasf, 10 years ago

Replying to bonefish:

You should be able to enter KDL, now.

Confirmed, KDL worked at hrev36537 (standard anyboot CD), after:

runtime_loader: /boot/system/lib/libstdc++.so: Could not map image: Out of memory

"Never give up! Never surrender!" :-)

By Grabthar's hammer, by the Sons of Warvan your fixes shall be tested! ;-)

in reply to:  15 comment:17 by andreasf, 10 years ago

At hrev36573 I get this during the distorted screen:

ati: SetDisplayMode() done
Keyboard Requested Halt.
Welcome to Kernel Debugging Land...
Thread 6 "page writer" running on CPU 0
kdebug> sc
stack trace for thread 6 "page writer"
    kernel stack: 0x82c0c000 to 0x82c10000
frame               caller     <image>:function + offset
 0 82c0fb68 (+  32) 80075e77   <kernel_x86> invoke_command_trampoline(void*: 0x82c0fbe8) + 0x0017
 1 82c0fb88 (+  12) 800fd24e   <kernel_x86>:arch_debug_call_with_fault_handler + 0x001b
 2 82c0fb94 (+  48) 800736d8   <kernel_x86>:debug_call_with_fault_handler + 0x0058
 3 82c0fbc4 (+  64) 8007621c   <kernel_x86>:invoke_debugger_command + 0x008c
 4 82c0fc04 (+  48) 80076376   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: NULL, int32: 0, char*: 0xc0fc64) + 0x0086
 5 82c0fc34 (+  32) 8007644c   <kernel_x86>:invoke_debugger_command_pipe + 0x008c
 6 82c0fc54 (+  48) 80079cdb   <kernel_x86> ExpressionParser<0x82c0fce0>::_ParseCommandPipe(int&: 0x82c0fd7c) + 0x017b
 7 82c0fc84 (+  64) 8007d1cc   <kernel_x86> ExpressionParser<0x82c0fce0>::EvaluateCommand(char const*: 0x801535e0 "sc", int&: 0x82c0fd7c) + 0x07cc
 8 82c0fcc4 (+ 192) 8007dff4   <kernel_x86>:evaluate_debug_command + 0x0114
 9 82c0fd84 (+  64) 8007479d   <kernel_x86> kernel_debugger_loop(char const*: 0x0 "<NULL>", char const*: 0x801486a1 "Keyboard Requested Halt.", char*: 0x82c0fdf4,d
10 82c0fdc4 (+  48) 800749b2   <kernel_x86> kernel_debugger_internal(char const*: 0x0 "<NULL>", char const*: 0x83c63800 "", char*: 0x82c0fe14, int32: -2147005649)2
11 82c0fdf4 (+  32) 80074b43   <kernel_x86>:kernel_debugger + 0x0023
12 82c0fe14 (+  32) 80074c3d   <kernel_x86>:debug_emergency_key_pressed + 0x005d
13 82c0fe34 (+  32) 800f4dc2   <kernel_x86> debug_keyboard_interrupt(void*: NULL) + 0x0052
14 82c0fe54 (+  48) 8004e558   <kernel_x86>:int_io_interrupt_handler + 0x0058
15 82c0fe84 (+  32) 800f84c4   <kernel_x86> hardware_interrupt(iframe*: 0x82c0feb0) + 0x0114
16 82c0fea4 (+  12) 800fe1fd   <kernel_x86>:int_bottom + 0x003d
kernel iframe at 0x82c0feb0 (end = 0x82c0ff00)
 eax 0x1            ebx 0x840332a4      ecx 0x0          edx 0x0
 esi 0x801684c8     edi 0x801684b4      ebp 0x82c0ff28   esp 0x82c0fee4
 eip 0x800e1cbf  eflags 0x210206   
 vector: 0x21, error code: 0x0
17 82c0feb0 (+ 120) 800e1cbf   <kernel_x86> set_page_state(vm_page*: NULL, int32: 0) + 0x017f
18 82c0ff28 (+ 176) 800e5649   <kernel_x86> page_writer(void*: NULL) + 0x0679
19 82c0ffd8 (+  32) 800651a6   <kernel_x86> _create_kernel_thread_kentry() + 0x0016
20 82c0fff8 (+2101280776) 80069b70   <kernel_x86> thread_kthread_exit() + 0x0000
kdebug> avail
Available memory: 10997760/68157440 bytes
kdebug> page_stats
page stats:
total: 16640
active: 92 (busy: 0)
inactive: 1573 (busy: 0)
cached: 0 (busy: 0)
unused: 4390 (busy: 0)
wired: 9882 (busy: 0)
modified: 189 (busy: 0)
free: 514
clear: 0
unreserved free pages: 512
unsatisfied page reservations: 1
mapped pages: 9926
longest free pages run: 7 pages (at 14872)
longest free/cached pages run: 7 pages (at 14872)
waiting threads:
      52: missing:      1, don't touch:    512

free queue: 0x801684f0, count = 514
clear queue: 0x80168504, count = 0
modified queue: 0x801684c8, count = 189 (188 temporary, 189 swappable, inactive: 189)
active queue: 0x801684a0, count = 92
inactive queue: 0x801684b4, count = 1573
cached queue: 0x801684dc, count = 0
kdebug> 

comment:18 by bonefish, 10 years ago

Available memory: 10997760/68157440 bytes

page stats:
total: 16640

Haiku is not aware of the machine having 128 MB RAM and I'm afraid 68 MB is really not enough, if swap is disabled (which I assume is the case). Please enable TRACE_MEMORY_MAP in src/system/boot/platform/bios_ia32/mmu.cpp. That will print the memory map the boot loader gets from the BIOS.

in reply to:  18 comment:19 by andreasf, 10 years ago

phys memory ranges:

    base 0x00000000, length 0x0009f000

    base 0x00100000, length 0x04000000

allocated phys memory ranges:

    base 0x00100000, length 0x01409000

    base 0x03f00000, length 0x00100000

allocated virt memory ranges:

    base 0x80000000, length 0x025be000

in reply to:  18 ; comment:20 by andreasf, 10 years ago

Ah, could it be related to this first line?

No extended memory block - using 64 MB (fix me!)

in reply to:  20 ; comment:21 by andreasf, 10 years ago

Setting memSize to 128 * 1024 * 1024 in mmu.cpp does get us to the blue desktop screen, but now I get:

ati: SetDisplayMode() done

etherpci: init_driver init_driver: etherpci not found

ata 1 error: timeout waiting for interrupt
ata 1 error: Recover
LostInterrupt: device busy, status 0x58
ata 1 error: device selection timeout
atapi 1-0 error: failed to send packet request

ata 1 error: device selection timeout
atapi 1-0 error: failed to send packet request

etc.

I'm wondering, since the comment in mmu.cpp says "contiguously mapped at 0x0", shouldn't the size of the second memory range be memSize - 0x1000000?

in reply to:  21 ; comment:22 by bonefish, 10 years ago

Component: System/KernelDrivers/Disk

Replying to andreasf:

Setting memSize to 128 * 1024 * 1024 in mmu.cpp does get us to the blue desktop screen, but now I get:

ati: SetDisplayMode() done

etherpci: init_driver init_driver: etherpci not found

ata 1 error: timeout waiting for interrupt
ata 1 error: Recover
LostInterrupt: device busy, status 0x58
ata 1 error: device selection timeout
atapi 1-0 error: failed to send packet request

ata 1 error: device selection timeout
atapi 1-0 error: failed to send packet request

etc.

Mmh, no idea. Changing component.

I'm wondering, since the comment in mmu.cpp says "contiguously mapped at 0x0", shouldn't the size of the second memory range be memSize - 0x1000000?

Indeed. Fixed in hrev36581. This is still a kludge, though. When function e820 isn't supported, we should better first fall back to e801. Just in case you're interested in implementing it: http://www.uruk.org/orig-grub/mem64mb.html

in reply to:  22 comment:23 by andreasf, 10 years ago

Component: Drivers/DiskSystem/Boot Loader

I've tried it locally, and I am now able to boot the full desktop and am installing to partition.

I assume the ATA errors were symptom of something going beyond 128MB physical memory; if they reoccur I'd prefer to open a new ticket. Changing component to boot loader.

by andreasf, 10 years ago

Attachment: boot-memory.diff added

proposed patch

in reply to:  22 comment:24 by andreasf, 10 years ago

Summary: Boot failure[PATCH] Boot failure due to 64 MB default memory size

Replying to bonefish:

When function e820 isn't supported, we should better first fall back to e801.

Both fallbacks are implemented in the attached patch. e801 works for me, 88 is untested.

comment:25 by axeld, 10 years ago

Looks great, thanks! Just one suggestion: instead of writing: if (a & b) write: if ((a & b) != 0) as the former does not behave as you want when adding another term to the clause due to operator precedence, and is a very common source of errors (I've already fixed a ton of those in our repository).

by andreasf, 10 years ago

Attachment: boot-memory.2.diff added

updated patch

in reply to:  25 comment:26 by andreasf, 10 years ago

There you go. I just copied the e820 code. ;-)

comment:27 by bonefish, 10 years ago

Resolution: fixed
Status: assignedclosed
Summary: [PATCH] Boot failure due to 64 MB default memory sizeBoot failure

Thanks, applied in hrev36595.

Note: See TracTickets for help on using tickets.