Opened 3 years ago

Closed 3 years ago

#17511 closed bug (fixed)

qemu riscv64 no longer booting

Reported by: kallisti5 Owned by:
Priority: normal Milestone: R1/beta4
Component: System/Boot Loader/EFI Version: R1/beta3
Keywords: Cc:
Blocked By: Blocking:
Platform: riscv64

Description

gcc11 introduced some major bugs in booting Haiku on the unmatched and qemu.

qemu-system-riscv64 -M virt -m 1G -device ati-vga -kernel u-boot.bin \
	-drive file=haiku-mmc.image,format=raw,if=virtio \
	-usb -device usb-ehci,id=echi -device usb-kbd -device usb-tablet
.
.
Booting /EFI\BOOT\BOOTRISCV64.EFI

PLIC contexts
  context 1: 2
    cpu id: 0
GOP protocol not found
Welcome to the Haiku boot loader!
add_partitions_for(0x00000000bc6c7258, mountFS = no)
add_partitions_for(fd = 0, mountFS = no)
0x00000000bc6c72b0 Partition::Partition
0x00000000bc6c72b0 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
  priority: 810
check for partitioning_system: Intel Extended Partition
0x00000000bc6c74a0 Partition::Partition
0x00000000bc6c72b0 Partition::AddChild 0x00000000bc6c74a0
0x00000000bc6c74a0 Partition::SetParent 0x00000000bc6c72b0
new child partition!
0x00000000bc6c75b8 Partition::Partition
0x00000000bc6c72b0 Partition::AddChild 0x00000000bc6c75b8
0x00000000bc6c75b8 Partition::SetParent 0x00000000bc6c72b0
new child partition!
0x00000000bc6c72b0 Partition::Scan(): scan child 0x00000000bc6c74a0 (start = 2048, size = 33554432, parent = 0x00000000bc6c72b0)!
0x00000000bc6c74a0 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
check for partitioning_system: Intel Extended Partition
0x00000000bc6c72b0 Partition::Scan(): scan child 0x00000000bc6c75b8 (start = 33556480, size = 314572800, parent = 0x00000000bc6c72b0)!
0x00000000bc6c75b8 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
check for partitioning_system: Intel Extended Partition
0x00000000bc6c72b0 Partition::~Partition
0x00000000bc6c74a0 Partition::SetParent 0x0000000000000000
0x00000000bc6c75b8 Partition::SetParent 0x0000000000000000
0x00000000bc6c74a0 Partition::_Mount check for file_system: BFS Filesystem
0x00000000bc6c74a0 Partition::_Mount check for file_system: FAT32 Filesystem
0x00000000bc6c74a0 Partition::_Mount check for file_system: TAR Filesystem
0x00000000bc6c74a0 Partition::~Partition
0x00000000bc6c75b8 Partition::_Mount check for file_system: BFS Filesystem
PackageVolumeInfo::SetTo()
PackageVolumeInfo::_InitState(): failed to parse activated-packages: No such file or directory
load kernel kernel_riscv64...
Unhandled exception: Load access fault
EPC: 00000000be6dfd9e RA: 00000000be6e07bc TVAL: af9f6b284653f724
EPC: 000000007e980d9e RA: 000000007e9817bc reloc adjusted

Code: f4a6 ecce e0da fc5e f862 f466 f06a ec6e (4783 0a95)
UEFI image [0x00000000be6c7000:0x00000000be7237cf] pc=0x18d9e '/EFI\BOOT\BOOTRISCV64.EFI'


resetting ...

Change History (5)

comment:1 by kallisti5, 3 years ago

I started doing a debug build of our kernel, and the -O0 it sets seems to solve the early boot issue we see on qemu after the gcc 11 upgrade.

Version 1, edited 3 years ago by kallisti5 (previous) (next) (diff)

comment:2 by kallisti5, 3 years ago

With the trace statements above (and enabling debug for the kernel + bootloader), ran across this:

PCI: pci_module_init
pci_controller_init()
sizeof(PciDbi): 0x1000
hostCtrlType: ecam
  reg[0]: (0x30000000, 0x10000000)
  configRegs: (0x30000000, 0x10000000)
  interrupt-map:
    bus: 0, dev: 0, fn: 0, childIrq: 1, parentIrq: (3, 32)
    bus: 0, dev: 0, fn: 0, childIrq: 2, parentIrq: (3, 33)
    bus: 0, dev: 0, fn: 0, childIrq: 3, parentIrq: (3, 34)
    bus: 0, dev: 0, fn: 0, childIrq: 4, parentIrq: (3, 35)

    bus: 0, dev: 1, fn: 0, childIrq: 1, parentIrq: (3, 33)
    bus: 0, dev: 1, fn: 0, childIrq: 2, parentIrq: (3, 34)
    bus: 0, dev: 1, fn: 0, childIrq: 3, parentIrq: (3, 35)
    bus: 0, dev: 1, fn: 0, childIrq: 4, parentIrq: (3, 32)

    bus: 0, dev: 2, fn: 0, childIrq: 1, parentIrq: (3, 34)
    bus: 0, dev: 2, fn: 0, childIrq: 2, parentIrq: (3, 35)
    bus: 0, dev: 2, fn: 0, childIrq: 3, parentIrq: (3, 32)
    bus: 0, dev: 2, fn: 0, childIrq: 4, parentIrq: (3, 33)

    bus: 0, dev: 3, fn: 0, childIrq: 1, parentIrq: (3, 35)
    bus: 0, dev: 3, fn: 0, childIrq: 2, parentIrq: (3, 32)
    bus: 0, dev: 3, fn: 0, childIrq: 3, parentIrq: (3, 33)
    bus: 0, dev: 3, fn: 0, childIrq: 4, parentIrq: (3, 34)
  ranges:
    IOPORT (0x01000000): child: 00000000, parent: 03000000, len: 10000
    MMIO32 (0x02000000): child: 40000000, parent: 40000000, len: 40000000
    MMIO64 (0x03000000): child: 400000000, parent: 400000000, len: 400000000
AllocRegs()
PANIC: Unexpected exception occurred in kernel mode!
Welcome to Kernel Debugging Land...
Thread 14 "main2" running on CPU 0
Stack:
FP: 0xffffffc0029c9e70
FP: 0xffffffc0029c9e90, PC: 0xffffffc00217204a <kernel_riscv64> _ZL22stack_trace_trampolinePv + 16
FP: 0xffffffc0029c9ec0, PC: 0xffffffc002230774 <kernel_riscv64> arch_debug_call_with_fault_handler + 58
FP: 0xffffffc0029c9f10, PC: 0xffffffc002174636 <kernel_riscv64> debug_call_with_fault_handler.localalias + 118
FP: 0xffffffc0029c9fa0, PC: 0xffffffc0021722d2 <kernel_riscv64> _ZL20kernel_debugger_loopPKcS0_Pvi + 638
FP: 0xffffffc0029c9fe0, PC: 0xffffffc0021727a6 <kernel_riscv64> _ZL24kernel_debugger_internalPKcS0_Pvi + 144
FP: 0xffffffc0029ca020, PC: 0xffffffc002174c00 <kernel_riscv64> panic + 104
FP: 0xffffffc0029ca140, PC: 0xffffffc002231cd2 <kernel_riscv64> _ZL10SendSignal20debug_exception_typejimi + 302
FP: 0xffffffc0029ca2e0, PC: 0xffffffc00223220e <kernel_riscv64> STrap + 460
FP: 0xffffffc0029ca400, PC: 0xffffffc00222ee44 <kernel_riscv64> SVec + 100
STrap(exception loadAccessFault)
  sstatus: (ie: {}, pie: {s}, spp: s, fs: dirty, xs: off, sum: 0, mxr: 0, uxl: 0, sd: 1)
  stval: 0xffffffc006000000
   ra: 0xffffffc0025383b6   t6: 0x00000000fbf4fd11   sp: 0xffffffc0029ca400   gp: 0x0000000000000000
   tp: 0xffffffc0032bb580   t0: 0x80000000000fbf24   t1: 0xffffffc002532acc   t2: 0x0000000000000000
   t5: 0x0000000000000000   s1: 0xffffffc00253c8b8   a0: 0xffffffc006000000   a1: 0x0000000000000000
   a2: 0x0000000000000000   a3: 0x0000000000000000   a4: 0x0000000010000000   a5: 0xffffffc016000000
   a6: 0xffffffc0029ca1c0   a7: 0x0000000000000000   s2: 0x0000000000000000   s3: 0x0000000000000000
   s4: 0x0000000000000000   s5: 0x0000000000000008   s6: 0x0000000000000020   s7: 0x000000000000ffff
   s8: 0x0000000000000000   s9: 0x0000000002000000  s10: 0x0000000003000000  s11: 0xffffffc00253baa8
   t3: 0xffffffc00253791e   t4: 0xffffffffffffffc8   fp: 0xffffffc00253c8b8  epc: 0xffffffc0025383b8
FP: 0xffffffc00253c8b8, PC: 0xffffffc0025383b8 <pci> _ZN17ArchPCIController9AllocRegsEv + 96
FP: 0xffffffc0022971e0, PC: 0xffffffc0033720a0 <slab area> 0x3720a0
FP: 0xffffffc00217b6a6, PC: 0x1 <commpage> 0x1
FP: 0x853e4781807ff0ef, PC: 0x80826105644260e2 0x80826105644260e2
kdebug> 

comment:3 by kallisti5, 3 years ago

ahci: failed to get pci x86 module
PCI: pci_module_init
pci_controller_init()
sizeof(PciDbi): 0x1000
hostCtrlType: ecam
  reg[0]: (0x30000000, 0x10000000)
  configRegs: (0x30000000, 0x10000000)
  interrupt-map:
    bus: 0, dev: 0, fn: 0, childIrq: 1, parentIrq: (3, 32)
    bus: 0, dev: 0, fn: 0, childIrq: 2, parentIrq: (3, 33)
    bus: 0, dev: 0, fn: 0, childIrq: 3, parentIrq: (3, 34)
    bus: 0, dev: 0, fn: 0, childIrq: 4, parentIrq: (3, 35)

    bus: 0, dev: 1, fn: 0, childIrq: 1, parentIrq: (3, 33)
    bus: 0, dev: 1, fn: 0, childIrq: 2, parentIrq: (3, 34)
    bus: 0, dev: 1, fn: 0, childIrq: 3, parentIrq: (3, 35)
    bus: 0, dev: 1, fn: 0, childIrq: 4, parentIrq: (3, 32)

    bus: 0, dev: 2, fn: 0, childIrq: 1, parentIrq: (3, 34)
    bus: 0, dev: 2, fn: 0, childIrq: 2, parentIrq: (3, 35)
    bus: 0, dev: 2, fn: 0, childIrq: 3, parentIrq: (3, 32)
    bus: 0, dev: 2, fn: 0, childIrq: 4, parentIrq: (3, 33)

    bus: 0, dev: 3, fn: 0, childIrq: 1, parentIrq: (3, 35)
    bus: 0, dev: 3, fn: 0, childIrq: 2, parentIrq: (3, 32)
    bus: 0, dev: 3, fn: 0, childIrq: 3, parentIrq: (3, 33)
    bus: 0, dev: 3, fn: 0, childIrq: 4, parentIrq: (3, 34)
  ranges:
    IOPORT (0x01000000): child: 00000000, parent: 03000000, len: 10000
    MMIO32 (0x02000000): child: 40000000, parent: 40000000, len: 40000000
    MMIO64 (0x03000000): child: 400000000, parent: 400000000, len: 400000000
AllocRegs()
j: 0
i: 0
readConfig address: 0xFFFFFFC006000000 bus: 0  device: 0  func: 0 offset: 0 size: 2PANIC: Unexpected exception occurred in kernel mode!
Welcome to Kernel Debugging Land...
Thread 14 "main2" running on CPU 0

comment:4 by kallisti5, 3 years ago

To touch on this one, we have determined a bit more on this issue.

There are two known major issues with the riscv64 port.

1) Only seen in qemu (this issue)

Adding a single dprintf in src/system/boot/loader/loader.cpp solves this crash...

    status_t status = elf_load_image(modules, name);
    dprintf("DJ QUACKY QUACK\n");

The cause is unknown. Speculation has included some kind of memory alignment issue, or some other mmu issue.

2) userspace hang #17468 - seen in qemu and the unmatched board.

This is seen on both the Unmatched hardware, and in qemu after problem 1 above is worked around. x512 has mentioned the issue happens due to the ICU67 package (and downgrading to previous ICU 5x solves the problem).

The most likely core cause is the GCC 11.2 upgrade which occurred around roughly the same time as the ICU version bump. Building ICU70 might solve this problem, however more likely is some lingering bug in binaries generated by gcc 11.2

Last edited 3 years ago by kallisti5 (previous) (diff)

comment:5 by kallisti5, 3 years ago

Milestone: UnscheduledR1/beta4
Resolution: fixed
Status: newclosed
  • 1 above was solved by a u-boot binary upgrade. Must have been some bug in u-boot's allocations.
  • 2 was worked around by downgrading to icu57 compiled with gcc 8.x.

I'm calling this one resolved since the scope is too large now for a single ticket.

Note: See TracTickets for help on using tickets.