Context Navigation

#17468 closed bug (fixed)

riscv64 images built with icu compiled under gcc 11.x lockup at boot

Reported by:	kallisti5	Owned by:	nobody
Priority:	normal	Milestone:	Unscheduled
Component:	System/Kernel	Version:	R1/beta3
Keywords:		Cc:
Blocked By:		Blocking:
Platform:	riscv64

Description

The unmatched is no longer booting in recent commits...

ahci: failed to get pci x86 module
module: Search for bus_managers/pci/x86/v1 failed.
ahci: failed to get pci x86 module
module: Search for bus_managers/pci/x86/v1 failed.
ahci: failed to get pci x86 module
module: Search for bus_managers/pci/x86/v1 failed.
ahci: failed to get pci x86 module
vm_soft_fault: va 0x34f5de1000 not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x34f5de1ff0, ip 0x257ee83c4c, write 1, user 1, exec 0, thread 0x2ce
thread_hit_serious_debug_event(): Failed to install debugger: thread: 718: Bad port ID
error starting "/boot/system/servers/launch_daemon" error = -1

Attachments (1)

boot_logs.txt (87.4 KB ) - added by kallisti5 3 years ago.

Download all attachments as: .zip

Change History (24)

by kallisti5, 3 years ago

Attachment:	boot_logs.txt added

comment:1 by kallisti5, 3 years ago

It might be related to the recent gcc11 merge. We are still using gcc8 syslibs. Going to unbootstrap and see if new build-packages solves it.

comment:2 by X512, 3 years ago

It seems crash in launch_daemon, userland problem.

comment:3 by kallisti5, 3 years ago

I've confirmed that gcc11 is to blame by building the latest Haiku code with the gcc8 buildtools repo. Haiku boots as usual when compiled with gcc8 buildtools.

The first thing to test is likely getting a gcc11 syslibs package and updating the build-package repo with it.

Last edited 3 years ago by kallisti5 (previous) (diff)

comment:4 by kallisti5, 3 years ago

I've bootstrapped haiku for riscv64 and updated our build-packages. The sifive unmatched still isn't booting with the same error.

gcc 11 introduced some major problem.

comment:5 by X512, 3 years ago

It is infinite recursion in runtime_loader.

FP: 0x3dd877aaa0, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877aac0, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877aae0, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877ab00, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877ab20, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877ab40, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877ab60, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877ab80, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877aba0, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
FP: 0x3dd877abc0, PC: 0x33b765164c </boot/system/runtime_loader> 0x1464c
kdebug>

comment:6 by X512, 3 years ago

0000000000114632 <memset>:
  114632: 01 11        	addi	sp, sp, -32
  114634: 22 e8        	sd	s0, 16(sp)
  114636: 26 e4        	sd	s1, 8(sp)
  114638: 06 ec        	sd	ra, 24(sp)
  11463a: 00 10        	addi	s0, sp, 32
  11463c: aa 84        	mv	s1, a0
  11463e: 19 c6        	beqz	a2, 0x11464c <memset+0x1a>
  114640: 93 f5 f5 0f  	andi	a1, a1, 255
  114644: 97 40 ff ff  	auipc	ra, 1048564
  114648: e7 80 c0 00  	jalr	12(ra)
  11464c: e2 60        	ld	ra, 24(sp) // <-- HERE
  11464e: 42 64        	ld	s0, 16(sp)
  114650: 26 85        	mv	a0, s1
  114652: a2 64        	ld	s1, 8(sp)
  114654: 05 61        	addi	sp, sp, 32
  114656: 82 80        	ret

follow-up: 9 comment:7 by waddlesplash, 3 years ago

That problem should have already been resolved by hrev55661.

comment:8 by X512, 3 years ago

After fixing infinite recursion by globally applying -fno-builtin it now crash in libicu:

PANIC: thread_hit_serious_debug_event
Welcome to Kernel Debugging Land...
Thread 282 "launch_daemon" running on CPU 0
Stack:
FP: 0xffffffc000004a80
FP: 0xffffffc000004aa0, PC: 0xffffffc000152d8a <kernel_riscv64> arch_debug_call_with_fault_handler + 32
FP: 0xffffffc000004af0, PC: 0xffffffc0000d3b88 <kernel_riscv64> debug_call_with_fault_handler.localalias + 128
FP: 0xffffffc000004b80, PC: 0xffffffc0000d4ee8 <kernel_riscv64> _ZL20kernel_debugger_loopPKcS0_Pvi + 324
FP: 0xffffffc000004bf0, PC: 0xffffffc0000d5290 <kernel_riscv64> _ZL24kernel_debugger_internalPKcS0_Pvi + 284
FP: 0xffffffc000004c30, PC: 0xffffffc0000d5524 <kernel_riscv64> panic + 92
FP: 0xffffffc000004ca0, PC: 0xffffffc0000e1e36 <kernel_riscv64> _ZL30thread_hit_serious_debug_event22debug_debugger_messagePKvi + 38
FP: 0xffffffc000004d00, PC: 0xffffffc0000e21c4 <kernel_riscv64> user_debug_exception_occurred + 78
FP: 0xffffffc000004de0, PC: 0xffffffc00013f91a <kernel_riscv64> vm_page_fault + 460
FP: 0xffffffc000004ed0, PC: 0xffffffc0001541fc <kernel_riscv64> STrap + 800
FP: 0xffffffc000004ff0, PC: 0xffffffc000151d38 <kernel_riscv64> SVecU + 120
STrap(exception execPageFault)
  sstatus: (ie: {u}, pie: {s}, spp: u, fs: dirty, xs: off, sum: 0, mxr: 0, uxl: 2, sd: 1)
  stval: 0x3205051300134516
   ra: 0x0000003efcd2c292   t6: 0x000000000000000e   sp: 0x0000003f0182ca00   gp: 0x0000000000000000
   tp: 0x0000003f0182d000   t0: 0x000000000000000c   t1: 0x0000003acc8b11dc   t2: 0x0000000000000000
   t5: 0x0000000000000040   s1: 0xffffffffffffffff   a0: 0x00000039f9fd8100   a1: 0x0000000000000001
   a2: 0x0000000000000020   a3: 0x00000039fadc3000   a4: 0x0000000100000000   a5: 0x3205051300134517
   a6: 0x0000000000000016   a7: 0x00000006b4d3a62c   s2: 0x0000002e990323e8   s3: 0xfffffffffffffffe
   s4: 0x00000027cdf81c12   s5: 0xfffffffffffffffe   s6: 0x0000003efcd9bf18   s7: 0x0000003acc8bf140
   s8: 0xfffffffffffffffd   s9: 0xfffffffffffffffd  s10: 0xffffffffffffffff  s11: 0xffffffffffffffff
   t3: 0x0000003acc8b58bc   t4: 0x0000000000000000   fp: 0x0000003f0182ca90  epc: 0x3205051300134516
FP: 0x3f0182ca90, PC: 0x3205051300134516 0x3205051300134516
FP: 0x0, PC: 0x2e98f01270 <libicuuc.so.67> _ZN6icu_676UMutex8getMutexEv + 122
kdebug>

in reply to: 7 comment:9 by X512, 3 years ago

Replying to waddlesplash:

That problem should have already been resolved by hrev55661.

Now it occurs in runtime_loader, not libroot.so. Jamfile is here: https://git.haiku-os.org/haiku/tree/src/system/runtime_loader/arch/riscv64/Jamfile.

follow-up: 11 comment:10 by waddlesplash, 3 years ago

The jamfile reuses the already-built .o from libroot.

in reply to: 10 comment:11 by X512, 3 years ago

Replying to waddlesplash:

The jamfile reuses the already-built .o from libroot.

But not for riscv64, see jamfile above. It use generic C memset, not assembly code.

comment:12 by X512, 3 years ago

Anyway jam -dx show no -fno-builtin when memset.c is compiled.

comment:13 by X512, 3 years ago

libicu crash may be caused by incorrectly built gcc_syslibs[_devel] package. develop/headers/c++/riscv64-unknown-haiku/bits/gthr-default.h conteins stu instead of pthread implementation.

comment:14 by kallisti5, 3 years ago

odd. I have https://github.com/haikuports/haikuports.cross/blob/master/sys-devel/gcc_bootstrap/gcc_bootstrap-11.2.0_2021_07_28.recipe#L185 configured for pthread.

--enable-threads=posix

comment:15 by X512, 3 years ago

Note that the same problem was present in gcc8 bootstrap, but older ICU was used.

comment:16 by kallisti5, 3 years ago

ack. ok that makes sense. I'll dig into the ICU issue.

As a note here, we added the same no-builtin fix for arch_string in hrev55753 across all architectures. That adjusted the behavior of memset

0000000000112c4a <memset>:
  112c4a: 41 11         addi    sp, sp, -16
  112c4c: 22 e0         sd      s0, 0(sp)
  112c4e: 06 e4         sd      ra, 8(sp)
  112c50: 2a 84         mv      s0, a0
  112c52: 09 c6         beqz    a2, 0x112c5c <memset+0x12>
  112c54: 93 f5 f5 0f   andi    a1, a1, 255
  112c58: ef 50 9f 9f   jal     0x108650 <.plt+0x590>
  112c5c: a2 60         ld      ra, 8(sp)
  112c5e: 22 85         mv      a0, s0
  112c60: 02 64         ld      s0, 0(sp)
  112c62: 41 01         addi    sp, sp, 16
  112c64: 82 80         ret

I can confirm here my unmatched desktop is booting again after hrev55753

comment:17 by kallisti5, 3 years ago

I can confirm here my unmatched desktop is booting again after hrev55753

Cancel that. I had an extra flash drive plugged in with an older version of Haiku on it.

Looks like we completely missed the point in hrev55753. I'm fixing it and will submit another fix after testing

comment:18 by kallisti5, 3 years ago

so.. still seeing the original crash on my unmatched even after hrev55754

SiFive unmatched:

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0xffffff9300134516, ip 0xffffff9300134516, write
thread_hit_serious_debug_event(): Failed to install debugger: thread: 702: Bad port ID

Compiling kernel memset.o

.../riscv64-unknown-haiku-gcc
...
-fno-builtin-fork -fno-builtin-vfork -march=rv64gc -nostdinc -finline -fno-builtin -Wno-main
...
-c "../src/system/libroot/posix/string/arch/generic/memset.c"
...
"objects/haiku/riscv64/release/system/kernel/lib/arch/riscv64/memset.o"

Linking kernel memset:

.../riscv64-unknown-haiku-ld -Bstatic -Bsymbolic -nostdlib -znocombreloc -no-undefined -r
objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/arch_elf.o
objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/arch_uart_sifive.o
objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/sbi_syscalls.o
objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/debug_uart.o
objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/debug_uart_8250.o
objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/arch_cpu.o
objects/haiku/riscv64/release/system/kernel/lib/arch/riscv64/byteorder.o
objects/haiku/riscv64/release/system/kernel/lib/arch/riscv64/memcpy.o
objects/haiku/riscv64/release/system/kernel/lib/arch/riscv64/memset.o
-o objects/haiku/riscv64/release/system/boot/arch/riscv64/efi/boot_arch_riscv64.o

compiling libroot memset:

.../riscv64-unknown-haiku-gcc  -O2 -Wall -Wno-multichar -Wpointer-arith -Wsign-compare
-Wmissing-prototypes -fno-strict-aliasing -fno-delete-null-pointer-checks -fno-builtin-fork
-fno-builtin-vfork -march=rv64gc -nostdinc -fno-builtin
-c "../src/system/libroot/posix/string/arch/riscv64/../generic/memset.c"
...
-o "objects/haiku/riscv64/release/system/libroot/posix/string/arch/riscv64/memset.o"

Linking libroot memset:

.../riscv64-unknown-haiku-ld  -r objects/haiku/riscv64/release/system/libroot/posix/string/arch/riscv64/memcpy.o
objects/haiku/riscv64/release/system/libroot/posix/string/arch/riscv64/memset.o
-o objects/haiku/riscv64/release/system/libroot/posix/string/arch/riscv64/posix_string_arch_riscv64.o

memset from bootloader post hrev55754:

0000000000009b9a <memset>:
    9b9a:       0ff5f593                zext.b  a1,a1
    9b9e:       00c50733                add     a4,a0,a2
    9ba2:       87aa                    mv      a5,a0
    9ba4:       c611                    beqz    a2,9bb0 <memset+0x16>
    9ba6:       0785                    addi    a5,a5,1
    9ba8:       feb78fa3                sb      a1,-1(a5)
    9bac:       fee79de3                bne     a5,a4,9ba6 <memset+0xc>
    9bb0:       8082                    ret

to me, everything looks correct :-|

comment:19 by X512, 3 years ago

Haiku riscv64 hrev55862 successfully boot in TinyEMU when downgraded ICU to icu-57.2-2.

comment:20 by kallisti5, 3 years ago

Summary:	unmatched hang - bad address → riscv64 images built with icu compiled under gcc 11.x lockup at boot

indeed. I've pushed that as a temporary workaround in hrev56122.

The important bit seems to be that icu-57.2-2 was compiled with gcc 8.x The non-functional icu 66, icu 70 were built with gcc 11.2.0 or 11.3.0.

It seems less tied to ICU version, and more tied to gcc toolchain used.

comment:21 by kallisti5, 3 years ago

On a whim I tried reproducing this one again after updating to 324b6a / hrev56131 (kernel/vm: Remove default kernel read/write flags)

Issue is still reproduced on hrev56131 + existing icu70 package

comment:22 by kallisti5, 2 years ago

We downgraded ICU which solved this one.

comment:23 by kallisti5, 2 years ago

Resolution:	→ fixed
Status:	new → closed

Note: See TracTickets for help on using tickets.

Download in other formats: