Opened 7 months ago

Last modified 7 months ago

#18603 new bug

hoard2: random malloc() failures on non-x86 hardware

Reported by: X512 Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/libroot.so Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: riscv64

Description

This is hrev57157.

On RISC-V real SMP hardware (HiFive Unmatched, VisionFive 2), malloc() sometimes return NULL even if a lot memory are still available. This cause serious problems when building big software (GCC, LLVM, WebKit etc.) for RISC-V CPU because compiler randomly fails reporting out of memory. I suspect that Hoard 2 allocator use some assumptions that word size memory read/write is atomic that is valid for x86, but not RISC-V or ARM. I tried to switch default allocator to Mimalloc and do not observe random malloc() failures yet.

Change History (4)

comment:1 by pulkomandy, 7 months ago

I suspect that Hoard 2 allocator use some assumptions that word size memory read/write is atomic that is valid for x86, but not RISC-V or ARM

Do you have a bit more detailed investigation on this? Like a pointer to where this would happen in the code?

I don't understand how word size accesses can be not atomic, unless they are misaligned accesses (otherwise, it is physically only one access to the RAM). But there could be other problems (things remaining in the CPU cache or registers and being out of sync between different CPU cores, for example). I guess we don't mean the same thing by "atomic"?

in reply to:  1 comment:2 by X512, 7 months ago

Replying to pulkomandy:

I suspect that Hoard 2 allocator use some assumptions that word size memory read/write is atomic that is valid for x86, but not RISC-V or ARM

Do you have a bit more detailed investigation on this? Like a pointer to where this would happen in the code?

I tried to investigate before, but not successful. It is not trivial to identify exact code location that cause problem, it need understanding of internal Hoard 2 algorithms and waiting bug to trigger. I think that it is a waste of time to do this investigation of obsolete unmaintained version of Hoard.

Switching to actual modern allocator such as Mimalloc solves problem, I successfully build GCC 13.2.0 on 4 core RISC-V machine without even single malloc() fault.

Maybe it is acceptable solution to keep Hoard 2 on 32 bits and use modern allocator on 64 bits. Modern allocators seems not bother to work nice with 32 bit virtual address space restrictions.

comment:3 by waddlesplash, 7 months ago

The problem isn't usually 32-bit virtual address space restrictions, but assumptions that overcommitting is the default, which it isn't for us.

Hoard3 looks like it may be better in this area, but I haven't tested it yet.

comment:4 by X512, 7 months ago

Mimalloc supports both overcommit and non-overcommit mode. If first reserve large parts of virtual address space and then commits/decommits memory in reserved range. Explicit commit requests are supported, so Mimalloc can gracefully handle out of memory.

Note: See TracTickets for help on using tickets.