Opened 9 years ago

Closed 8 years ago

#6232 closed bug (fixed)

PANIC: heap: kernel heap has run out of memory

Reported by: mmadia Owned by: mmlr
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: heap Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev37221-2h boots; hrev37236-2h KDL during boot. The system is an AMD x2 64 + (2) 2GB sticks. hrev36983 reports 3007MiB total. The boot-menu option to ignore >4GB doesn't help. listdev & serial log attached.

I'll check if remote serial debugging works at that point.

Attachments (2)

mmadia.listdev (2.7 KB ) - added by mmadia 9 years ago.
serial.log (50.7 KB ) - added by mmadia 9 years ago.

Download all attachments as: .zip

Change History (9)

by mmadia, 9 years ago

Attachment: mmadia.listdev added

by mmadia, 9 years ago

Attachment: serial.log added

comment:1 by mmadia, 9 years ago

serial_input does indeed work -- so a remote debug session over ssh+minicom is a possibility. Also, while the serial.log is from 37245-2h, the lowest failing revision that I tested was 37236-2.

comment:2 by axeld, 9 years ago

Version: R1/alpha2R1/Development

I'm probably dumb, but what does the -2 and -2h mean? :-)

in reply to:  2 comment:3 by michael.weirauch, 9 years ago

Replying to axeld:

I'm probably dumb, but what does the -2 and -2h mean? :-)

Probably gcc2 and gcc2-hybrid ;)

comment:4 by bonefish, 9 years ago

Keywords: heap added
Owner: changed from bonefish to mmlr
Status: newassigned

Looks like a heap bug:

848	heap_add_area: area 171 added to large heap 0xcd6a5999 - usable range 0xcdc01000 - 0xce000000
849	PANIC: heap: kernel heap has run out of memory

At a quick glance, I see:

  • memalign() issues a grow request and waits only once.
  • heap_grow_thread() always notifies all waiting memalign()s.

There's a race condition. E.g. in case of two memalign()s -- which don't even need to be concurrent (due to the possible release of sHeapGrowSem even in case the allocation succeeded) -- the second one could be notified before its heap has actually been resized.

Not sure whether this is the bug we see here, but I strongly suspect a heap problem to be the cause at any rate.

There are at least two possible reasons why the phys_addr_t width change could trigger the problem now: The timing of allocations might have changed and the sizes of some allocations have changed (e.g. the one in the stack crawl (DMABuffer::Create()) actually depends on the size of phys_addr_t -- it's now > 8 KB).

@Matt: Two tests would be interesting:

  • Disable SMP in the boot loader. That changes the number of heaps and could change the appearance problem.
  • Enable USE_SLAB_ALLOCATOR_FOR_MALLOC in build/user_config_headers/kernel_debug_config.h. Not using the heap anymore that should prevent the issue completely.

in reply to:  4 comment:5 by mmadia, 9 years ago

Replying to bonefish:

@Matt: Two tests would be interesting:

  • Disable SMP in the boot loader. That changes the number of heaps and could change the appearance problem.
  • Enable USE_SLAB_ALLOCATOR_FOR_MALLOC in build/user_config_headers/kernel_debug_config.h. Not using the heap anymore that should prevent the issue completely.

Using either of these allow 37236 to boot.

comment:6 by axeld, 9 years ago

I've (at least temporarily) enabled the slab allocator for malloc() by default now in hrev37327 to work around this problem for now. In the end, there is no reason to keep heap.cpp besides its extra debugging capabilities.

comment:7 by mmlr, 8 years ago

Resolution: fixed
Status: assignedclosed

Since the slab is now used as the heap and the old heap has been moved to the dedub facilities, this can be closed.

Note: See TracTickets for help on using tickets.