Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#5506 closed bug (fixed)

Kernel panic "heap configuration invalid - max bin count reached"

Reported by: drcouzelis Owned by: mmlr
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: x86

Description

This development build failed to boot from a live CD on my computer:

haiku-nightly-hrev35650-x86gcc4hybrid-cd.zip 11:28PM 27th February, 2010

I attached a "screenshot" of the output from the "bt" command.

Here is a information about all of my computer hardware: hardware

Other notes: I decided to try the latest nightly build since the alpha 1 build has a graphical glitch on my computer. (the colors are messed up) In other words, the alpha 1 build boots and runs on my computer. I have not done any "install" yet; I have only run from a live CD. I haven't reported the graphical glitch from the alpha 1 build because I want to see if it still exists in the latest nightly build.

Please let me know if I can provide any further information. Thank you.

Attachments (3)

panic.jpg (111.2 KB ) - added by drcouzelis 14 years ago.
r34934-symbol.jpg (277.9 KB ) - added by drcouzelis 14 years ago.
r35718-call.jpg (98.6 KB ) - added by drcouzelis 14 years ago.

Download all attachments as: .zip

Change History (16)

by drcouzelis, 14 years ago

Attachment: panic.jpg added

comment:1 by bonefish, 14 years ago

Component: - GeneralSystem/Kernel
Owner: changed from nobody to mmlr
Status: newassigned

comment:2 by mmlr, 14 years ago

That's an extremely curious one! The allocator being created here is the grow heap, which uses a fixed configuration (the small heap one). It even uses a fixed size, which doesn't actually matter though as the size can't influence the configuration. The configuration is kept in the static sHeapClasses and is never modified. It had to be good at one point as heap_init already used it to set up the small heap before that point. I can only guess this to be memory corruption. Is this always reproducible? FWIW this exact image boots fine on QEMU here.

comment:3 by drcouzelis, 14 years ago

Thank you very much for looking into it. Yes, it is always reproducible. I also get the same error on today's nightly "hrev35693".

Since alpha 1 boots, how about I download old nightly images and find out when it stopped being able to boot on my computer? In theory, that should allow you to pinpoint what the change was that causes the bug. Shall I do that?

Is there a test I can do to see if I have memory corruption? I use Arch Linux as my primary OS, by the way.

in reply to:  3 comment:4 by mmlr, 14 years ago

Replying to drcouzelis:

Since alpha 1 boots, how about I download old nightly images and find out when it stopped being able to boot on my computer? In theory, that should allow you to pinpoint what the change was that causes the bug. Shall I do that?

If you want to invest that amount of time then that'd certainly be very much appreciated.

Is there a test I can do to see if I have memory corruption? I use Arch Linux as my primary OS, by the way.

I more meant memory corruption due to Haiku bugs, not bad hardware. Of course that could theoretically be the case as well, though since the data in question really is static const and should be loaded and stay at a single location it'd be strange for it to work at first and then be gone later. A memory corruption due to some bug in the boot process is more likely.

What you can do is to dump the memory that supposedly contains the configuration to see what it ends up being. That way we might get an idea as to who's overwriting it. To do that execute these commands in the kernel debugger:

symbol sHeapClasses
dw _ 8

That should give some pointer to a string (you can dump it with string <pointer> if you're curious, it should read "small"), 0x32 -> 50 the initial percentage (unused in this case), 0x200 -> B_PAGE_SIZE / 8 the max allocation size, 0x1000 -> B_PAGE_SIZE the heap page size, 0x8 the min bin size, 0x4 the alignment, 0x8 the min count per page and 0x10 the max waste per page. If any of that doesn't match then something overwrote the config. If so please execute a dw _ 64 to get a bit more context, take another screenshot and attach it here.

comment:5 by drcouzelis, 14 years ago

I've narrowed it down to somewhere between hrev34837 (1 Jan) and hrev34934 (7 Jan). There were four builds inbetween them. I will download them while I sleep and check them out tomorrow.

I will attach a screenshot from the "dw" command. Everything seems to match up with what you said it should be.

by drcouzelis, 14 years ago

Attachment: r34934-symbol.jpg added

in reply to:  5 ; comment:6 by mmlr, 14 years ago

Replying to drcouzelis:

I've narrowed it down to somewhere between hrev34837 (1 Jan) and hrev34934 (7 Jan). There were four builds inbetween them. I will download them while I sleep and check them out tomorrow.

Cool thanks for that!

I will attach a screenshot from the "dw" command. Everything seems to match up with what you said it should be.

Ok that's interesting. It indeed looks just as it should. Then I can only imagine that the class passed to the function would be incorrect, as the calculations are really all fixed and don't change from machine to machine.

Can you please run the following command:

call 12 -5

This should give the arguments passed to heap_create_allocator(). To make sure I'm not completely off, please run the first argument through the string command. It should output "grow". The second argument is the allocation base which can vary, then the size which should be 0x100000 (1MB), then the heap_class pointer which is supposed to be equal to the pointer you get from symbol sHeapClasses and then 0 (false) for not allocating the allocator structure on the heap.

in reply to:  6 comment:7 by mmlr, 14 years ago

Replying to mmlr:

then the size which should be 0x100000 (1MB)

At that point the base and size have already been changed though, so you will likely get these values off by 80 (0x50). Can you please also get a dump of the created allocator by running:

dw <second argument found above minus the 0x50 offset> 64

In case the area somehow got unreadable/writeable memory (device memory for example) it'd show here.

comment:8 by drcouzelis, 14 years ago

I've narrowed it down: hrev34909 (5 Jan) will boot but hrev34934 (7 Jan) fails to boot and has a kernel panic. There was no release on 6 Jan.

I will attach a screenshot with the output from the commands you requested.

Also, I recently read "Welcome to Kernel Debugging Land", so I can be ready to help more. :)

by drcouzelis, 14 years ago

Attachment: r35718-call.jpg added

comment:9 by drcouzelis, 14 years ago

The nightly image will now boot on my computer: hrev35718 (1 Mar) has the kernel panic but hrev35731 (2 Mar) will boot.

I guess you could mark this as "solved", but I don't know how it got solved. I'd still be happy to give you more information if needed. Thank you!

comment:10 by anevilyak, 14 years ago

hrev35726 might have solved it then, which would point to a buggy BIOS.

comment:11 by anevilyak, 14 years ago

I take that back, hrev35726 was actually a software bug, hrev35736 was the BIOS workaround.

comment:12 by mmlr, 14 years ago

Resolution: fixed
Status: assignedclosed

Yeah, the dump was pretty obviously some kind of unwrite/readable memory. I would have suspected it to be fixed with hrev35726, so thanks for checking/verifying that. It was caused by hrev34933 btw.

comment:13 by drcouzelis, 14 years ago

Thank you very much! I now have Haiku installed on my computer. I will continue using it and try to submit helpful bug reports.

Note: See TracTickets for help on using tickets.