Opened 13 years ago

Last modified 10 years ago

#8345 closed bug

PANIC: ASSERT FAILED ... x86/paging/pae/x86VMTranslationMapPAE.cpp:231 — at Version 8

Reported by: kallisti5 Owned by: bonefish
Priority: high Milestone: R1/beta1
Component: System/Kernel Version: R1/Development
Keywords: vm pae Cc: smc.collins@…, mdisreali@…, umccullough, degea@…
Blocked By: Blocking:
Platform: x86

Description (last modified by bonefish)

I've seen this a few times now randomly after doing lots of compiling under Haiku GCC4...

PANIC: ASSERT FAILED ... x86/paging/pae/x86VMTranslationMapPAE.cpp:231
(*entry & 0x0000000000000001LL) == 0; virtual address: 0x7ffef000, existing pte: 0xffffffffffffffff

http://cgit.haiku-os.org/haiku/tree/src/system/kernel/arch/x86/paging/pae/X86VMTranslationMapPAE.cpp?id=hrev43765#n229

Change History (11)

comment:1 by kallisti5, 13 years ago

Description: modified (diff)

by kallisti5, 13 years ago

Attachment: IMG_20120220_131211.jpg added

by kallisti5, 13 years ago

Attachment: IMG_20120220_131211.2.jpg added

by kallisti5, 13 years ago

Attachment: IMG_20120220_131220.jpg added

comment:2 by kallisti5, 13 years ago

Haiku: hrev43717 gcc4 hybrid not dirty.

comment:3 by kallisti5, 13 years ago

Milestone: R1R1/alpha4
Priority: normalcritical

speaking to mmlr, he mentions also seeing this issue:

3:20 <@mmlr> kallisti5: I've seen that translationmap error as well
13:20 <@kallisti5> mmlr: thats good to hear :)
13:20 <@kallisti5> i've seen it quite a few times as of late
13:20 <@kallisti5> mmlr: you don't use radeon_hd do you?
13:20 <@mmlr> I don't
13:21 <@kallisti5> i'm just asking because it came from screen_blanker
13:21 <@kallisti5> ok
13:21 <@mmlr> I think it's some kind of edge case when nearly exhausting physical pages
13:21 <@kallisti5> mmlr: think it may be a blocker for a4?
13:21 <@mmlr> I've seen it last while stress testing things with the guard heap
13:21 <@mmlr> I originally thought it might be related to me manipulating the entries directly
13:22 <@mmlr> but that confirms it happens without guard heap as well
13:22 <@mmlr> well, technically it is kind of critical
13:23 <@mmlr> not sure if someone actually is going to have the time to look into it

Marking a critical bug for alpha4, the random system crash bugs are pretty annoying :)

comment:4 by anevilyak, 13 years ago

Owner: changed from axeld to bonefish
Status: newassigned

comment:5 by SeanCollins, 13 years ago

Kallisti, I get this crash allot to, I can get around it by disablling all but one cpu core on my machine. Could you try that on your end as well ? I use pulse to disable all but one cpu core.

comment:6 by SeanCollins, 13 years ago

Cc: smc.collins@… added

comment:7 by kallisti5, 12 years ago

Still happens on hrev44242

I disabled hyper-threading which drops the cores to one vs two.. Will report back status after some testing.

comment:8 by bonefish, 12 years ago

Description: modified (diff)

I just noticed I hadn't commented on that one yet. While the issue itself isn't all that critical the underlying cause possibly is. The Map() method of the translation map maps a virtual address to a physical page. That is achieved by writing a respectively composed value (the physical page address mixed with flags for the access permissions and other stuff) into a respective page table entry (PTE). It is expected that when Map() is invoked that the virtual address is not already mapped to a physical page. That's what the assert checks (the "page present" flag of the PTE).

Since the new value is written to the PTE, it will be OK afterwards -- the panic is continuable -- but that the previous value got there is worrisome. Even more so since in this case the value is ~0, something never written by the paging code. So someone else wrote into that page table. Not a comforting thought.

Michael mentioned that he thinks that this happens when nearly exhausting physical pages. When it happens next time, please run a page_stats so we can verify the theory. Also, it would be interesting in what situation the assert was triggered. Like how long the machine ran and what you did before. Obviously a reproducible test case would be perfect.

Note: See TracTickets for help on using tickets.