Opened 16 years ago

Closed 16 years ago

#1727 closed bug (fixed)

Haiku panics with vm_page fault/double fault after 20-30s usage

Reported by: tqh Owned by: axeld
Priority: blocker Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc: fredrik.holmqvist@…, andreasf, anevilyak@…
Blocked By: Blocking:
Platform: All

Description

Haiku boots fine and the desktop loads, but after 20-30s the system goes into KDL with panic vm_page fault or double fault message.

This is on AMD X64 X2, (AMD690G, SB600, Radeon X1250 (shared mem)) with SATA disk in SATA-mode. Haiku hrev23739

Img of KDL coming soon.

This happens on every boot so far. I thought this was related to the reopened bug 1578, but now I think it deserves it own bug.

Attachments (6)

kdl.gif (180.2 KB ) - added by tqh 16 years ago.
Image of KDL
kdl2.gif (95.6 KB ) - added by tqh 16 years ago.
c2d kdl1.JPG (78.9 KB ) - added by pieterpan 16 years ago.
Core2Duo KDL1
c2d kdl2.JPG (84.4 KB ) - added by pieterpan 16 years ago.
Core2Duo KDL2
c2d kdl3.JPG (101.7 KB ) - added by pieterpan 16 years ago.
Core2Duo KDL3
20080427002.jpg (205.5 KB ) - added by tqh 16 years ago.
bt Revision: 25209 with firewire

Download all attachments as: .zip

Change History (44)

by tqh, 16 years ago

Attachment: kdl.gif added

Image of KDL

comment:1 by axeld, 16 years ago

Priority: normalblocker

Since when does this happen for you?

comment:2 by tqh, 16 years ago

I just got this box, so it's been so since yesterday.

comment:3 by tqh, 16 years ago

I see I missed to mention that this is a gcc4 build.

by tqh, 16 years ago

Attachment: kdl2.gif added

comment:4 by tqh, 16 years ago

Still there with fixes in hrev23749.

comment:5 by jackburton, 16 years ago

Happens sometimes for me too, more or less since last BeGeistert's changes.

comment:6 by tqh, 16 years ago

Tried the revision that turned of depots in slab (or something). It ran a lot longer and with a lot of the demo apps running, but finally had a 'page-fault with interrupts disabled'. Unfortunatly no trace of that one.

comment:7 by pieterpan, 16 years ago

I also have the same problem. Very stable with the old GCC, but with gcc4, I get crashes within 2 minutes. I have 3 bt's. C2D laptop, 2 processors, but I also tried by disabling SMP, same crash. I use fail safe video because geforce 8xxx is not supported.

by pieterpan, 16 years ago

Attachment: c2d kdl1.JPG added

Core2Duo KDL1

by pieterpan, 16 years ago

Attachment: c2d kdl2.JPG added

Core2Duo KDL2

by pieterpan, 16 years ago

Attachment: c2d kdl3.JPG added

Core2Duo KDL3

comment:8 by pieterpan, 16 years ago

By the way, see http://www.freelists.org/archives/haiku-development/02-2008/msg00022.html for more cases, and 1 case where the problem does not show.

I'm running hrev23833.

comment:9 by anevilyak, 16 years ago

I can confirm that this happens to me as well, as mentioned on the mailing list. Except unlike most of the other posters, I'm on a single core (Athlon64 Venice 3200+). GCC2 build works perfectly on the same hardware, while I encounter both the double fault problem within 30 seconds as mentioned by tqh, and also the Radeon driver not being able to correctly initialize when built with gcc4.

comment:10 by tqh, 16 years ago

I compared gcc4 cross compilers -dumpspecs to hrev5, and to my untrained eye it looks like it's missing things it should have. The lack of '-Ddeclspec(x)=...' stood out to me as Firefox has a lot of those, and a quick look in OpenGrok says it's used in Haiku.

Is this possibly a problem?

comment:11 by andreasf, 16 years ago

Cc: andreasf added

comment:12 by tqh, 16 years ago

hrev23938 seems to run fine for me. It's either fixed or it went away when I increased the disk image size to 300 MB.

comment:13 by tqh, 16 years ago

It's still crashing alot, but the KDL's seems a lot more varied. Latest boot, the input in gdb froze and the thread for gdb was stuck in system/kernel/sem.cpp:switch_sem_etc. On a sidenote anevilyak has the same issues, but trac doesn't seem to work for him right now.

comment:14 by tqh, 16 years ago

One of the pagefaults seems reproducable by using QEMU with the -kernel-kqemu. For me that is

qemu-system-x86_64 -kernel-kqemu haiku.image

And with kqemu properly installed, without that flag it runs fine.

comment:15 by mmlr, 16 years ago

Actually Haiku shouldn't run with kernel kqemu at all -> #748 probably related to how we store our "current thread". Sadly I can't test that here as I don't have a x86-64 platform.

comment:16 by tqh, 16 years ago

That command is only for those running 64-bit linux. As most in this bug seems to do that I added the info. I guess on 32-bit platforms you can use qemu -kernel-kqemu haiku.image directly.

So all you need is a gcc4 built image, and qemu with kqemu.

comment:17 by mmlr, 16 years ago

No, that's not what I meant. Haiku should not boot at all within QEMU when you use -kernel-kqemu. Not the other way around. Using QEMU under Haiku with -kernel-kqemu is not an issue. Crashing is "expected" per bug #748 if you used -kernel-kqemu to run Haiku. If the crash is different than the one from #748 you should add that info into that bug so we can try to find the difference.

comment:18 by tqh, 16 years ago

Understood. The same panic as #748 occurs when running native but often after minutes instead of boot, but there are other panics as well. I will try to add info whenever I can. Hopefully I might be able to use serial debugging to give more detailed info later.

Oh, and thanks for all the hard work in the smp-fixes.

comment:19 by tqh, 16 years ago

I built a version where I added HAIKU_GCC_BASE_FLAGS = $(HAIKU_GCC_BASE_FLAGS) -m32 ; to BuildSetup, and it stayed up with many demo-apps for 3 and 7 minutes the two first boots. Not sure if that is set elsewhere, but I got the feeling it was behaving better. First time it just hang, secong time it paniced, but I missed the log. I'll test more..

comment:20 by mmlr, 16 years ago

There is actually a configure argument "--use-32bit" that is supposed to enable exactly that. I guess it could even be required when using a 64bit host and gcc4.

comment:21 by tqh, 16 years ago

Yes, it might be so. I'm still curious though, because now gdb loads all libs for firefox very fast (1 second max), while it was very slow before. I'll look at what commands jam actually runs.

comment:22 by tqh, 16 years ago

It was set already, couldn't find any ref to it in the xref, so that's why I added it myself.

comment:23 by tqh, 16 years ago

I'm curious does this only happen for 64-bit processors or is it any processor as long as it's gcc4? I'm using AMD64 myself.

comment:24 by tqh, 16 years ago

Happens on P4 as well. So the answer is no.

comment:25 by anevilyak, 16 years ago

Cc: anevilyak@… added

comment:26 by tqh, 16 years ago

Removing the firewire busmanager seemed to work for me. Just did a quick 15 min test though.

in reply to:  26 comment:27 by andreasf, 16 years ago

Replying to tqh:

Removing the firewire busmanager seemed to work for me.

For the record, that was what made it work for me, as reported on the haiku-development list.

Two GCC4-specific flags had also been added in hrev25016 and hrev25054 by mmlr.

comment:28 by tqh, 16 years ago

The thread andreasf talks about is: [haiku-development] Disabling Strict Aliasing for GCC4 Builds

comment:29 by tqh, 16 years ago

Running over 35 mins with full cpu-activity (GLTeapot, Chart(2 threads, max settings)) with firewire busmanager removed.

I guess that's why builds work better in vmware also as firewire isn't enabled there.

New bug for firewire troubles in gcc4 or handle it here?

comment:30 by tqh, 16 years ago

been running hrev25182 with firewire for 1 hour 13 mins now..

comment:31 by tqh, 16 years ago

.. 1 h 55 min. I'll shut it down soon.

comment:32 by anevilyak, 16 years ago

Running for ~3h without firewire and radeon here now, hrev25186.

by tqh, 16 years ago

Attachment: 20080427002.jpg added

bt Revision: 25209 with firewire

comment:33 by tqh, 16 years ago

As you can see from the pic, today it went back to crashing after a few minutes.

in reply to:  33 comment:34 by absabs, 16 years ago

I come across the same problem on a 1394 card with via chip while trying to solve the parity error bug (ticket 2243). I can dig into this problem and try to fix it

As you can see from the pic, today it went back to crashing after a few minutes.

comment:35 by laplace, 16 years ago

Applied patch by JiSheng Zhang in hrev26272. Please test.

comment:36 by tqh, 16 years ago

I have confirmed that firewire is running, and it hasn't caused any problems here yet. I have only tested for a very short time, so more testing needed.

(As I understood the commits this was a workaround for another problem. Is there a bug for that problem already?)

comment:37 by tqh, 16 years ago

This seems to be fixed here.

comment:38 by absabs, 16 years ago

Resolution: fixed
Status: newclosed

It seems that this bug has been fixed in hrev26272. So close it.

Note: See TracTickets for help on using tickets.