Opened 11 years ago

Closed 11 years ago

#1727 closed bug (fixed)

Haiku panics with vm_page fault/double fault after 20-30s usage

Reported by: tqh Owned by: axeld
Priority: blocker Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc: fredrik.holmqvist@…, andreasf, anevilyak@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Haiku boots fine and the desktop loads, but after 20-30s the system goes into KDL with panic vm_page fault or double fault message.

This is on AMD X64 X2, (AMD690G, SB600, Radeon X1250 (shared mem)) with SATA disk in SATA-mode. Haiku hrev23739

Img of KDL coming soon.

This happens on every boot so far. I thought this was related to the reopened bug 1578, but now I think it deserves it own bug.

Attachments (6)

kdl.gif (180.2 KB) - added by tqh 11 years ago.
Image of KDL
kdl2.gif (95.6 KB) - added by tqh 11 years ago.
c2d kdl1.JPG (78.9 KB) - added by pieterpan 11 years ago.
Core2Duo KDL1
c2d kdl2.JPG (84.4 KB) - added by pieterpan 11 years ago.
Core2Duo KDL2
c2d kdl3.JPG (101.7 KB) - added by pieterpan 11 years ago.
Core2Duo KDL3
20080427002.jpg (205.5 KB) - added by tqh 11 years ago.
bt Revision: 25209 with firewire

Download all attachments as: .zip

Change History (44)

Changed 11 years ago by tqh

Attachment: kdl.gif added

Image of KDL

comment:1 Changed 11 years ago by axeld

Priority: normalblocker

Since when does this happen for you?

comment:2 Changed 11 years ago by tqh

I just got this box, so it's been so since yesterday.

comment:3 Changed 11 years ago by tqh

I see I missed to mention that this is a gcc4 build.

Changed 11 years ago by tqh

Attachment: kdl2.gif added

comment:4 Changed 11 years ago by tqh

Still there with fixes in hrev23749.

comment:5 Changed 11 years ago by jackburton

Happens sometimes for me too, more or less since last BeGeistert's changes.

comment:6 Changed 11 years ago by tqh

Tried the revision that turned of depots in slab (or something). It ran a lot longer and with a lot of the demo apps running, but finally had a 'page-fault with interrupts disabled'. Unfortunatly no trace of that one.

comment:7 Changed 11 years ago by pieterpan

I also have the same problem. Very stable with the old GCC, but with gcc4, I get crashes within 2 minutes. I have 3 bt's. C2D laptop, 2 processors, but I also tried by disabling SMP, same crash. I use fail safe video because geforce 8xxx is not supported.

Changed 11 years ago by pieterpan

Attachment: c2d kdl1.JPG added

Core2Duo KDL1

Changed 11 years ago by pieterpan

Attachment: c2d kdl2.JPG added

Core2Duo KDL2

Changed 11 years ago by pieterpan

Attachment: c2d kdl3.JPG added

Core2Duo KDL3

comment:8 Changed 11 years ago by pieterpan

By the way, see http://www.freelists.org/archives/haiku-development/02-2008/msg00022.html for more cases, and 1 case where the problem does not show.

I'm running hrev23833.

comment:9 Changed 11 years ago by anevilyak

I can confirm that this happens to me as well, as mentioned on the mailing list. Except unlike most of the other posters, I'm on a single core (Athlon64 Venice 3200+). GCC2 build works perfectly on the same hardware, while I encounter both the double fault problem within 30 seconds as mentioned by tqh, and also the Radeon driver not being able to correctly initialize when built with gcc4.

comment:10 Changed 11 years ago by tqh

I compared gcc4 cross compilers -dumpspecs to hrev5, and to my untrained eye it looks like it's missing things it should have. The lack of '-Ddeclspec(x)=...' stood out to me as Firefox has a lot of those, and a quick look in OpenGrok says it's used in Haiku.

Is this possibly a problem?

comment:11 Changed 11 years ago by andreasf

Cc: andreasf added

comment:12 Changed 11 years ago by tqh

hrev23938 seems to run fine for me. It's either fixed or it went away when I increased the disk image size to 300 MB.

comment:13 Changed 11 years ago by tqh

It's still crashing alot, but the KDL's seems a lot more varied. Latest boot, the input in gdb froze and the thread for gdb was stuck in system/kernel/sem.cpp:switch_sem_etc. On a sidenote anevilyak has the same issues, but trac doesn't seem to work for him right now.

comment:14 Changed 11 years ago by tqh

One of the pagefaults seems reproducable by using QEMU with the -kernel-kqemu. For me that is

qemu-system-x86_64 -kernel-kqemu haiku.image

And with kqemu properly installed, without that flag it runs fine.

comment:15 Changed 11 years ago by mmlr

Actually Haiku shouldn't run with kernel kqemu at all -> #748 probably related to how we store our "current thread". Sadly I can't test that here as I don't have a x86-64 platform.

comment:16 Changed 11 years ago by tqh

That command is only for those running 64-bit linux. As most in this bug seems to do that I added the info. I guess on 32-bit platforms you can use qemu -kernel-kqemu haiku.image directly.

So all you need is a gcc4 built image, and qemu with kqemu.

comment:17 Changed 11 years ago by mmlr

No, that's not what I meant. Haiku should not boot at all within QEMU when you use -kernel-kqemu. Not the other way around. Using QEMU under Haiku with -kernel-kqemu is not an issue. Crashing is "expected" per bug #748 if you used -kernel-kqemu to run Haiku. If the crash is different than the one from #748 you should add that info into that bug so we can try to find the difference.

comment:18 Changed 11 years ago by tqh

Understood. The same panic as #748 occurs when running native but often after minutes instead of boot, but there are other panics as well. I will try to add info whenever I can. Hopefully I might be able to use serial debugging to give more detailed info later.

Oh, and thanks for all the hard work in the smp-fixes.

comment:19 Changed 11 years ago by tqh

I built a version where I added HAIKU_GCC_BASE_FLAGS = $(HAIKU_GCC_BASE_FLAGS) -m32 ; to BuildSetup, and it stayed up with many demo-apps for 3 and 7 minutes the two first boots. Not sure if that is set elsewhere, but I got the feeling it was behaving better. First time it just hang, secong time it paniced, but I missed the log. I'll test more..

comment:20 Changed 11 years ago by mmlr

There is actually a configure argument "--use-32bit" that is supposed to enable exactly that. I guess it could even be required when using a 64bit host and gcc4.

comment:21 Changed 11 years ago by tqh

Yes, it might be so. I'm still curious though, because now gdb loads all libs for firefox very fast (1 second max), while it was very slow before. I'll look at what commands jam actually runs.

comment:22 Changed 11 years ago by tqh

It was set already, couldn't find any ref to it in the xref, so that's why I added it myself.

comment:23 Changed 11 years ago by tqh

I'm curious does this only happen for 64-bit processors or is it any processor as long as it's gcc4? I'm using AMD64 myself.

comment:24 Changed 11 years ago by tqh

Happens on P4 as well. So the answer is no.

comment:25 Changed 11 years ago by anevilyak

Cc: anevilyak@… added

comment:26 Changed 11 years ago by tqh

Removing the firewire busmanager seemed to work for me. Just did a quick 15 min test though.

comment:27 in reply to:  26 Changed 11 years ago by andreasf

Replying to tqh:

Removing the firewire busmanager seemed to work for me.

For the record, that was what made it work for me, as reported on the haiku-development list.

Two GCC4-specific flags had also been added in hrev25016 and hrev25054 by mmlr.

comment:28 Changed 11 years ago by tqh

The thread andreasf talks about is: [haiku-development] Disabling Strict Aliasing for GCC4 Builds

comment:29 Changed 11 years ago by tqh

Running over 35 mins with full cpu-activity (GLTeapot, Chart(2 threads, max settings)) with firewire busmanager removed.

I guess that's why builds work better in vmware also as firewire isn't enabled there.

New bug for firewire troubles in gcc4 or handle it here?

comment:30 Changed 11 years ago by tqh

been running hrev25182 with firewire for 1 hour 13 mins now..

comment:31 Changed 11 years ago by tqh

.. 1 h 55 min. I'll shut it down soon.

comment:32 Changed 11 years ago by anevilyak

Running for ~3h without firewire and radeon here now, hrev25186.

Changed 11 years ago by tqh

Attachment: 20080427002.jpg added

bt Revision: 25209 with firewire

comment:33 Changed 11 years ago by tqh

As you can see from the pic, today it went back to crashing after a few minutes.

comment:34 in reply to:  33 Changed 11 years ago by absabs

I come across the same problem on a 1394 card with via chip while trying to solve the parity error bug (ticket 2243). I can dig into this problem and try to fix it

As you can see from the pic, today it went back to crashing after a few minutes.

comment:35 Changed 11 years ago by laplace

Applied patch by JiSheng Zhang in hrev26272. Please test.

comment:36 Changed 11 years ago by tqh

I have confirmed that firewire is running, and it hasn't caused any problems here yet. I have only tested for a very short time, so more testing needed.

(As I understood the commits this was a workaround for another problem. Is there a bug for that problem already?)

comment:37 Changed 11 years ago by tqh

This seems to be fixed here.

comment:38 Changed 11 years ago by absabs

Resolution: fixed
Status: newclosed

It seems that this bug has been fixed in hrev26272. So close it.

Note: See TracTickets for help on using tickets.