Opened 10 years ago

Last modified 9 months ago

#2900 assigned bug

panic: double fault, always thread 4 kernel_daemon

Reported by: stippi Owned by: nobody
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc: bergep@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description (last modified by stippi)

When I run Haiku on my desktop machine, I get this panic after a few minutes. Usually the machine has been idle for a little while. I searched for "kernel daemon" and "double fault". There is one double fault bug which has been fixed appearantly. Since I have only USB hardware attached, I cannot type bt in the KDL.

I have never seen this on my IBM/Lenovo T60 laptop. The desktop machine should be a P35 motherboard and uses an nVidia 7300 with native driver. As far as I can tell, the ide bus_manager is used. Both systems use a Core 2 Duo CPU. Otherwise, I don't know what's different between the two systems.

Change History (13)

comment:1 Changed 10 years ago by stippi

Description: modified (diff)

comment:2 Changed 10 years ago by axeld

You could install the auto_stack_trace module, and see if that helps with "bt". Photos welcome :-)

comment:3 Changed 10 years ago by stippi

I hooked up a PS/2 keyboard and here is what I get (can't find a camera so I am typing it up):

<kernel_x86>:double_fault_exception

801236f8 <kernel_x86>:int_bottom_vm86 + 0x004d
kernel iframe at 0x80123704 (end = 0x80123754)
 eax 0x5394      ebx 0x0         ecx 0x0         edx 0x8015ff20
 esi 0x8010f160  edi 0x817b0800  ebp 0x8015d00c  esp 0x8015d00c
 eip 0x800c9cd1 eflags 0x210097
 vector: 0x8, error code: 0x0

80123704 <kernel_86>:int_bottom + 0x0021
kernel iframe at 0x8015d00c (end = 0x8015d05c)
 eax 0x5394      ebx 0x0         ecx 0x0         edx 0x8015ff20
 esi 0x8010f160  edi 0x817b0800  ebp 0x8015d05c  esp 0x8015d040
 eip 0x800c9cd1 eflags 0x210097
 vector: 0xd, error code: 0x0

8015d00c <kernel_86>:int_bottom + 0x0021
kernel iframe at 0x8015d05c (end = 0x8015d0ac)
 eax 0x5394      ebx 0x0         ecx 0x0         edx 0x8015ff20
 esi 0x8010f160  edi 0x817b0800  ebp 0x8015d0ac  esp 0x8015d090
 eip 0x800c9cd1 eflags 0x210097
 vector: 0xd, error code: 0x0

8015d05c <kernel_86>:int_bottom + 0x0021
kernel iframe at 0x8015d0ac (end = 0x8015d0fc)
 eax 0x5394      ebx 0x0         ecx 0x0         edx 0x8015ff20
 esi 0x8010f160  edi 0x817b0800  ebp 0x8015d0fc  esp 0x8015d0e0
 eip 0x800c9cd1 eflags 0x210097
 vector: 0xd, error code: 0x0

8015d0ac <kernel_86>:int_bottom + 0x0021
kernel iframe at 0x8015d0fc (end = 0x8015d14c)
 eax 0x5394      ebx 0x0         ecx 0x0         edx 0x8015ff20
 esi 0x8010f160  edi 0x817b0800  ebp 0x8015d14c  esp 0x8015d130
 eip 0x800c9cd1 eflags 0x210097
 vector: 0xd, error code: 0x0

8015d0fc <kernel_86>:int_bottom + 0x0021
kernel iframe at 0x8015d14c (end = 0x8015d19c)
 eax 0x5394      ebx 0x0         ecx 0x0         edx 0x8015ff20
 esi 0x8010f160  edi 0x817b0800  ebp 0x8015d19c  esp 0x8015d180
 eip 0x800c9cd1 eflags 0x210097
 vector: 0xd, error code: 0x0

[...]

I hope I typed enough for you to see the pattern in this. (Hint: certain numbers shift by 5)

This goes on several pages until there is a "ff20040b -- read fault".

I am going to reinitialize the partition and do a clean build, because I have that partition around for ages.

comment:4 in reply to:  3 Changed 10 years ago by bonefish

Replying to stippi:

I hope I typed enough for you to see the pattern in this. (Hint: certain numbers shift by 5)

Obviously someone scrambled something in the kernel. The entry point for exception handling (int_bottom) causes a page fault, which goes on recursively until the stack overflows and the double fault handler takes over.

The faulting instruction in int_bottom accesses the thread structure (pointed to by edi). Please verify with the "thread" command (the first thing it prints is the thread structure pointer). This might cause a triple fault, though. An "area $edi" would be interesting, too.

Since the kernel daemon thread can run all kinds of code, it might give further hints to know what code it actually executed last. If the stack trace doesn't tell it (the part below the bottommost iframe would point there), you could enable kernel tracing and add ktrace_printf()s in src/system/kernel/kernel_daemon.cpp: KernelDaemon::_DaemonThread() before and after invoking the daemon function (also printing the daemon function itself, of course). A "traced 0 thread 4" will produce the info. With the "ls" command the function address can be resolved to a symbol name.

If you have KDEBUG disabled, it might be a good idea to enabled it. Maybe some ASSERT triggers earlier.

comment:5 Changed 10 years ago by stippi

I just got the crash again. Yesterday I built a completely fresh installation and didn't get the crash at first, apparently it may take a while until it crashes. This morning, I had one app_server crash and rebooted, during the second session, I got the double fault again. So unless the app_server crash corrupted something, I would say the problem is not attributed to a corrupt image.

So far, I was only able to retrieve the area info. area $edi gave "could not find area $edi (0)", but I scrolled down to the end of the stack trace and did "area 0x81012f60" which was the last contents of edi. That gave 'kernel heap'. I will now enable the kernel debugging and follow the rest of your instructions. Thanks!

comment:6 Changed 10 years ago by stippi

Now I got the crash with tracing turned on and my output in place. The last function the thread tried to execute is "apm_daemon_FPvi". The tracing after entering that function is never printed. The function executed immediately before that was "swap_hash_resizerFPvi". I had kernel heap, signal and syscall tracing enabled, there are respective entries preceding in the listing, but there are no entries between executing the "swap_hash_resizer", "done" and trying to invoke "apm_daemon". Is that helpful? I will run the system again and try to see if the combination is the same next time it crashes.

comment:7 Changed 10 years ago by stippi

Milestone: R1/alpha1R1
Priority: blockernormal

Got the crash again when trying to execute "apm_daemon". This time, there was no other function immediately before that, just a couple of syscalls, two after printing the output before executing the function (_kern_snooze_etc() post/pre). I am going to disable APM, which does me no good anyways and remove this bug from the R1/alpha target.

comment:8 Changed 10 years ago by axeld

So this seems to be the known APM problem - I had a similar problem on my T40p, but only if I used the PowerStatus application (random crashes).

comment:9 Changed 10 years ago by kvdman

This can be duplicated on Vmware by enabling APM in the kernel settings file, it's somewhat inconsistent though (sometimes it double faults at boot, other times after running shortly).x`

comment:10 Changed 9 years ago by thetick

A new info on this? I have a Thinkpad A22m and with APM enabled I get the double fault at boot and sometimes a little afterwards other times.

comment:11 Changed 9 years ago by thetick

Cc: bergep@… added

comment:12 Changed 2 years ago by axeld

Owner: changed from axeld to nobody
Status: newassigned

comment:13 Changed 9 months ago by cocobean

hrev51986-52017 reviewed. I currently test on an Intel-based desktop and IBM/Lenovo T-series laptop which proves a certain stableness with Haiku after hours of use.

I can look into the issue of "Vmware - enabling APM in the kernel settings file" still causes an issue.

I think we can close this ticket. Use hrev52017 as a baseline.

Note: See TracTickets for help on using tickets.