Opened 10 years ago

Closed 9 years ago

#5206 closed bug (fixed)

NMI Interrupt introduced between r34760 and r34915

Reported by: adamk Owned by: axeld
Priority: high Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description (last modified by anevilyak)

This afternoon I updated my local svn repo to hrev34915 and installed an updated build to a spare partition I use for testing.

Upon booting up, I'm greeted with a KDL: 'PANIC: Fatal exception "NMI Interrupt" occurred! Error code: 0x0" . I've noticed other reports on here, and someone on #haiku mentioned the same problem, but it seems that others are able to 'cont' and the boot up proceeds as normal. Unfortunately, continuing is not an option for me as I end up with a vm_page_fault.

I can still boot up my hrev34760 installation on another partition on the same drive.

I have a logging via a serial cable, and grabbed a backtrace which I am attaching to this ticket.

While it is certainly possible for me to start reverting to previous commits, this will undoubtedly take a while, so I'm hoping someone can read something in my backtrace that indicates the source of the problem and figure out which commit broke Haiku for me :-)

Attachments (3)

haiku-nmi-crash.txt (52.0 KB ) - added by adamk 10 years ago.
NMI Interrupt backtrace.
haiku-boot.txt (132.0 KB ) - added by adamk 10 years ago.
Successful boot of hrev34760
cpu1-bt.txt (3.9 KB ) - added by adamk 10 years ago.
cpu1 backtrace, requested by DeadYak :-)

Download all attachments as: .zip

Change History (18)

by adamk, 10 years ago

Attachment: haiku-nmi-crash.txt added

NMI Interrupt backtrace.

by adamk, 10 years ago

Attachment: haiku-boot.txt added

Successful boot of hrev34760

comment:1 by adamk, 10 years ago

I've attached a log from a successful boot of the older version. The crash in the newer version seems to happen at about this point:

acpi: ACPI disabled ahci: ahci_supports_device

I get the ACPI disabled in the new buggy version, and then the crash occurs before the ahci line.

comment:2 by anevilyak, 10 years ago

Component: - GeneralSystem/Kernel
Description: modified (diff)
Owner: changed from nobody to bonefish
Status: newassigned
Version: R1/alpha1R1/Development

From the backtrace, looks possibly related to some of Ingo's recent area management changes...if that backtrace is consistent, the output of "call 15 -3" after the NMI itself would be interesting (i.e. without trying to continue).

comment:3 by adamk, 10 years ago

The backtrace is consistent. The output of "call 15 -3" is simply:

PANIC: Fatal exception "NMI Interrupt" occurred! Error code: 0x0

Welcome to Kernel Debugging Land... Thread 2 "idle thread 2" running on CPU 0 kdebug> call 15 -3 thread 2, idle thread 2 809a9f64 800e4530 <kernel_x86>:arch_cpu_idle(0x0 (0), 0x0 (0), 0x0 (0)) kdebug>

by adamk, 10 years ago

Attachment: cpu1-bt.txt added

cpu1 backtrace, requested by DeadYak :-)

comment:4 by adamk, 10 years ago

And here's 'call 18 -3' after the attached cpu1 backtrace .

kdebug> call 18 -3 thread 15, main2 81107854 800fa3ac <kernel_x86>:memset_generic(0x80c45000, 0x0 (0), 0x1000 (4096))

comment:5 by anevilyak, 10 years ago

That unfortunately looks quite correct :( Thanks for checking though. Perhaps something's wrong with the area it just tried to create, though it'd take a bit more digging to get useful information about that one.

comment:6 by bonefish, 10 years ago

Owner: changed from bonefish to axeld

I don't know in which situations an NMI is triggered (IIRC the IA32 specs were somewhat vague), but unlike e.g. Linux Haiku doesn't handle those yet.

Might be a duplicate of #2680 and/or #3113. Passing on this hot potato... :-)

comment:7 by anevilyak, 10 years ago

Thanks for looking Ingo :) In that case I guess it'd be more helpful if adamk could binary search exactly what revision this starts at, since it's at least 100% reliably occurring.

comment:8 by adamk, 10 years ago

hrev34837 does not NMI crash (but has separate app_server issues that I will detail in another ticket once I track down the bad commit here). I'll keep bisecting this bug.

comment:9 by adamk, 10 years ago

hrev34876 is fine.

comment:10 by adamk, 10 years ago

hrev34901 is fine.

comment:11 by adamk, 10 years ago

34908 and 34911 are both good.

comment:12 by adamk, 10 years ago

34912 was the first revision to cause the NMI interrupt crash.

comment:13 by axeld, 10 years ago

When you write it as hrev34912 it is linked automatically to the corresponding changeset.

comment:14 by adamk, 10 years ago

Thanks. FYI, it's working again as of hrev34990 :-)

Adam

comment:15 by jackburton, 9 years ago

Resolution: fixed
Status: assignedclosed

Closing, since it's fixed.

Note: See TracTickets for help on using tickets.