Opened 16 years ago
Closed 10 years ago
#3113 closed bug (invalid)
Fatal Exception "NMI Interrupt" on kernel boot
Reported by: | graham | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | marcusoverhagen | |
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
Recently I tried to boot Haiku off of a usb hard drive using my laptop.
Upon loading the kernel the spash screen remains grey (no color in icons) for 5 seconds (even the virtual machine boots faster) where upon it jumps to the 3rd icon and the kernel debugger starts.
As the kernel isn't fully loaded yet I cannot save the output to a file so I had to manually write down the text and reproduce the output.
As it looks, the error seams to be with the PCI module. I tried to use this bug as an oppertunity to learn the internals of the kernel and have learned a lot in the past week about programming a kernel than I have done reading about it for a year in books.
Unfortunatly I got lost at the device tree code (where are the preloaded modules listed?) and since I don't have experiance in programming hardware beyond basic assembly code I suspect I jumped in at the deep end. So I will leave this to you.
I (hopefully) have uploaded a typed version of the kenel debugger output and a basic profile of my hardware including the controllers and devices on the PCI bus.
This was done using a windows program but if you require more information I can boot Arch Linux off of another usb drive and give you another profile.
Attachments (6)
Change History (30)
by , 16 years ago
Attachment: | hdebug.txt added |
---|
by , 16 years ago
Attachment: | hardware.html added |
---|
comment:1 by , 16 years ago
comment:3 by , 16 years ago
diver: Yes it does look similar after the driver is removed from his build. I can't really tell if it's related though.
I noticed someone else had used the 'pcistatus' command but it hasn't been registered at the point where I get dropped into the kernel debugger.
comment:4 by , 16 years ago
Cc: | added |
---|
follow-ups: 6 12 comment:5 by , 16 years ago
Hint: Parity error reporting gets enabled at PCI::_ConfigureBridges() in pci.cpp
comment:6 by , 16 years ago
Replying to marcusoverhagen:
Hint: Parity error reporting gets enabled at PCI::_ConfigureBridges() in pci.cpp
I've learned a bit more now and found where the modules get loaded.
I looked over my attached list of PCI devices and all of the PCI to PCI bridges are configured judging by the syslog.
So the error is probably in PCI::ClearDeviceStatus()
I wonder if it's worthwhile recompiling it with dumpstatus = true set?
comment:7 by , 16 years ago
After a fiasco with updating Arch linux I finnaly managed to build an image in ubuntu with the dumpstatus=true.
No difference in the syslog.
I'm going to try it with some dprintf's to trace how far it gets.
comment:8 by , 16 years ago
Sorry for the delay.
As far as I can make out the error occurs just after it outputs the configuration of the last PCI to PCI bridge. It doesn't get as far as the part where it children of the bus are checked.
Any advice on how I should proceed next?
comment:9 by , 16 years ago
Looked up what that bridge actually is and it is the South bridge.
The plot thickens!
comment:10 by , 16 years ago
If I'm not mistaken the status of the PCI to PCI bridges looks rather suspicious. Take a look at the attached PDF and note the red values.
Is this the correct behaviour?
comment:11 by , 16 years ago
Looks like the file uploader is broken. =(
PDF is now uploaded here:
http://binxtrone.110mb.com/tmp/PCI%20Bridge%20Control%20Status.pdf
comment:12 by , 16 years ago
Replying to marcusoverhagen:
Hint: Parity error reporting gets enabled at PCI::_ConfigureBridges() in pci.cpp
Hi. I managed to get Haiku to boot perfectly by disabling the SERR# Enable bit in the control register.
Sure enough checking the PCI status after boot yields:
domain 0, bus 0, dev 30, func 0, PCI bridge secondary status 0x4280
Recieved System Error
domain 0, bus 0, dev 30, func 0, PCI bridge control 0x0825
I haven't a clue why it's signalling a system error. =(
You can download the specification for the ICH8 here:
http://www.intel.com/design/chipsets/datashts/313056.htm
According to the device list I origionally attached I have the Base Mobile Version.
Do check out the strange configuration behaviour shown in the PDF I uploaded yesterday though.
comment:13 by , 16 years ago
Please attach a complete syslog, I'm interested in the PCI configuration. Please also attach the complete PCI status (you have to enter KDL once for that)
Especially what is the secondary bus number of bridge on bus 0, dev 30, func 0, and what devices are on that bus. Has any of the devices a pending error status?
follow-up: 19 comment:14 by , 16 years ago
Replying to marcusoverhagen:
Please attach a complete syslog, I'm interested in the PCI configuration. Please also attach the complete PCI status (you have to enter KDL once for that)
Especially what is the secondary bus number of bridge on bus 0, dev 30, func 0, and what devices are on that bus. Has any of the devices a pending error status?
Whew! I had to take a picture of the pcistatus output and type it but I managed to extract the syslog files using the skyfs/befs viewer. This is the output with the SERR# Enable bit disabled. Files are zipped because the syslog file is bigger than the maximum upload limit.
It's probably also worth mentioning that the mouse seems to lock up after a minute or so after the boot completes. This didn't happen when I modified the code to skip bus 0, dev 30.
comment:15 by , 16 years ago
Ignore the last character of "KERN: PCI: dom 0, bus 0, dev 28, func 5, changed PCI bridge control from 0x0004 to 0x0007_"
That was a conversion error when I changed the line breaks. =)
by , 16 years ago
Attachment: | syslog_pcistatus.zip added |
---|
comment:16 by , 16 years ago
Never mind. I noticed a mistake in the other file so I just reuploaded the corrected version.
comment:17 by , 16 years ago
No wonder the syslog is so huge. It has the output of the last couple of boots I did.
Ignore the zip file. I'll post the smaller versions. Grrr.
by , 16 years ago
Attachment: | pcistatus.txt added |
---|
by , 16 years ago
Attachment: | syslog.txt added |
---|
comment:18 by , 16 years ago
Ok I re-copied my image onto the usb disk and my mouse problem and lockups disappeared so ignore what I said about those.
Again this is the output with the SERR# Enable bit disabled.
by , 16 years ago
Attachment: | pcistatus boot.txt added |
---|
comment:19 by , 16 years ago
Replying to marcusoverhagen:
Please attach a complete syslog, I'm interested in the PCI configuration. Please also attach the complete PCI status (you have to enter KDL once for that)
Especially what is the secondary bus number of bridge on bus 0, dev 30, func 0, and what devices are on that bus. Has any of the devices a pending error status?
I changed the ClearDeviceStatus() code to dprintf's so that it outputed the status during the PCI module initialization and and extracted this from the resulting syslog, since the status is cleared during initalization.
(bus 7, dev 9) looks the same as the after-boot status but the other bridges have devices that signalled system errors. How come they don't generate a NMI since they still have the SERR# Enable bit set?
comment:20 by , 16 years ago
Tried setting the SERR# Response bit to 0 in the command register of devices on bus 7 but no effect.
comment:22 by , 13 years ago
Can you recheck this with a recent Haiku build? It may have been fixed recently.
comment:23 by , 13 years ago
The GPU of that machine no longer works thanks to poor quality control and poor customer support.
I have a machine with that same bridge in the same position that might be of some use so I'll check it with haiku-pre-alpha-hrev28555-raw.zip to see if I encounter the original bug.
If I get that error I'll test it with a more recent build.
comment:24 by , 10 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Closing this as it is six years old, with no activity in three years.
Forgot to mention that the image I used was: haiku-pre-alpha-hrev28555-raw.zip
This bug isn't version specific however.