Opened 16 years ago
Closed 10 years ago
#2243 closed bug (fixed)
Firewire driver provokes PCI parity error and KDL upon boot (before Tracker loads)
Reported by: | koki | Owned by: | modeenf |
---|---|---|---|
Priority: | high | Milestone: | R1 |
Component: | Drivers/FireWire | Version: | R1/pre-alpha1 |
Keywords: | Cc: | marcusoverhagen, anevilyak | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
Haiku hrev25566 on an HP Pavilion zv5400us laptop.
Haiku KDLs upon boot, after the Deskbar & Terminal are loaded, and before Tracker is run. 100% reproducible.
Haiku will finish to load Tracker and run OK after entering "Continue" command in KDL.
pcistatus/backtraces/listdev outputs and syslog attached.
Attachments (7)
Change History (34)
by , 16 years ago
Attachment: | pcistatus.jpg added |
---|
comment:1 by , 16 years ago
Cc: | added |
---|
Forgot to mention: Haiku does not KDL if fw_raw driver is removed. FWIW.
comment:2 by , 16 years ago
As the firewire driver seems to cause instability for GCC4 builds too, I'd vote for removing it from the image until the issues have been sorted out.
comment:3 by , 16 years ago
Priority: | normal → high |
---|
+1! I have a machine that also only boots okay when I remove the firewire driver. I've removed it in hrev25572 for now.
comment:4 by , 16 years ago
The firewire controller seems to generate a parity error, when DMA is enabled in fwohci_rx_enable().
follow-up: 6 comment:5 by , 16 years ago
First, there's no problem with my 1394 card on my box.
This problem may be due to the PCI sub system changeset 25550 two days ago. Now PCI bus enable Parity Error and SERR by default. Some broken PCI-1394 card does not clear all the memory on-chip during boot(hardware reset?), then PCI bus parity errors =>NMI interrupt. The broken_hardware_patch may fix the bug, if so please close this ticket. Would someone help test it?
ps:can we just enable bus master bit by default?
Regards, JiSheng
by , 16 years ago
Attachment: | broken_hardware_patch added |
---|
comment:6 by , 16 years ago
once a bug, please report it, so I can dig into it and fix.
PS: thanks mmu_man for reminding me which maybe caused the bug
comment:7 by , 16 years ago
Some more info.
With hrev25550 I enabled the PCI-PCI bridge reporting of parity errors on it's secondary side. That means, when a PCI device attached to the bridge generates a parity error, the bridge will report it as SERR, usually generating an NMI.
PCI bridge configuration happens here: 97 KERN: PCI: dom 0, bus 0, dev 10, func 0, changed PCI bridge control from 0x0200 to 0x0823 98 KERN: PCI: dom 0, bus 0, dev 11, func 0, changed PCI bridge control from 0x000f to 0x082f
The bridge is: 243 KERN: PCI: [dom 0, bus 0] bus 0, device 10, function 0: vendor 10de, device 00dd, revision a2
and the secondary bus is number 2 250 KERN: PCI: primary_bus 00, secondary_bus 02, subordinate_bus 02, secondary_latency 80
Where the firewire controller is located: 262 KERN: PCI: [dom 0, bus 2] bus 2, device 0, function 0: vendor 104c, device 8026, revision 00 264 KERN: PCI: vendor 104c: Texas Instruments 265 KERN: PCI: device 8026: TSB43AB21 IEEE-1394a-2000 Controller (PHY/Link)
While it might be possible to not enable parity error reporting at all, or to disable it for a blacklist of broken devices, I'm not sure if it isn't the firewire driver that is guilty here. Masquerading of errors usually only leads to undetected data corruption.
comment:8 by , 16 years ago
I found a similar problem here: http://lists.freebsd.org/pipermail/freebsd-current/2004-November/042438.html
follow-up: 11 comment:9 by , 16 years ago
comment:10 by , 16 years ago
Cc: | added |
---|
comment:11 by , 16 years ago
yep. There's a same bug in FreeBSD's stack. Perhaps FreeBSD now disable parity error(?I'm not sure), so the line is removed
comment:12 by , 16 years ago
Just for the record, while I didn't have a KDL, when the FW driver is installed, the system hangs completely. I haven't yet tested again with the parity check enabled.
comment:13 by , 16 years ago
Summary: | KDL upon boot (before Tracker loads) → Firewire driver provokes PCI parity error and KDL upon boot (before Tracker loads) |
---|
Regarding this issue in general, why is the firewire device transmitting that data using DMA to the system RAM?
As I understand it, it is still with bad parity, because it's RAM has never been written to before (assuming the above idea is correct). That seems to happen when receiving is enabled in fwohci_rx_enable().
comment:14 by , 16 years ago
because once bus reset, the all sid packets(include itself) will be received. DMA is used to transmit these packets.
IMO, it is still the parity problem. Because the firewire stack is ok before on koki's box. I need to find a pc with the same problem and test it, for there's no problem on my box. Any suggestions?
axeld, what's the serial debug information when the system hangs?
follow-up: 22 comment:15 by , 16 years ago
IIRC it didn't dump anything helpful, and I couldn't even enter KDL. If you have any idea on how I can dig into this more, let me know.
Is it possible to gracefully handle a parity problem by turning the check off and dump a warning to syslog? If that is not possible, I think the only solution would be to turn parity checking off by default, and make it available via a config setting only (that defaults to off).
follow-up: 17 comment:16 by , 16 years ago
Because the firewire stack is ok before on koki's box.
FWIW, I actually don't know if the FW stack was OK before, as I never used it for anything. What I can say is that, if there was a problem with FW before hrev25566, it did not manifest itself the way it does now.
follow-up: 18 comment:17 by , 16 years ago
it means that the firewire stack is initialized OK.
FWIW, I actually don't know if the FW stack was OK before, as I never >used it for anything. What I can say is that, if there was a problem with >FW before hrev25566, it did not manifest itself the way it does now.
Is it possible to gracefully handle a parity problem by turning the >check off and dump a warning to syslog? If that is not possible, I think >the only solution would be to turn parity checking off by default, and >make it available via a config setting only (that defaults to off).
the broken_hardware_patch should turn off pci-1394 card parity check. But result is the there is a parity error with pci-bridge.
would someone help test this new patch
by , 16 years ago
comment:18 by , 16 years ago
would someone help test this new patch
I mean only the patch2. a few lines fixed and enable postedWriteEnable bit of HCControl register.
thank DeakYak explain the "posted" mean;)
follow-up: 20 comment:19 by , 16 years ago
I tried a build with the patched fw_raw (thanks Deadyak!), and it still KDLs.
comment:20 by , 16 years ago
Replying to koki:
I tried a build with the patched fw_raw (thanks Deadyak!), and it still KDLs.
hmmm. Then moudule patched is firewire,not fw_raw. replaced wrong file?
comment:21 by , 16 years ago
The correct file was patched, I sent him a new image with your patch applied, he didn't just replace one file.
comment:22 by , 16 years ago
I examined about 10 box, found a linux pc with a similar problem. But the Linux kernel's NMI default behavior is emit a message and continue, so there is no problem usually. I sent a patch to lkml:http://lkml.org/lkml/2008/5/21/163. But After a lot of test today, I found that the problem is the same even the patch applied.
And about FreeBSD, the NMI ISR is just panic, but their PCI only enable master bit by default(no sure). So there is also no problem usually.
IMHO, Could we just only enable pci master or emit a message when NMI interrupt happened?
comment:23 by , 16 years ago
hmmm, there is no such problem with the 1394card on my box. So I think it is the Linux pc's pci slot problem. I can't install other os on that box because the box is very important. I need to find another box
comment:24 by , 16 years ago
I submitted this bug report, but unfortunately the laptop that was showing the problem has died and I don't have it anymore. So, unfortunately, I will not be able to provide any feedback. Sorry.
comment:25 by , 12 years ago
This patch has been added to trunk.. long ago.
So no one can test this?
comment:26 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:27 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Not tested, but it seems it was fixed. Two or more years ago.
pcistatus output