Opened 4 years ago

Closed 4 years ago

#12357 closed bug (fixed)

AHCI unsupported request opcode

Reported by: kallisti5 Owned by: nobody
Priority: critical Milestone: R1/beta1
Component: Drivers/Disk Version: R1/Development
Keywords: ahci Cc:
Blocked By: Blocking:
Has a Patch: yes Platform: All

Description

It looks like AHCI was broken in one of the recent changes. Testing on a Dell optiplex sata machine. Boots fail, when boots succeed I get AHCI faults.

I changed some code to begin emitting VPD data when requested through SCSI, however the code isn't called so I don't think it is the cause.

x86_64, hrev49616

~> grep ahci /var/log/syslog
KERN: ahci: generic AHCI controller found! vendor 0x8086, device 0x1c02
KERN: ahci: ahci_register_device
KERN: ahci: ahci_init_driver
KERN: ahci: ahci_sim_init_bus
KERN: ahci: ahci_sim_init_bus: pciDevice 0xffffffff8245f050
KERN: ahci: AHCIController::Init 0:31:2 vendor 8086, device 1c02
KERN: ahci: PCI SATA capability found at offset 0xa8
KERN: ahci: satacr0 = 0x0010b012, satacr1 = 0x00000048
KERN: ahci: pcicmd old 0x0007
KERN: ahci: pcicmd new 0x0006
KERN: ahci: using MSI vector 24
KERN: ahci: registers at 0xf7c06000, size 0x800
KERN: ahci: mapping physical address 0xf7c06000 with 2048 bytes for AHCI HBA regs
KERN: ahci: physical = 0xf7c06000, virtual = 0xffffffff81a06000, offset = 0, phyadr = 0xf7c06000, mapadr = 0xffffffff81a06000, size = 4096, area = 0x000002f8
KERN: ahci: cap: Interface Speed Support: generation 2
KERN: ahci: cap: Number of Command Slots: 32 (raw 0x1f)
KERN: ahci: cap: Number of Ports: 4 (raw 0x3)
KERN: ahci: cap: Supports Port Multiplier: no
KERN: ahci: cap: Supports External SATA: no
KERN: ahci: cap: Enclosure Management Supported: yes
KERN: ahci: cap: Supports Command List Override: yes
KERN: ahci: cap: Supports Staggered Spin-up: yes
KERN: ahci: cap: Supports Mechanical Presence Switch: no
KERN: ahci: cap: Supports 64-bit Addressing: yes
KERN: ahci: cap: Supports Native Command Queuing: yes
KERN: ahci: cap: Supports SNotification Register: no
KERN: ahci: cap: Supports Command List Override: yes
KERN: ahci: cap: Supports AHCI mode only: no
KERN: ahci: cap2: DevSleep Entrance from Slumber Only: no
KERN: ahci: cap2: Supports Aggressive Device Sleep Management: no
KERN: ahci: cap2: Supports Device Sleep: no
KERN: ahci: cap2: Automatic Partial to Slumber Transitions: yes
KERN: ahci: cap2: NVMHCI Present: no
KERN: ahci: cap2: BIOS/OS Handoff: no
KERN: ahci: ghc: AHCI Enable: yes
KERN: ahci: Ports Implemented Mask: 0x000003 Number of Available Ports: 2
KERN: ahci: AHCI Version 0001.03.00 Interrupt 24
KERN: ahci: AHCIPort::Init1 port 0
KERN: ahci: allocating 4096 bytes for AHCI port 0
KERN: ahci: area = 761, size = 4096, virt = 0xffffffff81a07000, phy = 0x1389a000
KERN: ahci: PRD table is at 0xffffffff81a07580
KERN: ahci: AHCIPort::Init1 port 1
KERN: ahci: allocating 4096 bytes for AHCI port 1
KERN: ahci: area = 7KERN: 62, size = 4096, virt = 0xffffffff81a08000, phy = 0x13899000
KERN: ahci: PRD table is at 0xffffffff81a08580
KERN: ahci: AHCIPort::Init2 port 0
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400040, ci 0x00000000
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 1
KERN: ahci: ssts 0x00000123
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: sctl.ipm 0x03
KERN: ahci: sctl.spd 0x00
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: sctl.det 0x00
KERN: ahci: ie   0x7dc0007f
KERN: ahci: serr 0x00000000
KERN: ahci: is   0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: cmd  0x0000c017
KERN: ahci: PhyReady Change
KERN: ahci: Port Connect Change
KERN: ahci: ssts 0x00000123
KERN: ahci: AHCIPort::_HardReset() PORT_CMD_ST set, behaviour undefined
KERN: ahci: sctl.ipm 0x03
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400040, ci 0x00000000
KERN: ahci: sctl.spd 0x00
KERN: ahci: ssts 0x00000123
KERN: ahci: sctl.det 0x00
KERN: ahci: sctl.ipm 0x03
KERN: ahci: serr 0x04050000
KERN: ahci: sctl.spd 0x00
KERN: ahci: sact 0x00000000
KERN: ahci: sctl.det 0x00
KERN: ahci: tfd  0x00000080
KERN: ahci: serr 0x04050000
KERN: ahci: AHCIPort::Init2 port 1
KERN: ahci: sact 0x00000000
KERN: ahci: AHCIPort::ResetPort port 1
KERN: ahci: PhyReady Change
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::ResetPort port 1, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::_HardReset() PORT_CMD_ST set, behaviour undefined
KERN: ahci: AHCIPort::PostReset port 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: device signature 0xeb140101 (ATAPI)
KERN: ahci: ssts 0x00000123
KERN: ahci: ie   0x7dc0007f
KERN: ahci: sctl.ipm 0x03
KERN: ahci: is   0x00000000
KERN: ahci: sctl.spd 0x00
KERN: ahci: cmd  0x0100c017
KERN: ahci: sctl.det 0x00
KERN: ahci: ssts 0x00000113
KERN: ahci: serr 0x00050000
KERN: ahci: sctl.ipm 0x03
KERN: ahci: sact 0x00000000
KERN: ahci: sctl.spd 0x00
KERN: ahci: PhyReady Change
KERN: ahci: sctl.det 0x00
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: tfd  0x00000100
KERN: ahci: cookie = 0xffffffff8228c180
KERN: ahci: ahci_path_inquiry, cookie 0xffffffff8228c180
KERN: ahci: ahci_scan_bus, cookie 0xffffffff8228c180
KERN: ahci: AHCIPort::ScsiTestUnitReady port 0
KERN: ahci: AHCIPort::ScsiInquiry port 0
KERN: ahci: lba 1, lba48 1, fUse48BitCommands 1, sectors 268435455, sectors48 1953525168, size 1000204886016
KERN: ahci: model number: WDC WD10EADS-00L5B1                     
KERN: ahci: serial number:      WD-WCAU45958290
KERN: ahci: firmware rev.: 01.01A01
KERN: ahci: sg_memcpy phyAddr 0x12b321a0, size 96
KERN: ahci: ahci_get_restrictions, cookie 0xffffffff8228c180
KERN: ahci: AHCIPort::ScsiGetRestrictions port 0: isATAPI 0, noAutoSense 0, maxBlocks 65536
KERN: ahci: ahci_get_restrictions, cookie 0xffffffff8228c180
KERN: ahci: AHCIPort::ScsiGetRestrictions port 1: isATAPI 1, noAutoSense 1, maxBlocks 256
KERN: ahci: AHCIPort::ScsiReadCapacity port 0
KERN: ahci: SectorSize 512, SectorCount 0x74706db0
KERN: ahci: sg_memcpy phyAddr 0x12b32260, size 8
KERN: ahci: AHCIPort::ScsiReadCapacity port 0
KERN: ahci: SectorSize 512, SectorCount 0x74706db0
KERN: ahci: sg_memcpy phyAddr 0x12b32290, size 8
KERN: ahci: ahci_ioctl, cookie 0xffffffff8228c180
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000001, is 0x40000001, ci 0x00000001
KERN: ahci: ssts 0x00000113
KERN: ahci: sctl.ipm 0x03
KERN: ahci: sctl.spd 0x00
KERN: ahci: sctl.det 0x00
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: Task File Error
KERN: ahci: AHCIPort::ResetPort port 1
KERN: ahci: AHCIPort::ResetPort port 1, deviceBusy 0, forceDeviceReset 0
KERN: ahci: AHCIPort::PostReset port 1
KERN: ahci: device signature 0xeb140101 (ATAPI)
KERN: ahci: sata_request::finish ATA command 0xa0 failed: status 0x51, error 0x24
KERN: ahci: sata_request::finish ATAPI packet 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (len 10)
KERN: ahci: ahci_ioctl, cookie 0xffffffff8228c180
KERN: ahci: AHCIPort::ScsiReadCapacity port 0
KERN: ahci: SectorSize 512, SectorCount 0x74706db0
KERN: ahci: sg_memcpy phyAddr 0x12b324f0, size 8
KERN: ahci: AHCIPort::ScsiReadCapacity port 0
KERN: ahci: SectorSize 512, SectorCount 0x74706db0
KERN: ahci: sg_memcpy phyAddr 0x12b32480, size 8
KERN: ahci: AHCIPort::ScsiExecuteRequest port 0 unsupported request opcode 0x43
KERN: ahci: AHCIPort::ScsiExecuteRequest port 0 unsupported request opcode 0x43
KERN: ahci: AHCIPort::ScsiReadCapacity port 0
KERN: ahci: SectorSize 512, SectorCount 0x74706db0
KERN: ahci: sg_memcpy phyAddr 0x1c8d97a0, size 8
KERN: ahci: AHCIPort::ScsiReadCapacity port 0
KERN: ahci: SectorSize 512, SectorCount 0x74706db0
KERN: ahci: sg_memcpy phyAddr 0x1c8d9730, size 8

Attachments (12)

ahci-rework-v1.diff (5.2 KB) - added by kallisti5 4 years ago.
My initial rework of the AHCI Init / Reset code. RFC. Needs testing
ahci_rework_v1-01.log (6.0 KB) - added by kallisti5 4 years ago.
hrev49616.log (7.5 KB) - added by kallisti5 4 years ago.
ahci-syslog.txt (142.3 KB) - added by jessicah 4 years ago.
ahci-rework-v2.diff (9.6 KB) - added by kallisti5 4 years ago.
version 2. Rebased. Doesn't solve the ATAPI issues yet.
ahci-rework-v2b.diff (9.2 KB) - added by kallisti5 4 years ago.
TRACE changes removed to improve readibility.
ahci-rework-v2c.diff (7.5 KB) - added by kallisti5 4 years ago.
Undoing more TRACE changes to improve readibiltiy.
ahci-rework-v3.diff (13.2 KB) - added by kallisti5 4 years ago.
rework v3
ahci-rework-v4.diff (16.1 KB) - added by kallisti5 4 years ago.
ahci-rework-v5.diff (16.5 KB) - added by kallisti5 4 years ago.
ahci-rework-v6.diff (16.8 KB) - added by kallisti5 4 years ago.
ahci-rework-v7.diff (18.1 KB) - added by kallisti5 4 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 4 years ago by axeld

The unsupported request code stands for SCSI_OP_READ_TOC -- so unless you changed something there, this shouldn't be problematic. IOW the syslog snippet doesn't really help a lot.

However, booting fails on my laptop, too, so I can look into it (as it's probably my fault). For reference, the last thing in the syslog on a failed boot here is a port reset, with the message that the ST bit is set in the command register -- according to the specification, this must not happen at this point.

comment:2 Changed 4 years ago by kallisti5

Recent AHCI changes (latest to oldest. I omitted my VPD changes)

comment:3 Changed 4 years ago by kallisti5

"AHCIPort::_HardReset() PORT_CMD_ST set, behaviour undefined" I saw that was added, and i'm now seeing it... does that logic need review?

comment:4 Changed 4 years ago by mmlr

As far as I see it, the logic in 5584c22 is simple insufficient. It causes a hard reset for any COMINIT, not just for unsolicited ones. Device detection (i.e. setting sctl.det to 1) causes a COMRESET, so if a COMINIT is received because of that, no hard reset should be done as it is perfectly expected. Skimming through the AHCI specs there are also other situations that may solicit a COMINIT, so these should be reviewed as well.

Generally you could just revert the patches up to the point where you can't reproduce the problem anymore, which would make things less theoretical.

comment:5 Changed 4 years ago by kallisti5

Do we want to revert the top 4 commits above and push for them to be reworked?

comment:6 Changed 4 years ago by pulkomandy

First, please actually git bisect the issue if you can reproduce it, so we can be sure the problem is in 5584c22. If it is, instead of reverting, we can try to improve on it as mmlr suggests. You can ask for help from Anarchos on IRC, he wrote the initial patch.

comment:7 Changed 4 years ago by jessicah

As per ML [1], my issues were introduced in 5584c22f. Michael's changes in cccf804d generally fixed my boot issues; but on the rare occasion, it still fails to boot, hanging at the disks icon as in 5584c22f.

[1] https://www.freelists.org/post/haiku-commits/haiku-hrev49590-srcaddonskernelbussesscsiahci,3

Last edited 4 years ago by jessicah (previous) (diff)

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v1.diff added

My initial rework of the AHCI Init / Reset code. RFC. Needs testing

comment:8 Changed 4 years ago by kallisti5

Has a Patch: set

Changed 4 years ago by kallisti5

Attachment: ahci_rework_v1-01.log added

Changed 4 years ago by kallisti5

Attachment: hrev49616.log added

comment:9 Changed 4 years ago by axeld

Since Enable(true) and Enable(false) only share a single line of code, I would divide that into two functions (Enable(), and Disable()). In any case, I would prefer SetEnabled(bool) over Enable(bool).

Anyway, this doesn't seem to fix any actual issue, it's just some refactoring, right?

You slightly change the way the port is initialized, but I don't know the spec well enough to be able to say if that's better or not :-)

Changed 4 years ago by jessicah

Attachment: ahci-syslog.txt added

comment:10 Changed 4 years ago by jessicah

With hrev49629, had one of those AHCI "lockups", see https://dev.haiku-os.org/attachment/ticket/12357/ahci-syslog.txt

Notice that it actually ends up in an out of control loop.

comment:11 Changed 4 years ago by kallisti5

Just noticed that as of right now AHCI cdroms no longer work (tested in virtualbox) Seems like our current AHCI code has a lot of issues.

Last edited 4 years ago by kallisti5 (previous) (diff)

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v2.diff added

version 2. Rebased. Doesn't solve the ATAPI issues yet.

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v2b.diff added

TRACE changes removed to improve readibility.

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v2c.diff added

Undoing more TRACE changes to improve readibiltiy.

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v3.diff added

rework v3

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v4.diff added

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v5.diff added

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v6.diff added

Changed 4 years ago by kallisti5

Attachment: ahci-rework-v7.diff added

comment:12 Changed 4 years ago by waddlesplash

@kallisti5: Was (part) of this patch merged, or what?

comment:13 Changed 4 years ago by kallisti5

Resolution: fixed
Status: newclosed

My patch was merged in hrev49665 to make the issue better. No additional unpushed patches at this time.

The bug still exists though.. but in a different form. Going to close this one as the goals likely need rebased.

Note: See TracTickets for help on using tickets.