Attachments (17)
Change History (65)
by , 18 months ago
Attachment: | listdev.txt added |
---|
by , 18 months ago
Attachment: | syslog.txt added |
---|
comment:1 by , 18 months ago
comment:2 by , 18 months ago
vidrep, can you enable on-screen logs and post the few last screenshots?
comment:3 by , 18 months ago
by , 18 months ago
Attachment: | IMG_0245.JPG added |
---|
by , 18 months ago
Attachment: | IMG_0246.JPG added |
---|
by , 18 months ago
Attachment: | IMG_0247.JPG added |
---|
comment:5 by , 18 months ago
Component: | Partitioning Systems/GPT → Drivers/Disk/AHCI |
---|
<Vidrep_64> I'm booting from an internal hard drive
comment:6 by , 14 months ago
Presumably hrev57034 (note multiple commits in that hrev) is the cause here. So this would then be a problem with the new PCIe support code.
I've asked Vidrep to try booting with ACPI disabled, as that will use the older configuration mechanism and may reveal differences.
comment:7 by , 14 months ago
I was able to successfully install a current nightly build only if "disable ACPI" is first selected in the boot menu. Otherwise the system will KDL on the fourth icon during boot. It will KDL during shutdown regardless.
comment:8 by , 14 months ago
Component: | Drivers/Disk/AHCI → Drivers/ACPI |
---|
The problem is likely in the new ECAM PCI mechanism and the ACPI bus, then, based on the commit range.
comment:9 by , 14 months ago
Milestone: | Unscheduled → R1/beta5 |
---|
by , 14 months ago
by , 14 months ago
Attachment: | IMG_0127.JPG added |
---|
comment:12 by , 12 months ago
Blocked By: | 18454 added |
---|
Since the system boots with ACPI disabled, maybe related to #18454.
comment:14 by , 10 months ago
Tested with hrev57645 x86_64
Fresh install booting from USB stick
Image on USB stick boots to installation screen, but the hard disk and its Haiku partitions are not seen unless ACPI is disabled first. After disabling ACPI and installing to HDD the system will KDL on the fourth icon upon reboot. Diable ACPI allows it to boot to the desktop. The system will KDL again when shutting down or rebooting.
comment:15 by , 10 months ago
Unfortunately I can't find any boot in the non-working case in your attached files.
I'm interested in the PCI initialization which starts like this (with all the "PCI: ..." following that):
From the hrev57033 log:
238 KERN: PCI: mechanism addr: e0000000, seg: 0, start: 0, end: ff 239 KERN: PCI: mechanism pcie controller found 240 KERN: pci_init_deferred()
From the other log:
4556 KERN: PCI: pci_module_init 4557 KERN: acpi: ACPI disabled 4558 KERN: PCI: mechanism 1 controller found
(I find only cases with ACPI disabled).
I think you can extract the syslog while booting from USB with ACPI enabled.
I'm hopping to see something like this in current nightlies:
KERN: initialize PCI controller from ACPI KERN: PCI: range from ACPI [0(1),fe(1)] with length ff KERN: PCI: range from ACPI [0(1),cf7(1)] with length cf8 KERN: PCI: range from ACPI [d00(1),ffff(1)] with length f300 KERN: PCI: range from ACPI [a0000(1),bffff(1)] with length 20000 KERN: PCI: range from ACPI [c0000(1),c3fff(1)] with length 0 KERN: PCI: range from ACPI [c4000(1),c7fff(1)] with length 0 KERN: PCI: range from ACPI [c8000(1),cbfff(1)] with length 0 KERN: PCI: range from ACPI [cc000(1),cffff(1)] with length 0 KERN: PCI: range from ACPI [d0000(1),d3fff(1)] with length 0 KERN: PCI: range from ACPI [d4000(1),d7fff(1)] with length 0 KERN: PCI: range from ACPI [d8000(1),dbfff(1)] with length 0 KERN: PCI: range from ACPI [dc000(1),dffff(1)] with length 0 KERN: PCI: range from ACPI [e0000(1),e3fff(1)] with length 0 KERN: PCI: range from ACPI [e4000(1),e7fff(1)] with length 0 KERN: PCI: range from ACPI [e8000(1),ebfff(1)] with length 0 KERN: PCI: range from ACPI [ec000(1),effff(1)] with length 0 KERN: PCI: range from ACPI [f0000(1),fffff(1)] with length 0 KERN: PCI: range from ACPI [80800000(1),bfffffff(1)] with length 3f800000 KERN: PCI: range from ACPI [4000000000(1),7fffffffff(1)] with length 4000000000 KERN: PCI: mechanism addr: c0000000, seg: 0, start: 0, end: ff
(we have added quite a bit of debug info in here)
I have also noticed this log:
io-apic 0 entry count exceeds max supported, only using the first 64 entries io-apic 0 has range 0-63, 64 entries, version 0x00720020, apic-id 1
maybe we should do something about that?
comment:16 by , 10 months ago
Would enabling any of the debug options in the boot menu be of any help?
comment:17 by , 10 months ago
I don't know, I need a log where the problem actually happens to investigate it, and I don't think I found one in the attached logs (maybe I missed it). They are either #rom before the problem, or with acpi disabled.
comment:18 by , 10 months ago
hrev57666 x86_64 ACPI enabled
Enable on screen debug output - IMG.0126.JPG
Display current boot loader log - IMG.0128.JPG, IMG.0129.JPG
Hopefully this helps.
by , 10 months ago
Attachment: | IMG_0126.JPG added |
---|
by , 10 months ago
Attachment: | IMG_0128.JPG added |
---|
by , 10 months ago
Attachment: | IMG_0129.JPG added |
---|
comment:19 by , 10 months ago
Ok, there are some strange values in the ACPI range list similar to #18454. So I guess we need to fix that first and see if it helps here. Waddleplash had already guessed it, but now I think it's confirmed.
comment:20 by , 10 months ago
I have attached a couple more photos of debug output, but this time with on screen paging disabled. I was able to capture the last few lines before it was overwritten by the KDL.
IMG_0126.2.JPG
IMG_0127.2.JPG
by , 10 months ago
Attachment: | IMG_0126.2.JPG added |
---|
by , 10 months ago
Attachment: | IMG_0127.2.JPG added |
---|
comment:22 by , 10 months ago
Updated to hrev57683. Same problem as before - cannot boot without first disabling ACPI.
comment:23 by , 6 months ago
Please retest after hrev57812. If it still fails, please capture a new syslog and upload it.
by , 6 months ago
comment:25 by , 6 months ago
This boot has ACPI disabled, can you capture a failed boot with ACPI enabled, or at least the tail end of it via taking a picture of debug output?
by , 6 months ago
Attachment: | IMG_0041.JPG added |
---|
by , 6 months ago
Attachment: | IMG_0045.JPG added |
---|
comment:31 by , 6 months ago
“Press key to continue” has no effect. The debug output is frozen at this point.
comment:32 by , 6 months ago
In the bootloader options you can select "disable paging", that will let the system boot as far as possible without having to press a key between each page.
There is a known issue that the keyboard in these boot screens stops working when usb starts being initialized, so the page-by-page output is only possible for the steps before that.
comment:33 by , 6 months ago
I've already provided screen shots with paging disabled. It scrolls by very quickly then KDL's (see IMG_0045.JPG). Otherwise it stalls after one page and I cannot proceed further.
comment:34 by , 6 months ago
Ok, so, to clarify the current situation:
- Problems with PCI enumeration are now solved
- The boot fails with "cannot find any boot partitions" after several errors from SATA / AHCI ("unable to hard reset device") and a DMA transfer that doesn't seem to stop.
Maybe the easiest way to investigate this is booting from USB with ACPI enabled (if I understand correctly, this should work fine, but you won't have access to the hard disk) and getting a complete syslog from there?
We can try to compare the AHCI initialization with the one in syslog.2 and see if we can understand why it's not working when ACPI is enabled.
comment:35 by , 6 months ago
I wrote hrev57909_x86_64 to a USB thumb drive and booted from it with ACPI enabled. Attached is the syslog.
by , 6 months ago
comment:36 by , 6 months ago
The first difference I see is in the MSI interrupts handling:
490 KERN: allocate_io_interrupt_vectors: allocated 1 vectors starting from 17 491 KERN: msi_allocate_vectors: allocated 1 vectors starting from 17 492 KERN: msi enabled: 0x0001 493 KERN: ahci: using MSI vector 17
now becomes:
498 KERN: allocate_io_interrupt_vectors: allocated 1 vectors starting from 65 499 KERN: msi_allocate_vectors: allocated 1 vectors starting from 65 500 KERN: msi enabled: 0x0001 501 KERN: ahci: using MSI vector 65
but then it goes completely wrong, here is a normal access in syslog.2:
497 KERN: ahci: cap: Interface Speed Support: generation 3 498 KERN: ahci: cap: Number of Command Slots: 32 (raw 0x1f) 499 KERN: ahci: cap: Number of Ports: 2 (raw 0x1) 500 KERN: ahci: cap: Supports Port Multiplier: no 501 KERN: ahci: cap: Supports External SATA: no 502 KERN: ahci: cap: Enclosure Management Supported: no 503 KERN: ahci: cap: FIS-based Switching Control: no 504 KERN: ahci: cap: Supports Command List Override: yes 505 KERN: ahci: cap: Supports Staggered Spin-up: no 506 KERN: ahci: cap: Supports Mechanical Presence Switch: yes 507 KERN: ahci: cap: Supports 64-bit Addressing: yes 508 KERN: ahci: cap: Supports Native Command Queuing: yes 509 KERN: ahci: cap: Supports SNotification Register: no 510 KERN: ahci: cap: Supports Command List Override: yes 511 KERN: ahci: cap: Supports AHCI mode only: yes 512 KERN: ahci: cap2: DevSleep Entrance from Slumber Only: yes 513 KERN: ahci: cap2: Supports Aggressive Device Sleep Management: yes 514 KERN: ahci: cap2: Supports Device Sleep: yes 515 KERN: ahci: cap2: Automatic Partial to Slumber Transitions: yes 516 KERN: ahci: cap2: NVMHCI Present: no 517 KERN: ahci: cap2: BIOS/OS Handoff: no 518 KERN: ahci: ghc: AHCI Enable: yes 519 KERN: ahci: Ports Implemented Mask: 0x000003 Number of Available Ports: 2 520 KERN: ahci: AHCI Version 0001.03.01 Interrupt 17
In syslog.3 we are reading only ff bytes from the device it seems, everything is set:
505 KERN: ahci: cap: Interface Speed Support: generation 15 506 KERN: ahci: cap: Number of Command Slots: 32 (raw 0x1f) 507 KERN: ahci: cap: Number of Ports: 32 (raw 0x1f) 508 KERN: ahci: cap: Supports Port Multiplier: yes 509 KERN: ahci: cap: Supports External SATA: yes 510 KERN: ahci: cap: Enclosure Management Supported: yes 511 KERN: ahci: cap: FIS-based Switching Control: yes 512 KERN: ahci: cap: Supports Command List Override: yes 513 KERN: ahci: cap: Supports Staggered Spin-up: yes 514 KERN: ahci: cap: Supports Mechanical Presence Switch: yes 515 KERN: ahci: cap: Supports 64-bit Addressing: yes 516 KERN: ahci: cap: Supports Native Command Queuing: yes 517 KERN: ahci: cap: Supports SNotification Register: yes 518 KERN: ahci: cap: Supports Command List Override: yes 519 KERN: ahci: cap: Supports AHCI mode only: yes 520 KERN: ahci: cap2: DevSleep Entrance from Slumber Only: yes 521 KERN: ahci: cap2: Supports Aggressive Device Sleep Management: yes 522 KERN: ahci: cap2: Supports Device Sleep: yes 523 KERN: ahci: cap2: Automatic Partial to Slumber Transitions: yes 524 KERN: ahci: cap2: NVMHCI Present: yes 525 KERN: ahci: cap2: BIOS/OS Handoff: yes 526 KERN: ahci: ghc: AHCI Enable: yes 527 KERN: ahci: Ports Implemented Mask: 0xffffffff Number of Available Ports: 32 528 KERN: ahci: AHCI Version ffff.ff.ff Interrupt 65
From there on, the AHCI driver is of course completely confused, trying to create 32 ports, and finding out that all of them have DMA transfers active (since the bit indicating that is also stuck to 1).
comment:37 by , 6 months ago
I found a possible issue in the revision range and made a fix for it: https://review.haiku-os.org/c/haiku/+/7929
I don't know if that will fix the problem, however.
comment:39 by , 6 months ago
I updated from hrev57910 to hrev57921 and there is no change. The PC still KDL's as before. Also, my idualwifi 7260 is not functioning after the update. Perhaps related to https://dev.haiku-os.org/ticket/18966
comment:40 by , 5 months ago
Blocking: | 18792 added |
---|
comment:41 by , 5 months ago
I noticed something strange.
We are applying a quirk to all Intel devices to configure the ports. This is done in AHCIController::ResetController.
This code reads the port count and should log a message if the port count is greater than 8. It doesn't log anything ("don't know how to enable SATA ports 9 to %d").
In AHCIController::Init, a few lines after calling ResetController, we again read the two variables used to compute the number of ports. This time, we find out that there are 32 ports.
ahci: cap: Number of Ports: 32 (raw 0x1f) Ports Implemented Mask: 0xffffffff Number of Available Ports: 32
This means, at the time ResetController was called, the device was operating fine. But during or after applying the quirk, it stopped working?
I also verified that Linux applies the same quirk to this device. https://github.com/torvalds/linux/blob/master/drivers/ata/ahci.c#L1770
I guess the problem then is that the new ECAM register writing code is not correct. Either there is a problem with doing a 32 bit access instead of a 16 bit one, or there is a problem with endianness and we end up accessing to the wrong register.
I made two changes:
I appreciate if you can test them (together or separately, as you prefer)
comment:42 by , 5 months ago
I only know how to test it from iso nightly build. So I downloaded hrev57980 revision (there are the fixes, right?) and I still cannot see my hard drive. Only my memory stick is visible. I'm attaching the syslog file.
comment:43 by , 5 months ago
No, the patch is not merged yet in the nightlies. That's why I linked to the patch, otherwise I would give a hrev number to test with.
From the patch, you can find the comment by the commit checker robot, and from there a link to the test build: https://haiku.movingborders.es/testbuild/I8854350b70793160548fe96e15d95874deed4f7b/2/hrev57979/x86_64/
There you can find an iso image with that change.
comment:44 by , 5 months ago
When booting the test image from a USB thumb drive, the installer now "sees" my hard drive with its Haiku partitions again. So, this looks to be the correct fix for this issue.
comment:45 by , 5 months ago
Yes, it also works for me. I can install Haiku. After installation it does not boot up, but (probably) this is another problem.
comment:46 by , 5 months ago
I’ll do an install to the HDD later today and report back any issues, if any.
comment:47 by , 5 months ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fix merged in hrev57981. Thanks for testing!
comment:48 by , 5 months ago
Test image successfully installed to my existing HDD partition and rebooted without a KDL. Thank you!
I was puzzled by numbers and checked the joined sylog. Of course, it's between hrev57033 and hrev57038 but a lot happened during this period (Changes to XHCI, PCIe, usb-disk driver transition to new API..)... Could you precise where it hangs?