Opened 3 years ago

Closed 3 years ago

#17352 closed bug (fixed)

nvme_disk: Timed out waiting for interrupt [regression]

Reported by: DFergFLA Owned by: waddlesplash
Priority: normal Milestone: R1/beta4
Component: Drivers/Disk/NVMe Version: R1/beta3
Keywords: Cc:
Blocked By: #17334 Blocking:
Platform: All

Description

As of 10/21/21 my HP Envy will not longer boot Haiku. It stops at the HD icon. This laptop has been running Haiku for years with no booting issue. If I go into boot option and select a system state of 10/201/21 the system will boot fine. Newer than 10/20 and it doesn't boot.

Also, as of today there are two entries list for "Haiku" in the EFI boot loader. They both have the exact same system state entries and I can use either one for boothing. However this laptop has one one drive, with only Haiku installed onto one partition. Reboot no longer will work as the boot loader complains of the boot drive being invalid. However If I manuall select the system state from 10/20 on either of the two entries it boots fine.

Attachments (11)

syslog (475.1 KB ) - added by DFergFLA 3 years ago.
syslog
syslog.2 (158.0 KB ) - added by DFergFLA 3 years ago.
Syslog 2
20211024_150842.jpg (778.6 KB ) - added by DFergFLA 3 years ago.
boot
boot 2.jpg (500.7 KB ) - added by DFergFLA 3 years ago.
Boot 2
Log 1.jpg (4.4 MB ) - added by DFergFLA 3 years ago.
Log 1
Log 2.jpg (4.8 MB ) - added by DFergFLA 3 years ago.
Log 2
Log 3.jpg (768.6 KB ) - added by DFergFLA 3 years ago.
Log 3
1.jpg (56.3 KB ) - added by DFergFLA 3 years ago.
New Log 1
2.jpg (42.3 KB ) - added by DFergFLA 3 years ago.
New Log 2
3.jpg (40.0 KB ) - added by DFergFLA 3 years ago.
New Log 3
4.jpg (36.8 KB ) - added by DFergFLA 3 years ago.
New Log 4

Change History (32)

by DFergFLA, 3 years ago

Attachment: syslog added

syslog

comment:1 by waddlesplash, 3 years ago

In the syslog, I see two hrevs mentioned: hrev55507 and hrev55580. Which of these is the one you can boot successfully with?

comment:2 by waddlesplash, 3 years ago

Actually, unless I am misreading your syslog, there may be two haiku.hpkg active at once. That may be related here...

Can you enable the "on screen debug output" option in the bootloader (and choose "disable on-screen paging"), and take a picture of wherever output stops, on the unsuccessful boot?

comment:3 by DFergFLA, 3 years ago

@waddlesplash

OK, so by the power of the "Magic Reboot" I am now back to just my original problem. I no longer have two entries for Haiku.

I will upload a new syslog for you.

by DFergFLA, 3 years ago

Attachment: syslog.2 added

Syslog 2

by DFergFLA, 3 years ago

Attachment: 20211024_150842.jpg added

boot

comment:4 by DFergFLA, 3 years ago

sorry for the quality of the picture. It's as good as I could get it. Just FYI, the messages on screen don't really stop. Every few seconds it just keeps going, it seems to be cycling though the same ones over and over.

comment:5 by waddlesplash, 3 years ago

Component: - GeneralDrivers/Disk/NVMe
Owner: changed from nobody to waddlesplash
Summary: Haiku won't bootnvme_disk: Timed out waiting for interrupt [regression]

The problem is "nvme_disk: timed out waiting for interrupt".

From the successful boot:

KERN: PCI: [dom 0, bus  3] bus   3, device  0, function  0: vendor 144d, device a802, revision 01
KERN: PCI:   class_base 01, class_function 08, class_api 02
KERN: PCI:   vendor 144d: Samsung Electronics Co Ltd
KERN: PCI:   device a802: NVMe SSD Controller SM951/PM951 (PM963 2.5" NVMe PCIe SSD)
KERN: PCI:   info: Mass storage controller (Non-Volatile memory controller, NVM Ex
KERN: PCI:   line_size 10, latency 00, header_type 00, BIST 00
KERN: PCI:   ROM base host 00000000, pci 00000000, size 00000000
KERN: PCI:   cardbus_CIS 00000000, subsystem_id a801, subsystem_vendor_id 144d
KERN: PCI:   interrupt_line 0b, interrupt_pin 01, min_grant 00, max_latency 00
KERN: PCI:   base reg 0: host a1000000, pci a1000000, size 00004000, flags 04
KERN: PCI:   base reg 1: host 00000000, pci 00000000, size 00000000, flags 00
KERN: PCI:   base reg 2: host 00003000, pci 00003000, size 00000100, flags 01
KERN: PCI:   base reg 3: host 00000000, pci 00000000, size 00000000, flags 00
KERN: PCI:   base reg 4: host 00000000, pci 00000000, size 00000000, flags 00
KERN: PCI:   base reg 5: host 00000000, pci 00000000, size 00000000, flags 00
KERN: PCI:   Capabilities: PM, MSI, PCIe, MSI-X
KERN: PCI:   Extended capabilities: (empty list)
KERN: nvme_disk: attached to NVMe device "SAMSUNG MZVLV256HCHP-000H1 (S2CSNX0HA06325)"
KERN: nvme_disk:    maximum transfer size: 131072
KERN: nvme_disk:    qpair count: 8
KERN: slab memory manager: created area 0xffffffff90801000 (1874)
KERN: DMAResource@0xffffffff82c412f0: low/high 0/ffffffffffffffff, max segment count 126, align 4096, boundary 0, max transfer 131072, max segment size 18446744073709551615
KERN: allocate_io_interrupt_vectors: allocated 1 vectors starting from 66
KERN: msi_allocate_vectors: allocated 1 vectors starting from 66
KERN: msix configured for 1 vectors
KERN: msi-x enabled: 0x8008
KERN: nvme_disk: using MSI-X

comment:6 by waddlesplash, 3 years ago

So this would have likely regressed after hrev55520 which disabled MSI-X support, as it was broken on QEMU (#17334) and also some real hardware (#15874). Your hardware seems to somehow require it, though; or at least, it claims to support MSIs, but maybe those do not work for some reason.

by DFergFLA, 3 years ago

Attachment: boot 2.jpg added

Boot 2

comment:7 by DFergFLA, 3 years ago

It eventually did stop. I have taken a picture of that also and added it. Just in case it contains more information for you.

comment:8 by DFergFLA, 3 years ago

@waddlesplash

OK, ok so MSI-X. That actually might explain why it doesn't boot on my HP Envy, but does on my ASRock J5005-ITX. So, I'll just have to wait for that problem to be solved.

If there is anything I can assist with on my end to help. Please let me know.

Last edited 3 years ago by DFergFLA (previous) (diff)

comment:9 by waddlesplash, 3 years ago

It does help, actually; it indicates that the interrupts we (probably) want are getting triggered but on an unhandled vector. That is very strange.

Please see if you can boot with on-screen paging *enabled*, SMP *disabled*, and then page through the boot log output until you see a screenful that contains "nvme_disk:" in yellow like it is in your picture here, and then take a picture of that screen, and post it here.

comment:10 by waddlesplash, 3 years ago

Blocked By: 17334 added

comment:11 by DFergFLA, 3 years ago

OK, Here are the setting I used:

  1. Under "Safe Mode" I enabled "Disable SMP"
  2. Under "Debug" I enabled "Enable On Screen Debug Output" and "Enable Debug Syslog"

There was a lot of stuff. I have taken 3 screenshots of all I could get.

P.S. Is there a way to write the entire debug to a USB?

comment:12 by waddlesplash, 3 years ago

Only from the BIOS bootloader, not the EFI one, unfortunately.

by DFergFLA, 3 years ago

Attachment: Log 1.jpg added

Log 1

by DFergFLA, 3 years ago

Attachment: Log 2.jpg added

Log 2

by DFergFLA, 3 years ago

Attachment: Log 3.jpg added

Log 3

comment:13 by waddlesplash, 3 years ago

Unfortunately this does not have the most critical part, which is the section beginning "nvme_disk: attached to NVMe device ..." You probably need to use the on-screen paging method and look at each screen before going to the next one.

comment:14 by DFergFLA, 3 years ago

In all cases where I see "nvme_disk:" in yellow. It is followed by "Timeout waiting for interrupt!" It looks like this

"nvme_disk: Timeout waiting for interrupt!"

I never see "nvme_disk: attached to NVMe device..."

comment:15 by waddlesplash, 3 years ago

You have to leave on-screen paging enabled, and then go through the syslog one page at a time.

comment:16 by DFergFLA, 3 years ago

From the very 1st time I see "nvme_disk:" through the next two to three dozen times. It always says the same thing. "timeout waiting for interrupt."

Last edited 3 years ago by DFergFLA (previous) (diff)

by DFergFLA, 3 years ago

Attachment: 1.jpg added

New Log 1

by DFergFLA, 3 years ago

Attachment: 2.jpg added

New Log 2

by DFergFLA, 3 years ago

Attachment: 3.jpg added

New Log 3

by DFergFLA, 3 years ago

Attachment: 4.jpg added

New Log 4

comment:17 by waddlesplash, 3 years ago

The "New Log 1" contains the relevant bit of data; for some reason it is not in yellow. So, it seems, we are indeed using MSI here, for which we get interrupts but on the wrong vector, it appears.

I will try to look into this more in the coming days, but I'm a little baffled as to what could be going wrong. I suspect it is very closely related to #17334, but how, I don't know.

comment:18 by DFergFLA, 3 years ago

I am just so happy we managed to get the bits of info you needed. My eyes were starting to go cross from reading that log file. Anyway, just let me know if there is anything else I can do.

Thanks

comment:19 by waddlesplash, 3 years ago

Please retest after hrev55586.

comment:20 by DFergFLA, 3 years ago

Thank you,

I am able to confirm that it is working again.

Thanks

comment:21 by waddlesplash, 3 years ago

Milestone: UnscheduledR1/beta4
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.