Opened 10 years ago
Last modified 4 years ago
#11019 assigned bug
Boot fail with SATA card and drive attached but not used by Haiku
Reported by: | jstressman | Owned by: | bonefish |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/Boot Loader/BIOS | Version: | R1/Development |
Keywords: | boot-failure | Cc: | |
Blocked By: | Blocking: | #13528 | |
Platform: | All |
Description (last modified by )
I have an IO Crest SI-PEX40057 SATA III PCI-e card in my machine, and if I hook any hard drive to it, Haiku completely fails to boot and instead just reboots the machine as soon as it shows "Loading system" on the screen, before the graphical boot loader or anything else appears.
"IO Crest SATA III 4-Port PCI-e 2.0 x 2 Card with Marvell HyperDuo RAID Mode Support and Low Profile Brackets SI-PEX40057"
http://www.amazon.com/gp/product/B00AZ9T264/
This appeared after hrev46284 (which works). hrev46287 is affected and does not boot.
Merely unhooking the drive from the card "fixes" the problem and everything works fine.
I tried hooking up two different hard drives to the card, and both cause this problem, even though neither drive is the Haiku drive. These are just extra storage drives. (And everything works fine in Windows and Linux.)
Also, even with hrev46284 and earlier, if you try to enter the safe mode boot menu by holding down shift or space, this will cause a reboot before you load anything.
I went back to hrev45284 (100 revisions back) and still had the same problem, so I'm not sure how far back that issue goes (reboot when trying to enter safe mode menu with shift/space).
And again, just unhooking the drive from the card restores expected functionality and the system boots properly.
Attachments (9)
Change History (34)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
- There is no kernel crash. It simply reboots.
- See attached listdev.txt
- See attached syslog.txt and syslog.old
by , 10 years ago
Attachment: | listdev.txt added |
---|
comment:3 by , 10 years ago
Component: | System/Boot Loader → Drivers/Disk |
---|---|
Owner: | changed from | to
It'd be nice if this could be fixed by R1alpha5, but I'm not going to make it a blocker because chances are it won't be...
comment:4 by , 10 years ago
Component: | Drivers/Disk → System/Boot Loader |
---|---|
Owner: | changed from | to
Status: | new → assigned |
So hrev46287 seems to be the problematic commit. Assigning to Ingo.
comment:5 by , 10 years ago
Given that there was a preexisting triple fault I assume hrev46287 just shifted it due to a different memory use pattern (uses less heap and more raw memory regions).
The syslog from the successful boot doesn't show anything off AFAICT. It would be helpful to get the output of a failed boot.
Other than that the only options for narrowing the issue I see ATM are 1. bisecting old revisions to find the one that introduced the original issue (assuming it hasn't been there forever) and 2. bisect the boot loader code with an infinite loop to narrow down the location of the triple fault.
To get the boot output in cases where serial output isn't accessible, we could add a build time option to print it on screen instead.
comment:6 by , 10 years ago
That was part of the problem I ran into on this. I don't have any serial output on this machine.
If there were a way to output some debugging info to the screen that might help, as it reboots so quickly that I'm not sure how far it even gets into the boot process. It says "Loading system" in the upper left corner and reboots immediately, before you can get into the safe mode menu, or see anything else come up on screen.
I'm definitely open to work with someone on getting it figured out.
I could try to dig back and figure out where the shift/space for safe mode menu reboot came in, but I'm not sure we have official nightly revisions far back enough to find out. So I might potentially be doing a lot of custom builds. ;)
I'm trying to put together a machine with both PCI-e and a serial port on it right now so that I can stick that SATA card in it and see if I can reproduce the problem. If so, then I can try running a null modem cable to another machine with a serial port I have to try collecting any potential debug info.
I'll report back if/when I have any luck and we can potentially go from there.
comment:7 by , 10 years ago
I purchased a null modem cable and built a test machine for Haiku that has both a serial out on the board, and PCI-e for the SATA card. I purchased an ATEN UC232A USB to PDA/Serial (DB9) Adapter and used that on this machine to receive the debugging output.
After swapping the SATA card into that machine, I see the same almost instant reboot behavior with hrev46677
I'm attaching the output of the serial debugging both with a drive attached, and without.
For the 2nd log without the drive attached, the crash at the end is unrelated and is documented in ticket #11040
by , 10 years ago
Attachment: | serial-log-SATAbug1.txt added |
---|
log with SATA card in with a drive attached to it.
by , 10 years ago
Attachment: | serial-log-SATAbug2.txt added |
---|
log with SATA card in WITHOUT a drive attached to it. crash at the end is separate, see ticket #11040
comment:8 by , 10 years ago
Alpha 4.1 reboots immediately when trying to enter safe mode menu (like hrev46284 and earlier). Alpha 3 reboots immediately when trying to enter safe mode menu. Alpha 2 reboots immediately when trying to enter safe mode menu. Alpha 1 reboots immediately when trying to enter safe mode menu.
There is no output to the log before the reboot happens on Alpha2 through 4.1, but Alpha1 actually does output to the log before it reboots. I'm attaching that.
(I used shift to enter the safe mode menu on 2 through 4.1, but had to use space on 4.1. I assume shift hadn't been added as a usable key yet at that point?)
by , 10 years ago
Attachment: | serial-log-alpha1-1.txt added |
---|
log with SATA card in with a drive attached to it, attempting to enter the safe mode menu and triggering an immediate reboot on R1Alpha1.
comment:9 by , 10 years ago
Related to bug1 (with a drive attached), could you build with TRACE_DEVICES enabled at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n17 ?
comment:10 by , 10 years ago
korli: enabled TRACE_DEVICES in said file, installed the package...
jam haiku_loader.hpkg pkgman install [dir to file]/haiku_loader.hpkg
Rebooted, but unless I did something wrong it looks like it reboots before it even gets a chance to output anything meaningful to the log?
by , 10 years ago
Attachment: | serial-log-pkgupdate3.txt added |
---|
reboot after enabling TRACE_DEVICES in said file and recompiling and installing haiku_loader package.
comment:11 by , 10 years ago
TRACE_DEVICES didn't seem to help, so on the advice of Diver and jessicah I added a bunch of dprintfs to help narrow down the problem point.
from src/system/boot/platform/bios_ia32/devices.cpp - line 521
static status_t add_block_devices(NodeList *devicesList, bool identifierMissing) { if (sBlockDevicesAdded) return B_OK; uint8 driveCount; if (get_number_of_drives(&driveCount) != B_OK) return B_ERROR; dprintf("## debug 0\n"); dprintf("number of drives: %d\n", driveCount); dprintf("## debug 1\n"); for (int32 i = 0; i < driveCount; i++) { dprintf("## debug 2\n"); uint8 driveID = i + 0x80; dprintf("## debug 3\n"); if (driveID == gBootDriveID) { dprintf("## debug 4\n"); continue; dprintf("## debug 5\n"); } BIOSDrive *drive = new(nothrow) BIOSDrive(driveID); dprintf("## debug 6\n"); if (drive->InitCheck() != B_OK) { dprintf("## debug 7\n"); dprintf("could not add drive %u\n", driveID); dprintf("## debug 8\n"); delete drive; dprintf("## debug 9\n"); continue; dprintf("## debug 10\n"); } dprintf("## debug 11\n"); // Only add usable drives if (is_drive_readable(drive)) { dprintf("## debug 12\n"); devicesList->Add(drive); dprintf("## debug 13\n"); } else { dprintf("## debug 14\n"); dprintf("could not read from drive %" B_PRIu8 ", not adding\n", driveID); delete drive; continue; } dprintf("## debug 15\n"); if (drive->FillIdentifier() != B_OK) dprintf("## debug 16\n"); identifierMissing = true; } dprintf("before devicesList if statement\n"); if (identifierMissing) { // we cannot distinguish between all drives by identifier, we need // compute checksums for them dprintf("inside devicesList if statement - before find_unique_check_sums\n"); find_unique_check_sums(devicesList); dprintf("inside devicesList if statement - after find_unique_check_sums\n"); } dprintf("after devicesList if statement\n"); sBlockDevicesAdded = true; return B_OK; }
section of the log where it works fine and keeps booting
Using mode 0x118 VESA compatible graphics! EDID1: 4f EDID2: ebx 0 Welcome to the Haiku boot loader! ## debug 0 number of drives: 1 ## debug 1 ## debug 2 ## debug 3 ## debug 4 before devicesList if statement inside devicesList if statement - before find_unique_check_sums inside devicesList if statement - after find_unique_check_sums after devicesList if statement add_partitions_for(0x001053b8, mountFS = no) add_partitions_for(fd = 0, mountFS = no) 0x00105520 Partition::Partition 0x00105520 Partition::Scan() check for partitioning_system: GUID Partition Map check for partitioning_system: Intel Partition Map priority: 810 check for partitioning_system: Intel Extended Partition 0x00105698 Partition::Partition 0x00105520 Partition::AddChild 0x00105698 0x00105698 Partition::SetParent 0x00105520 new child partition! 0x00105770 Partition::Partition ... keeps booting fine ...
end of log where it reboots
Using mode 0x118 VESA compatible graphics! EDID1: 4f EDID2: ebx 0 Welcome to the Haiku boot loader! ## debug 0 number of drives: 2 ## debug 1 ## debug 2 ## debug 3 ## debug 4 ## debug 2 ## debug 3 ## debug 6 ## debug 11
comment:13 by , 10 years ago
My mistake. I mixed this ticket up with another (USB bug where I'd enabled it).
Enabled now, and uncommented the trace at BIOSDrive::ReadAt
Building now.
comment:14 by , 10 years ago
Ok, attaching a log of a normal boot without the drive attached, and one with the drive attached with the reboot problem.
Same changes as above, but with TRACE_DEVICES enabled at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n17 and the 2 lines for TRACE specifically for BIOSDrive::ReadAt uncommented at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n636
by , 10 years ago
Attachment: | serial-log-dprintf-extrainfo-trace1.txt added |
---|
normal boot without drive attached.
by , 10 years ago
Attachment: | serial-log-dprintf-extrainfo-trace2.txt added |
---|
boot with the drive attached where it reboots itself.
comment:15 by , 10 years ago
Description: | modified (diff) |
---|
We debugged this over IRC earlier today, and we found that the crash occurs in the int13 call to read a sector from the disk. All previous operations with int13 (detecting the availability of the "extended read" feature, and reading the drive capacity) work fine, but trying to read a sector from the disc will reboot the machine. Working with other drives in the system which are not attached to the SATA card works fine. So it could be a bug in the card's BIOS, or a problem with the parameters we give to it.
comment:16 by , 10 years ago
Description: | modified (diff) |
---|
A bug with Linux on the same chipset: https://bugzilla.kernel.org/show_bug.cgi?id=42679
Does disabling VT-d in the BIOS helps? (if your computer can do that)
Also, we got another user reporting this on IRC today with a SATA card using the same chipset. Points towards a hardware issue with that chipset/BIOS, as we got no report of this happening with other hardware.
comment:17 by , 10 years ago
Blocking: | 7665 added |
---|
comment:18 by , 7 years ago
Blocking: | 13528 added |
---|
comment:19 by , 7 years ago
This one raised its ugly head in #13528 .
Intel Sky lake Celeron 3855U
I added some debugging around these BIOS calls:
Additional Video Mode (1920x1080@60Hz): clock=148.5 MHz h: (1920, 2008, 2052, 2200) v: (1080, 1084, 1089, 1125) size: 51 cm x 28.699 cm border: 0 cm x 0 cm Horizontal frequency range = 30..80 kHz Vertical frequency range = 50..75 Hz Maximum pixel clock = 160 MHz Serial Number: LNZ080024237 Monitor Name: Acer S231HL crtc: h 2008/2052/2200, v 1084/1089/1125, pixel clock 148500000, refresh 6026 Welcome to the Haiku boot loader! boot drive ID: 80 drive ID 128 BIOS(13h): Restore BIOS IDT BIOS(13h): eax: 0x4100, ebx: 0x55aa, ecx: 0x13fca0, edx: 0x80, esi: 0x0, edi: 0x0, es: 0x0, flags: 0x0 BIOS(13h): Set debug BIOS IDT checking extensions: carry: 0; ebx: 0x0000aa55; ecx: 0x00130005 BIOS(13h): Restore BIOS IDT BIOS(13h): eax: 0x4800, ebx: 0xaa55, ecx: 0x130005, edx: 0x80, esi: 0x20, edi: 0x0, es: 0x0, flags: 0x286 BIOS(13h): Set debug BIOS IDT size: 1e drive_path_signature: 0 host bus: "", interface: "" cylinders: 942, heads: 255, sectors: 63, bytes_per_sector: 512 total sectors: 15133248 BIOS(13h): Restore BIOS IDT BIOS(13h): eax: 0x800, ebx: 0x0, ecx: 0x2960000, edx: 0x80, esi: 0x5daec, edi: 0x0, es: 0x0, flags: 0x10 BIOS(13h): Set debug BIOS IDT number of drives: 2 drive ID 129 BIOS(13h): Restore BIOS IDT BIOS(13h): eax: 0x4100, ebx: 0x55aa, ecx: 0x13fbe0, edx: 0x81, esi: 0xfc79, edi: 0x0, es: 0x0, flags: 0x0 BIOS(13h): Set debug BIOS IDT checking extensions: carry: 0; ebx: 0x0000aa55; ecx: 0x00130001 BIOS(13h): Restore BIOS IDT BIOS(13h): eax: 0x4800, ebx: 0xaa55, ecx: 0x130001, edx: 0x81, esi: 0x20, edi: 0x0, es: 0x0, flags: 0x246 BIOS(13h): Set debug BIOS IDT size: 1e drive_path_signature: 0 host bus: "", interface: "" cylinders: 16383, heads: 16, sectors: 63, bytes_per_sector: 512 total sectors: 117231408 BIOS reads 512 bytes from 0 (offset = 0), drive 129 BIOS(13h): Restore BIOS IDT BIOS(13h): eax: 0x4200, ebx: 0x105441, ecx: 0xfe34, edx: 0x81, esi: 0x20, edi: 0xaa55, es: 0x1, flags: 0x13 <rebootsky>
comment:23 by , 7 years ago
An idea to try: set "packet->size" to 16 (instead of sizeof(disk_address_packet)) at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n651
disk_address_packet is defined with an additional field flat_buffer, which is unused anyway.
comment:24 by , 6 years ago
Keywords: | boot-failure added |
---|
comment:25 by , 6 years ago
Blocking: | 7665 removed |
---|
comment:26 by , 6 years ago
Reproduced this on a Gigabyte motherboard from ~2011 or so with hardware RAID enabled. Removing the "flat_buffer" field doesn't seem to affect anything, but I also can't get the serial log at present...
comment:27 by , 6 years ago
Posted the removal of flat_buffer as https://review.haiku-os.org/c/haiku/+/1145. It doesn't seem to break anything, anyway.
comment:28 by , 4 years ago
Component: | System/Boot Loader → System/Boot Loader/BIOS |
---|
Please attach: