Opened 10 years ago

Last modified 4 years ago

#11019 assigned bug

Boot fail with SATA card and drive attached but not used by Haiku — at Version 16

Reported by: jstressman Owned by: bonefish
Priority: normal Milestone: R1
Component: System/Boot Loader/BIOS Version: R1/Development
Keywords: boot-failure Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by pulkomandy)

I have an IO Crest SI-PEX40057 SATA III PCI-e card in my machine, and if I hook any hard drive to it, Haiku completely fails to boot and instead just reboots the machine as soon as it shows "Loading system" on the screen, before the graphical boot loader or anything else appears.

"IO Crest SATA III 4-Port PCI-e 2.0 x 2 Card with Marvell HyperDuo RAID Mode Support and Low Profile Brackets SI-PEX40057"

http://www.amazon.com/gp/product/B00AZ9T264/

This appeared after hrev46284 (which works). hrev46287 is affected and does not boot.

Merely unhooking the drive from the card "fixes" the problem and everything works fine.

I tried hooking up two different hard drives to the card, and both cause this problem, even though neither drive is the Haiku drive. These are just extra storage drives. (And everything works fine in Windows and Linux.)

Also, even with hrev46284 and earlier, if you try to enter the safe mode boot menu by holding down shift or space, this will cause a reboot before you load anything.

I went back to hrev45284 (100 revisions back) and still had the same problem, so I'm not sure how far back that issue goes (reboot when trying to enter safe mode menu with shift/space).

And again, just unhooking the drive from the card restores expected functionality and the system boots properly.

Change History (25)

comment:1 by waddlesplash, 10 years ago

Please attach:

  1. A screenshot of the kernel crash (or photo...)
  2. The output of "listdev" on the working revision
  3. The syslog from the working revision

comment:2 by jstressman, 10 years ago

  1. There is no kernel crash. It simply reboots.
  2. See attached listdev.txt
  3. See attached syslog.txt and syslog.old
Last edited 10 years ago by jstressman (previous) (diff)

by jstressman, 10 years ago

Attachment: syslog.old added

first "half" of log

by jstressman, 10 years ago

Attachment: syslog.txt added

end of the log.

by jstressman, 10 years ago

Attachment: listdev.txt added

comment:3 by waddlesplash, 10 years ago

Component: System/Boot LoaderDrivers/Disk
Owner: changed from axeld to nobody

It'd be nice if this could be fixed by R1alpha5, but I'm not going to make it a blocker because chances are it won't be...

comment:4 by diver, 10 years ago

Component: Drivers/DiskSystem/Boot Loader
Owner: changed from nobody to bonefish
Status: newassigned

So hrev46287 seems to be the problematic commit. Assigning to Ingo.

comment:5 by bonefish, 10 years ago

Given that there was a preexisting triple fault I assume hrev46287 just shifted it due to a different memory use pattern (uses less heap and more raw memory regions).

The syslog from the successful boot doesn't show anything off AFAICT. It would be helpful to get the output of a failed boot.

Other than that the only options for narrowing the issue I see ATM are 1. bisecting old revisions to find the one that introduced the original issue (assuming it hasn't been there forever) and 2. bisect the boot loader code with an infinite loop to narrow down the location of the triple fault.

To get the boot output in cases where serial output isn't accessible, we could add a build time option to print it on screen instead.

comment:6 by jstressman, 10 years ago

That was part of the problem I ran into on this. I don't have any serial output on this machine.

If there were a way to output some debugging info to the screen that might help, as it reboots so quickly that I'm not sure how far it even gets into the boot process. It says "Loading system" in the upper left corner and reboots immediately, before you can get into the safe mode menu, or see anything else come up on screen.

I'm definitely open to work with someone on getting it figured out.

I could try to dig back and figure out where the shift/space for safe mode menu reboot came in, but I'm not sure we have official nightly revisions far back enough to find out. So I might potentially be doing a lot of custom builds. ;)

I'm trying to put together a machine with both PCI-e and a serial port on it right now so that I can stick that SATA card in it and see if I can reproduce the problem. If so, then I can try running a null modem cable to another machine with a serial port I have to try collecting any potential debug info.

I'll report back if/when I have any luck and we can potentially go from there.

comment:7 by jstressman, 10 years ago

I purchased a null modem cable and built a test machine for Haiku that has both a serial out on the board, and PCI-e for the SATA card. I purchased an ATEN UC232A USB to PDA/Serial (DB9) Adapter and used that on this machine to receive the debugging output.

After swapping the SATA card into that machine, I see the same almost instant reboot behavior with hrev46677

I'm attaching the output of the serial debugging both with a drive attached, and without.

For the 2nd log without the drive attached, the crash at the end is unrelated and is documented in ticket #11040

by jstressman, 10 years ago

Attachment: serial-log-SATAbug1.txt added

log with SATA card in with a drive attached to it.

by jstressman, 10 years ago

Attachment: serial-log-SATAbug2.txt added

log with SATA card in WITHOUT a drive attached to it. crash at the end is separate, see ticket #11040

comment:8 by jstressman, 10 years ago

Alpha 4.1 reboots immediately when trying to enter safe mode menu (like hrev46284 and earlier). Alpha 3 reboots immediately when trying to enter safe mode menu. Alpha 2 reboots immediately when trying to enter safe mode menu. Alpha 1 reboots immediately when trying to enter safe mode menu.

There is no output to the log before the reboot happens on Alpha2 through 4.1, but Alpha1 actually does output to the log before it reboots. I'm attaching that.

(I used shift to enter the safe mode menu on 2 through 4.1, but had to use space on 4.1. I assume shift hadn't been added as a usable key yet at that point?)

by jstressman, 10 years ago

Attachment: serial-log-alpha1-1.txt added

log with SATA card in with a drive attached to it, attempting to enter the safe mode menu and triggering an immediate reboot on R1Alpha1.

comment:9 by korli, 10 years ago

Related to bug1 (with a drive attached), could you build with TRACE_DEVICES enabled at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n17 ?

comment:10 by jstressman, 10 years ago

korli: enabled TRACE_DEVICES in said file, installed the package...

jam haiku_loader.hpkg
pkgman install [dir to file]/haiku_loader.hpkg

Rebooted, but unless I did something wrong it looks like it reboots before it even gets a chance to output anything meaningful to the log?

by jstressman, 10 years ago

Attachment: serial-log-pkgupdate3.txt added

reboot after enabling TRACE_DEVICES in said file and recompiling and installing haiku_loader package.

comment:11 by jstressman, 10 years ago

TRACE_DEVICES didn't seem to help, so on the advice of Diver and jessicah I added a bunch of dprintfs to help narrow down the problem point.

from src/system/boot/platform/bios_ia32/devices.cpp - line 521

static status_t
add_block_devices(NodeList *devicesList, bool identifierMissing)
{
	if (sBlockDevicesAdded)
		return B_OK;

	uint8 driveCount;
	if (get_number_of_drives(&driveCount) != B_OK)
		return B_ERROR;

dprintf("## debug 0\n");

	dprintf("number of drives: %d\n", driveCount);

dprintf("## debug 1\n");

	for (int32 i = 0; i < driveCount; i++) {
dprintf("## debug 2\n");
		uint8 driveID = i + 0x80;
dprintf("## debug 3\n");
		if (driveID == gBootDriveID) {
dprintf("## debug 4\n");
			continue;
dprintf("## debug 5\n");
		}
		BIOSDrive *drive = new(nothrow) BIOSDrive(driveID);
dprintf("## debug 6\n");
		if (drive->InitCheck() != B_OK) {
dprintf("## debug 7\n");
			dprintf("could not add drive %u\n", driveID);
dprintf("## debug 8\n");
			delete drive;
dprintf("## debug 9\n");
			continue;
dprintf("## debug 10\n");
		}
dprintf("## debug 11\n");

		// Only add usable drives
		if (is_drive_readable(drive)) {
dprintf("## debug 12\n");
			devicesList->Add(drive);
dprintf("## debug 13\n");
		} else {
dprintf("## debug 14\n");
			dprintf("could not read from drive %" B_PRIu8 ", not adding\n", driveID);
			delete drive;
			continue;
		}
dprintf("## debug 15\n");

		if (drive->FillIdentifier() != B_OK)
dprintf("## debug 16\n");
			identifierMissing = true;
	}

dprintf("before devicesList if statement\n");

	if (identifierMissing) {
		// we cannot distinguish between all drives by identifier, we need
		// compute checksums for them
dprintf("inside devicesList if statement - before find_unique_check_sums\n");
		find_unique_check_sums(devicesList);
dprintf("inside devicesList if statement - after find_unique_check_sums\n");
	}

dprintf("after devicesList if statement\n");

	sBlockDevicesAdded = true;
	return B_OK;
}

section of the log where it works fine and keeps booting

Using mode 0x118
VESA compatible graphics!
EDID1: 4f
EDID2: ebx 0
Welcome to the Haiku boot loader!
## debug 0
number of drives: 1
## debug 1
## debug 2
## debug 3
## debug 4
before devicesList if statement
inside devicesList if statement - before find_unique_check_sums
inside devicesList if statement - after find_unique_check_sums
after devicesList if statement
add_partitions_for(0x001053b8, mountFS = no)
add_partitions_for(fd = 0, mountFS = no)
0x00105520 Partition::Partition
0x00105520 Partition::Scan()
check for partitioning_system: GUID Partition Map
check for partitioning_system: Intel Partition Map
  priority: 810
check for partitioning_system: Intel Extended Partition
0x00105698 Partition::Partition
0x00105520 Partition::AddChild 0x00105698
0x00105698 Partition::SetParent 0x00105520
new child partition!
0x00105770 Partition::Partition

... keeps booting fine ...

end of log where it reboots

Using mode 0x118
VESA compatible graphics!
EDID1: 4f
EDID2: ebx 0
Welcome to the Haiku boot loader!
## debug 0
number of drives: 2
## debug 1
## debug 2
## debug 3
## debug 4
## debug 2
## debug 3
## debug 6
## debug 11

comment:12 by korli, 10 years ago

It seems like you should do the same in BIOSDrive::ReadAt()...

comment:13 by jstressman, 10 years ago

My mistake. I mixed this ticket up with another (USB bug where I'd enabled it).

Enabled now, and uncommented the trace at BIOSDrive::ReadAt

Building now.

comment:14 by jstressman, 10 years ago

Ok, attaching a log of a normal boot without the drive attached, and one with the drive attached with the reboot problem.

Same changes as above, but with TRACE_DEVICES enabled at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n17 and the 2 lines for TRACE specifically for BIOSDrive::ReadAt uncommented at http://cgit.haiku-os.org/haiku/tree/src/system/boot/platform/bios_ia32/devices.cpp#n636

by jstressman, 10 years ago

normal boot without drive attached.

by jstressman, 10 years ago

boot with the drive attached where it reboots itself.

comment:15 by pulkomandy, 10 years ago

Description: modified (diff)

We debugged this over IRC earlier today, and we found that the crash occurs in the int13 call to read a sector from the disk. All previous operations with int13 (detecting the availability of the "extended read" feature, and reading the drive capacity) work fine, but trying to read a sector from the disc will reboot the machine. Working with other drives in the system which are not attached to the SATA card works fine. So it could be a bug in the card's BIOS, or a problem with the parameters we give to it.

comment:16 by pulkomandy, 10 years ago

Description: modified (diff)

A bug with Linux on the same chipset: https://bugzilla.kernel.org/show_bug.cgi?id=42679

Does disabling VT-d in the BIOS helps? (if your computer can do that)

Also, we got another user reporting this on IRC today with a SATA card using the same chipset. Points towards a hardware issue with that chipset/BIOS, as we got no report of this happening with other hardware.

Note: See TracTickets for help on using tickets.