#10880 closed bug (fixed)
haiku_loader can hang for minutes when one or more BIOS reported drives are unreadable
Reported by: | MasterM | Owned by: | pdziepak |
---|---|---|---|
Priority: | high | Milestone: | R1/beta1 |
Component: | System/Boot Loader | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | x86 |
Description
The problem is caused by find_unique_check_sums() that tries to avoid checksum clash for a maximum of 200 iterations. If one or more drives report read errors during that loop it can take a very long time to complete.
Setting maximum tries to a low value would have a side effect of potentially increased change of collision. Better solution is to check if we can read from a drive before adding it to the block device list.
I marked this issue as high priority because the boot loader is a critical part of an operating system. Feel free to change the priority to something lower if necessary.
Attachments (1)
Change History (15)
comment:1 by , 10 years ago
patch: | 0 → 1 |
---|
comment:2 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 10 years ago
Owner: | changed from | to
---|
comment:4 by , 10 years ago
Milestone: | Unscheduled → R1/alpha5 |
---|
follow-up: 7 comment:5 by , 10 years ago
comment:6 by , 10 years ago
I say we use GitHub, but manually pull from repositories and don't use their automerge feature.
follow-up: 8 comment:7 by , 10 years ago
Replying to pdziepak:
- Why try to read 512 bytes? Wouldn't 1 byte be enough?
BIOSDrive::ReadAt uses BIOS interrupt 13h for disk access. You can only read a multiple of sector size that way and since there are virtually no devices with sector size less than 512 bytes, there is really no performance difference when reading anything <=512 bytes.
Since find_unique_check_sums() uses chunks of 512b anyway I figured it would be good heuristic to check if one could read at least chunk that big without any explicit read errors.
- Format string,
driveID
isuint8
notunsigned
.
On most compilers driveID should be promoted to unsigned int automatically. Should I add explicit cast there?
- That's probably out of scope of this patch but if I understand correctly (basing on th description of this ticket) we should try some perfect hashing here (without any bound on number of attempts) that guarantees generation of collision-free hash table in O(n). I admit though that I haven't fully read and understood the code in question yet, so this suggestion may be wrong.
Well there are really two seperate problems with that piece of code. One of them is indeed a simple sum used as a poor's man hashing algorithm (a good candidate for replacement would be crc32 for example) and the other one is that it tries to read random sectors from a drive in an attempt to uniquely identify it. So if two drives contain exactly the same data, then it will fail anyway after max tries. This scenario is of course highly unlikely but nevertheless possible.
It is worth noting that this algorithm only runs if haiku_loader is unable to get drive identification data from BIOS (which happens quite often on real hardware but I haven't looked at that code so can't say why).
comment:8 by , 10 years ago
Replying to MasterM:
Replying to pdziepak:
- Why try to read 512 bytes? Wouldn't 1 byte be enough?
BIOSDrive::ReadAt uses BIOS interrupt 13h for disk access. You can only read a multiple of sector size that way and since there are virtually no devices with sector size less than 512 bytes, there is really no performance difference when reading anything <=512 bytes.
is_drive_readable
doesn't have to care about the implementation of BIOSDrive::ReadAt
. All we need to do is to check whether read from the device are successful - 1 byte is enough for that purpose.
- Format string,
driveID
isuint8
notunsigned
.On most compilers driveID should be promoted to unsigned int automatically. Should I add explicit cast there?
Actually, it is promoted to int
(assuming sizeof(int) > 1
), but that doesn't really matter. It is a good practice to use PRI*
macros whenever trying to print [u]int*_t
(well, in this case it would be B_PRI*
and [u]int*
.
follow-up: 10 comment:9 by , 10 years ago
I should have mentioned earlier that this patch really needs a proper description. Something similar to what you wrote in this bug report should be good (reference "fixes #...." is not enough).
Also, because of C++11 user-defined literals there has to be space between "
and B_PRI
. Obviously, we like symmetry very much so you should also add space after B_PRI
.
The comments seem a bit redundant, but it is not very important so it can stay the way it is if you want.
comment:10 by , 10 years ago
Replying to pdziepak:
I should have mentioned earlier that this patch really needs a proper description. Something similar to what you wrote in this bug report should be good (reference "fixes #...." is not enough).
It already does :) That is the second line, not the first.
by , 10 years ago
Attachment: | 0001-haiku_loader-now-ignores-unusable-drives-reported-by.patch added |
---|
Ok, let's try again. :)
comment:11 by , 10 years ago
Owner: | changed from | to
---|
In newly added code we prefer BIOSDrive* drive
over BIOSDrive *drive
. Please stick to that in future, otherwise you risk that your committer will embarrass themselves by failing to use git. ;)
Anyway, merged in hrev47279. Thanks!
comment:12 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:13 by , 10 years ago
Some general remarks (just got aware of this ticket):
- The hash value is only used to identify disks later on in the kernel, if there is no other mapping available (most BIOS implementations do not provide that info, unfortunately). It doesn't really need to be a strong hashing mechanism, as collisions are still not very likely.
- If the disks are identical, this algorithm will indeed fail, but then it does not really matter, anyway ;-)
- The real problem in the randomness, however, is that it may fail for non-identical drives, too. That may be an actual problem, however unlikely. The only way to improve that would be to smartly choose the blocks, but that would require intimidate knowledge of the underlying file system. And may also fail unless you read in the complete disk.
- As discussed on the ML a short time ago, we use Trac for tickets to be able to not lose patches. Whether or not this is a good idea might be debatable, but IMO for simple bug fixes it's not really important to have the committer to stay around to push the same patch over and over until someone finally accepts it. Of course, there are tons of ways to improve this workflow, like using Gerrit instead, or having a "Commit" button in Trac directly.
I think removing drives from the list is a good solution, although I'm not sure it should happen at that point (it could also happen at the initial disk detection, as that would save the extra read).
comment:14 by , 10 years ago
Milestone: | R1/alpha5 → R1/beta1 |
---|
driveID
isuint8
notunsigned
.