Opened 4 years ago

Closed 4 years ago

#6036 closed bug (fixed)

Read from cd failure (ISO9660 haiku-r1-alpha2.iso)

Reported by: aldeck Owned by: axeld
Priority: normal Milestone: R1
Component: Drivers/Disk Version: R1/alpha2
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Read from cd failure (ISO9660 haiku-hrev1-alpha2.iso)

Reading fails on the file apps/WebPositive/libwebcore.so always at 19,922,944 (0x1300000) bytes in length.
The correct length of libwebcore.so is 20,993,020.

tail -f /var/log/syslog

KERN: ata 0 error: command failed, error bit is set. status 0x51, error 0x54
KERN: atapi 0-0 error: device indicates transfer error after dma
KERN: check_sense: Illegal request
KERN: file_cache: read pages failed: General system error

The cd checksum is valid on linux, dd if=/dev/sr0 | md5sum

Another test: with that same cd, extracting libwebcore.so from the cd succeeds on linux (with correct size and checksum) although it fails on haiku (hrev36829 from trunk) (same machine, same cd).

This is the ISO9660 image, anyboot has no problem. Reproducible running on alpha2 (live cd installer) and hrev36829 from trunk.

Machine is a recent and reliable Q6600 / 4GB.

Attached is the result of the instructions from Ingo quoted below
(cf http://www.freelists.org/post/haiku/trouble-installing-R1A2,19 )

First of all install a 
breakpoint where the first error message is logged 
(ATAChannel::FinishRequest()):

  breakpoints ([symbol FinishRequest__10ATAChannel] + 0x129)

Then trigger the error (I guess a "cat .../libwebcore.so > /dev/null" should 
do). I'm not sure, if ATAChannel::FinishRequest() is executed in the same 
thread that requests the I/O operation. If not, find the thread ("calling 
periph_io") and get a stack trace. The second parameter of periph_io() is an 
io_operation* -- pass it to the "io_operation" command. The operation has a 
parent -- pass it to "io_request". This one might also have parent -> 
"io_request". And so on. The stack trace should also yield an IOCache or 
IOSchedulerSimple pointer (depending on which is used). Please pass it to 
"io_scheduler". Also use the address for a "dw <address> 32".

Attachments (1)

255619.txt (8.9 KB) - added by aldeck 4 years ago.
debugging session

Download all attachments as: .zip

Change History (12)

Changed 4 years ago by aldeck

debugging session

comment:1 Changed 4 years ago by aldeck

note that i've got everything setup for a remote kdl session if needed. (email me)

comment:2 follow-up: Changed 4 years ago by bonefish

  • Component changed from - General to Drivers/Disk
  • Owner changed from nobody to mmlr
  • Status changed from new to assigned

That a dd if=/dev/sr0 | md5sum (without any other parameters to dd) returns the correct checksum, while a direct md5sum does not, is very suspicious. Apparently something is wrong with the CD or the drive or both.

The disk size the IOCache gets from its owner (scsi_cd) is 487581696 and the current, failing I/O operation is a read at offset 487063552 with length 518144 (i.e. up to 487581696). The ISO image size is 487577600, i.e. 4 KB smaller, and I guess that's also the size of the actually readable content of the disk. That the request fails (with "Illegal Request") is therefore not really surprising.

I'm not familiar with the involved busses and protocols, so I cannot say whether it is possible to fix or work around the problem at that level. Assigning to Michael for that purpose.

A possible work-around at the IOCache level would be to test-read at the end of the medium to find out its actual size. Quite ugly, though.

comment:3 in reply to: ↑ 2 Changed 4 years ago by aldeck

Replying to bonefish:

That a dd if=/dev/sr0 | md5sum (without any other parameters to dd) returns the correct checksum, while a direct md5sum does not, is very suspicious.

Sorry for the confusion, i initially thought i would provide the generic instructions for other to test, then forgot about it. I personally had to use bs=2048 count=238075 (ie: 487577600 / 2048)

comment:4 Changed 4 years ago by axeld

I just looked into the issue: according to the linux sources, the SCSI spec allows the device to report up to 75 blocks beyond its actual size. While I did not find any mention of this in the specs (only looked at MMC-5/READ_CAPACITY), what they do is to check whether an access covered the last part of a device, and adjust the disk size on the fly then by evaluating the exact SCSI reply (see drivers/scsi/sr.c::sr_done() line 257ff in 2.6.34). FreeBSD ignores the issue, so it should have the same issues we have.

As a possible work-around, I would suggest issuing an additional read in cd_set_capacity(), and handling the error case in either periph_io(), or adding a dedicated function for this in scsi_cd (I would prefer the latter).

comment:5 follow-up: Changed 4 years ago by tangobravo

Interesting. I think it's worth updating the R1A2 iso image with a work-around. Lots of people seem to be hitting this issue, and it would be good to fix it if R1A2 is going to be the "official" release for another 6 months or so.

comment:6 in reply to: ↑ 5 Changed 4 years ago by jackburton

Replying to tangobravo:

Interesting. I think it's worth updating the R1A2 iso image with a work-around. Lots of people seem to be hitting this issue, and it would be good to fix it if R1A2 is going to be the "official" release for another 6 months or so.

Agreed.

comment:7 Changed 4 years ago by axeld

  • Owner changed from mmlr to axeld
  • Status changed from assigned to in-progress

Implemented in hrev36989 & hrev36990 - works for my test CD, please report back, so that we can close this bug.

comment:8 Changed 4 years ago by aldeck

Trying to test on hrev36990 but it seems that it doesn't detect media changes anymore, ie: no automounting of any cd. And nothing shows up in the syslog.

Trying to mount manually worked, on the second attempt though. And copying libwebcore.so worked!

The first mount attempt errored out with (hand typed):

KERN: dma buffer restrictions not yet implemented: boundary 65536
KERN: last message repeated 31 times
KERN: ata 0 error: command failed, error bit is set. status 0x51, error 0x54
KERN: atapi 0-0 error: device indicates transfer error after dma
KERN: check_sense: Illegal request
KERN: scsi_cd: adjusted capacity from 238077 to 238076 blocks.

So it seems the original problem has been fixed but a regression popped in (or a mysterious issue on my side).

comment:9 Changed 4 years ago by aldeck

Ah, unfortunately, i just tried booting and installing from the cd itself (instead of booting a harddrive install of hrev36990 and testing things from there) and it failed on copying libwebcore.so like originally reported.

comment:10 Changed 4 years ago by axeld

Interesting. FWIW there were no changes with the media change - I had problems with this as well, but only if I had the tray open upon boot. In any case, this would deserve a different bug report.

Can you compile scsi_cd with debug output turned on (ie. define TRACE_CD_DISK), and see if the adjusted capacity message was in the syslog? Maybe it does not work for the boot device for some reason.

comment:11 Changed 4 years ago by aldeck

  • Resolution set to fixed
  • Status changed from in-progress to closed

*cough* of course i was booting the alpha2 cd, not hrev36990's... Which worked fine! I guess we can close this one :)

Will investigate that other issue, i indeed might have booted with the tray open when doing those tests.

Closing, thanks!

Note: See TracTickets for help on using tickets.