Opened 17 years ago

Closed 16 years ago

#2359 closed bug (fixed)

weird pauses during disk access

Reported by: stippi Owned by: stippi
Priority: normal Milestone: R1
Component: Drivers/Disk Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

It started some time ago, it could be around the time that Marcus last worked on the AHCI driver to add ATAPI support and handle error conditions. The problem could of course be elsewhere, I should revert that particular patch to make sure. Especially shortly after booting, I get these funny "lock-ups". For example, when I move the Terminal window and the icons would need to redraw, the screen would freeze (I can still move the mouse though), and after several seconds, the icons would draw and I could move the window again. Also some applications would have a big delay when starting. The system always continues to work fine after these pauses, they don't seem to have any side effect. Another way to see them is when right clicking the Desktop shortly after boot to mount another partition, the menu would not appear for several seconds. Then everything goes on to work normally again. When I find some time, I will try to roll back the latest changes to the AHCI driver to see if these lock-ups disappear then. Will report back here. Hardware is IBM/Lenovo T60, AHCI driver is used for disk access. Revision is hrev25859, but the problems first appeared sometimes sooner.

Change History (9)

comment:1 by marcusoverhagen, 16 years ago

Status: newassigned

Is this problem still present? Might have been fixed by the tracker mime update change.

comment:2 by stippi, 16 years ago

This is definitely still a problem and as far as we have analyzed it here, it seems to be a problem in the disk subsystem, so likely AHCI. At the moment, we are trying to see if we can find the change that brought up these pauses.

comment:3 by mmlr, 16 years ago

Not sure if this is the same, but I'm doing an svn checkout of the Haiku sources to an AHCI attached SATA disk here and from time to time it stalls for a few seconds and then continues normally (each maybe 30 seconds or so). When these stalls happen the syslog reads:

KERN: ahci: ExecuteAtaRequest port 0: device timeout
KERN: ahci: sata_request::abort called for command 0x25
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 0
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)

comment:4 by stippi, 16 years ago

Yes, this is the very same issue and I am getting the same output when it happens. During the pauses, icons will not refresh on the Desktop. When they finally refresh again, the above output is printed in the syslog. I am suspecting that these pauses started happening at around hrev25647 - hrev25649, but I am not sure and we have not yet tried to revert just the changes from those revisions since later changes make this a bit more difficult. What may also be the case is that the problem was there all along but got undetected before these changes. Marcus, what do you think?

comment:5 by axeld, 16 years ago

As I pointed out to Ingo on the phone, this might very well be a problem of the interrupt handling. Ie. the edge vs. level detection or selection might not work properly on newer machines anymore, and therefore, interrupts could get lost. At least that's a theory of mine, and I've seen quite a few problems that could be caused by this.

I'll intend to look into it in the next few days.

comment:6 by marcusoverhagen, 16 years ago

The error handling in ahci has been improved, starting with hrev25647 it can now continue after all errors, but will reset the controller first, added in hrev25649.

What takes so long is waiting for the timeout (because no interrupt is received), it was 5 sec in hrev25597 and has been increased to 20 sec in hrev25649 (because cache sync is slow).

The ahci driver design is pretty simple, inside ExecuteSataRequest() it will execute the command, and waits for the interrupt. fCommandsActive is spinlock protected (except for the abort case), fRegs->ci is changed by the hardware from 1 to 0 if the command execution is complete.

(This could be optimized, by starting a timer, and processing the command finish inside the interrupt, without semaphore. Also the sata_request could be made a member of AHCIPort, I want to do so later.)

I'm not working on ahci right now, feel free to debug this Axel :)

comment:7 by axeld, 16 years ago

I don't really intend to touch AHCI; I think it's a general Haiku problem.

comment:8 by stippi, 16 years ago

Owner: changed from marcusoverhagen to stippi
Status: assignednew

We think we have found the problem... going to confirm soon. Hang on! :-)

comment:9 by stippi, 16 years ago

Resolution: fixed
Status: newclosed

Fixed in hrev26223. It was a race condition that the interrupt was issued before the thread initializing the transfer waited for the release of the semaphore (and the semaphore was only release if someone waited for it).

Note: See TracTickets for help on using tickets.