Opened 6 years ago
Last modified 6 years ago
#14738 assigned bug
[pc_serial] Sometimes gets stuck in Write() (even in async mode)
Reported by: | ttcoder | Owned by: | mmu_man |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Drivers/TTY | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
I have a report from KEZM (they have an RS 232 switcher piloted by CC on Haiku) : CC tends to get stuck on the serial write a few times per week -- the app freezes.
My code configures the BSerialPort like thus
serialInput.SetBlocking( false );
Yet it seems to get in a 'blocked' state ?
Change History (7)
follow-up: 3 comment:2 by , 6 years ago
Component: | Kits/Device Kit → Drivers/TTY |
---|---|
Owner: | changed from | to
It seems the pc_serial driver does not handle non-blocking write nor timeouts. So I don't see a way to fix this from the application side.
I don't see where that write callback is called, however I see the semaphore is also released in the interrupt handler.
I suspect a race condition, where we manage to call write twice, quickly enough, and the interrupt triggers only once, resulting in the second write call blocking for no reason.
Reassigning to mmu_man, as he wrote this driver.
comment:3 by , 6 years ago
Replying to pulkomandy:
I suspect a race condition, where we manage to call write twice, quickly enough, and the interrupt triggers only once, resulting in the second write call blocking for no reason.
Very interesting.. If that is indeed the cause, I guess it would be mitigated by doing a snooze(9000) (or whatever the delay is between interrupts) after each write? Remember, a perfect fix later does not have to exclude an immediate work-around -- the station would be happy if I could provide a timely improvement :-)
comment:4 by , 6 years ago
WriteCallbackFunction
seems to be unused, it's a leftover from the usb_serial driver code which was used as model.
Looking back at the code it's amazing it actually works :)
comment:5 by , 6 years ago
Sent the station a build of CC with a snooze( 100000 ) call before each write(); I don't think he's seeing lock-ups/freezes any more, but it seems to have made things much worse for some reason, regarding the other bug -- the one with the serial port getting "out of sync" and confusing the switcher box ID...
Just realized there is this in his syslogs...
KERN: pc_serial: write: failed to get write done sem 0x8000000a
... and that 0x8000000a resolves to 0x8000000a: Interrupted system call
Let's take a bit of a logical leap here, and assume that the above is related to either (or both) bugs he is seeing, and needs to be addressed...
I seem to recall that when a piece of call is vulnerable to that, it needs to wrap its call in a while() loop, something like "while( write(blah) == B_INTERRUPTED ) try_again()"; maybe that is true of the pc_serial driver as well, it would need such a while loop ?
comment:6 by , 6 years ago
a write() syscall is allowed to return EINTR (see http://pubs.opengroup.org/onlinepubs/9699919799//functions/write.html : [EINTR] The write operation was terminated due to the receipt of a signal, and no data was transferred. )
The loop could happen on the client side. Would doing the loop in pc_serial not cause problems when using for instance Ctrl+Break?
comment:7 by , 6 years ago
The driver cannot do much about this.
The idea of this is that an application should reply immediately to signals (for example when you press control + C). So, if it is blocked in a system call (such as read() or write()), that will be interrupted and return EINTR to signal it to the application.
The application can then handle the signal, and restart its system call.
Alternatively, you can set SA_RESTART on the signal settings to let the syscall be retried automatically.
Running
ps
, he gets this:Looking for "pc_serial:done_write" yields this: http://xref.plausible.coop/source/xref/haiku/src/add-ons/kernel/drivers/ports/pc_serial/SerialDevice.cpp
From my limited understanding, it seems the semaphore is acquired in Write() and released in WriteCallbackFunction() ? So the fact my thread remains stuck means the callback was not called, does that sound correct? How do I deal with that in my app, can I "unblock" my thread from another thread ?
EDIT: oddly, looking for other references to the callback yields nothing: http://xref.plausible.coop/source/search?q=WriteCallbackFunction&defs=&refs=&path=&hist=&type=&project=haiku Is it not called by anyone?