Opened 5 years ago
Closed 5 years ago
#15853 closed bug (fixed)
[pkgman] Failed to download package cmake: Operation would block
Reported by: | diver | Owned by: | pulkomandy |
---|---|---|---|
Priority: | high | Milestone: | R1/beta2 |
Component: | Kits/Package Kit | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
hrev54023 x86_64.
Resizing Terminal window while pkgman is downloading packages results in Operation would block
error.
Change History (32)
comment:1 by , 5 years ago
comment:3 by , 5 years ago
I now started getting "Operation would block" even without resizing Terminal.
pkgman list
showed this url which was 404:
https://eu.hpkg.haiku-os.org/haikuports/master/x86_64/current
Fixed it with:
pkgman add https://eu.hpkg.haiku-os.org/haikuports/master/repository/x86_64/current/
Has some redirect stopped working or something? Could be that Operation would block
would thrown if pkgman can't connect due to 404? Sounds like a wrong error code.
comment:4 by , 5 years ago
There was an error code change in the version of OpenSSL 1.1 we are on that broke some applications which may be related. They reverted it in the newer patches of OpenSSL, so we should update.
comment:5 by , 5 years ago
A few users on Telegram started seeing this error. Should we move it to beta2?
comment:6 by , 5 years ago
Milestone: | Unscheduled → R1/beta2 |
---|---|
Priority: | normal → high |
Yes, definitely.
comment:7 by , 5 years ago
I have added a recipe for OpenSSL 1.1.1g. Let's see if it helps. If it does, we need to re-run the hardlink packages script, I guess, so that it ships with the beta?
comment:8 by , 5 years ago
I'm going to do that anyway after Humdinger pushes new recipes for applications following the translations freeze.
comment:9 by , 5 years ago
Beta 2 status meeting: this issue has a potential patch that can be applied to the beta2 branch.
comment:10 by , 5 years ago
Can we do this update soon? It would be nice to make sure OpenSSL fixes indeed solve this, or if further changes on our side would also be needed.
comment:11 by , 5 years ago
Why do we need to do the update to test? OpenSSL is not statically linked, just update your system package and test it.
comment:12 by , 5 years ago
Indeed. Let me rephrase: can anyone reproduce this with an updated install? (not a stock one that would still include the unpatched openssl)
comment:13 by , 5 years ago
Just happened to me with haiku-master-hrev54185-x86_gcc2h-anyboot.iso
with openssl_1.1.1e-1_x86_gcc2.hpkg
on Terminal window resize during pkgman install qbittorrent
.
comment:14 by , 5 years ago
Yes, which is why I asked to test with an updated install. openssl_1.1.1g is what we want to test, but the default image is currently not shipped with it.
comment:15 by , 5 years ago
So far I can't reproduce it. However Just now in hrev54215 I got this:
~> pkgman full 100% repochecksum-1 [65 bytes] Validating checksum for Haiku...done. Refreshing repository "Haiku" failed 0%: Interrupted system call 100% repochecksum-1 [64 bytes]
I didn't resize the terminal and subsequent pkgman full
succeeded.
comment:17 by , 5 years ago
So it looks like hrev53812 did address this, but then the (necessary) followup hrev53853 broke it again. When I reproduce this locally, sending SIGWINCH or any other signal interrupts the system call of SSL_read(), but it weirdly returns SSL_ERROR_WANT_READ. I expected it to return a more specific error in this case, something mapped to EAGAIN. But I guess that isn't what we get.
So after hrev53853 we map SSL_ERROR_WANT_READ to B_WOULD_BLOCK. But the caller, BHttpRequest::_MakeRequest(), is not a non-blocking I/O context and doesn't expect B_WOULD_BLOCK.
A bandaid fix would be to just add a special branch to the only call of fSocket->Read() like so
diff --git a/src/kits/network/libnetapi/HttpRequest.cpp b/src/kits/network/libnetapi/HttpRequest.cpp index 2638e66888af..9b5b550e2451 100644 --- a/src/kits/network/libnetapi/HttpRequest.cpp +++ b/src/kits/network/libnetapi/HttpRequest.cpp @@ -588,8 +588,12 @@ BHttpRequest::_MakeRequest() bytesRead = fSocket->Read(chunk, kHttpBufferSize); if (bytesRead < 0) { - readError = bytesRead; - break; + if (bytesRead == B_WOULD_BLOCK) + fSocket->WaitForReadable(); + else { + readError = bytesRead; + break; + } } else if (bytesRead == 0) receiveEnd = true;
This works for me but it isn't a complete fix. For one, I think this could happen on the write path too, but there isn't any error checking happening for calls to fSocket->Write().
The other potential problem is that it seems SSL_read() could return SSL_ERROR_WANT_WRITE if the state machine of the TLS session is expecting you to write because no data will ever be available for read.
It also seems that SecureSocket should know whether the caller expects Read() or Write() to block. If it had a blocking I/O mode then we wouldn't need to touch all of the Read and Write call sites to make this work everywhere.
comment:18 by , 5 years ago
Is there no way to detect a purely interrupted syscall, and just retry that, and return EWOULDBLOCK syscalls to the caller? This seems like a common problem, what do other libraries that use OpenSSL do or expect?
comment:19 by , 5 years ago
It looks like "errno" may still contain, according to the OpenSSL manpages, the underlying syscall error. Maybe we can just test that here, if we get ERROR_WANT_...?
comment:20 by , 5 years ago
The ERROR_WANT_WRITE should not happen in our case because we configure OpenSSL in "auto retry" mode, if I understand things correctly (but it's quite possible that I don't).
waddlesplash, do you have a reference for this in the manpages? If this is indeed now documented, the correct thing to do would be to restart the syscall specifically in the case we get EINTR, and not in other error cases. But when I wrote this code I could not find any documentation that would guarantee this. I may have simply missed it.
comment:21 by , 5 years ago
Actually, it appears AUTO_RETRY makes BIO ... not retry: https://www.openssl.org/docs/man1.0.2/man3/BIO_get_ssl.html (see NOTES)
I found some answers on StackOverflow and elsewhere that OpenSSL does not taint "errno", but maybe that is no longer the case. Seems worth checking.
comment:22 by , 5 years ago
I don't fully understand what AUTO_RETRY mode is. You're right waddlesplash that page seems to imply that it does the opposite of what we want.
I tested commenting out the line that sets SSL_MODE_AUTO_RETRY on BSecureSocket::Private::sContext, and I also tried setting SSL_MODE_AUTO_RETRY on BSecureSocket::Private::fSSL. In every case I still get SSL_ERRRO_WANT_READ back when interrupting a read.
The good news is that errno is being set to EINTR.
comment:23 by , 5 years ago
I take it back, something else was setting errno. I made sure to clobber errno before calling SSL_read and now I can see that it isn't being set. EDIT: Revisiting this today, I see that this comment was incorrect because my test code was just wrong. errno does indeed get set to EINTR.
comment:24 by , 5 years ago
https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_mode.html is a bit more clear about auto_retry and also mention that it is the default in OpenSSL 1.1.1, so we can remove this from the context setup.
According to https://stackoverflow.com/questions/24188013/openssl-and-signals?rq=1 , we should be able to get the errno from get_last_socket_error(), and use BIO_should_retry() to know if we need to retry an operation.
comment:25 by , 5 years ago
I've been reading through examples in the openssl repo to get a better feel for the API. It looks like SSL_waiting_for_async() is a better signal for whether we should return B_WOULD_BLOCK.
https://www.openssl.org/docs/man1.1.0/man3/SSL_waiting_for_async.html
So we can check for EINTR in errno, but in my experiments relying on SSL_waiting_for_async seems to do the right thing.
comment:26 by , 5 years ago
Looking through the openssl code, it looks like the place where it defines whether or not BIO_should_retry is just this.
switch (SSL_get_error(ssl, (int)ret)) { case SSL_ERROR_WANT_READ: BIO_set_flags(b, BIO_FLAGS_READ | BIO_FLAGS_SHOULD_RETRY); break; case SSL_ERROR_WANT_WRITE: BIO_set_flags(b, BIO_FLAGS_WRITE | BIO_FLAGS_SHOULD_RETRY); break; case SSL_ERROR_WANT_CONNECT: BIO_set_flags(b, BIO_FLAGS_IO_SPECIAL | BIO_FLAGS_SHOULD_RETRY); BIO_set_retry_reason(b, BIO_get_retry_reason(next)); break; case SSL_ERROR_WANT_X509_LOOKUP: BIO_set_retry_special(b); BIO_set_retry_reason(b, BIO_RR_SSL_X509_LOOKUP); break; default: break; } break;
It doesn't appear that we are using SSL_MODE_ASYNC. If we did, then I think we'd want to use SSL_waiting_for_async() to determine whether to return B_WOULD_BLOCK instead of retrying.
I think just reverting hrev53853 gets us back into a working state, and if we decide to support SSL_MODE_ASYNC then we would just change this retry loop to use SSL_waiting_for_async and BIO_should_retry together.
comment:27 by , 5 years ago
Doesn't that break Vision, which does use non-blocking sockets here?
I can't remember why I made it use non-blocking sockets (or maybe it already did) but at least that needs to be preserved (or fixed).
If EINTR is set in errno, then we may just want to rely on that instead of SSL_waiting_for_async.
comment:28 by , 5 years ago
Oh I didn't know about the Vision problem. I'll look at that code to try to understand what's going on.
comment:29 by , 5 years ago
Vision requiring asynchronous communications probably occurred here: https://github.com/HaikuArchives/Vision/commit/2343f4eadb453bfd5bf7e978e456b586f1fb4edb
That commit may just be wrong, I have not revisited it since I wrote it; but at least it's worked until now.
comment:30 by , 5 years ago
OK cool, thanks for the heads up. I've been reading through the code. This is a nice test case.
At least this seems to work as a hotfix. https://github.com/ambroff/haiku/commit/bf182a76186aab0df4f7959f2f1bc7a6749aa077
It will take me some time to fully understand how openssl is meant to be used. I've never used it directly before this.
comment:31 by , 5 years ago
I have only minimal experience with OpenSSL, but at least that fix looks good to me.
This seems strangely familiar. I thought we had a similar ticket in the last couple of months that was fixed, but couldn't find it.