Opened 9 years ago
Closed 5 years ago
#12699 closed bug (no change required)
Various crashes in BHttpRequest's crypto
Reported by: | ttcoder | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Kits/Network Kit | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
This sometimes occurs in our Updater GUI (fortunately when it does happen, it is during tear-down, after the update has been performed). We get various backtraces, all involving "crypto" something, sometimes hinting at BHttpRequest. Will post a few backtraces as examples.
Seems to have started after the switch from libresolv(?). This is a bit of a "heisenbug" as this occurs mainly for Dane, not really for me. There might be a possibility that this is due to apps being linked against an older hrev (with a diff. libresolv?) being run in a more recent hrev, as the builds I send to Dane are compiled in 49662 whereas he runs hrev49856
Attachments (5)
Change History (12)
by , 9 years ago
Attachment: | cc-crypto-crash.report added |
---|
comment:1 by , 9 years ago
I think you mean netresolv, which is part of libnetwork and used for DNS resolution.
The "crypto" things are from OpenSSL, and not directly related to netresolv (in fact, the crypto functions are not related to network at all).
It is hard to extract meaningful information from your "backtrace" which shows only one entry. If you get the issue only on teardown, it could be that some BUrlRequests are not closed properly (you can't just destroy then, you need to Stop() them first).
comment:2 by , 9 years ago
Digging for more reports, I only found that one -- luckily it's the one I wanted, as it's most representative: from memory, most of Dane's crashes are in the BCertificate dtor indeed.
This is really Dane's ticket not mine, I don't have much of a dog in this fight as I'm mostly unable to reproduce this crash :-j .. But with the information you gave me pulkomandy I have renewed motivation to try out new things in the next build I'll send him:
- I'll take your advice and review my code for usage of Stop() in BUrlRequest/BHttpRequest, I didn't know it was necessary to call it before delete'ing, gotta fix that
- I'll check out the sources for the B* classes (especially BUrlContext) for ownership transfer and the like. Currently I assume I remain owner for all objects and delete them all myself in TtsUpdaterWin
- I might also use leak_checker.sh/malloc-debug, IIRC it also reports double-delete cases and lots more
comment:3 by , 9 years ago
I tried to be consistent in the design of the network API, so methods called Set* don't take ownership, while methods called Adopt* do.
In the case of BCertificate, do you manipulate those diretly? I think the only use is the CertificateVerificationFailed in HTTPS requests?
by , 9 years ago
Attachment: | redux.1 (open-and-close at once).txt added |
---|
leak_analyser.sh redux.1* crypto RegExp > red1 : a few hundred bytes of leaks, probably not significant
by , 9 years ago
Attachment: | redux.2 (wait for completion of update).txt added |
---|
leak_analyser.sh redux.2* crypto RegExp > red2: remarkably low-key "increment", given that several HTTP requests and some packaging code have been executed since redux.1
comment:4 by , 9 years ago
A bit of 'closure' (AFAIConcerned) on this..
Not much to report from my leak_check.sh run (attached them anyway in case there's interest). The net code does not seem to leak much -- in fact the hundreds of bytes mentionned might even not be a recurring leak but a "static" one, since some of them mention "_res", which IIRC is a static
The relevant info though is this: if my code had done a double-free or some such, it would have popped up the debugger at runtime, but my program always runs "clean", so no heap corruption I guess
*
A small request while we're on the subject -- UrlRequest.h:31 triggers an "unused variable 'timeout' " gcc warning (yes I enable hardline error reporting :-)
The thing is... In much of my source files this is the /only/ warning there is, and thus it pops up the Pe warning window unnecessarily.. Would be nice if the method signature was changed from "bigtime_t timeout" to just "bigtime_t" :-) The rest of the Haiku headers never trigger this warning, so makes sense to be consistent :-)
*
Regarding Stop(),
Checked the source of the class I use and Stop() is in fact automatically called in the dtor:
http://code.metager.de/source/xref/haiku/src/kits/network/libnetapi/HttpRequest.cpp#145
So I assume it would not help if I call Stop() by hand just before deleting the object
*
I do not deal with BCertificate directly at all.. Just the top level classes:
A quick outline of my code just in case:
class RetrieverFSM { ........ BUrlContext context; BHttpAuthentication authentication; BUrl url; BHttpRequest httpReq; }; void RetrieverFSM::SetFromSettings() { authentication.SetUserName( .... ); authentication.SetPassword( .... ); ... } void RetrieverFSM::DownloadNextFile() { ... url.SetUrlString( fullpath ); httpReq.SetUrl( url ); listener.Reset(); httpReq.SetListener( &listener ); context.AddAuthentication( url, authentication ); httpReq.SetContext( &context ); httpReq.Run(); }
*
Anyway..
I'm leaning toward a "race condition" of some sort, maybe in my code, so Dane will have to provide more info or reproducible steps, so that I have a chance to extract a reproducible case from my 2'000 lines updater :-)
Ball's in his court now
by , 9 years ago
Attachment: | TuneTracker_REDUX-2845-debug-30-03-2016-19-12-22.report added |
---|
Actually reproducible with an hrev49600+ build transferred to 50095, and guarded heap
comment:5 by , 9 years ago
Seems this is actually reproducible for me in fact, by going through a few hoops (didn't yet try to simplify the process and compile directly on this machine).
The Debugger still does not signal any double-free or the such though, it just mentions an abnormal memory allocation in the BUrlContext/Certificate's BList.
The syslog is interesting:
KERN: malloc() of 2684354560 bytes asked KERN: bfs: bfs_create_symlink:1020: File or Directory already exists KERN: bfs: bfs_open_dir:1647: Not a directory KERN: malloc() of 2684354560 bytes asked <----- note: that's hex value 0xa0000000 it seems KERN: bfs: bfs_create_symlink:1020: File or Directory already exists KERN: bfs: bfs_open_dir:1647: Not a directory KERN: malloc() of 2684354560 bytes asked KERN: bfs: bfs_create_symlink:1020: File or Directory already exists KERN: bfs: bfs_open_dir:1647: Not a directory KERN: malloc() of 2684354560 bytes asked KERN: bfs: bfs_create_symlink:1020: File or Directory already exists KERN: slab memory manager: created area 0xf1001000 (11527) (...) KERN: low resource memory: normal -> note KERN: low resource memory: note -> critical KERN: 0xdf8ff870->VMAnonymousCache::_Commit(1342185472): Failed to reserve 1342074880 bytes of RAM KERN: low resource memory: critical -> note KERN: 3101: DEBUGGER: failed to create area for allocation of 327681 pages KERN: debug_server: Thread 3101 entered the debugger: Debugger call: `failed to create area for allocation of 327681 pages' KERN: low resource memory: note -> warning KERN: stack trace, current PC 0x6090e114 commpage_syscall + 0x4: KERN: (0x7121abd8) 0x70ca25 panic__FPCce + 0x45 KERN: (0x7121b008) 0x70d363 guarded_heap_allocate_with_area__FUlUl + 0x93
EDIT:
Results: when running a build that's been compiled on that same hrev (rather than compiled on an older hrev) I'm unable to get the crash, the updater is rock solid. That's trying with both libroot.so and libroot_debug.so
That is not a proof in itself that the crash is due to an ABI change/issue, I guess the inability to reproduce the crash in this case could be due to a number of things; but that does prevent me from building a minimal reproducible case
Here's hoping we're able to switch all our hrevs to newer ones (pending commit of the two-liner in #12373 which seems to be excruciatingly difficult to do for some reason :-) so as to work around this issue.
Well d'uh ! Turns out the sizeof() of BUrlContext grew by 32 bytes in hrev49790, straddling the hrev I use to build and the hrev that Dane is now using.
So of course, code compiled against the older header would allocate the next object 32 bytes 'earlier' than it should, clobbering BUrlContext's fCertificates
member to kingdom come
Will install a heisenrev on my main machine and build there, and send the build to Dane, should fix it; will report back
P.S. Is it relevant/possible to add FBC padding to BUrlContext ..etc classes? Their headers are public in headers/os/net so application writers (beside me) might start using them and rely on their sizeof() remaining constant
EDIT: was about to post a message suggesting to add FBC to all concerned classes, but realized no amount of (reasonable) padding would have balanced out for a full 32 bytes size growth... Working on another way to protect ourselves from future FBC occurences
comment:6 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 5 years ago
Resolution: | → no change required |
---|---|
Status: | assigned → closed |
That was indeed resolved after the build machine "hopped" to the newer hrev at that time, which would entail the extra 32 bytes sizeof being allocated correctly (though then becoming backward incompatible, but don't care).
Incomplete backtrace, mentions "CRYPTO_add_lock"