Opened 4 years ago

Last modified 3 years ago

#12699 assigned bug

Various crashes in BHttpRequest's crypto

Reported by: ttcoder Owned by: nobody
Priority: normal Milestone: R1
Component: Kits/Network Kit Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

This sometimes occurs in our Updater GUI (fortunately when it does happen, it is during tear-down, after the update has been performed). We get various backtraces, all involving "crypto" something, sometimes hinting at BHttpRequest. Will post a few backtraces as examples.

Seems to have started after the switch from libresolv(?). This is a bit of a "heisenbug" as this occurs mainly for Dane, not really for me. There might be a possibility that this is due to apps being linked against an older hrev (with a diff. libresolv?) being run in a more recent hrev, as the builds I send to Dane are compiled in 49662 whereas he runs hrev49856

Attachments (5)

cc-crypto-crash.report (29.0 KB ) - added by ttcoder 4 years ago.
Incomplete backtrace, mentions "CRYPTO_add_lock"
ak-BCertificate-crash.report (27.3 KB ) - added by ttcoder 4 years ago.
BCertificate dtor crash
redux.1 (open-and-close at once).txt (13.5 KB ) - added by ttcoder 4 years ago.
leak_analyser.sh redux.1* crypto RegExp > red1 : a few hundred bytes of leaks, probably not significant
redux.2 (wait for completion of update).txt (23.1 KB ) - added by ttcoder 4 years ago.
leak_analyser.sh redux.2* crypto RegExp > red2: remarkably low-key "increment", given that several HTTP requests and some packaging code have been executed since redux.1
TuneTracker_REDUX-2845-debug-30-03-2016-19-12-22.report (82.3 KB ) - added by ttcoder 4 years ago.
Actually reproducible with an hrev49600+ build transferred to 50095, and guarded heap

Download all attachments as: .zip

Change History (11)

by ttcoder, 4 years ago

Attachment: cc-crypto-crash.report added

Incomplete backtrace, mentions "CRYPTO_add_lock"

comment:1 by pulkomandy, 4 years ago

I think you mean netresolv, which is part of libnetwork and used for DNS resolution.

The "crypto" things are from OpenSSL, and not directly related to netresolv (in fact, the crypto functions are not related to network at all).

It is hard to extract meaningful information from your "backtrace" which shows only one entry. If you get the issue only on teardown, it could be that some BUrlRequests are not closed properly (you can't just destroy then, you need to Stop() them first).

by ttcoder, 4 years ago

BCertificate dtor crash

comment:2 by ttcoder, 4 years ago

Digging for more reports, I only found that one -- luckily it's the one I wanted, as it's most representative: from memory, most of Dane's crashes are in the BCertificate dtor indeed.

This is really Dane's ticket not mine, I don't have much of a dog in this fight as I'm mostly unable to reproduce this crash :-j .. But with the information you gave me pulkomandy I have renewed motivation to try out new things in the next build I'll send him:

  • I'll take your advice and review my code for usage of Stop() in BUrlRequest/BHttpRequest, I didn't know it was necessary to call it before delete'ing, gotta fix that
  • I'll check out the sources for the B* classes (especially BUrlContext) for ownership transfer and the like. Currently I assume I remain owner for all objects and delete them all myself in TtsUpdaterWin
  • I might also use leak_checker.sh/malloc-debug, IIRC it also reports double-delete cases and lots more
Last edited 4 years ago by ttcoder (previous) (diff)

comment:3 by pulkomandy, 4 years ago

I tried to be consistent in the design of the network API, so methods called Set* don't take ownership, while methods called Adopt* do.

In the case of BCertificate, do you manipulate those diretly? I think the only use is the CertificateVerificationFailed in HTTPS requests?

by ttcoder, 4 years ago

leak_analyser.sh redux.1* crypto RegExp > red1 : a few hundred bytes of leaks, probably not significant

by ttcoder, 4 years ago

leak_analyser.sh redux.2* crypto RegExp > red2: remarkably low-key "increment", given that several HTTP requests and some packaging code have been executed since redux.1

comment:4 by ttcoder, 4 years ago

A bit of 'closure' (AFAIConcerned) on this..

Not much to report from my leak_check.sh run (attached them anyway in case there's interest). The net code does not seem to leak much -- in fact the hundreds of bytes mentionned might even not be a recurring leak but a "static" one, since some of them mention "_res", which IIRC is a static

The relevant info though is this: if my code had done a double-free or some such, it would have popped up the debugger at runtime, but my program always runs "clean", so no heap corruption I guess

*

A small request while we're on the subject -- UrlRequest.h:31 triggers an "unused variable 'timeout' " gcc warning (yes I enable hardline error reporting :-)

The thing is... In much of my source files this is the /only/ warning there is, and thus it pops up the Pe warning window unnecessarily.. Would be nice if the method signature was changed from "bigtime_t timeout" to just "bigtime_t" :-) The rest of the Haiku headers never trigger this warning, so makes sense to be consistent :-)

*

Regarding Stop(),

Checked the source of the class I use and Stop() is in fact automatically called in the dtor:

http://code.metager.de/source/xref/haiku/src/kits/network/libnetapi/HttpRequest.cpp#145

So I assume it would not help if I call Stop() by hand just before deleting the object

*

I do not deal with BCertificate directly at all.. Just the top level classes:

A quick outline of my code just in case:

	class RetrieverFSM
	{
		........
		
		BUrlContext context;
		BHttpAuthentication authentication;
		
		BUrl url;
		BHttpRequest httpReq;
	};

void RetrieverFSM::SetFromSettings()
{
	authentication.SetUserName( .... );
	authentication.SetPassword( .... );
	...
}

void RetrieverFSM::DownloadNextFile()
{
	...
	url.SetUrlString( fullpath );
	httpReq.SetUrl( url );
	
	listener.Reset();
	httpReq.SetListener( &listener );
	
	context.AddAuthentication( url, authentication );
	httpReq.SetContext( &context );
	
	httpReq.Run();
}

*

Anyway..

I'm leaning toward a "race condition" of some sort, maybe in my code, so Dane will have to provide more info or reproducible steps, so that I have a chance to extract a reproducible case from my 2'000 lines updater :-)

Ball's in his court now

by ttcoder, 4 years ago

Actually reproducible with an hrev49600+ build transferred to 50095, and guarded heap

comment:5 by ttcoder, 4 years ago

Seems this is actually reproducible for me in fact, by going through a few hoops (didn't yet try to simplify the process and compile directly on this machine).

The Debugger still does not signal any double-free or the such though, it just mentions an abnormal memory allocation in the BUrlContext/Certificate's BList.

The syslog is interesting:

KERN: malloc() of 2684354560 bytes asked
KERN: bfs: bfs_create_symlink:1020: File or Directory already exists
KERN: bfs: bfs_open_dir:1647: Not a directory
KERN: malloc() of 2684354560 bytes asked <----- note: that's hex value 0xa0000000 it seems
KERN: bfs: bfs_create_symlink:1020: File or Directory already exists
KERN: bfs: bfs_open_dir:1647: Not a directory
KERN: malloc() of 2684354560 bytes asked
KERN: bfs: bfs_create_symlink:1020: File or Directory already exists
KERN: bfs: bfs_open_dir:1647: Not a directory
KERN: malloc() of 2684354560 bytes asked
KERN: bfs: bfs_create_symlink:1020: File or Directory already exists
KERN: slab memory manager: created area 0xf1001000 (11527)
(...)
KERN: low resource memory: normal -> note
KERN: low resource memory: note -> critical
KERN: 0xdf8ff870->VMAnonymousCache::_Commit(1342185472): Failed to reserve 1342074880 bytes of RAM
KERN: low resource memory: critical -> note
KERN: 3101: DEBUGGER: failed to create area for allocation of 327681 pages
KERN: debug_server: Thread 3101 entered the debugger: Debugger call: `failed to create area for allocation of 327681 pages'
KERN: low resource memory: note -> warning
KERN: stack trace, current PC 0x6090e114  commpage_syscall + 0x4:
KERN:   (0x7121abd8)  0x70ca25  panic__FPCce + 0x45
KERN:   (0x7121b008)  0x70d363  guarded_heap_allocate_with_area__FUlUl + 0x93

EDIT:

Results: when running a build that's been compiled on that same hrev (rather than compiled on an older hrev) I'm unable to get the crash, the updater is rock solid. That's trying with both libroot.so and libroot_debug.so

That is not a proof in itself that the crash is due to an ABI change/issue, I guess the inability to reproduce the crash in this case could be due to a number of things; but that does prevent me from building a minimal reproducible case

Here's hoping we're able to switch all our hrevs to newer ones (pending commit of the two-liner in #12373 which seems to be excruciatingly difficult to do for some reason :-) so as to work around this issue.


Well d'uh ! Turns out the sizeof() of BUrlContext grew by 32 bytes in hrev49790, straddling the hrev I use to build and the hrev that Dane is now using.

So of course, code compiled against the older header would allocate the next object 32 bytes 'earlier' than it should, clobbering BUrlContext's fCertificates member to kingdom come

Will install a heisenrev on my main machine and build there, and send the build to Dane, should fix it; will report back

P.S. Is it relevant/possible to add FBC padding to BUrlContext ..etc classes? Their headers are public in headers/os/net so application writers (beside me) might start using them and rely on their sizeof() remaining constant

EDIT: was about to post a message suggesting to add FBC to all concerned classes, but realized no amount of (reasonable) padding would have balanced out for a full 32 bytes size growth... Working on another way to protect ourselves from future FBC occurences

Last edited 4 years ago by ttcoder (previous) (diff)

comment:6 by axeld, 3 years ago

Owner: changed from axeld to nobody
Status: newassigned
Note: See TracTickets for help on using tickets.