Opened 12 years ago

Last modified 15 months ago

#9458 assigned bug

Memory leak in BMediaFile -> BMediaTrack

Reported by: Robert S. Owned by: leavengood
Priority: normal Milestone: R1
Component: Kits/Media Kit Version: R1/Development
Keywords: BMediaFile BMediaTrack memory leak Cc: ttcoder
Blocked By: Blocking: #18160
Platform: All

Description

I found a memory leak in BMediaFile -> MediaTrack. If you create a MediaTrack from BMediaFile, than release it again, close the media file and delete the media file object it is still some memory alloceted. I wrote a short test app (paladin project) to demonstrate the memory leak/bug, have a look to the method "DebugClass::TestMethod()" in the test app.

Attachments (1)

DebugApp.zip (13.8 KB ) - added by Robert S. 12 years ago.

Download all attachments as: .zip

Change History (23)

by Robert S., 12 years ago

Attachment: DebugApp.zip added

comment:1 by ttcoder, 11 years ago

Just noticed your ticket after filing #9945 Robert.. Wondering, what metric do you use to know that memory is being leaked (not released) ? And does the leak still occur if you upgrade to hrev46028 or newer ? Seems that fix is in the "Reader" code rather than "Writer" code, but might as well try.

comment:2 by pulkomandy, 10 years ago

Still leaking in hrev48547.

comment:3 by waddlesplash, 10 years ago

Platform: x86All
Priority: highnormal

comment:4 by axeld, 8 years ago

Owner: changed from axeld to nobody
Status: newassigned

comment:5 by jackburton, 6 years ago

Could be related to the use of realloc() in BMediaFile::CreateTrack() (see #14329 ) or, in general, on how we allocate the BMediaTrack array in BMediaFile. ?

comment:6 by leavengood, 5 years ago

Owner: changed from nobody to leavengood

There is definitely something terrible going on here.

comment:7 by ttcoder, 5 years ago

Cc: ttcoder added

comment:8 by leavengood, 5 years ago

To provide some update, the TestMethod in this DebugApp is the following:

void
DebugClass::TestMethod() {
	do {
		fData = new BMallocIO();
		fMediaFile = new BMediaFile(fData, &fMediaFileFormat);
		fMediaTrack = fMediaFile->CreateTrack(&fInputFormat, &fCodec);
		fMediaFile->CommitHeader();
		fMediaTrack->SetQuality(1.0);
		
		fMediaFile->ReleaseAllTracks();
		fMediaFile->CloseFile();
		delete fMediaFile;
		delete fData;
		snooze(1000);
	}while(true);
}

snooze take microseconds, so this only "pauses" for 1 millisecond, and if you run this code as is, it will quickly crash Haiku, at least it did for me in a VM with 2 GB of RAM.

Switching to 1000000 (1 second) for the snooze helps, but each iteration adds to memory usage by quite a bit.

fMediaFileFormat and fInputFormat are set up to be raw video in avi files with a mpeg4 codec.

At the suggestion of waddlesplash I added code to libroot to log memory allocations including the image and symbol of the caller, and I ran that with this DebugApp for a few seconds. The amount of data is hard to describe, but it is a lot (many MBs of text.) I think I ran this against the version with the short sleep and that is probably why I get so much data. I just did a quick look and I think there are 109 iterations of this loop in my test run, and the resulting "log" of the output for memory usage is a 14 MB text file.

I wrote a tool to analyze this data but there is so much memory left "hanging" when you Ctrl-C this app that it is hard to tell what the leaks are. I will try running my memory tracker again with a version of this which only does a few loops and then exits, so I get less memory usage left which is not truly leaks. Now that I say this it sounds obvious but I was getting tired on Saturday night when I was looking at this :)

A lot of stuff happens behind the scenes when you set up a BMediaFile and a BMediaTrack, including loading the needed plugins, which in this case is ffmpeg. It is also unloaded when those are deleted, so each iteration of the above loop loads, then unloads ffmpeg. Every time you load ffmpeg, it loads the gnutls library, and that seems to have a lot of set up, and I see many, many calls to asn1_perror and asn1_find_node from the libtasn1 library, which I assume is due to some bugs in our Haiku port of this library. Or maybe not, it is hard to tell, but asn1_perror is supposed to log an error to stderr but I don't see anything logged. And it might be fine that these do a lot of small allocations when gnutls is set up. Overall this should be looked into, but is certainly not the core problem here.

Running this code in a loop like this is certainly a pathological case, but I am still shocked how much happens behind the scenes.

What also might be important here is that this code does not use the BMediaRoster, and any leaking it does might be from the assumption that BMediaRoster will always be there to clean up when a program is done using the Media Kit. Again, I will know more when I find the source of leaks here, which hopefully I will. But I am not exaggerating when I say each iteration of this loop probably has many 1000s of allocations, so it is tricky to sort through.

As one final concern, the leak I find here, if any, might not be the one generally plaguing us for normal media playback, though I sure hope it is the same...

comment:9 by pulkomandy, 5 years ago

As far as I know, the decoding/encoding part of the media quit is largely independant from the other things, so BMediaRoster shouldn't be needed (nor do you need the media_server) for this.

To rule out ffmpeg, blacklist the ffmpeg plugin and make tests with a simpler one, however I don't know if we have a plugin that supports writing without using ffmpeg yet. Something writing raw audio to an au, wav or aiff file shouldn't be too hard. Then we will know if we need to closely review the BMediaFile code, or rather dig into the ffmpeg plugin and library.

comment:10 by leavengood, 5 years ago

Yes, I have been considering that approach to reduce what I have to look at. Thanks for the pointer on potential file formats which don't use ffmpeg. It should not be hard to adjust this code for that.

in reply to:  8 comment:11 by ttcoder, 5 years ago

Cc: mmlr added

Replying to leavengood:

I wrote a tool to analyze this data but there is so much memory left "hanging" when you Ctrl-C

Ah, Ctrl-C.... This might be the reason that leak_analyser.sh has not worked for you so far; @mmlr would know for sure (Cc'ing him), but interrupting the program this way might behave the same as if it crashed, as far as the guarded heap's output is concerned. And so it will appear as if all resources have been freed "correctly" (as discussed off ticket), with no leak analysis performed at all.

this app that it is hard to tell what the leaks are. I will try running my memory tracker again with a version of this which only does a few loops and then exits, so I get less memory usage left which is not truly leaks.

Right, replacing the infinite loop with a simple for() loop on a few iterations might allow the guarded heap's "epilogue" functions to be called properly, and output results so that leak_analyser.sh can be brought to bear and pinpoint leaks directly. This might end up being a significant shortcut from the other methods (patching libroot and so on).

A lot of stuff happens behind the scenes when you set up a BMediaFile and a BMediaTrack, including loading the needed plugins, which in this case is ffmpeg. It is also unloaded when those are deleted, so each iteration of the above loop loads, then unloads ffmpeg.

Which is probably going to construct/destruct lots of heavy objects (might even include the AddOnManager ..etc).

Running this code in a loop like this is certainly a pathological case, but I am still shocked how much happens behind the scenes.

Well, MediaPlayer (or SoundPlay) by design is going to create a BMediaFile/Track for each mp3 it wants to play (or keep the object around but call its SetTo() method for each file, which amounts to the same thing) so this looks good to me, seems you found a shortcut test for simulating a recording (or playback) test in an accelerated fashion :-)

What also might be important here is that this code does not use the BMediaRoster, and any leaking it does might be from the assumption that BMediaRoster will always be there to clean up when a program is done using the Media Kit.

IIRC the roster handles only the BMediaNode side of things (cleans up time sources related to BSoundPlayer etc), but BMediaFile is on the other side of the "fence" so to speak, compared to BSoundPlayer.

Again, I will know more when I find the source of leaks here, which hopefully I will. But I am not exaggerating when I say each iteration of this loop probably has many 1000s of allocations, so it is tricky to sort through.

If you end up using leak_analyser.sh (and maybe a short for() loop is all it will take to allow for it) you'll see things are much easier :-) though it still will give you a headache or ten going through the (reduced) output. Back when I had done that the results hinted at AddonManager and stuff, but YMMV, we'll see.

comment:12 by ttcoder, 5 years ago

Cc: mmlr removed

comment:13 by pulkomandy, 4 years ago

So, for reference here is the leak analyzer output:

allocation: base: 0xd62dc80; size: 892; thread: 1685; alignment: 64
        <libroot.so> panic(char const *,...) + 0x4206 (nearest)
        <libroot.so> panic(char const *,...) + 0x46d9 (nearest)
        <libroot.so> panic(char const *,...) + 0x5bab (nearest)
        <libroot.so> memalign + 0x29
        <libroot.so> posix_memalign + 0x5e
        0x172636e (lookup failed: Invalid Argument)
        0x1b259d4 (lookup failed: Invalid Argument)
        <libmedia.so> BPrivate::media::MediaWriter::CreateEncoder(BPrivate::media::Encoder **, media_codec_info const *, media_format *, unsigned long) + 0x9d
        <libmedia.so> BMediaTrack::BMediaTrack(BPrivate::media::MediaWriter *, long, media_format *, media_codec_info const *) + 0xbd
        <libmedia.so> BMediaFile::CreateTrack(media_format *, media_codec_info const *, unsigned long) + 0xb5
        <main> DebugClass::TestMethod(void) + 0x102
        <main> main + 0x6c
        <main> _start + 0x5b
        0x21037f0 (lookup failed: Invalid Argument)
        0x655e8258 (lookup failed: Invalid Argument)

Unfortunately, the interesting part of it is missing, because by the time the leak analysis is done, the ffmpeg plugin is already unloaded. And so we get the two "lookup failed" entries in the middle of the backtrace...

The posix_memalign is probably called from inside av_malloc. But I didn't spot any obvious place where we forgot to free something.

comment:14 by ttcoder, 4 years ago

Thanks for explaining the reason for the "panic/nearest" lines, it always gnawed at me not understanding why those were missing, makes sense now :-). Looks like hrev54495 helps with this. One more (last ?) leak fixed.

comment:15 by pulkomandy, 4 years ago

Well this change only helps in an error case, I don't think it will make the situation actually better.

For the test program to actually work properly you need to set a colorspace. It's trying to get one from BScreen but it can't because there is no BApplication. I just changed the constructor for fColor_space = B_RGB32;

My explanation was for these two lines:

        0x172636e (lookup failed: Invalid Argument)
        0x1b259d4 (lookup failed: Invalid Argument)

I don't know about the "nearest" ones, but it doesn't really matter, these are inside the memory allocator and it's not really important which path is used there.

I tried to debug with Debugger (setting a breakpoint in posix_memalign) as well but it seems even there I can't reliably get a backtrace inside ffmpeg, making it harder to track where allocations happen.

comment:16 by jackburton, 2 years ago

Blocking: 18160 added

comment:17 by pulkomandy, 2 years ago

I am on a 64 bit install now so ffmpeg backtraces seem to be working. And the leak reproduces here as well.

I have made a simple test consisting of one single loop of the provided test program, and I have added manual dumps of the heap before and after, using the heap_debug_dump_allocations function.

This way it's easy to see the leaked memory, there are 8 differences:

  • Two allocations of 104 bytes
  • Two allocations of 40 bytes
  • Three allocations of 1064 bytes
  • One allocation of 10792 bytes

After each repeptition, the allocations of 104 and 40 bytes do not reappear, but the other ones do:

  • Three allocations of 1064 bytes
  • One allocation of 10792 bytes

So, this is our leak. Now to identify which structure or buffer these sizes correspond to.

I have set a breakpoint in mosix_memalign and I am checking the rdx register to compare it to the size of these leaked allocations. I found the 10792 bytes one is from _Setup in AVCodecEncoder, which always re-inits a new ffmpeg context without releasing the previous one. But it can be called multiple times, for example in SetQuality. So I have simply added code to free that context at the start of the function and that leak is gone.

I found also a possible set of smaller leaks caused by dynamically loadd libraries, but I am not sure if it is a big problem or not. For example, libgomp has a pthread_attr_init and I'm not sure if the corresponding pthread_attr_destroy is being called. I'll have to check more closely after I have fixed the larger leaks.

comment:18 by jackburton, 2 years ago

That's great work!

comment:19 by pulkomandy, 2 years ago

According to https://stackoverflow.com/questions/43389411/how-does-one-correctly-avoid-the-avcodec-alloc-context3-leak-from-avformat-new-s it is somewhat normal to have leaks in our situation.

We need to build ffmpeg with a few defines to disable deprecated fields, after we make sure we don't use them. The way this works in ffmpeg is:

  • By default, the deprecated fields are enabled, with a compiler warning (our ffmpeg add-on triggers several of these). The code does its best to populate these fields to keep existing apps working
  • Once the app code has been adjusted, ffmpeg can be recompiled with defines to completely disable the deprecated fields and not populate them (and not define them, so instead of a warning, we get a compiler error becase the field simply does not exist)

Currently our code in the ffmpeg plug-in triggers several of the deprecation warnings. I will solve these. In particular the 1064 bytes one might be exactly the one referred to in that StackOverflow discussion (at least, the size matches).

Once the warnings are solved we can see about adjusting our ffmpeg compilation to remove all these deprecated fields and hopefully it will not break too many other things. Or we can just wait until the next ffmpeg version (already released, just need to package it at haikudepot and upgrade our decoder), which will remove all the deprecated things from previous versions (and add new ones).

comment:20 by jackburton, 2 years ago

I guess we'll need to review very carefully any ffmpeg upgrade, then. At least we should never upgrade it before a release.

comment:21 by pulkomandy, 2 years ago

This bug is pre-existing and has been there for a while. But some of the leaks were (and maybe still are) in ffmpeg with not much we can do about it.

I put a change on Gerrit to enable compiler errors instead of warnings for the deprecated function, let's see if this can work with gcc2 as well, then at least we can make sure to not use deprecated functions.

in reply to:  21 comment:22 by jackburton, 2 years ago

Replying to pulkomandy:

This bug is pre-existing and has been there for a while. But some of the leaks were (and maybe still are) in ffmpeg with not much we can do about it.

I know, I meant from now on.

I put a change on Gerrit to enable compiler errors instead of warnings for the deprecated function, let's see if this can work with gcc2 as well, then at least we can make sure to not use deprecated functions.

Yeah, I guess that's a good start. Nice work, btw

Note: See TracTickets for help on using tickets.