Context Navigation

#4954 closed bug (fixed)

Proliferation of BBuffer instances when playing sound

Reported by:	Adek336	Owned by:	leavengood
Priority:	normal	Milestone:	R1/beta2
Component:	Kits/Media Kit	Version:	R1/alpha1
Keywords:		Cc:	ttcoder
Blocked By:		Blocking:	#14755
Platform:	All

Description

I've made _shared_buffer_list::RecycleBuffer print info[i].id in each iteration of the loop and started media_server in the Terminal to watch the output. I found out that each time I executed the sample program from #4920, the number of buffer ids printed grew by three, which most likely should not be the case. Restarting media_server reset the number of buffers.

Also, after changing sleep(10) to usleep(50000) and running

while true; do ./the-sample-program; done

AboutSystem showed the memory usage increase over time. I also got a media_server crash after a while, and another time a KDL.

Although running

while true; do ls; done

also makes the memory usage increase over time, it doesn't induce crashes and doesn't make the number of alive BBuffer instances grow.

Change History (32)

comment:1 by Adek336, 15 years ago

Memory is not recovered after media_server crash.

comment:2 by axeld, 15 years ago

Owner:	changed from marcusoverhagen to axeld
Status:	new → assigned

The ID is system wide, and will always increase with every buffer - so that part is normal.

However, the memory should be freed in any case.

comment:3 by axeld, 15 years ago

Looking into it a bit more, and it looks like the whole buffer management in the media server is very inefficient, and currently doesn't really allow to delete buffers when they are no longer needed.

I'm afraid this will need a complete rewrite of that management, and will take some time.

comment:4 by axeld, 15 years ago

On the plus side, the inefficiencies could also very well causing occasional sound problems (so another one of those causes would be down).

comment:5 by ttcoder, 10 years ago

Can an admin "Cc" me so that this ticket appears on my query/dashboard thanks

comment:6 by waddlesplash, 10 years ago

Cc:	ttcoder added

comment:7 by pulkomandy, 10 years ago

Possibly fixed in hrev49035. Please test and report.

comment:8 by Barrett, 9 years ago

I've continued my work on BBufferGroup and BBuffer in a branch :

https://github.com/Barrett17/haiku/commits/bbuffergroup_refactor

This is an initial cleanup, i've tried to improve efficiency and it's solving the possible problem of adding the same buffer id multiple times in a group by checking it. While the final aim would be to solve the BBuffer proliferation (by making the reference count system to work) i think it's better to submit this first part instead to cumulate commits.

comment:9 by Barrett, 9 years ago

Owner:	changed from axeld to Barrett
Status:	in-progress → assigned

comment:10 by Barrett, 9 years ago

I think this issue is just a duplicate of #4021, so i'm going to close it. Actually the behavior is OK, the problem relates not only BBuffers but also other things that aren't freed by the media_server. There's probably a mixture of two different issues, but since it was in-progress maybe axel has done some investigation before and in that case i would like to hear :-)

comment:11 by Barrett, 8 years ago

Resolution:	→ not reproducible
Status:	assigned → closed

Not reproducible in any way for me, a lot changed in the meantime. In case someone can reproduce it just let me know.

comment:12 by Barrett, 6 years ago

Resolution:	not reproducible
Status:	closed → reopened

comment:13 by Barrett, 6 years ago

Blocked By:	14755 added

comment:14 by leavengood, 6 years ago

I need to test but wonder if the media_server just doesn't get notified by the roster when the sample app (and similar command-line apps) quit? Therefore it never cleans up resources from that app. Though the test code from #4920 has a BApplication so should interact with the roster. The only way to know for sure is add some logging to the media_server.

But nonetheless I think for crashes at least we need a cleanup thread which periodically goes through the resources in media_server and removes things for teams which are no longer running.

I do not know if there is a cleaner method. If anyone has suggestions let me know.

I may also run BeOS and see what threads are in their media_server to see if there is an obvious cleanup one.

I think this is also the cause of #14755.

Also as mentioned buffer management needs work.

comment:15 by leavengood, 6 years ago

I just booted up a BeOS R5 VM. For what it is worth, the BeOS media_server has a thread called "dirty_work", which is a clever thread name, as usual, and sure sounds like a resource cleanup thread to me.

That is the only thread besides the media_server one in the actual media_server. The media_addon_server seems to have everything else.

I suppose I am doing research that others already did many years ago.

Yet I wonder why no one worked on a cleanup thread yet. I guess lack of time or interest.

comment:16 by leavengood, 6 years ago

OK, it seems we did have a cleanup thread before, cleverly called "big brother is watching you". But it was removed by Dario in January 2016. Why?

I think you broke the behavior with this Dario:

https://github.com/haiku/haiku/commit/ec02769a58bb58b4bb20394d6f8107a3c53de15f#diff-04c52546aa8723768d7946189d7660b2

I don't think you can count on the roster if apps crash.

Maybe ping/pong is annoying, but I think we still need something which cleans up teams when they stop, without counting on the roster to tell us.

comment:17 by tqh, 6 years ago

I disagree, I think this is the right thing to do. Show that this isn't working before naming and shaming. It would be nice to leave that behind already.

Second, having timer based things are not good for laptops. We are trying to remove them everywhere, and if Roster does it job correctly we should not need it. I just removed another place were we relied on a timer thread keep the CPU from going into sleep. And the network stack was fixed to not poll at 1000Hz. It's better to make the current solution robust and be able to use our laptops a full day..

comment:18 by Barrett, 6 years ago

Ryan, I told you in private that I'd have helped you fixing the media kit bugs. I told you that even if you behaved so badly with the contract, I'd have leaved everything behind. I'm behaving like a perfect christian even if I'm atheist since I remember being in this planet. Either you stop treating me like shit, or I'm done.

comment:19 by leavengood, 6 years ago

It seems to be a regression. I will prove it, one way or the other. Not to prove anyone right or wrong, just so we have some solid understand of what is going wrong. I do not think it is the only source of buffer management issues.

If the Roster cannot be relied upon then we can use _start_watching_system. Though obviously making sure the Roster works right is the best approach.

I would wonder if maybe no one has retested #4021, which is an old bug so cannot be blamed on a recent change. But then we also have #14755, which is only 3 months old.

Last edited 6 years ago by leavengood (previous) (diff)

comment:20 by Barrett, 6 years ago

Now, politely let's reply on the technical part.

Before all, I don't know how you can believe that a ticket opened 9 (!?!?) years ago can be caused by a change done 3 years ago.

Second, investigating the media_server is not going to give you any clue, because we have a completely different implementation in about all aspects except public API.

Third, regressions can always happen. But you don't have to blame the people who did the work, you are not god, you are not going to make always perfect changes.

comment:21 by tqh, 6 years ago

Lets spend the energy on finding what is causing this, and see how we can fix this.

follow-up: 24 comment:22 by leavengood, 6 years ago

Many, many times in my career I have asked coworkers why they made a change, and that it may have broken something. That is totally reasonable, and most people don't freak out about it. I think you both are overreacting.

Did I call Dario names? Did I claim he was stupid? No, I asked about a change, and that I think it might have broken something. Why? To understand why the change was made.

Here I am, willing to spend time to work on Media Kit, when no one but Dario will, and you both attack me. Please stop the negativity.

Also I think we need fresh eyes on this, so let me look with those fresh eyes. I may see something other people don't. And that may include investigating the BeOS media_server a bit.

comment:23 by Barrett, 6 years ago

Ryan, no one is overreacting. It's unrespectful to behave this way, it has been already said.

_start_watching_system is already used by the registrar so you are not going to solve the issue. And I think this syscall is going to be quite heavy, so not a good idea anyway.

Have fun looking at how the BeOS media_server works ;), I gave you an advice, you can choose if to accept it or not. The question is, do you want to accept my advices on where the issue may be or not?

in reply to: 22 comment:24 by Barrett, 6 years ago

Replying to leavengood:

Many, many times in my career I have asked coworkers why they made a change, and that it may have broken something. That is totally reasonable, and most people don't freak out about it. I think you both are overreacting.

I require some due diligence when developing. Maybe it's a too high level for you, but checking the ticket date is a minimum.

However now I realize that you are continuing to trust "someone" else position on that.

follow-up: 26 comment:25 by waddlesplash, 6 years ago

_start_watching_system is already used by the registrar so you are not going to solve the issue.

Yes, it was added by tqh rather recently in hrev52796. A cursory look reveals that it should likely be sending notifications for this, but that code is behind quite a lot of branches and I didn't test it just now.

And I think this syscall is going to be quite heavy, so not a good idea anyway.

This only shows you do not know what you are talking about here. The syscall implementation simply calls SystemNotificationService::StartListening, which is not very "heavy" at all; it just creates a listener and inserts it into a doubly-linked list. In fact:

[  1908] _kern_start_watching_system(0xffffffff, 0x1f, 0xf7, 0x0) = 0x00000000 No error (3 us)

That's right, 3 microseconds! This is even faster than a write() to stdout (5-6 us, depending on length.)

And of course one only need to call it once; after that you simply receive messages about whatever events occurred to the port specified, same as with a BRoster watcher.

Actually it is extremely funny that you write this, for in your very next comment:

I require some due diligence when developing. Maybe it's a too high level for you, but

And of course you did none of this due diligence yourself when speaking of _start_watching_system.

However now I realize that you are continuing to trust "someone" else position on that.

And there you go with the conspiracy theories again...

in reply to: 25 comment:26 by Barrett, 6 years ago

Replying to waddlesplash:

_start_watching_system is already used by the registrar so you are not going to solve the issue.

Yes, it was added by tqh rather recently in hrev52796. A cursory look reveals that it should likely be sending notifications for this, but that code is behind quite a lot of branches and I didn't test it just now.

Yeah please lose time on stuff that I already checked! This is teamwork! :-D

Actually it is extremely funny that you write this, for in your very next comment:

Actually it is very funny seeing you red hot because I'm still there. Yesterday I've been in bed when replying, and for some reason I thought that syscall I'd have subscribed automatically to a set of events. That's why I said that. Of course it doesn't, but this doesn't change the outcome of the argument at all.

If any, are you implying that the Haiku design is overall bad, and so the registrar could be replaced by this syscall?

And there you go with the conspiracy theories again...

Can you please sto annoying me? :)

Version 0, edited 6 years ago by Barrett (next)

comment:27 by korli, 6 years ago

Owner:	changed from Barrett to nobody
Status:	reopened → assigned

comment:28 by leavengood, 5 years ago

Owner:	changed from nobody to leavengood

I have finally begun investigating this. I am slowly getting a handle on it but obviously it is tricky. We still get 3 new buffers in the media_addon_server every time MediaPlayer is opened, and I assume the same would happen for the sample program (I will try that too.)

I also see an "orphaned" buffer in the media_server each time MediaPlayer starts. I wrote a small script which calls listarea for media_server and media_addon_server and greps for the cloned buffers they each hold and the 1 and 3 get added every time.

I believe a BBufferProducer subclass within the media_addon_server team is creating these buffers whenever a program needs to play sound. The buffers are all part of the same area, which indicates they are probably part of the same buffer group. SoundPlayNode::AllocateBuffers actually creates a BBufferGroup of this size, so it may be the culprit, but I need to do more debugging.

I am not yet sure about the extra buffers in media_server.

comment:29 by leavengood, 5 years ago

I have found the source of this bug and have a fix up at https://review.haiku-os.org/c/haiku/+/1717. In retrospect it seems obvious, but this was the bug no one could track down for a decade. It took me about 8 hours to debug it, but only because I didn't know the Media Kit code at all when I started.

comment:30 by Barrett, 5 years ago

Congrats! You got the right motivation. But next time you could have used the time machine to prevent it ;)

comment:31 by waddlesplash, 5 years ago

Blocked By:	14755 removed
Blocking:	14755 added
Resolution:	→ fixed
Status:	assigned → closed

Fix merged in hrev53379.

comment:32 by nielx, 5 years ago

Milestone:	R1 → R1/beta2

Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone

Note: See TracTickets for help on using tickets.

Download in other formats: