Opened 10 years ago

Last modified 6 weeks ago

#4954 assigned bug

Proliferation of BBuffer instances when playing sound

Reported by: Adek336 Owned by: nobody
Priority: normal Milestone: R1
Component: Kits/Media Kit Version: R1/alpha1
Keywords: Cc: ttcoder
Blocked By: #14755 Blocking:
Has a Patch: no Platform: All

Description

I've made _shared_buffer_list::RecycleBuffer print info[i].id in each iteration of the loop and started media_server in the Terminal to watch the output. I found out that each time I executed the sample program from #4920, the number of buffer ids printed grew by three, which most likely should not be the case. Restarting media_server reset the number of buffers.

Also, after changing sleep(10) to usleep(50000) and running

while true; do ./the-sample-program; done

AboutSystem showed the memory usage increase over time. I also got a media_server crash after a while, and another time a KDL.

Although running

while true; do ls; done

also makes the memory usage increase over time, it doesn't induce crashes and doesn't make the number of alive BBuffer instances grow.

Change History (27)

comment:1 Changed 10 years ago by Adek336

Memory is not recovered after media_server crash.

comment:2 Changed 9 years ago by axeld

Owner: changed from marcusoverhagen to axeld
Status: newassigned

The ID is system wide, and will always increase with every buffer - so that part is normal.

However, the memory should be freed in any case.

comment:3 Changed 9 years ago by axeld

Looking into it a bit more, and it looks like the whole buffer management in the media server is very inefficient, and currently doesn't really allow to delete buffers when they are no longer needed.

I'm afraid this will need a complete rewrite of that management, and will take some time.

comment:4 Changed 9 years ago by axeld

On the plus side, the inefficiencies could also very well causing occasional sound problems (so another one of those causes would be down).

comment:5 Changed 4 years ago by ttcoder

Can an admin "Cc" me so that this ticket appears on my query/dashboard thanks

comment:6 Changed 4 years ago by waddlesplash

Cc: ttcoder added

comment:7 Changed 4 years ago by pulkomandy

Possibly fixed in hrev49035. Please test and report.

comment:8 Changed 4 years ago by Barrett

I've continued my work on BBufferGroup and BBuffer in a branch :

https://github.com/Barrett17/haiku/commits/bbuffergroup_refactor

This is an initial cleanup, i've tried to improve efficiency and it's solving the possible problem of adding the same buffer id multiple times in a group by checking it. While the final aim would be to solve the BBuffer proliferation (by making the reference count system to work) i think it's better to submit this first part instead to cumulate commits.

comment:9 Changed 4 years ago by Barrett

Owner: changed from axeld to Barrett
Status: in-progressassigned

comment:10 Changed 4 years ago by Barrett

I think this issue is just a duplicate of #4021, so i'm going to close it. Actually the behavior is OK, the problem relates not only BBuffers but also other things that aren't freed by the media_server. There's probably a mixture of two different issues, but since it was in-progress maybe axel has done some investigation before and in that case i would like to hear :-)

comment:11 Changed 3 years ago by Barrett

Resolution: not reproducible
Status: assignedclosed

Not reproducible in any way for me, a lot changed in the meantime. In case someone can reproduce it just let me know.

comment:12 Changed 5 months ago by Barrett

Resolution: not reproducible
Status: closedreopened

comment:13 Changed 5 months ago by Barrett

Blocked By: 14755 added

comment:14 Changed 2 months ago by leavengood

I need to test but wonder if the media_server just doesn't get notified by the roster when the sample app (and similar command-line apps) quit? Therefore it never cleans up resources from that app. Though the test code from #4920 has a BApplication so should interact with the roster. The only way to know for sure is add some logging to the media_server.

But nonetheless I think for crashes at least we need a cleanup thread which periodically goes through the resources in media_server and removes things for teams which are no longer running.

I do not know if there is a cleaner method. If anyone has suggestions let me know.

I may also run BeOS and see what threads are in their media_server to see if there is an obvious cleanup one.

I think this is also the cause of #14755.

Also as mentioned buffer management needs work.

comment:15 Changed 2 months ago by leavengood

I just booted up a BeOS R5 VM. For what it is worth, the BeOS media_server has a thread called "dirty_work", which is a clever thread name, as usual, and sure sounds like a resource cleanup thread to me.

That is the only thread besides the media_server one in the actual media_server. The media_addon_server seems to have everything else.

I suppose I am doing research that others already did many years ago.

Yet I wonder why no one worked on a cleanup thread yet. I guess lack of time or interest.

comment:16 Changed 2 months ago by leavengood

OK, it seems we did have a cleanup thread before, cleverly called "big brother is watching you". But it was removed by Dario in January 2016. Why?

I think you broke the behavior with this Dario:

https://github.com/haiku/haiku/commit/ec02769a58bb58b4bb20394d6f8107a3c53de15f#diff-04c52546aa8723768d7946189d7660b2

I don't think you can count on the roster if apps crash.

Maybe ping/pong is annoying, but I think we still need something which cleans up teams when they stop, without counting on the roster to tell us.

comment:17 Changed 2 months ago by tqh

I disagree, I think this is the right thing to do. Show that this isn't working before naming and shaming. It would be nice to leave that behind already.

Second, having timer based things are not good for laptops. We are trying to remove them everywhere, and if Roster does it job correctly we should not need it. I just removed another place were we relied on a timer thread keep the CPU from going into sleep. And the network stack was fixed to not poll at 1000Hz. It's better to make the current solution robust and be able to use our laptops a full day..

comment:18 Changed 2 months ago by Barrett

Ryan, I told you in private that I'd have helped you fixing the media kit bugs. I told you that even if you behaved so badly with the contract, I'd have leaved everything behind. I'm behaving like a perfect christian even if I'm atheist since I remember being in this planet. Either you stop treating me like shit, or I'm done.

comment:19 Changed 2 months ago by leavengood

It seems to be a regression. I will prove it, one way or the other. Not to prove anyone right or wrong, just so we have some solid understand of what is going wrong. I do not think it is the only source of buffer management issues.

If the Roster cannot be relied upon then we can use _start_watching_system. Though obviously making sure the Roster works right is the best approach.

I would wonder if maybe no one has retested #4021, which is an old bug so cannot be blamed on a recent change. But then we also have #14755, which is only 3 months old.

Last edited 2 months ago by leavengood (previous) (diff)

comment:20 Changed 2 months ago by Barrett

Now, politely let's reply on the technical part.

Before all, I don't know how you can believe that a ticket opened 9 (!?!?) years ago can be caused by a change done 3 years ago.

Second, investigating the media_server is not going to give you any clue, because we have a completely different implementation in about all aspects except public API.

Third, regressions can always happen. But you don't have to blame the people who did the work, you are not god, you are not going to make always perfect changes.

comment:21 Changed 2 months ago by tqh

Lets spend the energy on finding what is causing this, and see how we can fix this.

comment:22 Changed 2 months ago by leavengood

Many, many times in my career I have asked coworkers why they made a change, and that it may have broken something. That is totally reasonable, and most people don't freak out about it. I think you both are overreacting.

Did I call Dario names? Did I claim he was stupid? No, I asked about a change, and that I think it might have broken something. Why? To understand why the change was made.

Here I am, willing to spend time to work on Media Kit, when no one but Dario will, and you both attack me. Please stop the negativity.

Also I think we need fresh eyes on this, so let me look with those fresh eyes. I may see something other people don't. And that may include investigating the BeOS media_server a bit.

comment:23 Changed 2 months ago by Barrett

Ryan, no one is overreacting. It's unrespectful to behave this way, it has been already said.

_start_watching_system is already used by the registrar so you are not going to solve the issue. And I think this syscall is going to be quite heavy, so not a good idea anyway.

Have fun looking at how the BeOS media_server works ;), I gave you an advice, you can choose if to accept it or not. The question is, do you want to accept my advices on where the issue may be or not?

comment:24 in reply to:  22 Changed 2 months ago by Barrett

Replying to leavengood:

Many, many times in my career I have asked coworkers why they made a change, and that it may have broken something. That is totally reasonable, and most people don't freak out about it. I think you both are overreacting.

I require some due diligence when developing. Maybe it's a too high level for you, but checking the ticket date is a minimum.

However now I realize that you are continuing to trust "someone" else position on that.

comment:25 Changed 2 months ago by waddlesplash

_start_watching_system is already used by the registrar so you are not going to solve the issue.

Yes, it was added by tqh rather recently in hrev52796. A cursory look reveals that it should likely be sending notifications for this, but that code is behind quite a lot of branches and I didn't test it just now.

And I think this syscall is going to be quite heavy, so not a good idea anyway.

This only shows you do not know what you are talking about here. The syscall implementation simply calls SystemNotificationService::StartListening, which is not very "heavy" at all; it just creates a listener and inserts it into a doubly-linked list. In fact:

[  1908] _kern_start_watching_system(0xffffffff, 0x1f, 0xf7, 0x0) = 0x00000000 No error (3 us)

That's right, 3 microseconds! This is even faster than a write() to stdout (5-6 us, depending on length.)

And of course one only need to call it once; after that you simply receive messages about whatever events occurred to the port specified, same as with a BRoster watcher.

Actually it is extremely funny that you write this, for in your very next comment:

I require some due diligence when developing. Maybe it's a too high level for you, but

And of course you did none of this due diligence yourself when speaking of _start_watching_system.

However now I realize that you are continuing to trust "someone" else position on that.

And there you go with the conspiracy theories again...

comment:26 in reply to:  25 Changed 2 months ago by Barrett

Replying to waddlesplash:

_start_watching_system is already used by the registrar so you are not going to solve the issue.

Yes, it was added by tqh rather recently in hrev52796. A cursory look reveals that it should likely be sending notifications for this, but that code is behind quite a lot of branches and I didn't test it just now.

Yeah please lose time on stuff that I already checked! This is teamwork! :-D

Actually it is extremely funny that you write this, for in your very next comment:

Actually it is very funny seeing you red hot because I'm still there. Yesterday I've been in bed when replying...very asleep, and for some reason I thought that syscall would have subscribed automatically to a set of events. That's why I said that. Of course it doesn't, but this doesn't change the outcome of the argument at all.

Except are you implying that the Haiku design is overall bad, and so the registrar could be replaced by this syscall?

And there you go with the conspiracy theories again...

Can you please stop annoying me? :)

Last edited 2 months ago by Barrett (previous) (diff)

comment:27 Changed 6 weeks ago by korli

Owner: changed from Barrett to nobody
Status: reopenedassigned
Note: See TracTickets for help on using tickets.