#12286 closed bug (fixed)
Menus seem to be leaking ports
Reported by: | ttcoder | Owned by: | waddlesplash |
---|---|---|---|
Priority: | blocker | Milestone: | R1/beta1 |
Component: | Kits/Application Kit | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
As a follow up to #11342:comment:6 I ended up watching a terminal looping on sysinfo -ports
while doing various things in Haiku. While doing so, I noticed the leak is not specific to W+ or qupzilla, it occurs even when opening menus from the deskbar, terminal ..etc.
The leaked ports seem to be returned to the system once the offending application is quit and relaunched (e.g. hey Deskbar quit
then the three finger salute and then "restart desktop").
Anyone can confirm it's not a quirk happening just on this setup (hrev49480), does it happen for others too ? Is the growing number of used ports a bug ?
Change History (32)
comment:1 by , 9 years ago
comment:3 by , 9 years ago
Running strace
on the testcase from #9576 doesn't show any ports being created when a menu is opened (or closed), and I don't see any ports that look suspicious / unfreed. So this is not a bug in the Interface Kit.
I did a bit of poking around in app_server (the next likely culprit) but found nothing...
comment:4 by , 9 years ago
Hmmm.. Opening a menu/submenu should create a BWindow and its inherited BLooper and its port, does the fMsgPort
member show up in strace's output ?
Anyway just in case, I looked at this recent change in Looper.cpp and Application.cpp but could not find a/the problem
comment:5 by , 9 years ago
BMenu caches (some of) its windows, so if you open a BMenu it won't necessarily create a new BWindow.
comment:6 by , 9 years ago
BMenu caches (some of) its windows, so if you open a BMenu it won't always create a new BWindow.
Right, but in this case, one port is leaked every time you open a menu, even if it's the same menu (from the same menu bar, etc.)
comment:7 by , 9 years ago
It would be useful to look more closely at what is leaked. Is there a thread running for the menu? Which hrev did this appear in first?
comment:8 by , 9 years ago
Well playing with listports
is fairly interesting: it seems the faulty ports are not listed. Maybe there is actually no port "leak" as such, and the kernel is just getting confused, i.e. there is a variable which cashes for efficiency reasons the "size" (length, whatever) of the port list, and that variable gets out of sync with the actual list contents in some circumstances, leading into an "artificial" exhaustion in the end?
To wit:
~/Desktop> sysinfo -ports 3415 ports free (used/max 681 / 4096) ~/Desktop> listport | wc 426 1514 20563
I don't feel like counting how many lines there in are the listports output, but we can say confidently that 426 lines are NOT going to list 681 ports :-) (and indeed a cursory exmination of the output shows nothing out of the ordinary: the app_server has a short listing of ports, Terminal's listing of ports is short and does NOT change much or at all ..etc)
Especially since the listport output does not change (!) even after the ports has been incremented by 5 (in this test I have Qupzilla launched, hence the bigger figures):
~/Desktop> listport | wc 396 1407 19175 ~/Desktop> sysinfo -ports 2641 ports free (used/max 1455 / 4096) ~/Desktop> listport | wc 396 1407 19175 ~/Desktop> sysinfo -ports 2636 ports free (used/max 1460 / 4096)
Or maybe it's the other way around and it's listport
which is broken ?
....
By the way, first saw that when installing a nightly (fresh install) of 49480 in place of a 49172, which could run Qupzilla for a full afternoon without problem, never ran out of ports. Pretty wide range still..
comment:9 by , 9 years ago
sysinfo
just prints a member of a struct retrieved here: http://cgit.haiku-os.org/haiku/tree/src/bin/sysinfo.cpp#n820
listport
is fairly straightforward too: http://cgit.haiku-os.org/haiku/tree/src/bin/listport.c#n80
EDIT: the relative absence of creation/destruction of ports reported by listport
seems consistent with comment:3 . Thus it would be sysinfo
which is at odds with both listport and strace no ?
follow-up: 13 comment:10 by , 9 years ago
Not sure whether this is a good idea (and/or BeOS compatible), but listport
(i.e. get_next_port_info()
) omits closed ports. So, you may get:
$ listport 532 port: total: 4096, used: 176, left: 3920 TEAM 532 (/boot/system/apps/Terminal): ID name capacity queued ---------------------------------------------------- 352 rAppLooperPort 100 0 354 tmp_rport1 1 0 355 tmp_rport2 1 0 357 a<app_server 100 0 358 a<532:x-vnd.Haiku-Terminal 50 0 359 200 0 360 w<app_server 200 1 353 tmp_rport0 1 0
but the kernel debugger command will show them all:
kdebug> ports 532 port id cap read-cnt write-cnt total team name 0xd1223e60 352 100 0 100 4 532 rAppLooperPort 0xd12235a0 353 1 0 1 2247 532 tmp_rport0 0xd1223460 354 1 0 1 0 532 tmp_rport1 0xd12233c0 355 1 0 1 0 532 tmp_rport2 0xd1223280 357 100 0 100 584 532 a<app_server 0xd12231e0 358 50 0 50 594 532 a<532:x-vnd.Haiku-Terminal 0xd12230a0 359 200 0 200 13465 532 0xd1223000 360 200 0 200 3283 532 w<app_server 0xce4d3ab0 385 0 2 198 7 532 mbar cached menu 0xce4d38d0 388 0 1 199 7 532 mbar cached menu 0xce4d3830 391 0 2 198 11 532 mbar cached menu 0xce4d36f0 394 0 2 198 30 532 mbar cached menu 0xce4d3290 397 0 2 198 7 532 mbar cached menu 0xce6883c8 400 0 2 198 6 532 mbar cached menu 0xce4d3470 403 0 2 198 7 532 mbar cached menu 0xce688148 406 0 2 198 7 532 mbar cached menu 0xd1256dd8 409 0 2 198 7 532 mbar cached menu 0xd1256e78 412 0 2 198 6 532 mbar cached menu
Capacity 0 means closed.
So, indeed it appears like the menu code leaks (closed) ports. FWIW, I have an old hrev48150 gcc4-only image lying around, which doesn't show the issue.
follow-up: 12 comment:11 by , 9 years ago
Great collaborative work :-) Puts me back on track to continue the hunt, even surprised myself that I did find a few things.
The mbar cached menu
port name might come from this (not certain): http://cgit.haiku-os.org/haiku/tree/src/kits/interface/Menu.cpp#n2537
If so, the handling of fCachedMenuWindow
seems to be okay so that oriented me to the theory of a higher level leak, i.e. the whole BMenu being leaked (and thus also its fCachedMenuWindow being leaked).
So I ran Michael's leak_analyser.sh against prefs/Mouse (Terminal takes too long to run with the guarded heap) and it seems there are tons of leaks. One can navigate through them by "diff'ing" between two sessions though. If I open the top menufield in the second session (but not the first) I see 3 new leaks appear. None of them are a BMenu, but the first one seems worth addressing immediately, I will leave it to you folks to confirm:
allocation: base: 0x45c37e8; size: 2072; thread: 2308; alignment: 1 <libroot.so> __wmemset + 0x64e (nearest) <libroot.so> __wmemset + 0xa95 (nearest) <libroot.so> memalign + 0x33 <libroot.so> malloc + 0x26 <libroot.so> __allocate_pthread + 0x21 <libroot.so> spawn_thread + 0x23 <libbe.so> BPrivate::MouseDownThread<BMenuField>::Go(void) + 0x2d <libbe.so> BPrivate::MouseDownThread<BMenuField>::TrackMouse(BMenuField *, void (BMenuField::*)(BMenuField *, BPoint), void (BMenuField::*)(BMenuField *, BPoint, unsigned long), long long) + 0x53 <libbe.so> BMenuField::MouseDown(BPoint) + 0x10b <libbe.so> BWindow::DispatchMessage(BMessage *, BHandler *) + 0xd47 <libbe.so> BWindow::task_looper(void) + 0x290
Is it worth filing another ticket for that leaked thread ?
EDIT: for cinfirming the leak/or not, might be worth mentionning more data: the second leak item is this
allocation: base: 0x45e1fc8; size: 56; thread: 2308; alignment: 1 (..) <libroot.so> __builtin_new + 0x29 <libbe.so> BPrivate::MouseDownThread<BMenuField>::TrackMouse(BMenuField *, void (BMenuField::*)(BMenuField *, BPoint), void (BMenuField::*)(BMenuField *, BPoint, unsigned long), long long) + 0x1e
And looking at the code, there is indeed an operator-new here but the corresponding delete in Track() is never reached because of the kill_thread() here ;
EDIT2: Errr ok, in fact it seems Track() does terminate, and does not crash in ASSERT() simply because libbe.so is built in release mode (?), and (get this), the dtor is called but interrupts itself before actually freeing the resources, because kill_thread() actually kills itself. Am I making sense at all? Dunno.
So we do leak the MouseDownThread object whenever the menu is opened and closed, right ?
Once the above is dealt with I'll return to the supposed BMenu leak.. Or maybe before embarking on that, it would be interesting compile libbe.so with this define commented out and see if the port leak disappears?
comment:12 by , 9 years ago
Replying to ttcoder:
EDIT2: Errr ok, in fact it seems Track() does terminate, and does not crash in ASSERT() simply because libbe.so is built in release mode (?), and (get this), the dtor is called but interrupts itself before actually freeing the resources, because kill_thread() actually kills itself. Am I making sense at all? Dunno.
So we do leak the MouseDownThread object whenever the menu is opened and closed, right ?
Yes. Moreover the kill_thread()
is causing the first leak you noticed. kill_thread()
explicitly skips clean-ups associated with the termination of the thread (unlike exit_thread()
), among those the deallocation of the pthread object. So MouseDownThread
is not a well-behaved citizen at all.
Once the above is dealt with I'll return to the supposed BMenu leak.. Or maybe before embarking on that, it would be interesting compile libbe.so with this define commented out and see if the port leak disappears?
Doesn't hurt to try, I guess.
follow-up: 16 comment:13 by , 9 years ago
Replying to bonefish:
So, indeed it appears like the menu code leaks (closed) ports. FWIW, I have an old hrev48150 gcc4-only image lying around, which doesn't show the issue.
I guess commit 285b7163ad41e00841d0dc28b48fe10902a350bc is the culprit then, since it changed the code to use MouseDownThread(), which calls kill_thread instead of wait_for_thread.
comment:14 by , 9 years ago
MouseDownThread is also used in Tracker (DialogPane) and Deskbar (ExpandoMenuBar), so before changing the kill_thread() to wait_for_thread() we should review its usage there, too.
@Bonefish: Is there no way to force kill_thread() to kill the used ports, too ?
follow-up: 18 comment:15 by , 9 years ago
The change includes a creation of a MouseDownThread indeed. But maybe that class can be fixed.
Is it correct to say the class would be fully fixed by removing the kill_thread() call in dtor, the ASSERT() in Track(), and the TRESPASS() in the dtor ?
Removing the kill_thread would let the dtor complete cleanly, return control to Track(), and exit the thread ?
PS - maybe it's time to track that issue in another ticket, as it's unrelated to the port leak (unless I missed something :-)
follow-up: 17 comment:16 by , 9 years ago
Replying to jackburton:
Replying to bonefish:
So, indeed it appears like the menu code leaks (closed) ports. FWIW, I have an old hrev48150 gcc4-only image lying around, which doesn't show the issue.
I guess commit 285b7163ad41e00841d0dc28b48fe10902a350bc is the culprit then, since it changed the code to use MouseDownThread(), which calls kill_thread instead of wait_for_thread.
I don't see the "instead". The use of MouseDownThread
just seems to be an addition.
Replying to jackburton:
MouseDownThread is also used in Tracker (DialogPane) and Deskbar (ExpandoMenuBar), so before changing the kill_thread() to wait_for_thread() we should review its usage there, too.
wait_for_thread()
is not correct either, since in that case the thread itself invokes the destructor. With kill_thread()
it commits suicide, with wait_for_thread()
it would wait for itself (not sure off the top of my head, if that would cause it to hang indefinitely or return an error). exit_thread()
would be more correct (when invoked from the thread itself), but it would still leak the allocated memory.
As is the class is simply badly designed. And yes, its other uses should be checked as well before changing anything.
@Bonefish: Is there no way to force kill_thread() to kill the used ports, too ?
No. kill_thread()
is even documented in the BeBook as being problematic with respect to resource leaks. It should be used only as a last resort.
follow-up: 19 comment:17 by , 9 years ago
Replying to bonefish:
Replying to jackburton:
I guess commit 285b7163ad41e00841d0dc28b48fe10902a350bc is the culprit then, since it changed the code to use MouseDownThread(), which calls kill_thread instead of wait_for_thread.
I don't see the "instead". The use of
MouseDownThread
just seems to be an addition.
Yeah, sorry, I misread the commit message. By the way, maybe John could explain that change, because I don't understand it completely. It seems that now two threads are used instead of one.
Replying to jackburton:
MouseDownThread is also used in Tracker (DialogPane) and Deskbar (ExpandoMenuBar), so before changing the kill_thread() to wait_for_thread() we should review its usage there, too.
wait_for_thread()
is not correct either, since in that case the thread itself invokes the destructor. Withkill_thread()
it commits suicide, withwait_for_thread()
it would wait for itself (not sure off the top of my head, if that would cause it to hang indefinitely or return an error).exit_thread()
would be more correct (when invoked from the thread itself), but it would still leak the allocated memory.As is the class is simply badly designed. And yes, its other uses should be checked as well before changing anything.
@Bonefish: Is there no way to force kill_thread() to kill the used ports, too ?
No.
kill_thread()
is even documented in the BeBook as being problematic with respect to resource leaks. It should be used only as a last resort.
Ok, thank you.
comment:18 by , 9 years ago
Replying to ttcoder:
The change includes a creation of a MouseDownThread indeed. But maybe that class can be fixed.
Is it correct to say the class would be fully fixed by removing the kill_thread() call in dtor, the ASSERT() in Track(), and the TRESPASS() in the dtor ?
Removing the kill_thread would let the dtor complete cleanly, return control to Track(), and exit the thread ?
It depends on the other uses. If it is expected that any other thread can destroy the object then it wouldn't work. Without ref-counting that would be unsafe anyway, though.
PS - maybe it's time to track that issue in another ticket, as it's unrelated to the port leak (unless I missed something :-)
I also see no immediate connection to the leaked ports.
comment:19 by , 9 years ago
Replying to jackburton:
By the way, maybe John could explain that change, because I don't understand it completely. It seems that now two threads are used instead of one.
After a quick look, I think I made a good decision in the past not to touch the menu code. Yeah, the new thread polls GetMouse()
until all mouse buttons are released and then uninstalls the message filter. Why the message filter can't do that itself, I don't known.
The old thread's activities are to invalidate the view directly after it is started and once more just before it terminates. In between it polls fMenuBar->fTracking
. To me it doesn't sound like a thread would be needed for that purpose either.
comment:20 by , 9 years ago
Filed #12293 , we'll probably want to comment there about that separate heap leak issue; I'll now focus on this one as it's more severe, a leak of ports rather than heap..
comment:21 by , 9 years ago
git bisect result:
# bad: [798ad3db303ec486d507f32915193f3456bfca86] BLooper: don't delete a launch_daemon port. git bisect bad 798ad3db303ec486d507f32915193f3456bfca86 # good: [1480e5da6fc4a2926c3da1a973e39927ac2171ae] The beginnings of a launch_daemon for Haiku. git bisect good 1480e5da6fc4a2926c3da1a973e39927ac2171ae
both commits are inside the Launch Daemon branch and rather close. You can feed this into "git bisect replay" if you want to continue bisecting, for me it's sleep time. Note that some revisions in the launch daemon branch aren't booting, which complicates bisecting further.
comment:22 by , 9 years ago
Component: | Kits/Interface Kit → Servers/launch_daemon |
---|---|
Priority: | high → blocker |
This is probably the cause of the "out of ports" KDLs then. Elevating to blocker (ugh, another launch_daemon issue; maybe a few rounds of testing are in order before we merge whatever the next major branch is...)
follow-up: 24 comment:23 by , 9 years ago
I may have taken a wrong turn in comment:11 by focusing on class BMenu; the KDL ports dump shows which port is leaked, it's the one named mbar cached menu
: the BMenuWindow ctor passes that name string to BWindow, which passes it here to BLooper. It would take some tracing/digging to check for sure that the close_port() call is indeed not executed, even though the code path looks like it should be. Will take me a few days to get to this, so kinda hoping someone will beat me to it :-)
comment:24 by , 9 years ago
Replying to ttcoder:
close_port() call is indeed not executed, even though the code path looks like it should be.
It seems strange at least that delete_port would be called for fOwnsPort=false. Maybe the check should be inverted?
follow-up: 26 comment:25 by , 9 years ago
@jackburton the if (fMsgPort >= 0 && fOwnsPort)
only triggers the close_port() if fOwnsPort
==true, that should be the right logic?
Anyway I fessed up and actually built a libbe.so right away, with some tracing added, and that allows to pinpoint the problem at close_port(), if I'm not mistaken: it seems it is called, but it does not succeed in closing the port, even though it returns B_OK.
- if it's commented out, opening a menu allocates 3 ports, closing it frees only 2 ports.
- if it's commented in as normal, opening a menu allocates 3 ports, closing it frees only 2 ports -- no change whatsoever !
I looked at the kernelland create_port() code out of curiosity to see how it could fail yet still return 0x0 but I don't "see" it.. And out of my pool depth of course.
To double-check what I did, here's a sample session showing the be_app looper, the BWindow looper, and the BMenu opened and closed twice:
/boot/system/cache/tmp/New folder> ./Mouse BLooper ctor 0x72e86400 BLooper ctor 0x18859698 BLooper ctor 0x188c6430 BLooper DTOR 0x188c6430 with fOwnsPort=1 dtor: close_port(6091) returns 0 BLooper ctor 0x188e2430 BLooper DTOR 0x188e2430 with fOwnsPort=1 dtor: close_port(6094) returns 0 BLooper DTOR 0x18859698 with fOwnsPort=1 dtor: close_port(6088) returns 0 BLooper DTOR 0x72e86400 with fOwnsPort=1 dtor: close_port(6085) returns 0 /boot/system/cache/tmp/New folder> ls -R .: lib Mouse ./lib: libbe.so
comment:26 by , 9 years ago
Replying to ttcoder:
@jackburton the
if (fMsgPort >= 0 && fOwnsPort)
only triggers the close_port() iffOwnsPort
==true, that should be the right logic?
Nope, I meant the
if (!fOwnsPort) {
at line 146
comment:27 by , 9 years ago
Well d'oh, should have checked the port API to know what to look for, the opposite of create_port is delete_port! Sorry for the noise
Looks like you can commit that change (unless axel wants to review first), you fixed this ticket :-)
/boot/system/cache/tmp/New folder> while true; do sysinfo -ports; sleep 1; done 2995 ports free (used/max 1101 / 4096) 2992 ports free (used/max 1104 / 4096) 2992 ports free (used/max 1104 / 4096) 2995 ports free (used/max 1101 / 4096) 2992 ports free (used/max 1104 / 4096) 2995 ports free (used/max 1101 / 4096) 2992 ports free (used/max 1104 / 4096) 2995 ports free (used/max 1101 / 4096) 2992 ports free (used/max 1104 / 4096) 2995 ports free (used/max 1101 / 4096)
comment:28 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → in-progress |
Odd that launch_daemon caused this issue, then.
comment:29 by , 9 years ago
Ah, my bad, I thought you were talking about a problem in BMenu. I see the issue in BLooper now.
comment:31 by , 9 years ago
Component: | Servers/launch_daemon → Kits/Application Kit |
---|
Wrong component. BTW we do not have a "Linux Next". We merge completed stuff to give it the additional testing to let it mature; our master branch isn't supposed to be stable.
comment:32 by , 9 years ago
Thanks waddlesplash, all back to normal with that commit!
/me incidentally happy to try out tunetracker as a launch_daemon 'service' some day, could be an elegant continuity-of-service system for us
Comments on the below: the 3 replicants in the deskbar seem to leak between 1 and 3 ports when clicked; the next "paragraph" was with Terminal, seems each of its three menus leak 1-2 ports when opened and closed, for me.