Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#12286 closed bug (fixed)

Menus seem to be leaking ports

Reported by: ttcoder Owned by: waddlesplash
Priority: blocker Milestone: R1/beta1
Component: Kits/Application Kit Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

As a follow up to #11342:comment:6 I ended up watching a terminal looping on sysinfo -ports while doing various things in Haiku. While doing so, I noticed the leak is not specific to W+ or qupzilla, it occurs even when opening menus from the deskbar, terminal ..etc.

The leaked ports seem to be returned to the system once the offending application is quit and relaunched (e.g. hey Deskbar quit then the three finger salute and then "restart desktop").

Anyone can confirm it's not a quirk happening just on this setup (hrev49480), does it happen for others too ? Is the growing number of used ports a bug ?

Change History (32)

comment:1 by ttcoder, 4 years ago

Comments on the below: the 3 replicants in the deskbar seem to leak between 1 and 3 ports when clicked; the next "paragraph" was with Terminal, seems each of its three menus leak 1-2 ports when opened and closed, for me.

~/Desktop> while true; do sysinfo -ports; sleep 3; done
      3855 ports free      (used/max        241 /       4096)
      3855 ports free      (used/max        241 /       4096)
      3854 ports free      (used/max        242 /       4096)
      3850 ports free      (used/max        246 /       4096)
      3851 ports free      (used/max        245 /       4096)
      3851 ports free      (used/max        245 /       4096)
      3851 ports free      (used/max        245 /       4096)
      3850 ports free      (used/max        246 /       4096)
      3850 ports free      (used/max        246 /       4096)
      3846 ports free      (used/max        250 /       4096)
      3847 ports free      (used/max        249 /       4096)
      3847 ports free      (used/max        249 /       4096)
      3847 ports free      (used/max        249 /       4096)

      3879 ports free      (used/max        217 /       4096)
      3862 ports free      (used/max        234 /       4096)
      3859 ports free      (used/max        237 /       4096)
      3859 ports free      (used/max        237 /       4096)
      3858 ports free      (used/max        238 /       4096)
      3856 ports free      (used/max        240 /       4096)
      3855 ports free      (used/max        241 /       4096)
      3854 ports free      (used/max        242 /       4096)
      3848 ports free      (used/max        248 /       4096)
[...]
      3826 ports free      (used/max        270 /       4096)

comment:2 by waddlesplash, 4 years ago

Confirmed I see the same thing happening here.

comment:3 by waddlesplash, 4 years ago

Running strace on the testcase from #9576 doesn't show any ports being created when a menu is opened (or closed), and I don't see any ports that look suspicious / unfreed. So this is not a bug in the Interface Kit.

I did a bit of poking around in app_server (the next likely culprit) but found nothing...

comment:4 by ttcoder, 4 years ago

Hmmm.. Opening a menu/submenu should create a BWindow and its inherited BLooper and its port, does the fMsgPort member show up in strace's output ? Anyway just in case, I looked at this recent change in Looper.cpp and Application.cpp but could not find a/the problem

comment:5 by jackburton, 4 years ago

BMenu caches (some of) its windows, so if you open a BMenu it won't always create a new BWindow.

Last edited 4 years ago by jackburton (previous) (diff)

comment:6 by waddlesplash, 4 years ago

BMenu caches (some of) its windows, so if you open a BMenu it won't always create a new BWindow.

Right, but in this case, one port is leaked every time you open a menu, even if it's the same menu (from the same menu bar, etc.)

comment:7 by pulkomandy, 4 years ago

It would be useful to look more closely at what is leaked. Is there a thread running for the menu? Which hrev did this appear in first?

comment:8 by ttcoder, 4 years ago

Well playing with listports is fairly interesting: it seems the faulty ports are not listed. Maybe there is actually no port "leak" as such, and the kernel is just getting confused, i.e. there is a variable which cashes for efficiency reasons the "size" (length, whatever) of the port list, and that variable gets out of sync with the actual list contents in some circumstances, leading into an "artificial" exhaustion in the end?

To wit:

~/Desktop> sysinfo -ports
      3415 ports free      (used/max        681 /       4096)
~/Desktop> listport | wc
    426    1514   20563

I don't feel like counting how many lines there in are the listports output, but we can say confidently that 426 lines are NOT going to list 681 ports :-) (and indeed a cursory exmination of the output shows nothing out of the ordinary: the app_server has a short listing of ports, Terminal's listing of ports is short and does NOT change much or at all ..etc)

Especially since the listport output does not change (!) even after the ports has been incremented by 5 (in this test I have Qupzilla launched, hence the bigger figures):

~/Desktop> listport | wc
    396    1407   19175
~/Desktop> sysinfo -ports
      2641 ports free      (used/max       1455 /       4096)
~/Desktop> listport | wc
    396    1407   19175
~/Desktop> sysinfo -ports
      2636 ports free      (used/max       1460 /       4096)

Or maybe it's the other way around and it's listport which is broken ?

....

By the way, first saw that when installing a nightly (fresh install) of 49480 in place of a 49172, which could run Qupzilla for a full afternoon without problem, never ran out of ports. Pretty wide range still..

comment:9 by ttcoder, 4 years ago

sysinfo just prints a member of a struct retrieved here: http://cgit.haiku-os.org/haiku/tree/src/bin/sysinfo.cpp#n820

listport is fairly straightforward too: http://cgit.haiku-os.org/haiku/tree/src/bin/listport.c#n80

EDIT: the relative absence of creation/destruction of ports reported by listport seems consistent with comment:3 . Thus it would be sysinfo which is at odds with both listport and strace no ?

Last edited 4 years ago by ttcoder (previous) (diff)

comment:10 by bonefish, 4 years ago

Not sure whether this is a good idea (and/or BeOS compatible), but listport (i.e. get_next_port_info()) omits closed ports. So, you may get:

$ listport 532
port: total:  4096, used:   176, left:  3920

TEAM  532 (/boot/system/apps/Terminal):
   ID                         name  capacity  queued
----------------------------------------------------
  352               rAppLooperPort       100       0
  354                   tmp_rport1         1       0
  355                   tmp_rport2         1       0
  357                 a<app_server       100       0
  358   a<532:x-vnd.Haiku-Terminal        50       0
  359                                    200       0
  360                 w<app_server       200       1
  353                   tmp_rport0         1       0

but the kernel debugger command will show them all:

kdebug> ports 532
port             id  cap  read-cnt  write-cnt   total   team  name
0xd1223e60      352  100         0       100        4    532  rAppLooperPort
0xd12235a0      353    1         0         1     2247    532  tmp_rport0
0xd1223460      354    1         0         1        0    532  tmp_rport1
0xd12233c0      355    1         0         1        0    532  tmp_rport2
0xd1223280      357  100         0       100      584    532  a<app_server
0xd12231e0      358   50         0        50      594    532  a<532:x-vnd.Haiku-Terminal
0xd12230a0      359  200         0       200    13465    532  
0xd1223000      360  200         0       200     3283    532  w<app_server
0xce4d3ab0      385    0         2       198        7    532  mbar cached menu
0xce4d38d0      388    0         1       199        7    532  mbar cached menu
0xce4d3830      391    0         2       198       11    532  mbar cached menu
0xce4d36f0      394    0         2       198       30    532  mbar cached menu
0xce4d3290      397    0         2       198        7    532  mbar cached menu
0xce6883c8      400    0         2       198        6    532  mbar cached menu
0xce4d3470      403    0         2       198        7    532  mbar cached menu
0xce688148      406    0         2       198        7    532  mbar cached menu
0xd1256dd8      409    0         2       198        7    532  mbar cached menu
0xd1256e78      412    0         2       198        6    532  mbar cached menu

Capacity 0 means closed.

So, indeed it appears like the menu code leaks (closed) ports. FWIW, I have an old hrev48150 gcc4-only image lying around, which doesn't show the issue.

comment:11 by ttcoder, 4 years ago

Great collaborative work :-) Puts me back on track to continue the hunt, even surprised myself that I did find a few things. The mbar cached menu port name might come from this (not certain): http://cgit.haiku-os.org/haiku/tree/src/kits/interface/Menu.cpp#n2537

If so, the handling of fCachedMenuWindow seems to be okay so that oriented me to the theory of a higher level leak, i.e. the whole BMenu being leaked (and thus also its fCachedMenuWindow being leaked).

So I ran Michael's leak_analyser.sh against prefs/Mouse (Terminal takes too long to run with the guarded heap) and it seems there are tons of leaks. One can navigate through them by "diff'ing" between two sessions though. If I open the top menufield in the second session (but not the first) I see 3 new leaks appear. None of them are a BMenu, but the first one seems worth addressing immediately, I will leave it to you folks to confirm:

allocation: base: 0x45c37e8; size: 2072; thread: 2308; alignment: 1
        <libroot.so> __wmemset + 0x64e (nearest)
        <libroot.so> __wmemset + 0xa95 (nearest)
        <libroot.so> memalign + 0x33
        <libroot.so> malloc + 0x26
        <libroot.so> __allocate_pthread + 0x21
        <libroot.so> spawn_thread + 0x23
        <libbe.so> BPrivate::MouseDownThread<BMenuField>::Go(void) + 0x2d
        <libbe.so> BPrivate::MouseDownThread<BMenuField>::TrackMouse(BMenuField *, void (BMenuField::*)(BMenuField *, BPoint), void (BMenuField::*)(BMenuField *, BPoint, unsigned long), long long) + 0x53
        <libbe.so> BMenuField::MouseDown(BPoint) + 0x10b
        <libbe.so> BWindow::DispatchMessage(BMessage *, BHandler *) + 0xd47
        <libbe.so> BWindow::task_looper(void) + 0x290

Is it worth filing another ticket for that leaked thread ?

EDIT: for cinfirming the leak/or not, might be worth mentionning more data: the second leak item is this

allocation: base: 0x45e1fc8; size: 56; thread: 2308; alignment: 1
  (..)
        <libroot.so> __builtin_new + 0x29
        <libbe.so> BPrivate::MouseDownThread<BMenuField>::TrackMouse(BMenuField *, void (BMenuField::*)(BMenuField *, BPoint), void (BMenuField::*)(BMenuField *, BPoint, unsigned long), long long) + 0x1e

And looking at the code, there is indeed an operator-new here but the corresponding delete in Track() is never reached because of the kill_thread() here ;

EDIT2: Errr ok, in fact it seems Track() does terminate, and does not crash in ASSERT() simply because libbe.so is built in release mode (?), and (get this), the dtor is called but interrupts itself before actually freeing the resources, because kill_thread() actually kills itself. Am I making sense at all? Dunno.

So we do leak the MouseDownThread object whenever the menu is opened and closed, right ?

Once the above is dealt with I'll return to the supposed BMenu leak.. Or maybe before embarking on that, it would be interesting compile libbe.so with this define commented out and see if the port leak disappears?

Last edited 4 years ago by ttcoder (previous) (diff)

in reply to:  11 comment:12 by bonefish, 4 years ago

Replying to ttcoder:

EDIT2: Errr ok, in fact it seems Track() does terminate, and does not crash in ASSERT() simply because libbe.so is built in release mode (?), and (get this), the dtor is called but interrupts itself before actually freeing the resources, because kill_thread() actually kills itself. Am I making sense at all? Dunno.

So we do leak the MouseDownThread object whenever the menu is opened and closed, right ?

Yes. Moreover the kill_thread() is causing the first leak you noticed. kill_thread() explicitly skips clean-ups associated with the termination of the thread (unlike exit_thread()), among those the deallocation of the pthread object. So MouseDownThread is not a well-behaved citizen at all.

Once the above is dealt with I'll return to the supposed BMenu leak.. Or maybe before embarking on that, it would be interesting compile libbe.so with this define commented out and see if the port leak disappears?

Doesn't hurt to try, I guess.

in reply to:  10 ; comment:13 by jackburton, 4 years ago

Replying to bonefish:

So, indeed it appears like the menu code leaks (closed) ports. FWIW, I have an old hrev48150 gcc4-only image lying around, which doesn't show the issue.

I guess commit 285b7163ad41e00841d0dc28b48fe10902a350bc is the culprit then, since it changed the code to use MouseDownThread(), which calls kill_thread instead of wait_for_thread.

comment:14 by jackburton, 4 years ago

MouseDownThread is also used in Tracker (DialogPane) and Deskbar (ExpandoMenuBar), so before changing the kill_thread() to wait_for_thread() we should review its usage there, too.

@Bonefish: Is there no way to force kill_thread() to kill the used ports, too ?

comment:15 by ttcoder, 4 years ago

The change includes a creation of a MouseDownThread indeed. But maybe that class can be fixed.

Is it correct to say the class would be fully fixed by removing the kill_thread() call in dtor, the ASSERT() in Track(), and the TRESPASS() in the dtor ?

Removing the kill_thread would let the dtor complete cleanly, return control to Track(), and exit the thread ?

PS - maybe it's time to track that issue in another ticket, as it's unrelated to the port leak (unless I missed something :-)

in reply to:  13 ; comment:16 by bonefish, 4 years ago

Replying to jackburton:

Replying to bonefish:

So, indeed it appears like the menu code leaks (closed) ports. FWIW, I have an old hrev48150 gcc4-only image lying around, which doesn't show the issue.

I guess commit 285b7163ad41e00841d0dc28b48fe10902a350bc is the culprit then, since it changed the code to use MouseDownThread(), which calls kill_thread instead of wait_for_thread.

I don't see the "instead". The use of MouseDownThread just seems to be an addition.

Replying to jackburton:

MouseDownThread is also used in Tracker (DialogPane) and Deskbar (ExpandoMenuBar), so before changing the kill_thread() to wait_for_thread() we should review its usage there, too.

wait_for_thread() is not correct either, since in that case the thread itself invokes the destructor. With kill_thread() it commits suicide, with wait_for_thread() it would wait for itself (not sure off the top of my head, if that would cause it to hang indefinitely or return an error). exit_thread() would be more correct (when invoked from the thread itself), but it would still leak the allocated memory.

As is the class is simply badly designed. And yes, its other uses should be checked as well before changing anything.

@Bonefish: Is there no way to force kill_thread() to kill the used ports, too ?

No. kill_thread() is even documented in the BeBook as being problematic with respect to resource leaks. It should be used only as a last resort.

in reply to:  16 ; comment:17 by jackburton, 4 years ago

Replying to bonefish:

Replying to jackburton:

I guess commit 285b7163ad41e00841d0dc28b48fe10902a350bc is the culprit then, since it changed the code to use MouseDownThread(), which calls kill_thread instead of wait_for_thread.

I don't see the "instead". The use of MouseDownThread just seems to be an addition.

Yeah, sorry, I misread the commit message. By the way, maybe John could explain that change, because I don't understand it completely. It seems that now two threads are used instead of one.

Replying to jackburton:

MouseDownThread is also used in Tracker (DialogPane) and Deskbar (ExpandoMenuBar), so before changing the kill_thread() to wait_for_thread() we should review its usage there, too.

wait_for_thread() is not correct either, since in that case the thread itself invokes the destructor. With kill_thread() it commits suicide, with wait_for_thread() it would wait for itself (not sure off the top of my head, if that would cause it to hang indefinitely or return an error). exit_thread() would be more correct (when invoked from the thread itself), but it would still leak the allocated memory.

As is the class is simply badly designed. And yes, its other uses should be checked as well before changing anything.

@Bonefish: Is there no way to force kill_thread() to kill the used ports, too ?

No. kill_thread() is even documented in the BeBook as being problematic with respect to resource leaks. It should be used only as a last resort.

Ok, thank you.

in reply to:  15 comment:18 by bonefish, 4 years ago

Replying to ttcoder:

The change includes a creation of a MouseDownThread indeed. But maybe that class can be fixed.

Is it correct to say the class would be fully fixed by removing the kill_thread() call in dtor, the ASSERT() in Track(), and the TRESPASS() in the dtor ?

Removing the kill_thread would let the dtor complete cleanly, return control to Track(), and exit the thread ?

It depends on the other uses. If it is expected that any other thread can destroy the object then it wouldn't work. Without ref-counting that would be unsafe anyway, though.

PS - maybe it's time to track that issue in another ticket, as it's unrelated to the port leak (unless I missed something :-)

I also see no immediate connection to the leaked ports.

in reply to:  17 comment:19 by bonefish, 4 years ago

Replying to jackburton:

By the way, maybe John could explain that change, because I don't understand it completely. It seems that now two threads are used instead of one.

After a quick look, I think I made a good decision in the past not to touch the menu code. Yeah, the new thread polls GetMouse() until all mouse buttons are released and then uninstalls the message filter. Why the message filter can't do that itself, I don't known.

The old thread's activities are to invalidate the view directly after it is started and once more just before it terminates. In between it polls fMenuBar->fTracking. To me it doesn't sound like a thread would be needed for that purpose either.

comment:20 by ttcoder, 4 years ago

Filed #12293 , we'll probably want to comment there about that separate heap leak issue; I'll now focus on this one as it's more severe, a leak of ports rather than heap..

comment:21 by pulkomandy, 4 years ago

git bisect result:

# bad: [798ad3db303ec486d507f32915193f3456bfca86] BLooper: don't delete a launch_daemon port.
git bisect bad 798ad3db303ec486d507f32915193f3456bfca86
# good: [1480e5da6fc4a2926c3da1a973e39927ac2171ae] The beginnings of a launch_daemon for Haiku.
git bisect good 1480e5da6fc4a2926c3da1a973e39927ac2171ae

both commits are inside the Launch Daemon branch and rather close. You can feed this into "git bisect replay" if you want to continue bisecting, for me it's sleep time. Note that some revisions in the launch daemon branch aren't booting, which complicates bisecting further.

comment:22 by waddlesplash, 4 years ago

Component: Kits/Interface KitServers/launch_daemon
Priority: highblocker

This is probably the cause of the "out of ports" KDLs then. Elevating to blocker (ugh, another launch_daemon issue; maybe a few rounds of testing are in order before we merge whatever the next major branch is...)

comment:23 by ttcoder, 4 years ago

I may have taken a wrong turn in comment:11 by focusing on class BMenu; the KDL ports dump shows which port is leaked, it's the one named mbar cached menu : the BMenuWindow ctor passes that name string to BWindow, which passes it here to BLooper. It would take some tracing/digging to check for sure that the close_port() call is indeed not executed, even though the code path looks like it should be. Will take me a few days to get to this, so kinda hoping someone will beat me to it :-)

in reply to:  23 comment:24 by jackburton, 4 years ago

Replying to ttcoder:

close_port() call is indeed not executed, even though the code path looks like it should be.

It seems strange at least that delete_port would be called for fOwnsPort=false. Maybe the check should be inverted?

comment:25 by ttcoder, 4 years ago

@jackburton the if (fMsgPort >= 0 && fOwnsPort) only triggers the close_port() if fOwnsPort==true, that should be the right logic?

Anyway I fessed up and actually built a libbe.so right away, with some tracing added, and that allows to pinpoint the problem at close_port(), if I'm not mistaken: it seems it is called, but it does not succeed in closing the port, even though it returns B_OK.

  • if it's commented out, opening a menu allocates 3 ports, closing it frees only 2 ports.
  • if it's commented in as normal, opening a menu allocates 3 ports, closing it frees only 2 ports -- no change whatsoever !

I looked at the kernelland create_port() code out of curiosity to see how it could fail yet still return 0x0 but I don't "see" it.. And out of my pool depth of course.

To double-check what I did, here's a sample session showing the be_app looper, the BWindow looper, and the BMenu opened and closed twice:

/boot/system/cache/tmp/New folder> ./Mouse 
BLooper ctor 0x72e86400
BLooper ctor 0x18859698
BLooper ctor 0x188c6430
BLooper DTOR 0x188c6430 with fOwnsPort=1
  dtor: close_port(6091) returns 0
BLooper ctor 0x188e2430
BLooper DTOR 0x188e2430 with fOwnsPort=1
  dtor: close_port(6094) returns 0
BLooper DTOR 0x18859698 with fOwnsPort=1
  dtor: close_port(6088) returns 0
BLooper DTOR 0x72e86400 with fOwnsPort=1
  dtor: close_port(6085) returns 0
/boot/system/cache/tmp/New folder> ls -R
.:
lib  Mouse

./lib:
libbe.so
Last edited 4 years ago by ttcoder (previous) (diff)

in reply to:  25 comment:26 by jackburton, 4 years ago

Replying to ttcoder:

@jackburton the if (fMsgPort >= 0 && fOwnsPort) only triggers the close_port() if fOwnsPort==true, that should be the right logic?

Nope, I meant the

if (!fOwnsPort) {

at line 146

comment:27 by ttcoder, 4 years ago

Well d'oh, should have checked the port API to know what to look for, the opposite of create_port is delete_port! Sorry for the noise

Looks like you can commit that change (unless axel wants to review first), you fixed this ticket :-)

/boot/system/cache/tmp/New folder> while true; do sysinfo -ports; sleep 1; done
      2995 ports free      (used/max       1101 /       4096)
      2992 ports free      (used/max       1104 /       4096)
      2992 ports free      (used/max       1104 /       4096)
      2995 ports free      (used/max       1101 /       4096)
      2992 ports free      (used/max       1104 /       4096)
      2995 ports free      (used/max       1101 /       4096)
      2992 ports free      (used/max       1104 /       4096)
      2995 ports free      (used/max       1101 /       4096)
      2992 ports free      (used/max       1104 /       4096)
      2995 ports free      (used/max       1101 /       4096)

comment:28 by waddlesplash, 4 years ago

Owner: changed from axeld to waddlesplash
Status: newin-progress

Odd that launch_daemon caused this issue, then.

comment:29 by waddlesplash, 4 years ago

Ah, my bad, I thought you were talking about a problem in BMenu. I see the issue in BLooper now.

comment:30 by waddlesplash, 4 years ago

Resolution: fixed
Status: in-progressclosed

Fixed in hrev49520.

comment:31 by axeld, 4 years ago

Component: Servers/launch_daemonKits/Application Kit

Wrong component. BTW we do not have a "Linux Next". We merge completed stuff to give it the additional testing to let it mature; our master branch isn't supposed to be stable.

comment:32 by ttcoder, 4 years ago

Thanks waddlesplash, all back to normal with that commit!

/me incidentally happy to try out tunetracker as a launch_daemon 'service' some day, could be an elegant continuity-of-service system for us

Note: See TracTickets for help on using tickets.