Opened 12 months ago

Last modified 5 months ago

#17208 new bug

Memory usage constantly increases on network with lots of devices

Reported by: pakyr Owned by: nobody
Priority: critical Milestone: R1/beta4
Component: Network & Internet Version: R1/beta3
Keywords: Cc: korli
Blocked By: Blocking:
Platform: All

Description

On a network with many (thousands) of devices, memory usage constantly increases at a rate of one megabyte every 3-10 seconds, until the system begins swapping to disk and eventually crashes. This seems to be because of a massive number of MDNS packets (even when totally idle, the system receives ~100kbps). Included is a wireshark capture showing the MDNS packets being recieved. This issue doesn't appear on networks without lots of devices, and the only difference was all the MDNS packets, which is why I assume this is somehow the issue.

Attachments (2)

capture.pcapng (1.2 MB ) - added by pakyr 12 months ago.
Wireshark Capture
syslog (173.3 KB ) - added by pakyr 12 months ago.
syslog

Download all attachments as: .zip

Change History (22)

by pakyr, 12 months ago

Attachment: capture.pcapng added

Wireshark Capture

comment:1 by pulkomandy, 12 months ago

"the memory usage" is a bit vague. Can you at least identify which team/process is leaking memory? Is it kernel? Is it net_server? Is it something else?

comment:2 by pakyr, 12 months ago

Kernel Team.

comment:3 by Coldfirex, 12 months ago

Would there be a way to simulate this?

comment:4 by pakyr, 12 months ago

No clue. All I can say is that it's happened to me on two networks, one at a large business, and the other at a large university. I could upload a syslog, or a video of the behavior, though I doubt either would help much.

comment:5 by Coldfirex, 12 months ago

Lets try a syslog at least. Wonder about an install not behind a router/firewall?

by pakyr, 12 months ago

Attachment: syslog added

syslog

comment:6 by pakyr, 12 months ago

Here you go. I booted the machine, let it sit idle for about minute with memory usage stable at 376mb, then connected to the network and let it sit idle for another minute, during which memory usage increased by ~15mb. That's when I grabbed the syslog. Also, not sure what you mean by "install not behind a router/firewall".

Last edited 12 months ago by pakyr (previous) (diff)

comment:7 by Coldfirex, 12 months ago

Maybe we could test with iperf or something similar to replicate easily? https://taosecurity.blogspot.com/2006/09/generating-multicast-traffic.html

comment:8 by waddlesplash, 12 months ago

Please try these things (one after the other, if none initially produces any result):

  1. Disabling the WiFi altogether, and see if the memory usage goes back down.
  2. Connecting via ethernet instead of WiFi, and seeing if the memory usage still goes up the same way.
  3. Booting Haiku in a virtual machine with a virtio network adapter, and seeing if the memory usage still goes up the same way.

comment:9 by pakyr, 12 months ago

  1. The increase stopped, but the usage did not go back down at all, even after waiting 20 minutes.
  1. Unable to test this right now.
  1. Booted in VMWare on a different PC, and had the same issue, but only when the network adapter was in 'bridged' mode instead of 'NAT' mode.

I also thought it may have been due to something I installed on my main Haiku installation, so I booted my laptop using a beta 3 installer, and had the same issue.

comment:10 by pakyr, 12 months ago

Update: Was able to test with an ethernet cable on the original laptop; no change, the issue manifested in the same way.

comment:11 by waddlesplash, 12 months ago

Component: - GeneralNetwork & Internet

This sounds like it should be relatively easy to reproduce and track down where the memory is really going, then. I'll see if I can take a look before too long.

comment:12 by waddlesplash, 8 months ago

Milestone: UnscheduledR1/beta4
Priority: normalcritical

comment:13 by waddlesplash, 8 months ago

Cc: korli added

CC korli: MDNS packets are UDP multicast, which you reenabled last year.

comment:14 by pulkomandy, 8 months ago

Just to clarify, is the fix in https://review.haiku-os.org/c/haiku/+/4791 related to this?

comment:15 by waddlesplash, 8 months ago

I don't think it is, but I could be mistaken. I didn't test it with mDNS anyway.

comment:16 by waddlesplash, 8 months ago

Actually, the commit that patch fixed was made only in November and this ticket was opened in August. So clearly this problem predates that one, so it isn't related.

comment:17 by korli, 8 months ago

I tried to replay the capture dump locally with tcpreplay, the haiku host sees the packets slowly coming (checked with tcpdump). It's difficult to notice anything happening on the used memory because of the replay. Maybe the replay is too slow to reproduce.

comment:18 by pulkomandy, 8 months ago

It should be possible to change the replay speed with tcpreplay options -p, -x or -t

https://linux.die.net/man/1/tcpreplay

comment:19 by waddlesplash, 5 months ago

I managed to reproduce this, though the rate is much slower for me (0.1MB every 3-10 seconds.) When it is occurring, more and more slab areas are created, e.g.

slab memory manager: created area 0xffffffffb0801000 (240115)
slab memory manager: created area 0xffffffffb1001000 (240117)

So, I dropped into KDL and dumped all object caches (includes net_buffers and the default malloc heap), exited, waited a while (without doing anything), then did it again. Here are just the differences:

            address                   name  objsize    align    usage  empty  usedobj    total    flags
-0xffffffff82006570    block allocator: 48       48        8  6066176      0   124381   124404 80000000
+0xffffffff82006570    block allocator: 48       48        8  6881280      0   141051   141120 80000000
-0xffffffff82006720    block allocator: 64       64       64  2445312      0    37593    37611 80000000
+0xffffffff82006720    block allocator: 64       64       64  2527232      0    36003    38871 80000000
-0xffffffff82006de0   block allocator: 128      128      128 14479360      0   109577   109585 80000000
+0xffffffff82006de0   block allocator: 128      128      128 17960960      0   135930   135935 80000000
-0xffffffff82008510   block allocator: 256      256      256 16846848      0    61686    61695 80000000
+0xffffffff82008510   block allocator: 256      256      256 20885504      0    76482    76485 80000000
-0xffffffff82008a80   block allocator: 448      448        8 21102592      0    46366    46368 80000000
+0xffffffff82008a80   block allocator: 448      448        8 27103232      0    59551    59553 80000000
-0xffffffff8200b800  block allocator: 4096     4096     4096   458752      1       86      112 88000000
+0xffffffff8200b800  block allocator: 4096     4096     4096   458752      1       85      112 88000000
-0xffffffff8200c800  block allocator: 8192     8192     8192   655360      0       76       80 88000000
+0xffffffff8200c800  block allocator: 8192     8192     8192   655360      0       77       80 88000000
-0xffffffff8200de00             cache refs       16        8   794624      0    48815    48888        0
+0xffffffff8200de00             cache refs       16        8  1011712      0    61994    62244        0
-0xffffffff8200d8c0           vnode caches      224        8 10526720      0    46251    46260        0
+0xffffffff8200d8c0           vnode caches      224        8 13529088      0    59451    59454        0
-0xffffffff8200d540            null caches      192        8    81920      0      420      420        0
+0xffffffff8200d540            null caches      192        8    86016      0      429      441        0
-0xffffffff823adc48          cached blocks      104        8  5242880      0    49705    50400 20000000
+0xffffffff823adc48          cached blocks      104        8  6815744      0    63165    65520 20000000
-0xffffffff823bca00    block cache buffers     2048        8 203948032      0    99412    99584 20000000
+0xffffffff823bca00    block cache buffers     2048        8 258998272      0   126330   126464 20000000

comment:20 by waddlesplash, 5 months ago

And of course now that I've created some testing images, I can't seem reproduce it.

Anyone else that has managed to replicate this repeatedly?

Note: See TracTickets for help on using tickets.