Opened 12 months ago
Last modified 5 months ago
#17208 new bug
Memory usage constantly increases on network with lots of devices
Reported by: | pakyr | Owned by: | nobody |
---|---|---|---|
Priority: | critical | Milestone: | R1/beta4 |
Component: | Network & Internet | Version: | R1/beta3 |
Keywords: | Cc: | korli | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
On a network with many (thousands) of devices, memory usage constantly increases at a rate of one megabyte every 3-10 seconds, until the system begins swapping to disk and eventually crashes. This seems to be because of a massive number of MDNS packets (even when totally idle, the system receives ~100kbps). Included is a wireshark capture showing the MDNS packets being recieved. This issue doesn't appear on networks without lots of devices, and the only difference was all the MDNS packets, which is why I assume this is somehow the issue.
Attachments (2)
Change History (22)
by , 12 months ago
Attachment: | capture.pcapng added |
---|
comment:1 by , 12 months ago
"the memory usage" is a bit vague. Can you at least identify which team/process is leaking memory? Is it kernel? Is it net_server? Is it something else?
comment:4 by , 12 months ago
No clue. All I can say is that it's happened to me on two networks, one at a large business, and the other at a large university. I could upload a syslog, or a video of the behavior, though I doubt either would help much.
comment:5 by , 12 months ago
Lets try a syslog at least. Wonder about an install not behind a router/firewall?
comment:6 by , 12 months ago
Here you go. I booted the machine, let it sit idle for about minute with memory usage stable at 376mb, then connected to the network and let it sit idle for another minute, during which memory usage increased by ~15mb. That's when I grabbed the syslog. Also, not sure what you mean by "install not behind a router/firewall".
comment:7 by , 12 months ago
Maybe we could test with iperf or something similar to replicate easily? https://taosecurity.blogspot.com/2006/09/generating-multicast-traffic.html
comment:8 by , 12 months ago
Please try these things (one after the other, if none initially produces any result):
- Disabling the WiFi altogether, and see if the memory usage goes back down.
- Connecting via ethernet instead of WiFi, and seeing if the memory usage still goes up the same way.
- Booting Haiku in a virtual machine with a virtio network adapter, and seeing if the memory usage still goes up the same way.
comment:9 by , 12 months ago
- The increase stopped, but the usage did not go back down at all, even after waiting 20 minutes.
- Unable to test this right now.
- Booted in VMWare on a different PC, and had the same issue, but only when the network adapter was in 'bridged' mode instead of 'NAT' mode.
I also thought it may have been due to something I installed on my main Haiku installation, so I booted my laptop using a beta 3 installer, and had the same issue.
comment:10 by , 12 months ago
Update: Was able to test with an ethernet cable on the original laptop; no change, the issue manifested in the same way.
comment:11 by , 12 months ago
Component: | - General → Network & Internet |
---|
This sounds like it should be relatively easy to reproduce and track down where the memory is really going, then. I'll see if I can take a look before too long.
comment:12 by , 8 months ago
Milestone: | Unscheduled → R1/beta4 |
---|---|
Priority: | normal → critical |
comment:13 by , 8 months ago
Cc: | added |
---|
CC korli: MDNS packets are UDP multicast, which you reenabled last year.
comment:14 by , 8 months ago
Just to clarify, is the fix in https://review.haiku-os.org/c/haiku/+/4791 related to this?
comment:15 by , 8 months ago
I don't think it is, but I could be mistaken. I didn't test it with mDNS anyway.
comment:16 by , 8 months ago
Actually, the commit that patch fixed was made only in November and this ticket was opened in August. So clearly this problem predates that one, so it isn't related.
comment:17 by , 8 months ago
I tried to replay the capture dump locally with tcpreplay, the haiku host sees the packets slowly coming (checked with tcpdump). It's difficult to notice anything happening on the used memory because of the replay. Maybe the replay is too slow to reproduce.
comment:18 by , 8 months ago
It should be possible to change the replay speed with tcpreplay options -p, -x or -t
comment:19 by , 5 months ago
I managed to reproduce this, though the rate is much slower for me (0.1MB every 3-10 seconds.) When it is occurring, more and more slab areas are created, e.g.
slab memory manager: created area 0xffffffffb0801000 (240115) slab memory manager: created area 0xffffffffb1001000 (240117)
So, I dropped into KDL and dumped all object caches (includes net_buffers and the default malloc heap), exited, waited a while (without doing anything), then did it again. Here are just the differences:
address name objsize align usage empty usedobj total flags -0xffffffff82006570 block allocator: 48 48 8 6066176 0 124381 124404 80000000 +0xffffffff82006570 block allocator: 48 48 8 6881280 0 141051 141120 80000000 -0xffffffff82006720 block allocator: 64 64 64 2445312 0 37593 37611 80000000 +0xffffffff82006720 block allocator: 64 64 64 2527232 0 36003 38871 80000000 -0xffffffff82006de0 block allocator: 128 128 128 14479360 0 109577 109585 80000000 +0xffffffff82006de0 block allocator: 128 128 128 17960960 0 135930 135935 80000000 -0xffffffff82008510 block allocator: 256 256 256 16846848 0 61686 61695 80000000 +0xffffffff82008510 block allocator: 256 256 256 20885504 0 76482 76485 80000000 -0xffffffff82008a80 block allocator: 448 448 8 21102592 0 46366 46368 80000000 +0xffffffff82008a80 block allocator: 448 448 8 27103232 0 59551 59553 80000000 -0xffffffff8200b800 block allocator: 4096 4096 4096 458752 1 86 112 88000000 +0xffffffff8200b800 block allocator: 4096 4096 4096 458752 1 85 112 88000000 -0xffffffff8200c800 block allocator: 8192 8192 8192 655360 0 76 80 88000000 +0xffffffff8200c800 block allocator: 8192 8192 8192 655360 0 77 80 88000000 -0xffffffff8200de00 cache refs 16 8 794624 0 48815 48888 0 +0xffffffff8200de00 cache refs 16 8 1011712 0 61994 62244 0 -0xffffffff8200d8c0 vnode caches 224 8 10526720 0 46251 46260 0 +0xffffffff8200d8c0 vnode caches 224 8 13529088 0 59451 59454 0 -0xffffffff8200d540 null caches 192 8 81920 0 420 420 0 +0xffffffff8200d540 null caches 192 8 86016 0 429 441 0 -0xffffffff823adc48 cached blocks 104 8 5242880 0 49705 50400 20000000 +0xffffffff823adc48 cached blocks 104 8 6815744 0 63165 65520 20000000 -0xffffffff823bca00 block cache buffers 2048 8 203948032 0 99412 99584 20000000 +0xffffffff823bca00 block cache buffers 2048 8 258998272 0 126330 126464 20000000
comment:20 by , 5 months ago
And of course now that I've created some testing images, I can't seem reproduce it.
Anyone else that has managed to replicate this repeatedly?
Wireshark Capture