Opened 6 years ago
Last modified 3 years ago
#13088 assigned enhancement
Performance: Investigate network I/O bottlenecks
Reported by: | kallisti5 | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Network & Internet/TCP | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
There seems to be some pretty serious performance bottlenecks around the network stack or the disk drivers, or the BFS filesystem.
Change History (14)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Haiku test - wget https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.9-rc6.tar.xz
Haiku x86_64, hrev50707, HaikuPorts wget 1.18-1 x86_64
device Network controller (Ethernet controller) [2|0|0] vendor 10ec: Realtek Semiconductor Co., Ltd. device 8168: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
- Haiku - x86_64 - GPT - BFS - SATA - 7.8MB/s
- Haiku - x86_64 - GPT - BFS - RAM Disk (200M) - 8.22MB/s
That seems to show the issue is somewhere either in the network stack, NIC driver, or BFS driver.
comment:3 by , 6 years ago
Looks like definitely network stack or network card driver.
Haiku x86_64, hrev50707
~> dd if=linux-4.9-rc6.tar.xz of=/RAM/linux-4.9-rc6.tar.xz 181945+1 records in 181945+1 records out 93156312 bytes (93 MB) copied, 0.764035 s, 122 MB/s
comment:5 by , 6 years ago
This is definitely caused by #13089. If I disable TCP window scaling on Linux, I get ~8MB/s downloads.
sudo sysctl -w net.ipv4.tcp_window_scaling=0
linux-4.9-rc6.tar.xz.1 42%[===> ] 37.76M 8.95MB/s eta 6s
comment:6 by , 6 years ago
Summary: | Performance: Investigate bottlenecks → Performance: Investigate network I/O bottlenecks |
---|
comment:7 by , 6 years ago
we actually do implement basic rfc1323 from what I can tell.. and wireshark shows tcp window scaling multipliers to be in use.
I did another transfer, wget from my local machine of a *larger* file and have some new behavior. The speed ramps up and down.
8MB/s, 7MB/s <10 or so seconds> 60MB/s <5 or so seconds> 2MB/s <10 or so seconds> 100MB/s <5 or so seconds> 2MB/s
So it seems like more of a congestion issue? No traffic on the local lan.
comment:8 by , 5 years ago
Component: | - General → Network & Internet/TCP |
---|---|
Owner: | changed from | to
comment:9 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:10 by , 4 years ago
We have iperf in our repositories, assuming there is nothing wrong with our port, I ran a few tests:
iperf -s iperf -c 127.0.0.1
- Linux x86_64 4.18.10-200.fc28.x86_64: (Ryzen 1800X)
- TCP window size: 2.50 MByte (default)
- [ 3] 0.0-10.0 sec 57.3 GBytes 49.3 Gbits/sec
- [ 3] 0.0-10.0 sec 57.0 GBytes 48.9 Gbits/sec
- TCP window size: 2.50 MByte (default)
- Linux x86_64 4.14.16-300.fc27.x86_64: (VM @ Vultr)
- TCP window size: 2.50 MByte (default)
- [ 3] 0.0-10.0 sec 17.8 GBytes 15.3 Gbits/sec
- [ 3] 0.0-10.0 sec 16.6 GBytes 14.3 Gbits/sec
- TCP window size: 2.50 MByte (default)
- Haiku, x86_64 hrev52295-16-g039fdd9ffe (early beta1 branch) (VM @ Vultr)
- TCP window size: 32.0 KByte (default)
- Uptime 15 days, 22 hours, 46 minutes - exposed to the internet.
- [ 3] 0.0-10.0 sec 71.2 MBytes 59.7 Mbits/sec
- [ 3] 0.0-10.0 sec 44.7 MBytes 37.5 Mbits/sec
- Haiku, x86_64 hrev52295-16-g039fdd9ffe (early beta1 branch) (VM @ Vultr)
- TCP window size: 32.0 KByte (default)
- Uptime 1 minute
- [ 3] 0.0-10.0 sec 62.2 MBytes 52.2 Mbits/sec
- [ 3] 0.0-10.0 sec 58.6 MBytes 49.2 Mbits/sec
- Haiku, x86_64 hrev52295-16-g039fdd9ffe (early beta1 branch) (VM @ Vultr)
- TCP window size: 2.50 MByte
- Uptime 7 minutes
- [ 3] 0.0-10.0 sec 65.8 MBytes 55.2 Mbits/sec
- [ 3] 0.0-10.0 sec 33.6 MBytes 28.2 Mbits/sec
- Haiku x86_64 hrev52295+100: (beta1 + updates) (Dell i5 Optiplex 3010 desktop)
- TCP window size: 32Kbyte (default)
- [ 3] 0.0-10.0 sec 9.35 GBytes 8.03 Gbits/sec
- [ 3] 0.0-10.0 sec 6.48 GBytes 5.56 Gbits/sec
- TCP window size: 32Kbyte (default)
- Haiku x86_64 hrev52295+100: (beta1 + updates) (Dell i5 Optiplex 3010 desktop)
- TCP window size: 2.50 Mbyte
- [ 3] 0.0-10.0 sec 10.7 GBytes 9.23 Gbits/sec
- [ 3] 0.0-10.0 sec 18.0 MBytes 15.1 Mbits/sec
- [ 3] 0.0-10.0 sec 10.5 GBytes 9.01 Gbits/sec
- [ 3] 0.0-10.0 sec 505 MBytes 424 Mbits/sec
- [ 3] 0.0-10.0 sec 16.1 MBytes 13.5 Mbits/sec
- [ 3] 0.0-60.0 sec 30.8 GBytes 4.41 Gbits/sec
- TCP window size: 2.50 Mbyte
comment:11 by , 4 years ago
One interesting output is strace -c from iperf:
~> strace -c iperf -c 127.0.0.1 ------------------------------------------------------------ Client connecting to 127.0.0.1, TCP port 5001 TCP window size: 32.0 KByte (default) ------------------------------------------------------------ [ 3] local 127.0.0.1 port 40108 connected with 127.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 58.9 MBytes 49.4 Mbits/sec Time % Usecs Calls Usecs/call Syscall ------ ---------- ------- ---------- -------------------- 100.00 10079874 3 3359958 _kern_mutex_switch_lock 0.00 128 1 128 _kern_image_relocated 0.00 23 2 11 _kern_spawn_thread 0.00 19 6 3 _kern_set_area_protection 0.00 4 1 4 _kern_create_area 0.00 3 1 3 _kern_exit_team 0.00 3 2 1 _kern_get_system_info 0.00 2 1 2 _kern_reserve_address_range 0.00 2 2 1 _kern_resume_thread 0.00 1 1 1 _kern_resize_area 0.00 0 1 0 _kern_get_next_image_info 0.00 0 4 0 _kern_sigaction
comment:12 by , 4 years ago
https://github.com/google/benchmark seems like it might be helpful to see where we're at vs Linux.
Linux 4.18.10-200.fc28.x86_64:
2018-10-05 09:42:27 Running ./test/basic_test Run on (8 X 3600 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 256K (x4) L3 Unified 8192K (x1) ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ***WARNING*** Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------- BM_empty 3 ns 3 ns 233438098 BM_empty/threads:8 1 ns 6 ns 122371744 BM_spin_empty/8 46 ns 46 ns 15205421 BM_spin_empty/512 2695 ns 2692 ns 257997 BM_spin_empty/8192 43966 ns 43901 ns 16100 BM_spin_empty/8/threads:8 12 ns 95 ns 7261392 BM_spin_empty/512/threads:8 698 ns 5488 ns 130448 BM_spin_empty/8192/threads:8 11046 ns 86815 ns 7984 BM_spin_pause_before/8 46 ns 46 ns 15192276 BM_spin_pause_before/512 2693 ns 2689 ns 267952 BM_spin_pause_before/8192 42290 ns 42232 ns 16134 BM_spin_pause_before/8/threads:8 12 ns 97 ns 7226928 BM_spin_pause_before/512/threads:8 701 ns 5562 ns 128960 BM_spin_pause_before/8192/threads:8 11219 ns 88271 ns 7840 BM_spin_pause_during/8 458 ns 459 ns 1525514 BM_spin_pause_during/512 3326 ns 3325 ns 201580 BM_spin_pause_during/8192 45514 ns 45407 ns 15909 BM_spin_pause_during/8/threads:8 75 ns 594 ns 1175000 BM_spin_pause_during/512/threads:8 771 ns 6036 ns 116296 BM_spin_pause_during/8192/threads:8 11327 ns 89091 ns 7984 BM_pause_during 437 ns 437 ns 1636400 BM_pause_during/threads:8 66 ns 517 ns 1353320 BM_pause_during/real_time 425 ns 427 ns 1620308 BM_pause_during/real_time/threads:8 66 ns 517 ns 10697656 BM_spin_pause_after/8 54 ns 54 ns 10148460 BM_spin_pause_after/512 3067 ns 3051 ns 234550 BM_spin_pause_after/8192 47197 ns 47013 ns 13512 BM_spin_pause_after/8/threads:8 13 ns 97 ns 8500864 BM_spin_pause_after/512/threads:8 714 ns 5651 ns 127944 BM_spin_pause_after/8192/threads:8 11470 ns 88168 ns 8456 BM_spin_pause_before_and_after/8 54 ns 54 ns 13917278 BM_spin_pause_before_and_after/512 3030 ns 3016 ns 213215 BM_spin_pause_before_and_after/8192 43801 ns 43707 ns 16402 BM_spin_pause_before_and_after/8/threads:8 12 ns 95 ns 7654360 BM_spin_pause_before_and_after/512/threads:8 719 ns 5040 ns 138824 BM_spin_pause_before_and_after/8192/threads:8 11009 ns 84438 ns 8088 BM_empty_stop_start 2 ns 2 ns 327797067 BM_empty_stop_start/threads:8 0 ns 3 ns 275301568 BM_KeepRunning 3 ns 3 ns 244697531 BM_KeepRunningBatch 0 ns 0 ns 1000000091 BM_RangedFor 2 ns 2 ns 373357250
comment:13 by , 4 years ago
- Haiku x86_64 hrev52295+100: (beta1 + updates) (Dell i5 Optiplex 3010 desktop)
- rtl81xx NIC
- Linux x86_64 4.18.10-200.fc28.x86_64: (Ryzen 1800X)
- Intel 1Gbit NIC
- Cat5e, 1 Gbit switch
- [ 3] 0.0-10.0 sec 935 MBytes 784 Mbits/sec
- [ 3] 0.0-10.0 sec 935 MBytes 785 Mbits/sec
- [ 3] 0.0-10.0 sec 840 MBytes 704 Mbits/sec
- [ 3] 0.0-10.0 sec 935 MBytes 784 Mbits/sec
- [ 3] 0.0-10.0 sec 840 MBytes 705 Mbits/sec
- [ 3] 0.0-60.0 sec 5.37 GBytes 768 Mbits/sec
All of these seem normal over a local network :-|
Linux test - wget https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.9-rc6.tar.xz
Linux - x86_64 - GPT - btrfs - SATA - 60MB/s