Opened 3 years ago

Last modified 4 months ago

#13088 assigned enhancement

Performance: Investigate network I/O bottlenecks

Reported by: kallisti5 Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: Network & Internet/TCP Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

There seems to be some pretty serious performance bottlenecks around the network stack or the disk drivers, or the BFS filesystem.

Change History (14)

comment:1 by kallisti5, 3 years ago

Linux test - wget https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.9-rc6.tar.xz

07:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)

Linux - x86_64 - GPT - btrfs - SATA - 60MB/s

comment:2 by kallisti5, 3 years ago

Haiku test - wget ​https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.9-rc6.tar.xz

Haiku x86_64, hrev50707, HaikuPorts wget 1.18-1 x86_64

device Network controller (Ethernet controller) [2|0|0]
  vendor 10ec: Realtek Semiconductor Co., Ltd.
  device 8168: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
  • Haiku - x86_64 - GPT - BFS - SATA - 7.8MB/s
  • Haiku - x86_64 - GPT - BFS - RAM Disk (200M) - 8.22MB/s

That seems to show the issue is somewhere either in the network stack, NIC driver, or BFS driver.

comment:3 by kallisti5, 3 years ago

Looks like definitely network stack or network card driver.

Haiku x86_64, hrev50707

~> dd if=linux-4.9-rc6.tar.xz of=/RAM/linux-4.9-rc6.tar.xz
181945+1 records in
181945+1 records out
93156312 bytes (93 MB) copied, 0.764035 s, 122 MB/s

comment:4 by kallisti5, 3 years ago

#13089 might be one cause.

comment:5 by kallisti5, 3 years ago

This is definitely caused by #13089. If I disable TCP window scaling on Linux, I get ~8MB/s downloads.

sudo sysctl -w net.ipv4.tcp_window_scaling=0

linux-4.9-rc6.tar.xz.1  42%[===>     ]  37.76M  8.95MB/s    eta 6s

comment:6 by kallisti5, 3 years ago

Summary: Performance: Investigate bottlenecksPerformance: Investigate network I/O bottlenecks

comment:7 by kallisti5, 3 years ago

we actually do implement basic rfc1323 from what I can tell.. and wireshark shows tcp window scaling multipliers to be in use.

I did another transfer, wget from my local machine of a *larger* file and have some new behavior. The speed ramps up and down.

8MB/s, 7MB/s <10 or so seconds> 60MB/s <5 or so seconds> 2MB/s <10 or so seconds> 100MB/s <5 or so seconds> 2MB/s

So it seems like more of a congestion issue? No traffic on the local lan.

comment:8 by diver, 3 years ago

Component: - GeneralNetwork & Internet/TCP
Owner: changed from nobody to axeld

comment:9 by axeld, 3 years ago

Owner: changed from axeld to nobody
Status: newassigned

comment:10 by kallisti5, 14 months ago

We have iperf in our repositories, assuming there is nothing wrong with our port, I ran a few tests:

iperf -s
iperf -c 127.0.0.1
  • Linux x86_64 4.18.10-200.fc28.x86_64: (Ryzen 1800X)
    • TCP window size: 2.50 MByte (default)
      • [ 3] 0.0-10.0 sec 57.3 GBytes 49.3 Gbits/sec
      • [ 3] 0.0-10.0 sec 57.0 GBytes 48.9 Gbits/sec
  • Linux x86_64 4.14.16-300.fc27.x86_64: (VM @ Vultr)
    • TCP window size: 2.50 MByte (default)
      • [ 3] 0.0-10.0 sec 17.8 GBytes 15.3 Gbits/sec
      • [ 3] 0.0-10.0 sec 16.6 GBytes 14.3 Gbits/sec
  • Haiku, x86_64 hrev52295-16-g039fdd9ffe (early beta1 branch) (VM @ Vultr)
    • TCP window size: 32.0 KByte (default)
    • Uptime 15 days, 22 hours, 46 minutes - exposed to the internet.
      • [ 3] 0.0-10.0 sec 71.2 MBytes 59.7 Mbits/sec
      • [ 3] 0.0-10.0 sec 44.7 MBytes 37.5 Mbits/sec
  • Haiku, x86_64 hrev52295-16-g039fdd9ffe (early beta1 branch) (VM @ Vultr)
    • TCP window size: 32.0 KByte (default)
    • Uptime 1 minute
      • [ 3] 0.0-10.0 sec 62.2 MBytes 52.2 Mbits/sec
      • [ 3] 0.0-10.0 sec 58.6 MBytes 49.2 Mbits/sec
  • Haiku, x86_64 hrev52295-16-g039fdd9ffe (early beta1 branch) (VM @ Vultr)
    • TCP window size: 2.50 MByte
    • Uptime 7 minutes
      • [ 3] 0.0-10.0 sec 65.8 MBytes 55.2 Mbits/sec
      • [ 3] 0.0-10.0 sec 33.6 MBytes 28.2 Mbits/sec
  • Haiku x86_64 hrev52295+100: (beta1 + updates) (Dell i5 Optiplex 3010 desktop)
    • TCP window size: 32Kbyte (default)
      • [ 3] 0.0-10.0 sec 9.35 GBytes 8.03 Gbits/sec
      • [ 3] 0.0-10.0 sec 6.48 GBytes 5.56 Gbits/sec
  • Haiku x86_64 hrev52295+100: (beta1 + updates) (Dell i5 Optiplex 3010 desktop)
    • TCP window size: 2.50 Mbyte
      • [ 3] 0.0-10.0 sec 10.7 GBytes 9.23 Gbits/sec
      • [ 3] 0.0-10.0 sec 18.0 MBytes 15.1 Mbits/sec
      • [ 3] 0.0-10.0 sec 10.5 GBytes 9.01 Gbits/sec
      • [ 3] 0.0-10.0 sec 505 MBytes 424 Mbits/sec
      • [ 3] 0.0-10.0 sec 16.1 MBytes 13.5 Mbits/sec
      • [ 3] 0.0-60.0 sec 30.8 GBytes 4.41 Gbits/sec
Last edited 14 months ago by kallisti5 (previous) (diff)

comment:11 by kallisti5, 14 months ago

One interesting output is strace -c from iperf:

~> strace -c iperf -c 127.0.0.1
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 40108 connected with 127.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  58.9 MBytes  49.4 Mbits/sec

Time % Usecs      Calls   Usecs/call Syscall
------ ---------- ------- ---------- --------------------
100.00   10079874       3    3359958 _kern_mutex_switch_lock
  0.00        128       1        128 _kern_image_relocated
  0.00         23       2         11 _kern_spawn_thread
  0.00         19       6          3 _kern_set_area_protection
  0.00          4       1          4 _kern_create_area
  0.00          3       1          3 _kern_exit_team
  0.00          3       2          1 _kern_get_system_info
  0.00          2       1          2 _kern_reserve_address_range
  0.00          2       2          1 _kern_resume_thread
  0.00          1       1          1 _kern_resize_area
  0.00          0       1          0 _kern_get_next_image_info
  0.00          0       4          0 _kern_sigaction

comment:12 by kallisti5, 14 months ago

https://github.com/google/benchmark seems like it might be helpful to see where we're at vs Linux.

Linux 4.18.10-200.fc28.x86_64:

2018-10-05 09:42:27
Running ./test/basic_test
Run on (8 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------------------
Benchmark                                              Time           CPU Iterations
-------------------------------------------------------------------------------------
BM_empty                                               3 ns          3 ns  233438098
BM_empty/threads:8                                     1 ns          6 ns  122371744
BM_spin_empty/8                                       46 ns         46 ns   15205421
BM_spin_empty/512                                   2695 ns       2692 ns     257997
BM_spin_empty/8192                                 43966 ns      43901 ns      16100
BM_spin_empty/8/threads:8                             12 ns         95 ns    7261392
BM_spin_empty/512/threads:8                          698 ns       5488 ns     130448
BM_spin_empty/8192/threads:8                       11046 ns      86815 ns       7984
BM_spin_pause_before/8                                46 ns         46 ns   15192276
BM_spin_pause_before/512                            2693 ns       2689 ns     267952
BM_spin_pause_before/8192                          42290 ns      42232 ns      16134
BM_spin_pause_before/8/threads:8                      12 ns         97 ns    7226928
BM_spin_pause_before/512/threads:8                   701 ns       5562 ns     128960
BM_spin_pause_before/8192/threads:8                11219 ns      88271 ns       7840
BM_spin_pause_during/8                               458 ns        459 ns    1525514
BM_spin_pause_during/512                            3326 ns       3325 ns     201580
BM_spin_pause_during/8192                          45514 ns      45407 ns      15909
BM_spin_pause_during/8/threads:8                      75 ns        594 ns    1175000
BM_spin_pause_during/512/threads:8                   771 ns       6036 ns     116296
BM_spin_pause_during/8192/threads:8                11327 ns      89091 ns       7984
BM_pause_during                                      437 ns        437 ns    1636400
BM_pause_during/threads:8                             66 ns        517 ns    1353320
BM_pause_during/real_time                            425 ns        427 ns    1620308
BM_pause_during/real_time/threads:8                   66 ns        517 ns   10697656
BM_spin_pause_after/8                                 54 ns         54 ns   10148460
BM_spin_pause_after/512                             3067 ns       3051 ns     234550
BM_spin_pause_after/8192                           47197 ns      47013 ns      13512
BM_spin_pause_after/8/threads:8                       13 ns         97 ns    8500864
BM_spin_pause_after/512/threads:8                    714 ns       5651 ns     127944
BM_spin_pause_after/8192/threads:8                 11470 ns      88168 ns       8456
BM_spin_pause_before_and_after/8                      54 ns         54 ns   13917278
BM_spin_pause_before_and_after/512                  3030 ns       3016 ns     213215
BM_spin_pause_before_and_after/8192                43801 ns      43707 ns      16402
BM_spin_pause_before_and_after/8/threads:8            12 ns         95 ns    7654360
BM_spin_pause_before_and_after/512/threads:8         719 ns       5040 ns     138824
BM_spin_pause_before_and_after/8192/threads:8      11009 ns      84438 ns       8088
BM_empty_stop_start                                    2 ns          2 ns  327797067
BM_empty_stop_start/threads:8                          0 ns          3 ns  275301568
BM_KeepRunning                                         3 ns          3 ns  244697531
BM_KeepRunningBatch                                    0 ns          0 ns 1000000091
BM_RangedFor                                           2 ns          2 ns  373357250

comment:13 by kallisti5, 14 months ago

  • Haiku x86_64 hrev52295+100: (beta1 + updates) (Dell i5 Optiplex 3010 desktop)
    • rtl81xx NIC
  • Linux x86_64 4.18.10-200.fc28.x86_64: (Ryzen 1800X)
    • Intel 1Gbit NIC
  • Cat5e, 1 Gbit switch
  • [ 3] 0.0-10.0 sec 935 MBytes 784 Mbits/sec
  • [ 3] 0.0-10.0 sec 935 MBytes 785 Mbits/sec
  • [ 3] 0.0-10.0 sec 840 MBytes 704 Mbits/sec
  • [ 3] 0.0-10.0 sec 935 MBytes 784 Mbits/sec
  • [ 3] 0.0-10.0 sec 840 MBytes 705 Mbits/sec
  • [ 3] 0.0-60.0 sec 5.37 GBytes 768 Mbits/sec

All of these seem normal over a local network :-|

comment:14 by waddlesplash, 4 months ago

No change required?

Note: See TracTickets for help on using tickets.