Gluon: Bad VXLAN Performance

Created on 21 Jan 2018  ·  5Comments  ·  Source: freifunk-gluon/gluon

While testing the current master I noticed a significant performance drop when using VXLAN for wiremesh instead of the old fashioned way that binds the Wiremesh Interface directly to bat0:

Testsetting: TP-Link TL-WDR4300 v1 vs Siemens Futro S550, both connected to the same GBE capable Switch, VPN turned of at the WDR4300:

  1. Test with legacy Wiremesh:

Initiated from WDR4300:

root@ffnoc12:~# batctl tp -t 10000 a6:07:f9:5a:71:db
Test duration 10010ms.
Sent 382825692 Bytes.
Throughput: 36.47 MB/s (305.95 Mbps)

Initiated from Offloader

root@ffnocoffloader:~# batctl tp -t 10000 46:3d:e1:c3:7f:63
Test duration 10020ms.
Sent 457679556 Bytes.
Throughput: 43.56 MB/s (365.41 Mbps)
  1. Same test with VXLAN:

Initiated on WDR:

root@ffnoc12:~# uci set network.mesh_wan.legacy='0'
root@ffnoc12:~# uci commit network
root@ffnoc12:~# /etc/init.d/network restart
root@ffnoc12:~# batctl tp -t 10000 a6:07:f9:5a:71:db
Test duration 10020ms.
Sent 30967956 Bytes.
Throughput: 2.95 MB/s (24.72 Mbps)

Initiated on Offloader:

root@ffnocoffloader:~# batctl tp -t 10000 46:3d:e1:c3:7f:63
Test duration 10110ms.
Sent 47189196 Bytes.
Throughput: 4.45 MB/s (37.34 Mbps)

For domainshortcut prevention VXLAN (or any Option to authenticate the Wiremesh) is a good idea, but for RF Backbone purposes this is too slow.

bug regression

Most helpful comment

This check was with iperf on the WDR3600. This obviously harmed test performance, but I only wanted to test relative performance with and without VXLAN, and not the maximum achievable throughput.

All 5 comments

I have tested this with both iperf and batctl tp; I could reproduce the extreme performance drop with batctl, but not with iperf (or rather, with iperf the performance was rather bad even in legacy mode). The reason is fragmentation: the packet size used by batctl's throughput meter is chosen so that it goes through a 1500 byte link without fragmentation, but it needs to be fragmented over a 1430 byte link.

Therefore, the numbers given by iperf are more accurate for real-life scenarios, as fragmentation is usually necessary both in VXLAN and legacy mode (or in neither, with proper MSS clamping). I have pushed a few optimizations for wired meshing (2950cc3f596d5565390aaa1188cdb67d2401840b affects both legacy and VXLAN mode, a9edd43693a02e0829d04a83a13ebbf0f7eef3ee and e54b37d835624059d005bdb771442bb3f1dd4605 slightly improve VXLAN performance). There will also be a follow-up to 7ae8a511267e7f280862fcd57f8ae394b947b799 in a few days.

With all these patches applied (including the follow-up), I have measured the following numbers with iperf on a WDR3600 (using my notebook on the other side) without MSS clamping:

  • legacy RX: 96.9 Mbits/sec
  • legacy TX: 107 Mbits/sec
  • VXLAN RX: 56.5 Mbits/sec
  • VXLAN TX: 46.2 Mbits/sec

Reducing the MSS to avoid fragmentation, I get the following numbers:

  • legacy RX: 164 Mbits/sec
  • legacy TX: 149 Mbits/sec
  • VXLAN RX: 88.5 Mbits/sec
  • VXLAN TX: 74.3 Mbits/sec

So VXLAN does cost performance without doubt, but it is by no means as bad as batctl tp might suggest. I will look into further optimiazation options (e.g. ip6tables, which is responsible for a considerable part of the performance drop).

did you run iperf ON the wdr3600? if so, wasn't this test limited by cpu performance used by iperf itself?
we always did iperf with real x86 machines on both ends, only.
if i'm on the wrong track, please ignore :-D

This check was with iperf on the WDR3600. This obviously harmed test performance, but I only wanted to test relative performance with and without VXLAN, and not the maximum achievable throughput.

With d87a798ac3d1e8cef3d83c22c4482afd21886c34, all throughput optimizations that are easily possible have been made. Firewall performance will be revisited after the next release.

maybe we could document how much the performance was improved compared to the measurements @lephisto and @NeoRaider did in january?
or the other way around, how much the performance still suffers compared to legacy meshing.

Was this page helpful?
0 / 5 - 0 ratings