Since 1.9.0, dnsdist can use AF_XDP for high performance UDP packet processing recent Linux kernels (4.18+). It requires dnsdist to have the CAP_NET_ADMIN, CAP_SYS_ADMIN and CAP_NET_RAW capabilities at startup, and to have been compiled with the -with-xsk configure option.
The way AF_XDP works is that dnsdist allocates a number of frames in a memory area called a UMEM, which is accessible both by the program, in userspace, and by the kernel. Using in-memory ring buffers, the receive (RX), transmit (TX), completion (cq) and fill (fq) rings, the kernel can very efficiently pass raw incoming packets to dnsdist, which can in return pass raw outgoing packets to the kernel. In addition to these, an eBPF XDP program needs to be loaded to decide which packets to distribute via the AF_XDP socket (and to which, as there are usually more than one). This program uses a BPF map of type XSKMAP (located at /sys/fs/bpf/dnsdist/xskmap by default) that is populated by dnsdist at startup to locate the AF_XDP socket to use. dnsdist also sets up two additional BPF maps (located at /sys/fs/bpf/dnsdist/xsk-destinations-v4 and /sys/fs/bpf/dnsdist/xsk-destinations-v6) to let the XDP program know which IP destinations are to be routed to the AF_XDP sockets and which are to be passed to the regular network stack (health-checks queries and responses, for example). A ready-to-use XDP program can be found in the contrib directory of the PowerDNS Git repository:
Then dnsdist needs to be configured to use AF_XDP, first by creating a XskSocket object that are tied to a specific queue of a specific network interface:
This ties the new object to the first receive queue on enp1s0, allocating 65536 frames and populating the map located at /sys/fs/bpf/dnsdist/xskmap.
Then we can tell dnsdist to listen for AF_XDP packets to 192.0.2.1:53, in addition to packets coming via the regular network stack:
In practice most high-speed (>= 10 Gbps) network interfaces support multiple queues to offer better performance, so we need to allocate one XskSocket per queue. We can retrieve the number of queues for a given interface via:
The Combined lines tell us that the interface supports 8 queues, so we can do something like this:
This will start one router thread per XskSocket object, plus one worker thread per addLocal() using that XskSocket object.
We can instructs dnsdist to use AF_XDP to send and receive UDP packets to a backend in addition to packets from clients:
This will start one router thread per XskSocket object, plus one worker thread per addLocal()/newServer() using that XskSocket object.
We are not passing the MAC address of the backend (or the gateway to reach it) directly, so dnsdist will try to fetch it from the system MAC address cache. This may not work, in which case we might need to pass explicitly:
Using kxdpgun, we can compare the performance of dnsdist using the regular network stack and AF_XDP.
This test was realized using two Intel E3-1270 with 4 cores (8 threads) running at 3.8 Ghz, using Intel 82599 10 Gbps network cards. On both the injector running kxdpgun and the box running dnsdist there was no firewall, the governor was set to performance, the UDP buffers were raised to 16777216 and the receive queue hash policy set to use the IP addresses and ports (see Performance Tuning).
dnsdist was configured to immediately respond to incoming queries with REFUSED:
On the injector box we executed:
We first ran without AF_XDP:
then with:
The first run handled roughly 1 million QPS, the second run 2.5 millions, with the CPU usage being much lower in the AF_XDP case.
dnsdist needs quite a few more additional permissions to use AF_XDP:
to access the BPF maps directory, it needs to be able to go into the /sys/fs/bpf directory: one option is to chmod o+x /sys/fs/bpf, a safer one is to restrict that to the dnsdist user instead via chgrp dnsdist /sys/fs/bpf && chmod g+x /sys/fs/bpf
to read the BPF maps themselves, they need to be readable by the dnsdist user: chown -R dnsdist:dnsdist /sys/fs/bpf/dnsdist/
to create AF_XDP sockets: add AF_XDP to RestrictAddressFamilies in the systemd unit file
to load a BPF program: add CAP_SYS_ADMIN to CapabilityBoundingSet and AmbientCapabilities in the systemd unit file
to create raw network sockets: add CAP_NET_RAW to CapabilityBoundingSet and AmbientCapabilities in the systemd unit file
and finally to lock enough memory: ensure that LimitMEMLOCK=infinity is set in the systemd unit file
Link nội dung: https://www.sachhayonline.com/xsk-a66625.html