Learning kernel-bypass networking in a simpler way

Jackie Dinh
2 min readJan 3, 2022

In previous article, we show that establishing one million tcp connections at server side can be done in modern Linux, but handling millions of requests per second (i.e. c10m) is quite challenging in the current Linux network stack. That’s due to complexity of Linux network stack which data from NIC travels through many layers to reach applications.

The limit of Linux network stack has been addressed by kernel-bypass networking technique. Examples are Netmap and DPDK. These frameworks are able to handle millions of raw packets at line rate (1.44Mpps for 1Gb card, 14.88 Mpps for 10Gb card).

Ixy is a good starting point for learning more on kernel-bypass networking or even simpler driver for e1000 (8254x Intel) card which is easily found in virtual machines like qemu and vritualbox at this repo.

Basic idea of kernel-bypass networking is to have direct interaction with network devices instead of using Linux system calls to send/receive data through Linux kernel.

An userspace driver consists of following steps:
1. Read pci config, enable DMA and get base address of device:

- pci_addr is pci bus address of format <domain:bus:device.function>.
- Setting bit 2 at offset 4 of pci config space to 1 is enabling DMA.
- Calling mmap() will map device into memory so that we can access to device registers and memory directly with the base address.

2. Initialize driver:

Note that by using the base address, we can access device registers by their offset to the base address. e.g. base_addr[E1000_IMS] means access to interrupt mask set register by base_addr + E1000_IMS.

Steps to initialize a device:
- reset device by settting reset bit of control register
- initialize transmit queue by setting all buffers empty
- initialize receive queue by allocating memory for receive buffers. Note that device only access to physical memory address.
- setup transmit and receive control bits
- enable receive interrupts

For more details, see e1000 specs here. With this setting, both driver and device share the same memory for receiving packets, so we can build application that can access raw packets from NIC with zero-copying data.

Hope that with this basic understanding of kernel-bypass networking in simple userspace driver (source code here), interesting readers can explore more details in complicated framework like DPDK which deploys other the techniques like TLB handling, poll mode driver, etc to gain more performance.

The remain challenge is how application can catch up to handle millions of packets per second.

--

--

Jackie Dinh

Seeking to develop useful software to create great values