Hands On Vhost-User: A Warm Welcome To DPDK: 17-22 Minutes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

redhat.com

Hands on vhost-user: A warm welcome


to DPDK
17-22 minutes

In this post we will set up an environment and run a DPDK based


application in a virtual machine. We will go over all steps required to
set up a simple virtual switch in the host system which connects to
the application in a VM. This includes a description of how to
create, install and run a VM and install the application in it. You will
learn how to create a simple setup where you sent packets via the
application in the guest to a virtual switch in the host system and
back. Based on this setup you will learn how to tune settings to
achieve optimal performance.

For readers interested in playing with DPDK but not in configuring


and installing the required setup, we have Ansible playbooks in a
Github repository that can be used to automate everything. Let’s
start with the basic setup.

This downloads a preinstalled version of CentOS 7, ready to run in


an OpenStack environment. Since we’re not running OpenStack,
we have to clean the image. To do that, first we will make a copy of
the image so we can reuse it in the future:

The libvirt commands to do this can be executed with an


unprivileged user (recommended) if we export the following
variable:

This command mounts the filesystem and applies some basic


configuration automatically so that the image is ready to boot
afresh.

We need a network to connect our VM as well. Libvirt handles


networks in a similar way it manages VMs, you can define a
network using an XML file and start it or stop it through the
command line.

For this example, we will use a network called ‘default’ whose

1 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

definition is shipped inside libvirt for convenience. The following


commands define the ‘default’ network, start it and check that it’s
running.

Finally, we can use virt-install to create the VM. This command line
utility creates the needed definitions for a set of well known
operating systems. This will give us the base definitions that we can
then customize:

The options used for this command specify the number of vCPUs,
the amount of RAM of our VM as well as the disk path and the
network we want the VM to be connected to.

Apart from defining the VM according to the options that we


specified, the virt-install command should have also started the VM
for us so we should be able to list it:

Voilà! Our VM is running. We need to make some changes to its


definition soon. So we will shut it down now:

DPDK helps with optimally allocating and managing memory


buffers. On Linux this requires using hugepage support which must
be enabled in the running kernel. Using pages of a size bigger than
the usual 4K improves performance by using fewer pages and
therefore fewer TLB (Translation Lookaside Buffers) lookups.
These lookups are required to translate virtual to physical
addresses. To allocate hugepages during boot we add the following
to the kernel parameters in the bootloader configuration.

hugepagesz=1G: for the hugepages created during startup set the


size to 1G as well

hugepages=6: create 6 hugepages (of size 1G) from the start.


These should be seen after booting in /proc/meminfo

Note that in addition to the hugepage settings we also added two


IOMMU related kernel parameters, iommu=pt intel_iommu=on.
This will initialize the Intel VT-d and the IOMMU Pass-Through
mode that we will need for handling IO in the Linux userspace. As
we changed kernel parameters, now is a good time to reboot the
host.

After it comes up we can check that our changes to the kernel


parameters were effective by running user@host $ cat
/proc/cmdline.

2 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

The virt-install command created and started a VM using libvirt. To


connect our DPDK based vswitch testpmd to QEMU we need to
add the definition of the vhost-user interfaces (backed by UNIX
sockets) to the device section of the XML:

<interface type='vhostuser'>
<mac address='56:48:4f:53:54:01'/>
<source type='unix' path='/tmp/vhost-user1'
mode='client'/>
<model type='virtio'/>
<driver name='vhost' rx_queue_size='256' />
</interface>
<interface type='vhostuser'>
<mac address='56:48:4f:53:54:02'/>
<source type='unix' path='/tmp/vhost-user2'
mode='client'/>
<model type='virtio'/>
<driver name='vhost' rx_queue_size='256' />
</interface>

Another difference in the guest config compared to one used for


vhost-net is use of hugepages. For that we add the following to the
guest definition:

<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB'
nodeset='0'/>
</hugepages>
<locked/>
</memoryBacking>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>

And so memory can be accessed we need an additional setting in


the guest configuration. This is an important setting, without it we
won’t see any packets being transmitted:

<cpu mode='host-passthrough' check='none'>


<topology sockets='1' cores='3' threads='1'/>
<numa>

3 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

<cell id='0' cpus='0-2' memory='3145728'


unit='KiB' memAccess='shared'/>
</numa>
</cpu>

Now we need to start our guest. Because we configured it to


connect to the vhost-user UNIX sockets we need to be sure they
are available when the guest it started. This is achieved by starting
testpmd, which will open the sockets for us:
user@host $ sudo testpmd -l 0,2,3,4,5 --socket-
mem=1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1' \
--vdev 'net_vhost1,iface=/tmp/vhost-user2' --
\
--portmask=f -i --rxq=1 --txq=1 \
--nb-cores=4 --forward-mode=io

One last thing, because we connect to the vhost-user unix sockets,


we need to make QEMU run as root for this experiment. For this set
user = root in /etc/libvirt/qemu.conf. This is required
for our special use case but not recommended in general. In fact
readers should revert this setting after following this hands-on
article by commenting out the user = root setting.

Now we can start the VM with user@host $ virsh start


vhuser-test1.

Log in as root. The first thing we do in the guest is to bind the virtio
devices to the vfio-pci driver. To be able to do this we need to load
the required kernel modules first.
root@guest $ modprobe vfio
enable_unsafe_noiommu_mode=1
root@guest $ cat /sys/module/vfio/parameters
/enable_unsafe_noiommu_mode
root@guest $ modprobe vfio-pci

Let’s find out the PCI addresses of our virtio-net devices first.

root@guest $ dpdk-devbind --status net



Network devices using kernel driver
===================================

4 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

0000:01:00.0 'Virtio network device 1041'


if=enp1s0 drv=virtio-pci unused= *Active*
0000:0a:00.0 'Virtio network device 1041'
if=enp1s1 drv=virtio-pci unused=
0000:0b:00.0 'Virtio network device 1041'
if=enp1s2 drv=virtio-pci unused=

In the output of dpdk-devbind look for the virtio-devices in the


section that are not marked active. We can use these for our
experiment. Note: addresses may be different on the readers
system. When we first boot the devices will be automatically bound
to the virtio-pci driver. Because we want to use them not with the
kernel driver but with the vfio-pci kernel module, we first unbind
them from virtio-pci and then bind them to the vfio-pci driver.
root@guests $ dpdk-devbind.py -b vfio-pci
0000:0a:00.0 0000:0b:00.0

Now the guest is prepared to run our DPDK based application. To


make this binding permanent we could also use the driverctl
utility:

root@guest $ driverctl -v set-override


0000:00:10.0 vfio-pci

Do the same for the second virtio-device with address


0000:00:11.0. Then list all overrides to check it worked:

user@guest $ sudo driverctl list-overrides


0000:00:10.0 vfio-pci
0000:00:11.0 vfio-pci

Generating traffic

We installed and configured everything to finally run networking


traffic over our interfaces. Let’s start: In the host we first need to
start the testpmd instance which acts as a virtual switch. We will
just make it forward all packets it received on interface net_vhost0
to net_vhost1. It needs to be started before we start the VM,
because it will try to connect to the unix sockets belonging to the
vhost-user devices during initialization and they are created by
QEMU.

root@host $ testpmd -l 0,2,3,4,5 --socket-mem=1024

5 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

-n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1' \
--vdev 'net_vhost1,iface=/tmp/vhost-user2' --
\
--portmask=f -i --rxq=1 --txq=1 \
--nb-cores=4 --forward-mode=io

Now we can launch the VM we prepared previously:

user@host $ virsh start vhuser-test1

Notice how we can see output in the testpmd window which shows
the vhost-user messages it received

Once the guest has booted we can start the testpmd instance. This
one will initialize the ports and the virtio-net driver that DPDK
implements. Among other things this is where the virtio feature
negotiation takes place and the set of common features is agreed
upon.

Before we start testpmd we make sure that the vfio kernel module
is loaded and bind the virtio-net devices to the vfio-pci driver:

root@guest $ dpdk-devbind.py -b vfio-pci 0000:00:10.0


0000:00:11.0

Start testpmd:

root@guest $ testpmd -l 0,1,2 --socket-mem 1024 -n


4 \
--proc-type auto --file-prefix pg -- \
--portmask=3 --forward-mode=macswap --port-
topology=chained \
--disable-rss -i --rxq=1 --txq=1 \
--rxd=256 --txd=256 --nb-cores=2 --auto-start

Now we can check how many packets our testpmd instances are
processing. On the testpmd prompt we enter the command ‘show
port stats all’ and see the number of packets forwarded in each
direction (RX/TX).

An example:
testpmd> show port stats all

######################## NIC statistics for port

6 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

0 ########################
RX-packets: 75525952 RX-missed: 0 RX-
bytes: 4833660928
RX-errors: 0
RX-nombuf: 0
TX-packets: 75525984 TX-errors: 0 TX-
bytes: 4833662976

Throughput (since last show)


Rx-pps: 4684120
Tx-pps: 4684120

######################################################################

######################## NIC statistics for port


1 ########################
RX-packets: 75525984 RX-missed: 0 RX-
bytes: 4833662976
RX-errors: 0
RX-nombuf: 0
TX-packets: 75526016 TX-errors: 0 TX-
bytes: 4833665024

Throughput (since last show)


Rx-pps: 4681229
Tx-pps: 4681229

######################################################################

There are different forwarding modes in testpmd. In this example


we used --forward-mode=macswap, which swaps the destination
and source MAC address. Other forwarding modes like ‘io’ don’t
touch packets at all and will give much higher, but also even more
unrealistic numbers. Another forwarding mode is ‘noisy’. It can be
fine-tuned to simulate packet buffering and memory lookups.

Extra: Optimizing the configuration for maximum throughput


and low latency

7 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

So far we mostly stayed with the default settings. This helped to


keep the tutorial simple and easy to follow. But for those readers
interested in tuning all components for the best performance we will
explain what is needed to achieve this.

Optimizing host settings

We start with optimizing our host system.

There are a few settings we need to do in the host system to


achieve optimal performance. Note that you don’t necessarily have
to do all these manual steps. With tuned you get a set of available
tuned profiles that you can choose from. Applying the cpu-
partitioning profile of tuned will take care of all the steps we will
execute manually here.

Before we start explaining the tunings in detail, this is how you use
the tuned cpu-partition profile and don’t have to bother with all the
details:

user@host $ sudo dnf install tuned-profiles-cpu-


partitioning

Then edit /etc/tuned/cpu-partitioning-variables.conf


and set isol_cpus and no_balance_cores both to 2-7.

Now we can apply the tuned profile with the tuned-adm command:

user@host $ sudo tuned-adm profile cpu-


partitioning

Rebooting to apply the changes is necessary because we add


kernel parameters.

For those readers who want to know more details of what the cpu-
partitioning profile does, let’s do these steps manually. If you’re not
interested in this you can just skip to the next section.

Let’s assume we have eight cores in the system and we want to


isolate six of them on the same NUMA node. We use two cores to
run the guest virtual CPUs and the remaining four to run the data
path of the application.

The most basic change is done even outside of Linux in the BIOS
settings of the system. There we have to disable turbo-boost and
hyper-threads. If the BIOS is for some reason not accessible,

8 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

disable hyperthreads with this command:


cat /sys/devices/system/cpu/cpu*[0-9]/topology
/thread_siblings_list \
| sort | uniq \
| awk -F, '{system("echo 0 > /sys/devices
/system/cpu/cpu"$2"/online")}'

After that we attach the following to the kernel command line:

intel_pstate=disable isolcpus=2-7 rcu_nocbs=2-7


nohz_full=2-7

What these parameters mean is:

intel_pstate=disable: avoid switching power states

Do this by running:
user@host $ grubby --args "intel_pstate=disable
mce=ignore_ce isolcpus=2-7 rcu_nocbs=2-7
nohz_full=2-7" --update-kernel /boot/<your kernel
image file>

Non-maskable interrupts can reduce performance because they


steal valuable cycles where the core could handle packets instead,
so we disable them with:

user@host $ echo 0 > /proc/sys/kernel


/nmi_watchdog

To a similar effect we exclude the cores we want to isolate from the


writeback cpu mask:

user@host $ echo ffffff03 > /sys/bus/workqueue/devices/writeback


/cpumask

Optimizing guest settings

Similar to what we did in the host we also change the kernel


parameters for the guest. Again by using grubby we add the
following parameters to the configuration:

default_hugepagesz=1G hugepagesz=1G hugepages=1


intel_iommu=on iommu=pt isolcpus=1,2 rcu_nocbs=1,2
nohz_full=1,2

The meaning of the first three parameters are known from the host

9 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

configuration. The others are:

intel_iommu=on: Make use of the IOMMU.

iommu=pt: Operate IOMMU in pass-through mode. More about


what this means later.

isolcpus=1,2: Ask kernel to isolate these cores.

rcu_nocbs=1,2: Don’t do RCU callbacks on the cpu, offload it to


other threads to avoid RCU callbacks as softirqs.

nohz_full=1,2: Avoid scheduling clock ticks.

And because we want the same for guest cores handling the
packets that we want for the cores in the host we do the same
steps and disable NMIs, exclude the cores from block device
writeback flusher threads and from IRQs, we do this:

user@guest $ echo 0 > /proc/sys/kernel


/nmi_watchdog
user@guest $ echo 1 > /sys/bus/workqueue/devices
/writeback/cpumask
user@guest $ clear_mask=0x6 #Isolate CPU1 and CPU2
from IRQs
for i in /proc/irq/*/smp_affinity
do
echo "obase=16;$(( 0x$(cat $i) & ~$clear_mask
))" | bc > $i
done

Pinning virtual CPUs to physical cores in the host will make sure
the vcpus are not scheduled to different cores.
<cputune>
<vcpupin vcpu='0' cpuset='1'/>
<vcpupin vcpu='1' cpuset='6'/>
<vcpupin vcpu='2' cpuset='7'/>
<emulatorpin cpuset='0'/>
</cputune>

Analyzing the performance

After going through all the performance tuning steps, let’s run our
testpmd instances again to see how the number of packets per

10 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

second changed.
testpmd> show port stats all

######################## NIC statistics for port


0 ########################
RX-packets: 24828768 RX-missed: 0 RX-
bytes: 1589041152
RX-errors: 0
RX-nombuf: 0
TX-packets: 24828800 TX-errors: 0 TX-
bytes: 1589043200

Throughput (since last show)


Rx-pps: 5207755
Tx-pps: 5207755

######################################################################

######################## NIC statistics for port


1 ########################
RX-packets: 24852096 RX-missed: 0 RX-
bytes: 1590534144
RX-errors: 0
RX-nombuf: 0
TX-packets: 24852128 TX-errors: 0 TX-
bytes: 1590536192

Throughput (since last show)


Rx-pps: 5207927
Tx-pps: 5207927

######################################################################

Compared to the numbers of the not tuned setup the numbers of


packet per second increased roughly by 12%. This is a very simple
setup and we have no other workloads running on host and guest.
In a more complex scenario the performance improvement might
be even more significant.

After building a simple setup before, in this section we concentrated

11 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

on tuning the performance of the individual components. The key


here is to deconfigure and disable everything that distracts cores
(physical or virtual) from doing what they are supposed to do:
handling packets.

We did this manually in what seems like a complicated set of


commands so we can learn what is behind it all. But the truth is: all
this can be achieved by installing and using tuned and the tuned-
profiles-cpu-partition package and a simple one-line configuration
file change. Even more, the single biggest impact is achieved by
pinning the vCPUs to host cores.

Ansible scripts available

Setting this environment up and running is the first and fundamental


step in order to understand, debug and test this architecture. In
order to make it as quick and easy as possible, Red Hat’s virtio-net
team has developed a set of Ansible scripts for everyone to use.

Just follow the instructions in the README and Ansible should take
care of the rest.

Conclusion

We have set up and configured a host system to run DPDK based


application and created a virtual machine that is connected to it via
vhost-user interfaces. Inside the VM we ran testpmd, also built on
DPDK, and used it to generate, send and receive packets in a loop
between the testpmd vswitch instance in the host and the instance
in the VM. The setup we looked at is a very simple one. A next step
for the interested reader could be deploying and using OVS-DPDK,
which is OpenVSwitch built against DPDK. It’s a far more advanced
virtual switch used in production scenarios.

This is the last post on the "Virtio-networking and DPDK" topic,


which started with "how vhost-user came into being," and
was followed by "journey to the vhost-users realm."

Prior posts / Resources

Introducing virtio-networking: Combining virtualization and


networking for modern IT

12 of 13 11-05-20, 11:02 PM
Hands on vhost-user: A warm welcome to DPDK about:reader?url=https://www.redhat.com/en/blog/hands-vhost-user-wa...

Introduction to virtio-networking and vhost-net

Deep dive into Virtio-networking and vhost-net

Hands on vhost-net: Do. Or do not. There is no try

How vhost-user came into being: Virtio-networking and DPDK

A journey to the vhost-users realm

13 of 13 11-05-20, 11:02 PM

You might also like