2016 - Linux Networking Explained - 0
2016 - Linux Networking Explained - 0
$ ip link
[...]
$ ip link show enp1s0f1
4: enp1s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state [...]
link/ether 90:e2:ba:61:e7:45 brd ff:ff:ff:ff:ff:ff
Addresses
Do we need to consider a packet for local sockets?
Sockets
ip_local_deliver() ip_output()
Local?
ip_forward() Routing
net.ipv4.conf.all.forwarding = 1
H4x0r Tip: You can also modify this table after the generated
local routes have been inserted.
Routing
Device
Sockets Device
Device
Routes Routes
tap0 eth0
Packet Headers:
Ethernet VLAN IP
$ cp /usr/share/doc/teamd-*/example_configs/activebackup_ethtool_1.conf .
$ teamd -g -f activebackup_ethtool_1.conf -d
[...]
$ teamdctl team0 state
[...]
Veth
Virtual Ethernet Cable
Namespace 1 Namespace 2
● Bidirectional FIFO
● Often used to cross namespaces veth0 veth1
Namespace
Host
br0
veth0 veth1
team0
Namespace Namespace
Container A Container B
eth0 eth0
eth0 eth1
MACVLAN
Simplified bridging for guests
● NOT 802.1Q VLANs
● Multiple MAC addresses on single interface
● KISS - no learning, no STP macvlan0 macvlan1
slaves MAC1 MAC2
● Modes:
– VEPA (default): Guest to guest done on
ToR, L3 fallback possible
master Physical Device
– Bridge: Guest to guest in software
– Private: Isolated, no guest to guest
– Passthrough: Attaches VF (SR-IOV)
$ ip link add link em1 name macvlan0 type macvlan mode bridge
$ ip -d link show macvlan0
23: macvlan0@em1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN [...]
link/ether f2:d8:91:54:d0:69 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvlan mode bridge addrgenmode eui64
$ ip link set macvlan0 netns blue
Example
Team + MACVLAN
Namespace
Host
team0
Namespace Namespace
Container A Container B
eth0 eth0
(macvlan) (macvlan)
eth0 eth1
TUN/TAP
A gate to user space
● Character Device in user space File File
Descriptor Descriptor
● Network device in kernel space user
● L2 (TAP) or L3 (TUN) kernel
tun0 tap0
● Uses: encryption, VPN, tunneling,
virtual machines, ...
$ ip tuntap add tun0 mode tun
$ ip link set tun0 up
$ ip link show tun0
18: tun0: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel [...]
link/none
$ ip route add 10.1.1.0/24 dev tun0
user.c:
fd = open("/dev/net/tun", O_RDWR);
strncpy(ifr.ifr_name,“tap0”, IFNAMSIZ);
ioctl(fd, TUNSETIFF, (void *) &ifr);
MACVTAP
Bridge + TAP = MACVTAP
● A TAP with an integrated bridge
/dev/tap2 /dev/tap3
● Connects VM/container via L2 user
● Same modes as MACVLAN kernel
macvtap2 macvtap3
MAC1 MAC2
Physical Device
$ ip link add link em1 name macvtap0 type macvtap mode vepa
$ ip -d link show macvtap
20: macvtap0@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP [...]
link/ether 3e:cb:79:61:8c:4b brd ff:ff:ff:ff:ff:ff
macvtap mode vepa addrgenmode eui64
$ ls -l /dev/tap20
crw-------. 1 root root 241, 1 Aug 8 21:08 /dev/tap20
IPVLAN
MACVLAN for Layer 3 (L3)
● Can hide many containers behind a
single MAC address. ipvlan0 ipvlan1
● Shared L2 among slaves slaves IP1 IP2
● Mode:
– L2: Like MACVLAN w/ single MAC
master Physical Device
– L3: L2 deferred to master
namespace, no multicast/broadcast
Underlay Overlay
$ ip link add vxlan42 type vxlan id 42 group 239.1.1.1 dev em1 dstport 4789
$ ip link set vxlan42 up
$ ip link show vxlan42
31: vxlan42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN [...]
link/ether e6:fc:c8:7e:07:83 brd ff:ff:ff:ff:ff:ff
IPSec
Authenticated &
Encrypted Socket Socket
Netdevice L3 Netdevice
Transport Mode
● AH: Authentication
Ethernet IP ESP TCP
● ESP: Authenication +
Tunnel Mode encryption
Ethernet IP ESP IP TCP
Verifier
+ JIT
Sockets Kernel
add eax,edx add eax,edx
shl eax,2 shl eax,2
Network
TC Stack TC
Ingress Egress
netdevice netdevice
● Maps
– Arrays (per CPU), hashtables (per CPU)
● Packet mangling
● Redirect to other device
● Tunnel metadata (encapsulation)
● Cgroups integration
● Event notifications via perf ring buffer
XDP – Express Data Path
Source Byte
Code Code
LLVM/clang Userspace
Verifier
+ JIT
Network
Netdevice Stack
Driver
Q&A
Learn more about networking with BPF:
Fast IPv6-only Networking for Containers Based on
BPF and XDP
Wednesday August 24, 2016 4:35pm – 5:35pm, Queen's Quay
Contact:
● Twitter: @tgraf__ Mail: [email protected]
Image Sources:
● Cover (Toronto)
Rick Harris (https://www.flickr.com/photos/rickharris/)
● The Invisible Man
Dr. Azzacov (https://www.flickr.com/photos/drazzacov/)
● Chicken
JOHN LLOYD (https://www.flickr.com/photos/hugo90/)