0% found this document useful (0 votes)
154 views84 pages

Kubernetes Networking Made Easy With Open Vswitch and OpenFlow Péter Megyesi LeanNet Ltd.

Open vSwitch is a virtual switch that is programmable and used as the basis for networking in many cloud platforms. Kubernetes is a container orchestration system that automates deployment and management of containerized applications. When using Open vSwitch with Kubernetes, a CNI plugin allows Open vSwitch to provide networking between Kubernetes pods by creating a veth pair connecting each pod to the Open vSwitch bridge, allowing pods to communicate without NAT.

Uploaded by

M_ahmed81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views84 pages

Kubernetes Networking Made Easy With Open Vswitch and OpenFlow Péter Megyesi LeanNet Ltd.

Open vSwitch is a virtual switch that is programmable and used as the basis for networking in many cloud platforms. Kubernetes is a container orchestration system that automates deployment and management of containerized applications. When using Open vSwitch with Kubernetes, a CNI plugin allows Open vSwitch to provide networking between Kubernetes pods by creating a veth pair connecting each pod to the Open vSwitch bridge, allowing pods to communicate without NAT.

Uploaded by

M_ahmed81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Kubernetes Networking Made Easy

with Open vSwitch and OpenFlow

Péter Megyesi
Co-founder @ LeanNet ltd.
Who Am I?
PhD in Telecommunications @ Budapest University of Technology
§ Measurement and monitoring in Software Defined Networks
§ Participating in 5G-PPP EU projects
§ Graduated in the EIT Digital Doctoral School
Co-founder & CTO @ LeanNet Ltd.
§ Evangelist of open networking solutions
§ Currently focusing on SDN in cloud native environments

[email protected]

twitter.com/M3gy0

linkedin.com/in/M3gy0

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
What is Open vSwitch?

The de facto production quality, multilayer virtual switch


§ Originally developed by Nicira (the inventors of SDN and OpenFlow)
§ Now it’s developed under the Linux Foundation
§ Designed to be programmable by OVSDB and OpenFlow
§ Compatible with standard management interfaces (NetFlow, sFlow, IPFIX, RSPAN, LACP)
§ The basis of VMware NSX-T, OpenStack and many other public clouds…
§ Able to run in user-space mode via DPDK, thus can provide speed up to ~80 Gbps
NIC

OVSDB
SDN controller Open vSwitch
OpenFlow

VM 1 VM 2

Server
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
What is Kubernetes?

The de facto production quality, container-orchestration framework


§ Originally developed by Google (Borg project)
§ Now maintained by the Cloud Native Compute Foundation
§ Automating deployment, scaling, and management of containerized applications

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Basic Kubernetes Terminology

Kubernetes Master Replication Set


§ Controller of a Kubernetes cluster § Ensures availability and scalability
Kubernetes Node (Worker / Minion) Services
§ Hosts (server or VM) that run Kubernetes § Collection of pods exposed as an
applications
endpoint
Container
Node Port
§ Unit of packaging
§ Expose services internally
Pod
§ Unit of deployment Load Balancer
§ The way for external access
Labels and Selectors
§ Key-Value pairs for identification

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model

eth0
Docker Host
docker0 172.17.0.1/24

vethxx

vethxy Root namespace

Container 1

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model

eth0
Docker Host
docker0 172.17.0.1/24
vethxx

Root namespace

eth0
172.17.0.2

Container 1

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model

eth0
Docker Host
docker0 172.17.0.1/24
vethxx vethyy

Root namespace

eth0 eth0
172.17.0.2 172.16.0.3

Container 1 Container 2

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model

Docker Host 2
Docker Host 1
172.17.0.2
172.17.0.2
NAT
Container
Container NAT
NAT
NAT

172.17.0.3 Docker Host 3

NAT 172.17.0.2

Container
Container

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Docker Host Ports

172.17.0.2

80 172.17.0.2
17472 SNAT

9898
5001
SNAT 26432

Host 2: 10.0.0.20
172.17.0.3
This is unfeasible in a very large cluster!
Host 1: 10.0.0.10
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Networking in Kubernetes

Pod-to-Pod communication
§ Each Pod in a Kubernetes cluster is assigned an IP in a flat shared networking namespace
§ All PODs can communicate with all other PODs without NAT
§ The IP that a PODs sees itself as is the same IP that others see it as
Pod-to-Service communication
§ Requests to the Service IPs are intercepted by a Kube-proxy process running on all hosts
§ Kube-proxy is then responsible for routing to the correct POD
External-to-Internal communication
§ All nodes can communicate with all PODs (and vice-versa) without NAT
§ Node ports are can be assigned to a service on every Kuberentes host
§ Public IPs can be implemented by configuring external Load Balancers which target all
nodes in the cluster
§ Once traffic arrives at a node, it is routed to the correct Service backends by Kube-proxy
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Container Network Interface

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI in Kubernetes

Script / binary placed on every host


§ Kubelet calls it with the right eth0
Kubernetes Node
environmental variables and
STDIN parameters br0 10.244.1.1

Example for configuration Root namespace


- /etc/cni/net.d/01-dunlin.conf

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI in Kubernetes

Script / binary placed on every host


§ Kubelet calls it with the right eth0
Kubernetes Node
environmental variables and
STDIN parameters br0 10.244.1.1

Example environment variables Root namespace


§ CNI_command: add or delete
§ CNI_netns: /proc/<PID>/ns/net
§ CNI_ifname: eth0
§ CNI_path: /opt/bin/cni
§ CNI_containerid
§ K8S_pod_name
§ K8S_pod_namespace
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch

Create virtual ethernet port pair


§ ip link add veth0 type veth peer name veth1 eth0
Kubernetes Node
br0 10.244.1.1

veth0 veth1
Root namespace

Container NS

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch

Create virtual ethernet port pair


§ ip link add veth0 type veth peer name veth1 eth0
Kubernetes Node
Add interface to OVS bridge
§ ovs-vsctl add-port br0 veth0 br0 10.244.1.1
veth0

veth1
Root namespace

Container NS

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch

Create virtual ethernet port pair


§ ip link add veth0 type veth peer name veth1 eth0
Kubernetes Node
Add interface to OVS bridge
§ ovs-vsctl add-port br0 veth0 br0 10.244.1.1
veth0
Add the other interface to namespace
§ ip set link veth1 netns $CNI_netns
Root namespace

veth1

Container NS

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch

Create virtual ethernet port pair


§ ip link add veth0 type veth peer name veth1 eth0
Kubernetes Node
Add interface to OVS bridge
§ ovs-vsctl add-port br0 veth0 br0 10.244.1.1
veth0
Add the other interface to namespace
§ ip set link veth1 netns $CNI_netns
Root namespace
Rename and setup interface
§ ip netns exec $CNI_netns
eth0
§ ip link set dev veth1 name eth0
§ ip link set dev eth0 address 10.244.1.2
§ ip link set dev eth0 mtu 1450
§ ip route add default via 10.244.1.1
Container NS

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch

Create virtual ethernet port pair


§ ip link add veth0 type veth peer name veth1 eth0
Kubernetes Node
Add interface to OVS bridge
§ ovs-vsctl add-port br0 veth0 br0 10.244.1.1
veth0
Add the other interface to namespace
§ ip set link veth1 netns $CNI_netns
Root namespace
Rename and setup interface
§ ip netns exec $CNI_netns
eth0
§ ip link set dev veth1 name eth0
§ ip link set dev eth0 address 10.244.1.2
§ ip link set dev eth0 mtu 1450
§ ip route add default via 10.244.1.1
Container NS

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Kubernetes Model – The IP per POD Model

Kubernetes Node 2
Kubernetes Node 1 10.244.2.0/24
10.244.1.0/24 10.244.2.2
10.244.1.2

?
POD
POD Host 2: 10.0.0.20

br0 Kubernetes Node 3


10.244.3.0/24
10.244.1.3 10.244.3.2

POD
POD

Host 1: 10.0.0.10 Host 3: 10.0.0.30


Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Cluster Networking in Kubernetes

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Same Node

L2 src: eth0 Root


namespace
pod1
L2 dst: br0
pod2 vethxx vethyy

L3 src:
eth0 eth0
pod1
L3 dst: pod 1 pod 2
pod2
Kubernetes Node

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Same Node
Linux Bridge
§ MAC learning
Open vSwitch
§ MAC learning: action=normal
§ L2 rule: dl_dst=pod2,action=output:2
§ L3 rule: ip,nw_dst=pod2, action=output:2
L2 src: eth0 Root
namespace
pod1
L2 dst: br0
pod2 vethxx vethyy

L3 src:
eth0 eth0
pod1
L3 dst: pod 1 pod 2
pod2
Kubernetes Node

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Same Node
Linux Bridge
§ MAC learning
Open vSwitch
§ MAC learning: action=normal
§ L2 rule: dl_dst=pod2,action=output:2
§ L3 rule: ip,nw_dst=pod2, action=output:2
L2 src: eth0 Root
namespace
pod1
L2 dst: br0
pod2 vethxx vethyy

L3 src:
eth0 eth0
pod1
L3 dst: pod 1 pod 2
pod2
Kubernetes Node

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes

Network
Fabric

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes

L2 src: pod1
L2 dst: br0 (gw)

L3 src: pod1 Network


L3 dst: pod4 Fabric

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
Using an overlay network
L2 src: pod1 • An overlay network obscures the underlying
L2 dst: br0 (gw) network architecture from the pod network through
Where to go
traffic encapsulation (for example VxLAN, GRE)
from here?? • Encapsulation reduces performance, though
L3 src: pod1 Network exactly how much depends on your solution
L3 dst: pod4 Fabric

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
Without an overlay network
L2 src: pod1 • Configure the underlying network fabric (switches,
L2 dst: br0 (gw) routers, etc.) to be aware of pod IP addresses
Where to go
• This does not require the encapsulation provided
from here?? by an overlay, and so can achieve better
L3 src: pod1 Network performance
L3 dst: pod4 Fabric

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Kubernetes Cluster Networking Plugins

Public clouds which supports Kuberentes program this into the fabric
§ E.g. in Google Container Engine: “everything to 10.1.1.0/24, send to this VM”
In other cases we need to use an external plugin
§ Flannel
§ Calico
§ Canal
§ Romana
§ Weave
§ Cisco Contiv
§ Huawei CNI-Genie
§ Nuage Networks VCS (by Nokia)
§ Open Virtual Network
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes

L2 src: pod1
L2 dst: br0 (gw)
Calico defines BGP agents and
advertises the POD subnets to the
L3 src: pod1 Network fabric
L3 dst: pod4 Fabric It uses IP-IP encapsulation

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes

L2 src: pod1
L2 dst: br0 (gw)
Flannel and Weave creates VxLAN tunnels
between nodes using a kernel implementation
L3 src: pod1 Network
L3 dst: pod4 Fabric

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes

Set-up VxLAN ports to every other node


L2 src: pod1 § ovs-vsctl add-port br0 vxlan4 -- set interface
L2 dst: br0 (gw) vxlan4 type=vxlan option:remote_ip={node4_ip}

Add rule for their subnet


L3 src: pod1 Network § ip,nw_dst={node4_subnet},
L3 dst: pod4 Fabric action=output:vxlan4

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
Set-up VxLAN one port
L2 src: pod1 § ovs-vsctl add-port br0 vxlan0 – set interface
L2 dst: br0 (gw) vxlan0 type=vxlan option:key=flow
option:remote_ip=flow
Add rule including tunnel destenation
L3 src: pod1 Network § ip,nw_dest= {node4_subnet},actions=
L3 dst: pod4 Fabric set_field:{node4_ip}->tun_dst,output:vxlan0

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
Set-up VxLAN one port
L2 src: node1 § ovs-vsctl add-port br0 vxlan0 – set interface
br0 vxlan0 type=vxlan option:key=flow
option:remote_ip=flow
L2 dst: node4
Add rule including tunnel destenation
br0 Network § ip,nw_dest= {node4_subnet},actions=
Fabric set_field:{node4_ip}->tun_dst,output:vxlan0
L3 src: pod1
L3 dst: pod4
eth0 Root eth0 Root
namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
Set-up VxLAN one port
L2 src: br0 § ovs-vsctl add-port br0 vxlan0 – set interface
(node4) vxlan0 type=vxlan option:key=flow
option:remote_ip=flow
L2 dst: pod4
Add rule including tunnel destenation
Network § ip,nw_dest= {node4_subnet},actions=
L3 src: pod1 Fabric set_field:{node4_ip}->tun_dst,output:vxlan0
L3 dst: pod4

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
For This, You Will Need a Control Plane

State information
Kubernetes API Control Plane Software

Rule installation
Network via OpenFlow and
Fabric OVSDB

eth0 Root eth0 Root


namespace namespace
br0 br0
vethxx vethyy vethzz vethvv

eth0 eth0 eth0 eth0


pod 1 pod 2 pod 3 pod 4
Kubernetes Node 1 Kubernetes Node 4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Pod to Service Communication
in Kubernetes

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Services in Kubernetes

Definition:
§ Service is an abstraction to define a logical set of Pods bound Remember:
by a policy by to access them
§ Defined by labels and selectors
PODs are Mortal!!!
§ Supports TCP and UDP
§ Interfaces with Kube-Proxy to manipulate IPtables
§ Service can be exposed internally by cluster/service IP
eth0 Root
namespace

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Services in Kubernetes

This will be the DNS name of the service

eth0 Root
namespace

Selector for PODs


br0
This will be the IP of the service vethxx vethyy

eth0 eth0
This will be the port of the service
pod 1 pod 2
This is the POD port
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L2 src: pod1
L2 dst: br0

L3 src: pod1
L3 dst: svc1
eth0 Root
namespace

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L2 src: pod1
L2 dst: br0

L3 src: pod1
L3 dst: svc1
eth0 Root
namespace

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L2 src: pod1
L2 dst: br0

L3 src: pod1
L3 dst: svc1
eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: pod1
L3 dst: svc1
L3 dst: pod88

DNAT, conntrack
eth0 Root
namespace
Remember: IPtables
§ Every node should reach every POD in the cluster
br0
§ ip route add {global_pod_cidr} via br0
vethxx vethyy
e.g. 10.244.0.0/16
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Example for IPtables Ruleset

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: pod1
L3 dst: pod88

via tunnel
network eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: pod88
L3 dst: pod1

via tunnel
network eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: pod88
L3 dst: pod1

via tunnel
network eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: pod88
L3 src: svc1
L3 dst: pod1

un-DNAT
eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: svc1
L3 dst: pod1

eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service

L3 src: svc1
L3 dst: pod1
Unfortunately, you can’t do the same with OVS

eth0 Root
namespace
IPtables

br0
vethxx vethyy

eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Handling Service Communication with OVS: Option 1

Use ct* flow rules:


§ It uses the same conntrack kernel module as IPtables
§ You can specify similar NAT rules than you would in IPtables
§ For load balancing between POD backend, you can use group rules
table=0,ip,nw_src={pod_cidr},nw_dst={service_cidr},ct_state=-trk,action=ct(table=2)
table=0,ip,nw_src={pod_cidr},nw_dst={pod_cidr},ct_state=-trk,action=ct(table=4)

table=2,ip,nw_dst={svc1_ip},tp_dst={svc1_port},ct_state=+trk+new,action=group:1
table=2,ip,nw_dst={svc2_ip},tp_dst={svc2_port},ct_state=+trk+new,action=group:2
table=2,ct_state=+trk-new,action=table:4

table=4 contains the original switching / routing rules

group_id=1,type=select, bucket=ct(commit,nat(dst={pod1_ip}:{pod_port}),table=4,
bucket=ct(commit,nat(dst={pod2_ip}:{pod_port}),table=4,
bucket=ct(commit,nat(dst={pod3_ip}:{pod_port}),table=4

* ct rules are actually not OpenFlow compatible

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Handling Service Communication with OVS: Option 2

Use stateless NAT rules:


§ If we see a Service IP we switch the destination IP to a POD backend
§ But at the same time we modify the source IP to a shifted domain (e.g. 10.244.x.y à 172.24.x.y)
§ This way we don’t use any kernel specific rules which allows the integration into user-space (e.g. DPDK)
table=0,ip,nw_src={pod_cidr},nw_dst={service_cidr},action=table:2
table=0,ip,nw_src={pod_cidr},nw_dst={shifted_pod_cidr},action=table:3
table=0,ip,nw_src={pod_cidr},nw_dst={pod_cidr},action=table:4

table=2,ip,nw_dst={svc1_ip},tp_dst={svc1_port},actions=load:44056->NXM_OF_IP_SRC[16..31],group:1

table=3,ip,nw_src={pod1_ip},tp_src={pod_port},actions=mod_nw_src:{svc1_ip},mod_tp_src:{svc1_port}
,load:2804->NXM_OF_IP_DST[16..31],resubmit:4

table=4 contains the original switching / routing rules

group_id=1,type=select, bucket=mod_nw_dst:{pod1_ip},mod_tp_dst:{pod_port},resubmit=4,
bucket=mod_nw_dst:{pod2_ip},mod_tp_dst:{pod_port},resubmit=4,
bucket=mod_nw_dst:{pod3_ip},mod_tp_dst:{pod_port},resubmit=4
* NXM stands for Nicira eXtended Match rules which are also not OpenFlow compatible

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Finally, it’s demo time J

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Performance Comparison: Google Cloud

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Performance Comparison: Amazon Cloud

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Performance Comparison: Packet

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Kubernetes Networking with Open vSwitch

Pure OVS solution


§ CNI binary attaching PODs to and OVS bridge
§ POD-to-POD and POD-to-Service communication with OpenFlow rules
§ Enhanced monitoring using Prometheus and OVS-exporter
§ Speed and latency is comparable with leading plugins (Flannel, Calico, Weave)
§ DPDK integration possibility
§ 100% open source: https://github.com/dunlinplugin

dunlin.io
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Backup Slides

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
IPtables Latency by Google

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Pod to External Communication
in Kubernetes

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external

src: pod1
dst: 8.8.8.8

eth0 Root
namespace
IPtables

cbr0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external

src: pod1
dst: 8.8.8.8

eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables

cbr0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external

src: pod1
src: NodeIP
dst: 8.8.8.8

MASQUERADE
eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables

Node IPs are usually also private cbr0


vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external

src: NodeIP
src: PublicIP
Network
dst: 8.8.8.8 Fabric

MASQUERADE
eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables

Node IPs are usually also private cbr0


§ Needs second NAT by the fabric vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external

src: PublicIP
dst: 8.8.8.8
Network
Fabric

eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables

Node IPs are usually also private cbr0


§ Needs second NAT by the fabric vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem

src: pod1
dst: svc1
dst: pod1

DNAT, conntrack
eth0 Root
namespace
IPtables

cbr0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem

src: pod1
dst: pod1

eth0 Root
namespace
IPtables

cbr0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem

src: pod1
dst: pod1

eth0 Root
namespace
IPtables

The reply for this packet would not leave this POD at all! cbr0
Only SNAT at the in IPtables can solve this problem vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem

src: pod1
src: cbr0
dst: svc1
dst: pod1
DNAT, conntrack
eth0 Root
namespace
IPtables

cbr0
vethxx vethyy

eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External to Internal Communication
in Kubernetes

Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Serveices can be exposed to the outside by


§ Node port
§ Load Balancer
Network
Example: frontend Fabric
§ pod 11
§ pod 31
§ pod 32
eth0 Root eth0 Root eth0 Root
ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethyy vethzz vethvv vethoo vethpp

eth0 eth0 eth0 eth0 eth0 eth0


pod 11 pod 12 pod 21 pod 22 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Serveices can be exposed to the outside by


§ Node port
§ Load Balancer
Network
Example: frontend Fabric
§ pod 11
§ pod 31
§ pod 32
eth0 Root eth0 Root eth0 Root
ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Node port
§ One port on every node gets rerouted to a certain service
§ Typically port number > 30000
§ ∀NodeIP:30001 à 10.9.8.15:8080 Network
§ Node IPs are usually not public! Fabric

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Node port src: xxx:yyy


§ One port on every node gets rerouted to a certain service
dst:
§ Typically port number > 30000
§ ∀NodeIP:30001 à 10.9.8.15:8080 Network Node2:30001
§ Node IPs are usually not public! Fabric
Translates to one
of the pod IP
randomly
eth0 Root eth0 Root eth0 Root
ns ns ns
IPtables IPtables IPtables

cbr0 1/3 cbr0


1/3 cbr0
vethxx vethoo vethpp

eth0 1/3 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Node port src: xxx:yyy


§ One port on every node gets rerouted to a certain service
dst:
§ Typically port number > 30000
§ ∀NodeIP:30001 à 10.9.8.15:8080 Network Node2:30001
§ Node IPs are usually not public! Fabric src: Node2:xxxx
dst: pod11:8080

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 1/3 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Node port src: xxx:yyy


§ One port on every node gets rerouted to a certain service
dst:
§ Typically port number > 30000
§ ∀NodeIP:30001 à 10.9.8.15:8080 Network Node3:30001
§ Node IPs are usually not public! Fabric
Translates to one
of the pod IP
randomly
eth0 Root eth0 Root eth0 Root
ns ns ns
IPtables IPtables IPtables
1/3 1/3
cbr0 cbr0 cbr0
vethxx 1/3 vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Node port src: xxx:yyy


§ One port on every node gets rerouted to a certain service
dst:
§ Typically port number > 30000
§ ∀NodeIP:30001 à 10.9.8.15:8080 Network Node3:30001
§ Node IPs are usually not public! Fabric src: Node3:xxxx
dst: pod11:8080

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx 1/3 vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Load Balancer
§ One public IP that maps to a certain service
§ Fabric has to manage it!
§ GCE Network
LoadFabric
Balancer
§ AWS
§ OpenStack

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Load Balancer src: xxx:yyy


§ One public IP that maps to a certain service
dst: 95.67.12.3
§ Fabric has to manage it!
§ GCE Network
LoadFabric
Balancer
§ AWS Packet arrives to
§ OpenStack Load Balancer’s
public IP

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Load Balancer src: xxx:yyy


§ One public IP that maps to a certain service
dst: 95.67.12.3
§ Fabric has to manage it!
§ GCE Network dst: Node Port
LoadFabric
Balancer
§ AWS Packet has to be
§ OpenStack forwarded to one
of the nodes

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Load Balancer src: xxx:yyy


§ One public IP that maps to a certain service
dst: 95.67.12.3
§ Fabric has to manage it!
§ GCE Network dst: Node Port
LoadFabric
Balancer
§ AWS If the LB is smart it
§ OpenStack will only forward
to nodes with pod

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Load Balancer src: xxx:yyy


§ One public IP that maps to a certain service
dst: 95.67.12.3
§ Fabric has to manage it!
§ GCE Network dst: Node1
LoadFabric
Balancer
§ AWS If IPtables is smart
§ OpenStack it won’t reroute to
other node

eth0 Root eth0 Root eth0 Root


ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp

eth0 eth0 eth0


pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic

Load Balancer src: xxx:yyy


§ One public IP that maps to a certain service
dst: 95.67.12.3
§ Fabric has to manage it!
§ GCE Network
LoadFabric
Balancer
§ AWS Even then load
§ OpenStack balance might not
be perfect!
50% 50%
eth0 Root eth0 Root eth0 Root
ns ns ns
IPtables IPtables IPtables

cbr0 cbr0 cbr0


vethxx vethoo vethpp
50% 25% 25%
eth0 eth0 eth0
pod 11 pod 31 pod 32
Kubernetes Node 1 Kubernetes Node 2 Kubernetes Node 3
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu

You might also like