Kubernetes Networking Made Easy With Open Vswitch and OpenFlow Péter Megyesi LeanNet Ltd.
Kubernetes Networking Made Easy With Open Vswitch and OpenFlow Péter Megyesi LeanNet Ltd.
Péter Megyesi
Co-founder @ LeanNet ltd.
Who Am I?
PhD in Telecommunications @ Budapest University of Technology
§ Measurement and monitoring in Software Defined Networks
§ Participating in 5G-PPP EU projects
§ Graduated in the EIT Digital Doctoral School
Co-founder & CTO @ LeanNet Ltd.
§ Evangelist of open networking solutions
§ Currently focusing on SDN in cloud native environments
twitter.com/M3gy0
linkedin.com/in/M3gy0
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
What is Open vSwitch?
OVSDB
SDN controller Open vSwitch
OpenFlow
VM 1 VM 2
Server
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
What is Kubernetes?
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Basic Kubernetes Terminology
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model
eth0
Docker Host
docker0 172.17.0.1/24
vethxx
Container 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model
eth0
Docker Host
docker0 172.17.0.1/24
vethxx
Root namespace
eth0
172.17.0.2
Container 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model
eth0
Docker Host
docker0 172.17.0.1/24
vethxx vethyy
Root namespace
eth0 eth0
172.17.0.2 172.16.0.3
Container 1 Container 2
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Docker Model
Docker Host 2
Docker Host 1
172.17.0.2
172.17.0.2
NAT
Container
Container NAT
NAT
NAT
NAT 172.17.0.2
Container
Container
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Docker Host Ports
172.17.0.2
80 172.17.0.2
17472 SNAT
9898
5001
SNAT 26432
Host 2: 10.0.0.20
172.17.0.3
This is unfeasible in a very large cluster!
Host 1: 10.0.0.10
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Networking in Kubernetes
Pod-to-Pod communication
§ Each Pod in a Kubernetes cluster is assigned an IP in a flat shared networking namespace
§ All PODs can communicate with all other PODs without NAT
§ The IP that a PODs sees itself as is the same IP that others see it as
Pod-to-Service communication
§ Requests to the Service IPs are intercepted by a Kube-proxy process running on all hosts
§ Kube-proxy is then responsible for routing to the correct POD
External-to-Internal communication
§ All nodes can communicate with all PODs (and vice-versa) without NAT
§ Node ports are can be assigned to a service on every Kuberentes host
§ Public IPs can be implemented by configuring external Load Balancers which target all
nodes in the cluster
§ Once traffic arrives at a node, it is routed to the correct Service backends by Kube-proxy
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Container Network Interface
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI in Kubernetes
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI in Kubernetes
veth0 veth1
Root namespace
Container NS
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch
veth1
Root namespace
Container NS
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch
veth1
Container NS
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
CNI With Open vSwitch
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Kubernetes Model – The IP per POD Model
Kubernetes Node 2
Kubernetes Node 1 10.244.2.0/24
10.244.1.0/24 10.244.2.2
10.244.1.2
?
POD
POD Host 2: 10.0.0.20
POD
POD
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Same Node
L3 src:
eth0 eth0
pod1
L3 dst: pod 1 pod 2
pod2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Same Node
Linux Bridge
§ MAC learning
Open vSwitch
§ MAC learning: action=normal
§ L2 rule: dl_dst=pod2,action=output:2
§ L3 rule: ip,nw_dst=pod2, action=output:2
L2 src: eth0 Root
namespace
pod1
L2 dst: br0
pod2 vethxx vethyy
L3 src:
eth0 eth0
pod1
L3 dst: pod 1 pod 2
pod2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Same Node
Linux Bridge
§ MAC learning
Open vSwitch
§ MAC learning: action=normal
§ L2 rule: dl_dst=pod2,action=output:2
§ L3 rule: ip,nw_dst=pod2, action=output:2
L2 src: eth0 Root
namespace
pod1
L2 dst: br0
pod2 vethxx vethyy
L3 src:
eth0 eth0
pod1
L3 dst: pod 1 pod 2
pod2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
Network
Fabric
L2 src: pod1
L2 dst: br0 (gw)
Public clouds which supports Kuberentes program this into the fabric
§ E.g. in Google Container Engine: “everything to 10.1.1.0/24, send to this VM”
In other cases we need to use an external plugin
§ Flannel
§ Calico
§ Canal
§ Romana
§ Weave
§ Cisco Contiv
§ Huawei CNI-Genie
§ Nuage Networks VCS (by Nokia)
§ Open Virtual Network
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-POD, Between Nodes
L2 src: pod1
L2 dst: br0 (gw)
Calico defines BGP agents and
advertises the POD subnets to the
L3 src: pod1 Network fabric
L3 dst: pod4 Fabric It uses IP-IP encapsulation
L2 src: pod1
L2 dst: br0 (gw)
Flannel and Weave creates VxLAN tunnels
between nodes using a kernel implementation
L3 src: pod1 Network
L3 dst: pod4 Fabric
State information
Kubernetes API Control Plane Software
Rule installation
Network via OpenFlow and
Fabric OVSDB
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Services in Kubernetes
Definition:
§ Service is an abstraction to define a logical set of Pods bound Remember:
by a policy by to access them
§ Defined by labels and selectors
PODs are Mortal!!!
§ Supports TCP and UDP
§ Interfaces with Kube-Proxy to manipulate IPtables
§ Service can be exposed internally by cluster/service IP
eth0 Root
namespace
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Services in Kubernetes
eth0 Root
namespace
eth0 eth0
This will be the port of the service
pod 1 pod 2
This is the POD port
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L2 src: pod1
L2 dst: br0
L3 src: pod1
L3 dst: svc1
eth0 Root
namespace
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L2 src: pod1
L2 dst: br0
L3 src: pod1
L3 dst: svc1
eth0 Root
namespace
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L2 src: pod1
L2 dst: br0
L3 src: pod1
L3 dst: svc1
eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: pod1
L3 dst: svc1
L3 dst: pod88
DNAT, conntrack
eth0 Root
namespace
Remember: IPtables
§ Every node should reach every POD in the cluster
br0
§ ip route add {global_pod_cidr} via br0
vethxx vethyy
e.g. 10.244.0.0/16
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Example for IPtables Ruleset
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: pod1
L3 dst: pod88
via tunnel
network eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: pod88
L3 dst: pod1
via tunnel
network eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: pod88
L3 dst: pod1
via tunnel
network eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: pod88
L3 src: svc1
L3 dst: pod1
un-DNAT
eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: svc1
L3 dst: pod1
eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a Packet: POD-to-Service
L3 src: svc1
L3 dst: pod1
Unfortunately, you can’t do the same with OVS
eth0 Root
namespace
IPtables
br0
vethxx vethyy
eth0 eth0
pod 1 pod 2
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Handling Service Communication with OVS: Option 1
table=2,ip,nw_dst={svc1_ip},tp_dst={svc1_port},ct_state=+trk+new,action=group:1
table=2,ip,nw_dst={svc2_ip},tp_dst={svc2_port},ct_state=+trk+new,action=group:2
table=2,ct_state=+trk-new,action=table:4
group_id=1,type=select, bucket=ct(commit,nat(dst={pod1_ip}:{pod_port}),table=4,
bucket=ct(commit,nat(dst={pod2_ip}:{pod_port}),table=4,
bucket=ct(commit,nat(dst={pod3_ip}:{pod_port}),table=4
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Handling Service Communication with OVS: Option 2
table=2,ip,nw_dst={svc1_ip},tp_dst={svc1_port},actions=load:44056->NXM_OF_IP_SRC[16..31],group:1
table=3,ip,nw_src={pod1_ip},tp_src={pod_port},actions=mod_nw_src:{svc1_ip},mod_tp_src:{svc1_port}
,load:2804->NXM_OF_IP_DST[16..31],resubmit:4
group_id=1,type=select, bucket=mod_nw_dst:{pod1_ip},mod_tp_dst:{pod_port},resubmit=4,
bucket=mod_nw_dst:{pod2_ip},mod_tp_dst:{pod_port},resubmit=4,
bucket=mod_nw_dst:{pod3_ip},mod_tp_dst:{pod_port},resubmit=4
* NXM stands for Nicira eXtended Match rules which are also not OpenFlow compatible
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Finally, it’s demo time J
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Performance Comparison: Google Cloud
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Performance Comparison: Amazon Cloud
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Performance Comparison: Packet
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Kubernetes Networking with Open vSwitch
dunlin.io
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Backup Slides
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
IPtables Latency by Google
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Pod to External Communication
in Kubernetes
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external
src: pod1
dst: 8.8.8.8
eth0 Root
namespace
IPtables
cbr0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external
src: pod1
dst: 8.8.8.8
eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables
cbr0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external
src: pod1
src: NodeIP
dst: 8.8.8.8
MASQUERADE
eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables
eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external
src: NodeIP
src: PublicIP
Network
dst: 8.8.8.8 Fabric
MASQUERADE
eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables
eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
Life of a packet: pod-to-external
src: PublicIP
dst: 8.8.8.8
Network
Fabric
eth0 Root
POD IP address is private namespace
§ Needs NAT to communicate with external IPtables
eth0 eth0
pod 1 pod 2
Kubernetes Node
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem
src: pod1
dst: svc1
dst: pod1
DNAT, conntrack
eth0 Root
namespace
IPtables
cbr0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem
src: pod1
dst: pod1
eth0 Root
namespace
IPtables
cbr0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem
src: pod1
dst: pod1
eth0 Root
namespace
IPtables
The reply for this packet would not leave this POD at all! cbr0
Only SNAT at the in IPtables can solve this problem vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
The Hairpin Problem
src: pod1
src: cbr0
dst: svc1
dst: pod1
DNAT, conntrack
eth0 Root
namespace
IPtables
cbr0
vethxx vethyy
eth0 eth0
pod 1 pod 2
Kubernetes Node 1
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External to Internal Communication
in Kubernetes
Péter Megyesi: Kubernetes Networking Made Easy with Open vSwitch and OpenFlow www.leannet.eu
External-to-Internal Traffic
Node port
§ One port on every node gets rerouted to a certain service
§ Typically port number > 30000
§ ∀NodeIP:30001 à 10.9.8.15:8080 Network
§ Node IPs are usually not public! Fabric
Load Balancer
§ One public IP that maps to a certain service
§ Fabric has to manage it!
§ GCE Network
LoadFabric
Balancer
§ AWS
§ OpenStack