Learn Kubernetes
Learn Kubernetes
Kubernetes
Why Kubernetes
• The salary of DevOps related jobs are
sky high, up to $146,207 on average in
San Fransisco
• It gives you all the flexibility and cost savings that you always wanted within
one framework
• I’m a big advocate of Agile and DevOps techniques in all the projects I
work on
• Cluster Setup lectures using minikube or the Docker client (for desktop usage)
and production cluster setup on AWS using kops
• Lectures and demos on Kubeadm are also available for on-prem setups
• I update all the scripts and yaml definitions on Github to make sure
they work with the latest version available
• If not, you can send me a message and I’ll update the script on Github
Building Containers Services External DNS User Management Building & Deploying
Running your first app Labels Volumes RBAC
Node Affinity
• Setting up your Kubernetes cluster for the first time can be hard, but
once you’re passed the initial lectures, it will get easier, and you’ll deepen
your knowledge by learning all the details of Kubernetes
• When you’re finished with this course, you can continue with my 2 other
(Advanced) Kubernetes courses, you’ll get a coupon code in the last
lecture (bonus lecture)
Building Containers Services External DNS User Management Building & Deploying
Running your first app Labels Volumes RBAC
Node Affinity
• You can scan the following barcode or use the link in the next document
after this introduction movie
• Kubernetes clusters can start with one node until thousands of nodes
• Docker Swarm
• Mesos
• Highly modular
• Open source
• Great community
• Backed by Google
}
app A app B app C
{
OS OS OS app A app B app C
Docker
Container bins bins Engine
Hypervisor libs libs
Host OS Host OS
Server Server
Learn DevOps: Kubernetes - Edward Viaene
Containers on Cloud Providers
} }
app A app B app C
Docker
bins bins Engine
Container
Virtual Machine
libs libs
Guest OS
Hypervisor
Host OS
Server
• Docker Engine
• Docker Hub
• You can run the same docker image, unchanged, on laptops, data center
VMs, and Cloud providers.
• But, there are more integrations for certain Cloud Providers, like AWS & GCE
• Things like Volumes and External Load Balancers work only with
supported Cloud Providers
• I will first use minikube to quickly spin up a local single machine with a
Kubernetes cluster
• I’ll then show you how to spin up a cluster on AWS using kops
• Using the AWS Free tier (gives you 750 hours of t2.micro’s / month)
• http://aws.amazon.com
• Using DigitalOcean
• It’s aimed on users who want to just test it out or use if for development
• To launch your cluster you just need to enter (in a shell / terminal /
powershell):
$ minikube start
• Minikube and docker client are great for local setups, but not for real
clusters
• For other installs, or if you can’t get kops to work, you can use kubeadm
• The kubeadm lectures can be found at the end of this course, and let
you spin up a cluster on DigitalOcean
• Windows: https://docs.docker.com/engine/installation/windows/
• MacOS: https://docs.docker.com/engine/installation/mac/
• Linux: https://docs.docker.com/engine/installation/linux/
• Or you can use my vagrant devops-box which comes with docker installed
• In the demos I will always use an ubuntu-xenial box, setup with vagrant
$ cd docker-demo
$ ls
Dockerfile index.js package.json
$ docker build .
[…]
$
• After the docker build process you have built an image that can run the
nodejs app
• Then you can push any locally built images to the Docker Registry
(where docker images can be stored in)
$ docker login
$ docker tag imageid your-login/docker-demo
$ docker push your-login/docker-demo
$ cd docker-demo
$ ls
Dockerfile index.js package.json
$ docker build -t your-login/docker-demo .
[…]
$ docker push your-login/docker-demo
[…]
$
• Don’t try to create one giant docker image for you app, but split it up if
necessary
• All the data in the container is not preserved, when a container stops, all the
changes within a container are lost
• You can preserve data, using volumes, which is covered later in this course
• For more tips, check out the 12-factor app methodology at 12factor.net
• https://hub.docker.com/_/nginx/ - webserver
• https://hub.docker.com/_/php/ - PHP
• https://hub.docker.com/_/node - NodeJS
• https://hub.docker.com/_/ruby/ - Ruby
• https://hub.docker.com/_/python/ - Python
• https://hub.docker.com/_/openjdk/ - Java
Learn DevOps: Kubernetes - Edward Viaene
Demo
Pushing docker image to Docker Hub
Running first app
First app
• Let’s run our newly built application on the new Kubernetes cluster
• A pod can contain one or more tightly coupled containers, that make up
the app
• Those apps can easily communicate which each other using their local
port numbers
• This AWS Load Balancer will route the traffic to the correct pod in Kubernetes
• There are other solutions for other cloud providers that don’t have a Load
Balancer
Pod 1 Pod 1
k8s-example k8s-example
Pod 1 Pod 1
k8s-example k8s-example
Pod 1 Pod 1
k8s-example k8s-example
Container image
Pod busybox
busybox
Pod
apiVersion: v1
kubelet
iptables kind: Pod
kube-proxy metadata:
name: nodehelloworld.example.com
node 2 Docker labels:
Pod 4 Pod 5 Pod N app: helloworld
spec:
{
containers:
- name: k8s-demo
image: wardviaene/k8s-demo
node N Container ports:
- containerPort: 3000
• Any files that need to be saved can’t be saved locally on the container
Learn DevOps: Kubernetes - Edward Viaene
Scaling
• Our example app is stateless, if the same app would run multiple times, it
doesn’t change state
• Later in this course I’ll explain how to use volumes to still run stateful apps
• Those stateful apps can’t horizontally scale, but you can run them in a
single container and vertically scale (allocate more CPU / Memory /
Disk)
Learn DevOps: Kubernetes - Edward Viaene
Scaling
• Scaling in Kubernetes can be done using the Replication Controller
• The replication controller will ensure a specified number of pod replicas will
run at all time
• A pods created with the replica controller will automatically be replaced if they
fail, get deleted, or are terminated
• Using the replication controller is also recommended if you just want to make
sure 1 pod is always running, even after reboots
• This Replica Set, rather than the Replication Controller, is used by the
Deployment object
Learn DevOps: Kubernetes - Edward Viaene
Deployments
• A deployment declaration in Kubernetes allows you to do app
deployments and updates
• When using the deployment object, you define the state of your application
• Kubernetes will then make sure the clusters matches your desired state
kubectl get pods --show-labels get pods, and also show labels attached
to those pods
kubectl rollout status deployment/helloworld-deployment Get deployment status
kubectl set image deployment/helloworld-deployment Run k8s-demo with the image label
k8s-demo=k8s-demo:2 version 2
kubectl edit deployment/helloworld-deployment Edit the deployment object
• That’s why Pods should never be accessed directly, but always through a
Service
• A service is the logical bridge between the “mortal” pods and other
services or end-users
Learn DevOps: Kubernetes - Edward Viaene
Services
• When using the “kubectl expose” command earlier, you created a new
Service for your pod, so it could be accessed externally
• Note: by default service can only run between ports 30000-32767, but you could
change this behavior by adding the --service-node-port-range= argument to the
kube-apiserver (in the init scripts)
• Labels are like tags in AWS or other cloud providers, used to tag resources
• You can label your objects, for instance your pod, following an organizational structure
• In our previous examples I already have been using labels to tag pods:
metadata:
name: nodehelloworld.example.com
labels:
app: helloworld
• Once labels are attached to an object, you can use filters to narrow down
results
• Using Label Selectors, you can use matching expressions to match labels
• For instance, a particular pod can only run on a node labeled with
“environment” equals “development”
• Once nodes are tagged, you can use label selectors to let pods only run
on specific nodes
• To detect and resolve problems with your application, you can run health
checks
• The typical production application behind a load balancer should always have
health checks implemented in some way to ensure availability and resiliency
of the app
Learn DevOps: Kubernetes - Edward Viaene
Health checks
• This is how a health check looks like on our example container:
apiVersion: v1
kind: Pod
metadata:
name: nodehelloworld.example.com
labels:
app: helloworld
spec:
containers:
- name: k8s-demo
image: wardviaene/k8s-demo
ports:
- containerPort: 3000
livenessProbe:
httpGet:
path: /
port: 3000
initialDelaySeconds: 15
timeoutSeconds: 30
• If the check fails, the container will not be restarted, but the Pod’s IP
address will be removed from the Service, so it’ll not serve any
requests anymore
Learn DevOps: Kubernetes - Edward Viaene
Readiness Probe
• The readiness test will make sure that at startup, the pod will only receive
traffic when the test succeeds
• You can use these probes in conjunction, and you can configure different
tests for them
• If your container always exits when something goes wrong, you don’t need
a livenessProbe
• I’ll then show you the lifecycle of a pod in the next lecture
• Failed: All containers within this pod have been Terminated, and at
least one container returned a failure code
• The failure code is the exit code of the process when a container
terminates
• A network error might have been occurred (for example the node
where the pod is running on is down)
liveness
post start hook pre stop hook
probe
Type Status
Initialized True
Type Status Ready False
Initialized False PodScheduled True
Ready False
PodScheduled True
Time
Learn DevOps: Kubernetes - Edward Viaene
Demo
Pod lifecycle
Secrets
Secrets
• Secrets provides a way in Kubernetes to distribute credentials, keys,
passwords or “secret” data to the pods
• You can also use the same mechanism to provide secrets to your
application
• There are still other ways your container can get its secrets if you don’t
want to use Secrets (e.g. using an external vault services in your app)
Learn DevOps: Kubernetes - Edward Viaene
Secrets
• Secrets can be used in the following ways:
• Can be used for instance for dotenv files or your app can just read this
file
• After creating the yml file, you can use kubectl create:
$ kubectl create -f secrets-db-secret.yml
secret “db-secret" created
$
$ minikube dashboard
• The DNS service can be used within pods to find other services running on the
same cluster
• Multiple containers within 1 pod don’t need this service, as they can contact each
other directly
• A container in the same pod can connect the the port of the other container
directly using localhost:port
Pod 1 Pod 2
IP: 10.0.0.1 IP: 10.0.0.2
Service: app1 Service: app2
container container
$ host app1-service
Application 1 app1-service has address 10.0.0.1 Application 2
$ host app2-service
app2-service has address 10.0.0.2
$ host app2-service.default
app2-service.default has address 10.0.0.2 Default stands for the default namespace
$ host app2-service.default.svc.cluster.local
Pods and services can be launched in
app2-service.default.svc.cluster.local has address 10.0.0.2
different namespaces (to logically separate
your cluster)
Service: app1
container kube-dns
namespace: kube-system
• The ConfigMap key-value pairs can then be read by the app using:
• Environment variables
• Using volumes
• This file can then be mounted using volumes where the application
expects its config file
• This way you can “inject” configuration settings into containers without
changing the container itself
apiVersion: v1
kind: Pod
metadata:
name: nodehelloworld.example.com
labels:
app: helloworld
spec:
containers:
- name: k8s-demo
image: wardviaene/k8s-demo
ports:
- containerPort: 3000
volumeMounts:
- name: config-volume
mountPath: /etc/config The config values will be stored in files:
volumes: /etc/config/driver
- name: config-volume /etc/config/param/with/hierarchy
configMap:
name: app-config
• With ingress you can run your own ingress controller (basically a
loadbalancer) within the Kubernetes cluster
• There are a default ingress controllers available, or you can write your
own ingress controller
Learn DevOps: Kubernetes - Edward Viaene
Ingress
Internet Ingress rules:
host-x.example.com => pod 1
port 80 (http) host-y.example.com => pod 2
host-x.example.com/api/v2 => pod n
port 443 (https)
Pod
Pod 1 Pod 2
Application 1 Application 2
Learn DevOps: Kubernetes - Edward Viaene
Ingress rules
• You can create ingress rules using the ingress object
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: helloworld-rules
spec:
rules:
- host: helloworld-v1.example.com
http:
paths:
- path: /
backend:
serviceName: helloworld-v1
servicePort: 80
- host: helloworld-v2.example.com
http:
paths:
- path: /
backend:
serviceName: helloworld-v2
servicePort: 80
• On public cloud providers, you can use the ingress controller to reduce
the cost of your LoadBalancers
• You can use 1 LoadBalancer that captures all the external traffic and
send it to the ingress controller
• This tool will automatically create the necessary DNS records in your
external DNS server (like route53)
• For every hostname that you use in ingress, it’ll create a new record to
send traffic to your loadbalancer
• Other setups are also possible without ingress controllers (for example
directly on hostPort - nodePort is still WIP, but will be out soon)
Learn DevOps: Kubernetes - Edward Viaene
Ingress with LB and External-DNS
Internet
AWS Route53
Pod
Ingress rules:
AWS host-x.example.com => pod 1
LoadBalancer nginx
host-y.example.com => pod 2
Service: ingress ingress host-x.example.com/api/v2 => pod n
controller
Pod
External
DNS
• That’s why up until now I’ve been using stateless apps: apps that don’t
keep a local state, but store their state in an external service
Docker
Pod 1 Pod 2 Pod N
EBS Storage
kubelet
iptables
kube-proxy
node 2 Docker
myapp
• Tip: the nodes where your pod is going to run on also need to be in the same
availability zone
Learn DevOps: Kubernetes - Edward Viaene
Volumes
• To use volumes, you need to create a pod with a volume definition
[…]
spec:
containers:
- name: k8s-demo
image: wardviaene/k8s-demo
volumeMounts:
- mountPath: /myvol
name: myvolume
ports:
- containerPort: 3000
volumes:
- name: myvolume
awsElasticBlockStore:
volumeID: vol-055681138509322ee
• The AWS Plugin can for instance provision storage for you by creating
the volumes in AWS before attaching them to a node
• This is still in beta when writing this course, but will be stable soon
• I’ll also keep my github repository up to date with the latest definitions
Learn DevOps: Kubernetes - Edward Viaene
Volumes
• To use auto provisioned volumes, you can create the following yaml file:
storage.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
zone: us-east-1
• This will allow you to create volume claims using the aws-ebs provisioner
• Kubernetes will provision volumes of the type gp2 for you (General
Purpose - SSD)
• Imagine you have 20 applications you want to deploy, and they all need to get a
specific credential
• You can use presets to create 1 Preset object, which will inject an environment
variable or config file to all matching pods
• When injecting Environment variables and VolumeMounts, the Pod Preset will apply
the changes to all containers within the pod
Learn DevOps: Kubernetes - Edward Viaene
Pod Presets
• This is an example of a Pod Preset
apiVersion: settings.k8s.io/v1alpha1 # you might have to change this after PodPresets become stable
kind: PodPreset
metadata:
name: share-credential
spec:
selector:
matchLabels:
app: myapp
env:
- name: MY_SECRET
value: "123456"
volumeMounts:
- mountPath: /share
name: share-volume
volumes:
- name: share-volume
emptyDir: {}
• It’s possible that no pods are currently matching, but that matching
pods will be launched at a later time
• Your podname will have a sticky identity, using an index, e.g. podname-0
podname-1 and podname-2 (and when a pod gets rescheduled, it’ll keep that
identity)
• Statefulsets allow stateful apps stable storage with volumes based on their ordinal
number (podname-x)
• Deleting and/or scaling a StatefulSet down will not delete the volumes
associated with the StatefulSet (preserving data)
Learn DevOps: Kubernetes - Edward Viaene
StatefulSets
• A StatefulSet will allow your stateful app to use DNS to find other peers
• If you wouldn’t use StatefulSet, you would get a dynamic hostname, which
you wouldn’t be able to use in your configuration files, as the name can
always change
Learn DevOps: Kubernetes - Edward Viaene
StatefulSets
• A StatefulSet will also allow your stateful app to order the startup and
teardown:
• This is useful if you first need to drain the data from a node before it
can be shut down
• Logging aggregators
• Monitoring
• Running a daemon that only needs one instance per physical instance
• I’ll use InfluxDB, but others like Google Cloud Monitoring/Logging and
Kafka are also possible
Learn DevOps: Kubernetes - Edward Viaene
Resource Usage Monitoring
• Visualizations (graphs) can be shown using Grafana
• All these technologies (Heapster, InfluxDB, and Grafana) can be started in pods
• https://github.com/kubernetes/heapster/tree/master/deploy/kube-config/
influxdb
• After downloading the repository the whole platform can be deployed using
the addon system or by using kubectl create -f directory-with-yaml-files/
Learn DevOps: Kubernetes - Edward Viaene
Resource Usage Monitoring
node 1 node 3
cAdvisor kubelet cAdvisor kubelet
Docker Docker
Pod
Heapster Pod
InfluxDB Pod
node 2
cAdvisor kubelet Grafana Pod
Docker
Pod
• In Kubernetes 1.3 scaling based on CPU usage is possible out of the box
• With alpha support, application based metrics are also available (like
queries per second or average request latency)
• To enable this, the cluster has to be started with the env var
ENABLE_CUSTOM_METRICS to true
Learn DevOps: Kubernetes - Edward Viaene
Autoscaling
• Autoscaling will periodically query the utilization for the targeted pods
• Autoscaling will use heapster, the monitoring tool, to gather its metrics
and make scaling decisions
• You run a deployment with a pod with a CPU resource request of 200m
• You can create rules that are not hard requirements, but rather a
preferred rule, meaning that the scheduler will still be able to schedule
your pod, even if the rules cannot be met
• You can create rules that take other pod labels into account
• For example, a rule that makes sure 2 different pods will never be
on the same node
Learn DevOps: Kubernetes - Edward Viaene
Affinity and anti-affinity
• Kubernetes can do node affinity and pod affinity/anti-affinity
• I’ll first cover node affinity and will then cover pod affinity/anti-affinity
• 1) requiredDuringSchedulingIgnoredDuringExecution
• 2) preferredDuringSchedulingIgnoredDuringExecution
• The second type will try to enforce the rule, but it will not guarantee it
• Even if the rule is not met, the pod can still be scheduled, it’s a soft
requirement, a preference
Learn DevOps: Kubernetes - Edward Viaene
Affinity and anti-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values:
- dev
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: team
operator: In
values:
- engineering-project1
containers:
[…]
• The higher this weighting, the more weight is given to that rule
• When scheduling, Kubernetes will score every node by summarizing the weightings
per node
• If only the rule with weight 1 matches, then the score will only be 1
• The node that has the highest total score, that’s where the pod will be scheduled on
Learn DevOps: Kubernetes - Edward Viaene
Built-in node labels
• In addition to the labels that you can add yourself to nodes, there are pre-
populated labels that you can use:
• kubernetes.io/hostname
• failure-domain.beta.kubernetes.io/zone
• failure-domain.beta.kubernetes.io/region
• beta.kubernetes.io/instance-type
• beta.kubernetes.io/os
• beta.kubernetes.io/arch
Learn DevOps: Kubernetes - Edward Viaene
Affinity and anti-affinity
Demo
Interpod Affinity and anti-affinity
Interpod Affinity and anti-affinity
• This mechanism allows you to influence scheduling based on the labels of other
pods that are already running on the cluster
• requiredDuringSchedulingIgnoredDuringExecution
• preferredDuringSchedulingIgnoredDuringExecution
• The required type creates rules that must be met for the pod to be scheduled, the
preferred type is a “soft” type, and the rules may be met
Learn DevOps: Kubernetes - Edward Viaene
Interpod Affinity and anti-affinity
• A good use case for pod affinity is co-located pods:
• You might want that 1 pod is always co-located on the same node with
another pod
• For example you have an app that uses redis as cache, and you want
to have the redis pod on the same node as the app itself
• If the affinity rule matches, the new pod will only be scheduled on nodes
that have the same topologyKey value as the current running pod
Node1 Node2
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: pod
matchExpressions: app=myapp
- key: "app"
operator: In
values:
- myapp
topologyKey: "kubernetes.io/hostname" new pod
app=redis
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: new pod
matchExpressions: app=db
- key: "app"
operator: In
values:
- myapp
topologyKey: "failure-domain.beta.kubernetes.io/zone" Node3 eu-west-1b
• You can use anti-affinity to make sure a pod is only scheduled once on a
node
• For example you have 3 nodes and you want to schedule 2 pods, but
they shouldn’t be scheduled on the same node
• Pod anti-affinity allows you to create a rule that says to not schedule
on the same host if a pod label matches
Node1 Node2
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: new pod pod
matchExpressions: app=db app=myapp
- key: "app"
operator: In
values:
- myapp
topologyKey: "kubernetes.io/hostname"
• You might have to take this into account if you have a lot of rules and a
larger cluster (e.g. 100+ nodes)
• This will make sure that no pods will be scheduled on node1, as long as
they don’t have a matching toleration
• If the taint is applied while there are already running pods, these will not
be evicted, unless the following taint type is used:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: “NoExecute"
tolerationSeconds: 3600
• If you don’t specify the tolerationSeconds, the toleration will match and the pod
will keep running on the node
• In this example, the toleration will only match for 1 hour (3600 seconds), after
that the pod will be evicted from the node
Learn DevOps: Kubernetes - Edward Viaene
Taints and tolerations
• Example use cases are:
• If you have a few nodes with specific hardware (for example GPUs), you
can taint them to avoid running non-specific applications on those nodes
• This will automatically taint nodes that have node problems, allowing
you to add tolerations to time the eviction of pods from nodes
spec:
kubelet:
featureGates:
TaintNodesByCondition: "true"
• In the next slide I’ll show you a few taints that can be possibly added.
• Resources are the endpoints in the Kubernetes API that store collections
of API Objects
• For example, there is the built-in Deployment resource, that you can use
to deploy applications
• In the yaml files you describe the object, using the Deployment
resource type
• Operators, explained in the next lecture, use these CRDs to extend the
Kubernetes API with their own functionality
• In the demo, I’ll show you how to start using an Operator for PostgreSQL
• If you’d just deploy a PostgreSQL container, it’d only start the database
• If you’re going to use this PostgreSQL operator, it’ll allow you to also create
replicas, initiate a failover, create backups, scale
apiVersion: cr.client-go.k8s.io/v1
kind: Pgcluster
metadata:
labels:
archive: "false"
archive-timeout: "60"
crunchy_collect: "false"
name: mycluster
pg-cluster: mycluster
primary: "true"
name: mycluster
namespace: default
scheduling REST
actuator
node n
• You don’t want one person or team taking up all the resources (e.g.
CPU/Memory) of the cluster
• You can divide your cluster in namespaces (explained in next lecture) and
enable resource quotas on it
• You run a deployment with a pod with a CPU resource request of 200m
• The administrator can specify default request values for pods that don’t
specify any values for capacity
• If a resource is requested more than the allowed capacity, the server API
will give an error 403 FORBIDDEN - and kubectl will show an error
Resource Description
requests.cpu The sum of CPU requests of all pods cannot exceed this value
requests.mem The sum of MEM requests of all pods cannot exceed this value
The sum of storage requests of all persistent volume claims cannot exceed this
requests.storage
value
limits.cpu The sum of CPU limits of all pods cannot exceed this value
limits.memory The sum of MEM limits of all pods cannot exceed this value
persistentvolumeclaims total number of persistent volume claims that can exist in a namespace
• e.g. the marketing team can only use a maximum of 10 GiB of memory,
2 loadbalancers, 2 CPU cores
• Client Certificates
• Bearer Tokens
• Authentication Proxy
• OpenID
• Webhooks
Learn DevOps: Kubernetes - Edward Viaene
User Management
• Service Users are using Service Account Tokens
• a UID
• Groups
• AlwaysAllow / AlwaysDeny
• When an API request comes in (e.g. when you enter kubectl get nodes),
it will be checked to see whether you have access to execute this
command
• You can parse the incoming payload (which is JSON) and reply with
access granted or access denied
Learn DevOps: Advanced Kubernetes - Edward Viaene
RBAC
• To enable an authorization mode, you need to pass --authorization-
mode= to the API server at startup
• Most tools now provision a cluster with RBAC enabled by default (like
kops and kubeadm)
• You first describe them in yaml format, then apply them to the cluster
• First you define a role, then you can assign users/groups to that role
• You can create roles limited to a namespace or you can create roles
where the access applies to all namespaces
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""]
resources: [“pods”, “secrets”]
verbs: ["get", "watch", "list"]
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: read-pods
namespace: default
subjects:
- kind: User
name: bob
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader-clusterwide
rules:
- apiGroups: [""]
resources: [“pods”, “secrets”]
verbs: ["get", "watch", "list"]
• Pod-To-Service communication
• External-To-Service
• Every pod can get an IP that is routable using the AWS Virtual Private Network
(VPC)
• The kubernetes master allocates a /24 subnet to each node (254 IP addresses)
• There is a limit of 50 entries, which means you can’t have more than 50 nodes
in a single AWS cluster
• Although, AWS can raise this limit to 100, but it might have a performance
impact
• An Overlay Network
Pod
veth0
container 1
10.3.1.2 docker0 flannel0 host IP
10.3.1.1 10.3.x.x 192.168.0.2 MAC addr
container 2
src: 192.168.0.2
dst: 192.168.0.3
UDP
Flannel acts as network
node 2 gateway between Packet Encapsulated
nodes packet with
Pod src: 10.3.1.2
dst: 10.3.2.2
veth0
container 1 10.3.2.2 docker0 flannel0 host IP
10.3.2.1 10.3.x.x 192.168.0.3
container 2
• It allows you to easily add more nodes to the cluster without making API
changes yourself
• You drain a node before you shut it down or take it out of the cluster
• If the node runs pods not managed by a controller, but is just a single pod:
$ kubectl drain nodename --force
• Only one of them will be the leader, the other ones are on stand-by
Master 1 Master 2
authorization authorization
APIs APIs
scheduling REST scheduling REST
actuator actuator …
• If you’re going to use a production cluster on AWS, kops can do the heavy lifting for
you
• If you’re running on an other cloud platform, have a look at the kube deployment
tools for that platform
• kubeadm is a tool that is in alpha that can set up a cluster for you
• In the next demo I’ll show you how to modify the kops setup to run multiple master
nodes
Learn DevOps: Kubernetes - Edward Viaene
Demo
HA setup
Federation
Federation
• Federation allows you to manage multiple Kubernetes clusters
• etcd cluster
• federation-apiserver
• federation-controller-manager
apiVersion: v1
kind: Service
metadata:
name: example-service
annotations:
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:xx-xxxx-x:xxxxxxxxx:xxxxxxx/xxxxx-xxxx-xxxx-xxxx-xxxxxxxxx
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
• In this lecture I’ll go over the possible annotations for the AWS Elastic Load
Balancer (ELB)
Learn DevOps: Kubernetes - Edward Viaene
TLS on AWS ELB
Annotation Description
service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval
service.beta.kubernetes.io/aws-load-balancer-access-log-enabled
Used to enable access
logs on the load balancer
service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name
service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-
Cross-AZ loadbalancing
enabled
• You need to run “helm init” to initialize helm on the Kubernetes cluster
• After this, helm is ready for use, and you can start installing charts
• You can think of templates as dynamic yaml files, which can contain logic
and variables
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Release.Name }}-configmap
data:
myvalue: "Hello World"
drink: {{ .Values.favoriteDrink }}
• The favoriteDrink value can then be overridden by the user when running
helm install
mychart/
key:value
values.yaml
deployment.yaml
templates/
service.yaml
• Azure Functions
• AWS Lambda
• With these products, you don’t need to manage the underlying infrastructure
• The functions are also not “always-on” unlike containers and instances, which
can greatly the reduce cost of serverless if the function doesn’t need to be
executed a lot
Learn DevOps: Kubernetes - Edward Viaene
What is Serverless
• Serverless in public cloud can reduce the complexity, operational costs, and
engineering time to get code running
• A developer can “just push” the code and does not to worry about many
operational aspects
• Although “cold-starts”, the time for a function to start after it has not been
invoked for some time, can be an operational issue that needs to be taken
care of
Learn DevOps: Kubernetes - Edward Viaene
What is Serverless
• This is an example of a AWS Lambda Function:
• For example in AWS you would use the API Gateway, to setup a URL that
will invoke this function when visited
• OpenFaas
• Kubeless
• Fission
• OpenWhisk
• You can install and use any of the projects to let developers launch functions on your Kubernetes
cluster
• As an administrator, you’ll still need to manage the underlying infrastructure, but from a
developer standpoint, he/she will be able to quickly and easily deploy functions on Kubernetes
• In this course, I’ll demo Kubeless, which is easy to setup and use
• Python
• NodeJS
• Ruby
• PHP
• .NET
• Golang
• Others
Learn DevOps: Kubernetes - Edward Viaene
Kubeless
• Once you deployed your function, you’ll need to determine how it’ll be
triggered
• HTTP functions
• Scheduled function
Learn DevOps: Kubernetes - Edward Viaene
Kubeless
• Once you deployed your function, you need to determine how it’ll be
triggered
• AWS Kinesis
Ingress
Load Balancer
Client
App 1 App 2
Database 1 Database 2
Ingress
Load Balancer
Client
App 1 App 2
Database 1 Database 2
Ingress
Ingress • No encryption
• No retries, no failover
• No intelligent loadbalancing
• No routing decisions
• No metrics/logs/traces
• No access control
• …
App 1 App 2 App 4
App 3
Ingress App 3
Sidecar
Ingress App 3
Sidecar
• Routing decisions
App 1 App 2 App 4 • Metrics/logs/traces
• Access control
Sidecar Sidecar Sidecar
Management interface
App 1 App 2
Envoy Envoy
proxy proxy
hello world
hello world
Envoy Envoy
Weight (10%) proxy proxy
istio-ingress
Pod Latency: 5s
• Attacks like impersonation by rerouting DNS records will fail, because a fake
application can’t prove its identity using the certificate mechanism
hello world
!!!
Envoy Envoy
proxy proxy
!!! world
hello
Envoy Envoy
proxy proxy
• Based on those identities, we can start to doing Role Based Access Control
(RBAC)
• RBAC allows us to limit access between our services, and from end-user to
services
• Istio is able to verify the identity of a service by checking the identity of the x.509
certificate (which comes with enabling mutual TLS)
• Good to know: The identities capability in istio is built using the SPIFFE standard
(Secure Production Identity Framework For Everyone, another CNCF project)
Learn DevOps: Kubernetes - Edward Viaene
RBAC
• RBAC in istio (source: https://istio.io/docs/concepts/security/)
• For example, in the demo, we’ll only enable it for the “default”
namespace
apiVersion: "rbac.istio.io/v1alpha1"
• We can then create a ServiceRole that specifies the rules and a
kind: RbacConfig
ServiceRoleBinding to link a ServiceRole to a subject (for example
metadata: a
name: default
Kubernetes ServiceAccount)
spec:
mode: 'ON_WITH_INCLUSION'
inclusion:
namespaces: ["default"]
istio-ingress
• After having strong identities using the x.509 certificates that mutual
TLS provides, I showed you how to use role based access control
(RBAC)
• it’s an open standard for representing claims securely between two parties (see
https://jwt.io/ for more information)
• In our implementation, we’ll receive a JWT token from an authentication server after
logging in (still our hello world app)
• The app will provide us with a token that is signed with a key
• The data is not encrypted, but the token contains a signature, which can be
verified to see whether it was really created by the server
• Only the server has the (private) key, so we can’t recreate or tamper with the
token
Learn DevOps: Kubernetes - Edward Viaene
End-user authentication
• This is an example of a token:
• eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODk
wIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJ
SMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
• In the JWT payload, data can be stored, like the username, groups, etc
• This can then used later by the app, when the users sends new requests
• If the signature in the token is valid, then the JWT is valid, and the
data within the token can be used
• Once validated the service would need to check whether the user has access to
this service (authorization)
• With istio, this can be taken away from the app code and managed centrally
• You can configure the jwt token signature/properties you expect in istio, and create
policies to allow/disallow access to a service
• For example: the “hello” app can only be accessed if the user is authenticated
• The sidecar will verify the validity of the signature, to make sure the token is valid
client Policy
- jwt:
issuer: "[email protected]"
jwksUri: "http://auth.kubernetes.newtech.academy/.well-known/jwks.json"
principalBinding: USE_ORIGIN
Pod istio-ingress
/login Pod
hello-auth jwt token hello
Envoy
proxy Envoy
jwt token proxy
retrieve jwks.json
istio-ingress
Pod
Service
hello X
Envoy
proxy
istio-ingress
trace
trace trace
• For example, when you create a new pod, a request will be sent to the
kubernetes API server, and this can be intercepted by an admission
controller
kube-apiserver --enable-admission-plugins=NamespaceLifecycle,...
Using the “LimitRange” object type, you set the default and limit
LimitRanger cpu/memory resources within a namespace. The LimitRanger
admission controller will ensure these defaults and limits are applied
Makes sure that kubelets (that run on every node) can only modify
NodeRestriction
their own Node/Pod objects (objects that run on that specific node)
You can setup a webhook that can modify the object being sent to
MutatingAdmissionWebhook the kube-apiserver. The MutatingAdmissionWebhook ensures that
matching objects will be sent to this webhook for modification
You can setup a webhook that can validate the objects being sent
ValidatingAdmissionWebhook to the kube-apiserver. If the ValidatingAdmissionWebhook rejects
the request, the request fails
Request Object
API HTTP Authn Mutating Validating Persisted to
Handler Authz admission schema etcd
admission
validation
Webhook
Webhook Webhook
Webhook
• For example:
• Make sure containers only run within a UID / GID range, or make sure
that containers can’t run as root
• It’ll determine whether the pod meets the pod security policy based on the
security context defined within the pod specification
• In the next demo I’ll show you how to enable it and create a PodSecurityPolicy
to implement some extra security controls for new pods that are created
• One for the system processes, because some of them need to run
privileged / as root
• One for the pods users want to schedule which should be tighter than the
system policy (for example deny privileged pods)
• Skaffold can also be used as a tool that can be incorporated into your CI/
CD pipeline, as it has the build/push/deploy workflow built-in
• That way you can have the workflow locally to test your app, and have
it handled by your CI/CD in the same way
• Other builds next to docker are also possible, like Bazel, build packs
or custom builds
Cleanup
File sync images &
resources
• etcd is a distributed and reliable key-value store for the most critical
data of a distributed system
Source: https://github.com/etcd-io/etcd
Learn DevOps: Kubernetes - Edward Viaene
etcd
• All Kubernetes objects that you create are persisted to the etcd backend
(typically running inside your cluster)
• If you have a 1-master Kops cluster or a minikube setup, you’ll typically have a
1-node etcd cluster
• The latency between your nodes should be low, as heartbeats are sent
between nodes
• If you have a cluster spanning multiple DCs you’ll need to tune your
heartbeat timeout in the etcd cluster
Learn DevOps: Kubernetes - Edward Viaene
etcd
• A write to etcd can only happen by the leader, which is elected by an
election algorithm (as part of the raft algorithm)
• If a write goes to one of the other etcd nodes, the write will be routed
through the leader (each node also knows who the leader node is)
• etcd will only persist the write if a quorum agrees on the write
• etcd supports snapshots to take a backup of your etcd cluster, which can
store all data into a snapshot file
• Unlike Kops, EKS will fully manage your master nodes (which includes the
apiserver and etcd)
• You pay a fee for every cluster you spin up (to pay for the master nodes) and
then you pay per EC2 worker that you attach to your cluster
• It’s a great alternative for kops if you want to have a fully managed cluster and
not deal with the master nodes yourself
• Depending on your cluster setup, EKS might be more expensive than running
a kops cluster - so you might still opt to use Kops for cost reasons
Learn DevOps: Kubernetes - Edward Viaene
AWS EKS
• EKS is a popular AWS service and supports lots of handy features:
• AWS created its own VPC CNI (Container networking interface) for EKS
• AWS can even manage your workers to ensure updates are applied to
your workers
• Service Accounts can be tied to IAM roles to use IAM roles on a pod-level
• Integrates with many other AWS services (like CloudWatch for logging)
• You can find the documentation at eksctl.io, where you will also find the
download instructions
• You can also pass a yaml based configuration file if you want set your
own configuration, like VPC subnets (otherwise it’ll create VPC and
subnets for you)
• With this feature you can specify IAM policies at a pod level
• For example: one specific pod in the cluster can access an s3 bucket, but
others cannot
• Previously, IAM policies would have to be set up on the worker level (using EC2
instance roles)
• With IAM Roles for Service Accounts, it lets you hand out permissions on a more
granular level
• One major caveat, the app running in the container that uses the AWS SDK must
have a recent SDK to be able to work with these credentials
Learn DevOps: Kubernetes - Edward Viaene
IRSA
• IAM Roles for Service Accounts uses the IAM OpenID Connect provider (OIDC) that
EKS exposes
• To link an IAM Role with a Service Account, you need to add an annotation to the
Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::AWS_ACCOUNT_ID:role/IAM_ROLE_NAME
• The EKS Pod Identity Webhook will then automatically inject environment variables
into the pod that have this ServiceAccount assigned (AWS_ROLE_ARN &
AWS_WEB_IDENTITY_TOKEN_FILE)
• These environment variables will be picked up by the AWS SDK during authentication
Pod
Assumes role
AWS SDK
IAM role
AWS Service
e.g. S3
• It can synchronise your version control (git) and your Kubernetes cluster
• With flux, you can put manifest files (your kubernetes yaml files) within
your git repository
• Flux will monitor this repository and make sure that what’s in the manifest
files is deployed to the cluster
• Flux also has interesting features where it can automatically upgrade your
containers to the latest version available within your docker repository (it
uses semantic versioning for that - e.g. “~1.0.0”)
• You declaratively describe the entire desired state of your system in git