0% found this document useful (0 votes)
143 views105 pages

Kubernetes + Docker Cheatsheet

The document provides a comprehensive guide on Docker and Kubernetes, detailing their architectures, installation processes, and command usage. It covers Docker components such as the client, daemon, and container lifecycle, as well as Kubernetes objects, features, and security practices. Additionally, it includes troubleshooting tips and recommendations for best practices in using Docker and Kubernetes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views105 pages

Kubernetes + Docker Cheatsheet

The document provides a comprehensive guide on Docker and Kubernetes, detailing their architectures, installation processes, and command usage. It covers Docker components such as the client, daemon, and container lifecycle, as well as Kubernetes objects, features, and security practices. Additionally, it includes troubleshooting tips and recommendations for best practices in using Docker and Kubernetes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

Table of content

Docker Architecture 2
Installing docker components 4
docker client and daemon 4
docker-compose 4
docker-machine 4
Docker container lifecycle 5
docker subcommands 5
Getting help 7
docker run 7
docker build 8
docker rm 8
docker network 8
docker volume 8
docker bulk commands 9
Dockerfile 9
Dockerfile syntax 9
Dockerfile instructions 9
docker-compose 15
docker-machine 16
Swarm (multinode) 17
docker node 17
docker service 18
Docker Security 18
Recommendations 19
Images 19
Containers 19
Docker Swarm 19

Kubernetes Architecture 20

Skew support policy 21

kubectl command 22
Integrated Development Environment 23
Dashboard 23

Kubernetes Objects and Features 24


Basics 25
Namespaces 25
Nodes 25
Pods 26
Labels and Selectors 29
Replication Controllers 30
ReplicaSets 32
Deployments 32
Jobs 35
Cronjobs 36
DaemonSets 37
StatefulSets 38
Annotations 40
Authentication and Authorization 40
Admissions Controllers 41
Service Accounts 42
Certificate-based Authentication 43
kubeconfig 43
Role-Based Access Control 44
Networking 50
Services 50
CNI 56
Network Policies 56
What You can’t do with network policies: 62
DNS in Kubernetes 62
Ingress 65
Storage 69
Volumes 69
ConfigMaps 73
Secrets 74
Persistent Volume Claims 77
Features 78
NodeSelector 78
Affinity and anti-affinity rules 79
Taints and Tolerations 81
Liveness, readiness and startup probes 82
Managing limits and quotas for CPU and Memory 84
LimitRange 86
QoS 86
initContainers 87
Pod priorities and preemptions 87
Logging 88

Security 94
Security basics 94
Pentests 94
SecurityContexts 95

Troubleshooting 96
Troubleshooting commands 98
Kubernetes deployment tools 99
Useful tools and apps around Kubernetes 100

Helm (highly recommended!!) 100


Notes 103

1
Docker Architecture

https://docs.docker.com/engine/images/architecture.svg

daemon: process that runs on a host machine (server)

client: primary Docker interface

image: read-only template (build component)

registry: public or private image repositories (distribution, ship component)

container: created form image, holds everything that is needed for an application to run (run
component)

Underlying technologies
namespaces Cgroups

pid memory
net CPU
mnt block I/O
ipc network
uts

union file systems container format

UnionFS libcontainer
AUFS LXC
btrfs
vfs
DeviceMapper

2
● namespaces
o pid namespace: used for process isolation (Process ID)
o net namespace: used for managing network interfaces
o mnt namespace: used for managing mount-points
o ipc namespace: used for managing access to IPC resources (InterProcess
runCommunication)
o uts namespace: used for isolating kernel and version identifiers (Unix Timesharing
System)

● control groups (cgroups)


o used for sharing available hardware resources
o and setting up limits and constraints

● union file system (UnionFS)


o file system that operate by creating layers
o many layers are merged and visible as one consistent file system
o many available file systems: AUFS, btrfs, vfs, DeviceMapper

https://devopscube.com/wp-content/uploads/2015/02/docker-filesystems-busyboxrw.png

https://docs.docker.com/storage/storagedriver/images/container-layers.jpg

3
● container format
o two supported container formats: libcontainer, LXC

https://i.stack.imgur.com/QVNR6.png

Installing docker components

docker client and daemon


Ubuntu from official package (older version):
apt-get update
apt-get install docker.io -y

The newest version:


curl -fsSL get.docker.com -o get-docker.sh
sudo sh get-docker.sh
docker --version

docker-compose
sudo curl -L
https://github.com/docker/compose/releases/download/$dockerComposeVersion/doc
ker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version

Change /$dockerComposeVersion to the newest docker-compose version.

docker-machine
curl -L
https://github.com/docker/machine/releases/download/v0.12.2/docker-machine-`u
name -s`-`uname -m` >/tmp/docker-machine &&

4
chmod +x /tmp/docker-machine &&
sudo cp /tmp/docker-machine /usr/local/bin/docker-machine

Docker container lifecycle

docker subcommands
docker attach Attach local standard input, output, and error streams to a running container

docker build Build an image from a Dockerfile

docker commit Create a new image from a container’s changes

docker cp Copy files/folders between a container and the local filesystem

docker deploy Deploy a new stack or update an existing stack

docker diff Inspect changes to files or directories on a container’s filesystem

docker events Get real time events from the server

docker exec Run a command in a running container

docker export Export a container’s filesystem as a tar archive

docker history Show the history of an image

docker images List images

docker import Import the contents from a tarball to create a filesystem image

docker info Display system-wide information

5
docker inspect Return low-level information on Docker objects

docker kill Kill one or more running containers

docker load Load an image from a tar archive or STDIN

docker login Log in to a Docker registry

docker logout Log out from a Docker registry

docker logs Fetch the logs of a container

docker network Manage networks

docker node Manage Swarm nodes

docker port List port mappings or a specific mapping for the container

docker ps List containers

docker pull Pull an image or a repository from a registry

docker push Push an image or a repository to a registry

docker rename Rename a container

docker rm Remove one or more containers

docker rmi Remove one or more images

docker run Run a command in a new container

docker save Save one or more images to a tar archive (streamed to STDOUT by default)

docker search Search the Docker Hub for images

docker service Manage services

docker stack Manage Docker stacks

docker start Start one or more stopped containers

docker stats Display a live stream of container(s) resource usage statistics

docker stop Stop one or more running containers

docker swarm Manage Swarm

docker system Manage Docker

docker tag Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE

docker top Display the running processes of a container

docker update Update configuration of one or more containers

docker version Show the Docker version information

docker volume Manage volumes

docker wait Block until one or more containers stop, then print their exit codes

6
Getting help
docker --help list all docker commands

docker command --help list subcommand options

docker run
foreground mode (default) - stdout and stderr are redirected to the terminal, docker run
propagates the exit code of the main process

-d the container is run in detached mode

-t allocate pseudo terminal for the container (stdin closed immediately, terminal signals are not
forwarded)

-i stdin open, terminal signals forwarded to the container

-itd open stdin, allocate terminal and run process in the background (required for docker attach)

-u user used to tun container

-w working directory

-e setting environment variables

-h container hostname

-l sets labels

-p host_port:container_port publish container port on the host

-P Publish all exposed ports to random ports (30000-32767)

-v /path creates random name volume and attach it to the container path

volume_name:/container_path create named volume and attach it to the container


path

host_path:/container_path mounts host directory in container path - host_path


has to be absolute path

--entrypoint overwrite entrypoint defined in Dockerfile

--log-driver logging driver for the container

--log-opt log driver options

--name assign name to container (by default a random name is generated → adjective name)

--network connect a container to a network

--rm automatically removes container after exit

--network attach docker interface to the specified network (by default it connect container to
the bridge network)

7
docker build
-f used for custom dockerfile names

-t tags image after build

--build-arg set build-time variables

--rm deletes intermediate containers after successful build

docker rm
-f force removal of a running container

-v remove the volumes associated with the container

docker network
connect connect a container to a network

create create a network

disconnect disconnect a container from a network

inspect display detailed information about the network

ls list networks

prune remove all unused networks

rm remove one or more networks

docker volume
create Create a volume

inspect Display detailed information on one or more volumes

ls List volumes

prune Remove all unused volumes

rm Remove one or more volumes

8
docker bulk commands
docker rm -f $(docker ps -q) - delete all running containers containers

docker rm -f $(docker ps -qa) - delete all containers

docker start/stop $(docker ps -qa) - start/stops all containers

docker kill $(docker ps -q) - kill all running containers

docker rmi $(docker images -q) - delete all docker images

Dockerfile
Dockerfile syntax

Dockerfile instructions
FROM – sets the base image (required)

FROM ubuntu

FROM ubuntu:18.04

FROM
ubuntu@sha256:34471448724419596ca4e890496d375801de21b0e67b81a77fd615
5ce001edad

RUN – execute any command in containers and creates new fs layer, RUN use 2 forms:

shell - RUN COMMAND

by default run commands in /bin/sh -c command. Long commands can be split by backslash
mark “\” and commands can be launched in sequence using “&&”.You can also use different
operators in this form.

exec - RUN [“executable”, ”param1”, ”param2”]

This form is recommended when you are configuring CMD or ENTRYPOINT. exec form can pass
linux signals as opposed to shell form.

9
RUN apt-get update && \

apt-get install vim -y

COPY – copy files into containers / ADD has more functionalities as: unpack tar archives, copy files
and remote file URLs into containers

COPY/ADD $HOST_PATH $CONTAINER_PATH

COPY test.sh /

ADD file.tar.xz /

- Both instructions copy files into Docker images.


- COPY is simpler
- ADD provides additional functionalities as:

local tar automatic extraction into image


can fetch packagers from remote URLs just by adding them as source

- COPY and ADD are using the same instruction forms:

ADD [--chown=<user>:<group>] <src1> COPY [--chown=<user>:<group>]


<src2> ... <dest> <src1> <src2> ... <dest>

ADD [--chown=<user>:<group>] COPY [--chown=<user>:<group>]


[”<src1>”, ”<src1>”, ... ”<dest>”] [”<src1>”, ”<src1>”,... ”<dest>”]

- The --chown feature is only supported on Dockerfiles used to build Linux containers, and
will not work on Windows containers.

SHELL - Sets default shell that will be used for other instructions

- Each SHELL instruction overwrite all previous SHELL instructions (it also affects all
subsequent instructions)
- This instruction is commonly used on Windows to switch between cmd and Powershell
- The SHELL instruction must be written in JSON form in a Dockerfile.

SHELL [”executable”, ”parameters”]

SHELL ["/bin/bash", "-c"]

CMD – provides defaults for running containers / ENTRYPOINT - configure container that will run as
executable

- Both of them specify the startup command for an image and both of them can be easily
overwritten with docker run command

10
- CMD is the default one
- ENTRYPOINT run container as an executable
- CMD and ENTRYPOINT are using the same instruction forms:

CMD executable param1 param2 ... ENTRYPOINT executable param1 param2 ...

CMD [”executable”, ”param1”, ENTRYPOINT [”executable”, ”param1”,


”param2”, ...] “param2”, ...]

- ALWAYS when possible use exec form!


- CMD and ENTRYPOINT can be used together, in that case CMD will be appended to the
ENTRYPOINT instruction (CMD can be easily overwritten with docker run)

ENV – sets environment variables

- Sets image environment variable


- Variables declared in Dockerfile can be easily overwritten by the –e option (docker run)
- Environment vars can be declared in 2 ways:

ENV <key> <value>

This form creates single cache layer per one ENV instruction

ENV <key>=<value> <key>=<value> ...

This declaration requires “” or \ for escaping spaces, 1 ENV instruction means creation of 1 Docker
image layer.

Declared env vars can be used in Dockerfile using the following 2 syntaxes:

$KEY

${KEY}

EXPOSE – sets port on which containers will be listening

- it is just for informational purpose (it do not publish your container port on the host port)
- publishing ports on nodes require adding –p or –P flag at docker run command
- ports can be published externally on a different port numbers

EXPOSE <port> <port2> …

EXPOSE 80 443

USER – sets the username

- sets the user name (or UID) and a group for a running container (also works for RUN, CMD
and ENTRYPOINT instructions)
- if user isn’t assigned to a primary group, root group will be used

11
USER <user>[:<group>]

USER <UID>[:<GID>]

VOLUME – mounts volume from HostOS

- Creates mount point and marks it as holding externally mounted volumes from host and
other containers perspective
- If any kind of data is available at VOLUME path, during container run it will be copied into
VOLUME (on the host)
- On Windows, destination of a volume (inside a container) has to be a non-existing or empty
directory, and has to be other than drive C:
- Order mathers! If any build step change data in a volume after volume declaration, all newly
created or edited data will be discarded
- There is no option to point host directory in Dockerfile (it has to be done by using docker run
command)

VOLUME /mountpath

VOLUME /myvolume

VOLUME [”/myvolume”]

MAINTAINER – sets Author of the image (deprecated)

LABEL – adds metadata to the image

LABEL $KEY=$VALUE

LABEL MAINTAINER=”[email protected]

ARG - define variables available during build-time

- Defines variables that can be passed at build-time to the docker daemon (--build-arg
<var>=<value>)
- ENV variables overwrite, vars defined by ARG
- ARG vars are available form the place in which they are declared
- Do not use credentials with ARG instructions
- ARG variables can contain default values

ARG <var>

ARG <var> <deafult_value>

ONBUILD - add trigger used when the image is used as the base image

- adds a trigger instruction that will be executed when the image will be used as a base image
(instruction will be executed in downstream build)

12
- any build instruction can be used (except FROM or MAINTAINER)

ONBUILD INSTRUCTION ARGUMENTS

ONBUILD RUN curl --data "(hostname=$hostname"


https://example.com/images/use

WORKDIR - sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions
that follow it in the Dockerfile

WORKDIR $CONTAINER_PATH

WORKDIR /usr/local/bin

STOPSIGNAL – sets the system call signal that will be sent to the container to exit

STOPSIGNAL integer

STOPSIGNAL signal_name

STOPSIGNAL SIGSTOP

Signal Name Description

SIGHUP 1 Hangup (POSIX)

SIGINT 2 Terminal interrupt (ANSI)

SIGQUIT 3 Terminal quit (POSIX)

SIGILL 4 Illegal instruction (ANSI)

SIGKILL 9 Kill(can't be caught or ignored) (POSIX)

SIGTERM 15 Termination (ANSI)

SIGSTOP 19 Stop executing(can't be caught or ignored) (POSIX)

HEALTHCHECK – sets container healthchecks, that probes app in container

- Tells docker how to check if application is working properly


- Setting HEALTHCHECK instruction for docker container creates new health status column
(when using docker ps) that assumes 3 container states: starting, healthy and unhealthy
- There can be only one HEALTHCHECK in Dockerfile (only last one take effect)

HEALTHCHECK --interval=DURATION (30s) --timeout=DURATION (30s)


--start-period=DURATION (0s) --retries=N (3) CMD command

HEALTHCHECK --interval=5m --timeout=3s CMD curl -f http://localhost/


|| exit 1

13
Build context

The build context is the directory at a specified location PATH or URL. The PATH is a directory on your
local file system. The URL is a Git repository location.

Docker client is compressing the directory (and all subdirectories) and sends it to docker daemon.

docker context is pointed using the last argument of docker build command (usually by “.”)

docker build -t test:image .

docker build -t ubuntu:test .

docker build -t ubuntu:test directory/

docker build -t ubuntu:test https://github.com/ubuntu/test.git

docker build -t ubuntu.test http://ubuntu.com/test.tar.gz

Building images

docker build -t $IMAGE_NAME:$IMAGE_TAG $DOCKER_CONTEXT

docker build -t image:tag .

Multi-stage builds

Multi-stage builds allow for storing only the result of build in the second image, without
development dependencies. It requires at least 2 FROM instructions.

#STAGE 0

FROM golang:1.7.3

WORKDIR /go/src/github.com/alexellis/href-counter/

RUN go get -d -v golang.org/x/net/html

COPY app.go .

RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

#STAGE 1

FROM alpine:latest AS FINAL

RUN apk --no-cache add ca-certificates

WORKDIR /root/

COPY --from=0 /go/src/github.com/alexellis/href-counter/app .

CMD ["./app"]

14
The COPY --from=0 line copies just the built artifact from the previous stage into this new stage.
The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.

Building just 1 stage:

docker build --target FINAL -t alexellis2/href-counter:latest .

Use external image as stage:

COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf

Great article about writing Dockerfiles:


https://rock-it.pl/how-to-write-excellent-dockerfiles/

docker-compose
docker-compose - allows for managing many containers with one
command and one file

build Build or rebuild services

config Validate and view the compose file

create Create services

down Stop and remove containers, networks, images, and volumes

exec Execute a command in a running container

help Get help on a command

kill Kill containers

port Print the public port for a port binding

ps List containers

rm Remove stopped containers

run Run a one-off command

scale Set number of containers for a service

start Start services

stop Stop services

up Create and start containers

version Show the Docker-Compose version information

15
docker-compose.yaml - file structure

version: '$VERSION' # 2 or 3, 3 is recommended


services: # this section define microservices stack
$SERVICE_NAME: # service name defines container dns name, with compose_ prefix
container_name: $DNS_NAME # overwrite container name - by default inherited
from service
image: $IMAGE_NAME # image used for a container
deploy: # this section works only in Swarm mode defines the
number of replicas, way in which containers are updated restart policy and more
replicas: $REPLICAS_NUMBER # define number of container replicas
update_config: # define the way in which containers are updated
parallelism: $CONTAINER_NUMBER# how many containers are updated at once
delay: $NUMBER_OF_SECONDSs # defines update periods
restart_policy:
condition: $RESTART_CONDITION # when dockerd will restart container: none,
on-failure, any (default)
ports: # define container ports
- "$CONTAINER_PORT" # define container port for inter-containers communication
- "$HOST_PORT:CONTAINER_PORT" # expose container on host port
volumes: # define volumes or mount points used by container
- $VOLUME_NAME:CONTAINER_PATH# mounting named volume
- $HOST_PATH:CONTAINER_PATH # mounting hostpath
- $CONTAINER_PATH # mounting randomly-named volume
networks: # define networks to which container will be connected
- $NETWORK_NAME
depends_on: # sets dependencies between services
- $SERVICE_NAME
stdin_open: true # open stdin for main process
tty: true # reserve tty for main process
volumes: # volumes section is used for volume creation
$VOLUME_NAME: # volume name, this equals to docker volume create command
networks: # network section is used for networks management
$NETWORK_NAME: # network name, this equals to docker network create command

docker-machine
docker-machine

active Print which machine is active

config Print the connection config for machine

create Create a machine

env Display the commands to set up the environment for the Docker client

inspect Inspect information about a machine

ip Get the IP address of a machine

16
kill Kill a machine

ls List machines

provision Re-provision existing machines

regenerate-certs Regenerate TLS Certificates for a machine

restart Restart a machine

rm Remove a machine

ssh Log into or run a command on a machine with SSH.

scp Copy files between machines

start Start a machine

url Get the URL of a machine

version Show the Docker Machine version or a machine docker version

help Shows a list of commands or help for one command

docker-machine create $MACHINE_NAME

eval $(docker-machine env machine_name) activates machine (points docker client to


docker machine)

Swarm (multinode)
docker swarm

init Initialize a swarm

join Join a swarm as a node and/or manager

leave Leave the swarm

docker node
demote demote one or more nodes from manager in the swarm

inspect display detailed information about the node

ls list swarm nodes

promote promote nodes in the swarm

ps list tasks running on the nodes

rm remove node from the swarm

17
docker service
create create new service

inspect display detailed information about the service

logs fetch the logs of a service or task

ls list services

ps list tasks

rm remove service

scale scale one or multiple replicated services

Docker Security
- Docker is only as secure as the underlying host.
- Get familiar with Center for Internet Security Benchmarks! (www.cisecurity.org)

Linux:

CIS Distribution Independent Linux Benchmark v1.1.0 (409p.)

Docker:

CIS Docker Community Edition Benchmark v1.1.0 (230p.)

- All of those powerful documents can be downloaded for free!!

docker-bench-security is an automated checker based on the CIS benchmarks


(github.com/docker/docker-bench-security)

docker run \
-it \
--net host \
--pid host \
--userns host \
--cap-add audit_control \
-e DOCKER_CONTENT_TRUST=$DOCKER_CONTENT_TRUST \
-v /var/lib:/var/lib \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /usr/lib/systemd:/usr/lib/systemd \
-v /etc:/etc --label docker_bench_security \
docker/docker-bench-security

Tools used for static code analysis:


anchore (github.com/anchore/anchore-engine)

18
clair (github.com/coreos/clair)

Runtime security:
Host can be also secured from unwanted access by attaching apparmor, seccomp and selinux
security profiles.

Recommendations
Images
- Keep images as small as possible
- Cut back number of layers
- Use appropriate - already built images (nginx:1.15 instead of ubuntu + nginx)
- Create your own base images if your images has a lot off in common
- Always declare image tag (avoid using default :latest tag)
- Plan and apply consistent naming convention for image tags (staging, production, alpha,
beta)
- Create images that are as general as possible, mount config files using Configs or ConfigMaps
(as well as Secrets).
- Use multistage builds

Containers
- Deploy applications using manifest files (docker-compose or k8s manifests)
- Always set containers limits (especially for cpu and memory)
- Use one of the container orchestration engines
- Remove all volume bindings (store app code in the image)
- Set different env variables
- Specify restart policies and healthchecks
- In Swarm use compose files that will define the entire application (from many microservices).
- In Kubernetes use Helm package manager to maintain application configuration (Highly
recommended)

Docker Swarm
- Design your apps to be stateless - stateless apps are much easier to scale
- A service’s configuration is declarative, and Docker is always working to keep the desired and
actual state in sync.
- Use docker stack with Swarm (compose version ”3”)
- Use docker registry (do not build images using build option in docker-compose.yaml!)
- Consider using even 1 node in Swarm mode:
- use configs and secrets for storing configuration files and credentials
- scale up and down your containers
- built-in HA and Load Balancing

19
Kubernetes Architecture

https://lh3.googleusercontent.com/EU3DgtFKagWp5S0UpKj-wRgx8WK2nvQ2BG-4dGio57pGNj42A7Lip9IARBba34hIm84-_7z
wWt6iImQE8beSqLxpzXm-2w_84M_X2IHQ7jvpWtIDMF81hmq6N4hGSxp6DQoFW5qX

20
Master components:

1. kube-apiserver - Its main Kubernetes component that is managing all Master and Worker
components using REST API. The Kubernetes API server validates and configures data for
the api objects which include pods, services, replicationcontrollers, and others.

2. etcd - Kubernetes datastore used for saving cluster state in distributed key-value store.
Etcd is the most crucial cluster component, and the only one that needs to be backed up.

3. kube-scheduler - It's a component that is responsible for scheduling pods on nodes.


Scheduler is making decisions taking into account: node health, resource utilization, node
selectors, affinity/anti-affinity rules, taints and tolerations etc.
4. kube-controller-manager - It's responsible for running Kubernetes controllers: replication
controller, endpoint controller, node controller, service account controller, token controller etc.
Controllers are running endless control loops that regulate the state of Kubernetes
components.

5. cloud-controller-manager (optional) - Its a component that integrates Kubernetes with


cloud providers services. Using cloud-controller-manager Kubernetes can:
- dynamically request volumes and attach them to the pods
- create (and configure) load balancer and (using services) attach it to the pods
- and many more

Worker (minion) components:

1. kubelet - It's a daemon that is responsible for managing container Runtime that is
installed on each worker node.
2. kube-proxy - Provides service abstraction by passing traffic into pods (using iptables).
3. container runtime - It's a set of standards and technologies that together can run
containerized applications.

Skew support policy


The Kubernetes project maintains release branches for the most recent three minor releases
(1.21, 1.20, 1.19). Kubernetes 1.19 and newer receive approximately 1 year of patch
support. Kubernetes 1.18 and older received approximately 9 months of patch support.

Current Kubernetes version: 1.21


Kubernetes skew support policy is different for different components.

kube-apiserver - X-1 -> (1.21, 1.20)


controller-manager - X-1 -> (1.21, 1.20)

21
kube-scheduler - X-1 -> (1.21, 1.20)
kubelet - X-2 -> (1.21, 1.20, 1.19)
kube-proxy - X-2 -> (1.21, 1.20, 1.19)
kubectl - X+1 > X-1 -> (1.22,1.21, 1.120)

Recommended way of updating Kubernetes is 1 major version per upgrade.

Upgrade process is divided in 2 steps:


- Master upgrade
- Workers upgrade

kubectl command
kubectl is a binary used for Kubernetes objects management. Using this command you can
view, create, delete and edit K8s objects.

kubectl binary installation:


https://kubernetes.io/docs/tasks/tools/install-kubectl/

Syntax:

kubectl [command] [TYPE] [NAME] [flags]

command: Specifies the operation that you want to perform on one or more resources, for
example create, get, describe, delete.

TYPE: Specifies the resource type - pod, deployment, job, cronjob etc

NAME: Specifies the name of the resource. Names are case-sensitive. If the name is omitted,
details for all resources are displayed, for example kubectl get pods.

flags: Specifies optional flags.

Examples:

Command Description

kubectl run nginx --image nginx Create nginx deployment

kubectl expose deployment nginx Expose nginx deployment on


--type=NodePort hosts

kubectl get pods Get list of running pods

22
kubectl describe pod $POD_NAME Describe pod settings

kubectl port-forward $POD_NAME -i Create tunnel to app in a pod

kubectl exec -it $POD_NAME -- command Execute command in a pod

kubectl label pods $POD_NAME Add label to a pod


LABEL=VALUE

kubectl create/apply -f Create object based on


$MANIFEST_FILE manifest file

kubectl create/apply -f Create objects based on


$MANIFESTS_DIR manifests files in directory

Official kubectl cheat sheet is available at:


https://kubernetes.io/docs/reference/kubectl/cheatsheet/

Getting help in kubectl command:

kubectl/kubectl help - shows available subcommands - [command]


kubectl [command] --help - shows available [command] flags

Integrated Development Environment

Visual Studio Code (https://code.visualstudio.com/)


With Kubernetes, Kubernetes Support, Kubernetes templates extensions
Visual Studio Code can automatically create the entire example manifest files.

How to use templates?


1. Create an empty file with .yaml or .yml extension (save it).
2. Try to type “se” and you will get a popup with a list of available templates that names start
from “se” string: Services, ServiceAccounts etc.

IntelliJ IDEA (https://www.jetbrains.com/idea/)


Extension: Kubernetes

Dashboard
Remember that a dashboard is a separate component that is not by default installed on
production environments. It can be installed as addon (minikube, kubespray) or
separate application (helm).

Minikube:
Installing dashboard: minikube addons enable dashboard

23
Connecting to dashboard: minikube dashboard

Helm:
https://github.com/kubernetes/dashboard/tree/master/aio/deploy/helm-chart/kubernetes-dash
board

The most secure way of connecting to Kubernetes Dashboard is setting up tunnel to


kube-api with kubectl proxy command, after that dashboard is available at:

http://localhost:8001/api/v1/namespaces/kube-system/services/https:k
ubernetes-dashboard:/proxy/

Remember that Dashboard does not support certificate based authentication, you need to
use token instead.

Kubernetes Objects and Features


Children guides (recommended for beginners):

The Illustrated Children's Guide to Kubernetes


https://www.youtube.com/watch?v=Q4W8Z-D-gcQ
A Kubernetes story: Phippy goes to the zoo
https://www.youtube.com/watch?v=R9-SOzep73w

Managing Kubernetes Objects

Imperative commands:
kubectl run nginx --image nginx
kubectl create deployment nginx --image nginx
Imperative object configuration:
kubectl create/apply -f file/url
Declarative object configuration (recommended!):
kubectl create/apply -f directory/

Required object fields:

apiVersion - Which version of the Kubernetes API you’re using to create this object
kind - What kind of object you want to create (treat it as programming class
https://simple.wikipedia.org/wiki/Class_(programming)
metadata - Data that helps uniquely identify the object, including a name string, UID, and
optional namespace (treat is a object creation)
spec - The precise format of the object spec is different for every Kubernetes object, and
contains nested fields specific to that object (treat it as object specification)

24
Each Kubernetes object requires the above fields. In many cases if you will not specify
values for specification fields, cluster defaults will be applied.

API documentation: https://kubernetes.io/docs/reference/kubernetes-api/

Basics

Namespaces
Used for object separation, provides scope for names, and splits physical cluster into logical
spaces. You can also limit the quota for each namespace for resources utilization. Most
Kubernetes objects are scoped to Namespace. You cannot have 2 the same object types
(for example pods) with the same name in the same namespace.

There are 3 default namespaces (depends on Kubernetes release):


default - default namespace for Kubernetes objects without a namespace
kube-system - namespace for objects created by Kubernetes
kube-public - namespace for public resources

Namespace and DNS


Each Service object created in Kubernetes lives in
<namespace-name>.svc.cluster.local domain. It means that when you would like to
contact service from different namespace you have to use its FQDN:
<service-name>.<namespace-name>.svc.cluster.local

Example manifest:

apiVersion: v1
kind: Namespace
metadata:
name: production

Nodes
From Kubernetes perspective Node is an object (machine) on which workloads can be
started. For K8s it doesn’t matter if Node is a physical or virtual machine. Remember that
nodes cannot be created by Kubernetes itself.

Interaction with node interface can be currently done using 3 components:


- node controller
- kubelet
- kubectl

Node status fields include 4 main components (depend on K8s distribution):

25
addresses Hostname, InternalIP, ExternalIP

conditions OutOfDisk, MemoryPressure, DiskPressure,


NetworkUnavailable, Ready, ConfigOK, PIDPressure,
KubeletReady

capacity Describes resources available on node: CPU, memory and


maximum Pod number.

info (nodeInfo) Kernel Version, OS Image,


Operating System, Architecture, Container Runtime Version,
Kubelet Version, Kube-Proxy Version

allocatable Amount of resources that can be used by PODs

daemonEndopints Kubelet endpoint

images Images available on node

Pods
Pod is the smallest shedulable unit in Kubernetes. A pod describes an application running on
Kubernetes. Pod defines a group of containers that share: namespaces, volumes, ip address
and port space. Each Pod has a unique IP address, containers within a POD can
communicate using localhost - this means that port conflict occurs between containers in a
Pod. Kubernetes Pods are mortal. They are born and they die, and they are not resurrected.

Example manifest:

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
# below section defines Pod specification (Pod v1 core)
restartPolicy: Always
hostname: nginx
serviceAccountName: nginx
containers:
# below section defines container specification (Container v1 core)
- name: nginx
image: nginx
ports:
- containerPort: 80

26
PodSpec fields (Pod v1 core):
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/
#podspec-v1-core

affinity - define affinity rules


containers - container configuration (array)
dnsConfig - manage name resolution inside a POD
dnsPolicy - configure one of dns policies
hostname - define a hostname (in POD runtime)
imagePullSecrets - provide credentials needed for secured docker registries
initContainers - configure container (and action) that is executed before main PODs
nodeSelector - declare on which node POD should be running (using labels and
selectors)
priorityClassName - define POD priority class name
restartPolicy - configure conditions according to which POD will be restarted
serviceAccountName - define service account name that will be used for POD run
terminationGracePeriodSeconds - define grace POD termination time
tolerations - sets tolerations
volumes - define volumes that can be mounted in containers

Container v1 core:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/
#container-v1-core

args - arguments passed to command (Dockerfile CMD instruction)


command - PID1 process in container (Dockerfile ENTRYPOINT instruction)
env - define environment variables for a container
envFrom - import array of variables from configMap or Secret
image - set container image
imagePullPolicy - define image pull policy
lifecycle - configure PostStart or PreStop lifecycle hooks
livenessProbe - configure livenessProbe healthcheck
readinessProbe - configure readinessProbe healthcheck
name - define Pod name
ports - define container ports
resources - set hard and soft quotas for memory and cpu
securityContext - manage advanced security features
stdin - allocate runtime buffer for stdin
tty - allocates tty for the container itself
volumeMounts - volume mount point in a container
workingDir - container working directory

27
command and args fields
With use of command and args you can overwrite the default main container process and its
arguments that are defined in Dockerfile.

Description Dockerfile instruction K8s field

Main command used to start container ENTRYPOINT command

The arguments passed to the main process CMD args

apiVersion: v1
kind: Pod
metadata:
name: ubuntu
labels:
app: ubuntu
spec:
containers:
- name: ubuntu-command-example
image: ubuntu
command: ["printenv"]
args: ["HOSTNAME", "HOME"]
restartPolicy: OnFailure

kubectl logs ubuntu


ubuntu
/root

Notice that command and args fields expect lists to be passed - even one element list.

Pod lifecycle phases:


Pending - Accepted by K8s but container not yet created (POD scheduling decision +
image download)
Running - Pod bounded to a node and all containers were created
Succeeded - All Containers in the POD have terminated in success
Failed - All Containers in the POD have terminated (at least one Container has terminated
in failure)
Unknown - State of the POD could not be obtained (communication error with a node)
Terminating - State of the POD between sending terminate signal (SIGTERM) and kill
signal (SIGKILL) - by default 30sec

Static Pods:
Static pods are a way of creating and managing pods using kubelet (without kube-api).
Pods are created/deleted based on local directory content or remote url. Pods manifest path

28
and url are checked by kubelet every 20 seconds. When a static pod is created kubelet
tries to register it in api (pod will be visible in kube-api but cannot be managed from there).

kubelet static pod flags:


kubelet --pod-manifest-path=<the directory>
kubelet --manifest-url=<URL>

Labels and Selectors


Labels are key-value pairs that can be attached to different kubernetes objects. You can
assign labels to: Deployments, Pods, Jobs, Services etc. They help with organisation of K8s
objects according to different purposes: environment, client, or use-case. They can be
added to an object at creation time and can be added or modified at runtime.

Assigning labels:
metadata:
labels:
app: nginx

Label selectors are core grouping primitive in Kubernetes. They are used by users to select
a set of objects. Not all objects support selectors, some of objects that are tightly related to
selectors: Deployments, ReplicaSets, Services.

Assigning selectors:
- equality-based selectors:
selector:
app: nginx

- set-based selectors:
selector:
matchLabels:
app: nginx

or
selector:
matchExpressions:
- {key: app, operator: In, values: [nginx]}

29
Selector type Equality-based selectors Set-based selectors

Operators =, == and != in, notin and exists

Examples env = dev env in (prod, dev)


env != prod env notin (tests)
env
!env

Equality-based selectors Set-based selectors

kubectl get pods -l kubectl get pods -l env=prod


env=prod

selector: selector:
env: prod matchLabels:
env: prod
#or

matchExpressions:
- {key: env, operator: In, values:
[prod]}
- {key: env, operator: NotIn, values:
[dev]}

Matching objects must satisfy all Not every resource supports set-based selectors.
of the specified label constraints,
though they may have additional
labels as well.

Using set-based selectors You can build expressions like:

env in (production, development)


env notin (tests)
env
!env

Replication Controllers
ReplicationController (rc or rcs) ensures that a specified number of PODs are running in a
cluster. Replication Controller doesn’t support set-based selectors - use RepicaSet when it's
possible.

30
Example RC manifest:

apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
# below section defines ReplicationController specification
replicas: 2
selector:
app: nginx
# below section defines Pod configuration
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80

31
ReplicaSets
RepicaSet (rs) is a new version of Replication Controller that supports set-based selectors. It
ensures that a specified number of PODs are running in a cluster.

Example RS manifest:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx
labels:
app: nginx
spec:
# below section defines ReplicaSets specification
replicas: 2
selector:
matchExpressions:
- {key: app, operator: In, values: [nginx]}
# below section defines Pod configuration
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80

Deployments
Deployments control ReplicaSets and Pods configuration using one manifest file. After
creation, the state of Pods and ReplicaSets are checked by the Deployment Controller.
Deployments are used for long running processes - daemons. If anything happens with a
Pod that is connected to Deployment (Pod will be deleted, node will be destroyed), Pod will
be recreated.

32
Example Deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
# below section defines Deployment specification
strategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: nginx
# below section defines Pod configuration
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80

DeploymentSpec:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#deploymentspec-v1-a
pps

replicas - number of related Pod replicas (managed by ReplicaSets)


revisionHistoryLimit - specify how many old ReplicaSets for this Deployment you
want to retain. The rest will be garbage-collected in the background. By default, it is 10.
strategy:
type: Recreate - kills existing Pods before creating new ones

strategy:
type: RollingUpdate - define rolling update strategy (no outage)
rollingUpdate:
maxUnavailable: 2 - specifies the maximum number of Pod replicas that can be
unavailable during the update process.
maxSurge: 2 - specifies the maximum number of Pods that can be created over the
desired number of Pod replicas.

33
Example update process for 3 pod replicas:

kubectl get deployments


NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 3 3 3 3 3d

Pod-template-hash label
This is an automatically created label that is added by the Deployment controller to every
ReplicaSet that is created using the Deployment manifest. Value of this label is a hash
created from the PodTemplate field (PodSpec). If You will not change anything in the
PodSpec section and You will try to update Your app - Kubernetes will not trigger application
upgrade (deployment.apps/nginx unchanged).

labels:
app: nginx
pod-template-hash: f89759699

Updating a Deployment
A Deployment’s rollout is triggered if and only if the Deployment’s pod template (that is,
.spec.template) is changed, for example if the labels or container images of the template
are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.

In API version apps/v1, a Deployment’s label selector is immutable after it gets created.

Possible update strategy methods available in Kubernetes:


- Recreate
- Ramped
- Blue/Green Deployment
- Canary
- a/b testing (requires third-party resources)

https://blog.container-solutions.com/kubernetes-deployment-strategies

34
Jobs
Jobs are tasks that run to completion (they do not behave like daemons - long running
processes. Jobs are resources that once created cannot be updated (You need to delete it
and create again). After Job completion Pods are not deleted - logs, warning and diagnostic
output stay until you will delete it.
Example manifest:

apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
# below section defines Job specification
completions: 3
parallelism: 3
# below section defines Pod configuration
template:
metadata:
name: example-job
spec:
containers:
- name: pi
image: perl
command: ["perl"]
args: ["-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

JobsSpec:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/
#jobspec-v1-batch

activeDeadlineSeconds - specify a time (related to startTime) of a Job after which K8s


will try terminate all related Pods (Job status will become type: Failed with reason:
DeadlineExceeded)
backoffLimit - number of retries before marking job as failed
completions - number of successfully finished pods
parallelism - the maximum desired number of pods the job should run at any given time
ttlSecondsAfterFinished - limits Job lifetime by deleting it after specified amount of
seconds (currently this is alpha feature and has to be enabled explicitly -
TTLAfterFinished)

Remember that the Job object requires Pod restartPolicy set to Never or OnFailure
(default policy is Always).

35
Cronjobs
Runs jobs at specified point in time:
- Once at a specified point in time
- Repeatedly at a specified point in time

Kubernetes CronJobs are using cron format to define in which jobs will be executed.

# ┌───────────── minute (0 - 59)


# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday;
#│││││ 7 is also Sunday on some systems)
#│││││
#│││││
# * * * * * command to execute

Example manifest:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: example-cronjob
spec:
# below section defines CronJob specification
schedule: "0 0 * * *"
concurrencyPolicy: Forbid
# below section defines Job specification
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: example-job
image: ubuntu
command: ["/bin/sh", "-c", "--"]
args: ["for i in `seq 1 1 100`; do echo $i; done"]
restartPolicy: Never

CronJobSpec v1beta1 batch:


https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#cronjobspec-v1beta1-
batch

36
concurrencyPolicy - Specifies how to treat concurrent executions of a Job.
- Allow - allows for concurrent jobs (default)
- Forbid - skips next run if previous one doesn’t finished
- Replace - kills currently running job and create a new one
failedJobsHistoryLimit - The number of failed finished jobs to retain.
schedule - cron format schedule of a job
startingDeadlineSeconds - time buffer for missed job execution, after this deadline job
will be counted as failed
successfulJobsHistoryLimit - The number of successful finished jobs to retain.
suspend - suspend job execution, it will not suspend already running job (it will suspend
execution of new Jobs, not currently running)
jobTemplate - specifies Job configuration

DaemonSets
Ensures that one instance of a Pod is running on each (or some) node. When new nodes
are added to the cluster K8s automatically runs Pod from a DaemonSets on a new node.
Mainly used for:
- Cluster storage daemons (ceph, glusterd)
- Logs collector daemons (fluentd, logstash)
- Monitoring daemons (Prometheus Node Exporter, Datadog agent...)

Why should you use DaemonSets?


Ability to monitor and manage logs for daemons in the same way as applications.
Same config language and tools (e.g. Pod templates, kubectl) for daemons and applications.

Example manifest file:


apiVersion: v1
kind: DaemonSet
metadata:
name: daemonset-example
spec:
# below section defines DaemonSets specification
updateStrategy: OnDelete
selector:
matchLabels:
node: daemonset-example
# below section defines Pod configuration
template:
metadata:
labels:
app: daemonset-example
spec:
containers:
- name: daemonset-example
image: ubuntu

37
command:
- /bin/sh
args:
- -c
- >-
while [ true ]; do
echo "DaemonSet running on $(hostname)" ;
sleep 10 ;
done

StatefulSets
StatefulSets are the way of managing Pods that persists their identity. Each Pod managed
by StatefulSets has a persistent identifier that it maintains across any rescheduling.
Remember that StatefulSet requires a special type of Service object - Service without ip
address (in that case service discovery has to be implemented by a third party tool).

Pods are using the following naming convention:

<statefulset name>-<ordinal index>


nginx-01, nginx-02…

StatefulSets Pods behaviour:


- each Pod has a unique network identifier (dns name)
- each Pod requires persistent storage (manually created before deployment or using
persistent volume claim)
- deployment and scaling of Pods is done in ordered manner <
- deletion and termination of Pods is done in ordered manner >
- rolling updates of Pods are also done in ordered manner <

If You want to reach a Pod managed by StatefulSet use one of the following approaches:
- in the same namespace: <pod_name>.<service_name>
- cluster wide FQDN:
<pod_name>.<service_name>.<namespace>.svc.cluster.local

StatefulSetSpec:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#statefulsetspec-v1-ap
ps

serviceName - name of the Service attached to StatefulSet, Service has to be created


before StatefulSet object
podManagementPolicy - controls how pods are created during initial scale up, when
replacing pods on nodes, or when scaling down
OrderedReady (default) - Pods are scaled up/down one by one
Parallel - Pods are scaled up/down all at once
replicas - number od Pod replicas

38
revisionHistoryLimit - the maximum number of revisions that will be maintained in the
StatefulSet's revision history
updateStrategy - defines how Pods are updated
rollingUpdate (default)
partition - If a partition is specified, all Pods with an ordinal that is greater than or equal
to the partition will be updated when the StatefulSet’s .spec.template is updated. All Pods
with an ordinal that is less than the partition will not be updated, and, even if they are
deleted, they will be recreated at the previous version.
OnDelete - Pod is deleted and new one is created after
volumeClaimTemplates - list of claims that pods are allowed to reference

Example manifest (Service & StatefulSet):


apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx
spec:
# below section defines StatefulSets specification
selector:
matchLabels:
app: nginx
serviceName: "nginx"
replicas: 1
# below section defines Pod configuration
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx

39
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: nginx-html-volume
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-html-volume
awsElasticBlockStore:
volumeID: <volume-id>
fsType: ext4

Annotations
Annotations are a special kind of metadata that are attached to objects. Labels can be used
to select objects and to find collections of objects that satisfy certain conditions. In contrast,
annotations are not used to identify and select objects. Different client tools and libraries can
retrieve this metadata and use them for different purposes. Annotations are not used
internally by K8s. The metadata in an annotation can be small or large, structured or
unstructured, and can include characters not permitted by labels.

Example Annotation snippet:


metadata:
annotations:
key1: “value”
key2: “value”

Helm snippet for Pod template hash changes:


template:
metadata:
labels:
app: {{ .Release.Name }}
annotations:
checksum/config: {{ now }}

Adding annotations with kubectl:


kubectl annotate <resource>/<name> key=value

Authentication and Authorization


Authentication - process or action of proving or showing something to be true, genuine, or
valid. Kubernetes distinguish 2 types of users::
- users
- service accounts

40
Authorization - define a set of privileges that are assigned to authenticated entity. Currently
there are 4 authorization modules in k8s:
- Node
- ABAC
- RBAC (Recommended)
- Webhook

Users are manually managed by Kubernetes admins or by external identity providers. User
entity can be confirmed by one of the following authentication strategies:
- Certificate-based (X509 Client Certs, --client-ca-file=$CERT_FILE)
- Username/Password-based (--basic-auth-file=$CRED_FILE)
- Pre-Generated Token-Based (--token-auth-file=$TOKEN_FILE)
- Service Account Tokens
- OpenID Connect Tokens

Each user object can contain the following key: value pairs:
- Username: name that uniquely identifies the end user, examples: admin,
[email protected]
- UID: a string (number) which identifies the end user
- Groups: define a group name to which user belongs
- Extra fields: additional metadata that can be used by authorizers

Kubernetes API request flow

source: https://kubernetes.io/docs/concepts/security/controlling-access/

Admissions Controllers
Admission Controllers are pieces of code that intercept requests before they are sent to the
API. There are 2 kinds of Admissions Controllers: mutating (they can modify objects that
they admit) and validating(they validate requests sent to the API). Admission Controllers are
compiled in API.

41
Why do I need Admission Controllers?
https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/#why-do-i
-need-admission-controllers

Admission Controllers that are available by default (in current K8s version):
CertificateApproval, CertificateSigning,
CertificateSubjectRestriction, DefaultIngressClass,
DefaultStorageClass, DefaultTolerationSeconds, LimitRanger,
MutatingAdmissionWebhook, NamespaceLifecycle,
PersistentVolumeClaimResize, Priority, ResourceQuota, RuntimeClass,
ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition,
ValidatingAdmissionWebhook

If You want to disable one of the default Admission plugins You need to use
--disable-admission-plugins flag.

Service Accounts
Service accounts are used for assigning Kubernetes API permissions to Pods.
- All Service Account Users are using Service Account Tokens
- They are stored as credentials using Secrets (Secrets are also mounted in pods to
allow communication between services)
- Service Account are specific to a namespace
- Can be created by API or manually using objects
- Any API call that is not authenticated is considered as an anonymous user

example manifest:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-service-account

Creating service account with kubectl:


kubectl create serviceaccount admin-service-account

Assigning service account to Pod (PodSpec):


spec:
serviceAccountName: admin-service-account

42
containers:
- name: nginx
...

automountServiceAccountToken by default it is set to true. Which means that a token


related to ServiceAccount is mounted inside of the continater. If you don’t need access to
K8s API from the container you should change this option to false. This option can be
overwritten at Pod.spec level. Token is mounted at:
/run/secrets/kubernetes.io/serviceaccount/token path in your containers.

ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-service-account
automountServiceAccountToken: false

Pod level:
spec:
serviceAccountName: admin-service-account
automountServiceAccountToken: false
containers:
- name: nginx
...

Certificate-based Authentication
Certificates can be manually generated by: easyrsa, openssl, cfssl.
Client certificates need to be defined on API level by --client-ca-file=$CERT_FILE
flag. $CERT_FILE needs to consist of one or more certificate authorities. Username is
defined by CN (Common Name) of certificate, groups are defined by O (organization)
certificate field.

Provisioning a CA and Generating TLS Certificates:


1. Generating CA configuration file, certificate and private key.
2. Generating Admin Client Certificates

You need to create user certificate with a proper Common Name and Organization, that is
signed by CA that was used for starting Kubernetes API (--client-ca-file flag)

kubeconfig
Kubeconfig is a file used by kubectl command to access Kubernetes API - treat it as a
key to Your cluster. Kubeconfig file is divided into 3 sections:
- user (define username and authentication method)

43
- cluster (api endpoint and optional certificate settings)
- context (connects user and cluster entries).
- default namespace (optionally)

Kubeconfig file priorities (order mathers, from highest to lowest priority):


- kubectl commands with --kubeconfig $PATH_TO_KUBECONFIG_FILE flag
- KUBECONFIG environment variable
- Default kubeconfig location is: ~/.kube/config

kubectl config - sub-command group for managing kubeconfig file

Setting user section:


kubectl config set-credentials admin --username=admin
--password=secret
Setting cluster section:
kubectl config set-cluster local-server
--server=https://localhost:8080 --insecure-skip-tls-verify=true
Setting context:
kubectl config set-context default-context --cluster=local-server
--user=myself
Choosing current context:
kubectl config use-context default-context
Setting default namespace:
kubectl config set contexts.default-context.namespace production

Role-Based Access Control


RBAC defines sets of permissions that can be assigned to users, groups or service
accounts. RBAC needs 3 things to work: User/ServiceAccount/Group,
Role/ClusterRole and RoleBinding/ClusterRoleBinding. Remember that RBAC
uses PoLP (Policy of Least Privileges) - by default everything is forbidden, and you define
what is allowed. There are 2 kinds of policies (Roles):

Roles - define permissions restricted to 1 namespace


ClusterRoles - define permissions for entire cluster

44
Example Role/ClusterRole manifest:
kind: Role # or ClusterRoler
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default # ommited in ClusterRole
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]

Command line counterpart:


kubectl create role pod-reader --verb=get,watch,list --resource=pods

Possible configurations:

apiGroups resources verbs

- core - pods - create


- apps - configmaps - delete
- batch - endpoints - deletecollection
- extensions - persistentvolumeclaims - get
- policy - replicationcontrollers - list
- networking.k8s.io - secrets - patch
- autoscaling - serviceaccounts - update
-* - services - watch
-* - impersonate
... -*

Asterisk (“*”) means all possible.

apiGroup is a first part of the object apiVersion field (apiGroup/objectVersion).

apiVersion field apiGroup

apiVersion: v1 ""

apiVersion: apps/v1 apps

apiVersion: batch/v1 batch

apiVersion: extensions/v1beta1 extensions

You can check object: apiGroup, scope and abbreviation by executing:


kubectl api-resources

45
Example RoleBinding manifest:

kind: RoleBinding # ClusterRoleBinding


apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader
namespace: default
subjects:
- kind: User # Group, ServiceAccount
name: tom
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role # ClusterRole
name: pod-reader
apiGroup: rbac.authorization.k8s.io

Command line counterpart:


kubectl create rolebinding pod-reader --role pod-reader --user tom
--namespace default

When You are binding User/Group/ServiceAccount to Role/ClusterRole You need


to remember about the proper apiGroup to which User/Group/ServiceAccount
belongs to.

User binding:
subjects:
- kind: User
name: tom
apiGroup: rbac.authorization.k8s.io

ServiceAccount binding:
subjects:
- kind: ServiceAccount
name: scripts
apiGroup: ""

Group binding:
subjects:
- kind: Group
name: admins
apiGroup: rbac.authorization.k8s.io

If you want to check your permission you can use kubectl auth can-i command:
kubectl auth can-i <verb> <resource>
$ kubectl auth can-i get pods
yes

46
Notice that rules field in Role/ClusterRole and subjects field in
RoleBinding/ClusterRoleBinding are lists, which means that You can define many
rules in 1 Role, and you can attach a Role to many identities in 1 RoleBinding.

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: pod-job-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
- apiGroups: [“batch”]
resources: ["jobs"]
verbs: ["get", "watch", "list"]
...

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader-binding
namespace: default
subjects:
- kind: User
name: tom
apiGroup: rbac.authorization.k8s.io
- kind: Group
...
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io

47
Assigning Role to a Pod

Assigning Role to a User/Group

In each Kubernetes cluster you can find predefined groups and service accounts. You can
check them by executing:

kubectl get cluster-roles


kubectl get sa --all-namespaces

Notice that in each namespace you can find a default service account that is attached to
all Pods in the namespace.

48
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#defaul
t-roles-and-role-bindings

Assigning permissions to default service accounts and different identities:

For the default service account in the kube-system namespace:


subjects:
- kind: ServiceAccount
name: default
namespace: kube-system

For all service accounts in the test namespace:


subjects:
- kind: Group
name: system:serviceaccounts:test
apiGroup: rbac.authorization.k8s.io

For all service accounts in any namespace:


subjects:
- kind: Group
name: system:serviceaccounts
apiGroup: rbac.authorization.k8s.io

For all authenticated users:


subjects:
- kind: Group
name: system:authenticated
apiGroup: rbac.authorization.k8s.io

Default roles and role bindings:


ttps://kubernetes.io/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings

49
Networking

Services
Services are REST objects that act as persistent endpoint and load balancer for pods
communication. Each service name and IP is propagated by Kubernetes DNS to the entire
K8s cluster. Service redirects traffic based on relation between selectors assigned to
Services and labels assigned to Pods. Traffic that will arrive on Service name will be
redirected to Pods with particular label. In the world of microservices, containers always
communicate with each other using Services - not drectly!

Kubernetes supports 2 primary modes of finding a service discovery - ENV variables and
DNS.

ServiceSpec v1 core

type - type determines how the Service is exposed. Defaults to ClusterIP. Valid options are
ExternalName, ClusterIP, NodePort, and LoadBalancer
- clusterIP - define cluster ip related to service (can be set automatically by master
or manually)
- externalIPs - is a list of IP addresses for which nodes in the cluster will also
accept traffic for this service
- externalName - externalName is the external reference that kubedns or equivalent
will return as a CNAME record for this service
- loadBalancer - created external loadbalancer
healthCheckNodePort - specifies the healthcheck nodePort for the service
loadBalancerIP - only applies to Service Type: LoadBalancer LoadBalancer will get
created with the IP specified in this field (needs to be supported by cloud provider)

50
loadBalancerSourceRanges - this option will restrict traffic through the cloud-provider
load-balancer will be restricted to the specified client IPs
ports - the list of ports that are exposed by this service
selector - route service traffic to pods with label keys and values matching this selector
sessionAffinity - supports "ClientIP" and "None". Used to maintain session affinity.
Enable client IP based session affinity
sessionAffinityConfig: - contains the configurations of session affinity
clientIP:
timeoutSeconds: 10 (default is 10800 - 3hours, max value is 86400)

ServicePort v1 core

name - the name of this port within the service. This must be a DNS_LABEL
nodePort - the port on each node on which this service is exposed when type=NodePort or
LoadBalancer. Usually assigned by the system
port - The port that will be exposed by this service.
protocol - the IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is
TCP.
targetPort - number or name of the port to access on the pods targeted by the service.
Number must be in the range 1 to 65535

Service Types:

ClusterIP(default) - Creates a service that is accessible only inside a cluster.


NodePort - Publish service on nodes using one of dynamically allocated high ports (default:
30000-32767), port can be also set to particular value (using nodePort field).
LoadBalancer - This service will provision external loadbalancer using cloud providers
infrastructure.
ExternalName - Maps service to external DNS name.
ExternalIP - Maps external ip to Kubernetes service. Externals ip needs to route to one or
more cluster nodes.
Headless - Service without IP address.

ClusterIP(default) - Service is used for communication between applications inside of the


Kubernetes cluster. Traffic for Service endpoints (Pods attached to a Service) is load
balanced to the Pods pointed by the selector. Services are sets of routing policies that are
injected into nodes configuration by kube-proxy. Each service can be reached by:
- in namespace: $SERVICE_NAME
- in a cluster: $SERVICE_NAME.$NAMESPACE_NAME or
$SERVICE_NAME.$NAMESPACE_NAME.svc.cluster.local

51
Example manifest file:

kind: Service
apiVersion: v1
metadata:
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
selector:
app: nginx
type: ClusterIP
ports:
# below section defines Service specification (ServicePort v1 core)
- port: 80

52
NodePort Service is used for setting access from outside of the cluster, the application is
then accessible by <node-ip>:<port> (port has to be from 30000-32767 range).
NodePort redirects traffic to automatically created ClusterIP Service. Services are sets
of routing policies that are injected into node configuration by kube-proxy.

Example manifest file:

apiVersion: v1
kind: Service
metadata:
labels:
service: nginx
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
type: NodePort
selector:
app: nginx
ports:
# below section defines Service specification (ServicePort v1 core)
- name: "443"
port: 443
nodePort: 30443

53
LoadBalancer - This Service will provision external loadbalancer using cloud providers
infrastructure.

Example manifest:

kind: Service
apiVersion: v1
metadata:
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
selector:
app: nginx
type: LoadBalancer
ports:
# below section defines Service specification (ServicePort v1 core)
- protocol: TCP
port: 80

54
ExternalName - creates an alias (CNAME record) in Kubernetes DNS. In this case
redirection is made at DNS level - not at routing level.

kind: Service
apiVersion: v1
metadata:
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
type: ExternalName
externalName: example.nginx.site

ExternalIP - Maps external ip to Kubernetes service. Externals ip needs to route to one or


more cluster nodes.

kind: Service
apiVersion: v1
metadata:
name: nginx
Spec:
# below section defines Service specification (ServiceSpec v1 core)
selector:
app: nginx
externalIPs:
- 192.168.1.1
# below section defines Service specification (ServiceSpec v1 core)
ports:
- name: http
protocol: TCP
port: 80

55
CNI

Network plugin requirements:


- all containers can communicate with all other containers without NAT
- all nodes can communicate with all containers (and vice-versa) without NAT
- the IP that a container sees itself as is the same IP that others see it as it is

Some of CNI plugins and their features:

Provider Network Route Network Mesh External Encryption Ingress/Egress Support


Model Distribution Policies Datastore Policies

Calico Layer 3 Yes Yes Yes Etcd No Yes Yes

Canal Layer 2 N/A Yes No Etcd No Yes No


vxlan

flannel vxlan No No No Etcd No No No

Weav Layer 2 N/A Yes Yes No Yes Yes Yes


e Net vxlan 4

Container Network Interface:


- Is and Interface between container runtime and the network implementation
- Configures the network interfaces and routes
- It handles only network connectivity

CNI specification:
https://github.com/containernetworking/cni/blob/master/SPEC.md

CNI components:
- CNI binary: configures network interface of the Pod
- Daemon: manages routing across the cluster (installed on every Node)

Calico routing modes (great explanation):


https://octetz.com/docs/2020/2020-10-01-calico-routing-modes/

Network Policies
NetworkPolicies are Kubernetes internal firewalls. A network policy is a specification of
how groups of pods are allowed to communicate with each other and other network
endpoints.

By default, pods are non-isolated; they accept traffic from any source.

56
Pods become isolated by having a NetworkPolicy that selects them. Once there is any
NetworkPolicy in a namespace selecting a particular pod, that pod will reject any
connections that are not allowed by any NetworkPolicy.

NetworkPolicies require 2 components to work: CNI provider (Calico, Canal, Weave Net, etc)
and NetworkPolicy manifest. CNI providers can be deployed using Kubernetes manifest,
installation instructions can be found at websites of CNI providers.

Difference between egress and ingress traffic:

Example NetworkPolicy manifests:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: api-allow
spec:
podSelector:
matchLabels:
app: bookstore
role: api
ingress:
- from:
- podSelector:
matchLabels:
app: bookstore

57
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
- Egress
ingress:
- from:
- ipBlock:
cidr: 172.17.0.0/16
except:
- 172.17.1.0/24
- namespaceSelector:
matchLabels:
project: myproject
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 6379
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 5978

NetworkPolicy Spec:
podSelector: Each NetworkPolicy includes a podSelector which selects the grouping
of pods to which the policy applies (An empty podSelector selects all pods in the
namespace).

policyTypes: Can be set to Ingress, Egress, or both. The policyTypes field indicates
traffic direction: send to a Pod or from a Pod. If no policyTypes are specified on a
NetworkPolicy then by default Ingress will be set.

ingress: Each NetworkPolicy may include a list of whitelist ingress rules. Each rule
allows traffic which matches both the from and ports sections. Ingress field can contain 3
source fields: ipBlock, namespaceSelector and a podSelector.

58
egress: Each NetworkPolicy may include a list of whitelist egress rules. Each rule allows
traffic which matches both the to and ports sections. The example policy contains a single
rule, which matches traffic on a single port to any destination in 10.0.0.0/24.

Deny all ingress traffic in namespace (empty policy attached to all pods):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress

Deny all egress in namespace:


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
spec:
podSelector: {}
policyTypes:
- Egress

Allow all ingress in namespace:


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-ingress
spec:
podSelector: {}
ingress:

59
- {}
policyTypes:
- Ingress

Deny all ingress and egress traffic in namespace:


apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

For the most security demanding environments and clients like banks and financial services
you should start with the default-deny-all NetworkPolicy and then you should add
NetworkPolicies that will allow for the traffic between your apps.

This policy selects all pods in namespace and blocks all of the traffic. This is implementation
of Principle of least privileges (POLP) at NetworkPolicy level.

In case of deny-all (ingress&egress) creation you need to remember that in case of allowing
traffic between your pods, you need to define ingress and egress policies and attach them to
proper pods.

60
... ...
egress: ingress:
- to: - from:
- podSelector: - podSelector:
matchLabels: matchLabels:
app: "pod2" app: "pod1"
policyTypes:
- Egress

61
Purpose Deny from all other namespaces Allow from all other namespaces Allow from one
namespace

kind: NetworkPolicy kind: NetworkPolicy kind: NetworkPolicy


apiVersion: networking.k8s.io/v1 apiVersion: apiVersion:
metadata: networking.k8s.io/v1 networking.k8s.io/v1
namespace: prod metadata: metadata:
name: deny-from-other-ns namespace: prod namespace: prod
spec: name: allow-all-ns name: allow-develop-ns
podSelector: spec: spec:
matchLabels: podSelector: podSelector:
Manifest ingress: matchLabels: matchLabels:
- from: app: db app: db
- podSelector: {} ingress: ingress:
- from: - from:
- namespaceSelector: {} - namespaceSelector:
matchLabels:
env: default

Use case Network connectivity isolation at Allowing access for service that has Allowing access for a service
namespace level. to be available for many from a particular namespace.
namespaces, for example: central
database

What You can’t do with network policies:


https://kubernetes.io/docs/concepts/services-networking/network-policies/#what-you-can-t-d
o-with-network-policies-at-least-not-yet

DNS in Kubernetes
Main DNS function in Kubernetes is providing name resolution and service discovery for K8s
services and Pods. Most popular DNS tool that is used in Kubernetes is CoreDNS

What objects get DNS records?


- Services (my-svc.my-namespace.svc.cluster-domain.example)
- Pods (pod-ip-address.my-namespace.pod.cluster-domain.example)

62
Any pods created by a Deployment or DaemonSet exposed by a Service have the following
DNS resolution available:
pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example.

Main DNS function in Kubernetes is providing name resolution and service discovery for K8s
services. Cluster can use one of the following dnsPolicies:
- Default - name resolution configuration is inherited from the node
- ClusterFirst - any queries for *.cluster.local domain are sent to
kube-dns Service, other queries are sent to the upstream server inherited from
node.

Kubernetes DNS-Based Service Discovery:


https://github.com/kubernetes/dns/blob/master/docs/specification.md

Pods DNS policies:


- Default - DNS config inherited from Node
- ClusterFirst - any DNS query that do not match cluster domain suffix (by default
cluster.local) is forwarded to the upstream nameserver inherited from the node
- ClusterFirstWithHostNet (should be used with hostNetwork) - Pods are
running in hostNetwork
- None - ignore all DNS settings from the Kubernetes, DNS settings need to be
provided by dnsConfig field

spec:
dnsPolicy: “None”

Overwriting DNS at Pod level:

nameservers - define name servers that will be used for name queries (max 3)
searches - define domains in which Pod will search hostname queries (max 6)
options - list of optional objects

63
CoreDNS

Default Corefile configuration:

apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns

CoreDNS has a pluggable architecture, plugins can be enabled using Corefile configuration
file.
https://coredns.io/plugins/

Description of CoreDNS plugins available in the default Corefile:


https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#coredns-confi
gmap-options

How to setup stub domain and upstream servers in Pods?


https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configuration-
of-stub-domain-and-upstream-nameserver-using-coredns

64
Ingress
An API object that manages external access to the services in a cluster, typically HTTP.
Ingress is a collection of load balancing, SSL termination and name-based virtual hosting
rules. Ingress requires 2 components: ingress controller and ingress object
manifests.

Nginx ingress controller can be installed using deployment manifest, instruction can be found
at: https://github.com/nginxinc/kubernetes-ingress/blob/master/build/README.md

Default ingress that redirects all incoming traffic to one service:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
spec:
defaultBackend:
service:
name: web
port:
number: 80

65
The above example can be used for example for nginx which will then use reverse proxy to
pass traffic to your apps. In that case only nginx will be available from the outside of your
cluster.

Example manifest file (name-based):


K8s Ingress config nginx config counterpart

apiVersion: networking.k8s.io/v1 server {


kind: Ingress server_name warsaw.example.com;
metadata: listen 80;
name: ingress location / {
proxy_pass http://warsaw-web;
spec:
}
rules: }
- host: warsaw.example.com server {
http: server_name gdansk.example.com;
paths: listen 80;
- path: / location / {
pathType: Prefix proxy_pass http://gdansk-web;
backend: }
}
service:
name: warsaw-web
port:
number: 80
- host: gdansk.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gdansk-web
port:
number: 80

Hostname wildcards:
Host Host header Match?

*.foo.com bar.foo.com Matches based on shared suffix

*.foo.com baz.bar.foo.com No match, wildcard only covers a single DNS label

*.foo.com foo.com No match, wildcard only covers a single DNS label

66
Example manifest file (reverse-proxy, redirection based on url):
K8s Ingress config nginx config counterpart

apiVersion: networking.k8s.io/v1 server {


kind: Ingress server_name example.com;
metadata: listen 80;
name: ingress-example location / {
annotations: proxy_pass http://web/;
nginx.ingress.kubernetes.io/rewrite-target: / }
spec: location /metrics {
rules: proxy_pass http://grafana/;
- host: "example.com" }
http: }
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 80
- path: /metrics
pathType: Prefix
backend:
service:
name: grafana
port:
number: 80

pathType configuration options:

ImplementationSpecific: With this path type, matching is up to the IngressClass.


Implementations can treat this as a separate pathType or treat it identically to Prefix or Exact
path types.

Exact: Matches the URL path exactly and with case sensitivity.

Prefix: Matches based on a URL path prefix split by /. Matching is case sensitive and done
on a path element by element basis. A path element refers to the list of labels in the path
split by the / separator. A request is a match for path p if every p is an element-wise prefix of
p of the request path.

67
Example ssl certificate manifest files:

K8s Ingress config nginx config counterpart

apiVersion: v1 server {
data: server_name example.com;
tls.crt: base64 encoded cert listen 80;
tls.key: base64 encoded key ssl_certificate /ssl/tls.crt;
kind: Secret ssl_certificate_key /ssl/tls.key;
metadata: location /
name: secret-web-ssl proxy_pass ttp://nginx;
type: Opaque }
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-web-ssl
spec:
tls:
- hosts:
- example.com
secretName: secret-web-ssl
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80

cert-manager (https://cert-manager.io/docs/) is a great tool that can automate free Ingress


ssl certificates renewing with Lets Encrypt (https://letsencrypt.org/).

Accessing application Kubernetes

- Services - NodePort, LoadBalancer, ExternalIP


- Ingress - using name-based or reverse-proxy redirection
- kubectl port-forward - creates a proxy to Pod, Deployment or Service
(applications are available behinds localhost tunnel)
- hostNetwork (PodSpec) - give access to all host interfaces, used by network
plugins for network management (flanneld)
- hostPort (Pod Container v1 core array) - publish container application on a host
port (it works the same as docker run -p option)

68
Storage

Volumes
Volumes are external data sources that can be mounted inside of a pod. Kubernetes
supports many different types of volumes, each of them can be used by containers. Different
volume types behave in a different way. Each Container in the Pod must independently
specify where to mount each volume

awsElasticBlockStore
azureDisk
azureFile
cephfs
configMap
csi
downwardAPI
emptyDir
fc (fibre channel)
flocker
gcePersistentDisk
gitRepo (deprecated)
glusterfs
hostPath
iscsi
local
nfs
persistentVolumeClaim
projected
portworxVolume
quobyte
rbd
scaleIO
secret
storageos
vsphereVolume

Each volume type is mounted in a different way, instructions about mounting particular
volumes can be found at: https://kubernetes.io/docs/concepts/storage/volumes/

emptyDir:
- emptyDir is a temporary directory that is used for sharing data between containers in a
Pod.
- emptyDir lives as long as Pod is running on a k8s node
- By default Kubernetes is using a storage medium that is attached to a node, but there is
also a possibility to mount tmpfs (memory) for high speed access.
- emptyDir on host is stored at:

69
/var/lib/kubelet/pods/$PODUID/volumes/kubernetes.io~empty-dir/$VOLUM
ENAME

Example of emptyDir manifest:

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}

hostPath:
- hostPath is useful for getting access from Pod to different host internals, for example:
/var/lib/docker or /sys
- Pods on different nodes can act differently, depending on files content
- Files or directories created on node

Example of hostPath manifest:


apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /var/lib/docker
name: docker-internals-volume
volumes:
- name: docker-internals-volume
hostPath:
path: /var/lib/docker

70
awsElasticBlockStore
- Content of volume is not deleted with Pod termination - volume is just unmounted
- EBS volume needs to be created before mount
- K8s nodes needs to be EC2 instances located in the same region and availability zone as
EBS
- EBS volume can be mounted only to one EC2 instance

Example of awsElasticBlockStore manifest:

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /images
name: images-volume
volumes:
- name: images-volume
awsElasticBlockStore:
volumeID: $VOLUME_ID
fsType: ext4

gcePersistentDisk
- Content of volume is not deleted with Pod termination - volume is just unmounted
- Persistent disk volume needs to be created before mount
- K8s nodes needs to be GCE vms located in the same Project and zone as Persistent
volume
- Persistent disk can be mounted only to one GCE vm with r/w mode.

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- mountPath: /images
name: images-volume

71
volumes:
- name: images-volume
gcePersistentDisk:
pdName: images-volume
fsType: ext4

azureDisk

Configuration options that you can add to AzureDisk:


cachingMode - Host Caching mode: None, Read Only, Read Write.
diskName - The Name of the data disk in the blob storage
diskURI - The URI the data disk in the blob storage
fsType - Filesystem type to mount. Must be a filesystem type supported by the host
operating system. Ex. "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
kind - Expected values Shared: multiple blob disks per storage account Dedicated: single
blob disk per storage account Managed: azure managed data disk (only in managed
availability set). defaults to shared
readOnly - boolean Defaults to false (read/write). ReadOnly here will force the ReadOnly
setting in VolumeMounts.

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /images
name: images-volume
volumes:
- name: azure
azureDisk:
diskName: test.vhd
diskURI: https://<account>.blob.microsoft.net/vhds/test.vhd

Useful links:
https://github.com/kubernetes/examples/tree/master/staging/volumes/azure_disk
https://github.com/kubernetes/examples/tree/master/staging/volumes/azure_file

72
ConfigMaps
ConfigMaps in Kubernetes are used for storing different types of application configurations
(treat them as config files from /etc). ConfigMaps can be created using kubectl create
configmap <map-name> <data-source> command or based on ConfigMap manifest
file (recommended).

ConfigMaps can be mounted as:


- files
- environment variables

Example manifest snippet that mounts ConfigMap as file in container:

...
containers:
- name: nginx
image: nginx
volumeMounts:
- name: example-configmap-volume
mountPath: /example
volumes:
- name: example-configmap-volume
configMap:
name: example-configmap
---
apiVersion: v1
kind: ConfigMap
metadata:
name: example-configmap
data:
filename: "Content of example file"

Example manifest snippet that mounts ConfigMap as env variables in container:


containers:

- name: nginx
image: nginx
envFrom:
- configMapRef:
name: example-configmap

73
Example manifest snippet that mounts only one env variable from ConfigMap in container:
containers:
- name: nginx
image: nginx
env:
- name: ENV_VAR_NAME
valueFrom:
configMapKeyRef:
name: example-configmap
key: ENV1
---
apiVersion: v1
kind: ConfigMap
metadata:
name: example-configmap
data:
ENV1: "Content of ENV1 example variable"
ENV2: "Content of ENV2 example variable"

Secrets
Secrets are designed to store fragile security data as passwords, certificates or tokens. Main
purpose of secrets is storing security data as: db name, db password, tokens etc (remember
that all values are based). There are three types of secrets:
- docker-registry Create a secret for use with a Docker registry
- generic Create a secret from a local file, directory or literal value
- tls Create a TLS secret

Secrets can be created using kubectl create secret <secret-type>


<secret-name> <data-source> command or based on Secret manifest file
(recommended).

Generic Secrets:
Example manifest snippet that mounts Secret as env variables in container:
...
containers:
- name: nginx
image: nginx
envFrom:
- secretRef:
name: api-credentials
---

74
apiVersion: v1
data:
PASSWORD: c2VjcmV0LXBhc3N3b3JkCg==
USER: YWRtaW4K
kind: Secret
metadata:
name: secrets

Example manifest snippet that mounts Secret as file in container:

Container:
...
containers:
- name: nginx
image: nginx
envFrom:
- secretRef:
name: api-credentials
volumeMounts:
- name: secrets-volume
mountPath: /secrets
volumes:
- name: secrets-volume
secret:
secretName: index-html

Secret:
apiVersion: v1
data:
index.html: c29tZSBjb250ZW50
kind: Secret
metadata:
name: index-html

Coding/Encoding strings using base64 (all values are encoded in Secret manifests):

echo -n $STRING | base64


echo -n $BASED_STRING | base64 --decode

docker-registry Secrets

If You want to pull images from a private repository you need to authenticate to your docker
registry provider. You do so by using:
docker login

The login command creates or updates a config.json file that holds an authorization token.

75
cat ~/.docker/config.json
{
"auths": {
"asia.gcr.io": {},
"https://asia.gcr.io": {},
...
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.09.0 (darwin)"
},
"credsStore": "osxkeychain"
}

By default, Docker looks for the native binary on each of the platforms, i.e. “osxkeychain”
on macOS, “wincred” on windows, and “pass” on Linux. A special case is that on Linux,
Docker will fall back to the “secretservice” binary if it cannot find the “pass” binary. If
none of these binaries are present, it stores the credentials (i.e. password) in base64
encoding in the config files.

If you need access to multiple registries, you can create one secret for each registry.
Kubelet will merge any imagePullSecrets into a single virtual ~/.docker/config.json when
pulling images for your Pods.

How to create a docker-registry secret?


1. From command-line:
kubectl create secret docker-registry <secret-name>
--docker-server=<your-registry-server> --docker-username=<your-name>
--docker-password=<your-password> --docker-email=<your-email>

DockerHub registry server: https://index.docker.io/v1/

If You are using DockerHub you need to use 2 flags: --docker-username=<your-name>


--docker-password=<your-password>

2. From manifest file:


kubectl apply -f secretmanifest.yaml
apiVersion: v1
kind: Secret
metadata:
name: <NAME>
namespace: <NAMESPACE>
data:
.dockercfg: ak0yeDFNakEwZEdkbG...RHVndjVzQ1YUdZPSJ9fQ==
type: kubernetes.io/dockercfg

76
docker-registry secrets can be attached to Your applications at 2 levels:

Pod level:
containers:
- name: private-registry-container
image: <your-private-image>
imagePullSecrets:
- name: regcred

ServiceAccount level:
apiVersion: v1
kind: ServiceAccount
metadata:
name: regcred
imagePullSecrets:
- name: regcred

Persistent Volume Claims


Dynamic volume provisioning allows for dynamic storage volume creation and attachment.
For dynamic volume creation K8s needs storage class and Persistent Volume Claim
definition.

StorageClass define a type of storage that will be used by PersistentVolumeClaim


PersistentVolumeClaim defines the amount of storage that will be requested in
StorageClass and access mode.

accessModes:
ReadWriteOnce – Mount a volume as read-write by a single pod
ReadOnlyMany – Mount the volume as read-only by many pods
ReadWriteMany – Mount the volume as read-write by many pods

PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim1
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast
resources:
requests:
storage: 30Gi

77
StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd

Deployment snippet:
...
containers:
- name: nginx
image: nginx
volumeMounts:
- name: dynamic-claim1-volume
mountPath: /claim1
volumes:
- name: dynamic-claim1-volume
persistentVolumeClaim:
claimName: dynamic-claim1

Features

NodeSelector
Using node selectors you can pin Pods to nodes, by adding labels to nodes and settings
node selectors on Pods.

Adding labels:
kubectl label node minikube disk=ssd

Configuring nodeSelector:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx

78
spec:
nodeSelector:
disk: ssd
containers:

Affinity and anti-affinity rules


Affinity and anti-affinity rules allows you to do much more complex scheduling rules than with
nodeSelector:
- The selecting language is more expressive
- You can create rules that are not hard requirements but rather prefered rules, which means
that scheduler can assign Pod to node even if rules cannot be met
- You can create rules that takes other pod label into account

There are 2 kinds of affinity and anti-affinity rules:


- node rules
- pod rules

There are 2 types of node affinity rules:


1. Hard requirement (that works like the nodeSelector) - The rules must be met before the
pod will be scheduled.

- requiredDuringSchedulingIgnoredDuringExecution

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values:
- production

2. Soft requirement - Even if the rules are not set, Pod can still be scheduled, this is just a
preference.

- preferredDuringSchedulingIgnoredDuringExecution

affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:

79
matchExpressions:
- key: env
operator: In
values:
- staging

Interpod affinity rules can influence scheduling based on the labels of other pods that are
already running.
Remember that each pod is running inside a namespace so affinity rules apply to the pods in
a particular namespace (if the namespace is not defined, the affinity rule will apply to the
namespace of a pod).
There are 2 types of interpod affinity rules:

1. Hard requirement

- requiredDuringSchedulingIgnoredDuringExecution

affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: disk
operator: In
values:
- ssd
topologyKey: kubernetes.io/hostname

2. Soft requirement

- preferredDuringSchedulingIgnoredDuringExecution

affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: storage
operator: In
values:
- ssd
topologyKey: kubernetes.io/hostname

80
When to use interpod affinity/anti-affinity rules?

Affinity Anti-affinity

Collocating 2 pods that are tightly connected Ensuring the pod is always scheduled
to each other and traffic between them once at node.
shouldn’t be sent through the network Example: CPU or memory demanding
(performance benefits). applications.
Example: Apps that are using redis cache.

Collocating pods in the same AZ (in case of


federated clusters).

Taints and Tolerations


Tolerations are opposite to node affinities. They allow nodes to repel a set of pods. Taints
mark nodes and tolerations applied to pods influence the scheduling of the pods.
Comparison to Affinity rules:

Affinity rules Taints and Tolerations

Node label Taint

Pod Affinity rule Tolerations

By default in at least 2 node cluster, master has the following taint assigned by default:

node-role.kubernetes.io/master:NoSchedule

Which will not allow for scheduling on master nodes (by default pods are not using
tolerations).

Adding taints to nodes:


kubectl taint nodes $NODE_NAME key=value:effect
kubectl taint nodes disk=ssd:NoSchedule
Adding tolerations to pod:

tolerations:
- key: “key”
operator: “Equal”
value: “value”
effect : “NoSchedule”

81
operators:
- Equal
- Exists

effects:
- NoSchedule
- PreferNoSchedule
- NoExecute

Lifecycle Hooks
Lifecycle Hooks allow for attaching different types of actions to pod lifecycle. Hooks define
when some action should happen: after the container starts or before its termination.
Handlers say what to do: exec a command or send http requests.

Hooks:
PostStart - is executed right after container creation, in case of PostStart failure -
container is terminated and restarted. Until PostStart ends, POD is in Pending state.
PreStop - Is executed before POD termination. Until PreStop ends POD is in Terminating
state, Prestop needs to end before POD deletion signal is sent.

Handlers:
exec - runs a command to execute, resources used by this handler are counted as a
container resources
http - executes an HTTP request against a specific endpoint on the container

lifecycle: lifecycle:
postStart: preStop:
exec: exec:
command: ["/bin/sh", "-c", command: ["/bin/sh", "-c",
script.sh"] "script.sh"]

Liveness, readiness and startup probes


The kubelet uses probes to know when to restart a Container or when detach pods from
Service. There are 3 types of probe actions:
- exec
- http
- socket

There are 3 types of Kubernetes probes:

livenessProbe

Liveness probe checks pod health and restarts it when the probe fails.
If your container returns exit codes different than 0, you don’t need livenessProbe.

82
readinessProbe

Readiness probe test will make sure that at startup, the pod receives traffic only when the
test succeeds. Readiness probe is running during the entire Pods life.

startupProbe

Startup probe (if defined) is the first probe that will be executed. LivenessProbe will be
started after startupProbe successful execution (startup probe will not probe your app after
successful execution).

General probe options:


initialDelaySeconds: Number of seconds after the container has started before
liveness or readiness probes are initiated.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds.
Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1
second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered
successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
failureThreshold: Number of probes after which app will be considered as healthy.

Http specific options:


host: Host name to connect to, defaults to the pod IP. You probably want to set “Host” in
httpHeaders instead.
scheme: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.
path: Path to access on the HTTP server.
httpHeaders: Custom headers to set in the request. HTTP allows repeated headers.
port: Name or number of the port to access on the container. Number must be in the range 1
to 65535.
Example snippet of livenessProbe:
containers:
- name: nginx
image: nginx
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5

Example snippet of readinessProbe:


containers:

83
- name: nginx
image: nginx
readinessProbe:
httpGet:
path: /healthz
port: 81
initialDelaySeconds: 15
timeoutSeconds: 1
periodSeconds: 15

Node maintenance commands


Setting Node as no-schedulable/schedulable (new pods will not be scheduled):
kubectl cordon/uncordon $NODE_NAME
Setting Node as no-schedulable and evicting nodes:
kubectl drain $NODE_NAME --grace-period
Checking node and pod metrics (heapster required):
kubectl top node/pod $NODE_NAME/$POD_NAME
Checking state of cluster components
kubectl cluster-info

Managing limits and quotas for CPU and Memory


Managing container limits and quotas for CPU and Memory

There are 2 types of resources that can be assigned to containers:


- CPU (vCPU = 1 Hyperthread on a bare-metal Intel processor with Hyperthreading) -
the smallest unit of CPU allocation is 1milicore (1/1000 of 1vCPU time) example:
100m or 0.1
- Memory (by default set to the amount of bytes) - can be moderated by integer
suffixes: E, P, T, G, M, K or Ei, Pi, Ti, Gi, Mi, Ki (power-of-two equivalents) example:
128974848, 129e6, 129M, 123Mi.

There are 2 types of resource metadata:


- requests are guaranteed resources that will be provided to Pod. If total Pods
capacity (request) is less than node capacity - Pod will not be scheduled. Requests
are guaranteed resourced (they will be always available for a container).
- limits are hard quotas that if exceeded allows kubelet to restart a Pod or even evict it
from a node.

containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"

84
limits:
memory: "128Mi"
cpu: "500m"

Managing Namespace limits and quotas for CPU and Memory


If your cluster resources are divided using namespaces, and each namespace is used for
different purposes for example: staging, testing and production. It's worth setting resource
quotas per namespace - you will be sure that resources from staging namespace will not
throttle production applications. ResourceQuota is working in a namespace in which it was
created. Remember that ResourceQuota requires resource requests set at container level.

There are 3 main ResourceQuotas types:


Compute Resource Quota
- limits.cpu
- limits.memory
- requests.cpu
- requests.memory

Storage Resource Quota


- requests.storage
- persistentvolumeclaims

Object Count Quota - count/<resource>.<group>


- count/services
- count/secrets
- count/configmaps
- count/deployments.extensions
- count/pods

Container snippet:
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

85
Example ResourceQuota manifest:

apiVersion: v1
kind: ResourceQuota
metadata:
name: quota
spec:
hard:
requests.cpu: 800m
requests.memory: 1800m
limits.cpu: 2400m
limits.memory: 1800Mi

LimitRange
LimitRange sets default requests and limits for all containers in a namespace that do not
have resource limits and requests defined.

apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
spec:
limits:
- default:
memory: 512Mi
cpu: 1
defaultRequest:
memory: 256Mi
cpu: 1
type: Container

QoS
Kubernetes distinguishes 3 classes of QoS, each class is tightly related to the resource
requests and resource limits. The first one has the highest priority, then the second one, and
third.
- Guaranteed (limits=requests)
- Burstable (limits≠requests)
- Best-Effort (no limits and no requests)

86
initContainers
Init Containers are a special kind of container - they are always started before app
containers. Init containers support all the fields and features of app Containers except:
- They are always run to completion
- Each one must complete successfully before the next one is started.
- Init Containers do not support readiness probes

Deployment snippet:
...
template:
metadata:
labels:
app: nginx
spec:
initContainers:
- name: clone-repo
image: busybox
command: ['sh', '-c', 'git clone ...']
containers:
...

Pod priorities and preemptions


Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be
scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of
the pending Pod possible.

Setting Pods priorities requires 2 steps:


- Creation of PriorityClass
- Running Pods with attached priorityClassName

Versioning constraints:
In Kubernetes 1.9 and later, Priority also affects scheduling order of Pods and
out-of-resource eviction ordering on the Node.
Pod priority and preemption have been moved to stable since Kubernetes 1.14 and are
enabled by default in this release and later.

Preemption - ability to kill lower priority Pods to schedule higher priority Pods.

87
Example assignment of PriorityClassName snippet:
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
priorityClassName: high-priority

Assigning priorityClassName to a Pod, also affects scheduling priority. Pods with higher
priority values can be taken from the scheduler queue before Pods with a lower priority.

Example PriorityClass manifest:

apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Priority Class description"

value - defines priority value from 0 to 232


globalDefault - if set to yes defines default priority assigned to Pods without a
priorityClassName (only one policy in a cluster can be set as global)
description - description of PriorityClass
preemptionPolicy - defines if Pod can evict other Pods to be scheduled

Logging
Every string inside of the container that is redirected to /dev/stdout or /dev/stderr is
by default written to a json file (this is a default logging driver in Docker -
https://docs.docker.com/engine/admin/logging/overview) on the host..

Logging Architectures

Node-level logging

https://kubernetes.io/docs/concepts/cluster-administration/logging/

88
With that approach container logs are saved at
/var/log/containers/POD-NAME_NAMESPACE_CONTAINER-NAME.log at each node.
kubectl logs command read those files. By default, if a container restarts, the kubelet
keeps one terminated container with its logs. If a pod is evicted from the node, all
corresponding containers are also evicted, along with their logs.

Log rotation
Kubernetes by itself is not managing log rotation at node level. Log rotation is configured at
cluster deployment step, it can be configured using logrotate (log rotations is executed at
every hour and when log file exceeds 10MB) or container runtime settings.

Useful flags for kubectl logs command:

flag description

--all-containers Get all containers' logs in the pod(s).

-c Print the logs of this container

-f Specify if the logs should be streamed.

-p If true, print the logs for the previous


instance of the container in a pod if it exists.

--tail=5 Lines of recent log file to display. Defaults to


-1 with no selector, showing all log lines
otherwise 10, if a selector is provided.

--timestaps Include timestamps on each line in the log


output

kubectl logs --help


kubectl logs $PODS_NAME -c $CONTAINER_NAME -f

There a 2 cases in which kubectl logs command will not show proper logs:
- Logging drivers are used (they redirect logs from STDOUT and STDERR to files,
external hosts, logging systems or databases)
- Process launched in a container sends output logs to files, the workaround to send
them to /dev/stdout or /dev/stderr is to create a soft link between a log file
and /dev/stdout or /dev/stderr.

Example from official nginx image (nginx logs path is hardcoded):


RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
https://github.com/nginxinc/docker-nginx/blob/5c15613519a26c6adc244c24f814a95c786cfbc
3/stable/buster/Dockerfile#L95
Cluster-level logging

89
https://kubernetes.io/docs/concepts/cluster-administration/logging/
This approach assumes installing a logging agent on every node (with DaemonSets) and a
central log collector that gathers logs streamed from all agents. Popular Kubernetes logging
agents: Fluentd, Logstash, GrayLog, Fluent Bit. Logging-agent gathers logs from the
container log file and sends them to the logging backend before they will be rotated by
logrotate.

Streaming sidecar container

https://kubernetes.io/docs/concepts/cluster-administration/logging/

90
Sidecar container with logging agent

https://kubernetes.io/docs/concepts/cluster-administration/logging

Direct logs redirection from your app

https://kubernetes.io/docs/concepts/cluster-administration/logging/

Horizontal Pod Autoscalers with metrics-server


HPA can dynamically scale up and down Pods based on CPU or memory pressure.

HPA requires 2 components:


- metrics-server installed in your cluster
- resource requests assigned to containers

Example resource requests and limits snippet:


containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

91
Example HPA manifest V1 (CPU):

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80

Example HPA manifest V2beta1 (memory):

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: php-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 80

92
Example HPA manifest V2beta2 (CPU):

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi

93
Security
Security basics

1. Use Secrets
2. Kubernetes API - Enable Role Based Access Control
3. kube-apiserver --authorization-mode=RBAC,...
4. Use apparmor and seccomp
5. Run malicious applications in Sandboxes (gvisor & katacontainers)
6. Containers hardening:
- reduce attack surface
- run as non-root user
- set read-only filesystem
7. Scan your images for CVE using Anchore, Clair or trivy
8. Pods - Configure Pod Security Policies
9. kube-apiserver --enable-admission-plugins=PodSecurityPolicy..
10. Enable mtls (ServiceMesh)
11. Check workloads during the runtime (Falco)
12. Secure Your Dashboard access (kubectl proxy or kubectl port-forward)
13. Set NetworkPolicies that allow only for needed communication
14. Use CIS Benchmarks (kube-bench)
15. Verify platform binaries (sha512sum)
16. Disable ServiceAccount token mount (automountServiceAccountToken: false)
17. API restrictions:
- Don’t allow for anonymous access
- Close insecure port
- Don’t expose API to the external world
- Use RBAC
- Restrict access from nodes (NodeRestriction admission controller)
18. Encrypt etcd at rest
19. Use OPA to validate users configuration.
20. Run immutable containers (remove shells, make filesystem read-only, run container
as non-root)
21. Enable Auditing.

Useful links:
https://kubernetes-security.info/
https://www.cisecurity.org/cis-benchmarks/

Pentests
- docker-bench-security (https://github.com/docker/docker-bench-security)

94
- kubernetes-security-benchmark
(https://github.com/mesosphere/kubernetes-security-benchmark)
- kubernetes-cis-benchmark (https://github.com/neuvector/kubernetes-cis-benchmark)
- kube-bench (https://github.com/aquasecurity/kube-bench)
- kube-hunter (https://github.com/aquasecurity/kube-hunter)

SecurityContexts
Security context defines privileges and access control settings for Pods and containers.
Using security contexts you can define settings as user or group ids that will be assigned to
the main container process, or the supplemental group that owns pod’s volumes.

apiVersion: v1
kind: Pod
metadata:
name: security-context-pod
spec:
securityContext:
runAsUser: 1001
runAsGroup: 3000
fsGroup: 2000
containers:
- name: security-context-container
image: ubuntu
command: ["/bin/sh","-c","sleep infinity"]
volumeMounts:
- name: sec-vol
mountPath: /data
securityContext:
allowPrivilegeEscalation: false
runAsUser: 1000
volumes:
- name: sec-vol
emptyDir: {}

Notice that some options can be defined at pod and container level. In that case
configuration defined at container level takes precedence.

ubuntu@security-context-pod:/$ id
uid=1000 gid=3000 groups=3000,2000

Security context fields that are available only at pod level:


fsGroup - defines supplemental group that will be applied to all containers in a pod, for
some volume types kubelet will set this group as owner of the volume
fsGroupChangePolicy - defines behavior of changing ownership and permission of the
volume before being exposed inside Pod

95
supplementalGroups - A list of groups applied to the first process run in each container,
in addition to the container's primary GID
sysctls - hold a list of namespaced sysctls used for the pod
Security context fields that are available only at container level:
allowPrivilegeEscalation - controls whether a process can gain more privileges than
its parent process (set to true when container runs as Privileged or has CAP_SYS_ADMIN
capability)
capabilities - linux capabilities attached to container (add or drop)
privileged - defines container permissions, privileged equals to root on the host.
procMount - denotes the type of proc mount to use for the containers
readOnlyRootFilesystem - whether this container has a read-only root filesystem

Security context fields that are available at Pod and container level:
runAsGroup - defines group that will be assigned to all containers in the pod
runAsNonRoot - indicates that the containers must run as a non-root user
runAsUser - overwrites user ID defined at image level
seLinuxOptions - SeLinux context that will be applied (by default random SELinux context
will be applied)
seccompProfile - seccomp options used by container
windowsOptions - windows specific options

Troubleshooting
Troubleshooting Pod errors
ImagePullBackOff error - most commonly issue with accessing docker registry. There
are three primary culprits besides network connectivity issues:
- The image tag is incorrect
- The image doesn't exist (or is in a different registry)
- Kubernetes doesn't have permissions to pull that image
There is no observable difference in Pod status between a missing image and incorrect
registry permissions. In either case, Kubernetes will report an ErrImagePull status for the
Pods.

How to debug those errors?


kubectl describe pod <podname>
Log in to the node using ssh and check if you can download docker image manually using:
docker pull <imagename>

CrashLoopBackOff tells us that Kubernetes is trying to launch this Pod, but one or more of
the containers is crashing or getting killed (kubectl logs).

kubectl get pods -w


NAME READY STATUS RESTARTS AGE
configmap-pod 0/1 CreateContainerConfigError 0 2m11s

96
secret-pod 0/1 ContainerCreating 0 40m

Missing ConfigMap (CreateContainerConfigError) - ConfigMap declared in Pod


specification but not existing in API.

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 2s (x2 over 20s) kubelet, minikube Error:
configmaps "test-config" not found

Missing Secret (ContainerCreating) - Secret declared in Pod specification but not existing
in API.

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 24m (x11 over 30m) kubelet, minikube
MountVolume.SetUp failed for volume "test-secret" : secrets "test-secret"
not found

If a pod is stuck in Pending it means that it can not be scheduled onto a node. Generally this
is because there are insufficient resources of one type or another that prevent scheduling
(kubectl describe). You can also check Dashboard views.

Troubleshooting Services
Issues with resolving Services names:
Could not resolve host: <servicename>
How to debug it?
kubectl get svc
kubectl get endpoints
kubectl exec -ti POD-UID nslookup <servicename>
kubectl run client --image=appropriate/curl --rm -ti --restart=Never
--command -- curl http://<servicename>:80
If the error persists, that means that there is an issue with kube-dns .
Or check your service configuration using Kubernetes dashboard

Troubleshooting services managed by Service Managers


Services that are managed by service managers (not by Container Runtime):
- docker or rkt
- kubelet
- kube-proxy

Useful commands for debugging services managed by service manager:

97
- systemctl and journalctl for systemd
- service and initctl for upstart

Example unit files for systemd and upstart:


https://github.com/kubernetes/contrib/tree/master/init

Useful systemctl commands:


systemctl status application
systemctl start application
systemctl stop application
systemctl restart application
systemctl enable application
systemctl disable application

Useful journalctl commands:


journalctl
journalctl -b # logs from boot
journalctl -u application.service # logs related specific units
journalctl -f # tail logs

Here are the locations of the relevant log files (on systems without systemd):

Master
/var/log/kube-apiserver.log - API Server, responsible for serving the API
/var/log/kube-scheduler.log - Scheduler, responsible for making scheduling
decisions
/var/log/kube-controller-manager.log - Controller that manages replication
controllers

Worker Nodes
/var/log/kubelet.log - Kubelet, responsible for running containers on the node
/var/log/kube-proxy.log - Kube Proxy, responsible for service load balancing

Troubleshooting commands

General:
kubectl get <resource> (--all-namespaces)
kubectl get <resource>/<resourcename> -o=yaml
kubectl get <resource>/<resourcename> (-o=wide or -o=yaml or
-o=json)
kubectl describe <resource>/<resourcename>

Cluster:
kubectl api-resources or api-versions

98
kubectl cluster-info (dump)
kubectl get componentstatuses
kubectl get nodes or pods
kubectl top nodes or pods
kubectl get events
kubectl client and server version:
kubectl version

deployment:
kubectl describe deployment/<deployname>
kubectl describe replicaset/<rsname>

Pod:
kubectl get pods
kubectl describe pod/<podname>
kubectl logs <podname> (--previous) (-f)
kubectl exec -it <podname> -- command

RBAC:
kubectl auth can-i verb resource

Service:
kubectl get svc
kubectl get endpoints
kubectl exec -ti POD-UID nslookup <servicename>
kubectl run client --image=appropriate/curl --rm -ti --restart=Never
--command -- curl http://<servicename>:80

Kubernetes deployment tools


LAB:
minikube - installs one node K8s cluster using different virtualization providers
(https://github.com/kubernetes/minikube)
docker-compose - installs kubernetes cluster on local machine using
Production:
KOPS (Kubernetes Operations) is the best tool for deploying K8s cluster hosted on AWS,
GCP or DigitalOcean. Recommended when possible. (https://github.com/kubernetes/kops)
kubeadm - one of the first tools designed for K8s cluster deployments, kubeadm should be
used when other tools cannot be
kubespray - installs K8s cluster using Ansible - prefered on bare metal
(https://github.com/kubernetes-sigs/kubespray)

Kubernetes as a Service (hosted Kubernetes) with Terraform:


- Elastic Kubernetes Service (EKS - check https://eksworkshop.com/)

99
- Google Kubernetes Engine (GKE - check
https://cloud.google.com/kubernetes-engine/docs/tutorials/)
- Azure Kubernetes Engine (AKS -
https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-prepa
re-app)

AKS Product site:


https://azure.microsoft.com/en-us/services/kubernetes-service/
Cluster Deployment Tutorial:
https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-deplo
y-cluster
AKS step by step workshop:
https://docs.microsoft.com/en-us/learn/modules/aks-workshop/

Useful tools and apps around Kubernetes

Useful tools around Kubernetes:


kubectx - tool useful for switching contexts (when you are managing more than 1 cluster)
kubens - tool useful for switching default namespace
helm - tool for managing Kubernetes applications as packages, treat it as package manager
(for example: apt) for Kubernetes.

Useful apps around Kubernetes


kompose - will save hours of your time if you you are migrating from docker-compose to
Kubernetes
Prometheus - open-source systems monitoring and alerting toolkit
Vitess - database clustering system for horizontal scaling of MySQL
NATS - simple, high performance open source messaging system for cloud native
applications, IoT messaging, and microservices architectures.
Rook - File, Block, and Object Storage Services for your Cloud-Native Environments
Harbor - open source cloud native registry that stores, signs, and scans container images
for vulnerabilities.
Istio - Become a master of Kubernetes mesh network
Kubeless/Knative - serverless on Kubernetes
Kubeapps - service catalog with applications that can be launched on Kubernetes

Helm (highly recommended!!)


Helm is a Kubernetes package manager that allows to deploy Kubernetes applications
(Deployments, Services, ConfigMaps, Secrets etc), and treat them as a single entity. One of
the most powerful features of Helm is built-in Go Templating language, that allows for
dynamic variable injection, into Kubernetes manifests, setting conditionals and many more.

Helm documentation is available at: https://docs.helm.sh/

100
Helm packages repository: https://github.com/helm/charts
Helm Hub: https://hub.helm.sh/

Helm package structure:

Chart.yaml # A YAML file containing information about the chart


LICENSE # OPTIONAL: A plain text file containing the license for the chart
README.md # OPTIONAL: A human-readable README file
requirements.yaml # OPTIONAL: A YAML file listing dependencies for the chart
values.yaml # The default configuration values for this chart
charts/ # A directory containing any charts upon which this chart depends.
templates/ # A directory of templates that, when combined with values, # will
generate valid Kubernetes manifest files.
templates/NOTES.txt # OPTIONAL: A plain text file containing short usage notes

Helm has two parts: a client (helm) and a server (tiller).


Helm support Go Template language which makes it a powerful template engine!
(http://masterminds.github.io/sprig/)

Installing tiller:
helm init

Getting values from values file:

values.yaml
nginx:
image: nginx

In templates (manifests):
{{ .Values.nginx.image }}

Using release name in template files


{{ .Release.Name }}

101
Dynamically adding files as ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
name: index-html
data:
index.html: |
{{ .Files.Get "index.html" | indent 4}}

Dynamically key=value ConfigMap/Secret from values file dictionary:

values.yaml
nginx:
dockerImage: nginx
pullPolicy: Always
envVars:
DB: mysql
DB_PASS: password

ConfigMap:
apiVersion: v1
data:
{{- range $key, $value := .Values.nginx.envVars }}
{{ $key }}: {{ $value | quote }}
{{- end }}
kind: ConfigMap
metadata:
name: nginx-env-vars

values.yaml
nginx:
creds:
DB: mysql
DB_PASS: password

Secret:
apiVersion: v1
kind: Secret
metadata:
name: nginx-credentials
{{- range $key, $value := .Values.nginx.creds }}
{{ $key }}: {{ $value | toString | quote | b64enc }}
{{- end }}

102
Notes

103
104

You might also like