Kubernetes + Docker Cheatsheet
Kubernetes + Docker Cheatsheet
Docker Architecture 2
Installing docker components 4
docker client and daemon 4
docker-compose 4
docker-machine 4
Docker container lifecycle 5
docker subcommands 5
Getting help 7
docker run 7
docker build 8
docker rm 8
docker network 8
docker volume 8
docker bulk commands 9
Dockerfile 9
Dockerfile syntax 9
Dockerfile instructions 9
docker-compose 15
docker-machine 16
Swarm (multinode) 17
docker node 17
docker service 18
Docker Security 18
Recommendations 19
Images 19
Containers 19
Docker Swarm 19
Kubernetes Architecture 20
kubectl command 22
Integrated Development Environment 23
Dashboard 23
Security 94
Security basics 94
Pentests 94
SecurityContexts 95
Troubleshooting 96
Troubleshooting commands 98
Kubernetes deployment tools 99
Useful tools and apps around Kubernetes 100
1
Docker Architecture
https://docs.docker.com/engine/images/architecture.svg
container: created form image, holds everything that is needed for an application to run (run
component)
Underlying technologies
namespaces Cgroups
pid memory
net CPU
mnt block I/O
ipc network
uts
UnionFS libcontainer
AUFS LXC
btrfs
vfs
DeviceMapper
2
● namespaces
o pid namespace: used for process isolation (Process ID)
o net namespace: used for managing network interfaces
o mnt namespace: used for managing mount-points
o ipc namespace: used for managing access to IPC resources (InterProcess
runCommunication)
o uts namespace: used for isolating kernel and version identifiers (Unix Timesharing
System)
https://devopscube.com/wp-content/uploads/2015/02/docker-filesystems-busyboxrw.png
https://docs.docker.com/storage/storagedriver/images/container-layers.jpg
3
● container format
o two supported container formats: libcontainer, LXC
https://i.stack.imgur.com/QVNR6.png
docker-compose
sudo curl -L
https://github.com/docker/compose/releases/download/$dockerComposeVersion/doc
ker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version
docker-machine
curl -L
https://github.com/docker/machine/releases/download/v0.12.2/docker-machine-`u
name -s`-`uname -m` >/tmp/docker-machine &&
4
chmod +x /tmp/docker-machine &&
sudo cp /tmp/docker-machine /usr/local/bin/docker-machine
docker subcommands
docker attach Attach local standard input, output, and error streams to a running container
docker import Import the contents from a tarball to create a filesystem image
5
docker inspect Return low-level information on Docker objects
docker port List port mappings or a specific mapping for the container
docker save Save one or more images to a tar archive (streamed to STDOUT by default)
docker wait Block until one or more containers stop, then print their exit codes
6
Getting help
docker --help list all docker commands
docker run
foreground mode (default) - stdout and stderr are redirected to the terminal, docker run
propagates the exit code of the main process
-t allocate pseudo terminal for the container (stdin closed immediately, terminal signals are not
forwarded)
-itd open stdin, allocate terminal and run process in the background (required for docker attach)
-w working directory
-h container hostname
-l sets labels
-v /path creates random name volume and attach it to the container path
--name assign name to container (by default a random name is generated → adjective name)
--network attach docker interface to the specified network (by default it connect container to
the bridge network)
7
docker build
-f used for custom dockerfile names
docker rm
-f force removal of a running container
docker network
connect connect a container to a network
ls list networks
docker volume
create Create a volume
ls List volumes
8
docker bulk commands
docker rm -f $(docker ps -q) - delete all running containers containers
Dockerfile
Dockerfile syntax
Dockerfile instructions
FROM – sets the base image (required)
FROM ubuntu
FROM ubuntu:18.04
FROM
ubuntu@sha256:34471448724419596ca4e890496d375801de21b0e67b81a77fd615
5ce001edad
RUN – execute any command in containers and creates new fs layer, RUN use 2 forms:
by default run commands in /bin/sh -c command. Long commands can be split by backslash
mark “\” and commands can be launched in sequence using “&&”.You can also use different
operators in this form.
This form is recommended when you are configuring CMD or ENTRYPOINT. exec form can pass
linux signals as opposed to shell form.
9
RUN apt-get update && \
COPY – copy files into containers / ADD has more functionalities as: unpack tar archives, copy files
and remote file URLs into containers
COPY test.sh /
ADD file.tar.xz /
- The --chown feature is only supported on Dockerfiles used to build Linux containers, and
will not work on Windows containers.
SHELL - Sets default shell that will be used for other instructions
- Each SHELL instruction overwrite all previous SHELL instructions (it also affects all
subsequent instructions)
- This instruction is commonly used on Windows to switch between cmd and Powershell
- The SHELL instruction must be written in JSON form in a Dockerfile.
CMD – provides defaults for running containers / ENTRYPOINT - configure container that will run as
executable
- Both of them specify the startup command for an image and both of them can be easily
overwritten with docker run command
10
- CMD is the default one
- ENTRYPOINT run container as an executable
- CMD and ENTRYPOINT are using the same instruction forms:
CMD executable param1 param2 ... ENTRYPOINT executable param1 param2 ...
This form creates single cache layer per one ENV instruction
This declaration requires “” or \ for escaping spaces, 1 ENV instruction means creation of 1 Docker
image layer.
Declared env vars can be used in Dockerfile using the following 2 syntaxes:
$KEY
${KEY}
- it is just for informational purpose (it do not publish your container port on the host port)
- publishing ports on nodes require adding –p or –P flag at docker run command
- ports can be published externally on a different port numbers
EXPOSE 80 443
- sets the user name (or UID) and a group for a running container (also works for RUN, CMD
and ENTRYPOINT instructions)
- if user isn’t assigned to a primary group, root group will be used
11
USER <user>[:<group>]
USER <UID>[:<GID>]
- Creates mount point and marks it as holding externally mounted volumes from host and
other containers perspective
- If any kind of data is available at VOLUME path, during container run it will be copied into
VOLUME (on the host)
- On Windows, destination of a volume (inside a container) has to be a non-existing or empty
directory, and has to be other than drive C:
- Order mathers! If any build step change data in a volume after volume declaration, all newly
created or edited data will be discarded
- There is no option to point host directory in Dockerfile (it has to be done by using docker run
command)
VOLUME /mountpath
VOLUME /myvolume
VOLUME [”/myvolume”]
LABEL $KEY=$VALUE
LABEL MAINTAINER=”[email protected]”
- Defines variables that can be passed at build-time to the docker daemon (--build-arg
<var>=<value>)
- ENV variables overwrite, vars defined by ARG
- ARG vars are available form the place in which they are declared
- Do not use credentials with ARG instructions
- ARG variables can contain default values
ARG <var>
ONBUILD - add trigger used when the image is used as the base image
- adds a trigger instruction that will be executed when the image will be used as a base image
(instruction will be executed in downstream build)
12
- any build instruction can be used (except FROM or MAINTAINER)
WORKDIR - sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions
that follow it in the Dockerfile
WORKDIR $CONTAINER_PATH
WORKDIR /usr/local/bin
STOPSIGNAL – sets the system call signal that will be sent to the container to exit
STOPSIGNAL integer
STOPSIGNAL signal_name
STOPSIGNAL SIGSTOP
13
Build context
The build context is the directory at a specified location PATH or URL. The PATH is a directory on your
local file system. The URL is a Git repository location.
Docker client is compressing the directory (and all subdirectories) and sends it to docker daemon.
docker context is pointed using the last argument of docker build command (usually by “.”)
Building images
Multi-stage builds
Multi-stage builds allow for storing only the result of build in the second image, without
development dependencies. It requires at least 2 FROM instructions.
#STAGE 0
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
COPY app.go .
#STAGE 1
WORKDIR /root/
CMD ["./app"]
14
The COPY --from=0 line copies just the built artifact from the previous stage into this new stage.
The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.
docker-compose
docker-compose - allows for managing many containers with one
command and one file
ps List containers
15
docker-compose.yaml - file structure
docker-machine
docker-machine
env Display the commands to set up the environment for the Docker client
16
kill Kill a machine
ls List machines
rm Remove a machine
Swarm (multinode)
docker swarm
docker node
demote demote one or more nodes from manager in the swarm
17
docker service
create create new service
ls list services
ps list tasks
rm remove service
Docker Security
- Docker is only as secure as the underlying host.
- Get familiar with Center for Internet Security Benchmarks! (www.cisecurity.org)
Linux:
Docker:
docker run \
-it \
--net host \
--pid host \
--userns host \
--cap-add audit_control \
-e DOCKER_CONTENT_TRUST=$DOCKER_CONTENT_TRUST \
-v /var/lib:/var/lib \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /usr/lib/systemd:/usr/lib/systemd \
-v /etc:/etc --label docker_bench_security \
docker/docker-bench-security
18
clair (github.com/coreos/clair)
Runtime security:
Host can be also secured from unwanted access by attaching apparmor, seccomp and selinux
security profiles.
Recommendations
Images
- Keep images as small as possible
- Cut back number of layers
- Use appropriate - already built images (nginx:1.15 instead of ubuntu + nginx)
- Create your own base images if your images has a lot off in common
- Always declare image tag (avoid using default :latest tag)
- Plan and apply consistent naming convention for image tags (staging, production, alpha,
beta)
- Create images that are as general as possible, mount config files using Configs or ConfigMaps
(as well as Secrets).
- Use multistage builds
Containers
- Deploy applications using manifest files (docker-compose or k8s manifests)
- Always set containers limits (especially for cpu and memory)
- Use one of the container orchestration engines
- Remove all volume bindings (store app code in the image)
- Set different env variables
- Specify restart policies and healthchecks
- In Swarm use compose files that will define the entire application (from many microservices).
- In Kubernetes use Helm package manager to maintain application configuration (Highly
recommended)
Docker Swarm
- Design your apps to be stateless - stateless apps are much easier to scale
- A service’s configuration is declarative, and Docker is always working to keep the desired and
actual state in sync.
- Use docker stack with Swarm (compose version ”3”)
- Use docker registry (do not build images using build option in docker-compose.yaml!)
- Consider using even 1 node in Swarm mode:
- use configs and secrets for storing configuration files and credentials
- scale up and down your containers
- built-in HA and Load Balancing
19
Kubernetes Architecture
https://lh3.googleusercontent.com/EU3DgtFKagWp5S0UpKj-wRgx8WK2nvQ2BG-4dGio57pGNj42A7Lip9IARBba34hIm84-_7z
wWt6iImQE8beSqLxpzXm-2w_84M_X2IHQ7jvpWtIDMF81hmq6N4hGSxp6DQoFW5qX
20
Master components:
1. kube-apiserver - Its main Kubernetes component that is managing all Master and Worker
components using REST API. The Kubernetes API server validates and configures data for
the api objects which include pods, services, replicationcontrollers, and others.
2. etcd - Kubernetes datastore used for saving cluster state in distributed key-value store.
Etcd is the most crucial cluster component, and the only one that needs to be backed up.
1. kubelet - It's a daemon that is responsible for managing container Runtime that is
installed on each worker node.
2. kube-proxy - Provides service abstraction by passing traffic into pods (using iptables).
3. container runtime - It's a set of standards and technologies that together can run
containerized applications.
21
kube-scheduler - X-1 -> (1.21, 1.20)
kubelet - X-2 -> (1.21, 1.20, 1.19)
kube-proxy - X-2 -> (1.21, 1.20, 1.19)
kubectl - X+1 > X-1 -> (1.22,1.21, 1.120)
kubectl command
kubectl is a binary used for Kubernetes objects management. Using this command you can
view, create, delete and edit K8s objects.
Syntax:
command: Specifies the operation that you want to perform on one or more resources, for
example create, get, describe, delete.
TYPE: Specifies the resource type - pod, deployment, job, cronjob etc
NAME: Specifies the name of the resource. Names are case-sensitive. If the name is omitted,
details for all resources are displayed, for example kubectl get pods.
Examples:
Command Description
22
kubectl describe pod $POD_NAME Describe pod settings
Dashboard
Remember that a dashboard is a separate component that is not by default installed on
production environments. It can be installed as addon (minikube, kubespray) or
separate application (helm).
Minikube:
Installing dashboard: minikube addons enable dashboard
23
Connecting to dashboard: minikube dashboard
Helm:
https://github.com/kubernetes/dashboard/tree/master/aio/deploy/helm-chart/kubernetes-dash
board
http://localhost:8001/api/v1/namespaces/kube-system/services/https:k
ubernetes-dashboard:/proxy/
Remember that Dashboard does not support certificate based authentication, you need to
use token instead.
Imperative commands:
kubectl run nginx --image nginx
kubectl create deployment nginx --image nginx
Imperative object configuration:
kubectl create/apply -f file/url
Declarative object configuration (recommended!):
kubectl create/apply -f directory/
apiVersion - Which version of the Kubernetes API you’re using to create this object
kind - What kind of object you want to create (treat it as programming class
https://simple.wikipedia.org/wiki/Class_(programming)
metadata - Data that helps uniquely identify the object, including a name string, UID, and
optional namespace (treat is a object creation)
spec - The precise format of the object spec is different for every Kubernetes object, and
contains nested fields specific to that object (treat it as object specification)
24
Each Kubernetes object requires the above fields. In many cases if you will not specify
values for specification fields, cluster defaults will be applied.
Basics
Namespaces
Used for object separation, provides scope for names, and splits physical cluster into logical
spaces. You can also limit the quota for each namespace for resources utilization. Most
Kubernetes objects are scoped to Namespace. You cannot have 2 the same object types
(for example pods) with the same name in the same namespace.
Example manifest:
apiVersion: v1
kind: Namespace
metadata:
name: production
Nodes
From Kubernetes perspective Node is an object (machine) on which workloads can be
started. For K8s it doesn’t matter if Node is a physical or virtual machine. Remember that
nodes cannot be created by Kubernetes itself.
25
addresses Hostname, InternalIP, ExternalIP
Pods
Pod is the smallest shedulable unit in Kubernetes. A pod describes an application running on
Kubernetes. Pod defines a group of containers that share: namespaces, volumes, ip address
and port space. Each Pod has a unique IP address, containers within a POD can
communicate using localhost - this means that port conflict occurs between containers in a
Pod. Kubernetes Pods are mortal. They are born and they die, and they are not resurrected.
Example manifest:
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
# below section defines Pod specification (Pod v1 core)
restartPolicy: Always
hostname: nginx
serviceAccountName: nginx
containers:
# below section defines container specification (Container v1 core)
- name: nginx
image: nginx
ports:
- containerPort: 80
26
PodSpec fields (Pod v1 core):
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/
#podspec-v1-core
Container v1 core:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/
#container-v1-core
27
command and args fields
With use of command and args you can overwrite the default main container process and its
arguments that are defined in Dockerfile.
apiVersion: v1
kind: Pod
metadata:
name: ubuntu
labels:
app: ubuntu
spec:
containers:
- name: ubuntu-command-example
image: ubuntu
command: ["printenv"]
args: ["HOSTNAME", "HOME"]
restartPolicy: OnFailure
Notice that command and args fields expect lists to be passed - even one element list.
Static Pods:
Static pods are a way of creating and managing pods using kubelet (without kube-api).
Pods are created/deleted based on local directory content or remote url. Pods manifest path
28
and url are checked by kubelet every 20 seconds. When a static pod is created kubelet
tries to register it in api (pod will be visible in kube-api but cannot be managed from there).
Assigning labels:
metadata:
labels:
app: nginx
Label selectors are core grouping primitive in Kubernetes. They are used by users to select
a set of objects. Not all objects support selectors, some of objects that are tightly related to
selectors: Deployments, ReplicaSets, Services.
Assigning selectors:
- equality-based selectors:
selector:
app: nginx
- set-based selectors:
selector:
matchLabels:
app: nginx
or
selector:
matchExpressions:
- {key: app, operator: In, values: [nginx]}
29
Selector type Equality-based selectors Set-based selectors
selector: selector:
env: prod matchLabels:
env: prod
#or
matchExpressions:
- {key: env, operator: In, values:
[prod]}
- {key: env, operator: NotIn, values:
[dev]}
Matching objects must satisfy all Not every resource supports set-based selectors.
of the specified label constraints,
though they may have additional
labels as well.
Replication Controllers
ReplicationController (rc or rcs) ensures that a specified number of PODs are running in a
cluster. Replication Controller doesn’t support set-based selectors - use RepicaSet when it's
possible.
30
Example RC manifest:
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
# below section defines ReplicationController specification
replicas: 2
selector:
app: nginx
# below section defines Pod configuration
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
31
ReplicaSets
RepicaSet (rs) is a new version of Replication Controller that supports set-based selectors. It
ensures that a specified number of PODs are running in a cluster.
Example RS manifest:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx
labels:
app: nginx
spec:
# below section defines ReplicaSets specification
replicas: 2
selector:
matchExpressions:
- {key: app, operator: In, values: [nginx]}
# below section defines Pod configuration
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Deployments
Deployments control ReplicaSets and Pods configuration using one manifest file. After
creation, the state of Pods and ReplicaSets are checked by the Deployment Controller.
Deployments are used for long running processes - daemons. If anything happens with a
Pod that is connected to Deployment (Pod will be deleted, node will be destroyed), Pod will
be recreated.
32
Example Deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
# below section defines Deployment specification
strategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: nginx
# below section defines Pod configuration
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
DeploymentSpec:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#deploymentspec-v1-a
pps
strategy:
type: RollingUpdate - define rolling update strategy (no outage)
rollingUpdate:
maxUnavailable: 2 - specifies the maximum number of Pod replicas that can be
unavailable during the update process.
maxSurge: 2 - specifies the maximum number of Pods that can be created over the
desired number of Pod replicas.
33
Example update process for 3 pod replicas:
Pod-template-hash label
This is an automatically created label that is added by the Deployment controller to every
ReplicaSet that is created using the Deployment manifest. Value of this label is a hash
created from the PodTemplate field (PodSpec). If You will not change anything in the
PodSpec section and You will try to update Your app - Kubernetes will not trigger application
upgrade (deployment.apps/nginx unchanged).
labels:
app: nginx
pod-template-hash: f89759699
Updating a Deployment
A Deployment’s rollout is triggered if and only if the Deployment’s pod template (that is,
.spec.template) is changed, for example if the labels or container images of the template
are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.
In API version apps/v1, a Deployment’s label selector is immutable after it gets created.
https://blog.container-solutions.com/kubernetes-deployment-strategies
34
Jobs
Jobs are tasks that run to completion (they do not behave like daemons - long running
processes. Jobs are resources that once created cannot be updated (You need to delete it
and create again). After Job completion Pods are not deleted - logs, warning and diagnostic
output stay until you will delete it.
Example manifest:
apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
# below section defines Job specification
completions: 3
parallelism: 3
# below section defines Pod configuration
template:
metadata:
name: example-job
spec:
containers:
- name: pi
image: perl
command: ["perl"]
args: ["-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
JobsSpec:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/
#jobspec-v1-batch
Remember that the Job object requires Pod restartPolicy set to Never or OnFailure
(default policy is Always).
35
Cronjobs
Runs jobs at specified point in time:
- Once at a specified point in time
- Repeatedly at a specified point in time
Kubernetes CronJobs are using cron format to define in which jobs will be executed.
Example manifest:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: example-cronjob
spec:
# below section defines CronJob specification
schedule: "0 0 * * *"
concurrencyPolicy: Forbid
# below section defines Job specification
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: example-job
image: ubuntu
command: ["/bin/sh", "-c", "--"]
args: ["for i in `seq 1 1 100`; do echo $i; done"]
restartPolicy: Never
36
concurrencyPolicy - Specifies how to treat concurrent executions of a Job.
- Allow - allows for concurrent jobs (default)
- Forbid - skips next run if previous one doesn’t finished
- Replace - kills currently running job and create a new one
failedJobsHistoryLimit - The number of failed finished jobs to retain.
schedule - cron format schedule of a job
startingDeadlineSeconds - time buffer for missed job execution, after this deadline job
will be counted as failed
successfulJobsHistoryLimit - The number of successful finished jobs to retain.
suspend - suspend job execution, it will not suspend already running job (it will suspend
execution of new Jobs, not currently running)
jobTemplate - specifies Job configuration
DaemonSets
Ensures that one instance of a Pod is running on each (or some) node. When new nodes
are added to the cluster K8s automatically runs Pod from a DaemonSets on a new node.
Mainly used for:
- Cluster storage daemons (ceph, glusterd)
- Logs collector daemons (fluentd, logstash)
- Monitoring daemons (Prometheus Node Exporter, Datadog agent...)
37
command:
- /bin/sh
args:
- -c
- >-
while [ true ]; do
echo "DaemonSet running on $(hostname)" ;
sleep 10 ;
done
StatefulSets
StatefulSets are the way of managing Pods that persists their identity. Each Pod managed
by StatefulSets has a persistent identifier that it maintains across any rescheduling.
Remember that StatefulSet requires a special type of Service object - Service without ip
address (in that case service discovery has to be implemented by a third party tool).
If You want to reach a Pod managed by StatefulSet use one of the following approaches:
- in the same namespace: <pod_name>.<service_name>
- cluster wide FQDN:
<pod_name>.<service_name>.<namespace>.svc.cluster.local
StatefulSetSpec:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#statefulsetspec-v1-ap
ps
38
revisionHistoryLimit - the maximum number of revisions that will be maintained in the
StatefulSet's revision history
updateStrategy - defines how Pods are updated
rollingUpdate (default)
partition - If a partition is specified, all Pods with an ordinal that is greater than or equal
to the partition will be updated when the StatefulSet’s .spec.template is updated. All Pods
with an ordinal that is less than the partition will not be updated, and, even if they are
deleted, they will be recreated at the previous version.
OnDelete - Pod is deleted and new one is created after
volumeClaimTemplates - list of claims that pods are allowed to reference
39
image: nginx
ports:
- containerPort: 80
name: web
volumeMounts:
- name: nginx-html-volume
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-html-volume
awsElasticBlockStore:
volumeID: <volume-id>
fsType: ext4
Annotations
Annotations are a special kind of metadata that are attached to objects. Labels can be used
to select objects and to find collections of objects that satisfy certain conditions. In contrast,
annotations are not used to identify and select objects. Different client tools and libraries can
retrieve this metadata and use them for different purposes. Annotations are not used
internally by K8s. The metadata in an annotation can be small or large, structured or
unstructured, and can include characters not permitted by labels.
40
Authorization - define a set of privileges that are assigned to authenticated entity. Currently
there are 4 authorization modules in k8s:
- Node
- ABAC
- RBAC (Recommended)
- Webhook
Users are manually managed by Kubernetes admins or by external identity providers. User
entity can be confirmed by one of the following authentication strategies:
- Certificate-based (X509 Client Certs, --client-ca-file=$CERT_FILE)
- Username/Password-based (--basic-auth-file=$CRED_FILE)
- Pre-Generated Token-Based (--token-auth-file=$TOKEN_FILE)
- Service Account Tokens
- OpenID Connect Tokens
Each user object can contain the following key: value pairs:
- Username: name that uniquely identifies the end user, examples: admin,
[email protected]
- UID: a string (number) which identifies the end user
- Groups: define a group name to which user belongs
- Extra fields: additional metadata that can be used by authorizers
source: https://kubernetes.io/docs/concepts/security/controlling-access/
Admissions Controllers
Admission Controllers are pieces of code that intercept requests before they are sent to the
API. There are 2 kinds of Admissions Controllers: mutating (they can modify objects that
they admit) and validating(they validate requests sent to the API). Admission Controllers are
compiled in API.
41
Why do I need Admission Controllers?
https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/#why-do-i
-need-admission-controllers
Admission Controllers that are available by default (in current K8s version):
CertificateApproval, CertificateSigning,
CertificateSubjectRestriction, DefaultIngressClass,
DefaultStorageClass, DefaultTolerationSeconds, LimitRanger,
MutatingAdmissionWebhook, NamespaceLifecycle,
PersistentVolumeClaimResize, Priority, ResourceQuota, RuntimeClass,
ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition,
ValidatingAdmissionWebhook
If You want to disable one of the default Admission plugins You need to use
--disable-admission-plugins flag.
Service Accounts
Service accounts are used for assigning Kubernetes API permissions to Pods.
- All Service Account Users are using Service Account Tokens
- They are stored as credentials using Secrets (Secrets are also mounted in pods to
allow communication between services)
- Service Account are specific to a namespace
- Can be created by API or manually using objects
- Any API call that is not authenticated is considered as an anonymous user
example manifest:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-service-account
42
containers:
- name: nginx
...
ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-service-account
automountServiceAccountToken: false
Pod level:
spec:
serviceAccountName: admin-service-account
automountServiceAccountToken: false
containers:
- name: nginx
...
Certificate-based Authentication
Certificates can be manually generated by: easyrsa, openssl, cfssl.
Client certificates need to be defined on API level by --client-ca-file=$CERT_FILE
flag. $CERT_FILE needs to consist of one or more certificate authorities. Username is
defined by CN (Common Name) of certificate, groups are defined by O (organization)
certificate field.
You need to create user certificate with a proper Common Name and Organization, that is
signed by CA that was used for starting Kubernetes API (--client-ca-file flag)
kubeconfig
Kubeconfig is a file used by kubectl command to access Kubernetes API - treat it as a
key to Your cluster. Kubeconfig file is divided into 3 sections:
- user (define username and authentication method)
43
- cluster (api endpoint and optional certificate settings)
- context (connects user and cluster entries).
- default namespace (optionally)
44
Example Role/ClusterRole manifest:
kind: Role # or ClusterRoler
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default # ommited in ClusterRole
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
Possible configurations:
apiVersion: v1 ""
45
Example RoleBinding manifest:
User binding:
subjects:
- kind: User
name: tom
apiGroup: rbac.authorization.k8s.io
ServiceAccount binding:
subjects:
- kind: ServiceAccount
name: scripts
apiGroup: ""
Group binding:
subjects:
- kind: Group
name: admins
apiGroup: rbac.authorization.k8s.io
If you want to check your permission you can use kubectl auth can-i command:
kubectl auth can-i <verb> <resource>
$ kubectl auth can-i get pods
yes
46
Notice that rules field in Role/ClusterRole and subjects field in
RoleBinding/ClusterRoleBinding are lists, which means that You can define many
rules in 1 Role, and you can attach a Role to many identities in 1 RoleBinding.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: pod-job-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
- apiGroups: [“batch”]
resources: ["jobs"]
verbs: ["get", "watch", "list"]
...
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader-binding
namespace: default
subjects:
- kind: User
name: tom
apiGroup: rbac.authorization.k8s.io
- kind: Group
...
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
47
Assigning Role to a Pod
In each Kubernetes cluster you can find predefined groups and service accounts. You can
check them by executing:
Notice that in each namespace you can find a default service account that is attached to
all Pods in the namespace.
48
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#defaul
t-roles-and-role-bindings
49
Networking
Services
Services are REST objects that act as persistent endpoint and load balancer for pods
communication. Each service name and IP is propagated by Kubernetes DNS to the entire
K8s cluster. Service redirects traffic based on relation between selectors assigned to
Services and labels assigned to Pods. Traffic that will arrive on Service name will be
redirected to Pods with particular label. In the world of microservices, containers always
communicate with each other using Services - not drectly!
Kubernetes supports 2 primary modes of finding a service discovery - ENV variables and
DNS.
ServiceSpec v1 core
type - type determines how the Service is exposed. Defaults to ClusterIP. Valid options are
ExternalName, ClusterIP, NodePort, and LoadBalancer
- clusterIP - define cluster ip related to service (can be set automatically by master
or manually)
- externalIPs - is a list of IP addresses for which nodes in the cluster will also
accept traffic for this service
- externalName - externalName is the external reference that kubedns or equivalent
will return as a CNAME record for this service
- loadBalancer - created external loadbalancer
healthCheckNodePort - specifies the healthcheck nodePort for the service
loadBalancerIP - only applies to Service Type: LoadBalancer LoadBalancer will get
created with the IP specified in this field (needs to be supported by cloud provider)
50
loadBalancerSourceRanges - this option will restrict traffic through the cloud-provider
load-balancer will be restricted to the specified client IPs
ports - the list of ports that are exposed by this service
selector - route service traffic to pods with label keys and values matching this selector
sessionAffinity - supports "ClientIP" and "None". Used to maintain session affinity.
Enable client IP based session affinity
sessionAffinityConfig: - contains the configurations of session affinity
clientIP:
timeoutSeconds: 10 (default is 10800 - 3hours, max value is 86400)
ServicePort v1 core
name - the name of this port within the service. This must be a DNS_LABEL
nodePort - the port on each node on which this service is exposed when type=NodePort or
LoadBalancer. Usually assigned by the system
port - The port that will be exposed by this service.
protocol - the IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default is
TCP.
targetPort - number or name of the port to access on the pods targeted by the service.
Number must be in the range 1 to 65535
Service Types:
51
Example manifest file:
kind: Service
apiVersion: v1
metadata:
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
selector:
app: nginx
type: ClusterIP
ports:
# below section defines Service specification (ServicePort v1 core)
- port: 80
52
NodePort Service is used for setting access from outside of the cluster, the application is
then accessible by <node-ip>:<port> (port has to be from 30000-32767 range).
NodePort redirects traffic to automatically created ClusterIP Service. Services are sets
of routing policies that are injected into node configuration by kube-proxy.
apiVersion: v1
kind: Service
metadata:
labels:
service: nginx
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
type: NodePort
selector:
app: nginx
ports:
# below section defines Service specification (ServicePort v1 core)
- name: "443"
port: 443
nodePort: 30443
53
LoadBalancer - This Service will provision external loadbalancer using cloud providers
infrastructure.
Example manifest:
kind: Service
apiVersion: v1
metadata:
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
selector:
app: nginx
type: LoadBalancer
ports:
# below section defines Service specification (ServicePort v1 core)
- protocol: TCP
port: 80
54
ExternalName - creates an alias (CNAME record) in Kubernetes DNS. In this case
redirection is made at DNS level - not at routing level.
kind: Service
apiVersion: v1
metadata:
name: nginx
spec:
# below section defines Service specification (ServiceSpec v1 core)
type: ExternalName
externalName: example.nginx.site
kind: Service
apiVersion: v1
metadata:
name: nginx
Spec:
# below section defines Service specification (ServiceSpec v1 core)
selector:
app: nginx
externalIPs:
- 192.168.1.1
# below section defines Service specification (ServiceSpec v1 core)
ports:
- name: http
protocol: TCP
port: 80
55
CNI
CNI specification:
https://github.com/containernetworking/cni/blob/master/SPEC.md
CNI components:
- CNI binary: configures network interface of the Pod
- Daemon: manages routing across the cluster (installed on every Node)
Network Policies
NetworkPolicies are Kubernetes internal firewalls. A network policy is a specification of
how groups of pods are allowed to communicate with each other and other network
endpoints.
By default, pods are non-isolated; they accept traffic from any source.
56
Pods become isolated by having a NetworkPolicy that selects them. Once there is any
NetworkPolicy in a namespace selecting a particular pod, that pod will reject any
connections that are not allowed by any NetworkPolicy.
NetworkPolicies require 2 components to work: CNI provider (Calico, Canal, Weave Net, etc)
and NetworkPolicy manifest. CNI providers can be deployed using Kubernetes manifest,
installation instructions can be found at websites of CNI providers.
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: api-allow
spec:
podSelector:
matchLabels:
app: bookstore
role: api
ingress:
- from:
- podSelector:
matchLabels:
app: bookstore
57
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
- Egress
ingress:
- from:
- ipBlock:
cidr: 172.17.0.0/16
except:
- 172.17.1.0/24
- namespaceSelector:
matchLabels:
project: myproject
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 6379
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 5978
NetworkPolicy Spec:
podSelector: Each NetworkPolicy includes a podSelector which selects the grouping
of pods to which the policy applies (An empty podSelector selects all pods in the
namespace).
policyTypes: Can be set to Ingress, Egress, or both. The policyTypes field indicates
traffic direction: send to a Pod or from a Pod. If no policyTypes are specified on a
NetworkPolicy then by default Ingress will be set.
ingress: Each NetworkPolicy may include a list of whitelist ingress rules. Each rule
allows traffic which matches both the from and ports sections. Ingress field can contain 3
source fields: ipBlock, namespaceSelector and a podSelector.
58
egress: Each NetworkPolicy may include a list of whitelist egress rules. Each rule allows
traffic which matches both the to and ports sections. The example policy contains a single
rule, which matches traffic on a single port to any destination in 10.0.0.0/24.
Deny all ingress traffic in namespace (empty policy attached to all pods):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
59
- {}
policyTypes:
- Ingress
For the most security demanding environments and clients like banks and financial services
you should start with the default-deny-all NetworkPolicy and then you should add
NetworkPolicies that will allow for the traffic between your apps.
This policy selects all pods in namespace and blocks all of the traffic. This is implementation
of Principle of least privileges (POLP) at NetworkPolicy level.
In case of deny-all (ingress&egress) creation you need to remember that in case of allowing
traffic between your pods, you need to define ingress and egress policies and attach them to
proper pods.
60
... ...
egress: ingress:
- to: - from:
- podSelector: - podSelector:
matchLabels: matchLabels:
app: "pod2" app: "pod1"
policyTypes:
- Egress
61
Purpose Deny from all other namespaces Allow from all other namespaces Allow from one
namespace
Use case Network connectivity isolation at Allowing access for service that has Allowing access for a service
namespace level. to be available for many from a particular namespace.
namespaces, for example: central
database
DNS in Kubernetes
Main DNS function in Kubernetes is providing name resolution and service discovery for K8s
services and Pods. Most popular DNS tool that is used in Kubernetes is CoreDNS
62
Any pods created by a Deployment or DaemonSet exposed by a Service have the following
DNS resolution available:
pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example.
Main DNS function in Kubernetes is providing name resolution and service discovery for K8s
services. Cluster can use one of the following dnsPolicies:
- Default - name resolution configuration is inherited from the node
- ClusterFirst - any queries for *.cluster.local domain are sent to
kube-dns Service, other queries are sent to the upstream server inherited from
node.
spec:
dnsPolicy: “None”
nameservers - define name servers that will be used for name queries (max 3)
searches - define domains in which Pod will search hostname queries (max 6)
options - list of optional objects
63
CoreDNS
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
CoreDNS has a pluggable architecture, plugins can be enabled using Corefile configuration
file.
https://coredns.io/plugins/
64
Ingress
An API object that manages external access to the services in a cluster, typically HTTP.
Ingress is a collection of load balancing, SSL termination and name-based virtual hosting
rules. Ingress requires 2 components: ingress controller and ingress object
manifests.
Nginx ingress controller can be installed using deployment manifest, instruction can be found
at: https://github.com/nginxinc/kubernetes-ingress/blob/master/build/README.md
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
spec:
defaultBackend:
service:
name: web
port:
number: 80
65
The above example can be used for example for nginx which will then use reverse proxy to
pass traffic to your apps. In that case only nginx will be available from the outside of your
cluster.
Hostname wildcards:
Host Host header Match?
66
Example manifest file (reverse-proxy, redirection based on url):
K8s Ingress config nginx config counterpart
Exact: Matches the URL path exactly and with case sensitivity.
Prefix: Matches based on a URL path prefix split by /. Matching is case sensitive and done
on a path element by element basis. A path element refers to the list of labels in the path
split by the / separator. A request is a match for path p if every p is an element-wise prefix of
p of the request path.
67
Example ssl certificate manifest files:
apiVersion: v1 server {
data: server_name example.com;
tls.crt: base64 encoded cert listen 80;
tls.key: base64 encoded key ssl_certificate /ssl/tls.crt;
kind: Secret ssl_certificate_key /ssl/tls.key;
metadata: location /
name: secret-web-ssl proxy_pass ttp://nginx;
type: Opaque }
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-web-ssl
spec:
tls:
- hosts:
- example.com
secretName: secret-web-ssl
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
68
Storage
Volumes
Volumes are external data sources that can be mounted inside of a pod. Kubernetes
supports many different types of volumes, each of them can be used by containers. Different
volume types behave in a different way. Each Container in the Pod must independently
specify where to mount each volume
awsElasticBlockStore
azureDisk
azureFile
cephfs
configMap
csi
downwardAPI
emptyDir
fc (fibre channel)
flocker
gcePersistentDisk
gitRepo (deprecated)
glusterfs
hostPath
iscsi
local
nfs
persistentVolumeClaim
projected
portworxVolume
quobyte
rbd
scaleIO
secret
storageos
vsphereVolume
Each volume type is mounted in a different way, instructions about mounting particular
volumes can be found at: https://kubernetes.io/docs/concepts/storage/volumes/
emptyDir:
- emptyDir is a temporary directory that is used for sharing data between containers in a
Pod.
- emptyDir lives as long as Pod is running on a k8s node
- By default Kubernetes is using a storage medium that is attached to a node, but there is
also a possibility to mount tmpfs (memory) for high speed access.
- emptyDir on host is stored at:
69
/var/lib/kubelet/pods/$PODUID/volumes/kubernetes.io~empty-dir/$VOLUM
ENAME
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
hostPath:
- hostPath is useful for getting access from Pod to different host internals, for example:
/var/lib/docker or /sys
- Pods on different nodes can act differently, depending on files content
- Files or directories created on node
70
awsElasticBlockStore
- Content of volume is not deleted with Pod termination - volume is just unmounted
- EBS volume needs to be created before mount
- K8s nodes needs to be EC2 instances located in the same region and availability zone as
EBS
- EBS volume can be mounted only to one EC2 instance
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /images
name: images-volume
volumes:
- name: images-volume
awsElasticBlockStore:
volumeID: $VOLUME_ID
fsType: ext4
gcePersistentDisk
- Content of volume is not deleted with Pod termination - volume is just unmounted
- Persistent disk volume needs to be created before mount
- K8s nodes needs to be GCE vms located in the same Project and zone as Persistent
volume
- Persistent disk can be mounted only to one GCE vm with r/w mode.
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- mountPath: /images
name: images-volume
71
volumes:
- name: images-volume
gcePersistentDisk:
pdName: images-volume
fsType: ext4
azureDisk
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
volumeMounts:
- mountPath: /images
name: images-volume
volumes:
- name: azure
azureDisk:
diskName: test.vhd
diskURI: https://<account>.blob.microsoft.net/vhds/test.vhd
Useful links:
https://github.com/kubernetes/examples/tree/master/staging/volumes/azure_disk
https://github.com/kubernetes/examples/tree/master/staging/volumes/azure_file
72
ConfigMaps
ConfigMaps in Kubernetes are used for storing different types of application configurations
(treat them as config files from /etc). ConfigMaps can be created using kubectl create
configmap <map-name> <data-source> command or based on ConfigMap manifest
file (recommended).
...
containers:
- name: nginx
image: nginx
volumeMounts:
- name: example-configmap-volume
mountPath: /example
volumes:
- name: example-configmap-volume
configMap:
name: example-configmap
---
apiVersion: v1
kind: ConfigMap
metadata:
name: example-configmap
data:
filename: "Content of example file"
- name: nginx
image: nginx
envFrom:
- configMapRef:
name: example-configmap
73
Example manifest snippet that mounts only one env variable from ConfigMap in container:
containers:
- name: nginx
image: nginx
env:
- name: ENV_VAR_NAME
valueFrom:
configMapKeyRef:
name: example-configmap
key: ENV1
---
apiVersion: v1
kind: ConfigMap
metadata:
name: example-configmap
data:
ENV1: "Content of ENV1 example variable"
ENV2: "Content of ENV2 example variable"
Secrets
Secrets are designed to store fragile security data as passwords, certificates or tokens. Main
purpose of secrets is storing security data as: db name, db password, tokens etc (remember
that all values are based). There are three types of secrets:
- docker-registry Create a secret for use with a Docker registry
- generic Create a secret from a local file, directory or literal value
- tls Create a TLS secret
Generic Secrets:
Example manifest snippet that mounts Secret as env variables in container:
...
containers:
- name: nginx
image: nginx
envFrom:
- secretRef:
name: api-credentials
---
74
apiVersion: v1
data:
PASSWORD: c2VjcmV0LXBhc3N3b3JkCg==
USER: YWRtaW4K
kind: Secret
metadata:
name: secrets
Container:
...
containers:
- name: nginx
image: nginx
envFrom:
- secretRef:
name: api-credentials
volumeMounts:
- name: secrets-volume
mountPath: /secrets
volumes:
- name: secrets-volume
secret:
secretName: index-html
Secret:
apiVersion: v1
data:
index.html: c29tZSBjb250ZW50
kind: Secret
metadata:
name: index-html
Coding/Encoding strings using base64 (all values are encoded in Secret manifests):
docker-registry Secrets
If You want to pull images from a private repository you need to authenticate to your docker
registry provider. You do so by using:
docker login
The login command creates or updates a config.json file that holds an authorization token.
75
cat ~/.docker/config.json
{
"auths": {
"asia.gcr.io": {},
"https://asia.gcr.io": {},
...
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.09.0 (darwin)"
},
"credsStore": "osxkeychain"
}
By default, Docker looks for the native binary on each of the platforms, i.e. “osxkeychain”
on macOS, “wincred” on windows, and “pass” on Linux. A special case is that on Linux,
Docker will fall back to the “secretservice” binary if it cannot find the “pass” binary. If
none of these binaries are present, it stores the credentials (i.e. password) in base64
encoding in the config files.
If you need access to multiple registries, you can create one secret for each registry.
Kubelet will merge any imagePullSecrets into a single virtual ~/.docker/config.json when
pulling images for your Pods.
76
docker-registry secrets can be attached to Your applications at 2 levels:
Pod level:
containers:
- name: private-registry-container
image: <your-private-image>
imagePullSecrets:
- name: regcred
ServiceAccount level:
apiVersion: v1
kind: ServiceAccount
metadata:
name: regcred
imagePullSecrets:
- name: regcred
accessModes:
ReadWriteOnce – Mount a volume as read-write by a single pod
ReadOnlyMany – Mount the volume as read-only by many pods
ReadWriteMany – Mount the volume as read-write by many pods
PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim1
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast
resources:
requests:
storage: 30Gi
77
StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
Deployment snippet:
...
containers:
- name: nginx
image: nginx
volumeMounts:
- name: dynamic-claim1-volume
mountPath: /claim1
volumes:
- name: dynamic-claim1-volume
persistentVolumeClaim:
claimName: dynamic-claim1
Features
NodeSelector
Using node selectors you can pin Pods to nodes, by adding labels to nodes and settings
node selectors on Pods.
Adding labels:
kubectl label node minikube disk=ssd
Configuring nodeSelector:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
78
spec:
nodeSelector:
disk: ssd
containers:
…
- requiredDuringSchedulingIgnoredDuringExecution
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values:
- production
2. Soft requirement - Even if the rules are not set, Pod can still be scheduled, this is just a
preference.
- preferredDuringSchedulingIgnoredDuringExecution
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
79
matchExpressions:
- key: env
operator: In
values:
- staging
Interpod affinity rules can influence scheduling based on the labels of other pods that are
already running.
Remember that each pod is running inside a namespace so affinity rules apply to the pods in
a particular namespace (if the namespace is not defined, the affinity rule will apply to the
namespace of a pod).
There are 2 types of interpod affinity rules:
1. Hard requirement
- requiredDuringSchedulingIgnoredDuringExecution
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: disk
operator: In
values:
- ssd
topologyKey: kubernetes.io/hostname
2. Soft requirement
- preferredDuringSchedulingIgnoredDuringExecution
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: storage
operator: In
values:
- ssd
topologyKey: kubernetes.io/hostname
80
When to use interpod affinity/anti-affinity rules?
Affinity Anti-affinity
Collocating 2 pods that are tightly connected Ensuring the pod is always scheduled
to each other and traffic between them once at node.
shouldn’t be sent through the network Example: CPU or memory demanding
(performance benefits). applications.
Example: Apps that are using redis cache.
By default in at least 2 node cluster, master has the following taint assigned by default:
node-role.kubernetes.io/master:NoSchedule
Which will not allow for scheduling on master nodes (by default pods are not using
tolerations).
tolerations:
- key: “key”
operator: “Equal”
value: “value”
effect : “NoSchedule”
81
operators:
- Equal
- Exists
effects:
- NoSchedule
- PreferNoSchedule
- NoExecute
Lifecycle Hooks
Lifecycle Hooks allow for attaching different types of actions to pod lifecycle. Hooks define
when some action should happen: after the container starts or before its termination.
Handlers say what to do: exec a command or send http requests.
Hooks:
PostStart - is executed right after container creation, in case of PostStart failure -
container is terminated and restarted. Until PostStart ends, POD is in Pending state.
PreStop - Is executed before POD termination. Until PreStop ends POD is in Terminating
state, Prestop needs to end before POD deletion signal is sent.
Handlers:
exec - runs a command to execute, resources used by this handler are counted as a
container resources
http - executes an HTTP request against a specific endpoint on the container
lifecycle: lifecycle:
postStart: preStop:
exec: exec:
command: ["/bin/sh", "-c", command: ["/bin/sh", "-c",
script.sh"] "script.sh"]
livenessProbe
Liveness probe checks pod health and restarts it when the probe fails.
If your container returns exit codes different than 0, you don’t need livenessProbe.
82
readinessProbe
Readiness probe test will make sure that at startup, the pod receives traffic only when the
test succeeds. Readiness probe is running during the entire Pods life.
startupProbe
Startup probe (if defined) is the first probe that will be executed. LivenessProbe will be
started after startupProbe successful execution (startup probe will not probe your app after
successful execution).
83
- name: nginx
image: nginx
readinessProbe:
httpGet:
path: /healthz
port: 81
initialDelaySeconds: 15
timeoutSeconds: 1
periodSeconds: 15
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
84
limits:
memory: "128Mi"
cpu: "500m"
Container snippet:
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
85
Example ResourceQuota manifest:
apiVersion: v1
kind: ResourceQuota
metadata:
name: quota
spec:
hard:
requests.cpu: 800m
requests.memory: 1800m
limits.cpu: 2400m
limits.memory: 1800Mi
LimitRange
LimitRange sets default requests and limits for all containers in a namespace that do not
have resource limits and requests defined.
apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
spec:
limits:
- default:
memory: 512Mi
cpu: 1
defaultRequest:
memory: 256Mi
cpu: 1
type: Container
QoS
Kubernetes distinguishes 3 classes of QoS, each class is tightly related to the resource
requests and resource limits. The first one has the highest priority, then the second one, and
third.
- Guaranteed (limits=requests)
- Burstable (limits≠requests)
- Best-Effort (no limits and no requests)
86
initContainers
Init Containers are a special kind of container - they are always started before app
containers. Init containers support all the fields and features of app Containers except:
- They are always run to completion
- Each one must complete successfully before the next one is started.
- Init Containers do not support readiness probes
Deployment snippet:
...
template:
metadata:
labels:
app: nginx
spec:
initContainers:
- name: clone-repo
image: busybox
command: ['sh', '-c', 'git clone ...']
containers:
...
Versioning constraints:
In Kubernetes 1.9 and later, Priority also affects scheduling order of Pods and
out-of-resource eviction ordering on the Node.
Pod priority and preemption have been moved to stable since Kubernetes 1.14 and are
enabled by default in this release and later.
Preemption - ability to kill lower priority Pods to schedule higher priority Pods.
87
Example assignment of PriorityClassName snippet:
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
priorityClassName: high-priority
Assigning priorityClassName to a Pod, also affects scheduling priority. Pods with higher
priority values can be taken from the scheduler queue before Pods with a lower priority.
apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Priority Class description"
Logging
Every string inside of the container that is redirected to /dev/stdout or /dev/stderr is
by default written to a json file (this is a default logging driver in Docker -
https://docs.docker.com/engine/admin/logging/overview) on the host..
Logging Architectures
Node-level logging
https://kubernetes.io/docs/concepts/cluster-administration/logging/
88
With that approach container logs are saved at
/var/log/containers/POD-NAME_NAMESPACE_CONTAINER-NAME.log at each node.
kubectl logs command read those files. By default, if a container restarts, the kubelet
keeps one terminated container with its logs. If a pod is evicted from the node, all
corresponding containers are also evicted, along with their logs.
Log rotation
Kubernetes by itself is not managing log rotation at node level. Log rotation is configured at
cluster deployment step, it can be configured using logrotate (log rotations is executed at
every hour and when log file exceeds 10MB) or container runtime settings.
flag description
There a 2 cases in which kubectl logs command will not show proper logs:
- Logging drivers are used (they redirect logs from STDOUT and STDERR to files,
external hosts, logging systems or databases)
- Process launched in a container sends output logs to files, the workaround to send
them to /dev/stdout or /dev/stderr is to create a soft link between a log file
and /dev/stdout or /dev/stderr.
89
https://kubernetes.io/docs/concepts/cluster-administration/logging/
This approach assumes installing a logging agent on every node (with DaemonSets) and a
central log collector that gathers logs streamed from all agents. Popular Kubernetes logging
agents: Fluentd, Logstash, GrayLog, Fluent Bit. Logging-agent gathers logs from the
container log file and sends them to the logging backend before they will be rotated by
logrotate.
https://kubernetes.io/docs/concepts/cluster-administration/logging/
90
Sidecar container with logging agent
https://kubernetes.io/docs/concepts/cluster-administration/logging
https://kubernetes.io/docs/concepts/cluster-administration/logging/
91
Example HPA manifest V1 (CPU):
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: php-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 80
92
Example HPA manifest V2beta2 (CPU):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi
93
Security
Security basics
1. Use Secrets
2. Kubernetes API - Enable Role Based Access Control
3. kube-apiserver --authorization-mode=RBAC,...
4. Use apparmor and seccomp
5. Run malicious applications in Sandboxes (gvisor & katacontainers)
6. Containers hardening:
- reduce attack surface
- run as non-root user
- set read-only filesystem
7. Scan your images for CVE using Anchore, Clair or trivy
8. Pods - Configure Pod Security Policies
9. kube-apiserver --enable-admission-plugins=PodSecurityPolicy..
10. Enable mtls (ServiceMesh)
11. Check workloads during the runtime (Falco)
12. Secure Your Dashboard access (kubectl proxy or kubectl port-forward)
13. Set NetworkPolicies that allow only for needed communication
14. Use CIS Benchmarks (kube-bench)
15. Verify platform binaries (sha512sum)
16. Disable ServiceAccount token mount (automountServiceAccountToken: false)
17. API restrictions:
- Don’t allow for anonymous access
- Close insecure port
- Don’t expose API to the external world
- Use RBAC
- Restrict access from nodes (NodeRestriction admission controller)
18. Encrypt etcd at rest
19. Use OPA to validate users configuration.
20. Run immutable containers (remove shells, make filesystem read-only, run container
as non-root)
21. Enable Auditing.
Useful links:
https://kubernetes-security.info/
https://www.cisecurity.org/cis-benchmarks/
Pentests
- docker-bench-security (https://github.com/docker/docker-bench-security)
94
- kubernetes-security-benchmark
(https://github.com/mesosphere/kubernetes-security-benchmark)
- kubernetes-cis-benchmark (https://github.com/neuvector/kubernetes-cis-benchmark)
- kube-bench (https://github.com/aquasecurity/kube-bench)
- kube-hunter (https://github.com/aquasecurity/kube-hunter)
SecurityContexts
Security context defines privileges and access control settings for Pods and containers.
Using security contexts you can define settings as user or group ids that will be assigned to
the main container process, or the supplemental group that owns pod’s volumes.
apiVersion: v1
kind: Pod
metadata:
name: security-context-pod
spec:
securityContext:
runAsUser: 1001
runAsGroup: 3000
fsGroup: 2000
containers:
- name: security-context-container
image: ubuntu
command: ["/bin/sh","-c","sleep infinity"]
volumeMounts:
- name: sec-vol
mountPath: /data
securityContext:
allowPrivilegeEscalation: false
runAsUser: 1000
volumes:
- name: sec-vol
emptyDir: {}
Notice that some options can be defined at pod and container level. In that case
configuration defined at container level takes precedence.
ubuntu@security-context-pod:/$ id
uid=1000 gid=3000 groups=3000,2000
95
supplementalGroups - A list of groups applied to the first process run in each container,
in addition to the container's primary GID
sysctls - hold a list of namespaced sysctls used for the pod
Security context fields that are available only at container level:
allowPrivilegeEscalation - controls whether a process can gain more privileges than
its parent process (set to true when container runs as Privileged or has CAP_SYS_ADMIN
capability)
capabilities - linux capabilities attached to container (add or drop)
privileged - defines container permissions, privileged equals to root on the host.
procMount - denotes the type of proc mount to use for the containers
readOnlyRootFilesystem - whether this container has a read-only root filesystem
Security context fields that are available at Pod and container level:
runAsGroup - defines group that will be assigned to all containers in the pod
runAsNonRoot - indicates that the containers must run as a non-root user
runAsUser - overwrites user ID defined at image level
seLinuxOptions - SeLinux context that will be applied (by default random SELinux context
will be applied)
seccompProfile - seccomp options used by container
windowsOptions - windows specific options
Troubleshooting
Troubleshooting Pod errors
ImagePullBackOff error - most commonly issue with accessing docker registry. There
are three primary culprits besides network connectivity issues:
- The image tag is incorrect
- The image doesn't exist (or is in a different registry)
- Kubernetes doesn't have permissions to pull that image
There is no observable difference in Pod status between a missing image and incorrect
registry permissions. In either case, Kubernetes will report an ErrImagePull status for the
Pods.
CrashLoopBackOff tells us that Kubernetes is trying to launch this Pod, but one or more of
the containers is crashing or getting killed (kubectl logs).
96
secret-pod 0/1 ContainerCreating 0 40m
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 2s (x2 over 20s) kubelet, minikube Error:
configmaps "test-config" not found
Missing Secret (ContainerCreating) - Secret declared in Pod specification but not existing
in API.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 24m (x11 over 30m) kubelet, minikube
MountVolume.SetUp failed for volume "test-secret" : secrets "test-secret"
not found
If a pod is stuck in Pending it means that it can not be scheduled onto a node. Generally this
is because there are insufficient resources of one type or another that prevent scheduling
(kubectl describe). You can also check Dashboard views.
Troubleshooting Services
Issues with resolving Services names:
Could not resolve host: <servicename>
How to debug it?
kubectl get svc
kubectl get endpoints
kubectl exec -ti POD-UID nslookup <servicename>
kubectl run client --image=appropriate/curl --rm -ti --restart=Never
--command -- curl http://<servicename>:80
If the error persists, that means that there is an issue with kube-dns .
Or check your service configuration using Kubernetes dashboard
97
- systemctl and journalctl for systemd
- service and initctl for upstart
Here are the locations of the relevant log files (on systems without systemd):
Master
/var/log/kube-apiserver.log - API Server, responsible for serving the API
/var/log/kube-scheduler.log - Scheduler, responsible for making scheduling
decisions
/var/log/kube-controller-manager.log - Controller that manages replication
controllers
Worker Nodes
/var/log/kubelet.log - Kubelet, responsible for running containers on the node
/var/log/kube-proxy.log - Kube Proxy, responsible for service load balancing
Troubleshooting commands
General:
kubectl get <resource> (--all-namespaces)
kubectl get <resource>/<resourcename> -o=yaml
kubectl get <resource>/<resourcename> (-o=wide or -o=yaml or
-o=json)
kubectl describe <resource>/<resourcename>
Cluster:
kubectl api-resources or api-versions
98
kubectl cluster-info (dump)
kubectl get componentstatuses
kubectl get nodes or pods
kubectl top nodes or pods
kubectl get events
kubectl client and server version:
kubectl version
deployment:
kubectl describe deployment/<deployname>
kubectl describe replicaset/<rsname>
Pod:
kubectl get pods
kubectl describe pod/<podname>
kubectl logs <podname> (--previous) (-f)
kubectl exec -it <podname> -- command
RBAC:
kubectl auth can-i verb resource
Service:
kubectl get svc
kubectl get endpoints
kubectl exec -ti POD-UID nslookup <servicename>
kubectl run client --image=appropriate/curl --rm -ti --restart=Never
--command -- curl http://<servicename>:80
99
- Google Kubernetes Engine (GKE - check
https://cloud.google.com/kubernetes-engine/docs/tutorials/)
- Azure Kubernetes Engine (AKS -
https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-prepa
re-app)
100
Helm packages repository: https://github.com/helm/charts
Helm Hub: https://hub.helm.sh/
Installing tiller:
helm init
values.yaml
nginx:
image: nginx
In templates (manifests):
{{ .Values.nginx.image }}
101
Dynamically adding files as ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: index-html
data:
index.html: |
{{ .Files.Get "index.html" | indent 4}}
values.yaml
nginx:
dockerImage: nginx
pullPolicy: Always
envVars:
DB: mysql
DB_PASS: password
ConfigMap:
apiVersion: v1
data:
{{- range $key, $value := .Values.nginx.envVars }}
{{ $key }}: {{ $value | quote }}
{{- end }}
kind: ConfigMap
metadata:
name: nginx-env-vars
values.yaml
nginx:
creds:
DB: mysql
DB_PASS: password
Secret:
apiVersion: v1
kind: Secret
metadata:
name: nginx-credentials
{{- range $key, $value := .Values.nginx.creds }}
{{ $key }}: {{ $value | toString | quote | b64enc }}
{{- end }}
102
Notes
103
104