Kubernetes 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Kubernetes

pocket guide
Arye Afshari
Mohsen Shojaei Yegane

Information-sha256:026c0899de80acf5058d3c6add117633058460b7317d731d2e911bf70b9e5c25 Learnk8s.
Arye afshari
The Kubernetes Pocket Guide is a small and easy-to-use document that helps you
understand Kubernetes better. Inside this booklet, we have taken great care to gather
and explain all the important ideas and knowledge about Kubernetes in a simple way.
Whether you're just starting out or already have experience with it, this guide will be
your helpful companion. It provides clear explanations and makes it easier for you to
learn the basics of Kubernetes

Sponsored by learnk8s, this booklet is offered freely to the public.


Learnk8s, an esteemed educational platform, specializes in Kubernetes
training courses, workshops, and educational articles. Additionally, this
booklet has another standardized format that was produced for learnk8s

Note: The content of this booklet is written based on Kubernetes version 1.25
Table of Contents
.

Core Concept Lifecycle Management


—————————— ———————————————
4 Kubernetes
18 Configmap, Secret
5 Kubernetes Architecture
19 Init Container
6 Methods of building k8s cluster
19 Pod Lifecycle
6 Kubectl
20 Sidecar Container
7 Pod
21 Rollout & Rollback
8 Workload
22 Probes
8 Deployment
23 Node Maintenance
9 Namespace
24 Cluster upgrade
9 Resource quota, Limit range
25 Backup & Restore
9 Resource requirements & Limit
10 Service
11 Endpoint
11
16
Dns
Daemonset
Security
——————
16 Static pod
26 Security
17 Autoscaling (HPA ,VPA)
27 Authentication
20 Job & Cronjob
27 Authorization (RBAC)
36 Statefulset
27 Admission control
36 Headless service
28 Service Account
37 Statefulset & storage
29 Api groups
29 Kubeconfig
30 Authentication with X509
Scheduling 31 Auditing
—————————— 31 RuntimeClass
32 Network Policy
12 How scheduling works? 32 Security Context
12 Label & selector 32 Image security
12 Annotations 33 Gatekeeper
12 Node selector

Storage
13 Affinity & anti-affinity
13 Taint & toleration
——————
14 Taint/tolerate & node affinity
15 Priority class & preemption 34 HostPath volume
15 Pod distribution budget 34 EmptyDir volume
15 Bin packing 34 Persistent volume(pv) & pvc
35 Static & Dynamic provisioning

Addons
——————
38 How to deploy an application in k8s?
38 Kustomize
38 Helm
39 Operator
40 Ingress
41 Cert-Manager
026c0899de80acf5058d3c6add117633058460b7317d731d2e911bf70b9e5c25 ‫אריה אפשארי‬
4
Kubernetes(k8s)
Kubernetes, also known as K8s, is an open-source platform for managing containerized workloads and services. It provides a way to deploy, scale, and manage
containerized applications across a cluster of nodes. Kubernetes was originally developed by Google and is now maintained by the Cloud Native Computing
Foundation (CNCF)

Kubernetes provides a set of powerful abstractions and APIs for managing containerized applications and their dependencies in a standardized and consistent
way. It allows you to declaratively define your application's desired state in the form of a set of Kubernetes objects (such as pods, services, deployments, config
maps, and many others), and then Kubernetes takes care of actually running and managing those objects on a cluster of machines

StatfulSet ConfigMap Depoyment Service

Kube-api Depoyment Secret Service Ingress


————————>
————————>
Etcd C-M Sched ServiceAccount Pvc StorageClass Pvc

Master Node Worker Node Kubelet CRI


you create YAML manifests that describe the
Kubernetes Cluster
Kubernetes objects that make up your application

Why you need Kubernetes and what it can do?


Simplify container management: Kubernetes provides a unified API for managing Improve scalability: Kubernetes makes it easy to scale containerized applications up or
containers, making it easier to deploy and manage containerized applications down based on demand, ensuring that applications can handle increased traffic or demand
across multiple hosts or cloud providers without downtime or disruption

Enhance resiliency: Kubernetes provides built-in fault tolerance and self-healing Increase automation: Kubernetes automates many of the tasks involved in deploying and
capabilities, which can help keep applications running even in the face of managing containerized applications, such as rolling updates, scaling, and load balancing.
hardware or software failures This can help reduce the burden on operations teams and improve efficiency

Simplify application deployment: Kubernetes provides a consistent way to deploy Provide flexibility: Kubernetes is highly configurable and extensible, allowing developers
and manage containerized applications across different environments, such as and operations teams to customize it to meet their specific needs. This includes support
on-premises data centers or public cloud providers for different container runtimes, storage systems, and networking plugins

Kubernetes allows you to choose the Container Runtime Interface (CRI), Container Network Interface (CNI), and Container Storage Interface (CSI) that
you want to use with your cluster

The CRI is a standardized interface between Kubernetes and the container runtime that is responsible for WebApps & Services
starting and stopping containers. The CRI abstracts away the details of the container runtime, allowing
Kubernetes to work with any container runtime that implements the CRI interface. This makes it possible to use Service Management
Orchestration

different container runtimes on different nodes in the same cluster, or to switch to a different container runtime
Scheduling
without having to modify your applications or infrastructure
Resource Management
The CNI is a standard for configuring network interfaces for Linux containers. Kubernetes uses a CNI plugin to
configure the network interfaces for the containers running on your cluster. The CNI plugin is responsible for Container Runtime Container Network Container Storage Container Runtime Container Network Container Storage

setting up the network namespace for the container, configuring the IP address and routing, and setting up any
Machine & OS Machine & OS
necessary network policies or security rules. By using a CNI plugin, Kubernetes makes it easy to switch between
different networking solutions or to use multiple networking solutions in the same cluster Machine infrastructure

The CSI is a standard for exposing storage systems to container orchestrators like Kubernetes. Kubernetes uses a CSI driver to interact with the underlying storage system. The CSI driver is
responsible for managing the lifecycle of the storage volumes used by your applications, including creating, deleting, and resizing volumes. By using a CSI driver, Kubernetes makes it easy to
use a wide range of storage systems with your applications, including cloud-based storage solutions, on-premises storage systems, and specialized storage solutions for specific use cases

Container orchestration is the process of managing, deploying, and scaling containers in a distributed environment. It involves automating the deployment and management of containerized
applications across a cluster of hosts, and ensuring that the containers are running as expected. Container orchestration systems typically provide features such as container scheduling, load
balancing, service discovery, health monitoring, and automated scaling based on demand. Today, Kubernetes is the most popular container orchestration platform used globally

k8s Cluster is a set of nodes that work together to run containerized applications. The nodes can be virtual or physical machines, and
they typically run Linux as the operating system. The cluster consists of two main types of nodes:

Master Node(s): The master node is responsible for Worker Node(s): The worker nodes, also known as worker or minion nodes, Master Node Master Node
managing the overall state and control of the cluster. are responsible for executing the actual workloads and hosting the containers. Etcd C-M Sched Etcd C-M Sched

Kube-api Kube-api

Kubelet CRI Kubelet CRI Kubelet CRI


Worker Node Worker Node Worker Node

Kubernetes cluster
026c0899de80acf5058d3c6add117633058460b7317d731d2e911bf70b9e5c25
5
Kubernetes Architecture:

Kubernetes is built on a master-worker architecture. The master node is responsible for managing the overall state of the cluster, while the worker nodes
run the actual application workloads. The components of the Kubernetes master node include the API server, etcd, scheduler, and controller manager.
The worker nodes run the kubelet, kube-proxy, and the container runtime

The cluster control plane


Decides witch node should
Run Kubernetes controllers be used for each Pod
Worker Node

>
Master

>
>

Linux

Kube-controller-manager Kube-scheduler

Worker Node

>
>
Etcd Kube-apiserver

>
>

>
Key-value database used as backing Allows interacting
store for all cluster configuration data with the control plane Linux

Maintains network rules on nodes


Manage containers on node
Run containers on node

The kube-apiserver is the control plane component that serves as the primary management The kubelet is the primary node agent that runs on each worker node in the Kubernetes
entity for the cluster. It handles all communication and authentication, and controls all other cluster. It is responsible for managing and monitoring the state of containers running on
components of the cluster. Additionally, the kube-apiserver is also responsible for monitoring the node, as well as ensuring that the containers are healthy and running as expected.
and controlling the state of the cluster, making sure that all components are running as The kubelet communicates with the kube-apiserver to receive instructions on which pods
expected. to run on the node, and reports back to the master node with updates on the status of the
containers and their health. Additionally, the kubelet also manages the networking and
Etcd is a distributed key-value database that is used by Kubernetes to store cluster state storage configurations for the containers running on the node
data. It is responsible for maintaining the configuration details of the Kubernetes cluster and
is the only component that interacts directly with the kube-apiserver. etcd provides a reliable
The Kube-proxy is responsible for managing the networking and routing configurations
and highly available data store for Kubernetes, ensuring that the cluster can recover quickly
for services within the cluster. In Kubernetes, a service functions as an abstraction layer
from failures and maintain consistency across all nodes.
that facilitates communication between pods in the cluster. When a service is established,
Kubernetes generates a set of iptables rules on each node within the cluster. Managed by
The kube-scheduler is responsible for assigning newly created pods to nodes in the cluster. It kube-proxy, these rules enable traffic to be accurately directed to the appropriate pods
reads the list of unassigned pods from etcd and, using a variety of algorithms and configurations, associated with the service, irrespective of the node they operate on. This ensures that
determines which node each pod should run on. Once it has made its decision, the kube-scheduler communication between the pods and services is both reliable and efficient.
informs the kube-apiserver, which in turn communicates with the kubelet on the chosen node to
start the pod's containers and begin running the workload.
The container runtime is responsible for running containers on each node in the cluster. The
container runtime is a software component that manages the lifecycle of containers, including
The kube-control-manager is a collection of controllers that manage various aspects of the pulling container images from a registry, creating and starting containers, monitoring their
Kubernetes cluster. These controllers include the node controller, which watches the state of health, and stopping or deleting them when they are no longer needed.
nodes in the cluster and takes actions to ensure that nodes are stable and healthy. For example,
if a node fails, the node controller will take actions to ensure that the workloads running on
the failed node are rescheduled onto other nodes in the cluster. Other controllers in the kube-
control-manager include the replication controller, endpoint controller, and service account
and token controllers, which manage other aspects of the cluster such as scaling, networking,
and security

Kubernetes components can be run in a Kubernetes cluster as containers or system-level services, depending on their requirements and the needs of the cluster.
In general, Kubernetes components that require access to system resources or need to run on the node itself (such as the kubelet and kube-proxy) are run as system-level
services on each node. Components that do not require direct access to system resogurces and can be run in a container (such as the API server, etcd, kube-scheduler, and
kube-controller-manager) are typically deployed as containers in pods
6

Methods of building a Kubernetes cluster:

There are several ways to build a k8s cluster, depending on your requirements and the resources you have available. Here are some common approaches:

Self-hosted Kubernetes cluster: In this approach, you set up and manage your own Kubernetes cluster on your infrastructure. This requires expertise in Kubernetes and
infrastructure management, but gives you full control over the environment. You can use tools like kubeadm, kops, Rancher, kubespray to set up and manage the cluster.
This approach can be a good fit if you have specific security or compliance requirements, or if you need to customize the environment to your needs.

Cloud-hosted Kubernetes cluster: Most cloud providers offer managed Kubernetes services, such as Amazon EKS, Google Kubernetes Engine (GKE), or Microsoft Azure
Kubernetes Service (AKS). With this approach, the cloud provider manages the underlying infrastructure and Kubernetes control plane, while you manage the worker nodes
that run your applications. This approach can be more cost-effective and reduces the operational overhead of managing your own infrastructure. It's a good fit if you're
already using a cloud provider and want to leverage their managed Kubernetes service.

Cluster as a Service: Cluster as a Service (CaaS) is a cloud-based service that lets you create and manage Kubernetes clusters without worrying about the underlying
infrastructure. Providers like DigitalOcean, Linode, and Platform9 offer CaaS solutions that simplify the process of creating and managing Kubernetes clusters. With this
approach, you get the benefits of managed Kubernetes services without being tied to a specific cloud provider.

Containerized Kubernetes: You can run k8s as a containerized application on your infrastructure or in the cloud. This approach is useful for development and testing
environments, as it lets you spin up a Kubernetes cluster quickly and easily. You can use tools like Minikube, or KinD to create containerized Kubernetes clusters.

In summary, there are several ways to build a k8s cluster, each with its own benefits and trade-offs. The approach you choose will depend on your specific needs and constraints.

How to connect to a Kubernetes cluster


To connect to a Kubernetes cluster, you usually use kubectl. kubectl is a powerful and flexible command-line tool for managing Kubernetes clusters, providing a
simple and consistent interface for interacting with Kubernetes resources and performing operations on the cluster

When a user runs a kubectl command, kubectl sends an HTTP request to the Kubernetes API server using the API endpoint specified in the kubectl configuration
file. The API server then processes the request, performs the requested operation, and returns a response to kubectl.

The API server uses authentication and authorization mechanisms to ensure that only authorized users can access and modify resources in the cluster.
By default, kubectl uses the credentials and configuration information stored in the .kube/config file to authenticate and authorize requests to the API server

K8s uses a configuration file called "kubeconfig" to store information about how to connect
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get node
to a Kubernetes cluster. This file contains information about clusters, users, and contexts
If a configuration file is not present in the ~/.kube directory, we must pass it each
time we run a command. To avoid this inconvenience, we can follow these steps
apiVersion: v1
kind: Config An example kubeconfig file
mkdir -p $HOME/.kube
sudo scp user@cluster-ip/etc/kubernetes/admin.conf $HOME/.kube/config
clusters: sudo chown $(id -u):$(id -g) $HOME/.kube/config
- name: k8s-st1
cluster: provides information about a Kubernetes cluster. Each cluster configuration
certificate-authority-data: <certificate data> includes the cluster name, server URL, and any necessary authentication
server: https://127.0.0.1:41285 used to display the current kubeconfig file. It shows all
information such as a certificate authority of the clusters, users, and contexts defined in the file
kubectl config view
users:
- name: arye provides information about a user that can authenticate to a k8s cluster.
user: Each user configuration includes the user name and any necessary
client-certificate-data: <certificate data> a third-party utility that can be used to switch
authentication information such as a client certificate and key
client-key-data: <key data> kubectx between contexts defined in the kubeconfig file

contexts:
- name: arye@k8s-st1 specifies a cluster and a user to use when connecting to a k8s cluster. Each
context: context configuration also includes an optional namespace that specifies the
cluster: k8s-st1
default namespace to use when executing commands against the cluster
user: arye
namespace: dev
K8s Cluster
This field specifies the default context to use when executing "kubectl" commands Master Node
current-context: arye@k8s-st1 Worker Node
POST requests
>Kubectl ————————————> Kube-api Worker Node
REST API Call

kubeconfig
You can use autocompletion for kubectl in zsh and bash

This script provides auto-completion support for kubectl commands and flags when using the zsh shell with the Oh My Zsh framework
kubectl completion zsh > ~/.oh-my-zsh/custom/plugins/kubectl.plugin.zsh
Once the script is generated and saved in the appropriate directory, you can …
plugins=(
enable it by adding kubectl to the plugins array in your ~/.zshrc config file git
kubectl
)

To generate a shell completion script for the bash shell, you can use the following command
kubectl completion bash > /etc/bash_completion.d/kubectl

Having access to the cluster configuration file can potentially allow an attacker to view, modify, or delete resources in the cluster, as well as perform other
malicious actions. Therefore, it is important to ensure that access to the cluster configuration file is tightly controlled and restricted to only those who need it
7
Kubernetes objects

Ingresses

s
vice
Co
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.A Pod is a group of

Ser
nf
igM

d
Po
ap
Se
ets

s
c
aS

re
lic

ts
one or more containers, with shared storage and network resources, and a specification for how to run the containers
ep
Jo R
bs
ents
Cro loym
nJo Dep
bs

Kubernetes
s
StatefulSet
Pods encapsulate and manage application processes and are created using a pod specification, which describes the desired state of the PersistentVolumes
DaemonSets

objects
StorageClass
pod, including the containers to run, the network configuration, and any storage volumes to use. Pods are scheduled to run on nodes in the ntVolu
meCla
ims Hori
zont
alPo
Persiste dAut

cluster by the Kubernetes scheduler and can be managed using labels and selectors to group and organize them based on their attributes
osca
Re
so le rs
u rce
Lim Qu
le ota
Ro g itR
an s
din ge
Bin ole
s

Ne
ole

ing
rR

tw
R e
st

ind
Pod

or
When creating a pod , you can specify various settings for the pod and the

kP
Clu

leB
. . .

oli
Ro
To run a container, it must be part of a pod. This means that containers

cie
s
r
ste
containers running in it. Here's an example YAML manifest that creates a pod Container

Clu
cannot be directly brought up in the cluster without being part of a pod

pod-def.yml
apiVersion: v1
kind: Pod specifies which version of the Kubernetes API is
used to create and manage the resource kubectl create -f pod-df.yml The kubectl create -f command is used to create a Kubernetes resource from a YAML file
metadata:
name: pod-nginx specifies the type of Kubernetes resource
namespace: default This field contains metadata about the pod, such as k is a alias for k get pods -o wide -w
labels: its name, namespace,labels, annotations,.. . the kubectl NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
app: nginx pod-nginx 1/1 Running. 0 7m53s 10.244.83.193 kubeworker-1 <none> <none>
type: front-end
spec: The spec field contains the specification for the pod, including
containers: the containers to run, the network configuration, and any
To check the status of a pod , you can use the kubectl describe pod command and check the Events section. This
- name: nginx-container storage volumes to use,…
image: nginx:1.18 section shows a list of events related to the pod, including the time of occurrence, type of event, and a description
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#pod-v1-core of the event. This information can be useful for monitoring and troubleshooting issues with the pod

kubectl explain command provides detailed information about Kubernetes API resources. It arye@arye-dev : kubectl describe pods pod-nginx
allows you to view the structure, properties, and possible values of any Kubernetes resource …
Events:
kubectl explain pods Type Reason Age From Message
Kubectl explain pods.spec.containers ---- ------ ---- ---- -------
Normal Scheduled 8m4s default-scheduler Successfully assigned default/pod-nginx to kubeworker-1
Warning Failed 8m1s kubelet Error: ErrImagePull
kubectl delete -f pod-df.yml Normal BackOff 8m1s kubelet Back-off pulling image "nginx:1.18"
Warning Failed 8m1s kubelet Error: ImagePullBackOff
this command is used to delete a k8s resource that was created using a YAML file
Normal Pulling 7m46s (x2 over 8m3s) kubelet Pulling image "nginx:1.18"
In Kubernetes, if you update a YAML file and want to apply the changes to a running Normal Pulled 3m58s kubelet Successfully pulled image "nginx:1.18"
Normal Created 3m58s kubelet Created container nginx-container
pod, only a few fields can be updated, and you cannot update all fields in the YAML file. Normal Started 3m57s kubelet Started container nginx-container
If you make changes that affect fields outside the scope of updateable fields, you must
delete the pod and then apply the new YAML file to create a new pod

Process of creating a pod in Kubernetes:

0 Define Pod specification : This involves creating a pod manifest yaml file that defines the pod properties like name, labels, containers, volumes etc.

1 Authentication and Authorization: When a request to create a pod is sent to Kubernetes through kubectl or the Kubernetes API, the kube-api module first authenticates the request
and then checks for the necessary permissions or authorizations to create the pod.

2 Manifest Syntax Check: If the authentication and authorization processes are successful, kube-api checks the manifest file associated with the pod for syntax errors. This ensures
that the manifest file is well-formed and adheres to the Kubernetes API schema.

3 Writing to etcd: If the syntax check is successful, kube-api writes the pod's manifest file to etcd

4 Pod Scheduling: The scheduler is responsible for assigning pods to nodes in the cluster based on resource availability and other factors. The scheduler continuously monitors the
cluster for new pods and nodes and attempts to schedule the pods to run on the available nodes.

5 Reporting to API: The scheduler requests unassigned pods from the Kubernetes API and selects a node to assign the pod to. The scheduler then reports the selected node back to
the API, which updates etcd with this information.

6 Sending Creation Request to Kubelet: Once the API updates etcd with the selected node information, it sends a creation request to kubelet, the agent running on each node
responsible for running the pod. Kubelet then starts the process of creating the pod on the selected node, pulling the necessary container images and starting the containers.

7 Pod Status Update: As the pod is being created, kubelet updates the pod status in etcd to reflect the current state of the pod. This includes information such as the pod's phase,
container statuses, and IP address.

0 Request to create a pod is sent to Kubernetes


3 kube-api writes the pod's manifest file to etcd
kube-api authenticates the request
1
kube-api checks authorizations
K get pod

NAME READY STATUS RESTARTS AGE IP NODE The scheduler finds a new unassigned pod on etcd and
pod-nginx 0/1 Creating 0 1m - - 4
2 kube-api checks the manifest file associated attempts to schedule it to run on an available node
with the pod for syntax errors
<——————————————————————
>Kubectl —————————> ————————————————————————> scheduler continuously monitors etcd
——————————————————————>
kubectl apply -f pod-df.yml
———————
<—————————
<————————————————————

K 5
The scheduler selects a node to assign the pod to it
The scheduler reports the selected node to the kube-api
apiVersion: v1
kind: Pod u
metadata: b
name: pod-nginx
e
<————————————————————————————————————————————————————————
namespace: default
-
————————————————————————————— NAME READY STATUS RESTARTS AGE IP NODE
labels: 5 Kube-api updates etcd with changes pod’s status filed 7
pod-nginx 1/1 Running 0 7h17m 10.244.83.193 kubeworker-1
app: nginx
type: front-end a NAME
pod-nginx
READY
0/1
STATUS
Pending
RESTARTS
0
AGE
1m
IP
-
NODE
kubeworker-1
The assignment of IP address to Pod is handled by the CNI
spec:
containers:
p 6 Kubelet pulls the container images and starting the containers, updated pod status
- name: nginx-container i ———-———————- arye@arye-dev : kubectl describe pods pod-nginx
——————-

——————-

image: nginx:1.18 …
6 Kube-api sends a pod creation request to kubelet Events:
Type Reason Age From Message
——————————————————————————————> ---- ------ ---- ---- -------
Normal Scheduled 1m4s default-scheduler Successfully assigned default/pod-nginx to kubeworker-1
<—————————————————————————————— Normal Pulling 1m46s kubelet Pulling image "nginx:1.18"
————————————————————————————— Normal Pulled 1m58s kubelet Successfully pulled image "nginx:1.18"
7 kube-api reads pod information from kubelet and updates the pod status in etcd
Normal Created 2m kubelet Created container nginx-container
kubeworker-1
———-———————- Normal Started 2m kubelet Started container nginx-container
8
Workloads
Workload object is a resource that defines how to run a containerized application or a set of containerized applications in a cluster. Workload objects are used to manage the
deployment, scaling, and management of containerized applications within a Kubernetes cluster.
The most basic workload object in Kubernetes is the Pod, which represents a single instance of a running container. However, managing Pods directly can be complex and
error-prone, which is why Kubernetes provides higher-level workload objects that abstract away the details of Pod management.

Kubernetes workload objects

Horizontal Pod Autoscaler Deployment CronJob


Scales the number of Pods > Creates a ReplicaSet and takes Creates Jobs based on a time
based on various metrics
> care of rollouts and rollbacks schedule

>

>
StatefulSet Replicaset Job DaemonSet
Creates Pods while handling the > Creates the desired amount of Creates short living Pods for Creates exactly one Pod per
needs of stateful applications Pod instances one time executions Node

>
Pod
> Smallest k8s compute resources <
containing 1..n containers

workload objects in Kubernetes provide a declarative and automated way of managing containerized applications in a cluster. By defining the desired state of your application
using workload objects, Kubernetes can handle the details of creating, scaling, and updating the underlying pods that run your application

If a pod is deleted, the system does not automatically recreate it because there is no pod controller in `kube-control-manager`. Therefore, even if you have only one pod, it is better
to place it as a subset of a new object called a ReplicaSets that it can be managed by the replication controller. This tool can automatically perform load balancing and scaling.

ReplicaSet is a k8s object that ensures a specified number of replica Pods are running at all times. If a Pod The ReplicaSet controller continuously monitors the state of the cluster and compares it to the
desired state specified in the ReplicaSet definition. If there are fewer replicas than the desired
managed by a ReplicaSet fails or is deleted, the ReplicaSet will automatically create a new replica to replace it number, the controller will create new replicas to bring the cluster back to the desired state. If
Groups
there are more replicas than the desired number, the controller will delete the excess replicas
apiVersion: apps/v1 rs-def.yml ——————————————
——————

——————
kind: ReplicaSet
ReplicaSet

replication controller
metadata: It is recommended to use kubectl apply instead of kubectl create
name: replicaset-nginx replica: 3
spec: kubectl apply -f rs-def.yml selector:
template: kubectl describe rs replicaset-nginx app: nginx
metadata:
Master-node
labels: ——————————————
This section is the same as defining a pod, app: nginx K get rs
NAME DESIRED CURRENT READY AGE
and if a pod is deleted, it can be recreated
spec: replicaset-nginx 3 3 3 13s
containers: ————————————————— ————————

——————
——————
——————
based on this template
——————

- name: nginx
image: nginx:1.17 ReplicaSet is designed to ensure that the current state of the cluster matches the Pod Container Pod Container Pod Container

replicas: 3 desired state specified in its definition. The desired state is defined by the number Labels: Labels: Labels:
The pods that are subsets of this ReplicaSet app: nginx app: nginx app: nginx
selector: of replicas of a specific Pod template that should be running at any given time
must have a label with app: nginx matchLabels:
app: nginx Node 1 Node 2
————————————————— ————————

K get pod The selector section of the ReplicaSet definition specifies that the pods managed by this ReplicaSet should have a label
NAME READY STATUS RESTARTS AGE IP NODE
pod-nginx 1/1 Running 0 7h17m 10.244.83.193 kubeworker-1 with key app and value nginx. Since there is already a pod running with the label app: nginx, the ReplicaSet will select it
replicaset-nginx-bnrcs 1/1 Running 0 23s 10.244.83.195 kubeworker-1
replicaset-nginx-hm26d 1/1 Running 0 23s 10.244.83.194 kubeworker-1 as part of its subset and will only create the remaining two replicas to meet the desired number of 3 replicas.

you can scale the number of replicas of a ReplicaSet using the kubectl scale command, or by updating
the replicas field in the ReplicaSet manifest and applying the changes using kubectl apply kubectl scale replicaset=3 rs.def.yml —replicas=6

Deployment is a powerful higher-level abstraction that enables you to manage the desired state of your application in Kubernetes. It ensures that a specified number of
replicas of your application are always running, by creating and managing other Kubernetes resources like ReplicaSets and Pods. With deployments, you can perform
rolling updates and rollbacks, making it easy to update your application without any downtime or quickly revert to a previous version in case of issues.

apiVersion: apps/v1
Deployment
kind: Deployment
Updates & Rollback A Deployment definition is similar to a ReplicaSet definition in that both are used to manage a set of metadata:
replicas of a pod template. However, the main difference is that a Deployment provides additional name: nginx
functionality for rolling updates and rollbacks of the replicas, whereas a ReplicaSet does not
ReplicaSet
namespace: dev
Self-healing, scalable, desired state spec:
template:
Pod Pod Pod metadata:
Container Container .. Container
labels:
app: nginx
spec:
containers:
- name: nginx
recommended to use a Deployment to manage the replicas of a stateless application image: nginx:1.17
strategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: nginx
9
Namespace

In Kubernetes, Namespace is a way to organize and isolate resources within a cluster. A namespace provides a virtual cluster within a physical cluster, allowing multiple teams or
applications to coexist within the same Kubernetes cluster
To create a namespace , you can use the kubectl
. .
create namespace command
& kubectl create namespace dev

Fronted namespace
Each namespace has its own set of resources, such as pods, services, storage volumes that are
1
App2 App2
isolated from resources in other namespaces. This helps to prevent naming conflicts between
apiVersion: v1 resources and allows different teams or applications to manage their own resources independently

Dev namespace
kind: Namespace
metadata:
name:dev App1 App1
E Node2
Node1
you can also create a YAML file that defines your namespace
and use the kubectl apply command to create the namespace Namespaces provide a way to organize resources and apply resource quotas, network policies, and other settings at a namespace level. For example,
you can limit the number of pods or services that can be created in a namespace, or restrict network traffic between pods in different namespaces.
— - ————————— —- —— —- —- ———-—- ———-—— —- —- ———-—- ———-—- ———-—— —- —- ———-— - ————————— —- —— —- —————-
Resource quotas
Resource quotas in k8s are a way to limit the amount of compute resources that can be consumed by a set of pods in a namespace. A resource quota is defined as
a Kubernetes object that specifies the maximum amount of CPU, memory, and other resources that can be used by pods in a namespace
2 apiVersion: v1
Name: saas-team-quota kind: ResourceQuota
kubectl describe resourcequota saas-team-quota => Namespace: dev metadata:
Resource Used Hard name: saas-team-quota
&
-------- ---- ----
ResourceQuota object specifies the maximum limits for the following resources
configmaps 0 5 namespace: dev
This command will display detailed information about limits.cpu 1 4 spec:
The maximum number of pods that can be created in the namespace T
the saas-team-quota ResourceQuota object, including
limits.memory 2 4Gi
persistentvolumeclaims 0 5 hard:
the current usage and maximum limits for each resource The total amount of CPU that can be requested by all pods in the namespace
pods 5 10 pods: "10" [

requests.cpu 1 2 requests.cpu: "2"


The total amount of memory that can be requested by all pods in the namespace [
requests.memory 2 2Gi
secrets 0 5 requests.memory: 2Gi
services 1 5 limits.cpu: "4"
services.loadbalancers 0 2
limits.memory: 4Gi
The total amount of CPU that can be used by all pods in the namespace
services.nodeports 0 3 [

count/deployment.apps 1 4 configmaps: "5"


The total amount of memory that can be used by all pods in the namespace persistentvolumeclaims: "5" 2

replicationcontrollers: "5"
secrets: "5"
If a ResourceQuota is applied to a namespace but no resource constraints are defined for the pods in the template section of a Deployment YAML file, then the Deployment services: "5"
services.loadbalancers: "2"
and ReplicaSet will still be created. However, no pods will enter the running state, as the ResourceQuota will prevent them from consuming any resources k -n dev get events services.nodeports: "3"
count/deployment.apps: "4"

$ k describe ns dev
LimitRange Name:
Labels:
dev
<none>
LimitRange is a resource object that is used to specify default and maximum resource limits for a set of pods in a namespace Annotations: <none>
3 Status: Active
When a LimitRange is applied to a namespace, it will only affect newly created pods. Existing pods will not Resource Quotas
apiVersion: v1
have their resource limits automatically updated to match the LimitRange settings kind: LimitRange Name: comput-quota
metadata: Resource Used Hard
name: dev-resource-limits -------- --- ---
namespace: dev count/deployments.apps 1 2
spec: cpu 6m 100m
limits: memory 60M 100M
- default: pods 6 10
cpu: 100m
No LimitRange resource.
LimitRange is used to set default and maximum resource limits for individual pods or containers within
memory: 128Mi
defaultRequest:
a namespace, while ResourceQuota is used to set hard limits on the total amount of resources that can
cpu: 50m
memory: 64Mi
be used by all the pods in a namespace
max:
cpu: 500m
memory: 512Mi
min:
cpu: 50m
memory: 32Mi
type: Container

———————————-————————-————————-————————-—————-————————-————————-—————-—————-————————-—
4
Resource Requirements & Limits apiVersion: apps/v1
Resource requirements and limits are used to specify the amount of CPU and memory resources that a container requires in order to run properly
kind: Deployment
metadata:
name: nginx
Resource requirements are set in the pod specification and indicate the minimum amount of CPU and memory resources that a container namespace: dev
spec:
needs to run. Kubernetes uses these requirements to determine which nodes in the cluster have the necessary resources to schedule the pod template:
metadata:
Resource limits specify the maximum amount of CPU and memory resources that a container is allowed to use. Kubernetes enforces labels:
app: nginx
these limits by throttling the container's resource usage if it exceeds the specified limit spec:
containers:
Guaranteed cpu resources for container - name: nginx
< > Maximum CPU resources for container image: nginx:1.17
< >
Container requires at least 100 milliCPU (0.1 CPU) resources:
.
and 10 megabytes of memory to run requests:
cpu request = 100 cpu limit =200 cpu: “100m”
Container is limited to using no more than 200
memory: “10M”
< > limits:
milliCPU (0.2 CPU) and 50 megabytes of memory
Area in between which k8s can throttle depending on other containers cpu: “200m”
memory: “50M”
… replicas: 3
Capacity: The Capacity section shows the maximum amount of
selector:
kunectl describe node kubeworker-1 ==> cpu: 4 matchLabels:
memory: 8192Mi resources (such as CPU and memory) that a node in the app: nginx
T
pods: 110 Kubernetes cluster has available
Allocatable:
if you do not specify the request and limits values for
cpu: 3 Allocatable section, shows the amount of resources
memory: 7168Mi 1 a container , the pod will be assigned default values
pods: 110 that Kubernetes has allocated for use by containers for CPU and memory. The default request value is
… and pods on the node 0.5 CPU and 256Mi memory, while the default
limits value is 1 CPU and 256Mi memory
Setting resource requirements and limits is important for ensuring that containers have the necessary
resources to run effectively without overloading the system. By specifying resource limits, you can
When a container reaches or exceeds its memory limit, the Linux kernel's Out of Memory Killer (OOM Killer) is invoked. The OOM
prevent containers from using too many resources and causing performance issues or crashes Killer is responsible for selecting and terminating processes to free up memory when system memory becomes critically low. By
default, Kubernetes lets the OOM Killer select and terminate the process within the container that triggered the OOM condition.
10
Service
Services are a core component in Kubernetes that are used to manage networking and traffic flow within a cluster. They provide a stable IP address and DNS name for a set of pods
and allow for communication between different components within and outside of the application. Services also enable load balancing , service discovery and traffic management,
making them a critical component for building scalable and resilient applications in Kubernetes.
— Deployment
.

When a service is created, it is assigned a virtual IP address (known as a ClusterIP), which is used to route traffic to the pods that are part
of the service. The service also has a DNS name, which can be used to access the service from within the cluster .

——————Replicaset
app: nginx

app: nginx ————— Label


10.244.83.196
>

>
25%

1 If a pod fails or is removed from the service, controller will automatically remove it from the services use labels and selectors to discover and Selector 25%

—————
list of endpoints for the service. This ensures that traffic is not sent to a non-existent pod. route traffic to the pods that are part of the service 10.244.83.193

———app: nginx
2 When a pod managed by a deployment fails, 75%

The controller creates a new pod to replace the failed pod Service—————
>
25% app: nginx
Once the new pod is running and ready, service's endpoint
3 10.102.156.115
controller will add it back to the list of endpoints for the app: nginx Each service has a unique IP address and DNS name that
Failed
app: nginx 10.244.83.194
service, allowing traffic to be routed to it can be used to access the pods that provide the service.
(used labels and selectors to discovery) 10.244.83.195 10.244.83.197

>
>

app: nginx app: apache


>
25% app: nginx

10.244.83.210
10.244.83.195

10.102.156.115

Service types in k8s —


|
|
—-————————-————————-—————-————————-————————-—————-—————-—————————
| | | |
. . . .

ClusterIP NodePort LoadBalancer ExternalName


This is the default type of service, which exposes This type of service exposes the service on a static This type of service exposes the service This type of service maps the service to an external
the service on a cluster-internal IP address that port on each node in the cluster, which can be accessed using a cloud provider's load balancer, which DNS name, allowing the service to be accessed from
is only accessible from within the cluster from outside the cluster using the node's IP address distributes traffic to the different Pods within the cluster using a consistent name

apiVersion: v1 apiVersion: v1 apiVersion: v1


kind: Service kind: Service kind: Service
metadata: metadata: metadata:
name: nginx-internal name: nginx-ext name: nginx-ext-lb
namespace: dev namespace: dev namespace: dev
spec: spec: spec: port ————————-————————-
rt
Po
type: LoadBalancer get
type: ClusterIP type: NodePort | tar
|
ports: ports: ports: |
————- targetPort: 8080 - targetPort: 8080
.

|
- targetPort: 80
.

| |
————— port: 80 port: 80 port: 8080 |
| protocol: tcp | nodePort:8080 nodePort:31090 Node nodePort
—————
| selector: | selector: | selector:
The selector field specifies the label selector that the service
| app: nginx | app: nginx | app: nginx —————
will use to find the Pods that it should load balance traffic to
| | |
service will load balance traffic to any Pods that The targetPort is used to specify the port number on
| |
have the label app=nginx the Pods that the service should forward traffic to
|
The port is used to specify the port number The nodePort field is used to specify the high port number on each node in | if you don't specify a nodePort value , Kubernetes will automatically
that clients should use to access the service the cluster that can be used to access the service from outside the cluster allocate a random high port number (30000-32767)for the service

To expose a service to the outside world in k8s, you can use one of the following methods

LoadBalancer Service NodePort Service Ingress


This method requires that your cloud provider This type of service exposes the service on a static port on An Ingress is a Kubernetes resource that defines a set of rules for routing external HTTP(S) traffic to a service.
supports LoadBalancer services, and it can each node in the cluster, which can be accessed from outside Ingress resources require an Ingress controller to be deployed in the cluster, which is responsible for implementing
incur additional costs the cluster using the node's IP address the routing rules. Ingress controllers are available for many popular web servers, such as Nginx, Traefik

While a load balancer service can provide a stable IP address and port for accessing the service, it still requires manual intervention
to update the endpoints, which can be time-consuming and error-prone. Therefore, a better solution to this problem would be to use When you create a LoadBalancer service , it creates a cloud provider-specific
Service LoadBalancer object, such as an Amazon Elastic Load Balancer (ELB) or
Kubernetes Ingress, which provides a more flexible and automated way of managing external access to the services in a k8s cluster
it is possible that a node might be added or removed from the system, which means that we need to manually update the endpoints of the (LoadBalancer) Google Cloud Load Balancer, which is external to the Kubernetes cluster
load balancer or recreate the service.
When you create a Service of type LoadBalancer, k8s will
Ingress
—————————————-————— ————————-——————————-
| automatically create a NodePort and ClusterIP for the Service.
| NodePort | Service
| Pod | (NodePort)
| Ingress

| | | resources apiVersion: apps/v1


Container <
kind: Deployment
| | | metadata:
| | name: mysql-deployment
Headless services are used for
—— Service namespace: dev
| |
(ClusterIP)
———————

spec:
direct access to pods Go to page 36
replicas: 2
| | selector:
| Pod Service | matchLabels:
app: mysql
| (ClusterIP) |
<
template:
| Container |
Label
| metadata:
| labels:
| . | |
|
| .

app: mysql
| | ——— Service | spec:
Pod
app: mysql
For services that we do not want to expose to the outside world
——-———

| containers:
| (Headless) |
(such as database clusters like mysql), we set the service type to
mysql.connect(host="mysql-service.dev.svc.cluster.local", port=3306, - name: mysql
—->
3306

database="mydb", user="<username>", password="<password>")

| | Container | mysql-db image: mysql:latest


—————————

ClusterIP. If we do not specify the service type, it will be selected


mysql.connect(“Service Name.namespace.service.domain”)
ports:
as ClusterIP by default | | | - containerPort: 3306
apiVersion: v1 —> | env:
9104

| ———> Service Container


3306

—————————

mysql-exporter - name: MYSQL_ROOT_PASSWORD


kind: Service
| ——— | valueFrom:
(ClusterIP)
metadata: —————————————-——————————-—————————————-——-——
10.244.83.210
———> secretKeyRef:
9104

name: mysql-service | ———- |


———

app: mysql name: mysql-secret


Pod
spec: | app: mysql

selector: | 10.102.156.125 | | | key: password


- name: MYSQL_DATABASE
matchLabels: | | | | Selector
3306

value: mydb
Container
—->
mysql-db
app: mysql | | - name: mysql-exporter
|
ports:
- name: mysql | Prometheus | |
image: prom/mysqld-exporter:v0.12.1
ports:
9104

> Container | - containerPort: 9104


port: 3306
targetPort: 3306
| | — mysql-exporter
env:
- name: mysql-exporter | | 10.244.83.211 | - name: DATA_SOURCE_NAME
value: "root:$MYSQL_ROOT_PASSWORD@(localhost:3306)/
port: 9104 Node1
|—————————————-——————————- iptables Node2
|———————————————-——————————- |
| $MYSQL_DATABASE?tls=false" Go to page 36
targetPort: 9104
.0
Cluster |
|
In the real world, StatefulSets used instead of Deployments for stateful applications
|
| | .0
When a service is created, the kube-proxy on each node in the cluster automatically creates iptables rules to forward traffic to the service endpoints.
|
| iptables -A KUBE-SERVICES -d 10.102.156.125/32 -p tcp -m comment --comment "/* dev/mysql: cluster IP" -m tcp --dport 3306 -j KUBE-SVC-ABC123
|
iptables -A KUBE-SVC-<service-uid> -m comment --comment "dev/mysql-service:" -j KUBE-SEP-<endpoint-uid-1>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR NODE-PORT ENDPOINTS iptables -A KUBE-SVC-<service-uid> -m comment --comment "dev/mysql-service:" -j KUBE-SEP-<endpoint-uid-2>
mysql-service ClusterIP 10.102.156.125 <none> 3306/TCP,9104/TCP 1d app=mysql <none> 10.244.83.210:3306, 10.244.83.211:3306
11
Process of creating a service in Kubernetes:

1 Define the service: The first step in creating a service is to define the service using a YAML file or through the Kubernetes API. The YAML file specifies details such as the name
of the service, the selector used to identify the pods that the service should route traffic to, and the type of service (ClusterIP, NodePort, or LoadBalancer).

2 Submit the service definition: Once the service definition is created, it can be submitted to the Kubernetes API server

3 API Server validates the service definition: The Kubernetes API server receives the service definition and validates it to ensure that it is well-formed and contains all the required
information.

4 Service is created: Once the service is created, Kubernetes creates an endpoint object that tracks the IP addresses and ports of the pods that the service should route traffic to.
This information is stored in etcd

5 iptables rules are created: Once the endpoint object is created, kube-proxy creates iptables rules on each node in the cluster to route traffic to the pods that are part of the service.
These iptables rules are used to ensure that traffic is routed to the correct pod, and that traffic is load balanced across multiple pods if more than one pod matches the selector.

6 Access the service: The service is now accessible within the cluster using its name or DNS name, and can be used to route traffic to the pods that are part of the service.

7 Monitor the service: Once the service is running, Kubernetes monitors its health and takes action if any issues arise. For example, if a pod fails, Kubernetes will automatically
remove it from the list of endpoints for the service.

1 Request to create a service is sent to Kubernetes


——————————————————>
Kubelet interacts with kube-proxy to ensure that the iptables rules are properly configured and connected to the appropriate

—————————> pods. Kubelet provides kube-proxy with information about the pods that are running on the node, and kube-proxy uses this
>Kubectl
information to update the iptables rules as needed.
kubectl apply -f mysql-svc.yml Authentication and Authorization 5
2-3 Manifest Syntax Check kube-proxy on each node in the cluster automatically creates iptables rules to forward traffic to the service endpoints.
apiVersion: v1
kind: Service | The following is a simplified overview of the iptables rules created by kube-proxy:
|
metadata: 1. A new iptables chain is created with the name of the service (e.g. "my-service").
|
name: mysql-service | 2. A rule is added to the PREROUTING chain to match incoming traffic that is destined for the service's cluster IP address (e.g. 10.0.0.1) and jump to the service chain.
namespace: dev | 3. In the service chain, a rule is added to select one of the service's endpoints using a load balancing algorithm (e.g. round-robin).
spec: | 4. The selected endpoint's IP address is rewritten as the destination IP address of the packet.
>

type: ClusterIP
ports: 5. The packet is then forwarded to the selected endpoint.
- targetPort: 80
# Allow traffic from the Kubernetes Service IP address and port to the Kubernetes Endpoints
iptables
port: 80
iptables -A KUBE-SERVICES -d <ClusterIP>/32 -p tcp -m comment --comment "mysql-service/mysql: cluster IP" -m tcp --dport <Port> -j KUBE-MARK-MASQ
protocol: tcp iptables -A KUBE-SERVICES -d <ClusterIP>/32 -p tcp -m comment --comment "mysql-service/mysql" -m tcp --dport <Port> -j KUBE-SVC-<service-uid>
selector:
app: nginx Writing to etcd: Kube-api creates an endpoint object # Allow traffic from the Kubernetes Endpoints to the Pods
that tracks the IP addresses and ports of the pods
4 iptables -A KUBE-SEP-<endpoint-uid> -s <PodIP> -m comment --comment "mysql-service/mysql" -j KUBE-MARK-MASQ
iptables -A KUBE-SEP-<endpoint-uid> -s <PodIP> -m comment --comment "mysql-service/mysql" -j DNAT --to-destination <PodIP>:<Port>

———-————————-————————-—————-————————-————————-—————-—————-————————-—————-————————-—————-

Endpoint >80 app: nginx

When a Service is created, the Service controller queries the Kubernetes API server to get a list of all Pods that match
app: nginx

apiVersion: v1 10.244.83.194

the Service's label selector. It then creates an Endpoint object that includes the IP addresses and ports of these Pods,
kind: Endpoints
metadata: 10.102.156.115

and associates the Endpoint with the Service. The Kubernetes networking layer uses this Endpoint information to route
name: my-web-service
subsets: >80 app: nginx
- addresses:
traffic to the appropriate Pods that make up the Service. - ip: 10.244.1.194
- ip: 10.244.1.195
10.244.83.195

ports:
Endpoints are automatically created and managed by
- name: http
port: 80 Kubernetes when you create a Service, and they are updated
dynamically as Pods are added or removed from the Service

DNS
protocol: TCP

Kubernetes has a built-in DNS component that provides naming and discovery between pods running on the cluster. It assigns DNS records (A records, SRV records, etc)
for each pod/service automatically. The DNS name follows a specific format, such as <service-name>.<namespace>.svc.cluster.local for accessing a Service or
<pod-name>.<service-name>.<namespace>.svc.cluster.local for accessing a specific Pod associated with a headless Service. CoreDNS will have the following DNS records for dev namespace
nginx-service.dev.svc.cluster.local 10.244.0.100
10-244-0-100.dev.pod.cluster.local 10.244.0.55
If a Pod located in the "default" namespace needs to communicate with a service named "nginx-service" residing
in the "dev" namespace, it can do so by using the URL "http://nginx-service.dev.svc.cluster.local". Pod Kube-dns Coredns
1
—————> — Hostname
nginx-service
NS Type Root Ip Address
dev svc cluster.local 10.244.0.100

<pod-name>.<service-name>.<namespace>.svc.cluster.local Service
10-244-0-55

Pod
dev Pod cluster.local 10.244.0.55

/etc/resolv.conf
In Kubernetes, FQDN stands for Fully Qualified Domain Name. It is a complete domain name that nameserver 10.96.0.10 10.96.0.10 10.244.0.12:53
Kube-system
specifies the exact location of a resource within the DNS hierarchy. By using FQDNs, Kubernetes |
simplifies the process of resource discovery, network routing, and namespace isolation within the cluster default |
| nginx-service nginx-pod
2
|———————————> ——
The default DNS provider in Kubernetes is CoreDNS, which runs as pods/containers inside the cluster.
Service Pod
10.244.0.100 10.244.0.55
CoreDNS retrieves pod/service information from the Kubernetes API to update its DNS records. dev

Notice:Kubernetes does not automatically create DNS records for Pod names directly. This is because Pod IPs keep changing whenever Pods are recreated or
Pods get DNS resolution indirectly via records in the Pod DNS subdomain:
rescheduled.Instead, stable DNS records are maintained at the Service level in Kubernetes. Services have unchanging virtual IPs that act as stable endpoints
Each Pod gets a DNS record in the format :
<pod-ip-address>.<namespace>.pod.cluster.local
Pod dns policy
Pod's DNS settings can be configured based on the dnsPolicy field in a Pod specification. This dnsPolicy field
apiVersion: v1
kind: Pod

accepts three possible values:


metadata:
name: mypod
spec:
containers:
ClusterFirst: Any DNS query that does not match the configured search domains for the Pod are - name: mypod
image: myimage
forwarded to the upstream nameserver. This is the default policy if dnsPolicy is not specified. dnsPolicy: "None"
Please note that the Pod's DNS config allows dnsConfig:
None: Allows a Pod to ignore DNS settings from the Kubernetes environment. All DNS settings you to customize the DNS parameters of a Pod nameservers:
- 1.2.3.4
are supposed to be provided using the dnsConfig field in the Pod Spec. searches:
In this example, the Pod mypod uses a custom DNS resolver (1.2.3.4) and a custom search list - ns1.svc.cluster-domain.example
(ns1.svc.cluster-domain.example and my.dns.search.suffix). The option ndots:2 means that if a DNS query - my.dns.search.suffix
Default: Use the DNS settings of the node that the Pod is running on. This means it will use the options:
name contains less than 3 dots, then the search list mechanism will be used. For example, a query for mypod
same DNS as the node that the Pod runs on.
- name: ndots
will be first tried as mypod.ns1.svc.cluster-domain.example and if that fails, as mypod.my.dns.search.suffix. value: "2"
12
How scheduling works?
When a Pod is created , it is not assigned to any specific Node initially. instead, the Pod is marked as "unscheduled" and is added to a scheduling queue. The scheduler continuously
watches this queue and selects an appropriate Node for each unscheduled Pod.The scheduler uses a set of rules to determine which nodes are eligible for scheduling. These rules include:

Resource requirements: Node capacity:


The scheduler looks at the CPU and memory requirements The scheduler considers the capacity of each node in the cluster, including Once the scheduler has identified a set of eligible nodes, it evaluates each
specified in the pod's configuration and ensures that the the amount of available CPU, memory, and storage, and selects a node node's fitness and assigns a score based on these factors. The node with
selected node has enough available resources to run the pod. that has sufficient capacity to meet the pod's requirements the highest score is selected, and the pod is scheduled to run on that node.

Taints and tolerations: Node selectors: Kubernetes also provides the ability to filter nodes based on various
Nodes in a Kubernetes cluster can be tainted to indicate that they have specific Users can also specify node selectors, which are labels that are attributes before selecting them for scheduling. This allows users to
restrictions on the pods that can be scheduled on them. Pods can specify applied to nodes in the cluster. The scheduler can use these specify additional constraints, such as selecting only nodes with
tolerations for these taints, which allow them to be scheduled on the tainted nodes. selectors to filter out nodes that don't match the pod's requirements. specific labels or taints.

(Node,Pod) Affinity/anti-Affinity: If the scheduler is unable to find a suitable node for the pod,
Kubernetes allows users to specify affinity and anti-affinity rules that control which nodes pods can be scheduled on. For example, a pod may be required the pod remains unscheduled and enters a pending state until
to run on a node that has a specific label, or it may be prohibited from running on a node that already has a pod with a certain label. a suitable node becomes available.

You can constrain a Pod to run on specific nodes or prefer to run on particular nodes. There are several recommended approaches to achieve this, including Node Selector,
Affinity/Anti-affinity, and Taint.

Pod8 with a toleration and a node affinity can only be scheduled on


a node that meets both the toleration and affinity requirements

Pod7 with a toleration for a taint can be scheduled on a node that has the
matching taint, as well as on any other node that doesn't have the taint
you can manually set the nodeName field in
the Pod's spec, and essentially bypassing the Pod8
scheduler and telling k8s exactly which Node pod5 should be scheduled on a node that already
pod3 should be scheduled on a node has at least one other pod with the label app=redis
affinity:
to schedule the Pod on. nodeAffinity:
that has a GPU by using the gpu label requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
Pod5
- matchExpressions:
Pod2 will only be deployed on
Pod3 - key: gpu
nodes that have app:kafka label affinity: operator: In
podAffinity:
affinity:
requiredDuringSchedulingIgnoredDuringExecution: Pod7 values:
nodeAffinity: - rtx4090
requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:
Pod1
tolerations:
Pod2 nodeSelectorTerms: - matchExpressions:
- key: “app”
tolerations:
- matchExpressions: - key: app - key: “gpu”
nodeSelector: - key: gpu operator: In operator: “Equal” operator: “Equal”
nodeName:Node1 app: kafka operator: Exists values: value: “ssd” value: “RTX4090”
- redis effect: “NoSchedule” effect: “NoSchedule”

Taint
app: Kafka gpu: true Taint Node6
Node1 Node2 Node3 Node4 Pod app:redis Node5 gpu: rtx4090
Redis kunectl taint nodes node5 app=ssd:NoSchedule kunectl taint nodes node5 gpu=RTX4090:NoSchedule

Pod affinity is used to ensure that a pod is scheduled on a node that has other pods running with taints allow to mark a node as unsuitable for certain pods, tolerations, affinity, and node selectors are defined on
certain characteristics, while node affinity refers to the preference of a pod to be scheduled on a preventing them from being scheduled on that node unless pods, while Labels and taints are defined on nodes
specific node based on its labels they have a matching toleration

———————————-————————-————————-————————-—————-————————-————————-—————-—————-————————-———
Labels & selector
labels are a powerful mechanism for grouping and organizing related objects, such as Pods, Services, Deployments, and more. Labels are key-value pairs that can be attached
to Kubernetes objects, and they can be used for a variety of purposes, such as grouping related objects for easy management, selecting objects for operations such as scaling
or updating, and enabling fine-grained access control
there are several ways to use labels to group objects in Kubernetes
-> Grouping by object type: You can use labels to group objects based on their type, such as Pods, Services, Deployments, ConfigMaps
-> Grouping by application: You can use labels to group objects based on the application they belong to, such as a web application, a database, or a caching layer
-> Grouping by functionality: You can use labels to group objects based on their functionality, such as front-end components, back-end components, databases,
caches, authentication services, video processing services

Annotations
Annotations are similar to labels, but they are designed to store additional information that is not used for grouping or selection, They can be used to store
information such as version numbers, timestamps, configuration details, and other metadata that is useful for debugging, monitoring, or other purposes
you can use annotations to configure the Nginx ingress controller. annotations: Annotations can be up to 256 kilobytes in size, allowing you to
However, for more complex configurations, it can be easier to maintain nginx.ingress.kubernetes.io/proxy-cache: "on"
store more complex metadata with Kubernetes objects (labels are
nginx.ingress.kubernetes.io/proxy-cache-path: "/data/nginx/cache"
and manage your Nginx configuration by using a ConfigMap nginx.ingress.kubernetes.io/proxy-cache-max-size: "100m" limited to 63 characters)
———————————-————————-————————-————————-—————-————————-————————-—————-—————-————————-—
Node selector
NodeSelector is a feature in Kubernetes that allows you to specify a set of labels that a node must have in order for a pod to be scheduled on that node. When you create a pod, you
can specify a NodeSelector in the pod spec that will be used to match against the labels of all the nodes in the cluster. If any node has labels that match the NodeSelector, then the pod
can be scheduled on that node. also for more complex and multiple constraints such as deploying a Pod on two nodes with different labels, it's better to use Affinity or Anti-Affinity
apiVersion: v1 !
—————————————————— kind: Pod If a pod's NodeSelector specifies labels that don't exist on any
—————

This pod will only be deployed on nodes that have this label
——————————————————————————
metadata: node, the pod won't be scheduled until a node is labeled appropriately
———

———

name: myapp-pod Pending…


spec:
This command add the disktype=ssd label to the node named kubeworker-2
containers:
k label node kubeworker-2 disktype=ssd - name: dada-processor
image: data-processor This command removes the disktype label from the node named
nodeSelector:
kubeworker-2 k label node kubeworker-2 disktype-
workernode-1 workernode-2 disktype: ssd disktype: ssd
13
Affinity and anti-affinity
Affinity gives you more control over the scheduling process, allowing you to set rules based on the node's labels or pod’s labels. Anti-affinity prevents Pods from
being scheduled on the same node or group of nodes.
Affinity Type Description
————————— —————————————————————————————————————————————————————————————————————————————————————————————————————

Node Affinity Used to specify rules for which nodes a Pod can be scheduled on based on the labels of the nodes.
———————————— ——————————————————————————————————————————————————————————————————————————————————————

Pod Affinity Used to specify rules for which Pods should be co-located on the same node based on the labels of other Pods running on the node.
———————————— ——————————————————————————————————————————————————————————————————————————————————————

Pod Anti-Affinity Used to specify rules for which Pods should not be co-located on the same node based on the labels of other Pods running on the node.

Each type of Affinity can be further Affinity Caregory Description


broken down into two categories ———————————————— ——————————————————————————————————————————————————————————————————————————————————————————————————————————

Required During Scheduling Specifies that the rule must be satisfied for the Pod to be scheduled. If the rule is not satisfied, the Pod will not be scheduled.
> ————————————————————— ——————————————————————————————————————————————————————————————————————————————————————————
Specifies that the rule should be satisfied for the Pod to be scheduled, but is not required. If the rule is not satisfied, the Pod will still be scheduled.
Preferred During Scheduling

apiVersion: v1
kind: Pod
metadata: Availability zone1 Availability zone2
name: database-pod avail-zone: zone1 avail-zone: zone1
spec: avail-zone: zone2 avail-zone: zone2
containers: share: dedicated share: shared share: dedicated share: shared
- name: database-pod
image: postgres: 13.11 Pod Node Node Node Node
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution: Preferred labels: Top priority Priority: 2 Priority: 3 Priority: 4
- weight: 80 avail-zone: zone1 (weight 80)
preference: share: dedicated (weight 20)
matchExpressions:
- key: avail-zone we used preferred Node Affinity to specify that the Pod prefers to be scheduled on nodes with the labels avail-zone: zone1 and share: dedicated. We
operator: In
values: also assigned a weight to each label to indicate the preference of the Pod. The higher the weight, the higher the priority of the label during scheduling
- zone1
- weight: 20
preference:
matchExpressions:
- key: share You can specify a weight between 1 and 100 for each instance of the
operator: In preferredDuringSchedulingIgnoredDuringExecution affinity type
values:
- dedicated

apiVersion: v1
kind: Pod
metadata: we're using Pod Affinity to specify that the frontend-pod requires that it be scheduled on a node that has Rack 1 Rack 2
name: frontend-pod
spec: a Pod with the label app=backend in the same rack (topologyKey: "rack"). If no node has a matching Pod rack: rack1 rack: rack2
containers: in the same rack, the frontend-pod will not be scheduled.
- name: frontend-container Node 1 Node 11
image: frontend-image
affinity: Frontend Pod rack: rack1 rack: rack2
podAffinity: app: backend
requiredDuringSchedulingIgnoredDuringExecution: Pod affinity (required) Backend pod Node 12
- labelSelector: Label selector: app=backend Node 2
matchExpressions: Topology key: rack
- key: app … …
operator: In You can use the In, NotIn, Exists and DoesNotExist rack: rack1 rack: rack2
values in the operator field for affinity and anti-affinity.
values: Fronted pods will be scheduled to nodes
- backend in the same rack as the backend pod. Node 10
topologyKey: "rack" Node 20

————————————-————————-————————-————————-—————-————————-————————-—————-—————-————————-—

Taints & Tolerations


Node affinity is a property of pods that can either prefer or require certain nodes for scheduling. In contrast, taints allow nodes to reject certain pods. Tolerations are applied to
pods and enable the scheduler to schedule them on nodes that have the corresponding taints
Pod
Taints are defined using the kubectl taint command, and they consist of a key-value pair and an effect. The key-value pair is used to ———————> X
identify the type of taint
Taint A
Pod —————————>
Node
kubectl taint nodes node-name key(=value):taint-effect tolerations: A

Tolerations are applied to pods. Tolerations allow Taints are node-specific &
the scheduler to schedule pods with matching taints applied to individual nodes
Taint-effect

>
NoSchedule: This effect means that no new Pods will be scheduled on the Node unless they have a corresponding toleration. Existing Pods on the Node will continue to run

>
NoExecute: This effect means that any Pods that do not have a corresponding toleration will be evicted from the Node. This can be useful for situations where a Node needs
to be drained of its Pods for maintenance or other reasons

>
PreferNoSchedule: This effect is similar to NoSchedule, but it allows Pods to be scheduled on the Node if there are no other Nodes available that match the Pod's scheduling
requirements. However, if there are other Nodes available that do not have the taint, the Pod will be scheduled on one of those Nodes instead.

apiVersion: v1
kind: Pod
apiVersion: v1 metadata:
kind: Pod name: db-pod
metadata:
name: nginx-pod We apply a taint on workernode-3 spec:
containers:
spec: kunectl taint nodes workernode-3 app=ssd:NoSchedule - name: mysql-container
containers: image: mysql:latest
- name: nginx-container
image: nginx:1.18 workernode-1 workernode-2 workernode-3 app=ssd:NoSchedule tolerations:
| | - key: “app”
| | | operator: “Equal”
|
————————-————————-————————- | | | | | value: “ssd”
| | |
nginx-pod does not tolerate the taint on the ————————-————————-————————-—-——————————————- effect: “NoSchedule”
db-pod can still be scheduled on other nodes that do not have that taint
workernode-3 , so it will not be deployed on it Pods that have this toleration can
be scheduled on workernode-3

When you want to deploy a Pod on a specific node, you need to use taint affinity in addition to taints. This is because taints only restrict which nodes a
Pod can be scheduled on based on the characteristics of the node, but do not take into account any preferences or constraints specific to the Pod itself
14
Notice: the node-role.kubernetes.io/master taint is automatically applied by the kubelet on the master node when the cluster is initialized. Its purpose is to reserve the master node
for running control plane components and system Pods, ensuring they have dedicated resources and are not scheduled with regular user Pods. To enable scheduling Pods on the
master node in Kubernetes, there are two approaches: adding a toleration or removing the applied taint kubectl describe node kubemaster | grep Taint
Taint: node-role.kubernetes.io/master:NoSchedule

Adding a Toleration: By adding a toleration to the Pod's configuration that matches the taint on the master Removing the Taint: Another way to allow Pods to be scheduled on the master node is by removing the taint altogether. This approach effectively
node, the Pod can be scheduled on the master node despite the taint. This allows specific Pods to run on opens up the master node for scheduling any type of Pod, including regular user Pods. However, removing the taint means that the master node may
the master node while preserving its dedicated role for control plane components and system Pods. no longer be exclusively reserved for control plane components and system Pods, potentially affecting the stability and performance of the cluster.

tolerations: kubectl taint nodes <node-name> node-role.kubernetes.io/master-


- key: “node-role.kubernetes.io/master” kubectl taint nodes kubemaster node-role.kubernetes.io/master-
operator: “Exists ”

Warning: the default master taint exists to protect the stability and reliability of the control plane. Removing it is not recommended as it can lead to overloading the master, reduced HA, and potentially cluster failures

Notice: when a node becomes not ready, indicating that it is no longer available to run new workloads, two taints are automatically added to the node: "node.kubernetes.io/not-ready:NoSchedule"
and "node.kubernetes.io/not-ready:NoExecute". These taints serve different purposes and affect the scheduling and behavior of pods on the node. kubectl taint nodes <node-name> node.kubernetes.io/not-ready:NoSchedule-
kubectl taint nodes <node-name> node.kubernetes.io/not-ready:NoExecute-

You can remove the taints using the "kubectl taint" command with the "--remove" option

Taint/Tolerations & Node Affinity


To achieve fine-grained control over pod scheduling and ensure pods are scheduled on specific nodes while those nodes only accept certain pods, you can use a combination of Node
Affinity and Taints and Tolerations.

First, we use Node Affinity to specify the rules for selecting nodes based on their labels
==>
Node Affinity: Node Affinity is used to specify rules that determine which nodes a pod can be scheduled on.
kunectl label node kubeworker-1 drive=ssd
You can define node affinity rules based on node labels, node fields, or node selectors. By applying node kunectl label node kubeworker-2 cpu=xeon
affinity to a pod, you can restrict its scheduling to specific nodes that meet the defined criteria. kunectl label node kubeworker-3 gpu=yes

Second, we use Taints and Tolerations to indicate which pods can tolerate which taints on nodes. We can apply a taint to nodes that should only accept certain pods, and then specify
the corresponding tolerations in the pod specification
==>
Taints and Tolerations: Taints are applied to nodes to repel or prevent pod scheduling by default. However,
kunectl taint nodes workernode-1 drive=ssd:NoSchedule
you can configure tolerations in the pod's configuration to allow specific pods to tolerate specific taints on kunectl taint nodes workernode-2 cpu=xeon:NoSchedule
nodes. Tolerations enable pods to be scheduled on tainted nodes by matching the taint's key and value. kunectl taint nodes workernode-3 gpu=yes:NoSchedule

affinity:
nodeAffinity:
tolerations: requiredDuringSchedulingIgnoredDuringExecution:
- key: “drive” nodeSelectorTerms:
operator: “Equal” - matchExpressions:
value: “ssd” - key: “drive”
operator: In
Pod
effect: “NoSchedule”
values: Pod
Define Node Affinity rules in the pod's configuration - ssd
to match specific labels or fields on production nodes

—————
—————

>
————— —— — - — — - -

————
> ————— —— — - — — - -
—--

Pod Pod Pod


drive=ssd cpu=xeon gpu=yes

kunectl label node kubeworker-1 drive=ssd

kunectl taint nodes workernode-1 drive=ssd:NoSchedule

workernode-1 drive= SSD workernode-2 cpu=xeon workernode-3 gpu=yes workernode-4 workernode-5


With this approach, only pods that have the appropriate tolerations and satisfy the Node Affinity rules will be scheduled Two Pods do not have any tolerations specified in their PodSpec, while the other two nodes do not have
any taints applied. Therefore, the scheduler can schedule these two Pods on either of the taintless nodes.
on the these nodes. Other nodes without the specific taint or lacking the required labels/fields won't receive these pods

apiVersion: apps/v1 apiVersion: apps/v1 apiVersion: apps/v1


kind: Deployment kind: Deployment kind: Deployment
metadata: metadata: metadata:
name: nfs-app1 name: nginx name: image-processor-app1
namespace: dev namespace: dev namespace: dev
spec: spec: spec:
replicas: 3 replicas: 3 replicas: 3
selector: selector: selector:
matchLabels: matchLabels: matchLabels:
app: nfs app: nginx app: image-processor
template: template: template:
metadata: metadata: metadata:
labels: labels: labels:
app: nfs app: nginx app: image-processor
spec: spec: spec:
containers: containers: containers:
- name: nfs - name: nginx - name: image-processor
==>
image: nfs:latest image: nginx:1.17 image: image-processor
affinity: affinity: affinity:
nodeAffinity: nodeAffinity: nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: requiredDuringSchedulingIgnoredDuringExecution: requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: nodeSelectorTerms: nodeSelectorTerms:
- matchExpressions: - matchExpressions: - matchExpressions:
- key: “drive” - key: “drive” - key: “drive”
operator: In operator: In operator: In
values: values: values:
- ssd - ssd - ssd
==> tolerations:
tolerations: tolerations:
- key: “drive” - key: “cpu” - key: “gpu”
operator: “Equal” operator: “Equal” operator: “Equal”
value: “ssd” value: “xeon” value: “yes”
effect: “NoSchedule” effect: “NoSchedule” effect: “NoSchedule”
15
priority class & Preemption
priority class is a way to assign a priority value to a Pod, which determines its relative importance compared to other Pods. The priority value can be any integer between 0 and
1000000, with higher values indicating higher priority.
Preemption policies determine whether a higher priority Pod can preempt(evict) a lower priority Pod to be scheduled on a node. There are three preemption policies:

"PreemptLowerPriority" (default): "IfNoOtherPods": "Never": apiVersion: scheduling.k8s.io/v1 apiVersion: v1


kind: PriorityClass kind: Pod
Pods with this priority class are allowed to preempt Pods with this priority class are allowed to preempt lower Pods with this priority class are never metadata: metadata:
lower priority Pods if there are no nodes with available priority Pods only if there are no other Pods in the cluster that allowed to preempt lower priority Pods name: high-priority name: db-1
resources to schedule them without preemption can be evicted to make room for the higher priority Pods
value: 1000000 spec:
globalDefault: false containers:
preemptionPolicy: PreemptLowerPriority - name: db-1
Evict,Reschedule image: mysql:latest
———————
| | The globalDefault field indicates whether this PriorityClass
Pod2 | | should be used for pods without a PriorityClass
priorityClassName: high-priority
Req: mem:300 | Once you have defined the priority class, you can assign
priorityClassName: high-priority | Pod Pod Pod Pod2 Pod Pod Pod it to a Pod by specifying its spec.priorityClassName field.
——Scheduler — mem:150 mem:150 mem:150 mem:300 mem:150 mem:150 mem:150
nodeSelector: ==>
app: kafka
Node-1 app:kafka Node-1 app:kafka Node-2

Available mem:200 Available mem:50


Pod disruption budget
apiVersion: policy/v1
Pod Disruption Budget (PDB) in Kubernetes is a way to ensure that a certain number or percentage of pods with an application are not kind: PodDisruptionBudget
metadata:
voluntarily evicted at the same time. This can help to maintain high availability during voluntary disruptions like upgrades and maintenance. name: my-pdb
spec:
minAvailable: 2
In this example, the Pod Disruption Budget named my-pdb specifies that at selector:
least two pods with the label app=my-app should be available at all times matchLabels:
app: my-app

Bin packing
Bin packing in k8s refers to the process of efficiently utilizing resources by scheduling pods on nodes in a way that maximizes resource usage and minimizes wasted resources.
Kubernetes achieves bin packing through its scheduler, which considers factors such as resource requests, limits, and available resources on nodes to make optimal scheduling
decisions. Kubernetes scheduler follows two strategies to decide the scheduling of Pods:

BestFit: In this approach, the scheduler places the incoming Pod in the node with the least amount of free resources after placement. This strategy aims to leave as much space free as possible on every other node.
WorstFit: In this approach, the scheduler places the incoming Pod in the node with the most amount of free resources after placement. This strategy aims to fill up nodes as much as possible.

Placement failures can occur in bin packing scenarios in Kubernetes when the scheduler is unable to find a suitable node to schedule a pod due to resource constraints or other constraints defined in the cluster.

scenario: there are three nodes in the cluster, each with 1000m of CPU and 2GB RAM. Currently, there are nine running pods (blue) with their allocated resource requests. However, a new pod (orange)
with a request of 300m CPU and 600MB RAM cannot be scheduled. This is due to the unavailability of any node that satisfies both the CPU and RAM requirements of the new pod. Surprisingly, even
though the entire cluster has a total of 600m CPU and 1200MB RAM available, the scheduler is unable to find a suitable node.
Problem 1: placement Failure Migrate and Place
you can consider moving Pod A from Node1 to Node2. By doing so, you would consolidate the required resources Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total
vCPu | 200 | 1000 vCPu | 200 | 1000 vCPu | 200 | 1000 vCPu | 100 | 1000 vCPu | 0 | 1000 vCPu | 200 | 1000
(400m CPU, 900MB RAM) on Node1. This would create enough available resources on Node1 for the pending Mem | 500 | 2000 Mem | 500 | 2000 Mem | 500 | 2000 Mem | 300 | 2000 Mem | 0 | 2000 Mem | 500 | 2000
pod X to be comfortably placed by the scheduler.
Pod A | Req Pod D | Req Pod G | Req Pod x | Req Pod D | Req Pod G | Req
vCPu | 200 vCPu | 300 vCPu | 100 vCPu | 300 vCPu | 300 vCPu | 100
Mem | 400 Mem | 500 Mem | 250 Mem | 600 Mem | 500 Mem | 250

Pod X
—— | Req Pod B | Req Pod E | Req Pod H | Req Pod B | Req Pod E | Req Pod H | Req
vCPu | 300 vCPu | 250 vCPu | 200 vCPu | 300 vCPu | 250 vCPu | 200 vCPu | 300
priorityClassName: high-priority Mem | 300 Mem | 750
Mem | 600 Mem | 600 Mem | 300 Mem | 750 Mem | 600

Pod C | Req Pod F | Req Pod I | Req Pod C | Req Pod F | Req Pod I | Req
The operation of moving Pod A from Node1 to Node2 can be performed manually by directly interacting vCPu | 350 vCPu | 300 vCPu | 400 vCPu | 350 vCPu | 300 vCPu | 400
Mem | 500 Mem | 800 Mem | 700
with the Kubernetes API.this can be done using command.Additionally, adjusting the priorities of your pods
Mem | 500 Mem | 800 Mem | 700

can help in scenarios where resources are scarce. By assigning appropriate priorities to your pods, you can
Total available capacity across kubernetes Pod A | Req
vCPU : 600 mC memory : 1200 MB vCPu | 200
ensure that critical pods have higher priorities compared to less critical pods. When resources become limited, Mem | 400
Total available capacity across kubernetes
the Kubernetes scheduler can use these priorities to make decisions about which pods to preempt in order to make room for higher priority pods. By preempting lower vCPU : 300 mC memory : 600 MB
priority pods, Kubernetes ensures that critical pods get scheduled and receive the necessary resources. This helps in optimizing resource utilization and ensuring that
important workloads are given priority even in resource-constrained environments.

Imbalanced placement in Kubernetes refers to a situation where the distribution of pods or workloads across the nodes in a Kubernetes cluster is uneven or skewed. This can lead to certain nodes being
overloaded while others are underutilized, resulting in inefficient resource allocation and potential performance issues. There are a few common causes of imbalanced placement in Kubernetes:

There are a few common causes of imbalanced placement in Kubernetes:

Node labels and pod affinity/anti-affinity: Kubernetes provides mechanisms like Resource requests and limits: Kubernetes allows you to specify resource requests Node capacity and utilization: If the nodes in a Kubernetes cluster have different
node labels and pod affinity/anti-affinity rules to influence the placement of and limits for pods, indicating the minimum and maximum amount of resources (CPU, capacities in terms of CPU, memory, or other resources, it can result in imbalanced
pods. If these rules are not properly configured or if there are inconsistencies in memory) they require. If these values are set incorrectly or if there is a wide variation placement. Nodes with higher capacity may end up hosting more pods, while nodes
the labels, pods may not be distributed evenly across nodes. in the resource requirements of pods, it can lead to imbalanced placement. with lower capacity may remain underutilized.

scenario: Node1 has high CPU usage (90%) but relatively low memory usage (25%). On the other hand, Node3 has low CPU usage (20%) but high memory usage (85%). This imbalance in resource
utilization across the nodes can have the following impacts:

Problem 2: imbalanced placement Swap and Balance


Pod B on Node1: Since Pod B is a CPU-intensive process, the high CPU usage on Node1 indicates that there might be Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total Node 1 | Avail | total
limited CPU resources available for Pod B during peak load situations. This can result in Pod B experiencing CPU vCPu | 100 | 1000 vCPu | 300 | 1000 vCPu | 200 | 1000 vCPu | 400 | 1000 vCPu | 200 | 1000 vCPu | 200 | 1000
Mem | 1500 | 2000 Mem | 400 | 2000 Mem | 500 | 2000 Mem | 950 | 2000 Mem | 500 | 2000 Mem | 500 | 2000
starvation, leading to degraded performance or even failures if it requires more CPU resources than what is available.
Pod A | Req Pod C | Req Pod E | Req Pod A | Req Pod C | Req Pod E | Req
vCPu | 500 vCPu | 300 vCPu | 100 vCPu | 500 vCPu | 300 vCPu | 100
Pod E on Node3: As Node3 has high memory usage (85%), Pod E, which is running on Node3, might face memory Mem | 950 Mem | 500 Mem | 950
Mem | 300 Mem | 500 Mem | 300
starvation during peak load scenarios. If Pod E requires additional memory resources that are not available due to
Pod B | Req
high memory usage on Node3, it can lead to out-of-memory errors or performance degradation Pod B | Req Pod D | Req Pod F | Req Pod F | Req Pod D | Req
vCPu | 400 vCPu | 400 vCPu | 100 vCPu | 100 vCPu | 400 vCPu | 400
Mem | 200 Mem | 1100 Mem | 750 Mem | 750 Mem | 1100 Mem | 200

If we swap Pod B and Pod F between Node1 and Node3, the observation and impact remain the same. Node1 still has 40% CPU usage and 48% memory usage, while Node3 has 50% CPU usage and 55%
memory usage. With these resource utilization levels, any pods on these two nodes should still be able to handle any kind of peak load without experiencing resource starvation or performance degradation
16
Daemonset apiVersion: apps/v1
kind: DaemonSet
A DaemonSet is a type of controller that ensures that all (or some) nodes in a cluster run a copy of a specific pod. It is often used metadata:
name: monitoring-deamon
for system-level tasks that should be run on every node, such as log collection, monitoring, or other types of background tasks namespace: dev-drive-monitor
spec:
When you create a DaemonSet, Kubernetes automatically creates a pod on each node that matches the specified label selector.
template:
metadata:
If a new node is added to the cluster, Kubernetes automatically creates a new pod on that node as well labels:
app: monitoring
spec:
By using labels and node selectors, you can specify which nodes in the Kubernetes cluster should run a
containers:
- name: monitoring-agent
particular DaemonSet. This allows you to restrict the execution of the DaemonSet to specific nodes Only on nodes that have this image: monitoring-agent
label, a pod of the DaemonSet nodeSelector:
Drive: ssd
type is automatically created
selector:
what is the best way to test a DaemonSet on a limited number of nodes without consuming too many resources from the customer's service? matchLabels:
app: monitoring
One approach could be to create a separate namespace with a ResourceQuota that limits the amount of resources that can be used by the
DaemonSet. This will ensure that the DaemonSet does not consume too many resources from the customer's service The selector section specifies the label
selector used to identify which pods
are managed by the DaemonSet

daemon pod daemon pod daemon pod daemon pod


node node node node Remove a node:
Cluster DaemonSet automatically terminate the corresponding pod

Add a new node:


DaemonSet automatically create a new pod on the node

Static Pod
A static pod is a pod that is managed directly by the kubelet on a specific node, rather than by the Kubernetes API server. A static pod is defined by a YAML manifest file that is placed
in a specific directory on the node, and the kubelet monitors that directory for changes to the manifest file

The containers come up statically, and their manifest file is located in the directory /etc/kubernetes/manifests. This means that the Kubernetes components, such as the API server,
controller manager, and scheduler, are started as containers using pre-defined manifests located in the /etc/kubernetes/manifests directory

Kubelet Pod Pod Pod Pod


| |
| | Kube
StaticPodPath is a directory path where | | kube-api etcd scheduler control-manager
| |
static pod manifests are stored on a node |
|
Container Runtime Engine
$ kubectl --namespace kube-system get pods -o wide
/var/lib/kubelet/<configuration>.yaml > Path: staticPodPath
NAME READY STATUS RESTARTS AGE IP NODE

etcd-kubemaster-1 1/1 Running 4 6d2h 192.168.100.11 kubemaster-1
staticPodPath: /etc/kubernetes/manifests
kube-apiserver-kubemaster-1 1/1 Running 3 6d2h 192.168.100.11 kubemaster-1
kube-api etcd scheduler Kube
control-manager kube-controller-manager-kubemaster-1 1/1 Running 3 6d2h 192.168.100.11 kubemaster-1

Node kube-scheduler-kubemaster-1 1/1 Running 3 6d2h 192.168.100.11 kubemaster-1

To delete a static pod in Kubernetes, you can either delete its corresponding manifest file from StaticPodPath or move the manifest file file to another path. Use the following command to remove the manifest file:
Replace <static-pod-manifest.yaml> with the filename of the manifest associated with the static pod you want to delete.
sudo rm /etc/kubernetes/manifests/<static-pod-manifest.yaml>

After performing either of these operations , the kubelet running on the node will detect the change in the static pod directory. It will stop managing the static pod associated with the deleted or moved YAML file,
and Kubernetes will initiate the termination process for that pod.

Static PODs DaemonSets


——————

Created by the kubelet Created by Kube-API server

Deploy Control Plane components as static pods Deploy MonitoringAgents, logging Agents on nodes
ignored by the kube-scheduler

Metrics Server
The Metrics Server is a component of Kubernetes that provides container resource metrics for built-in autoscaling pipelines. It collects resource metrics from Kubelets and exposes
them through the Metrics API in the Kubernetes API server. These metrics can be used by the Horizontal Pod Autoscaler and Vertical Pod Autoscaler for autoscaling purposes

2.Expose metrics

cAdvisor (short for "Container Advisor") is a component of the kubelet


HPA Api-server 4.expose metrics Kubelet
that is responsible for collecting and monitoring performance metrics
nd
>

ct a s
olle etric 1.collect metrics for containers and pods running on a node in a Kubernetes cluster
3.c gate m
Kubectl top Metrics
g re
Server ag Pod Pod
Master Nodes Node
Cluster

kubectl top nodes API provided by the kubelet for discovering and retrieving per-node summarized stats available through the /metrics/resource endpoint
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node1 50m 5% 983Mi 49% The Metrics Server aggregates metrics such as CPU and memory usage and stores in memory
node2 47m 4% 1043Mi 52%
$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
pod1 0m 10Mi
pod2 1m 100Mi
17
Autoscaling
Autoscaling refers to the ability of the Kubernetes cluster to automatically adjust the number of running instances of a specific workload or application based on the current
demand or load. Autoscaling helps to ensure that there are enough resources available to handle the workload while also optimizing resource utilization.
Kubernetes provides two types of autoscaling mechanisms: Single pod More pods
<———————> <———————————————>
Horizontal Pod Autoscaling (HPA) allows you to automatically scale out your application by adding or removing replicas .
.

based on resource utilization metrics such as CPU utilization or custom metrics. This ensures that you have the necessary
resources to handle increased traffic or load without over-provisioning resources and incurring additional costs
Pod1 Pod1 Pod2 Pod3
Scaling out, also known as horizontal scaling, is the process of adding more replicas of a Deployment or ReplicaSet to handle an increase in traffic or load
Before HPA scaling After HPA scaling
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
Metrics Server < apiVersion: apps/v1

<————>

<————>

<————>
name: php-apache This specifies the target 2.calculate the Replicas
kind: Deployment
.

metadata:
Deployment for the HPA …
namespace: dev name: php-apache
spec: 15 secs
namespace: dev

>
scaleTargetRef: <—
Pod1 ———Pod2 1.Query for metrics
<—

Pod3
<—
spec:
>
——

—— —

——

Horizontal Pod Autoscaler


—— ——
kind: Deployment ——
——



—— selector:
name: php-apache Deployment matchLabels:
run: php-apache
Scall <
ReplicaSet
apiVersion: apps/v1 replicas: 1
Replication Controller
minReplicas: 3 minimum and maximum number of 3.scale the app to desired replicas
template:
StatefulSet
maxReplicas: 20 replicas for the Deployment metadata:
metrics: labels:
- type: Resource HPA uses the metrics server to collect the metrics data and then uses the scaling run: php-apache
resource: spec:
algorithm to calculate the new number of replicas needed based on the current load containers:
name: cpu - name: php-apache
target: image: k8s.gcr.io/hpa-example
type: Utilization ports:
averageUtilization: 50 - containerPort: 80
The metrics section defines the metrics that the HPA uses to scale the Deployment. In this case, two metrics are specified: CPU utilization and memory utilization. For resources:
- type: Resource
resource: each metric, the HPA calculates the average utilization across all pods over a certain period of time and compares it to the target utilization. If the actual utilization limits:
cpu: 500m
name: memory exceeds the target utilization, the HPA increases the number of replicas. If the actual utilization falls below the target utilization, the HPA decreases the number of requests:
target: replicas. By using multiple metrics, the HPA can make more informed scaling decisions cpu: 200m
type: Utilization ----
averageUtilization: 40 apiVersion: v1
kind: Service
behavior: The behavior section defines the scaling behavior for the HPA. In this case, the HPA uses two policies for scaling up and two policies for scaling down. The Pods metadata:
scaleUp: policy specifies the number of replicas to add or remove, while the Percent policy specifies the percentage of replicas to add or remove. By using both policies, the name: php-apache
policies: namespace: dev
- type: Pods HPA can scale up or down more quickly or slowly, depending on the workload. The selectPolicy field specifies how the HPA should choose between the Pods and labels:
value: 5 Percent policies. In this case, it's set to Max, which means that the highest value of the two policies will be used for scaling up, and the lowest value will be used for run: php-apache
spec:
periodSeconds: 30 scaling down. The stabilizationWindowSeconds field specifies the number of seconds that the HPA should wait before it starts scaling again after a scaling event. ports:
- type: Percent This helps to prevent rapid scaling, which can cause instability in the cluster - port: 80
value: 100 selector:
periodSeconds: 30 The Pods policy specifies that the HPA should add 5 replicas every 30 seconds, while the Percent policy specifies that the HPA should add 100% replicas every 30 seconds run: php-apache
selectPolicy: Max
hpa
dev
stabilizationWindowSeconds: 40 NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
n
et -
scaleDown: php-apache Deployment/php-apache 40%/40%, 20%/50% 3 20 3 1d
policies: Kg
- type: Pods Max(5,3)
value: 4 5pod NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
100%[3pod]
periodSeconds: 10 php-apache Deployment/php-apache 30%/40%, 15%/50% 3 20 8 1d
- type: Percent
value: 10 The Pods policy specifies that the HPA should remove 4 replicas every 10 seconds, while the Percent policy specifies that the HPA should remove 10% replicas every 10 seconds
periodSeconds: 10
selectPolicy: Min HPA is designed to automatically scale the number of replicas of a deployment or a replica set based on observed CPU utilization, memory
stabilizationWindowSeconds: 5
utilization, or custom metrics. This makes it well-suited for stateless workloads that can be easily scaled horizontally by adding more replicas

Vertical Pod Autoscaling (VPA) allows you to automatically scale up or down the resource requests and limits of containersin a Pod based on actual resource usage. This ensures
that each Pod has the necessary resources to handle the workload efficiently without wasting resources Pod1
Pod1 After scalling
Scaling up, also known as vertical scaling, is the process of increasing the resources available to each replica of a Deployment or ReplicaSet to handle an increase in demand Before scalling

apiVersion: "autoscaling.k8s.io/v1" there are two ways to trigger VPA in k8s: automatic and manual,you can set ————————> ——————>
kind: VerticalPodAutoscaler the updatePolicy field to Auto for automatic scaling or Off for manual scaling VPA Here, the VPA is scaling the
cpu and mem of pod1
metadata: cpu:2
name: php-apache cpu:4
mem:2G
namespace: dev 1
Because of the updateMode field in is set to "Auto", the VPA Updater component will automatically update the resource requests and mem:6G
spec:
targetRef: limits of the containers in the pods. apiVersion: apps/v1
apiVersion: "apps/v1" targetRef: The reference to the target workload object that the VPA should adjust. In this case, it's a deployment with the name php-apache kind: Deployment
kind: Deployment metadata:
name: php-apache updatePolicy determines how frequently the pod resource requests and limits should be updated. In this case, it's set to "Auto", which means the VPA name: php-apache
namespace: dev
updatePolicy: will automatically update the resource requests and limits based on the pod's usage spec:
updateMode: "Auto"
resourcePolicy: resourcePolicy defines the resource requests and limits for the containers in the target workload. In this case, there is one container policy defined selector:
matchLabels:
containerPolicies: containerName: The name of the container to apply the policy to. In this case, it's set to *, which means the policy applies to all containers in the target workload run: php-apache
- containerName: '*'
mode determines how the resource requests and limits are set. In this case, it's set to "Auto", which means the VPA will automatically adjust the resource replicas: 1
mode: "Auto" template:
controlledValues: "RequestsAndLimits" requests and limits based on the pod's usage. metadata:
minAllowed: controlledValues: The values that the VPA is allowed to set for the resource requests and limits. In this case, it's set to "RequestsAndLimits", which labels:
cpu: 10m run: php-apache
memory: 5Mi means the VPA can adjust both the resource requests and limits. spec:
maxAllowed: The minimum resource request and limit values allowed for the container are set to 10 milliCPU and 5 MiB of memory. The maximum resource containers:
cpu: 200m - name: php-apache
request and limit values allowed for the container are set to 200 milliCPU and 500 MiB of memory image: k8s.gcr.io/hpa-example
memory: 500Mi
controlledResources: ["cpu", "memory"] controlledResources: The resources that the VPA is allowed to adjust. In this case, it's set to both CPU and memory. ports:
- containerPort: 80
resources:
apiVersion: "autoscaling.k8s.io/v1" requests:
cpu: “20m”
kind: VerticalPodAutoscaler memory: “200Mi”
metadata: 2 Because of the updateMode field in is set to "Off", the VPA Updater component will not automatically update the resource requests and limits of the limits:
name: php-apache
namespace: dev containers in the pods. In this case, you will need to manually update the resource requests and limits of the pods when necessary cpu: “500m”
memory: “1Gi”
spec: To manually adjust the resource requests and limits, you can update the deployment or statefulset object that the VPA is targeting
targetRef: kubectl describe vpa command can provide recommendations for the resource requests and limits of containers based on the resource
apiVersion: "apps/v1" VPA is well-suited for stateful workloads
kind: Deployment usage metrics collected by the VPA controller
name: php-apache Deployment
kubectl -n dev get vpa
updatePolicy: NAME MODE CPU MEM PROVIDED AGE | ——————————————————————————————————————
VPA components |
updateMode: "off" php-apache off 163m 262144k True 2m7s | 4.Pod Resource Recommendations | 5.Terminate the pod
> VPA Updater 6.Recreates Pod
| |
| |
>

>

kubectl describe vpa -n kube-system | 7.Get the Pod Resource Recommendation VPA Admission | 8. Apply pod spec Pod
… | > >
container Recommendations. Controler
| adding cpu:”250m” Cpu: “500m”
Lower Bound: The minimum amount of CPU and memory that the container | |
>

Container Name: php-apache


| 3.provides Pod Resource
should have to meet the resource utilization targets |
Lower Bound:
Recommendation
Cpu: 25m | > VPA |
Memory: 262144k VPA
Target: The target amount of CPU and memory that the container should | < Recommender |
Target: 2.Read configs from
| Read pod Resource |
>

Cpu: 163m VPA & it will compute


Memory: 262144k have to achieve the desired resource utilization levels 1.Configure VPA
| —————————————————————————————————————— Utilization Metrics |
Upper Bound:
Metrics Server
Cpu: 10173m
Memory: 2770366988 Upper Bound: The maximum amount of CPU and memory that the container can use Monitors Pod Resource Utilization Metrics
18
Configure application

Kubernetes provides several ways to configure applications, including using ConfigMaps, environment variables, and Secrets.
ConfigMaps are Kubernetes resources that can be used to store configuration data as key-value pairs. You can create a ConfigMap with the desired configuration data, and then
reference it in your Deployment or Pod specification using the 'configMapKeyRef' field or mount it directly to the pod.

You can create a ConfigMap using the 'kubectl create configmap' command, or by defining a YAML file.

apiVersion: v1 K get cm
k describe cm db-config
Pod
kind: ConfigMap Config map
metadata: File o
r env v
Secret
ars s
name: app-configmap t var
apiVersion: v1

m en
kind: ConfigMap

iron
Vars

Env
metadata:
data: dGlnZHajsurn name: db-config
JahuNrux8dmx data:
DB_HOST: "mydbhost" hUd4NbvvE4s0
dGln
Jahu ZHaj
hUd4 Nrux surn me
db-host: il-server2
lu
9sdej3mdMksp 9sde Nbvv 8dmx Vo db-port: "5432"
DB_PORT: "5432" AXI6cGFzczEy AXI6 j3md E4s0
MzQ cGFz Mksp
DB_NAME: "payments"
czEy
DB_NAME: "mydb" MzQ

ile
c/f
/et

How you can use a ConfigMap to store configuration data

———————————-—————-——————
——————
——————

>
Environment Variables > Configuration Files >
Command-Line Arguments
You can store environment variables in a ConfigMap You can store configuration files in a ConfigMap and You can store command-line arguments in a ConfigMap
and use them to configure your application > mount them as volumes in your container and use them to configure your application
———————————-—————-—————— apiVersion: v1
apiVersion: v1 kind: Pod
kind: Pod metadata:
metadata:
Pod name: mypod
name: mypod
spec:
spec: containers:
ts
Config map

————
>
volum
- nam
es:
e:
config config-vo
Ma lume
Conne
ct
volum
: con
eMoun

fig-vo
lume
Use
containers: - name: mycontainer
<——— name: p: - name
apiVersion: v1
kind: ConfigMap CM Define
metadata:
app-co
nfigm
ap
fig
- name: mycontainer image: myimage
tc/con
name: app-configmap
data:
db-host: ir-server2
db-port: "5432" mou
ntPa
th:/e
image: myimage command: ["/bin/myapp"]
DB_NAME: "payments"
volumeMounts: args: ["--config", "/etc/myapp/config.yaml"]
- name: config-volume volumeMounts:
mountPath:/etc/config - name: config-volume
we define a volume named 'config-volume' that maps to the 'app-configmap' volumes: apiVersion: v1
mountPath: /etc/myapp/ kind: ConfigMap
ConfigMap using the 'configMap' field. We then mount this volume into the - name: config-volume volumes:
configMap: metadata:
container using the 'volumeMounts' field, which specifies that the volume - name: config-volume name: my-configmap
name: app-configmap
should be mounted at the path '/etc/config' in the container configMap: data:
name: my-configmap config.yaml: |
Now, any configuration files that are stored in the 'app-configmap' ConfigMap can be accessed by the application running in the container
setting1: value1
at the '/etc/config' path
setting2: value2

Environment variables can be used to pass configuration information to the container, such as database connection strings or API keys. You can define environment variables
in the Deployment or Pod specification using the 'env' field, the 'envFrom' field, and the 'valueFrom' field.
> ————————————-————————-————————-————————-—————-————————-————————-—————-—————-———————-—

—-————————-—————-————————-————————-—————-—————-————————-—
—-————————-—————-————————-————————-—————-—————-————————-—

The 'env' field is used to define individual environment The 'envFrom' field is used to define environment variables for a the 'valueFrom' field is used to define environment variables for a container
variables for a container. You can define the name and container based on a ConfigMap or Secret. You can specify the based on a field in another resource, such as a ConfigMap or Secret. You
value of each environment variable using the 'name' and name of the ConfigMap or Secret using the 'configMapRef' or can specify the name of the resource and the field using the
'value' fields, respectively 'secretRef' fields, respectively 'configMapKeyRef' or 'secretKeyRef' fields, respectively

apiVersion: apps/v1 > apiVersion: apps/v1


kind: Deployment kind: Deployment > apiVersion: apps/v1
… metadata: kind: Deployment
template: name: myapp …
metadata: spec: template:
labels: replicas: 3 metadata:
app: myapp selector: labels:
spec: matchLabels: app: myapp
containers: app: myapp spec:
- name: web template: containers:
image: myapp:latest metadata: - name: web
ports: labels: image: myapp:latest
- containerPort: 80 app: myapp ports:
env: spec: - containerPort: 80
- name: DB_HOST containers: env:
value: "il-server2" - name: web - name: DB_HOST
- name: DB_PORT image: myapp:latest valueFrom:
value: "5432" ports: configMapKeyRef:
- name: DB_NAME - containerPort: 80 name: db-config
value: "payments" envFrom: key: db-host
- configMapRef: - name: DB_PORT
name: db-config valueFrom:
-secretRef: configMapKeyRef:
name: db-secrets name: db-config
apiVersion: v1 key: db-port
kind: ConfigMap
metadata: apiVersion: v1
name: db-config kind: Secret The 'name' field in the 'configMapKeyRef' field specifies apiVersion: v1
data: metadata: the name of the ConfigMap, and the 'key' field specifies kind: ConfigMap
DB_HOST: "il-server2" name: db-secrets the name of the key within the ConfigMap to use as the metadata:
DB_PORT: "5432" type: Opaque name: db-config
value for the environment variable
DB_NAME: "payments" data: data:
DB_USER: dXNlcg== db-host: il-server2
DB_PASSWORD: cGFzc3dvcmQ=
Because we only want to add specific variables from a db-port: "5432"
DB_NAME: "payments"
ConfigMap to a container, we use the 'valueFrom' field
————-——————————-————————-————————-————————-———-————————-————————-—————-—————-———————-—

Secrets are similar to ConfigMaps, but are used to store sensitive information such as passwords, tokens or API keys. You can create a Secret with the desired sensitive information,
and then reference it in your Deployment or Pod specification using the 'secretKeyRef' field.
19
you can also create a secret by running the kubectl create secret command
arye@dev: kubectl get secrets
kubectl create secret generic db-secret --from-literal=username=myuser --from-literal=password=mypassword NAME TYPE DATA AGE
default-token-abc12 kubernetes.io/service-account-token 3 4d
This command will create a secret named db-secret with two key-value pairs: username and password db-secret Opaque 2 2h
> To use a secret in a pod, you can mount it as > To update a secret, you can use the kubectl edit secret
arye@dev: kubectl describe secret db-secret The DATA column shows the number of
a volume or use it as an environment variable command or edit the yaml file directly and apply the changes Name: db-secret data items (key-value pairs) in each secret
spec: Namespace: default
containers: kubectl edit secret db-secret Labels: <none>
- name: my-container Annotations: <none>
image: my-image
volumeMounts: Type: Opaque
- name: secret-volume
mountPath: /etc/myapp/secret Data
readOnly: true ====
volumes: By default, the values of the key-value pairs in a secret are base64-encoded to provide a password: 16 bytes
- name: secret-volume basic level of obfuscation. To decode the values, you can use the base64 command
username: 6 bytes
secret:
secretName: db-secret kubectl get secret db-secret -o jsonpath='{.data.password}' | base64 --decode

There are several types of secrets in Kubernetes, including:


1 Opaque: This is the default secret type in Kubernetes. It can be used to store any arbitrary data and is encoded in base64.
2 TLS: This type of secret is used to store TLS certificates and keys. It contains two keys: tls.crt and tls.key.
3 Docker-registry: This type of secret is used to authenticate with a Docker registry. It contains the username and password for the registry.
4 SSH: This type of secret is used to store SSH keys. It contains the private key and the public key.
5 Service account: This type of secret is automatically created by Kubernetes when a service account is created. It contains a token that can be used to authenticate the service account.

Warning
Kubernetes Secrets use base64 encoding to obfuscate the sensitive data, it is important to note that base64 encoding is not a form of encryption and can be easily decoded

two solutions to solve this problem


>
>

Using external encryption tools or key management systems to secure sensitive you can use access controls to limit who can access the sensitive data. K8s provides
data before storing it in Kubernetes Secrets can enhance security.(HashiCorp various mechanisms for controlling access, such as RBAC and network policies, that
Vault, Azure Key Vault, and AWS Key Management Service) can be used to limit access to sensitive data to only authorized users and applications

————————————————————————-—————————-————————-————————-————————-————————-————————-—————-
Application Lifecycle Management
initContainer
An init container is a special type of container that runs before the main container(s) in a pod. The purpose of an init container is to perform some initialization or setup tasks that
are required before the main container(s) can start running. Init containers are defined in the same YAML file as the pod specification, alongside the main container(s). They can be
used to perform tasks such as setting up a database schema, downloading necessary files, or waiting for a specific service to become available
Init containers have their own lifecycle, and they are considered successful if they complete their tasks without error. If an init container fails, Kubernetes will
attempt to restart it until it succeeds, which ensures that the main container(s) in the pod are not started until the initialization tasks are complete

apiVersion: v1 Example 1 Pod


kind: Pod |———————————————————————————————————> t |
| |
metadata: | init container | Readness
name: my-webapp-pod | |
>

| |
spec: | Restore if |
Main container
initContainers: | |
- name: redis-setup needed
|
|
. . |
|
Sidecar container
image: redis:latest | —————————————————————————————————————————————————|
command: ["sh", "-c"] < Backup
args:
-|
The init container uses the Redis image and runs a shell command that performs the following tasks:
redis-cli ping || exit 1
redis-cli config set maxmemory 1gb • Check if the Redis server is running by pinging it apiVersion: v1 Example 2

redis-cli config set maxmemory-policy allkeys-lru • Set the maximum memory limit to 1 gigabyte kind: Pod
redis-cli config set save "" • Set the eviction policy to "allkeys-lru"
metadata:
containers: name: mysql-db
• Disable automatic snapshots by setting the save policy to an empty string spec:
- name: webapp
image: my-webapp-image containers:
ports: After completing its tasks, the init container exits and is terminated. The main container
- name: mysql
- containerPort: 80 image: mysql:5.7
then starts running and serves the web application for the duration of the Pod's lifecycle env:
env:
- name: REDIS_HOST - name: MYSQL_ROOT_PASSWORD
value: redis-service valueFrom:
- name: REDIS_PORT secretKeyRef:
value: "6379" name: db-secrets
key: password
When the Pod is started, the migrate-db container runs first and performs the database migration. Once the initContainers:
- name: migrate-db
migration is complete, the mysql-db container starts and runs the application, which now uses the migrated database image: mysql:5.7
command: ['sh', '-c', 'mysql -h ${DB_HOST} -u root -p$
Pod lifecycle {DB_PASSWORD} ${DB_NAME} < /migrations/migrate.sql']
env:
Here are the key phases in the lifecycle of a Pod in Kubernetes:
- name: DB_HOST
value: 127.0.0.1
- name: DB_NAME
value: mydb
- name: DB_PASSWORD
Once a pod's containers are scheduled to run on a node, the pod enters the running phase valueFrom:
secretKeyRef:
name: db-secrets
key: password
->->->- ->->->- volumeMounts:
- name:migrations
Pending Running Succeeded mountPath: /migrations
-> ->- volumes:
-> >->
-> - name: migrations
When a pod is created, it enters the pending phase. During this phase, the
- -
configMap:
Kubernetes scheduler assigns the pod to a node and the container images
When a container in a pod completes its task successfully, the container enters name: db-migrations
are pulled from the container registry. The pod remains in the pending
the Succeeded phase and the pod is considered to have completed its task
phase until all of its containers are ready and scheduled to run on a node
Failed

When a container in a pod fails or crashes, the container enters the Failed phase
20
MULTI-CONTAINER PODs: > Sidecar Container > Adapter >
Ambassador

Sidecar container is a container that is deployed alongside a main container in a pod . The main container is typically an application that performs some specific function, while
the sidecar container provides support or complementary functionality to the main container

The idea behind the sidecar pattern is to keep the main container focused on a specific task or functionality, while delegating other Pod
App-container SidecareContainer
tasks to the sidecar container. This allows for more modular and flexible deployment architectures, as the sidecar container can be
Lifecycle Lifecycle
updated or replaced independently of the main container > >

|
——————— |
Storage
Although containers inside a pod share a common network and storage, they have independent lifecycles and can be created, updated, and deleted individually Network

important use cases


>
Logging and Monitoring: A side containers can be used to collect and forward logs and metrics from the main application container to a central monitoring system
>
Backup and Recovery: A side containers can be used to perform backup and recovery operations on the main application container
>
Service mesh: A sidecar container can be used to implement a service mesh such as Istio or Linkerd. A service mesh provides additional functionality for managing and securing communications between
services running in Kubernetes

One example of how a sidecar container can be used with a database service in a Kubernetes deployment: apiVersion: v1
The main container is running a database service and is exposing port 5432 for incoming database connections. kind: Pod
metadata:
The sidecar container is configured to perform backups of the database name: db-pod
spec:
containers:
- name: db-container
image: my-database-image
env:

Pod
- name: DATABASE_URL
value: "postgresql://my-database-hostname:5432/my-database"
db-container
A sidecar process responsible for periodic ports:
backups of the database to an S3 bucket - containerPort: 5432
Dump Sidecare volumeMounts:
24
===== Container
——————
Amazon - name: db-data
S3 mountPath: /var/lib/postgresql/data
>
>

Storage /var/lib/postgresql/data /backups name: sidecar-container


image: my-sidecar-image
env:
- name: BACKUP_LOCATION
value: "s3://my-bucket/my-backups"
The sidecar container can periodically backup the database to a remote location to ensure data resiliency - name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: db-secrets
key: database-password
volumeMounts:
- name: backup-data
mountPath: /backups
- name: db-secrets
mountPath: /secrets
—————-————————-————————-————————-—————-
The sidecar container is running a script that periodically backs up the database and stores the backup files in the "/backups" directory. command: ["/bin/sh", "-c"]

——————-——————
——————-——————

The script is also using the "pg_dump" command to perform the backup and gzip to compress the backup file. The backup location is specified in args:
-|
the environment variable "BACKUP_LOCATION", which is set to an S3 bucket. The script is running in an infinite loop and sleeps for 24 hours while true; do
between each backup. pg_dump -U postgres -h localhost my-database | gzip > /backups/my-
database-$(date +%Y-%m-%d-%H%M%S).sql.gz; s3cmd put /backups/my-database-
*.sql.gz "$BACKUP_LOCATION";
sleep 86400;
s3cmd put /backups/my-database-*.sql.gz "$BACKUP_LOCATION";
done
—————-————————-————————-————————-—————-
volumes:
- name: db-data
The two containers are communicating using shared volumes and environment variables. The main container is using a volume mount emptyDir: {}
called "db-data" to store its data files, while the sidecar container is using a volume mount called "backup-data" to store its backup files - name: backup-data
emptyDir: {}
- name: db-secrets
secret:
Job & CronJobs secretName: db-secrets

Job is a type of resource that allows you to create and manage a finite or batch process in your cluster. Jobs are commonly used for tasks that need to be run once or
a few times, such as data processing, backups, or migrations apiVersion: batch/v1
kind: Job
metadata:
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number name: data-processing-job
of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. spec:
When a specified number of successful completions is reached, the task (ie, Job) is complete backoffLimit: 3
template:
The backoffLimit specifies the number of times k8s
spec:
containers:
should retry the Job if it fails before giving up
- name: data-processor
image: data-processor:v1.4
command: ["python", "process_data.py"]
restartPolicy: Never

CronJobs in Kubernetes are a way to schedule and automate the execution of Jobs on a recurring basis. A Job is a Kubernetes object that creates one or more Pods to perform
a specific task, and a CronJob is a higher-level abstraction that allows Jobs to be scheduled according to a specific time or interval, similar to the Unix cron utility.

Let's say you have a web application that periodically needs to generate reports based on user data. You could create a kind: CronJob
apiVersion: batch/v1

CronJob that runs a script to generate the report and then terminates when the report is complete. metadata:
name: report-generation-cronjob
spec:
Job will run at the top of every hour schedule: "0 * * * *"
Activation time
CronJobs Trigger
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
—>

spec:
> This will delete the Pod 100 seconds after it finishes ttlSecondsAfterFinished: 100
jobTemplate
Job containers:
- name: report-generator
Pod image: my-django-app:v1
env:
- name: DJANGO_SETTINGS_MODULE
CronJobs create Jobs which in turn create Pods to run the task
value: myapp.settings
Notice: By default, completed Jobs and Pods are retained after running. To automatically clean up completed command: ["python", "manage.py", "generate_report"]
Jobs, you can set `.spec.successfulJobsHistoryLimit` and `.spec.failedJobsHistoryLimit` on the CronJob
restartPolicy: Never
21
Rollout & Rollback
apiVersion: apps/v1
Rollout is the process of updating a Deployment or ReplicaSet to a new version of your application kind: Deployment
Rollback is performed by updating container with the previous version of the container image metadata:
name: nginx-imp
spec:
Create
Deployment ———> ReplicaSet-1 replicas: 3
selector:
apiVersion: apps/v1 matchLabels:
rollback rollout app: nginx
Pod Pod
kind: Deployment
Pod ——————> strategy:
Revision 1 app: v1.17 app: v1.17 app: v1.17
metadata:
type: RollingUpdate ————|
name: nginx-imp ReplicaSet-2 rollingUpdate: |
————>

spec: |
maxUnavailable: 1
rollout replicas: 3 |
selector: maxSurge: 1
|
After adding the 'strategy' section to the YAML
template:
Create a new matchLabels:
metadata:
|
version Revision 2 Pod Pod Pod app: nginx file and applying it to the Kubernetes cluster using
labels:
|
template: |
app: v1.18 app: v1.18 app: v1.18 the 'kubectl apply' command, Kubernetes will app: nginx |
metadata: start a RollingUpdate for the Deployment
————>

spec: |
labels: |
rollback app: nginx kubectl apply -f deployment.yaml
containers:
|
spec: - name: nginx |
containers: image: nginx:1.18 |
Pod Pod Pod ports:
Revision 1 - name: nginx Update image to nginx:1.2
|
app: v1.17 app: v1.17 app: v1.17 image: nginx:1.17 - containerPort: 80 |
|
ports: |
- containerPort: 80 |
|
|
When you perform an upgrade to a deployment, Kubernetes creates a new replica set with the updated container image and |
|
configuration, and gradually replaces the pods managed by the old replica set with the pods managed by the new replica set |
|
|
During a Rolling update , the 'maxUnavailable' and 'maxSurge' settings determine the rate at which replicas are replaced, ensuring that the application remains available and stable throughout the update process |
|

'maxUnavailable' specifies the maximum number of replicas that can be unavailable during the update process. This parameter ensures that the application always has a minimum number of replicas available, even during the
update process. For example, if you set 'maxUnavailable' to 1, Kubernetes will not terminate more than one replica at a time during the update process, ensuring that the application always has at least one replica available.

'maxSurge' specifies the maximum number of new replicas that can be created during the update process. This parameter ensures that the update process is efficient and does not overload the system with too many new replicas
at once. For example, if you set 'maxSurge' to 1, Kubernetes will not create more than one new replica at a time during the update process, ensuring that the application remains stable and functional throughout the update.

you can also run a rolling update in Kubernetes using the 'kubectl' command
pod name-Replicaset id-pod id
> To perform a rolling update using the 'kubectl' command, you need to have a Deployment defined in Kubernetes
kubectl get pod,rs

kubectl create deployment nginx-imp —image nginx:1.17 —replicas 3 create a Deployment with the 'nginx:1.17' NAME READY STATUS RESTARTS AGE
pod/nginx-imp-f32gt99mnj-ihd7t 1/1 Running 0 10m
image and three replicas …
pod/nginx-imp-f32gt99mnj-ki34f 1/1 Running 0 10m

> use the 'set' command in 'kubectl' to update the image used by the Deployment NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-imp-f32gt99mnj 1 1 1 10m
After executing the 'set' command, Kubernetes will start replicaset.apps/nginx-imp-ht5g34kpz2 0 0 0 12m
kubectl set image deployment/nginx-imp nginx=nginx:1.18 a rolling update for the 'nginx-imp' Deployment
—>
Old Replicaset New Replicaset

>
You can monitor the progress of the rolling update by running the following command

kubectl rollout status deployment/nginx-imp If you want to pause the rolling update at any
kubectl get pod,rs
time, you can use this command:
NAME READY STATUS RESTARTS AGE
kubectl rollout pause deployment/nginx-imp pod/nginx-imp-ht5g34kpz2-ihd7t 1/1 Running 0 10m

pod/nginx-imp-ht5g34kpz2-ki34f 1/1 Running 0 10m
>
If you want to undo the update and roll back to the previous version, you can use the following command: ———> NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-imp-f32gt99mnj 0 0 0 14m
kubectl rollout undo deployment/nginx-imp replicaset.apps/nginx-imp-ht5g34kpz2 1 1 1 16m

>
You can use 'kubectl rollout history' command to view the revision history of a Deployment, including the
rollout status, the version of the Deployment, and the date and time of the revision

kubectl rollout history deployment nginx-imp To change the 'CHANGE-CAUSE' annotation for a Deployment in Kubernetes, you can use the 'kubectl annotate' command
deployment.apps/nginx-imp
REVISION CHANGE-CAUSE kubectl rollout undo deployment/nginx-imp --to-revision=2
1 kubectl create deployment nginx-imp --image=nginx:1.17 --replicas=5
2 kubectl set image deployment/nginx-imp nginx=nginx:1.18 kubectl annotate deployment nginx-imp kubernetes.io/change-cause="updated to nginx 1.19" --overwrite

The 'CHANGE-CAUSE' field in the 'kubectl rollout history' output is an annotation that kubectl rollout history deployment nginx-imp
is added to the Deployment when it is updated using the 'kubectl set' command deployment.apps/nginx-imp
This annotation can be useful for tracking changes and providing REVISION CHANGE-CAUSE
additional information about the update process 1 kubectl create deployment nginx-imp --image=nginx:1.17 --replicas=3
2 updated to nginx 1.18

Some of Deployment strategies to perform rollouts

> > > >

Rolling updates are performed by gradually replacing Recreate strategy deletes all the old Pods Blue/Green strategy creates a new set of Pods running Canary strategy updates a small percentage of Pods with
instances of an old version of a container with instances before creating new ones. This can result in the updated version of your application alongside the old the new version of your application, while the rest of the
of a new version. (default deployment strategy) some downtime for your application set of Pods running the previous version Pods continue to run the previous version

Before deployment After deployment Before deployment After deployment Before deployment After deployment Before deployment After deployment
. . . . . . . . . . . . . . . . . . . . . . . . . . .
V1.0 . -> . V1.0 V2.0 V1.0 V2.0 V1.0 V2.0 V1.0
->

. . .
V2.0 . . . . . .
V2.0 standby V1.0 standby
22
Self-Healing Application
Self-healing applications in Kubernetes are applications that can detect and recover from failures automatically without human intervention. Kubernetes provides several
mechanisms to enable self-healing, including probes, replica sets, and deployments. These components together ensure that the desired state of the application is maintained,
even in the face of failures, updates, or changes in the environment.

Probes play a vital role in ensuring the health and availability of pods and containers running in a Kubernetes cluster. By The main idea behind ReplicationControllers and Deployments in Kubernetes is to
periodically checking the health of containers, Kubernetes can take appropriate actions such as restarting containers, maintain a desired number of pod replicas running at any given time. In other words,
marking pods as ready to receive traffic, or delaying traffic until an application inside a container has started successfully they ensure that a particular pod (or set of pods) always remains up and running.

Kubernetes provides three main types of probes to check the health of Pods The kubelet is responsible for running probes

| on containers to check their health


———-———————-————————-—————-————————-————————-—————-——
| | |
> > >

Liveness probes Readiness probes Startup Probes


Kubernetes uses liveness probes to know when to restart a Container. For Kubernetes uses readiness probes to know when a Container is ready to start These probes let Kubernetes know when your application has
example, a liveness probe could catch a deadlock, where an application is accepting traffic. A Pod is considered ready when all of its Containers are started. If such a probe is configured, it disables liveness and
running, but unable to make progress. Restarting a Container in such a ready. One use of this signal is to control which Pods are used as backends for readiness checks until it succeeds, making it useful for slow-
state can help to make the application more available despite bugs. Services. When a Pod is not ready, it is removed from Service load balancers. starting containers.
}
Pod Condition : Pod scheduled
Pod Phase : Pending
Container: Waiting

|—

{
|
|
Pod started | InitialDelaySeconds
for
|
| |
|
Startup Probe

|
———————
Wait | Startup
<——

periodSeconds
——>
|
|
Probe
Execution
|
| Successful Probe Reply
No | SuccessTreshold=1
Start Readiness & liveness Probes
|
Startup
}
}
|
Probe ——> Is last attempt ——> Startup FAILURE | InitialDelaySeconds
success? No Yes Pod STOPED | for InitialDelaySeconds
| Liveness Probe for

Pod Condition : Not Ready (duo to Ready Probe)


| Readiness Probe
———

|
|
Yes |
Wait ——————>———— ——————<——————
Wait |
|
Liveness
Probe
Readiness
Probe
<——

<——

periodSeconds <—-—| —-—> periodSeconds | Execution Execution

}
|
|
}
| Successful Probe Reply Successful Probe Reply
——>
——>

No | | No |
| periodSeconds
| periodSeconds
Liveness | | Readiness
——> Is last attempt ——> Readiness FAILURE
Liveness FAILURE |
NO TRAFFIC
<—— Is last attempt <—— Probe
| |
Probe
NO TRAFFIC
|
Yes No success? success? No Yes |
| Readiness
| | | Liveness Probe
Execution
Yes | | Yes |
Probe
Execution
| |

Pod Phase : Running


}
| Successful Probe Reply
| Wait | | Successful Probe Reply

|
|
periodSeconds
|
|
|
|
|
|
|
|
periodSeconds
} periodSeconds

| |

Container: Running
|
| | Readiness
| | |
|
|
Liveness
Probe
Probe
Execution

| Pod running and |


| Execution

—> receiving traffic <—


|
| Successful Probe Reply
Successful Probe Reply
SuccessTreshold=3

The probes can be implemented in several ways


SuccessTreshold=3
||

| Pod Condition : Ready

———-———————-————————-—————-————————-————————-—————-————— | Pod Phase : Running Container: Running


| | |
> > >

HTTP checks: Kubernetes sends an HTTP request to the specified path TCP checks: Kubernetes tries to establish a TCP connection Exec checks: Kubernetes executes the specified command within
of your application. If the application responds with a success status code to your application on the specified port. If it can establish a your container. If the command returns an exit status of 0, the
(200 - 399), the probe is successful. Otherwise, it's considered a failure connection, the probe is successful. Otherwise, it's failed. probe is successful. Otherwise, it's considered a failure.

apiVersion: v1 apiVersion: v1 apiVersion: v1


kind: Pod kind: Pod kind: Pod
metadata: metadata: metadata:
name: my-pod name: my-pod name: my-pod
spec: spec: spec:
containers: containers: containers:
- name: my-container - name: my-container - name: my-container
image: my-image image: my-image image: my-image
livenessProbe: ports: livenessProbe:
httpGet: The "initialDelaySeconds" field indicates that - containerPort: 8080 exec:
path: /healthz k8s should wait 30 seconds before checking livenessProbe: command:
port: 8080 the container's health for the first time tcpSocket: - /bin/sh
initialDelaySeconds: 30 port: 8080 - -c
periodSeconds: 10 The "periodSeconds" field indicates that initialDelaySeconds: 15 - /usr/bin/custom-script.sh
Kubernetes should check the container's periodSeconds: 10 initialDelaySeconds: 30
health every 10 seconds thereafter failureThreshold: 3 periodSeconds: 10
The Liveness Probe is configured to use an HTTP GET request to check We use the tcpSocket handler to check the container's health by This probe runs a script inside the container. If the
the container's health. The request is sent to the path "/healthz" on port trying to open a TCP connection to port 8080. If the connection script terminates with 0 as its exit code, it means
8080, which is where the container exposes its health check endpoint is successful, the Liveness Probe is considered successful the container is running as expected

Example: a Web Application with Readiness Probe


apiVersion: apps/v1
If a container fails the Readiness Probe check, it will be removed from the list of endpoints used by the service as a backend.
web-service

kind: Deployment
God help it
Service
metadata: This ensures that the service does not send requests to the container until it becomes ready to receive them again.
————
<-<- <-

name: web-app <-<-<-<-<-<-<-<-


spec: —————————————
——

——
<-<-

kubectl describe svc web-service apiVersion: v1


replicas: 3 kind: Service
selector: Container Container Container Name: web-service metadata:
matchLabels: Namespace: default name: web-service
app: web-app Pod Pod Pod Labels: <none> spec:
template: Annotations: Selector: app=web-app selector:
10.0.0.2 10.0.0.3 10.0.0.4 Type: ClusterIP
metadata: IP: 10.0.0.1
app: web-app
labels: Web-app ports:
Port: http 80/TCP
app: web-app TargetPort: 8080/TCP - name: http
spec: kubectl describe pod … Endpoints: 10.0.0.2:8080, 10.0.0.3:8080, 10.0.0.4:8080 port: 80
containers: Session Affinity: None targetPort: 8080
Conditions: Conditions: Events: <none> type: ClusterIP
- name: web-container Type Status Type Status
image: my-web-image ---- ------ ---- ------
ports: Initialized True Initialized True
Ready True Ready False
- containerPort: 8080 ContainersReady True ContainersReady True
readinessProbe: PodScheduled True PodScheduled True
httpGet:
path: /healthz failureThreshold is a parameter that can specify how many consecutive failures are allowed before the container is considered to have failed the probe.
port: 8080 If the deployment fails the probe check three times in a row, the kubelet will restart the pod K describe deploy web-app
initialDelaySeconds: 10 Events
periodSeconds: 5 readiness probe failed
failureThreshold: 3
Cluster maintenance 23
Node maintenance
Node maintenance in Kubernetes refers to the process of temporarily taking a node out of the cluster to perform maintenance tasks such as upgrading the operating system,
applying security patches, replacing hardware or performing other tasks that require the node to be offline. During this time, any workloads running on the node will be evicted
and rescheduled onto other nodes in the cluster to ensure high availability and minimal disruption to users.
up
Down (Steps 2,3) 1 New node
2 Nodes need to be regularly updated and
kubeworker-1 UnScheduleable maintained to keep the cluster healthy
|
3 |
Reschedule |
>———————————————————>
New node

——
Master Worker node Worker node Worker node
———>
Other nodes
Worker node
·
Perform maintenance tasks 4
schedulable 5

Steps to perform maintenance on a node in Kubernetes


——————————————————————

——————
——————
To avoid any service disruptions during node maintenance, it's important to ensure that Once the drain command completes and all the pods have been successfully Not adding a replacement node may cause the cluster to become
your Kubernetes cluster has sufficient resources and capacity to handle the workload rescheduled onto other nodes, you can perform the required maintenance unready, especially when there are a large number of pods running
of the evicted pods. If a node is added to the cluster, it increases the overall resources tasks on the drained node. This may include updating the operating system, on the node being taken down and insufficient resources available
available for scheduling pods, reducing the chances of service disruptions. (optional ) performing security patches, or any other necessary maintenance activities on the remaining nodes to allocate to those pods
——————————————————————
> >

Node
1 kubectl cordon kubeworker-1
maintenance Add a Node —>— Cordon —>— Drain —>— Perform maintenance tasks —>— UnCordon —>— Remove added Node
steps
[after updating all nodes] 2 kubectl drain kubeworker-1

3 Start node updating …..


When maintenance needs to be performed on a
node, it should be cordoned as the first step
4 kubectl uncordon kubeworker-1
> > >

The cordon command marks a node as unschedulable. It prevents The drain command is used to gracefully evict pods from the After the maintenance tasks are completed, the node can be brought back online and added back to
new pods from being scheduled on the node while allowing existing node that is undergoing maintenance. It triggers the rescheduling the cluster. Kubernetes will automatically detect the new node and begin scheduling pods on it again.
pods to continue running. By running kubectl cordon <node-name>, of active pods onto other available nodes in the cluster. Running It's important to note that when a node is added back to the cluster, Kubernetes will not automatically
you indicate that the node is entering maintenance and should not kubectl drain <node-name> initiates the process of moving pods move all the evicted pods back to the node. Instead, the scheduler will treat the node like a new node
receive any additional workload. off the node, ensuring that they are not abruptly terminated. and schedule new pods onto it based on the available resources and workload requirements

Cordon Drain UnCordon

Pod Pod Pod Pod Pod Pod


Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod

Ready Ready Scheduling Disabled Ready Scheduling Disabled Ready Ready Ready

Reserving resources for the operating system and the kubelet in Kubernetes is crucial for maintaining stability

Kubernetes nodes can encounter resource starvation issues when pods consume all available capacity on a node, resulting in an insufficient allocation of resources for critical system daemons
and processes that drive the functioning of the operating system and Kubernetes infrastructure. This imbalance can subsequently lead to cluster instability and performance degradation.
configuring kubelet resource reserves is a good way to prevent resource starvation issues on Kubernetes nodes.
Here are some ways kube and system resource reserves can help:
<->

Eviction Thresholds -———————————

kube-reserved This reserves resources for Kubernetes system daemons like kubelet, container runtime, node problem detector, etc.
> > > > > > >

> > > > > > >

Prevents starvation of critical components.


Allocatable
amount of compute resources that are available for pods

> > > > >


Pods
system-reserved Reserves resources for the underlying node's kernel and system services. Leaves room for OS processes.
.

Kube Reserved Kubelet


eviction-hard The kubelet will evict pods when available resources drop below this threshold to maintain reserves
System Reserved Operating system
--kube-reserved=cpu=500m,memory=1Gi
To configure these reserves, you can set flags on the kubelet service like: --system-reserved=cpu=1,memory=2Gi
--eviction-hard=memory.available<500Mi

SPOF
Single Point of Failure (SPOF) refers to a component or resource that, if it fails, can cause a complete or partial outage of the entire system. This means that the failure of a single
component can result in the unavailability or degraded performance of the overall Kubernetes cluster. Identifying and mitigating SPOFs is crucial for ensuring high availability and
reliability in a Kubernetes environment. Here are some recommendations for ensuring the minimum amount of SPOFs for critical Kubernetes components:

Kubernetes Control Plane - Need at least 3 master nodes spread across availability etcd - For production, need at least 3 etcd instances, 5 for better Worker Nodes - No specific minimum, but have at least 3
zones. This ensures high availability of API server and controller manager. redundancy. Should be co-located with control plane nodes. nodes in a cluster and spread them across zones.

Load Balancers - Front load balancers with at Ingress Controllers - Need 2+ ingress controllers like Data Storage - Use cluster-wide storage like Cluster Networking - Should have high availability at the network
least 2 instances or use external LB services. Nginx for redundancy. Configure with a load balancer. GlusterFS, Rook, OpenEBS with replication. level - multiple switches, routers etc. Avoid SPOF in networking.
24
Cluster upgrade
It's important to keep k8s components up-to-date with the latest stable version to ensure that the cluster is secure and stable. Here are the several methods for upgrading a k8s cluster:

Kubeadm: Kubeadm is a popular tool for bootstrapping and managing Kubernetes clusters, particularly Kubernetes Tools: Various Kubernetes deployment tools such as Kops, Kubespray, Rancher, and
for self-provisioned clusters. Kubeadm provides commands like `kubeadm upgrade plan` and `kubeadm others provide their own mechanisms for cluster upgrades. These tools typically offer automation
upgrade apply` to systematically upgrade the control plane and worker nodes. It simplifies the process of and specific commands for upgrading the cluster. For example, Kops provides the `kops upgrade
upgrading kubeadm-provisioned clusters. cluster` and `kops rolling-update` commands to handle the upgrade process.

Cloud Provider Upgrades: Managed Kubernetes services offered by cloud providers, such as Blue-Green Deployment: The blue-green deployment approach involves creating a parallel "green" cluster
Amazon EKS, Azure AKS, and Google GKE, often handle control plane upgrades transparently. with the desired version while the existing "blue" cluster is still running. Once the green cluster is ready, you
The cloud provider automatically manages the upgrade process, including the control plane switch traffic over to it, ensuring minimal downtime. After verifying the green cluster's stability, you can
components. As a user, you only need to update the node machine images to the desired version. delete the old blue cluster. This method allows for a smooth transition and rollback option if any issues arise.

Kubernetes does support the last three minor versions for 9 months and provides patches for security and bug fixes during that time
Kubernetes releases its versions
based on semantic versioning
V 1.22 ——— V 1.23 ——— V 1.24 ——— V 1.25 ——— V 1.26

un-Supported ———————Supported ———————- Latest The maximum amount of difference that can exist between k8s components V 1.25.3
• Control plane components: 0 versions (identical) MAJOR MINOR PATCH
when updating Kubernetes it is generally recommended to update only one For production Kubernetes clusters, the general recommendation is to • kubelet/kubectl: Up to 2 minor versions behind Features Bug fixes
minor version at a time, Minor version updates are meant to be backwards stay within 1 minor version of the latest stable Kubernetes release. • etcd: Up to 1 minor version behind API server Functionalities
compatible. So going from 1.x to 1.x+1 should work smoothly

How to Upgrade Kubernetes Cluster Using Kubeadm?

Step 1: Prepare for the Upgrade recommended to perform upgrades on a test cluster before Step1: Determine which version to upgrade to
upgrading a production cluster to ensure that the process ETCDCTL_API=3 etcdctl snapshot save snapshot.db \ My current version is 1.25.3 and we will be upgrading it
Before upgrading, it is important to review the
goes smoothly and without any issues --endpoints=$ENDPOINTS \ to one higher version, ie, 1.26.7
release notes and documentation for the target --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
Kubernetes version. Check for any specific Back up any critical data and configurations, including etcd --key=/etc/kubernetes/pki/etcd/server.key # Find the latest 1.26 version in the list.

requirements or considerations. data, you maybe need to roll back the upgrade. # It should look like 1.26.x-00, where x is the latest patch

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

Step 2: Upgrade Control Plane Nodes


A. Drain the control plane node C. Plan the upgrade D. Perform the upgrade E.Upgrade kubelet and kubectl
Upgrade the control plane components (API server,
controller manager, and scheduler) and etcd (if kubectl drain <control-plane-node-name> --ignore-daemonsets sudo kubeadm upgrade plan sudo kubeadm upgrade apply v1.26.7 apt-mark unhold kubelet kubectl
sudo apt-get upgrade -y kubelet=1.26.7-00
applicable) on each control plane node one by on Upgrade to the latest version in the v1. series: sudo apt-get upgrade -y kubectl=1.26.7-00
COMPONENT CURRENT TARGET sudo apt-mark hold kubelet kubectl
Typically, this involves running a series of commands B.Upgrade kubeadm
"Skip this step if you want to update to
the latest patch in this minor version."
kube-apiserver v1.25.3 v1.26.7
kube-controller-manager v1.25.3 v1.26.7
with `kubeadm` to upgrade the control plane kube-scheduler v1.25.3 v1.26.7
sudo apt-mark unhold kubeadm kube-proxy v1.25.3 v1.26.7 F. Restart kubelet and Uncordon the node
components. sudo apt-get upgrade -y kubeadm=1.26.7-00 CoreDNS v1.9.1 v1.9.3
sudo apt-mark hold kubeadm etcd 3.5.4-0 3.5.6-0 sudo systemctl daemon-reload
C:Analyzes the current state of the cluster and generates a plan for
You can now apply the upgrade by executing the following command: sudo systemctl restart kubelet
The `apt-mark` command in the operating system can be used to label packages kubeadm upgrade apply v1.26.7
upgrading the control plane components to a newer version of Kubernetes. kubectl uncordon <control-plane-node-name>
and update only the operating system without changing their version.

Step3: Upgrade Worker Nodes

Upgrade the worker nodes one by one. This can be done A. Drain the control plane node D.Upgrade kubelet and kubectl
C.Upgrade the k8s configuration
by draining and cordoning each node, upgrading the kubectl drain <worker-node-name> --ignore-daemonsets apt-mark unhold kubelet kubectl && \
sudo kubeadm upgrade node apt-get update && \
necessary components, and then uncordoning the node. apt-get install -y kubelet=1.27.x-00 && \
apt-get install -y kubectl=1.27.x-00 && \
B. Upgrade kubeadm, kubelet
The upgrade process for worker nodes typically involves apt-mark hold kubelet kubectl
upgrading the kubelet, kube-proxy, and any other relevant apt-mark unhold kubeadm && \
apt-get update && apt-get install -y kubeadm=1.26.7-00 && \
components. apt-mark hold kubeadm E. Restart kubelet and Uncordon the node

sudo systemctl daemon-reload


sudo systemctl restart kubelet
>
"All at once": In this strategy, all worker nodes are upgraded at the same time. This approach can be faster than kubectl uncordon <worker-node-name>
Worker nodes other strategies, but it also carries the highest risk of causing downtime if something goes wrong during the upgrade
upgrade strategies
>
"+1/-1": This strategy involves upgrading one worker node at a time, starting with adding a new node with the updated Kubernetes version, followed by removing an old node with
the old version. This process is repeated until all worker nodes have been upgraded. This strategy minimizes the risk of downtime while still allowing for a relatively quick upgrade.
>
"-1/+1": This strategy is similar to "+1/-1", but it involves removing an old node first and then adding a new node with the updated Kubernetes version. This strategy carries a slightly higher risk of
downtime because there may be fewer worker nodes available during the upgrade process, which could result in an overload on the remaining nodes and potentially cause them to become not ready

Step 4: Verify Cluster Health Step 5: Update Kubernetes Objects

After upgrading all the control plane and worker Some Kubernetes objects, such as Deployments or
nodes, you should verify the health of the cluster. StatefulSets, may need to be updated to take advantage
of new features or changes in the upgraded version
Check the status of the control plane components
using commands like `kubectl get nodes` and
`kubectl get pods -n kube-system`.
25
Backup & Restore Methods
It's important to regularly back up to ensure that your k8s cluster can be easily restored in the event of a failure or data loss. Additionally, it's important to
test your backup and restore processes to ensure that they are working properly and that you can recover from any issues that may arise.
When designing a backup strategy for a Kubernetes cluster, it's crucial to back up both the application data and the cluster configuration.

Cluster configuration Cluster configuration includes all the Kubernetes objects and resources that configure your cluster and applications

etcd data: The cluster state and metadata in Kubernetes are stored in etcd. To Kubernetes manifests: includes all the Kubernetes objects and resources that configure your cluster and applications.
ensure cluster recovery, it's crucial to back up the etcd data. This can be achieved This includes things like deployments, services, configmaps, and etc.These resources are usually defined as code, for
either by taking periodic snapshots of the etcd database or by implementing a example in YAML or JSON files. Because they are code, a good practice is to store them in a version control system like Git.
backup solution specifically designed for etcd, such as etcdctl or Velero. This gives you a history of changes and allows you to revert to a previous state if something goes wrong.

You can backup Kubernetes resources using etcdctl command-line You can backup Kubernetes resources using Velero

apiVersion: batch/v1beta1 Once Velero is installed, you can create a backup by running the following command: you can also automate the backup process with Velero
kind: CronJob velero backup create <backup-name>
metadata: apiVersion: batch/v1beta1
name: etcd-backup kind: CronJob
spec: By default, Velero will back up all resources in all namespaces. If you want to back up only certain namespaces metadata:
name: velero-backup
schedule: "0 * * * *" or resources, you can specify them with the --include-namespaces and --include-resources flags, respectively spec:
jobTemplate: schedule: "0 * * * *"
spec: velero backup create <backup-name> --include-namespaces my-namespace \ jobTemplate:
template: --include-resources deployments,pods spec:
spec: template:
containers: spec:
- name: etcd-backup The BackupController notices the new The BackupController makes a call to the object containers:
image: quay.io/coreos/etcd:v3.5.0 - name: velero
Backup object and performs validation. storage service to upload the backup file.
command: image: velero/velero:v1.7.0
- /bin/sh command:
2
Create custom
- /velero
- -c 1 ———————>
Watch
User resource Kube-apiserver <— args:
-| —— - backup
ETCDCTL_API=3 etcdctl snapshot save /backup/k8s/etcd-snapshot.db \ velero backup create <——— 4 Object storage - create
3 Query BackupController ——>

<====>
--endpoints=<ETCD-endpoints> \ Velero client makes a call to the k8s Velero
service - my-backup
--cacert=/etc/kubernetes/pki/etcd/ca.crt \ API server to create a Backup object
volumeMounts:
--cert=/etc/kubernetes/pki/etcd/server.crt \ - name: cloud-credentials
The BackupController begins the backup
--key=/etc/kubernetes/pki/etcd/server.key Etcd mountPath: /credentials
process. It collects the data to back up by env:
volumeMounts:
- name: etcd-certs querying the API server for resources - name: AWS_ACCESS_KEY_ID
mountPath: /etc/kubernetes/pki/etcd valueFrom:
secretKeyRef:
- name: backup
mountPath: /backup To restore a backup, run the following command: name: cloud-credentials
key: aws_access_key_id
volumes: - name: AWS_SECRET_ACCESS_KEY
- name: etcd-certs apiVersion: v1
kind: PersistentVolumeClaim
velero restore create --from-backup <backup-name> valueFrom:
secret: metadata: secretKeyRef:
By default, Velero will restore all resources in the backup to their original namespaces. If you want to restore only certain
name: backup-pvc name: cloud-credentials
secretName: etcd-certs spec:
key: aws_secret_access_key
- name: backup namespaces or resources, you can specify them with the --include-namespaces and --include-resources flags, respectively
storageClassName: <storage-class>
accessModes: volumes:
persistentVolumeClaim: - ReadWriteOnce
claimName: backup-pvc resources: - name: cloud-credentials
requests: velero restore create --from-backup <backup-name> --include-namespaces my-namespace \ secret:
storage: 5Gi --include-resources deployments,pods secretName: cloud-credentials
> This manifest sets up a CronJob that runs an etcd backup job every hour, using the etcdctl
command-line tool inside a container to create a snapshot of the etcd database and save it to the specified path. It mounts the etcd certificates and a PersistentVolumeClaim for storing the backup.

To restore an etcd backup using etcdctl


1 Ensure that the Kubernetes API server is not active or stopped 2 change the path of the ETCD data directory to var /var/lib/etcd-from-backup/, you 3 Use the etcdctl to restore the backup
need to edit the manifest file and update the relevant volume and hostPath specifications
sudo mv /etc/kubernetes/manifests/kube-api.yaml another-path etcdctl snapshot restore /backup/k8s/etcd-snapshot.db \
/etc/kubernetes/manifests/etcd.yaml
Or
volumes: --data-dir=/var/lib/etcd-from-backup \
--initial-cluster= etcd01= kubemaster-1=https://192.168.100.11:2380 \
- name: etcd-data --initial-advertise-peer-urls https://192.168.100.11:2380 \
,etcd02=http://<etcd02-ip>:2380

sudo systemctl stop kube-apiserver hostPath: --initial-cluster-token=<ETCD-initial-cluster-token> \


path: /var/lib/etcd-from-backup/ --name=kubemaster-1

Refers to the actual data produced and managed by the applications running on your k8s cluster. This could include databases, user-generated content, logs,
Application data
and anything else that your applications are producing or manipulating
CSI-driver
storage provider

There are several strategies you can follow to backup this data: ———————————
Physical disk is
claims persistent
Volume > Persistent Volume associated with

Database Backups: If you're using a database in your application, it's likely that the database itself has backup
Persistent volume
———————————
———>

Physical disk
Pod

———————————————————>
functionality. For example, you can create a dump of a MySQL database or a snapshot of a MongoDB database.
Create persistent
volume

Request a
Persistent Volume <—————>
persistent volume
Backup Sidecars: Another approach is to use a sidecar container in your pods specifically for managing
Mounted Requests a disk from
———> Storage Class ————————>
storage provider
on pod Claim
backups. This container would be responsible for regularly creating backups and sending them to a remote location Source
———————————————————————————————————————————————————
————————————————
————————————————

>

Volume Snapshots: Kubernetes volume snapshots provide a standardized way to create copies of the content of
Request a Volume
Snapshot Content volumes Snapshot Requests a snapshot of
Volume Snapshot <—————> Class ————————>
disk from storage provider
persistent volumes at a point in time, without creating new volumes.
———>

Create volume
To create a VolumeSnapshot in Kubernetes, follow the steps below Snapshot Content

Verify VolumeSnapshotContent
———————————
Ensure that you have the necessary Create a VolumeSnapshot of the claims Volume Volume Snapshot A snapshot of Physical
Snapshot Content > Content disk is associated with
prerequisites in place desired PVC was created Volume Snapshot Content
———————————
Physical disk
Cluster must have volume snapshot CRDs, and
for snapshots
—————————————————————— Check the status of the VolumeSnapshot ———————————————————————————————————————————————————
snapshot controller deployed on it for this to work
| |
| Define a VolumeSnapshot object that | to ensure it is created successfully When you create a VolumeSnapshot object, it triggers the storage provider to create a snapshot of
| |
kubectl get crds | grep snapshot.storage.k8s.io | references the PVC you want to snapshot | kubectl describe volumesnapshot <snapshot-name> the underlying storage volume. The snapshot is represented by the VolumeSnapshotContent object
| |
CSI driver and storage class that support volume snapshots ——————————————————————

To restore a snapshot, create a new PVC based on a VolumeSnapshotContent. This results in


When a VolumeSnapshot object is created, the VolumeSnapshotClass provisions a VolumeSnapshotContent to hold
apiVersion: v1 a new PV with data populated from the snapshot
kind: PersistentVolumeClaim the actual snapshot data.. Deleting the VolumeSnapshot object does not delete the VolumeSnapshotContent object.
metadata:
name: csi-pvc If you want to delete the snapshot data, you need to delete the corresponding VolumeSnapshotContent object VolumeSnapshoet Deployment
spec: |——————————————— Create a Pod that mounts the
accessModes: VolumeSnapshotContent
PVC ——> Pv restored PVC to validate the backup
- ReadWriteOnce
resources: PVC VolumeSnapshot ——> VolumeSnapshotContent kind: PersistentVolumeClaim
requests: Storage Class metadata: kind: Deployment
storage: 1Gi
PV metadata:
name: csi-pvc-restored
storageClassName: csi-hostpath-sc spec: name: my-csi-app
apiVersion: snapshot.storage.k8s.io/v1 spec:
kind: VolumeSnapshot accessModes:

apiVersion: snapshot.storage.k8s.io/v1 VolumeSnapshotClass metadata: - ReadWriteOnce
volumes:
kind: VolumeSnapshotClass name: new-snapshot-demo resources:
requests: - name: my-csi-volume
metadata: spec: persistentVolumeClaim:
name: csi-hostpath-sc volumeSnapshotClassName: csi-hostpath-sc storage: 1Gi
storageClassName: csi-hostpath-sc claimName: csi-pvc-restored
driver: hostpath.csi.k8s.io source:
deletionPolicy: Delete persistentVolumeClaimName: csi-pvc dataSource:
name: new-snapshot-demo
The VolumeSnapshotClass defines the snapshotter/provisioner that will be used to take snapshots and parameters like retention policy, etc. kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io New PV provisioned from the snapshot data
it enables dynamic provisioning of snapshots, just like a StorageClass allows dynamic provisioning of volumes
The dataSource field indicates that this PVC is a clone of the specified snapshot
26
Security
Kubernetes uses a combination of secure network channels, authentication and authorization mechanisms, network policies, and container security features to ensure that all
communication within the cluster is authenticated, encrypted, and secure. These mechanisms help to protect the cluster against unauthorized access, data breaches, and other
security threats, and provide a reliable and secure platform for deploying and managing containerized applications.

Secure network channels: Kubernetes uses secure network channels to ensure that all communication within the cluster is encrypted and secure. These channels are established using
Transport Layer Security (TLS) certificates, which provide a secure way to authenticate the identity of different components and encrypt all data that is transmitted between them.

> In Kubernetes, many of the components use mutual TLS (Transport Layer Security) authentication for secure communication between each other. This method involves each component
having its own certificate (crt) and private key (key) that are used to authenticate and encrypt communication when communicating with other components.

The cluster's certificate authority (CA) is responsible for issuing and managing certificates used for authentication and encryption within the cluster. the CA is typically implemented as a component within the Kubernetes control
plane, and is responsible for generating and managing the cluster's root certificate and private key. These are used to sign and issue certificates for different components within the cluster, such as nodes, API servers, and users.

/etc/kubernetes/ is a directory that contains Kubernetes configuration


files. usually used for defining settings related to the k8s components Ca.crt Ca.key CERTIFICATE AUTHORITY (CA)
arye@kubemaster-1:/etc/kubernetes$ ll
-rw------- 1 root root 5450 May 18 09:34 admin.conf
-rw------- 1 root root 5486 May 18 09:34 controller-manager.conf All requests and responses between different components in
-rw------- 1 root root 1886 May 18 09:35 kubelet.conf
drwxr-xr-x 2 root root 4096 May 18 10:30 manifests/ the cluster are routed through the API server, which ensures
drwxr-xr-x 3 root root 4096 May 18 09:34 pki/ that all communication is authenticated and secure. Kube-API apiserver-etcd-client .crt
-rw------- 1 root root 5438 May 18 09:34 scheduler.conf apiserver.crt apiserver-etcd-client .key
These configuration files are essential for the proper functioning of the various k8s apiserver.key Server —————————————————> ETCD
components. They contain settings such as the API server address, authentication apiserver-kubelet-client .crt

————————————————
<—————————————————
<——————————————————

<—————————————————
<—————
<——————————————
These certificates are issued by the Certificate Authority (CA) apiserver-kubelet-client.key
and authorization information, and other component-specific configurations Etcdserver.crt
Etcdserver.key
admin.conf: This file contains the configuration for the Kubernetes cluster
Admin.crt
administrator, holding the necessary credentials and cluster information to Kubectl
—————
Admin.key
interact with the cluster using the kubectl command-line tool

The /etc/kubernetes/pki directory is a directory used by Kubernetes to


store the public key infrastructure (PKI) materials, such as certificates kube-proxy.crt
and keys, that are used to secure communication between the different Kube-proxy
————————
Kube-proxy.key
components of the Kubernetes cluster.

arye@kubemaster-1:/etc/kubernetes/pki$ ll Container-manager.crt Kube-Controller ———————————


-rw-r--r-- 1 root root 1090 May 18 09:34 apiserver-etcd-client.crt Container-manager.key ——————————> kubelet.crt
-rw------- 1 root root 1679 May 18 09:34 apiserver-etcd-client.key Manager Kubelet
-rw-r--r-- 1 root root 1099 May 18 09:34 apiserver-kubelet-client.crt ————————————
kubelet-client.crt kubelet.key
-rw------- 1 root root 1675 May 18 09:34 apiserver-kubelet-client.key kubelet-client.key
-rw-r--r-- 1 root root 1229 May 18 09:34 apiserver.crt Scheduler.crt
-rw------- 1 root root 1679 May 18 09:34 apiserver.key
Scheduler.key Kube-Scheduler ———————————
The kubelet.crt and kubelet.key files are typically located in the /var/lib/kubelet/pki
-rw-r--r-- 1 root root 1025 May 18 09:34 ca.crt
-rw------- 1 root root 1679 May 18 09:34 ca.key directory on the node where the kubelet is running
drwxr-xr-x 2 root root 4096 May 18 09:34 etcd/
-rw-r--r-- 1 root root 1038 May 18 09:34 front-proxy-ca.crt The Scheduler.crt and Scheduler.key files are not typically found in the /etc/kubernetes/pki directory because the
-rw------- 1 root root 1679 May 18 09:34 front-proxy-ca.key Certificate (public key) Private key
-rw-r--r-- 1 root root 1058 May 18 09:34 front-proxy-client.crt Kubernetes scheduler component does not require its own certificate and key for secure communication within the cluster.
———— ————
-rw------- 1 root root 1675 May 18 09:34 front-proxy-client.key The communication between the scheduler and the API server is typically secured using the API server's certificate and key *.crt *.key
-rw------- 1 root root 1675 May 18 09:34 sa.key *.pem *.key.pem
-rw------- 1 root root 451 May 18 09:34 sa.pub

Certificates generated by kubeadm expire after 1 year and will need to be


———
renewed. kubeadm provides a simple command to renew all certificates|
|
|
————- It is advisable to backup your certificates and configuration files before executing the command
To renew all the certificates in a k8s cluster with kubeadm, you can
|
use the kubeadm certs renew command with the all option /etc/kubernetes/pki/*.* /etc/kubernetes/*.conf ~/.kube/config
|
root@kubemaster-1 ( /etc/kubernetes): |
———————-|
kubeadm certs renew all
|
|
This will renew the following certificates:
|
- etcd server and peer certificates |————- After running the command you should restart the control plane Pods
- API server certificate |
- Front proxy client certificate | Static Pods are managed by the local kubelet and not by the API Server, thus kubectl cannot be used to delete and
| restart them. To restart a static Pod you can temporarily remove its manifest file from /etc/kubernetes/manifests/
- Controller manager client certificate
|
- Scheduler client certificate |
Note: kubelet.conf is not included in the list above |
|
you can check the expiration dates of the certificates |
|————-
arye@kubemaster-1:~$ sudo kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
kubeadm can renews all the certificates during control plane upgrade.
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
This feature is intended to address straightforward scenarios. If you don't have specific requirements regarding
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf May 18, 2023 09:34 UTC 341d no
apiserver May 18, 2023 09:34 UTC 341d ca no
apiserver-etcd-client May 18, 2023 09:34 UTC 341d etcd-ca no certificate renewal and regularly perform Kubernetes version upgrades (with less than a year between each
upgrade), kubeadm will handle the process to ensure your cluster remains up to date and reasonably secure.
apiserver-kubelet-client May 18, 2023 09:34 UTC 341d ca no
controller-manager.conf May 18, 2023 09:34 UTC 341d no
etcd-healthcheck-client May 18, 2023 09:34 UTC 341d etcd-ca no
etcd-peer May 18, 2023 09:34 UTC 341d etcd-ca no
etcd-server May 18, 2023 09:34 UTC 341d etcd-ca no
front-proxy-client May 18, 2023 09:34 UTC 341d front-proxy-ca no
scheduler.conf May 18, 2023 09:34 UTC 341d no
CERTIFICATE AUTHORITY. EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca May 15, 2032 09:34 UTC 9y no
etcd-ca May 15, 2032 09:34 UTC 9y no
front-proxy-ca May 15, 2032 09:34 UTC 9y no

you can use the following command to display the details of a certificate file in a human-readable format:
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout
27
Authentication & Authorization
When a client (such as kubectl or a custom application) sends an API request to Kubernetes, the request goes through several steps before it is processed and a response is sent back to the client

>Kubectl Curl

Who can access? Authentication


LoadBalancer External request
———————> -> Authentication —-—> Authorization —-—> Admission control ———-—> etcd What can they do? Authorization
———————>
Internal request
Pod Svc
Kube-API
Kubelet k8s objects
list of some built-in admission control plugins available in Kubernetes
LimitRanger (enforces resource limits on pods and containers)
Request Role-Based Access Control (RBAC) is common authorization method in Kubernetes NamespaceExists (rejects requests in namespaces that don't exist)
—————————>———>———>

ValidatingAdmissionWebhook (calls a webhook to validate the object)


Admin groups Who?
RoleBinding ——Dev NameSpace MutatingAdmissionWebhook (calls a webhook to mutate the object)
Authentications plugin ————— Who?
Priority (enforces pod priority based on PriorityClass)
User
Arye
—— Which Roles?
• Static Token file
Mohsen
>
ResourceQuota (enforces resource quotas for a namespace)
Harold Service-Account
Role PodPreset (injects configuration data into pods based on pod presets)
>
• Open ID Connect ——— Which resource?
Authz PodTolerationRestriction (enforces restrictions on tolerations for pods)
> • X.509 certificates ————————> StatefulSet Pods
What actions?——

ConfigMaps Services list get SecurityContextDeny (denies pods with certain security context settings)
• Authentication proxy
PvcS Deployments watch create SecurityContextDeny (denies pods with certain security context settings)
• Webhook Resources

delete patch RuntimeClass (enforces the use of a specific runtime class for pods)
Verbs update you can also use Gatekeeper, It allows you to define and enforce custom
policies that restrict the creation and modification of resources in the cluster

if one authorization plugin fails to authorize the request, API server > Authorization can prevent unauthorized access to resources in the cluster, it cannot prevent the creation or modification of resources that do not comply with the cluster's policies.
try another plugin until it finds one that can authorize the request Admission control helps ensure that only valid and compliant resources are created or updated in the cluster, which can prevent misconfigurations, security vulnerabilities, and other issues

Authentication: Kubernetes uses authentication mechanisms to verify the identity of users and components trying to access the cluster. This ensures that only authorized users and components can access the
cluster.
The most significant authentication methods in k8s

Service Account authentication > processes SA Credential


X.509 client certificates authentication User x509 Credential
Credential
User User User
Permission Credential Service Account authentication is a method that uses Kubernetes Service Accounts X.509 client certificates authentication is a method that uses digital certificates to authenticate
|
————————>
K get - - raw /api/v1/namespaces/kube-system/pods | jq .
<——————————-

| to authenticate clients. Each Service Account in the cluster has its own token that is clients. Each client in the cluster has its own certificate and private key that are used to authenticate
>

| |
| | | used to authenticate clients. This method is commonly used when a cluster has a large and encrypt communication when communicating with other components. This method is commonly
| {
status: Failure |
{
| number of clients or when automated processes need to access the Kubernetes API used when a cluster has a small number of clients or when strong authentication is required
status: Failure
| code": 403 | code": 401

| } | }
|
| | Webhook authentication
>

|
| | Authentication Webhook authentication is a method that uses an external HTTP service to authenticate clients. This method is commonly used when a cluster needs to integrate with acustom authentication
| 1
| Who are you? system. The external HTTP service receives authentication requests from the Kubernetes API server and returns a response indicating whether the client is authenticated or not
| |
| |
|
>

|
2 Authorization
an authenticated user can execute Authorization: Kubernetes uses authorization mechanisms to determine what actions a user or component can perform within the cluster. This ensures that users and
What action on which resources components have access only to the resources they are authorized to access.

Role-Based Access Control (RBAC): RBAC is a security mechanism in Kubernetes that allows you to control access to resources based on the user's role and permissions. in RBAC, you define roles and cluster roles
that specify a set of permissions, such as read, write, or delete, for a particular set of resources. You then create role bindings and cluster role bindings that associate roles and cluster roles with users, groups, or service accounts.

Who? Webapp NameSpace


ClusterRoleBinding ————————
:
Admin groups
Who?
User ————— Which Cluster Roles?
Arye

Mohsen dev NameSpace


Harold
>
| RoleBinding —————
——————————————————————
|
| Who?
|
———————————————— | Which Roles or ClusterRoles?
Service-Account |
Subject >
Role
|
|
| |
StatefulSet Pods ClusterRole Team-A NameSpace
| Which resources? |
| |
Which resources? ——————
| What actions? apiVersion: rbac.authorization.k8s.io/v1 ConfigMaps Services
|
|
kind: RoleBinding What actions? ——
Pods list metadata:
|
name: pod-list-permission-binding PvcS Deployments
Services
apiVersion: rbac.authorization.k8s.io/v1 get
namespace: dev … list get
Resources
Deployments kind: Role watch subjects:
metadata: - kind: User watch create
StatefulSet create
name: pod-list-permission name: john
delete patch
ConfigMaps namespace: dev delete apiGroup: rbac.authorization.k8s.io Verbs define the types of actions that a user or
rules: roleRef:
PvcS
- apiGroups: [""] patch kind: Role Verbs update service account can perform on a specific resource
… resources: ["pods"] update name: pod-list-permission
verbs: ["get", "list"] apiGroup: rbac.authorization.k8s.io
Resources Verbs

Role is a set of permissions that define what actions RoleBinding is a mechanism for binding a role to a ServiceAccount,a user or ClusterRoles: A ClusterRole is similar to a Role, but it applies to the entire cluster ClusterRoleBinding is
.

are allowed on specific resources within a namespace. group of users within a namespace. Role bindings are used to grant specific instead of a single namespace. ClusterRoles can be used to grant permissions for cluster-scoped and apply
permissions to users or groups of users by assigning them to a particular role cluster-scoped resources (e.g. Nodes) or for resources in all namespaces. to all namespaces

————
some built-in ClusterRole
| —————————
> Built-in Roles and ClusterRoles are predefined by k8s. These built-in roles are >
| ClusterAdmin: This role is intended to be used by administrators who need full access to all resources kubectl describe clusterrole cluster-admin
| Name: cluster-admin
Roles and ClusterRoles can designed to provide a set of default permissions for managing Kubernetes resources | in the cluster. It grants permissions to perform any action on any resource in any namespace. Labels: kubernetes.io/bootstrapping=rbac-defaults
| Annotations: rbac.authorization.kubernetes.io/autoupdate: true
be either custom or built-in >
|
|
admin: This ClusterRole provides full access to manage resources in a specific namespace, including PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
K get clusterrole > Custom Roles and ClusterRoles are created by users to define their own set of |
| the ability to create, update, and delete resources --------- ----------------------- --------------------- -----
*.* [] [] [*]
permissions for managing Kubernetes resources
|
> system:controller:*, system:node:*: These ClusterRoles provide permissions for k8s controllers and [*] [] [*]
nodes to manage resources in the cluster

The 'system:node' ClusterRole is used to define the set of permissions for nodes in the cluster. This ClusterRole is typically used to grant
permissions to the kubelet, to perform actions on various resources related to nodes, including nodes themselves, pods, and service accounts

Admission control: Kubernetes uses a Admission control mechanism for enforcing rules and policies on k8s resources before they are created or updated in the cluster. This can include validating the structure
and content of resource manifests, applying default values, and enforcing constraints on resource usage. There are two types of admission control plugins:

Validating admission plugins: These plugins validate the request object without modifying it. They can reject Mutating admission plugins: These plugins can modify the request object, as well as validate it. They are
the request if it doesn't meet the required criteria. executed before the validating admission plugins
28
Service Account

Service accounts in Kubernetes are non-human accounts that provide a unique identity for system components and application pods. These accounts are namespace-specific objects managed
within the Kubernetes API server. By default, each Kubernetes namespace includes a service account called "default" which has no special roles or privileges assigned to it. In earlier versions of
Kubernetes prior to 1.24, when a service account was created, an associated token would be automatically generated and mounted within the pod's file system. However, from Kubernetes 1.24
onwards, the automatic token generation has been discontinued, and tokens must be acquired through the TokenRequest API or by creating a Secret API object, allowing the token controller to
populate it with a service account token.
apiVersion: v1 namespace:dev
automatically create a default service account
kubectl create namespace <namespace_name> ——————————————————————————————> kind: ServiceAccount Service Account:default monitoring-agent-SA
metadata:
name: default
Every kubernetes namespace has a default service account named default once being created namespace: < namespace_name > Pod Pod

>
>
serviceAccountName: serviceAccountName:
monitoring-agent-SA
If a pod is created without specifying a service account, it will use the default ServiceAccount, default Service Account has limited permissions,
-

but If you need to grant your pod more permissions, you can create a custom Service Account with the necessary roles and assign it to the pod.

Creating and Using a Service Account in a Kubernetes Pod.

| 1 Create a service account kubectl create serviceaccount monitoring-agent-SA


|
| in Kubernetes v1.24 and earlier, when a Service Account is created, a token secret is automatically generated and stored in the same namespace. This token secret is used for authentication and authorization
|
| purposes, However, in Kubernetes v1.25 and later, this automatic token creation has been removed. Instead, there are alternative methods for token creation and management. Here are some options:
|
| apiVersion: v1
| kind: Secret
| When you create a Secret with the annotation kubernetes.io/service-account.name and specify a ServiceAccount name, the token
| Secret API object controller in k8s will automatically populate the Secret with a service account token associated with the referenced ServiceAccount.
metadata:
|
name: monitoring-agent-SA-token
| so you don't need to manually generate or provide the token for the Secret.The token controller takes care of generating and populating the token for you annotations:
| kubernetes.io/service-account.name: monitoring-agent-SA
| type: kubernetes.io/service-account-token
|
|
TokenRequest is an API resource in k8s that allows you to request a token for a specific Service Account. It offers a way to dynamically generate short-lived tokens for authentication and authorization
| TokenRequest
| purposes. The TokenRequest object has several important fields, with the audience field being one of them. The `audience` field specifies the intended recipient(s) or target audience for the requested token,
defining the authorized users or services for token usage. Here are some examples of audiences that can be specified in the audience field
|
| apiVersion: authentication.k8s.io/v1
| kind: TokenRequest
| rbac.authorization.k8s.io: For Role and ClusterRole operations metrics.k8s.io: For Metrics API access metadata:
| name: monitoring-agent-SA-token
| spec:
authentication.k8s.io: This specifies use by authentication methods and operators like kubelet storage.k8s.io: For Storage operations audiences:
|
| - rbac.authorization.k8s.io
| api: This specifies that the token is intended for use against the Kubernetes API server. API access given to service accounts is enforced by this audience serviceAccountName: monitoring-agent-SA
|
|
| 2 Grant permissions to the ServiceAccount apiVersion: rbac.authorization.k8s.io/v1
| kind: ClusterRoleBinding
| Create a Cluster Role that grants the necessary permissions for SA and Create a Role Binding that associates the Service Account with the Cluster Role metadata:
| Code creates a ClusterRole that grants permissions to retrieve and list information about pods and nodes in the k8s cluster using the "get", "list", and "watch" verbs apiVersion: rbac.authorization.k8s.io/v1 name: monitoring-agent-role-binding
| kind: ClusterRole subjects:
| metadata: - kind: ServiceAccount
| To check the permissions of a service account in Kubernetes, execute the following command to list all the available permissions granted to the name: monitoring-agent-role name: monitoring-agent-SA
|
| monitoring-agent-SA Service Account in the default namespace. kubectl kubectl auth can-i --list --as=system:serviceaccount:default:monitoring-agent-SA
rules: roleRef:
auth can-i --list --as=system:serviceaccount:<namespace>:<serviceaccount> - apiGroups: [""] kind: ClusterRole
| resources: ["pods", "nodes"] name: monitoring-agent-role
| The --list flag is used to list all the actions and resources that the service account has access to, and the --as flag is used to specify the service account to check verbs: ["get", "list", "watch"] apiGroup: rbac.authorization.k8s.io
| the permissions for.
|
apiVersion: v1
|
| 3 Mount the service account token into a pod
kind: Pod
metadata:
name: monitoring-agent
When you specify the serviceAccountName field in the Pod spec, Kubernetes mounts the secret containing the Service Account token as a volume in the Pod.
spec:
serviceAccountName: monitoring-agent-SA
The volume is mounted at /var/run/secrets/kubernetes.io/serviceaccount, and the Service Account token is stored in the token file inside this volume containers:
- name: monitoring-agent
image: monitoring-agent-image
args: ["--kubeconfig=/var/run/secrets/kubernetes.io/serviceaccount/token"]
ClusterRole
Pod volumeMounts:
Which resources: Service Account - name: sa-token
Bind Mount mountPath: /var/run/secrets/kubernetes.io/serviceaccount
Pods,nodes —————> monitoring-agent-SA ————————>
serviceAccountName:
monitoring-agent-SA readOnly: true
What action: volumes:
get,list,watch - name: sa-token
secret:
secretName: monitoring-agent-SA-token-xxxxx

Using a Service Account to Access a Kubernetes Cluster with kubectl

| 1 Create & Grant permissions to the service account feature


Who? RoleBinding Which Roles?
——————————————————————— Role
——

|
s
| kubectl create serviceaccount sara Service-Account Sara
| apiVersion: rbac.authorization.k8s.io/v1 apiVersion: rbac.authorization.k8s.io/v1
Who?
| Admin groups kind: RoleBinding kind: Role
| kubectl create role pod-list-permission --verb=get,list,watch --resource=pods --namespace=default metadata:
Arye
User metadata:
| Mohsen name: pod-list-permission-binding name: pod-list-permission
| kubectl create rolebinding pod-list-permission-binding --role=pod-list-permission --user=sara --namespace=default Harold

Service-Account namespace: default namespace: default


| subjects: rules:
Subject
| this command creates a role binding in the default namespace that binds the pod-list-permission role to the user sara - kind: ServiceAccount - apiGroups: [""]
| name: sara resources: ["pods"]
|
namespace: default verbs: ["get", "list", "watch"]
| 2 Retrieve the Service Account token
| Kubectl -n default create token sara roleRef:
Role list
| You can retrieve the Service Account token or recreate it by running the following commands: eyJhbGciOiJSUzI1NiIsIm…Lz9APOb2rsWHr9HWA kind: Role Pods
Which resource?
name: pod-list-permission watch
| …
What actions?
apiGroup: rbac.authorization.k8s.io Resources get
| Verbs

|
| 3 Set the token as a credential in kubectl

To add the user sara to this .kube/config file, you would need to add the following code configuration
apiVersion: v1 .kube/config file The "rules" section in a role specifies the permissions granted by the role. The "rules" section
clusters:
is an array of rules, where each rule specifies the resources and operations that are allowed
under the users section - name: sara - cluster:
certificate-authority-data: LS0tLS1CRUdJTiBD…RVURS0tLS0tCg==
user: server: https://127.0.0.1:42995 The "subjects" section specifies the user or group of users to which the role should be bound.
token: eyJhbGciOiJSUzI1NiIsIm…Lz9APOb2rsWHr9HWA name: k8s-cluster-1 A subject can be a user, a group, or a service account.
contexts:
After adding this configuration, you would then need to create a new context that uses the sara user - context:
How to bind a role to multiple users? How to specify multiple rules in a Role?
and the k8s-cluster-1 cluster - context:
cluster: k8s-cluster-1
user: sara …
cluster: k8s-cluster-1 name: sara-k8s-cluster-1 apiVersion: rbac.authorization.k8s.io/v1
subjects: kind: Role
user: sara - context: - kind: User metadata:
name: sara-k8s-cluster-1 cluster: k8s-cluster-1 name: mohsen
user: arye name: pod-viewer
apiGroup: rbac.authorization.k8s.io namespace: my-namespace
name: arye@k8s-cluster-1
Finally, set the current-context field to the newly created context name current-context: sara-k8s-cluster-1
- kind: User rules:
name: arye - apiGroups: [""]
kind: Config apiGroup: rbac.authorization.k8s.io
current-context: sara-k8s-cluster-1 preferences: {} resources: ["pods"]
users:
- kind: Group verbs: ["get", "list"]
- name: arye name: developers - apiGroups: [""]
user: apiGroup: rbac.authorization.k8s.io resources: ["services"]
This configuration sets the authentication method for the user sara to use a bearer token (token field) client-certificate-data: LS0tLS1CRUdJTiBDRQUR…S0tLS0tCg== … verbs: ["get"]
instead of the client certificate and client key used by the arye user client-key-data: LS0tLS1CRUdJTiBd0NibHFxS0t…LS0tCg== you can bind a role to multiple users by
- name: sara You can also specify multiple rules in the
creating a role binding that specifies
Using Service Accounts for authentication can be more secure than using user accounts user: "rules" section of a role
token: eyJhbGciOiJSUzI1NiIsIm…Lz9APOb2rsWHr9HWA multiple users in the "subjects" section
because Service Accounts are automatically created and managed by Kubernetes
29
API Groups
In k8s, API groups are a way of organizing related resources and operations together. This allows for easier discovery and usage, and also helps to avoid naming conflicts between
different resources. When k8s was first introduced, all the resources like Pod, Service, ReplicationController, etc., were all part of a single group, the "core" group, and were accessed
at the path /api/v1.As k8s evolved and more resources were added, it became clear that this single group was not scalable. So, the concept of API groups was introduced

Kubernetes uses a versioning scheme to facilitate the evolution of its API. There are three types of versioning in Kubernetes:

Alpha: This is the first stage of the development of a new API. Beta: This is the second stage. Beta APIs are well-tested and are Stable: This is the final stage. Stable APIs appear in released
Alpha APIs may be unstable, change significantly after the initial enabled by default in your clusters. However, they may still undergo software for many subsequent versions
release, and may not even be enabled in your clusters. changes, such as in the form of bug fixes or feature enhancements.
The version of an API group is represented by vXalphaX (e.g., v1alpha1), vXbetaX
(e.g., v2beta2), and vX (e.g., v1) for alpha, beta, and stable versions, respectively.
Kubernetes API groups are divided into two categories
Named API Groups Core API Group
Named API groups are additional API groups introduced to extend the functionality of Kubernetes beyond the The core API group, also referred to as the "v1" group, contains the essential resources that are fundamental to
core resources. Each named API group focuses on specific features or functionalities and manages specialized the functioning of a Kubernetes cluster. It includes resources such as Pods, Services, Namespaces, ConfigMaps,
resources related to those features. The Named API group is accessed using the /apis endpoint Secrets, and more. The core API group is accessed using the /api/v1 endpoint
Pods services configmaps secrets namespaces PV PVC rc nodes endpoint events binding

...
> > > > >

apps: This group contains resources related to running batch: This group includes resources rbac.authorization.k8s.io: This group contains the Role, networking.k8s.io: This group contains storage.k8s.io: This group contains resources
applications on Kubernetes. It includes Deployment, for batch processing and job-like ClusterRole, RoleBinding, and ClusterRoleBinding resources resources related to networking in k8s, related to storage, such as StorageClass,
ReplicaSet, StatefulSet, and DaemonSet. tasks. It includes Job and CronJob. for handling role-based access control (RBAC) in Kubernetes. such as NetworkPolicy and Ingress. VolumeAttachment, and the CSINode driver

New resources are accessed at the path /apis/{group}/{version}.For example, to access the Deployment resource, which is part of the apps group, you would use the path /apis/apps/v1/deployments.

You can list all available API groups and versions in your cluster by running kubectl api-versions
Kubernetes APIs

/api /apis /healthz /metrics /logs …

/batch /networking.k8s.io /storage.k8s.io /rbac.authorization.k8s.io /apps … >


In addition to these API groups, k8s also provides several non-resource endpoints that are not part
<—————
API groups
of any specific API group. These endpoints provide access to information and functionality that are
/V1 /V1 /V1 /V1 /V1 /V1 Version
/V1 <——— not associated with any specific resource, such as the /healthz, /metrics, and /logs endpoints
|
|
Core groups Named groups |
|-> list get
| /Statefulset —>
|->
| /Replicaset watch create
|->
/Deployments delete patch

Resources update
KUBECONFIG and KUBECONFIG file Verbs

The KUBECONFIG environment variable is used to specify the path to the Kubernetes configuration file, which contains information about the cluster, user, and context used by kubectl and other
Kubernetes command-line tools. The KUBECONFIG file can contain multiple contexts, each representing a different cluster and namespace. The KUBECONFIG file is typically stored in the user's
home directory at the path ~/.kube/config on Unix-based systems
new.kubeconfig
kubectl --kubeconfig=/path/to/my-kubeconfig/my-kubeconfig.yaml get pods apiVersion: v1
Mohsen Arye Sarah Kind: Config
This command uses the specified KUBECONFIG file instead of the default ~/.kube/config file ——————> Which users? Clusters:
Users
- name: GKE
——————> Which clusters? cluster:
EKS Arvan GKE certificate-authority: ca.cert
Contexts servrer: https://k8s-endpoint:6443
Clusters
contexts:
Current-context - name: Arye@GKE
To create a kubeconfig file using kubectl, you can follow these steps: Kubeconfig
context:
Set the cluster details Set the user credentials Set the context Use the context Contexts section defines the mapping between the Kubernetes cluster(s) and the users who cluster: GKE
<——> user: Arye
can access them. A context includes the cluster and user information, as well as a reference
——— namespace: dev
to a default namespace and a name for the context |
| users:
| - name: Arye
kubectl config set-cluster GKE --server=https://k8s-endpoint:6443 --certificate-authority=ca.crt --embed-certs --kubeconfig new.kubeconfig If embed-certs=false | user:
| client-certificate: arye.crt
K8s CA certificate for TLS verification |
Set a cluster entry in kubeconfig Cluster name Server Address kubeconfig file that will be created with this new entry. client-key: arye.key
Embeds the certificate data directly in the kubeconfig instead of linking to a file |
|
| current-context: Arye@GKE
kubectl config set-credentials Arye --client-key=/path/to/arye.key --client-certificate=/path/to/arye.crt --embed-certs --kubeconfig new.kubeconfig If you use a token |
|> The default namespace to use for this context.
User name user key file user certificate file

kubectl config set-credentials Arye --token= eyJhbGciOiJSUzI1NiIsIm…Lz9APOb2rsWHr9HWA --kubeconfig new.kubeconfig If you use a client certificate

kubectl config set-context Arye@GKE --cluster=GKE --user=Arye namespace=dev --kubeconfig new.kubeconfig


context name default namespace

you can specify a different kubeconfig file by setting the KUBECONFIG


kubectl config use-context Arye@GKE
environment variable.
export KUBECONFIG=new.kubeconfig

Cluster Scope in Kubernetes


In Kubernetes, resources are divided into two categories based on their scope: Namespaced and Cluster-scoped

Namespaced resources: These resources exist and operate within a namespace. configmaps Role PVC Rolebinding

They can have different configurations and states in different namespaces Deployment Replicaset Pods Services Jobs kubectl api-resources —namespace=true

kubectl api-resources
nodes
NAME.
PV Clusterroles
SHORTNAMES. APIVERSION. NAMESPACED. KIND
Cluster-scoped resources: These resources exist and operate across …
endpoint no v1 true Endpoints

the entire cluster. They are not confined to any particular namespace
Pods po v1 false Pod
Clusterrolebinding Namespace Service svc v1 true Service
deployment deploy apps/v1 true Deployment
kubectl api-resources —namespace=false Ingresses ing extensions/v1beta1 true Ingress

30
How to create a new admin or developer user account for accessing to a k8s cluster with X.509 ?

Creating
Creating Role ->->->-
Creating a new Creating a CSR Review by Admin Creating a >- RoleBinding
->->->- ->->->- ->->->- ->->-
private key & CSR YAML file & approve it kubeconfig file
->->
->-
Creating Creating
->->->-
ClusterRole ClusterRoleBinding

1 Creating a new private key & a csr file by new user 2 Creating a new CSR yaml file and Sign the CSR using the Kubernetes CA

Generate a private key for the user using OpenSSL. The private key is used as Create a CertificateSigningRequest object in Kubernetes that includes the user's CSR and submit

part of the user's credentials to authenticate with the Kubernetes API server the CSR to the Kubernetes cluster

openssl genrsa -out mojtaba.key 2048


apiVersion: certificates.k8s.io/v1 csr-mojtaba.yml
kind: CertificateSigningRequest
metadata:
Create a CSR for the user using the private key name: mojtaba
openssl req -new -key mojtaba.key -subj “/CN=mojtaba” -out mojtaba.csr spec: request: $(cat user-name.csr | base64 -w 0)
groups:
- system:authenticated Or
The CSR includes the user's identifying information and the public key associated with the private key
request: LS0tLS1CRUdJTjkKNUlEdC9BWT………0KLS0tLS1FTkQ… <—— cat mojtaba.csr | base64 -w 0
gQ0VSVElGSUNBVEUgUkVRVUVTVC0tLS0tCg==
signerName: kubernetes.io/kube-apiserver-client
usages:
- client auth kubectl explain CertificateSigningRequest.spec

The signerName specifies the Kubernetes CA that will sign the certificate.
The usages field specifies that the certificate will be used for client authentication.

kuectl apply -f csr-mojtaba.yml


——————————————————————————————————————————————————————————————————————————————

3 submit the CSR to the Kubernetes cluster and approve it 4 Export the issued certificate from the CertificateSigningRequest.
you retrieve the signed certificate for the user
Once the CSR is submitted, it needs to be approved by a cluster administrator.
kubectl get csr mojtaba -o jsonpath='{.status.certificate}' | base64 -d >mojtaba.crt
k get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION 5 Create a kubeconfig File for the User
mojtaba 33m kubernetes.io/kube-apiserver-client kubernetes-admin Pending
Create a kubeconfig file for the user that includes the cluster details, user credentials, and context. The
k describe csr mojtaba certificate-authority-data field contains the base64-encoded CA certificate for the Kubernetes cluster.
Name: mojtaba
Labels: <none> apiVersion: v1 apiVersion: v1
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"certificates.k8s.io/ kind: Config
v1","kind":"CertificateSigningRequest","metadata":{"annotations":{},"name":"mojtaba"},"spec":{"groups": kind: Config
["system:authenticated"],"request":"LS0tLS1CRUdJTjkKNUlEdC9BWT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUgUkVRVU
VTVC0tLS0tCg==","signerName":"kubernetes.io/kube-apiserver-client","usages":["client auth"]}} current-context: mojtaba@cka current-context: mojtaba@cka
CreationTimestamp: Sat, 11 Jun 2022 17:51:33 +0000
Requesting User: kubernetes-admin clusters: clusters:
Signer: kubernetes.io/kube-apiserver-client - name: cka - name: cka
Status: Pending cluster:
Subject: cluster:
Common Name: mojtaba server: https://kubemaster-1:6443 server: https://kubemaster-1:6443
Serial Number: certificate-authority: ca.crt certificate-authority-data: <base64-encoded CA certificate data>
Organization: StarkWare
Organizational Unit: blockchain users:
—-|
users:
|-— - name: mojtaba
Country: IL
Locality: haifa - name: mojtaba
Province: haifa user: | | user:
Events: <none> client-certificate: mojtaba.crt | | client-certificate-data: <base64-encoded client certificate data>
client-key: mojtaba.key | | client-key-data: <base64-encoded client key data>
| |
|— k certificate approve mojtaba contexts: | | contexts:
| - name: mojtaba@cka - name: mojtaba@cka
| k get csr | |
context: | |
context:
| NAME AGE SIGNERNAME REQUESTOR CONDITION cluster: cka cluster: cka
| mojtaba 39m kubernetes.io/kube-apiserver-client kubernetes-admin Approved,Issued user: mojtaba | | user: mojtaba
|
namespace: dev | | namespace: dev
| | |
|
>
This command notifies the Kubernetes CA that the CSR has been approved and requests a | |
signed certificate for the user. The signed certificate is then stored in the status.certificate To become independent from external files in the configuration, you can use the data field directly within the configuration file

field of the CertificateSigningRequest object certificate-authority-data client-certificate-data client-key-data

cat /etc/kubernetes/pki/ca.crt | base64 -w 0 cat mojtaba.csr | base64 -w 0 cat mojtaba.csr | base64 -w 0

——
|
| kubectl config set-cluster cka --server=https://kubemaster-1:6443 --certificate-authority=ca.crt --embed-certs --kubeconfig devuser.kubeconfig
5.1 If you don't want to create a kubeconfig manually, |
you can create a kubeconfig using kubectl | kubectl config set-credentials mojtaba --client-key=/path/to/mojtaba.key --client-certificate=/path/to/mojtaba.crt --embed-certs --kubeconfig devuser.kubeconfig
===|
| kubectl config set-context mojtaba@cka --cluster=cka --user=mojtaba namespace=dev --kubeconfig devuser.kubeconfig
|
| kubectl config use-context mojtaba@cka
|
——

6 Set Up Role-Based Access Control (RBAC) for the User Rule Contains
Resources

In this final step, you create a role and role binding to grant the user permissions in the Kubernetes cluster Pod, services
> Role (Aggregated)Cluster Role
Verb(s) a collection of namespace a collection of cluster global
apiVersion: rbac.authorization.k8s.io/v1 apiVersion: rbac.authorization.k8s.io/v1 Get, List
scoped rules rules
kind: Role kind: RoleBinding
>
>
>

metadata: metadata: collects rules from


collects rules from
name: developer name: mojtaba-developer
namespace: dev namespace: dev Pod Role Binding Cluster Role Binding
rules: roleRef: Compute unit that can interact Attached rules from one Role or attaches rule from one Cluster
- apiGroups: [ “” ] apiGroup: "rbac.authorization.k8s.io" with the Kubernetes API server Cluster Role to User, Groups or SAs Role to Users,Groups or SAs
resources: [“pods”] kind: "Role" uses credentials of assigns rules to assigns rules to
verbs: [“list” , “get” , “create” , “update” , “delete”] name: "developer"
subjects:
>
>
>
>

- apiGroups: [ “” ] - apiGroup: "rbac.authorization.k8s.io" Service Account User Groups


resources: [“configMap”] kind: "User" can be part of 1..x
Namespaced Kubernetes managed user that A user that authenticates against > a collection of Users
verbs: [“create”] name: "mojtaba" is intended to be used by in cluster processes the kubernetes API server

The reason for having two separate rules in the Role definition is that the two resources, "pods" and "configMap", have different permissions requirements Kubernetes RBAC Objects

https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#normal-user
31
Auditing
Auditing in Kubernetes refers to the process of recording and analyzing activities that occur on the cluster. This can include actions taken by users, API requests, and
changes to objects in the cluster. Auditing provides visibility into the behavior of the cluster and can be used for security, compliance, and troubleshooting purposes

Audit levels in Kubernetes define the verbosity of the recorded events. There are four audit levels:
>
None: Do not log any events. Admin

>
Metadata: Log request metadata only (e.g., who, what, where, when). Dev

——
——
Dev
>
Request: Log event metadata and request content (excluding the response). ——

——

——
Audit log events are
ALL ROADS LEAD TO …
—— ——
RequestResponse: Log event metadata, request content, and response content.

>
>

>
emitted as JSON object
—— >
| {
> THE APISERVER
Memory consumption depends Audit logging increases the memory
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",

All requests to view or modify the state of


"level": "RequestResponse",
"timestamp": "2022-06-01T14:23:00Z",
=
on the audit logging policy consumption of the API Server
=
Audit log ==
"auditID": "1a2b3c4d-1234-5678-90ab-cdef01234567",
"stage": "ResponseComplete",

== the cluster pass through the apiserver

<—
<
"requestURI": "/api/v1/namespaces/default/pods",

<————
==

——
"verb": "POST",
"user": {
==
>

——
"username": "john.doe",
"groups": ["developers"]

This central position makes the apiserver the


},


"sourceIPs": ["10.0.0.1"],
"objectRef": {

——

——
"apiVersion": "v1",

appropriate source for auditing data


"kind": "Pod",
"namespace": "default",

——
"name": "my-pod"

——
To enable audit logging in the Kubernetes API server, you need to configure the API server

——
},
"responseStatus": {
"metadata": {},
"code": 201
},

to use a specific audit policy and write audit logs to a file or other destination
"requestReceivedTimestamp": "2022-06-01T14:22:59.999Z",
"stageTimestamp": "2022-06-01T14:23:00.001Z",
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{...}"
}
}

1 Edit the API server configuration file, Add the following


2 Add a volume and volumeMount to the spec section
flags to the spec.containers.command section
- --audit-policy-file=/etc/kubernetes/audit/policy.yaml volumes: 3 Add the corresponding volume mounts to the
- --audit-log-path= /var/log/kubernetes/audit/audit.log - name: audit-config spec.containers.volumeMounts section
- --audit-log-format=json hostPath:
- --audit-log-maxsize=500 path: /etc/kubernetes/audit/policy.yaml - name: audit-config
- --audit-log-maxbackup=3 type: File mountPath: /etc/kubernetes/audit/policy.yaml
- name: audit-logs readOnly: true After the API server restarts and applies the policy.yaml
4
The apiserver has some audit logging options: hostPath: subPath: audit-policy.yaml file, you can tail the logs to see the events being recorded
path: /var/log/kubernetes/audit - name: audit-logs
audit-policy-file: sets the policy file to use type: DirectoryOrCreate mountPath: /var/log/kubernetes/audit tail -f /var/log/kubernetes/audit/audit.log | jq
audit-log-*: setting configure log files
audit-webhook-*: settings configure log network endpoints

Audit policy is a configuration that defines the rules for what events should be recorded and at what level apiVersion: audit.k8s.io/v1 This policy has five rules, each specifying
/etc/kubernetes/audit/policy.yaml kind: Policy a different level of audit logging:
apiVersion: audit.k8s.io/v1 apiVersion: audit.k8s.io/v1 rules:
kind: Policy kind: Policy
specifies that all "get", "watch", and "list"
rules: rules: - level: None ———>
- level: Metadata - level: Request verbs: ["get", "watch", "list"] operations should not be audited at all.
resources: resources:
- group: "" - group: "" - level: None specifies that events should not be
resources: ["pods", "services"] resources: ["pods", "services"] ———>
- level: Request verbs: ["create", "update", "delete"] resources: audited at all.
users: ["system:serviceaccount:my-namespace:my-serviceaccount"] - level: Metadata - group: "" # core
resources: resources: resources: ["events"]
- group: "" - group: ""
resources: ["configmaps"] resources: ["namespaces", "configmaps", "secrets"] - level: None
verbs: ["create", "update", "delete"] users:
This policy logs metadata for all pod and service operations and logs request - level: None - "system:kube-scheduler" specifies these certain system
———>
content for configmap operations performed by the specified service account
resources:
- "system:kube-proxy" users should not be audited
- group: ""
resources: ["persistentvolumes", "persistentvolumeclaims"] - "system:apiserver"
- "system:kube-controller-manager"
This policy will log request-level for pod and service creation, update, and deletion, also will log - "system:serviceaccount:gatekeeper-
metadata-level events for namespace, configmap, and secret creation, update, and deletion. It system:gatekeeper-admin" specifies that all users belonging to the
———>
will not log events related to persistent volume and persistent volume claim resources. "system:nodes" group should not be audited.
- level: None
userGroups: ["system:nodes"]
specifies that all other operations, including
———>
- level: RequestResponse requests and responses, should be audited
at the RequestResponse levels

RuntimeClass
RuntimeClass is a Kubernetes feature that allows users to specify different runtime configurations for their containers. One common use case for RuntimeClass is to Kubelet
run containers with different levels of isolation.For example, a user may want to run some containers with a higher level of isolation, while others may not require the
—>

same level of security. By defining multiple RuntimeClasses with different runtime configurations, the user can choose the appropriate class for each container. Container d
— —
— —
> >
| Pod
gVisor is a user-space kernel that provides isolation for containers by intercepting and handling system calls. It can be used with k8s to App process | User Space runc runsc(gvisor)
provide an extra layer of security for your pods. To restrict syscalls for a pod running in k8s, you can use gVisor as the runtime for that pod. |
SANDBOX | gVisor is an sandboxed container runtime developed by Google. It
| provides an additional layer of isolation between containerized
applications and the host kernel using a technique called "sandboxing"
System Calls |
Install Create a Use the gVisor | Kernel Space
How to use gVisor Configure Kernel
gVisor RuntimeClas RuntimeClass |
containerd
s resource in your pod

> First, you need to install gVisor on your Kubernetes nodes. You can > To use gVisor with Kubernetes, you need to configure the container runtime
```toml
do this using the runsc binary, which is the gVisor runtime. Download (e.g., containerd) to use gVisor. Create a configuration file for containerd: [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
and install the runsc binary on each node: runtime_type = "io.containerd.runsc.v1"
sudo mkdir -p /etc/containerd [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc Add this configuration to
sudo nano /etc/containerd/config.toml BinaryName = "/usr/local/bin/runsc"
chmod +x runsc the `config.toml` file:
Root = ""
sudo mv runsc /usr/local/bin LogLevel = "info"
Restart containerd to apply the new configuration: Debug = false
DebugLogFile = ""
sudo systemctl restart containerd NoSandbox = false
```

> create a `RuntimeClass` resource in your Kubernetes cluster that specifies gVisor > To use gVisor for a specific pod, set the `runtimeClassName` field to `gvisor`
as the runtime. Save the following YAML file as `gvisor-runtime-class.yaml`: in the pod spec. Here's an example of a simple Nginx pod that uses gVisor:

apiVersion: node.k8s.io/v1 apiVersion: v1


kind: RuntimeClass kind: Pod Pod
metadata: metadata: —> Kubelet
name: gvisor name: nginx-gvisor runtimeClassName: gvisor
—>

handler: runsc spec:


runtimeClassName: gvisor nginx-gvisor Container d
kubectl apply pply -f gvisor-runtime-class.yaml containers: —
- name: nginx —
— —
image: nginx:latest > >
ports:
- containerPort: 80 runc runsc(gvisor)
kubectl apply pply -f nginx-gvisor-pod.yaml
Executing the 'dmesg' command inside the container does not show more information because system calls are restricted by gvisor.
32
Network policy
Network Policy is a Kubernetes feature that allows you to define rules for ingress and egress traffic between pods inside a cluster. It's a way to implement security and access control at
the network level by specifying which pods can communicate with each other, using labels to identify the target and source pods. Monitoring namespace ns: monitoring
Pod Pod

Policies are namespace scoped Policies are applied to pods using label selectors |——————————>
<————————<———————— role: ms-exporter
|
—>——————————————| |
feature | |————————<— Prometheus Mysql-exporter
features Policy rules can specify the traffic that is allowed to/from pods, namespaces, or CIDRs | | | |
s | |
| |
Svc | |
Policy rules can specify protocols (TCP, UDP, SCTP), named ports or port numbers
.

| |
| |
| | Ingress Egress
User

>
| |

>
| | Request response

>

>
Ingress Egress Ingress Egress
| Pod Pod <——————————<— Pod
role: db
| Svc <————————<—
Request response Request response Request response
Request response
HA —————> Django
Ingress —>————————> —>——————————>
Ingress app Egress Ingress db
172.18.10.10 webserver Egress
app: traefik refers to refers to | |
Network policies are applied to pods rather than services because pods are the network endpoints | Mount |
incoming traffic from other pods or outgoing traffic from the target pod to Pv
that actually receive the traffic. Services are not network endpoints and do not receive traffic Secret
dGlnZHajsurnJa

external sources to the target pod other pods or external destinations size: 20Gi huNrux8dmxhUd4

directly. Instead, they route traffic to the appropriate pods based on their labels.
NbvvE4s09sdej3

IOPS: 4
mdMkspAXI6cGFz
czEyMzQ

>
If no Kubernetes network policies apply to a pod, then all traffic to/from the pod are allowed (default-allow). If one or more k8s
network policies apply to a pod, then only the traffic specifically defined in that network policy are allowed (default-deny)

Network policies are like firewall rules for your Kubernetes pods. By default, pods are non-isolated and can accept traffic from any source. When you apply a NetworkPolicy to a pod, that pod becomes isolated
and only allows traffic that is permitted by the policy. There are several types of Network Policy rules that can be defined in Kubernetes:

PodSelector: This rule selects a specific set of NamespaceSelector: This rule selects all the pods ExternalEntities: This rule allows you to define specific IP addresses
pods to apply the policy to based on their labels. in a specific namespace to apply the policy to. or IP ranges that are allowed to communicate with the selected pods
apiVersion: networking.k8s.io/v1 apiVersion: networking.k8s.io/v1 apiVersion: networking.k8s.io/v1
kind: NetworkPolicy Allow ingress traffic from pods in the same namespace
kind: NetworkPolicy kind: NetworkPolicy
metadata: metadata: metadata:
name: db-policy name: db-policy-namespace name: traefik-policy
spec: namespace: default namespace: default
podSelector: This selects the pods to which the NetworkPolicy applies. spec:
Allow ingress traffic from pods in a different namespace spec:
matchLabels: In this case, it matches all pods with the label role: db podSelector: podSelector:
role: db matchLabels: matchLabels:
This rule only allows inbound traffic from pods labeled
policyTypes: specifies that the policy only applies to ingress traffic role: db app: traefik
- Ingress ingress: "role: ms-exporter" in the "ns: monitoring" namespace. policyTypes:
ingress: The "ingress" section specifies the traffic rules that govern - from: Incoming traffic is limited to port number 3306 - Ingress
- from: inbound traffic to the selected pods. Specifically, it permits
- podSelector: - Egress
- podSelector: matchLabels: ingress:
matchLabels: traffic from pods labeled with "role: django" to access the role: ms-exporter - from:
role: django selected pods on TCP port 3306. namespaceSelector: specifies that the policy applies to both Ingress and Egress traffic - ipBlock:
ports: matchLabels: cidr: 172.18.0.0/24
Ports filed allows you to specify the ports and protocols that specifies that pods with the label "app: traefik" can only
- protocol: TCP ns: monitoring egress:
port: 3306 are allowed for incoming or outgoing traffic. ports: receive traffic from the IP block "172.18.0.0/24" and - to:
- port: 3306 can only send traffic to pods with the label "role: django". - podSelector:
matchLabels:
role: django
Please note that in order to use Network Policies, you must have a CNI (Container Network Interface) that supports them, such as Calico or Weave Net.

Security Context
SecurityContext is a configuration object that defines the security settings for a Pod or a specific container within a Pod. It allows you to set the access control and security-related
properties for the containers, including their file system, users, and groups, as well as the capabilities and privileges of the processes running inside the containers.
SecurityContext object can be defined at the Pod level or at the container level, using the securityContext field in the Pod or container specification

apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
metadata: metadata:
name: my-pod container will run as the user ID 1000, the group ID 2000, and name: my-pod
the capabilities field is used to specify the Linux capabilities that the
spec: spec:
have its filesystem owned by the group ID 3000. Additionally, apiVersion: v1 containers: container is allowed to use. Here, the container is allowed to use the
containers:
- name: my-container the container's root filesystem will be read-only, which can help to kind: Pod - name: my-container NET_ADMIN capability, but is not allowed to use the CHOWN capability.
image: my-image improve security by preventing changes to critical system files. metadata: image: my-image
securityContext: name: my-pod securityContext:
runAsUser: 1000 spec: capabilities:
runAsGroup: 2000 containers: add:
fsGroup: 3000 - name: my-container - NET_ADMIN Additionally, the allowPrivilegeEscalation field is set to false,
readOnlyRootFilesystem: true the privileged field is set to true, which means image: my-image drop: which means that the container is not allowed to escalate
securityContext: - CHOWN
that the container will run in privileged mode privileges beyond what is specified in the SecurityContext object
privileged: true allowPrivilegeEscalation: false

Image security
Trivy is a simple and comprehensive vulnerability scanner for containers. It's used to identify vulnerabilities in operating system packages (Alpine, Red Hat Universal Base Image, CentOS, etc.) &
application dependencies (Bundler, Composer, npm, yarn, etc.). It's especially useful in the Kubernetes (k8s) environment for scanning container images and ensuring your workloads are secure.
Here's how Trivy can be integrated into different stages of Kubernetes deployment:
Pre-deployment Scanning: Develop
Scan Git repository
Scan third party libraries
Before deploying your workloads, you can use Trivy to scan various resources for vulnerabilities and misconfigurations. Here are some common use cases: Scan filesystems
Scan container image
Third-party Libraries: Scan your application's Container Images: Scan container images for vulnerabilities in the Git Repositories: Analyze your code repositories for Iterate
dependencies and libraries for known vulnerabilities. underlying operating system packages and application dependencies secrets, sensitive information, or other security issues. Test

You can use the Trivy CLI on your local machine or integrate Trivy into your CI/CD pipeline to perform these pre-deployment scans. Trivy will provide you with a list of vulnerabilities Development lifecycle
Scan base image
and misconfigurations to address before deploying your workloads
Scan Dockerfile
Scan kubernetes Manifest
Continuous Scanning of Running Workloads:
After deploying your workloads to Kubernetes, it's essential to set up automated and continuous scanning to detect vulnerabilities in your running workloads. Observe Deploy
Here are the recommended features for this stage:
Scan running in-cluster kubernetes workloads

Trivy K8s Command: Use the trivy kubernetes command to scan Kubernetes Deployments trivy k8s --namespace=kube-system --report=summary deploy
or Namespaces. Trivy will scan the container images used by the running Pods and provide Summary Report for minikube
vulnerability reports
Workload Assessment
┌─────────────┬───────────────────────────┬───────────────────┬────────────────────┬───────────────────┐
│ Namespace │ Resource │ Vulnerabilities │ Misconfigurations │ Secrets │
Trivy Operator: Deploy the Trivy Operator in your Kubernetes cluster. The Trivy Operator │ │ ├───┬───┬───┬───┬───┼───┬───┬───┬────┬───┼───┬───┬───┬───┬───┤
automates the scanning of running workloads by continuously monitoring and scanning │ │ │ C │H │M │L │U │C │H │M │L │U │C │H │M │L │U │
├─────────────┼───────────────────────────┼───┼───┼───┼───┼───┼───┼───┼───┼────┼───┼───┼───┼───┼───┼───┤
container images within the cluster │ kube-system │ Deployment/metrics-server │ │ │ │ │ │ │ │2 │ 8 │ │ │ │ │ │ │
│ kube-system │ Deployment/coredns │ │ 1 │ │ │ │ │1 │3 │ 5 │ │ │ │ │ │ │
│ kube-system │ Deployment/logviewer │ 2 │ │ │ │ │ │ │ 4 │ 11 │ │ │ │ │ │ │
└─────────────┴───────────────────────────┴───┴───┴───┴───┴───┴───┴───┴───┴────┴───┴───┴───┴───┴───┴───┘
Severities: C=CRITICAL H=HIGH M=MEDIUM L=LOW U=UNKNOWN
Gatekeeper 33

Kubernetes provides admission controller webhooks as a mechanism to decouple policy decisions from the API server. These webhooks intercept admission requests before
they are persisted as objects in k8s, allowing custom logic and policies to be enforced. Gatekeeper was specifically designed to facilitate customizable admission control
through configuration, rather than requiring code changes. It brings awareness to the overall state of the cluster, going beyond evaluating a single object during admission.
Gatekeeper integrates with Kubernetes as a customizable admission webhook. It leverages the Open Policy Agent (OPA), which is a policy engine hosted by the Cloud Native
Computing Foundation (CNCF), to execute policies in cloud-native environments.

Constraint Templates are Kubernetes Custom Resource Definitions (CRDs) that define a set of constraints or policies that
can be applied to Kubernetes objects. They act as a template or blueprint for creating individual Constraints. A Constraint
Template defines the structure, parameters, and validation rules for a specific type of constraint that can be applied to Constraints are instances of Constraint Templates. They are created based on the defined template and applied to specific Kubernetes resources.

Kubernetes resources.Constraint Templates allow you to define reusable policies that can be applied to multiple resources Constraints enforce policies by validating the resources against the defined rules and conditions in the Constraint Template. If a resource violates

across your cluster. They provide a way to centralize and standardize the enforcement of constraints any of the defined constraints, it is considered non-compliant

When a user tries to create/update a resource in the cluster, the request first goes to the
Admission control gatekeeper (as an admission webhook). Gatekeeper checks if the resource satisfies all the
defined constraints and rejects the request if any policy is violated
Constraint Templates CRD Constraints CRD

>
Deploy Service Query
Constraint Templates —— Constraints
| |
Pod Ingress | |
————————————
… |
——————————————————————— OPA >

Kubernetes objects Watch/replicate Gatekeeper

Enforcing Resource Limits and Requests for Pods using Gatekeeper

To enforce a policy where all Pods must have resource limits and requests set using Gatekeeper, you would create a ConstraintTemplate and then a Constraint using that template.
Here's how you can do it:
1 2
Create a ConstraintTemplate, which defines the schema and the Rego logic for the policy. Create a Constraint based on the ConstraintTemplate you defined. The Constraint specifies the name
The ConstraintTemplate specifies that the Pods must have resource limits and requests and the kind of resources to which the policy applies

apiVersion: constraints.gatekeeper.sh/v1beta1
apiVersion: templates.gatekeeper.sh/v1beta1 kind: K8sRequiredResources
kind: ConstraintTemplate metadata:
metadata: name: pod-must-have-limits
name: k8srequiredresources spec:
spec: match:
crd: kinds:
spec: - apiGroups: [""]
names: kinds: ["Pod"]
kind: K8sRequiredResources
validation: 3
After applying the Constraint, any new Pods that do not have resource limits and requests will be
openAPIV3Schema:
properties:
resources: rejected by the Gatekeeper admission webhook. Existing pods will not be affected by this policy
type: array
items: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredresources
violation[{"msg": msg}] { this ConstraintTemplate is defining a constraint that requires all containers in Kubernetes resources to have resource limits defined. If any container
container := input.review.object.spec.containers[_]
not container.resources.limits.memory violates this constraint, Gatekeeper will prevent the resource from being created or modified.
msg := sprintf("container <%v> has no memory limit", [container.name]) The first violation rule checks whether a container in the input resource's specification (spec) has defined memory resource limits. If there are no memory
}
resource limits defined, it generates a violation with a message indicating that the container lacks memory resource limits.
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits.cpu
msg := sprintf("container <%v> has no CPU limit", [container.name])
}
34
Storage
In Kubernetes, containers are typically considered to be ephemeral and immutable, meaning that they are designed to be short-lived and replaceable. This approach is well-suited
for stateless applications that don't store or modify persistent data, but it can be challenging for stateful applications that require persistent storage.
To address this challenge, Kubernetes provides various ways to persist data, ranging from simple to complex solutions. Here are some of the approaches to persist data in k8s.

HostPath volumes:
apiVersion: v1
HostPath volumes allow you to mount a directory from the host node's filesystem into a Pod. kind: Pod
metadata:
This approach is useful for testing and development purposes, but it is not recommended for name: web-app
namespace: dev
production environments as it can create security risks
Pod
spec:
containers:
One important thing to note about HostPath volumes is that they are only accessible - name: web-app
from the node where the pod is running. This means that if the pod is rescheduled to a image: my-web-app-image
user-uploads 2 We then mount the user-uploads volume to the container's /app/uploads
/var/uploads >
different node, it will not have access to the files on the original node's filesystem. Also, if > /app
volumeMounts:
/uplo - mountPath: /app/uploads directory using the volumeMounts field in the container specification. This
multiple pods are scheduled on the same node and they use the same HostPath volume, ads
name: user-uploads allows the web application to access the user-uploaded files stored in the
they will be able to read and write to the same files on the host node's filesystem
volumes: /var/uploads directory on the host node's filesystem
- name: user-uploads
Host node's filesystem hostPath:
1 We are creating a HostPath volume named user-uploads that
path: var/uploads
maps to the /var/uploads directory on the host node's filesystem
type: DirectoryOrCreate
type field specifies that the directory should be created if it doesn't already exist

EmptyDir volumes:
EmptyDir volumes are a type of temporary storage volume that are created and attached to a Pod when the Pod is created. The data stored in an EmptyDir volume exists only for the
lifetime of the Pod and is deleted when the Pod is deleted. These volumes are commonly used for storing temporary data that is needed by a Pod, such as cache files or temporary log
When you define an EmptyDir volume, you can specify a size limit for the EmptyDir volume can be configured to store its data in memory instead of on disk. apiVersion: v1
volume. If you don't specify a limit, the application running in the pod can This provides faster access to the data in the volume, which can make it useful for kind: Pod
metadata:
generate any amount of data, which can cause the disk to become full and caching data that needs to be accessed frequently name: ML-app
spec:
potentially cause the node to become unavailable
Pod
Pod containers:
- name: video-conv
apiVersion: v1 image: video-conv
kind: Pod RAM cache-volume > volumeMounts:
/var/
cache/
>

metadata: data - name: cache-volume


name: monitoring-pod mountPath: /var/cache/data
spec: volumes:
containers: - name: cache-volume
- name: monitoring-container The medium field is used to indicate the underlying storage medium for a volume. By setting the medium to emptyDir:
image: monitoring-image medium: Memory
"Memory", the cache-volume volume will be created using the host node's RAM as the storage medium. sizeLimit: 1Gi
volumeMounts:
- name: logs-volume
/var/cache/data directory inside the container is mounted to an EmptyDir volume named cache-volume. The cache-volume volume is
mountPath: /var/log/monitoring-app
volumes: configured with a sizeLimit of 1 gigabyte, which means that it can store up to 1 gigabyte of data in memory during the lifetime of the Pod
- name: logs-volume
emptyDir:
sizeLimit: 1Gi

ConfigMaps and Secrets:


ConfigMaps and Secrets are Kubernetes objects that allow you to store configuration data and sensitive information such as
credentials and keys, respectively. They can be mounted as volumes in a Pod, allowing the Pod to access the data as files Go to page 18

Persistent Volumes (PVs) and Persistent Volume Claims (PVCs):


PVs are independent storage volumes that can be provisioned from different storage providers such as cloud storage or on-premise storage systems, and PVCs are used to request
storage resources from the PVs. The PVs and PVCs allow you to abstract the underlying storage infrastructure from your application, providing a layer of indirection. You can use
PVs and PVCs to store data persistently, even if a Pod is deleted or restarted. PVs and PVCs can be used with different storage backends like NFS, iSCSI, Ceph, etc
To connect PVs and PVCs to pods, you need to follow these steps:

Provision a Persistent Volume (PV): As an administrator, you'll define and create a PV object, specifying the storage capacity, access modes, and Deployment Deployment Statefulset

other properties. This involves interacting with your underlying storage infrastructure, whether it's local disks, network storage, or cloud storage. Pod Pod Pod
|
|
Create a Persistent Volume Claim (PVC): A user or developer creates a PVC object, specifying their desired storage capacity, access
>

<
|
>

| modes, and any additional requirements. The PVC will be used by the pod to request storage. persistentVolumeClaim persistentVolumeClaim Volume claim template
| |
| |
| |
|
| |
Binding PVC to PV: Once the PVC is created, Kubernetes matches it with an available PV that meets the requested criteria. The
| |
| | |
| |
|
| | binding process ensures that the PVC and PV are associated with each other. PVC PVC PVC
>

| |
| > | |
| | |
|
Mounting the PV to a pod: In the pod's specification, you specify the PVC as a volume source. When the pod is scheduled and
| |
| |
|
Admin
| |
| | | |
| runs, Kubernetes mounts the PV associated with the PVC to a specified path within the pod's filesystem. PV > Created PV | | PV
| | |
| |———————————————————— | |
| |
| | | | | |
| Pvs can be provisioned by an | |
kind: PersistentVolume | apiVersion: v1 | | | |
| kind: PersistentVolumeClaim administrator or dynamically create | | | |
apiVersion: v1 apiVersion: v1 kind: Pod |
| |
| |
PVs have a lifecycle independent of any individual pod,
|
metadata: metadata: metadata:
name: nfs-pv1-40g-rw StorageClass
name: nfs-pvc-20g name: my-pod meaning they can exist even when no pods are using them
spec: spec: spec:
capacity: accessModes: containers:
storage: 40Gi - ReadWriteMany - name: my-container PVC is a request for storage by a pod. It is a way for pods to dynamically request a specific amount
accessModes: resources: image: my-image
- ReadWriteMany volumeMounts: and type of storage without having to know the details of the underlying storage infrastructure
requests:
nfs: storage: 15Gi - name: my-volume
server: nfs_server_ip mountPath: /data
path: /mnt/nfs_share/pv1-40g-rw volumes:
persistentVolumeReclaimPolicy: Retain - name: my-volume
persistentVolumeClaim:
claimName: nfs-pvc-20g

accessModes is a field that is used to specify how the volume can be mounted and accessed by a pod The persistentVolumeReclaimPolicy determines what happens to the contents of a Persistent
Volume (PV) when it is released, specifying whether the contents should be retained or deleted
>
ReadOnlyMany (ROX): This access mode allows the volume to be mounted as read-only by multiple nodes in a
cluster. This means that the volume can be mounted by multiple pods at the same time, but cannot be modified. > Retain: The PV's contents are retained even after the PV is released. This means that the PV can be reused by
This mode is typically used for shared read-only storage resources, such as configuration files or static data creating a new PVC that requests the same storage capacity and access modes as the original PVC that used the PV
>
ReadWriteMany (RWX): This access mode allows the volume to be mounted as read-write by multiple nodes in a >
Delete: The PV's contents are deleted when the PV is released. This means that the PV cannot be reused by creating
cluster. This means that the volume can be mounted by multiple pods at the same time, and can be modified. This a new PVC that requests the same storage capacity and access modes as the original PVC that used the PV
mode is typically used for shared read-write storage resources, such as file shares or databases
>
Recycle (deprecated): The PV's contents are deleted when the PV is released, but the PV is made available
> ReadWriteOnce (RWO): This access mode allows the volume to be mounted as read-write by a single node in a
for reuse. However, this value is deprecated and should not be used in newer versions of Kubernetes
cluster. This means that the volume can be mounted by only one pod at a time, and is typically used for storage
resources that can only be accessed by one node or pod at a time, such as local storage or block storage.
35
PVs can be provisioned
statically or dynamically
> >

Static provisioning involves manually creating PVs and configuring their properties, such Dynamic provisioning allows Kubernetes to automatically create PV when a PVC
as storage capacity, access modes, and claimPolicy.(provisioned by an Administrator) is created. Dynamic provisioning can be implemented using StorageClasses

———————————————— ————————————————

————————————————

————————————————
Cluster
————————————————
Cluster

————————————————
Dev
————— —————

Persistent Volume Claims Persistent Volume Claims


————————————> (PVCs) 1 (PVCs) 2
3
————————————> ————————————> > ———————————

——
—————
PVC PVC
size: 15Gi, IOPS: 4 size: 10Gi, IOPS: 4
gce storage provisioner
——————

——————
4
Admin
4 ceph storage provisioner
When you create a PV that uses an NFS volume, the PV connects to Admin Dev
—————

————
the NFS server and uses it as the backend storage for the PV
. .

Sc Pv Sc Pv
Pv Pv size: - size: -
2 IOPS: - 3
size: 40Gi size: 20Gi IOPS: -
————————————> IOPS: 4 IOPS: 4
——————————— File storage ———————————
1 storage provisioner:gce storage provisioner:ceph

Persistent Volumes(PVs) 0 Persistent Volumes(PVs)


To create a Persistent Volume (PV) in Kubernetes from an existing
———————————————— NFS volume, you first need to create an NFS export on the NFS ————————————————
>
Persistent Volumes (PVs) that fit NFS volumes are created server that will make the volume available to the Kubernetes cluster
by the cluster administrator on the Kubernetes cluster. exportfs -o rw,sync,no_subtree_check,no_root_squash,size=40g,fsid=0, storage classes act as an abstraction layer on top of PVs, allowing you to define a
sec=sys,anonuid=65534,anongid=65534 k8s-cluster-ip:/mnt/nfs_share/pv1-40g-rw set of default parameters and policies that are used when dynamically provisioning
exportfs -o rw,sync,no_subtree_check,no_root_squash,size=20g,fsid=0, new PVs based on PVC requests
sec=sys,anonuid=65534,anongid=65534 k8s-cluster-ip:/mnt/nfs_share/pv2-20g-rw

>

kind: PersistentVolume kind: PersistentVolume When a PVC is created, it specify a StorageClass to use, which will dictate how the PV is provisioned.
apiVersion: v1 apiVersion: v1 1
If no StorageClass is specified, the default StorageClass will be used (if one is defined)
metadata: metadata:
name: nfs-pv1-40g-rw name: nfs-pv1-20g-rw
spec: spec: apiVersion: v1 apiVersion: storage.k8s.io/v1
capacity: capacity: kind: PersistentVolumeClaim kind: StorageClass
storage: 40Gi storage: 20Gi metadata: metadata:
accessModes: accessModes: name: my-pvc name: gce-pd-storage
- ReadWriteMany - ReadWriteMany
nfs: nfs: spec: provisioner: kubernetes.io/gce-pd <

server: nfs_server_ip server: nfs_server_ip > storageClassName: gce-pd-storage parameters:


path: /mnt/nfs_share/pv1-40g-rw path: /mnt/nfs_share/pv2-20g-rw accessModes: type: pd-standard
persistentVolumeReclaimPolicy: Retain persistentVolumeReclaimPolicy: Retain > - ReadWriteOnce replication-type: none
resources: reclaimPolicy: Retain
requests: allowVolumeExpansion: true
storage: 10Gi volumeBindingMode: Immediate
3 To use persistent storage in their Pod, the user can run the kubectl get pv command to view the available PV
The provisioner field in a StorageClass specifies the name of the provisioner that should be <

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs-pv1-20g-rw 20Gi RWX Retain Available nfs 1d used to provision the storage. There are many different provisioners available for different
nfs-pv1-40g-rw 40Gi RWX Retain Available nfs 1d types of storage, including those for cloud providers like GCE, AWS, and Azure.

In order to use one of these PVs for persistent storage in a Pod, we can create a Persistent Volume Claim (PVC) > The storageClassName field in the PVC specification is used to specify the name of
that requests storage from the desired PV the StorageClass that should be used to provision the requested storage
kind: PersistentVolumeClaim
apiVersion: v1
metadata: > This accessMode don’t allow multiple pods to read and write to the same PVC simultaneously.
name: nfs-pvc-20g
4 Once the PVC is bound to the PV, we can mount the PV to the
spec:
Pod by including it as a volume in the Pod definition file. accessModes: 2 Kubernetes first tries to find an existing PV that matches the criteria specified in the PVC. If no suitable PV is
- ReadWriteMany found , Kubernetes requests the provisioner mapped to the PVC's storage class to create a new volume. The
resources: storage provisioner can be a plugin or a driver that interfaces with the underlying storage system
requests:
apiVersion: v1 storage: 15Gi
kind: Pod
metadata: .

name: my-pod
> Each PV can be bound to only one PVC at a time , because when a PVC is created, 3 The storage provisioner creates a new PV that matches the PVC's requirements, such as Sc storage provisioner:gce
Pv
Kubernetes will try to find an available PV that matches the PVC's requirements based size, access mode, and storage class
spec: size: 10
containers: on capacity, access mode, and storage class. If a suitable PV is found, the PVC is bound IOPS: 4

- name: my-container to that PV, and the PV becomes unavailable for other PVCs to
image: my-image
4 Once the new PV is created, Kubernetes binds it to the PVC and the PVC is ready to be used
volumeMounts:
- name: my-volume by a Kubernetes pod. The pod can then mount the volume and use it to store and retrieve data
When the Pod is created, Kubernetes will use the bound PVC named nfs-pvc-20g as
mountPath: /data
a volume mount point to access the persistent storage associated with the bound PV.
volumes:
- name: my-volume The volumeMounts section in the Pod specification specifies that the volume should be

persistentVolumeClaim: mounted at the path /data within the container, so any data written to that path will be
claimName: nfs-pvc-20g stored persistently in the PV.

If Pod is deleted and then rebuilt with the same PVC, it will be connected to the same persistent volume, and any data
that was previously stored in the volume will still be accessible. However, if the PVC is deleted, any data stored in the
associated persistent volume will be lost, and the Pod that was using that PVC will no longer be able to access the
data. This is because deleting a PVC deletes the binding between the PVC and the PV, which causes the PV to be
released and potentially recycled for use by other PVC

Finalizers
finalizers are markers attached to resources (such as pods, services, or deployments) to indicate that When you attempt to delete an object in Kubernetes that has a finalizer associated with it,
the object will remain in the finalization phase until the controller responsible for managing
some additional cleanup or finalization steps need to be performed before the resource can be fully that object removes the finalizer keys or until the finalizers are explicitly removed by a user.
deleted. Finalizers are represented as strings and are stored in the metadata of the resource live
delete
Some common finalizers you’ve likely encountered are: kubectl create
kubernetes.io/pv-protection finalization remove finalizer key
kubernetes.io/pvc-protection
The finalizers above are used on volumes to prevent accidental deletion kubectl delete if deletion will not be complete we can
edit the object and remove the finalizer

registry delete
deletion

empty finalizer
Key
36
StatefulSet
StatefulSets are a type of workload object in Kubernetes that are used to manage stateful applications. They are designed to handle applications that require unique identities,
stable network addresses, persistent storage ,ordered deployment and scaling, and graceful deletion. Such as databases, message queues, etc. StatefulSets maintain a sticky
identity for each pod, so even if a pod gets rescheduled, it still maintains the same identity/name. The pods are created from the same spec, but are not interchangeable - each
has a unique persistent ID.

Important
Characteristics of sts
.

webapp
Predictable pod name:
webapp-0 webapp-1 webapp-2
In a StatefulSet, each Pod is assigned a predictable name based on the name of the StatefulSet and its index.
10.244.83.193 10.244.83.194 10.244.83.195
For example, if the StatefulSet is named "webapp" and has three replicas, the Pods will be named "webapp-0," StatefulSet
"webapp-1," and "webapp-2." This allows for easy identification and reference to specific Pods within the Set <statefulset-name>-<ordinal-index>

Dns-name .

webapp-0.webapp webapp-1.webapp webapp Fixed individual DNS name:


webapp-0 webapp-1 StatefulSets also provide a fixed individual DNS name for each Pod, based on the predictable name assigned to it. This allows
StatefulSet applications to refer to each Pod by a consistent DNS name, even if the Pod is rescheduled to a different node. For example, if the

Each Pod has a stable hostname based on its ordinal index


StatefulSet is named "webapp," and the Pod is named "webapp-0," the DNS name for that Pod will be "webapp-0.webapp

Headless service: Ordered Pod creation:


StatefulSets are accompanied by a headless service, which allows for direct communication with StatefulSets ensure that Pods are created in a specific order, with each Pod waiting for the
individual Pods rather than the Service as a whole. This is useful for stateful applications that previous one to be ready before starting. This is particularly important for stateful applications
require direct communication between Pods, such as database clusters. that require specific sequencing of events, such as database clusters

podname.headless-servicename.namespace .svc.cluster-domain

Selector
.

webapp
app: webapp
—————

———> webapp-0
webapp-0 webapp-1
|
| 10.244.83.193 webapp-2 webapp-3
app: webapp ————| webapp-0.hs-web.default.svc.cluster-domain 10.244.83.193 10.244.83.194 10.244.83.195 10.244.83.196
hs-web StatefulSet
————
|
| app: webapp Pods are deployed in order from 0 to N-1, and terminated in reverse order from N-1 to 0.
nslookup |
———> webapp-1
———>10.244.83.193
app-service 10.244.83.194 10.244.83.194
webapp-1.hs-web.default.svc.cluster-domain
Why do we need StatefulSets?
Consider an example of a stateful application - a database. Databases are typically stateful, meaning they require persistent storage to store their data. They also require stable
network identities to ensure that client applications can consistently connect to the same instance of the database, If you deploy a database using a regular Deployment or RS,
Kubernetes will create multiple replicas of the database, each with its own randomly assigned hostname and IP address. This can cause problems for the database, as the client
applications may not be able to connect to the correct instance of the database, or data may be lost when pods are deleted or recreated. To solve these problems, you can use a
StatefulSet to manage the deployment and scaling of the database.

Headless service
A Headless Service is a type of Kubernetes service that does not have a ClusterIP assigned to it. Instead, it manages the Domain Name System (DNS) records
directly. This means that when a client tries to connect to a Pod that is part of the Headless Service, it can use the DNS name associated with the Pod's IP
address to directly communicate with the Pod. When used with StatefulSets, it allows addressing each Pod individually using their stable hostnames.
apiVersion: v1 .
app
kind: Service
metadata: .

app
name: app-service app: nginx
app-8k6ar7ye4p

——
spec: app: nginx .

Label
——

ports: app: nginx


Label app: nginx
app-0
- port: 3306 app: nginx
Selector > app-8k6ar7ye4p-ag7ha
selector:
app: mysql |
——————>
Selector
—————

10.244.83.193
—————

|
clusterIP: None | 10.244.83.193
|
app: nginx ————| app: nginx app: nginx app: nginx
app-service ——————————> app-1 app-service
> app-8k6ar7ye4p-nik91
Headless -- -- -- -- ————| ClusterIp 10.102.156.115
| 10.244.83.194 10.244.83.194
10.244.83.193
nslookup | nslookup
———> 10.244.83.194 | ———> 10.102.156.115 app: nginx
app-service 10.244.83.195 app-service
| app: nginx > app-8k6ar7ye4p-a4nk2
app-2
——————>
Headless service does not have ClusterIp service has a unique IP <pod name>-<Replicaset random id>-<pod random id>
<statefulset-name>-<ordinal-index>
a DNS name or an IP address 10.244.83.195 address and DNS name 10.244.83.195

Headless-StatefulSet ClusterIP-Deployment
When a client sends traffic to a Headless Service, Kubernetes returns the IP addresses of all the Pods that are backing the When a client sends traffic to the service, Kubernetes chooses one of the Pods based on a load-balancing algorithm.
service, regardless of their status. This means that the client may receive IP addresses for Pods that are not running or are in a Regular services use a ClusterIP address to load-balance traffic across the Pods that are backing the service
failed state. The client is then responsible for load-balancing the traffic across the individual Pod IP addresses that are returned
.

app
> Regular service provides a single IP address that represents a group of Pods, while a
app: nginx
Headless Service provides individual DNS names and IP addresses for each Pod in the service
——

app: nginx
Label
> Regular services are typically used for stateless applications that can handle traffic from multiple clients, while
app: nginx
> app-0
app-service
——————>
|
Headless Services are more commonly used for stateful applications that require direct access to individual Pods ClusterIp 10.102.156.115 | 10.244.83.193
|
|
app: nginx ————| app: nginx
Headless services can be used in combination with regular services to provide both direct access to individual pods
app-service app-1
>
——————————>
>
and load-balanced access to the service as a whole. For example, you might use a headless service to allow database -- -- -- -- ————
|
nodes to communicate directly with each other, while also exposing a regular service for client applications to connect to | 10.244.83.194
|
|
| app: nginx
>
Regular service has a virtual Service IP that exists as iptables or ipvs rules on each node. A new connection to this service IP is then routed with | >
——————> app-2
DNAT to one of the Pod endpoints, to support a form of load balancing across multiple pods.A headless service (that isn't an ExternalName) will create <statefulset-name>-<ordinal-index>
10.244.83.195
DNS A records for any endpoints with matching labels or name. Connections will go directly to a single pod/endpoint without traversing the service rules.

Headless-ClusterIP -StatefulSet
37
StatefulSets can use two types of storage
|
|
|
Shared Storage |
—————————————————————————————————————————— Dedicated Storage

All Pods in the StatefulSet share the same storage volume. Data is available to all Pods. Each Pod gets its own PersistentVolume. Data is isolated between Pods.
Good for things like caches, tmp files etc. Good for databases, unique files etc.
Specify a PersistentVolumeClaim template in the sfs spec. All Pods will get a clone of this PVC. Don't specify volumeClaimTemplates. StatefulSet will create a PVC for each Pod
data can be corrupted if multiple Pods write to the same files Updating Pods is harder with dedicated storage, may need to coordinate Pod termination to avoid data loss.

Statefulset A
Pv
Statefulset B > Pod A-0 > Pvc A-0 >
> Pod B-0

>
|
Pod Pod | Pv
| >
Pvc B Template > Pod A-1 Pvc A-1 >
Template > Pod B-1 > > |

>
| |
Volume claim | | Pv
>
|
persistentVolume > Pod A-2 | | > Pvc A-2
| PV >
Claim
>
| Pod B-2 template | |

>
| | |
|
|———————————————————————————| | | |
———————————————————————————————— | Persistent Volumes
(PVs)
J

The PersistentVolumeClaims(pvc) will be created from this template

apiVersion: apps/v1 apiVersion: apps/v1


kind: StatefulSet kind: StatefulSet
metadata: metadata:
name: mysql name: mysql-hs
spec: spec:
serviceName field specifies the name of the Headless Service that controls
serviceName: mysql-hs selector:
the network identity of the StatefulSet's Pods and it is a mandatory filed
replicas: 3 matchLabels:
selector: apiVersion: v1 app: mysql
kind: Service serviceName: mysql
matchLabels: metadata:
app: mysql name: mysql-hs replicas: 3
template: spec: template:
ports: metadata:
metadata: - port: 3306
labels: selector: labels:
app: mysql app: mysql
app: mysql clusterIP: None
spec: spec:
containers: containers:
- name: mysql - name: mysql
apiVersion: v1 image: mysql:latest
image: mysql:latest
Pod template kind: PersistentVolume env:
env: metadata:
- name: MYSQL_ROOT_PASSWORD - name: mysql-pv - name: MYSQL_ROOT_PASSWORD
value: "yourpassword" labels: valueFrom:
ports: type: local secretKeyRef:
apiVersion: v1 spec: name: mysql-secret
- containerPort: 3306 kind: PersistentVolumeClaim capacity:
name: mysql key: root-password
metadata: storage: 1Gi
volumeMounts: · name: mysql-pvc
ports:
accessModes:
- name: mysql-persistent-storage spec: - ReadWriteMany - containerPort: 3306
mountPath: /var/lib/mysql accessModes: persistentVolumeReclaimPolicy: Retain name: mysql
volumes: - ReadWriteMany storageClassName: manual volumeMounts:
storageClassName: manual nfs: - name: mysql-shared-storage
- name: mysql-persistent-storage resources: path: /srv/nfs/kubedata/pv3
persistentVolumeClaim: mountPath: /var/lib/mysql
requests: server: 192.168.49.1
claimName: mysql-pvc <——— · storage: 1Gi volumeClaimTemplates:
- metadata:
- v
name: mysql-shared-storage
·

The volumeMounts and volumes are defined in the pod template The PVC must be created beforehand either The PV and PVC are using the "manual" StorageClass. The PVC
spec:
section of the StatefulSet manifest, which means that they will be manually or through some automated process has requested a capacity of 1Gi and has been bound to the PV.
accessModes: [ "ReadWriteMany" ]
shared by all pods created by the StatefulSet
storageClassName: google-storage
k get pv,pvc resources:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE requests:
persistentvolume/mysql-pv 1Gi RWM Retain Bound default/mysql-pvc manual 5h27m storage: 10Gi
V
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE volumeClaimTemplates is specified at the StatefulSet level, not in the pod template
persistentvolumeclaim/mysql-pvc-sts3 Bound mysql-pv 1Gi RWM manual 5h29m
The "volumeClaimTemplates" field in a StatefulSet is used to define persistent volume claims (PVCs)
that will be used by the pods in the set for their storage needs. When a pod is created or rescheduled,
it will automatically create/claim one of these PVCs and use it for its persistent storage

apiVersion: storage.k8s.io/v1
If you set the storageClassName to the name of a StorageClass that is kind: StorageClass
configured with a dynamic provisioner, Kubernetes will automatically create a metadata:
name: google-storage
new PV based on the specifications defined in the volumeClaimTemplates section provisioner: kubernetes.io/gce-pd
….
38
How to deploy an application in k8s?
An application in Kubernetes typically consists of YAML files that define the k8s resources needed to run the application, such as Deployments, Services, ConfigMaps, and
Secrets. You can deploy the application in Kubernetes manually by creating the YAML files and then using the `kubectl apply` command to create the Kubernetes resources on
the cluster. Alternatively, you can use deployment tools like Kustomize, Helm, or the Helm Operator to automate the deployment process and simplify the creation of the YAML
files. These tools provide a higher-level abstraction for managing Kubernetes resources and can make it easier to deploy and manage complex applications in Kubernetes.

>> Manual Deployment:


Manually deploying applications in Kubernetes involves creating YAML files that define the k8s resources needed to run the application, such as deployments, services, and
config maps. You would then use the kubectl apply command to create those resources on the Kubernetes cluster.

StatfulSet ConfigMap Depoyment Service


kubectl apply -f deployment.yml This approach can be useful for simple applications or for users who prefer a more hands-on approach,
but it can be time-consuming and error-prone for more complex applications
kubectl apply -f service.yml
Depoyment Secret Service Ingress kubectl apply -f statfulset.yml

ServiceAccount Pvc
kubectl apply -f serviceaccount.yml
StorageClass Pvc

My-Application

>> Kustomize
Kustomize is a tool for managing k8s manifest files using a declarative approach. It allows you to define a set of base manifests that define the desired state of your Kubernetes
resources, and then apply changes using composition and customization. The basic workflow of Kustomize consists of the following steps:

Create a base directory containing your Kubernetes manifests. This directory represents the
apiVersion: apps/v1
. kind: Deployment
apiVersion: kustomize.config.k8s.io/v1beta1
metadata: kind: Kustomization
desired state of your application or environment ├── base
name: nginx-deployment resources:
labels:
app: nginx - deployment.yaml
│ └── deployment.yaml spec: - service.yaml
Define a kustomization.yaml file in the base directory. This file specifies the base resources to
replicas: 1
│ └── service.yaml … - statfulset.yaml
- serviceaccount.yaml
use, as well as any additional resources that should be added, modified, or removed │ └── statfulset.yaml - configmap.yaml
│ └── serviceaccount.yaml - secret.yaml

Create overlay directories for each environment or application variant, if needed. These overlay │ └── configmap.yaml apiVersion: kustomize.config.k8s.io/v1beta1
│ └── secret .yaml kind: Kustomization
directories contain additional resources or modifications to apply on top of the base resources bases:
│ └── kustomization.yaml - ../base
namePrefix: dev-
├── dev patchesStrategicMerge:
deployment.yaml - |-
└── kustomization.yaml apiVersion: apps/v1
service.yaml kind: Deployment
├── prod metadata:
statfulset.yaml
└── kustomization.yaml
name: nginx-deployment
1 Base spec:
serviceaccount.yaml replicas: 3
configmap.yaml Yamls with common fields This kustomization.yaml file applies a patch to the nginx-deployment resource in
2 Overlays required for all environments the base directory, and adds a prefix "dev-" to the metadata name of all resources
secret.yaml
Yamls with Customization We create two overlay directories: dev and prod. Each overlay directory contains
kustomization.yaml
as per the Environment a kustomization.yaml file that specifies the base to use and any patches to apply
>

apiVersion: kustomize.config.k8s.io/v1beta1
kustomization.yaml kustomization.yaml
kind: Kustomization
bases:
- ../base bases specifies the base directory to use, in this case ../base
Dev Prod namePrefix: prod-
if the original name of the deployment resource is "nginx-deployment", it
patchesJson6902:
will be renamed to "prod-nginx-deployment" in the final set of manifests.
——>

Build customised Manifests for each Environment


- target:
3 kind: Deployment
name: nginx-deployment The patch replaces the replicas field with a value of 5

Kustomize patch: |-
- op: replace
patchesJson6902 field is used to apply JSON patches, which are a more flexible
and expressive way to modify resources compared to patchesStrategicMerge.
Deploy to dev Deploy to prod
path: /spec/replicas
value: 5

Dev cluster Prod cluster

>> Helm
Helm is a widely-used package manager for Kubernetes that simplifies the deployment and management of applications on a k8s cluster. It enables developers to package their
applications as charts, which are reusable and shareable bundles that contain all the resources required to deploy an application on a Kubernetes cluster. With Helm, users can
easily search for charts, install and upgrade applications, rollback changes, and manage dependencies through a straightforward command-line interface. Additionally, Helm
supports versioning, which allows users to track changes to their applications over time and roll back to previous versions if necessary

Helm uses a packaging format called "charts". A chart is a collection of files that describe a related set of kubernetes resources.For example,

StatfulSet ConfigMap Depoyment Service instead of manually creating deployments, services, and other K8s objects, you can package these into a Helm chart. Then, anyone can easily
deploy your application by installing the chart.
Depoyment Secret Service Ingress

ServiceAccount Pvc StorageClass Pvc A Helm chart typically includes the following files:
apiVersion: v2
name: wordpress
description: A Helm chart for deploying
Template
Chart.yaml: This is the core file which includes the name, description, and version of the chart. This file is
WordPress on Kubernetes
version: 1.0.0
used by Helm to identify the chart and to provide information to the user when installing or upgrading the chart
… appVersion: 5.8.0
values.yaml chart.yaml maintainers:
Helm chart
- name: Your Name
email: [email protected]

values.yaml : This file contains the default values for the chart's parameters. These parameters are used in the templates to
wordpress:
image: wordpress:5.8.0-php7.4-apache
imagePullPolicy: IfNotPresent To change a value, you can modify the values.yaml file and
generate the Kubernetes YAML files. The user can override these values during installation or upgrade using the --set flag replicaCount: 1
then run the helm upgrade command
or a values file. This file is used to allow users to customize the behavior of the chart without modifying the templates directly. The values in this file are used by the templates in the
templates/ directory to generate the k8s YAML files
The template syntax, enclosed in double curly braces ({{ }}),
is used to reference the values specified in values.yaml

apiVersion: apps/v1
templates/ : This directory contains the Kubernetes YAML files that define the resources to be deployed. These files are usually written in a templating language kind: Deployment
metadata:

like Go templating or Helm's own template language. The templates can include placeholders for the values defined in the values.yaml file. The templates can also
name: {{ .Release.Name }}-wordpress
labels:
app: wordpress
include logic to conditionally include or exclude resources based on the values of the parameters spec:
replicas: {{ .Values.wordpress.replicaCount }}

spec:
containers:
- name: wordpress
image: {{ .Values.wordpress.image }}

helpers.tpl : This file contains reusable snippets of code that can be used in the templates. These snippets can be used to simplify the templates and make them more
imagePullPolicy: {{ .Values.wordpress.imagePullPolicy }}
ports:
- name: http
readable. For example, a helper function might generate a random password or generate a unique name for a resource. …
containerPort: 80
39
Do you want to deploy an application using Helm in Kubernetes? Here are the general steps to follow

Install Helm: You need to install Helm on your local machine Add the Helm chart repository: Add the Helm chart repository Search for the Helm chart: Use the helm search command Configure the Helm chart: Create a values.yaml file to
or on the cluster where you will be deploying the application. that contains the application you want to deploy using the helm to search for the Helm chart that contains the application configure the Helm chart. This file contains the values
You can follow the official Helm installation guide for your repo add command. You can specify a name for the repository you want to deploy. You can specify the repository name that will be used to replace the placeholders in the
operating system to install Helm. and the URL of the repository. or search all repositories. Kubernetes resource files.

Install the Helm chart: Use the helm install command to install the Helm chart Verify the deployment: After the Helm chart has been installed, Upgrade or rollback the Helm chart: If you need to make changes to the application, you can use
to the Kubernetes cluster. You can specify the release name, namespace, and you can use kubectl commands to verify that the Kubernetes the helm upgrade command to upgrade the Helm chart. If there are issues with the new version,
any other required parameters using the command line or a YAML file resources have been created and are running correctly you can use the helm rollback command to revert to a previous version

Helm chart Helm chart


How to deploy an application such as WordPress from a Helm repository using Helm?
Helm chart Helm chart
Helm uses the Kubernetes
configuration file (usually located at
~/.kube/config) to access the
Kubernetes cluster
Sts Pod Svc
… … ——> ——> Kube-api —>
Deploy SA CM
Helm chart Helm chart 3 Cluster Namespace
Helm repositories are collections of
1 packaged Kubernetes resources,
Wordpress
Add the WordPress Helm chart repository: known as charts
Helm repositories Helm repositories
you need to add the WordPress Helm chart repository to your local
Helm installation. You can do this by running the following command:

helm repo add bitnami https://charts.bitnami.com/bitnami Update the Repository better to create a Kubernetes namespace for your WordPress installation

The Bitnami Helm repository contains a variety of charts for popular This step ensures that Helm has the latest versions in order to isolate the resources associated with your WordPress
applications like WordPress, MySQL. of all the charts from the Bitnami repository. deployment from other resources running in the same Kubernetes cluster

helm repo update kubectl create namespace wordpress

2
Customize the WordPress deployment:
> If you want to customize the installation, you can pass additional parameters to the Helm chart using the --set flag. For
Before deploying WordPress, let's customize some values. The
example, you can set a custom password for the WordPress administrator account by running the following command
default configuration can be obtained using the following command:
helm install my-wordpress bitnami/wordpress --namespace wordpress --set [email protected]
helm show values bitnami/wordpress

You can customize the values in a Helm chart by using the


--set flag when you install the chart or by creating a >
WordPress Helm chart comes with a default values file (values.yaml) which contains all the configuration options. We'll create
values.yaml file that overrides the default values a custom values file (values.yaml) to override some of these defaults, and customize the settings according to ours needs

values.yaml
wordpressUsername: myusername
3
Install the chart: wordpressPassword: mypassword
wordpress:
Once you have customized the values, you can install the chart on your persistence:
Kubernetes cluster using the helm install command. to install the WordPress size: 20Gi
#mariadb.auth.rootPassword= ROOT_PASSWORD
chart with a release name of my-wordpress, run the following command:
mariadb:
auth:
helm install my-wordpress bitnami/wordpress --namespace wordpress --create-namespace -f values.yaml --set service.type=NodePort rootPassword: ROOT_PASSWORD

The my-wordpress argument is the name of the release that Helm will use to track the installation, This command installs WordPress using the values.yaml
and the --namespace wordpress argument specifies the namespace in which to install WordPress file and sets the service.type value to NodePort

Verify the chart:

helm list -n wordpress


NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
my-wordpress wordpress 1 2023-07-28 20:40:28.214200542 +03 +03 deployed wordpress-16.1.33 6.2.2
releaseName A release is an instance of an application deployed by Helm from a chart

Upgrade or rollback the chart:


helm upgrade -f values2.yaml my-wordpress bitnami/wordpress -n wordpress helm upgrade [RELEASE] [CHART] [flags] How to get custom values for a helm release?
helm get values my-wordpress --revision=2 -n wordpress
helm list -n wordpress USER-SUPPLIED VALUES:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION wordpressPassword: "qazwsx"
my-wordpress wordpress 2 2023-09-26 11:42:00.841703412 +03 +03 deployed wordpress-17.1.6 6.3.1 wordpressUsername: daniele
A revision is a versioned change to the release. Each time a release is installed or upgraded, a new revision is created incrementally (rev 1, 2, 3 etc).

helm rollback my-wordpress 1 -n wordpress helm rollback RELEASE_NAME REVISION_NUMBER

>> operator
operator is a method of packaging, deploying, and managing a specific application or workload on a Kubernetes cluster. Operators are essentially Kubernetes controllers that
are designed to automate the deployment and management of complex applications or services
An operator typically consists of custom resources, custom controllers, and a set of Kubernetes objects that are defined to manage the application or workload. The custom
resource is a Kubernetes object that represents the desired state of the application or workload, while the custom controller is responsible for ensuring that the actual state of
the application or workload matches the desired state.
Kind: StatefulSet Kind: myapp
… …
| |
|
Managing stateful applications in Kubernetes can be challenging, but operators are particularly well-suited for
|
| Pod | Pod
this task. For example, an operator for a database application might automate tasks such as provisioning new StatefulSet ——
—>
CRD Controller StatefulSet ——
—>
—— ——>
database instances, scaling the database up or down, performing backups and restores, and handling failovers
——
—> —>
Pod Operator Pod
OLM
)

Dev How to scale up a StatefulSet? Dev operators can definitely help to automate many routine tasks
Operators are typically implemented using the Kubernetes Operator SDK, which provides
How to do leader election in Kubernetes? associated with managing complex applications in Kubernetes, &
a set of tools and libraries for building, testing, and deploying operators
How to migrate databases? this can free up human operators to focus on more strategic tasks
..
40
Ingress
Ingress is an API object in Kubernetes that allows access to your Kubernetes services from outside the Kubernetes cluster. It provides load balancing, SSL termination and
name-based virtual hosting for your services, In other words, it's a way for your applications to expose URLs to the outside world.

Proxy Server or
Ingress provides external reachable URLs, SSL termination and name-based Ingress provides layer 7 load balancing. It acts as a reverse proxy
virtual hosting to services in the cluster. This means you can route requests to and load balances traffic to different services in your Kubernetes
External LB
different services based on the request host or path. cluster | ——————————————————————|
| |
Ingress object allows you to expose multiple services through a single IP address | |
Ingress
| |
If you use the LoadBalancer service type, the service is made available to clients outside the cluster through a load balancer. This approach is fine if you only need to | |
expose a single service externally, but it becomes problematic with large numbers of services, since each service needs its own public IP address.Fortunately, by exposing | |
Service Service
these services through an Ingress object instead, you only need a single IP address. | (ClusterIP)
app: wear app: video |
(ClusterIP)
| |
Ingress consists of two main components: Ingress controller |
_____________ |
Ingress resources | |
Pod Pod Pod Pod Pod Pod
| app: wear app: wear app: wear app: video app: video app: video |
>
Ingress resource is a Kubernetes API object that defines the rules for how external traffic should be directed to services within a | |
Deployment wear-app Deployment Video-app
cluster. The ingress resource specifies the rules for routing traffic based on the host name, path, and other criteria. It also specifies | |
| |
the backend services that should receive the traffic. | Service
app: mysql |
(Headless ) Selector ———
| |
>
Ingress controller is responsible for implementing the rules defined in the ingress resource and handling external traffic based on | |
Pod Pod
these rules. Ingress controllers like Nginx use ConfigMaps to store the configuration for the ingress resources and dynamically | |
app: mysql app: mysql Label |
| —————

generate Nginx configuration based on the rules defined in the ingress resource
MySql-0 MySql-1
| Statfulset
|
|——————————————————————|
Kubernetes only provides the Ingress resource and needs a separate Ingress Controller to satisfy the Ingress. There are several options
available, but for the purpose of this guide, we'll use the Nginx Ingress Controller.Install the Nginx Ingress Controller

In order for the Ingress resource to work, the cluster must have an ingress controller running.Unlike other types of controllers which run as part of the kube-controller-manager binary, Ingress controllers are not started automatically with a cluster.
You have to select an Ingress Controller compatible with your setup and start it manually. (The actual implementation of Ingress is done by Ingress Controllers)

How Does an Ingress Controller Work?


Here’s a simplified view of how an Ingress Controller works:
1 You define an Ingress Resource in your cluster, which has a set of routing rules associated with it.
2 The Ingress Controller continuously watches for updates to Ingress Resources , Service, and Endpoints or EndpointSlice objects. When it detects a new or modified these objects, the
controller is notified. it reads the information in these objects to understand what traffic routing changes it needs to make.
3 The Ingress Controller configures the load balancer to implement the desired traffic routing.

2 The Ingress Controller continuously watches for updates to Ingress Resources, Service, and Endpoints or EndpointSlice objects.

|—————————————| When an ingress resource is created or updated, the ingress controller reads the configuration information from the
ingress controller | Service | corresponding ConfigMap and generates the configuration for the load balancer based on the rules defined in the ingress
shop.com/video | Ingress |
Client
shop.com/wear resources EndpointSlices resource. The load balancer configuration is then dynamically updated to reflect the changes in the ingress resource.
LB/Reverse Proxy | >
|
|————————————— |
1 Ingress resource is the YAML configuration that defines the rules for routing traffic.
>

shop.com/video Service apiVersion: networking.k8s.io/v1 The ingress controller uses the service name specified in the ingress rules to lookup the IP addresses of
server { 3 video-service kind: Ingress the pods backing that service. It then routes traffic to those pods according to the path matching rules
server_name shop.com ;
Pod Pod metadata: defined in the ingress
listen 80 ;
listen 443 ssl http2 ; app: video app: video name: ingress-path
location /video/ { spec:
10.1.1.2 10.1.1.3
set $namespace "default"; rules:
set $ingress_name "shop-ingress"; Deployment: video-app
Service
- host: shop.com
set $service_name "video-service"; shop.com/wear http:
set $service_port "http"; wear-service paths:
set $location_path "/video"; Pod Pod
… app: wear app: wear - path: /video Ingress resources define rules for routing HTTP/HTTPS traffic to services in a
} pathType: Prefix Kubernetes cluster. They specify path-based rules that map URLs to backend services.
apiVersion: v1 10.1.1.4 10.1.1.5 backend:
Deployment: wear-app
location /wear/ { kind: Service
metadata: service:
set $namespace "default"; name: wear-service name: video-service kubectl describe ingress ingress-path
spec: port: Name: ingress-path
set $ingress_name "shop-ingress"; selector:
set $service_name "wear-service"; name: http Namespace: default
app: wear Address:
set $service_port "http"; ports: - path: /wear
set $location_path "/wear"; - name: http Default backend: <default>
} }
pathType: Prefix Rules:
Nginx.conf port: 80
backend: Host Path Backends
targetPort: 8080
service: ---- ---- --------
name: wear-service shop.com
when configuring an Ingress resource, the "backend" field specifies the service that should receive the forwarded traffic. However, it's important /video video-service:80 (10.1.1.2:5678, 10.1.1.3:5678)
to note that the traffic never directly reaches the service itself. Instead, controller uses service endpoints to route the traffic, not the service.
port: /wear wear-service:80 (10.1.1.4:5678, 10.1.1.5:5678)
number: 80 Events: <none>

there is one rule specified for the `shop.com` host, and under that rule, there are two paths Ingress controllers often include a default backend component that handles traffic
(`/video` and `/wear`) defined for routing traffic to their respective backend services. that doesn't match any Ingress rules.

How to customize Nginx Ingress Controller?

Helm Chart Values: If deploying the Ingress controller via Helm chart, you can customize settings by overriding chart values. The Helm chart exposes many config settings as values.
ConfigMap: using a ConfigMap to set global configuration in NGINX, For example, you can specify custom log formats, change timeout values, enable features like GeoIP, etc
Annotation: use this if you want a specific configuration for a particular ingress rule.

How to enable Basic Authentication for an ingress rule in Kubernetes?

This example shows how to add authentication in a Ingress rule using a secret that contains a file generated with htpasswd

1 Generate the base64 encoded user/pass combo: 3 Configure the Ingress rule to use the basic authentication secret
apiVersion: networking.k8s.io/v1 Rule >> Shop.com
htpasswd -nbm arye Heisenberg | base64 kind: Ingress
metadata:
2 Convert htpasswd into a secret: name: shop.com-admin
Path >> /admin /video /Wear
>>

namespace: default
kubectl create secret generic shop-basic-auth --from-literal=auth=<base64 output> annotations:
Or kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/auth-type: basic Authentication Required
apiVersion: v1 |———————————————|
kind: Secret nginx.ingress.kubernetes.io/auth-secret: shop-basic-auth | |
metadata: nginx.ingress.kubernetes.io/auth-realm: Authentication Required | management-service |
| |
video-service wear-service
spec:
name: shop-basic-auth rules:
|———————————————|
namespace: default - host: shop.com
data: http: Pod Pod Pod
>> auth: YXJ5ZTokYXByMSQxbzAzWElTQiRJVFYudWh0dmcuVmV5d0t5a0s1cC4vCgo= paths:
- path: /admin
pathType: Prefix
Or backend:
service:
kubectl create secret generic shop-basic-auth --from-literal=username=arye --from-literal=password=Heisenberg name: management-service kubernetes.github.io/ingress-nginx/examples/
port:
name: http You can find more examples in this link
41
How to enable TLS for an ingress rule in Kubernetes?

Method 1: Self-signed certificate apiVersion: networking.k8s.io/v1


kind: Ingress
Generate a self-signed certificate and private key: metadata:
name: ingress-path
openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout tls.key -out tls.crt -subj "/CN=arye.ir" annotations: This YAML manifest describes an Ingress resource that enables TLS for the host
nginx.ingress.kubernetes.io/ssl-redirect: "true" "arye.ir", redirects HTTP traffic to HTTPS, and defines a routing rule for the path
Create a secret containing the key and certificate: spec:
"/booklet" to the backend service named "book-service" on the specified port
kubectl create secret tls tls-secret --key tls.key --cert tls.crt tls:
- hosts:
- arye.ir
Method 2: Use Certbot secretName: tls-secret
Configure Ingress to Use the Certificate rules:
——————————————> - host: arye.ir
Use Certbot to generate a TLS certificate for your domain. Reference tls-secret secret in your Ingress resource http:
certbot --manual --preferred-challenges dns certonly -d arye.ir paths:
- path: /booklet
Create a Kubernetes secret that contains the private key and the certificate
pathType: Prefix
backend:
kubectl create secret tls tls-secret --key privkey.pem --cert cert.pem service:
name: book-service
port:
Method 3: Use Cert-manager name: http
… You can find more annotations in this link

Cert-manager https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/annotations.md

Cert-manager is a certificate management controller for k8s. It helps with issuing and renewing certificates from various sources, such as Let's Encrypt, HashiCorp Vault, Venafi.
cert-manager ensures certificates are valid and up to date, and will attempt to renew certificates at a configured time before expiry.

main features of
Issuing and renewing certificates from a variety of sources DNS01 and HTTP01 ACME challenge solver support for Let's Encrypt certificates Issuing certificates for Certificate resources using CRDs
cert-manager
Automated creation and updating of k8s Secrets with certificates Issuing certificates for Ingress resources with annotations

Cert-manager mainly uses two different custom Kubernetes resources (CRDs) to configure and control how it operates, as well as to store state. These resources are Issuers and Certificates.
Issuer is an object that represents a particular certificate authority or a specific method for issuing certificates. It defines the parameters and configurations required to request certificates.
An Issuer can be used to issue certificates within a single namespace or cluster in Kubernetes. There are different types of issuers supported by CertManager, such as:

ACME Issuer: This type of issuer integrates with the Automated Certificate Management CA Issuer: This type of issuer is used when you have an existing Self-Signed Issuer: This type of issuer is used when you want to
Environment (ACME) protocol, which is commonly used by Let's Encrypt and other CAs. certificate authority (CA) that you want to use for issuing certificates. generate self-signed certificates within the Kubernetes cluster. It
ACME issuers automate the process of obtaining and renewing certificates. It requires you to provide the CA's certificate and private key. is typically used for testing or development purposes

Certificates resources allow you to specify the details of the certificate you want to request. They reference an issuer to define how they'll be issued. 5
It represents a desired state for a certificate and provides a way to request, issue, and renew certificates automatically.

Ingress 6
what happens when you create a Certificate resource in cert-manager: 4
7
Service Cert-secret
1. You create a Certificate resource with details like the domain name, secret name to store certificate, and reference to the Issuer
3
2
2. The cert-manager controller sees the new Certificate and kicks off the issuance process
ISSUER
3. cert-manager first checks if the referenced Issuer exists and is valid. The Issuer has the details for the certificate authority 1 Certificate type:ACME
4. cert-manager requests the certificate authority (CA) like Let's Encrypt to issue a certificate for the requested domain Cluster Domain:arye.ir Server:Let’s Encrypt

5. The CA validates that you own/control the domain name by performing a challenge. For example, with HTTP challenge, you need to have a temporary file served on the domain
6. Once domain ownership is validated, the CA issues the signed certificate. The certificate is returned to cert-manager
7. cert-manager takes the certificate and creates or updates the Kubernetes secret defined in the Certificate. This secret will contain the certificate and private key.

How can I issue a certificate for the domain arye.ir using cert-manager from Let's Encrypt?

Configure Let's Encrypt Issuer —>———>———>— Issue a Certificate ——>—— Configure Ingress to Use the Certificate —>— Validate the Setup
————————————————————————————————————————

Cert-manager uses 'Issuer' or 'ClusterIssuer' resources to represent Create a certificate resource to obtain the certificate Your Ingress configuration should use the Check if the certificate has
certificate authorities. We'll create a 'ClusterIssuer' for Let's Encrypt. from Let's Encrypt for the specified domain. secret arye-ir-tls for its TLS configuration been issued successfully

apiVersion: cert-manager.io/v1 apiVersion: cert-manager.io/v1 apiVersion: networking.k8s.io/v1 kubectl describe certificate arye-ir-cert
kind: ClusterIssuer kind: Certificate kind: Ingress
metadata: metadata: metadata: Name: arye-ir-cert
name: letsencrypt-prod name: arye-ir-cert Namespace: default
name: arye.ir-ingress Labels: <none>
spec: namespace: default spec: Annotations: <none>
acme: spec: tls: API Version: cert-manager.io/v1
# The ACME server URL secretName: arye-ir-tls - hosts: Kind: Certificate
server: https://acme-v02.api.letsencrypt.org/directory issuerRef: - arye.ir …
# Email address used for ACME registration name: letsencrypt-prod Spec:
- *.arye.ir Common Name: arye.ir
email: [email protected] kind: ClusterIssuer secretName: arye-ir-tls DNS Names:
# Name of a secret used to store the ACME account private key commonName: arye.ir rules: arye.ir
privateKeySecretRef: dnsNames: - host: arye.ir Issuer Ref:
name: letsencrypt-prod - arye.ir http: Group: cert-manager.io
# Enable the HTTP-01 challenge provider - *.arye.ir Kind: ClusterIssuer
paths: Name: letsencrypt-prod
solvers: duration: 90d - pathType: Prefix Secret Name: tls-secret
- http01: path: "/" Status:
ingress: backend: Conditions:
After a few moments, cert-manager should issue a certificate for your domain
class: nginx service: Last Transition Time: 2023-09-27T10:01:00Z
and store it in the secret specified in the certificate resource. You can verify that Message: Certificate is up to date and has not expired
name: your-service-name
the certificate has been issued by checking the contents of the secret:

privateKeySecretRef specifies the name of the Kubernetes secret that will be used to store port: Not After: 2024-09-27T10:00:00Z
the private key for the certificate. Data number: 80 Events: <none>
kubectl describe secret arye-ir-tls -> ====
tls.crt: 2316 bytes
solvers specifies the method for verifying ownership of the domain. In this case, we are using tls.key: 1704 bytes
apiVersion: networking.k8s.io/v1
the HTTP-01 challenge, which involves creating a temporary file in the web root of the domain
secretName specifies the name of the Kubernetes secret that will be used to store the TLS
kind: Ingress
and responding to an HTTP request to that file. The ingress field specifies that we will use an
certificate and private key. metadata:
Ingress resource to serve the challenge. name: arye.ir-ingress
issuerRef specifies the name and kind of the Kubernetes resource that is used as the issuer for
annotations:
this certificate. In this case, we are using the previously defined letsencrypt-prod ClusterIssuer
cert-manager.io/cluster-issuer: "letsencrypt-prod"
commonName specifies the common name for the TLS certificate. In this case, it is set to arye.ir. spec:

this ClusterIssuer can be referenced by other resources like Certificate or dnsNames specifies the list of DNS names for which the TLS certificate should be issued. In tls:
this case, we are issuing the certificate for arye.ir and all subdomains of the specified domain
- hosts:
ingress to automatically generate and manage self-signed certificates.
- arye.ir
apiVersion: cert-manager.io/v1 - *.arye.ir
kind: ClusterIssuer You can configure TLS for Ingress using secretName: arye-ir-tls cert-manager.io/cluster-issuer: References
——> ———>——>— rules:
metadata: annotations instead of Certificate resources the Issuer resource in cert-manager that will
- host: arye.ir
name: selfsigned-issuer be used to obtain the certificate
spec: • Use annotations for basic, single TLS certificate configuration. Simpler, but less flexible.

selfSigned: {} • Use Certificate resources for multiple certificates, automation, and advanced management. More complex, but more flexible and powerful
42
Add-ons

Kubernetes has a rich ecosystem of add-ons and extensions that provide additional functionality and features to enhance and extend the capabilities of a Kubernetes cluster.
So far, we have covered a few of these add-ons in the booklet. Now, let's introduce some additional add-ons that can further enhance your Kubernetes experience:

Argo CD is a powerful open-source tool designed for Kubernetes, enabling GitOps continuous delivery. It simplifies application deployment and management by utilizing a declarative approach. With Argo CD,
you can define the desired state of your applications using Kubernetes manifests stored in a Git repository. It provides a user-friendly graphical interface to monitor application status, track changes, and roll
back if needed. By following GitOps principles, Argo CD ensures that your cluster's configuration matches the desired state defined in the repository, automatically deploying and updating applications.

Service mesh add-ons, like Istio and Linkerd, are powerful tools that enhance the networking and observability capabilities of microservices within a Kubernetes cluster. By integrating transparently with the
cluster, they offer advanced features for traffic management, security, and distributed tracing. These service mesh solutions enable fine-grained control over traffic routing, load balancing, and fault tolerance
mechanisms, ensuring efficient and reliable communication between microservices. With built-in security features like mutual TLS authentication and encryption, they provide robust protection for service-to-
service communication. Additionally, service mesh add-ons enable comprehensive observability with distributed tracing, metrics collection, and logging, allowing for deep insights into the behavior and
performance of microservices.

Rook and Longhorn are two notable storage-related add-ons for Kubernetes. Rook is a cloud-native storage orchestrator that enables the deployment and management of various storage solutions as native
Kubernetes resources. It automates the provisioning, scaling, and lifecycle management of distributed storage systems like Ceph, CockroachDB, and more. On the other hand, Longhorn is a lightweight, open-
source distributed block storage system built for Kubernetes. It provides reliable, replicated block storage for stateful applications, offering features like snapshots, backups, and volume expansion. Together,
Rook and Longhorn empower Kubernetes users to easily deploy and manage resilient, scalable, and persistent storage solutions within their clusters, enhancing the availability and data management
capabilities of their applications.

Monitoring and logging add-ons, such as Prometheus, facilitate the collection and storage of time-series data and metrics from diverse Kubernetes components and applications, enabling comprehensive
analysis and alerting capabilities. Additionally, Fluentd serves as a dependable log aggregation tool, simplifying the gathering, parsing, and forwarding of logs from multiple sources to ensure centralized and
scalable log management. The ELK (Elasticsearch, Logstash, and Kibana) stack offers a comprehensive solution for monitoring and logging, utilizing Elasticsearch for efficient log indexing and searching,
Logstash for data processing and filtering, and Kibana for visualizing and analyzing log data. Together, these add-ons provide Kubernetes users with powerful tools for monitoring performance, detecting
issues, and gaining valuable insights to optimize their Kubernetes environments.

Additionally, The CNCF landscape is an excellent resource for discovering and exploring a vast array of add-ons and tools within the cloud-native ecosystem. It offers a visual representation of different
projects and categories, allowing users to navigate through various technologies that can enhance their Kubernetes deployments. Whether you're looking for monitoring and observability tools, networking
solutions, or storage options, the CNCF landscape provides a comprehensive overview of the available options. By exploring the CNCF landscape, you can expand your knowledge and make informed
decisions about incorporating the right add-ons and tools into your Kubernetes and cloud-native environments. It's a valuable resource for staying up-to-date with the latest innovations and finding the best
solutions to meet your specific needs.

You might also like