0% found this document useful (0 votes)
159 views40 pages

k8s Primer

The document discusses the key components of a Kubernetes architecture including masters, nodes, pods, services, and deployment controllers. It provides details on: 1. Masters control the cluster and include an API server, scheduler, cluster store, and controllers. 2. Nodes run pods and include kubelet, container engines like Docker, and kube-proxy. 3. Pods are the basic deployable units that can include one or more containers. 4. Deployment controllers manage deploying pods based on a desired configuration. 5. Services provide discovery and load balancing for pods and have their own IP and DNS.

Uploaded by

sunil kalva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views40 pages

k8s Primer

The document discusses the key components of a Kubernetes architecture including masters, nodes, pods, services, and deployment controllers. It provides details on: 1. Masters control the cluster and include an API server, scheduler, cluster store, and controllers. 2. Nodes run pods and include kubelet, container engines like Docker, and kube-proxy. 3. Pods are the basic deployable units that can include one or more containers. 4. Deployment controllers manage deploying pods based on a desired configuration. 5. Services provide discovery and load balancing for pods and have their own IP and DNS.

Uploaded by

sunil kalva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Architecture

Master
Master is the controlling element of the cluster. Some people call it the “Brain" of the cluster. It is the
only endpoint that is open to the users of the cluster. For the purpose of fault-tolerance, one cluster may
have multiple masters.
Master has 4 parts:

1. API server: This is the front end that communicates with the user. It is a REST-based API that is
designed to consume JSON inputs. As a default, it runs in port 443.
2. Scheduler: Scheduler watches API server for new Pod requests. It communicates with Nodes to
create new pods and to assign work to nodes while allocating resources or imposing constraints.
3. Cluster store: Cluster store is a persistent storage holding cluster states and configuration
details. It uses ETCD (open-source distributed key-value store) to store this data.
4. Controller: Includes Node controller, Endpoint Controller, Namespace Controller, etc.

Nodes (Slaves/Minions)
Nodes are the workers. They are the ones that do all the “Work” assigned to the cluster. Inside a Node,
there are 3main components, apart from the “Pods” (I will talk about Pods later on). Those 3 parts are;
1. Kubelet Kublets do a lot of work inside a Node. They register the nodes with the cluster, watch for
work assignments from the scheduler, instantiate new Pods, report back to the master, etc.
2. Container Engine Container Engine is the responsible person for managing containers. It does all the
image pulling, container stopping, starting, etc. Most widely used container engine is Docker. However,
you can also use Rocket for this.
3. Kube Proxy Kube Proxy is responsible for assigning IP addresses per pod. Each time a pod is created,
a new IP address will be allocated for that pod. Kube Proxy also does the Loadbalancing work.

Apart from those mentioned components, Nodes have their own default pods like logging, health
checking, DNS, etc. Each node exposes 3 read-only endpoints through (usually) localhost:10255. Those
endpoints are,
● /specs
● /healthz
● /pods
Essential Components of Kubernetes
There are few main components of a Kubernetes Cluster architecture that anyone should know before
going into working with Kubernetes. First one is a Pod:

Pods

A pod is the atomic unit of deployment or scheduling in Kubernetes.

The Pod is a Ring-faced environment with its own Network stack and Kernal namespaces. It has
containers inside. No pod can exist without a container. But there can be single-container pods or
multi-container pods depending on the application we deploy.

For example, if you have a tightly coupled application with an API and a log, you can use one container
for API and another for the log. But you can deploy both of them in the same Pod. However, industry
best practice is to go with single-container architecture.
Another small thing to note about Pod is that they are “Mortal”. Confused? Let me explain. A pod’s
life-cycle has 3 stages:

Pending → Running → Succeeded/Failed

This is similar to Born → Living → Dead. There will be no Resurrection; no re-birth. If a Pod died without
completing his task, a new Pod will be created to replace the dead Pod. The most important thing is, this
new pod’s IP and all other factors will be different from the dead pod.

Deployment Controller

To manage the Pods, there are numerous controllers presented in Kubernetes. Such controller used for
the purpose of deployment and declarative updates is known as Deployment Controller.

In the Deployment object (mostly used format is a YAML file. But in this tutorial, I use command line) we
can describe our “Desired state” like what is the image needed to be deployed, what are the ports to
expose, how many replicas to have, what are the labels needed to be added, etc. What Deployment
Controller does is to check this desired state periodically and make changes in the cluster to make sure
the desired state is achieved.

Service

Another one component I am going to use in this tutorial is “Service”. Before telling what is a Service, I
will describe why we need a service.

As I mentioned earlier, Pods are mortal. When a pod dies, a new one is born to take its place. It doesn’t
have the same IP address as the dead one.

So think of a scenario where we have a system with both front end service and backend service. From
the front end to call the backend, we need an IP or URL. Let's assume we used the pod IP of the backend
service inside the frontend code. We face three issues:

1. We need to first deploy our backend and take its IP. Then we need to include it in the front
end code before making the docker image. This order must be followed.

2. What if we want to scale our backend? We need to update the frontend again with the
new pod IPs.

3. If the backend pod dies, a new pod will be created. Then we need to change the front end
code with the new pod IP and make the docker image again. We also have to swap the
image in the frontend. This will become even more problematic if backend has several
pods.

Too much work and complicated work. This is why we need a “Service”.
How Kubernetes Service works
Service has its own IP address and DNS which are stable. So the frontend is successfully decoupled
from the backend services. Therefore, a Service is a High-level stable abstract point for multiple pods.
For the discovery of Pods, a service uses something called “labels”. Pods belong to a Service via labels.
In the service initializing stage, we describe what labels the service should look for via “selector” flag. If
the Service found a Pod with all the labels mentioned in the selector section, the Service will append its
endpoint list and add the pod to the list. (Having extra labels than the mentioned, is acceptable. But
should not miss any label mentioned.)
When a request comes to the Service, it uses a method like Round-Robbin, Random, etc. to select the
request forwarding pod.
Use of Service object facilitates us with many advantages, like request forwarding to only healthy pods,
load balancing, roll-back of versions, etc. But the most important advantage of a Service is successful
decoupling of System components.
There are 5 types of Services available in Kuberntes which we can choose according to our purpose:
(Source: Kubernetes.io, 2019)
1. ClusterIP: Exposes the service on a cluster-internal IP. Choosing this value makes the service
only reachable from within the cluster. This is the default ServiceType.
2. NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP
service, to which the NodePort service will route, is automatically created. You’ll be able to
contact the NodePort service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
3. LoadBalancer: Exposes the service externally using a cloud provider’s load balancer. NodePort
and ClusterIP services, to which the external load balancer will route, are automatically created.
4. ExternalName: Maps the service to the contents of the externalName field (e.g.
foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set
up. This requires version 1.7 or higher of kube-dns

Minikube

➜ ~ minikube start --nodes 3 -p multinode


➜ ~ minikube profile multinode <setting default profile to multinode>
✅ minikube profile was successfully set to multinode
➜ ~ minikube ip
192.168.59.100
https://faun.pub/metallb-configuration-in-minikube-to-enable-kubernetes-service-of-type-loadbalancer
-9559739787df

Setup

kubeadm, kubelet and kubectl


1. kubeadm: the command to bootstrap the cluster.
2. kubelet: the component that runs on all of the machines in your cluster and does things like
starting pods and containers.
3. kubectl: the command line util to talk to your cluster.

Common

cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list


deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
apt install -qq -y kubeadm=1.21.0-00 kubelet=1.21.0-00 kubectl=1.21.0-00
sudo apt-mark hold kubelet kubeadm kubectl

OPTIONAL

So in particular case docker was using the groupfs which i changed to systemd
Create the file as
[root@k8smaster ~]# vim /etc/docker/daemon.json
{ "exec-opts": ["native.cgroupdriver=systemd"] }
[root@k8smaster ~]# systemctl restart docker
[root@k8smaster ~]# systemctl status docker
Master Node

1. kubeadm init --apiserver-advertise-address=<<Master ServerIP>>


--pod-network-cidr=192.168.0.0/16
2. mkdir -p $HOME/.kube
3. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
4. sudo chown $(id -u):$(id -g) $HOME/.kube/config
5. kubectl create -f https://docs.projectcalico.org/v3.18/manifests/calico.yaml

Dashboard

kubectl apply -f
https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.0/aio/deploy/recommended.yaml
kubectl get services --all-namespaces
kubectl -n kubernetes-dashboard edit service kubernetes-dashboard
kubectl proxy --address='0.0.0.0' --disable-filter=true &
kubectl get services --all-namespaces
http://192.168.1.244:8001/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboa
rd:/proxy/#/workloads?namespace=default

Starting kube api server


minikube start --extra-config=apiserver.Features.EnableSwaggerUI=true

The easiest way to access the Kubernetes API with when running minikube is to use
kubectl proxy --port=8080

You can then access the API with


curl http://localhost:8080/api/

Kubectl

Port Forwarding

Kubectl port-forward allows you to access and interact with internal Kubernetes cluster processes from
your localhost. You can use this method to investigate issues and adjust your services locally without the
need to expose them beforehand.
Even though Kubernetes is a highly automated orchestration system, the port forwarding process
requires direct and recurrent user input. A connection terminates once the pod instance fails, and it’s
necessary to establish a new forwarding by entering the same command manually.

1. The port-forward command specifies the cluster resource name and defines the port number to
port-forward to.
2. As a result, the Kubernetes API server establishes a single HTTP connection between your
localhost and the resource running on your cluster.
3. The user is now able to engage that specific pod directly, either to diagnose an issue or debug if
necessary.

Port forwarding is a work-intensive method. However, in some cases, it is the only way to access internal
cluster resources.

K8S API

kubectl proxy --port=8081


$ curl http://localhost:8080/api/v1/nodes
{
"kind": "NodeList",
"apiVersion": "v1",
"metadata": {
"selfLink": "/api/v1/nodes",
"resourceVersion": "602451”
},
"items": [
{
"metadata": {
"name": "master01",
"selfLink": "/api/v1/nodes/master01",

]
}
Indexer

In Kubernetes, an indexer is a component that provides efficient indexing and querying capabilities for
resources in the Kubernetes API server. It is responsible for maintaining an index of the desired objects
based on specified fields.
The indexer is used by various Kubernetes controllers, clients, and other components to quickly retrieve
and filter objects based on specific criteria. It allows for efficient searching and retrieval of resources
without the need to iterate through all objects.
When an object is created or updated in the Kubernetes API server, the indexer updates its index
accordingly. This allows for fast lookup and retrieval of objects based on various fields, such as labels,
annotations, or custom-defined fields.
The indexer plays a crucial role in enabling efficient and performant operations within the Kubernetes
ecosystem, especially when dealing with large-scale deployments and managing numerous resources. It
enhances the overall responsiveness and scalability of the Kubernetes API server by providing optimized
access to resources based on different search criteria.

Cache

In Kubernetes, the cache object is a client-side cache mechanism that is utilized by various components
to store and retrieve information from the Kubernetes API server. It acts as an in-memory cache of the
Kubernetes API resources, allowing for faster access and reducing the need for repeated API calls.
The cache object is typically used by controllers, clients, and other Kubernetes components that require
frequent access to resource information. It helps improve the performance and efficiency of operations
by reducing the network latency and API server load.
When using the cache object, the client fetches the desired resources from the API server and stores
them in the cache. Subsequent requests for the same resources can then be served directly from the
cache, eliminating the need for additional API calls. The cache object automatically manages the
synchronization and refreshing of the stored resource information to ensure its accuracy and consistency.
Additionally, the cache object provides functionalities such as indexing and event handling. It can index
resources based on specific fields, allowing for efficient lookup and filtering operations. It also receives
and processes events from the API server, keeping the cached resources up to date with any changes
happening in the cluster.
By utilizing the cache object, Kubernetes components can optimize their interactions with the API server,
improve performance, and reduce the overall load on the cluster.

Informers

In Kubernetes, the Informer object is a client-side caching and event handling mechanism provided by
the client-go library. It enables efficient tracking and retrieval of resource changes from the Kubernetes
API server.
The Informer acts as a controller that watches and synchronizes a specific set of resources with the API
server. It maintains a local cache of the watched resources and keeps it up to date by handling incoming
events. These events can include creations, updates, deletions, or other changes to the watched
resources.
By using the Informer, client applications can avoid making frequent direct API calls to the server and
instead rely on the cached data. This improves performance and reduces unnecessary network traffic
and API server load.
In addition to caching, the Informer provides a convenient way to handle events related to the watched
resources. It allows developers to define event handlers that get executed when specific types of events
occur. This enables applications to react to changes in real-time and take appropriate actions, such as
updating local state, triggering additional processes, or sending notifications.
The Informer can be configured with various options, such as the resource type, namespace, and update
frequency, to tailor its behavior according to the application's needs.
Overall, the Informer object simplifies resource synchronization and event handling in Kubernetes client
applications, improving efficiency and responsiveness while reducing the reliance on direct API calls.

ClientSet

In Kubernetes, the clientset is a client library provided by client-go, which is the official Go client for
interacting with the Kubernetes API server. The clientset serves as a high-level interface that simplifies
the process of interacting with various Kubernetes resources and performing operations on them.
The clientset is generated from the Kubernetes API specification and provides a set of typed client
objects for each resource type in the API. These client objects offer methods and functions that abstract
away the complexities of directly interacting with the API server, making it easier for developers to
perform CRUD (Create, Read, Update, Delete) operations on Kubernetes resources.
With the clientset, developers can easily create, retrieve, update, and delete resources such as pods,
services, deployments, namespaces, and more. It handles the low-level details of authenticating with
the API server, constructing API requests, and processing responses.
The clientset also supports various optional configuration parameters that allow customization, such as
specifying the API server URL, authentication credentials, timeouts, and transport settings.
By utilizing the clientset, developers can write Kubernetes applications in Go and interact with the
Kubernetes API server in a more intuitive and efficient manner. It provides a convenient and idiomatic
way to work with Kubernetes resources, reducing the amount of boilerplate code needed and improving
productivity.

Controllers

Kubernetes controllers are components that track at least one Kubernetes resource type. Each resource
object has a spec field that represents the desired state. The controller(s) for that resource are
responsible for ensuring that the current state is as close as possible to the desired state. There are
many different types of controllers in Kubernetes, each with its own specific purpose.
What is the Controller Manager?

The Controller Manager is a key component of the Kubernetes control plane. It is responsible for running
various controllers that watch the state of Kubernetes resources and reconcile the actual state with the
desired state.
The Controller Manager runs as a set of processes on the Kubernetes master node. Examples of
controllers that ship with Kubernetes today are the replication controller, endpoints controller,
namespace controller, and serviceaccounts controller.Each of these controllers has a specific set of
responsibilities, such as managing the number of replicas for a given deployment or ensuring that the
endpoints for a service are up-to-date.

Controller vs Controller Manager

● Controller: Tracks at least one Kubernetes resource type and is responsible for making the
current state come closer to the desired state.
● Controller Manager: A Kubernetes control plane component that runs multiple controllers
and ensures that they are functioning correctly.

In simpler terms, a controller is responsible for managing a specific resource’s desired state, while the
controller manager manages multiple controllers and ensures they are working as intended.

Top 10 Important Kubernetes Controllers

As a beginner, it can be overwhelming to learn about all the controllers available in Kubernetes. But not
all controllers are created equal. In this post, we will identify the top 10 Kubernetes controllers that will
provide a focused learning plan to master them.

1. ReplicaSet Controller

The ReplicaSet controller is responsible for ensuring that the specified number of replicas of a pod is
running at all times. It is a successor to the deprecated Replication Controller and is widely used in
Kubernetes deployments.

2. Deployment Controller

The Deployment controller is used to manage the deployment of pods in a declarative way. It can scale
up or down the number of replicas, do a rolling update, and revert to an earlier version if necessary.

3. StatefulSet Controller

The StatefulSet controller is used to manage stateful applications that require unique network identifiers
and stable storage. It ensures that pods are deployed in a predictable order and that each pod has a
unique hostname.
4. DaemonSet Controller

The DaemonSet controller ensures that a copy of a pod runs on each node in the cluster. It is commonly
used for tasks such as logging and monitoring.

5. Job Controller

The Job controller manages batch tasks in the Kubernetes cluster. It ensures that the job is completed
successfully and terminates when the specified number of completions is reached.

6. CronJob Controller

The CronJob controller is used to create jobs that run on a schedule. It is commonly used for tasks such
as backups and cleanup.

7. Namespace Controller

The Namespace controller is used to create and manage namespaces in the Kubernetes cluster.
Namespaces are a way to divide cluster resources between multiple users.

8. ServiceAccount Controller

The ServiceAccount controller is used to manage service accounts in the Kubernetes cluster. Service
accounts are used to provide an identity to pods and control access to resources.

9. Service Controller

The Service controller is used to manage Kubernetes services. It ensures that requests are routed to the
appropriate pods based on labels and selectors.

10. Ingress Controller

The Ingress controller is used to manage the ingress resources in the Kubernetes cluster. It allows
external traffic to access the services in the cluster.

Writing custom controller


The following representation showing how the various components in the client-go library work and its
interaction points with the custom controller code that you will write.
The picture is divided into two parts— client-go and Custom Controller.

1. Reflector: A reflector watches the Kubernetes API for the specified resource type (kind). This could be
an in-built resource or it could be a custom resource. When it receives notification about the existence of
a new resource instance through the watch API, it gets the newly created object using the corresponding
listing API. It then puts the object in a Delta Fifo queue.
2. Informer: An informer pops objects from the Delta Fifo queue. Its job is to save objects for later
retrieval, and invoke the controller code passing it the object.
3. Indexer: An indexer provides indexing functionality over objects. A typical indexing use-case is to
create an index based on object labels. Indexers can maintain indexes based on several indexing
functions. Indexer uses a thread-safe data store to store objects and their keys. There is a default
function that generates an object’s key as<namespace>/<name> combination for that object.

Custom Controller components


1. Informer reference: This is the reference to the Informer instance that knows how to work with your
custom resource objects. Your custom controller code needs to create the appropriate Informer.
2. Indexer reference: This is the reference to the Indexer instance that knows how to work with your
custom resource objects. Your custom controller code needs to create this. You will be using this
reference for retrieving objects for later processing. client-go provides functions to create Informer and
Indexer according to your needs. In your code you can either directly invoke these functions or use
factory methods for creating an informer.
3. Resource Event Handlers: These are the callback functions which will be called by the Informer when
it wants to deliver an object to your controller. The typical pattern to write these functions is to obtain
the dispatched object’s key and enqueue that key in a work queue for further processing.
4. Work queue: This is the queue that you create in your controller code to decouple delivery of an object
from its processing. Resource event handler functions are written to extract the delivered object’s key
and add that to the work queue.
5. Process Item: This is the function that you create in your code which processes items from the work
queue. There can be one or more other functions that do the actual processing. These functions will
typically use the Indexer reference, or a Listing wrapper to retrieve the object corresponding to the key.

You can try out our Postgres custom resource to see how these components fit together in real code.
This custom resource has been developed following the sample-controller available in Kubernetes.

Ref
1.: Generating ClientSet/Informers/Lister and CRD for Custom Resources | Writing K8S Operator - …
2. Writing a Kubernetes custom controller (ekspose) from scratch to expose your deployment | Par…

Operators

asd

Pods

Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.

A Pod (as in a pod of whales or pea pod) is a group of one or more containers, with shared storage and
network resources, and a specification for how to run the containers. A Pod's contents are always
co-located and co-scheduled, and run in a shared context. A Pod models an application-specific "logical
host": it contains one or more application containers which are relatively tightly coupled. In non-cloud
contexts, applications executed on the same physical or virtual machine are analogous to cloud
applications executed on the same logical host.

apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80

You'll rarely create individual Pods directly in Kubernetes—even singleton Pods. This is because Pods
are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or
indirectly by a controller), the new Pod is scheduled to run on a Node in your cluster. The Pod remains
on that node until the Pod finishes execution, the Pod object is deleted, the Pod is evicted for lack of
resources, or the node fails.

You can use workload resources to create and manage multiple Pods for you. A controller for the
resource handles replication and rollout and automatic healing in case of Pod failure. For example, if a
Node fails, a controller notices that Pods on that Node have stopped working and creates a replacement
Pod. The scheduler places the replacement Pod onto a healthy Node.

Here are some examples of workload resources that manage one or more Pods:

● Deployment
● StatefulSet
● DaemonSet

Pod Lifecycle

This page describes the lifecycle of a Pod. Pods follow a defined lifecycle, starting in the Pending phase,
moving through Running if at least one of its primary containers starts OK, and then through either the
Succeeded or Failed phases depending on whether any container in the Pod terminated in failure.
Pods are only scheduled once in their lifetime. Once a Pod is scheduled (assigned) to a Node, the Pod
runs on that Node until it stops or is terminated.

Like individual application containers, Pods are considered to be relatively ephemeral (rather than
durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they
remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to
that node are scheduled for deletion after a timeout period.

Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted;
likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes
uses a higher-level abstraction, called a controller, that handles the work of managing the relatively
disposable Pod instances.

A given Pod (as defined by a UID) is never "rescheduled" to a different node; instead, that Pod can be
replaced by a new, near-identical Pod, with even the same name if desired, but with a different UID.

When something is said to have the same lifetime as a Pod, such as a volume, that means that the thing
exists as long as that specific Pod (with that exact UID) exists. If that Pod is deleted for any reason, and
even if an identical replacement is created, the related thing (a volume, in this example) is also destroyed
and created anew.

Here are the possible values for phase:

Value Description

Pending The Pod has been accepted by the Kubernetes cluster, but one or more of the containers
has not been set up and made ready to run. This includes time a Pod spends waiting to
be scheduled as well as the time spent downloading container images over the network.

Running The Pod has been bound to a node, and all of the containers have been created. At least
one container is still running, or is in the process of starting or restarting.

Succeeded All containers in the Pod have terminated in success, and will not be restarted.
Failed All containers in the Pod have terminated, and at least one container has terminated in
failure. That is, the container either exited with non-zero status or was terminated by the
system.

Unknown For some reason the state of the Pod could not be obtained. This phase typically occurs
due to an error in communicating with the node where the Pod should be running.

Pod Conditions

A Pod has a PodStatus, which has an array of PodConditions through which the Pod has or has not
passed:

● PodScheduled: The Pod has been scheduled to a node.


● ContainersReady: all containers in the Pod are ready.
● Initialized: all init containers have started successfully.
● Ready: The Pod is able to serve requests and should be added to the load balancing pools of all
matching Services.

Field name Description

type Name of this Pod condition.

status Indicates whether that condition is applicable, with possible values "True",
"False", or "Unknown".

lastProbeTime Timestamp of when the Pod condition was last probed.

lastTransitionTime Timestamp for when the Pod last transitioned from one status to another.

reason Machine-readable, UpperCamelCase text indicating the reason for the


condition's last transition.

message Human-readable message indicating details about the last status transition.

Pod Readiness

FEATURE STATE: Kubernetes v1.14 [stable]


Your application can inject extra feedback or signals into PodStatus: Pod readiness. To use this, set
readinessGates in the Pod's spec to specify a list of additional conditions that the kubelet evaluates for
Pod readiness.

Readiness gates are determined by the current state of status.condition fields for the Pod. If Kubernetes
cannot find such a condition in the status.conditions field of a Pod, the status of the condition is
defaulted to "False".

Here is an example:

kind: Pod
...
spec:
readinessGates:
- conditionType: "www.example.com/feature-1"
status:
conditions:
- type: Ready # a built in PodCondition
status: "False"
lastProbeTime: null
lastTransitionTime: 2018-01-01T00:00:00Z
- type: "www.example.com/feature-1" # an extra PodCondition
status: "False"
lastProbeTime: null
lastTransitionTime: 2018-01-01T00:00:00Z
containerStatuses:
- containerID: docker://abcd...
ready: true
...

Status for Pod readiness


The kubectl patch command does not support patching object status. To set these status.conditions for
the pod, applications and operators should use the PATCH action. You can use a Kubernetes client
library to write code that sets custom Pod conditions for Pod readiness.

For a Pod that uses custom conditions, that Pod is evaluated to be ready only when both the following
statements apply:

● All containers in the Pod are ready.


● All conditions specified in readinessGates are True.
When a Pod's containers are Ready but at least one custom condition is missing or False, the kubelet
sets the Pod's condition to ContainersReady.

Container Probes

A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic,


the kubelet calls a Handler implemented by the container. There are three types of handlers:

● ExecAction: Executes a specified command inside the container. The diagnostic is considered
successful if the command exits with a status code of 0.
● TCPSocketAction: Performs a TCP check against the Pod's IP address on a specified port. The
diagnostic is considered successful if the port is open.
● HTTPGetAction: Performs an HTTP GET request against the Pod's IP address on a specified port
and path. The diagnostic is considered successful if the response has a status code greater than
or equal to 200 and less than 400.

Each probe has one of three results:

● Success: The container passed the diagnostic.


● Failure: The container failed the diagnostic.
● Unknown: The diagnostic failed, so no action should be taken.

The kubelet can optionally perform and react to three kinds of probes on running containers:

● livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet
kills the container, and the container is subjected to its restart policy. If a Container does not
provide a liveness probe, the default state is Success.
● readinessProbe: Indicates whether the container is ready to respond to requests. If the readiness
probe fails, the endpoints controller removes the Pod's IP address from the endpoints of all
Services that match the Pod. The default state of readiness before the initial delay is Failure. If a
Container does not provide a readiness probe, the default state is Success.
● startupProbe: Indicates whether the application within the container is started. All other probes
are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet
kills the container, and the container is subjected to its restart policy. If a Container does not
provide a startup probe, the default state is Success.

For more information about how to set up a liveness, readiness, or startup probe, see Configure
Liveness, Readiness and Startup Probes.

When should you use a liveness probe?

FEATURE STATE: Kubernetes v1.0 [stable]


If the process in your container is able to crash on its own whenever it encounters an issue or becomes
unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the
correct action in accordance with the Pod's restartPolicy.

If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and
specify a restartPolicy of Always or OnFailure.

When should you use a readiness probe?

FEATURE STATE: Kubernetes v1.0 [stable]

If you'd like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe. In
this case, the readiness probe might be the same as the liveness probe, but the existence of the
readiness probe in the spec means that the Pod will start without receiving any traffic and only start
receiving traffic after the probe starts succeeding.

If you want your container to be able to take itself down for maintenance, you can specify a readiness
probe that checks an endpoint specific to readiness that is different from the liveness probe.

If your app has a strict dependency on back-end services, you can implement both a liveness and a
readiness probe. The liveness probe passes when the app itself is healthy, but the readiness probe
additionally checks that each required back-end service is available. This helps you avoid directing traffic
to Pods that can only respond with error messages.

If your container needs to work on loading large data, configuration files, or migrations during startup,
you can use a startup probe. However, if you want to detect the difference between an app that has
failed and an app that is still processing its startup data, you might prefer a readiness probe.

Note: If you want to be able to drain requests when the Pod is deleted, you do not necessarily need a
readiness probe; on deletion, the Pod automatically puts itself into an unready state regardless of
whether the readiness probe exists. The Pod remains in the unready state while it waits for the
containers in the Pod to stop.

When should you use a startup probe?

FEATURE STATE: Kubernetes v1.20 [stable]

Startup probes are useful for Pods that have containers that take a long time to come into service.
Rather than set a long liveness interval, you can configure a separate configuration for probing the
container as it starts up, allowing a time longer than the liveness interval would allow.

If your container usually starts in more than initialDelaySeconds + failureThreshold × periodSeconds,


you should specify a startup probe that checks the same endpoint as the liveness probe. The default for
periodSeconds is 10s. You should then set its failureThreshold high enough to allow the container to
start, without changing the default values of the liveness probe. This helps to protect against deadlocks.

Termination of Pods

Because Pods represent processes running on nodes in the cluster, it is important to allow those
processes to gracefully terminate when they are no longer needed (rather than being abruptly stopped
with a KILL signal and having no chance to clean up).
Typically, the container runtime sends a TERM signal to the main process in each container. Many
container runtimes respect the STOPSIGNAL value defined in the container image and send this instead
of TERM. Once the grace period has expired, the KILL signal is sent to any remaining processes, and the
Pod is then deleted from the API Server.

An example flow:

1. You use the kubectl tool to manually delete a specific Pod, with the default grace period (30
seconds).
2. The Pod in the API server is updated with the time beyond which the Pod is considered "dead"
along with the grace period. If you use “kubectl describe” to check on the Pod you're deleting,
that Pod shows up as "Terminating". On the node where the Pod is running: as soon as the
kubelet sees that a Pod has been marked as terminating (a graceful shutdown duration has been
set), the kubelet begins the local Pod shutdown process.
1. If one of the Pod's containers has defined a preStop hook, the kubelet runs that hook
inside of the container. If the preStop hook is still running after the grace period expires,
the kubelet requests a small, one-off grace period extension of 2 seconds.
Note: If the preStop hook needs longer to complete than the default grace period
allows, you must modify terminationGracePeriodSeconds to suit this.
2. The kubelet triggers the container runtime to send a TERM signal to process 1 inside
each container.
Note: The containers in the Pod receive the TERM signal at different times and in an
arbitrary order. If the order of shutdowns matters, consider using a preStop hook to
synchronize.
3. At the same time as the kubelet is starting graceful shutdown, the control plane removes that
shutting-down Pod from Endpoints (and, if enabled, EndpointSlice) objects where these
represent a Service with a configured selector. ReplicaSets and other workload resources no
longer treat the shutting-down Pod as a valid, in-service replica. Pods that shut down slowly
cannot continue to serve traffic as load balancers (like the service proxy) remove the Pod from
the list of endpoints as soon as the termination grace period begins.
4. When the grace period expires, the kubelet triggers forcible shutdown. The container runtime
sends SIGKILL to any processes still running in any container in the Pod. The kubelet also cleans
up a hidden pause container if that container runtime uses one.
5. The kubelet triggers forcible removal of Pod objects from the API server, by setting grace period
to 0 (immediate deletion).
6. The API server deletes the Pod's API object, which is then no longer visible from any client.

Forced Pod Termination

By default, all deletes are graceful within 30 seconds. The kubectl delete command supports the
--grace-period=<seconds> option which allows you to override the default and specify your own value.

Setting the grace period to 0 forcibly and immediately deletes the Pod from the API server. If the pod
was still running on a node, that forcible deletion triggers the kubelet to begin immediate cleanup.

Note: You must specify an additional flag --force along with --grace-period=0 in order to perform force
deletions.
When a force deletion is performed, the API server does not wait for confirmation from the kubelet that
the Pod has been terminated on the node it was running on. It removes the Pod in the API immediately
so a new Pod can be created with the same name. On the node, Pods that are set to terminate
immediately will still be given a small grace period before being force killed.

If you need to force-delete Pods that are part of a StatefulSet, refer to the task documentation for
deleting Pods from a StatefulSet.

Garbage collection of failed Pods


For failed Pods, the API objects remain in the cluster's API until a human or controller process explicitly
removes them.

The control plane cleans up terminated Pods (with a phase of Succeeded or Failed), when the number of
Pods exceeds the configured threshold (determined by terminated-pod-gc-threshold in the
kube-controller-manager). This avoids a resource leak as Pods are created and terminated over time.

Labels and Selectors

Labels are key/value pairs that are attached to objects, such as pods. Labels are intended to be used to
specify identifying attributes of objects that are meaningful and relevant to users, but do not directly
imply semantics to the core system. Labels can be used to organize and to select subsets of objects.
Labels can be attached to objects at creation time and subsequently added and modified at any time.
Each object can have a set of key/value labels defined. Each Key must be unique for a given object.

"metadata": {
"labels": {
"key1" : "value1",
"key2" : "value2"
}
}

Labels allow for efficient queries and watches and are ideal for use in UIs and CLIs. Non-identifying
information should be recorded using annotations.

Labels are key/value pairs. Valid label keys have two segments: an optional prefix and name, separated
by a slash (/). The name segment is required and must be 63 characters or less, beginning and ending
with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and
alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a
series of DNS labels separated by dots (.), not longer than 253 characters in total, followed by a slash (/).

apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/name: mysql
app.kubernetes.io/instance: mysql-abcxzy
app.kubernetes.io/version: "5.7.21"
app.kubernetes.io/component: database
app.kubernetes.io/part-of: wordpress
app.kubernetes.io/managed-by: helm
app.kubernetes.io/created-by: controller-manager

The kubernetes.io/ and k8s.io/ prefixes are reserved for Kubernetes core components.
Valid label value:

● must be 63 characters or less (can be empty),


● unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]),
● could contain dashes (-), underscores (_), dots (.), and alphanumerics between.

Label selectors
Unlike names and UIDs, labels do not provide uniqueness. In general, we expect many objects to carry
the same label(s).

Via a label selector, the client/user can identify a set of objects. The label selector is the core grouping
primitive in Kubernetes.

The API currently supports two types of selectors: equality-based and set-based. A label selector can
be made of multiple requirements which are comma-separated. In the case of multiple requirements, all
must be satisfied so the comma separator acts as a logical AND (&&) operator.
The semantics of empty or non-specified selectors are dependent on the context, and API types that use
selectors should document the validity and meaning of them.

Note: For some API types, such as ReplicaSets, the label selectors of two instances must not overlap
within a namespace, or the controller can see that as conflicting instructions and fail to determine how
many replicas should be present.
Caution: For both equality-based and set-based conditions there is no logical OR (||) operator. Ensure
your filter statements are structured accordingly.

Equality-based requirement

Equality- or inequality-based requirements allow filtering by label keys and values. Matching objects
must satisfy all of the specified label constraints, though they may have additional labels as well. Three
kinds of operators are admitted =,==,!=. The first two represent equality (and are synonyms), while the
latter represents inequality. For example:

environment = production
tier != frontend

One usage scenario for equality-based label requirement is for Pods to specify node selection criteria.
For example, the sample Pod below selects nodes with the label "accelerator=nvidia-tesla-p100".

apiVersion: v1
kind: Pod
metadata:
name: cuda-test
spec:
containers:
- name: cuda-test
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-p100

Set-based requirement

Set-based label requirements allow filtering keys according to a set of values. Three kinds of operators
are supported: in,notin and exists (only the key identifier).
For example:
environment in (production, qa)
tier notin (frontend, backend)
partition
!partition

● The first example selects all resources with key equal to environment and value equal to
production or qa.
● The second example selects all resources with key equal to tier and values other than frontend
and backend, and all resources with no labels with the tier key.
● The third example selects all resources including a label with key partition; no values are
checked.
● The fourth example selects all resources without a label with a key partition; no values are
checked.

Similarly the comma separator acts as an AND operator. So filtering resources with a partition key (no
matter the value) and with an environment different than qa can be achieved using
partition,environment notin (qa). The set-based label selector is a general form of equality since
environment=production is equivalent to environment in (production); similarly for != and notin.

Set-based requirements can be mixed with equality-based requirements. For example: partition in
(customerA, customerB),environment!=qa.

LIST and WATCH filtering

LIST and WATCH operations may specify label selectors to filter the sets of objects returned
using a query parameter. Both requirements are permitted (presented here as they would
appear in a URL query string):

● Equality-based requirements:
?labelSelector=environment%3Dproduction,tier%3Dfrontend
● Set-based requirements:
?labelSelector=environment+in+%28production%2Cqa%29%2Ctier+in+%28frontend
%29

Both label selector styles can be used to list or watch resources via a REST client. For example,
targeting apiserver with kubectl and using equality-based one may write:

kubectl get pods -l environment=production,tier=frontend

or using set-based requirements:

kubectl get pods -l 'environment in (production),tier in (frontend)'


As already mentioned, set-based requirements are more expressive. For instance, they can
implement the OR operator on values:

kubectl get pods -l 'environment in (production, qa)'

or restricting negative matching via exists operator:

kubectl get pods -l 'environment,environment notin (frontend)'

Service and ReplicationController


The set of pods that a service targets is defined with a label selector. Similarly, the population of pods
that a replicationcontroller should manage is also defined with a label selector.
Labels selectors for both objects are defined in json or yaml files using maps, and only equality-based
requirement selectors are supported:

"selector": {

"component" : "redis",

or

selector:

component: redis

this selector (respectively in json or yaml format) is equivalent to component=redis or component in


(redis)

Resources that support set-based requirements


Newer resources, such as Job, Deployment, ReplicaSet, and DaemonSet, support set-based
requirements as well.

selector:
matchLabels:
component: redis
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to
an element of matchExpressions, whose key field is "key", the operator is "In", and the values array
contains only "value". matchExpressions is a list of pod selector requirements. Valid operators include In,
NotIn, Exists, and DoesNotExist. The values set must be non-empty in the case of In and NotIn. All of
the requirements, from both matchLabels and matchExpressions are ANDed together -- they must all be
satisfied in order to match.

Replica Controller

Note: A Deployment that configures a ReplicaSet is now the recommended way to set up replication.

A ReplicationController ensures that a specified number of pod replicas are running at any one time.
In other words, a ReplicationController makes sure that a pod or a homogeneous set of pods is
always up and available.

If there are too many pods, the ReplicationController terminates the extra pods. If there are too few, the
ReplicationController starts more pods. Unlike manually created pods, the pods maintained by a
ReplicationController are automatically replaced if they fail, are deleted, or are terminated. For example,
your pods are re-created on a node after disruptive maintenance such as a kernel upgrade. For this
reason, you should use a ReplicationController even if your application requires only a single pod.

This example ReplicationController config runs three copies of the nginx web server.

apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80

kubectl apply -f replication.yaml


kubectl describe replicationcontrollers/nginx
To list all the pods that belong to the ReplicationController in a machine readable form, you can use a
command like this:

pods=$(kubectl get pods --selector=app=nginx --output=jsonpath={.items..metadata.name})


echo $pods

Deleting a Replication Controller and its Pods

To delete a ReplicationController and all its pods, use kubectl delete. Kubectl will scale the
ReplicationController to zero and wait for it to delete each pod before deleting the ReplicationController
itself.
You can delete a ReplicationController without affecting any of its pods.
Using kubectl, specify the --cascade=orphan option to kubectl delete.
When using the REST API or Go client library, you can delete the ReplicationController object. Once the
original is deleted, you can create a new ReplicationController to replace it. As long as the old and new
.spec.selector are the same, then the new one will adopt the old pods.
Pods may be removed from a ReplicationController's target set by changing their labels. This technique
may be used to remove pods from service for debugging and data recovery. Pods that are removed in
this way will be replaced automatically (assuming that the number of replicas is not also changed).

Common Usage Patterns

Rescheduling
As mentioned above, whether you have 1 pod you want to keep running, or 1000, a
ReplicationController will ensure that the specified number of pods exists, even in the event of node
failure or pod termination (for example, due to an action by another control agent).

Scaling
The ReplicationController enables scaling the number of replicas up or down, either manually or by an
auto-scaling control agent, by updating the replicas field.

Rolling updates
The ReplicationController is designed to facilitate rolling updates to a service by replacing pods
one-by-one.
The recommended approach is to create a new ReplicationController with 1 replica, scale the new (+1)
and old (-1) controllers one by one, and then delete the old controller after it reaches 0 replicas. This
predictably updates the set of pods regardless of unexpected failures.
Ideally, the rolling update controller would take application readiness into account, and would ensure
that a sufficient number of pods were productively serving at any given time.
The two ReplicationControllers would need to create pods with at least one differentiating label, such as
the image tag of the primary container of the pod, since it is typically image updates that motivate rolling
updates.

Multiple release tracks


In addition to running multiple releases of an application while a rolling update is in progress, it's
common to run multiple releases for an extended period of time, or even continuously, using multiple
release tracks. The tracks would be differentiated by labels.

For instance, a service might target all pods with tier in (frontend), environment in (prod). Now say you
have 10 replicated pods that make up this tier. But you want to be able to 'canary' a new version of this
component. You could set up a ReplicationController with replicas set to 9 for the bulk of the replicas,
with labels tier=frontend, environment=prod, track=stable, and another ReplicationController with
replicas set to 1 for the canary, with labels tier=frontend, environment=prod, track=canary. Now the
service is covering both the canary and non-canary pods. But you can mess with the
ReplicationControllers separately to test things out, monitor the results, etc.

Using ReplicationControllers with Services


Multiple ReplicationControllers can sit behind a single service, so that, for example, some traffic goes to
the old version, and some goes to the new version.

A ReplicationController will never terminate on its own, but it isn't expected to be as long-lived as
services. Services may be composed of pods controlled by multiple ReplicationControllers, and it is
expected that many ReplicationControllers may be created and destroyed over the lifetime of a service
(for instance, to perform an update of pods that run the service). Both services themselves and their
clients should remain oblivious to the ReplicationControllers that maintain the pods of the services.

Replica Set

A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such,
it is often used to guarantee the availability of a specified number of identical Pods.
A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire,
a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying
the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills
its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet
needs to create new Pods, it uses its Pod template. A ReplicaSet is linked to its Pods via the Pods'
metadata.ownerReferences field.
A ReplicaSet ensures that a specified number of pod replicas are running at any given time. However, a
Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to
Pods along with a lot of other useful features. Therefore, we recommend using Deployments instead of
directly using ReplicaSets, unless you require custom update orchestration or don't require updates at
all.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
# modify replicas according to your case
replicas: 3
selector:
matchLabels:
tier: frontend
template:
metadata:
labels:
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3

kubectl apply -f https://kubernetes.io/examples/controllers/frontend.yaml

kubectl get rs
NAME DESIRED CURRENT READY AGE
frontend 3 3 3 6s

kubectl describe rs/frontend


Name: frontend
Namespace: default
Selector: tier=frontend
Labels: app=guestbook
tier=frontend
Annotations: kubectl.kubernetes.io/last-applied-configuration:

{"apiVersion":"apps/v1","kind":"ReplicaSet","metadata":{"annotations":{},"labels":{"app":"guestbook","ti
er":"frontend"},"name":"frontend",...
Replicas: 3 current / 3 desired
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: tier=frontend
Containers:
php-redis:
Image: gcr.io/google_samples/gb-frontend:v3
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 117s replicaset-controller Created pod: frontend-wtsmm
Normal SuccessfulCreate 116s replicaset-controller Created pod: frontend-b2zdv
Normal SuccessfulCreate 116s replicaset-controller Created pod: frontend-vcmts

kubectl get pods


NAME READY STATUS RESTARTS AGE
frontend-b2zdv 1/1 Running 0 6m36s
frontend-vcmts 1/1 Running 0 6m36s
frontend-wtsmm 1/1 Running 0 6m36s

You can also verify that the owner reference of these pods is set to the frontend ReplicaSet. To do this,
get the yaml of one of the Pods running:

kubectl get pods frontend-b2zdv -o yaml


apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2020-02-12T07:06:16Z"
generateName: frontend-
labels:
tier: frontend
name: frontend-b2zdv
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: frontend
uid: f391f6db-bb9b-4c09-ae74-6a1f77f3d5cf
...

Isolating Pods from a ReplicaSet

You can remove Pods from a ReplicaSet by changing their labels. This technique may be used to remove
Pods from service for debugging, data recovery, etc. Pods that are removed in this way will be replaced
automatically ( assuming that the number of replicas is not also changed).

Scaling a ReplicaSet
A ReplicaSet can be easily scaled up or down by simply updating the .spec.replicas field. The ReplicaSet
controller ensures that a desired number of Pods with a matching label selector are available and
operational.

When scaling down, the ReplicaSet controller chooses which pods to delete by sorting the available
pods to prioritize scaling down pods based on the following general algorithm:

1. Pending (and unschedulable) pods are scaled down first


2. If controller.kubernetes.io/pod-deletion-cost annotation is set, then the pod with the lower value
will come first.
3. Pods on nodes with more replicas come before pods on nodes with fewer replicas.
4. If the pods' creation times differ, the pod that was created more recently comes before the older
pod (the creation times are bucketed on an integer log scale when the LogarithmicScaleDown
feature gate is enabled)

If all of the above match, then selection is random.

Pod deletion cost

FEATURE STATE: Kubernetes v1.22 [beta]

Using the controller.kubernetes.io/pod-deletion-cost annotation, users can set a preference regarding


which pods to remove first when downscaling a ReplicaSet.
The annotation should be set on the pod, the range is [-2147483647, 2147483647]. It represents the
cost of deleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower
deletion cost are preferred to be deleted before pods with higher deletion cost.
The implicit value for this annotation for pods that don't set it is 0; negative values are permitted. Invalid
values will be rejected by the API server.
This feature is beta and enabled by default. You can disable it using the feature gate PodDeletionCost in
both kube-apiserver and kube-controller-manager.

Note:
● This is honored on a best-effort basis, so it does not offer any guarantees on pod deletion order.
● Users should avoid updating the annotation frequently, such as updating it based on a metric
value, because doing so will generate a significant number of pod updates on the apiserver.

Example Use Case


The different pods of an application could have different utilization levels. On scale down, the
application may prefer to remove the pods with lower utilization. To avoid frequently updating the pods,
the application should update controller.kubernetes.io/pod-deletion-cost once before issuing a scale
down (setting the annotation to a value proportional to pod utilization level). This works if the
application itself controls the down scaling; for example, the driver pod of a Spark deployment.
ReplicaSet as a Horizontal Pod Autoscaler Target

A ReplicaSet can also be a target for Horizontal Pod Autoscalers (HPA). That is, a ReplicaSet can be
auto-scaled by an HPA. Here is an example HPA targeting the ReplicaSet we created in the previous
example.

controllers/hpa-rs.yaml

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-scaler
spec:
scaleTargetRef:
kind: ReplicaSet
name: frontend
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 50

Saving this manifest into hpa-rs.yaml and submitting it to a Kubernetes cluster should create the defined
HPA that autoscales the target ReplicaSet depending on the CPU usage of the replicated Pods.

kubectl apply -f https://k8s.io/examples/controllers/hpa-rs.yaml

Alternatively, you can use the kubectl autoscale command to accomplish the same (and it's easier!)
kubectl autoscale rs frontend --max=10 --min=3 --cpu-percent=50

Replica Set is the next generation of Replication Controller. Replication controller is kinda imperative, but
replica sets try to be as declarative as possible.

Replica Set Replication Controller

1. Replica Set supports the new set-based 1. Replication Controller only supports
selector. This gives more flexibility. equality-based selector.
for eg: environment in (production, qa) This for eg: environment = production This selects all
selects all resources with key equal to resources with key equal to environment and
environment and value equal to production value equal to production or qa
2. Rollout command is used for updating the 2. Rolling-update command is used for updating
replica set. Even though replica set can be the replication controller. This replaces the
used independently, it is best used along with specified replication controller with a new
deployments which makes them declarative. replication controller by updating one pod at
a time to use the new PodTemplate.
Deployments

A Deployment provides declarative updates for Pods and ReplicaSets.

You describe a desired state in a Deployment, and the Deployment Controller changes the actual
state to the desired state at a controlled rate. You can define Deployments to create new
ReplicaSets, or to remove existing Deployments and adopt all their resources with new
Deployments.

The following is an example of a Deployment. It creates a ReplicaSet to bring up three nginx Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80

In this example:

● A Deployment named nginx-deployment is created, indicated by the .metadata.name field.


● The Deployment creates three replicated Pods, indicated by the .spec.replicas field.
● The .spec.selector field defines how the Deployment finds which Pods to manage. In this case,
you select a label that is defined in the Pod template (app: nginx). However, more sophisticated
selection rules are possible, as long as the Pod template itself satisfies the rule.
● The template field contains the following sub-fields:
○ The Pods are labeled app: nginx using the .metadata.labels field.
○ The Pod template's specification, or .template.spec field, indicates that the Pods run one
container, nginx, which runs the nginx Docker Hub image at version 1.14.2.
○ Create one container and name it nginx using the
.spec.template.spec.containers[0].name field.

Horizontal Pod Autoscaling

Config Map

A ConfigMap is an API object used to store non-confidential data in key-value pairs. Pods can consume
ConfigMaps as environment variables, command-line arguments, or as configuration files in a volume.

A ConfigMap allows you to decouple environment-specific configuration from your container images, so
that your applications are easily portable.
Examples

MongoDB

Standalone Instance

Label the Node


Label the node that will be used for MongoDB deployment. The label is used later to assign pods to a
specific node.

To do so:

1. List the nodes on your cluster:

kubectl get nodes


2. Choose the deployment node from the list in the command output.

3. Use kubectl to label the node with a key-value pair.

kubectl label nodes <node> <key>=<value>


The output confirms that the label was added successfully.

Create a StorageClass
StorageClass helps pods provision persistent volume claims on the node. To create a StorageClass:

1. Use a text editor to create a YAML file to store the storage class configuration.

vim StorageClass.yaml

2. Specify your storage class configuration in the file. The example below defines the
mongodb-storageclass:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: mongodb-storageclass
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Create a Persistent Storage


Provision storage for the MongoDB deployment by creating a persistent volume and a persistent volume
claim:

1. Create a YAML file for persistent volume configuration.

vim PersistentVolume.yaml

2. In the file, allocate storage that belongs to the storage class defined in the previous step. Specify the
node that will be used in pod deployment in the nodeAffinity section. The node is identified using the
label created in Step 1.

apiVersion: v1
kind: PersistentVolume
metadata:
name: mongodb-pv
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: mongodb-storageclass
local:
path: /mnt/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- large

3. Create another YAML for the configuration of the persistent volume claim:
vim PersistentVolumeClaim.yaml

4. Define the claim named mongodb-pvc and instruct Kubernetes to claim volumes belonging to
mongodb-storageclass.

kind: PersistentVolumeClaim

apiVersion: v1

metadata:
name: mongodb-pvc
spec:
storageClassName: mongodb-storageclass
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi

Create a Config Map


The ConfigMap file stores non-encrypted configuration information used by pods.

1. Create a YAML file to store deployment configuration:

vim ConfigMap.yaml
2. Use the file to store information about system paths, users, and roles. The following is an example of
a ConfigMap file:

apiVersion: v1
kind: ConfigMap
metadata:
name: mongodb-configmap
data:
mongo.conf: |
storage:
dbPath: /data/db
ensure-users.js: |
const targetDbStr = 'test';
const rootUser = cat('/etc/k8-test/admin/MONGO_ROOT_USERNAME');
const rootPass = cat('/etc/k8-test/admin/MONGO_ROOT_PASSWORD');
const usersStr = cat('/etc/k8-test/MONGO_USERS_LIST');

const adminDb = db.getSiblingDB('admin');


adminDb.auth(rootUser, rootPass);
print('Successfully authenticated admin user');

const targetDb = db.getSiblingDB(targetDbStr);

const customRoles = adminDb


.getRoles({rolesInfo: 1, showBuiltinRoles: false})
.map(role => role.role)
.filter(Boolean);

usersStr
.trim()
.split(';')
.map(s => s.split(':'))
.forEach(user => {
const username = user[0];
const rolesStr = user[1];
const password = user[2];

if (!rolesStr || !password) {
return;
}

const roles = rolesStr.split(',');


const userDoc = {
user: username,
pwd: password,
};

userDoc.roles = roles.map(role => {


if (!~customRoles.indexOf(role)) {
return role;
}
return {role: role, db: 'admin'};
});

try {
targetDb.createUser(userDoc);
} catch (err) {
if (!~err.message.toLowerCase().indexOf('duplicate')) {
throw err;
}
}
});

Create a StatefulSet

You might also like