k8s Arch Conn

Uploaded by

Damian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

24 views32 pages

k8s Arch Conn

Uploaded by

Damian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 32

Cluster Architecture The architectural concepts behind Kubernetes. ion between Nodes and the Control Plane Controllers Leases loud Controller Manager roup v2 Container Runtime Interface (CRI) Garbage Collection Mixed Version Proxy, 1- Nodes Kubernetes runs your workload by placing containers into Pads to run on Nodes. A node may be'a virtual or physical machine, depending on the cluster. Each node is managed by the control plane and contains the services necessary to run Pods. ‘Typically you have several nodes in a cluster; ina learning or resource. limited environment, you might have only one node. The com the onents on a node include the kubelet, a container runtime, and Management ‘There are two main ways to have Nodes added to tne API server: 1. The kubelet on a node self-registers to the control plane 2. You (or another human user) manually add a Node object After you create a Node object, or the kubelet on a node self-registers, che control plane checks whether the new Node object is valid. For example, if you try to create a Node from the following JSON manifest: eind: “Node”, ‘apiversion": "v1", etadata”: ( aabels: { “name: “ny-Finst-Kts-node’ ? > Kubernetes creates a Node object internally (the representation). Kubernetes checks that a kubelet has registered to the API server that matches the watedsta.nava field of the Node. Ifthe nade is healthy (.. all necessary services are running), then it is eligible to run a Pod, Otherwise, that node is ignored for any cluster activity until it becomes healthy.Note: Kubernetes keeps the abject for the invalid Node and continues checking to see whether it becomes healthy. You, or a health checking, 5, must explicitly delete the Node object to stop that The name of a Node object must be a valid DNS subdomain name, Node name uniqueness The name identifies a Node. Two Nodes cannot have the same name at the same time. Kubernetes also assumes that a resource with the same name is the same object Incase of a Node, tis implicitly assured that an instance using the same name will have the same state (eg, network settings, root disk contents) and attributes like node labels. This may lead te Inconsistencies ifan Instance was modified without changing Its name. If the Node needs to be replaced or updated significantly, the existing Node ‘object needs to be removed from API server first and re-added after the update, Self-registration of Nodes When the kubelet flag --register-node is true (the default) the kubelet will attempt to register itself with the API server. This is the preferred pattern, used by most distros, For selfregistration, the kubelet is started with the following options: + --taveconfig. ~ Path to credentials to authenticate itself to the API + --clout-provider = How to talk to a cloud provider to read metadata about itself, + -cregister-node - Automatically register with the API server. + -srepisters Register the node with the given lst of taints (comma separated -cvalves: ) neti No-op if regester-noge Is false + --noge-ip - Optional comma-separated lst ofthe IP addresses for the node. You can only specify a single address for each address family. For example, ina single-stack IPv4 cluster, you set this valu be the IPv4 address that the kubelet should use for the node. See configure IPvé/IPv6 dual stack for details of running a dual-stack cluster, Ifyou don't provide this argument, the kubelet uses the node's default IPv4 address, if any; if the node nas no IPv4 addresses then the kubelet uses the node's default IPv6 address, + -snoce-tabels «Labels to add when registering the node n the cluster (ee label restrictions enforce by the NodeRestriction admission uti + -snosestatus-upate-frequency - Specifies how often kubelet posts its node status ta the API server. When the Node authorization mode and NodeRestriction admission plugin are enabled, kubelets are only authorized to create/modify their own NodeNote: ‘As mentioned in the Node name uniqueness section, when Node configuration needs to be updated, Its a good practice to re-register the node with the API server. For example, ifthe kubelet being, restarted with the new set of --noge-laels , but the same Node name is used, the change will not take an effect, as labels are being set on the Node registration, Pods already scheduled on the Node may misbehave or cause issues if the Node configuration will be changed on kubelet restart. For example, already running Pad may be tainted against the new labels: assigned to the Node, while other Pods, that are incompatible with that Pod willbe scheduled based on this new label. Node re- registration ensures all Pods will be drained and properly re- ‘scheduled. Manual Node administration You can create and modify Node objects using kubect When you want to create Node objects manually, set the kubelet flag register-node-false You can modify Node objects regardless of the setting of For example, you can set labels an an existing Node or mark unschedulable. You can use labels on Nodes in conjunction with node selectors on Pods to control scheduling. For example, you can constrain a Pod to only be eligible to run ona subset of the available nodes. Marking a node as unschedulable prevents the scheduler from placing new pods anto that Node but does not affect existing Pods on the Node. This is useful as a preparatory step before a node reboot or other maintenance. To mark a Node unschedulable, run: kubectl conden sicotnant See Safely Drain a Node for more details. Note: Pods that are part of a DaemonSet tolerate being run on an tunschedulable Node. DaemonSets typically provide node-local services that should run on the Node even ifit is being drained of workload applications, Node status ‘ANode's status contains the following information: + Addresses * Consitions + Capacity and Allocatable * Info. You can use kubectl to view a Node's status and other details:kubectl describe node See Node Status for more details Node heartbeats Heartbeats, sent by Kubernetes nodes, help your cluster determine the availablity of each node, and to Cake action when fallures are detected. For nodes there are two forms of heartbeats: *# Updates to the .status of @ Node, *# Lease objects within the kube-node-tease namespace. Fach Node has an associated Lease object. Node controller The node cont manages various aspects of nodes, ler is a Kubernetes control plane component that The node controller has multiple roles in a node's Ife. The first is assigning, CIDR block to the node when its registered (if CIDR assignment is turned on). The second is keeping the node controller's internal lst of nodes up to date with the cloud providers ist of available machines. When running in a cloud environment and whenever a node is unhealthy, the node controller asks the cloud provider if the VM for that node is still availabe. Ifnot, the node controller deletes the nade from is list of nodes, The third is monitoring the nodes’ health, The nede controller is responsible for + In the case that a node becomes unreachable, updating the Ready condition in the Node's status field. In this case the nade controller sets the Ready condition £0 unknown + Ifa node remains unreachable: triggering APL-initiated eviction for all of the Pods on the unreachable node. By default, the node controller waits 5 minutes between marking the node as Unknown and submitting the first eviction request. By default, the node controller checks the state of each node every 5 seconds. This period can be configured using the --rode-noniter-pertod flag on the kube-controter-nanager component. Rate limits on eviction Inmost cases, the node controller limits the eviction rate to --node: ceviction-rate (default 0.1) per second, meaning It won't evict pods from more than 1 node per 10 seconds, ‘The node eviction behavior changes when a node in a given availabilty zone becomes unhealthy. The node controller checks what percentage of nodes in the zone are unhealthy (the Reaey condition is unknown oF raise )at the same time: # Ifthe fraction of unhealthy nodes is at least --unhealthy-zone- shold (default 0.55), then the eviction rate is reduced + Ifthe cluster is small (Le. has less than or equal to --2arge-cluster~ slze-threshelé nodes - default 50), then evictions are stopped.+ Otherwise, the eviction rate is reduce (default 0.01) per second. ‘The reason these policies are implemented per availabilty zone is because ‘one availabilty zone might become partitioned from the control plane while the others remain connected, If your cluster does not span multiple cloud provider availabilty zones, then the eviction mechanism does nat take per-zone unavallabilty into account, ‘Akey reason for spreading your nodes across availabilty zones is So that the workload can be shifted to healthy zones when one entire zone goes down. Therefore, if all nodes in a zone are unhealthy, then the node controller evicts at the normal rate of --node-evietion-rate . The corner ‘case is when all zones are completely unhealthy (none of the nodes in the Cluster are healthy). In such a case, the node controller assumes that there is some problem with connectivity between the control plane and the nodes, and doesn't perform any evictions, (if there has been an outage and some nodes reappear, the node contraller does evict pods from the remaining nodes that are unhealthy or unreachable), The nade controller is also responsible for evicting pods running on nodes with Notxecute taints, unless those pods tolerate that taint, The node controller also adds taints corresponding to node problems like node unreachable or not ready. This means that the scheduler won't place Pods ‘onto unhealthy nodes. Resource capacity tracking Node objects track information about the Node’s resource capacity: for ‘example, the amount of memory available and the number of CPUs. Nodes that self register report their capacity during registration. If you manually ‘add a Node, then you need to set the node's capacity information when you add it. The Kubernetes scheduler ensures that there are enough resources for all the Pods on a Node, The scheduler checks that the sum of the requests of containers on the nade is no greater than the node's capacity. That sum of requests includes all containers managed by the kubelet, but excludes any containers started directly by the container runtime, and also excludes any processes running outside of the kubelet’s control Note: ifyou want to explicitly reserve resources for non-Pod processes, see reserve resources for system daemons. Node topology FEATURE STATE: cubeneves 02.28 [ Ifyou have enabled the Topotogyenager feature gate, then the kubelet can use topology hints when making resource assignment decisions. See Control Topology Management Policies on a Node for more information, Graceful node shutdown FEATURE STATE: cube The kubelet attempts to detect node system shutdown and terminates pods running on the node.Kubelet ensures that pods follow the normal pod termination process ‘during che node shutdown. During node shutdown, the kubelet does not accept new Pods (even if those Pods are already bound to the node), The Graceful node shutdown feature depends on systemd since it takes, advantage of systemd inhibitor locks to delay the node shutdown with 2 given duration, Graceful node shutdown is controlled with the cracefulNedeshutdonn {feature gate which is enabled by default in 1.21 Note that by default, both configuration options described below, shutdouncracePeriod and shutdownGracePeriodcriticalPods afe Set t0 zero, ‘thus not activating the graceful node shutdown functionality, To activate the feature, the two kubelet config settings should be configured appropriately and set to non-zero values, ‘once systemd detects or notifies node shutdown, the kubelet sets a notresdy condition on the Node, with the resson Setto “node 4s shutting oun” . The kube-scheduler honors this condition and does not schedule any Pods onto the affected node; other third-party schedulers are ‘expected to follow the same logic. This means that new Pods won't be scheduled onto that node and therefore none will start. The kubelet also rejects Pods during the Pedaéntssion phase if an ongoing node shutdown has been detected, so that even Pods with a toleration for node -abernetes.io/not-ready:Neseneaute do not start there, ‘At the same time when kubelet is setting that condition on its Node via the ‘API, the kubelet also begins terminating any Pods that are running locally. During a graceful shutdown, kubelet terminates pods in two phases: 1. Terminate regular pods running on the node. 2, Terminate critical pods running on the node. Graceful node shi wn feature is configured with two KubeletConfiguration options © Specifies the total duration that the node should delay the shutdown by. Ths isthe total grace period for pod termination ‘or both regular and critical pods, © Specifies the duration used to terminate critical pods during a node shutdown, This value should be less than snutcouncracePeriog Note: There are cases when Node termination was cancelled by the system (or perhaps manually by an administrator). In either of those situations the Node will return to the ready state, However, Pods which already started the process of termination will not be restored by kubelet and will need to be re-scheduled, For example, if shutdoucracereriod-30s , and aiPods=1@s , kubelet will delay the nade shutdown bby 30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved for gracefully terminating normal pods, and the last 10 seconds would be reserved for terminating critical pods, Note: When pods were evicted during the graceful node shutdown, they are marked as shutdown. Running, kubecti get pods shows the status of the evicted pods as Terminated .And kubectl describe poe indicatesthat the pod was evicted because of nade shutdown: Reason Terminates Neseage: Pod was terminated in response to inminent node shi Pod Priority based graceful node shutdown FEATURE STATE: cube es v.28 [ To provide more flexibility during graceful node shutdown around the ‘ordering of pods during shutdown, graceful node shutdown hanors the PriorityClass for Pods, provides that you enabled this feature in your cluster. The feature allows cluster administers to explicitly define the ‘ordering of pods during graceful node shutdown based on priority classes, The Graceful Node Shutdown feature, as described above, shuts down pods in two phases, non-critical pods, flowed by critical pods, IF additional flexibility is needed to explicitly define the ordering of pods during shutdown in a more granular way, pod priority based graceful shutdown can be used, When graceful node shutdown honors pod priorities, this makes it possible to do graceful node shutdown in mukiple phases, each phase shutting, down a particular priority class of pods. The kubelet can be configured the exact phases and shutdown time per phase. Assuming the following custom pod priority classes in a cluster, Pod priority class name Pod priority class value custon-class-a 1100000 custon-elass-b 10000 custon-class-e 1000 regular/unset ° Within the kubelet configuration the settings for shutdounsracePericdsyPodPriority could look like: Pod priority class value Shutdown period 1100000 to seconds 10000 180 seconds 1000 120 seconds ° 60 seconds ‘The corresponding kubelet config YAML configuration would be:shutdounGracePerfodyPodPriorsty: + priority: 100200 shutdouncracePerodseconds: 10 = priority: 26008 shutdounGracePerfoaSeconds: 180 = priority: 2602 = priority: @ shutdouncracePertodSeconds: 60 The above table implies that any pod with prsorsty value >= 100000 will get just 10 seconds to stop, any pod with value >= 10000 and < 100000 will get 180 seconds to stop, any pod with value >= 1000 and < 10000 will get 120 seconds to stop. Finally, all other pods will get 60 seconds to stop. ‘One doesnt have to specify values corresponding to all of the classes. For ‘example, you could instead use these settings Pod priority class value Shutdown period 100000 300 seconds 1000 120 seconds ° 60 seconds Inthe above case, the pods with custen-class-» wil go into the same bucket as custen-claes-< for shutdown, Irthere are no pods ina particular range, then the kubelet does not wait for pods in that priority range. Instead, the kubelet immediately skips to the next priority class value range. If this feature is enabled and no configuration is provided, then no ‘ordering action willbe taken, Using this feature requires enabling the GracefulNodeshutdenntasedonPod?riority feature gate, and setting ShutdouncracePeriodsyPodPriority in the kubelet config to the desired configuration containing the pod priority class values and their respective shutdown periods. Note: The abilty to take Pod priority into account during graceful node shutdown was introduced as an Alpha feature in Kubernetes v1.23. In Kubernetes 1.28 the feature is Beta ands enabled by default, Metrics graceful_shutdown_stars_tine_seconés and graceful_shutdou_end_tine seconds are emitted under the kubelet subsystem to monitor node shutdowns. Non-graceful node shutdown handling FEATURE STATE: che ‘Anode shutdown action may not be detected by kubelet’s Node Shutdown Manager, either because the command does not trigger the inhibitor locks mechanism used by kubelet or because of a user error, .e., the‘ShutdownGracePeriod and ShutdownGracePeriodCriticalPods are not configured properly, Please refer to above section Graceful Node Shutdown for more details, \When a node is shutdown but not detected by kubelet’s Node Shutdown Manager, the pods that are part of a StatefulSet will be stuck in terminating status on the shutdown node and cannot move to a new running node, This is because kubelet on the shutdown node is not available to delete the pods so the Statefulset cannot create anew pod with the same name. If there are volumes used by the pods, the VolumeAttachments will not be deleced from the original shutdown node so the volumes used by these pods cannot be attached to a new running node. As a result, the application running on the StatefulSet cannot function properly. ifthe original shutdown node comes up, the pods will be deleted by kubelet and new pods will be created on a different running node. If the original shutdown node dees not come up, these pods will be stuck in terminating status on the shutdown node forever. To mitigate the above situation, a user can manually add the taint node. kabernetes 40/0 effect to a Node marking it out-of-service. Ifthe NodeoutoFServicevelunedetach feature gate is enabled on kube-controller-manager, and a Node is marked out-of-service with this “aint, the pods on the node will be forcefully deleted ifthere are no ‘matching tolerations on it and volume detach operations for the pods terminating on the node will happen immediately. Ths allows the Pods on the out-of-service node to recover quicaly on a different node, of-service with either Notwecute OF Noschedule During a non-graceful shutdown, Pods are terminated in the two phases: 1. Force delete the Pods that do not tolerations. 1ave matching, ovt-of-service 2. Immediately perform detach volume operation for such pods. Note: ' Before adding the taint node.kubernetes. so/out-of-service it should be verified that the node is already in shutdown or power off state (notin the middle of restarting). + The user is required to manually remove the out-of-service taint after the pods are moved to a new node and the user has checked that the shutdown node has been recovered since the User was the one who originally added the taint Swap memory management FEATURE STATE: cuberseves v2.28 [ To enable swap on anode, the nogeswap feature gate must be enabled on the kubelet, and the --faii-swap-on command line flag or at1swapon figuration setting must be set to false. Warning: When the memory swap feature is turned on, Kubernetes data such as the content of Secret abjects that were written to tmpfs now could be swapped to disk. ‘Auser can also optionally configure nerorySwap.swapaenavior in order to specify how a node will use swap memory. For example,ImenorySwap: swopBehavion: Unlinitedsuap + untinizedsiap (default: Kubernetes workloads can use as much swap memory as they request, upto the system limit. + Lintteesuap : The utilization of swap memory by Kubernetes workloads is subject to limitations, Only Pods of Burstable Qos are permitted to employ swap. If configuration for nenorysuap_Is not specified and the feature gate is, ‘enabled, by default the kubelet will apply the same behaviour as the UunLinivessuap setting, With Lieteeasuop , Pods that do not fall under the Burstable QoS Classification (e, festeFfort / Guarantees Qos Pads) are prohibited from utilizing swap memary. To maintain the aforementioned security and node health guarantees, these Pods are not permitted to use swap memory when Linsteaswap isin effect, Prior to detailing the calculation of the swap limit, it is necessary to define the following terms: + nodetovaiMenory : The total amount of physical memory available on the node, + totalPodssuaptvatlable : The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use) + contatnertenoryRequest : The co! iner's memory request. ‘Swap limitation is configured as: (containerwenorytequest / ogeTotaldenory) * totalPodssuapsvatlable Itis important to note that, for containers within Burstable QoS Pods, itis possible to opt-out of swap usage by specifying memory requests that are ‘equal to memory limits. Containers configured in this manner will not have access to swap memory. ‘Swap is supported only with egroup v2, cgroup v1 is not supported. For mare information, and to assist with testing and provide feedback, please see the blog-post about Kubernetes 1,28: NodeSwap graduates ‘Beta, KEP-2400 and its design proposal. What's next Learn more about the folowing: + Components that make up a node. + API definition for Node, + Node section of the architecture design document. + Taints and Tolerations, + Node Resource Managers, + Resource Management for Windows nodes.2 - Communication between Nodes and the Control Plane This document catalogs the communication paths between the API server and the Kubernetes cluster. The intent isto allow users to customize their installation to harden the network configuration such that the cluster can be run on an untrusted network (or on fully public Ps on a cloud provider) Node to Control Plane Kubernetes has a "hub-and-spoke" API pattern, All APL usage from nodes (or the pods they run) terminates at the API server. None of the other control plane components are designed to expose remote services. The {API server is configured to listen for remote connections on a secure HTTPS port (typically 443) with one or more forms of client authentication enabled. One or more forms of authorization should be enabled, especially if anonymous requests or service account tokens are allowed. Nodes should be provisioned with the public raot certificate for the cluster such that chey can connect securely to the API server along with valid cient credentials. A good approach is that the client credentials provided to the kubelet are in the form of a client certificate, See kubelet TLS bbootstcapping for automated provisioning of kubelet client certificates. Pods at wish to connect to the API server can do so securely by leveraging a service account so that Kubernetes will automatically inject ‘the public root certificate and a valid bearer token into the pod when its instantiated, The kubernetes service (in derault_ namespace) is configured with a virtual IP address that is redirected (via kube-proxy ) to the HTTPS. endpoint on the API server. The control plane components also communicate with the API server over the secure port ‘As a result, the default operating mode for connections from the nodes and pod running on the nodes to the control plane is secured by default land can run over untrusted andor public networks, Control plane to node There are two primary communication pats from the contral pane the AP server to the nodes. The fists fom he AP server tothe Kubelet, process which runs on each node inthe duster. The seconds ram the AP Server to any node, pod or service through the API server's proxy functionality. API server to kubelet ‘The connections from the API server to the kubelet are used for: + Fetching logs for pods. Attaching (usually through kubects ) to running pods. + Providing the kubelet’s port-forwarding functionality, ‘These connections terminate at the kubelet's HTTPS endpoint. By default, the AP! server does not verify the kubelet's serving certificate, which makes ‘the connection subject to man-in-the-middle attacks and unsafe to run ‘over untrusted and/or public networks.Tovverify this connection, use the -- authorsty flag to provide the API server with a root certificate bundle to use to verly the kubelet's serving certificate et-certific If that is not possible, use SSH tunneling between the API server and kubelet if required to avoid connecting over an untrusted or public network Finally, Kubelet authentication and/or authorization should be enabled to secure the kubelet API API server to nodes, pods, and services ‘The connections from the API server to a node, pod, or service default to plain HTTP connections and are therefore neither authenticated nor ‘encrypted, They can be run over a secure HTTPS connection by prefixing httos: to the node, pod, or service name in the API URL, but they will not validate the certificate provided by the HTTPS endpoint nor provide client credentials. So while the connection will be encrypted, It will not provide any guarantees of integrity. These connections are not currently safe { run over untrusted or public networks. SSH tunnels Kubernetes supports SSH tunnels to protect the control plane to nodes ‘communication paths. In this configuration, the API server initiates an SSH tunnel to each node in the cluster (connecting to the SSH server listening (on port 22) and passes all traffic destined for a kubelet, node, pod, or service through the tunnel. This tunnel ensures that the traffic is not ‘exposed outside of the network in which the nodes are running. Note: SSH tunnels are currently deprecated, so you shouldn't opt to Use them unless you know what you are doing, The Konnectivty ssenvice Is a replacement for this communication channel Konnectivity service FEATURE STATE: caters es VALI [ {As a replacement to the SSH tunnels, the Konnectivty service provides TCP level proxy for the control plane to cluster communication, The Konnectivty service consists of two parts: the Konnectvity server in the control plane network and the Konnectivity agents in the nodes network. The Konnectivty agents initiate connections to the Konnectivity server and maintain the network connections. After enabling the Konnectivity service, all control plane to nodes traffic goes through these connections. Follow the Konnectivity service task to set up the Konnectivity service in your cluster. What's next Read about the Kubernetes control plane components + Learn more about Hubs and Spoke model + Learn how to Secure a Cluster + Learn more about the Kubernetes API + Setup Konnectvity service Use Port Forwarding to Access Applications in a Cluster * Learn how to Fetch Ings for Pods, use kubect! portforward3 - Controllers Inroboties and automation, a contro! loop is a non-terminating loop that regulates the state ofa system, Here is one example of a control loop: a thermostat in a room. When you set the temperature, that's telling the thermostat about your desired state, The actual room temperature isthe current state. The ‘thermostat acts to bring the current state closer to the desired turning equipment on or off. 2, by In Kubernetes, controllers are control loops that watch the state of your cluster, then make or request changes where needed. Each controller tries to move the current cluster state closer to the desired state, Controller pattern ‘controller tracks at least one Kubernetes resource type. These objects have a spec field that represents the desired state, The controlle(s) for that resource are responsible for making the current state come closer to that desired state, The controller might carry the action out itself; more commonly, in Kubernetes, a controller will send messages to the API server that have useful side effects. You'll see examples of this below. Control via AP! server The Job controller is an example of a Kubernetes buil-in controller. Built-in controllers manage state by interacting with the cluster API server. Job is a Kubernetes resource that runs a ‘erry out a task and then stop. or perhaps several Pods, to (Once scheduled, Pod objects become part of the desired state for a kubelet) When the Job controller sees a new task it makes sure that, somewhere in your cluster, the kubelets on a set of Nodes are running the right number ‘of Pods to get the work done. The Job controller does not run any Pods or containers itself Instead, the Job controller tells the API server to create or remove Pods. Other components in the control plane act on the new information (there are new Pods to schedule and run), and eventually the work's done. After you create a new Jab, the desired state is for that ob to be ‘completed, The job controller makes the current state for that Job be nearer to your desired state: creating Pods that do the work you wanted {or that Job, so that the Job is closer to completion. Controllers also update the objects that configure them. For example: once the work is done for a Job, the Job controller updates that Job object to mark it Finishes (his isa bit lke how some thermostats turn a light off to indicate that your room is now at the temperature you set). Direct control In contrast with Job, some controllers need to make changes to things outside of your clusterFor example, if you use a control loop to make sure there are enough Nodes in your cluster, then that controller needs something outside the ‘current cluster to set up new Nodes when needed. Controllers that interact with external state find thelr desired state from the AP! server, then communicate directly with an external system to bring the current state closer inline (There actually is a co cluster) ler that horizontally scales the nodes in your The important point here is that the controller makes some changes to bring about your desired state, and then reports the current state back to your clusters API server. Other control loops can observe that reported {data and take their own actions. Inthe thermostat example, Ifthe room is very cold then a different controller might also turn on a frost protection heater. With Kuberné clusters, the control plane indirectly works with IP address management tools, storage services, cloud provider APIs, and other services by ‘extending Kubernetes to implement that. Desired versus current state kKubernetes takes a cloud-native view of systems, and 's able to handle constant change Your cluster could be changing at any point as work happens and control loops autornatically fx failures, This means that, potentially, your cluster never reaches a stable state. {As long as the controllers for your cluster are running and able to make useful changes, it doesn't matter ifthe overall state is stable or not, Design ‘As a tenet of its design, Kubernetes uses lots of controllers that each manage a particular aspect of cluster state. Most commonly, a particular control loop (controller) uses one kind of resource as Its desired state, and has a different kind of resource that it manages to make that desired state happen. For example, a controller for Jobs tracks Job objects (to discover new work) and Pod objects (to run the Jobs, and then to see when the work is finished). In this case something else creates the Jobs, whereas the Job controller creates Pods. Its useful to have simple controllers rather than one, monolithic set of Control loops that are interlinked, Controllers can fail, so Kubernetes is designer to allow for that. Note: ‘There can be several controllers that create or update the same kind of object. Behind the scenes, Kubernetes controllers make sure that they only pay attention to the resources linked to their controlling For example, you can have Deployments and Jobs; these both create Pods. The Job controller does not delete the Pods that your Deployment created, because there is information (labels) the controllers can use to tell those Pods apart.Ways of running controllers Kubernetes comes with 2 set of builtin controllers that run inside the kube-controller-manager. These buik-in controllers provide important core benaviors, The Deployment controller and jab controller are examples of controllers that come as part of Kubernetes itself (‘bulltin” controllers). Kubernetes lets you run a resilient control plane, so that if any of the built-in controllers were to fal, another part of the control plane will take over the work You can find controllers that run outside the control plane, to extend Kubernetes. Or, if you want, you can write a new controller yourself. You ‘can run your own controller as a set of Pods, or externally to Kubernetes. What fies best will depend on what that particular controller does, What's next + Read about the Kubernetes control plane * Discover some of the basic Kubernetes objects + Learn more about the Kubernetes AP! + Ifyou want to write your own controller, see Extension Patterns in Extending Kubernetes.4-Leases Distributed systems often have a need for leases, which provide a mechanism to lock shared resources and coordinate activity between members of a set, In Kubernetes, the lease concept is represented by Lease objects in the coordination. k8s.io API Group, which are used for system-critical capabilities such as node heartbeats and component-level leader election. Node heartbeats Kubernetes uses the Lease API to communicate kubelet node heartbeats, to the Kubernetes API server. For every Node , there isa Lease object with amatching name in the kube-node-tease namespace. Under the hood, every kubelet heartbeat is an update request to this Lease object, updating the spec.renewrine eld for the Lease. The Kubernetes control plane uses the time stamp of this feld to determine the availabilty ofthis Node See Node Lease objects for more details, Leader election Kubernetes also uses Leases to ensure only one instance of a component is running at any given time, This is used by control plane components like kebe-controller-nanager and kube-scheeuler in HA configurations, where ‘only one instance af the component should be actively running while the ‘other instances are on stand-by. API server identity FEATURE STATE: coho Starting in Kubernetes v1.26, each uve-apiserver uses the Lease API to publish its identity to the rest of the system. While not particularly useful ‘on its own, this provides a mechanism for clients to discover how many Instances of kube-apiservar are operating the Kubernetes control plane, Existence of kube-apiserver leases enables future capabilities that may require coordination between each kube-apiserver. You can inspect Leases owned by each kube-apiserver by checking for lease objects in the kube-systen namespace with the name kube- aptserver-csha2se-hash> . Alternatively you can use the label selector kubactl -n kube-systen get lease -2 apiserver.ubernetes. o/iaentit apiserver-7oe90061¢59436802ddaF1376e aptserver-7oe840616590368 apiserver-idfer752bc0366376276363868 apiserver-1dFer752bcb36627¢ ‘The SHA256 hash used in the lease name is based on the OS hostname as, seen by that API server. Each kube-apiserver should be configured to use a hostname that is unique within the cluster, New instances of kube-apiserver that use the same hostname will take aver existing Leases using anew holder identity, as opposed to instantiating new Lease objects. You ‘can check the hostname used by kube-apisever by checking the value of the kubernetes.io/rostnane label: kubectl -n kube-systen get lease apiserver-o7aseadbsbe72ctas#3d1c3762 apiVersion: coordination.kis.40/vi kind: Lease eneationtinestanp: *2623-07-07713:16:487 abel apiserver.kubernetes.do/identity: kube-apiserver kubernetes.do/hostrane: master-1 rane: apiserver-07as¢a9b90072¢495#361¢3702 anespace: kube-systen resourceversion: 7334899 wid: 99879205-1839-4523-p215-eddeess2aco% holdertdentity: apiserver-a7aseadssbo72¢4a5#361¢3702. leaseDurationsecends: 3600 renewrine: "2023-07-04721:58:48.0658882 291497-0935-4 Expired leases from kube-apiservers that no longer exist are garbage collected by new kube-apiservers after 1 hour. You can disable API server Identity leases by disabling the spiservertdentity feature gate. Workloads Your own workload can define its own use of Leases. For example, you might run a custom controller where a primary or leader member performs operations that its peers do not. You define a Lease so that the controller replicas can select or elect a leader, using the Kubernetes API for coordination. If you do use a Lease, it's @ good practice to define a name for the Lease that is obviously linked to the product or component. Far ‘example, ifyou have a component named Example Foo, use a Lease rammed exanple-fo0 Ifa cluster operator or another end user could deploy multiple instances of ‘a component, select a name prefix and pick a mechanism (such as hash of the name of the Deployment) to avaid name collsions for the Leases. You can use another approach so long as it achieves the same outcome: different software products do not conflict with one another.5 - Cloud Controller Manager FEATURE STATE: cuberneves v1 Cloud infrastructure technologies let you run Kubernetes on public, private, and hybrid clouds. Kubernetes believes in automated, API-driven infrastructure without tight coupling between components. The cloud-contraller-manager is a Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets, you link your cluster into your cloud provider's API, and separates out the ‘components that interact with that cloud platform from components that ‘only interact with your cluster, By decoupling the interoperability logic between Kubernetes and the underlying cloud infrastructure, the cloud.controller-manager compone ‘enables cloud providers to release features at a different pace compared to the main Kubernetes project. ‘The cloud-controller-manager is structured using a plugin mechanism that allows different cloud providers to integrate their platforms with Kubernetes, Design The cloud controller manager runs in the control plane as a replicated set of processes (usually, these are containers in Pods). Each cloud.contraller- ‘manager implements multiple controllers ina single process. Note: You can also run the cloud controller manager as a Kubernetes rather than as part of the control plane, Cloud controller manager functions The controllers inside the cloud controller manager include: Node controller The node controllers responsible for updating Node objects when new Servers ae created in your cloud infrastructure. The node controller obtains information about the hosts running Inside your tenancy wth the cloud provider. The nade contrllr performs the folowing functions 1. Update a Node object with the corresponding server's unique identifier obtained from the cloud provider API. 2. Annotating and labelling the Node object with cloud-specific information, such as the region the node is deployed into and theresources (CPU, memory, etc) that it has available. 3. Obtain the node's hostname and network addresses. 4.Nerifying the node's health. In case a node becomes unresponsive, this controller checks with your cloud provider's API to see ifthe server has been deactivated / deleted / terminated, the node has been deleted from the cloud, the controller deletes the Node object, from your Kubernetes cluster. ‘Some cloud provider implementations split this into a node controller and a separate node lifecycle controller. Route controller The route controllers responsible for configuring routes in the cloud appropriately so that containers on different nodes in your Kubernetes cluster can communicate with each other. Depending on the cloud provider, the route controller might also allocate blocks of IP addresses for the Pod network, Service controller Servic integrate with cloud infrastructure components such as managed load Balancers, IP addresses, network packet filtering, and target health checking, The service controller interacts with your cloud provider's APIs to set up load balancers and other infrastructure components when you declare a Service resource that requires them. Authorization This section breaks down the access that the cloud controller manager requires on various API objects, in order to perform its operations. Node controller The Node controller only works with Node objects. trequires full access to read and modify Node objects. vimede + get + lise + create © update + patch + wateh + delete Route controller The route controller stens to Node object creation and configures routes appropriately, Itrequires Get access to Nade objects. wuinede + getService controller The service controller watches for Service object create, update and delete events and then configures Endpoints for those Services appropriately (for EndpointSlies, the kube-controller-manager manages these on demand). To access Services, it requires list, and watch access. To update Services it requires patch and update access, To set up Endpoints resources for the Services, it requires access to create, list, get, watch, and update. wuiservice list + get * wateh + patch + update Others The implementation of the core of the cloud controller manager requires ‘access to create Event objects, and to ensure secure operation, it requires ‘access to create ServiceAccounts. wayevent + create + patch + update Wilservicetccount © create ‘The RBAC ClusterRoke for the cloud controller manager looks like:apiversion: rbac.authorization. 8s. fo/v1 kind: Clustertoie metadata: rane: cloud-controler-ranager rules: + apicroups: verbs: = ereate = pater update = apicroupe: + nodes verbs: = apicroups: nodes/status verbs: = patch = apscroupe: = services verbs: = list = vpdate = water isroups: = serviceaccounts verbs: ~ apicroups: ~ persistentvolunes verbs: + eet = list = vpdate ~ apisroups: = endpoints verbs eet = list vpdate What's next + Cloud Controller Manager Administration has instructions on running ‘and managing the cloud controller manager.+ To upgrade a HA control plane to use the cloud controller manager, ‘see Migrate Replicated Contral Plane To Use Cloud Controller Manager. ‘+ Want to know how to implement your own cloud controller manager, ‘or extend an existing project? 2. The cloud controller manager uses Go interfaces, specifically, ClovsProvider interface defined in clous.go from ‘ubernetes/cloud-provider to allow implementations from any cloud to be plugged in 2 The implementat 0 of the shared controllers highlighted in this, document (Node, Route, and Service), and some scaffolding, along with the shared cloudprovider interface, is part of the Kuberneces core, Implementations specific ta cloud providers. are outside the core of Kubernetes and implement the CloveProvider interface. @ For more information about developing plugins, see Developing Clous Controller Manager.6 - About cgroup v2 ‘On Linux, control groups constrain resources that are allocated to processes. ‘The kubelet and the underlying container runtime need to interface with groups to enforce resource management for pods and containers which Includes cpu/memory requests and limits for containerized workloads, ‘There are two versions of cgroups in Linux: ¢group v1 and group v2. group v2is the new generation of the group APL What is cgroup v2? FEATURE STATE: cubeneves v2.25 [ ‘group v2 is the next version of the Linux cgroup API. cgroup v2 provides a unified control system with enhanced resource management capabilities. group v2 offers several improvements over cgroup v1, such as the following: * Single unified hierarchy design in API * Safer sub-tree delegation to containers + Newer features lke Pressure Stall Information + Enhanced resource allocation management and isolation across mukiple resources © Unified accounting for different types of memory allocations (network memory, kernel memory, ete) 2 Accounting for non-immediate resource changes such as page cache write backs Some Kubernetes features exclusively use ¢group v2 for enhanced resource management and isolation. For example, the MemaryQoS feature improves memory QoS and relies on cgroup v2 primitives, Using cgroup v2 The recommended way to use group v2 isto use a Linux distribution that enables and uses cgroup v2 by default. To check if your distribution uses cgroup v2, refer to Mentily cgroup, version on Linux nodes Requirements «group v2 has the folowing requirements # 0S distribution enables cgroup v2 * Linux Kernel version is 5.8 or later * Container runtime supports ¢group v2. For example! © containerd v1.4 and later 9° cris0V1.20 and later The kubelet and the container runtime are configured to use the systemd cgroup driver Linux Distribution cgroup v2 support For alist of Linux distributions that use cgroup v2, refer to the cgroup v2 documentation+ Container Optimized 05 (since M97) Ubuntu (since 21.10, 22.04+ recommended) + Debian GNU/Linux (since Debian 11 bullseye) + Fedora(since 31) + Arch Linux (since April 2021) + RHEL and RHEL ike distributions (since 9) To check if your distribution is using cgroup v2, refer to your distribution’s documentation o follow the instructions in Identify the ¢group version on Linux nodes, You can also enable cgroup v2 manually on your Linux distribution by modifying the kernel cmdline boot arguments. your distribution uses GRUB, systend.untfied_cgroup_hierarchyet should be added in GAUa_crOLINE_WX Under /ete/derautt/erub , followed by sudo update grub . However, the recommended approach is to use a distribution that already enables cgroup v2 by defal Migrating to cgroup v2 To migrate to cgroup v2, ensure that you meet the requirements, then upgrade to a kernel version that enables cgroup v2 by default, The kubelet automatically detects that the OS is running on cgroup v2 and performs accordingly with no additional configuration required, There should not be any noticeable difference in the user experience when ‘switching to cgroup v2, unless users are accessing the cgroup file system directly either on the node or from within the containers. ‘cproup v2 uses a diferent API than cgroup v1, sof there are any applications chat directly access the group file system, they need to be updated to newer versions that support cgroup v2. For example: + Some third-party monitoring and security agents may depend on the group filesystem. Update these agents to versions that support ‘group v2. + Ifyou run cAdvisor as a stand-alone Daemonset for monitoring pods ‘and containers, update it to v0.43.0 or later. + Ifyou deploy Java applications, prefer to use versions which fully support egroup v2: © Onen|DKJ HotSpot: jdk8u372, 11.0.16, 15 and later © IBM Semeru Runtimes:jdk8u345-b01, 11.0.16.0, 17.0.40, 180.20 and later © IBM Java: 8.07.15 and later + Ifyou are using the uber-go/automaxprocs package, make sure the version you use is v1.5.1 or higher. Identify the cgroup version on Linux Nodes ‘The cgroup version depends on the Linux distribution being used and the default egroup version configured on the OS. To check which cgroup version your distribution uses, run the stat ~fe 37 /sys/fs/egroup/ ‘command on the node: stat fe &T /sys/#s/cgroup/ For cgroup v2, the output is , ent -key-Fie-, 1 requestheoder-oltowed-nanes can be set t0 Blank to oltow any Common | ~-requestheader-alloved-nanes-

K 8 S 1
No ratings yet
K 8 S 1
1 page
Kubernetes Overview
No ratings yet
Kubernetes Overview
7 pages
k8s Module Ansible
No ratings yet
k8s Module Ansible
17 pages
Kubernetes - Sunil
No ratings yet
Kubernetes - Sunil
6 pages
Kubernetes Interview Questions
No ratings yet
Kubernetes Interview Questions
16 pages
Unit 5 DAK
No ratings yet
Unit 5 DAK
32 pages
k8s Basic
No ratings yet
k8s Basic
21 pages
Kubernetes in Action (2018) PDF
100% (5)
Kubernetes in Action (2018) PDF
628 pages
Understanding The Kubernetes Architecture With A Use-Case - Edureka
No ratings yet
Understanding The Kubernetes Architecture With A Use-Case - Edureka
8 pages
Scheduling Pods Node Selector Afinity Taints and Tolerations
No ratings yet
Scheduling Pods Node Selector Afinity Taints and Tolerations
6 pages
Whyk 8 S
No ratings yet
Whyk 8 S
43 pages
Lecture 00 Understand-Kubernetes
No ratings yet
Lecture 00 Understand-Kubernetes
94 pages
Assignment No - 5 - Kubernetes
No ratings yet
Assignment No - 5 - Kubernetes
11 pages
K8s Components
No ratings yet
K8s Components
5 pages
Kubernetes Concepts 2
No ratings yet
Kubernetes Concepts 2
2 pages
03-Kubernetes Basics
No ratings yet
03-Kubernetes Basics
87 pages
6 Kubernetes
No ratings yet
6 Kubernetes
26 pages
Kubernetes
No ratings yet
Kubernetes
64 pages
Kubernetes Architecture 1631845920
No ratings yet
Kubernetes Architecture 1631845920
10 pages
Kubernetes Notes
No ratings yet
Kubernetes Notes
35 pages
Kubernetes Is An Application
No ratings yet
Kubernetes Is An Application
5 pages
Kubernetes 1735866835
No ratings yet
Kubernetes 1735866835
77 pages
Kubernetes
No ratings yet
Kubernetes
43 pages
K8 Architectures
No ratings yet
K8 Architectures
30 pages
05 Architecture.v2
No ratings yet
05 Architecture.v2
73 pages
PowerPoint Presentation Kubernetes+ CKA +0200+ +scheduling
No ratings yet
PowerPoint Presentation Kubernetes+ CKA +0200+ +scheduling
34 pages
2 Architecture
No ratings yet
2 Architecture
6 pages
Section 2: Scheduling
No ratings yet
Section 2: Scheduling
7 pages
Kubernetes Scenario Based Questions
No ratings yet
Kubernetes Scenario Based Questions
9 pages
Devops Shack 50 Complex Kubernetes Scenario-Based Q&A: 1. Scenario: Zero-Downtime Deployment For Multiple Services
No ratings yet
Devops Shack 50 Complex Kubernetes Scenario-Based Q&A: 1. Scenario: Zero-Downtime Deployment For Multiple Services
45 pages
Kubernetes Simple Notes
No ratings yet
Kubernetes Simple Notes
46 pages
2023 Kubernetes Architecture Lessons
No ratings yet
2023 Kubernetes Architecture Lessons
27 pages
Updated Jntu Attendance
No ratings yet
Updated Jntu Attendance
4 pages
Kuber Net Es
No ratings yet
Kuber Net Es
219 pages
Kubernetes Roadmap
No ratings yet
Kubernetes Roadmap
60 pages
Kubernetes From Scratch: By: Eng. Mohamed Elemam Email
100% (2)
Kubernetes From Scratch: By: Eng. Mohamed Elemam Email
72 pages
03 - Kubernetes Architecture
No ratings yet
03 - Kubernetes Architecture
19 pages
Section 2
No ratings yet
Section 2
81 pages
Kubernetes Demonstraion With 100slides
No ratings yet
Kubernetes Demonstraion With 100slides
97 pages
Kuber Net Es
No ratings yet
Kuber Net Es
25 pages
Kubernates
No ratings yet
Kubernates
75 pages
Cka PDF
80% (5)
Cka PDF
58 pages
6.1) Kubernetes Detailed Notes
100% (3)
6.1) Kubernetes Detailed Notes
75 pages
k8s Primer
No ratings yet
k8s Primer
40 pages
Kubernetes Definition
No ratings yet
Kubernetes Definition
34 pages
Scheduling, Preemption and Eviction - Kubernetes
100% (1)
Scheduling, Preemption and Eviction - Kubernetes
66 pages
Kubernetes Simple Notes
0% (1)
Kubernetes Simple Notes
46 pages
K8s Report
No ratings yet
K8s Report
14 pages
15 Scale Down The Cluster
No ratings yet
15 Scale Down The Cluster
7 pages
Introduction To Kubernetes
100% (6)
Introduction To Kubernetes
182 pages
Kubernetes-Personal Notes-Naresh Kumar Chityala
No ratings yet
Kubernetes-Personal Notes-Naresh Kumar Chityala
9 pages
Kubernetes
No ratings yet
Kubernetes
66 pages
Cluster Architecture - Kubernetes
100% (1)
Cluster Architecture - Kubernetes
29 pages
Cheat Sheet: All Clusterrolebindings Clusterroles Function
No ratings yet
Cheat Sheet: All Clusterrolebindings Clusterroles Function
1 page
Kubernetes For Beginners
100% (1)
Kubernetes For Beginners
29 pages
Cka PDF
No ratings yet
Cka PDF
56 pages
Lab14 - Kubernetes Scheduler
No ratings yet
Lab14 - Kubernetes Scheduler
12 pages
Kubernetes: A Comprehensive Overview
100% (1)
Kubernetes: A Comprehensive Overview
67 pages
Kubernetes For Everyone
No ratings yet
Kubernetes For Everyone
34 pages
Understanding Kubernetes: A Guide To
No ratings yet
Understanding Kubernetes: A Guide To
21 pages
1) Node Selector:: Let Us Start With A Simple Example
No ratings yet
1) Node Selector:: Let Us Start With A Simple Example
71 pages
Concepts
No ratings yet
Concepts
500 pages

k8s Arch Conn

Uploaded by

k8s Arch Conn

Uploaded by

You might also like