0% found this document useful (0 votes)
28 views14 pages

Chapter 4

Uploaded by

Steven Ayare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

Chapter 4

Uploaded by

Steven Ayare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 4

Infrastructural Components for MLOPS applications

4.1 Infrastructure as Code (IaC)

Infrastructure as Code (IaC) is a modern approach to managing and provisioning infrastructure using code rather
than manual processes. It enables the automation of infrastructure setup and management by defining resources like
servers, networks, and storage in machine-readable configuration files or scripts. IaC helps ensure consistency,
scalability, and faster deployment of infrastructure.

Here’s a brief overview of its core concepts:

1. Definition: IaC allows teams to define and manage their infrastructure using configuration files or scripts.
This includes servers, networks, storage, and other components needed to run applications.
2. Version Control: Just like application code, infrastructure code can be versioned using tools like Git. This
allows teams to track changes, collaborate effectively, and roll back to previous configurations if needed.
3. Automation: IaC automates the setup and management of infrastructure, reducing the potential for human
error. Automated provisioning tools, such as Terraform, AWS CloudFormation, and Ansible, allow teams
to deploy infrastructure quickly and consistently.
4. Consistency: Since infrastructure is defined in code, it can be replicated across different environments
(development, testing, production) with consistent configurations. This reduces discrepancies and
configuration drift.
5. Scalability: IaC enables dynamic scaling of infrastructure resources based on demand. Automated scripts
can quickly spin up or down resources, ensuring optimal performance.
6. Testing: Infrastructure can be tested in the same way as application code, enabling teams to validate
configurations before deploying them in production.

Other essential concepts

1. Declarative vs. Imperative


o Declarative: You define the desired state of the infrastructure, and the IaC tool ensures it is
achieved.
Example: Terraform, AWS CloudFormation.
o Imperative: You define the exact steps to achieve the desired state.
Example: Ansible playbooks.
2. Version Control
o Infrastructure definitions are stored as code in repositories like Git, enabling tracking of changes
and rollback capabilities.
3. Idempotency
o Applying the same IaC script multiple times results in the same infrastructure state, ensuring
predictable and consistent outcomes.
4. Automation and Orchestration
o IaC automates resource provisioning and manages dependencies between different components of
the infrastructure.

How IaC Works


1. Define Infrastructure: Write configurations in a file (e.g., Terraform .tf files, YAML, or JSON).
Example of an AWS EC2 instance configuration in Terraform:

hcl
Copy code
resource "aws_instance" "example" {
ami = "ami-12345678"
instance_type = "t2.micro"
}

2. Provision Infrastructure: Use an IaC tool to execute the configuration and deploy the defined resources.
3. Manage Changes: Modify the code and reapply it to update the infrastructure. The tool determines the
changes and applies only the necessary updates.
4. Monitor State: IaC tools maintain a state file to track the current state of the infrastructure.

Benefits of IaC

1. Speed and Agility


o Automates the provisioning process, significantly reducing setup time.
o Enables rapid scaling of resources in response to demand.
2. Consistency and Reliability
o Eliminates manual configuration errors by defining infrastructure as code.
o Ensures consistent environments across development, staging, and production.
3. Versioning and Auditability
o Infrastructure changes are tracked in version control systems, making rollbacks and auditing easy.
4. Cost-Efficiency
o Reduces operational overhead by automating repetitive tasks.
o Optimizes resource allocation through automation.
5. Collaboration
o Teams can collaborate on infrastructure definitions using tools like Git, enabling seamless
integration with CI/CD pipelines.
6. Speed and Efficiency: Rapidly deploy infrastructure with minimal manual intervention.
7. Reduced Costs: Lower operational costs by automating resource management and reducing errors.
8. Improved Collaboration: Facilitate collaboration between development and operations teams.
9. Enhanced Security: Implement security best practices through code, ensuring consistent configurations.

Challenges of IaC

1. Learning Curve
o Teams need to learn new tools, languages, and frameworks for managing infrastructure as code.
2. Complexity
o Large-scale deployments can lead to complex configurations and state management.
3. State Management
o Managing state files in shared environments requires careful handling to avoid conflicts or data
loss.
4. Security
o Sensitive information (e.g., API keys) in IaC scripts needs to be managed securely.
Popular IaC Tools

Tool Description
Terraform Open-source tool for multi-cloud environments, uses a declarative approach.
AWS CloudFormation Native AWS IaC tool for managing AWS resources.
Ansible Configuration management and IaC tool that uses an imperative approach.
Pulumi Supports multiple languages (Python, TypeScript) for defining infrastructure.
Chef Configuration management tool, often used for infrastructure automation.
SaltStack Tool for event-driven infrastructure automation and management.
Google Deployment Manager Native IaC tool for Google Cloud Platform (GCP).

IaC in DevOps

IaC plays a critical role in DevOps by enabling:

1. Automation of Infrastructure Deployment: Reducing manual intervention in setting up environments.


2. Integration with CI/CD Pipelines: Automatically provisioning and configuring infrastructure for
applications.
3. Environment Consistency: Ensuring that developers, testers, and production teams work on identical
environments.

4.2 AzureDevOPS

Azure DevOps is a cloud-based set of development tools and services provided by Microsoft to support the entire
software development lifecycle. It offers a comprehensive platform for planning, developing, testing, and deploying
applications, making it easier for teams to collaborate and deliver high-quality software quickly. Azure DevOps has
comprehensive set of tools and services provided by Microsoft for planning, developing, testing, and delivering
software efficiently. It supports DevOps practices by offering integrated solutions for Continuous Integration
(CI), Continuous Delivery (CD), and infrastructure automation, enabling teams to improve collaboration and
streamline software delivery.

Key Components of Azure DevOps

1. Azure Boards:
o Purpose: Project management and tracking.
o Features: Provides tools for planning and tracking work using Kanban boards, backlogs, and
sprint planning tools. It helps teams manage their tasks and visualize progress.
2. Azure Repos:
o Purpose: Source code management.
o Features: Offers Git repositories or Team Foundation Version Control (TFVC) for version
control. Teams can collaborate on code, conduct code reviews, and manage branches.
3. Azure Pipelines:
o Purpose: Continuous integration and continuous delivery (CI/CD).
o Features: Automates the building, testing, and deployment of applications. It supports multiple
languages and platforms and can deploy to various environments, including Azure, AWS, and on-
premises servers.
4. Azure Test Plans:
o Purpose: Testing and quality assurance.
o Features: Provides tools for manual and exploratory testing, automated test management, and
integration with CI/CD pipelines to ensure code quality.
5. Azure Artifacts:
o Purpose: Package management.
o Features: Allows teams to create, host, and share packages (e.g., NuGet, npm, Maven) from
public and private sources, facilitating dependency management.

Key Features of Azure DevOps

1. Azure Boards
o Provides agile planning, tracking, and project management tools.
o Features include Kanban boards, Scrum support, and customizable dashboards.
2. Azure Repos
o A Git-based source control system for managing code.
o Offers pull requests, code reviews, and branch policies to ensure high-quality code.
3. Azure Pipelines
o CI/CD pipelines for building, testing, and deploying applications.
o Supports multiple languages, platforms, and cloud environments, including on-premises and
multi-cloud.
4. Azure Test Plans
o Provides tools for automated and manual testing.
o Features include test case management, exploratory testing, and actionable insights.
5. Azure Artifacts
o A package management system for managing dependencies.
o Supports multiple formats like Maven, NuGet, and npm.
6. Integration with Other Tools
o Seamless integration with GitHub, Jenkins, Docker, Kubernetes, and other third-party tools.
7. Cloud and On-Premises Support
o Available as a cloud service (Azure DevOps Services) or as an on-premises solution (Azure
DevOps Server).

Azure DevOps Workflow

1. Plan:
o Use Azure Boards to define work items, prioritize tasks, and plan sprints.
2. Develop:
o Manage source code in Azure Repos or integrate with GitHub.
o Collaborate using pull requests and branch policies.
3. Build & Test:
o Automate builds and run tests using Azure Pipelines.
o Incorporate unit tests, integration tests, and security scans.
4. Release:
o Automate deployments to staging and production environments using Azure Pipelines.
o Enable rollback for failed deployments.
5. Monitor:
o Use monitoring tools like Azure Monitor or integrate with other observability tools for
performance tracking.

Benefits of Azure DevOps

1. End-to-End Solution:
o Covers the entire software development lifecycle, from planning to deployment.
2. Platform Agnostic:
o Supports development for any language, platform, or cloud (e.g., AWS, Google Cloud).
3. Scalable:
o Suitable for small teams as well as large enterprises.
4. Seamless Collaboration:
o Helps teams collaborate effectively with integrated tools.
5. Extensibility:
o Offers a vast marketplace of extensions for additional functionalities.
6. Secure and Reliable:
o Built with enterprise-grade security and reliability.

4.3 Containerization

Containerization is a technology that allows developers to package applications and their dependencies into
standardized units called containers. Containerization is a software deployment process that bundles an application’s
code with all the files and libraries it needs to run on any infrastructure. This approach ensures that the application
runs consistently across different environments, whether it's on a developer's machine, a testing server, or a
production environment. Containers are lightweight, isolated, and share the host operating system's kernel, which
makes them more efficient than traditional virtual machines.

 Docker Container

A Docker container is a lightweight, portable, and isolated environment that packages an application along with its
dependencies, libraries, and configuration files. Containers enable developers to build, test, and deploy applications
consistently across different environments, such as development, staging, and production.

Key Concepts of Docker Containers

1. Isolation:
Containers isolate applications and their dependencies from the host system and other containers, ensuring
that one container does not interfere with another.
2. Portability:
Containers can run consistently across any environment that has Docker installed, whether it’s on a
developer's laptop, a test server, or in the cloud.
3. Lightweight:
Unlike virtual machines, containers share the host OS kernel, making them smaller and faster to start.
4. Immutability:
Containers are based on images, which are read-only templates. This ensures that the application inside a
container remains consistent across deployments.

Key Components of Docker Containers

1. Image:
A blueprint for a container, defining the application and its dependencies.
2. Container:
A running instance of an image.
3. Docker Engine:
The runtime environment that manages containers.
4. Volumes:
Storage mechanisms for persisting data outside the container lifecycle.
5. Network:
Docker allows containers to communicate with each other or with external systems through networking
configurations.

 Docker Containers vs Virtual Machines

Aspect Docker Containers Virtual Machines


Isolation Process-level isolation, shares OS kernel Full OS isolation, includes OS kernel
Resource Usage Lightweight, uses fewer resources Heavyweight, higher resource usage
Startup Time Almost instant Can take minutes
Portability Highly portable Less portable, tied to specific hypervisors
Performance Near-native performance Overhead due to hypervisor

 Common Docker Commands

Command Description
docker run Create and run a new container.
docker ps List running containers.
docker stop <container_id> Stop a running container.
docker rm <container_id> Remove a stopped container.
docker images List available Docker images.
docker build -t <name> . Build a Docker image from a Dockerfile.
docker exec -it <container_id> Run a command inside a running container.
docker logs <container_id> View logs of a container.
docker-compose up Start multiple containers defined in a docker-compose.yml file.

 Container Lifecycle
The lifecycle of a container typically involves several stages, which can be managed using commands in
containerization tools like Docker. Here are the key stages of the container lifecycle along with relevant commands:

1. Image Creation:

o Command: docker build


o Description: This command creates a container image from a Dockerfile, which contains
instructions on how to build the image (e.g., base image, dependencies, configuration).

2. Image Listing:

o Command: docker images

o Description: This command lists all the images available on the local machine.

3. Container Creation:

o Command: docker create

o Description: This command creates a new container from an image but does not start it
immediately.

4. Container Starting:

o Command: docker run (also implicitly starts a container created with docker create)

o Description: This command creates and starts a container in one step. It can also include options to
allocate resources, set environment variables, or bind ports.

5. Container Management:

o Starting: docker start [container_id]

o Stopping: docker stop [container_id]

o Restarting: docker restart [container_id]

o Description: These commands manage the running state of a container, allowing you to start, stop,
or restart it as needed.

6. Container Monitoring:

o Command: docker logs [container_id]

o Description: This command retrieves logs from a running or stopped container, helping in
monitoring and debugging.

7. Container Interaction:

o Command: docker exec -it [container_id] /bin/bash

o Description: This command allows you to run commands inside a running container interactively.

8. Container Stopping:

o Command: docker stop [container_id]

o Description: This stops a running container gracefully.

9. Container Removal:

o Command: docker rm [container_id]

o Description: This command removes a stopped container from the system.


10. Image Removal:

o Command: docker rmi [image_id]

o Description: This removes an image from the local system.

11. Container Cleanup:

o Command: docker system prune

o Description: This command removes all stopped containers, unused networks, dangling images,
and build cache, helping to free up disk space.

There are several advantages to containerized machine learning applications, Including:

 Increased portability and flexibility

Containers can be moved between different environments, making testing and deploying machine learning
applications in various settings easy.

 Improved resource utilization

Containers allow for more efficient use of resources, like multiple applications running on a single server or cluster
of servers.

 Isolation and security

Containers isolate applications from each other and the underlying operating system, providing an additional layer
of security.

 Reduced development and deployment time

Containers are quick to create and deploy, making it possible to iterate rapidly on machine learning applications.

4.4 Kubernetes and its architecture

Kubernetes is an open-source platform that helps you manage, scale, and deploy containerized applications.
Originally developed by Google, it provides a framework to run distributed systems resiliently. Key features of
Kubernetes include:

1. Container Management: Kubernetes can manage containers, ensuring they are running, healthy, and
properly configured.
2. Scaling and Load Balancing: It can automatically scale applications up or down based on demand and
distribute network traffic to maintain performance.
3. Self-Healing: Kubernetes automatically replaces or restarts containers that fail or become unresponsive.
4. Service Discovery and Load Balancing: It can expose a container to the internet or other containers using
a single DNS name or IP address.
5. Automated Rollouts and Rollbacks: Kubernetes can manage the deployment of new versions of
applications, allowing for smooth updates and the ability to revert if issues arise.
6. Configuration Management: It allows for the management of configuration settings, secrets, and
environment variables separately from the application code.

Architecture of kubernetes
 Kubernetes follows a master-slave architecture. Master Node: The master node is the control plane of
Kubernetes. It makes global decisions about the cluster (like scheduling), and it detects and responds to
cluster events (like starting up a new pod when a deployment’s replicas field is unsatisfied).

 Worker Nodes: Worker nodes are the machines where your applications run. Each worker node runs at
least:
- Kubelet is a process responsible for communication between the Kubernetes Master and the node; it
manages the pods and the containers running on a machine.
- A container runtime (like Docker, rkt), is responsible for pulling the container image from a registry,
unpacking the container, and running the application.

The master node communicates with worker nodes and schedules pods to run on specific nodes.

Here are the main components of Kubernetes:

[1] Pods: A Pod is the smallest and simplest unit in the Kubernetes object model that you create or deploy. A
Pod represents a running process on your cluster and can contain one or more containers.. It encapsulates
one or more tightly coupled containers that share the same network namespace, storage, and configuration.
Pods represent a single instance of a running process in your cluster.

Key Features of Kubernetes Pods

a) Multi-Container Support:
o A pod can host multiple containers that work together as a single unit.
o Commonly used for scenarios like a primary application container and a sidecar container (e.g., logging or
monitoring).
b) Shared Resources:
o Network: Containers within a pod share the same IP address and network namespace, allowing them to
communicate via localhost.
o Storage: Pods can share volumes for persistent or ephemeral storage needs.
c) Ephemeral Nature:
o Pods are designed to be short-lived and are replaced if they fail. For long-term persistence, higher-level
resources like Deployments or StatefulSets manage pods.

Structure of a Pod

A pod consists of:

a) Containers:
o The main applications and supporting processes.
b) Storage Volumes:
o Shared storage for data persistence.
c) Networking:
o Shared network namespace for communication.
d) Configuration:
o Environment variables, secrets, and configuration maps injected into the pod.

Pod Lifecycle

a) Pending:
o The pod has been created but not yet scheduled on a node.
b) Running:
o The pod has been scheduled and at least one container is running.
c) Succeeded:
o All containers in the pod have completed successfully.
d) Failed:
o At least one container in the pod has terminated with a failure.
e) Unknown:
o The pod's state cannot be determined, typically due to communication issues with the node.

[2] Services: A Kubernetes Service is an abstraction that defines a logical set of Pods and a policy by which to
access them - sometimes called a micro-service.

[3] Volumes: A Volume is essentially a directory accessible to all containers running in a pod. It can be used
to store data and the state of applications.

[4] Namespaces: Namespaces are a way to divide cluster resources between multiple users. They provide a
scope for names and can be used to divide cluster resources between multiple users.

[5] Deployments: A Deployment controller provides declarative updates for Pods and ReplicaSets. You
describe a desired state in a Deployment, and the Deployment controller changes the actual state to the
desired state at a controlled rate.

A) Master Components
In Kubernetes, the master components make global decisions about the cluster, and they detect and respond to
cluster events. Let’s discuss each of these components in detail.
API Server

The API Server is the front end of the Kubernetes control plane. It exposes the Kubernetes API, which is used by
external users to perform operations on the cluster. The API Server processes REST operations validates them, and
updates the corresponding objects in etcd.

etcd

etcd is a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data. It’s a
database that stores the configuration information of the Kubernetes cluster, representing the state of the cluster at
any given point of time. If any part of the cluster changes, etcd gets updated with the new state.

Scheduler

The Scheduler is a component of the Kubernetes master that is responsible for selecting the best node for the pod to
run on. When a pod is created, the scheduler decides which node to run it on based on resource availability,
constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.

Controller Manager

The Controller Manager is a daemon that embeds the core control loops shipped with Kubernetes. In other words, it
regulates the state of the cluster and performs routine tasks to maintain the desired state. For example, if a pod goes
down, the Controller Manager will notice this and start a new pod to maintain the desired number of pods.

B) Node Components
Kubernetes worker nodes host the pods that are the components of the application workload. The key components of
a worker node include the Kubelet, the main Kubernetes agent on the node, the Kube-proxy, the network proxy, and
the container runtime, which runs the containers. Let’s discuss them in detail.

Kubelet

Kubelet is the primary "node agent" that runs on each node. Its main job is to ensure that containers are running in a
Pod. It watches for instructions from the Kubernetes Control Plane (the master components) and ensures the
containers described in those instructions are running and healthy.

The Kubelet takes a set of PodSpecs (which are YAML or JSON files describing a pod) and ensures that the
containers described in those PodSpecs are running and healthy.

Kube-proxy

Kube-proxy is a network proxy that runs on each node in the cluster, implementing part of the Kubernetes Service
concept. It maintains network rules that allow network communication to your Pods from network sessions inside or
outside of your cluster.
Kube-proxy ensures that the networking environment (routing and forwarding) is predictable and accessible, but
isolated where necessary.

Container Runtime

Container runtime is the software responsible for running containers. Kubernetes supports several container
runtimes, including Docker, containerd, CRI-O, and any implementation of the Kubernetes CRI (Container Runtime
Interface). Each runtime offers different features, but all must be able to run containers according to a specification
provided by Kubernetes.

4.5 Microservices

Microservices is an architectural style where a large application is broken down into smaller, independent, and
loosely coupled services, each responsible for a specific business function or capability. These services
communicate with each other via APIs or messaging protocols and are typically developed, deployed, and
maintained independently.

Key Characteristics of Microservices

1. Independent Deployment: Each microservice can be developed, deployed, and scaled independently,
without affecting the rest of the system.
2. Domain-Driven Design (DDD): Microservices are often built around business domains, where each
service handles a specific business function (e.g., user management, payment processing).
3. Decentralized Data Management: Each service manages its own database or data storage, ensuring that
data is isolated and avoids cross-service dependencies.
4. Loose Coupling: Microservices communicate with each other through well-defined APIs, typically using
HTTP/REST or messaging queues, and can be changed independently without disrupting other services.
5. Technology Agnostic: Microservices can be written in different programming languages and use different
technologies that best suit their individual needs.
6. Fault Isolation: Since services are independent, failures in one service do not directly affect others, making
the system more resilient.
7. Scalability:
Individual services can be scaled independently based on demand, leading to better resource optimization.

Advantages of Microservices

1. Flexibility and Agility:


o Teams can develop, test, and deploy microservices independently, promoting faster delivery and
iteration.
o Enables easier updates and modification of specific services without needing to redeploy the entire
system.
2. Scalability:
o Each microservice can be scaled independently, which means resources can be allocated based on
the specific needs of each service.
3. Resilience:
o Fault isolation means that failures in one service are less likely to impact others, leading to a more
robust application overall.
4. Technology Diversity:
o Microservices can use the best-suited technology stack for each service. For example, one service
might use Java, while another uses Python, and a third uses Node.js.
5. Easier Maintenance:
o Smaller, more focused codebases are easier to understand, maintain, and update.
o Helps development teams focus on specific areas of the system.
6. DevOps and Continuous Delivery:
o Microservices work well with modern DevOps practices, enabling continuous integration,
automated testing, and continuous deployment pipelines.

Challenges of Microservices

1. Complexity:
o Managing multiple microservices can introduce significant complexity in terms of deployment,
communication, and monitoring.
o Service discovery, inter-service communication, and data consistency are non-trivial problems.
2. Distributed Systems Issues:
o Network latency, message passing, and fault tolerance become critical issues in distributed
architectures.
o Microservices are prone to issues such as network congestion, time synchronization, and data
consistency.
3. Data Management:
o Each service typically manages its own database, which can lead to challenges with data
consistency, transactions, and inter-service queries.
4. Testing:
o Testing in a microservices environment can be complex because you need to test each individual
service as well as interactions between services.
5. Monitoring and Debugging:
o With multiple services running independently, logging, monitoring, and tracing become more
challenging, especially when trying to understand end-to-end workflows.
6. Overhead:
o Each service introduces its own overhead in terms of network communication, versioning, and
API maintenance.

4.6 Orchestration

Orchestration refers to the automated coordination, management, and arrangement of complex systems,
applications, or services. It ensures that various components in a system work together seamlessly to achieve
specific objectives. In the context of IT and software development, orchestration is often used to manage
containerized applications, workflows, or cloud resources.

Key Characteristics of Orchestration

1. Automation:
o Eliminates manual intervention by automating processes, reducing errors, and improving
efficiency.
2. Coordination:
o Ensures that interdependent tasks are executed in the correct sequence and at the right time.
3. Scalability:
o Allows systems to adapt to changing workloads by dynamically scaling resources up or down.
4. Visibility:
o Provides insights into system performance and workflows through monitoring and logging.

Types of Orchestration
1. Container Orchestration:
o Manages containerized applications across a cluster of machines.
o Example tools: Kubernetes, Docker Swarm, Apache Mesos.
2. Cloud Orchestration:
o Automates the management of cloud resources like VMs, storage, and networking.
o Example tools: AWS CloudFormation, Azure Resource Manager, Terraform.
3. Workflow Orchestration:
o Coordinates the execution of tasks in business or data workflows.
o Example tools: Apache Airflow, Apache NiFi, Argo Workflows.
4. Service Orchestration:
o Coordinates multiple microservices to provide a unified application experience.
o Example tools: Istio, Consul, Linkerd.

Benefits of Orchestration

1. Efficiency:
o Automates repetitive tasks, saving time and reducing errors.
2. Scalability:
o Handles increased workloads without manual intervention.
3. Cost Optimization:
o Ensures optimal resource utilization, reducing costs.
4. Improved Collaboration:
o Provides a unified framework for teams to manage complex systems.
5. Resilience:
o Enhances system reliability through automated recovery and scaling.

Challenges of Orchestration

1. Complexity:
o Orchestrating multiple components can be challenging in large-scale systems.
2. Tooling:
o Choosing the right orchestration tool for specific use cases requires careful consideration.
3. Security:
o Ensuring secure communication and compliance in orchestrated systems is critical.
4. Monitoring and Debugging:
o Debugging orchestrated workflows and maintaining visibility can be difficult.

---------------------------------------------------------------------------------------------------------------------

Some Important Questions

 Explain Infrastructure as Code (IaC) in detail


 Write a short note on AzureDevOPS
 What is Containerization? Explain the lifecycle on containers with its commands
 Write a short note on Docker container
 What is Kubernetes? Explain its architectural components in brief with diagram
 Write a short note on Kubernetes POD
 Write a short note on Microservices
 Write a short note on Orchestration

You might also like