0% found this document useful (0 votes)
25 views61 pages

CI-CD Pipeline With Project Deployment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views61 pages

CI-CD Pipeline With Project Deployment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

CI/CD Pipeline

Trainer: Ms. Nidhi Grover Raheja


What is CI/ CD?
• A CI/CD pipeline (Continuous Integration/Continuous Delivery or
Continuous Deployment) is a series of automated processes that help
software development teams deliver high-quality software more
efficiently.
• The pipeline automates steps in the software delivery process,
allowing for continuous code integration, testing, and deployment.
• The components of a CI/CD pipeline work together to automate the
software delivery process, from code integration to deployment and
monitoring.
• Each component is critical in ensuring that software is delivered
quickly, reliably, and securely, allowing development teams to focus
on building features and improving the product.
Continuous Integration (CI)
Continuous Delivery (CD)
Purpose:
• Continuous Delivery is an extension of CI. It ensures that the software
can be reliably released at any time.
• The key difference from CI is that CD automates the release process
up to the point of deployment, but manual approval is still required
for the final deployment to production.
Continuous Delivery (CD) Process
Continuous Deployment (CD)
Purpose
• Continuous Deployment goes one step further than Continuous
Delivery.
• It automates the entire process, from integration to deployment,
without requiring manual approval for production releases.
• Every change that passes automated tests is automatically deployed
to production.
Continuous Deployment (CD) Process
Key Benefits of CI/CD Pipelines
Challenges
• Complexity: Setting up and maintaining a CI/CD pipeline can
be complex, especially for large projects.
• Infrastructure Costs: Continuous deployment may require
robust infrastructure and monitoring tools, increasing costs.
• Cultural Shift: Implementing CI/CD often requires a shift in
team culture towards more frequent integration and
deployment.
CI/CD Pipeline Components
• A CI/CD pipeline comprises several components that work together to
automate the software development process, from code integration to
deployment.
• Following is the list of each component:
1. Source Control (Version Control System)
2. Build Automation
3. Automated Testing
4. Artifact Repository
5. Continuous Integration (CI) Server
6. Configuration Management
7. Deployment Automation
8. Environment Provisioning (Infrastructure as Code)
9. Continuous Delivery/Continuous Deployment (CD)
10. Monitoring and Logging
CI/CD Pipeline for Machine Learning
• A CI/CD pipeline for machine learning (ML) involves additional steps
compared to a traditional software development pipeline because it
must handle unique aspects like data management, model training,
evaluation, and deployment.
• A CI/CD pipeline for machine learning ensures that models are
developed, tested, and deployed in a systematic, reproducible, and
automated manner.
• It helps maintain high-quality models in production by automating
the complex processes of data handling, model training, evaluation,
and deployment.
CI/CD Pipeline Components for Machine Learning
• Here’s a list of the components and stages of a CI/CD pipeline
specifically tailored for machine learning projects.
• Following is the list of each component:
1. Version Control for Code and Data
2. Data Validation
3. Feature Engineering and Data Preprocessing
4. Model Training
5. Model Evaluation
6. Hyperparameter Tuning
7. Model Versioning
8. Continuous Integration (CI) for ML
9. Model Deployment (Continuous Delivery/Continuous Deployment)
10. Model Retraining and Continuous Learning
Practical Example
• A step-by-step project guide to creating a CI/CD pipeline for a
machine learning project using Python.
• We’ll use the Iris dataset as an example.
• This project will cover setting up the project structure, writing the
necessary scripts, and creating the CI/CD pipeline using GitHub
Actions.
Step 1: Set Up the Project Structure

1. Create the Project Directory:


Open your terminal and create a new directory for your
project:

• C:\Windows\system32>c:
• C:\Windows\system32>cd\
• C:\>mkdir ml-ci-cd-pipeline
• C:\>cd ml-ci-cd-pipeline
➢ C:\ml-ci-cd-pipeline>mkdir data model tests
➢ C:\ml-ci-cd-pipeline>mkdir .github
➢ C:\ml-ci-cd-pipeline>cd .github
➢ C:\ml-ci-cd-pipeline\.github>mkdir workflows
➢ C:\ml-ci-cd-pipeline\.github>cd..
➢ C:\ml-ci-cd-pipeline>
• Add the following content to requirements.txt:
# Generate iris.csv
import pandas as pd
from sklearn.datasets import load_iris

# Load Iris dataset


iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Save to CSV
df.to_csv(r'C:\ml-ci-cd-pipeline\data\iris.csv', index=False)
import pandas as pd
Step 3: Write the Model Training Script from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
In train.py in the model/ directory, import joblib
open the train.py file and add the # Load the dataset
Following code: data = pd.read_csv('data/iris.csv')

# Preprocess the dataset


X = data.drop('species', axis=1)
y = data['species']

# Split the data into training and test sets


Note: X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
This script loads the Iris dataset, splits it into
training and testing sets, trains a Random # Train a RandomForest model
Forest model, and saves the trained model as model = RandomForestClassifier(n_estimators=100,
iris_model.pkl. random_state=42)
model.fit(X_train, y_train)

# Save the model


joblib.dump(model, 'model/iris_model.pkl')
import pandas as pd
Step 4: Write the Model Evaluation Script from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib
In the evaluate.py:
# Load the dataset
In the model/ directory, open evaluate.py data = pd.read_csv('data/iris.csv')
file and add the following code:
# Preprocess the dataset
X = data.drop('species', axis=1)
y = data['species']

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
Note:
# Load the saved model
This script loads the saved model, evaluates it model = joblib.load('model/iris_model.pkl')
on the test set, and prints the accuracy.
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f'Model accuracy: {accuracy:.2f}')
import unittest
Step 5: Write Unit Tests import joblib
from sklearn.ensemble import RandomForestClassifier
Open test_model.py:
In the tests/ directory, class TestModelTraining(unittest.TestCase):
open the test_model.py file and add the def test_model_training(self):
model = joblib.load('model/iris_model.pkl')
following code: self.assertIsInstance(model, RandomForestClassifier)

self.assertGreaterEqual(len(model.feature_importances_
), 4)

Note: if __name__ == '__main__':


unittest.main()
This test ensures that the trained model is
an instance of RandomForestClassifier and
that it has the correct number of features.
name: CI/CD Pipeline

on:
Step 6: Set Up the CI/CD Pipeline push:
branches:
Using GitHub Actions - main
pull_request:
Open the Workflow File: branches:
- main
In the .github/workflows/ jobs:

directory, open the build:


runs-on: ubuntu-latest

ci_cd_pipeline.yml file in a text steps:


- name: Checkout code
editor and add the following uses: actions/checkout@v2

content: - name: Set up Python


uses: actions/setup-python@v2
with:
python-version: '3.8'

- name: Install dependencies


run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Train Model


run: |
python model/train.py

- name: Evaluate Model


run: |
python model/evaluate.py

- name: Run Unit Tests


run: |
python -m unittest discover -s tests
Execute following commands on CMD:

echo __pycache__/ > .gitignore


echo .env >> .gitignore
Commit the staged files with a message:

git commit -m "Initial commit"


➢ git remote add origin https://github.com/NidhiGRaheja/ml-ci-cd-
pipeline.git
➢ git branch -M main
➢ git push -u origin main

You might also like