0% found this document useful (0 votes)

38 views11 pages

Machine Learning Life Cycle

The machine learning life cycle consists of seven key steps: Gathering Data, Data Preparation, Data Wrangling, Data Analysis, Model Training, Model Testing, and Deployment. Each step is crucial for building an effective machine learning model, from collecting and cleaning data to training and evaluating the model before deploying it in real-world applications. Understanding the problem and the data is essential for achieving accurate predictions and successful outcomes.

Uploaded by

kuriaaustine125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

Machine Learning Life Cycle

Uploaded by

kuriaaustine125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Machine learning Life cycle

Machine learning has given the computer systems the abilities to automatically learn without
being explicitly programmed. But how does a machine learning system work? So, it can be
described using the life cycle of machine learning. Machine learning life cycle is a cyclic process
to build an efficient machine learning project. The main purpose of the life cycle is to find a
solution to the problem or project.
Machine learning life cycle involves seven major steps, which are given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand the problem and to know
the purpose of the problem. Therefore, before starting the life cycle, we need to understand the
problem because the good result depends on the better understanding of the problem.
In the complete life cycle process, to solve a problem, we create a machine learning system
called "model", and this model is created by providing "training". But to train a model, we need
data, hence, life cycle starts by collecting data.
PlayNext
Mute
Current Time 0:38
/
Duration 18:10
Loaded: 9.17%

1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to
identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from various
sources such as files, database, internet, or mobile devices. It is one of the most important
steps of the life cycle. The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate will be the prediction.
This step includes the below tasks:
o Identify various data sources
o Collect data
o Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also called as a dataset. It will be
used in further steps.

2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step
where we put our data into a suitable place and prepare it to use in our machine learning
training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:
o Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format. It is the
process of cleaning the data, selecting the variable to use, and transforming the data in a proper
format to make it more suitable for analysis in the next step. It is one of the most important
steps of the complete process. Cleaning of data is required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the data may not
be useful. In real-world applications, collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
So, we use various filtering techniques to clean the data.
It is mandatory to detect and remove the above issues because it can negatively affect the
quality of the outcome.

4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
o Selection of analytical techniques
o Building models
o Review the result
The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome. It starts with the determination of the type of
the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model using
prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the model.

5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a model
is required so that it can understand the various patterns, rules, and, features.

6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the model.
In this step, we check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement of
project or problem.
7. Deployment
Advertisement
The last step of machine learning life cycle is deployment, where we deploy the model in the
real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not. The
deployment phase is similar to making the final report for a project.
1. Import Required Libraries
python
Copy code
# Basic libraries
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
roc_auc_score, roc_curve

2. Load and Inspect the Data

python
Copy code
# Load the dataset
data = pd.read_csv('your_dataset.csv')

# Preview the first few rows

print(data.head())

# Check for missing values

print(data.isnull().sum())

# Check data types and basic statistics

print(data.info())
print(data.describe())

3. Data Preprocessing
Handle Missing Values
python
Copy code
# Drop rows with missing target values (if any)
data.dropna(subset=['target_column'], inplace=True)

# Fill or impute missing values in other columns

data.fillna(data.mean(), inplace=True) # Example: fill with column mean
Encode Categorical Variables
python
Copy code
# Convert categorical variables to numeric values
for col in data.select_dtypes(include=['object']).columns:
data[col] = LabelEncoder().fit_transform(data[col])
Feature Scaling
python
Copy code
# Separate features and target variable
X = data.drop('target_column', axis=1)
y = data['target_column']

# Standardize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

4. Split Data into Training and Testing Sets

python
Copy code
# Split data (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

5. Model Selection and Training

Initialize and Train a Basic Model
python
Copy code
# Choose a model (e.g., Random Forest)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
Evaluate Initial Model Performance
python
Copy code
# Predict on test data
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Classification report and confusion matrix

print(classification_report(y_test, y_pred))
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.show()
ROC Curve and AUC Score
python
Copy code
# Calculate ROC-AUC score
y_pred_proba = model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred_proba)
print(f'ROC-AUC Score: {roc_auc:.2f}')

# Plot ROC curve

fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.show()

6. Model Improvement
Hyperparameter Tuning with Grid Search
python
Copy code
# Set up parameter grid
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [None, 10, 20, 30],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}

# Initialize GridSearchCV with cross-validation

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5,
scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Best parameters and best score

print(f'Best parameters: {grid_search.best_params_}')
print(f'Best cross-validation accuracy: {grid_search.best_score_:.2f}')
Train Model with Best Parameters
python
Copy code
# Use best parameters from grid search
best_model = grid_search.best_estimator_
best_model.fit(X_train, y_train)

# Evaluate the optimized model

y_pred_best = best_model.predict(X_test)
accuracy_best = accuracy_score(y_test, y_pred_best)
print(f'Optimized Model Accuracy: {accuracy_best:.2f}')

# Updated classification report and confusion matrix

print(classification_report(y_test, y_pred_best))
conf_matrix_best = confusion_matrix(y_test, y_pred_best)
sns.heatmap(conf_matrix_best, annot=True, fmt='d', cmap='Greens')
plt.show()

7. Feature Importance Analysis

python
Copy code
# Feature importance
feature_importances = pd.Series(best_model.feature_importances_, index=data.columns[:-1])
feature_importances.sort_values().plot(kind='barh', title='Feature Importance')
plt.show()

8. Cross-Validation and Final Model Evaluation

python
Copy code
# Cross-validation accuracy
cv_scores = cross_val_score(best_model, X, y, cv=10, scoring='accuracy')
print(f'Cross-validated Accuracy: {np.mean(cv_scores):.2f} ± {np.std(cv_scores):.2f}')

9. Save the Model for Future Use

python
Copy code
import joblib

# Save the trained model

joblib.dump(best_model, 'final_model.pkl')
# Load the model later with
# model = joblib.load('final_model.pkl')

Machine Learning Life Cycle - Chap2
No ratings yet
Machine Learning Life Cycle - Chap2
12 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
30 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
ML 5units
No ratings yet
ML 5units
284 pages
Machine Learning
No ratings yet
Machine Learning
84 pages
Machine Learning
No ratings yet
Machine Learning
116 pages
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
No ratings yet
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
91 pages
Cse3001 Ai ML m2
No ratings yet
Cse3001 Ai ML m2
118 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
Social Media Analytics Techniques
No ratings yet
Social Media Analytics Techniques
77 pages
ML Life Cycle
No ratings yet
ML Life Cycle
4 pages
10 Machine Learning
No ratings yet
10 Machine Learning
9 pages
ML MU Unit 1 Introduction To MLPDF 2025 02 07 10 53 02
No ratings yet
ML MU Unit 1 Introduction To MLPDF 2025 02 07 10 53 02
49 pages
Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
Unit 1
No ratings yet
Unit 1
21 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Machine Learning 3
No ratings yet
Machine Learning 3
30 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Session 4 Machine Learning Process
No ratings yet
Session 4 Machine Learning Process
28 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
72 pages
ML Da
No ratings yet
ML Da
55 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
Case Study - Churn Mdel Prediction
No ratings yet
Case Study - Churn Mdel Prediction
77 pages
Unit III
No ratings yet
Unit III
19 pages
Unit 1
No ratings yet
Unit 1
41 pages
Describe Machine Learning Lifecycle
No ratings yet
Describe Machine Learning Lifecycle
4 pages
Unit 1 ML PDF
No ratings yet
Unit 1 ML PDF
19 pages
Manual Data
No ratings yet
Manual Data
13 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
26 pages
ML Unit 1
No ratings yet
ML Unit 1
22 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
TIS - Intro To Machine Learning
No ratings yet
TIS - Intro To Machine Learning
18 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
MLE
No ratings yet
MLE
15 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Unit 3
No ratings yet
Unit 3
13 pages
ML Notion 1
No ratings yet
ML Notion 1
18 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
ML Question Answer
No ratings yet
ML Question Answer
4 pages
Shanthi ML
No ratings yet
Shanthi ML
26 pages
Deep Learning
No ratings yet
Deep Learning
25 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
English Paper 2 - KCSE 2024 Past Papers Questions and Answers-3333
No ratings yet
English Paper 2 - KCSE 2024 Past Papers Questions and Answers-3333
8 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
ML
No ratings yet
ML
19 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
4 pages
Machine Learning 1
No ratings yet
Machine Learning 1
34 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Unit 1
No ratings yet
Unit 1
32 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
M3-M4-Understanding of Data
No ratings yet
M3-M4-Understanding of Data
16 pages
Answer 2023-24
No ratings yet
Answer 2023-24
19 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
Practical5 GIStraining
No ratings yet
Practical5 GIStraining
30 pages
Fundamentals of Intelligent Software Systems
No ratings yet
Fundamentals of Intelligent Software Systems
72 pages
Audit Answer
No ratings yet
Audit Answer
76 pages
Sampling: Final and Initial Sample Size Determination True/False Questions
No ratings yet
Sampling: Final and Initial Sample Size Determination True/False Questions
13 pages
Personal Loan Campaign Final
No ratings yet
Personal Loan Campaign Final
12 pages
Lesson 1 Introduction To Embedded Systems Development
No ratings yet
Lesson 1 Introduction To Embedded Systems Development
19 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Reasoning in AI
No ratings yet
Reasoning in AI
3 pages
Writing Notes-240
No ratings yet
Writing Notes-240
31 pages
Parametric Test
No ratings yet
Parametric Test
28 pages
Future Trends
No ratings yet
Future Trends
41 pages
Tutorials + Solutions
No ratings yet
Tutorials + Solutions
21 pages
Audit Report For System Development and Change Management
No ratings yet
Audit Report For System Development and Change Management
1 page
Embedded Systems-Sm
No ratings yet
Embedded Systems-Sm
26 pages
Lesson 7 PC Enablers
No ratings yet
Lesson 7 PC Enablers
19 pages
Cre P2 MS
No ratings yet
Cre P2 MS
10 pages
Review Qns
No ratings yet
Review Qns
16 pages
Eng P1 MS
No ratings yet
Eng P1 MS
9 pages
08 Parametric Tests
100% (1)
08 Parametric Tests
129 pages
Gis Cat Two
No ratings yet
Gis Cat Two
7 pages
BST 2022 PP1 MS
No ratings yet
BST 2022 PP1 MS
7 pages
Lecture 6a Standard Hardening Mechanisms
No ratings yet
Lecture 6a Standard Hardening Mechanisms
6 pages
Import Seaborn As Sns
No ratings yet
Import Seaborn As Sns
27 pages
Practice Worksheet
100% (1)
Practice Worksheet
4 pages
BBM Assignments
No ratings yet
BBM Assignments
13 pages
4.49tybsc It
0% (2)
4.49tybsc It
12 pages
Bus Stats Ch14 PDF
No ratings yet
Bus Stats Ch14 PDF
77 pages
Simple Mathematics in Psychological Research
No ratings yet
Simple Mathematics in Psychological Research
22 pages
Journal Article On Racial Discrimination
No ratings yet
Journal Article On Racial Discrimination
8 pages
2071 TC2AILab5
No ratings yet
2071 TC2AILab5
6 pages
Lecture 6 - NHST and Assumptions Testing
No ratings yet
Lecture 6 - NHST and Assumptions Testing
50 pages
8614.educational Statitics Unit 8
No ratings yet
8614.educational Statitics Unit 8
28 pages
AP 7.1 Guided Notes For Reading Textbook
No ratings yet
AP 7.1 Guided Notes For Reading Textbook
6 pages
6 Anova
No ratings yet
6 Anova
44 pages
A Multivariate Ari1399 PDF
No ratings yet
A Multivariate Ari1399 PDF
14 pages
The Development of Assessment Instrument For Elementary School Student Painting
No ratings yet
The Development of Assessment Instrument For Elementary School Student Painting
20 pages
Dyslexia Treatment Studies A Systematic Review and
No ratings yet
Dyslexia Treatment Studies A Systematic Review and
19 pages
Mini Tabl CG&CGK
No ratings yet
Mini Tabl CG&CGK
2 pages
Midterm So Ls
No ratings yet
Midterm So Ls
9 pages
Fourth Quarter Examination Statistics & Probability: The Learning Park of Com Val
No ratings yet
Fourth Quarter Examination Statistics & Probability: The Learning Park of Com Val
3 pages
Dsbda Insem
No ratings yet
Dsbda Insem
1 page
Worksheet 3 For Eng
No ratings yet
Worksheet 3 For Eng
2 pages
Business Analytics S3 MBA May 2022 (S)
No ratings yet
Business Analytics S3 MBA May 2022 (S)
2 pages
SQQS1013 Assignment 2 A221
No ratings yet
SQQS1013 Assignment 2 A221
2 pages
Overview of Hypothesis Testing Analysis
No ratings yet
Overview of Hypothesis Testing Analysis
3 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet