Session 4 Machine Learning Process

This document outlines the machine learning process, detailing the steps involved in developing a machine learning model, including problem definition, data gathering, preparation, analysis, feature engineering, model training, evaluation, and deployment. It emphasizes the importance of systematic development and best practices to enhance model performance. Additionally, it includes an assignment to describe various machine learning processes and compare them with data mining processes.

Uploaded by

owekesa361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views28 pages

Session 4 Machine Learning Process

Uploaded by

owekesa361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Session 4

Machine Learning Process

Learning Outcomes
• By the end of this lecture, you will be able to:

• Understand the process of developing a machine

learning model.
• Identify and explain each step in the machine learning
life cycle.
• Apply the machine learning life cycle to real-world
examples.
• Recognize common challenges and best practices in
each phase of the cycle.
Machine learning overview
• Machine learning is a subset of artificial intelligence (AI).
• Trains computers to mimic human thinking.
• Utilizes real-world data for training.
• It follows predefined steps to train computer
• This process is known as a machine learning lifecycle.
Steps in the Machine Learning Process
• Guides the development and deployment of machine
learning models.
• It’s a Structured process with various steps.
• Understanding the life cycle ensures:
• systematic development and deployment,
• improves efficiency, and
• enhances model performance.
Steps in the Machine Learning Process
• Prior to starting the process, you need toClearly define the
problem you aim to solve Problem Definition

Example: Predicting customer churn for a telecom

company [problem].
• Key Considerations: Business objectives, success metrics,
feasibility.
Step 1: Gathering Data
• Identify Data Sources
• Recognize where data can be collected from.
• Examples: Files, databases, internet, mobile devices.
• Collect Data
• Gather data from identified sources.
• Ensure data is relevant and comprehensive.
• Integrate Data
• Combine data from different sources.
• Create a coherent and unified dataset.
• Outcome
• Readytouse dataset for further processing.
Step 2: Data Preparation
• Raw data, is often messy and unstructured.
• Data cleaning involves addressing issues such as missing
values, outliers, and inconsistencies that could compromise the
accuracy and reliability of the machine learning model.
Objective
• Refine raw data for meaningful analysis.
• Lay the foundation for robust model development.

• The basic features of Data Cleaning and Preprocessing are

discussed next:
Step 2: Data Preparation
Data Cleaning
• Address missing values.
• Handle outliers.
• Resolve inconsistencies.
Data Preprocessing
• Standardize formats.
• Scale values.
• Encode categorical variables.
Step 2: Data Preparation
Data Quality
• Ensure well-organized data.
• Prepare for meaningful analysis.
Data Integrity
• Maintain dataset integrity.
• Effective cleaning and preprocessing.
Step 3: Data Wrangling
• The process of cleaning and converting raw data into a
useable format.
• It is the process of cleaning the data, selecting the
variable to use, and transforming the data in a proper
format to make it more suitable for analysis in the next
step.
• Cleaning of data is required to address the quality issues.
Step 3: Data Wrangling
• In real-world applications, collected data may have
various issues, including:
Missing Values
Duplicate data
Invalid data
Noise (irrelevant or meaningless data)
• So, we use various filtering techniques to clean the data.
• It is mandatory to detect and remove the above issues
because it can negatively affect the quality of the
outcome.
Step 4: Analyze Data
• Also called “Exploratory Data Analysis (EDA) ”
• Understanding the underlying patterns and characteristics
of collected data.
• Leveraging statistical and visual tools to gain insights into
the dataset’s structure.
• Visualizations, summary statistics, and correlation
analyses play crucial role.
• Example of data visualization (e.g., histogram, scatter
plot).
Step 4: Analyze Data
• Exploration: Use statistical and visual tools to explore the
structure and patterns in the data.
• Patterns and Trends: Identify underlying patterns, trends,
and potential challenges within the dataset.
• Insights: Gain valuable insights to inform decisions in later
stages of the machine learning process.
• Decision Making: Use exploratory data analysis to make
informed decisions about feature engineering and model
selection.
Step 5: Feature Engineering and
Selection
• Feature Selection: Identify the subset of features that most
significantly impact the model’s performance.
• Feature Engineering: Create new features or transform
existing ones to better capture patterns and relationships.
• Requires domain expertise and a deep understanding of
the problem
• Aim is o engineer features that contribute meaningfully to
predictive power.
• Optimization: Balance feature set for predictive accuracy
while minimizing computational complexity.
Step 5: Feature Engineering and
Selection - Example using Python
Problem: to predict the `price` of houses using the available
features.
Dataset :Assume we have a dataset `house_data.csv` with the
following columns:
• house_id
• size_in_sqft
• num_bedrooms
• num_bathrooms
• location
• year_built
• price
Step 5: Feature Engineering and
Selection – Example using Python
Loading the Data:
Step 5: Feature Engineering and
Selection – Example using Python
Exploring the Data :
Step 5: Feature Engineering and
Selection – Example using Python
Handling Missing Values :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Total Rooms: Create a new feature by adding the number
of bedrooms and bathrooms :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Age of House: Create a new feature representing the age
of the house :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Age of House: Create a new feature representing the age
of the house :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Location Encoding: Convert categorical data into
numerical data. :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Selection
• Drop less relevant or redundant features :
Step 6: Train Model
• Split the dataset into training and testing
Training Set: Used to train the model.
Testing Set: Used to evaluate the model.
• Select an appropriate machine learning algorithm
Regression: Linear Regression, Ridge, Lasso, etc.
Classification: Logistic Regression, Decision Trees, Random Forest,
SVM, etc.
Clustering: K-Means, Hierarchical Clustering, etc.
• Train the model
Step 7: Model Evaluation
• Test the model to determine the percentage accuracy of
the model.
• Involves rigorous testing against validation datasets.
• Evaluation metrics such as accuracy, precision, recall, and
F1 score are computed to gauge its effectiveness.
• Provides insights into the model’s strengths and
weaknesses.
Step 7: Model Deployment
• We deploy the model in the real-world system.
• The deployment phase is similar to making the final report
for a project.
Next Steps
1. Install Python compatible IDE (Integrated Development
Environment).
2. Install Weka Machine Learning Environment
Assignment:
1. Describe the following machine learning processes:
a. CRISP-DM
b. SEMMA
c. KDD
(6 marks)
2. Identify the key differences and similarities among the
data miming (KDD) and machine learning (CRISP-DM,
SEMMA) processes? (4 marks)
Submit by: 19/05/2025 (hard copy)

Flow Diagram of Machine Learning or Life Cycle of Machine Learning
No ratings yet
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
91 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
ML Life Cycle
No ratings yet
ML Life Cycle
4 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
30 pages
Machine Learning-1
No ratings yet
Machine Learning-1
64 pages
Ai 900 Questions
No ratings yet
Ai 900 Questions
57 pages
Project Proposal Machine Learning
No ratings yet
Project Proposal Machine Learning
6 pages
Machine Learning
No ratings yet
Machine Learning
116 pages
10 Machine Learning
No ratings yet
10 Machine Learning
9 pages
Machine Learning Introduction
100% (1)
Machine Learning Introduction
20 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
Unit 7 ML
No ratings yet
Unit 7 ML
33 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML Workflow Steps: Step 2: Building Dataset
No ratings yet
ML Workflow Steps: Step 2: Building Dataset
5 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
Machine Learning
No ratings yet
Machine Learning
84 pages
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
No ratings yet
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
69 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
ML Da
No ratings yet
ML Da
55 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Capstone Overview
No ratings yet
Capstone Overview
58 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
Basant VT
No ratings yet
Basant VT
36 pages
Case Study - Churn Mdel Prediction
No ratings yet
Case Study - Churn Mdel Prediction
77 pages
Shwet Mlds
No ratings yet
Shwet Mlds
35 pages
Project File On Cognifyz
100% (1)
Project File On Cognifyz
45 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
How To Prepare Data For Machine Learning
No ratings yet
How To Prepare Data For Machine Learning
34 pages
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
No ratings yet
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
21 pages
Current Trends in Software
No ratings yet
Current Trends in Software
26 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Unit 1
No ratings yet
Unit 1
41 pages
Week 3 A
No ratings yet
Week 3 A
18 pages
Unit 1
No ratings yet
Unit 1
32 pages
الفصل ١
No ratings yet
الفصل ١
15 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
ML Notion 1
No ratings yet
ML Notion 1
18 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Manual Data
No ratings yet
Manual Data
13 pages
Major Project
No ratings yet
Major Project
20 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
22 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
EXAMPLE ML in Real Life
No ratings yet
EXAMPLE ML in Real Life
6 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Final 1
No ratings yet
Final 1
6 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
How To Apply ML
No ratings yet
How To Apply ML
4 pages
Synopsis Diabetic Retinopathy
No ratings yet
Synopsis Diabetic Retinopathy
23 pages
Session 3 Types of Machine Learning
No ratings yet
Session 3 Types of Machine Learning
22 pages
AI-Powered DevOps
No ratings yet
AI-Powered DevOps
7 pages
Mlops Report
No ratings yet
Mlops Report
17 pages
Ai and Machine Learning in Software Development
No ratings yet
Ai and Machine Learning in Software Development
25 pages
Customer Churn Prediction Using Machine Learning Algorithms
No ratings yet
Customer Churn Prediction Using Machine Learning Algorithms
6 pages
Ddu Project
No ratings yet
Ddu Project
13 pages
Mining Sessions
No ratings yet
Mining Sessions
756 pages
Project
No ratings yet
Project
13 pages
Topic 3 Introduction To ARENA
No ratings yet
Topic 3 Introduction To ARENA
96 pages
Short Code Application Form
No ratings yet
Short Code Application Form
3 pages
Comparative Analysis of Traditional and AI-based D
No ratings yet
Comparative Analysis of Traditional and AI-based D
24 pages
Lesson 1 Web App Web Services
No ratings yet
Lesson 1 Web App Web Services
35 pages
Data Science Notes - Hamza
No ratings yet
Data Science Notes - Hamza
110 pages
TS Ananya Technical Seminar
No ratings yet
TS Ananya Technical Seminar
19 pages
Lesson 6 PHP MYSQL CRUD
No ratings yet
Lesson 6 PHP MYSQL CRUD
13 pages
Ieee Icdcece 2025
No ratings yet
Ieee Icdcece 2025
6 pages
BatteryML Paper
No ratings yet
BatteryML Paper
22 pages
Final Report
No ratings yet
Final Report
42 pages
Higher Education Loans Board
No ratings yet
Higher Education Loans Board
4 pages
1 s2.0 S1746809424011388 Main
No ratings yet
1 s2.0 S1746809424011388 Main
19 pages
CV Yash
No ratings yet
CV Yash
2 pages
Conference Latex Template 10 17 19
No ratings yet
Conference Latex Template 10 17 19
24 pages
JETIR2504A41
No ratings yet
JETIR2504A41
7 pages
AMA 4417FUNCTIONAL ANALYSIS Course Outline
No ratings yet
AMA 4417FUNCTIONAL ANALYSIS Course Outline
3 pages
Handout - Leveraging Ai Superior PV Energy Predictions Slides PDF
No ratings yet
Handout - Leveraging Ai Superior PV Energy Predictions Slides PDF
37 pages
Lesson 8 Intro To Laravel
No ratings yet
Lesson 8 Intro To Laravel
26 pages
Dataset Paper
No ratings yet
Dataset Paper
21 pages
Project 001 RPT
No ratings yet
Project 001 RPT
16 pages
BH Paper
No ratings yet
BH Paper
6 pages
Copy of PLP Standard Pitch Deck Template
No ratings yet
Copy of PLP Standard Pitch Deck Template
16 pages
AKBALIK Et Al 2024 Engine Fault Detection by Sound Analysis and Machine Learning
No ratings yet
AKBALIK Et Al 2024 Engine Fault Detection by Sound Analysis and Machine Learning
18 pages
Research Paper 4
No ratings yet
Research Paper 4
12 pages
Research Paper 2
No ratings yet
Research Paper 2
12 pages
Research Paper 5
No ratings yet
Research Paper 5
11 pages
Computer Vision Based Rice Leaf Disease Detection and Classification Using Multi Level Feature Extra
No ratings yet
Computer Vision Based Rice Leaf Disease Detection and Classification Using Multi Level Feature Extra
10 pages
Phase 3 Project
No ratings yet
Phase 3 Project
6 pages
Credit Card Fraud Detection Documentation
No ratings yet
Credit Card Fraud Detection Documentation
68 pages
20CS1107
No ratings yet
20CS1107
2 pages
Proposal
No ratings yet
Proposal
4 pages
AMA 4415 ALGEBRAIC GEOMETRY Course Outline
No ratings yet
AMA 4415 ALGEBRAIC GEOMETRY Course Outline
2 pages
Samuel's Resume
No ratings yet
Samuel's Resume
1 page
Touchpad Information Technology Class 9: Skill Education Based on Windows & OpenOffice Code (402)
From Everand
Touchpad Information Technology Class 9: Skill Education Based on Windows & OpenOffice Code (402)
Dr. Sanjay Jain
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet

Session 4 Machine Learning Process

Uploaded by

Session 4 Machine Learning Process

Uploaded by

Session 4

Machine Learning Process

• Understand the process of developing a machine

Example: Predicting customer churn for a telecom

• The basic features of Data Cleaning and Preprocessing are

You might also like