0% found this document useful (0 votes)
7 views28 pages

Session 4 Machine Learning Process

This document outlines the machine learning process, detailing the steps involved in developing a machine learning model, including problem definition, data gathering, preparation, analysis, feature engineering, model training, evaluation, and deployment. It emphasizes the importance of systematic development and best practices to enhance model performance. Additionally, it includes an assignment to describe various machine learning processes and compare them with data mining processes.

Uploaded by

owekesa361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views28 pages

Session 4 Machine Learning Process

This document outlines the machine learning process, detailing the steps involved in developing a machine learning model, including problem definition, data gathering, preparation, analysis, feature engineering, model training, evaluation, and deployment. It emphasizes the importance of systematic development and best practices to enhance model performance. Additionally, it includes an assignment to describe various machine learning processes and compare them with data mining processes.

Uploaded by

owekesa361
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Session 4

Machine Learning Process


Learning Outcomes
• By the end of this lecture, you will be able to:

• Understand the process of developing a machine


learning model.
• Identify and explain each step in the machine learning
life cycle.
• Apply the machine learning life cycle to real-world
examples.
• Recognize common challenges and best practices in
each phase of the cycle.
Machine learning overview
• Machine learning is a subset of artificial intelligence (AI).
• Trains computers to mimic human thinking.
• Utilizes real-world data for training.
• It follows predefined steps to train computer
• This process is known as a machine learning lifecycle.
Steps in the Machine Learning Process
• Guides the development and deployment of machine
learning models.
• It’s a Structured process with various steps.
• Understanding the life cycle ensures:
• systematic development and deployment,
• improves efficiency, and
• enhances model performance.
Steps in the Machine Learning Process
• Prior to starting the process, you need toClearly define the
problem you aim to solve Problem Definition

Example: Predicting customer churn for a telecom


company [problem].
• Key Considerations: Business objectives, success metrics,
feasibility.
Step 1: Gathering Data
• Identify Data Sources
• Recognize where data can be collected from.
• Examples: Files, databases, internet, mobile devices.
• Collect Data
• Gather data from identified sources.
• Ensure data is relevant and comprehensive.
• Integrate Data
• Combine data from different sources.
• Create a coherent and unified dataset.
• Outcome
• Readytouse dataset for further processing.
Step 2: Data Preparation
• Raw data, is often messy and unstructured.
• Data cleaning involves addressing issues such as missing
values, outliers, and inconsistencies that could compromise the
accuracy and reliability of the machine learning model.
Objective
• Refine raw data for meaningful analysis.
• Lay the foundation for robust model development.

• The basic features of Data Cleaning and Preprocessing are


discussed next:
Step 2: Data Preparation
Data Cleaning
• Address missing values.
• Handle outliers.
• Resolve inconsistencies.
Data Preprocessing
• Standardize formats.
• Scale values.
• Encode categorical variables.
Step 2: Data Preparation
Data Quality
• Ensure well-organized data.
• Prepare for meaningful analysis.
Data Integrity
• Maintain dataset integrity.
• Effective cleaning and preprocessing.
Step 3: Data Wrangling
• The process of cleaning and converting raw data into a
useable format.
• It is the process of cleaning the data, selecting the
variable to use, and transforming the data in a proper
format to make it more suitable for analysis in the next
step.
• Cleaning of data is required to address the quality issues.
Step 3: Data Wrangling
• In real-world applications, collected data may have
various issues, including:
Missing Values
Duplicate data
Invalid data
Noise (irrelevant or meaningless data)
• So, we use various filtering techniques to clean the data.
• It is mandatory to detect and remove the above issues
because it can negatively affect the quality of the
outcome.
Step 4: Analyze Data
• Also called “Exploratory Data Analysis (EDA) ”
• Understanding the underlying patterns and characteristics
of collected data.
• Leveraging statistical and visual tools to gain insights into
the dataset’s structure.
• Visualizations, summary statistics, and correlation
analyses play crucial role.
• Example of data visualization (e.g., histogram, scatter
plot).
Step 4: Analyze Data
• Exploration: Use statistical and visual tools to explore the
structure and patterns in the data.
• Patterns and Trends: Identify underlying patterns, trends,
and potential challenges within the dataset.
• Insights: Gain valuable insights to inform decisions in later
stages of the machine learning process.
• Decision Making: Use exploratory data analysis to make
informed decisions about feature engineering and model
selection.
Step 5: Feature Engineering and
Selection
• Feature Selection: Identify the subset of features that most
significantly impact the model’s performance.
• Feature Engineering: Create new features or transform
existing ones to better capture patterns and relationships.
• Requires domain expertise and a deep understanding of
the problem
• Aim is o engineer features that contribute meaningfully to
predictive power.
• Optimization: Balance feature set for predictive accuracy
while minimizing computational complexity.
Step 5: Feature Engineering and
Selection - Example using Python
Problem: to predict the `price` of houses using the available
features.
Dataset :Assume we have a dataset `house_data.csv` with the
following columns:
• house_id
• size_in_sqft
• num_bedrooms
• num_bathrooms
• location
• year_built
• price
Step 5: Feature Engineering and
Selection – Example using Python
Loading the Data:
Step 5: Feature Engineering and
Selection – Example using Python
Exploring the Data :
Step 5: Feature Engineering and
Selection – Example using Python
Handling Missing Values :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Total Rooms: Create a new feature by adding the number
of bedrooms and bathrooms :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Age of House: Create a new feature representing the age
of the house :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Age of House: Create a new feature representing the age
of the house :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Location Encoding: Convert categorical data into
numerical data. :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Selection
• Drop less relevant or redundant features :
Step 6: Train Model
• Split the dataset into training and testing
Training Set: Used to train the model.
Testing Set: Used to evaluate the model.
• Select an appropriate machine learning algorithm
Regression: Linear Regression, Ridge, Lasso, etc.
Classification: Logistic Regression, Decision Trees, Random Forest,
SVM, etc.
Clustering: K-Means, Hierarchical Clustering, etc.
• Train the model
Step 7: Model Evaluation
• Test the model to determine the percentage accuracy of
the model.
• Involves rigorous testing against validation datasets.
• Evaluation metrics such as accuracy, precision, recall, and
F1 score are computed to gauge its effectiveness.
• Provides insights into the model’s strengths and
weaknesses.
Step 7: Model Deployment
• We deploy the model in the real-world system.
• The deployment phase is similar to making the final report
for a project.
Next Steps
1. Install Python compatible IDE (Integrated Development
Environment).
2. Install Weka Machine Learning Environment
Assignment:
1. Describe the following machine learning processes:
a. CRISP-DM
b. SEMMA
c. KDD
(6 marks)
2. Identify the key differences and similarities among the
data miming (KDD) and machine learning (CRISP-DM,
SEMMA) processes? (4 marks)
Submit by: 19/05/2025 (hard copy)

You might also like