Unit-4 Part 1 Preparing Model

The document outlines the preparation and exploration of data for machine learning, focusing on different data types, data quality issues, and remediation techniques. It emphasizes the importance of data preprocessing, including handling missing values and outliers, to enhance model accuracy and efficiency. Key steps in data preprocessing are also detailed, such as dataset acquisition, data cleaning, and feature scaling.

Uploaded by

harshlpatel.4274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views20 pages

Unit-4 Part 1 Preparing Model

Uploaded by

harshlpatel.4274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Unit-2 Preparing

Model
PROF. ATMIYA PATEL
Understanding about Data
Different Types of Data
Exploring Structure of Data
Two basic data types:
1. Numerical
2. Categorical

Standard dataset have data dictionary. Like UCI repository (University of California)
Link: https://archive.ics.uci.edu/ml/index.php
Exploring Numerical Data
Steps:
1. Understanding central tendency (Ex. Mean, Median)
2. Understanding data spread
a. Dispersion of data
b. Position of different data values

3. Plotting and exploring numerical data

◦ Two plots for numerical data:
a. Box plot
b. Histogram
Exploring Categorical Data
Exploring relationship between
variables
Cont…
Two-way cross tabulations
Data Quality and Remediation
Data Quality: Major factor to decide success of machine learning. However, it is not realistic to
expect that the data will be flawless.
Two types of problems:
1. Certain data elements without a value. (missing value)
2. Data elements having value different from the other elements. (outliers)

(a.) (b.)
Data Quality issues factors
Incorrect sample set selection: The data may not reflect normal pr regular quality due to this.
 Ex. Use Festival sale data to predict the future sale.

Error in data collection: resulting outliers and missing values.

 Ex. When group of person is responsible for collection of data. (Outliers)
 If data not recorded at all. (missing values)
Data remediation
The right amount of efficiency has to be achieved in the learning activity.
Remediation actions:
For incorrect data it can be remedied by proper sampling technique. However human error can
not be solved 100 per.
For outliers and missing values we can follow the proper steps.
Handling outliers
These are the data elements with an abnormally high value which may impact prediction
accuracy.
Approaches to handle outliers:
1.) Remove outliers: If the number of records which are outliers is not many then remove it.
Cont…
2.) Imputation: to impute the value with mean or median or mode.
Cont…
3.) Capping: For values that lie outside the 1.5|x| IQR limits, we can cap them by replacing those
observation below the lower limit with the value of 95th percentile.
Handling Missing Values
1.) Eliminate records having missing values of the data elements
If it is a tolerable limit this is the effective approach.
2.) Imputing missing values
To assign a value to the data elements. Mean/mode/median is most frequently used to assign
the values.
3.) Estimate missing values
If there are data points similar to the ones with missing attribute values, then the attribute
values from those similar data points can be planted in place of the missing value.
Ex. Weight of a student having age 12 years and height 5 ft. is missing. Then the weight of any
other student having age close to 12 years and height close to 5 ft. can be assigned.
Data Preprocessing
Data preprocessing is a process of preparing the raw data and making it suitable for a machine
learning model.
It is the first and crucial step while creating a machine learning model.
◦ When creating a machine learning project, it is not always a case that we come across the clean and
formatted data. And while doing any operation with data, it is mandatory to clean it and put in a
formatted way. So, for this, we use data preprocessing task.
Why do we need Data Preprocessing?
A real-world data generally contains noises, missing values, and maybe in an
unusable format which cannot be directly used for machine learning models.
Data preprocessing is required tasks for cleaning the data and making it
suitable for a machine learning model which also increases the accuracy and
efficiency of a machine learning model.
Data Preprocessing steps:
1. Getting the dataset
2. Importing libraries
3. Importing datasets
4. Finding Missing Data
5. Encoding Categorical Data
6. Splitting dataset into training and test set
7. Feature scaling
Thank you…

Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
66 pages
Lec 9 - 11 - Machine Learning Basics
No ratings yet
Lec 9 - 11 - Machine Learning Basics
58 pages
Lecture 3 - Data Preprocessing
No ratings yet
Lecture 3 - Data Preprocessing
50 pages
Week 3
No ratings yet
Week 3
77 pages
CS322 - Lec 3 - S25
No ratings yet
CS322 - Lec 3 - S25
42 pages
ET 610 - Data Preprocessing
No ratings yet
ET 610 - Data Preprocessing
41 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Data Preparation For Machine Learning Mini Course
No ratings yet
Data Preparation For Machine Learning Mini Course
19 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
64 pages
Improve Model Accuracy With Data Pre-Processing
No ratings yet
Improve Model Accuracy With Data Pre-Processing
11 pages
Data Preparation .1
No ratings yet
Data Preparation .1
37 pages
Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
73 pages
Class3-9 DataPreprocessing 22Aug-06Sept2019
No ratings yet
Class3-9 DataPreprocessing 22Aug-06Sept2019
53 pages
Lecture 02
No ratings yet
Lecture 02
41 pages
ML Mid 1 Scheme
No ratings yet
ML Mid 1 Scheme
8 pages
DS Unit 2
No ratings yet
DS Unit 2
42 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
02 - 23ECE216 - EDA - Pre Processing
No ratings yet
02 - 23ECE216 - EDA - Pre Processing
16 pages
3b. Data Pre-Processing
No ratings yet
3b. Data Pre-Processing
84 pages
Unit 3
No ratings yet
Unit 3
41 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
UNIT02
No ratings yet
UNIT02
41 pages
DMDW 03
No ratings yet
DMDW 03
25 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Unit 2
No ratings yet
Unit 2
19 pages
Data Quality
100% (2)
Data Quality
16 pages
4 - Data Pre-Processing I
No ratings yet
4 - Data Pre-Processing I
37 pages
CH2 Data Cleaning
No ratings yet
CH2 Data Cleaning
41 pages
Unit 4
No ratings yet
Unit 4
66 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
66 pages
2 DMiningKuliah 2A DPreparation
No ratings yet
2 DMiningKuliah 2A DPreparation
32 pages
ML Unit 2
No ratings yet
ML Unit 2
52 pages
03 Data Preprocessing
No ratings yet
03 Data Preprocessing
15 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
37 pages
Data Preprocessing Implementation 13112023 061217pm
No ratings yet
Data Preprocessing Implementation 13112023 061217pm
31 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
R Programming Unit-2
No ratings yet
R Programming Unit-2
29 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
Data Preprocessing Tutorial
No ratings yet
Data Preprocessing Tutorial
39 pages
DEC - Unit II Data Pre-Processing
No ratings yet
DEC - Unit II Data Pre-Processing
96 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
24 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
3 pages
Unit-Ii Data Preprocessing
No ratings yet
Unit-Ii Data Preprocessing
94 pages
Data Preparation
No ratings yet
Data Preparation
17 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
8 pages
Investigating The Validity of The Wong and Law Emotional Intelligence Scale in A Nepali Student Sample
No ratings yet
Investigating The Validity of The Wong and Law Emotional Intelligence Scale in A Nepali Student Sample
25 pages
BA UNIT-3 - Part 1
No ratings yet
BA UNIT-3 - Part 1
4 pages
MLP Slides Merged
No ratings yet
MLP Slides Merged
480 pages
3-Data Preprocessing
No ratings yet
3-Data Preprocessing
32 pages
Unit 1
No ratings yet
Unit 1
21 pages
Be A 65 Ads Exp 3
No ratings yet
Be A 65 Ads Exp 3
6 pages
Ads Exp2 C35
No ratings yet
Ads Exp2 C35
9 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
Missing Values Analysis & Data Imputation: Single User License. Do Not Copy or Post
No ratings yet
Missing Values Analysis & Data Imputation: Single User License. Do Not Copy or Post
26 pages
Predictive Insights Real-Time Decision Making in Supply Chain Management
No ratings yet
Predictive Insights Real-Time Decision Making in Supply Chain Management
9 pages
DSBDL Asg 2 Write Up
No ratings yet
DSBDL Asg 2 Write Up
4 pages
Innovative Strategies Statistical Solutions and Simulations For Modern Clinical Trials 1st Edition Mark Chang (Author) All Chapters Instant Download
No ratings yet
Innovative Strategies Statistical Solutions and Simulations For Modern Clinical Trials 1st Edition Mark Chang (Author) All Chapters Instant Download
55 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Batch-4 Idp
No ratings yet
Batch-4 Idp
52 pages
DWM Module 2
No ratings yet
DWM Module 2
9 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Vignesh's Documentation
No ratings yet
Vignesh's Documentation
59 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Mirai Botnet Detection: A Machine Learning Approach: Journal of Nonlinear Analysis and Optimization January 2025
No ratings yet
Mirai Botnet Detection: A Machine Learning Approach: Journal of Nonlinear Analysis and Optimization January 2025
6 pages
Multimodal ML Approach
No ratings yet
Multimodal ML Approach
16 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Depreciation
No ratings yet
Depreciation
19 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
ICT583 Case Study (1) (1) .Edited
No ratings yet
ICT583 Case Study (1) (1) .Edited
9 pages
Effects of An Intervention Designed To Enhance Romantic Relationship Excitement: A Randomized-Control Trial
No ratings yet
Effects of An Intervention Designed To Enhance Romantic Relationship Excitement: A Randomized-Control Trial
14 pages
Sharpe 2021
No ratings yet
Sharpe 2021
10 pages
ML 4
No ratings yet
ML 4
6 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
Report 1
No ratings yet
Report 1
7 pages
Stress and Burnout Among Graduate Students Moderation by Sleep Duration and Quality
No ratings yet
Stress and Burnout Among Graduate Students Moderation by Sleep Duration and Quality
8 pages
Analysis and Interpretation of Censored Cost Data Using Real-World Evidence: A Step-By-Step Approach
No ratings yet
Analysis and Interpretation of Censored Cost Data Using Real-World Evidence: A Step-By-Step Approach
31 pages
Loan Default Prediction Using Machine Learning
No ratings yet
Loan Default Prediction Using Machine Learning
5 pages
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial Experiments (Statistics)
No ratings yet
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial Experiments (Statistics)
16 pages
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
No ratings yet
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
9 pages
Chen K C and Jang S J 2010 Motivation in
No ratings yet
Chen K C and Jang S J 2010 Motivation in
13 pages
How To Critically Analyse Psychological Research: The University of Newcastle, Australia
No ratings yet
How To Critically Analyse Psychological Research: The University of Newcastle, Australia
13 pages
022 Price and Location PDF
No ratings yet
022 Price and Location PDF
16 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Bank Additional Names
No ratings yet
Bank Additional Names
2 pages
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages

Unit-4 Part 1 Preparing Model

Uploaded by

Unit-4 Part 1 Preparing Model

Uploaded by

Unit-2 Preparing

3. Plotting and exploring numerical data

Error in data collection: resulting outliers and missing values.

You might also like