Data Preprocessing

Uploaded by

asiyashaik7867

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Data Preprocessing

Uploaded by

asiyashaik7867

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

Data Preprocessing

Data Formatting
1.Data is usually collected from different places by different people, which may be
stored in different formats.
2.Data formatting means bringing data into a common standard of expression
that allows users to make meaningful comparisons.
3.As a part of dataset cleaning, data formatting ensures that data Is consistent
and easily understandable.
For example, people may use different expressions to represent New York City,
such as N.Y., Ny, NY, and New York.
Sometimes this unclean data is a good thing to see.
For example, if you're looking at the different ways people tend to write New York,
then this is exactly the data that you want.
Or if you're looking for ways to spot fraud, perhaps writing N.Y. is more likely to
predict an anomaly than if someone wrote out New York in full.
Normalization enables a fairer comparison between
the different features, making sure they have the
same impact.
1.It is also important for computational reasons.
Box plots:
1.Box plots are a great way to visualize numeric data, since you can
visualize the various distributions of the data.
2.The main features that the box plot shows are the median of the data
which represents where the middle data point is, the upper quartile
shows where the 75th percentile is, the lower quartile shows where the
25th percentile is.
3.The data between the upper and lower quartile represents the inter-
quartile range.
4.Next, you have the lower and upper extremes.
5.These are calculated as 1.5 times the inter-quartile range above the 75th
percentile, Finally, box plots also display outliers as individual dots that
occur outside the upper and lower extremes.
6.With box plots, you can easily spot outliers and also see the distribution
and skewness of the data.
7.Box plots make it easy to compare between groups.
Pivot Table and Heatmap
1.A PivotTable has one variable displayed along the columns and the other variable
displayed along the rows. Just with one line of code and by using the Pandas pivot
method, we can pivot the body style variable, so it is displayed along the columns,
and the drive wheels will be displayed along the rows.
2. The price data now becomes a rectangular grid, which is easier to visualize. This is
similar to what is usually done in Excel spreadsheets.
3.Another way to represent the PivotTable is using a heatmap plot. Heatmap takes a
rectangular grid of data and assigns a colour intensity based on the data value at the
grid points.
4.It is a great way to plot the target variable over multiple variables, and through this,
get visual clues of the relationship between these variables and the target.
5.In this example, we use pyplot's pcolor method to plot heatmap, and convert the
previous PivotTable into a graphical form.
6.We specified the red blue colour scheme.
7.In the output plot, each type of body style is numbered along the x-axis, and each
type of drive wheels is numbered along the y-axis.
8.The average prices are plotted with varying colors based on their values according to
the color bar.
9.We see that the top section of the heatmap seems to have higher prices in the bottom

Chapter 4 Analysis and Interpretation of Assessment Results
No ratings yet
Chapter 4 Analysis and Interpretation of Assessment Results
36 pages
Unit-4 DS
No ratings yet
Unit-4 DS
17 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Unit-5 new
No ratings yet
Unit-5 new
31 pages
DS_UNIT 3
No ratings yet
DS_UNIT 3
37 pages
DVP 3
No ratings yet
DVP 3
97 pages
DA Unit 4
No ratings yet
DA Unit 4
30 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Mvda - Question Bank
No ratings yet
Mvda - Question Bank
14 pages
Unit-4 Data Exploration (E-next.in)
No ratings yet
Unit-4 Data Exploration (E-next.in)
16 pages
Chart Handout
No ratings yet
Chart Handout
9 pages
DSV Module-4
No ratings yet
DSV Module-4
36 pages
Research PPT (q3)
No ratings yet
Research PPT (q3)
23 pages
Journal Writing Material-1
No ratings yet
Journal Writing Material-1
5 pages
@vtucode - in 21CS644 Module 4 2021 Scheme
No ratings yet
@vtucode - in 21CS644 Module 4 2021 Scheme
33 pages
Unit 5 Fod (1) (Repaired)
No ratings yet
Unit 5 Fod (1) (Repaired)
28 pages
21AD71-module-1-textbook
No ratings yet
21AD71-module-1-textbook
75 pages
Unit 1 - Data Science & Big Data - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Science & Big Data - WWW - Rgpvnotes.in
20 pages
Module 4 DS
No ratings yet
Module 4 DS
89 pages
mid1 DWDM
No ratings yet
mid1 DWDM
11 pages
Tableau Ans.
No ratings yet
Tableau Ans.
25 pages
Math Midterm
No ratings yet
Math Midterm
9 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Using GEOrient - An Example
No ratings yet
Using GEOrient - An Example
8 pages
Data-Collection
No ratings yet
Data-Collection
13 pages
Presentation of Community Data - 1
No ratings yet
Presentation of Community Data - 1
7 pages
Data Science Cat - 1
No ratings yet
Data Science Cat - 1
14 pages
Microsoft Excel Data Visualisation
No ratings yet
Microsoft Excel Data Visualisation
16 pages
Mod 4
No ratings yet
Mod 4
115 pages
Data_Visualization
No ratings yet
Data_Visualization
5 pages
Da Unit-5
100% (1)
Da Unit-5
19 pages
Data Visualization Exp. 2
No ratings yet
Data Visualization Exp. 2
5 pages
Module 5 - Data Presentation
No ratings yet
Module 5 - Data Presentation
28 pages
Lesson 4 part 1-3
No ratings yet
Lesson 4 part 1-3
61 pages
Computer Assignment Kittypot
No ratings yet
Computer Assignment Kittypot
4 pages
Hs Eco Unit 5 Sem 1 Class Xi 2024
No ratings yet
Hs Eco Unit 5 Sem 1 Class Xi 2024
20 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Task Explanation202
No ratings yet
Task Explanation202
3 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
AIML Practice Questions IA-1 Ans
No ratings yet
AIML Practice Questions IA-1 Ans
7 pages
Unit - 2 BRM PDF
No ratings yet
Unit - 2 BRM PDF
9 pages
UNIT 1
No ratings yet
UNIT 1
15 pages
Weekly Learning Activity Sheet Research 1, Quarter 3, Week 4 Learning Competency: Learning Objectives
No ratings yet
Weekly Learning Activity Sheet Research 1, Quarter 3, Week 4 Learning Competency: Learning Objectives
7 pages
Data Analytics
No ratings yet
Data Analytics
110 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Computerised Accountancy Notes Oxaliss
No ratings yet
Computerised Accountancy Notes Oxaliss
4 pages
Lesson 2
No ratings yet
Lesson 2
18 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Midterm Module 1a
No ratings yet
Midterm Module 1a
14 pages
Business Anaytics Unit 1
No ratings yet
Business Anaytics Unit 1
37 pages
Stats Unit2
No ratings yet
Stats Unit2
56 pages
Ml Chapter 2
No ratings yet
Ml Chapter 2
9 pages
DMA 2 Dot&ScatterPlots, Exponential&FreqDistributionGraphs
No ratings yet
DMA 2 Dot&ScatterPlots, Exponential&FreqDistributionGraphs
70 pages
SWE 335 Slide 07
No ratings yet
SWE 335 Slide 07
29 pages
machine learning unit 2
No ratings yet
machine learning unit 2
9 pages
data visualization techniques
No ratings yet
data visualization techniques
11 pages
Data Visualization
No ratings yet
Data Visualization
14 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
4.tunneling - Lecture 2
100% (1)
4.tunneling - Lecture 2
22 pages
TAD760VE: Volvo Penta Industrial Diesel
No ratings yet
TAD760VE: Volvo Penta Industrial Diesel
2 pages
Natural Ice Creams - Buzz
100% (2)
Natural Ice Creams - Buzz
50 pages
MODUL Spaceframes PDF
No ratings yet
MODUL Spaceframes PDF
43 pages
Maternity Benefit Act 1961
No ratings yet
Maternity Benefit Act 1961
12 pages
Chapter 7 The Business Plan
100% (1)
Chapter 7 The Business Plan
34 pages
1993 - Ardhana Etal - Deep Marine Sand Ngrayong Tuban
100% (1)
1993 - Ardhana Etal - Deep Marine Sand Ngrayong Tuban
59 pages
Techniques For Characterization of Nano Materials
No ratings yet
Techniques For Characterization of Nano Materials
35 pages
Fluent Multiphase 15.0 L04 Gas Liquid Flows
No ratings yet
Fluent Multiphase 15.0 L04 Gas Liquid Flows
62 pages
kkkkkk
No ratings yet
kkkkkk
2 pages
Data Processing Agreement
No ratings yet
Data Processing Agreement
21 pages
Roll On Ingredients
No ratings yet
Roll On Ingredients
5 pages
Priprema Za Kolokvijum Ii
No ratings yet
Priprema Za Kolokvijum Ii
15 pages
Apparel Quality Management: Assignment No.1
No ratings yet
Apparel Quality Management: Assignment No.1
11 pages
ECN 9125 QUIZ 1 - Solutions
No ratings yet
ECN 9125 QUIZ 1 - Solutions
5 pages
Career Info 2.5.0
No ratings yet
Career Info 2.5.0
6 pages
Tortorella 2017
No ratings yet
Tortorella 2017
14 pages
OLIVE INVOICe 11
No ratings yet
OLIVE INVOICe 11
2 pages
New Sabah Times PDF
No ratings yet
New Sabah Times PDF
3 pages
VARDEX Grooving 135-154
No ratings yet
VARDEX Grooving 135-154
20 pages
Mason - TP - Hindi (2022)
No ratings yet
Mason - TP - Hindi (2022)
199 pages
Cyber Security: Indian Perspective
No ratings yet
Cyber Security: Indian Perspective
33 pages
Maxima Workbook
No ratings yet
Maxima Workbook
279 pages
Captain Harun - MASTER ORALS Answers - Docx-1
100% (6)
Captain Harun - MASTER ORALS Answers - Docx-1
22 pages
Oct 1, Pennywise - Castlegar, Slocan Valley
No ratings yet
Oct 1, Pennywise - Castlegar, Slocan Valley
40 pages
Feedback: The Correct Answer Is: Has A Higher Market Price Per Dollar of Earnings Than Does One Share of Turner's
No ratings yet
Feedback: The Correct Answer Is: Has A Higher Market Price Per Dollar of Earnings Than Does One Share of Turner's
20 pages
Insurance Claim (Activity Diagram (UML) ) - Creately
No ratings yet
Insurance Claim (Activity Diagram (UML) ) - Creately
2 pages
Ethics in Information Technology: Intellectual Property
No ratings yet
Ethics in Information Technology: Intellectual Property
48 pages
Computer Science Investigatory Project Class 12
No ratings yet
Computer Science Investigatory Project Class 12
26 pages
CSTR 2
No ratings yet
CSTR 2
12 pages

Data Preprocessing

Uploaded by

Data Preprocessing

Uploaded by

Data Preprocessing

You might also like