0% found this document useful (0 votes)

22 views37 pages

Data Visualization

The document discusses key concepts in data exploration including data visualization techniques like histograms, box plots, and scatter plots. It also covers descriptive statistics and correlation. Histograms, box plots, and scatter plots are useful for exploring patterns in data. Descriptive statistics and correlation help analyze relationships between variables.

Uploaded by

yasmine hussein

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views37 pages

Data Visualization

Uploaded by

yasmine hussein

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Principles and Practices of

Data Science
Data Science Process Stages :
Stage 4 .Data Exploration
● Data Exploration refers to explore,visualize,describe and analyse the
dataset characterizations,such as size,quantity,and accuracy,in order to
better understand the nature of the data and identify areas or patterns to
dig into more.

● It uses statistical techniques and data visualizations to examine the

data at a high level.
● Businesses determine which data is most important and which may
distort the analysis and therefore should be removed.

● It is helpful in decreasing time spent on less valuable analysis by

selecting the right path forward from the start.
Data Visualization
Data Visualization
Data visualization is the graphical representation of information and data.

By using visual elements like charts, graphs, and maps, data visualization tools
provide an accessible way to explore and understand the nature ,spread ,trends,
outliers, and patterns in data.
1.Histogram
A histogram is a plot that shows the frequency distribution of a set of variables.

The histogram gives an insight into the underlying distribution of the variable, outliers, skewness, etc.

(In Python)To draw a histogram, invoke the ‘hist()’ method of the matplotlib library.
Histogram
Example . Draw a Histogram of the following data a (list of prices of commonly
sold items at AllElectronics.)

The numbers have been sorted: 1, 1, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10, 12, 14, 14,
14, 15, 15, 15, 15, 15, 15, 18, 18, 18, 18, 18, 18, 18, 18, 20, 20, 20, 20, 20, 20, 20,
21, 21, 21, 21, 25, 25, 25, 25, 25, 28, 28, 30, 30, 30.
Histogram

The frequency table shows the

prices in the first column and the
frequency in the second column .
Histogram
Examle2. shows the grades of 25 math student,Draw a Histogram for
this dataset.
Histogram
Solution:
Box and Whisker Plot
● Box plot is a graphical representation of numerical data that can be used to

understand the variability of the data and the existence of outliers.

Box and Whisker plot

● Box plot is designed by identifying the following descriptive statistics.

1. Lower quartile, median and upper quartile

2. Lowest and highest values

-Box plot is constructed using IQR, minimum and maximum values. IQR is the distance between the 3rd

quartile and 1st quartile. The length of the box is equivalent to IQR.

( In Python))to draw the box plot, call boxplot() of the seaborn library.
Box and Whisker plot

● For example ,the scores of 15 students in the data science first exam were
as follow :{77,45,66,80,90,34,89,95,66,67,45,88,80,72,70}
● To draw the box and whisker plot for these data set ,5 values required
.Starting by the median:
1. Order the values in the ascending order.

34,45,45,66,66,67,70,72,77,80,80,88,89,90,95

2. In this case the count of values is odd so the median is the middle value 72 .

34,45,45,66,66,67,70,72,77,80,80,88,89,90,95
Box and Whisker plot
3.Lower Quartile and Upper Quartile should be detected .

-Lower Quartile:is the median of the values lower the median .

-Upper Quartile:is the median of the values upper the median.

34,45,45,66,66,67,70,72,77,80,80,88,89,90,95

4.Find the extreme values(The Range Values ),the minimum and the maximum value
-The minimum values is :34

-The Maximum value is 95

34,45,45,66,66,67,70,72,77,80,80,88,89,90,95
Box and Whisker plot
The solution :

%25 %25
%25 %25
Box and Whisker plot

The box and whisker plot shows that 50% of the students have scores between 66 and 88 points.

In addition, 75% scored lower than 88 points, and 50% have test results above 66. So, if you have
test results somewhere in the lower whisker, you may need to study more.

Question :a cloth shop has two stores ,the following datasets show the recorded sales in each
store made each month, interpret the results of the Box and whisker plot for both stores.

Store 1:

250, 360, 90, 189, 580, 350, 200, 180, 266, 410, 190, 170.

Store 2:

520, 320, 450, 500, 120, 500, 630, 420, 210, 600, 230, 140.
Scatter plots
Scatter plots represent each relationship between two continuous variables as
individual data point in a 2D graph

we can interpret that there is

a linear relationship between
engine size and price.
Cars with bigger engines
might be costlier than the
cars with small-sized
engines.
Descriptive Statistics

● Descriptive Statistics summarize or describe features of a data set,such as

the measure of central tendency or measure of the spread.

-Univariate data : it analyzes only one variable, and used in

identifying characteristics of a single trait without analyzing any relationships
or causations.for example mean of the university students Age.

-Bivariate data :it analyzes the relationship between two variables

and attempts to link them by Correlation.For example test of the is
relationship between the Age of the student and the test score.
Correlation
● when we look at two variables over time, if one variable changes, how does this affect

change in the other variable?

● For example, smoking is known to be correlated with lung cancer. Since, smoking

increases the chances of lung cancer.

Correlation
● Correlation basically means a mutual connection between two or more sets of
data. In statistics, bivariate data or two random variables are used to find the
correlation between them.

● The correlation coefficient is generally the measurement of the correlation

between the bivariate data which basically denotes how much two random
variables are correlated with each other
Correlation
Correlation
-Example, find the correlation coefficient
between X and Y .

● Starting by finding the sample means :

Correlation
● Calculate the distance of each datapoint from its mean
Correlation
● Complete the top of the coefficient equation

● Complete the bottom of the coefficient equation

Correlation

● Multiply the results of two expressions together :

18*50 = 900

● Then take the square root of the multiplication result :

√900 = 30

● pull in the numbers for the numerator and denominator :

r =30/30= 1
Correlation
Correlation
Correlation
Causation
● Causation means that changes in one variable brings about changes in the
other; there is a cause-and-effect relationship between variables. The two
variables are correlated with each other and there is also a causal link
between them.
Correlation Vs Causation
A classic example:

● During the summer, the sale of ice cream at a beach increases

● Simultaneously, drowning accidents also increase as well

Does this mean that increase of ice cream sale is a direct cause of
increased drowning accidents?
Correlation Vs Causation

In other words: can we use ice cream sale to predict drowning accidents?

The answer is - Probably not.

It is likely that these two variables are accidentally correlating with each other.

What causes drowning then?

● Unskilled swimmers
● Waves
● Lack of supervision
● Alcohol (mis)use
Correlation

Lcup Compre Stat Review Edited
100% (1)
Lcup Compre Stat Review Edited
44 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Variables & Chart
No ratings yet
Variables & Chart
60 pages
Variables and Data Presentation
No ratings yet
Variables and Data Presentation
64 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Variable: An Item of Data Examples
No ratings yet
Variable: An Item of Data Examples
60 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Algebra 1 Unit 6 Describing Data Notes
No ratings yet
Algebra 1 Unit 6 Describing Data Notes
13 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Unit II Descriptive-Statistics-And-Correlation
No ratings yet
Unit II Descriptive-Statistics-And-Correlation
19 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
AS-level - Research Methods 4 - Correlation and Data Analysis
No ratings yet
AS-level - Research Methods 4 - Correlation and Data Analysis
63 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Data Management
No ratings yet
Data Management
36 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
1.1 CS3352-FDS - Unit 1
No ratings yet
1.1 CS3352-FDS - Unit 1
42 pages
Comprehensive Ebook of Statistics For Data Science - Chaitali
No ratings yet
Comprehensive Ebook of Statistics For Data Science - Chaitali
21 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Module 4
No ratings yet
Module 4
195 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
CHP 2
No ratings yet
CHP 2
52 pages
Statistics For Data Science PDF - Statistics-for-Data-Science PDF
No ratings yet
Statistics For Data Science PDF - Statistics-for-Data-Science PDF
14 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
43 pages
Statistics
No ratings yet
Statistics
81 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
02 Data
No ratings yet
02 Data
36 pages
Basics Data Description
No ratings yet
Basics Data Description
2 pages
DSOST2
No ratings yet
DSOST2
44 pages
Statistics
No ratings yet
Statistics
41 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Business Statistics: Qualitative or Categorical Data
No ratings yet
Business Statistics: Qualitative or Categorical Data
14 pages
Statistics FoundationalMathofAI S24
No ratings yet
Statistics FoundationalMathofAI S24
5 pages
Business Statistics and Analysis Course 2&3
No ratings yet
Business Statistics and Analysis Course 2&3
42 pages
01 Data
No ratings yet
01 Data
100 pages
Stats and Its Real World Applications.
No ratings yet
Stats and Its Real World Applications.
53 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
No ratings yet
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
21 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
SDA 3E Chapter 2
No ratings yet
SDA 3E Chapter 2
40 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Intro To Statistics
No ratings yet
Intro To Statistics
37 pages
Computer Software
No ratings yet
Computer Software
23 pages
Algorithms and Flowcharts
No ratings yet
Algorithms and Flowcharts
37 pages
Data Representation
No ratings yet
Data Representation
31 pages
CS361 Artificial Intelligence (SEP) Lecture 1 (An Introduction To Artificial Intelligence) Fall 2020
No ratings yet
CS361 Artificial Intelligence (SEP) Lecture 1 (An Introduction To Artificial Intelligence) Fall 2020
44 pages
Analysis of Variance and Design of Experiments
No ratings yet
Analysis of Variance and Design of Experiments
20 pages
18-19-20 Hypothesis Testing, Parametric and Non-Parametric Test
100% (3)
18-19-20 Hypothesis Testing, Parametric and Non-Parametric Test
35 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
Data Migration
100% (2)
Data Migration
31 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
Exercise 3
No ratings yet
Exercise 3
10 pages
Unit-3 Data Preprocessing
100% (1)
Unit-3 Data Preprocessing
7 pages
Graduate School Application Essay Handout
No ratings yet
Graduate School Application Essay Handout
5 pages
Multiple-Choice Test Linear Regression Regression: y X y X y X
No ratings yet
Multiple-Choice Test Linear Regression Regression: y X y X y X
2 pages
A AMU
No ratings yet
A AMU
9 pages
IPA Stage1 RR Riesthuis Otgaar Mangiulli Buecken
No ratings yet
IPA Stage1 RR Riesthuis Otgaar Mangiulli Buecken
30 pages
Data Science Primer
No ratings yet
Data Science Primer
9 pages
CS109a Lecture17 Boosting Other
No ratings yet
CS109a Lecture17 Boosting Other
21 pages
8 Esh Narayan 734 Research Article CSIT June 2012
No ratings yet
8 Esh Narayan 734 Research Article CSIT June 2012
9 pages
2020 Mahajanetal Wtliftingpaper
No ratings yet
2020 Mahajanetal Wtliftingpaper
20 pages
The Influence of E-Commitment and E-Trust Towards E-Loyalty Among Internet Banking Users: A PLS Modelling Approach
No ratings yet
The Influence of E-Commitment and E-Trust Towards E-Loyalty Among Internet Banking Users: A PLS Modelling Approach
9 pages
Machine Learning Updated Lesson Plan
No ratings yet
Machine Learning Updated Lesson Plan
6 pages
Mahindra
67% (3)
Mahindra
27 pages
Notes Unit I
No ratings yet
Notes Unit I
12 pages
Journal Reading Sepo
No ratings yet
Journal Reading Sepo
30 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Unit 1 Data Science Notes
No ratings yet
Unit 1 Data Science Notes
33 pages
SIP Format & Reporting Schedule
No ratings yet
SIP Format & Reporting Schedule
4 pages
Choosing The Correct Statistical Test in SAS, Stata, SPSS and R
No ratings yet
Choosing The Correct Statistical Test in SAS, Stata, SPSS and R
8 pages
Da Unit-4
No ratings yet
Da Unit-4
43 pages
Introduction To Data Analytics: Data Analysis Using Python - Project Report
100% (1)
Introduction To Data Analytics: Data Analysis Using Python - Project Report
1 page
Analytical Sport Business
No ratings yet
Analytical Sport Business
11 pages
Inception Report AACCSA
No ratings yet
Inception Report AACCSA
11 pages
Power BI Boot Camp CES LUMS
No ratings yet
Power BI Boot Camp CES LUMS
3 pages