0% found this document useful (0 votes)

54 views51 pages

Data Analysis and Visualization EDA

Exploratory Data Analysis (EDA) is a method for summarizing and understanding data characteristics, relationships between variables, and identifying key factors affecting outcomes, such as car prices. The lecture covers data types, descriptive analysis, measures of central tendency, variability, and the implementation of these concepts using Python. Key topics include the importance of variance and standard deviation in data analysis, as well as the use of z-scores for comparing distributions.

Uploaded by

usairashahbaz152

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views51 pages

Data Analysis and Visualization EDA

Uploaded by

usairashahbaz152

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

EXPLORATORY

DATA
ANALYSIS
Lecture 2
EXPLORATORY DATA
ANALYSIS
• EDA, is an approach to analyze data in order to
summarize main characteristics of the data, gain better
understanding of the data set, uncover relationships
between different variables and extract important
variables for the problem we're trying to solve.

• Question: what are the characteristics that have the

most impact on the car price?
TOPICS COVERED
Types and Forms of Data

Understand Descriptive Data Analysis

Types Measures

• Central Tendency
• Variability
• Relative Standing

3
TYPES AND FORMS OF DATA
Data: Encoded Knowledge (numbers, text, sound, colors, images)

Dataset: Table (Rows, Columns, Cell (Data Items))

Data Types:
• Independent – given (Input)
• Dependent – observed (Output)
Data Forms:
• Discrete – finite possible values(yes/no; republican/democrat; satisfied/not-
satisfied, etc.)
• Continuous – infinite possible values (height, weight, length, time, etc.)

4
FORMS OF DATA

5
DIMENSIONALITY OF DATA SETS
• Univariate: f(x)
• Bivariate: f(x1, x2)
• Multivariate: f(x1, x2, x3, ….., xn)

X1 X2 X3 X4 Xn

……….

6
UNDERSTANDING DESCRIPTIVE DATA ANALYSIS
• Describing and summarizing data.
• Uses two main approaches:
• Quantitative: numerically.
• Visual: charts and other graphs.
• Not to infer/estimate properties about
a larger population, only describe!

7
TYPES OF MEASURES
• Central Tendency. To find Center, i.e., mean, median,
and mode.

• Variability. To find “data spread” or distance from the

center, i.e. variance and standard deviation (std).

• Relative Standing. To find relative position of specific

data items, i.e. covariance and correlation coefficient.

8
CENTRAL TENDENCY
CENTRAL LIMIT THEOREM
• The mean of a random sample will more closely resemble the mean
for the whole dataset as the sample size increases, regardless of the
shape of the distribution.

• Mean of Sample ~ Mean of whole population

10
NORMAL DISTRIBUTION

Symmetrical (50% above and below mean)

68%

95%
95%
99% 99%

11
THE BELL CURVE

Histograms (Frequency of items)

.01 .01

Significant Significant

Mean=70

12
MEASURES OF CENTRAL TENDENCY
•Mean - arithmetic average
–Σ (𝑥ᵢ / 𝑛), where 𝑖 = 1, 2, …, 𝑛.

•Median - midpoint of the

distribution (Data items need to be
sorted in ascending order first)
•Mode - the value that occurs most
often

13
ARITHMETIC MEAN EXAMPLE

98
88
81
74
72
72
741\10 = 74.1
70
69
65
52

741

14
MODE EXAMPLE

Find the score that occurs most frequently

98
88
81
74
72 Mode = 72
72
70
69
65
52

15
MEDIAN EXAMPLE

Arrange in descending order and find the midpoint

Odd Number (N = 9)
98 Even Number (N = 10)
88 98
81 88
74 81
72 Midpoint = 72 74
70 72 Midpoint =
69 71 (72+71)/2
65 70 = 71.5
52 69
65
Two most important steps of this implementation are as follows: 52
Sorting the elements of the dataset
Finding the middle element(s) in the sorted dataset

16
IMPLEMENTATION OF CONCEPTS USING PYTHON
#Import Libraries
import math
import statistics
import numpy as np
# Define a list
x = [8.0, 1, 2.5, 4, 28.0] # Sample data
# Print values
Print (x)
# Calculate mean simple
MEAN = sum(x) / len(x) # 8.7
# using statistics library
MEAN = [Link](x) # 8.7
# if you are using numpy then
y = [Link](x)
MEAN = [Link](y) #8.7

17
IMPLEMENTATION OF CONCEPTS USING PYTHON

# Calculate median simple

n = len(x)
if n % 2: #For ODD number of data items
median_ = sorted(x)[round(0.5*(n-1))]
else: #For EVEN number of data items
x_ord, index = sorted(x), round(0.5 * n)
median_ = 0.5 * (x_ord[index-1] + x_ord[index])

print (median_) # 4

# using statistics library

median_ = [Link](x) # 4

# if you are using numpy then

y = [Link](x)
median_ = [Link](x) #4

18
IMPLEMENTATION OF CONCEPTS USING PYTHON

u = [2, 3, 2, 8, 12] #Lets change the data to this one

# Calculate mode simple

mode_ = max(([Link](item), item) for item in set(u))[1]
print(mode_) #2

# using statistics library

MEAN = statistics. mode(u) #2

# if you are using numpy then

w = [Link](u)
MEAN = [Link](w) #2

19
VARIABILITY
VARIABILITY
• The measures of central tendency aren’t sufficient to describe data.
• Need to measures of variability that quantify the spread of data points.

• Variability measures:
• Range
• Variance
• Standard deviation
• Skewness
• Percentiles

21
THE RANGE AS A MEASURE OF SPREAD
• Range = largest value – smallest value

Group 1 Group 2
100, 100 91, 85
99, 98 81, 79
88, 77 78, 77
72, 68 73, 75
67, 52 72, 70
43, 42 65, 60

Range G1: 100 – 42 = 58 Range G2: 91 – 60 = 31

VARIANCE

• The sample variance quantifies the spread of the data.

• It shows numerically how far the data points are from the mean.

( X i − X ) 2 ( X i − X ) 2
S =
2
s =
2

N n −1
Population Variance Sample Variance

23
VARIANCE EXAMPLE

X X X-X X –X2
98 - 74.1 = 23.90 = 571.21 Population Variance (N)
88 - 74.1 = 13.90 = 193.21
81 - 74.1 = 6.90 = 47.61 1,434.90 \ 10 = 143.49
74 - 74.1 = -0.10 = 0.01
72 - 74.1 = -2.10 = 4.41
72 - 74.1 = -2.10 = 4.41
70 - 74.1 = -4.10 = 16.81 Sample Variance (n-1)
69 - 74.1 = -5.10 = 26.01
65 - 74.1 = -9.10 = 82.81 1,434.90 \ 9 = 159.43
52 - 74.1 = -22.10 = 488.41
Mean = 74.1 1,434.90

24
WHY FIND VARIANCE AND STD?
• The variance is used in many higher-order calculations including:
• T-test (inferential, mean, two samples)
• Analysis of Variance (ANOVA) (inferential, variance, two samples)
• Regression (inferential, cause-effect correlation)
• Variance = zero (All values within set are identical)
• All variances = non-zero (positive numbers). Why?
• A large variance indicates that numbers in the set are far from the
mean and each other, while a small variance indicates the opposite.

25
STANDARD DEVIATION

• Once you get the variance, you can calculate the standard deviation by taking
its square root.
• The higher the standard deviation, the greater the variability and/or spread
of scores
• Why std instead of variance?
• The std is more convenient because it has the same unit as
the data points, i.e. S instead of 𝒔𝟐 .
• Helps to localize the data item (z-score)

s =
(
 Xi −X 2 )
n −1
26
STANDARD DEVIATION EXAMPLE

Population STD
X X X-X X –X2
1,434.90 \ 10 = 143.49
98 - 74.1 = 23.90 = 571.21
88 - 74.1 = 13.90 = 193.21 (SQRT) 143.49 = 11.98
81 - 74.1 = 6.90 = 47.61
74 - 74.1 = -0.10 = 0.01
72 - 74.1 = -2.10 = 4.41
72 - 74.1 = -2.10 = 4.41 Sample STD
70 - 74.1 = -4.10 = 16.81 1,434.90 \ 9 = 159.43
69 - 74.1 = -5.10 = 26.01
65 - 74.1 = -9.10 = 82.81
(SQRT) 159.43 = 12.63
52 - 74.1 = -22.10 = 488.41
Mean = 74.1 1,434.90

27
Z-SCORE FORMULA
The z-score is simply a way of telling how far a score is from the mean in
standard deviation units.

Item localization in terms of std from the mean in either direction.

𝑋 − 𝑋ത
𝑧=
𝑆
Z-Scores with positive numbers are above the mean while Z-Scores
with negative numbers are below the mean.

28
COMPARING Z-SCORES

• Z-scores allow the researcher to make comparisons between different

distributions.
Mathematics English
µ = 75 µ = 52
σ=6 σ=4
X = 78 X = 57

𝑋 − 𝜇 78 − 75 3
Mathematics 𝑧= = = = 0.5
𝜎 6 6

57 − 52 5
English 𝑧= = = 1.25
4 4

29
AREA UNDER THE NORMAL CURVE

50% 50%

34.1% 34.1%

13.5% 13.5%
2.2% 2.2%

68.2%

95.2%

99.6%
30
AREA UNDER THE NORMAL CURVE
• For many lists of observations – especially if their histogram is bell-shaped
• Roughly 68% of the observations in the list lie within 1 std from the mean
• 95% of the observations lie within 2 std from the mean
• 99.6% of the observations lie within 3 std from the mean

31
SKEWNESS

• It measures the asymmetry of a data sample.

• Asymmetry is concentration of scores at a particular point on the x-axis.

• Negative skewness: The symmetry of the distribution is tilt toward the right side (Higher Numbers)
• Positive skewness: The symmetry of the distribution is tilt toward the left side (Lower Numbers)

32
SYMMETRIC VS. SKEWED DATA

• Median, mean and mode of symmetric,

positively and negatively skewed data

positively skewed negatively skewed symmetric

33
WHEN THE DISTRIBUTION MAY NOT BE NORMAL
9
Salary Sample Data
8

7
Average = 62K
6
Mode = 45K
Frequency
5

0
Median =
25 27 29 32 35 38 43 45 48 51 54 56 59 60 62 65 68 71 75 78 85 88 91 95 98 99 100 150 175
Annual Salary in Thousands of Dollars

34
QUARTILES
• Each dataset has three quartiles, which are the percentiles that divide the
dataset into four parts:
• Q1, A value for which 25% of the observations are smaller and 75% are
larger
• Q2, same as median (50% are smaller, 50% are larger)
• Q3, (75% are smaller, 25% are larger)

35
PERCENTILES

• In general the nth percentile is a value such that n% of the observations fall at
or below or it
IMPLEMENTATION OF CONCEPTS USING PYTHON
#Import Libraries
import math
import statistics
import numpy as np For whole population variance:
# Define a list •Replace (n - 1) with n in the pure Python implementation.
x = [8.0, 1, 2.5, 4, 28.0] # Sample data •Use [Link]() instead of [Link]().
•Specify the parameter ddof=0 if you use NumPy or Pandas. In
# Calculate variance simple NumPy, you can omit ddof because its default value is 0.
n = len(x)
mean_ = sum(x) / n #8.7
var_ = sum((item - mean_)**2 for item in x) / (n - 1) #123.2
# using statistics library
var_ = [Link](x) #123.2 It’s very important to specify the parameter ddof=1.
That’s how you set the delta degrees of freedom to 1.
# if you are using numpy then This parameter allows the proper calculation of 𝑠²,
Y = [Link](x) with (𝑛 − 1) in the denominator instead of 𝑛.
var_ = [Link](y, ddof=1)

37
PYTHON IMPLEMENTATION OF CONCEPTS USING PYTHON

# Calculate std simple

std_ = var_ ** 0.5

# using statistics library

std_ = [Link](x)

# if you are using numpy then

y = [Link](x)
[Link](y, ddof=1)

38
IMPLEMENTATION OF CONCEPTS USING PYTHON

# Calculate skewness simple

x = [8.0, 1, 2.5, 4, 28.0]
n = len(x)
mean_ = sum(x) / n
var_ = sum((item - mean_)**2 for item in x) / (n - 1)
std_ = var_ ** 0.5
skew_ = (sum((item - mean_)**3 for item in x)
* n / ((n - 1) * (n - 2) * std_**3))
print (skew_)
1.9470432273905929 #The skewness is positive, so x has a right-side tail.

# using scipy library

y = [Link](x)
[Link](y, bias=False)

39
IMPLEMENTATION OF CONCEPTS USING PYTHON

# Calculate Percentile simple

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]
perc = [Link](x, n=4, method='inclusive')
print (perc)
[0.1, 8.0, 21.0]
# In this example, 8.0 is the median of x,
# while 0.1 and 21.0 are the sample 25th and 75th percentiles, respectively.

# using numpy library

y = [Link](x)
perc = [Link](y, [25, 50, 75])
print (perc)
#[0.1, 8.0, 21.0]

40
IMPLEMENTATION OF CONCEPTS USING PYTHON

# Calculate Range simple

x = [-5.0, -1.1, 0.1, 2.0, 8.0, 12.8, 21.0, 25.8, 41.0]
rng = max(x) – min(x)
print (rng) #46

# using numpy library

y = [Link](x)
rng = [Link](y)
print (rng) #46.0
print ([Link](y) - [Link](y)) # 46

41
RELATIVE STANDING
COVARIANCE

• Signifies the direction of the linear relationship between the two variables.
• Direction means if the variables are directly proportional or inversely
proportional to each other.
• The values of covariance can be any number (Not Scaled)
• It only measures how two variables change together, not the dependency of
one variable on another one.

43
COVARIANCE EXAMPLE

• Mean ABC = (1.1 + 1.7 + 2.1 + 1.4 + 0.2) / 5 = 1.30

• Mean XYZ = (3 + 4.2 + 4.9 + 4.1 + 2.5) / 5 = 3.74
• Cov = [(1.1 - 1.30) x (3 - 3.74)] + [(1.7 - 1.30) x (4.2 - 3.74)] + [(2.1 - 1.30) x
(4.9 - 3.74)] + …
• Cov = 2.66 / (5 - 1) = 0.665

44
CORRELATION

• It is used to study the strength of a relationship between two, numerically

measured, continuous variables.
• To determine whether the covariance of the two variables is large or small,
we need to assess it relative to the standard deviations of the two variables.
• To do so we have to normalize the covariance by dividing it with the product
of the standard deviations of the two variables, thus providing a correlation
between the two variables.
• The main result of a correlation is called the correlation coefficient.
• The correlation coefficient is a dimensionless metric and its value ranges
from -1 to +1.

45
CORRELATION

46
CORRELATION

47
CORRELATION VS COVARIANCE

Covariance Correlation
Covariance is nothing but a Correlation refers to the scaled
measure of correlation. form of covariance.
Correlation on the other hand
Covariance indicates the measures both the strength and
direction of the linear direction of the linear
relationship between variables. relationship between two
variables.
Covariance can vary between - Correlation ranges between -1
∞ and +∞ and +1
Covariance is affected by the
change in scale. If all the values
of one variable are multiplied by
a constant and all the values of Correlation is not influenced by
another variable are multiplied, the change in scale.
by a similar or different
constant, then the covariance is 48
CORRELATION EXAMPLE

GLUCOSE
SUBJECT AGE X
LEVEL Y
1 43 99
2 21 65
3 25 79
4 42 75
From our table:
5 57 87
Σx = 247
Σy = 486 6 59 81
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
n is the sample size, in our case = 6
R = 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]
= 0.5298

49
EXAMPLE
• A survey was given to students to find out how many hours per week they would listen to a
particular radio station.
• The data collected was distributed by gender.
• Determine the mean, range, variance and standard deviation of each group.
• Find the correlation coefficient R
Group A (Female) Group B (Male)

15 30
25 15
12 21
7 12
3 25
33 20
18 5
16 24
9 17
24 11

50
SOLUTION - EXAMPLE

Female Group Mean = 16.2 h/w

Male Group Mean = 18 h/w
Female Group Range = 30
Male Group Range = 25
Female Group Sample Variance / Std = 83.73 / 9.15
Female Group Population Variance / Std = 75.36 / 8.68
Male Group Sample Variance / Std = 56.22 / 7.5
Male Group Population Variance / Std = 50.6 / 7.11
Correlation Coefficient = -0.2073

DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Descriptive Statistics and EDA Overview
No ratings yet
Descriptive Statistics and EDA Overview
36 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Statistical Methods in Social Sciences
No ratings yet
Statistical Methods in Social Sciences
69 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
Assessment in Learning 1 Unit 4 Presentation Quantitative Analysis and Interpretation
No ratings yet
Assessment in Learning 1 Unit 4 Presentation Quantitative Analysis and Interpretation
86 pages
Business Statistics: Session 2
No ratings yet
Business Statistics: Session 2
60 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
Statistics and Data Management Guide
No ratings yet
Statistics and Data Management Guide
14 pages
History Reporting
No ratings yet
History Reporting
61 pages
EDA: Key Stats & Visualizations in Python
No ratings yet
EDA: Key Stats & Visualizations in Python
15 pages
Frequency Distribution & Statistics Guide
No ratings yet
Frequency Distribution & Statistics Guide
4 pages
Session 3
No ratings yet
Session 3
61 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Understanding Statistics: Concepts & Applications
No ratings yet
Understanding Statistics: Concepts & Applications
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
22 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
CH 003
No ratings yet
CH 003
87 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
EXP-1 - Statistics and Plotting
No ratings yet
EXP-1 - Statistics and Plotting
23 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Understanding Measures of Variability
No ratings yet
Understanding Measures of Variability
24 pages
MMW PPT Weeks 9 12
No ratings yet
MMW PPT Weeks 9 12
31 pages
Descriptive Statistics W25
No ratings yet
Descriptive Statistics W25
41 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
AP ECON 2500 Session 2
No ratings yet
AP ECON 2500 Session 2
22 pages
Mean, Median, and Mode Explained
No ratings yet
Mean, Median, and Mode Explained
4 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Standard Deviation
No ratings yet
Standard Deviation
37 pages
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
No ratings yet
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
10 pages
Lecture5 Stat104 Fall2017 V1 6up
No ratings yet
Lecture5 Stat104 Fall2017 V1 6up
13 pages
WEEK 3 - Central-Tendency-Variation-And-Shape
No ratings yet
WEEK 3 - Central-Tendency-Variation-And-Shape
39 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Statistics Basics for Data Science
100% (2)
Statistics Basics for Data Science
27 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
50 pages
I Am Sharing 'DOC-20250811-WA0005.' With You
No ratings yet
I Am Sharing 'DOC-20250811-WA0005.' With You
16 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
34 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
59 pages
4 - Stat - Measures of Variation 2021
No ratings yet
4 - Stat - Measures of Variation 2021
26 pages
Agrc 212 Lecture Three - June 2023 Covered
No ratings yet
Agrc 212 Lecture Three - June 2023 Covered
32 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
Basic Statistics: Measures of Data Analysis
No ratings yet
Basic Statistics: Measures of Data Analysis
62 pages
Data Visualization
No ratings yet
Data Visualization
37 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Intro To Data: Science
No ratings yet
Intro To Data: Science
156 pages
4 - Stat - Measures of Variation 2024
No ratings yet
4 - Stat - Measures of Variation 2024
27 pages
Statistics for Students
No ratings yet
Statistics for Students
1 page
AP Statistics: Data & Variation
No ratings yet
AP Statistics: Data & Variation
83 pages
Measures of Variation Guide
No ratings yet
Measures of Variation Guide
26 pages
Variance & ST - Deviation
No ratings yet
Variance & ST - Deviation
29 pages
Chap2 Data
No ratings yet
Chap2 Data
101 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Excel Formulas and Functions
86% (28)
Excel Formulas and Functions
126 pages
Learn Excel Data Analysis
100% (18)
Learn Excel Data Analysis
721 pages
101 Best Microsoft Excel Tips & Tricks Ebook v1.3 - LM
97% (31)
101 Best Microsoft Excel Tips & Tricks Ebook v1.3 - LM
616 pages
Excel Bible For Beginners - Excel For Dummies Guide To The Best Excel Tools, Tips and Shortcuts
100% (18)
Excel Bible For Beginners - Excel For Dummies Guide To The Best Excel Tools, Tips and Shortcuts
148 pages
Learn Excel Dashboard
100% (18)
Learn Excel Dashboard
233 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (18)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
MGH Data Analysis With Microsoft Power BI 126045861X
93% (14)
MGH Data Analysis With Microsoft Power BI 126045861X
808 pages
Advanced Excel Tutorial
98% (49)
Advanced Excel Tutorial
232 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (46)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Python in Excel (2024)
100% (14)
Python in Excel (2024)
607 pages
Data Science and Predictive Analytics
100% (10)
Data Science and Predictive Analytics
309 pages
Microsoft Power BI Cookbook by Greg Deckler
100% (20)
Microsoft Power BI Cookbook by Greg Deckler
655 pages
POWER BI Tutorial
91% (11)
POWER BI Tutorial
77 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Data Analysis With Microsoft Excel
92% (26)
Data Analysis With Microsoft Excel
532 pages
Excel 2024 A Comprehensive Guide To Learn All The Functions Formulas
100% (14)
Excel 2024 A Comprehensive Guide To Learn All The Functions Formulas
210 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
120+ Useful Excel Macro Codes For VBA Beginners
100% (24)
120+ Useful Excel Macro Codes For VBA Beginners
205 pages
Excel Dashboards Tutorial PDF
93% (28)
Excel Dashboards Tutorial PDF
166 pages
Power Query - Reference Book PDF
100% (18)
Power Query - Reference Book PDF
236 pages
Easy Guide Excel 2022 Boost Your Excel Skills With This Simple and
100% (12)
Easy Guide Excel 2022 Boost Your Excel Skills With This Simple and
392 pages
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
89% (18)
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
102 pages
Advanced Excel Formulas Unleashing Brilliance With Excel Formulas
92% (13)
Advanced Excel Formulas Unleashing Brilliance With Excel Formulas
834 pages
Python Durga Notes
85% (65)
Python Durga Notes
367 pages
Power BI Tutorial
93% (14)
Power BI Tutorial
34 pages
Excel Formulas & Functions
100% (15)
Excel Formulas & Functions
100 pages
Excel Pivot Table Champ
100% (12)
Excel Pivot Table Champ
67 pages
Excel Formulas
100% (15)
Excel Formulas
315 pages
Collect, Transform and Combine Data Using Power BI and Power Query in Excel (Business Skills)
86% (14)
Collect, Transform and Combine Data Using Power BI and Power Query in Excel (Business Skills)
543 pages
2 The Need For Security
No ratings yet
2 The Need For Security
68 pages
Project Proposal Network
No ratings yet
Project Proposal Network
4 pages
Project CC
No ratings yet
Project CC
11 pages
CN Ecercise Chapter 2 Physical Layer
No ratings yet
CN Ecercise Chapter 2 Physical Layer
29 pages
CN Data Link Layer Exercise
No ratings yet
CN Data Link Layer Exercise
15 pages
Intro Iphone
No ratings yet
Intro Iphone
19 pages
Automatic Corrosion Detection On Concrete Surfaces.: Topic
No ratings yet
Automatic Corrosion Detection On Concrete Surfaces.: Topic
10 pages
STATS 200 Midterm Exam Sample Questions
No ratings yet
STATS 200 Midterm Exam Sample Questions
6 pages
BJT and FET Biasing and Stabilization
No ratings yet
BJT and FET Biasing and Stabilization
15 pages
ЧПУ контроллера FSCUT 3000:
No ratings yet
ЧПУ контроллера FSCUT 3000:
70 pages
Minimum Size Maximum Efficiency: 3-Part Hematology Analyzer
No ratings yet
Minimum Size Maximum Efficiency: 3-Part Hematology Analyzer
2 pages
Lab 1 Introduction To Stateflow
No ratings yet
Lab 1 Introduction To Stateflow
39 pages
Wave Optics: Key Concepts and Problems
No ratings yet
Wave Optics: Key Concepts and Problems
562 pages
Higher Mental Processes 1st Edition Robert W. Proctor (Editor)
No ratings yet
Higher Mental Processes 1st Edition Robert W. Proctor (Editor)
397 pages
Core Pure Practice Paper 2 - For Teachers
No ratings yet
Core Pure Practice Paper 2 - For Teachers
17 pages
Unit 4 KTT 2 Organic Pathways - Question Book
No ratings yet
Unit 4 KTT 2 Organic Pathways - Question Book
10 pages
Spru 187 o
No ratings yet
Spru 187 o
229 pages
Lab 7 - Chemistry and Physics - Oceanography 2021
No ratings yet
Lab 7 - Chemistry and Physics - Oceanography 2021
2 pages
Mahmoudi Et Al. S II ASR 71 1281 2023
No ratings yet
Mahmoudi Et Al. S II ASR 71 1281 2023
6 pages
Reviewer Quiz Bee
No ratings yet
Reviewer Quiz Bee
6 pages
2022 Syllabus
No ratings yet
2022 Syllabus
44 pages
Overview of Wire Antenna Engineering
100% (1)
Overview of Wire Antenna Engineering
141 pages
Structural Stiffness Matrix Analysis
No ratings yet
Structural Stiffness Matrix Analysis
13 pages
Storage Tanks Selection and Sizing: Richardhaw@sympatico - Ca
No ratings yet
Storage Tanks Selection and Sizing: Richardhaw@sympatico - Ca
50 pages
UCS749
No ratings yet
UCS749
1 page
Revised Heat Gain Rates From Typical Commercial Cooking Appliances From RP
No ratings yet
Revised Heat Gain Rates From Typical Commercial Cooking Appliances From RP
36 pages
Thermowells for HVAC and Industry
No ratings yet
Thermowells for HVAC and Industry
4 pages
Class 9 Data Entry and Keyboarding Skills
No ratings yet
Class 9 Data Entry and Keyboarding Skills
4 pages
Assertions and Reasons in Physics Concepts
No ratings yet
Assertions and Reasons in Physics Concepts
7 pages
Math6 Q2 W2 Ancabaccan
No ratings yet
Math6 Q2 W2 Ancabaccan
7 pages
Tonnes Per Centimetre Immersion (TPC)
No ratings yet
Tonnes Per Centimetre Immersion (TPC)
7 pages
ArcGis 9.3 Installation Tutorial
50% (2)
ArcGis 9.3 Installation Tutorial
35 pages
9th Class Chapter 5 Chemistry Notes Sindh Board
No ratings yet
9th Class Chapter 5 Chemistry Notes Sindh Board
10 pages
File Module
No ratings yet
File Module
10 pages
Grade 8 Excel Formulas Worksheet
No ratings yet
Grade 8 Excel Formulas Worksheet
2 pages
Engineering Geology Relatedness
No ratings yet
Engineering Geology Relatedness
37 pages

Data Analysis and Visualization EDA

Uploaded by

Data Analysis and Visualization EDA

Uploaded by

EXPLORATORY

• Question: what are the characteristics that have the

Understand Descriptive Data Analysis

Dataset: Table (Rows, Columns, Cell (Data Items))

• Variability. To find “data spread” or distance from the

• Relative Standing. To find relative position of specific

• Mean of Sample ~ Mean of whole population

Symmetrical (50% above and below mean)

Histograms (Frequency of items)

•Median - midpoint of the

Find the score that occurs most frequently

Arrange in descending order and find the midpoint

# Calculate median simple

# using statistics library

# if you are using numpy then

u = [2, 3, 2, 8, 12] #Lets change the data to this one

# Calculate mode simple

# using statistics library

# if you are using numpy then

Range G1: 100 – 42 = 58 Range G2: 91 – 60 = 31

• The sample variance quantifies the spread of the data.

Item localization in terms of std from the mean in either direction.

• Z-scores allow the researcher to make comparisons between different

• It measures the asymmetry of a data sample.

• Median, mean and mode of symmetric,

positively skewed negatively skewed symmetric

# Calculate std simple

# using statistics library

# if you are using numpy then

# Calculate skewness simple

# using scipy library

# Calculate Percentile simple

# using numpy library

# Calculate Range simple

# using numpy library

• Mean ABC = (1.1 + 1.7 + 2.1 + 1.4 + 0.2) / 5 = 1.30

• It is used to study the strength of a relationship between two, numerically

Female Group Mean = 16.2 h/w

You might also like