0% found this document useful (0 votes)

19 views19 pages

DOM503 Session 1

The document discusses the differences between categorical and numerical data, including their displays and measures of central tendency such as mean, median, and mode. It also covers concepts like percentiles, quartiles, variance, and standard deviation, along with methods for measuring skewness and kurtosis. Additionally, it addresses common errors in data visualization and emphasizes the importance of ethical considerations in data analysis and interpretation.

Uploaded by

Mihir Ritti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views19 pages

DOM503 Session 1

Uploaded by

Mihir Ritti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

DOM503 2021

Session 1
Categorical and Numerical
Data
Categorical data is data that is separated
into various groupings or categories for
display.
Takes form of tables, bar charts, pie
charts, etc.
Numerical data comprises of numbers that
have not been separated into categories.
Displays of numerical data include arrays,
frequency distributions, scatter plots, etc.
Both types of data can be displayed using
some types of tables such as Pivot Tables.
Grouped data mean example
Data: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100

Mean = (15*3+25*6+35*5+45*4+55*2)/20
= 33
Measures of Central
Tendency
Most sets of data show a tendency to group
around a central value, this is the ‘central
tendency’.
The most common measures of central
tendency are mean, median, and mode.
The mean, also called the arithmetic mean,
is the average
n of all values in the sample
space.  Xi
X1  X 2    X n
X i 1

n n

n is the size of the sample. Excel command:

Mode
A measure of central tendency
It is the value that occurs most often in the
sample.
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode if all values have the
same frequency
There may be several modes if more than one
value are tied for the highest frequency.
Excel command: MODE
Note: Not suitable for small data sets
Median
Robust measure of central tendency
Not affected by extreme values
 In an ordered array, the median is
the “middle” number
Median: (n+1)/2 ranked value.
If n is odd, the median is the middle
number.
If n is even, the median is the
average of the two middle numbers.
EXCEL command: MEDIAN
Percentile
To find top xth percentile, we use same
method as quartile.
List data in ascending order
xth percentile = Data in rank (n+1)x/100,
where n is number of data points.
In case of fractional value of rank, use
unitary method to find value. Eg: 80th
percentile out of 30 data points would be
31*0.8=24.8th rank.Value would be 24th data
point * 0.2 + 25th data point * 0.8
EXCEL command for raw data:
PERCENTILE.EXC
Quartiles
Quartiles split data into 4 parts.
1st Quartile splits the lowest 25% of the values from
the rest.
(25th percentile)
3rd Quartile splits the lowest 75% of the values from
the rest.
(75th percentile)
Q2 is the median.
Interquartile range: Q3 – Q1 is a measure of how
the data points are distributed around the central
value Q2
EXCEL command: We can use either
PERCENTILE.EXC or QUARTILE
Interquartile Range
Measure of variation
Also known as midspread
Spread in the middle 50%
Difference between the first and third
quartiles
Not affected by extreme values
Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Interquartile Range Q3  Q1 17.5  12.5 5

5-number summary and the
Box Plot
The 5 numbers: smallest X, Q , Q , Q ,
1 2 3
largest X
Boxplot
Graphical display of data using 5-number
summary

Median( Q2) Xlargest

X smallest Q Q3
1

4 6 8 10 12
Distribution Shape and the
Boxplot

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3
Variation and shape of data:
Range
 Measure of variation
 Difference between the largest and the
Range
smallest  X Largest  X Smallest
observations:

 Ignores the way in which data are distributed

 Does not consider how the values cluster

between extremes.
Variance
Important measure of variation
Shows variation about the mean
Is the average of the square of the difference
between a data point and the mean
n

 X i  X
2

Sample variance:
S 2  i 1 , is the
sample mean. n 1
Excel command: VAR.S N

 Xi   
2

 2  i 1
Population variance: N , µ is
the population mean.
Excel command: VAR.P
Standard Deviation

Most important measure of variation

Shows variation about the mean
Has the same units as the original
data
Is the square root of the variance
n

 X  X
2
i
S  i 1
Sample standard deviation: n 1
N

 Xi  
2
Excel command STDEV.S
  i 1
Population standard deviation:
N
Excel command STDEV.P
Why do we divide by (n-1) for
sample variance?
 For a sample variance to be unbiased, the average
variance for all possible samples for a given
population has to be equal to the population
variance.
 It was mathematically shown that if the sample
variance was calculated using n instead of n-1, the
average variance of all possible samples was not
equal to population variance.
 This is called Bessel’s correction, where we use
denominator (n-1) for calculating sample variance.
 As population size becomes larger compared to
sample size, pop variance and sample variance
gives the same result.
Measuring skewness
Skewness is the measure of asymmetry in a
data distribution. One method of
calculating it is adjusted Fisher Pearson
coefficient, as follows:

A symmetrical distribution like a normal

distribution will have . Negative value
indicates left-skewed data, positive
indicates right-skewed.
Presence of extreme outliers can distort
value of G, giving erroneous results.
Excel command: SKEW
Measuring kurtosis
Kurtosis is a measure of how ‘heavy’ the tails
of a data set are, i.e., how many outliers are
present, relative to a normal distribution.

Normal distribution has kurtosis = 0. Higher

kurtosis indicates large number of outliers,
lower means few outliers.
Like skewness, extreme outliers can distort
kurtosis values.
Excel command: KURT
Errors in visualizing data
Using “chart junk”, visual effects that distort
or distract from the data to be presented, eg:
garish graphics, irrelevant visuals.
Failing to provide a relative basis in
comparing data between groups. For
example, two separate pie charts showing the
operations of two companies does not help if
we’re trying to compare the two.
Compressing the vertical axis – using an axis
going up to 100 when the highest value is 30.
Providing no zero point on the vertical axis
Pitfalls and Ethical Considerations
Data analysis is objective
Should report the summary measures that best
meet the assumptions about the data set
Data interpretation is subjective
Should be done in fair, neutral and clear manner

Numerical descriptive measures:

Should document both good and bad results
Should be presented in a fair, objective and
neutral manner
Should not use inappropriate summary measures
to distort facts

Quantitative Techniques For Management PDF
88% (8)
Quantitative Techniques For Management PDF
507 pages
Chapter 1
100% (1)
Chapter 1
75 pages
M6 - Basic Statistics
No ratings yet
M6 - Basic Statistics
66 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
59 pages
Business Analytics: Describing The Distribution of A Single Variable
No ratings yet
Business Analytics: Describing The Distribution of A Single Variable
58 pages
Chapter 4 Measures of Dispersion (Variation)
No ratings yet
Chapter 4 Measures of Dispersion (Variation)
34 pages
Fundamentals of Statistics With MS Excel
No ratings yet
Fundamentals of Statistics With MS Excel
83 pages
Notes On Data Processing, Analysis, Presentation
No ratings yet
Notes On Data Processing, Analysis, Presentation
63 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Data Management
No ratings yet
Data Management
43 pages
Week 8 Quantitative Data Analysis - Descriptive Statistics
No ratings yet
Week 8 Quantitative Data Analysis - Descriptive Statistics
59 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
40 pages
Set 3 K@mpoi Algebra 2022 - Jawapan
No ratings yet
Set 3 K@mpoi Algebra 2022 - Jawapan
12 pages
Lecture 9
No ratings yet
Lecture 9
40 pages
Descriptive Stats
No ratings yet
Descriptive Stats
39 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
No ratings yet
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
21 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
02 Data
No ratings yet
02 Data
36 pages
Lecture 5
No ratings yet
Lecture 5
33 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
2.data Description
No ratings yet
2.data Description
57 pages
Lec006 - Measures of Dispersion
No ratings yet
Lec006 - Measures of Dispersion
42 pages
Data Mining 1
No ratings yet
Data Mining 1
29 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
DOM105 Session 1
No ratings yet
DOM105 Session 1
31 pages
Poison For Breakfast
100% (1)
Poison For Breakfast
23 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
The IMA Volumes in Mathematics and Its Applications: Avner Friedman Willard Miller, JR
No ratings yet
The IMA Volumes in Mathematics and Its Applications: Avner Friedman Willard Miller, JR
172 pages
Lecture 3 - Numerical Statistics
No ratings yet
Lecture 3 - Numerical Statistics
7 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Qtymeth Dispersion
No ratings yet
Qtymeth Dispersion
8 pages
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
No ratings yet
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
33 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
SINGLE VARIABLE Notes 5.3 Year 10
No ratings yet
SINGLE VARIABLE Notes 5.3 Year 10
9 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Advanced Mathematics 2
No ratings yet
Advanced Mathematics 2
4 pages
Calicut B.Tech Electrical and Electronics Syllabus
0% (1)
Calicut B.Tech Electrical and Electronics Syllabus
150 pages
Class Test 1 Revision Notes
No ratings yet
Class Test 1 Revision Notes
10 pages
Lock in Amplifiers Applications
No ratings yet
Lock in Amplifiers Applications
198 pages
Stress Concentration Problems
No ratings yet
Stress Concentration Problems
15 pages
MAA00A1 Learning Guide
No ratings yet
MAA00A1 Learning Guide
12 pages
Adaptive Filter Application in Echo Cancellation System and Implementation Using FPGA
No ratings yet
Adaptive Filter Application in Echo Cancellation System and Implementation Using FPGA
13 pages
Course - Outline - Math 2030B - F2019 PDF
No ratings yet
Course - Outline - Math 2030B - F2019 PDF
5 pages
Answers)
100% (1)
Answers)
12 pages
QA 27 Geometry - 2
No ratings yet
QA 27 Geometry - 2
33 pages
s2017 Pbs Pixar Notes PDF
No ratings yet
s2017 Pbs Pixar Notes PDF
18 pages
AL2 Series SOFTWARE MANUAL Jy992d74001l PDF
No ratings yet
AL2 Series SOFTWARE MANUAL Jy992d74001l PDF
124 pages
Curriculum Physics Program - Assignment 1 - Ryan Hamilton 91641872
No ratings yet
Curriculum Physics Program - Assignment 1 - Ryan Hamilton 91641872
32 pages
General Universal Joint Characteristics and Applications From SDP - SI
No ratings yet
General Universal Joint Characteristics and Applications From SDP - SI
9 pages
2009 - Ukmt
No ratings yet
2009 - Ukmt
17 pages
Fractions Potential Assessment Questions
No ratings yet
Fractions Potential Assessment Questions
4 pages
Bordasvaldez Studyhabitsattitudetowardsmathmathachievementsofdoscststudents
No ratings yet
Bordasvaldez Studyhabitsattitudetowardsmathmathachievementsofdoscststudents
19 pages
Fibonacci Sequence
No ratings yet
Fibonacci Sequence
6 pages
Test Review
No ratings yet
Test Review
8 pages
Calculus II - Summary of Lecture #3
No ratings yet
Calculus II - Summary of Lecture #3
16 pages
2 Mesh Analysis
No ratings yet
2 Mesh Analysis
16 pages
Mid Term Last Year
No ratings yet
Mid Term Last Year
4 pages
Risk Matrix
No ratings yet
Risk Matrix
1 page
Strings Js Notes
No ratings yet
Strings Js Notes
3 pages
Ian Stewart On Minesweeper
No ratings yet
Ian Stewart On Minesweeper
5 pages
Naskah Fathi Slide I Slide 8
No ratings yet
Naskah Fathi Slide I Slide 8
3 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
The Practically Cheating Statistics Handbook TI-83 Companion Guide
From Everand
The Practically Cheating Statistics Handbook TI-83 Companion Guide
S. Deviant
3.5/5 (3)
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

DOM503 Session 1

Uploaded by

DOM503 Session 1

Uploaded by

DOM503 2021

n is the size of the sample. Excel command:

Interquartile Range Q3  Q1 17.5  12.5 5

Median( Q2) Xlargest

Left-Skewed Symmetric Right-Skewed

 Ignores the way in which data are distributed

 Does not consider how the values cluster

Most important measure of variation

A symmetrical distribution like a normal

Normal distribution has kurtosis = 0. Higher

Numerical descriptive measures:

You might also like