0% found this document useful (0 votes)

11 views36 pages

MANG6513 2023 Lecture 3

Uploaded by

Todd Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views36 pages

MANG6513 2023 Lecture 3

Uploaded by

Todd Wang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Descriptive

Statistics
Fangsheng of
MANG6513: Foundation GeBAMS
[email protected]
Understand the feature of
If, dataset
given a dataset as depicted, how to describe the data? Essentially, this
means we would like to extract the observable feature/pattern of the data.
Age Height Sex
0 50 Boy
4 96 Boy Age Height(boy) Height(girl)
4 90 Girl How? 0 50 50
0 55 Girl 4 96 92
8 120 Boy 8 114 110
8 110 Boy 12 129 133
8 112 Girl … … …
… … …

2 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Describe data
If given the data, how to describe it? Essentially, this means we would like
to extract the observable feature/pattern of the data.
• Univariate/bivariate case
e.g., What is the maximum/minimum/average height in the class?
How does height associate with weight/age/sex?
• Measurements/graphs
e.g., Any statistics/maths can be used to measure/quantify the feature?
Can we use graphs to help the illustration? -> We will cover this part next week

3 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Measures of
Centrality

4 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Measures of Centrality
• Centrality refers to the central location in the data
• Summarize the data into one number
• Key measures:
• Mean
• Median
• Mode
• Midrange

5 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Mean
• Formula

• Sensitive to outliers
Consider two data sets: (1,3,5,6,8) and (1,3,5,6,20).

6 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Median
• The median specifies the middle value when the data are arranged
from least to greatest.
• Half the data are below the median, and half the data are above it.
• For an odd number of observations, the median is the middle of the sorted
numbers.
• For an even number of observations, the median is the mean of the two
middle numbers.
• Not sensitive to outliers
Consider two data sets: (1,3,5,6,8) and (1,3,5,6,20).
The mean values would be 4.6 and 7, respectively.
The median values would be 5 for both.
7 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Mode
• The mode is the observation that occurs most frequently
• You can easily identify the mode from a frequency distribution by
identifying the value or group having the largest frequency or from a
histogram by identifying the highest bar
Think: what about the two data sets: (1,3,5,6,8) and (1,3,5,6,20)?

8 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Midrange
• The midrange is the average of the greatest and least values in the
data set
• Caution must be exercised when using the midrange because extreme
values easily distort the result
Consider two data sets: (1,3,5,6,8) and (1,3,5,6,20).
The mean values would be 4.6 and 7, respectively.
The median values would be 5 for both.
The midrange values would be 4.5 and 11.5, respectively.

• It provides a much rougher estimate than the mean and is often used
for only small sample sizes
9 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
If we want to analyse the dataset of
personal income of all students at uni,
which measure would you suggest to use
to identify the central location of the data?
1. Mean

ü 2. Median

3. Mode

Join: vevox.app ID: 132-776-410 POLL OPEN

Measures of
Dispersion
Measures of Dispersion
• Dispersion refers to the degree of variation in the data; the
numerical spread (or compactness) of the data. That is, how the
data deviate from the central location?

• Key measures:
• Range
• Interquartile range
• Variance
• Standard deviation

12 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Range
• The range is the simplest and is the difference between the maximum
value and the minimum value in the data set

• The range is affected by outliers, and is often used only for very small
data sets
Consider two data sets: (1,3,5,6,8) and (1,3,5,6,20).
The ranges would be [1,8] and [1,20], respectively.

13 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Percentiles
• The kth percentile is a value at or below which at least k percent of
the observations lie. The most common way to compute the kth
percentile is to order the data values from smallest to largest and
calculate the rank of the kth percentile using the formula:

• If, k = 50, means if we want to find the point where half of the
data are smaller than it, essentially, we are calculating the
Median.

14 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Quartiles
• Quartiles break the data into four parts.
• The 25th percentile is called the first quartile,Q1;
• the 50th percentile is called the second quartile, Q2;
• the 75th percentile is called the third quartile, Q3; and
• the 100th percentile is the fourth quartile, Q4.
• One-fourth of the data fall below the first quartile, one-half are below
the second quartile, and three-fourths are below the third quartile.

15 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Interquartile Range
• The interquartile range (IQR), or the midspread is the difference
between the first and third quartiles, Q3 – Q1.

• This includes only the middle 50% of the data and, therefore, is less
influenced by extreme values

16 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Identifying Outliers
• There is no standard definition of what constitutes an outlier.
• Boxplot can be used for this:
Max = Q3 + 1.5 ∗ IQR
Min = Q1 – 1.5 ∗ IQR

• Some typical rules of thumb:

• z-scores greater than +3 or less than -3
• Extreme outliers are more than 3*IQR to the left of Q1 or right of Q3
• Mild outliers are between 1.5*IQR and 3*IQR to the left of Q1 or right of Q3
17 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Variance and Standard
Deviation
• The variance is the “average” of the squared deviations from the mean

• The standard deviation S is the square root of the variance.

Consider two data sets: (1,3,5,6,8) and (1,3,5,6,20).
The variances are 7.3 and 56.5, respectively
The SDs are 2.7 and 7.5, approximately.
• The dimension/scale of the variance is the square of the dimension/scale
of the observations, whereas the dimension/scale of the standard
deviation is the same as the data.
18 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Variance and Standard
Deviation
• For many data sets encountered in practice:
• Approximately 68% of the observations fall within one standard deviation of
the mean

• Approximately 95% fall within two standard deviations of the mean

• Approximately 99.7% fall within three standard deviations of the mean

• These rules are commonly used to characterize the natural variation

in manufacturing processes and other business phenomena.

19 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Standardized Values
• A standardized value, commonly called a z-score, provides a normalised
measure of the distance an observation is from the mean, which is
independent of the units of measurement.
e.g., data with different scales, say (1,3,5,6,9) and (10,30,50,60,90), can we say they
have different or same dispersion?

• The z-score for the ith observation in a data set is calculated as

follows:

20 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Standardized Values
• The numerator represents the distance that xi is from the sample
mean; a negative value indicates that xi lies to the left of the mean,
and a positive value indicates that it lies to the right of the mean.

• By dividing by the standard deviation, s, we scale the distance from

the mean to express it in units of standard deviations. Thus,
• a z-score of 1.0 means that the observation is one standard deviation to the
right of the mean;
• a z-score of -1.5 means that the observation is 1.5 standard deviations to the
left of the mean.

21 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Measures of
Shape
Skewness
• Skewness describes the degree of asymmetry of data
• Coefficient of Skewness (CS):
• Distributions that tail off to the right are called positively skewed; those
that tail off to the left are said to be negatively skewed

Positively skewed Symmetrical

23 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Kurtos
is
• Kurtosis refers to the peakedness (i.e., high, narrow) or flatness (i.e.,
short, flat-topped) of a histogram.
• The coefficient of kurtosis (CK):

• CK measures the degree of kurtosis of

a population
• CK < 3 indicates the data is somewhat flat
with a wide degree of dispersion.
• CK > 3 indicates the data is somewhat
peaked with less dispersion.
24 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Shape and Measures of
Location
• Comparing measures of location can sometimes reveal information
about the shape of the distribution of observations.
• For example:
• If the distribution were perfectly symmetrical and unimodal, the mean,
median, and mode would all be the same.
• If it were negatively skewed, we would generally find that mean <
median < mode
• Positive skewness would suggest that mode < median <
mean

25 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Measures of
Association
Covariance
• Covariance is a measure of the linear association between two
variables, X and Y.
• The covariance between X and Y is the average of the product of the
deviations of each pair of observations from their respective means.

Covariance = 263.37

27 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Correlation
• Correlation is a normalised version of measure of the linear relationship
between two variables, X and Y, which does not depend on the units of
measurement.
• The correlation coefficient is scaled between -1 and 1.

Correlation = 0.56

28 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Examples of Correlation

29 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Measures of Association
• Two variables have a strong statistical relationship with one another if
they appear to move together.

• When two variables appear to be related, you might suspect a cause-

and-effect relationship.

• However, statistical relationships may exist even though a change in

one variable is not caused by a change in the other.

30 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Introduction
to R
The R Environment
• R is an integrated suite of software facilities for data manipulation,
calculation and graphical display. Among other things it has
• an effective data handling and storage facility,
• a suite of operators for calculations on arrays, in particular matrices,
• a large, coherent, integrated collection of intermediate tools for data
analysis,
• graphical facilities for data analysis and display either directly at the computer
or on hardcopy,
• a well developed, simple and effective programming language (called ‘S’)
• It has been extended by a large collection of packages
• One of sought-after business analytics skills as perceived by industry
•32 Online tutorial: https://education.rstudio.com/learn/beginner/
Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
RStudio

Data

Script

Files, plots, packages, help

Console

33 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
RStudio

Changing working directory

Getting help with functions and features

Exiting RStudio

34 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Reading data file
df <- read.table("mydata.csv", header = TRUE, sep = ",")
df <- read.csv("mydata.csv", header = TRUE)
df <- read.csv2("mydata.csv", header= TRUE)

Sep = the separator symbol;

The header argument is set at TRUE if the first line of the file being read contains the header with
the variable names;

read.csv() treats comma as the separator symbol

read.csv2() treats semicolon as the separator symbol

35 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )
Online resource
Linkdin Learning Resource:
https://www.linkedin.com/learnin
g/paths/master-r-for-data-science
?u=35146660

36 Top 50 in the world for Statistics and Operational Research (QS 2021 – 2019 )

Safari
No ratings yet
Safari
385 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Data Organization Method
No ratings yet
Data Organization Method
65 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Lecture 5 (Descriptive Statistics)
No ratings yet
Lecture 5 (Descriptive Statistics)
39 pages
SCSA1606 - Predictive and Advanced Analytics - Unit II
No ratings yet
SCSA1606 - Predictive and Advanced Analytics - Unit II
50 pages
BB Module 2 BASIC STATISTICS
No ratings yet
BB Module 2 BASIC STATISTICS
63 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Statistics
No ratings yet
Statistics
10 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Stats
No ratings yet
Stats
109 pages
ST8114 Module1 PartI UnivariateEDA
No ratings yet
ST8114 Module1 PartI UnivariateEDA
60 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Chapter 1 Descriptivestatistics
No ratings yet
Chapter 1 Descriptivestatistics
21 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
ISM - Session 1 - May 2025
No ratings yet
ISM - Session 1 - May 2025
54 pages
Analytics Compendium (Incl Stats)
No ratings yet
Analytics Compendium (Incl Stats)
31 pages
Statistics
No ratings yet
Statistics
63 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
Bioepi Lesson 6. Descriptive Statistics
No ratings yet
Bioepi Lesson 6. Descriptive Statistics
38 pages
Hns 2321 Biostatistics Descritive Statistics
No ratings yet
Hns 2321 Biostatistics Descritive Statistics
35 pages
2.data Description
No ratings yet
2.data Description
57 pages
Measures
No ratings yet
Measures
8 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
MMW Reviewer
No ratings yet
MMW Reviewer
9 pages
Cba101 MT
No ratings yet
Cba101 MT
4 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
3 - Descriptive Stat
No ratings yet
3 - Descriptive Stat
70 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Descriptive Statistic
No ratings yet
Descriptive Statistic
37 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
Taxation Direct and Indirect
100% (1)
Taxation Direct and Indirect
304 pages
Session 1 ISM May 2024
No ratings yet
Session 1 ISM May 2024
59 pages
Statistics 3: DR Taher
No ratings yet
Statistics 3: DR Taher
38 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Statistical Organization of Scores
No ratings yet
Statistical Organization of Scores
109 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Rizal Bourbonic Reform
100% (1)
Rizal Bourbonic Reform
3 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
43 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Data Management
No ratings yet
Data Management
7 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
HDFC Bank Annual Report (1) - 286-289
No ratings yet
HDFC Bank Annual Report (1) - 286-289
4 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Gtu 302 Biostatistics: Descriptive Statistics
100% (2)
Gtu 302 Biostatistics: Descriptive Statistics
57 pages
Quantitative Data Analysis
100% (2)
Quantitative Data Analysis
27 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Accounting Grade 12 Study Guide 2023
No ratings yet
Accounting Grade 12 Study Guide 2023
110 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
Statistics
100% (1)
Statistics
11 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
CHAPTER 3 Payroll
No ratings yet
CHAPTER 3 Payroll
14 pages
Vat-Quiz No. 1: Record As The Email To Be Included With My Response
No ratings yet
Vat-Quiz No. 1: Record As The Email To Be Included With My Response
15 pages
PIT Revision
No ratings yet
PIT Revision
6 pages
Tutorial Questions BBFA1053 2023 Jan
No ratings yet
Tutorial Questions BBFA1053 2023 Jan
39 pages
Cash 1099
No ratings yet
Cash 1099
5 pages
Aboitiz Financial Statement Analysis
No ratings yet
Aboitiz Financial Statement Analysis
16 pages
ch04 Income Statement
No ratings yet
ch04 Income Statement
76 pages
Randommmmm
No ratings yet
Randommmmm
8 pages
Bodie Essentials 2024 Release Chapter 14
No ratings yet
Bodie Essentials 2024 Release Chapter 14
32 pages
StarAgri P&L 2024
No ratings yet
StarAgri P&L 2024
1 page
HI 5020 Corporate Accounting: Session 8a Intra-Group Transactions
No ratings yet
HI 5020 Corporate Accounting: Session 8a Intra-Group Transactions
15 pages
Book Num 3 CAF DT Regular Batch November 2021
No ratings yet
Book Num 3 CAF DT Regular Batch November 2021
358 pages
Cpa 1 Financial Accounting Class Exercises May 22
No ratings yet
Cpa 1 Financial Accounting Class Exercises May 22
3 pages
Capital Budgeting
No ratings yet
Capital Budgeting
31 pages
MANG6513 2023 Lecture 1
No ratings yet
MANG6513 2023 Lecture 1
31 pages
Generally Accepted Accounting Principles (GAAP)
No ratings yet
Generally Accepted Accounting Principles (GAAP)
20 pages
Shivam Cement
No ratings yet
Shivam Cement
13 pages
Soal AKM Bab 18-23 (Bismillah Presentasi Lancar) BLM Fix Yu Bisa Yu-1
No ratings yet
Soal AKM Bab 18-23 (Bismillah Presentasi Lancar) BLM Fix Yu Bisa Yu-1
37 pages
Year End Activties List
No ratings yet
Year End Activties List
15 pages
FAR 24-25 - Employee Benefits, Income Taxes
No ratings yet
FAR 24-25 - Employee Benefits, Income Taxes
2 pages
DTB Assignment Central Tendencyans
No ratings yet
DTB Assignment Central Tendencyans
6 pages
ACK741684290051224
No ratings yet
ACK741684290051224
1 page
Accounting GR 10 - MBL-LIPs-Week10
No ratings yet
Accounting GR 10 - MBL-LIPs-Week10
5 pages
GaussianSurfaceEField Stu
No ratings yet
GaussianSurfaceEField Stu
3 pages
Financial Accounting A Practical Version
No ratings yet
Financial Accounting A Practical Version
10 pages
01 - Chart of Accounts
No ratings yet
01 - Chart of Accounts
2 pages
Capacitor e Storage
No ratings yet
Capacitor e Storage
1 page
ElastPE Calc
No ratings yet
ElastPE Calc
1 page
AU12 FSA MidTerm Quiz
No ratings yet
AU12 FSA MidTerm Quiz
4 pages
Graded Quiz 3 2
No ratings yet
Graded Quiz 3 2
5 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet