0% found this document useful (0 votes)

90 views44 pages

Unit 3 - Descriptive Statistics

Uploaded by

Rajdeep Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views44 pages

Unit 3 - Descriptive Statistics

Uploaded by

Rajdeep Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

EU Business School Munich

Quantitative Business Methods

Lecturer: Hashem Zarafat
E-Mail: [email protected]

UNIT 3 - FUNDAMENTALS OF
STATISTICS
Hashem Zarafat

QUANTITATIVE BUSINESS METHODS

UNIT 3 – FUNDAMENTAL OF STATISTICS
Some terms first…

FUNDAMENTALS OF STATISTICS
What Is Statistics?

1. Collecting Data
e.g. survey, databases
Data Why?
2. Presenting Data
e.g., plots, charts, tables, data visualization
Analysis

Decision-
Making
3. Characterizing Data
e.g., means, correlations…

© 1984-1994 T/Maker Co.

Statistical Methods

Statistical
Methods

Descriptive Inferential
Statistics Statistics
Descriptive Statistics
1. Involves
• Collecting Data
• Presenting Data $
50
• Characterizing Data
2. Purpose 25
• Describe Data
Numerical measures that describe a distribution 0
by providing:
• Information on the central tendency of Q1 Q2 Q3 Q4
the distribution
• The width of the distribution
• The shape of the distribution X = 30.5 S2 = 113
Measure of central tendency: a number that
characterizes the “middleness” of an entire
distribution
Inferential Statistics
1. Involves Population?
• Estimation
• Hypothesis
Testing

2. Purpose
• Make decisions about population
characteristics
Fundamental elements:
1. Statistical Inference
• Estimation or prediction or generalization about a
population based on information contained in a
sample
2. Measure of Reliability
• Statement (usually qualified) about the degree of
uncertainty associated with a statistical inference
Populations vs Samples

• A population (census) includes all of the entities of interest,

whether they be people, households, machines, or whatever.
The following are three typical populations:
– All potential voters in a presidential election
– All subscribers to cable television
– All invoices submitted for Medicare reimbursement by
nursing homes
• A sample is a subset of the population, often randomly chosen
and preferably representative of the population as a whole
A Puzzle is a Sample Until It Is Done! The Sample Allows One to Guess at the Picture.
SCALES OF MEASUREMENT
Nominal Scale (Categorical)

• Objects or individuals are assigned to categories that have no

numerical properties
• Characteristic of identity
• Categorical variables: variables measured on a nominal scale
• Examples: names, ethnicity, gender, favorite color, citizenship
• Dummy variables (experimental = 1; control = 2)

• R: “character” object
Ordinal Scale (Categorical)

• Objects or individuals are categorized, and the categories form a rank

order along a continuum
• Properties of identity and magnitude
• Ordinal data: referred to as ranked data
• Example: income brackets, educational level, grades (A, B, C), scale of
preferences (e.g. 0 to 10)
• Income and age brackets

• R: “factor” object
Interval Scale (Numerical, Continuous)

• Intervals between the numbers on the scale are all

equal in size
• Criteria of identity, magnitude, and equal unit size
are met
• Example: Fahrenheit temperature scale

• R: “numeric” object
Ratio Scale (Numerical, Continuous)

• A scale in which, in addition to order and equal units

of measurement, an absolute zero indicates an
absence of the variable being measured
• Ratio data have all properties of measurement
• Examples: number of children, times you check FB
per day…

• R: also “numeric” object (SPSS – “scale” variable, no

differentiation from the interval scale)
Scale of Measurement and Mathematical
Operations
Scale of measurement and Statistical Tests
Nominal Ordinal Interval Ratio
Descriptive Mode Mode Mode Mode
Statistics Median Median Median
Range Statistics Mean Mean
Range Statistics Range Statistics
Variance Variance
Standard Standard
deviation deviation
Inferential Non-Parametric Non-parametric Parametric Parametric
Statistics Chi-Square Mann-Whitney T test T test
U ANOVA ANOVA
Kruskal-Wallis H Pearson Pearson
Friedman Correlation Correlation
ANOVA
Not normal: Not normal:
Spearman
Non-parametric Non-parametric
Correlation
What about Measures of Central Tendency?
Data, data, data…
Probability Distributions

DISCRETE AND CONTINUOUS

VARIABLES
Discrete Variables
• Consist of whole number units or categories
• Made up of chunks or units that are detached and distinct
from one another
• Main questions:
– Are the variable’s values exhaustive (finite) or not?
– Do we know/ can we count all the possible values?

• Most nominal and ordinal data are discrete

– Examples: gender, political party, ethnicity
• Some interval or ratio data can be discrete
– Example: number of children in a family (one cannot have
5.34 children)
Continuous Variables

• Usually fall along a continuum and allow for

fractional amounts
• There is no way we can count all the possible values!
• Examples: age (22.7 years), height (64.5 inches),
weight (113.25 pounds)
• Most interval and ratio data are continuous in nature
Data Sets, Variables, Observations

• A data set is usually a rectangular array of data, with

variables in columns and observations in rows.
– R: “Data Frames”
• A variable (or field or attribute) is a characteristic of
members of a population, such as height, gender, or
salary.
• An observation (or case or record) is a list of all
variable values for a single member of a population.
➢ Logic of a Database

➢ R/ Excel: you need

two files sometimes
(data and desc)
STATISTICAL MEASURES
Measures of Central Tendency: The Mean

• The mean is the average of all values of a variable.

• If the data set represents a sample from some larger
population, we call this measure the sample mean and denote
it by .
• If the data set represents the entire population, we call it the
population mean and denote it by μ.
Measures of Central Tendency: The Median

• The median is the middle observation when the

data is arranged from smallest to largest.
• If the number of observations is odd, the median is
literally the middle observation.
• If the number of observations is even, the median
is usually defined as the average of the two middle
observations.
Measures of Central Tendency: The Mode

• The mode is the value that appears most

often.
• In most cases where a variable is essentially
continuous, the mode is not very interesting
because it is often the result of a few lucky
ties.
Minimum, Maximum, Percentiles

• The minimum and maximum are self-explanatory

• For any percentage p, the pth percentile is the value such that a
percentage p of all values are less than it.
– Splits the data into two pieces: the lower piece contains k percent of
the data, and the upper piece contains the rest of the data.

• Calculating: For example, suppose you have 25 test scores, and in order from
lowest to highest they look like this: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72,
77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99. To find the 90th percentile for
these (ordered) scores, start by multiplying 90% times the total number of scores,
which gives 90% ∗ 25 = 0.90 ∗ 25 = 22.5 (the index). Rounding up to the nearest
whole number, you get 23. Counting from left to right (from the smallest to the
largest value in the data set), you go until you find the 23rd value in the data set.
That value is 98, and it’s the 90th percentile for this data set.
Measures of Variability:
Quartiles, Range, and Interquartile Range

• The quartiles divide the data into four groups, each with
(approximately) a quarter of all observations.
• Naturally, the first, second and third quartiles are the percentiles
corresponding to p = 25%, p = 50%, and p = 75%.
• By definition, the second quartile (p = 50%) is equal to the median.

• The range is defined as the maximum value minus the minimum

value.
• The range is a fairly crude measure of variability.
• The interquartile range (IQR) is defined as the third quartile minus
the first quartile.
• Thus, the IQR is the range of the middle 50% of the data.
• It is less sensitive to extreme values than the range.
Measures of Variability: Variance and
Standard Deviation
• The variance is essentially the average of the squared
deviation from the mean.
• If Xi is a typical observation, its squared deviation from
the mean is .
• The sample variance is denoted by s2, and the population
variance by σ2.
• It is hard to interpret the variance numerically because it
is in squared units (e.g. $ → $2).
Formula for the Variance
Sample Variance: Population Variance:

• If all of the observations are close to the mean, then their squared
deviations from the mean will be relatively small, and the variance
will be relatively small.
• If at least a few of the observations are far from the mean, then
their squared deviations from the mean will be large, and this will
cause the variance to be large.
Empirical Rules for Interpreting Standard
Deviation
• A more natural measure is the standard deviation which is the square
root of variance (denoted as just s or σ).
• The interpretation of the standard deviation can be stated as three
empirical rules.
• If the values of a variable are approximately normally distributed
(symmetric and bell-shaped), then the following rules hold:
(1) Approximately 68% of the observations are within one standard deviation of
the mean.
(2) Approximately 95% of the observations are within two standard deviations
of the mean.
(3) Approximately 99.7% of the observations are within three standard deviations
of the mean.
• Fortunately, many variables in real-world data are indeed approximately
normally distributed.
Normal Distribution: Empirical Rules
• Standard normal distribution: a normal distribution with a mean of
0 and a standard deviation of 1 (→ “standardization”)
• Probability: the expected relative frequency of a particular outcome
Mean Absolute Deviation

• The mean absolute deviation (MAD) is another measure of

variability.
• For many variables, the standard deviation is approximately 25%
larger than the MAD:

• Formula for Mean Absolute Deviation:

Dataset ➢ Range
(exam scores; random sample): ➢ 99-57=42
57, 99, 78, 73, 84, 95
➢ Variance
Compute: ➢ ((57-81)^2+…)/(6-1) = 235.6

➢ Mean
➢ (57+99+78+73+84+95)/6 = 81

➢ Median
➢ 57, 73, 78, 84, 95, 99 ➢ Standard Deviation
➢ Even: (78+84)/2 = 81 ➢ √235.6 ≈ 15.35

➢ 90th percentile
➢ 0.9*6=5.4; round-up=6; 6th score
in ranked dataset:99.
Measures of Shape: Skewness
• Skewness occurs because of a lack of symmetry.
▪ A variable can be skewed to the right (or positively
skewed) because of some really large values (e.g.
comparing Baseball players’ salaries).
▪ Or it can be skewed to the left (or negatively skewed)
because of some really small values (e.g. examining
temperature lows in Antarctica).
Measures of Shape: Kurtosis
▪ Kurtosis has to do with the “fatness” of the tails of the
distribution relative to the tails of a normal distribution.
▪ A distribution with high kurtosis has many extreme
observations.
Skewness and Kurtosis – What are acceptable
values?

– Statistical methods include diagnostic hypothesis tests for

normality, and a rule of thumb that says a variable is
reasonably close to normal if its skewness and kurtosis
have values between –1.0 and +1.0.
– None of the methods is absolutely definitive.
– We will use the criteria that the skewness and kurtosis of
the distribution both fall between -1.0 and +1.0.
Outliers (Z Score)
• An outlier is literally a value or an entire observation that lies well outside
of the norm.
• You might define an outlier as any value more than three standard
deviations from the mean, but this is only a rule of thumb.
• Boxplot is widely used to define outliers.
• Probably the best advice for dealing with outliers is to run the analyses two
ways: With the outliers and without them.
Missing Values
• What are missing values?
– What to do about them?
• Never use “0” or a “blank” for a missing value (or any
“possible” value)!
• Missing data are coded in a variety of ways.
– Excel: empty cells/ codes.
– SPSS: 999 or 999.999 for a missing value
– R: NA for a missing value

Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Educ 201
No ratings yet
Educ 201
2 pages
Biology Lesson 9.1 Worksheet
No ratings yet
Biology Lesson 9.1 Worksheet
3 pages
Mobile Blood Donor Clinic A Discrete Event Simulation Model Case Analysis and Case Solution
100% (1)
Mobile Blood Donor Clinic A Discrete Event Simulation Model Case Analysis and Case Solution
11 pages
Stats
No ratings yet
Stats
109 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
Data Management
100% (1)
Data Management
51 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Class 1
No ratings yet
Class 1
52 pages
Data Organization Method
No ratings yet
Data Organization Method
65 pages
Cba101 MT
No ratings yet
Cba101 MT
4 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
STAE Lecture Notes - LU3 - Annotated
No ratings yet
STAE Lecture Notes - LU3 - Annotated
10 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
ISM Session 1-8+webinar1,2 Merged
No ratings yet
ISM Session 1-8+webinar1,2 Merged
718 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
MATM111
No ratings yet
MATM111
8 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Predictive Analytics Notes1
No ratings yet
Predictive Analytics Notes1
37 pages
20 - Levels of Measurement, Central Tendency Dispersion
No ratings yet
20 - Levels of Measurement, Central Tendency Dispersion
35 pages
City Uni of New York
No ratings yet
City Uni of New York
33 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Statistics
No ratings yet
Statistics
10 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Unit 4
No ratings yet
Unit 4
152 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Basic of Statistics #5 (!!!)
No ratings yet
Basic of Statistics #5 (!!!)
49 pages
Module 3 Descriptive Statistics Numerical Measures
No ratings yet
Module 3 Descriptive Statistics Numerical Measures
28 pages
1 Basics of Stat (Statistics IEM 2-2)
No ratings yet
1 Basics of Stat (Statistics IEM 2-2)
29 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Basic Stat 1
No ratings yet
Basic Stat 1
50 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Lecture Afffasfafa
No ratings yet
Lecture Afffasfafa
29 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
2 Research - 2ND QT - Week 1 - 10 14 2024
No ratings yet
2 Research - 2ND QT - Week 1 - 10 14 2024
13 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
2.data Description
No ratings yet
2.data Description
57 pages
ISM - Session 1 - May 2025
No ratings yet
ISM - Session 1 - May 2025
54 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Week 01
No ratings yet
Week 01
71 pages
MMW-FINALS-REVIEWER - Etc
No ratings yet
MMW-FINALS-REVIEWER - Etc
4 pages
Statistics
100% (1)
Statistics
11 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Hotel Quellenhof: Restaurant Concept & Initial Menu
No ratings yet
Hotel Quellenhof: Restaurant Concept & Initial Menu
10 pages
Advanced Probability Examples PDF
No ratings yet
Advanced Probability Examples PDF
32 pages
EUBS QBM Lecture Notes 1
No ratings yet
EUBS QBM Lecture Notes 1
2 pages
Probability Examples
No ratings yet
Probability Examples
5 pages
Advanced Probability Examples PDF
No ratings yet
Advanced Probability Examples PDF
32 pages
Unit 7 - Forecasting and Time Series - Advanced Topics
No ratings yet
Unit 7 - Forecasting and Time Series - Advanced Topics
54 pages
Unit 8 - Linear Programming
No ratings yet
Unit 8 - Linear Programming
39 pages
Unit 5 and 6 - Inferential Statistics and Regression Analysis
No ratings yet
Unit 5 and 6 - Inferential Statistics and Regression Analysis
68 pages
Unit 2 - Decision Analysis
No ratings yet
Unit 2 - Decision Analysis
40 pages
Unit 1 - Probability Theory
No ratings yet
Unit 1 - Probability Theory
71 pages
National Working Plan Code 2014 PDF
No ratings yet
National Working Plan Code 2014 PDF
91 pages
21CS53 DBMS Module3 QuestionBank 2023-24
No ratings yet
21CS53 DBMS Module3 QuestionBank 2023-24
3 pages
MT6761 Android Scatter
No ratings yet
MT6761 Android Scatter
12 pages
Handouts
No ratings yet
Handouts
4 pages
Factors Affecting Investment Decisions Studies On Young Investors
No ratings yet
Factors Affecting Investment Decisions Studies On Young Investors
7 pages
Nonparametric Tests: Larson/Farber 4th Ed
No ratings yet
Nonparametric Tests: Larson/Farber 4th Ed
94 pages
Gas To Power Feasibility Study Presentation
No ratings yet
Gas To Power Feasibility Study Presentation
23 pages
CMT Quiz
No ratings yet
CMT Quiz
3 pages
76.research On The Influence of Heat Treatment On The
No ratings yet
76.research On The Influence of Heat Treatment On The
7 pages
Thesis Paper Project Evaluation 1
No ratings yet
Thesis Paper Project Evaluation 1
25 pages
Hansen, Mass Culture in Kracauer, Derrida, Adorno
No ratings yet
Hansen, Mass Culture in Kracauer, Derrida, Adorno
32 pages
SHP-DS705 USER Manual
No ratings yet
SHP-DS705 USER Manual
2 pages
Mohit SOP (University of Adelaide)
No ratings yet
Mohit SOP (University of Adelaide)
2 pages
Chap 10
No ratings yet
Chap 10
50 pages
Introduction To Parachor
100% (1)
Introduction To Parachor
2 pages
For Your Salvation
No ratings yet
For Your Salvation
455 pages
Datasheet: Model 230 Brushless Slip Ring
No ratings yet
Datasheet: Model 230 Brushless Slip Ring
7 pages
2015 HK
No ratings yet
2015 HK
20 pages
Resource Persons: Chief Patron Patron Chairman Convenor Co-Convenors
No ratings yet
Resource Persons: Chief Patron Patron Chairman Convenor Co-Convenors
2 pages
Grade 10 Work Sheet w5 q1
100% (2)
Grade 10 Work Sheet w5 q1
2 pages
Edu 101
No ratings yet
Edu 101
2 pages
jinnes,+CJNR Vol 36 Issue 01 Art 02
No ratings yet
jinnes,+CJNR Vol 36 Issue 01 Art 02
9 pages
Routine EEE UG Spring2024
No ratings yet
Routine EEE UG Spring2024
41 pages
(LSE Monographs On Social Anthropology 63) Andre Beteille - Society and Politics in India - Essays in A Comparative Perspective-Athlone Press - Routledge (1991) (Z-Lib - Io)
No ratings yet
(LSE Monographs On Social Anthropology 63) Andre Beteille - Society and Politics in India - Essays in A Comparative Perspective-Athlone Press - Routledge (1991) (Z-Lib - Io)
326 pages
9540WTS 9560WTS 9580WTS Combines MY 2001 2004 Europe Edition Introduction
No ratings yet
9540WTS 9560WTS 9580WTS Combines MY 2001 2004 Europe Edition Introduction
6 pages
MICRO CHAP6 ACTS DRAFT Copy 1
No ratings yet
MICRO CHAP6 ACTS DRAFT Copy 1
3 pages
Analisis Perubahan Faktor Keamanan Lereng Akibat Hujan: (Analysis of Changes Safety Factor of Slope Due To Rainfall)
No ratings yet
Analisis Perubahan Faktor Keamanan Lereng Akibat Hujan: (Analysis of Changes Safety Factor of Slope Due To Rainfall)
8 pages
Basic Calculus q4
No ratings yet
Basic Calculus q4
74 pages
3M Petrifilm Yeast Molds
No ratings yet
3M Petrifilm Yeast Molds
8 pages

Unit 3 - Descriptive Statistics

Uploaded by

Unit 3 - Descriptive Statistics

Uploaded by

EU Business School Munich

Quantitative Business Methods

QUANTITATIVE BUSINESS METHODS

© 1984-1994 T/Maker Co.

• A population (census) includes all of the entities of interest,

• Objects or individuals are assigned to categories that have no

• Objects or individuals are categorized, and the categories form a rank

• Intervals between the numbers on the scale are all

• A scale in which, in addition to order and equal units

• R: also “numeric” object (SPSS – “scale” variable, no

DISCRETE AND CONTINUOUS

• Most nominal and ordinal data are discrete

• Usually fall along a continuum and allow for

• A data set is usually a rectangular array of data, with

➢ R/ Excel: you need

• The mean is the average of all values of a variable.

• The median is the middle observation when the

• The mode is the value that appears most

• The minimum and maximum are self-explanatory

• The range is defined as the maximum value minus the minimum

• The mean absolute deviation (MAD) is another measure of

• Formula for Mean Absolute Deviation:

– Statistical methods include diagnostic hypothesis tests for

You might also like