1 Introduction
1 Introduction
Slide 1
Acknowledgement to Dr. William Lau
Statistics
Slide 2
Applications in
Business and Economics
Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.
Finance
Financial advisors use price-earnings ratios and
dividend yields to guide their investment advice.
Slide 3
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of
marketing research applications.
Production
A variety of statistical quality control charts are used
to monitor the output of a production process.
Slide 4
Data and Data Sets
Slide 5
Elements, Variables, and Observations
Slide 6
Data, Data Sets,
Elements, Variables, and Observations
Variables
Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)
Data Set
Slide 7
Types of Data
• Categorical
• Numerical
Observed values are integer, real or complex numbers
Examples: IQ scores of GT students (integer values)
Lifetime of a computer chip (real values)
Slide 8
Quick check
Slide 9
Scales of Measurement
Scales of measurement include:
Nominal: Categorical with no order
Ordinal: Categorical with order
Interval: numerical values
Ratio: numerical positive values
Slide 10
Scales of Measurement
Nominal
Slide 11
Scales of Measurement
Nominal
Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).
Slide 12
Scales of Measurement
Ordinal
Slide 13
Scales of Measurement
Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Slide 14
Scales of Measurement
Interval
Slide 15
Scales of Measurement
Interval
Example:
Melissa has an SAT score of 1885, while Kevin
has an SAT score of 1780. Melissa scored 105
points more than Kevin.
Slide 16
Scales of Measurement
Ratio
Slide 17
Scales of Measurement
Ratio
Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.
Slide 18
Scales of Measurement
Data
Categorical Numerical
Slide 19
Cross-Sectional Data
Slide 20
Time Series Data
Slide 21
Time Series Data
Slide 22
Data Sources
Existing Sources
Slide 23
Data Sources
Slide 24
Data Sources
Slide 25
Data Sources
Slide 26
Data Acquisition Considerations
Time Requirement
• Searching for information can be time consuming.
• Information may no longer be useful by the time it
is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Slide 27
Descriptive Statistics
Slide 28
Example: William Auto Repair
Slide 29
Example: William Auto Repair
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Slide 30
Tabular Summary:
Frequency and Percent Frequency
Example: William Auto Repair
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
(2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100
Slide 31
Graphical Summary: Histogram
10
8
6
4
2
Parts
50−59 60−69 70−79 80−89 90−99 100-110 Cost ($)
Slide 32
Numerical Descriptive Statistics
Slide 33
Variability in Data
Slide 34
Probability vs Statistics
Probability
Population Sample
Inferential
Statistics
Slide 35
Statistical Inference
Population: a finite well-defined group of ALL objects which,
although possibly large, can be enumerated in theory
(e.g. investigating ALL the bearings manufactured today).
Population
Sample
Observation
Slide 36
Estimation
Hospital waiting time:
� = 𝟓𝟓. 𝟎𝟎𝟎𝟎
𝜷𝜷 Slide 37
Confidence Interval
How confident we are given the variability of data?
5.07
𝜷𝜷 ∈
Slide 38
Hypothesis Test
Null hypothesis 𝜷𝜷 ≤ 𝟓𝟓
Data
Decision
Slide 39
Example: Comparing two population
Which drug is more effective
Group 1 Group 2
Slide 40
Example: Comparing two population
A/B testing
Slide 41
Linear Regression
• Predict a response variable based on one or m ore predictor
variables
• Identify im portant factors influencing a response variable
What are important variables affecting the waiting time in hospital
waiting room?
Yi = β 0 + β1 X i + ε i i = 1,2,, n
Slide 42
Intercept Slope Random error
Statistical Inference
Slide 43
Process of Statistical Inference
1. Population
consists of all tune- 2. A sample of 50
ups. Average cost of engine tune-ups
parts is unknown. is examined.
Slide 44
Computers and Statistical Analysis
Slide 45
Data Warehousing
Slide 46
Data Mining
Slide 47
Data Mining Applications
Slide 48
Data Mining Requirements
Slide 49
Data Mining Model Reliability
Slide 50
Ethical Guidelines for Statistical Practice
Slide 51