1 Introduction To Statistics - Handouts
1 Introduction To Statistics - Handouts
Introduction to Statistics
By
Dr. Vishal Singh Patyal
Learning Objectives
In this chapter, you will learn:
What is Statistics
Why Statistics
Basic vocabulary used in Statistics
How statistics is used in Business
The sources of data and its types used in Business
Types of Variables
Level of Management
Tabular and Graphical Presentation of Data
Introduction to Statistics
7/20/2015
Introduction to Statistics
What is Statistics?
The science of collecting, describing, and interpreting data.
Statistics is a way to get information from data
Statistics
Data
Information
Information: Knowledge
communicated
concerning some
particular fact.
7/20/2015
Introduction to Statistics
Types of Statistics
Statistics
The branch of mathematics that transforms data into
useful information for decision makers.
Descriptive Statistics
Inferential Statistics
Collecting, summarizing,
and describing data
Drawing conclusions
and/or making decisions
concerning a population
based only on sample data
Introduction to Statistics
7/20/2015
Descriptive Statistics
Collect data
ex. Survey
Present data
ex. Tables and graphs
Characterize data
ex. Sample mean = X i
n
Collect
Organize
Summarize
Display
AnalyzeIntroduction to Statistics
Inferential Statistics
Estimation
ex. Estimate the population
mean weight using the sample
mean weight
Hypothesis testing
ex. Test the claim that the
population mean weight is 120
pounds
7/20/2015
Example
A recent study examined the QUANT and VERBAL CAT scores of
students across the country.
Which of the following
statements are descriptive in nature and which are inferential.
The mean QUANT CAT score was 492. D
The mean VERBAL SAT score was 475. D
Students in the Northeast scored higher in QUANT but lower
in VERBAL. I
80% of all students taking the exam were headed for IIMs. I
32% of the students scored above 610 on the VERBAL CAT. D
The QUANT CAT scores are higher than they were 10 years
ago. I
Introduction to Statistics
10
7/20/2015
Subset
Statistic
Parameter
Measures used to describe
the population are called
parameters
Measures computed
from sample data are
called statistics
Introduction to Statistics
11
12
7/20/2015
Example
NICMAR Institute dean is interested in learning about the average
age of faculty. Identify the basic terms in this situation.
The population is the age of all faculty members at the Institute.
A sample is any subset of that population. For example, we might
select 10 faculty members and determine their age.
The variable is the age of each faculty member.
The data would be the set of values in the sample.
The parameter of interest is the average age of all faculty at the
Institute.
The statistic is the average age for all faculty in the sample.
Introduction to Statistics
13
Introduction to Statistics
14
7/20/2015
Sources of Data
Primary Sources:
The data collector is the one using the data for analysis
Data from a political survey
Data collected from an experiment
Observed data
Secondary Sources
The person performing data analysis is not the data collector
Analyzing census data
Examining data from print journals or data published on
the internet.
Introduction to Statistics
15
Types of Variables
Data
Numerical
Categorical
Examples:
Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Continuous
Examples:
Number of Children
Defects per hour
(Countedtoitems)
Introduction
Statistics
Examples:
Weight
Voltage
(Measured characteristics)
16
7/20/2015
Types of Variables
Categorical
Qualitative variables have values that can only be placed into
categories, such as yes and no.
A variable that categorizes or describes an element of a
population.
Note: Arithmetic operations, such as addition and averaging, are not
meaningful for data resulting from a qualitative variable
Numerical
Quantitative variables have values that represent quantities.
A variable that quantifies an element of a population.
Note: Arithmetic operations such as addition and averaging, are
meaningful for data resulting from a quantitative variable.
Introduction to Statistics
17
Example
Identify each of the following examples as attribute (qualitative) or
numerical (quantitative) variables.
The amount of CNG pumped by the next 10 customers at the local
hp PUMP . (Numerical)
The amount of radon in the basement of each of 25 homes in a
new development. (Numerical)
The color of the baseball cap worn by each of 20 students.
(Attribute)
The length of time to complete a mathematics homework
assignment. (Numerical)
The state in which each truck is registered when stopped and
inspected at a weigh station. (Attribute)
Introduction to Statistics
18
7/20/2015
Question?
Identify each of the following as examples of qualitative or
quantitative variables:
The temperature in Barrow, Alaska at 12:00 pm on any
given day.
The make of automobile driven by each faculty member.
Whether or not a 6 volt lantern battery is defective.
The weight of a lead pencil.
The length of time billed for a long distance telephone call.
The brand of cereal children eat for breakfast.
The type of book taken out of the library by an adult.
Introduction to Statistics
19
Level of Measurement
Ratio
Interval
Ordinal
NOIR
Nominal
Introduction to Statistics
20
10
7/20/2015
Nominal scale
A nominal scale classifies data into distinct categories in
which no ranking is implied.
Categorical Variables
Categories
Personal Computer
Ownership
Yes / No
Growth, Value, Other
Type of Stocks
Owned
Microsoft Network /
AOL
Internet Provider
Introduction to Statistics
21
Ordinal scale
An ordinal scale classifies data into distinct
categories in which ranking is implied
Categorical Variable
Ordered Categories
Product satisfaction
Faculty rank
Student Grades
A, B, C, D, F
Introduction to Statistics
Chap 1-22
11
7/20/2015
Interval scale
An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true zero
point.
Example:
the difference between 1 and 2 years of age is the
same amount as the difference between 21 and 22
years of age, or 50 and 51, or 65 and 66.
the difference between a height of 60 inches and a
height of 55 inches is the same amount of difference as
a height of 72 inches and a height of 67 inches.
NOTE: For interval level variables, it is mathematically legitimate to do arithmetic (add, subtract, multiple,
and divide) as well as count the values, and sort or rank the values.
Introduction to Statistics
23
Introduction to Statistics
24
12
7/20/2015
Levels of Measurement
A ratio scale is an ordered scale in which the difference
between the measurements is a meaningful quantity.
Ratio level variables have the additional property of
having a true zero value so that ratios between values are
meaningful, but practically speaking, ratio level data is
treated the same as interval level.
Example
number of clients in past six months
It is meaningful to say that ...we had twice as many
clients in this period as we did in the previous six months.
Example
Introduction to Statistics
26
13
7/20/2015
Nominal
Introduction to Statistics
27
Nominal
28
14
7/20/2015
Ordinal
Nominal
29
Ordinal
Nominal
30
15
7/20/2015
Interval
Ordinal
Nominal
31
Interval
Ordinal
Nominal
Distance is meaningful
32
16
7/20/2015
Ratio
Interval
Ordinal
Nominal
Distance is meaningful
33
Ratio
Interval
Ordinal
Nominal
Absolute zero
Distance is meaningful
34
17
7/20/2015
Level of Measurement :
Decision Tree
Introduction to Statistics
35
Level of Measurement :
Characteristics
Introduction to Statistics
36
18
7/20/2015
Level of Measurement:
Statistical Tests
Introduction to Statistics
37
Example
Identify each of the following as examples of (1) nominal, (2)
ordinal, (3) discrete, or (4) continuous variables:
The length of time until a pain reliever begins to work.
The number of chocolate chips in a cookie.
The number of colors used in a statistics textbook.
The brand of refrigerator in a home.
The overall satisfaction rating of a new car.
The number of files on a computers hard disk.
The pH level of the water in a swimming pool.
The number of staples in a stapler.
Introduction to Statistics
38
19
7/20/2015
Class Exercise
Q 1: Determine whether the variable is categorical
or numerical If numerical, determine whether the
variable is discrete or continuous .Determine the
level of measurement
Amount of money spent on clothing in past
month?
Favorite department store?
Most likely time period during which shopping for
clothing takes place?
Number of pairs of shoes owned?
Class Exercise
Q 2: A manufacturer of dog food was planning to
survey household in India to determine purchasing
habit of dog owners. Among the variables to be
collected are
The primary place of purchase of dog food?
Whether dry or moist food can be purchased ?
Number of dogs living in the household?
Whether the dog is pedigreed?
20
7/20/2015
Class Exercise
Q3 : Suppose the following information collected from
Mr X on his application for a home loan at the HDFC
bank Loan department
a. Monthly payment : Rs 25100
b. Annual Family income:
c. Marital status: Married
d. No of job changed in past 10 years: 2
Classify each of the response by type of data and level of
measurement.
Introduction to Statistics
42
21
7/20/2015
One Categorical
Variable
Two Categorical
Variables
Summary Table
Contingency Table
Introduction to Statistics
43
Percent
45%
38%
Vacation
5%
Catching up on work
5%
Other
7%
Introduction to Statistics
Chap 1-44
22
7/20/2015
Contingency Table
Used to study patterns that may exist
between the responses of two or more
categorical variables
Cross tabulates or tallies jointly the responses
of the categorical variables
For two variables the tallies for one variable
are located in the rows and the tallies for the
second variable are located in the columns
Introduction to Statistics
45
335
65
400
46
23
7/20/2015
Errors
Total
Small
Amount
170
20
190
Medium
Amount
100
40
140
Large
Amount
65
Total
335
No
Errors
Errors
Total
Small
Amount
42.50%
5.00%
47.50%
Medium
Amount
25.00%
10.00%
35.00%
Large
Amount
16.25%
1.25%
17.50%
83.75%
16.25%
100.0%
70
65
400
Total
Introduction to Statistics
47
Errors
Small
Amount
170
20
190
Medium
Amount
100
40
140
Large
Amount
65
Total
335
5
65
Total
No
Errors
Errors
Total
Small
Amount
89.47%
10.53%
100.0%
Medium
Amount
71.43%
28.57%
100.0%
Large
Amount
92.86%
7.14%
100.0%
83.75%
16.25%
100.0%
70
400
Total
Introduction to Statistics
48
24
7/20/2015
Errors
Small
Amount
170
20
190
Medium
Amount
100
40
140
Large
Amount
65
Total
335
Total
5
65
No
Errors
Errors
Total
Small
Amount
50.75%
30.77%
47.50%
Medium
Amount
29.85%
61.54%
35.00%
Large
Amount
19.40%
7.69%
17.50%
100.0%
100.0%
100.0%
70
400
Total
Introduction to Statistics
49
Ordered Array
Frequency
Distributions
Introduction to Statistics
Cumulative
Distributions
50
25
7/20/2015
16
17
17
18
18
18
19
19
20
20
21
22
22
25
27
32
38
42
Night Students
18
18
19
19
20
21
23
28
32
33
41
45
Introduction to Statistics
Chap 1-51
52
26
7/20/2015
Introduction to Statistics
53
STEPS
1. Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44
, 46, 53, 58
2. Find range: 58 - 12 = 46
3. Select number of classes: 5 (usually between 5 and 15)
4. Compute class interval (width): 10 (46/5 then round up)
5. Determine class boundaries (limits):
1. Class 1: 10 to less than 20
2. Class 2: 20 to less than 30
3. Class 3: 30 to less than 40
4. Class 4: 40 to less than 50
5. Class 5: 50 to less than 60
6. Compute class midpoints: 15, 25, 35, 45, 55
7. Count observations & Introduction
assign totoStatistics
classes
54
27
7/20/2015
Midpoints
Frequency
15
25
35
45
55
3
6
5
4
2
20
Introduction to Statistics
55
Frequency
3
6
5
4
2
20
Relative
Frequency
Percentage
.15
.30
.25
.20
.10
1.00
15
30
25
20
10
100
Introduction to Statistics
56
28
7/20/2015
Frequency
Percentage
Cumulative
Frequency
Cumulative
Percentage
15%
15%
30%
45%
25%
14
70%
20%
18
90%
10%
20
100%
20
100
20
100%
Introduction to Statistics
57
Introduction to Statistics
58
29
7/20/2015
Frequency Distributions:
Some Tips
Different class boundaries may provide different
pictures for the same data (especially for smaller
data sets)
Shifts in data concentration may show up when
different class boundaries are chosen
As the size of the data set increases, the impact of
alterations in the selection of class boundaries is
greatly reduced
When comparing two or more groups with different
sample sizes, you must use either a relative
frequency or a percentage distribution
Introduction to Statistics
59
Summary Table
For One Variable
Bar
Chart
Pareto
Chart
Pie Chart
Introduction to Statistics
60
30
7/20/2015
7%
Catching up
5%
Vacation
5%
Travel to visit
38%
At home with
45%
0%
10%
20%
30%
40%
Introduction to Statistics
50%
Chap 1-61
5%
7%
45%
38%
Catching up on work
Other
Introduction to Statistics
Chap 1-62
31
7/20/2015
63
45%
100%
40%
90%
35%
80%
70%
30%
60%
25%
50%
20%
40%
15%
30%
10%
20%
5%
10%
0%
0%
Stocks
Bonds
Savings
Introduction to Statistics
cumulative % invested
(line graph)
CD
64
32
7/20/2015
Errors
Total
Small
Amount
50.75%
30.77%
47.50%
Medium
Amount
29.85%
61.54%
35.00%
Large
Amount
19.40%
7.69%
17.50%
100.0%
100.0%
100.0%
Medium
Small
Total
65
Ordered Array
Stem-and-Leaf
Display
Histogram
Introduction to Statistics
Polygon
Ogive
66
33
7/20/2015
Day Students
16 17 17 18 18 18
19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
Day Students
Stem
Night Students
Leaf
Stem Leaf
67788899
8899
0012257
0138
28
23
15
23 28 32 33 41 45
Introduction to Statistics
2-67
Leaf
Night Students
Stem Leaf
1 67788899
1 8899
2 0012257
2 0138
3 28
3 23
4 2
4 15
Introduction to Statistics
Chap 1-68
34
7/20/2015
69
Frequency
3
6
5
4
2
20
Relative
Frequency
.15
.30
.25
.20
.10
1.00
Percentage
15
30
25
20
10
100
10
Frequency
Class
0
5
Introduction to Statistics
15
25
35
45
55 More
70
35
7/20/2015
71
Frequency
Relative
Frequency
3
6
5
4
2
20
.15
.30
.25
.20
.10
1.00
Percentage
15
30
25
20
10
100
10
Frequency
Class
0
5
Introduction to Statistics
15
25
35
45
55
More
72
36
7/20/2015
Lower
Boundary
% Less Than
Lower
Boundary
10<20
10
20<30
20
15
30<40
30
45
40<50
40
70
50<60
50
90
60
100
Cumulative Percentage
100
50
0
10
20
Introduction to Statistics
30
40
50
60
73
Scatter Plots
Scatter plots are used for numerical data
consisting of paired observations taken from
two numerical variables
One variable is measured on the vertical axis
and the other variable is measured on the
horizontal axis
Scatter plots are used to examine possible
relationships between two numerical
variables
Introduction to Statistics
74
37
7/20/2015
Cost per
day
23
125
26
140
146
33
160
38
167
42
170
50
188
55
195
60
200
29
250
200
150
100
50
0
20
30
40
50
60
70
Introduction to Statistics
75
Time Series
A Time Series Plot is used to study patterns in the
values of a numeric variable over time
The Time Series Plot:
Numeric variable is measured on the vertical axis and
the time period is measured on the horizontal axis
Attendance (in millions) at USA amusement/theme parks from 2000-2005
Year
Year Number
Attendance
2000
317
2001
319
2002
324
2003
322
2004
328
2005
335
Introduction to Statistics
76
38
7/20/2015
Attendance
336
332
328
324
320
316
0
77
78
39
7/20/2015
Bad Presentation
Minimum Wage
Good Presentation
1960: $1.00
Minimum Wage
1970: $1.60
2
1980: $3.10
0
1990: $3.80
1960
1970
1980
1990
Introduction to Statistics
79
Graphical Errors:
No Relative Basis
Bad Presentation
As received by
students.
Freq.
300
20%
100
10%
0%
SO
JR
As received by
students.
%
30%
200
FR
Good Presentation
SR
FR
SO
JR
SR
80
40
7/20/2015
Graphical Errors:
Compressing the Vertical Axis
Bad Presentation
200
Quarterly Sales
50
100
25
0
Q1
Q2
Q3
Good Presentation
Quarterly Sales
Q4
Q1
Q2
Q3
Q4
Introduction to Statistics
81
Class Exercise 1
The owner of the restaurant wanted to study the demand for
dessert. He decided that in addition to studying whether the desert
was ordered, he would also study the gender of individual. Data
were collected from 600 customers and organized in the following
contingency tables.
Dessert Ordered
Yes
No
Total
Gender
Male
Female
40
96
240
224
280
320
Total
136
464
600
41
7/20/2015
Class Exercise 2
The Following Table represents estimated green power sales
by renewable energy source 2008
Source
Geothermal
hydro
Landfill mass and biomass
Solar
Unreported
Wind
Percentage
2.8
11.3
28.1
0.2
2.5
55.1
Class Exercise 3
42
7/20/2015
THANKS
Introduction to Statistics
86
43