Lecture_4
Lecture_4
What is Statistics?
5
Statistics: Science of
variability..?
• Virtually everything varies
• Variation occurs among individuals
• Variation occurs within any one individual
as time passes
Population Versus Sample
• Population — the whole
– a collection of persons, objects, or items under study
– The entire group of individuals in a statistical study we
want information about.
1. Descriptive statistics
2. Inferential statistics
Descriptive Statistics
Collect data
ex. Survey
Present data
ex. Tables and graphs
Characterize data
ex. Sample mean = X i
n
Descriptive statistics..
• Encompasses the following:
– Graphical or pictorial display
– Condensation of large masses of data into a
form such as tables
– Preparation of summary measures to give a
concise description of complex information (e.g.
an average figure)
– Exhibition of patterns that may be found in sets
of information
10
Inferential Statistics
Estimation
ex. Estimate the population
mean weight using the
sample mean weight
Hypothesis testing
ex. Test the claim that the
population mean weight is
120 pounds
Calculate x
to estimate
Population Sample
x
(parameter) (statistic)
Select a
random sample
13
Population vs. Sample
Population Sample
2
denotes population variance
denotes population standard deviation
Data
Categorical Numerical
Examples:
Marital Status
Political Party Discrete Continuous
Eye Color
Examples: Examples:
(Defined categories)
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured
characteristics)
Levels of Data Measurement
19
Levels of Measurement
Nominal
Data Level, Operations,
and Statistical Methods
Statistical
Data Level Meaningful Operations
Methods
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Methods of visual presentation
of data
• Multiple bar chart
4th Qtr
3rd Qtr
North
West
2nd Qtr East
1st Qtr
0 20 40 60 80 100
Methods of visual presentation
of data
• Simple pictogram
100
80
60
40
North
20
East
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr West
Frequency distributions
• Frequency tables
Observation Table
Class Interval Frequency Cumulative Frequency
< 20 13 13
<40 18 31
<60 25 56
<80 15 71
<100 9 80
Frequency diagrams
Frequency
30
25 Frequency
20 Cumulative Frequency
15
10
90
80
5 70
0 60
50
< 20 <40 <60 <80 <100 Cumulative Frequency
40
30
20
10
0
< 20 <40 <60 <80 <100
Frequency
30
25
20
15 Frequency
10
5
0
< 20 <40 <60 <80 <100
Ungrouped Versus Grouped
Data
• Ungrouped data
• have not been summarized in any way
• are also called raw data
• Grouped data
• have been organized into a frequency
distribution
Example of Ungrouped
Data
42 26 32 34 57
30 58 37 50 30
Ages of a Sample of
53 40 30 47 49
Managers from
50 40 32 31 40
XYZ
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52
49 33 43 46 32
61 31 30 40 60
74 37 29 43 54
35
Frequency Distribution of
Ages
36
Data Range
53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52 Smallest
49 33 43 46 32
61 31 30 40 60 Largest
74 37 29 43 54
37
Number of Classes and Class
Width
• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
51
Approximate Class Width = = 8.5
6
Class Width = 10
38
Class Midpoint
1
Class Midpoint = class beginning point + class width
2
1
= 30 + 10
2
= 35
39
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 6 .36
50
40-under 50 11 .22
50-under 60 11 18 .22
60-under 70 3 50 .06
70-under 80 1 .02
Total 50 1.00
40
Cumulative Frequency
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18
18 + 6 24
40-under 50 11 35
11 + 24
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
41
Class Midpoints, Relative Frequencies,
and Cumulative Frequencies
Relative Cumulative
Class IntervalFrequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative Frequencies
43
Common Statistical Graphs
• Histogram -- vertical bar chart of frequencies
• Frequency Polygon -- line graph of frequencies
• Ogive -- line graph of cumulative frequencies
• Pie Chart -- proportional representation for
categories of a whole
• Stem and Leaf Plot
• Pareto Chart
• Scatter Plot
44
Histogram
Class Interval
20
Frequency
20-under 30 6
Frequency
30-under 40 18
10
40-under 50 11
50-under 60 11
60-under 70 3
0
70-under 80 1 0 10 20 30 40 50 60 70 80
Years
45
Histogram Construction
20
30-under 40 18
40-under 50 11
Frequency
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
46
Frequency Polygon
20
20-under 30 6
30-under 40 18
Frequency
40-under 50 11
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
47
Ogive
Cumulative
Class Interval Frequency
60
20-under 30 6
40
Frequency
30-under 40 24
40-under 50 35
20
50-under 60 46
60-under 70 49 0
70-under 80 50 0 10 20 30 40 50 60 70 80
Years
48
Relative Frequency Ogive
Cumulative
Relative
49
Complaints by Passengers
Schedules,
Personnel Etc.
14% 10%
Equipment
15%
Stations, Etc.
40%
Train
Performance
21%
50
2d Quarter
Truck
Production
Company
A 357,411
B 354,936
Second C 160,997
Production E
Totals
12,747
920,190
51
Pie Chart Calculations for
Company A
2d Quarter
Truck
Production
Company Proportion Degrees
E 12,747 .014 5
Totals 920,190 1.000 360
52
Second Quarter
Truck Production
17%
4%
1%
39%
39%
A B C D E
53
Pareto Chart
100 100%
90 90%
80 80%
70 70%
60 60%
Frequency
50 50%
40 40%
30 30%
20 20%
10 10%
0 0%
Poor Short in Defective Other
Wiring Coil Plug
54
Scatter Plot
Gasoline Sales
5 60
100
15 120
9 90
0
15 140 0 5 10 15 20
Registered Vehicles
7 60
55
Principles of Excellent Graphs
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Graphical Errors: No Zero Point
on the Vertical Axis
Bad Presentation
Good Presentations