Chapter1-3 Statistic
Chapter1-3 Statistic
What is Statistics?
Statistics is a way to get information
from data
Population VS Sample
A Population is the entire set of observations
under study
A descriptive measure of a population is called a
parameter
A Sample is a subset of a population
A descriptive measure of a sample is called a
statistic
The descriptive techniques we learn in this
course can be used on this data
Introduction
Descriptive statistics involves the
arrangement, summary, and presentation of
data, to enable meaningful interpretation, and to
support decision making.
Descriptive statistics methods make use of
graphical techniques
2.6
Nominal Data…
Nominal Data
•The values of nominal data are categories.
coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
2.7
Ordinal Data…
Ordinal Data appear to be categorical in nature, but
their values have an order; a ranking to them:
2.8
Calculations for Types of Data
As mentioned above,
• All calculations are permitted on interval
data.
• Only calculations involving a ranking
process are allowed for ordinal data.
• No calculations are allowed for nominal
data, save counting the number of
observations in each category.
This lends itself to the following “hierarchy
of data”…
2.9
Hierarchy of Data…
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent categories.
Only calculations based on the frequencies of occurrence
are valid.
Data may not be treated as ordinal or interval.
2.10
Types of data – analysis
Knowing the type of data is necessary to properly
select the technique to be used when analyzing data.
Cross-Sectional/Time-Series Data
Cross sectional data is collected at a certain
point in time
Marketing survey (observe preferences by gender,
age)
Test score in a statistics course
Starting salaries of an MBA program graduates
Social Sales
(31 /100)(3600) = 111.60
Other
6%
Flicker
9%
Twitter
Twitter
31% Facebook
LinkedIn
YouTube
Instagram
21%
YouTube
Flicker
Facebook Other
7%
Instagram
4% LinkedIn
22%
The Bar Chart – Social Media Sales
Rectangles represent each category (social channel).
The height of the rectangle represents the frequency.
The base of the rectangle is arbitrary
The Bar Chart
15,000
10,000
5,000
0
2013 2014 2015 2016 2017 2018
Interval Data - Frequency Distribution
60
Frequency
40
20
0
15 30 45 60 75 90 105 120
Bin Frequency
Draw a Histogram 15 71
80
30 37
60 45 13
Frequency
40 60 9
20 75 10
90 18
0
15 30 45 60 75 90 105 120 105 28
Bills 120 14
Cell Phone Bill Example: Providing information
60
40
20
0
15
30
45
60
75
90
105
Bills 120
Class width
It is generally best to use equal class width,
but sometimes unequal class width are called
for.
Negatively skewed
Positively skewed
Modal classes
A modal class is the one with the largest
number of observations.
A unimodal histogram
A bimodal histogram
Class frequency
Class relative frequency =
Total number of observations
54% of the bills were less than or equal to $30 and 79%
Were less than or equal to $90.
Interpreting histograms
Custom Example: Selecting an
investment
An investor is considering investing in one
out of two investments.
The returns on these investments were
recorded.
From the two histograms, how can the
investor interpret the
Expected returns
The spread of the return (the risk involved with
each investment)
Custom Investment Example - Histograms
18- 18-
16- 16-
14- 14-
12- 12-
10- 10-
8- 8-
6- 6-
4- 4-
2- 2-
0- 0-
-15 0 15 30 45 60 75 -15 0 15 30 45 60 75
Return on investment A Return on investment B
Conclusion
It seems that investment A is better, because:
Its expected return is only slightly below that of
investment B
The risk from investing in A is smaller.
The possibility of having a high rate of return exists
for both investment.
Describing the Relationship Between Two Variables
Solution
The size (independent variable, X) affects
the price (dependent variable, Y)
We use Excel to create a scatter diagram
Size Vs. Price
1380000
1360000
1340000
1320000
1300000
1280000
1260000
1240000
1220000
1200000
1180000
0 500 1000 1500 2000 2500 3000 3500
Typical Patterns of Scatter Diagrams
Positive linear relationship No relationship Negative linear relationship
Frequency Table:
Newspaper
Occupation G&M Post Star Sun Total
Blue collar 27 18 38 37 120
White collar 29 43 21 15 108
Professional 33 51 22 20 126
Total 89 112 81 72 354
Contingency Table
Contingency Table:
Grand
Occupation GM Post Star Sun Total
60
Post
50
Post
Star Sun
40
G&M
G&M G&M
30
Star Star Sun
Post
20 Sun
10
0
Blue collar White collar Professional
Occupation
Graphing the Relationship between 2 Nominal Variables
The graphs tell us the same story as did the table. The
shapes of the bar charts for occupations 2 and 3
(White-collar and Professional) are very similar. Both
differ considerably from the bar chart for occupation 1
(Blue-collar).
Describing Time-Series Data
Gasoline Example
The average monthly retail price of
gasoline since 1976 was provided
Draw a graph of this data and describe the
information produced
Line Chart
Time 1 2 3 4 5
(minutes)
Frequency 25 40 50 35 30
a. 5 minutes
b. 3 minutes
c. 30 minutes
d. 50 minutes
Sample Question 6
Twenty-five voters participating in a recent election exit
poll in Minnesota were asked to state their political party affiliation.
Coding the data as R for Republican, D for Democrat, and I for
Independent, the data collected were as follows:
I, R, D, I, R, I, I, D, R, I, I, D, R, R, I, D, I, R, I, D, I, D, R, R, and I.
b) What does the bar chart tell you about the political
affiliations of those in this sample?
Sample Question 7
The graph below is an example of a histogram. True or False?
Sample question 8
Which of the following statements about
histograms is true?
a. A histogram is a summary of interval data.
b. A histogram is made of a series of intervals, called
classes.
c. The classes in a histogram cover the complete range of
observations.
d. All of these choices are true.
Sample Question 9
Compare the two histograms below.
Which statement is true?