0% found this document useful (0 votes)
9 views69 pages

Chapter1-3 Statistic

The document provides an overview of descriptive statistics, emphasizing the distinction between population and sample data, as well as the types of data (interval, nominal, and ordinal) and their respective graphical representation techniques. It explains how to analyze data using graphical methods such as bar charts and pie charts for nominal data, and histograms for interval data. Additionally, it discusses the importance of understanding data types for selecting appropriate analysis techniques and interpreting relationships between variables.

Uploaded by

Shakiba Akhbari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views69 pages

Chapter1-3 Statistic

The document provides an overview of descriptive statistics, emphasizing the distinction between population and sample data, as well as the types of data (interval, nominal, and ordinal) and their respective graphical representation techniques. It explains how to analyze data using graphical methods such as bar charts and pie charts for nominal data, and histograms for interval data. Additionally, it discusses the importance of understanding data types for selecting appropriate analysis techniques and interpreting relationships between variables.

Uploaded by

Shakiba Akhbari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Graphical Descriptive Techniques

What is Statistics?
Statistics is a way to get information
from data
Population VS Sample
A Population is the entire set of observations
under study
 A descriptive measure of a population is called a
parameter
A Sample is a subset of a population
 A descriptive measure of a sample is called a
statistic
The descriptive techniques we learn in this
course can be used on this data
Introduction
Descriptive statistics involves the
arrangement, summary, and presentation of
data, to enable meaningful interpretation, and to
support decision making.
Descriptive statistics methods make use of
 graphical techniques

 numerical descriptive measures.

The methods presented apply to both


 the entire population (Parameter)

 the population sample (Statistic)


Types of data and information
A variable - a characteristic of population or
sample that is of interest for us.
 Cereal choice
 Capital expenditure
 The waiting time for medical services
Data - the actual values of the variables
 Interval data are numerical observations
 Nominal data are categorical observations
 Ordinal data are ordered categorical observations
Interval Data…
Interval data
•Real numbers, i.e. heights, weights, prices,
etc.
•Also referred to as quantitative or
numerical.

Arithmetic operations can be performed


on Interval Data, thus its meaningful to
talk about 2*Height, or Price + $1, and so
on.

2.6
Nominal Data…
Nominal Data
•The values of nominal data are categories.

•E.g. responses to questions about marital status,

coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4

These data are categorical in nature; arithmetic


operations don’t make any sense (e.g. does Widowed ÷
2 = Married?!)

Nominal data are also called qualitative or categorical.

2.7
Ordinal Data…
Ordinal Data appear to be categorical in nature, but
their values have an order; a ranking to them:

E.g. College course rating system:


poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

While its still not meaningful to do arithmetic on this


data (e.g. does 2*fair = very good?!), we can say things
like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric
values are assigned to each category.

2.8
Calculations for Types of Data
As mentioned above,
• All calculations are permitted on interval
data.
• Only calculations involving a ranking
process are allowed for ordinal data.
• No calculations are allowed for nominal
data, save counting the number of
observations in each category.
This lends itself to the following “hierarchy
of data”…

2.9
Hierarchy of Data…
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.

Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.

Nominal
Values are the arbitrary numbers that represent categories.
Only calculations based on the frequencies of occurrence
are valid.
Data may not be treated as ordinal or interval.
2.10
Types of data – analysis
Knowing the type of data is necessary to properly
select the technique to be used when analyzing data.
Cross-Sectional/Time-Series Data
Cross sectional data is collected at a certain
point in time
 Marketing survey (observe preferences by gender,
age)
 Test score in a statistics course
 Starting salaries of an MBA program graduates

Time series data is collected over


successive points in time
 Weekly closing price of gold
 Amount of crude oil imported monthly
Computer Software Note
Ensure you load Data Analysis and
Data Analysis Plus

Tools >> Add-Ins >> Click on Analysis


toolpak (for Data Analysis) or refer to
instructions on course website.

Go on to the publishers site (for Data


Analysis Plus)
Graphical Techniques for Nominal data

The only allowable calculation on nominal


data is to count the frequency of each value
of a variable.
When the raw data can be naturally
categorized in a meaningful manner, we can
display frequencies by
 Bar charts – emphasize frequency of occurrences
of the different categories. Frequencies used.
 Pie chart – emphasize the proportion of
occurrences of each category. Relative
Frequencies
The Pie Chart
The pie chart is a circle, subdivided into
a number of slices that represent the
various categories.
The size of each slice is proportional to
the percentage corresponding to the
category it represents.
The Pie Chart – Social Media Sales

 2018 Social Media Sales for a local clothing


company is under investigation
 We need to know more about the social
channels that are contributing to our sales
 We want to analyze the social media
channels that are contributing to our sales
 A sample of 285 sales transactions were
pulled from our ecommerce store and the
social referring medium was recorded
The Pie Chart – Social Media Sales

Social Sales
(31 /100)(3600) = 111.60
Other
6%
Flicker
9%
Twitter
Twitter
31% Facebook
LinkedIn
YouTube
Instagram
21%
YouTube
Flicker
Facebook Other
7%

Instagram
4% LinkedIn
22%
The Bar Chart – Social Media Sales
Rectangles represent each category (social channel).
The height of the rectangle represents the frequency.
The base of the rectangle is arbitrary
The Bar Chart

Use bar charts also when the order in which


nominal data are presented is meaningful.
Total number of new products introduced in
Canada in the years 2013…2018
20,000

15,000

10,000

5,000

0
2013 2014 2015 2016 2017 2018
Interval Data - Frequency Distribution

Counts the number of observations that


fall into each of a series of intervals,
called classes, which cover the
complete range of observations
Graphical Techniques for Interval Data
Histogram
 Is a graphical representation of a
frequency distribution
 Rectangles are used, where their bases
are the intervals and whose heights are the
frequencies
Draw a Histogram
80

60
Frequency

40

20
0
15 30 45 60 75 90 105 120

Bills = cost of phone bill Bills


Graphical Techniques for Interval Data

Custom Example: Given information


concerning the monthly bills of new
subscribers in the first month after
signing on with a cellure company.
 Collect data
 Prepare a frequency distribution
 Draw a histogram
 See excel file posted with presentation -
Histogram
Cell Phone Bill Example: Providing information

Collect data Prepare a frequency distribution (pg. 50)


How many classes to use?
Bills Number of observations Number of classes
42.19 Less then 50 5-7
38.45 50 to 200 7-9
29.23 201 - 500 9-10
89.35 501 - 1,000 10-11
118.04
1,001 – 5,000 11-13
110.46
5,001- 50,000 13-17
0.00
More than 50,000 17-20
72.88
83.05 Class width = [Range] / [# of classes]
.
.
There are 200 data points
[119.63 - 0] / [8] = 14.95 15
Largest
Largest
Largest
Largest Smallest
Smallest
Smallest
Smallest Alternative, we could use Sturges’
observation
observation observation
observation
observation
observation observation
observation formula: Number of class intervals
= 1 + 3.3 log (n)
Cell Phone Bill Example: Providing information

Bin Frequency
Draw a Histogram 15 71
80
30 37
60 45 13
Frequency

40 60 9
20 75 10
90 18
0
15 30 45 60 75 90 105 120 105 28
Bills 120 14
Cell Phone Bill Example: Providing information

What information can we extract from this histogram


About half of all A few bills are in Relatively,
the bills are small the middle range large number
80 71+37=108 13+9+10=32 of large bills
18+28+14=60
Frequency

60

40

20

0
15
30
45
60
75
90
105
Bills 120
Class width
It is generally best to use equal class width,
but sometimes unequal class width are called
for.

Unequal class width is used when the


frequency associated with some classes is
too low. Then,
 several classes are combined together to form a
wider and “more populated” class.
 It is possible to form an open ended class at the
higher end or lower end of the histogram.
Shapes of histograms
There are four typical shape characteristics
Shapes of histograms

Negatively skewed

Positively skewed
Modal classes
A modal class is the one with the largest
number of observations.
A unimodal histogram

The modal class


Modal classes

A bimodal histogram

A modal class A modal class


Bell shaped histograms

• Many statistical techniques require that the


population be bell shaped.
• Drawing the histogram helps verify the shape of
the population in question
Relative frequency

It is often preferable to show the relative frequency


(proportion) of observations falling into each class,
rather than the frequency itself.

Class frequency
Class relative frequency =
Total number of observations

Relative frequencies should be used when


 the population relative frequencies are studied
 comparing two or more histograms
 the number of observations of the samples studied are
different
Relative and Cumulative Relative Frequency Distribution

Cumulative relative frequency distributions.


Cell phone bill example:
Cumulative
Relative Cumulative Relative
Class Frequency Frequency Frequency Frequency
0-15 71 0.355 71 71/200=.355
15-30 37 0.185 108 108/200=.540
30-45 13 0.065 121 121/200=.605
45-60 9 0.045 130 130/200=.650
60-75 10 0.05 140 140/200=.700
75-90 18 0.09 158 158/200=.790
90-105 28 0.14 186 186/200=.930
105-200 14 0.07 200 200/200=1.000
Sample Size = 200

54% of the bills were less than or equal to $30 and 79%
Were less than or equal to $90.
Interpreting histograms
Custom Example: Selecting an
investment
 An investor is considering investing in one
out of two investments.
 The returns on these investments were
recorded.
 From the two histograms, how can the
investor interpret the
 Expected returns
 The spread of the return (the risk involved with
each investment)
Custom Investment Example - Histograms

The center The center


18- 18-
for A for B
16- 16-
14- 14-
12- 12-
10- 10-
8- 8-
6- 6-
4- 4-
2- 2-
0- 0-
-15 0 15 30 45 60 75 -15 0 15 30 45 60 75
Return on investment A Return on investment B
Interpretation: The center of the returns of Investment A
is slightly lower than that for Investment B
Custom Investment Example - Histograms

18- Sample size =50 18- Sample size =50


16- 16-
14- 14-
12- 12-
10- 10-
8- 17 8- 16
6- 6-
34 26
4- 4-
2- 46 2- 43
0- 0-
-15 0 15 30 45 60 75 -15 0 15 30 45 60 75
Return on investment A Return on investment B

Interpretation: The spread of returns for Investment A


is less than that for investment B
Custom Investment Example - Histograms

18- 18-
16- 16-
14- 14-
12- 12-
10- 10-
8- 8-
6- 6-
4- 4-
2- 2-
0- 0-
-15 0 15 30 45 60 75 -15 0 15 30 45 60 75
Return on investment A Return on investment B

Interpretation: Both histograms are slightly positively


skewed. There is a possibility of large returns.
Custom Investment Example - Histograms

Conclusion
 It seems that investment A is better, because:
 Its expected return is only slightly below that of
investment B
 The risk from investing in A is smaller.
 The possibility of having a high rate of return exists
for both investment.
Describing the Relationship Between Two Variables

We are interested in the


relationship between two interval
variables.
Example 3.7 (pg.66)
 A real estate agent wants to study the Size Price
relationship between house price and 2300 1315000
1800 1229000
house size 2600 1355000
2000 1261000
 Twelve houses recently sold are 2200 1234000
sampled and there size and price 1400
3300
1216000
1308000
recorded 2800 1306000
2300 1289000
 Use graphical technique to describe 2000 1204000
2700 1265000
the relationship between size and 1800 1195000
price.
Describing the Relationship Between Two Variables

Solution
 The size (independent variable, X) affects
the price (dependent variable, Y)
 We use Excel to create a scatter diagram
Size Vs. Price

1380000
1360000
1340000
1320000
1300000
1280000
1260000
1240000
1220000
1200000
1180000
0 500 1000 1500 2000 2500 3000 3500
Typical Patterns of Scatter Diagrams
Positive linear relationship No relationship Negative linear relationship

Negative nonlinear relationship Nonlinear (concave) relationship


This is a weak linear relationship.
A non linear relationship seems to
fit the data better.
Graphing the Relationship Between Two Nominal Variables

We create a contingency table.


This table lists the frequency for each
combination of values of the two
variables.
We can create a bar chart that
represent the frequency of occurrence
of each combination of values.
Describing the Relationship between Two Nominal Variables

To describe the relationship between two


nominal variables, we must remember that we
are permitted only to determine the frequency
of the values.

As a first step we need to produce a cross-


classification table, which lists the frequency
of each combination of the values of the two
variables

We can achieve this in Excel using a Pivot


Table
Example 2.4 Newspaper Readership Survey – pg.35

In a major North American city there are four competing


newspapers: the Globe and Mail (G&M), Post, Sun, and
Star. To help design advertising campaigns, the
advertising managers of the newspapers need to know
which segments of the newspaper market are reading
their papers.

A survey was conducted to analyze the relationship


between newspapers read and occupation. A sample of
newspaper readers was asked to report which
newspaper they read: Globe and Mail (1) Post (2), Star
(3), Sun (4), and to indicate whether they were blue-
collar worker (1), white-collar worker (2), or professional
(3). The responses are stored in Chapter2_Pivot
Table.xls using the codes. Some of the data are listed
here.
Example 2.4, pg.35
Reader Occupation Newspaper
1 2 2
2 1 4
3 2 1
. . . Where:
Occupation
. . . 1 Blue Collar
352 3 2 2 White Collar
3 Professionals
353 1 3
354 2 3
Newspaper
1 Globe and Mail
2 Post
3 Star
4 Sun
Cross-Classification Table of Frequencies

Frequency Table:
Newspaper
Occupation G&M Post Star Sun Total
Blue collar 27 18 38 37 120
White collar 29 43 21 15 108
Professional 33 51 22 20 126
Total 89 112 81 72 354
Contingency Table

Contingency Table:

Grand
Occupation GM Post Star Sun Total

Blue 7.63% 5.08% 10.73% 10.45% 33.90%

White 8.19% 12.15% 5.93% 4.24% 30.51%

Professional 9.32% 14.41% 6.21% 5.65% 35.59%

Grand Total 25.14% 31.64% 22.88% 20.34% 100.00%


Graphing the Relationship between 2 Nominal Variables

60
Post
50
Post
Star Sun
40
G&M
G&M G&M
30
Star Star Sun
Post
20 Sun

10

0
Blue collar White collar Professional

Occupation
Graphing the Relationship between 2 Nominal Variables

If the two variables are unrelated, the patterns


exhibited in the bar charts should be approximately the
same. If some relationship exists, then some bar
charts will differ from others.

The graphs tell us the same story as did the table. The
shapes of the bar charts for occupations 2 and 3
(White-collar and Professional) are very similar. Both
differ considerably from the bar chart for occupation 1
(Blue-collar).
Describing Time-Series Data

Data can be classified according to the


time it is collected.
 Cross-sectional data are all collected at
the same time.
 Time-series data are collected at
successive points in time.
Time-series data is often depicted on a
line chart (a plot of the variable over
time).
Line Chart

Gasoline Example
 The average monthly retail price of
gasoline since 1976 was provided
 Draw a graph of this data and describe the
information produced
Line Chart

Using constant 1982-1984 dollars, we can see that the average


price of a gallon of gasoline hit its peak in the middle of 2008
(month 390). From there it dropped rapidly and in late 2009 and
Slowly started to come back up to 2015 to 2018.
Graphical Excellence
Is a term we apply to techniques that
are informative and concise and that
impart information clearly to their
viewers
When we do not achieve the above we
run into issues like “chartjunk” - where
the designer focuses too much on how
it looks and fails in the substance
Graphical Deception
Graphs without a scale on one axis
Influence from caption of charts
Stretch of horizontal or vertical axis
(makes data impact less/more dramatic than what it really is)
Playing with size distortions (making a bar
chart width larger for some)
The above deceptions only mislead the
reader and breaks ethical
responsibilities of the designer
The more aware you are of these the
less chance you will fall into statistical
traps and be a more informed
reader/user
Summary
Interval Nominal Ordinal
Single Data Set Histogram – for Pie Chart – Bar Chart is ideal
cross sectional Displays % - for this type of
interval data Shows data
proportional
Ogive – for cross difference in
sectional interval sample size
data %
Bar Chart –
Line Chart – for Displays
time series Frequency –
interval data Great for time
series nominal
data
Relationship Scatter Diagram Cross NA
between two classification
variables tables, multiple
bar charts
THE END 
Sample Questions
Remember these are just sample questions
and you should be doing as many questions
from the textbook as possible.
Sample Question 1
A councilman who is running for the office of senator of a
state with 3.5 million registered voters commissions a survey.
In the survey, 46% of the 8,000 registered voters interviewed
say they plan to vote for him. The population of interest is:
a. the 3.5 million registered voters in the state.
b. the 8,000 registered voters interviewed.
c. the 46% who plan to vote for her.
d. all the residents of the state.
Sample Question 2
Each of the following is a form of doing
____________________ statistics:
1) presenting your data using a graph;
2) calculating the mean of your sample; and
3) organizing your data into a table.

Fill in the blank.


Sample Question 3
At Cedar Rapids Community College, administrators want to
determine the average commuting distance for their students who
commute to school. They randomly select 250 students who commute and
ask them the distance of their commute to campus. From this group a mean
of 19.3 miles is computed.
a. Describe/find the parameter.
b. Describe/find the statistic.
c. Describe the population.
d. Describe the sample.
Sample Question 4
All calculations are permitted on what type of
data?

a. Interval data c. Ordinal data


b. Nominal data d. All of these choices are
true.
Sample Question 5
Suppose you measure the number of minutes it takes an employee
to complete a task, where the maximum allowed time is 5 minutes,
and each time is rounded to the nearest minute. Data from 130 employees
is summarized below. How long did it take most employees to complete the task?

Time 1 2 3 4 5
(minutes)
Frequency 25 40 50 35 30

a. 5 minutes
b. 3 minutes
c. 30 minutes
d. 50 minutes
Sample Question 6
Twenty-five voters participating in a recent election exit
poll in Minnesota were asked to state their political party affiliation.
Coding the data as R for Republican, D for Democrat, and I for
Independent, the data collected were as follows:
I, R, D, I, R, I, I, D, R, I, I, D, R, R, I, D, I, R, I, D, I, D, R, R, and I.

a) Construct a frequency bar chart from this data.

b) What does the bar chart tell you about the political
affiliations of those in this sample?
Sample Question 7
The graph below is an example of a histogram. True or False?
Sample question 8
Which of the following statements about
histograms is true?
a. A histogram is a summary of interval data.
b. A histogram is made of a series of intervals, called
classes.
c. The classes in a histogram cover the complete range of
observations.
d. All of these choices are true.
Sample Question 9
Compare the two histograms below.
Which statement is true?

a. The spread of histogram A is smaller than the spread of histogram B.


b. The spread of histogram A is larger than the spread of histogram B.
c. The spread of histogram A is the same as the spread of histogram B.
d. You cannot compare the spreads of these two histograms without the original
data.
Sample Question 10
What type of information is displayed on
the graph below:
Answers to Sample Question
1: A
2: Descriptive
3:

A) The mean commute distance for all commuting students at the college.

B) 19.3 miles.

C) All commuting students enrolled at the college.

D) The 250 randomly selected commuting students.
4: A
5: B
6: a) b)
The bar graph shows most of the
people surveyed were
Independents (11 out of 25 =
44.0%); Republications followed
with 8/25 = 32.0% and
Democrats made up 6 of the 25,
or 24.0%.
Answers to Sample Question
7: False (There are spaces between the bars)
8: D
9: C
10: Cumulative Relative Frequency

You might also like