0% found this document useful (0 votes)
275 views43 pages

1 Introduction To Statistics - Handouts

This document provides an introduction to statistics. It discusses key concepts including what statistics is, why it is studied, and basic vocabulary. Statistics is defined as the science of collecting, describing, and interpreting data. It can be used to present and describe business data, draw conclusions about populations using samples, and make forecasts. Descriptive statistics involves collecting and summarizing data, while inferential statistics draws conclusions about populations based on samples. The document also covers levels of measurement, sources of data, types of variables, and how statistics is used at different levels of management.

Uploaded by

Easwar Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views43 pages

1 Introduction To Statistics - Handouts

This document provides an introduction to statistics. It discusses key concepts including what statistics is, why it is studied, and basic vocabulary. Statistics is defined as the science of collecting, describing, and interpreting data. It can be used to present and describe business data, draw conclusions about populations using samples, and make forecasts. Descriptive statistics involves collecting and summarizing data, while inferential statistics draws conclusions about populations based on samples. The document also covers levels of measurement, sources of data, types of variables, and how statistics is used at different levels of management.

Uploaded by

Easwar Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

7/20/2015

Introduction to Statistics
By
Dr. Vishal Singh Patyal

Learning Objectives
In this chapter, you will learn:

What is Statistics
Why Statistics
Basic vocabulary used in Statistics
How statistics is used in Business
The sources of data and its types used in Business
Types of Variables
Level of Management
Tabular and Graphical Presentation of Data

Introduction to Statistics

7/20/2015

Introduction to Statistics

What is Statistics?
The science of collecting, describing, and interpreting data.
Statistics is a way to get information from data
Statistics
Data

Data: Facts, especially


numerical facts, collected
together for reference or
information.

Information
Information: Knowledge
communicated
concerning some
particular fact.

Statistics is a tool for creatingIntroduction


new understanding
from a set of numbers.
to Statistics

7/20/2015

Why Study Statistics?


Decision Makers Use Statistics To:
Present and describe business data and information properly
Draw conclusions about large populations, using information
collected from samples
Make reliable forecasts about a business activity
Improve business processes

Introduction to Statistics

Types of Statistics
Statistics
The branch of mathematics that transforms data into
useful information for decision makers.

Descriptive Statistics

Inferential Statistics

Collecting, summarizing,
and describing data

Drawing conclusions
and/or making decisions
concerning a population
based only on sample data

Introduction to Statistics

7/20/2015

Descriptive Statistics
Collect data
ex. Survey
Present data
ex. Tables and graphs
Characterize data
ex. Sample mean = X i
n

Collect
Organize
Summarize
Display
AnalyzeIntroduction to Statistics

Inferential Statistics
Estimation
ex. Estimate the population
mean weight using the sample
mean weight
Hypothesis testing
ex. Test the claim that the
population mean weight is 120
pounds

Predict and forecast values


of population parameters
Test hypotheses about
values of population
parameters
Make decisions

Drawing conclusions and/or making decisions concerning a population


basedIntroduction
on sample
results.
to Statistics

7/20/2015

Example
A recent study examined the QUANT and VERBAL CAT scores of
students across the country.
Which of the following
statements are descriptive in nature and which are inferential.
The mean QUANT CAT score was 492. D
The mean VERBAL SAT score was 475. D
Students in the Northeast scored higher in QUANT but lower
in VERBAL. I
80% of all students taking the exam were headed for IIMs. I
32% of the students scored above 610 on the VERBAL CAT. D
The QUANT CAT scores are higher than they were 10 years
ago. I
Introduction to Statistics

Basic Vocabulary of Statistics


Population
A population consists of all the items or individuals about
which you want to draw a conclusion.
A population is the group of all items of interest to a
statistics practitioner.
frequently very large; sometimes infinite.
E.g. All 1.252 Billion Indian population i.e. census data.
Sample
A subset of the population.
A sample is a set of data drawn from the population.
Potentially very large, but less than the population.
E.g. a sample of 765 voters
exit polled on election day
Introduction to Statistics

10

7/20/2015

Basic Vocabulary of Statistics


Population
Sample

Subset

Statistic

Parameter
Measures used to describe
the population are called
parameters

Measures computed
from sample data are
called statistics

Introduction to Statistics

11

Basic Vocabulary of Statistics


Variable
A variable is some characteristic of a population or sample.
E.g. student grades. Typically denoted with a capital letter: A,
B, C
The values of the variable are the range of possible values for
a variable.
E.g. student marks (0..100)
Data
Data are the observed values of a variable.
Data are the different values associated with a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Introduction to Statistics

12

7/20/2015

Example
NICMAR Institute dean is interested in learning about the average
age of faculty. Identify the basic terms in this situation.
The population is the age of all faculty members at the Institute.
A sample is any subset of that population. For example, we might
select 10 faculty members and determine their age.
The variable is the age of each faculty member.
The data would be the set of values in the sample.
The parameter of interest is the average age of all faculty at the
Institute.
The statistic is the average age for all faculty in the sample.
Introduction to Statistics

13

Why Collect Data?


A marketing research analyst needs to assess the
effectiveness of a new television advertisement.
A pharmaceutical manufacturer needs to determine whether
a new drug is more effective than those currently in use.
An operations manager wants to monitor a manufacturing
process to find out whether the quality of product being
manufactured is conforming to company standards.
An auditor wants to review the financial transactions of a
company in order to determine whether the company is in
compliance with generally accepted accounting principles.

Introduction to Statistics

14

7/20/2015

Sources of Data
Primary Sources:
The data collector is the one using the data for analysis
Data from a political survey
Data collected from an experiment
Observed data
Secondary Sources
The person performing data analysis is not the data collector
Analyzing census data
Examining data from print journals or data published on
the internet.
Introduction to Statistics

15

Types of Variables
Data

Numerical

Categorical
Examples:
Marital Status
Political Party
Eye Color
(Defined categories)

Discrete

Continuous

Examples:

Number of Children
Defects per hour
(Countedtoitems)
Introduction
Statistics

Examples:

Weight
Voltage
(Measured characteristics)
16

7/20/2015

Types of Variables
Categorical
Qualitative variables have values that can only be placed into
categories, such as yes and no.
A variable that categorizes or describes an element of a
population.
Note: Arithmetic operations, such as addition and averaging, are not
meaningful for data resulting from a qualitative variable
Numerical
Quantitative variables have values that represent quantities.
A variable that quantifies an element of a population.
Note: Arithmetic operations such as addition and averaging, are
meaningful for data resulting from a quantitative variable.
Introduction to Statistics

17

Example
Identify each of the following examples as attribute (qualitative) or
numerical (quantitative) variables.
The amount of CNG pumped by the next 10 customers at the local
hp PUMP . (Numerical)
The amount of radon in the basement of each of 25 homes in a
new development. (Numerical)
The color of the baseball cap worn by each of 20 students.
(Attribute)
The length of time to complete a mathematics homework
assignment. (Numerical)
The state in which each truck is registered when stopped and
inspected at a weigh station. (Attribute)
Introduction to Statistics

18

7/20/2015

Question?
Identify each of the following as examples of qualitative or
quantitative variables:
The temperature in Barrow, Alaska at 12:00 pm on any
given day.
The make of automobile driven by each faculty member.
Whether or not a 6 volt lantern battery is defective.
The weight of a lead pencil.
The length of time billed for a long distance telephone call.
The brand of cereal children eat for breakfast.
The type of book taken out of the library by an adult.

Introduction to Statistics

19

Level of Measurement

Ratio
Interval
Ordinal

NOIR

Nominal

Introduction to Statistics

20

10

7/20/2015

Nominal scale
A nominal scale classifies data into distinct categories in
which no ranking is implied.

Categorical Variables

Categories

Personal Computer
Ownership

Yes / No
Growth, Value, Other

Type of Stocks
Owned

Microsoft Network /
AOL

Internet Provider

Introduction to Statistics

21

Ordinal scale
An ordinal scale classifies data into distinct
categories in which ranking is implied

Categorical Variable

Ordered Categories

Student class designation

Freshman, Junior, Senior

Product satisfaction

Satisfied, Neutral, Unsatisfied

Faculty rank

Professor, Associate Professor,


Assistant Professor, Instructor

Standard & Poors bond ratings

AAA, AA, A, BBB, BB, B, CCC, CC,


C, DDD, DD, D

Student Grades

A, B, C, D, F
Introduction to Statistics

Chap 1-22

11

7/20/2015

Interval scale
An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true zero
point.
Example:
the difference between 1 and 2 years of age is the
same amount as the difference between 21 and 22
years of age, or 50 and 51, or 65 and 66.
the difference between a height of 60 inches and a
height of 55 inches is the same amount of difference as
a height of 72 inches and a height of 67 inches.
NOTE: For interval level variables, it is mathematically legitimate to do arithmetic (add, subtract, multiple,
and divide) as well as count the values, and sort or rank the values.
Introduction to Statistics

23

Introduction to Statistics

24

12

7/20/2015

Levels of Measurement
A ratio scale is an ordered scale in which the difference
between the measurements is a meaningful quantity.
Ratio level variables have the additional property of
having a true zero value so that ratios between values are
meaningful, but practically speaking, ratio level data is
treated the same as interval level.
Example
number of clients in past six months
It is meaningful to say that ...we had twice as many
clients in this period as we did in the previous six months.

Example

Introduction to Statistics

26

13

7/20/2015

The Hierarchy of Levels

Nominal

Introduction to Statistics

27

The Hierarchy of Levels

Nominal

Attributes are only named; weakest


Introduction to Statistics

28

14

7/20/2015

The Hierarchy of Levels

Ordinal
Nominal

Attributes are only named; weakest


Introduction to Statistics

29

The Hierarchy of Levels

Ordinal
Nominal

Attributes can be ordered

Attributes are only named; weakest


Introduction to Statistics

30

15

7/20/2015

The Hierarchy of Levels

Interval

Ordinal
Nominal

Attributes can be ordered

Attributes are only named; weakest


Introduction to Statistics

31

The Hierarchy of Levels

Interval
Ordinal
Nominal

Distance is meaningful

Attributes can be ordered

Attributes are only named; weakest


Introduction to Statistics

32

16

7/20/2015

The Hierarchy of Levels

Ratio
Interval

Ordinal
Nominal

Distance is meaningful

Attributes can be ordered

Attributes are only named; weakest


Introduction to Statistics

33

The Hierarchy of Levels

Ratio

Interval
Ordinal
Nominal

Absolute zero

Distance is meaningful

Attributes can be ordered

Attributes are only named; weakest


Introduction to Statistics

34

17

7/20/2015

Level of Measurement :
Decision Tree

Introduction to Statistics

35

Level of Measurement :
Characteristics

Introduction to Statistics

36

18

7/20/2015

Level of Measurement:
Statistical Tests

Introduction to Statistics

37

Example
Identify each of the following as examples of (1) nominal, (2)
ordinal, (3) discrete, or (4) continuous variables:
The length of time until a pain reliever begins to work.
The number of chocolate chips in a cookie.
The number of colors used in a statistics textbook.
The brand of refrigerator in a home.
The overall satisfaction rating of a new car.
The number of files on a computers hard disk.
The pH level of the water in a swimming pool.
The number of staples in a stapler.
Introduction to Statistics

38

19

7/20/2015

Class Exercise
Q 1: Determine whether the variable is categorical
or numerical If numerical, determine whether the
variable is discrete or continuous .Determine the
level of measurement
Amount of money spent on clothing in past
month?
Favorite department store?
Most likely time period during which shopping for
clothing takes place?
Number of pairs of shoes owned?

Class Exercise
Q 2: A manufacturer of dog food was planning to
survey household in India to determine purchasing
habit of dog owners. Among the variables to be
collected are
The primary place of purchase of dog food?
Whether dry or moist food can be purchased ?
Number of dogs living in the household?
Whether the dog is pedigreed?

20

7/20/2015

Class Exercise
Q3 : Suppose the following information collected from
Mr X on his application for a home loan at the HDFC
bank Loan department
a. Monthly payment : Rs 25100
b. Annual Family income:
c. Marital status: Married
d. No of job changed in past 10 years: 2
Classify each of the response by type of data and level of
measurement.

Organizing and Visualizing


Categorical and Numerical Data

Introduction to Statistics

42

21

7/20/2015

Categorical Data Are Organized By


Utilizing Tables
Categorical Data
Tallying Data

One Categorical
Variable

Two Categorical
Variables

Summary Table

Contingency Table

Introduction to Statistics

43

Organizing Categorical Data:


Summary Table
A summary table indicates the frequency, amount, or
percentage of items in a set of categories so that you can
see differences between categories.
How do you spend the holidays?

Percent

At home with family

45%

Travel to visit family

38%

Vacation

5%

Catching up on work

5%

Other

7%

Introduction to Statistics

Chap 1-44

22

7/20/2015

Contingency Table
Used to study patterns that may exist
between the responses of two or more
categorical variables
Cross tabulates or tallies jointly the responses
of the categorical variables
For two variables the tallies for one variable
are located in the rows and the tallies for the
second variable are located in the columns
Introduction to Statistics

45

Contingency Table - Example


A random sample of 400
Contingency Table Showing
invoices is drawn.
Frequency of Invoices Categorized
Each invoice is categorized as a By Size and The Presence Of Errors
small, medium, or large
No
Errors
Errors
Total
amount.
Small
170
20
190
Each invoice is also examined to
Amount
identify if there are any errors.
Medium
100
40
140
This data are then organized in
Amount
the contingency table to the
Large
65
5
70
Amount
right.
Total
Introduction to Statistics

335

65

400
46

23

7/20/2015

Contingency Table Based on


% of Overall Total
No
Errors

Errors

Total

Small
Amount

170

20

190

Medium
Amount

100

40

140

Large
Amount

65

Total

335

42.50% = 170 / 400


25.00% = 100 / 400
16.25% = 65 / 400

No
Errors

Errors

Total

Small
Amount

42.50%

5.00%

47.50%

Medium
Amount

25.00%

10.00%

35.00%

Large
Amount

16.25%

1.25%

17.50%

83.75%

16.25%

100.0%

70

65

400

83.75% of sampled invoices have no


errors and 47.50% of sampled invoices
are for small amounts.

Total
Introduction to Statistics

47

Contingency Table Based on


% of Row Totals
No
Errors

Errors

Small
Amount

170

20

190

Medium
Amount

100

40

140

Large
Amount

65

Total

335

5
65

89.47% = 170 / 190


71.43% = 100 / 140
92.86% = 65 / 70

Total

No
Errors

Errors

Total

Small
Amount

89.47%

10.53%

100.0%

Medium
Amount

71.43%

28.57%

100.0%

Large
Amount

92.86%

7.14%

100.0%

83.75%

16.25%

100.0%

70
400

Medium invoices have a larger chance


(28.57%) of having errors than small
(10.53%) or large (7.14%) invoices.

Total
Introduction to Statistics

48

24

7/20/2015

Contingency Table Based on


Percentage of Column Total
No
Errors

Errors

Small
Amount

170

20

190

Medium
Amount

100

40

140

Large
Amount

65

Total

335

50.75% = 170 / 335


30.77% = 20 / 65

Total

5
65

No
Errors

Errors

Total

Small
Amount

50.75%

30.77%

47.50%

Medium
Amount

29.85%

61.54%

35.00%

Large
Amount

19.40%

7.69%

17.50%

100.0%

100.0%

100.0%

70
400

There is a 61.54% chance that invoices


with errors are of medium size.

Total
Introduction to Statistics

49

Tables Used For Organizing


Numerical Data
Numerical Data

Ordered Array

Frequency
Distributions

Introduction to Statistics

Cumulative
Distributions

50

25

7/20/2015

Organizing Numerical Data:


Ordered Array
An ordered array is a sequence of data, in rank order,
from the smallest value to the largest value.
Day Students
Age of
Surveyed
College
Students

16

17

17

18

18

18

19

19

20

20

21

22

22

25

27

32

38

42

Night Students

18

18

19

19

20

21

23

28

32

33

41

45

Introduction to Statistics

Chap 1-51

Organizing Numerical Data:


Frequency Distribution
The frequency distribution is a summary table in which the
data are arranged into numerically ordered class groupings.
You must give attention to selecting the appropriate number
of class groupings for the table, determining a suitable width
of a class grouping, and establishing the boundaries of each
class grouping to avoid overlapping.
The number of classes depends on the number of values in
the data. With a larger number of values, typically there are
more classes. In general, a frequency distribution should
have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the
range (Highest valueLowest value) of the data by the
number of class groupings desired.
Introduction to Statistics

52

26

7/20/2015

Organizing Numerical Data:


Frequency Distribution Example
Example: A manufacturer of insulation randomly
selects 20 winter days and records the daily
high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27

Introduction to Statistics

53

STEPS
1. Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44
, 46, 53, 58
2. Find range: 58 - 12 = 46
3. Select number of classes: 5 (usually between 5 and 15)
4. Compute class interval (width): 10 (46/5 then round up)
5. Determine class boundaries (limits):
1. Class 1: 10 to less than 20
2. Class 2: 20 to less than 30
3. Class 3: 30 to less than 40
4. Class 4: 40 to less than 50
5. Class 5: 50 to less than 60
6. Compute class midpoints: 15, 25, 35, 45, 55
7. Count observations & Introduction
assign totoStatistics
classes
54

27

7/20/2015

Organizing Numerical Data:


Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class

10 but less than 20


20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Total

Midpoints

Frequency

15
25
35
45
55

3
6
5
4
2
20

Introduction to Statistics

55

Organizing Numerical Data:


Relative & Percent Frequency
Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class

10 but less than 20


20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Total

Frequency

3
6
5
4
2
20

Relative
Frequency

Percentage

.15
.30
.25
.20
.10
1.00

15
30
25
20
10
100

Introduction to Statistics

56

28

7/20/2015

Organizing Numerical Data:


Cumulative Frequency
Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class

Frequency

Percentage

Cumulative
Frequency

Cumulative
Percentage

10 but less than 20

15%

15%

20 but less than 30

30%

45%

30 but less than 40

25%

14

70%

40 but less than 50

20%

18

90%

10%

20

100%

20

100

20

100%

50 but less than 60


Total

Introduction to Statistics

57

Why Use a Frequency Distribution?


It condenses the raw data into a more useful form
It allows for a quick visual interpretation of the data
It enables the determination of the major
characteristics of the data set including where the data
are concentrated / clustered

Introduction to Statistics

58

29

7/20/2015

Frequency Distributions:
Some Tips
Different class boundaries may provide different
pictures for the same data (especially for smaller
data sets)
Shifts in data concentration may show up when
different class boundaries are chosen
As the size of the data set increases, the impact of
alterations in the selection of class boundaries is
greatly reduced
When comparing two or more groups with different
sample sizes, you must use either a relative
frequency or a percentage distribution
Introduction to Statistics

59

Visualizing Categorical Data


Through Graphical Displays
Categorical Data
Visualizing Data
Contingency
Table For Two
Variables

Summary Table
For One Variable

Bar
Chart

Pareto
Chart

Side By Side Bar


Chart

Pie Chart
Introduction to Statistics

60

30

7/20/2015

Organizing Categorical Data:


Summary Table
In a bar chart, a bar shows each category, the length
of which represents the amount, frequency or
percentage of values falling into a category.
How Do You Spend the Holidays?
Other

7%

Catching up

5%

Vacation

5%

Travel to visit

38%

At home with

45%

0%

10%

20%

30%

40%

Introduction to Statistics

50%

Chap 1-61

Organizing Categorical Data:


Pie Chart
The pie chart is a circle broken up into slices that
represent categories. The size of each slice of the pie
varies according to the percentage in each category.
How Do You Spend the Holiday's
5%

5%

7%

At home with family

45%

Travel to visit family


Vacation

38%

Catching up on work
Other
Introduction to Statistics

Chap 1-62

31

7/20/2015

Organizing Categorical Data:


Pareto Diagram
Used to portray categorical data
A bar chart, where categories are shown in
descending order of frequency
A cumulative polygon is shown in the same graph
Used to separate the vital few from the trivial
many
Introduction to Statistics

63

Organizing Categorical Data:


Pareto Diagram

45%

100%

40%

90%

35%

80%
70%

30%

60%

25%

50%

20%

40%

15%

30%

10%

20%

5%

10%

0%

0%
Stocks

Bonds

Savings

Introduction to Statistics

cumulative % invested
(line graph)

% invested in each category (bar


graph)

Current Investment Portfolio

CD
64

32

7/20/2015

Visualizing Categorical Data:


Side By Side Bar Charts
The side by side bar chart represents the data from a contingency table.
No
Errors

Errors

Total

Small
Amount

50.75%

30.77%

47.50%

Medium
Amount

29.85%

61.54%

35.00%

Large
Amount

19.40%

7.69%

17.50%

100.0%

100.0%

100.0%

Invoice Size Split Out By Errors & No


Errors
Errors
No Errors
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
Large

Medium

Small

Total

Invoices with errors are much more likely to be of


medium size (61.54% vs 30.77% and 7.69%)
Introduction to Statistics

65

Visualizing Numerical Data By


Using Graphical Displays
Numerical Data

Ordered Array

Stem-and-Leaf
Display

Frequency Distributions and


Cumulative Distributions

Histogram

Introduction to Statistics

Polygon

Ogive

66

33

7/20/2015

Organizing Numerical Data:


Stem and Leaf Display
A stem-and-leaf display organizes data into groups
(called stems) so that the values within each group
(the leaves) branch out to the right on each row.
Age of
Survey
ed
College
Studen
ts

Age of College Students

Day Students
16 17 17 18 18 18
19 19 20 20 21 22
22 25 27 32 38 42

Night Students
18 18 19 19 20 21

Day Students
Stem

Night Students

Leaf

Stem Leaf

67788899

8899

0012257

0138

28

23

15

23 28 32 33 41 45
Introduction to Statistics

2-67

Organizing Numerical Data:


Stem and Leaf Display
A stem-and-leaf display organizes data into groups
(called stems) so that the values within each group
(the leaves) branch out to the right on each row.
Age of College Students
Day Students
Stem

Leaf

Night Students
Stem Leaf

1 67788899

1 8899

2 0012257

2 0138

3 28

3 23

4 2

4 15
Introduction to Statistics

Chap 1-68

34

7/20/2015

Visualizing Numerical Data:


The Histogram
A graph of the data in a frequency distribution is
called a histogram.
In a histogram there are no gaps between adjacent
bars.
The class boundaries (or class midpoints) are shown
on the horizontal axis.
The vertical axis is either frequency, relative
frequency, or percentage.
Bars of the appropriate heights are used to represent
the number of observations within each class.
Introduction to Statistics

69

Visualizing Numerical Data:


The Histogram

10 but less than 20


20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Total

Frequency

3
6
5
4
2
20

Relative
Frequency

.15
.30
.25
.20
.10
1.00

Percentage

15
30
25
20
10
100

10

Histogram: Daily High


Temperature

Frequency

Class

0
5
Introduction to Statistics

15

25

35

45

55 More
70

35

7/20/2015

Visualizing Numerical Data:


The Polygon
A percentage polygon is formed by having the
midpoint of each class represent the data in that class
and then connecting the sequence of midpoints at
their respective class percentages.
The
cumulative
percentage
polygon,
or
ogive, displays the variable of interest along the X
axis, and the cumulative percentages along the Y axis.
Useful when there are two or more groups to
compare
Introduction to Statistics

71

Visualizing Numerical Data:


The Frequency Polygon

10 but less than 20


20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Total

Frequency

Relative
Frequency

3
6
5
4
2
20

.15
.30
.25
.20
.10
1.00

Percentage
15
30
25
20
10
100

10

Frequency Polygon: Daily High


Temperature

Frequency

Class

(In a percentage polygon


the vertical axis would be
defined to show the
percentage of observations
per class)

0
5

Introduction to Statistics

15

25

35

45

55

More

72

36

7/20/2015

Organizing Numerical Data:


The Cumulative Percentage Polygon
Class

Lower
Boundary

% Less Than
Lower
Boundary

10<20

10

20<30

20

15

30<40

30

45

40<50

40

70

50<60

50

90

60

100

Cumulative Percentage

Ogive: Daily High Temperature

100
50
0
10

20

Introduction to Statistics

30

40

50

60

73

Scatter Plots
Scatter plots are used for numerical data
consisting of paired observations taken from
two numerical variables
One variable is measured on the vertical axis
and the other variable is measured on the
horizontal axis
Scatter plots are used to examine possible
relationships between two numerical
variables
Introduction to Statistics

74

37

7/20/2015

Scatter Plot Example


Volume
per day

Cost per
day

23

125

26

140

146

33

160

38

167

42

170

50

188

55

195

60

200

Cost per Day

29

Cost per Day vs. Production


Volume

250
200

150
100

50
0
20

30

40

50

60

70

Volume per Day

Introduction to Statistics

75

Time Series
A Time Series Plot is used to study patterns in the
values of a numeric variable over time
The Time Series Plot:
Numeric variable is measured on the vertical axis and
the time period is measured on the horizontal axis
Attendance (in millions) at USA amusement/theme parks from 2000-2005
Year

Year Number

Attendance

2000

317

2001

319

2002

324

2003

322

2004

328

2005

335
Introduction to Statistics

76

38

7/20/2015

Time Series Example


Attendance (in millions) at US Theme
Parks

Attendance

336
332
328
324
320

316
0

Year (Since 2000)


Introduction to Statistics

77

Principles of Excellent Graphs


The graph should not distort the data.
The graph should not contain unnecessary
adornments (sometimes referred to as chart junk).
The scale on the vertical axis should begin at zero.
All axes should be properly labeled.
The graph should contain a title.
The simplest possible graph should be used for a
given set of data.
Introduction to Statistics

78

39

7/20/2015

Graphical Errors: Chart Junk

Bad Presentation
Minimum Wage

Good Presentation

1960: $1.00

Minimum Wage

1970: $1.60

2
1980: $3.10

0
1990: $3.80

1960

1970

1980

1990

Introduction to Statistics

79

Graphical Errors:
No Relative Basis
Bad Presentation
As received by
students.

Freq.
300

20%

100

10%

0%
SO

JR

As received by
students.

%
30%

200

FR

Good Presentation

SR

FR

SO

JR

SR

FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior


Introduction to Statistics

80

40

7/20/2015

Graphical Errors:
Compressing the Vertical Axis
Bad Presentation
200

Quarterly Sales

50

100

25

0
Q1

Q2

Q3

Good Presentation
Quarterly Sales

Q4

Q1

Q2

Q3

Q4

Introduction to Statistics

81

Class Exercise 1
The owner of the restaurant wanted to study the demand for
dessert. He decided that in addition to studying whether the desert
was ordered, he would also study the gender of individual. Data
were collected from 600 customers and organized in the following
contingency tables.

Dessert Ordered
Yes
No
Total

Gender
Male
Female
40
96
240
224
280
320

Total
136
464
600

a.Construct a contingency tables for row, column and total percentage?


b.Which type of percentage (row, column and total ), do you think more
informative for each gender?
c.What conclusions concerning the pattern of dessert ordering can the
restaurant owner reach?

41

7/20/2015

Class Exercise 2
The Following Table represents estimated green power sales
by renewable energy source 2008
Source

Geothermal
hydro
Landfill mass and biomass
Solar
Unreported
Wind

Percentage

2.8
11.3
28.1
0.2
2.5
55.1

a. Construct a bar chart, pie chart and Pareto chart


b. What conclusion can you reach about the sources of green
power
Source: National renewable energy laboratory,2008

Class Exercise 3

42

7/20/2015

Calculate the following ?


a.
b.
c.
d.
e.
f.
g.
h.

Divide the data into classes


Absolute frequency
Relative frequency
Percentages
Cumulative frequency
Cumulative percentage
Midpoints
Draw Histogram and relative frequency
polygon

THANKS

Introduction to Statistics

86

43

You might also like