0% found this document useful (0 votes)
51 views

Lecture 2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Lecture 2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

OPRE 6301-SYSM 6303: Chapter 2

Graphical Descriptive Techniques I

Types of Data and Information


The objective of statistics is to extract information from data. There are different types of
data and information. To help explain this principle, we need to define some terms:

ˆ Data: are facts and figures from which conclusions can be drawn. For example, your

income, your age, your education level, are all examples of data.

ˆ Variable: To collect data, we first have to define any particular characteristics that

we are interested in. These will be the variables. For example, the mark on a statistics
exam is a characteristic of statistics exams that is certainly of interest to students of
this class. Not all of the students achieve the same mark. The marks will vary from
students to student, thus the name is variable. We usually represent the name of the
variable using capital letter such as X, Y, and Z.

ˆ Data Set for each element of the study, we will collect data items for the variables of

interest, and the result will be the data set.

Data fall into two main groups:

1. Quantitative (Numerical or Interval) data

2. Qualitative (categorical) data:

ˆ Nominal

ˆ Ordinal

Quantitative Data: All real numbers – also referred to as Interval or numerical. For
example: heights, weights, prices, etc. All arithmetic calculations are meaningful.
These data represent quantities measured with a fixed or standard unit of measure. For
instance, age, income, temperature are all numbers which represent the quantity and have a
unit. Money, for example, could be in dollars, and temperature is in Fahrenheit.
Chapter 2 - OPRE 6301 2

Qualitative Data: The values represent categories. For example: marital status, coded as
1 = Single; 2 = Married; 3 = Divorced; 4 = Widowed. Calculations on this type of data are
meaningless. Nominal data are also called qualitative or categorical.
This type can be:

ˆ Both non-numeric or numeric

ˆ Nominative: no meaningful order exists. Example: gender

ˆ Ordinal: can be ordered. Example: Customer level of satisfaction

NOTE:

ˆ All calculations are permitted on interval data

ˆ Only calculations involving a ranking process are allowed for ordinal data

ˆ No calculations are allowed for nominal data. Only counting the number of observations

in each category is allowed.

What type of data is:

ˆ Postal zip code

ˆ Number of siblings

ˆ Highest degree earned

ˆ Annual salary

ˆ Hair color
Chapter 2 - OPRE 6301 3

Describing a set of nominal or ordinal data:


The only allowable calculation on nominal data is to count the frequency or compute the
percentage that each value of the variable represents. We can summarize the data in a ta-
ble, which represents the categories and their counts, called frequency distribution. A
relative frequency distribution lists the categories and the proportion of data belong to
each category.

Frequency distribution: a tabular summary of data showing the frequency (or number)
of observations in each of several non-overlapping categories or classes.

Frequency = number of times a value was observed in the data set

Example (Marada inn): Guests staying at Marada inn were asked to rate the quality of
their accommodations as being excellent, above average, average, below average, or poor.
The ratings provided by a sample of 20 guests are:

Rating Frequency
Poor
Below average
Average
Above average
Excellent
Total

Table 1: Frequency distribution of ratings


Chapter 2 - OPRE 6301 4

Relative frequency distribution represents the fraction or proportion of the total number
of data items belong to each class.

frequency of the class


Relative frequency of a class =
total number of observations

Percent relative frequency distribution (%) represents the relative frequency multiplied
by 100 (percent frequency of the data for each class).

Percent relative frequency = Relative frequency of a class × 100

Rating Relative Frequency Percent Frequency

Poor

Below average

Average

Above average

Excellent

Total

Table 2: Frequency distribution of ratings


Chapter 2 - OPRE 6301 5

Bar chart: a graphical display for depicting categorical data summarized in a frequency,
relative frequency, or percent frequency distribution.

ˆ Horizontal axis: the labels that are used for the classes (categories)

ˆ Vertical axis: frequency, relative frequency, or percent frequency scale

The bars are separated to emphasize the fact that each class is a separate category.

Pie chart: A commonly used graphical device for presenting relative frequency and percent
frequency distributions for categorical data. The graph is constructed with a circle which
is subdivide into sectors and the area of each sector represents the relative (or percent)
frequency for each class.
Chapter 2 - OPRE 6301 6

There are two ways to describe the relationship between two nominal variables:

ˆ Tabular method: It is cross-classification table that lists the frequency of each combi-

nation of the classes.

ˆ Graphical method: Using bar chart

Example 2.4: A major North American city has four competing newspapers: the Globe and
Mail (G & M), Post, Star, and Sun. To help design advertising campaigns, the advertising
managers of the newspapers need to know which segments of the newspaper market are read-
ing their papers. A survey was conducted to analyze the relationship between newspapers
read and occupation. A sample of newspaper readers was asked to report which newspaper
they read - Globe and Mail (1), Post (2), Star (3), Sun (4) - and indicate whether they were
blue-collar workers (1), white-collar workers (2), or professionals (3). Some of the data are
listed here (full data set is given in Xm02-04).

Reader Occupations Newspaper


1 2 2
2 1 4
3 2 1
. . .
. . .
. . .
352 3 2
353 1 3
354 2 3

Determine whether the two nominal variables are related.


Chapter 2 - OPRE 6301 7

Cross-classification table of frequencies describe the relationship between two nominal


variables.
Table 3: Cross-classification table of frequencies

Occupation G&M Post Star Sun Total


Blue Collar 27 18 38 37 120
White Collar 29 43 21 15 108
Professional 33 51 22 20 126
Total 89 112 81 72 354

If occupation and newspaper are related, there will be differences in the newspapers read
among the occupations. An easy way to see this is to convert the frequencies in each row
(or column) to relative frequencies in each row (or column). That is, compute the row (or
column) totals and divide each frequency by its row (or column) total, as shown in the
following table.
Note that the totals may not equal 1 because of rounding.

Table 4: Table of row relative frequencies

Occupation G&M Post Star Sun Total


Blue Collar 0.23 0.15 0.32 0.31 1
White Collar 0.27 0.4 0.19 0.14 1
Professional 0.26 0.4 0.17 0.16 1
Total 0.25 0.32 0.23 0.2 1
Chapter 2 - OPRE 6301 8

Graphing the relationship


There are several ways to graphically display the relationship between two nominal variables.
We have chosen two-dimensional bar charts for each of the three occupations.

Interpret:
If the two variables are unrelated, then the patterns exhibited in the bar charts should be
approximately the same. If some relationship exists, then some bar charts will differ from
others.
The graphs tell us the same story as did the table. The shapes of the bar charts for occupa-
tions 2 and 3 (white collar and professional) are very similar. Both differ considerably from
the bar chart for occupation 1 (blue collar).

You might also like