Lecture 2
Lecture 2
Data: are facts and figures from which conclusions can be drawn. For example, your
income, your age, your education level, are all examples of data.
Variable: To collect data, we first have to define any particular characteristics that
we are interested in. These will be the variables. For example, the mark on a statistics
exam is a characteristic of statistics exams that is certainly of interest to students of
this class. Not all of the students achieve the same mark. The marks will vary from
students to student, thus the name is variable. We usually represent the name of the
variable using capital letter such as X, Y, and Z.
Data Set for each element of the study, we will collect data items for the variables of
Nominal
Ordinal
Quantitative Data: All real numbers – also referred to as Interval or numerical. For
example: heights, weights, prices, etc. All arithmetic calculations are meaningful.
These data represent quantities measured with a fixed or standard unit of measure. For
instance, age, income, temperature are all numbers which represent the quantity and have a
unit. Money, for example, could be in dollars, and temperature is in Fahrenheit.
Chapter 2 - OPRE 6301 2
Qualitative Data: The values represent categories. For example: marital status, coded as
1 = Single; 2 = Married; 3 = Divorced; 4 = Widowed. Calculations on this type of data are
meaningless. Nominal data are also called qualitative or categorical.
This type can be:
NOTE:
Only calculations involving a ranking process are allowed for ordinal data
No calculations are allowed for nominal data. Only counting the number of observations
Number of siblings
Annual salary
Hair color
Chapter 2 - OPRE 6301 3
Frequency distribution: a tabular summary of data showing the frequency (or number)
of observations in each of several non-overlapping categories or classes.
Example (Marada inn): Guests staying at Marada inn were asked to rate the quality of
their accommodations as being excellent, above average, average, below average, or poor.
The ratings provided by a sample of 20 guests are:
Rating Frequency
Poor
Below average
Average
Above average
Excellent
Total
Relative frequency distribution represents the fraction or proportion of the total number
of data items belong to each class.
Percent relative frequency distribution (%) represents the relative frequency multiplied
by 100 (percent frequency of the data for each class).
Poor
Below average
Average
Above average
Excellent
Total
Bar chart: a graphical display for depicting categorical data summarized in a frequency,
relative frequency, or percent frequency distribution.
Horizontal axis: the labels that are used for the classes (categories)
The bars are separated to emphasize the fact that each class is a separate category.
Pie chart: A commonly used graphical device for presenting relative frequency and percent
frequency distributions for categorical data. The graph is constructed with a circle which
is subdivide into sectors and the area of each sector represents the relative (or percent)
frequency for each class.
Chapter 2 - OPRE 6301 6
There are two ways to describe the relationship between two nominal variables:
Tabular method: It is cross-classification table that lists the frequency of each combi-
Example 2.4: A major North American city has four competing newspapers: the Globe and
Mail (G & M), Post, Star, and Sun. To help design advertising campaigns, the advertising
managers of the newspapers need to know which segments of the newspaper market are read-
ing their papers. A survey was conducted to analyze the relationship between newspapers
read and occupation. A sample of newspaper readers was asked to report which newspaper
they read - Globe and Mail (1), Post (2), Star (3), Sun (4) - and indicate whether they were
blue-collar workers (1), white-collar workers (2), or professionals (3). Some of the data are
listed here (full data set is given in Xm02-04).
If occupation and newspaper are related, there will be differences in the newspapers read
among the occupations. An easy way to see this is to convert the frequencies in each row
(or column) to relative frequencies in each row (or column). That is, compute the row (or
column) totals and divide each frequency by its row (or column) total, as shown in the
following table.
Note that the totals may not equal 1 because of rounding.
Interpret:
If the two variables are unrelated, then the patterns exhibited in the bar charts should be
approximately the same. If some relationship exists, then some bar charts will differ from
others.
The graphs tell us the same story as did the table. The shapes of the bar charts for occupa-
tions 2 and 3 (white collar and professional) are very similar. Both differ considerably from
the bar chart for occupation 1 (blue collar).