0% found this document useful (0 votes)
9 views14 pages

STAE Lecture Notes - LU2

Uploaded by

aneenzenda06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

STAE Lecture Notes - LU2

Uploaded by

aneenzenda06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Learning Unit 2: TABLES AND GRAPHS

LEARNING OBJECTIVES
• Construct a frequency distribution
• Compute and interpret various components of a frequency distribution
• Construct and interpret a pie chart, bar graph, stem-and-leaf plot, histogram, frequency polygon and
ogive

Textbook reference: Chapter 1

2.1. Frequency Tables


Data can be represented graphically in a number of different ways. The appropriate graphical method to use
depends on the variable type. Some graphical methods use the raw data as input, while others use the frequency
table/distribution of the variable as input. Frequency tables are either ungrouped or grouped.

2.1.1. Ungrouped frequency table


An ungrouped frequency table, or ungrouped frequency distribution, lists each unique value of the variable
and records the frequency of occurrence of each value. None of the variable values are merged together with
other values, as such the values remain “ungrouped”. Categorical and discrete numerical data consisting of a
relatively small number of possible values are summarised in ungrouped frequency tables. The values of
nominal variables are typically sorted alphabetically, while ordinal variable values are listed based to its
inherent ordering. Discrete numerical variable values are ordered numerically in the table. Tables 2, 3 and 4
show the ungrouped frequency tables for three difference variables from Table 1 in Learning Unit 1.

Frequency Frequency Frequency


Filter 7 Not important 5 1 4
Instant 13 2 4 2 5
Total 20 3 5 3 4
4 4 4 3
Very important 2 5 3
Total 20 6 1
Total 20
Table 2: Coffee type
Table 3: Choice of brand rating Table 4: Household size
preference

1
2.1.2. Grouped frequency table
An ungrouped frequency table of a variable consisting of a large number of possible and different values will
be impractical in summarising the information. Grouped frequency tables are only constructed for numerical
data. For continuous data and discrete data consisting of many possible values, the values of the variable are
organised into classes or intervals. The grouped frequency table, or grouped frequency distribution, then lists
the intervals and records the number of observations in each interval. The purpose of grouping data is to
highlight the main features of the data and present the information more effectively. Grouping must be done
in such a way that important information is not lost.

The following points should be considered when constructing a grouped frequency table:
• The number of class intervals
o Too few intervals would group together too many values and lead to information loss
o Too many intervals would not give much more information that the original raw data values
• The size (width) of the class intervals
o Depends on the number of class intervals
o All classes must be of equal width
o Width = Upper limit − Lower limit
o The smallest value must be recorded in the first interval and the largest value in the last interval
• All class intervals must be non-overlapping
o The following notation can be used to denote the class intervals for age groups:
▪ [20, 30) and [30, 40)
▪ 20  x  30 and 30  x  40

There are many algorithms that can be used to find the optimal solution for grouping data together. That is
beyond the scope of this course. Table 5 shows the grouped frequency table for the age variable in Table 1.
Class intervals are structured as decades (note the inclusion and exclusion of the class limits/boundaries).
Age group Frequency
[10, 20) 3
[20, 30) 10
[30, 40) 7
Total 20
Table 5: Age categories

2
2.1.3. Components of a frequency table
A complete frequency table consists of many different components.

Values/Class intervals
List of all possible values of the variable (categorical/discrete) or class intervals (discrete/continuous).

Frequency
The count of all possible outcomes (individual values or class intervals) of the variable. Frequency is denoted
by f.

Sample size
Total number of observations in the dataset. For frequency tables f =n.

Relative frequency (RF)


The proportion of the total number of observations corresponding to each value or class interval of the variable,
where the sum of all proportions is equal to 1.
frequency
Relative frequency =
n

Relative frequency percentage (RF%)


The relative frequency expressed as a percentage, where the sum of all percentages is equal to 100.
frequency
Relative frequency percentage = 100
n

Cumulative frequency (CF)


The total number of observations below a certain point. The last cumulative frequency corresponds to the last
value or class interval in the frequency table and is equal to n. In grouped frequencies tables the cumulative
frequency is associated with the upper limit of the class interval.

Cumulative relative frequency (CRF)


CF expressed as a proportion of n, where the lasts CRF is equal to 1.

Cumulative relative frequency percentage (CRF%)


CRF expressed as a percentage, where the lasts CRF% is equal to 100.

3
Class width
The class width of class intervals in a grouped frequency table is the difference between the upper and lower
limits of the interval, i.e., Upper limit – Lower limit, irrespective of whether the limits are inclusive or
exclusive.

Class midpoint (MP)


In grouped frequency tables the value of the class midpoint is often used to represent the whole class interval.
lower limit + upper limit
Midpoint =
2

The complete frequency table for the grouped frequency age variable given in Table 5 is as follows:
Age group Frequency RF RF% CF CRF CRF% MP
[10, 20) 3 0.15 15 3 0.15 15 15
[20, 30) 10 0.50 50 13 0.65 65 25
[30, 40) 7 0.35 35 20 1 100 35
Total 20 1 100

Steps to enter frequency data for numerical variables into calculator


1) SETUP →3:STAT → 1:ON
2) MODE → 2:STAT → 1: 1 – VAR
3) Enter variable values (ungrouped) or midpoints (grouped) in the column labelled X
4) Enter frequencies in the column labelled FREQ
5) AC

Exercise 2.1
1) Construct a complete frequency table for the gender variable in Table 1, listed in order as follows:
Female Female Female Female Female Female Female Female Female Male
Male Male Male Male Male Male Male Male Male Male

Gender Frequency RF RF%


Male
Female
Total

Note: CF, CRF, CRF% and MP are not applicable here.

4
2) Construct a complete frequency table for the daily coffee consumption variable in Table 1, listed in order
as follows:
1 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 5 5 7 8

Coffee consumption f RF RF% CF CRF CRF%


1
2
3
4
5
6
7
8
Total

3) Construct a complete frequency table for the coffee affinity score variable in Table 1, where the first class
is (0, 1], listed in order as follows:
0.1 0.2 0.4 0.4 0.6 0.8 1.0 1.4 1.8 1.9 1.9 2.3 2.4 3.1 3.1 3.4 3.6 4.4 4.6 4.9

Coffee affinity
f RF RF% CF CRF CRF% MP
score
(0, 1]
(1, 2]
(2, 3]
(3, 4]
(4, 5]
Total

5
2.2. Contingency Tables
A contingency table, also known as a cross-tabulation, is a special type of frequency table in a matrix format
that is used to examine the relationship between two or more variables (categorical or numerical discrete)
simultaneously. If two variables are summarised in a contingency table, the values of one variable are listed
in the rows of the table and the values of the second variable are listed in the columns of the table. The
intersection of a row and a column is referred to as a cell of the table. The frequencies in the cells of the table
are referred to as observed (or cell or joint) counts (or frequencies). All the cell frequencies add up to n. Cell
frequencies can be expressed as percentages of the sample (table %), percentages within a single column
(column %) or percentages within a single row (row %), depending on the objective of the analysis.

Table 6 shows the contingency table of gender by coffee preference from the data in Table 1. Since gender
has two possible outcomes and coffee preference has two possible outcomes, this yields a 2×2 contingency
table. The additional row and column in the table do not denote possible outcomes of the variable, but show
the total counts for each separate variable, called the marginal counts/totals/frequencies per variable.

Coffee type preference


Observed frequencies
Filter Instant Total
Male 3 8
Gender Female 4 5
Total
Table 6: Contingency table of gender by coffee type preference

Exercise 2.2
1) Complete Table 6
2) What percentage of consumers drink filter coffee? ___________
This is an example of __________________ %
3) Among those who only drink filter coffee, what percentage is male? ___________
This is an example of __________________ %
4) What percentage of females only drink instant coffee? ___________
This is an example of __________________ %
5) What percentage of consumers are females who drink instant coffee? ___________
This is an example of __________________ %

6
2.3. Shape Of A Distribution
Frequency distributions and graphs summarise data so that important features in the data and the distribution
of the variable across the scale are presented in an effective way. The shape of a distribution is described in
terms of its symmetry and modality and determines the appropriate analyses that can be performed on the data.

2.3.1. Symmetry
A distribution is symmetric if the left side of the distribution mirrors the right side of the distribution. In a
symmetric distribution the mean, median and mode coincide (measures are discussed in detail in Section 3.1).

Mean
Median
Mode

A distribution is asymmetric or skewed if the left side of the distribution is different from the right side of the
distribution and is either left-skewed or right-skewed. A left-skewed distribution, also termed negatively
skewed, has a longer tail to the left of the distribution. A right-skewed distribution, also termed positively
skewed, has a longer tail to the right of the distribution.

The following two graphs show the shape of the left-skewed and right-skewed distributions and the
relationship between the mean, median and mode for both.

Positively skewed Negatively skewed

7
2.3.2. Modality
The modality of a distribution refers to the number of significant peaks in the shape of the distribution, i.e.,
high frequencies associated with a value or a class interval. A unimodal distribution has one peak, a bimodal
distribution has two peaks, and a multimodal distribution has more than two peaks.

Unimodal distribution Bimodal distribution

2.4. Graphical Methods


Data can be represented graphically in a number of different ways. The appropriate graphical method to use
depends on the variable type.

2.4.1. Pie chart


Pie charts are used to display the parts of the whole for categorical and discrete numerical variables. The slices
of the pie must be clearly identified and can be labelled with frequency counts, RF or RF%. The figure below
shows the pie chart (with frequencies) for coffee type preference, using the ungrouped frequency table data in
Table 2 as input.

Frequency
Filter 7
Instant 13
Total 20

8
2.4.2. Bar graph
A bar graph uses vertical or horizontal bars to show a comparison between possible outcomes of a categorical
or a discrete numerical variable. The sizes of the bars represent the frequencies, RF or RF%. There are gaps
between the bars as the outcomes of the variable are separated and not measured on a continuum. The
following figure shows the bar graph (with frequencies) of the choice of brand rating, using the ungrouped
frequency table data in Table 3 as input.

Frequency
Not important 5
2 4
3 5
4 4
Very important 2
Total 20

2.4.3. Stem-and-leaf plot


The stem-and-leaf plot is a graph used for numerical data where the data values are split into a stem and a leaf.
The stem values appear on the vertical axis and the leaves are shown on the horizontal axis. The stems can be
seen as class intervals, and the leaves denote the individual observations within each interval. The plot itself
is unit-free but must always have a key to indicate the unit of measure for either the stem or the leaf or both.
For example, the values 5, 11, 13, 13, 20 and 22, can be split into units of tens (stem) and units of ones (leaf),
producing the following stem-and-leaf plot (leaf unit = 1):

Stem Leaf
0 5
1 1 3 3
2 0 2

However, if the original values were 0.5, 1.1, 1.3, 1.3, 2.0 and 2.2, the resulting stem-and-leaf plot will be the
same as above, but the unit of measure will change, i.e., leaf unit = 0.1.

9
The following graph shows the stem-and-leaf plot of age, using the raw data from Table 1 as input. The data
range from 19 to 40. The first digit is used as the stem and the second digit as the leaf. For this data the leaf
unit = 1. The data are listed in order as follows:

19 19 19 21 24 24 25 26 26 28 29 29 30 32 34 35 35 36 37 40

Stem Leaf
1 9 9 9
2 1 4 4 5 6 6 8 9 9
3 0 2 4 5 5 6 7
4 0

2.4.4. Histogram
The histogram is used to graph a grouped frequency distribution. The interval widths are represented on the
horizontal axis and the frequencies, RF or RF% are represented on the vertical axis. Vertical bars are
constructed above the class limits. Since the class limits are on a continuum there are no gaps between the
bars. The following graph shows the histogram of age (with frequencies) using the grouped frequency table
data in Table 5 as input.

Age group Frequency


[10, 20) 3
[20, 30) 10
[30, 40) 7
Total 20

2.4.5. Frequency polygon


The frequency polygon is used to graph a grouped frequency distribution. The midpoints of each interval are
represented on the horizontal axis and the frequencies, RF or RF% are represented on the vertical axis and are
shown as points on the graph. To better show the shape of the distribution, additional class intervals are added
before the first class interval and after the last class interval, both with a frequency of zero. Then the first point
of the graph lies on the x-axis one interval width below the MP of the first class, and the last point lies on the
x-axis one interval width above the MP of the last class. The points are joined to form a line graph.

10
The following graph shows the frequency polygon of age, using the grouped frequency table data in Table 5,
together with the MP, as input.

Age group Frequency MP


[0, 10) 0 5
[10, 20) 3 15
[20, 30) 10 25
[30, 40) 7 35
[40, 50) 0 45
Total 20

The frequency polygon is an alternative to the histogram as both graphs convey the same information, as can
be seen when the two graphs are shown together in a single graph as follows:

2.4.6. Ogive
The ogive is a graphical representation of a CF, CRF or CRF%. Cumulative frequencies are represented on
the vertical axis and points are plotted above the upper class limits indicated on the horizontal axis. The first
point coincides with the lower class limit of the first class on the x-axis, with a CF of zero. The following
graph shows the ogive of age, using the cumulative frequencies in Table 5, together with the CF, as input.

Age group Frequency CF


[10, 20) 3 3
[20, 30) 10 13
[30, 40) 7 20
Total 20

11
Exercise 2.3
Use the frequency tables constructed in Exercise 2.1 to draw the following graphs:
1) Pie chart for gender (use RF%)

Gender RF%
Male 55
Female 45
Total 20

2) Bar graph for coffee consumption (use frequency)


Coffee consumption f
1 5
2 6
3 3
4 2
5 2
6 0
7 1
8 1
Total 20

3) Histogram for coffee affinity (use RF)


Coffee affinity
RF
score
(0, 1] 0.35
(1, 2] 0.20
(2, 3] 0.10
(3, 4] 0.20
(4, 5] 0.15
Total 1

12
4) Frequency polygon for coffee affinity (use RF%)

Coffee
affinity RF% MP
score
(−1, 0] 0 −0.5
(0, 1] 35 0.5
(1, 2] 20 1.5
(2, 3] 10 2.5
(3, 4] 20 3.5
(4, 5] 15 4.5
(5, 6] 0 5.5
Total 100

5) Stem-and-leaf plot for coffee affinity (use raw data)


0.1 0.2 0.4 0.4 0.6 0.8 1.0 1.4 1.8 1.9 1.9 2.3 2.4 3.1 3.1 3.4 3.6 4.4 4.6 4.9

Stem Leaf (leaf unit = )

6) Compare (3), (4) and (5), and comment on the symmetry and modality

13
Stem Leaf (leaf unit = 0.1)
0 124468
1 04899
2 34
3 1146
4 469

7) Ogive for coffee affinity (use CF) and interpret

Coffee
affinity CF x-axis
score

(0, 1] 7
(1, 2] 11
(2, 3] 13
(3, 4] 17
(4, 5] 20
Total

14

You might also like