Chap 2 Notes
Chap 2 Notes
graphically
Raw data, or data that have not been summarized in any way, are sometimes referred to as
ungrouped data. Table 2.1 contains 60 years of raw data of the unemployment rates for Canada.
Data that have been organized into a frequency distribution are called grouped data. Table 2.2
presents a frequency distribution for the data displayed in Table 2.1.
Frequency Distribution
One particularly useful tool for grouping data is the frequency distribution, which is a summary of
data presented in the form of class intervals and frequencies. When constructing a frequency
distribution, the business researcher should first determine the range of the raw data. The range
often is defined as the difference between the largest and smallest numbers. The range for the data
in Table 2.1 is 9.7(12.0–2.3). Next determine the class intervals and determine width of each interval
eg: Table 2.2.
Class Midpoint
The midpoint of each class interval is called the class midpoint or class mark eg: Table 2.2, the
midpoint of the class interval 3–under 5 is 4, or (3 + 5)/2.
Relative Frequency
It is the proportion of the total frequency that is in any given class interval in a frequency
distribution. Relative frequency is the individual class frequency divided by the total frequency. For
example, from Table 2.3, the relative frequency for the class interval 5–under 7 is 13/60 = .2167
Cumulative Frequency
The cumulative frequency for each class interval is the frequency for that class interval added to
the preceding cumulative total. In Table 2.3 the cumulative frequency for the 1st class is the same as
the class frequency: 4. The cumulative frequency for the 2nd class interval is the frequency of that
interval (12) plus the frequency of the first interval (4), which yields a new cumulative frequency of
16. This process continues through the last interval, frequencies (60)
Histogram
A histogram is a tool for differentiating the frequencies of class intervals. A quick glance at a
histogram reveals which class intervals produce the highest frequency totals. Figure 2.1 is a
histogram of the frequency distribution in Table 2.2, shows that the class interval 7–under 9 yields
by far the highest frequency count (19). Examination of the histogram reveals where large increases
or decreases occur between classes.
Histogram has importance in yielding information about the shape of the distribution of a large
database, the variability of the data, the central location of the data, and outlier data.
Frequency Polygons
A frequency polygon, like the histogram, is a graphical display of class frequencies. However, instead
of using bars or rectangles like a histogram, each class frequency is plotted as a dot at the class
midpoint, and the dots are connected by a series of line segments. Construction of a frequency
polygon begins by scaling class midpoints along the horizontal axis and the frequency scale along the
vertical axis.
Ogives
An ogive (o-jive) is a cumulative frequency polygon. Construction begins by labeling the x-axis with
the class endpoints and the y-axis with the frequencies. Ogives are most useful when the decision
maker wants to see running totals. For example, if a comptroller is interested in controlling costs, an
ogive could depict cumulative costs over a fiscal year. Steep slopes in an ogive can be used to
identify sharp increases in frequencies.
Dot Plots
In a dot plot, each data value is plotted along the horizontal axis and is represented on the chart by a
dot. If multiple data points have the same values, the dots will stack up vertically. Dot plots can be
especially useful for observing the overall shape of the distribution of data points along with
identifying data values or intervals for which there are groupings and gaps in the data.
Stem-and-leaf plots
Another way to organize raw data into groups besides using a frequency distribution is a stem-and-
leaf plot. It is constructed by separating the digits for each number of the data into two groups, a
stem and a leaf. The leftmost digits are the stem and consist of the higher valued digits. The
rightmost digits are the leaves and contain the lower values. If a set of data has only two digits, the
stem is the value on the left and the leaf is the value on the right. For example, if 34 is one of the
numbers, the stem is 3 and the leaf is 4.
Qualitative data graphs
There are three types of qualitative data graphs: (1) pie charts, (2) bar charts, and (3) Pareto charts.
Pie Charts
A pie chart is a circular depiction of data where the area of the whole pie represents 100% of the
data and slices of the pie represent a percentage breakdown of the sublevels. They are widely used
in business, particularly to depict such things as budget categories, market share, and time/resource
allocations.
Bar graphs
A bar graph or chart contains two or more categories along one axis and a series of bars, one for
each category, along the other axis. Typically, the length of the bar represents the magnitude of the
measure (amount, frequency, money, percentage, etc.) for each category. In Excel, horizontal bar
graphs are referred to as bar charts, and vertical bar graphs are referred to as column charts.
Preto Charts
Pareto analysis is a quantitative tallying of the number and types of defects that occur with a
product or service. Analysts use this tally to produce a vertical bar chart that displays the most
common types of defects, ranked in order of occurrence from left to right. The bar chart is called a
Pareto chart. A Pareto chart enables quality management decision makers to separate the most
important defects from trivial defects, which helps them to set priorities for needed quality
improvement work.
Pareto analysis contains a cumulative percentage line graph. Observe the slopes on the line graph. The steepest slopes represent the more
frequently occurring problems.