BUSINESS STATISTICS - Unit-2
BUSINESS STATISTICS - Unit-2
BUSINESS STATISTICS - Unit-2
STATISTICS
UNIT-2
ORGANISING AND PRESENTING DATA
BY:
DR. MADHAVI KAPOOR
CODING OF DATA
Coding of data involves assigning of symbols (numerical) to each response of the question. The
purpose of giving numerical symbols is to translate raw data into numerical data, which may be
counted and tabulated.
The task of researcher is to give numbers/codes (1,2,3....) to response carefully. As we have
already discussed various types of questions (such as open-end, etc.,) in the previous block, the
coding scheme will vary accordingly.
For example, a close end question may be already coded and hence it has to be just included in
the code book whereas coding of open-end questions involves operations such as classification
of major responses and Developing a response category of ‘other’ for responses which were not
given frequently
TABULATION AND ORGANIZATION
OF QUANTITATIVE DATA
Tables and graphs are commonly used to tabulate the data on a large number of subjects in a
condensed and summarized form.
We may use the frequency table or cumulating frequency distribution or the contingency table.
Frequency Distribution
Data collected from a test and by using other gathering/measuring tools are raw and may have
little meaning to the researcher until they are tabulated and organized in a systematic order.
One of the ways of doing so is to prepare a frequency table or a frequency distribution which
depicts the number of subjects distributed among the various groups or categories of
characteristics.
Cumulative Frequency/Relative
Frequency Distribution
In some cases, we may not be concerned with the frequencies within the class intervals, but
rather with the number or the percentage of values greater than or less than a specified value.
The main purpose of computing a percentage is to be able to compare groups or class intervals
in a frequency table.
There include:
i) Histogram or column diagram
ii) Frequency polygon
iii) Cumulative percentage curve or ogive
Histogram or column diagram
A histogram or column diagram is a graph in which class-intervals are represented along the
horizontal axis and their corresponding frequencies are represented by areas in the form of
rectangular vertical bars drawn on the intervals. Frequency distribution on histogram represents
interval and ratio scale distribution.
The following steps are followed in preparing a histogram.
Step 1: A horizontal line is drawn at the bottom of a graph paper. Units representing class-intervals
are marked along this line.
Step 2: A vertical line is drawn at the left hand extreme of the horizontal axis. Along this vertical
axis, units representing individual frequencies of the class-intervals are marked.
Step 3: Taking class units as bases, rectangles are drawn, such that the areas of rectangles are
proportional to the frequencies of the corresponding classes.
Frequency Polygon
Frequency polygon is also used to represent interval or ratio scale data. It is, you would notice,
is a shape enclosed by straight lines.
Frequency polygon is drawn by plotting the mid-point of each class-interval (i.e. bars in the
histogram) at a height proportional to its respective frequency and then joining the points by
straight lines including those with zero frequency at the two ends (Refer to Figure 10.2).
The first two steps are identical to those used in the construction of a histogram.
The next step to be followed is given as under:
Step 3: Directly above the mid-point of each class-interval along the horizontal axis plot the
points at a height proportional to the respective frequencies.
Join these points by straight lines. The frequency polygon for the distribution of Table 10.7 is
shown in the Figure 10.2.
Cumulative Percentage Curve or Ogive
When the frequencies are expressed as cumulative percentage of N on the vertical axis, the graphic
representation is known as a cumulative percentage curve or ogive.
After finding the cumulative percentage frequencies, the points are plotted on the exact upper limits
of the class-intervals.
A curve joining the points thus obtained is called the cumulative percentage curve or ogive.
The cumulative percentage curve or ogive of the distribution represented in Table 10.9 is illustrated in
Figure 10.3.
Stem-and-Leaf Plot
As already mentioned above the stem-and-leaf plot is a histogram-style tabulation of data.
Consider the data set presented in Table 10.3. sort the data in the ascending order (i.e. starting
from 57,60,61,63……………95,97,98).
A stem-and-leaf plot of this data can be constructed by writing the first digits in the first
column (under stem and leaf), then writing the second digits of all the numbers in that range to
the right as shown in Figure 10.4
Box-and-Whiskers Plot
A box-and-whiskers plot, sometimes referred to as box plot is also a histogram-like method of
displaying data. This statistic is used to help understand the distribution of the data in terms of
percentile.
The median (middle value of the data) is called the 50th percentile, which means that 50% of
the data are below the median and 50% are above the median. In the same way the 25th
percentile is that number where 25% of the data are below that number and 75% are above.
The 75th percentile is similar. Fifty per cent of the data lie between the 25th and 75th
percentiles. Another statistical measure of position is the quartile.
A quartile divides the distribution into quarters. The quartiles denoted by Q1, Q 2 and Q 3 in
Figure 10.5 are the three numbers that occupy the 25th, 50th, and 75th percentiles,
respectively.
Let us understand this with the help of an example. Suppose we have the following numbers
(already ranked): (1, 3, 4, 5, 5, 6, 7, 7, 7, 8, 8, 10, 10, 11 and 19)
Using this data now let us develop the box plot. Given herewith are the steps to draw a boxplot.
1. First identify the median and the 25th and 75th percentile value from the data.
2. Draw a box from the 25th to the 75th percentile.
3. Split the box with a line at the median.
4. Draw a thin line (whisker) from the 75th percentile up to the maximum value.
5. Draw another thin line from the 25th percentile down to the minimum value.
Now tally your plot with the display of the whiskers and box plot representing the data given
herewith.