Statistics Ch-02

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

Graphs, Charts, and Tables

– Describing Your Data

Week – 2
Construction
Construction of
of aa Frequency
Frequency
Distribution
Distribution

Raw data Graph

Question
to be Collect
Collect Organize
Organize Present
Present Draw
Draw
addressed data
data data
data data
data conclusion
conclusion

Frequency
distribution
Frequency Distributions

What is a Frequency Distribution?


 A frequency distribution is a list or a table …
 containing the values of a variable (or a set of
ranges within which the data fall) ...
 and the corresponding frequencies with which
each value occurs (or frequencies with which
data fall within each range)
Why Use
Frequency Distributions?

 A frequency distribution is a way to


summarize data
 The distribution condenses the raw data
into a more useful form...
 and allows for a quick visual interpretation
of the data
Summarization of Quantitative Data

In this section construction of grouping


frequencies into tables, is explained. Relative
frequency, and relative cumulative frequency
have also been defined and are calculated.
Their uses have also been discussed.
Frequency Table and Frequency Distribution

Frequency table is a two-column tabular


presentation of the data. First column shows the
different values of variable and second column
the corresponding frequencies. To explain this,
suppose we take 120 students from King Faisal
University and record their weights to the
nearest Kg.
Frequency Table and Frequency Distribution

This is known as raw or ungrouped data. As the


data is presented, it is difficult to understand
how the weights of students are distributed.
Only after some search, we can find that the
minimum value is 45 and maximum value is 98.
One can say that the weight of the 120 students
of this University varies from 45 Kg to 98 Kg.
Therefore, for better understanding we need
some more manipulation of raw data.
Frequency Table and Frequency Distribution

In order to get a clear picture of the data, the data


are presented in a condensed form, which is only
possible if the data are grouped into a number of
classes. If some one is working on the statistical
packages, like SPSS s/he can directly condense
the data into sufficient number of groups or
classes. How many groups should be there and
how to make groupings? These two questions are
very common for medical scientists. Let us deal
with these, one by one.
Frequency Table and Frequency Distribution

Before grouping the data, it is important to


decide upon the number of groups to be made.
As a general rule, the number of groups should
neither be too small so that all the information is
lost nor should be so large that no useful
summarization is obtained. Usually the number
of groups is taken from 5 to 15 and preferably
from 5 to 10. Regarding second question, let K
be the number of groups to be made, d the width
of each of the group. The number K may be
obtained by using Sturge's Rule as:
Frequency Table and Frequency Distribution

K = 1 + 3.322 (log10 n),


where d = R/K, and R = maximum - minimum
value of the data, n is the total number of
observations. Smallest value in the data set may be
taken as the lower limit of the first group. If,
however, it is not an integer the next higher integer
value is selected. Note that this formula provides a
guideline only and the value of K thus obtained, can
be increased or decreased, for better presentation.
In the above data, maximum value is 98 and
minimum value is 45, thus
R = 98 - 45 = 53, n = 120.
Frequency Table and Frequency Distribution

Using the Sturge's Rule


K = 1 + 3.322 (log10120) = 1 + 3.322 (2.079)
= 7.906 ~ 8
R = 53, then d (width) = 53/8 = 6.6 ~ 7
Most statisticians prefer to group the data
starting with a number with a multiple of 2 or 5
or 10 as the class may be. Select 45 as the
lower of the class limit and make the following
groupings called class intervals.
Frequency Table and Frequency Distribution
Frequency Table and Frequency Distribution

This is known as grouped data. This table is known as


frequency table or frequency distribution. Note that, the
class intervals given in table 1.6 are called discrete class
intervals. If someone is interested to present this data in
form of appropriate diagram then one cannot, as the
groups are discrete. Therefore continuous groups are
must. To make it continuous see the upper limit of the first
group and lower limit of the second group, find their
difference and divide by 2. Add this number in the upper
limit of the group and subtract from the lower limit of the
group i.e. 45 – 0.5 = 44.5 and 51 + 0.5 = 51.5. Now these
class limits will be called class boundaries The class limits
of table 1.6 is rewritten as class boundaries in table 1.7
(Column 1).
Frequency Table and Frequency Distribution
Relative Frequency
Relative frequency of a class interval is proportion of
the class frequency relative to the total frequencies.
Relative frequencies are in column (3), Table 1.7. The
purpose of calculating the relative frequencies is to
obtain the idea of proportion, and percentage which
are, in fact, useful to understand the basic concept of
different types of rates, ratios and consequently the
idea of probability. From the Table 1.7, we can
immediately say that there are about 27.5% students
whose weight lies in the weight group 58.5 - 65.5 Kg.
Cumulative Frequency
The cumulative class frequency of class interval is
the total number of observations having values
less than the upper limit of that class interval. One
of the advantages of the construction of cumulative
frequency table is that, one gets immediately the
picture, how many students have weight less than
or equal to a certain point. For example there are
117 students whose weights are less than or equal
to 86.5 Kg. The cumulative frequencies are given
in column 5 of Table 1.7.
Relative Cumulative
Frequency
The cumulative frequency of a class interval divided by the
total frequencies is called relative cumulative frequency. It
is generally expressed in the form of percentages and is
known as percentage cumulative frequency. One of its
advantages is that one can immediately get an idea, of the
percentage of the students whose weight is less than or
equal to a certain point. For example 69.2% students have
weight less than or equal to 72.5 Kg. In other words one
can say that about 31% students have weight above 72.5
Kg. The relative cumulative frequencies are given in
column (6), Table 1.7.
Graphical Presentation of Quantitative Data

A grouped data involving a quantitative variable


may be presented by various graphs. Some
commonly used graphs are histogram,
frequency polygon, frequency curve and
cumulative frequency curve.
Histogram
Histogram is a graphical display of a frequency
distribution and is obtained by plotting the class
intervals along the X-axis and frequencies
along the Y-axis. On each class interval (taken
as width), we draw adjacent vertical bars of the
heights equal to the corresponding frequencies.
The graph thus obtained is called histogram.
Histogram is constructed by using the data
given in Table 1.7 and is shown in Figure 1.5.
Frequency Polygon and Frequency Curve

Frequency Polygon is a graph obtained by joining by


straight lines the mid points of the tops of the bars of
the histogram. Frequency curve is a smoothed
curve, which does not necessarily pass through the
mid points like frequency polygon. The ends of the
graph drawn in this way do not meet the X-axis, but
remain open ended. This curve is very important as
analysis of the data depends on the shape of the
curve drawn. Frequency curve is plotted by using
the data given in Table 1.7 and is shown in Fig. 1.5.
Frequency Polygon and Frequency Curve
Types of Frequency Curve
Frequency curves are generally of two types;
(i) symmetrical and (ii) asymmetrical or skewed
Asymmetrical or skewed curve is either positively
skewed or negatively skewed. In symmetrical curves,
observations are equidistant from the central maximum.
Normal curve (to be discussed later) is an important
example of this type. In asymmetrical curves, the tails of
the curves is longer on one side than the other side. If
the longer tail is to the right, the curve is said to be
positively skewed. If the longer tail is to the left, the curve
is said to be negatively skewed.
Types of Frequency Curve
Cumulative Frequency Curve
Cumulative frequency curve is a graph obtained
by plotting the upper limits on X-axis and the
corresponding cumulative frequencies along Y-
axis and joining the points by freehand. The
graph of cumulative frequency using the data
given in Table 1.7 is shown in Figure 1.7.
Cumulative Frequency Curve
Historigram: Graphical Presentation of Data Relating to
Time

Sometimes data is relating to time. People


without going into details of the nature of data
draw either bar diagram or pie charts for this
type of data. In fact bar diagram or pie charts
are not appropriate. The line diagram is drawn
for the data relating to time. This graph is
known as Historigram. One can see the trend of
the data and may guess which type of analysis
for this type of data.
Historigram: Graphical Presentation of Data Relating to
Time

Below are the data relating to number of


students (males and females) admitted in King
Faisal University from 1975-1976 to 1993-1994
in medical college. We are interested to present
this data in an appropriate diagram.
Example
Table 1.8 shows the data relating to admission
of students in King Faisal University. Draw a
suitable graph for this data.
Example:

Data sorted from low to high:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Here, use the 10’s digit for the stem unit:


Stem Leaf
 12 is shown as 1 2

 35 is shown as 3 5
Example:

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 28, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Completed Stem-and-leaf diagram:


Stem Leaves
1 2 3 7
2 1 4 4 6 7 8
3 0 2 5 7 8
4 1 3 4 6
5 3 8
Descriptive Statistics
After the graphical presentation and summarization of
statistical data, the next step is to proceed to different
measures for statistical analysis. The methods of statistical
analysis for qualitative and quantitative data are different.
Proportion, percentage, ratio, indices, ranks, association,
test of independence, etc. are possible methods of statistical
analyses for qualitative data whereas percentage, indices,
averages, variations, correlation, regression, analysis of
variance, etc. are possible methods of analysis for
quantitative data. For qualitative data, we shall describe the
methods wherever it is necessary but we begin with
quantitative data analysis.
Graphical Presentation of Qualitative Data

The medical scientists while writing their papers or


reports always present their information in the forms
of diagrams and graphs as they are made to
summarize the data and a guide to further analysis.
Graphs are used to compare two or more than two
sets of data. Every graph or chart should have a title
that should give a clear description of the diagram or
chart. A suitable scale should be used. The
horizontal and vertical axes should be marked so
that the graph or chart should be self-explanatory.
Graphical Presentation of Qualitative Data

There are many ways to present the data by


charts and diagrams. We will discuss only
commonly used charts or diagrams. Data
involving a categorical variable measured on a
nominal or ordinal scale can be displayed by (i)
Simple Bar Charts (ii) Subdivided and Multiple
Bar Charts and (iii) Pie Charts.
Bar Charts
Bar chart is mainly used for graphical
presentation of categorical data. Bar chart is
obtained by plotting categories (of some
constant widths) along X-axis and erecting bars
of the heights equal to the corresponding
numbers long Y-axis. Usually some fixed gap is
left between two bars. Some non statisticians
make the bar diagram for the data which relate
to time, which in fact is not an appropriate chart.
Example
Table shows the blood groups of 230 patients
visiting in January 1994 in the Blood Bank of
King Fahd Teaching Hospital of the King Faisal
University at Al-Khobar. Prepare suitable chart
Solution
Since the data given in the table are
categorical, the most appropriate diagram is Bar
Chart. There are 230 patients falling in 7
categories of various blood groups and each
category is presented by a bar of by a bar of
height equal to the number of patients in that
category as shown in Figure 1.1 presents each.
Subdivided and Multiple Bar Charts

If the data is grouped on the basis of two


categorical variables then categories of one
variable are displayed by erecting bars of height
which corresponds to the values of these
categories and the categories of second variable
are displayed by dividing each bar into parts of
size equal to the values of the sub-categories,
whereas in multiple bar charts two bars for each
category are constructed side by side
Example
Table 1.2 shows the type of investigation
conducted on patients with breast disease for
study 1 and study 2, in a New Bury Hospital of
Berkshire from October 1 to December 31,
1989 (study 1) and from April 16 to July 19,
1990 (study-2) Prepare suitable chart:
Solution (a)
Subdivided Bar Chart - The numbers in each category are
added and bar chart is prepared for each category. Further,
each bar is divided into two types of study as shown in Fig.
Solution (b)
Multiple Bar Chart - In this diagram, same data
is used and two bars for each type of
investigations of both studies are placed side by
side as shown in Figure 1.3. The advantage of
the multiple bar chart is that comparison can be
made easily. If there could be more than two
studies, more than two bars are created side by
side.
Pie Chart
Pie chart is a pictorial presentation of the data. If a
set of observation has K categories, it is
represented by pies i.e. K sectors in a circle. The
angle of the ith sector at the center of the circle,
denoted by Ai, is proportional to the number in that
category. It is given by:
Example
Table 1.3 shows the reported cases of AIDs in
the 5 continents as of 17 Jan. 1992 (WHO).
Prepare a suitable chart for the given data.
Solution
One can say that this data may be represented by
bar charts, the answer is no, as the difference
between the minimum value and maximum value
is so much (more than 1:10) that bar charts for
these data cannot be presented on normal paper.
Besides we may be interested in the proportional
share of each continent ratio than actual numbers.
Therefore we look for another solution. The
appropriate chart for this type of data is, Pie Chart
that is shown in Fig. 1.4.
Rates
Suppose, in a specified population, n events
occur during a fixed period of time. If n(A) of
these events possess some characteristic, say
A, then rate of the event having the
characteristic A is given by
R(A) = n(A)/n base (K)
per base (K) unit,
where base is usually taken as 1, 100, 1000, or
100000, etc.

You might also like