Basic Stat CH-1 and 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

CHAPTER ONE

1. Introduction
1.1 Definitions and classification of Statistics
Statistics is defined differently by different authors over period of time. In the olden days
statistics was confined to only state affairs but in modern days it embraces almost every sphere
of human activity. Therefore, a number of old definitions, which were confined to narrow field
of enquiry, were replaced by more definitions, which are much more comprehensive and
exhaustive.
We can define statistics in two senses
• In the plural sense: statistics are the raw data themselves (Numerical facts), like statistics
of births, statistics of deaths, statistics of students, statistics of imports and exports, etc.
• In the singular sense: Statistics is the science of conducting studies to collect, organize,
summarize, analyze, and well as deriving valid conclusions and making reasonable
decisions on the basis of data.
Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics:
• Is concerned with summary calculations, graphs, charts and tables.
• In descriptive statistics our objective is to describe a group of data that we have „in hand‟
i.e. data that are accessible to us.
• We are not interested in other data that we are not gathered.
• Generally characterizes or describes a set of data elements by graphically displaying the
information or describing its central tendencies and how it is distributed.
Example: the following data refers to the number of malaria patients who have been treated
in Debre Berhan referal Hospital from 1986 to 1990 (Eth. Calendar).
3645; 4568; 5432; 6751; 7369
If we calculate the average malaria patients from 1986 to 1990 as

1|Page
1
Average  (3645  4568  5432  6751 7369)  5553 , then our work belongs to the
5
domain of descriptive statistics.
If we say that there was an increase of 724 patients from 1986 to 1990, then again this
belongs to the domain of descriptive statistics.
2. Inferential Statistics: consists of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables, and making
predictions. Statistical techniques based on probability theory are required.
Example 1.1: In the above example if we predict the number of malaria patients in the year
1995 to be 9917, then our work belongs to the domain of inferential statistics.
Example 1.2: Suppose we want to have an idea about the percentage of illiterates in our
country. We take a sample from the population and find the proportion of illiterates in the
sample. This sample proportion with the help of probability enables us to make some
inferences about the population proportion. This study belongs to inferential statistics.

1.2 Stages in Statistical Investigation


Before we deal with statistical investigation, let us see what statistical data mean. Each and every
numerical data can‟t be considered as statistical data unless it possesses the following criteria.
These are:
 The data must be aggregate of facts
 They must be affected to a marked extent by a multiplicity of causes
 They must be estimated according to reasonable standards of accuracy
 The data must be collected in a systematic manner for predefined purpose
 The data should be placed in relation to each other
A statistician should be involved at all the different stages of statistical investigation. This
includes formulating the problem, and then collecting, organizing and classifying, presenting,
analyzing and interpreting of statistical data. Let‟s see each stage in detail
I. Formulating the problem: first research must emanate if there is a problem. At this stage
the investigator must be sure to understand the problem and then formulate it in statistical
term. Clarify the objectives very carefully. Ask as many questions as necessary because
“An approximate answer to the right question is worth a great deal more than a precise
answer to the wrong question.”-The first golden rule of applied mathematics-

2|Page
Therefore, the first stage in any statistical investigation should be to:
 Get a clear understanding of the physical background to the situation under study;
 Clarify the objectives;
 Formulate the objective in statistical terms
II. Proper collection of data: in order to draw valid conclusions, it is important „good‟ data.
Data are gathered with aim to meet predetermine objectives. In other words, the data must
provide answers to problems. The data itself form the foundation of statistical analyses
and hence the data must be carefully and accurately collected. In section 1.6 we will see
the methods of data collection.
III. Organization and classification of data: in this stage the collected data organized in a
systematic manner. That means the data must be placed in relation to each other. The
classification or sorting out of data is, by itself, a kind of organization of data.
IV. Presentation of data: The purpose of putting the organized data in graphs, charts and
tables is two-fold. First, it is a visual way to look at the data and see what happened and
make interpretations. Second, it is usually the best way to show the data to others. Reading
lots of numbers in the text puts people to sleep and does little to convey information.
V. Analyses of data: is the process of looking at and summarizing data with the intent to
extract useful information and develop conclusions. Data analysis is closely related to data
mining, but data mining tends to focus on larger data sets, with less emphasis on making
inference, and often uses data that was originally collected for a different purpose. In this
stage different types of inferential statistical methods will apply. For instance, hypothesis
testing such as  2 test of association.
VI. Interpretation of data: interpretation means drawing valid conclusions from data which
form the basis of decision making. Correct interpretation requires a high degree of skill
and experience.
Note that: Analyses and interpretation of data are the two sides of the same coin.

1.3 Definition of Some Terms


In this section, we will define those terms which will be used most frequently. These are:
Data: are the values (measurements or observations) that the variables can assume. OR Facts or figures
from which the conclusion can be drawn.

3|Page
Data set: Facts or figures collected for a particular study. Each value in the data set is called data value or
datum.
Raw Data: Data sheets are where the data are originally recorded. Original data are called raw
data. Data sheets are often hand drawn, but they can also be printouts from database programs
like Microsoft Excel.
Population: The totality of all subjects with certain common characteristics that are
being studied in a specified time and place.
Sample: Is a portion of a population which is selected using some technique of sampling.
Sample must be representative of the population so that it must be selected by any of the
developed technique.
Sampling: Is the process of selecting units (e.g., people, households, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results back to
the population from which they were chosen.
Sample size: The number of elements or observation to be included in the sample.
Parameter: Any measure computed from the data of a population. Example: Populations mean
( ) and population standard deviation ( )
Statistic: Any measure computed from the sample. Example: sample mean ( ̅ ), sample standard
deviation (s)
Survey: A collection of quantitative information about members of a population when no special
control is exercised over any of the factors influencing the variable of interest.
Sample survey: A survey that include only a portion of the population.
Census: A collection of information about every member of a population
Sample survey has the following advantages over census
• Sample survey saves time and cost
• Has great accuracy
• Avoid wastage of material
Variable: A variable is a characteristic or attribute that can assume different values. Variables
whose values are determined by chance are called random variables. Variables are often
specified according to their type and intended use and hence variable can be classified in to two
namely qualitative and quantitative variables.

4|Page
• A quantitative variable is naturally measured as a number for which meaningful
arithmetic operations make sense. Examples: Height, age, crop yield, GPA, salary,
temperature, area, air pollution index (measured in parts per million), etc.
• Qualitative variable: Any variable that is not quantitative is qualitative. Qualitative
variables take a value that is one of several possible categories. As naturally measured,
qualitative variables have no numerical meaning. Examples: Hair color, gender, field of
study, marital status, political affiliation, status of disease infection.
Quantitative variables can be classified as discrete and continuous variable.
1. Discrete variables can assume certain numerical values. That is, there are gaps between
the possible values. Such as 0, 1, 2...It may be countable finite or countable infinite. For
example the number of students in a classroom, number of children a family.
2. Continuous variable can take any value within a specified interval with a finite enough
measuring device. No gaps between possible values. They are obtained by measuring. For
example, consider the heights of two people no matter how close it is we can find another
person whose height falls somewhere between the two heights is a continuous variable.

1.4 Applications, Uses and Limitations of Statistics


I. Applications of Statistics
 Apart from helping elicit an intelligent assessment from a body of figures and facts,
statistics is indispensable tool for any scientific enquiry-right from the stage of planning
enquiry to the stage of conclusion. It applies almost all sciences: pure and applied,
physical natural, biological, medical, agricultural and engineering. It also finds
applications in social and management sciences, in commerce, business and industry.
 In almost all fields of human endeavor.
 Almost all human beings in their daily life are subjected to obtaining numerical facts.
 Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
 In industries especially in quality control area.
I. Uses of Statistics
 Statistics presents fact in the form of numerical data
 It condenses and summarizes a mass of data in to a few presentable and precise figures.
 It facilitates comparison of data

5|Page
 It helps in formulating and testing hypothesis
 It helps in predicting future trend
 It helps in formulating polices.
II. Limitations of Statistics
Statistics with all its wide application in every sphere of human activity has its own limitation.
Some of them are given below
 Statistics is not suitable to the study of qualitative phenomenon: Since statistics is
basically a science and deals with a set of numerical data, it is applicable to the study of only
these subjects of enquiry, which can be expressed in terms of quantitative measurements. As
a matter of fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc,
cannot be expressed numerically and any statistical analysis cannot be directly applied on
these qualitative phenomenons. Nevertheless, statistical techniques may be applied indirectly
by first reducing the qualitative expressions to accurate quantitative terms. For example, the
intelligence of a group of students can be studied on the basis of their marks in a particular
examination.
 Statistics does not study individuals: Statistics does not give any specific importance to the
individual items; in fact it deals with an aggregate of objects. Individual items, when they are
taken individually do not constitute any statistical data and do not serve any purpose for any
statistical enquiry.
 Statistical laws are not exact: It is well known that mathematical and physical sciences are
exact. But statistical laws are not exact and statistical laws are only approximations.
Statistical conclusions are not universally true. They are true only on an average.
 Statistics table may be misused: Statistics must be used only by experts; otherwise,
statistical methods are the most dangerous tools on the hands of the inexpert. The use of
statistical tools by the inexperienced and untraced persons might lead to wrong conclusions.
Statistics can be easily misused by quoting wrong figures of data. As King says aptly
„statistics are like clay of which one can make a God or Devil as one pleases.‟
 Statistics is one of the methods of studying a problem: Statistical method does not provide
complete solution of the problems because problems are to be studied taking the background
of the countries culture, philosophy or religion into consideration. Thus the statistical study
should be supplemented by other evidences.

6|Page
1.5 Scales of Measurement
Normally, when one hears the term measurement, they may think in terms of measuring the
length of something (i.e. the length of a piece of wood) or measuring a quantity of something
(i.e. a cup of flour). This represents a limited use of the term measurement. In statistics, the term
measurement is used more broadly and is more appropriately termed scales of measurement.
Scales of measurement refer to ways in which variables or numbers are defined and categorized.
Each scale of measurement has certain properties which in turn determine the appropriateness for
use of certain statistical analyses. The four scales of measurement are nominal, ordinal, interval,
and ratio.
Nominal Scales
Nominal scales possess the following properties.
 Level of measurement which classifies data into mutually exclusive, all-inclusive
categories in which no order or ranking can be imposed on the data.
 No arithmetic and relational operation can be applied.
 No quantitative information is conveyed
 Thus only gives names or labels to various categories.
Examples:
 Political party preference (Republican, Democrat, or Other,)
 Sex (Male or Female.)
 Marital status (married, single, widow, divorce)
 Country code
 Regional differentiation of Ethiopia.
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the following properties:
 Level of measurement which classifies data into categories that can be ranked, however
Differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Ordering is the sole property of ordinal scale.
Examples:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).

7|Page
 Military status.
3. Interval Scales
Interval scales are measurement systems that possess the following properties:
 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
Examples:
 IQ, Temperature in F0.
4. Ratio Scales
Ratio scales measurement possess the following properties: Level of measurement which
classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios
exist between the different units of measure.
 All arithmetic and relational operations are applicable.
Examples:
 Weight
 Height
 Number of students
 Age
Use of level of measurements
 Helps you decide how to interpret the data from the variable.
 Helps you decide what statistical analysis is appropriate on the values that were assigned.
For example if a measurement is nominal then you know that you never average the data
level.

8|Page
CHAPTER TWO
2. Methods of Data Collection and Presentation
2.1 Methods of Data Collection
Once it is decide what type of study is to be made, it becomes necessary to collected information
about the concerned study, mostly in the form of data. In order to generate valid conclusion from
a data, information has to be collected in a systematic manner. Whatever the quality of sampling
and analysis method, a haphazardly collected dataset is less likely to produce valuable and
generalizable information.
2.1.1 Sources of Data
There are two sources of data these are primary and secondary sources. Depending on its source
data can also be classified into two types.
(1). Primary Data (2). Secondary Data
1) Primary data
• The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
• Refer to those that are collected by conducting survey to meet the specific problem needs
at hand.
Example: Population census reports are primary data because these are collected, complied and
published by the population census organization.
2) Secondary data
• The secondary data are the second hand information which are already collected by
someone (organization) for some purpose and are available for the present study. The
secondary data are not pure in character and have undergone some treatment at least once.
• Data taken from already available published or unpublished source.
There are three major methods of data collection
1. self-administered questionnaire
2. direct investigation-measurement (observation) of the subject and interviewing(face-to-
face, telephone, …)
3. the use of documentary source
1. Self-administered questionnaire

9|Page
Questionnaire is the main data collection instrument in formal sample survey. Before
examining the steps in designing a questionnaire we need to review the types of questions used
in questionnaires. Depending on the amount of freedom given to respondent in offering
responses, there are two basic types of questions that can be used in questionnaires: open-
ended questions and closed ended questions.
The type of questions for use will be determined by the form of responses wanted, the nature of
the respondents and their ability to answer the questions.
Open-ended questions: - allows the respondent to answer it freely in his or her own words
Example: what do you think are the reasons for a high drop-out rate of village health
committee members?
Closed- ended questions:-
Predetermined list of alternate responses is presented to the respondent for checking the appropriate
one(s). It implies that the respondent‟s answers are restricted in some way to a limited range of
alternatives.
Advantage
• It is the cheapest and can be conducted by a single researcher.
• Questionnaires can be sending to a wide geographical area.
• There is no interviewer variability
Disadvantage
• Low response rate
• No assurance that the questioners was answered by the right person.
• Mail questionnaire is not suitable for illiterate community
2. direct investigation
i. measurement or/and observation
• data can be obtained through direct observation or measurement
• provides accurate information but it is expensive and inconvenient
eg: Land area measurement, Animal weight gain, Physical examination, direct observation of
work.
ii. Interview
a) Face-to-Face interview
Advantage:-

10 | P a g e
• Interviewers can observe the surroundings and can use nonverbal communication and
visual aids.
• The interviewer can help the respondent if he/she has difficulty in understanding the
questions.
• Respondent is likely to answer all the questions alone
Disadvantage:-
• Cost is high
• Interviewer bias is also high
• Untrained interviewer may distort the meaning of the questions
b) Telephone Interview
Advantage:-
• It is less expensive in time and money compared to face to face interviews
• Relatively high response rate
• Reach people who would not open their doors to an interviewer, but might willing to
talk on the telephone
Disadvantage:-
• Unrepresentative of the groups which do not have telephones
• Unlisted telephone numbers are excluded from the study.
• Respondent may be substitute by another
3. The use of documentary source
• Extracting information from existing resources.
• Is much less expensive than any other two sources
• It is difficult to get the information needed when records are compiled in
unstandardized manner.
Example: - Hospital records, professional institutes, Official statistics, - - -
Editing of Data:
After collecting the data either from primary or secondary source, the next step is it‟s editing.
Editing means the examination of collected data to discover any error and mistake before presenting it. It
has to be decided before hand what degree of accuracy is wanted and what extent of errors can be
tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.

11 | P a g e
2.2 Methods of Data Presentation
2.2.1 Introduction
This topic introduces tabular and graphical methods commonly used to summarize both qualitative and
quantitative data. Tabular and graphical summaries of data can be obtained in annual reports, newspaper
articles and research studies. Everyone is exposed to these types of presentations, so it is important to
understand how they are prepared and how they will be interpreted.
Modern statistical software packages provide extensive capabilities for summarizing data and preparing
graphical presentations. MINITAB, SPSS, STATA and R are three packages that are widely available.
Tabulation of Data: The process of placing classified data into tabular form is known as tabulation. A
table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements.
2.2.2 Frequency Distribution
A frequency distribution is the organization of row data in table form, using classes and frequencies.
There are three basic types of frequency distributions, and there are specific procedures for constructing
each type. The three types are categorical, ungrouped and grouped frequency distributions.
The reasons for constructing a frequency distribution are as follows
• To organize the data in a meaningful, intelligible way.
• To enable the reader to determine the nature or shape of the distribution
• To facilitate computational procedures for measures of average and spread
• To enable the researcher to draw charts and graphs for the presentation of data
• To enable the reader to make comparisons between different data set
Some of basic terms that are most frequently used while we deal with frequency distribution are the
following:
• Lower Class Limits are the smallest number that can belong to the different class.
• Upper Class Limits are the largest number that can belong to the different classes.
• Class Boundaries are the number used to separate classes, but without the gaps created
by class limits.
• Class midpoints are the midpoints of the classes. Each class midpoint can be found by
adding the lower class limit to the upper class limit and dividing the sum by 2.
• Class width is the difference between two consecutive lower class limits or two
consecutive lower class boundaries.

12 | P a g e
2.2.2.1Categorical Frequency Distribution
The categorical frequency distribution is used for data which can be placed in specific categories such as
nominal or ordinal level data. For example, data such as data such as political affiliation, religious
affiliation, or major field of study would use categorical frequency distribution.
The major components of categorical frequency distribution are class, tally and frequency. Moreover,
even if percentage is not normally a part of a frequency distribution, it will be added since it is used in
certain types of graphical presentations, such as pie graph.
Steps of constructing categorical frequency distribution
1. You have to identify that the data is in nominal or ordinal scale of measurement
2. Make a table as show below
A B C D
class Tally Frequency Percent

3. Put distinct values of a data set in column A


4. Tally the data and place the result in column B
5. Count the tallies and place the results in column C
6. Find the percentage of values in each class by using the formula

Where is frequency and is total number of values


Example 2.1: Twenty-five army inductees were given a blood test to determine their blood type. The data
set is given as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.

13 | P a g e
Solution:
A B C D
class Tally Frequency Percent
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16

2.2.2.2 Ungrouped Frequency Distribution


When the data are numerical interested of categorical, the range of data is small and each class is only one
unit, this distribution is called an ungrouped frequency distribution.
The major components of this type of frequency distributions are class, tally, frequency and cumulative
frequency. The steps are almost similar with that of categorical frequency distribution.
Cumulative frequencies are used to show how many values are accumulated up to and including a specific
class.
Example 2.2: The following data represent the number of days of sick leave taken by each of 50 workers
of a company over the last 6 weeks.
2 0 0 5 8 3 4 1 0 0 7 1
7 1 5 4 0 4 0 1 8 9 7 0
1 7 2 5 5 4 3 3 0 0 2 5
1 3 0 2 4 5 0 5 7 5 1 1
0 2
A. Construct ungrouped frequency distribution
B. How many workers had at least 1 day of sick leave?
Solution:
A. Since this data set contains only a relatively small number of distinct or different values, it is
convenient to represent it in a frequency table which presents each distinct value along with its
frequency of occurrence.

Class Frequency Cumulative Frequency


0 12 12

14 | P a g e
1 8 20
2 5 25
3 4 29
4 5 34
5 8 42
7 5 47
8 2 49
9 1 50

B. Since 12 of the 50workers had no days of sick leave, the answer is 50-12=38

2.2.2.3 Grouped Frequency Distribution


When the range of the data is large, the data must be grouped in which each class has more than one unit
in width. While we construct this frequency distribution, we have to follow the following steps.
1. Find the highest and the lowest values
2. Find the range; or
3. Select the number of classes desired. Here, we have two choices to get the desired number of
classes:
a) Use Struge‟s rule. That is, where is the number of class and
is the number of observations.
b) Select the number of classes arbitrarily between 5 and 20. This is a conventional way. If you
fail to calculate by Struge‟s rule, this method is more appropriate.
When we choose the number of classes, we have to think about the following criteria
• The classes must be mutually exclusive. Mutually exclusive classes have non
overlapping class limits so that values can‟t be placed in to two classes.
• The classes must be continuous. Even if there are no values in a class, the class must
be included in the frequency distribution. There should be no gaps in a frequency
distribution. The only exception occurs when the class with a zero frequency is the
first or last. A class width with a zero frequency at either end can be omitted without
affecting the distribution.
• The classes must be equal in width. The reason for having classes with equal width is
so that there is not a distorted view of the data. One exception occurs when a
distribution is open-ended. i.e., it has no specific beginning or end values.

15 | P a g e
4. Find the class width by dividing the range by the number of classes

Note that: Round the answer up to the nearest whole number if there is a reminder. For instance,
and
5. Select the starting point as the lowest class limit. This is usually the lowest score (observation).
Add the width to that score to get the lower class limit of the next class. Keep adding until you
achieve the number of desired class calculated in step 3.
6. Find the upper class limit; subtract unit of measurement from the lower class limit of the
second class in order to get the upper limit of the first class. Then add the width to each upper
class limit to get all upper class limits.
Unit of measurement: Is the next expected upcoming value. For instance, 28, 23, 52, and then
the unit of measurement is one. Because take one datum arbitrarily, say 23, then the next
upcoming value will be 24. Therefore, If the data is 24.12, 30, 21.2 then give
priority to the datum with more decimal place. Take 24.12 and guess the next possible value. It is
24.13. Therefore, .
Note that: is the maximum value of unit of measurement and is the value when we don‟t
have a clue about the data.
7. Find the class boundaries.
and,

In short, and .
8. Tally the data and write the numerical values for tallies in the frequency column
9. Find cumulative frequency. We have two type of cumulative frequency namely less than
cumulative frequency and more than cumulative frequency. Less than cumulative frequency is
obtained by adding successively the frequencies of all the previous classes including the class
against which it is written. The cumulate is started from the lowest to the highest size. More than
cumulative frequency is obtained by finding the cumulate total of frequencies starting from the
highest to the lowest class.
For example, the following frequency distribution table gives the marks obtained by 40 students:

16 | P a g e
The above table shows how to find less than cumulative frequency and the table shown below
shows how to find more than cumulative frequency.

5.511
Example 2.3: Consider the following set of data and construct the grouped frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
Steps
1.
2.
3.

4.
5. Select starting point. Take the minimum which is 6 then add width 6 on it to get the next class
LCL.

6 12 18 24 30 36

17 | P a g e
6. Upper class limit. Since unit of measurement is one.
So 11 is the UCL of the first class.
Therefore, is the first class
Class 6-11 12-17 18-23 24-29 30-35 36-41
limit
7. Find the class boundaries. Take the formula in step 7.
, and

Class 5.5-11.5 11.5-17.5 17.5-23.5 23.5-29.5 29.5-35.5 35.5-41.5


Boundaries

8. Steps 9 and 10

2.2.2.4 Relative Frequency Distribution


An important variation of the basic frequency distribution uses relative frequencies, which are easily
found by dividing each class frequency by the total of all frequencies. A relative frequency distribution
includes the same class limits as a frequency distribution, but relative frequencies are used instead of
actual frequencies. The relative frequencies are sometimes expressed as percent.

Relative frequency distribution enables us to understand the distribution of the data and to compare
different sets of data.

18 | P a g e
2.2.3 Diagrammatic and Graphical Presentation of Data
We have discussed the techniques of classification and tabulation that help us in organizing the collected
data in a meaningful fashion. However, this way of presentation of statistical data does not always prove
to be interesting to a layman. Too many figures are often confusing and fail to convey the massage
effectively.
One of the most effective and interesting alternative way in which a statistical data may be presented is
through diagrams and graphs. There are several ways in which statistical data may be displayed
pictorially such as different types of graphs and diagrams.
General steps in constructing graphs
1. Draw and label the and axes
2. Choose a suitable scale for the frequencies or cumulative frequencies and label it on the axis.
3. Represent the class boundaries for the histogram or Ogive or the mid-point for the frequency
polygon on the axis.
4. Plot the points
5. Draw the bars or lines
2.2.3.1 Diagrammatic display of data: Bar charts, Pie-chart, Cartograms
I. Pie chart
Pie chart can used to compare the relation between the whole and its components. Pie chart is a circular
diagram and the area of the sector of a circle is used in pie chart. Circles are drawn with radii proportional
to the square root of the quantities because the area of a circle is .
To construct a pie chart (sector diagram), we draw a circle with radius (square root of the total). The total
angle of the circle is . The angles of each component are calculated by the formula.

These angles are made in the circle by mean of a protractor to show different components. The
arrangement of the sectors is usually anti-clock wise.
Example 2.4: The following table gives the details of monthly budget of a family. Represent these
figures by a suitable diagram.

19 | P a g e
Solution: The necessary computations are given below:

Chart Title
misclaneous
Fuel and Light 20% food
6.67% 40%

House Rent
27% clothing
6.67%

II. Bar Charts


The bar graph (simple bar chart, multiple bar chart and stratified or stacked bar chart) uses vertical or
horizontal bars to represent the frequencies of a distribution. While we draw bar chart, we have to
consider the following two points. These are
• Make the bars the same width
• Make the units on the axis that are used for the frequency equal in size

20 | P a g e
a) A simple bar chart is used to represents data involving only one variable classified on
spatial, quantitative or temporal basis. In simple bar chart, we make bars of equal width but
variable length, i.e. the magnitude of a quantity is represented by the height or length of the
bars. Following steps are undertaken in drawing a simple bar diagram:
• Draw two perpendicular lines one horizontally and the other vertically at an appropriate
place of the paper.
• Take the basis of classification along horizontal line (X-axis) and the observed variable
along vertical line (Y-axis) or vice versa.
• Marks signs of equal breath for each class and leave equal or not less than half breath in
between two classes.
• Finally marks the values of the given variable to prepare required bars.
Example 2.5: Draw simple bar diagram to represent the profits of a bank for 5 years.

Years 1989 1990 1991 1992 1993


Profit 10 12 18 25 42
( million $)

b) Multiple bar charts are used two or more sets of inter-related data are represented
(multiple bar diagram facilities comparison between more than one phenomenons). The
technique of simple bar chart is used to draw this diagram but the difference is that we use
different shades, colors, or dots to distinguish between different phenomena.
Example 2.6: Draw a multiple bar chart to represent the import and export of Canada (values in $) for the
years 1991 to 1995.

21 | P a g e
Years 1991 1992 1993 1994 1995
Imports 7930 8850 9780 11720 12150
Exports 4260 5225 6150 7340 8145

c) Stratified (Stacked or component) Bar Chart is used to represent data in which the total
magnitude is divided into different or components. In this diagram, first we make simple bars for
each class taking total magnitude in that class and then divide these simple bars into parts in the ratio
of various components. This type of diagram shows the variation in different components within
each class as well as between different classes. Sub-divided bar diagram is also known as component
bar chart or staked chart.
Example 2.7: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced on
a certain form during the years 1991 to 1994. Draw stratified bar chart.

Years 1991 1992 1993 1994


Wheat 34 43 43 45
Barley 18 14 16 13
Oats 27 24 27 34

Solution: To make the component bar chart, first of all we have to take year wise total production.

Years 1991 1992 1993 1994


Wheat 34 43 43 45
Barley 18 14 16 13
Oats 27 24 27 34
Total 79 81 86 92

The required diagram is given below:

22 | P a g e
2.2.3.2. Graphical presentation of data: Histogram, Frequency Polygon, Ogive Curves
Statistical graphs can be used to describe the data set or to analyze it. Graphs are also useful in getting the
audience‟s attention in a publication or a speaking presentation.
They can be used to discuss an issue, reinforce a critical point, or summarize a data set. They can also be
used to discover a trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are
i. The histogram.
ii. The frequency polygon.
iii. The cumulative frequency graph, or ogive (pronounced o-jive).
i. Histogram
Histogram is a special type of bar graph in which the horizontal scale represents classes of data values and
the vertical scale represents frequencies. The height of the bars correspond to the frequency values, and
the drawn adjacent to each other (without gaps).
We can construct a histogram after we have first completed a frequency distribution table for a data set.
The axis is reserved for the class boundaries.
Example 2.8: Take the data in example 2.3.

23 | P a g e
7.0

6.0

5.0

Frequency 4. 0

3.0

2.0

1.0

0.0 5.5 11.5 17.5 23.5 29.5 35.5 41.5


Class boundaries

Relative frequency histogram has the same shape and horizontal ( ) scale as a histogram, but the
vertical ( ) scale is marked with relative frequencies instead of actual frequencies.
ii. Frequency Polygon
A frequency polygon uses line segment connected to points located directly above class midpoint values.
The heights of the points correspond to the class frequencies, and the line segments are extended to the
left and right so that the graph begins and ends on the horizontal axis with the same distance that the
previous and next midpoint would be located.
Example 2.9: Take the data in example 2.3.

7.0

6.0

5.0

4.0

3.0

2.0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Midpoints

iii. Ogive Graph


An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the cumulative
frequency distribution lists cumulative frequencies. Note that the Ogive uses class boundaries along the
horizontal scale, and graph begins with the lower boundary of the first class and ends with the upper
boundary of the last class. Ogive is useful for determining the number of values below some particular
value. There are two type of Ogive namely less than Ogive and more than Ogive. The difference is that

24 | P a g e
less than Ogive uses less than cumulative frequency and more than Ogive uses more than cumulative
frequency on axis.
Example 2.10: Take the data in example 2.3 and draw less than and more than Ogive

20 Less than Ogive

15

10

More than Ogive


0
5.5 11.5 17.5 23.5 29.5 35.5 41.5
Class Boundaries

25 | P a g e

You might also like