0% found this document useful (0 votes)
155 views30 pages

Chapter One: Definition of Statistics: The Word "Statistics" Has Different Meanings To Different Person's .When

This document introduces statistics as a subject and provides definitions and classifications. It discusses descriptive statistics, which involves collecting, organizing and presenting data without drawing conclusions. Inferential statistics allows conclusions to be drawn about populations based on samples. Some key terms are also defined, such as population, sample, parameter and statistic. The document notes the wide applications of statistics across many fields but also limitations, as statistics deals with aggregates and not individual values.

Uploaded by

Tadesse Ayalew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views30 pages

Chapter One: Definition of Statistics: The Word "Statistics" Has Different Meanings To Different Person's .When

This document introduces statistics as a subject and provides definitions and classifications. It discusses descriptive statistics, which involves collecting, organizing and presenting data without drawing conclusions. Inferential statistics allows conclusions to be drawn about populations based on samples. Some key terms are also defined, such as population, sample, parameter and statistic. The document notes the wide applications of statistics across many fields but also limitations, as statistics deals with aggregates and not individual values.

Uploaded by

Tadesse Ayalew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CHAPTER ONE

1. Introduction

Statistical thinking has now a day became very essential for different fields of study. Its
usefulness has now spread to such diverse fields as agriculture, business, accounting, marketing,
economics, management, medicine, political science, psychology, sociology, engineering,
journal, metrology, tourism, etc. For this reason, statistics is now included in the curriculum of
many professional and academic study programs.

This chapter introduces the subject matter of statistics, the art of learning from data. It describes
the two branches of statistics, descriptive and inferential. The idea of learning about a population
by sampling and studying certain of its members is described.

• Definition and Classification of Statistics

Definition of Statistics: The word “statistics” has different meanings to different person’s .When
most people hear the word they think of tables of figures giving births, deaths, marriages,
divorces, accidents etc. Some people think of statistics as information about an activity (like
production, population, national income etc) expressed in numbers. Still some others think of the
term statistics as a subject or as a body of knowledge like other sciences.

Even if the word Statistics has different meaning for different individuals, the common usage of
the word “statistics” has, two meanings. In one sense “statistics” is the plural form of “statistics”
and refers to the numerical facts and figures collected for a certain purposes. In the other sense;”
statistics” refers to a field of study or to a body of knowledge or to a subject that is concerned
with systematic collection and interpretation of numerical data to make a decision. In this sense
the word statistics is singular. Thus,

Statistics as a subject (Field of study): is defined as the science of collecting, organizing,


presenting, analyzing and interpreting numerical data to make decision on the bases of such
analysis. (In singular sense)

Statistics as a numerical data: is defined as aggregates of numerical expressed facts (figures)


collected in a systematic manner for a predetermined purpose. (In plural sense)
In this course, we shall be mainly concerned with statistics as a subject, that is, as a field of
study.

Classification of Statistics:

Anyone can apply statistical techniques to, virtually, every branch of science and art. These
techniques are so diverse. So, statisticians commonly classify them into the following two broad
categories such as descriptive statistics and inferential statistics.

Descriptive Statistics:

It is an area of statistics which is mainly concerned with the methods and techniques
used in collection, organization, presentation, and analysis of a set of data without making
any conclusions or inferences.

According to the above definition the activities in the area of “Descriptive Statistics” includes

• Gathering data

• Editing and classifying data

• Presenting data in tables

• Drawing diagrams and graphs for the data

• Calculating averages and measures of dispersions from the data.

NB: Descriptive statistics doesn’t go beyond describing the data.

Some examples or activities on descriptive statistics:

• Recording a student’s grades throughout the semester and then finding the average of
these grades.

• From sample we have 40% employee suggest positive attitude toward the management of
the organization.

• Drawing graphs that show the difference in the scores of males and females.

All the above examples simply summarize and describe a given data. Nothing is inferred
or concluded on the basis of the above description.

Inferential Statistics:
Inferential statistics is an area of statistics which deals with the method of inferring or
drawing conclusion about the characteristics of the population based upon the results of a
sample.

Statistics is concerned not only with collection , organization , presentation and analysis of
data but also with the inferences which can be made after the analysis is completed. In
collecting data concerning the characteristics of a set of elements, or the element can even
be infinite. Instead of observing the entire set of objects, called the population, one observes a
subset of the population called a sample. Hence, inferential statistics utilizes sample data to make
decision for entire data set.

Examples:

(a) Of 50 randomly selected students at statistics department of Arba Minch university 28 of


which are female. An example of inferential statistics is the following statement: "56% of
students at statistics department of Arba Minch University are female". We have no information
about all students at statistics department of Arba Minch University, just about the 50. We have
taken that information and generalized it to talk about all students at statistics department of
Arba Minch University.

(b) “There is a definitive relationship between smoking and lung cancer”. This statement is the
result of continuous research of many samples taken and studied. Therefore, it is an
inference made from sample results.

(c) As a result of recent reduction in oil production by oil producing nations , we can
expect the price of gasoline to double up in the next year. (This is also an example of
inference from sample survey).

• Definition of Some Basic Statistical Terms

The following basic statistical terms are used frequently in this study.

Population: A population is a totality of things, objects, peoples, etc about which information is
being collected. It is the totality of observations with which the researcher is concerned.

Census survey: It is the process of examining the entire population. It is the total count of the
population.
Sample: A sample is a subset or part of a population selected to draw conclusions about the
population.

Sampling: It is the process of selecting a sample from the population.

Sample size: The number of elements or observation to be included in the sample.

Sampling Frame: It is a list of people, items or units from which the sample is taken.

Parameter: It is a descriptive measure (value) computed from the population. It is the population
measurement used to describe the population.

Example: population mean and population standard deviation

Statistic: It is a measure used to describe the sample. It is a value computed from the sample.

Data: It referred to a collection of related facts and figures from which conclusions may be
drawn.

Variable: A certain characteristic which changes from object to object and time to time.

Note: Censes survey (studying the whole population without considering samples) requires a
great deal of time, money and energy. Trying to study the entire population is in most cases
technically and economically not feasible. Hence, usually we will take a representative sample
out of the population on the basis of which we draw conclusions about the entire population and
we call it sampling survey. Therefore, sampling survey would have the following merits

• Helps to estimate the parameter of a large population.

• Is cheaper, practical, and convenient.

• Save time and energy.

• Easy to handle and analysis.

• Applications, Use and Limitations of Statistics

The scope of statistics is indeed very vast. Apart from helping elicit an intelligent assessment
from a body of figures and facts, statistics is indispensable tool for any scientific enquiry-right
from the stage of planning enquiry to the stage of conclusion. It applies almost all sciences: pure
and applied, physical, natural, biological, medical, agricultural and engineering. It also finds
applications in social and management sciences, in commerce, business and industry.
Of social sciences, economics leans most heavily on statistical methods for analyses of data
relating to micro as well as to macro economics, from demand analyses up to national income
analyses.

Today the field of statistics is recognized as a highly useful tool to making decision process by
managers of modern business, industry, frequently changing technology. It has a lot of functions
in everyday activities. The following are some of the most important uses of statistics.

• Statistics condenses and summarizes complex data. The original set of data (raw data) is
normally voluminous and disorganized unless it is summarized and expressed in few
numerical values.

• Statistics facilitates comparison of data. Measures obtained from different set of data
can be compared to draw conclusion about those sets. Statistical values such as averages,
percentages, ratios, etc, are the tools that can be used for the purpose of comparing sets of
data.

• Statistics helps in predicting future trends. Statistics is extremely useful for analyzing
the past and present data and predicting some future trends.

• Statistics influences the policies of government. Statistical study results in the areas of
taxation, on unemployment rate, on the performance of every sort of military equipment,
family planning, etc, may convince a government to review its policies and plans with the
view to meet national needs and aspirations.

• Statistical methods are very helpful in formulating and testing hypothesis and to develop
new theories and so on.

Even if, statistics is widely used in various fields of natural and social sciences and engineering
as well, which closely related with human inhabitant. It has its own limitations as far as its
application is concerned. Some of these limitations are the following:

• Statistics doesn’t deal with single (individual) values. Statistics deals only with
aggregate values. But in some cases single individual is highly important to consider in
some situations. Example, the sun, a deriver of bus, president, etc.
• Statistics can’t deal with qualitative characteristics. It only deals with data which can be
quantified. Example, it does not deal with marital status (married, single, divorced,
widowed) but it deal with number of married, number of single, number of divorced.

• Statistical conclusions are not universally true. Statistical conclusions are true only
under certain condition or true only on average. The conclusions drawn from the analysis
of the sample may, perhaps, differ from the conclusions that would be drawn from the
entire population. For this reason, statistics is not an exact science.

Example: Assume that in your class there is 50 numbers of students. Take their CGPA
for all 50 students and analysis mean CGPA; that is assumed 3.00. This value is on
average, because all individual has not CGPA 3.00. There is a student who has scored
above 3.00 and below 3.00.

• Statistical interpretations require a high degree of skill and understanding of the


subject. It requires extensive training to read and interpret statistics in its proper context.
It may lead to wrong conclusions if inexperienced people try to interpret statistical
results.

• Statistics can be misused. Sometimes statistical figures can be misleading unless they are
carefully interpreted.

• Stages in Statistical Investigation

Before we deal with statistical investigation, let us see what statistical data mean. Each and every
numerical data can’t be considered as statistical data unless it possesses the following criteria.
These are:

• The data must be aggregate of facts

• They must be affected to a marked extent by a multiplicity of causes

• They must be estimated according to reasonable standards of accuracy

• The data must be collected in a systematic manner for predefined purpose

• The data should be placed in relation to each other


A statistician should be involved at all the different stages of statistical investigation when
planning to conduct scientific research. This includes formulating the problem, and then
collecting, organizing (classifying), presenting, analyzing and interpreting of statistical data.

Formulating the problem:- First research must emanate if there is a problem. At this stage the
investigator must be sure to understand the problem and then formulate it in statistical term.
Clarify the objectives very carefully. Ask as many questions as necessary because “An
approximate answer to the right question is worth a great deal more than a precise answer to the
wrong question”.

Therefore,

• Get a clear understanding of the physical background to the situation under study.

• Clarify the objectives.

• Formulate the objective in statistical terms.

Data Collection: This is a stage where we gather information for the intended purpose

• If data is not readily available, it should be collected.

• Data may be collected by the investigator directly using methods like interview,
questionnaire, observation or it may be taken from published or unpublished sources.

• Data gathering is the basis (foundation) of any statistical work.

• Valid conclusions can only result from properly collected data.

Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data involve irrelevant
figures, incorrect facts, omission and mistakes. Errors that may have been included during
collection will have to be edited .After editing, we may classify (arrange) according to there
common characteristics. Classification or arrangement of data in some suitable order makes the
information easy for presentation.

Data Presentation: The organized data can now be presented in the form of tables and diagram.
At this stage, large data will be presented in tables in a very summarized and condensed manner.
The main purpose of data presentation is to facilitate statistical analysis. Graphs and diagrams
may also be used to give the data a vivid meaning and make the presentation attractive.
Data Analysis: This is the stage where we critically study the data to draw conclusions about the
population parameter. The purpose of data analysis is to dig out information useful for decision
making. Analysis usually involves highly complex and sophisticated mathematical techniques.
However, in this study only the most commonly used methods of statistical analysis are included.
Such as averages, the main measures of dispersion, regression and correlation analysis will be
considered.

Data Interpretation: This is the stage where draw valid conclusions from the results obtained
through data analysis. Interpretation means drawing conclusions from the data which form the
basis for decision making. The interpretation of data is a difficult task and necessitates a high
degree of skill and experience. If data that have been analyzed are not properly interpreted, the
whole purpose of the investigation may be defected and fallacious conclusion be drawn. So that
great care is needed when making interpretation.

• Types of Variables and Measurement Scales

Variables and Attributes: A variable in statistics is any characteristic, which can take on
different values when data are collected. Conventionally, the quantitative variables are termed
as variables and qualitative variables are termed as attributes.

Types of variables: Variables can be classified as qualitative or quantitative variables.

Qualitative Variables: are variables that can be placed into distinct categories according to some
characteristics or attribute.

Example: Sex, religious belief, marital status and so.

Quantitative Variables: are variables that are numeric in nature and can be ordered or ranked.

Example: Age, weight, height, temperature and so on.

Quantitative Variables can be further classified as discrete and continuous:

Discrete random variables are variables which assume values that can be counted (takes always
values is whole number). The values are obtained by counting.

Example: Variables such as number of students, number of errors per page, number of
accidents on traffic line, number of defective or non defective items produced in production line.
Continuous random variables are quantitative variables which assume values between any two
specific values (or take any decimal value). They are obtained by measuring.

Example: age, time, height, income, price, temperature, length, volume, rate, amount of
rainfall, etc. are continuous variables.

Measurement scales:
Normally, when one hears the term measurement, they may think in terms of measuring the
length of something (i.e. the length of a piece of wood) or measuring a quantity of something
(i.e. a cup of flour). This represents a limited use of the term measurement. In statistics, the term
measurement is used more broadly and is more appropriately termed as scales of measurement
(Measurement scales). Scales of measurement refer to ways in which variables or numbers are
defined and categorized. Each scale of measurement has certain properties which in turn
determine the appropriateness for use of certain statistical analyses. The four scales of
measurement are nominal, ordinal, interval, and ratio.
The various measurement scales results from the facts that measurement may be carried out
under different sets of rules.
Nominal Scale:- Consists of ‘naming’ observations or classifying them into various mutually
exclusive categories. Sometimes the variable under study is classified by some quality it
possesses rather than by an amount or quantity. In such cases, the variable is called attribute.
Example: Religion (Christianity, Islam, Hinduism, etc); Sex (Male, Female)
Eye color (brown, black), Blood type (A, B, AB and O) etc.
Ordinal Scale: - Whenever observations are not only different from category to category, but
can be ranked according to some criterion. The variables deal with their relative difference
rather than with quantitative differences.
Ordinal data are data which can have meaningful inequalities. The inequality signs < or > may
assume any meaning like ‘stronger, softer, weaker, better than’, etc.
Examples:
•Patients may be characterized as unimproved, improved & much improved.
•Letter grading system, authority, career, etc

• Individuals may be classified according to socio-economic as low, medium & high.


Interval Scale:- With this scale it is not only possible to order measurements, but also the
distance between any two measurements is known but not meaningful quotients. There is no true
zero point but arbitrary zero point. Interval data are the types of information in which an increase
from one level to the next always reflects the same increase. Possible to add or subtract interval
data but they may not be multiplied or divided.
Example: Temperature of zero degrees does not indicate lack of heat. The two common
temperature scales; Celsius (C) and Fahrenheit (F). We can see that the same difference exists
between 10oC (50oF) and 20oC (68oF) as between 25oc (77oF) and 35oC (95oF). i.e. the
measurement scale is composed of equal-sized interval. But we cannot say that a temperature of
20oC is twice as hot as a temperature of 10oC because the zero point is arbitrary.
Ratio Scale:- Characterized by the fact that equality of ratios as well as equality of intervals may
be determined. Fundamental to ratio scales is a true zero point. Typical examples of ratio scales
are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not
only can we say that a temperature of 200 degrees is higher than one of 100 degrees; we can
correctly state that it is twice as high. Interval scales do not have the ratio property. Most
statistical data analysis procedures do not distinguish between the interval and ratio properties of
the measurement scales.
Example: Variables such as age, height, length, volume, rate, time, amount of rainfall, etc. are
require ratio scale.
Note that: Permissible Arithmetic operations of measurement of scales are given below
Nominal Ordinal Interval Ratio
Counting Greater than or less than Addition and subtraction Multiplication and division
operations of scale values of scale values

CHAPTER TWO

• Methods of Data Collection and Presentation


• Methods of Data Collection
Any numerical or quantitative information can be termed as data. That is, numerical facts or
measurements obtained in the course of enquiry in to a phenomenon, marked by uncertainty,
constitute statistical data. These data will be obtained from various sources and collected in
different ways as it will be discussed in the coming section.
2.1.1 Sources of Data
The data are generally classified in the following two groups depending on their source:
2.1.1.1. Internal Data
Internal data comes from internal sources related with the functioning of an organization or firm
where records regarding purchase, production, sales are kept on a regular basis. Since internal
data originate within the business, collecting the desired information does not usually offer
much difficulty. The particular procedure depends largely upon the nature of facts being
collected and the form in which they exist. The problem of internal data can either be
insufficient or inappropriate for the statistical enquiry into a phenomenon.
2.1.1.2. External Data
The external data are collected and published by external agencies. It can be further classified as
• Primary Data
Primary data are measurements observed and recorded as part of an original study. When the
data required for a particular study can be found neither in the internal records of the enterprise,
nor in published sources, it may become necessary to collect original data, i.e., to conduct first
hand investigation. The work of collecting original data is usually limited by time, money and
manpower available for the study. When the data to be collected are very large in volume, it is
possible to draw reasonably accurate conclusions from the study of a small portion of the group
called a sample.
• Secondary Data
In statistics the investigator need not begin from the very beginning, he may use and must take
into account what has already been discovered by others. When an investigator uses the data
which has already been collected by others, such data are called secondary data. Secondary data
can be obtained from journals, reports, government publications, publications of research
organizations, etc.
However, secondary data must be used with utmost care. The reason is that such data may be
full of errors because of bias, inadequate size of the sample, substitution, errors of definition,
arithmetical errors, etc. Even if there is no error, secondary data may not be suitable and
adequate for the purpose of inquiry.
Before using secondary data the investigator should examine the following aspects:
• Whether the data are suitable for the purpose of investigation
• Whether the data are adequate for the purpose of the investigation. For example, if our
object is to study the wage rates of the workers in the cotton industry in Ethiopia and if
the available data covers only single industry, it would not solve the purpose.
• Whether the data are reliable: to determine the reliability of secondary data is perhaps the
most important and at the same time most difficult job.
• Methods of Collection
The collection of data is the first step in any statistical investigation of the phenomenon. These
data will be obtained from various sources. Depending on the source or appropriateness of the
method we use different methods to collect the data.
2.1.2.1 Method of Primary Data Collection
In primary data collection, the investigator collect the data by himself using methods such as
interviews, observations, laboratory experiments and questionnaires. The key point here is that
the data collected is unique to investigator and his/her research and, until the data is published,
no one else has access to it. There are many methods of collecting primary data and the main
methods include:
Questionnaire: It is a popular means of collecting data, but is difficult to design and often
require many rewrites before an acceptable questionnaire is produced.

Advantages:
• Can be used as a method in its own right or as a basis for interviewing or a telephone
survey.
• Can be posted, e-mailed or faxed.
• Can cover a large number of people or organizations.
• Wide geographic coverage.
• Relatively cheap.
• No prior arrangements are needed.
• Avoids embarrassment on the part of the respondent.
• Respondent can consider responses.
• Possible anonymity of respondent.
• No interviewer bias.

Disadvantages:
• Historically low response rate (although inducements may help).
• Time delay whilst waiting for responses to be returned
• Require a return deadline.
• Several reminders may be required.
• Assumes no literacy problems.
• No control over who completes it.
• Not possible to give assistance if required.
• Replies not spontaneous and independent of each other.
• Respondent can read all questions beforehand and then decide whether to complete or
not. For example, perhaps because it is too long, too complex, uninteresting, or too
personal.

Mailed Questionnaire Method: In the mailed questionnaire method a questionnaire in the form
of a set of questions is sent by mail to the informants. They are expected to answer the questions
and also the additional needed information, whenever needed and mail them back to the
investigator. This method should be used in the following occasions
i. When the area under investigation is wide.
ii. When the informants are educated.
iii. When the informants are expected to leave for faraway places.
Schedule through Enumerators: Initially let us make a distinction between a questionnaire and
a schedule. The questionnaire is a set of questions the answers to which are recorded by the
informant itself, whereas in a schedule answers are recorded by the investigator or an enumerator
on his behalf.
In this method the investigators or enumerators approach the informants with a prepared
questionnaire and get the replies to the questions. This method is generally used in censuses and
large scale surveys. In the case of census, investigators visits every member of the source of
information in the zones while in the case of sample survey, they collect information from those
members who have been selected in the sample.
Interviewing:
It is a technique that is primarily used to gain an understanding of the underlying reasons and
motivations for people’s attitudes, preferences or behavior. Interviews can be undertaken on a
personal one-to-one basis or in a group. They can be conducted at work, at home, in the street or
in a shopping center, or some other agreed location.

Advantages:
• Serious approach by respondent resulting in accurate information.
• Good response rate.
• Completed and immediate.
• Possible in-depth questions.
• Interviewer in control and can give help if there is a problem.
• Can investigate motives and feelings.
• Can use recording equipment.
• Characteristics of respondent assessed – tone of voice, facial expression, hesitation, etc.
• If one interviewer used, uniformity of approach.
• Used to pilot other methods.
Disadvantages:
• Need to set up interviews.
• Time consuming.
• Geographic limitations.
• Can be expensive.
• Normally need a set of questions.
• Respondent bias – tendency to please or impress, create false personal image, or end
interview quickly.
• Embarrassment possible if personal questions.
• Transcription and analysis can present problems– subjectivity.
• If many interviewers, training required.

Indirect Personal Interview:


In some cases the informants cannot be contacted directly. In this situation an indirect personal
inquiry is conducted to get the desired information. The indirect personal investigation is done
through some agencies that have some knowledge of the phenomenon under enquiry. This
method is useful in the following cases:
• When the area of investigation is large.
• When the information cannot be obtained directly from the informants.
Observation: It involves recording the behavioral patterns of people, objects and events in a
systematic manner.
Diaries: A diary is a way of gathering information about the way individuals spend their time on
professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management research
can provide information about work patterns and activities.
Laboratory experiment: Conducting laboratory experiments on fields of chemical, biological
sciences and so on.
Methods of Secondary Data Collection
Secondary data analysis can be literally defined as second-hand analysis and is the analysis of
data or information that was either gathered by someone else (e.g., researchers, institutions, other
NGOs, etc.) or for some other purpose than the one currently being considered, or often a
combination of the two.
Some of the sources of secondary data are government document, official statistics, technical
report, scholarly journals, trade journals, review articles, reference books, research institutes,
universities, hospitals, libraries, library search engines, computerized data base and world wide
web (WWW).

Advantage of secondary data


• Secondary data may help to clarify or redefine the definition of the problem as part of
the exploratory research process.
• Time saving
• Does not involve collection data and Provides a larger database as compared to primary
data
Disadvantage of secondary data
• Lack of availability
• Lack of relevance
• Inaccurate data
• Insufficient data

Sources of secondary data: Two sources


• Published Sources: Some of the published sources which provide secondary data are
government publications, International Publications, Semi-official Publications, Reports
of Committees and Commissions, Private Publications.
• Unpublished Sources: In some cases data are collected but these are not put in published
form. For example research scholars in the institutes and universities, trade associations
and labour bureaus do collect data but they never put it in the published form. Still, the
data from these sources may be used when needed.
2.2 Methods of Data Presentation
So far we deal how to collect data. Having collected and edited the data, the next important step
is to organize the data. The raw data or the recorded information in its original collected form is
always in an unorganized form and need to be organized and presented in a meaningful and
readily understandable form in order to facilitate further statistical analysis.
• Motivating examples
Tabular and graphical methods commonly used to summarize both qualitative and quantitative
data. Tabular and graphical summaries of data can be obtained in annual reports, newspaper
articles and research studies. It is important to understand how they are prepared and how they
will be interpreted.
Modern statistical software packages provide extensive capabilities for summarizing data and
preparing graphical presentations. MINITAB, SPSS and STATA are three packages that are
widely available.
The data can be presented in different ways, such as:
• Tabular form
• Diagrammatic form or
• Graphical form
Note: - The process of arranging data in to classes or categories according to similarities
technically is called classification. Classification is a preliminary and prepares the ground
for proper presentation of data.
• Tabular presentation of data
Tabulation is the process of summarizing classified or grouped data in the form of a table so that
it is easily understood and an investigator is quickly able to locate the desired information. A
table is a systematic arrangement of classified data in columns and rows. Thus, a statistical table
makes it possible for the investigator to present a huge mass of data in a detailed and orderly
form. It facilitates comparison and often reveals certain patterns in data, which are otherwise not
obvious. Classification and Tabulation, as a matter of fact; are not two distinct, processes.
Actually, they go together. Before tabulation, data are classified and then displayed under
different columns and rows of a table.
Advantages of Tabulation
Statistical data arranged in a tabular form serve following objectives:
• It simplifies complex data and the data presented are easily understood
• It facilitates comparison of related facts.
• It facilitates computation of various statistical measures like averages, dispersion, correlation
• It presents facts in minimum possible space and unnecessary repetitions and explanations are
avoided. Moreover, the needed information can be easily located.
• Tabulated data are good for references and they make it easier to present the information in
the form of graphs and diagrams.
Components of Table
The making of a compact table is itself an art. This should contain all the information needed
within the smallest possible space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind while preparing for a statistical
table. An ideal table should consist of the following main parts:
• Table Number A table should be numbered for easy reference and identification.
This number, if possible, should be written in the center at the top of the table.
Sometimes it is also written just before the title of the table.
• Title: A good table should have a clearly worded, brief but unambiguous title
explaining the nature of data contained in the table. It should also state arrangement
of data and the period covered. The title should be placed centrally on the top of a
table just below the table number (or just after table number in the same line).
• Captions or Column Headings: Captions in a table stand for brief and self-
explanatory headings of vertical columns. Captions may involve headings and
subheadings as well. The unit of data contained should also be given for each column.
Usually, a relatively less important and shorter classification should be tabulated in
the columns.
• Stubs or Row Designations: Stubs stands for brief and self-explanatory headings of
horizontal rows. Normally, a relatively more important classification is given in rows.
Also a variable with a large number of classes is usually represented in rows. For
example, rows may stand for score classes and columns for data related to sex of
students. In the process, there will be many rows for score classes but only two
columns for male and female students.
• Body: The body of the table contains the numerical information of frequency of
observations in the different cells. This arrangement of data is according to the
description of captions and stubs.
• Footnotes: Footnotes are given at the foot of the table for explanation of any fact or
information included in the table, which needs some explanation. Thus, they are meant for
explaining or providing further details about the data that have not been covered in title,
captions and stubs.
• Sources of Data: Lastly, one should also mention the source of information from which
data are taken. This may preferably include the name of the author, volume, page and the
year of publication. This should also state whether the data contained in the table is of
primary or secondary nature. A model structure of a table is given below;
Model Structure of a Table

Table Number: Title of the Table


Caption Heading
Stub Total
Headings Caption Sub-Headings

Stub Sub- Headings


BODY
Total
Foot notes: 1- 2-
Sources Note: 1- 2

Type of Tables
Tables can be classified according to their purpose, stage of enquiry, nature of dam or number of
characteristics used. On the basis of the number of characteristics, tables may be classified as
follows: 1. Simple or one-way table 2. Two-way table or contingency table.
3. Manifold table or higher order table.
Simple or One-Way Table
A simple or one-way table is the simplest table, which contains data of one characteristic only. A
simple table is easy-to construct and simple to follow. For example, the adjacent blank table may
be used to show the number of adults in different occupations in a loca1ity.
The number of adults in different occupations in a locality
Occupations No. of adults
Employee
Farmer
Total
Two-Way Table

Atable which contains data on two characteristics is called a two-way table. In such case,
therefore, eitherstub or caption is divided into two co-ordinate parts. In the given table, as an
example, the caption may be further divided in respect of sex.
This sub-division is shown in the adjacent two-way table, which now contains two characteristics
namely, the occupation and sex.
Table: The number of adults in a locality in respect of occupation and sex
No. of Adults
Occupation Male Female Total
Employee
Farmer
Total
Manifold (higher order) Table
Thus, including other characteristics can form more and more complex tables. For example we
may further classify the caption sub-headings in the above table in respect of ‘ marital status’,
and ‘socio-economic status’ etc. A table, in which more than two characteristics of data are
considered, is called a manifold table. For instance, table below shows three characteristics,
namely, occupation, sex and marital status.
Table: The number of adults in a locality in respect of occupation, sex and marital status
No. of Adults
Occupations Male Female Total
M U Total M U Total
Employee
Farmer
Total
Foot note- M stands for Married and U for Unmarried.
Manifold tables, though complex, are good in practice as these enable full information to be
incorporated and facilitate an analysis of all rehired facts. Still, as a normal practice, not more
than four characteristics should be represented in one table to avoid confusion. Other related
tables may be formed to show the remaining characteristics.
• Frequency Distributions: Qualitative, Quantitative: Absolute, Relative and Percentage.
The Frequency Distribution
Frequency:- is the number of times a certain value or group of values or categories/qualities/
repeated in a given set of data and frequency distribution is the organization of raw data in table
form, using classes and frequencies.

Main types of frequency distributions:


• Categorical Frequency Distribution
In this frequency distribution the data are usually qualitative and the scales of measurements for
the data are usually nominal or ordinal. For instance data on blood types of people, political
affiliation, economic status (low, medium and high), religious affiliation are presented by
categorical frequency distributions.

Example 2.1: last year, from thirty persons, blood test was taken and the following blood groups
were obtained. Construct an appropriate frequency distribution for these data.

B B AB B A AB
O AB AB AB B B
B A B AB O AB
A O B O AB A
B AB AB A AB O
There are four kinds of blood groups: A, B, AB, and O, which may be used as the classes for
constructing the distribution. The procedure for constructing a frequency distribution for
categorical data is given below.
Frequency distribution of Blood type
Blood Type A B AB O
Tally //// //// //// ////
//// //// /
Frequency 5 9 11 5

• Ungrouped Frequency Distribution


Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution is
often constructed for small set of data or a discrete variable.
Constructing an ungrouped frequency distribution
To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores
in the collected data. Then make a columnar table of all potential raw scored values arranged in
order of magnitude with the number of times a particular value is repeated, i.e., the frequency of
that value. To facilitate counting method, tallies can be used.

Example 2.3: The following data are the serum triglyceride levels (mg/dl) of 30 male of Biology
students who have measured his blood last year: 30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42,
30, 35, 37, 32, 30, and 41.Construct an appropriate frequency distribution for these data.
Example 2.4.

A demographer is interested in the number of children a family may have and took a sample of
30 families and obtained the following observations.

4 , 2,4,3,2, 8,3,4,4 ,2, 2, 8,5, 3, 4, 5, 4,5,4,3,5, 2,7, 3, 3, 6,7, 3, 8, 4

Construct a frequency distribution for this data.

Solution: These individual observations can be arranged in ascending or descending order of


magnitude in which case the series is called array. Array of the number of children in 30 families
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 7, 7, 8, 8, 8,since
the variable ” number of children in a family” can assume only the values 0,1,2,3,……etc. hence
it is a discrete variable. Therefore its frequency distribution is a discrete frequency distribution.
As Shown below.

No of children 2 3 4 5 6 7 8 Total
No of family (frequency) 5 7 8 4 1 2 3 30

• Grouped Frequency Distribution


When the data are continuous like height, weight, income of households in a certain city…etc or
When the range of the data is large (for discrete data), the data must be grouped into classes that
are more than one unit in width. Grouped frequency distribution is a frequency distribution when
several numbers of data are grouped into one class. Before we deal on grouped frequency
distribution the following terms should be clear.
• Class: different, non-overlapping groups of data.
• Class Frequency: The number of observations belonging to a particular class is known as
the frequency of that class or class frequency. Suppose there are 20 students who have
obtained marks ranging from30-40 and 44 students have obtained marks ranging from 50-
60. In the first case, the class-interval 30-40, the class frequency is 20, while in the second
case, in the class interval 50-60, the class frequency is 44.
• Class limits: separate one class in a grouped frequency distribution from another. The limits
could actually appear in the collected data and have gaps between the limit of one class and the
lower limit of the next class. It may be exclusive or inclusive class limit
• Lower Class Limits are the smallest number that can belong to the different class.

• Upper Class Limits are the largest number that can belong to the different classes

• Units of measurement (U): The distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
• Class boundaries (true class limits): Separate one class in a grouped frequency distribution
from another. The boundaries have one more decimal place than the raw data and therefore
do not appear in the collected data. There is no gap between the upper boundary of one class
and the lower boundary of the next class. The lower class boundary (LCB) is found by
subtracting U/2 from the corresponding lower class limit (LCL) and the upper class boundary
(UCB) is found by adding U/2 to the corresponding upper class limit (UCL).
• Class width (W): the difference between the upper and lower boundaries of any class or the
lower limits of two consecutive classes, or the upper limits of two consecutive classes.
N.B. Class width is not equal to the difference between UCL and LCL of the same class.

• Class Mid-point (Class mark or Mid points):When we add up the lower and the upper
class limits of a class interval, we get a certain value. This value is divided by two, which
gives us the class mid-point. Thus, the mid-point of class interval 40-60 is (40+60)/2 = 50.
The formula for obtaining class mid-point is as follows:

Mid-point (mi) =
As we shall see subsequently, the mid-point of each class interval is taken to represent it for the
purpose of statistical calculations.
• Cumulative frequency (C. f) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
• Cumulative frequency (C f) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class. A tabular arrangement of
class intervals together with their corresponding cumulative frequency (either less than or
more than type; as defined above) is called a cumulative frequency distribution.
• Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.

Note: 1.The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
• The sum of all the relative frequencies in the frequency distribution is always 1.
• Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type)
divided by the total frequency. This gives the percent of values, which are less than/more than
the upper/lower class boundary.

Guidelines for constructing Classes

• There should be between 5 and 20 classes.


• The classes must be mutually exclusive. This means that no data value can fall into two
different classes
• The classes must be all inclusive or exhaustive. This means that all data values must be
included.
• The classes must be continuous. There are no gaps in a frequency distribution.
• The classes must be equal in width. The exception here is the first or last class. It is possible
to have a "below ..." or "... and above" class. This is often used with ages.
Constructing a grouped frequency distribution

• Find the maximum (Max) and the minimum (Min) observation, and then compute their
range, R,
• Fix the number of classes’ desired (k). There are two ways to fix k:
• Fix k arbitrarily between 5 and 20, or

• Use Sturge’s Formula: where n is the total frequency /total number


of observations and round this value of k up to get an integer number.
• Find the class widths (W) by dividing the range by the number of classes and round the

number up to get an integer value.


• Pick a suitable starting point less than the minimum value. This starting point is the lower
limit of the first class. Continue to add the class width to this lower limit to get the rest of
the lower limits.
• Find the upper class limits. To find the upper class limit of the first class, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.
• Compute the class boundaries: Where LCL = lower class limit, UCL= upper class limit,
LCB= lower class boundary and UCB= upper class boundary. The class boundaries are
also half way between the upper limit of one class and the lower limit of the next class.
• Tally the data.
• Find the frequencies.
• (If necessary) Find the cumulative frequencies (more than and less than types).

Example 2.5: A sample of 20 fishes was taken at random from a fish pond. The following oxygen
consumption by each fish was measured and recorded. Construct a frequency distribution for these
data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Example : The following data are on age of 20 women who attended health education in a certain
hospital. Construct frequency distribution (Relative Frequency and Percentage Distributions) by
using sturge’s rule.30, 25, 23, 41, 39, 27, 41, 24, 32, 29, 35, 31, 36, 33, 36, 42, 35, 37, 41, and 29
Example: - Construct both “less than” and “more than “cumulative frequency distribution for
the following data on weekly wage distribution for 201 workers.
Weekly wage 0-20 20-40 40-60 60-80 80-100
No of workers 41 51 64 38 7
Table Cumulative Frequency Distribution
Weekly wage more than No of workers weekly wage less than No of workers
0 201 20 7
20 194 40 94
40 156 60 156
60 94 80 194
80 7 100 201
2.2.4 Diagrammatic Presentation of Data
Grouping of data or too many figures in a table do not always appeal to a common man as too
many figures are generally confusing and fail to convey the definite pattern or trend of the
figures. Diagrammatic Presentation of Data refers to techniques for presenting data in visual
displays using geometry and pictures.
They are important because
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.

Diagrams are appropriate for presenting discrete data and the three most commonly used
diagrammatic presentation for discrete as well as qualitative data are: Pie charts, Pictogram and
Bar charts
Pie chart
A pie chart is a circle that is divided into sections or sectors according to the percentage of
frequencies in each category of the distribution.
The angles of each component are calculated by the formula:

These angles are made in the circle by mean of a protractor to show different components. The
arrangement of the sectors is usually anti-clock wise.

Example2.6: The following table gives the details of monthly budget of a family. Represent
these figures by a suitable diagram.
Pictogram
In these diagrams, we represent data by means of some picture symbols. We decide about a
suitable picture to represent a definite number of units in which the variable is measured. The
following table shows the orange production in a plantation from production year 1990-1993.
Table: Orange productions from 1990 to 1993
Year Amount in Kg
1990 3000
1991 3850
1992 3500
1993 5000

Figure: Pictogram of the


data on Orange productions from 1990 to 1993.

Bar Charts:
A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
They are useful for comparing aggregate over time space. Bars can be drawn either vertically or
horizontally. Usually horizontal bar-diagrams are used for qualitatively classified data whereas
vertical bar-diagrams are used for quantitatively classified data.
EXAMPLE: horizontal bar diagram of blood group of a person
Blood type frequency
A 9
B 14
AB 10
O 17

The most common Bar charts are


• Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.
Example: The following frequency distribution shows sales of production (in million birr) of three
products for 2004 production year.
Product Sale(in millions)
A 14
B 21
C 9
D 17
Product
The bar-diagram presentation for the data

Ermias solomon Bsc at leather


goods and foot wear manufacturing Engineering
• Deviation Bar-diagrams
When the data take both positive and negative values (for instance data on profit, net export,
percent change, etc) deviation bar-diagrams are appropriate.
Example: Present the following data using a suitable bar-diagram.
Data: Net profit (in thousands birr) in oil sales for five years
Year Profit (in thousands)
1997 12
1998 -5
1999 14
2000 9
2001 -6

• Component Bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.

Example: Represent the following data using bar charts


Year 1990 1991 1992 1993
Crop EC EC EC EC
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47

Data: Yields of production of farmers in Southern Ethiopia.

• Multiple bar-diagrams

Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.

Example: The data given in the above


example can be presented by using
multiple bar-diagrams as below.

2.2.5 Graphical Presentation of Data


Like diagrammatic presentation, graphical presentation also gives a visual effect. Diagrammatic
presentation is used to present data classified according to categories and geographical aspects.
On the other hand, graphical presentation is used in situations when we observe some functional
relationship between the values of two variables. There are many forms of graphs, the most
commonly used type graph is frequency graphs.

Frequency Graphs
The types of frequency graphs are normally used are histogram, Polygon, and Ogive.
Histogram : It is a special type of bar graph in which the horizontal scale represents classes of data
values and the vertical scale represents frequencies. The height of the bars correspond to the frequency
values, and the drawn adjacent to each other (without gaps).A histogram can be constructed after we
have first completed a frequency distribution table for a data set. The x axis is reserved for the class
boundaries.
Example: Construct a histogram for the frequency distribution of the time spent by the
automobile workers.
Table: Time in minutes spent by
Histogram representing Time in Minutes spent by auto workers
automobile workers
Time Class Number of
(in minute) mark workers
15.5- 21.5 18.5 3
21.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
45.5-51.5 48.5 1

Relative frequency histogram has the same shape and horizontal () scale as a histogram, but the vertical ()
scale is marked with relative frequencies instead of actual frequencies.
Frequency Polygon
A frequency polygon uses line segment connected to points located directly above class midpoint values.
The heights of the points correspond to the class frequencies, and the line segments are extended to the
left and right so that the graph begins and ends on the horizontal axis with the same distance that the
previous and next midpoint would be located.
• Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical
axis and their respective class marks along the horizontal axis. Then join the cross points by a free
hand curve.
Example: Draw a frequency polygon presenting the following data.
Class Frequency c.f(less cf (more
Boundaries than than type)
type)
5.5 – 11.5 2 2 20
11.5 – 17.5 2 4 18
17.5 – 23.5 7 11 16
23.5 – 29.5 4 15 9
29.5 – 35.5 3 18 5
35.5 – 41.5 2 20 2
ii. Cumulative Frequency Polygon (Ogives)
Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis.
Place the class boundaries along the horizontal axis and the corresponding cumulative frequencies
(either less than or more than cumulative frequencies) along the vertical axis. Then join the cross
points by a free hand curve.

Example: the data in the previous example can be presented using either a less than or a more than
cumulative frequency polygon as given below (i) and (ii) respectively.

(i)More than type cumulative frequency polygon


(ii) Less than type cumulative frequency polygon

You might also like