0% found this document useful (0 votes)
9 views

Statistics I

The document discusses the significance of statistics across various fields, emphasizing its applications in business, agriculture, economics, and medicine. It defines statistics, categorizes it into descriptive and inferential types, and outlines the importance of systematic data collection and analysis. Additionally, it highlights the limitations of statistics and the necessity of accurate data for valid conclusions.

Uploaded by

Sultan Kubsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Statistics I

The document discusses the significance of statistics across various fields, emphasizing its applications in business, agriculture, economics, and medicine. It defines statistics, categorizes it into descriptive and inferential types, and outlines the importance of systematic data collection and analysis. Additionally, it highlights the limitations of statistics and the necessity of accurate data for valid conclusions.

Uploaded by

Sultan Kubsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

CHAPTER ONE

1. INTRODUCTION

CHAPTER ONE

1. INTRODUCTION

In the modern world of computers and information technology, the importance of statistics is very well
recognized by all the disciplines. Statistics has originated as a science of statehood and found applications
slowly and steadily in Agriculture, Economics. Commerce. Management, Biology, Medicine. Industry,
planning, education and so on. As on date there is no other human walk of life. where statistics cannot be
applied. The word Statistics and Statistical' are all derived from the Latin word Status, means a political
state.

1.1. Definition of Statistics

Statistics is defined differently by different authors over a period of time. In the olden days statistics was
confined to only state affairs but in modern days it embraces almost every sphere of human activity.
Therefore, a number of old definitions, which was confined to narrow field of enquiry were replaced by
more definitions, which are much more comprehensive and exhaustive. Secondly, statistics has been
defined in two different ways such as Statistical data and statistical methods. The following are some of
the definitions of statistics as numerical data.

Statistics are the classified facts representing the conditions of people in a state. In particular they are the
facts, which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement.

Statistics are measurements, enumerations or estimates of natural phenomenon usually, systematically


arranged. analyzed and presented as to exhibit important interrelationships among them.

Statistics is concerned with scientific methods for collecting, organizing. summarizing. presenting and
analyzing data as well as deriving valid conclusions and making reasonable decisions on the basis of this
analysis. Statistics is concerned with the systematic collection of numerical data and its interpretation.

The word_statistic' is used to refer to:

 Numerical facts, such as the number of people living in particular area.


 The study of ways of collecting, analyzing and interpreting the facts.

Definitions by A.L. Bowley: Statistics are numerical statement of facts in any department of enquiry placed
in relation to each other.

Definition by Croxton and Cowden: Statistics may be defined as the science of collection, presentation
analysis and interpretation of numerical data from the logical analysis. Definition by Horace Secrist:
Statistics may be defined as the aggregate of facts affected to a marked extent by multiplicity of causes,
numerically expressed. enumerated or estimated according to a reasonable standard of accuracy,
collected in a systematic manner. for a predetermined purpose and placed in relation to each other. It may
be emphasized that this definition highlights a few major characteristics of statistics. These are given
below.

- Statistics are aggregates of facts: This means a single figure is not statistics. For example. national income
of a country for a single year is not statistics.

Statistics are affected by a number of factors: For example, sale of a product depends on a number of
factors such as its price, quality, competition, the income of consumers and so on.

Statistics must be reasonably accurate: if wrong figures are analyzed, it will lead to erroneous conclusion.
Hence, it is necessary that conclusion must be based on accurate figures.

Statistics must be collected in a systematic manner: If data are collected in a disorganized manner. they
will not be reliable and will lead to misleading conclusions. Finally, statistics should be placed in relation to
each other if one collects data unrelated to each other, and then such data will be confusing and will not
lead to any logical conclusions. Data should be comparable overtime and space.

Definition by Lovett: statistics is a science that deals with collection. classification and tabulation of
numerical facts as a basis of the explanation, description and comparison of phenomena.

1.2. Importance of Statistics in Business

There is an increasing realization of the importance of statistics in various quarters. This is reflected in the
increasing use of statistics in the government, industry, business, agriculture, mining, transport, education,
medicine and so on. As we are concerned with the use of statistics in business and industry here,
description given below is confined to these areas only. There are three major functions in any business
enterprise in which statistical methods are useful.

The planning functions: This may relate to either special projects or to the recurring activities of the firm
over specified period.

The setting up standards: This may relate to the size of employment, volume of sales, fixation of quality
norms for the manufactured products, norms for daily output, and so forth.

The function of control: This involves comparison of actual production achieved against the norm or target
set earlier. In case the production has fallen short of the target, it gives remedial measures so that such a
deficiency does not occur again.

1.2. Types of Statistics

The statisticians commonly classify this subject in to two broad categories: the Descriptive statistics and
inferential statistics
Descriptive statistics: As the name suggests descriptive statistics includes any treatment designed to
describe or summarize the given data, bringing out their important features. Thus, statistics do not go
beyond this. This means that no attempt is made to infer anything that pertains to more than the data
themselves. Descriptive Statistics describe the data set that's being analyzed, but doesn't allow us to draw
any conclusions or make any interference about the data. Example: Arba Minch University was graduate
students in the year of 2009 is 4000, in the year of 2010 is 4500 and in the year of 2011 is 5200, this
belongs to the domain of descriptive statistics.

Inferential statistics.: It is a method used to generalize from sample to a population. Inferential statistics is
also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations
based on data from a sample. Example: The average per capital income of all Ethiopian population can be
estimated from figures obtained from a few hundred (the sample) of the population is 1000$. Statistical
population is the collection of all possible observations of specified characteristics of interest.

1.4. TYPES OF VARIABLES OR DATA

Variable: Is an item of interest that can take in many different numerical values. Variables can be
categorized as continuous or discrete. Or can be categorized as quantitative or qualitative.

A Continuous Variable, is measured along a continuum. So continuous variables are measured at any place
beyond the decimal point: result of a measuring process

E.g., age, money, time, height, weight. Consider, for example, that Olympic sprinters are timed to the
nearest hundredths place (in seconds), but if the Olympic judges wanted to clock them to the nearest
millionths place, they could.

A Discrete Variable, on the other hand, result of a counting process, it is measured in whole units or
categories. So discrete variables are not measured along a continuum.

For example, the number of brothers and sisters you have.

A Quantitative Variable varies by amount. The variables are measured in numeric units, and so both
continuous and discrete variables can be quantitative.

For example, we can measure food intake in calories (a continuous variable) or we can count the number
of pieces of food consumed (a discrete variable). In both cases, the variables are measured by amount (in
numeric units).

A Qualitative Variable, on the other hand, varies by class. The variables are often labeling for the behaviors
we observe so only discrete variables can fall into this category.

For example, socioeconomic class (working class, middle class, upper class) is discrete and qualitative; so
are many mental disorders such as depression (unipolar, bipolar) or drug use (none, experimental,
abusive).
Qualitative variables are non-numeric variables and can't be measured. Examples include gender, religious
affiliation and state of birth.

SCOPE OF STATISTICS

Apart from the methods comprising the scope of descriptive and inferential branches of statistics, statistics
also consists of methods of dealing with a few other issues of specific nature. Since these methods are
essentially descriptive in nature, they have been discussed here as part of the descriptive statistics. These
are mainly concerned with the following:

(i) It often becomes necessary to examine how two paired data sets are related. For example, we! may
have data on the sales of a product and the expenditure incurred on its advertisement for a specified
number of years. Given that sales and advertisement expenditure are related to each other, it is useful to
examine the nature of relationship between the two and quantify the degree of that relationship. As this
requires use of appropriate statistical methods, these falls under the purview of what we call regression
and correlation analysis.

(ii) Situations occur quite often when we require averaging (or totaling) of data on prices and/or quantities
expressed in different units of measurement. For example, price of cloth may be quoted per meter of
length and that of wheat per kilogram of weight. Since ordinary methods of totaling and averaging do not
apply to such price/quantity data, special techniques needed for the purpose are developed under index
numbers.

(iii) Many a time, it becomes necessary to examine the past performance of an activity with a view to
determining its future behaviour. For example, when engaged in the production of a commodity, monthly
product sales are an important measure of evaluating performance. This requires compilation and analysis
of relevant sales data over time. The more complex the activity, the more varied the data requirements.
For profit maximizing and future sales planning, forecast of likely sales growth rate is crucial. This needs
careful collection and analysis of past sales data. All such concerns are taken care of under time series
analysis.

(iv) Obtaining the most likely future estimates on any aspect(s) relating to a business or economic activity
has indeed been engaging the minds of all concerned. This is particularly important when it relates to
product sales and demand, which serve the necessary basis of production scheduling and planning. The
regression, correlation, and time series analyses together help develop the basic methodology to do the
needful. Thus, the study of methods and techniques of obtaining the likely estimates on
business/economic variables comprises the scope of what we do under business forecasting. Keeping in
view the importance of inferential statistics, the scope of statistics may finally be restated as consisting of
statistical methods which facilitate decision-making under conditions of uncertainty. While the term
statistical methods are

often used to cover the subject of statistics as a whole, in particular it refers to methods by which
statistical data are analyzed, interpreted, and the inferences drawn for decision making. Though generic in
nature and versatile in their applications, statistical methods have come to be widely used, especially in all
matters concerning business and economics. These are also being increasingly used in biology, medicine,
agriculture, psychology, and education. The scope of application of these methods has started opening and
expanding in a number of social science disciplines as well. Even a political scientist finds them of
increasing relevance for examining the political behaviour and it is. of course, no surprise to find even
historians' statistical data, for history is essentially past data presented in certain actual format.

1.5. Application of Statistics

Statistics is not a mere device for collecting numerical data. but as a means of developing sound
techniques for their handling, analyzing and drawing valid inferences from them. Statistics is applied in
every sphere of human activity let us discuss briefly.

Statistics and Industry:

Statistics is widely used in many industries. In industries, control charts are widely used to maintain a
certain quality level. In production engineering, to find whether the product is conforming to specifications
or not, statistical tools, namely inspection plans, control charts, etc., are of extreme importance.

Statistics and Commerce:

Statistics are lifeblood of successful commerce. Any businessman cannot afford to either by under stocking
or having overstock of his goods. In the beginning he estimates the demand for his goods and then takes
steps to adjust with his output or purchases.

Statistics and Agriculture:

Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Cash crop, plays a
prominent role in agriculture experiments. In tests of significance based on small samples, it can be shown
that statistics is adequate to test the significant difference between two sample means. In analysis of
variance, we are concerned with the testing of equality of several population means. For an example, five
fertilizers are applied to five plots each of wheat and the yields of wheat on each of the plots are given. In
such a situation, we are interested in finding out whether the effect of these fertilizers on the yield is
significantly different or not. The answer to this problem is provided by the technique of ANOVA and it is
used to test the homogeneity of several population means.

Statistics and Economics:

Nowadays the uses of statistics are abundantly made in any economic study. Alfred Marshall said that
statistical data and techniques of statistical tools are immensely useful in solving many economic problems
such as wages, prices. production, distribution of income and wealth and so on. Statistical tools like Index
numbers, time series Analysis, Estimation theory, Testing Statistical Hypothesis are extensively used in
economics.

Statistics and Education:

Statistics is widely used in education. Research has become a common feature in all branches of
activities. Statistics is necessary for the formulation of policies to start new course, consideration of
facilities available for new courses etc.

Statistics and Planning:

Statistics is crucial in planning. In the modern world, which can be termed as the world of planning , almost
all the organizations in the government and non-governments are seeking the help of planning for efficient
working, for the formulation of policy decisions and execution of the same.

In order to achieve the above goals, the statistical data relating to production, consumption,
demand, supply, prices, investments, income expenditure etc. and various advanced statistical
techniques for processing, analyzing and interpreting such complex data are of importance.
Statistics and Medicine:
In Medical sciences, statistical tools are widely used. In order to test the efficiency of a new drug
or medicine, t-test is used or to compare the efficiency of two drugs or two medicines, t-test for
the two samples is used. More and more applications of statistics are at present used in clinical
investigation.
Statistics and Modern applications:
Recent developments in the fields of computer technology and information technology have
enabled statistics to integrate their models and thus make statistics a part of decision-making
procedures of many organizations. There are so many software packages available for solving
design of experiments, forecasting simulation problems etc.
1.6. Limitations of statistics
The preceding discussions highlighted the importance of statistics in business should not lead
anyone to conclude that statistics is free from any limitations. Statistics has a number of
limitations.

1 Statistics has no place in all such cases where quantification is not possible. For example, beauty,
intelligence, courage cannot be quantified.

2 Statistics reveal the average behavior, the normal or general trend. An application of the average‘ concept
if applied to an individual or a particular situation may lead to a wrong conclusion and sometimes may be
disastrous.

3 Since statistics are collected for a particular purpose, such data may not be relevant or useful in other
situations.

4. Statistics is not 100% precise as is mathematics or accountancy.


5 In statistical surveys, sampling is generally used as it is not physically possible to cover all the units
comprising the universe. The results may not be appropriate as far as the universe is concerned.

Chapter Two
Data Collection and Presentation
1. Introduction
Statistical investigation is a comprehensive and requires systematic collection of data about some
group of people or objects, describing and organizing the data, analyzing the data with the help
of different statistical method, summarizing the analysis and using these results for making
judgments, decisions and predictions. When we talk of collection of data, we should be clear as to what does
the word ―data‖. The word datum is a Latin word which means something given’ It means a piece
of information which can be either quantitative or qualitative. The term data is the plural of datum and
means facts and statistics collect together for reference or analysis.

1.7. Nature of data


It may be noted that different types of data can be collected for different purposes. The data can
be collected in connection with time or geographical location or in connection with time and
location. The following are the three types of data:
1. Time series data,
2. Spatial data and
3. Spacio-temporal data.
i. Time series data
It is a collection of a set of numerical values, collected over a period of time. The data might
have been collected either at regular intervals of time or irregular intervals of time. Example;
The following is the data for the three types of expenditures in birrs for a family for the four
years 2001,2002,2003,2004.
Year Food Education Others Total
2001 2000 1000 1000 4000
2002 2500 1500 1500 5500
2003 3000 2000 1500 6500
2004 3500 1500 2500 7500
ii. Spatial Data:
If the data collected is connected with that of a place, then it is termed as spatial data. Example:
Assume the population of the southern nation and nationality of Ethiopia in 2006.
Cite/Town Population
Arba Minch 1,000,000
Wolayita Sodo 1,586,000
Hawassa 2,000,000
Butajira 1,250,000
iii. Spacio-Temporal Data:
If the data collected is connected to the time as well as place then it is known as Spacio-temporal
data. Example: Assume the population of the southern nation and nationality of Ethiopia in 2006
and 2007.
Cite/Town Populati
on
2006 2007
Arba Minch 1,000,000 1,150,000
Wolayita Sodo 1,586,000 1,690,000
Hawassa 2,000,000 2,200,00
Butajira 1,250,000 1,320,000
1.7.1. Levels of Data (Scales of Measurement)
Data is the value that the variables can take, which is either numerical or categorical value.
Levels of data can be classified in to two:
 Categorical Data such as Ordinal and Nominal
 Numerical Data such as Interval and R atio
1. Nominal scale: Nominal scale is simply a system of assigning number symbols to events in order
to label them. The usual example of this is the assignment of numbers of

basketball players in order to identify them. Such numbers cannot be considered to be


associated with an ordered scale for their order is of no consequence; the numbers are just
convenient labels for the particular class of events and as such have no quantitative value.
One cannot do much with the numbers involved. For example, one cannot usefully average the numbers on
the back of a group of football players and come up with a meaningful value. Neither can one usefully
compare the numbers assigned to one group with the numbers assigned to another. Accordingly, we are
restricted to use mode as the measure of central tendency. There is no generally used measure of
dispersion for nominal scales. Chi-square test is the most common test of statistical significance.
Nominal scale is the least powerful level of measurement. It indicates no order or distance relationship and
has no arithmetic origin.
2. Ordinal scale: The ordinal scale places events in order, but there is no attempt to make the intervals
of the scale equal in terms of some rule. Rank orders represent ordinal scales

and are frequently used in research relating to qualitative phenomena. A student‘s rank in
his graduation class involves the use of an ordinal scale. One has to be very careful in making statement
about scores based on ordinal scales. For instance, if Ram‘s position in his class is 10 and Mohan‘s position
is 40, it cannot be said that Ram‘s position is four times as good as that of Mohan. The statement would
make no sense at all. Ordinal measures have no absolute values, and the real differences between adjacent
ranks may not be equal. All that can be said is that one person is higher or lower on the scale than another,
but more precise comparisons cannot be made. Thus, the use of an ordinal scale implies a statement of
‗greater than‘ or ‗less than‘ without our being able to state how much greater or less. The real difference
between ranks 1 and 2 may be more or less than the difference between ranks 5 and 6. Since the numbers of
this scale have only a rank meaning, the appropriate measure of central tendency is the median. Measures
of statistical significance are restricted to the non-parametric methods.

3. Interval scale: In the case of interval scale, the intervals are adjusted in terms of some rule that has
been established as a basis for making the units equal. The units are equal only in so far as one accepts the
assumptions on which the rule is based. Interval scales can have an arbitrary zero, but it is not possible
to determine for them what may be called an absolute zero or the unique origin. The primary limitation of
the interval scale is the lack of a true zero; it does not have the capacity to measure the complete absence of
a trait or characteristic. The Fahrenheit scale is an example of an interval scale and shows similarities in
what one can and cannot do with it. One can say that an increase in temperature from 30° to 40° involves
the same increase in temperature as an increase from 60° to 70°, but one cannot say that the temperature of
60° is twice as warm as the temperature of 30° because both numbers are dependent on the fact that the zero
on the scale is set arbitrarily at the temperature of the freezing point of water. Interval scales provide more
powerful measurement than ordinal scales for interval scale also incorporates the concept of equality of
interval. Mean i s the appropriate measure of central tendency, while standard deviation is the
most widely used measure of dispersion. For statistical significance are the ‘t’ test and ‘F’ test are
widely applied.

4. Ratio scale: Ratio scales have an absolute or true zero of measurement. The term absolute zero‘ is
not as precise as it was once believed to be. We can conceive of an absolute zero of length and similarly we
can conceive of an absolute zero of time. For example, the zero point on a centimeter scale indicates the
complete absence of length or height. But an absolute zero of temperature is theoretically unobtainable and it
remains a concept existing only in the scientist‘s mind. The number of minor traffic-rule violations and the
number of incorrect letters in a page of type script represent scores on ratio scales. Both these scales have
absolute zeros and as such all minor traffic violations and all typing errors can be assumed to be equal in
significance. With ratio scales involved ne can make statements like ―Abie‘s‖ typing performance was
twice as good as that of Kebie.‖ The ratio involved does have significance and facilitates a kind of
comparison which is not possible in case of an interval scale. Ratio scale represents the actual amounts of
variables. Measures of physical dimensions such as weight, height, distance, etc. are examples. Generally,
all statistical techniques are usable with ratio scales and all manipulations that one can carry out with real
numbers can also be carried out with ratio scale values. Multiplication and division can be used with this
scale but not with other scales mentioned above. Geometric and harmonic means can be used
as measures of central tendency and coefficients of variation m ay also be calculated.

1.7.2. Source of Data


Any statistical data can be classified under two categories depending upon the sources utilized.
These categories are:
 Primary data
 Secondary data
 1.7.2.1. Primary data:
Primary data is the one, which is collected by the investigator himself for the purpose of a
specific inquiry or study. Such data is original in character and is generated by survey conducted by
individuals or research institution or any organization. Primary data can be collected through
I. Direct personal interviews: The persons from whom information are collected are
known as informants. The investigator personally meets them and asks questions to gather
the necessary information. It is the suitable method for intensive rather than extensive field
surveys. It suits best for intensive study of the limited field.

Merits:
 People willingly supply information because they are approached personally.
 The collected information is likely to be uniform and accurate.
 Information on character and environment may help later to interpret some of the results.
 Answers for questions about which the informant is likely to be sensitive can be gathered by this
method.
 The wordings in one or more questions can be altered to suit any informant. Inconvenience and
misinterpretations are thereby avoided.
Limitations:
 It is very costly and time consuming.
 It is very difficult, when the number of persons to be interviewed is large and the persons are spread
over a wide area.
 Personal prejudice and bias are greater under this method.
II. Indirect Oral Interviews: Under this method the investigator contacts witnesses or
neighbors or friends or some other third parties who are capable of supplying the necessary
information. This method is preferred if the required information is on addiction or cause of
fire or theft or murder etc., If a fire has broken out a certain place, the persons living in
neighborhood and witnesses are likely to give information on the cause of fire. This method
is suitable whenever direct sources do not exist or cannot be relied upon or would be
unwilling to part with the information.
III. Information from correspondents: The investigator appoints local agents or
correspondents in different places and compiles the information sent by them. Information to
Newspapers and some departments of Government come by this method. The advantage of
this method is that it is cheap and appropriate for extensive investigations. But it may not
ensure accurate results because the correspondents are likely to be negligent, prejudiced and
biased. This method is adopted in those cases where information is to be collected
periodically from a wide area for a long time.
IV. Mailed questionnaire method: Under this method a list of questions is prepared
and is sent to all the informants by post. The list of questions is technically called
questionnaire. A covering letter accompanying the questionnaire explains the purpose of the
investigation and the importance of correct information and requests the informants to fill in
the blank spaces provided and to return the form within a specified time.

The Merits of mailed questionnaire:


 is relatively cheap and it is preferable when the informants are spread over the wide
area.
The Limitations of mailed questionnaire:

 is that the informants should be literates who are able to understand and reply the
questions, It is possible that some of the persons who receive the questionnaires do not
return them and It is difficult to verify the correctness of the information furnished by
the respondents.
V. Schedules sent through Enumerators: Under this method enumerators or
interviewers take the schedules, meet the informants and filling their replies. Often
distinction is made between the schedule and a questionnaire. A schedule is filled by the
interviewers in a face-to-face situation with the informant. A questionnaire is filled by the
informant which he receives and returns by post. It is suitable for extensive surveys.

Merits:

 It can be adopted even if the informants are illiterates.


 Answers for questions of personal and pecuniary nature can be collected.
 Non-response is minimum as enumerators go personally and contact the
informants.
 The information collected is reliable. The enumerators can be properly trained
for the same.

Limitations:
 It is the costliest method.
 Extensive training is to be given to the enumerator‘s for collecting correct
and uniform information.
 Interviewing requires experience. Unskilled investigators are likely to fail in
their work.
Characteristics of a good questionnaire

 Number of questions should be minimum and Questions should be in logical orders, moving from
easy to more difficult questions.
 Questions should be short and simple. Technical terms and vague expressions capable of different
interpretations should be avoided.
 Questions should be carefully framed so as to cover the entire scope of the survey.
 The wording of the questions should be proper without hurting the feelings or arousing resentment.
 Physical appearance should be attractive, sufficient space should be provided for answering each
question.
1.7.2.2. Secondary Data:
Secondary data are those data which have been already collected and analyzed by some earlier agency for
its own use; and later the same data are used by a different agency.
 Sources of Secondary data
The sources of secondary data can broadly be classified under two heads:
A. Published sources, and
B. Unpublished sources.
A. Published Sources: The various sources of published data are:
 Reports and official publications of international bodies such as the International Monetary
Fund, International Finance Corporation and United Nations Organization.
 Semi-official publication of various local bodies such as Municipal Corporations and District
Boards.
 Private publications-such as the publications of Trade and professional bodies, Financial and
economic journals, Publications brought out by research agencies, research scholars, etc.
B. Unpublished Sources: All statistical material is not always published. There are various sources
of unpublished data such as records maintained by various Government and private offices, studies made by
research institutions, scholars, etc.
1.8. Tabular Methods of Data Presentation
Tabulation is the process of summarizing classified or grouped data in the form of a table so that it is easily
understood and an investigator is quickly able to locate the desired information. A table is a systematic
arrangement of classified data in columns and rows. Thus, a statistical table makes it possible for the
investigator to present a huge mass of data in a detailed and orderly form. It facilitates comparison and often
reveals certain patterns in data which are otherwise not obvious. ‗Classification‘ and ‗Tabulation‘, as a
matter of fact, are not two distinct processes. Actually, they go together. Before tabulation data are classified
and then displayed under different columns and rows of a table.
Advantages of Tabulation
 It simplifies complex data and the data presented are easily understood.
 It facilitates comparison of related facts, computation of various statistical measures like averages,
dispersion, correlation etc.
 It presents facts in minimum possible space and unnecessary repetitions and explanations are
avoided. Moreover, the needed information can be easily located.
 Tabulated data are good for references and they make it easier to present the information in the form
of graphs and diagrams.
Preparing a Table
The making of a compact table itself an art. This should contain all the information needed
within the smallest possible space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind while preparing for a statistical
table. An ideal table should consist of the following main parts:
i. Table Number: A table should be numbered for easy reference and identification.
ii. Title of the Table: A good table should have a clearly worded, brief but unambiguous title
explaining the nature of data contained in the table. It should also state arrangement of data and
the period covered.
iii. Captions or column Headings: Captions in a table stands for brief and self-
explanatory headings of vertical columns. Captions may involve headings and sub-headings as
well. The unit of data contained should also be given for each column.
iv. Stubs or Row Designations: Stubs stands for brief and self-explanatory headings of
horizontal rows. A variable with a large number of classes is usually represented in rows. For
example, rows may stand for score of classes and columns for data related to sex of students. In
the process, there will be many rows for scores classes but only two columns for male and
female students.
v. Body: The body of the table contains the numerical information of frequency of observations
in the different cells. This arrangement of data is according to the description of captions and
stubs.
vi. Footnotes: Footnotes are given at the foot of the table for explanation of any fact or
information included in the table which needs some explanation. Thus, they are meant for
explaining or providing further details about the data that have not been covered in title, captions
and stubs.
vii. Sources of data: Lastly one should also mention the source of information from which
data are taken. This may preferably include the name of the author, volume, page and the year of
publication.
Type of Tables:
Tables can be classified according to their purpose, stage of enquiry, nature of data or number of
characteristics used. On the basis of the number of characteristics, tables may be classified as
follows: Simple or One-Way Table, Two-Way Table, and Manifold Table.
1. Simple or one-way Table
A simple or one-way table is the simplest table which contains data of one characteristic only. A
simple table is easy to construct and simple to follow. For example, the blank table given below
may be used to show the number of adults in different occupations in a locality.
The number of adults in different occupations in a locality

Occupation Number of
adults
Farmer 230
Student 150
Total 380
2. Two-way Table:
A table, which contains data on two characteristics, is called a two-way table. In such case,
therefore, either stub or caption is divided into two co-ordinate parts. In the given table, as an
example the caption may be further divided in respect of ‗sex‘. This subdivision is shown in
two- way table, which now contains two characteristics namely, occupation and sex.
The number of adults in a locality in respect of occupation and sex

Occupatio Number of Total


n adults
Mal Femal
e e
Farmer 200 30 230
Students 100 50 150
Total 300 80 380
3. Manifold Table:
Thus, more and more complex tables can be formed by including other characteristics. For
example, we may further classify the caption sub-headings in the above table in respect of
―marital status‖, ―religion‖ and ―socio-economic status‖ etc. A table, which has more than two
characteristics of data, is considered as a manifold table. For instance, the table below shows
three characteristics namely, occupation, sex and marital status
Number of adults
Occupati Total
Male Female
on
Marrie Unmarri Tot Marrie Unmarri Tot
d ed al d ed al
Farmer 150 50 200 20 10 30 230
Students 10 90 100 5 45 50 150
Total 160 140 300 25 55 80 380

Manifold tables, though complex is good in practice as these enable full information to be incorporated and
facilitate analysis of all related facts. Still, as a normal practice, not more than four characteristics should be
represented in one table to avoid confusion. Other related tables
may be formed to show the remaining characteristics.
1.8.1. Frequency Distributions
Frequency distribution is a series when a number of observations with similar or closely related values are
put in separate bunches or groups, each group being in order of magnitude in a series. It is simply a table in
which the data are grouped into classes and the numbers of cases which fall in each class are recorded. It
shows the frequency of occurrence of different values of a single Phenomenon.
A frequency distribution n is constructed for three main reasons:
 To facilitate the analysis of data
 To estimate frequencies of the unknown population distribution from the distribution of sample data
and
 To facilitate the computation of various statistical measures
1.8.2. Raw data
The statistical data collected are generally raw data or ungrouped data. Let us consider the daily wages (in
birr) of 30 laborers in a factory.
80 7 55 50 60 65 40 30 80 90
0
75 4 35 65 70 80 82 55 65 80
5
60 5 38 65 7 85 90 65 45 75
5 5
The above figures are nothing but raw or ungrouped data and they are recorded as they occur without any
pre consideration. This representation of data does not furnish any useful information and is rather
confusing to mind. A better way to express the figures in an ascending or descending order of magnitude
and is commonly known as array. But this does not reduce the bulk of the data. The above data when
formed into an array is in the following form:
30 3 38 40 45 45 50 55 55 55
5
60 6 65 65 65 65 65 70 70 75
0
75 7 80 80 8 80 82 85 90 90
5 0
The array helps us to see at once the maximum and minimum values. It also gives a rough idea of the
distribution of the items over the range. When we have a large number of items, the formation of an array is
very difficult, tedious and cumbersome. The Condensation should be directed for better understanding and
may be done in two ways, depending on the nature of the data.
A. Ungrouped Frequency Distribution (For Discrete Variables):
In this form of distribution, the frequency refers to discrete value. Here the data are presented in a way that
exact measurements of units are clearly indicated. There are definite differences between the variables of
different groups of items. Each class is distinct and separate from the other class. Data such as facts like the
number of rooms in a house, the number of companies registered in a country, the number of children in a
family, etc... The process of preparing this type of distribution is very simple. We have just to count the
number of times a particular value is repeated, which is called the frequency of that class. In order to
facilitate counting, prepare a column of tallies. In another column, place all possible values of variable from
the lowest to the highest. Then put a bar (Vertical line) opposite the particular value to which it relates. To
facilitate counting, blocks of five bars are prepared and some space is left in between each block. We
finally count the number of bars and get frequency.

Example 1: In a survey of 40 families in a village, the number of children per family was recorded and
the following data obtained.

1 0 3 2 1 5 6 2 2 1
0 3 4 2 1 6 3 2 1 5
3 3 2 4 2 2 3 0 2 1
4 5 3 3 4 4 1 2 4 5
Solution:
Frequency distribution of the number of children
Number of Tally Marks Frequency
Children
0 3

1 7

2 10

3 8

4 6

5 4

6 2

B. Grouped Frequency Distribution (For Continuous Variables):


In this form of distribution refers to groups of values. This becomes necessary in the case of some variables
which can take any fractional value and in which case an exact measurement is not possible. Hence a
discrete variable can be presented in the form of a continuous frequency distribution.
 Nature of class
The following are some basic technical terms when a continuous frequency distribution is formed or data
are classified according to class intervals.
1) Class limits
The class limits are the lowest and the highest values that can be included in the class. For
example, take the class 30-40. The lowest value of the class is 30 and highest class is 40. The
two boundaries of class are known as the lower limits and the upper limit of the class. The lower
limit of a class is the value below which there can be no item in the class. The upper limit of a
class is the value above which there can be no item to that class. The way in which class limits
are stated depends upon the nature of the data. In statistical calculations, lower class limit is
denoted by L and upper-class limit by U.
2) Class Interval:
The class interval may be defined as the size of each grouping of data. For example, 50-75, 75-
100, 100-125… are class intervals. Each grouping begins with the lower limit of a class interval
and ends at the lower limit of the next succeeding class interval.
Number of class intervals: The number of class interval in a frequency is matter of importance.
The number of class interval should not be too many. For an ideal frequency distribution, the
number of class intervals can vary from 5 to 15. To decide the number of class intervals for the
frequency distributive in the whole data, we choose the lowest and the highest of the values. The
difference between them will enable us to decide the class intervals. Thus, the number of class
intervals can be fixed arbitrarily keeping in view the nature of problem under study or it can be
decided with the help of Sturges‘ Rule. According to him, the number of classes can be
determined by the formula
K = 1 + 3. 322 log
10
N
Where N = Total number of observations
Log = logarithm of the number
K = Number of class intervals.
Thus, if the number of observations is 10, then the number of class intervals is
K = 1 + 3. 322 log1010 = 4.322 = 4
If 100 observations are being studied, the number of class interval is
K = 1 + 3. 322 log10 100 = 7.644 = 8

3) Width or size of the class interval:


The difference between the lower- and upper-class limits is called Width or size of class interval
and is denoted by ‗C‘.
Size of the class interval: Since the size of the class interval is inversely proportional to the
number of class interval in a given distribution. The approximate value of the size (or width or
magnitude) of the class interval ‗C‘ is obtained by using Sturges rule as

You might also like