Objectives of Statistics
Objectives of Statistics
FUNCTIONS OF STATISTICS
2. USES OF STATISTICS
3. LIMITATIONS OF STATISTICS
4. CLASSIFICATION OF DATA
5. PARTS OF A TABLE
6. AVERAGES
7. POSITIONAL AVERAGE
8. MERITS & DEMERITS OF MEDIAN
9. MERITS & DEMERITS OF ARITHMETIC MEAN
10. MEASURES OF DISPERSIONS (explain, nedd & type)
11. CORRELATION (explain & types)
12. REGRESSION (explain)
FUNCTIONS OF STATISTICS
(1) Statistics presents the data in numerical form: Numbers give the exact idea about any phenomenon. We know that India is
overpopulated. But only when we see the census figure, 548 millions, we have the real idea about the population problem. If we
want to compare the speed of two workmen working in the same factory, with the same type of machine, we have to see the number
of units they turn out every day. Only when we express the facts with the help of numbers, they are convincing.
(2) It simplifies the complex data: The data collected are complex in nature. Just by looking at the figures no person can know the
real nature of the problem under consideration. Statistical methods make the data easy to understand. When we have data about
the students making use of the college library, we can divide the students according to the number of hours spent in the library. We
can also see how many are studying and how many are sitting there for general reading.
(3) It facilitates comparison: We can compare the wage conditions in two factories by comparing the average wages in the two
factories. We can compare the increase in wages and corresponding increase in price level during that period. Such comparisons
are very useful in many social sciences.
(4) It studies relationship between two factors: The relationship between two factors, like, height and weight, food habits and health,
smoking and occurrence of cancer can be studied using statistical techniques. We can estimate one factor given the other when
there is some relationship established between two factors.
(5) It is useful for forecasting: We are interested in forecasting using the past data. A shopkeeper may forecast the demand for the
goods and store them when they are easily available at a reasonable price. He can store only the required amount and there will not
be any problem of goods being wasted. A baker estimates the daily
demand for bread, and bakes only that amount so that there will be no problem of leftovers.
(6) It helps the formulation of policies: By studying the effect of policies employed so far by analysing them, using statistical
methods, the future policies can be formulated. The requirements can be studied and policies can be determined accordingly. The
import policy for food can be determined by studying the population figures, their food habits etc.
USES OF STATISTICS
Statistical techniques can be used in all fields for which numeric data can be collected. These are used in social sciences like
Economics, Psychology, Education, Sociology as well as in exact sciences like Biology and Physics.
In any business or industry, there are various departments like production, sales, inventory, accounts. From every department,
numerical information can be obtained and then statistical methods are applied. For production, quality control techniques can be
applied to get better quality of finished products. In inventory, proper storage of products can be made using inventory techniques.
An idea about changing tastes and preferences of consumers can be obtained with the help of market surveys and accordingly
sales promotions techniques can be improved.
In Agriculture, statistics is used to decide the combination of fertilizers and insecticides to obtain maximum crop. In Medicine, it is
used to test the effectiveness or the side effects of drugs.
In Sociology and Education, statistical techniques are used to study the relationship between individuals and groups. A new branch,
known as psychometrics, is developed with special statistical techniques for the measurement of aspects of individuals' performance
in learning situations.
Using modern techniques of Operations Research, new methods to improve industrial organisations are introduced. In Economics,
construction of index numbers like wholesale price index number or cost of living index number, study of monopolistic markets,
international trade and finance, analysis of wages, interest rates etc., is done with statistical methods. New techniques like Linear
Programming and Input Output Analysis are used to optimise a function subject to certain constraints.
Thus, statistical methods form a useful and indispensable tool for a research worker in any field. The methods are useful in
ever-widening range of human activities, in any field where numerical data is available.
LIMITATIONS OF STATISTICS
Despite its advantages, statistics has some limitations as well. A few of them ar as follows.
1
i) Statistical methods can be applied to only quantitative data. The qualitative data has to be expressed in numeral form for
application of these methods
ii) The methods can be applied to a group of observations and not a single observation. So also, the inferences drawn from the data
are true collectively and not individually. When we say that the average number of pages with misprints in a set of books is five; we
can not say which particular page have five misprints.
iii) Someone has said, "if a person is standing with his one foot on a burning stove and another on an ice chip then a statistician
would say that on al average the person is in a comfortable position".
Apart from the humorous part of it, we have to remember that the results derived using statistical methods are not cent percent
accurate. They are true only on an average and in the long run and there may not be single observation assuming the value of the
average. For example, the average marks of some students in a subject may be 58.4 while there will not be a single student getting
58.4 marks.
iv) Mark Twain once remarked, “there are three types of lies - ordinary lies, damned lies and statistics". In a way he is right because
anything can be proved using statistical techniques. A statement given with facts and figures carries more weight. The people are
more convinced by anything written in black and white.
v) The personal bias (conscious or unconscious) of an investigator collecting the information may discredit the entire work, or a
statistician may draw inaccurate inferences which may lead to wrong interpretations.
Thus, though statistics is a powerful science, it has to be used with extreme care so as to avoid improper conclusions and incorrect
decisions.
CLASSIFICATION OF DATA
1)Geographical or Spatial Classification: If the data is classified according to
geographical areas or locations, it is called Geographical classification. For example, the
figures of sales of a certain product, for a particular time period, can be arranged
according to states like Maharashtra, Uttar Pradesh, Kerala etc. The figures of export of
tea to different countries like France, England, Italy etc. can be represented.
For instance, the total number of commercial feature films produced in India in 1991
according to states may be represented as follows:
The items may be listed alphabetically or according to magnitude. In the above table, the
items are listed in decreasing order of the magnitude.
2) Chronological or Historical Classification: When the data is classified with reference to
time it is called chronological classification. Here, the values are arranged over a period
of time, so it is also known as Time Series. The figures in a time series are arranged in
chronological order, that is, beginning with the earliest period.
This type of classification is widely used to represent data about production, sales,
exports etc. over a period of time. Consider the following classification of the total prize
money, expressed in pounds at Wimbledon, during 1985-1992.
3)Qualitative Classification:! When classification is made according to some attributes like
colour, skill, intelligence etc., it is Qualitative Classification. We have already seen that
attributes can not be measured. The units can only be divided into groups of those who
possess the attribute and those who do not. Here, the groups are formed on the basis of
qualitative differences. Sales classified by type of products, population classified according
to sex or religion, sexwise classification of students at a certain university, classification of
workers according to the type of work done are some examples of Qualitative
Classification.Consider the following illustration of Qualitative Classification. It gives
classification of students in a college according to class.
4) Quantitative Classification: If the classification is made according to a characteristic which is measurable, it is Quantitative
Classification. Here, the data is classified according to magnitude of the characteristic. Here the data are expressed in quantitative
terms.
For example,i) classification of employees according to incomes
ii) classification of students according to their heights, weights, marks etc.
iii) The factories, classified according to the number of workers
iv) classification of persons with respect to age
v) Salesmen classified according to their yearly sales
The measurable characteristic is called a variate or a variable. In the above examples, the variables are income, height, weight,
marks, number of workers, age and sales.
PARTS OF A TABLE
1. Table Number: If there are many tables to be presented they must be serially numbered. The table
number should be written in the centre at the top.
2
2. Title: The title should be written in the centre at the top of the table, below the table number. The title must be self-explanatory and
brief. It must contain the names of the characteristics presented in the table. Also the geographical or physical area and the specific
time period covered by it should be mentioned in the title. The title should be prominently written in bold case letters.
3. Caption: The caption refers to the name of the column heading and is written at the centre of the column. A column heading, if
necessary may be divided into subheadings. The units, if required, should be mentioned with the captions like Age in years or
Height in cms etc.
4. Stub: A stub is a heading used for a row. It is written at the extreme left. It can be accompanied by the units in which the
characteristic is expressed.
5. Body of the Table: The numerical data expressed in the table constitutes the body of the table. The data may be arranged
chronologically or serially in ascending or descending order. Important figurers may be prominently presented by underlining them of
writing them in bold numbers. The row totals are mentioned in extreme right hand column. The column totals are placed at the
bottom of columns and the grand-total, is written as the entry in the last column and last row.
6. Head Note: It is a short statement about all or major parts of the table. It is written below the title in brackets.
7. Footnotes: If any clarification is needed about the parts of a table like title, captions or stubs, it is given as a Footnote. The
Footnotes are written at the bottom of the table. Any heterogeneity in the data presented or any exception may be explained in a
Footnote. Some special circumstances like strikes, fire affecting the data may be mentioned in a footnote. It may also be used to
refer to the source of the secondary data. The footnotes may be serially numbered as 1, 2, 3, --- or special characters such as an
asterisk (*) or a dagger (--) may be used to identify the Footnotes.
The following specimen shows the placements of various parts of a table.
AVERAGES
One of the objectives of the analysis of data is to get one single value which can describe characteristics of the entire mass of the
data, which can be considered as representative of the entire distribution. A value satisfying this criterion is a central value or an
"average".
In practice, the word "average" is used with different meanings. For instance, an average student, average height of boys, average
Hindi film, average actor, average income, etc. In some cases, we use the term "average" to denote a mediocre type e.g. average
student, average actor, average film, etc. In some other cases by the expression "average" we mean "typical" or "usual" e.g.
average Indian, average housewife etc. In statistical terms the average refers to a value of obtained by a specific process like
average height or average income.
In Statistics, the average is representative or typical value of the data. It usually lies somewhere near the centre of the group and
that is why the averages are termed as measures of Central Tendency or Central Value. It depicts the main characteristics of the
data. Large volumes of data cannot be easily understood or remembered. So a single value, summarising prominent features of the
data is needed.
If two or more sets of data are to be compared, it is not possible to compare each and every item. So, we require one figure,
representing the entire data in condensed form. For example, average salaries of employees of two companies of same type can be
compared. Suppose these are 2,500 and 2,150. The employees of the second company can demand a raise in salary based on
these results, or suppose, average marks at the terminal examination of students of two divisions of F.Y.B.Com. are 65.2 and 45.8
respectively. Then, some arrangement of special coaching can be made for students of second division. Thus, averages can
facilitate inter-comparison of different characteristics.
While drawing conclusions, care has to be taken to study the number of forces affecting the data. For instance, in the previous
example of students of two divisions, the divisions might have been formed according to marks at H.S.C. examination and the first
division may have students with higher percentage, which explains the average marks of 65.2 at terminal examination. Another point
to be noted is the same type of measure must be used to compare two or more sets of data.
Requisites of a Good Average:
1. It should be easy to understand 3. It should be capable of further 5. It should not be affected much by
and easy to calculate. algebraic treatment.. sampling fluctuations.
2. It should be based on all 4. It should not be affected by
observations. extreme values.
TYPES OF AVERAGES:The averages can be classified into two groups mathematical averages and - positional averages. The
mathematical averages are based on all observations and they are calculated using mathematical formulae. The averages are:
(i) Arithmetic Mean (ii) Geometric Mean (iii) Harmonic Mean
However, we will study only the first two averages.
The positional averages are based on only some of the observations and are located at a specific place in the sets. They are also
called "measures of location". They are:
(i) Median. (ii) Mode
POSITIONAL AVERAGE
1. MEDIAN Median is an important measure of location. When the raw or ungrouped data are arranged in ascending or descending
order, the middle observation or the arithmetic mean of two middle observations is the median.
3
3. MODE: The mode is defined as the value of a variable which occurs most frequently. It is a value which is repeated maximum
number of times or with highest frequency. So, mode is considered as the most typical average. Graphically, it is the value on x-axis
corresponding to the peak of the frequency curve.
If the data are ungrouped, mode can be obtained from inspection as the value with the maximum frequency. If we want to calculate
the most common height for a group of students or the most common size of ready made shirts we have to consider the mode as
the average. In market surveys, to know consumers' preference, mode is considered as the most suitable average.
For ungrouped data, for small sets mode can be found by inspection. But for grouped data, mode is calculated with the help of
interpolation formula. If a distribution has two or more values of maximum frequency, then the distribution is known as bimodal,
trimodal or multimodal.
It is easily understood and the calculations are also simple. In If some extreme values are not known and the total number
some cases, it can be obtained by mere inspection. of observations is known, median can be obtained.
Thhe sum of the absolute deviations of all the values from the When the distribution of the data is not symmetrical, median
median is minimum. is an appropriate average..
Median can be located graphically with the help of ogives. It is a value which exists in the data in many cases
Demerits of Median:
Its calculation requires prior arrangement of the data in (iii) The median is not capable of further It is affected by sampling
ascending or descending order. mathematical treatment. fluctuations.
(i) It is not based on all observations so it may not be a For continuous variate case, the formula is obtained on the assumption
good representative of the data in some situations. of uniform distribution of frequencies over the class intervals. This
assumption may not be true.
MEASURES OF DISPERSIONS
4
We have studied various measures of central tendency such as mean, median, mo in the previous chapter. But they are not
adequate to describe the For instance, consider the following sets of observations: distribution
Set A: 35, 37, 34, 38 and 46 with mean 38
Set B: 10, 90, 45, 12 and 33 with mean 38
Set C: 38, 38, 38, 38 and 38 with mean 38
All the sets have the same mean 38, but if the values in the sets are observed carefully it can be seen that in setC, the average 38
completely represents the distribution;in set A, only one value is represented by the average and in setB average 38 represents
none of the values.Also, the variation of the items is nilinis setC and is maximum in set B.
Thus, it is quite clear that in addition to averages, some additional information about the variation of items is required, to know the
extent to which the values vary from one another and from central value. A measure of spread of scatter of the data is called a
measure of variation or dispersion. The measures of dispersion can give us idea about reliability of the averages. Ra When the
dispersion is less, the average is more reliable so that it is a better estimate Ram of the population average, and if, the dispersion is
more, the average is not a good So representative of the data. The measures of dispersion determine the extent of variation in the
data, by which, some steps can be taken to control the variability. For instance, in factories quality control techniques can be applied
to control the variation. The measures of dispersion can be used to compare two or more distributions. The one with less dispersion
is more consistent or homogeneous and the one with more dispersion is less consistent. The study of dispersion leads to further
advanced techniques in analysis such as Statistical Quality Control, Cost Control, Inventory Control, etc.
Requisites of a Good Measure of Dispersion
(i) It should be easy to understand (iii) It should be based on all the (v) It should not be affected much by
and easy to calculate. observations. extreme values.
(ii) It should be rigidly defined. (iv) It should be capable of further (vi) It should have sampling stability.
algebraic treatment.
There are two types of measures of dispersion.
(a) Absolute Measures giving actual extent of scatter of the data and
(b) Relative Measures expressed as pure numbers, independent of the unit of measurement.
Corresponding, to each absolute measure of dispersion, a relative measure can be defined which can be used to compare two or
more distributions. Now, let us study these measures.
CORRELATION
REGRESSION
Regression analysis is a method of predicting or estimating one variable knowing the value of the other variable. Estimation is
required in different fields in every day life. A businessman wants to know the effect of increase in advertising expenditure on sales
or a doctor wishes to observe the effect of a new drug on patients. An economist is interested in finding the effect of change in
demand pattern of some commodities on prices.
We observe, different pairs of variables related to each other like saving depends upon income, cost of production depends upon the
number of units produced, the production depends on the number of workers present on a particular day etc. The relationship
between two variables can be established with the help of any measure of correlation, as we have seen in the previous chapter.
When it is observed that two variables are highly correlated, it leads to interdependence of the variables. We can study the cause
and effect relationship between them and then we can apply the regression analysis. The analysis helps in finding a mathematical
model of the relationship. In this chapter we will discuss linear regression only.