Sheet - 1 - EEE - Introduction of Statistics - Not in Syllabus
Sheet - 1 - EEE - Introduction of Statistics - Not in Syllabus
Statistics:
‘ When we think of statistics, in practice we think of a set of data that has been collected.’
’ Statistics is the science of data.
Statistics is the art of learning from data. (Ross, 2010)
Statistics is the study of data.
Statistics is the use of data to help the decision maker reach better decisions.
Statistics is the study of the collection, organization, presentation, analysis, and interpretation, of numerical data.
Initially it was regarded as the ‘science of statecraft’.
Statistics has now been considered as indispensable part of our everyday life.
Statistics: It may be defined as the branch of science, which deals with the collection, organization, presentation,
analysis and interpretation of numerical data in any field of enquiry to assist in making more effective decisions.
Dr. A. L. Bowley defined “ Statistics are numerical statement of facts in any department of enquiry placed in relation
to each other”
According to Croxton and Cowden, : Statistics may be defined as the science of collection, presentation, analysis, and
interpretation, of numerical data”.
Agricultural statistics: A branch of Statistics where statistical methods and techniques are used to analysis the data
collected from various fields of agriculture.
Business statistics, biostatistics, psychometry, education statistics etc. are branch of statistics.
Origin and development of Statistics: The word ‘Statistics’ seems to hsve been derived from the Ltin word ‘status’
or the Italian word ‘statista’ or the German word ‘statistik’, French word ‘statistique’ each of which means a ‘political
state’. Thus it was regarded as the ‘science of statecraft’
Major Area of Statistics:
1. Descriptive Statistics
2. Inferential statistics
Descriptive statistics:
It involves methods of organizing, picturing and summarizing information from data.
Inferential Statistics:
Inferential statistics involves methods of using information from a sample to draw conclusions about the population
characteristics.
Characteristics of Statistics:
a. Statistics deals with the aggregates of facts rather than with individual alone.
b. Statistics, generally are not the outcome of a single cause, but are affected by multiple causes.
c. Statistics are numerically expressed.
d. Statistical data is collected in a systematic manner.
e. Statistics are collected for a predetermined purpose.
f. Statistics are enumerated or estimated according to reasonable standard of accuracy.
g. Statistics are comparable and homogenous.
Functions of Statistics:
Page 2 of 15
1. It condenses and summarizes voluminous data into a few presentable, understandable and precise figures. In other
words, it simplifies mass of figures.
2. It facilitates classification and comparison of data.
3. It helps in determining relationship between two or more phenomenon: correlation.
4. It helps in predicting future trends.
5. It helps in formulating and testing suitable hypothesis.
6. It helps the central management and the government in formulating suitable policies.
Limitations of Statistics:
1. Statistics deals with aggregates of items and not with isolated/individual item/ or measurement.
2. Statistics deals only with quantitative characteristics.
3. Statistical laws hold good only for the averages.
4. It plays only an auxiliary role in summarizing a fact.
5. Statistics can be misused.
Statistics are liable to be misused and misinterpreted. As it is well known “there are three kinds of lies- lies,
damned lies and statistics”.
Importance:
* Statistics of wealth and manpower are important for development and planning.
* Statistics are invaluable in business and commerce;
* Statistics helps the planner to estimate the revenue income and expenditure of the country;
* Agriculture statistics may play a key role in agriculture development.
* In industry, statistics is widely used to provide quality control;
* Statistics is usually used in education, life science, social science and psychology
Infinite population: Number of fishes in a river, number of insects in a large agricultural fields etc.
Sample: A representative part of a population that is considered for study and analysis is called sample.
Example: Some employees of a firm
Population Sample
1. The totality of all elements under study or enquiry is A representative part of the population is called
called population sample.
A population includes each element from the set of A sample consists only of observations drawn from
observations that can be made the population. It is a subset of population
Population may be finite or infinite Sample must be finite
Collecting data from every element of a population is Collecting data is relatively easy.
not easy.
All students of statistics Few students drawn from students of statistics.
Page 3 of 15
Parameter: Any numerical measure or value that describes an unknown characteristic of a population is called
parameter.
µ, σ2, etc. are the parameters.
Statistic: Any numerical value that describes a characteristic of a sample is called a statistic.
Data:
A set of observations obtained from a particular enquiry is called data or a data set. Single observation is known as
datum.
Variable: A measurable characteristic or phenomenon, which varies from unit to unit under consideration, is called
a variable. Example: Weight, height, gender.
Variable
Qualitative Quantitative
Discrete Continuous
Quantitative (also known as numerical variable): A variable is called quantitative when it measures a numerical
quantity or amount on each experimental unit. Example: length and diameter of trees.
Qualitative: A qualitative variable is one for which numerical measurement is not possible but can be categorized
under same qualitative characteristics. It is also known as categorical variable or attribute.
For example: Gender of patients of a clinic, teaching performance of a professor, opinion of the economists regarding
the economic conditions in the country, etc.
Scales of measurement:
Organization of Data
Introduction: The data which has been collected are in raw or disorganized form. It is very important that the numerical
findings of any study be presented clearly and concisely and in a manner that enables one to quickly obtain a feel for
the essential characteristics of data which is difficult to explore from the collected raw data. Hence organization or
Summarization of data is required for presentation and analysis of data.
Classification: Classification is the process of arranging data into different groups or classes according to their
common characteristics.
Objectives/Purposes:
(i) To condense the mass of data (ii) to bring out clearly point of similarity and dissimilarity.(iii) To prepare the data
for tabulation (iv) To facilitate comparison. (v) To pinpoint the most significant features of the data at a glance.
Types/Basis of classification:
(a) Geographical --- Area wise, Cities, district, divisions, rural, urban
(b) Chronological --- On the basis of time
(c) Qualitative --- According to some attributes (quality) population: urban and Rural
(d) Quantitative --- in terms of magnitudes:
Tabulation: Tabulation is a logical and systematic organization/ arrangement of statistical data in rows and columns.
Frequency: The number of times that a given value occurs into each group or class is termed as class frequency or
simply frequency.
Frequency Distribution: Arrangement of observational data in different groups according to frequencies of the
observations is called frequency distribution. In other words Frequency Distribution is nothing but the organization
of raw data in the form of table using classes and frequencies.
7.4 2.0 3.7 9.4 4.4 4.8 7.6 9.9 8.3 6.5 7.3 4.1 5.2 8.1 3.4 10.0 7.1 4.1 7.2 9.4
8.2 3.1 10.9 7.1 5.2 6.9 7.3 7.3 8.2 5.7 7.0 8.0 5.7 4.0 7.2 9.4 8.2 4.4 5.2 8.9
6.7 4.3 11.7 8.1 6.4 6.0 5.1 8.2 8.9 7.3 7.8 7.4 6.8 6.7 5.3 11.2 7.0 7.5 7.5 7.6
7.1 7.7 8.8 8.0 7.1 6.8 7.4 10.0 7.2 8.8 8.8 7.2 10.1 9.3 8.3 6.7 4.1 5.1 7.4 7.4
7.3 9.8 8.2 7.7 7.6 9.8 6.0 9.7 5.1 6.0 7.1 9.3 9.1 7.6 7.5 9.3 8.5 7.3 9.4 9.4
1. Find the range: The largest value is 11.7 and the smallest value is 2.0.
Thus range= largest value - smallest value= 11.7-2.0=9.7
2. Number of classes: Find the number of classes to be made. Here n =100. So the number of classes
should be K=1+3.322 Log N K=1+3.322 Log 100= 7.644 8.
3. Class Interval: The length (size) of classes should be around = Range = 9 .7 =1.3.
No. of Classes 8
4. Now make the classes with the interval 2.0 -3.3, 3.3 -4.6, … , 11.1 -12.7 in the following way.
Bi-variate Frequency distribution for Quantitative variable (also known correlation table):
Problem: The following data represent the temperature (in 0c) and humidity (in %) in different days of the year:
Temp 33.0 33.5 32.6 32.4 32.8 32.2 33.4 33.4 32.2 33.7 33.8
Humidity 82 81 85 84 81 78 81 82 84 80 78
Temp 25.2 27.9 30.2 31.9 33.8 31.3 31.2 32.9 33.8 321.5 29.0
Humidity 81 76 71 81 82 83 89 89 84 82 82
Temp 21.3 27.6 30.7 34.0 34.9 35.7 32.8 32.8 32.6 29.8 26.7
Humidity 84 75 69 74 74 76 82 90 89 88 86
Temp 22.5 27.3 28.8 30.9 32.2 32.7 30.5 30.8 31.6 32.4 30.7
Humidity 78 71 72 81 82 86 90 90 86 85 80
Construct a bivariate frequency table to show the temperature and humidity (left to participators).
Graphical representation: Statistical data may be presented through some visual aids, refers to Graphs
and Diagrams.
Bar diagram: Simple bar diagram is the most popular diagrammatical representation of qualitative data
Represent the data by bar diagram, horizontal bar diagram, component bar diagram and multiple bar diagram.
Bar Diagram
140
120
Population
100
80 Series1
60 120 130
100
40 75
60
20
0
1 2 3 4 5
Census Year
3 Series1
1
0 50 100 150
Total Population
140
120
100 Series2
80
60
40 Series1
20
0
1 2 3 4 5
Census Year
Page 7 of 15
0
1 2 3 4 5
Census Year
Pie diagram: This type of diagram enables us to show the partitioning of a total into its component parts.
Relative Angles of
Sector Expenditure
Expenditure different sectors
Agriculture 80 0.30 108.00
Industry 70 0.26 93.60
Education 40 0.15 54.00
Transport 25 0..09 32.40
Other 55 0.20 72.00
Total 270 1.00 360
Pie chart of the expenditure of different sectors is exhibited in figure 4.4.2..
Histogram: The most common form of graphical presentation of a frequency distribution is the histogram. A
histogram is a bar diagram which is suitable for frequency distribution with continuous classes.
Table-1: Frequency distribution table of birth weight of 100 newborn babies (exclusive method):
9.8 -- 11.1 7 98
11.1 -- 12.4 2 100
Total 100
35
30
25
Frequency
20
15
10
Frequency polygon:
b) Ogive curve: The cumulative frequency curve or ogive is the graphic representation of a cumulative frequency
distribution. Ogives are of two types. I) Less than ogive and ii) greater than ogive
i) Less than ogive: In this case less than cumulative frequencies are plotted against the lower boundaries of their
respective class intervals.
ii) Greater than ogive: In this case greater than cumulative frequencies are plotted against upper boundaries of their
respective class intervals.
Page 9 of 15
100
Cumulative frequency
80
60
40
20
0
12.0 3.3
2 3
4.6 4
5.9 5
7.2 6
8.5 7
9.8 8 12.49
11.1
Objective of averaging:
There are two main objectives of the study of averages:
1. To get one single value that describes the characteristics of the entire data.
2. To facilitate comparison: a. compare result of two colleges--- which one is better?
b. result of same college for 2 years--- whether improving the result
GM=Antilog
Geometric mean (GM) 1
f i log xi
GM= ( x1 x2 ........xn ) , n
n
log x1
or GM= Antilog
n
Problem: Suppose that we have a family of seven members whose age in years are 12, 7, 21, 34, 17, 21
and 2. Compute AM, GM, HM, Median and Mode.
7
x
i =1
i
x1 + x 2 + x 3 + x 4 + x 5 + x 6 + x 7 12 + 7 + 21 + 34 + 17 + 21 + 2
Solution: AM = = = = 16.29
7 7 7
7 7 7
Here HM = = = = 7.69
1 1 1 1 1 1 1 1 0.91
+ + + + + +
xi 12 7 21 34 17 21 2
Median: Arrange data first, then most middle value is median. 2, 7, 12, 17, 21, 21, 34
Arithmetic mean, x =
f x i i
=
734.60
= 7.34
n 100
Arithmetic mean (By indirect method)
x = a + h u n = n + n x = a + hu
xi − a x a h u i i
Let u i = x i = a + hu i i i
h
AM by Indirect method, x = a + hu = a + h
f u = 6.55 + 1.30 x 61 = 6.55 + .79 = 7.34 which is equal to the direct
i i
n 100
method
Geometric mean:
n n
84.74
Here GM = Antilog = Antilog0.85 = 7.08
100
Page 12 of 15
n 100
Harmonic mean, HM= = = 6.71
fi 14.90
xi
Median:
n
− Fm
Median, Me = L + 2 C,
M
fm
where, Lm = Lower limit of the median class; N = Total Frequency
fm = Frequency of the median class Fm= Cumulative frequency of the pre-median class
C = Width of the median class.
n
− Fm
50 − 40
Here Me = LM + 2 C = 7.2 + 1.30 = 7.56
fm 36
Similarly
3n 3 x 100
− Fm − 40
75 − 40
3rd Quartile: Q3 = L + 4 C = 7.2 + 4 x 1.30 = 7.2 + x1.3 = 8.46
fm 36 36
Interpretation: 75 per cent of the new born babies weigh 8.46 lb or less.
7n 7 x 100
− Fm − 40
10 70 − 40
7 Decile D7 = L +
th
C = 7.2 + 10 x 1.30 = 7.2 + x 1.3 = 8.28
fm 36 36
80n 80 x 100
− Fm − 76
4
80th percentile, P80 = L + 100 C = 8.5 + 100 x 1.30 = 8.5 + x 1.3 = 8.85
fm 15 15
1 1
(36 − 19)
Mode: Mo = L + C =L+ C= 7.2 + x1.30
1 + 2 1 + 2 (36 − 19) + (36 − 15)
17 17
= 7 .2 + x 1.3 = 7.2 + x 1.3 =7.78
17 + 21 38
L = Lower limit of the modal class (the class which corresponds to the maximum frequency )
1 = The difference between the frequency of the modal class and pre-modal class
2 = The difference between the frequency of the modal class and post modal class
C = width of the modal class
Median (using GRAPHICAL LOCATION ) of Median, 3rd Quartile, 7th Decile and 80th Percentile:
Page 13 of 15
100
Cumulative frequency
80
60
40
20
0
12.0 3.3
2 3
4.6 4
5.9 5
7.2 6
8.5 7
9.8 8 12.49
11.1
Median = 7.6
P80 =8.90
D7 = 8.20 Q3= 8.45
Upper limit of the class boundary
d) Mode:
35
30
25
Frequency
20
15
10
0
2.0 3.3 4.6 5.9 7.2 8.5 9.8 11.1 12.4
Mode = 7.8
Class boundary
Page 14 of 15
Criteria AM GM HM Me Mo
Definition Rigidly defined Rigidly defined Rigidly defined Rigidly Not Rigidly
defined defined
Data restriction No restriction Values must be Values must be No restriction No
nonzero & positive nonzero restriction
Computation Easy Slightly difficult Slightly difficult Easy Easy
Based upon all Yes Yes Yes No No
observation
Effect of Less affected Not affected Less affected Not affected Not affected
extreme values
Sampling Little Little Little Much Much
fluctuation
Graphical Not possible Not possible Not possible Possible Possible
Location
Further Possible Possible Not possible Not possible Not possible
algebraic
treatment
Easy to calculate and easy to understand As compared with mean, affected to a great extent
by sampling fluctuation
Not affected by extreme values Not capable of further mathematical
calculation
Include open-ended class interval -
--- x --