0% found this document useful (0 votes)
5 views

BASIC STATISTICS (Definitions)

The document provides an overview of basic statistics concepts, including definitions of information, data, primary and secondary data, variables, and various types of data handling. It explains measures of central tendency such as mean, median, and mode, as well as measures of dispersion like range, variance, and standard deviation. Additionally, it covers methods for presenting data, including tabulation, frequency distribution, and graphical representations like histograms and frequency polygons.

Uploaded by

sabasalman32164
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

BASIC STATISTICS (Definitions)

The document provides an overview of basic statistics concepts, including definitions of information, data, primary and secondary data, variables, and various types of data handling. It explains measures of central tendency such as mean, median, and mode, as well as measures of dispersion like range, variance, and standard deviation. Additionally, it covers methods for presenting data, including tabulation, frequency distribution, and graphical representations like histograms and frequency polygons.

Uploaded by

sabasalman32164
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

BASIC STATISTICS

INFORMATION:
To know about something is known as ‘Information’.
INFORMATION HANDLING:
To present the information in a manageable way so that useful conclusions can be drawn, is called
‘Information handling’.
DATA:
The numerical figures obtained from any field of study are known as ‘Data’.
It can be obtained from existing sources, office records, published papers, or the same can be obtained
directly from the field according to needs.
PRIMARY DATA:
The data directly collected from its source is called ‘Primary data’.
SECONDARY DATA:
The data which have been passed through some statistical treatments at least once, is called
‘Secondary data’. e.g. the raw data, when put in some order change into secondary data.
CONSTANT:
Any quantity that has a single value is called a ‘Constant’.
VARIABLE:
Any characteristic whose values are always different from one individual to another, is called a
‘Variable’.
DISCRETE VARIABLE:
It can only take some specific values present in the data. It is always a whole figure, cannot be a
fraction.
CONTINUOUS VARIABLE:
It can take every possible value in a given interval (say a to b). It can be a whole figure or a fraction.
UNGROUPED DATA:
Numerical figures which are obtained on the first hand and recorded as they stand, are known as
‘Ungrouped data’.
TABULATION:
‘Tabulation’ of the data means to present the data in a classified form or into rows and columns.
CLASSIFICATION:
‘Classification’ is a process of arranging the data into certain groups or classes of similar characteristic.
The main function of classification is to condense a large number of observations into easy and
understandable form.
FREQUENCY DISTRIBUTION:
A ‘Frequency distribution’ is a tabular arrangement for classifying data into different groups and the
number of observations falling in each group corresponds to the respective group.
GROUPED DATA:
The data in the form of ‘Frequency distribution’ is called ‘Grouped data’.
CLASS LIMITS:
Each class or group is defined by two values, one small and one large, called the ‘Class limits’.
The smaller one is called ‘Lower class limit’ and the larger one is called ‘Upper class limit’.
SIZE OF CLASS INTERVAL:
The ‘size’ or ‘width’ of a class interval is defined as the difference between two consecutive lower
limits or two consecutive upper limits of a group. It is denoted by ′ℎ′.
Range 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
ℎ= =
No. of groups 𝑘
CLASS FREQUENCY:
The number of occurrences of items corresponding to a class interval is called ‘Class frequency’.
CLASS MARK:
‘Class mark’ is defined as the midpoint or average of a class.
It is obtained by dividing sum of lower and upper limits of a class by 2.
CLASS BOUNDARIES:
The real class limits of a class or group are called ‘Class boundaries’.

Class boundaries can be obtained from midpoints as (𝑥 ± ℎ⁄2).

CUMULATIVE FREQUENCY:
The total of frequency up to an upper class limit or boundary is called ‘Cumulative frequency’.
HISTOGRAM:
A ‘Histogram’ is the graph of adjacent rectangles.
In histogram, width of the rectangles corresponds to size of class interval and heights of rectangles correspond
to class frequency. Histogram is usually used when size of class intervals is unequal.
FREQUENCY POLYGON:
A ‘Frequency polygon’ is a many sided closed figure in which class marks are taken along 𝑥 − axis and
frequencies along 𝑦 − axis.
CUMULATIVE FREQUENCY DISTRIBUTION:
A table showing cumulative frequencies against upper class boundaries is called ‘Cumulative frequency
distribution’. It is also called a ‘Less than cumulative frequency distribution’.
CUMULATIVE FREQUENCY POLYGON/OGIVE:
The graph of a Less than cumulative frequency distribution is called ‘Cumulative frequency polygon or
Ogive’ in which cumulative frequencies are plotted against upper class boundaries.
CENTRAL TENDENCY:
‘Central tendency’ is more or less a central value around which the data appear to be crowded.
It is a single representative value which shows tendency or behavior of the distribution of the variable under
study.
MEASURES OF CENTRAL TENDENCY:
The measures or techniques that are used to determine the central value are called ‘Measures of
central tendency’.
The following measures of central tendency will be discussed.
1. Arithmetic mean 2. Median 3. Mode
4. Geometric mean 5. Harmonic mean 4. Quartiles
ARITHMETIC MEAN:
‘Arithmetic mean or simply called Mean’ is a single value obtained by dividing sum of all observations
by their total number. It is denoted by 𝑋̅.
FORMULAE OF ARITHMETIC MEAN:
1. DIRECT FORMULA FOR UNGROUPED DATA:
Σ𝑥 Sum of all observations
𝑋̅ = =
𝑛 no. of observations
2. SHORT FORMULA FOR UNGROUPED DATA:
‘Deviation’ is defined as difference of observed values of data and a constant.
So, 𝑫 = 𝒙 − 𝑨, where 𝐴 is a constant called ‘Assumed or Provisional mean’, is called Deviation.
ΣD
𝑋̅ = 𝐴 +
𝑛
3. CODING FORMULA FOR UNGROUPED DATA:
ΣU
𝑋̅ = 𝐴 + ×ℎ
𝑛
Where ℎ is the constant multiple of values of 𝑋.
4. DIRECT FORMULA FOR GROUPED DATA:
Σf𝑥
𝑋̅ =
Σf
5. SHORT FORMULA FOR GROUPED DATA:
ΣfD
𝑋̅ = 𝐴 +
Σf
6. CODING FORMULA FOR GROUPED DATA:
ΣfU
𝑋̅ = 𝐴 + ×ℎ
Σf
MEDIAN:
The middle most observation in an arranged data set is called ‘Median’. It divides the data into two
equal parts, i.e. 50% data lies before median and 50% after it. It is denoted by 𝑋̃.
FORMULAE FOR MEDIAN:
1. MEDIAN FOR UNGROUPED DATA:
CASE 1: When number of observations ‘n’ is odd.
𝑛+1
𝑋̃ = ( ) th observation
2
CASE 2: When number of observations ‘n’ is even.
1 𝑛 𝑛+2
𝑋̃ = [ th observation + ( ) th observation]
2 2 2
2. MEDIAN FOR GROUPED DATA:
ℎ 𝑛
𝑋̃ = 𝑙 + ( − 𝑐)
𝑓 2
Where, 𝑙 : lower class boundary of median class.
ℎ : class interval size of median class.
𝑓 : frequency of the median class.
𝑐 : cumulative frequency of the class preceding the median class.
QUARTILES:
The values which divide an arranged data set into four equal parts are called ‘Quartiles’.
ℎ 𝑛
FIRST QUARTILE is 𝑄1 = 𝑙 + 𝑓 ( 4 − 𝑐)
ℎ 𝟑𝒏
THIRD QUARTILE is 𝑄3 = 𝑙 + 𝑓 ( 4 − 𝑐)

MODE:
The most frequent occurring observation in a data is called ‘Mode’.
1. MODE FOR UNGROUPED DATA:
Mode = the most frequent observation
2. MODE FOR GROUPED DATA:
𝑓𝑚 − 𝑓1
𝑋̂ = 𝑙 + [ ]×ℎ
2𝑓𝑚 − 𝑓1 − 𝑓2
Where, 𝑙 : lower class boundary of modal class.
ℎ : class interval size of modal class.
𝑓𝑚 : frequency of modal class or maximum frequency.
𝑓1 : frequency of the class preceding the modal class.
𝑓2 : frequency of the class succeeding the modal class.
Empirical relation between Mean, Median and Mode is 𝐌𝐨𝐝𝐞 = 𝟑𝐌𝐞𝐝𝐢𝐚𝐧 − 𝟐𝐌𝐞𝐚𝐧
GEOMETRIC MEAN:
The nth positive root of product of ‘n’ observations is called ‘Geometric mean’.
1. Basic formula of Geometric mean for Ungrouped data:

𝐺. 𝑀. = 𝑛√𝑥1 . 𝑥2 . 𝑥3 … … … 𝑥𝑛
2. Logarithmic formula of Geometric mean for Ungrouped data:
1⁄ 1⁄
𝐺. 𝑀. = (𝑥1. 𝑥2 . 𝑥3 … … … 𝑥𝑛 ) 𝑛 ⇒ log 𝐺. 𝑀. = log(𝑥1 . 𝑥2. 𝑥3 … … … 𝑥𝑛 ) 𝑛

1 1
log 𝐺. 𝑀. = log(𝑥1 . 𝑥2 . 𝑥3 … … … 𝑥𝑛 ) ⇒ log 𝐺. 𝑀. = (log 𝑥1 + log 𝑥2 + log 𝑥3 + … … … + log 𝑥𝑛 )
𝑛 𝑛
1 𝚺 𝐥𝐨𝐠 𝒙
log 𝐺. 𝑀. = Σ log 𝑥 ⇒ 𝑮. 𝑴. = 𝐚𝐧𝐭𝐢𝐥𝐨𝐠 ( )
𝑛 𝒏
3. Logarithmic formula of Geometric mean for Grouped data:
Σf log 𝑥
𝐺. 𝑀. = antilog ( )
Σf
HARMONIC MEAN:
The value obtained by reciprocating the mean of reciprocals of observations is called ‘Arithmetic
mean’.
1. Harmonic mean for Ungrouped data:
𝑛
𝐻. 𝑀. =
Σ 1⁄𝑥
2. Harmonic mean for Grouped data:
Σf
𝐻. 𝑀. =
𝑓
Σ ⁄𝑥
PROPERTIES OF ARITHMETIC MEAN:
1. Mean of a variable with similar observations, say constant ‘k’, is the constant ‘k’ itself.
2. Mean is affected by change in origin.
3. Mean is affected by change in scale.
4. Sum of deviations of observations from arithmetic mean is always zero.
Σ(𝑥 − 𝑥̅ ) = 0 (Ungrouped data)
Σf(𝑥 − 𝑥̅ ) = 0 (Grouped data)
WEIGHTED ARITHMETIC MEAN:
The relative importance of a number is called its weight.
When all the observations 𝑥1, 𝑥2 , 𝑥3 … … , 𝑥𝑛 are not equally important, certain weights 𝑤1 , 𝑤2 , 𝑤3 … … , 𝑤𝑛 are
associated with them depending on the importance or significance.
So, the ‘Weighted Arithmetic mean’ is defined as
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 … … + 𝑤𝑛 𝑥𝑛 Σ𝑤𝑥
𝑥̅ 𝑤 = =
𝑤1 + 𝑤2 + 𝑤3 … … + 𝑤𝑛 Σ𝑤
MOVING AVERAGES:
‘Moving averages’ are defined as the successive arithmetic means which are computed for a sequence
of days/months/years etc. at a time.
DISPERSION:
Statistically, ‘Dispersion’ means the spread or scatterness of observations in a data set.
The purpose of finding Dispersion is to study the behavior of each unit of population around the average
value. It also helps in comparison of two sets of data in more detail.
The spread or scatterness in a data set can be seen in two ways:
1. The spread of observations between two extreme observations in a data set.
2. The spread of observations around an average value, say their arithmetic mean.
MEASURES OF DISPERSION:
The measures that are used to determine the degree or extent of variation in a data set are called
‘Measures of dispersion’.
The following measures of dispersion will be discussed:
1. Range 2. Variance 3. Standard Deviation
Dispersion is not affected by change in origin but it is affected by change in scale.
RANGE:
The extent of variation between two extreme observations in a data set is called ‘Range’.
1. Range for Ungrouped data:
Range = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 = 𝑥𝑚 − 𝑥0
2. Range for Grouped data:
Range = (Upper class boundary of last class) − (Lower class boundary of first class)
OR
Range = Maximum midpoint − Minimum midpoint
VARIANCE:
The mean of squared deviations of observations from their arithmetic mean is called ‘Variance’.
1. Proper mean or Definitional formula for Ungrouped data:
Σ(𝑥 − 𝑥̅ )2
𝑆2 =
𝑛
2. Direct or Computational formula for Ungrouped data:
Σ𝑥 2 Σ𝑥 2
𝑆2 = −( )
𝑛 𝑛

3. Proper mean or Definitional formula for Grouped data:


Σf(𝑥 − 𝑥̅ )2
𝑆2 =
Σf
4. Direct or Computational formula for Grouped data:

2
Σ𝑓𝑥 2 Σf𝑥 2
𝑆 = −( )
Σf Σf
STANDARD DEVIATION:
The positive square root of mean of squared deviations of observations from their arithmetic mean is
called ‘Standard deviation’.
1. Proper mean or Definitional formula for Ungrouped data:

Σ(𝑥 − 𝑥̅ )2
𝑆=√
𝑛

2. Direct or Computational formula for Ungrouped data:

Σ𝑥 2 Σ𝑥 2
𝑆=√ −( )
𝑛 𝑛

3. Proper mean or Definitional formula for Grouped data:

Σf(𝑥 − 𝑥̅ )2
𝑆=√
Σf

4. Direct or Computational formula for Grouped data:

Σ𝑓𝑥 2 Σf𝑥 2
𝑆=√ −( )
Σf Σf

You might also like