0% found this document useful (0 votes)
2 views

1 Descriptive Part

Descriptive probability

Uploaded by

hafizyt2014
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

1 Descriptive Part

Descriptive probability

Uploaded by

hafizyt2014
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Chapter one

Introduction

Definition and classification of statistics


Definition of statistics
The word statistics means different to different according to the way they use it but all the meanings given can
be categorized in two different definitions.
Statistics in Plural sense
✓ It refers to any information about any activity expressed in numbers.
Statistics in singular sense
✓ When statistics is used in its singular sense, it has its modern meaning, it is defined as a branch of
mathematics or applied research which is concerned with the development and application of methods
and techniques for collecting, organizing, presenting, analyzing, and interpreting quantitative data
in such a way that the reliability of conclusions based on the data may be evaluated objectively in terms
of probability statements. This meaning of statistics refers to the study of statistics as a science.
✓ It is the science that deals with method of collecting, organizing, analyzing and interpreting the result.
Classification of Statistics
Statistics can be divided in to two broad categories
Descriptive statistics:
✓ It is a branch of statistics that deals with any methods or procedures used to organize and summarize
masses of numerical data into a meaningful form by using various statistical techniques such as table,
chart, graph, average and etc.
✓ This part of statistics comprises of the first four parts of collection organization presentation and
analysis of a numerical data.
Inferential statistics:
✓ It is a branch of statistics concerned with interpreting data & drawing conclusions.
✓ It lies on the last step in statistical investigation and is concerned with drawing conclusions about the
source data by taking a sample.
✓ It can be defined as the science of using probability to make decisions.

THE NATURE OF THIS DISCIPLINE

Descriptive Statistics Probability Inferential Statistics

Application, uses and limitation of statistics

Uses of statistics
Statistics is used in almost all fields of human activities and used by government bodies, private business firms
and research agencies as a major tool. Some of the uses are:
✓ It is also helpful in formulating and testing hypothesis and to develop new theories
✓ It can condenses and summarizes complex data
✓ It helps to predict the future trend
Application area of statistics
✓ In research work: statistics is indispensable in research work
✓ In engineering areas and physical science
✓ In economics and biological science
✓ In social science and politics etc
Limitation of statistics
As there is much usefulness of statistical methods, there are also many potential errors and limitations in carrying
out and interpreting statistical studies.
✓ Complete accuracy in statistics is often impossible.
✓ It cannot deal with a single value. But it deal with a set of data
✓ It cannot deal with qualitative data. It only deals with data which can be quantified. Ex: it does not deal
with marital status (married, single) but it deal with a number of married, a number of single
✓ Statistical values are true on average. The conclusions drawn from the analysis of the sample may
perhaps, differ from the conclusions that would be drawn from the entire population. For this reason
statistics is not an exact science.

1
Some Basic Terminologies in Statistics

Population:
✓ It is a totality of things, objects, people, etc with which the researcher is concerned.
✓ It can be qualitative or quantitative, finite or infinite
Sample:
✓ It is a portion or part of population of interest.
Parameter:
✓ It is a numerical characteristic of an entire population (Greek letters)
Statistic:
✓ It is a numerical characteristic of a sample (Latin letters)
Variable:
✓ It is a certain characteristic that difference from object to object.
Examples: Weight, stock prices, height, price of gasoline
Types of variables
1. Quantitative variables:
✓ They are variables that can be expressed numerically.
✓ They are variables that assume values of the measurable quantity.
✓ It can be classified as:
a. Discrete variables:
o They are variables whose values can obtain by counting.
o The possible values for such variables are 0, 1, 2…. Ex: number of children in a family,
number of trees in forest.
b. Continuous variables:
o They are variables whose value can take any value b/n two №.
o Their values are obtained by measuring. Ex: weight, height, rain fall records.
2. Qualitative variables:
✓ They are variables that cannot be expressed numerically.
✓ It is also known as categorical variables.
Note:
✓ In quantitative variable an operation such as addition or average can make a sense. But for qualitative
it can’t make a sense.
✓ A categorical variable is also known as an attribute, whereas a quantitative variable is often referred
to simply a variable.
✓ If the variable can assume only one value, it is called a constant.
✓ In general, measurements give rise to continuous data, while enumerations, or counting’s, give rise to
discrete data.
Data:
✓ It is the set of values collected for the variable for each of the elements of a population or sample
✓ Data are a numerical representation of a phenomenon.
✓ It is information that expressed in quantitative form

Types of Data

➢ Depending on the level (scale) of measurement

1) Nominal data- Categorical data where the categories are not ordered (e.g., ethnic group). Data that is
classified into categories and cannot be arranged in any particular order.
1. Ordinal data - Categorical data that can be ordered, but the increment between specific values is
arbitrary. data arranged in some order, but the differences between data values cannot be determined or
are meaningless
2. Cardinal data - Data on scale where addition is meaningful (e.g., change in 3 inches for height).
There are two types of cardinal data:
a) Ratio-scale data - Cardinal data on a scale where ratios between values are meaningful (e.g.,
serum-cholesterol levels).
b) Interval-scale data - Cardinal data where the zero point is arbitrary. for such data, ratios are
not meaningful (Julian dates; we can calculate the number of days between two dates, but we
can’t say that one date is twice as large as another date).

2
Note:
✓ For ratio, the origin (i.e., the value zero) is meaningful №. But the origin has no meaning with interval.
Consequently, we can add and subtract interval, we cannot divide & multiply them. In ratio we can
use all operations (i.e. addition, subt. Divi. multiplication)
✓ Nominal & ordinal scales are belongs to qualitative variables, whereas interval & ratio scale are
quantitative.

➢ Depending on time reference


1. Cross-sectional data:
✓ The data that are collected at a time.
✓ This is data collected at the same or one particular point in time on different elements. They are
snapshots that show how things are at one particular time. E.g., sales made at the same point in time
but at d/t market places.
2. Time series(longitudinal) data:
✓ The data that are collected over a period of time.
✓ This is data collected at several points in time from the some study objects or units
3. Panel data: combination of these two
➢ Depending on the source of data
1. Primary data: the data that are collected for the first time for the problem under consideration.
2. Secondary data:
• The data collected previously by others for their own purposes.
• The data which are obtained from archives of organization, bulletins, journals or website.

Basic Steps in Statistical Study


For any statistical study, there are some basic steps to be followed once we draw a sample.
Step1. Gather first-hand information from the sample & this is called the raw data
Step2. Tabular representation of the raw data, i.e. represent the raw data in a table
Step3. Pictorial representation of the data, i.e. draw a diagram with the organized data in a table.
Step4. Numerically summarize the data, i.e. describe the entire data set with some key №s.
Step5. Analyze the data using mathematical formulae
Step6. Draw the final inference or conclusion about the population under study.

Chapter Two
Method of Data Collection and presentation

Data can be collected in a variety of ways. One of the most common methods is through the use of surveys
Question: what is survey?
Survey:
✓ It is requiring data from individual directly or indirectly.
✓ It can be conducted through the mail, telephone, personal interview, etc.
✓ There are two kinds of survey:
1. Census survey (complete enumeration survey):
• It is a survey that includes every element in the population.
2. Sample survey:
• It is a survey that includes only subset of the population.
Note:
✓ If your data represents only a portion of the population you have a sample.
✓ If your data represents the entire population you have a census.
✓ Sample survey is better than census; b/c it reduces cost, reduces effort, and accommodate more detail
information.
✓ Census is better than survey, when the number of population is small, the populations are
heterogeneous.

Organizing a Raw Data Set


Once a sample is drowning, we observe the variable (categorical or quantitative) value for the sampled objects
or individuals. Each value thus obtained is called an entry or a data pt or simply an observation; & the collection
of all the entries or observations is called a data set or often abbreviated as data. The most convenient method
of organizing a raw data is to construct frequency distribution (frequency table)

3
Definition: frequency distribution (f.d)
✓ It is organizing data in table form, using classes & frequencies.
✓ It shows how many observations fall in various categories.
✓ It can be classified as:
A. Categorical (qualitative) f.d
B. Numerical (quantitative) f.d

A. Categorical (qualitative) f.d:


✓ It is used for the data that can be placed in specific categories, such as nominal or ordinal.
✓ Data are classified according to non-numerical categories.

Ex: 25 army inductees were given a blood test to determine their blood type. The data set is

A B AB B O
O O AB B B
B B A O O
A O O O AB
AB A B O A

Construct a frequency distribution for this data.


Solution:
Step1.Determine the class: Thus the classes are A B O & AB
Step2. Determine the frequency for each class
Therefore the f.d is as follows
Class A B O AB
Frequency (f) 5 7 9 4

Note:
We can transform the frequency distribution into a relative frequency distribution, percentage frequency
distributions & cumulative frequency distribution.
✓ In order to transform f.d to relative f.d we can use the f.f formulae
f
Relative f.d= , w/r f = actual frequency & n = total frequency
n
✓ In order to transform f.d to percentage distribution we multiply relative f.d by 100%
f
i.e. percentage distribution = * 100%
n
✓ In order to transform f.d to cumulative f.d we have to define cumulative f.d

Definition: Cumulative frequency distribution of a class is the sum of all frequencies preceding or succeeding
that class including the frequency of that class. There are two types of cumulative frequency distributions namely
“less than “and “more than “cumulative frequency distributions.
I. The “less than” cumulative frequency distribution (LCF) of a class is obtained by adding the frequency
of the preceding classes including the frequency of that class.
II. The “more than” cumulative frequency distribution (MCF)of a class is obtained by adding the
frequency of the succeeding classes including the frequency of that class.
• From the above example let as construct all form of f.d
Class Frequency Relative Percentage Cumulative frequency
frequency frequency LCF MCF
A 5 0.2 20% 5 25
B 7 0.28 28% 12 20
O 9 0.36 36% 21 13
AB 4 0.16 16% 25 4

4
Note: from the above table we can construct
 relative f.d as follows
Class Relative frequency
A 0.2
B 0.28
O 0.36
AB 0.16
 percentage f.d as follows
Class Parentage frequency
A 20%
B 28%
O 36%
AB 16%
 cumulative f.d as follows
Class Cumulative frequency
LCF MCF
A 5 25
B 12 20
O 21 13
AB 25 4

B. Numerical (quantitative) f.d:


✓ It is used to display numerical data type.
✓ Data are classified according to numerical size. This is used to summarize data collected by interval
and ratio level of measurement.
✓ It can be classified as
a) Ungrouped f.d: it is a f.d were we count the number of times each value of variables is represented. It is used
when the range of the data is small.
Ex: The data shown here represents the number of miles per that 30 selected four wheel drive sports utility
vehicles obtained in city driving. Construct a frequency distribution.

12 17 12 14 16 18 16 18 12 16
17 15 15 16 12 15 16 16 12 14
15 12 15 15 19 13 16 18 16 14

Solution:
Step1. Determine the class, (i.e. the classes are 12, 13, 14, 15, 16, 17, 18, and 19)
Step2. Determine the frequency for each class
Therefore the f.d is as follows
Class 12 13 14 15 16 17 18 19
Frequency 6 1 3 6 8 2 3 1

b) Grouped f.d: when the range of the data is large, the data must be grouped in to class that is more than one unit
in width, in what is called a grouped (continuous) f.d.
Ex: A machine produces the following № of rejects in each successive period of five minute. Construct f.d

16 21 26 24 11 17 26 25 13 27
24 26 3 27 23 24 15 22 22 12
22 29 18 22 28 25 7 17 22 28
19 23 23 22 3 19 13 31 23 28
24 9 20 33 30 23 20 8 21 24

Solution:
Step1.Determine the class
Here for grouped f.d we might have two types of class w/c is called class limit (CL) & class boundary (CB). In
order to have a class we have to use the f.f procedure

5
Determine the № of class (K). It can be calculated as k = 1 + 3.322 log10 . Where k=№ of class
n
I.
required (if the value becomes decimal round to the next whole №), n=№ of observation in the sample.
Or we can find K by using k = 2.5n .
14

II. Determine class width (interval) (size) (W). It can be calculated as W=Range/K. (If the value becomes
decimal round to the next whole №). W/r Range= max-min
III. Select starting pt or the lowest class limits (LCL). This can be the smallest data value. Add the width
to the lowest score taken as starting pt to get the lower limit of the next class. Keep adding the W until
the № of class becomes K.
IV. Subtract one unit from the lower limit of the 2ndclass to get the upper limit of the 1stclass. Then add the
class width to each upper limit to get all the upper limits.
V. Find the class boundaries by subtracting 0.5 from each upper class limit& adding 0.5 to the upper class
limit (UCL)
Step2. Determine the frequency for each class
The completed grouped f.d is as follows:
Class limit Class boundary Frequency
3-7 2.5-7.5 3
8-12 7.5-12.5 4
13-17 12.5-17.5 6
18-22 17.5-22.5 13
23-27 22.5-27.5 17
28-32 27.5-32.5 6
33-37 32.5-37.5 1

Note: consider the f.f table


✓ The use of CB is to separate the classes so that there are no gap in the f.d. Ex there is a gap b/n 7&8,12&13
✓ 5 to 15 classes usually are used. If you use fewer than 5 classes, you risk losing too information. If you use more
than 15 classes the data may not be sufficiently summarized.
✓ CL & CB are the same, w/n the diagonal of the class are the same.
✓ When all the classes have the same (uniform) class width (W) then the W of the distribution is the d/c b/n either
the LCL or UCL of the two consecutive classes.
✓ We can find the mid pt (class mark) (Xmi) from the frequency distribution. It can be computed as
 LCL i − UCLi   LCB i − UCBi 
X mi =   or  
 2   2 
Class limit Class boundary Frequency Xmi LCF MCF
3-7 2.5-7.5 3 5 3 50
8-12 7.5-12.5 4 10 7 47
13-17 12.5-17.5 6 15 13 43
18-22 17.5-22.5 13 20 26 37
23-27 22.5-27.5 17 25 43 24
28-32 27.5-32.5 6 30 49 7
33-37 32.5-37.5 1 35 50 1

Pictorial representation of the data


Practically everyone encounters graphs at one time or another.

Definition of graph:
✓ The word graph comes from the Greek word meaning ‘’to draw or write.’’
✓ We define a graph as a pictorial representation of a set of data.
✓ Many types of graphs are employed in statistics, depending on the nature of the data involved and the
purpose for which the graph is intended.

The step of pictorial representation comes after the raw data set has been pruned & organized
The most common & simple form of Pictorial representation of data are
✓ Bar chart
✓ Pie chart
✓ Histogram

6
Bar chart/bar diagram/bar graph
✓ It is used to display distributions of categorical variables.
✓ One bar per category – height is determined by frequency or relative frequency
✓ Order of categories is arbitrary.
✓ Does NOT let you talk about the shape of a distribution.
Features of a bar chart
✓ Bars can be horizontal or vertical
✓ Bars are of uniform width & uniformly spaced [leave space b/n each bar (category) to indicate distinct]
✓ The length of the bar represents values of the variable being displayed, the frequency of occurrence, or
the percentage of occurrence. The same measurement scale is used for the length of each bar.
✓ The graph is well annotated with title, labels for each bar, & vertical scale or actual value for the length
of each bar.
✓ It can be classified as:
• Simple bar chart
• Component bar chart
• Multiple bar chart

Simple bar chart: is used to represent for only one variable.

Example: construct a bar chart to show the religion affiliation of the American population
Religion Number of population(million)
Protestant 79
Roman Catholic 31
Jewish 4
Others 2

Number of population(million)
100

50 Number of
population(million)
0
protestant Roman Calholic Jewish others
Figure of Simple Bar Diagrams

Note:
✓ The above graph show that each bar has an equal width but unequal length.
✓ The length indicates the number of population.
✓ It has a limitation b/c a diagram can display only one classification or one category of data.
✓ It may be noted that the simple bars shown in the above figure are drawn vertically. They are, therefore,
known as vertical bars. But the same bars can be drawn horizontally as shown in figure below.
.

Number of population(million)
others
Jewish
Number of
Roman Calholic
population(million)
protestant

0 20 40 60 80 100
Figure Horizontal Simple Bar Diagram

7
Component Bar Diagram: As the name of this diagram implies, it shows subdivisions of components in a
single bar. When it is desired to show how a total is divided into its components, we use a component bar chart.
In this type of bars different colors are used for identification.

Example: display the following using a suitable chart yield of farmers in SNNPR.
CROP/YEAR 1990 1991 1992 1993
PEAS 14 15 26 19
WHEAT 10 15 14 25
MAIZE 2 6 10 3
TOTAL 26 36 50 47

60
Maize
40
Wheat
20
peas
0
1990 1991 1992 1993

Fig of Component bar Diagram

Multiple Bars: When two or more interrelated series of data are depicted by a bar diagram, then such a diagram
is known as a multiple-bar diagram. Suppose we have birth rate and death rate of different five countries. We
can display by two bars close to each other, one representing birth rate while the other representing death rate
figure shows such a diagram based on hypothetical data.

Example: the following table give birth rates and death rates of different five countries during 1998
Country Birth Rate Death Rate
A 33 24
B 16 11
C 20 14
D 40 18

Birth Rate
60

40

20

0
A B C D

Figure Multiple Bar charts

Pie chart/pie diagram/circle graph


It is a type of circles used to display the percentage of total no. of measurement falling in to each of the categories.
Since the total angle at the center of a circle has 360 degrees ( o), we convert the relative frequencies in to
corresponding degrees using the formula: Degree a categories or class = relative frequency * 360o.

Example: Draw a pie diagram for the following data of Five year plan public sector
Agriculture and rural Development 12.9%
Irrigation etc 12.5%
Energy 27.2%
Industry and minerals 15.4%
Transport communication 15.9%
Social services and others 16.1%

8
precentage outlay
Solution: the angle at the center is given by  360o= percentage out lay x 3.6'
100
Percentage outlays Angle at the center
Agriculture and rural Development 12.9% 12.93.6=46o
Irrigation etc 12.5% 12.53.6=45o
Energy 27.2% 27.23.6=98o
Industry and minerals 15.4% 15.43.6=56o
Transport communication 15.9% 15.93.6=57o
Social services and others 16.1% 16.13.6=58o

Agriculture and rural


Development

13% Irrigation etc


16%
13%
16% Energy

15% 27% Industry and minerals

Transport communication

HISTOGRAMS AND FREQUENCY POLYGONS


Histograms and frequency polygons are two graphic representations of frequency distributions.
1. A histogram or frequency histogram, consists of a set of rectangles having
a. bases on a horizontal axis (the X axis), with centers at the class marks and lengths
equal to the class interval sizes, and
b. Areas proportional to the class frequencies.
2. A frequency polygon is
✓ It is a line graph of the class frequencies plotted against class marks. It can be obtained by
connecting the midpoints of the tops of the rectangles in the histogram.
• The first end point is joined to the x-axis to a point showing zero frequency just
before the first class interval, and the last end joined to the one after the last class
interval.

Ogive curves
So far we have discussed the graphic devices, that showed frequencies as are given to us or we may say non-
cumulative frequencies. We now take up another type of graph, which is based on cumulative frequencies. It is a
graph that represents the cumulative frequencies for the classes in f.d.

The cumulative frequency curve (ogive)


The cumulative frequency curve (or ogive) is the graphic representation of a cumulative frequency distribution.
There are two types of ogives. These are
I) Less than ogive
II) Greater than ogive.
I) Less than ogive
The less than cumulative frequencies are plotted against upper boundaries of their respective class intervals
II) Greater than ogive.
The greater than cumulative frequencies are plotted against the lower boundaries of their respective class
intervals.

9
Chapter Three
Numerical representation of a data set

There are three basic ways to summarize numerical data. These are
1. Measure of Central Tendency(MCT)
2. Measure of Variation (Dispersion)
Measure of Central Tendency (MCT):
✓ Quantitative variables contained in raw data or in frequency tables can be summarized by means of a few
numerical values. A key element of this summary is called the MCT. It is also called measure of average

Types of measures of central tendency


There are several different measures of central tendency; each has its advantage and disadvantage.

Three measures of the center of a distribution are commonly used: mean, median, and mode. Any of them can
be used with normally distributed data; however, with ordinal data, the mean of the raw scores is usually not
appropriate. Especially if one is computing certain statistics, the mean of the ranked scores of ordinal data
provides useful information. With nominal data, the mode is the only appropriate measure

Mean
✓ It is a measure of location or central value for a continuous variable.
✓ Most useful when the data have a symmetric distribution and do not contain outliers.
✓ It is the most popular & best understood MCT for a quantitative data set. Thus, it is usually the statistic
of choice, assuming that the data are normally distributed data.

10
Properties of the summation notation
n n n
1.  i = 1 + 2 + 3 + ...n
i =1
4.  ( xi + c) =  xi + (n  c )
i =1 i =1
n n n
2. 1 = n
i =1
5.  cx
i =1
i = c  xi , where c is a
i =1
n constant number
3.  c = n  c , where c is a constant n n n
i =1 6.  ( xi  y i ) =  xi   y i
i =1 i =1 i =1
number

The following table indicates the formula for mean


For individual or raw data For frequency distribution
For ungrouped data For grouped data
For population data For population data For population data
N N N

 xi  f i xi fm i i
AM = E ( x ) =  = i =1
AM = E (x ) =  = i =1
AM = E (x ) =  = i =1

N N N
For sample data For sample data For sample data
n n n

 xi  f i xi fm i i
AM = M (x ) = x = i =1
AM = M (x ) = x = i =1
AM = M (x ) = x = i =1

n n n
Where:

✓ xi = observation of the class


✓ mi = the mid pt of the class
n
✓ n =  f i = the total observation in the sample data
i =1
N
✓ N =  f i = the total observation in the population data
i =1

✓ M (x ) = x = the notation for sample data


✓ E(x) =  = the notation for population data

Alternative to the Arithmetic Mean-Median


Median:
✓ It is the middle number when the measurements are arranged in ascending (descending) order.
✓ It is the appropriate measure of central tendency for ordinal level raw data.
✓ It is a better measure of central tendency than the mean when the frequency distribution is skewed.
Note:
✓ In symmetric distributions the mean and median are the same
✓ In skewed distributions, median more appropriate.
✓ Provides a measure of location of a sample that is suitable for asymmetric distributions and is also
relatively insensitive to the presence of outliers.
Mode:
✓ It is the value of the item which appears most frequently.
✓ It is the most common category, or mode can be used with any kind of data but generally provides the
least precise information about central tendency.
✓ It is the only measure of central tendency that can be used with nominal data.

11
Note:
✓ There can be more than one mode or there may no mode when all observation in the data set have
equal frequency
✓ When all the values occur the same number of times, we usually say that there is no unique mode.

The following table indicates the formula for median & mode (for sample data)
For ungrouped data For grouped data
If n is odd: n 
2 − Cf  w
 n +1
th
✓ Median =   Median = LCB +  
 2  fmi
If n is even:
th th
Where:
n n  ✓ LCB = the lower class boundary of the median class
  +  + 1
Median =   2 
2 ✓ Cf = the LCF of the class above the median class,

2 ✓ fmi = frequency of the median class &
✓ W is the width of the median class.
Mode = the value that have the most frequency  f 1 − f0 
Mode = l o +   w
 ( f 1 − f 0 ) + ( f 1 − f 2 ) 
Where:
✓ l o = is the lowest class boundary of the modal class,
✓ f1 = the frequency of modal class
✓ f 0 = the frequency of the class preceding the modal class,
✓ f 2 = the frequency of the class succeeding the modal class.
✓ w = the class width.

Note: (in case of grouped data)


( )
✓ Median class is the value of n 2  the nearest on LCF.
✓ Modal class is the class with largest frequency.
✓ If we have the value of median or mode, we can know the median class or modal class respectively b/c
the median or mode value is found in their class. Most of the time it is used for finding missing frequency.

Note:

The value of central tendency, however, does not completely describe the data. Therefore, some additional
characteristics of the data must be used to provide for a more complete summary and description of the data and
to distinguish between dissimilar data sets. The next section deals with this additional characteristic, the
variability of the data.
Example: Consider the following two sets of data.
i. 6, 18, 30 and
ii. 17, 18, 19
6 + 18 + 30 54 17 + 18 + 19 54
xi = = = 18 and xii = = = 18
3 3 3 3
Observation Even though the two sets of data have the same arithmetic mean, the values in i are more scattered
or dispersed than that of ii.

Measure of Variation (dispersion):

When comparing sets of data, it is useful to have a way of measuring the scatter of spread of the data.

✓ Variation or dispersion is the degree to which numerical data is scattered or spread about some measure
of central tendency (usually the mean).

12
Variance (V): Variance also indicates a relationship between the mean of a distribution and the data points; it is
determined by averaging the sum of the squared deviations. Squaring the differences instead of taking the absolute
values allows for greater flexibility in calculating further algebraic manipulations of the data. Another measure of
variation is the standard deviation.

The following table indicates the formula for variance & standard deviation

For individual or raw data For frequency distribution


For ungrouped data For grouped data
Variance: For population data Variance: For population data Variance: For population data
N N N

 (x − )  f (x − )  f (m − )
2 2 2
i i i i i
 2 = E ( x −  )2 = i =1
2 = i =1
2 = i =1

N N N
Variance: For sample data Variance: For sample data Variance: For sample data
n n n

 (x − x)  f (x − x)  f (m − x)
2 2 2
i i i i i
S 2 = M (x − x ) = i =1
S2 = i =1
S2 = i =1
2

n −1 n −1 n −1
Standard deviation: For population Standard deviation: For popul. Standard deviation: For population
2 = 2 = 2 =
Standard deviation: For sample Standard deviation: For sample Standard deviation: For sample
S =S
2
S =S
2
S2 = S
Where:

✓ xmax is the maximum observation


✓ xmin is the minimum observation
✓ mi the mid pt of the class
✓ A is
• either population mean or median in case of population data or
• either sample mean or sample median in case of sample data

Note:
✓ The denominator in sample variance formula is n -1. This is b/c the sample variance underestimates the
population variance when the denominator in the sample formula for variance is n.

13

You might also like