Statistics
Statistics
1. Introduction:
‘The fundamental gospel of statistics is to push back the domain of ignorance, prejudice, rule of
thumb, arbitrary or premature decision, tradition and dogmatism, and to increase the domain in
which decisions are made and principles are formulated on the basis of analyzed quantitative
facts’ (Robert W. Burgess).
The word statistics is used in two different meanings. The most popular conception of statistics is
that it is quantitative figures. It is numerical description. It refers to quantitative aspect of things.
For instance, the number of child born in a year, number of schools and colleges in a state: the
second meaning of the word is a body of scientific principles and techniques.
Presentation of Data
Interpretation of Data Analysis of Data
Business StatisticsPage 1
Statistics is also Collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.
Statistics is a branch of science that deals with data analysis.
Data Collection: This is a stage where we gather information for our purpose
o If data are needed and if not readily available, then they have to be collected.
o Data may be collected by the investigator directly using methods like interview,
questionnaire, and observation or may be available from published or unpublished sources.
Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data might involve irrelevant
figures, incorrect facts, omission and mistakes. Errors that may have been included during
collection will have to be edited. After editing, we may classify (arrange) according to their
common characteristics. Classification or arrangement of data in some suitable order makes the
information easy for presentation.
Data Presentation: The organized data can now be presented in the form of tables and diagram.
At this stage, large data will be presented in tables in a very summarized and condensed
manner. The main purpose of data presentation is to facilitate statistical analysis. Graphs and
diagrams may also be used to give the data a vivid meaning and make the presentation attractive.
Data Analysis: This is the stage where we critically study the data to draw conclusions about the
population parameter. The purpose of data analysis is to dig out information useful for decision
making. Analysis usually involves highly complex and sophisticated mathematical techniques.
However, in this material only the most commonly used methods of statistical analysis are
included. Such as the calculations of averages, the computation of majors of dispersion,
probabilities and probability distributions.
Business StatisticsPage 2
Data Interpretation: This is the stage where we draw valid conclusions from the results
obtained through data analysis. Interpretation means drawing conclusions from the data which
form the basis for decision making. The interpretation of data is a difficult task and necessitates
a high degree of skill and experience. If data that have been analyzed are not properly
interpreted, the whole purpose of the investigation may be defected and fallacious conclusion be
drawn. So that great care is needed when making interpretation.
1.2 Classification (types) of Statistics: the study of Statistics is usually divided in to two major
categories: Descriptive statistics and inferential statistics.
I. The definition of Statistics given earlier referred to “organizing, presenting, and analyzing…
data.” This facet of statistics is usually referred to as Descriptive statistics.
Descriptive statistics: methods of organizing, summarizing and presenting data in an informative
way.
It seeks only to describe and analyze a sample without drawing any conclusion about a
population. It employs tools such as graphs, charts, tables, averages, mean, mode, etc to describe
the given data set.
Example 1: out of 50 electric light bulbs which are produced by a company weekly, 12 electric
light bulbs are defective.
II. Inferential statistics: - Another facet of statistics is inferential statistics- also called
statistical inference and inductive statistics. Our main concern regarding inferential statistics is
finding out something about a population based on a sample taken from the population.
Inferential statistics: the methods used to find out something about a population based on a
sample. Or Inferential Statistics refers to generalizing from samples to populations using
probabilities, performing hypothesis testing, determining relationships between/among variables,
and making predictions. Making inferences (predictions, decisions) about certain characteristics
of a population is based on information contained in a sample. I.e. it deals with drawing
important conclusions or generalizations about a population based on analysis of a sample.
Business StatisticsPage 3
Example: In order to estimate the voltage required to cause an electrical device to fail, a sample
of such devices can be subjected to increasingly higher voltages until each device fails. Based on
these sample results, the probability of failure at various voltage levels for the other devices in
the sampled population can be estimated.
Business StatisticsPage 4
6. Statistics and Medical Science
In medical science, the statistical tools for the collection, presentation and analysis of observed
facts relating to the causes and incidence of diseases and the results obtained from the use of
various drugs and medicines are of great importance.
7. Statistics and Psychology and Education
In education and psychology, too, Statistics has found wide applications e.g., to determine the
reliability and validity of a test,’ Factor Analysis’, etc.
8. Statistics and War
In war the theory of ‘Decision Functions’ can be great assistance to military and technical
personnel to plan ‘maximum destruction with minimum effort.’
Generally, the Functions of statistics are:
Statistics has several major functions in all the economic sectors. Some of the major functions of
statistics include the following:
It helps prediction of future values of a given variable of interest based on the past and
the present values observed;
It helps condensing and presenting of data in easily understandable manner;
Statistical figures on existing situations and on future situations (predicted) help the
design and taking of appropriate polices, strategies and actions;
Statistical figures also help comparison of the before and after of the introduction of
some polices, strategies or projects.
Statistics is used in every day to day activities of human being (it is used in almost all
fields of human endeavor i.e. statistics can be used in various occupations), for example:
Limitations of Statistics
Despite its functions and uses, statistics has some limitations. Statistics, with its wide
applications in almost every sphere of human activity, is not without limitations. The major
limitations of statistics include the following:
Business StatisticsPage 5
Individual items, taken separately, do not constitute statistical data and are
meaningless for statistical enquiry. Hence, Statistical analysis is suited to only
those problems where group of characteristics are to be studied.
To sum up, it deals on aggregates of facts and no importance is attached to
individual items–suited only if their group characteristics are desired to be
studied.
On the basis of statistical analysis, we can talk only in terms of probability and chance and not in
terms of certainty. Statistical conclusions are not universally true-they are true only on an
average.
Statistical methods are the most dangerous tools in the hands of the inexpert. The use of
the statistical tools by inexperienced and untrained persons might lead to very fallacious
conclusions.
Unless interpreted properly, statistical results may be misused (can be misguiding); and requires
well skilled and well trained personnel
4. Statistics is not suited to the study of qualitative phenomenon
It deals with only those subjects of inquiry that are capable of being quantitatively
measured and numerically expressed.
Statistics deals only with numerical facts while there are a lot of qualitative facts that
needs to be collected and analyzed.
Statistics, being a science dealing with a set of numerical data, is applicable to the study
of only those subjects of enquiry which are capable of quantitative measurement. As
such, qualitative phenomena like honesty, poverty, culture etc. which cannot be
expressed numerically, are not capable of direct statistical analysis.
5. Statistical data are only approximately and not mathematically correct.
Business StatisticsPage 6
UNIT 2 -DATA COLLECTION AND PRESENTATION
2. Introduction: Data constitute the foundation for statistical analysis. Governments, business
firms and individuals collect statistical data required to carry out their activities efficiently and
effectively. As we discussed in chapter one, Data are the real factors and figures seen or
observed that are collected, organized, presented, summarized, analyzed and
interpretation. From this definition, we can say that statistics as a field of study is only there if
and only if there are data, since by definition statistics as a field of study deals with
data collection, organization, presentation, analysis and interpretation through scientific
/systematic ways to come up with somehow valid generalization akbout the element under
study.
Data are facts and figures that are used to describe individuals (entities) of interest with regards
to a certain variable(s) of interest (data variable). Individuals are objects described by a set of
data [the entities on which data are collected]. Variable is a characteristic of interest for the
individual of a population under study. The concepts of data, individuals and variables can be
best understood through looking at hypothetical example.
Business StatisticsPage 7
III. Qualitative, i.e. according to some attributes
IV. Quantitative, i.e. in terms of magnitudes.
V. level (scale) of measurement (types of scales)
i. Geographical Classification
When data are observed over a period of time, the type of classification is known as
chronological classification. For examples, the sales figures of a company are given below:
Year Sales
2001 18810
2002 23601
2003 23816
2004 32435
2005 39343
In qualitative classification, data are classified on the basis of some attribute or quality such as
sex, color of hair, literacy, religion etc. The point to note in this type of classification is that the
attribute under study is blindness, we may found out how many persons are blind in a given
population.
Business StatisticsPage 8
Quantitative classification refers to the classification of data according to some characteristics
that can be measured, such as height, weight, income, sales etc. For examples, the workers of a
factory may be classified according to wages as follows:
2500-2600 50
2600-2700 200
2700-2800 260
When the data are qualitative, we are usually interested in how many or what proportion fall in
each category. For example, what percent of the population has blue eyes? How many Catholics
and how many Protestants are there in the United States? So, when the characteristic being
studied is nonnumeric, it is called a qualitative variable or an attribute.
Quantitative variable: When the variable studied can be reported numerically, the variable is
called a quantitative variable. A quantitative variable is also one that can be measured and
expressed numerically and they can be of two types (discrete or continuous). The values of a
discrete variable are usually whole numbers, such as the number of episodes of diarrhea in the
first five years of life.
Business StatisticsPage 9
A continuous variable is a measurement on a continuous scale. Examples include weight, height,
blood pressure, age, etc.
Quantitative variables are the balance in your checking account, the ages of company presidents,
the life of an automobile battery (such as 42 months) and the number of children in a family.
So, quantitative data consists of numerical measurements or counts. They are numerical in nature
and can be ordered or ranked. E.g. the variable “age” is numerical, and people can be ranked in
order according to the value of their ages. Other examples include heights, weights, and body
temperatures.
Although the types of variables could be broadly divided into categorical (qualitative) and
quantitative, it has been a common practice to see four basic types scales of measurement.
S. S. Stevens (1951) proposed four scale types. These scale types were Nominal, Ordinal,
Interval, and Ratio, and each possessed different properties of measurement systems.
1. Nominal scales /data: - Data that represent categories or names. There is no implied order to
the categories of nominal data. In these types of data, individuals are simply placed in the proper
category or group, and the number in each category is counted. Each item must fit into exactly
one category. The simplest data consist of unordered, dichotomous, or "either - or" types of
observations, i.e., either the patient lives or the patient dies, either he has some particular
attribute or he does not. eg. Example
Religion: Christianity, Islam, Hinduism, etc.
Sex: Male, Female
Eye color: brown, black, etc
2. Ordinal scales: - have order among the response classifications (categories). The spaces or
intervals between the categories are not necessarily equal. The ordinal scale is also used for
telling the number of observations falling in different categories. But in ordinal scale one group
is related to the other group in terms of ordinal value. For example it is common in schools that
teachers rate their students, based on their grade achievements, as excellent, very good, good and
fair. In this type of categorization, the rating “excellent” is obviously is greater than the rating
“very good” in terms of the ordinal value. It should be noted that we cannot tell by how much
does the rating “excellent” exceeds the rating “very good”. We can only tell that the rating
Business StatisticsPage 10
“excellent” is greater that the rating “very good”. The other thing is that the ratings are categories
in which the students are grouped based on their achievements. Like in the case nominal scale, in
the ordinal scale also each category is mutually exclusive.
3. Interval scales / Data: - In interval data, the intervals between values are the same. For
example, in the Fahrenheit temperature scale, the difference between 70 degrees and 71 degrees
is the same as the difference between 32 and 33 degrees. But the scale is not a RATIO Scale. 40
degrees Fahrenheit is not twice as much as 20 degrees Fahrenheit.
Interval variables are true quantitative measures because in addition to marking difference and
rank, the differences or distances between any two numbers on the scale are meaningful. This
means that the difference between two scores is an accurate reflection of the difference in the
amount of an attribute that the two objects have. Temperature, measured in degrees Celsius, is
measured on the interval scale, and a difference between 18 degrees and 20 degrees will be
exactly the same as the difference between 25 degrees and 27 degrees. Most measures in the
behavioral sciences (e.g. IQ scores, scores on attitude scales, and knowledge tests) are
considered interval measures. In addition to performing mathematical relations (<, >, =), we may
also legitimately perform the mathematical operations of addition and subtraction (+, –) with
these numbers. Therefore, interval scales are measurement systems that possess the properties of
magnitude and intervals, but not the property of rational zero.
4. Ratio scales / Data:- The data values in ratio data do have meaningful ratios, for example, age
is a ratio data, someone who is 40 is twice as old as someone who is 20. Both interval and ratio
data involve measurement. Most data analysis techniques that apply to ratio data also apply to
interval data. Therefore, in most practical aspects, these types of data (interval and ratio) are
grouped under metric data.
Ratio scales are measurement systems that possess all three properties: magnitude, intervals, and
rational zero. The added power of a rational zero allows ratios of numbers to be meaningfully
interpreted; i.e. the ratio of John's height to Mary's height is 1.32, whereas this is not possible
with interval scales.
Data according to sources we classified as (i) primary data and (ii) secondary data
Business StatisticsPage 11
(i) Primary Data: Primary data are measurements observed and recorded as part of an original
study. When the data required for a particular study can be found neither in the internal records
of the enterprise, nor in published sources, it may become necessary to collect original data, i.e.,
to conduct first hand investigation. The work of collecting original data is usually limited by
time, money and manpower available for the study. When the data to be collected are very large
in volume, it is possible to draw reasonably accurate conclusions from the study of a small
portion of the group called a sample.
(ii) Secondary data: In statistics the investigator need not begin from the very beginning, he
may use and must take into account what has already been discovered by others. When an
investigator uses the data which has already been collected by others, such data are called
secondary data. Secondary data can be obtained from journals, reports, government
publications, publications of research organizations, etc
Business StatisticsPage 12
Frequency distribution is grouping of data into categories showing the number of
observations in each mutually exclusive category.
A frequency distribution is a table in which possible values for a variable are grouped into
classes, and the number of observed values which fall into each class is recorded. Data
organized in a frequency distribution are called grouped data. In contrast, for ungrouped data
every observed value of the random variable is listed.
Rating Frequency
Coca Cola 6
Mirinda 8
Pepsi 2
Sprite 4
Total 20
- Is it on population or sample?
Example 2:
The number of refrigerators sold on 22 working days by a leading agency house:
23 30 20 26 30 20 23 40 40 26 20 30
23 40 28 26 23 40 28 28 30 30
The table bellow clearly shows that on 3 days 20 refrigerators were sold each day, on 4 days 23
refrigerators were sold each day etc.
Business StatisticsPage 13
This method of classification helps in condensing the data only where values are largely
repeated, otherwise there will be hardly and condensation. In order to make the series more
compact so that its characteristics can be easily studied, data may be classified according to
class- intervals.
20 lll 3
23 llll 4
26 lll 3
28 lll 3
30 1111 5
40 llll 4
Example 3:
Consider the problem of a social scientist who wants to study the age of persons arrested in a
country. In connection with large sets of data, a good overall picture and sufficient information
can often be conveyed by grouping the data into a number of class intervals as shown below.
Business StatisticsPage 14
distribution cannot tell how many of the arrested persons are 19 years old, or how many are over
62.
The construction of grouped frequency distribution consists essentially of five steps:
(1) Choosing the classes, (2) determine the class intervals (3) sorting (or tallying) of the data into
these classes, (4) counting the number of items in each class, and (5) displaying the results in the
form of a chart or table
Methods of classifying the grouped data according to class interval
There are two methods of classifying the grouped data according to class intervals namely
Exclusive method
Inclusive method
a. Exclusive Method
When the class intervals are so fixed that the upper limit of one class is the lower limit of the
next class it is known as the ‘Exclusive’ method of classification. The following data are
classified on the basis:
Income in $ No of Employees
1800-1900 50
1900-2000 100
2000-2200 200
It is clear that ‘Exclusive method’ ensures continuity of data in as much as the upper limit of one
class is the lower limit of the next class. Thus in the above example, there are 50 persons whose
income is between $1800 and $1899.99. A person who is getting exactly $1900 would be
included in the class 1900-2000.
Here, whenever this method is used it is necessary to give clear instructions in the questionnaire.
However, the reader should note that if class intervals are given like 0-10, 10-20,, it is always
presumed that upper limit is exclusive i.e. an observation exactly to the upper limit is not
included in that class.
Business StatisticsPage 15
b. Inclusive method
Under the “Inclusive method’ of classification, the upper limit of one class is included in that
class itself.
Income in $ No of Employees
800-899 50
900-999 100
1000-1099 200
In the class 800-899 we include persons whose income is between $800 and $899. If the income
of persons is exactly $900 he is included in the next class.
A. Absolute Distribution
Definition of Absolute Frequency: A statistical term describing the total number of trials or
observations within a given interval or frequency bin. The frequency bins can be of any size, but
they must be mutually exclusive, exhaustive and the data must be grouped. So, the absolute
frequency is simply the total number of observations or trials within a given range or it is the
number of occurrences of a particular phenomena. It shows how many scores have that
particular value.
Absolute frequency represents the number of times each score or observation has occurred in a
set of observation. Computing the absolute frequency of a score is simply a matter of counting
the number of times that score appears in the set of data. It is necessary to include scores with
zero frequency in order to draw the frequency polygons correctly.
Business StatisticsPage 16
Relative Frequency Distributions: is the proportion to the total number of observation. The
relative frequency distribution is useful in comparing two or more frequency distributions in
which the number of cases of each distribution is not equal.
Frequency of i th class
Relative frequency of the ith class = Total number of observatio ns
It may be noted that at times the use of relative frequencies is more appropriate than absolute
frequencies. Whatever two or more sets of data contain different number of observation, a
comparison with absolute frequencies will be incorrect. In such cases, it is necessary to use the
relative frequency.
C. Cumulative Frequency
In some situations, we may be interested, not in the frequencies in various classes, but rather in
the frequencies or proportions of observation which are “less than” or “greater than” a given
value. This leads to a cumulative frequency distribution. This is derived from a frequency
distribution by forming a cumulative frequency column. This column is computed by adding the
successive class frequencies from top to bottom. The entry corresponding to the top interval is
the frequency of that class., the entry opposite the second interval is the sum of the frequencies in
first and second class intervals etc. and so on.
If we divide frequency by N, the total number of observations, we get the relative frequencies.
Also, if we divide cumulative frequency by N, the total number of observations, we get the
relative cumulative frequencies, which are often expressed in percentage.
Cumulative frequency refers to the frequency of all data items with a value less than or equal to a
specified score.
Cumulative frequency (cf): the frequency of all scores at or below a particular score.
To compute a score’s cumulative frequency, we add the simple frequencies for all scores below
the score with the frequency for the score.
Business StatisticsPage 17
Example, an Absolute, Relative, Percentage & Cumulative Frequencies Table
Class interval Frequency Relative Frequency Percentage Cumulative Frequency
Class limits(CL),
Class frequency(F),
Class boundaries(CB) ,
Class midpoint(class mark)(mi),
Class width ( or class size or class interval)(W),and we will define
each term
Class Limits: includes the lower class limits (LCL) and upper class limits (UCL). For
example, take the class 40-60. Here, we find that the lowest limit is 40 and the highest limit is
60. When we categories individual observations within this class, the lowest limit is 40 and the
highest limit is 60. When we categories individual observation within this class, it is clear that
none of the included observation is below 40 or above 60. Take another example; a class 60-79
Business StatisticsPage 18
indicates that no value below 60 can be included here and, likewise, no value above 79 can be
included.
Class Frequency: The number of observations belonging to a particular class is known as the
frequency of that class or class frequency. Suppose there are 20 students who have obtained
marks ranging from30-40 and 44 students have obtained marks ranging from 50- 60. In the first
case, the class-interval 30-40, the class frequency is 20, while in the second case, in the class
interval 50-60, the class frequency is 44.
Classes Boundaries (CB): have upper class boundary (UCB) and lower class boundary (LCB)
which are obtained after getting class limits. First we have to find the distance (d), most of the
time is unit between one class and the next class using the formula
LCLi 1 UCLi
2
LCBi = LCLi - half the unit measurement
2. The UCB is obtained by adding half the unit of measurements (d) to the UCL of the class.
LCLi 1
UCLi
2
UCBi = UCLi +
The unit of measurement (d) is the gap between two UCL of the class and LCL to the next higher
class (two successive classes).
a) LCLi + 1 - UCLi
Business StatisticsPage 19
Unit of measurement (d) = LCLi+1 - UCLi = 10 - 9 = 1
LCL2 UCL1
1 0.5
Half of unit of measurement 2 2
Class width ( or class size or class interval)(W): is the difference between the upper class
boundary and the lower class boundary of a class is known as a class width (size), and for the
above example is W=UCB-LCB=9.5-4.5=5( we can take any class because all classes have equal
class width).
Note: When all the classes have the same (uniform) class width (size) then the class width of the
distribution is the difference between either the lower class limit or upper class limit of the two
consecutive classes.
Class midpoint (class mark) (mi): When we add up the lower and the upper class limits of a
class interval, we get a certain value. This value is divided by two, which gives us the class mid-
point. Thus, the mid-point of class interval 40 - 60 is (40+60) / 2 = 50. The formula for obtaining
class mid-point is as follows:
LCLi UCLi
or
LCBi UCBi
Midpoint (mi) = 2 2
Business StatisticsPage 20
As we shall see subsequently, the mid-point of each class interval is taken to represent it for the
purpose of statistical calculations.
Steps to construct frequency distribution
It is difficult to lay down any hard and fast rules for constructing frequency distribution. Raw or
ungrouped data have the following steps for organizing in to a frequency distribution.
Step1: Decide on the number of classes. The goal is to use just enough groups or classes to
reveal the shape of the distribution. Too many classes or too few classes might not reveal the
basic shape of the data set. A useful recipe to determine the number of classes (k) is the “2 to the
‘k’ rule.” This guide suggests you select the smallest number (k) for the number of classes such
Step2: Determine the class interval or Width. Generally the class interval or Width should be
the same for all classes. The classes all taken together must cover at least the distance from the
lowest value in the raw data up to the highest value. Expressing these words in a formula:
H−L
i > K
Where i = the class interval
H =Highest observed value
L = Lowest observed value
K = the number of classes
Step3: Set the Individual Class Limits. State clear class limits so you can put each observation
in to only one category. This means you must avoid overlapping or unclear class limits.
Step4: tally the number or amount of items in each class.
Step5: Count the number or amount of items in each class. The number of observations in
each class is called class frequency.
Example
The following are the marks of the 30 students in statistics. Prepare a frequency distribution
taking a suitable class interval.
Business StatisticsPage 21
12 33 23 25 18 35 37 49 54 51 37 15
27 33 42 45 47 55 69 65 63 46 29 18
37 45 46 59 29 55
Requests:-
1. Determine the number of classes
2. Classify the above data taking a suitable class interval.
3. Determine the class limit
4. Tally the number of items in each class
5. Count the number or amount of items in each class
Solution
1. Given the number of observations = N = 30, 2k ≥ 30, k =5; therefore, the ♯ of classes is 5.
Range 69− 12 57
2. i≥ K ≥ 5 ≥ 5 = 11.4 ≈12
3. Lower class limit = 10
4 and 5 see
❑
❑
22-34 llll I 6
58-70 1lll 4
Total 30
Business StatisticsPage 22
Tally means compute, count or mark.
2.3. Graphic Methods of Data Presentation (Histograms, Polygons, Ogive, Pie-Charts, Bar
and Line Graphs)
There are many graphs, diagrams and charts used to present data. Histograms, Polygons, Ogive,
Pie-Charts, Bar and Line Graphs are some of them. Here let us see two of them (Histograms and
Polygon) only using the above example of frequency distribution.
One the most common ways to portray a
Class Frequency frequency distribution is Histogram.
Histogram is a graph in which the classes
Histogram Data Presentation are marked on the horizontal axis and the
class frequency on the vertical axis. The
4 4
10 22 34 46 58 70 Class Interval
4 4
Business StatisticsPage 23
4 16 28 40 52 64 76
Introduction: - we need a single representative value that describes the entire mass of data
given in the frequency distribution. This single representative value is called the central value,
measure of location or an average around which individual values of a series cluster. This
central value or an average enables us to get a gist of the entire mass of data, and its value lies
somewhere in the middle of the two extremes of the given observations. For this reason such a
central value or an average is frequently called a measure of central tendency.
It should be clear to you that the concept of a measure of central tendency is concerned only with
quantitative variables and is undefined for qualitative variables as these are immeasurable on a
scale.
In contrast, measures of dispersion, or variability, are concerned with describing the variability
among the values. Several techniques are available for measuring the extent of variability in data
sets.
3.1. The Use of Summation Notation
The most important objective of calculating and measuring central tendency is to determine a
“single figure “which may be used to represent a whole series involving magnitudes of the same
variable. In that sense it is an even more compact description of the statistical data than the
frequency distribution.
The Capital Letter or uppercase ∑ (sigma) is the mathematical symbol for summation. If f (i)
denotes some quantity whose value depends on the value of i, the expression.
i 1
x i x1 x2 x3 . . . . xn
i 1
xi yi x1 y1 x2 y2 . . xn yn
Business StatisticsPage 24
n
x
i 1
i yi x1 y1 x2 y2 . . . xn yn
= x1 + x2 +. . . . + xn + y1 + y2 +. . . . + Yn
n n
i 1
xi
i 1
yi
=
is read as “sigma i, i going from 1 to n” and means to insert 1for i, then 2 for i, then 3 for i…and
sum the results.
3.2.1 Mean: There are four type of mean which are suitable for a particular type data.
These are
I. Arithmetic means III. Geometric mean
II. Weighted mean IV. Harmonic mean
In classification and tabulation of data, we observed that the values of the variable or
observations could be put in the form of any of the following statistical series, namely:
Business StatisticsPage 25
1. Individual series or ungrouped data: Let X be a variable which takes values x 1 ,x2 ,x3 ,
…,xn , in a sample size of n from a population of size N for n < N then A.M. of a set of
observations is the sum of all values in a series divided by the number of items in the series.
That is if x1, x2, x3,..xn be n random samples, their arithmetic mean is
Example 1:Suppose the scores of a student on seven examinations were 5 ,10,20, 7,33 , 60
and 68,find the arithmetic mean of scores of student.
These are seven observations. Symbolically, the arithmetic mean, also called simply mean is
X =
∑x = (5 + 10 + 20+ 7 + 33 + 60 + 68) / 7 = 203 / 7 = 29
n
X=
∑ Xf
∑f
Here, ∑ X i f i= the sum of the products of observations with their respective frequencies.
xi fi xi fi_
Business StatisticsPage 26
X =
∑ xi f i
∑ fi
We will solve one example to understand it.
Example: the following table gives the wages paid to 125 workers in a factory. Calculate the
arithmetic mean of the wages.
Wages (in birr): 200 210 220 230 240 250 260
No. of workers: 5 15 32 42 15 12 4
∑ xi f i 28490
Solution: X = = = 229.92birr
∑ fi 125
Business StatisticsPage 27
60-70 2_________________________
Solution
Marks Mid-point (mi) No. of Students (fi) fX
0-10 5 4 20
10-20 15 8 120
20-30 25 11 275
30-40 35 15 525
40-50 45 12 540
50-60 55 6 330
60-70 65 2 130
∑ fi = 58 ∑ M i f i = 1940
So, Arithmetic mean will be
∑ xi f i
X =
∑ fi
= 1940/58 = 33.45 marks
It may be noted that the mid-point of each class is taken as a good approximation of the true
mean of the class. This is based on the assumption that the values are distributed fairy enough
throughout the interval. When large numbers of frequency occur, this assumption is usually
accepted.
Example: The data 5, 9, 13, 12 and 16 has mean 11 but, If we have 100 instead of 5 i.e. 100, 9,
13, 12, 16 then the mean will be 30.
Business StatisticsPage 28
3.2.2 Median: Median is defined as the value of the middle item (or the mean of the values of
the two middle items) when the data are arranged in an ascending or descending order of
magnitude.
th
n 1
Median = 2 element if n is odd.
th th
n n
1
2 2
= 2 element if n is even.
Suppose we have the following series: 15, 19, 21, 7, 33, 25, 18, 10 and 5
We have to first arrange it in either ascending or descending order. These figures are arranged in
an ascending order as follows:
Now as the series consists of odd number of items, to find out the value of the middle item, we
use the formula
th
n 1
Median = 2 element if n is odd.
That is the size of the 5th item is the median. This happens to be 18.
Suppose the series consists of one more item, 23. We may, therefore, have to include 23 in the
above series at an appropriate place, that is, between 21 and 25. Thus, the series is now 5, 7, 10,
15, 18, 19, 21, 23, 25, and 33. Applying the above formula, the median is the size of 5.5 th item.
Business StatisticsPage 29
Here, we have to take the average of the values of 5 th and 6th item. This means an average of 18
and 19, which gives the median as 18.5.
In the case of a continuous frequency distribution, we first locate the median class by cumulating
th
N
the frequencies until 2 point is reached. Finally, the median is calculated by with the help
of the following formula:
N
2 Cf w
Median LCb
f
Where, Cf = less than cumulative frequency of the class preceding(one before) the median
class , f is frequency of the median class, LCb is lower class boundary of median class and
k
N
i 1
fi ,
w is the size of the class width and
Let us take an example of a frequency distribution for which the median is to be calculated.
800-1,000 18
1,000-1,200 25
1,200-1,400 30
1,400-1,600 34
1,600-1,800 26
1,800-2,000 10_____
Total 143
Business StatisticsPage 30
Solution: In order to calculate median in this case, we have to first provide cumulative frequency
to the table. Thus, the table with the cumulative frequency is written as:
71.5 43
200
=1200+ 30
=1390 birr
3.2.3 Mode ( ^
X)
The mode is another measure of central tendency. It is the value at the point around which the
items are most heavily concentrated.
A given set of data may have
One mode – uni model e.g. A=3 ,3,7,6,2,1 ^
X =3
Two mode – Bi – modal e.g. 10,10,9,9,6,3,2,1 ^
X = 10 and 9
More than two mode- multi modal e.g. 5,5,5,6,6,6,8,8,8,2,3,2 ^
X =5,6,8
May not exist at all e.g. 1,3,2,4,5,6,7,8 no modal value
As an example, consider the following series: 8, 9, 11, 15, 16, 12, 15, 3, 7, 15
Business StatisticsPage 31
There are ten observations in the series where in the figure 15 occurs maximum number of times-
three. The mode is therefore 15.
Note that
In case of discrete frequency distribution, mode is the value of the variable corresponding
to the maximum frequency. This method can be used conveniently if there is only one
value with the highest concentration of observation.
Example: Consider the following distribution, then determine modal value of the distribution.
X 1 2 3 4 5 6 7 8 9
F 3 1 18 25 40 30 22 10 6
Solution: The maximum frequency is 40 and therefore the corresponding value of X=5 is the
value of mode. In the case of grouped data, mode is determined by the following formula:
f 1 f0
^
X lo w
Mode = =
1 f f 0 f1 f
2
lo
Where = is the lower value of the class in which the mode lie.
While applying the above formula, we should ensure that the class-intervals are uniform
throughout. If the class-intervals are not uniform, then they should be made uniform on the
assumption that the frequencies are evenly distributed throughout the class. In the case of
unequal class-intervals, the application of the above formula will give misleading results.
Business StatisticsPage 32
90-100 4
We have to calculate the mode in respect of this series.
Solution: We can see from Column (2) of the table that the maximum frequency of 12 lies in the
class-interval of 60-70. This suggests that the mode lies in this class-interval. Applying the
formula given earlier, we get:
12 8
Mode 60 10
12 8 12 9
4
60 10
= 43
=65.7 approx.
In several cases, just by inspection one can identify the class interval in which the mode lies. One
should see which the highest frequency is and then identify to which class-interval this frequency
belongs. Having done this, the formula given for calculating the mode in a grouped frequency
distribution can be applied.
Business StatisticsPage 33
3.3. Measures of dispersion
In the preceding section we have seen the measures of central tendency. To describe a data set,
we use measures of variation in addition to measures of central tendency. But that it is not
enough to understand about the characteristics of the data we have collected. It is also important
to know the extent of variation among the data.
Literal meaning of dispersion is scatter or spread. Dispersion is the degree of the scatter or
variation of the variables about a central value. A measure of variation is designed to state the
extent to which the individual measures differ on an average from the mean. This section
discusses the methods used in measuring the extent of variations in the data we have collected.
Mean
Set 1: 60 40 30 50 60 40 70 50
Set 2: 50 49 49 51 48 50 53 50
The two data sets given above have a mean of 50, but obviously set 1 is more “spread out” than
set 2. How do we express this numerically? The object of measuring this scatter or dispersion is
to obtain a single summary figure which adequately exhibits whether the distribution is compact
or spread out. Some of the commonly used measures of dispersion (variation) are: Range,
variance, and standard deviation.
1. Range
The simplest measure of spread/variation is the range. It is the crudest measure of dispersion.
The range is a measure of absolute dispersion and as such cannot be usefully employed for
comparing the variability of two distributions expressed in different units. The range does not
use all the available observations. It uses only two extreme values. The range is the difference
between the highest and lowest scores. I.e. it is simply the highest value minus the lowest value.
Business StatisticsPage 34
In our example the highest score is a mark of 90 and the lowest is 0. The range is therefore 90.
This measure is a little crude; it sets the boundaries to the scores but does not tell us anything
about their general spread. Indeed, even if our marks were evenly spread between 0 and 90 rather
than clustered in the 50s, our range would still be 90.
The problem is that this summation is always zero. So, the average deviation will always be zero.
That is why the average deviation is never used.
Business StatisticsPage 35
Population Variance
So, to keep it from being zero, the deviation from the mean is squared and called the "squared
deviation from the mean". This "average squared deviation from the mean" is called the
variance.
Variance is the average of the squares of the distance each value is from the mean.
Population variance is (2) of N measurements is the sum of the squared deviations from
the mean divided by the N.
The symbol for the population variance is (read as sigma)
The formula for the population variance is:
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To counteract this, the sum of the squares of the deviations is divided by
one less than the sample size. Sample variance is (S 2) of n measurements is the sum of the
squared deviations from the mean divided by (n-1).
2. Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
Business StatisticsPage 36
Steps to compute the sample variance and standard deviation:
Example: compute the Range, Variance and Standard deviation for the following Sample data
Business StatisticsPage 37
5-1 4
S = 1.1402
Range = 6-3 = 3.
Business StatisticsPage 38
Unit 4 - Probability and probability distribution
4.1 Probability Theory: No doubt you are familiar with terms such as probability, chance and
likelihood. They are often used interchangeably. A manufacturer cannot be ascertained (sure) of
the future demand of his product with certainty. As everybody knows our world is a full of
uncertainty ,even ,no one knows exactly what comes after a minute, an hour……etc. but we
can guess the chance that something will happen. The word probability or chance is very
commonly used in day-to-day conversation, and generally, people have some idea what it means.
Terms like possible, probable, or likely and so on, have all similar meanings.
4.1.1 Basic definitions: Probability can be defined as a measure of the likelihood that a
particular event will occur or it is a science of decision making with calculated risk in face of
uncertainty. It is a numerical measure with a value between 0 and 1 of such likelihood. Where
the probability of zero indicates that the given event cannot occur and the Probability of one
assures certainty of such an occurrence.
Probability is a value between zero and one, inclusive, describing the relative possibility
(chance or likelihood) an event will occur.
It is a process that leads to the occurrences of one and only one of several possible observations.
Example: Throwing a die, tossing a coin are the examples of experiment or trial.
Example
Business StatisticsPage 39
If we throw a die the outcomes are 1, 2, 3, 4, 5 and 6. Then S {1, 2,3, 4,5, 6} is a sample space;
If we toss a coin then the outcomes are head (H) and tail (T). Then S {H , T } is a sample space;
Event: An event is the collection of one or more outcomes of an experiment. Example
If we throw a die the outcomes are 1, 2, 3, 4, 5, and 6. Then the outcomes of even numbers are
2, 4, 6. Then A {2, 4, 6} is called an event of even numbers;
Mutually Exclusive event: When an event occurs and none of the other events will occur at the
same time, then the event is called mutually exclusive event.
Example
If we toss a coin two outcomes head (H) and tail (T) are mutually exclusive event. Because if it
appears head (H) or tail (T) not both head and tail at the same time.
Business StatisticsPage 40
Equally likely means that each outcome of an experiment has the same chance of
happening as only other.
Limitations:
The classical probability fails to define probability when the total numbers of possible
outcomes are infinite.
It is not always to enumerate.
m
p p E lim
n n
Example: Suppose that an insurance company knows from past actual data that of all males 40
years old, about 60 out of every 100,000 will die within a one-year period.
Using this method, the company estimates the probability of death for that age group as:
60
= 0.0006
100,000
Subjective probability
The probability that a person assigns to an event which is the possible outcomes of some
processes on the basis of his own judgment, beliefs and information about the processes is
known as subjective probability.
Business StatisticsPage 41
For example, one fine morning Mr. X may well be prepared for rain, but his friend Mr. Y may
not.
Properties of probability
Let E be an experiment. Also let S be a sample space associated with E , with each event A we
P A
associate a real number, designed by and called the probability of A satisfying the
following properties:
0 p A 1
.
p S 1
.
p A B p A p B
If A and B are mutually exclusive events, .
Rules for computing probabilities
Probability of two or more events are computed by applying rules addition and multiplication
p A B p A p B
Addition: is special rule of addition. Mutually exclusive means that
when one event occurs, none of the other events can occur at the same time. Eg flipping a coin;
Example
If we toss a coin then what is the probability of head or tail?
Solution
Here there are two events, namely event A = H and event B = T . So that
p A or B p A p B
Joint probability: a probability that measures the likelihood two or more events will happen
concurrently. Addition rule P (A or B) = P (A U B) = P (A) + P (B) - P (A ∩ B)
P (A U B) = P (A) + P (B) - P (AB)
Example
Mr. X feels that the probability that he will pass Mathematics is 0.66666 and Statistics is
0.83333. If the probability that he will pass both the course is 0.6. What is the probability that he
will pass at least one of the courses?
Business StatisticsPage 42
Solution
Let M and S be the events that he will pass the courses Mathematics and Statistics respectively.
p M S p ( M or S )
The event M S means that at least one of M or S occurs. Therefore,
p he pass at least one of the course
p M p S p M or S
2 5 3 9
3 6 5 10 .
Complement rule
The complement rule is used to determine the probability of an event occurring by subtracting
p A 1 p A
the probability of the event not occurring from 1 i., e. .
Example
p B
Find .
Solution
p B 1 p B 1 0.025 0.075 0.90
We know that .
Rule of multiplication: there are two rules of multiplications, Special rule of multiplication and
the general rule of multiplication.
Special rule of multiplication: the Special rule of multiplication requires that two events A and
B are independent. Two events are independent if the occurrence of one event does not alter the
probability of the occurrence of the other event.
INDEPENDENCE the occurrence of one event has no effect on the probability of the occurrence
of the other event.
Special rule of multiplication P (A and B) = P (A ∩ B) = P (A). P (B)
Business StatisticsPage 43
Foe three independent events A, B and C, the special rule of multiplication used to determine the
probability that three events will occur is: P (A and B and C) = P (A). P (B) .P (C)
Example
A company has two large computers. The probability that the newer one will breakdown on any
particular month is 0.05, the probability that the older one will breakdown on any particular
month is 0.1. What is the probability that they will both breakdowns in a particular month?
Solution
Let, Event A is the newer one will breakdown and Event B is the older one will breakdown. So
p A 0.05 p B 0.1
that and .
p A and B p A * p B 0.05*0.1 0.005
.
General rule of multiplication: if two events are not independent, they are referred to as
dependent. We use the general rule of multiplication to find the joint probability of two events
when the events are not independent.
Remark: if A and B has a nonzero probability and the events are mutually exclusive they are
necessarily dependent; that is, they cannot be independent. On the other hand, if A and B are not
mutually exclusive they may be independent or dependent.
Example
There are 10 rolls of film in a box, 3 of which are defective. Two rolls are to be selected one
after another. What is the probability of selecting a defective roll followed by another defective
roll?
Solution
3
p D1
The first roll of film selected from the box being found defective is event D1 . 10 .
Business StatisticsPage 44
2
p D2 D1
The second roll selected being found defective is event D2 . Therefore, 9 . Since,
after the first selection was found to be defective, only 2 defective rolls of film remained in the
box containing 9 rolls.
So the probability of two defectives is
p D1 and D2
p D1 * p D2 D1
3 2 6
* 0.07
10 9 90 .
Bayes’ Theorem: Bayes’ Theorem is a method of revising probability given that additional
information is obtained. For mutually exclusive and collectively exhaustive events:
P ( A ) P(B/ A)
Bayes’ Theorem P (A/B) =
P ( A ) P ( B / A ) + P ( A ' ) P (B / A ' )
P (A) is known as prior probability. Prior probability is the initial probability based on the
present level of information and p (B/A) is called posterior probability. Posterior probability is
a revised probability based on additional information.
Example: - Once in the night, a speeding taxi struck a man as he crossed the street. An
eyewitness has testified that she thought the taxi (which did not
stop) was blue. The man sued the Blue Cab Company for his medical expenses. The city
where the accident occurred has only two taxi companies: Blue cab and Green cab. Green
cab has 85 percent of the taxis’ in the city. At the trial, the man’s lawyer shows that the
eyewitness is 80 percent reliable in identifying the color of taxis. That is, she was able to
identify correctly the color of taxis 80 percent of the time, under conditions like those of the
night accident. The lawyer concludes that it is extremely likely that aBlue Cab was hit the man.
Business StatisticsPage 45
Solution:
Given: B = Blue E = eyewitness thought that the taxis was blue.
G = Green
P (E/B) = 0.8 P (E/G) = 0.2
P (B) = 0.15 P (G) = 0.85 Required: P (B/E)=?
0.8 x 0.15
P (B/E) = P ( B ) P ¿ ¿ = = 0.41
0.8 x 0.15+ 0.2 x 0.85
i. Multiplication formula: if there are m ways of doing something and n ways of doing another
thing, there are m * n ways of both.
Example: pioneer manufacturers 3 models of stereo receivers, 2 cassette decks, 4 speakers and 3
CD carousels. When the 4 types of components are sold together, they form a “system”. How
many different systems can the electronics firm offer?
ii. Permutation formula: is applied to find the possible number of arrangements when there is
only one group object. A permutation is an arrangement in which the order of the objects
selected from specific pool of objects is important.
n!
Permutation formula n Pr = ( n −r ) ! where:
Business StatisticsPage 46
Example: a machine operator must make 4 safety checks before starting to machine a part. It
does not matter in which order the checks are made. How many different ways can the operator
make the checks?
4! 4! 4 x 3x 2 x1
Solution: n Pr = 4 P 4 = ( 4 − 4 ) ! = 0 ! 1
= 24
iii. Combination formula: if the order of the selected objects in not important, any selection is
called Combination. The formula to count the number of r object combinations from a set of n
objects is:
n!
Combination formula n Cr = r ! ( n− r ) !
Example: a pollster selected 4 of 10 available people. How many different groups of 4 are
10 ! 10! 10 x 9 x 8 x 7 x 6 !
possible? Solution: 10C4 = = = = 210
4 ! ( 10 − 4 ) ! 4!6! 4!6!
Permutation and Combination use a notation called n factorial. It is written n! and means the
product of n(n-1) (n-2) …(1). Example 5! = 5x4x3x2x1 = 120. Zero factorial, written 0! is 1.
That is, 0! = 1
Or it is a listing of all the outcomes of an experiment and the probability associated with each
outcome.
Example
To begin our study of probability distribution, let’s go back to the idea of a fair coin, suppose we
toss a fair coin twice the possible outcomes are:
Business StatisticsPage 47
First toss Second toss Number of Probability of the
heads on four possible
H H 2 0.5*0.5 0.25
Total 1.0
Business StatisticsPage 48
Probability Distribution
Couchy distribution
Business StatisticsPage 49
Example
Suppose we were examining the level of effluent in a variety of streams and we measured the
level of effluent by parts of effluent per million parts of water. We would expect quite a
continuous range of parts per million (ppm), all the way from very low levels is clear mountains
streams of extremely high levels in polluted streams. We would call the distribution of this
variable (ppm) a continuous distribution.
Discrete Random Variable: a random variable that can assume only certain clearly separated
values. It is usually the result of counting something or separated fractional or decimal values.
The expected value, mean or mathematical expectation of a random variable is the central
tendency measure of a random variable. Expected Value of a Random Variable: The
expected value of discrete random variable x, denoted by E(x) or μ, is the weighted mean of the
possible values that the random variable can assume, where the weight attached to each
value is the probability that the random variable will assume this value. In other words,
= √2
Business StatisticsPage 50
This distribution is one of the widely used probability distribution of a discrete
random variable. It describes discrete, not continuous, data resulting from an experiment
known as Bernoulli process (or experiment). This distribution was first developed by 17th
century Swiss mathematician, Jacob Bernoulli.
Properties of Binomial Experiment
2. In each trial there are only two possible outcomes. We refer to one outcome as ‘success’ and
the other as ‘failure’.
3. The probability of a success on one trial is denoted by P and does not change from
one trial to another. And the probability a failure, denoted by q, which is equal to 1-P,
does not change from trial to trial. (Stationary assumption);
4. Statistically, the trials are independent. Means the outcome of one trail does not affect the
outcome of other trails.
= ( nr) Pr qn-r
= nCr Pr qn-r
n!
P(r) = r ! ( n− r ) ! Pr qn-r
Business StatisticsPage 51
(c) Exactly one boy
Solution
Let us consider the event that a newly born child is a boy as success in Bernoulli trial with
2
probability of success 5 . Let the number of boys be a random variable X . Then X can take
values 0, 1, 2, 3, and 4.
x 4 x
2 4 2 3
f x, 4, for x 0,1, 2,3, 4
5 x 5 5 .
4 4 4
4 2 3
p all boys p x 4 4 5 5 0.0256 .
a)
0 4 0
4 2 3
p no boys p x 0 0 5 5 0.1296 .
b)
1 4 1
4 2 3
p exactly one boy p x 1 1 5 5 0.3456
c) .
Mean (Expected Value) and Variance of a Binomial Distribution
Mean (Expected value) of a binomial distribution = p n = µ
The variance of a binomial = σ2 = npq = n p(1-p) distribution
The standard deviation is = σ = √ nPq
Example 1:-Take the case of a packaging machine that produces 20 percent defective
packages. If we take a random sample of 10 packages, compute the mean (expected value) and
the standard deviation of the binomial distribution of that process like this?
This distribution is closely related to binomial probability distribution. But in hyper geometric
probability distribution, the trials are not independent. Thus, the probability of success changes
from trial to trial, the objective is to choose random sample of n-items out of a population of N
under condition that once an item has been selected, it is not returned to the population
(without replacement).
Business StatisticsPage 52
Properties of Hyper geometric Probability Distribution (Conditions for Hyper
Geometric Probability Distribution)
a) The events are two kinds only
b) The probability of success change in each trail
c) Events are dependent (because selection is without replacement but the manner of
dependence is one of kind only)
d) The trail are done fixed number of times
P(x) = (S C x) (N-S C n-x)
N Cn
Where: N= Population size
S = Number of successes in population
n = Sample size
x = Number of successes in a sample.
C = is the symbol for a combination
Examples 1:- If from 10 new technologies, 4 of them are classified as inappropriate to
local condition. What is the probability that a randomly selected 3 technologies without
replacement will contain 2 inappropriate technologies?
The hyper geometric probability formula P(r) = S N- S
x n- x
N
x
3 The Poisson Probability Distribution
The Poisson distribution named for its originator Simeon Denis Poisson (1781 – 1840), a
French man who developed the distribution from studies during the latter part of his lifetime.
This distribution is used to analyze the probability of small events or improbable events
within given time like the number accidents in given road and the number radiation leakages
within given time. Other examples of Poisson distribution include the distribution of telephone
calls going through a switch board system in given time, the demand of patients for service at a
health institution at given time period, the arrivals of trucks and cars at a toll booth in given time
period, and the a number of accidents at an intersection with in given time period, etc.
Business StatisticsPage 53
1. The probability of an occurrence of the event is the same for any two intervals of
equal length.
2. The occurrence or non-occurrence of the event in any interval is independent of the
occurrence or nonoccurrence in any other interval.
The Poisson probability function is given by probability of: P(x) = μx e-μ
x!
Where: P (x) = Probability of x occurrences in an interval or in a group
μ = the expected value or the average number of occurrences
e = constant equals to 2.71828 …
Example: A certain restaurant has a reputation for good food. The restaurant management boasts
that on a Saturday night, groups of customers arrive at a rate of 15 groups every half an hour, on
average.
a) What is the probability that 5 minutes will pass with no groups of customers arriving?
b) What is the probability that 8 groups of customers will arrive in 10 minutes?
A continuous random variable X is said to have a normal distribution if its density function is
given by
x 2
1
f x, , 2 e 2 2
; x 1
2
Business StatisticsPage 54
The variable X whose density function given in (1) is called normal variate with parameters
N , 2
.
and and is denoted by The parameters and are actually the mean and
2 2
Business StatisticsPage 55
X
Z
If X is a normal variable with parameters and , then
2
is a standard normal
distribution with mean zero and variance unity (standard deviation 1). The density function of
Z is
2
1 z2
f z , 0,1 e ; z
2
p a x b F b F a
Example
A company produces light bulbs whose life times follows a normal distribution with mean 1200
hours and standard deviation 250 hours. If a light bulb is chosen randomly from the company’s
output, what is the probability that its life time will be between 900 and 1300 hours?
Solution
900 X 1300
p 900 x 1300 p
900 1200 1300 1200
p z
250 250
p 1.2 z 0.4
p z 0.4 p z 1.2
Business StatisticsPage 56
0.65542 0.11507
0.54035 (By using Normal table)
Hence, the probability is approximately 0.54 that a light bulb will last between 900 and 1300
hours.
Example
A very large group of students obtains test scores that are normally distributed with mean 60 and
standard deviation 15. What proportion of students obtained scores?
X 85 85 60
p p z
p x 80 15
a)
p z 1.67 p z 1.67
X 90 90 60
p p z
p x 90 15
b)
p z 2 1 p z 2 1 p z 2
85 X 95
p 85 x 95 p
c)
85 60 95 60
p z
15 15
p 1.67 z 2.33
Business StatisticsPage 57
p z 2.33 p z 1.67
0.9901 0.9525
0.03756 (By using Normal table)
Example
The average daily sales of 500 branch office were Tk. 150 thousands and the standard deviation
Tk. 15 thousands. Assuming the distribution to be normal indicate how many branches have
sales between
120 X 145
p 120 x 145 p
a)
120 150 145 150
p z
15 15
p 2 z 0.33
p z 0.33 p z 2
Hence, the expected number of branches having sales between Tk. 120 thousands and Tk. 145
thousands are
140 X 165
p 140 x 165 p
b)
140 150 165 150
p z
15 15
p 0.67 z 1
Business StatisticsPage 58
p z 1 p z 0.67
Hence, the expected number of branches having sales between Tk. 140 thousands and Tk. 165
thousands are
General rule: If both Zs are on the same side of the mean, then the area between them
can be obtained by subtracting. And if both Zs are on the opposite side of the mean, then
the area between them can be obtained by summing the two values.
Inverse Use of the Standard Normal probability Table
This means to find the value of Z, which corresponds to a given probability (P) in the table.
Example
1. Z (p) = Z (0.4864) = 2.21
2. Z (p) = Z (0.4922) = 2.42
What you have to do is reverse the early procedure. First find the closest approximate in
probabilities and if you go horizontal you will get the first decimal and first digit. And if
you go up you will get the second decimal of the Z value.
Given probability we can find the value of Z, then change Z to X value using the formula
Business StatisticsPage 59
UNIT 5: SAMPLING AND SAMPLING DISTRIBUTIONS
5.1 SAMPLING THEORY
5.1.1 Basic definitions
1. Population/Universe: It refers to the aggregate of statistical information in which all
members are covered by an investigation or enquiry. For example: marks obtained by students
in class 12th in Ethiopia.
-Total number of elements under investigation.
2. (N) Sampling frame: the list or procedure for defining the population.
3. (n) Sample: It refers to the selection of the part of population with a view that it represents
whole population. Or subset of whole population.
4. Statistic: a number that describes some attribute of the sample. Ex. The average income of the
residents. (You can then use this to get to an estimate of the population parameter.
5.1.2 Need for sample
a. To get the maximum information about the population with minimum effort
b. By using sampling it saves our time and money because we do not collect data from whole
population.
c. Destruction of test units: if we want to know the quality of chocolates and if we check all
chocolates one by one so there will be possibility that all chocolates get waste or destroy. This is
one of the reason we used samples.
d. The physical impossibility of checking all items in the population.
e. Give accurate and reliable results.
Business StatisticsPage 60
c. For quality control in during production.
1) Random or Probability Sampling: Every subject in the sample has the same chance of
getting selected. Therefore, the sample group possesses the same characteristics of the larger
population.
a. Simple Random Sample: (random selection) procedure that generates numbers or cases
strictly on the basis of chance. Ex: Lottery method. In this method each item has the equal
chance of being selected.
b. Systematic Sampling: under this method selection of items will be done after fix
distance/interval. Example: every 5th item from the population is selected.
Business StatisticsPage 61
c. Stratified Sampling: under this method the population is divided into different groups and
then from the groups sample will be identified randomly. Example population is divided into two
groups i.e. male and female and then from male and female sample will be selected randomly.
2) Non-random or Non Probability Sampling: used when probability sampling is too
expensive, or when exact representation of population is not important to study, or when the
population cannot be defined.
a. Judgment/Purposive sampling: under this method selection of sample is based on individual
judgment.
b. Convenience sampling: results from hanging out, use whoever around is. Stand on street
corner and survey people.
c. Snowball sampling (used references): one member of the sample is identified and then they
identify another person who could take part in the study and so on.
d. Quota sampling: Under this method a fixed quota is assigned. Ex: 50 salaried persons in the
age group of 25-30 years. Within this quota, the selection of sample items depends entirely on
personal judgment.
Non sampling error: A statistical error caused by human error to which a specific statistical
analysis is exposed. These errors can include, but are not limited to, data entry errors, biased
Business StatisticsPage 62
questions in a questionnaire, biased processing/decision making, inappropriate analysis
conclusions and false information provided by respondents
For example, a survey of high school students to measure teenage use of illegal drugs will be a
biased sample because it does not include home schooled students or dropouts.
A sample is also biased if certain members are underrepresented or over represented relative
to others in the population.
For example, distributing a questionnaire at the end of a 3-day conference is likely to include
more people who are committed to the conference so their views would be overrepresented.
Selecting a sample using a telephone book will under represented people who cannot afford a
telephone, do not have a telephone, or do not list their telephone numbers.
*Sampling bias can occur any time your sample is not a random sample*
Why does it matter?
Sampling bias means that the data you collect may not be accurate or represent the group.
How can we know if the sample is biased?
Sometimes you can identify sampling bias just by being very thoughtful and comparing the
characteristics of respondents in your sample to what you know about the population in general.
Think about the demographic characteristics that might have an important relationship to their
answers. For example, if you know that gender is an important variable, and you know that the
population includes 50% males and 50% females, then the sample needs to include the same
proportions. If the sample includes 20% males, your results are likely to be biased because you
don’t have enough responses from men
Parameter: A parameter is a statistical measure based on each and every item of the population.
Business StatisticsPage 63
SYMBOLS USED
Population Sample
Usually parameters are unknown and statistics are used to know the estimate of the
population.
Example: suppose a university class has 16 students and professor want to know the average age
of students. Suppose population mean i.e. mu is unknown to the professor. Suppose professor
know only age of 3 students i.e. n=age of 3 students is 20, 35, 40.
20+35+40/3= 33.33
Mean of 3 samples= 33.33 but we want to know the population mean i.e. µ (mu)
Sample means vary from sample to sample. In repeated sampling the value of the sample mean
vary from sample to sample. But the sampling distribution must be normal.
- Sampling distribution of the sample mean is a probability distribution consisting of a list of all possible
sample means of a given sample size selected from a population, and the probability of occurrence associated
with each sample mean. It is also called the distribution of
Consider population with a mean of µ and standard deviation of õ. Then if we draw sample cases of size n from the
total population (N), the number of possible sample kinds of the same size that can be drawn from this population
Business StatisticsPage 64
is given by NCn or CNn . Hence, the sample means that we can have are different according to the elements that
comprise our sample.
Where: N= population size, n = sample size, X = the number of elements in the population or
sample that possess a specific characteristic.
In statistics and probability theory, the standard deviation (represented by the Greek letter
sigma, σ) shows how much variation or dispersion from the average exists. A low standard
deviation indicates that the data points tend to be very close to the mean (also called expected
value); a high standard deviation indicates that the data points are spread out over a large range
of values.
What is standard error?
Standard error of the given statistic is the standard deviation of sampling distribution of that
statistic.
Sample size Standard error
Increase Decrease
Decrease Increase
Business StatisticsPage 65
Standard error of mean Standard error of proportion
When population standard deviation is known When population proportion is known
whether sample size is large or small
Standard error of mean (S.E.x) =σ
Standard error (S.E.P) = PQ
n
n
where σ = population standard deviation
where, P= population proportion
n= sample size
Q=1-P; n= Sample size
Introduction:
As its name suggests, the objective of estimation is to determine the approximate value of
a population parameter on the basis of a sample statistic. For example, the sample mean
is employed to estimate the population mean.
Statistical estimation is also the process of estimating the value of a parameter from
information obtained from a sample.
We refer to the sample mean as the estimator of the population mean. Once the sample mean has
been computed, its value is called the estimate/point estimate. Parameters are estimated with
sample statistic value.
x µ
6.1 Basic Concepts of Statistical Estimation:
1. Estimation: is the process of estimating various unknown population parameters from sample
statistics.
Business StatisticsPage 66
3. Estimate/ point estimate: is numerical value of an estimator. It is the value taken by the
estimator as an estimate of the population parameter.
Example: the sample mean x = 100, sample S.D = 8 minutes, sample proportion: q = 5%.
a) Point estimation
b) Interval estimation
A) Point Estimation:
-is the process of using a single value to estimate a population parameter. It is also a single
number which computed from a sample.
- Example: the elements in a random sample are: 1, 2, 4, 5, 7, and 11. Then, compute the
following:
Solution:
= ∑X/n,
x is the required.
b) Ϭ =√∑(X-µ)2
X-N
n-1
Business StatisticsPage 67
so, X (X- x ) (X- x )2
1 -4 16
2 -3 9
4 -1 1
5 0 0
7 2 4
11 6 36
66
C) Ϭ x = Ϭ/√n
p(1 p)
sp Sq = sample standard error of the proportion
n
Business StatisticsPage 68
q = sample proportion of failure
n = sample size
Example: let even number be a success and suppose a sample of 200 numbers selected randomly
from a population contains 120 even numbers. Calculate point estimate of the standard error of
the proportion?
Solution: the required stands for sample standard deviation of the sample proportion (Sq):
Therefore, Sq = √pq/n = √0.6 x 0.4/200 = 0.0346, is a point estimate of the population S.D of the
sample proportion.
Point Estimates
(a Point Estimate)
Parameter... Statistic
Mean µ x
Proportion P= p̂
Variance Ϭ2 S2
Business StatisticsPage 69
Differences µ1- µ2 x1 _ x 2
1-2 q1-q2
Point and Interval Estimates:
A point estimate is a single number,
A confidence interval provides additional information about variability.
An interval estimate describes a range of values with in which a parameter might lie. Suppose that based on the
sample information, an investigator predicted that mean of a given population is between 6 and 7; this is what we
call an interval estimate.
Example: Sample mean = x = 50, is a point estimation.
Business StatisticsPage 70
I am 95% confident that µ is between 40 & 60. This is an example for interval estimation
Confidence Intervals:
Confidence Interval Estimation for the population Mean (µ) when Ϭ is known and n≥30
x z x z
n n
2) The probability that the population mean is within this range is not 100% (It is possible to
have a very unlikely sample mean comes out)
General Formula
z x z x z x z
n n n n
σ 71
Business StatisticsPage
x z α/2
n
, or
Margin of Error:
Margin of Error (e) is the amount added and subtracted to the point estimate to form the
confidence interval.
σ
The figure in the circle is Margin of error
e z/2
σ n
x z/2
n
σ
e z/2
n
Data variation, σ : e as σ
Sample size, n : e as n
Level of confidence, 1 - a : e if 1 - a
Business StatisticsPage 72
Example1: A sample of 11 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is .35 ohms. Determine
a 95% confidence interval for the true mean resistance of the population.
Solution: σ
x z/2
n
2.20 1.96 (.35/ 11 )
2.20 .2068
1.9932 ............... 2.4068
Interpretation: we are 95% confident that the true mean resistance is between 1.9932
and 2.4068 ohms.
Example 2: In order to know the Korean man’s height’s mean, randomly choose 100 persons.
The sample mean is 171.2 Estimate the mean of Korean man’s height with 95% confidence.
(Population standard deviation is assumed known as 10).
x z x z
n n
Interpretation: The mean belongs to the range [169.24, 173.16] with 95% confidence.
Example 3: A credit union wants to estimate the mean amount of outstanding loans. Past
experience reveals that the standard deviation is 250 birr. Determine a 98% confidence interval
estimate for the mean of all outstanding loans (population mean) if a random sample of 100
outstanding loans has a sample mean of 1, 950 birr.
Ϭ = 250
n = 100
Business StatisticsPage 73
z/2
Therefore, the interval is = x ± Ϭx
= 1950 ± 2.33 (25)
= 1950 ± 58.25
Interpretation: the credit union can say with 98% confidence that the mean amount of
outstanding loans is b/n birr 1891.75 and 2008.25.
Confidence in which the interval will contain the unknown population parameter.
A probability that the population parameter falls somewhere within the interval.
- Denoted by (1 – α)%
x z x z
n n
Business StatisticsPage 74
99% .99 2.57
99.8% .998 3.08
99.9% .999 3.27
Width of interval:-
(x z ) ( x z )
LCLà n n ←UCL
Lower limit ≤ population mean ≤ upper limit
Example 4: Advertisement Sponsors want to know average # of hours children spend watching
TV. Survey 10 kids to keep track of # of hours/ week and the sample mean is 29 and we know
that population s.d. = 8.0 hours by past experience The data follows normal distribution. ®
Find estimate of # of average hours kids are watching TV with 95% confidence interval.
A: parameter to be estimated = m
8 8
x z 2 29 z.025 29 1.96 29 4.958 [24.04,33.96]
n 10 10
Business StatisticsPage 75
Expect the sample mean value:-
Example: Expect the sample mean:
à When we make an interval with the center of sample mean value, μ belongs to the new range
with 95%
z x z x z x z
n n n n
This formula works only when x follows normal distribution and σ known.
Assumptions:-
Business StatisticsPage 76
When the original variable is normally distributed and Ϭ is known, the standard normal
distribution can be used to find confidence intervals regardless of the size of the sample.
When n≥30, the distribution of means will be approximately normal even if the original variable
distribution departs from normality.
Also, if n≥30 and Ϭ is unknown, S can be substituted for Ϭ in the formula for confidence
intervals; and the standard normal distribution can be used to find confidence intervals for means.
Example: A sample of 50 days showed that a fast-food restaurant served on average of 182
customers during lunch time (11:00 AM to 2:00 PM). The standard deviation of the sample was
8. Find the 90% confidence interval for the mean (population mean).
Step 2: Find
z/2 , Z value.
Subtract 0.05 from 0.5000 to get 0.4500.
α= 1-0.90 = 0.10
α/2 = 0.05
Therefore, Z = 1.65.
x -Zα/2 (S/√n) < µ< x + Zα/2 (S/√n), here S is used in place of Ϭ when Ϭ is unknown, since
n≥30).
Business StatisticsPage 77
= 180.1 <µ< 183.9, or
Hence, one can be 90% confident that the true population mean is b/n 180 and 184, or 182 ± 2.
With σ Known
x z / 2 x z / 2
n n
With σ Unknown, s s
x t / 2 x t / 2
n n
Two things changed
2) z à t (from t-distribution)
Assumptions:
s
x t/2
n
6.3 Student’s t Distribution
When Ϭ is known and n≥30 or when Ϭ is unknown and n≥30, the standard normal
distribution should be used to find confidence intervals for the population mean.
t distribution must be used when the sample size is less than 30 and the variable is
normally or approximately normally distributed.
-t-distributions are bell-shaped and symmetrical about the mean, but have ‘fatter’ tails
than the normal.
-Note: t àz as n increases
- Number of observations that are free to vary after sample mean has been
calculated, d.f. = n - 1
Degrees of Freedom (df): are the number of values that occur after a sample statistic has been
computed.
Idea: Number of observations that are free to vary after sample mean has been calculated.
Formula for finding a confidence interval for the population mean when Ϭ is unknown and
n<30:
s s
x t / 2 x t / 2
n n
Business StatisticsPage 79
Example 1:
d.f. = n – 1 = 24, so t
/2 , n 1 t .025,24 2.0639
s 8
x t/2 50 (2.0639)
n 25
46.698 …………….. 53.302
Example 2:
Advertisement Sponsors want to know # of hours children spend watching TV. Survey of 10
kids to keep track of # of hours / week and the mean is 29 and the standard deviation is 8.2. The
data follows normal distribution. ® Find estimate of # of average hours N.A. kids are watching
TV with 95% confidence interval
A: parameter to be estimated = m
s 8.2 8.2
x t 2,n 1, 29 t0.025,9 29 2.262 29 5.866 [23.13,34.87]
n 10 10
vs. [24.04, 33.96]: example 2
Slightly bigger interval than z-estimation (reflects the uncertainty of unknown σ).
t-distribution table
t Value:
Business StatisticsPage 80
If σ is known: Use normal distribution.
The required sample size can be found to reach a desired margin of error (e) and level of
confidence (1 - a) using margin of error formula. e= Z a/2 ( σ/ )
n
Required sample size, σ known:
2
z2/2 σ 2 z/2 σ
n 2
e e
Example: If s = 45, what sample size is needed to be 90% confident of being correct within ± 5?
2 2
z σ 1.645(45)
n /2 219.19
e 5
So the required sample size is n = 220, (Always round up).
An interval estimate for the population proportion (p) can be calculated by adding an
allowance for uncertainty to the sample proportion
p̂ ( ).
Recall that the distribution of the sample proportion is approximately normal if the
sample size is large, with standard deviation:
p(1 p)
σp
n
p(1 p)
We will estimate this with sample data: sp
n
Business StatisticsPage 81
(When n > 20, np > 5, and n (1 - p) > 5)
Confidence interval endpoints: Upper and lower confidence limits for the population
proportion are calculated with the formula,
p (1 p )
p z/2
n
Where:
Example: A survey is conducted to see if what percentage of CEOs has MBA degree for mid-
size companies. 97 CEOs have MBA degree out of 344 people. Estimate the proportion of
MBA degree holders among all mid-size companies. (Confidence level 95%)
Answer:
Example: A random sample of 100 people shows that 25 are left-handed. Form a 95%
confidence interval for the true proportion of left-handers.
Solution:
p 25/100 .25
Sp p(1 p)/n .25(.75)/n .0433
Increases in the sample size reduce the width of the confidence interval.
Example:
If the sample size in the above example is doubled to 200, and if 50 are left-handed in the
sample, then the interval is still centered at .25, but the width shrinks to .19 …….31
p (1 p )
Define the margin of error: e z/2
n
Solve for n:
z2/2 p (1 p )
n
e2
P can be estimated with a pilot sample, if necessary (or conservatively use p = .50).
Example:
E = .03
So use n = 451
Introduction
Business StatisticsPage 83
Meaning of hypothesis: Hypothesis is an assumption or an informed guess made
about a population characteristic. It can also be defined as an unproven statement or
proposition about something under investigation.
Hypothesis: is statement about population parameter. Or it is an assumption/ guess
about population parameters.
Hypothesis testing starts with formulation of a hypothesis and ends with a decision to
accept or reject the hypothesis.
There is five-step procedure that systematizes hypothesis testing. These are;
Step 1: state the null hypothesis (H0) and alternate hypothesis (HA)
The first step is to state the hypothesis being tested, that is null hypothesis. It is
designated H0 were H stands for hypothesis, and the subscript zero implies “no
difference.”
Null hypothesis:
- It is a statement that is not rejected if our sample data fail to provide convincing
Business StatisticsPage 84
Alternate hypothesis: is a statement that is accepted if the sample data provide enough
evidence that the null hypothesis is false. It is written HA. It describe that you will
conclude if you reject the null hypothesis.
In summary:
The first step of statistical testing is to convert the research question into null and
alternative forms.
H0 is a statement of “no difference” i.e. H0: μ1 = μ2 (Null Hypothesis)
H1: μ1 ≠ μ2 (in case of two tail test) alternate hypothesis
H2: μ1 > μ2 (in case of right tale test) alternate hypothesis
H3: μ1 < μ2 (in case left tale test) alternate hypothesis
Step 2: selecting the level of significance
Level of significance is a measure of degree of risk that a researcher might reject the null
hypothesis when the null hypothesis is not true.
The choice of the level of significance should be made before we collect the data. The most
common level is .05 or 5%, although .01 or 1% is also widely used. A 5% level of significance
implies that there is 5% probability that we may wrongly conclude that there is a difference
between the sample statistic and the hypothesized population parameter, when there is no
difference between them.
Critical region/rejection region: If the value of the statistic falls in the critical region, the null
hypothesis is rejected. If the value of the test statistic does not fall in the critical region the null
hypothesis accepted.
Business StatisticsPage 85
Type I error: A Type I error occurs when the researcher rejects a null hypothesis when
it is true. The probability of committing a Type I error is called the significance level.
This probability is also called alpha, and is often denoted by α.
Type II error: A Type II error occurs when the researcher fails to reject a null
hypothesis when it is false. The probability of committing a Type II error is called Beta,
and is often denoted by β. The probability of not committing a Type II error is called the
Power of the test.
There is many sample statistics. In this chapter we will use only Z and t.
Test statistic is a value determined from sample information used to determine whether to
reject the null hypothesis.
samplemean population mean x
z
The z value associated with sample mean is s tan darderror x
In this stage the test statistic is used to test whether we may accept or reject the null hypothesis.
There are different test statistics, some of these test statistics are z-test, t-test and chi-square test.
It is used as a guide in decision making regarding acceptance or rejection of H0.
Decision rule is a statement of the conditions under which the null hypothesis is rejected
and the condition under which it is not rejected.
Critical value: is the dividing point between the region where the null hypothesis is
rejected and the region where it is not rejected.
In general,
Business StatisticsPage 86
If Calculated value < Table Value (H0 not rejected)
If p-value < 0.05 then a significant relation exists between the dependent & independent
variable i.e. it is because of some assignable cause (H0 rejected)
If p-value > 0.05 then a no significant relation exists between the dependent &
independent variable i.e. it is because of chance only (H0 not rejected).
Or
Simple concept to learn when p-value is low i.e. less than .05 null hypothesis reject (when p
is low H0 go).
Based on sample information, compute for test statistics (i.e. z or t), and check against decision
rule. Finally make decision.
Business StatisticsPage 87
Is always about a population parameter, not about a sample statistic.
This is : μ 3
H 0correct H 0 : x 3
◦ e.g.: The average number of TV sets in U.S. homes is less than 3 ( HA: m < 3 )
Business StatisticsPage 88
Is generally the hypothesis that is believed (or needs to be supported) by the researcher
Level of Significance, a
Types of error: These are errors that are committed in making decisions.
In hypothesis testing sample evidence is used to test the null hypothesis. If the sample evidence convinces us that
the null hypothesis has a very small chance of being correct, the hypothesis will be unreasonable and hence should
be rejected. But if it has a greater chance of being true, then it should be accepted. No matter how we cannot be
100% sure about our conclusion as it is based on sample evidence. That means there is a possibility to accept a
false hypothesis or reject a true hypothesis. Hence, our conclusion is erroneous if a true hypothesis is rejected or if
a false one is accepted. From this one can easily understand that there are two possible errors, the first to reject a
true hypothesis, which is called type one error (a) and the second to accept a false hypothesis, which is called type
two error (B). Hence, type one error is said to have been committed only if a true hypothesis is rejected and type
two when a false hypothesis is accepted.
Both the two types of errors are not desirable as far as the reliability of the conclusion concerns. Hence,
statisticians should strive to avoid both errors entirely. However, it is not possible to avoid the possibilities of
making these errors as long as our conclusion is based on sample information. Hence, the objective should be to set
the chance of making errors at a low value. No matter how, there is one thing that must be taken into consideration
in doing so. To set both the two kinds of errors at a low value is somehow impossible as there is a tradeoff between
the two types. That means when we try to set the possibility of committing type 1 error at a low value the
possibility of committing type two errors will be higher. This by no means that 1 rather it means the
Business StatisticsPage 89
Type I Error: The researcher rejects a null hypothesis that actually is true. This error is
shown by alpha (α).
Type II Error: The researcher accepts a null hypothesis that is actually not true. It is
shown by beta (β).
State of Nature
Example:
Decision
Business StatisticsPage 90
Under the modern justice system, Type I error is more serious
• A person is accused of crime, and the jury does not know which is really true, and make a
decision on the basis of evidence found (Choose H0 or H1). However, the process is not
equal
In case of not enough evidence, the jury does not reject H0 (innocent) & do not accept H1
(guilty) (But....we don’t say we accept H0)
◦ β when the difference between hypothesized parameter and its true value
◦ β when a
◦ β when σ
◦ β when n
Business StatisticsPage 91
Step 2: select z test if sample size is more than 30 or population standard deviation is known.
Step 3: calculate standard error of mean by using following formula
S.E. X =
n
Where = Population standard deviation, n= sample size
Step 4: calculate the value of Z as follows:
Z= X–μ
S.E. X
Where, X = Sample Mean, μ= Population mean
Step 5: Calculate value of z at 5 % level of significance from normal distribution table.
If calculated value is less than table value, we accept null hypothesis and conclude that there is
no significant difference
But if calculated value is more than table value, we reject null hypothesis and conclude that
there is significant difference.
How clear the evidence should be? (In order to reject H0)
- When the judge sentenced guilty with 70% crime evidence, guilty decision à 30% misjudge
possible! (too risky!)
-When the judge think 97% the judge is guilty, Guilty decision à 3% misjudge possible (very
small)
In democratic culture, try to lower the mistake probability. But, if we allow 0.1% mistake
probability à hard to punish any criminals:
In statistics à
Business StatisticsPage 92
The smaller the α is, safer (mistake possibility smaller) but harder to prove something.
Practical example
Question: Philips Company claims that the length of life of its electric bulb is 2000 hours with
standard deviation of 30 hours. A random sample of 25 showed an average life of 1940 hours
with a standard deviation of 25 hours. At 5% level of significance can we conclude that the
sample has come from a population mean of 2000 hours?
Solution:
Step 1: set up a null hypothesis:
Step 5: Decision
Since the calculated value of Z is more than the table value, hence we reject null hypothesis
and conclude that sample has not come from the population with mean of 2000 hours.
If the increase (decrease) in one variable results in the corresponding increase (decrease) in the
others i.e. if the changes are in the same directions the variables are positively correlated. For
example, the heights and weights of a group of persons are positively correlated, advertising and
sales.
If the increase (decrease) in one variable results in the corresponding decrease (increase) in the
others i.e. if the changes are in the opposite directions the variables are negatively correlated.
For example, T.V registration and cinema attendance is negatively correlated.
Correlation thus expresses the relationship through a relative measure of change and it has
nothing to do with the units in which the variables are expressed.
Scatter Diagram
Business StatisticsPage 94
Scatter diagram (or Dot gram or Scatter gram) is a simple and attractive method of diagrammatic
represent of variable distribution for ascertaining the nature of correlation between the variables.
On the other hand, a scatter plot of two variables shows the values of one variable on the Y -axis
and the values of the other variable on the X -axis. Scatter plots are well suited for revealing the
relationship between two variables.
Scatter Diagram
70
60
............... y ........
50
40
30
20
10
0
0 20 40 60 80 100 120
............. x ........
Types of Correlation
Correlation is described or classified in several different ways. Three of the most important are:
If two variables changes in the same direction (i.e. if one increases the other also increase or if
one decreases the other also decreases) then this is called a positive correlation. For example:
Business StatisticsPage 95
X Y X Y
10 15 80 50
12 20 70 45
14 22 60 30
18 25 40 20
20 37 30 10
If two variables change in the opposite direction (i.e. if one increases, the other decreases and
vice versa); then the correlation is called a negative correlation. For example: T.V registrations
and cinema attendance.
Negative Negative
Correlation Correlation
X Y X Y
20 40 100 10
30 30 90 20
40 22 60 30
60 15 40 40
80 12 30 50
Business StatisticsPage 96
In partial correlation we recognize more than two variables. But consider only two variables to
be influencing variable being kept constant. For example, in the rice problem taken above if we
limit our correlation analysis of yield and rainfall to periods when a certain average daily
temperature existed, it becomes a problem of partial correlation.
and if the graph is not in a straight line, the correlation is non-linear and curve-linear.
The distinction between linear and non-linear correlation is based upon the constancy of the ratio
of change between the variables. If the amount of change in one variable tends to bear a constant
ratio to the amount of change in the other variable then the correlation is said to be linear. For
example, observe the following two variables X and Y:
X: 10 20 30 40 50
Y: 70 140 210 280 350
It is clear that the ratio of change between the two variables is the same. If such variables are
plotted on a graph paper all the plotted points would fall on a straight line.
Scatter Diagram
400
........... y ............
300
200
100
0
0 20 40 60
........... x ............
Correlation would be called non-linear or curvilinear if the amount of change in one variable
doesn’t bear a constant ratio to the amount of change in the other variable. For example, if we
double the amount of rainfall, the production of rice or wheat etc. would not necessarily be
doubled.
Business StatisticsPage 97
Scatter Diagram
........... y ..........
2000
1000
0
0 10 20 30 40 50
....... x ...........
Degrees of Correlation
Through the coefficient of correlation, we can measure the degree or extent of the correlation
between two variables. On the basis of the coefficient of correlation we can also determine
whether the correlation is positive or negative and also its degree or extent.
Perfect correlation: If two variables changes in the same direction and in the same
proportion, the correlation between the two is perfect positive. According to Karl
Pearson the coefficient of correlation in this case is 1 . On the other hand, if the variables
change in the opposite direction and in the same proportion, the correlation is perfect
negative. Its coefficient of correlation is -1 . In practice we rarely come across these types
of correlations.
Business StatisticsPage 98
Absence of correlation: If two series of two variables exhibit no relations between them
or change in variable does not lead to a change in the other variable, then we can firmly
say that there is no correlation or absurd correlation between the two variables. In such
a case the coefficient of correlation is 0.
Limited degrees of correlation: If two variables are not perfectly correlated or is there a
perfect absence of correlation, then we term the correlation as Limited correlation. It may
be positive, negative or zero but lies with the limits 1 .
High degree, moderate degree or low degrees are the three categories of this kind of correlation.
The following table reveals the effect (or degree) of coefficient or correlation.
Absence of
Zero 0
correlation
Perfect correlation +1 -1
Scatter Plot.
Karl Pearson’s coefficient of correlation.
Spearman’s Rank-correlation coefficient.
Method of Least Squares.
In this method the values of the two variables are plotted on a graph paper. One is taken along
the horizontal ( X -axis) and the other along the vertical ( Y -axis). By plotting the data, we get
points (dots) on the graph which are generally scattered and hence the name ‘Scatter Plot’.
Business StatisticsPage 99
The manner in which these points are scattered, suggest the degree and the direction of
correlation. The degree of correlation is denoted by ‘ r ’ and its direction is given by the signs
positive and negative.
If all points lie on a rising straight line the correlation is perfectly positive and r 1 .
Scatter Diagram
150
............. y ...........
100
50
0
8 13 18 23
......... x .............
If all points lie on a falling straight line the correlation is perfectly negative and r -1 .
Scatter Diagram
.......... y .........
100
80
60
40
20
0
10 20 30 40 50 60
.............. x ..........
If the points lie in narrow strip, rising upwards, the correlation is high
degree of positive.
If the points lie in a narrow strip, falling downwards, the correlation is
high degree of negative.
If the points are spread widely over a broad strip, rising upwards, the
correlation is low degree positive.
If the points are spread widely over a broad strip, falling downward, the
correlation is low degree negative.
If the points are spread (scattered) without any specific pattern, the
60
............ y ...........
50
40
30
20
10
0
0 10 20 30 40
............ x ..........
Though this method is simple and is a rough idea about the existence and
the degree of correlation, it is not reliable. As it is not a mathematical
method, it cannot measure the degree of correlation.
16
14
12
10
Profit
8 Profit(Lakhs of Rs.)
6
4
2
0
0 5 10 15
Capital Employed
By looking at the scatter diagram we can say that the variables profits and capital employed are
correlated. Further, correlation is positive because the trend to the points is upward rising from
the lower left hand corner to the upper right hand corner of the diagram.
Of the several mathematical methods of measuring correlation, the Karl Pearson’s method,
popularly known as Pearsonian coefficient of correlation, is most widely used in practice. The
coefficient of correlation is denoted by the symbol r. If the two variables under study are X and
Y, the following formula suggested by Karl Pearson can be used for measuring the degree of
relationship.
XY N
X Y
r
X Y
2 2
X 2 Y 2
N N
The value of the coefficient of correlation as obtained by the above formula shall always lie
between 1 .
When r 1 , it means there is perfect positive correlation between the variables.
Example1: Calculate the coefficient of correlation between the heights of father and his son for
the following data.
Height of father 16 16 16 16 16 16 17
172
(cm): 5 6 7 8 7 9 0
Height of son 16 16 16 17 16 17 16
171
(cm): 7 8 5 2 8 2 9
Solution:
X 2 Y 2
N N
X 2
= 225828 X =1344 N 8
Y 2
= 228532 Y = 1352 XY = 227160
1344 1352
227160
r 8
1344
2
1352
2
225828 228532
8 8
= 0.603022689 = 0.603
Example2: The following data consist of observations for the weights of 10 different
automobiles (in 1000 pounds) and the corresponding fuel consumptions (gallons per 100 miles).
X 2 Y 2
N N
X 2
= 89.29 X = 29 N 8
Y 2
= 207.31 Y =43.9 XY =135.8
29 43.9
135.8
r 10
29
2
43.9
2
89.29 207.31
10 10
= 0.976629971 = 0.976
6 d i 6 d i
2 2
1 1
n3 n n(n 2 1)
Where d stands for the difference between the pair of ranks and n the number of paired
observations
The value of Spearman’s rank correlation coefficient ranges between -1 and 1 .When
is 1 , the concordance between rankings is perfect and the ranks are in the same direction. When
is -1 , there is also perfect concordance between rankings but the ranks in opposite direction.
Where Actual Ranks are given the steps required for computing rank correlation are:
6 d i
2
1 3
Apply the formula: n n
Example1: Two managers are asked to rank a group of employees in order of potential for
eventually becoming top managers .The rankings are as follows:
Solution:
A 10 9
B 2 4 By using
C 1 2 Calculator
D 4 3
E 3 1
F 6 5
6 d i
2
6 14
1 3
We know that, n n =1- 103 10 = 0.915
Thus we find that there is a high degree of positive correlation in the ranks assigned by the two
managers.
When we are given the actual data and not the ranks it will be necessary to assigns the
ranks .Ranks can be assigned by taking either the highest value as 1 or the lowest value as 1. But
whether we start with the lowest value or the highest value we must follow the same method in
case of all the variables.
Example1:
Calculate the rank correlation coefficient for the following data of marks of 2 tests given to
candidates for a clerical job:
Preliminary
92 89 87 86 83 77 71 63 53 50
test
Final test 86 83 91 77 68 85 52 82 37 57
Solutions:
92 10 86 9
89 9 83 7
87 8 91 10
86 7 77 5 By using
83 6 68 4 Calculator
77 5 85 8
71 4 52 2
63 3 82 6
53 2 37 1
50 1 57 3
We know that,
6 d i
2 6 44
1 1
n3 n = 103 10 = 1-0.267 = 0.733
Thus there is a high degree of positive correlation between preliminary and final test.
Simple linear regression refers to the linear relationship between two variables. We usually
denote the dependent variable by Y and the independent variable by X. A simple regression line
is the line fitted to the points plotted in the scatter diagram which would describe the average
relationship between the two variables. Therefore, to see the type of relationship, it is advisable
to prepare scatter plot before fitting the model. The linear model is: Regression coefficient of Y
on X is Y a b X
X Y
XY n
b
( X ) 2
X 2
n , and the intercept
Y b X
a Y bX = n n where b= slope & a= constant
Sales
X 91 97 108 121 67 124 51 73 111 57
Purchase
Y 71 75 69 97 70 91 39 61 80 47
Solution:
Sales
X Purchase
Y X2 Y2 XY
X = 900 Y = 700 X 2
= 87360 Y 2
= 51868 XY = 66900
X Y 900 700
XY n
66900
10
b yx
( X ) 2 900
2
X 2
n
87360
10 0.613207547 = 0.613
Y b X 700
0.613
900
a Y b X = n n = 10 10 = 14.81
Least Square principle determines a regression equation by minimizing the sum of the squares of
the vertical distances between the actual y values and the predicted values of y.
Coefficient of Determination
If the value of r 0.9 , r will be 0.81 and this would mean that 81% of the variation in the
2