0% found this document useful (0 votes)
13 views109 pages

Statistics

The document provides a comprehensive overview of statistics, defining it as the science of collecting, organizing, presenting, analyzing, and interpreting numerical data for effective decision-making. It distinguishes between descriptive and inferential statistics, outlines the stages of data handling, and discusses the applications and limitations of statistics across various fields. The importance of proper data collection and interpretation is emphasized, along with the classification of data types and scales of measurement.

Uploaded by

naserdurri202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views109 pages

Statistics

The document provides a comprehensive overview of statistics, defining it as the science of collecting, organizing, presenting, analyzing, and interpreting numerical data for effective decision-making. It distinguishes between descriptive and inferential statistics, outlines the stages of data handling, and discusses the applications and limitations of statistics across various fields. The importance of proper data collection and interpretation is emphasized, along with the classification of data types and scales of measurement.

Uploaded by

naserdurri202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 109

Chapter One

1. Introduction:
‘The fundamental gospel of statistics is to push back the domain of ignorance, prejudice, rule of
thumb, arbitrary or premature decision, tradition and dogmatism, and to increase the domain in
which decisions are made and principles are formulated on the basis of analyzed quantitative
facts’ (Robert W. Burgess).

The word statistics is used in two different meanings. The most popular conception of statistics is
that it is quantitative figures. It is numerical description. It refers to quantitative aspect of things.
For instance, the number of child born in a year, number of schools and colleges in a state: the
second meaning of the word is a body of scientific principles and techniques.

1.1 Definition of statistics:

Proper Collection of Data Organization & Classification of Data Collection of Data


Descriptive

Presentation of Data
Interpretation of Data Analysis of Data

Statistics is defined as the science of collecting, organizing, presenting, analyzing and


interpreting numerical data to assist in making more effective decisions.

It is sciences of conducting studies to collect, organize, summarize, analyze and draw


conclusions from data.

Business StatisticsPage 1
Statistics is also Collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions.
Statistics is a branch of science that deals with data analysis.

The primary objective of statistical analysis is to determine certain characteristics of a group


from a representative sample. To be valid, this generalization must consider certain important
concepts.

Data Collection: This is a stage where we gather information for our purpose

o If data are needed and if not readily available, then they have to be collected.

o Data may be collected by the investigator directly using methods like interview,
questionnaire, and observation or may be available from published or unpublished sources.

o Data gathering is the basis (foundation) of any statistical work.

o Valid conclusions can only result from properly collected data.

Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data might involve irrelevant
figures, incorrect facts, omission and mistakes. Errors that may have been included during
collection will have to be edited. After editing, we may classify (arrange) according to their
common characteristics. Classification or arrangement of data in some suitable order makes the
information easy for presentation.

Data Presentation: The organized data can now be presented in the form of tables and diagram.
At this stage, large data will be presented in tables in a very summarized and condensed
manner. The main purpose of data presentation is to facilitate statistical analysis. Graphs and
diagrams may also be used to give the data a vivid meaning and make the presentation attractive.

Data Analysis: This is the stage where we critically study the data to draw conclusions about the
population parameter. The purpose of data analysis is to dig out information useful for decision
making. Analysis usually involves highly complex and sophisticated mathematical techniques.
However, in this material only the most commonly used methods of statistical analysis are
included. Such as the calculations of averages, the computation of majors of dispersion,
probabilities and probability distributions.

Business StatisticsPage 2
Data Interpretation: This is the stage where we draw valid conclusions from the results
obtained through data analysis. Interpretation means drawing conclusions from the data which
form the basis for decision making. The interpretation of data is a difficult task and necessitates
a high degree of skill and experience. If data that have been analyzed are not properly
interpreted, the whole purpose of the investigation may be defected and fallacious conclusion be
drawn. So that great care is needed when making interpretation.

1.2 Classification (types) of Statistics: the study of Statistics is usually divided in to two major
categories: Descriptive statistics and inferential statistics.
I. The definition of Statistics given earlier referred to “organizing, presenting, and analyzing…
data.” This facet of statistics is usually referred to as Descriptive statistics.
Descriptive statistics: methods of organizing, summarizing and presenting data in an informative
way.
It seeks only to describe and analyze a sample without drawing any conclusion about a
population. It employs tools such as graphs, charts, tables, averages, mean, mode, etc to describe
the given data set.

Example 1: out of 50 electric light bulbs which are produced by a company weekly, 12 electric
light bulbs are defective.

II. Inferential statistics: - Another facet of statistics is inferential statistics- also called
statistical inference and inductive statistics. Our main concern regarding inferential statistics is
finding out something about a population based on a sample taken from the population.

Inferential statistics: the methods used to find out something about a population based on a
sample. Or Inferential Statistics refers to generalizing from samples to populations using
probabilities, performing hypothesis testing, determining relationships between/among variables,
and making predictions. Making inferences (predictions, decisions) about certain characteristics
of a population is based on information contained in a sample. I.e. it deals with drawing
important conclusions or generalizations about a population based on analysis of a sample.

*here, the statistician tries to make inferences from samples to populations.

* Inferential statistics uses probability-the chance of an event occurring.

Business StatisticsPage 3
Example: In order to estimate the voltage required to cause an electrical device to fail, a sample
of such devices can be subjected to increasingly higher voltages until each device fails. Based on
these sample results, the probability of failure at various voltage levels for the other devices in
the sampled population can be estimated.

Population: a collection of all possible individuals, objects, or measurements of interest.

Sample: a portion, or part, of the population of interest.

1.3 Application of Statistics


In modern times, Statistics is viewed not as a mere device for collecting numerical data but as a
means of developing sound techniques for their handling and analysis and drawing valid
inferences from them.
1. Statistics and Planning
Statistics is indispensable to planning .In the modern age which is termed as ‘the age of
planning’, almost all over the world; governments are resorting to planning for the economic
development.
2. Statistics and Economics
Statistical data and technique of statistical analysis have proved immensely useful in solving a
variety of economic problems, such as wages, prices, analysis of time series and demand
analysis.
3. Statistics and Business
Statistics as an indispensable tool of production control also. Business executives are relying
more or less on statistical techniques for studying the needs and the desires of the consumers and
for many other purposes. The success of a businessman more or less depends upon the accuracy
and precision of his statistical forecasting.
4. Statistics and Industry
In Industry, Statistics is very widely used in ‘Quality Control’. In production engineering, to find
whether the product is conforming to specifications or not, statistical tools, viz, inspection plan,
control charts, etc. are of extreme importance.
5. Statistics and Mathematics
Statistics and mathematics are very intimately related. Recent advancements in statistical
techniques are the outcomes of wide applications of advanced mathematics.

Business StatisticsPage 4
6. Statistics and Medical Science
In medical science, the statistical tools for the collection, presentation and analysis of observed
facts relating to the causes and incidence of diseases and the results obtained from the use of
various drugs and medicines are of great importance.
7. Statistics and Psychology and Education
In education and psychology, too, Statistics has found wide applications e.g., to determine the
reliability and validity of a test,’ Factor Analysis’, etc.
8. Statistics and War
In war the theory of ‘Decision Functions’ can be great assistance to military and technical
personnel to plan ‘maximum destruction with minimum effort.’
Generally, the Functions of statistics are:

Statistics has several major functions in all the economic sectors. Some of the major functions of
statistics include the following:
 It helps prediction of future values of a given variable of interest based on the past and
the present values observed;
 It helps condensing and presenting of data in easily understandable manner;
 Statistical figures on existing situations and on future situations (predicted) help the
design and taking of appropriate polices, strategies and actions;
 Statistical figures also help comparison of the before and after of the introduction of
some polices, strategies or projects.
 Statistics is used in every day to day activities of human being (it is used in almost all
fields of human endeavor i.e. statistics can be used in various occupations), for example:
Limitations of Statistics
Despite its functions and uses, statistics has some limitations. Statistics, with its wide
applications in almost every sphere of human activity, is not without limitations. The major
limitations of statistics include the following:

1. Statistics does not study individuals.


 Statistics deals with aggregates and do no deal with individual entities. As a
result the result statistical results computed regarding a group of individuals may
not be true for some individuals in the group.

Business StatisticsPage 5
 Individual items, taken separately, do not constitute statistical data and are
meaningless for statistical enquiry. Hence, Statistical analysis is suited to only
those problems where group of characteristics are to be studied.
To sum up, it deals on aggregates of facts and no importance is attached to
individual items–suited only if their group characteristics are desired to be
studied.

2. Statistical laws are not exact

On the basis of statistical analysis, we can talk only in terms of probability and chance and not in
terms of certainty. Statistical conclusions are not universally true-they are true only on an
average.

3. Statistics is liable to be misused

 Statistical methods are the most dangerous tools in the hands of the inexpert. The use of
the statistical tools by inexperienced and untrained persons might lead to very fallacious
conclusions.
Unless interpreted properly, statistical results may be misused (can be misguiding); and requires
well skilled and well trained personnel
4. Statistics is not suited to the study of qualitative phenomenon

 It deals with only those subjects of inquiry that are capable of being quantitatively
measured and numerically expressed.
 Statistics deals only with numerical facts while there are a lot of qualitative facts that
needs to be collected and analyzed.
 Statistics, being a science dealing with a set of numerical data, is applicable to the study
of only those subjects of enquiry which are capable of quantitative measurement. As
such, qualitative phenomena like honesty, poverty, culture etc. which cannot be
expressed numerically, are not capable of direct statistical analysis.
5. Statistical data are only approximately and not mathematically correct.

Business StatisticsPage 6
UNIT 2 -DATA COLLECTION AND PRESENTATION

2. Introduction: Data constitute the foundation for statistical analysis. Governments, business
firms and individuals collect statistical data required to carry out their activities efficiently and
effectively. As we discussed in chapter one, Data are the real factors and figures seen or
observed that are collected, organized, presented, summarized, analyzed and
interpretation. From this definition, we can say that statistics as a field of study is only there if
and only if there are data, since by definition statistics as a field of study deals with
data collection, organization, presentation, analysis and interpretation through scientific
/systematic ways to come up with somehow valid generalization akbout the element under
study.

Data are facts and figures that are used to describe individuals (entities) of interest with regards
to a certain variable(s) of interest (data variable). Individuals are objects described by a set of
data [the entities on which data are collected]. Variable is a characteristic of interest for the
individual of a population under study. The concepts of data, individuals and variables can be
best understood through looking at hypothetical example.

2.1. Data Collection


Data collection is the process of gathering of information on variable of interest one wants to
study about individual (entity) of interest. Data collection involves several activities including
definition of the variable(s) of interest we want to study, identification of relevant data sources,
designing of appropriate data gathering methods, designing appropriate data gathering tools (like
questionnaires), etc that will be discussed in detail bellow.

2.1.1 Classification of Data


After collection and editing of data an important step towards processing the data is
classification.
Data can broadly be classified into the following basis:
I. Geographical, i.e. area-wise, e.g. cities, districts, etc
II. Chronological, i.e. on the basis of time

Business StatisticsPage 7
III. Qualitative, i.e. according to some attributes
IV. Quantitative, i.e. in terms of magnitudes.
V. level (scale) of measurement (types of scales)

i. Geographical Classification

In geographical classification, data are classified on the basis of geographical or location


differences between the various items. For example, when we present the production of
sugarcane, wheat, rice etc., for various states, this would be called geographical classification.
Geographical classifications are usually listed in alphabetical order for easy reference. Items may
also be listed by size to emphasize the important areas as in ranking the states by the population.

ii. Chronological Classification

When data are observed over a period of time, the type of classification is known as
chronological classification. For examples, the sales figures of a company are given below:

Year Sales

2001 18810

2002 23601

2003 23816

2004 32435

2005 39343

iii. Qualitative Classification

In qualitative classification, data are classified on the basis of some attribute or quality such as
sex, color of hair, literacy, religion etc. The point to note in this type of classification is that the
attribute under study is blindness, we may found out how many persons are blind in a given
population.

iv. Quantitative Classification

Business StatisticsPage 8
Quantitative classification refers to the classification of data according to some characteristics
that can be measured, such as height, weight, income, sales etc. For examples, the workers of a
factory may be classified according to wages as follows:

Monthly Wages No of Workers

2500-2600 50

2600-2700 200

2700-2800 260

v. level (scale) of measurement (types of scales)


Any aspect of an individual that is measured and take any value for different individuals or
cases, like blood pressure, or records, like age, sex is called a variable. It is helpful to divide
variables into different types, as different statistical methods are applicable to each. The main
division is into qualitative (or categorical) or quantitative (or numerical variables).

Qualitative variable: a variable or characteristic which cannot be measured in quantitative form


but can only be identified by name or categories, for instance place of birth, ethnic group, type of
drug, stages of breast cancer, degree of pain (minimal, moderate, severe or unbearable). So,
qualitative data consists of attributes, labels, or no-numerical entries. e.g. gender, religious
preferences, geographic locations, affiliation, type of automobiles owned, state of birth and eye
color.

When the data are qualitative, we are usually interested in how many or what proportion fall in
each category. For example, what percent of the population has blue eyes? How many Catholics
and how many Protestants are there in the United States? So, when the characteristic being
studied is nonnumeric, it is called a qualitative variable or an attribute.

Quantitative variable: When the variable studied can be reported numerically, the variable is
called a quantitative variable. A quantitative variable is also one that can be measured and
expressed numerically and they can be of two types (discrete or continuous). The values of a
discrete variable are usually whole numbers, such as the number of episodes of diarrhea in the
first five years of life.

Business StatisticsPage 9
A continuous variable is a measurement on a continuous scale. Examples include weight, height,
blood pressure, age, etc.

Quantitative variables are the balance in your checking account, the ages of company presidents,
the life of an automobile battery (such as 42 months) and the number of children in a family.

So, quantitative data consists of numerical measurements or counts. They are numerical in nature
and can be ordered or ranked. E.g. the variable “age” is numerical, and people can be ranked in
order according to the value of their ages. Other examples include heights, weights, and body
temperatures.

Although the types of variables could be broadly divided into categorical (qualitative) and
quantitative, it has been a common practice to see four basic types scales of measurement.

S. S. Stevens (1951) proposed four scale types. These scale types were Nominal, Ordinal,
Interval, and Ratio, and each possessed different properties of measurement systems.

1. Nominal scales /data: - Data that represent categories or names. There is no implied order to
the categories of nominal data. In these types of data, individuals are simply placed in the proper
category or group, and the number in each category is counted. Each item must fit into exactly
one category. The simplest data consist of unordered, dichotomous, or "either - or" types of
observations, i.e., either the patient lives or the patient dies, either he has some particular
attribute or he does not. eg. Example
Religion: Christianity, Islam, Hinduism, etc.
Sex: Male, Female
Eye color: brown, black, etc
2. Ordinal scales: - have order among the response classifications (categories). The spaces or
intervals between the categories are not necessarily equal. The ordinal scale is also used for
telling the number of observations falling in different categories. But in ordinal scale one group
is related to the other group in terms of ordinal value. For example it is common in schools that
teachers rate their students, based on their grade achievements, as excellent, very good, good and
fair. In this type of categorization, the rating “excellent” is obviously is greater than the rating
“very good” in terms of the ordinal value. It should be noted that we cannot tell by how much
does the rating “excellent” exceeds the rating “very good”. We can only tell that the rating

Business StatisticsPage 10
“excellent” is greater that the rating “very good”. The other thing is that the ratings are categories
in which the students are grouped based on their achievements. Like in the case nominal scale, in
the ordinal scale also each category is mutually exclusive.
3. Interval scales / Data: - In interval data, the intervals between values are the same. For
example, in the Fahrenheit temperature scale, the difference between 70 degrees and 71 degrees
is the same as the difference between 32 and 33 degrees. But the scale is not a RATIO Scale. 40
degrees Fahrenheit is not twice as much as 20 degrees Fahrenheit.
Interval variables are true quantitative measures because in addition to marking difference and
rank, the differences or distances between any two numbers on the scale are meaningful. This
means that the difference between two scores is an accurate reflection of the difference in the
amount of an attribute that the two objects have. Temperature, measured in degrees Celsius, is
measured on the interval scale, and a difference between 18 degrees and 20 degrees will be
exactly the same as the difference between 25 degrees and 27 degrees. Most measures in the
behavioral sciences (e.g. IQ scores, scores on attitude scales, and knowledge tests) are
considered interval measures. In addition to performing mathematical relations (<, >, =), we may
also legitimately perform the mathematical operations of addition and subtraction (+, –) with
these numbers. Therefore, interval scales are measurement systems that possess the properties of
magnitude and intervals, but not the property of rational zero.

4. Ratio scales / Data:- The data values in ratio data do have meaningful ratios, for example, age
is a ratio data, someone who is 40 is twice as old as someone who is 20. Both interval and ratio
data involve measurement. Most data analysis techniques that apply to ratio data also apply to
interval data. Therefore, in most practical aspects, these types of data (interval and ratio) are
grouped under metric data.

Ratio scales are measurement systems that possess all three properties: magnitude, intervals, and
rational zero. The added power of a rational zero allows ratios of numbers to be meaningfully
interpreted; i.e. the ratio of John's height to Mary's height is 1.32, whereas this is not possible
with interval scales.

2.1.2. Methods of Data Collection

Data according to sources we classified as (i) primary data and (ii) secondary data

Business StatisticsPage 11
(i) Primary Data: Primary data are measurements observed and recorded as part of an original
study. When the data required for a particular study can be found neither in the internal records
of the enterprise, nor in published sources, it may become necessary to collect original data, i.e.,
to conduct first hand investigation. The work of collecting original data is usually limited by
time, money and manpower available for the study. When the data to be collected are very large
in volume, it is possible to draw reasonably accurate conclusions from the study of a small
portion of the group called a sample.

(ii) Secondary data: In statistics the investigator need not begin from the very beginning, he
may use and must take into account what has already been discovered by others. When an
investigator uses the data which has already been collected by others, such data are called
secondary data. Secondary data can be obtained from journals, reports, government
publications, publications of research organizations, etc

2.1.3 Methods of Data Collection


They are several methods of collecting primary data, particularly in surveys and descriptive
researches .Important ones are:
1. Observation method
2. Interview method/ Face-to-face and self-administered interviews and telephone
interviews
3. Through questionnaires
4. Through Schedules
5. Focus group discussions (FGD)
6. Other data collection techniques – Rapid appraisal techniques, Delphi techniques, life
histories, case studies, etc.

2.2. Tabular Methods of Data Presentation


The organized data can now be presented in the form of tables and diagram. At this stage, large
data will be presented in tables in a very summarized and condensed manner. The main purpose
of data presentation is to facilitate statistical analysis. Graphs and diagrams may also be used to
give the data a vivid meaning and make the presentation attractive.

2.2.1. Frequency Distributions (Absolute, Relative and Cumulative Distributions)

Business StatisticsPage 12
Frequency distribution is grouping of data into categories showing the number of
observations in each mutually exclusive category.

A frequency distribution is a table in which possible values for a variable are grouped into
classes, and the number of observed values which fall into each class is recorded. Data
organized in a frequency distribution are called grouped data. In contrast, for ungrouped data
every observed value of the random variable is listed.

Types of Frequency Distributions


􀂄 a categorical frequency distribution is used when the data is nominal.
􀂄 an ungrouped frequency distribution is used for numerical data and when the range of data is
small.
􀂄 a grouped frequency distribution is used when the range is large and classes of several units in
width are needed.
Example 1:

Frequency Distribution (Table):

Ask Soft Drink Preferences to 20 people

Rating Frequency
Coca Cola 6
Mirinda 8
Pepsi 2
Sprite 4
Total 20

- Is it on population or sample?

Example 2:
The number of refrigerators sold on 22 working days by a leading agency house:

23 30 20 26 30 20 23 40 40 26 20 30

23 40 28 26 23 40 28 28 30 30

Frequency distribution of the number of refrigerators sold

The table bellow clearly shows that on 3 days 20 refrigerators were sold each day, on 4 days 23
refrigerators were sold each day etc.

Business StatisticsPage 13
This method of classification helps in condensing the data only where values are largely
repeated, otherwise there will be hardly and condensation. In order to make the series more
compact so that its characteristics can be easily studied, data may be classified according to
class- intervals.

No. of Refrigerators Tally Frequency no. of Days

20 lll 3

23 llll 4

26 lll 3

28 lll 3

30 1111 5

40 llll 4

Example 3:
Consider the problem of a social scientist who wants to study the age of persons arrested in a
country. In connection with large sets of data, a good overall picture and sufficient information
can often be conveyed by grouping the data into a number of class intervals as shown below.

Age (years) Number of persons


Under 14 1,748
15 – 24 3,325
25 – 34 3,149
35 – 44 1,323
45 – 54 512
55 and over 335
Total 10,392

This kind of frequency distribution is called grouped frequency distribution.


Frequency distributions present data in a relatively compact form, gives a good overall picture,
and contain information that is adequate for many purposes, but there are usually some things
which can be determined only from the original data. For instance, the above grouped frequency

Business StatisticsPage 14
distribution cannot tell how many of the arrested persons are 19 years old, or how many are over
62.
The construction of grouped frequency distribution consists essentially of five steps:
(1) Choosing the classes, (2) determine the class intervals (3) sorting (or tallying) of the data into
these classes, (4) counting the number of items in each class, and (5) displaying the results in the
form of a chart or table
Methods of classifying the grouped data according to class interval

There are two methods of classifying the grouped data according to class intervals namely

Exclusive method
Inclusive method

a. Exclusive Method

When the class intervals are so fixed that the upper limit of one class is the lower limit of the
next class it is known as the ‘Exclusive’ method of classification. The following data are
classified on the basis:

Income in $ No of Employees

1800-1900 50

1900-2000 100

2000-2200 200

It is clear that ‘Exclusive method’ ensures continuity of data in as much as the upper limit of one
class is the lower limit of the next class. Thus in the above example, there are 50 persons whose
income is between $1800 and $1899.99. A person who is getting exactly $1900 would be
included in the class 1900-2000.

Here, whenever this method is used it is necessary to give clear instructions in the questionnaire.
However, the reader should note that if class intervals are given like 0-10, 10-20,, it is always
presumed that upper limit is exclusive i.e. an observation exactly to the upper limit is not
included in that class.

Business StatisticsPage 15
b. Inclusive method

Under the “Inclusive method’ of classification, the upper limit of one class is included in that
class itself.

Income in $ No of Employees

800-899 50

900-999 100

1000-1099 200

In the class 800-899 we include persons whose income is between $800 and $899. If the income
of persons is exactly $900 he is included in the next class.

A. Absolute Distribution
Definition of Absolute Frequency: A statistical term describing the total number of trials or
observations within a given interval or frequency bin. The frequency bins can be of any size, but
they must be mutually exclusive, exhaustive and the data must be grouped. So, the absolute
frequency is simply the total number of observations or trials within a given range or it is the
number of occurrences of a particular phenomena. It shows how many scores have that
particular value.
Absolute frequency represents the number of times each score or observation has occurred in a
set of observation. Computing the absolute frequency of a score is simply a matter of counting
the number of times that score appears in the set of data. It is necessary to include scores with
zero frequency in order to draw the frequency polygons correctly.

B. Relative and Percentage frequency distribution

Business StatisticsPage 16
Relative Frequency Distributions: is the proportion to the total number of observation. The
relative frequency distribution is useful in comparing two or more frequency distributions in
which the number of cases of each distribution is not equal.

Frequency of i th class
Relative frequency of the ith class = Total number of observatio ns

By multiplying it by 100, we can change it as a percentage to the total number of observations.

It may be noted that at times the use of relative frequencies is more appropriate than absolute
frequencies. Whatever two or more sets of data contain different number of observation, a
comparison with absolute frequencies will be incorrect. In such cases, it is necessary to use the
relative frequency.

C. Cumulative Frequency

In some situations, we may be interested, not in the frequencies in various classes, but rather in
the frequencies or proportions of observation which are “less than” or “greater than” a given
value. This leads to a cumulative frequency distribution. This is derived from a frequency
distribution by forming a cumulative frequency column. This column is computed by adding the
successive class frequencies from top to bottom. The entry corresponding to the top interval is
the frequency of that class., the entry opposite the second interval is the sum of the frequencies in
first and second class intervals etc. and so on.

If we divide frequency by N, the total number of observations, we get the relative frequencies.
Also, if we divide cumulative frequency by N, the total number of observations, we get the
relative cumulative frequencies, which are often expressed in percentage.

Cumulative frequency refers to the frequency of all data items with a value less than or equal to a
specified score.

Cumulative frequency (cf): the frequency of all scores at or below a particular score.

To compute a score’s cumulative frequency, we add the simple frequencies for all scores below
the score with the frequency for the score.

Business StatisticsPage 17
Example, an Absolute, Relative, Percentage & Cumulative Frequencies Table
Class interval Frequency Relative Frequency Percentage Cumulative Frequency

10 but less than 20 3 3/55= 0.055 0.055*100=5.5 3

20 but less than 30 5 5/55=0.091 0.091*100=9.1 8

30 but less than 40 9 9/55=0.164 0.164*100=16.4 17

40 but less than 50 18 18/55=0.32 7 0.32 7*100= 32.7 35

50 but less than 60 10 10/55=0.182 0.182*100= 18.2 45

60 but less than 70 8 8/55=0.145 0.145*100= 14.5 53

70 but less than 80 2 2/55=0.036 0.036*100= 3.6 55

Total 55 1.000 100 55

Classification of data according to class intervals

 Class limits(CL),
 Class frequency(F),
 Class boundaries(CB) ,
 Class midpoint(class mark)(mi),
 Class width ( or class size or class interval)(W),and we will define
each term

Class Limits: includes the lower class limits (LCL) and upper class limits (UCL). For
example, take the class 40-60. Here, we find that the lowest limit is 40 and the highest limit is
60. When we categories individual observations within this class, the lowest limit is 40 and the
highest limit is 60. When we categories individual observation within this class, it is clear that
none of the included observation is below 40 or above 60. Take another example; a class 60-79

Business StatisticsPage 18
indicates that no value below 60 can be included here and, likewise, no value above 79 can be
included.

Class Frequency: The number of observations belonging to a particular class is known as the
frequency of that class or class frequency. Suppose there are 20 students who have obtained
marks ranging from30-40 and 44 students have obtained marks ranging from 50- 60. In the first
case, the class-interval 30-40, the class frequency is 20, while in the second case, in the class
interval 50-60, the class frequency is 44.

Classes Boundaries (CB): have upper class boundary (UCB) and lower class boundary (LCB)
which are obtained after getting class limits. First we have to find the distance (d), most of the
time is unit between one class and the next class using the formula

Unit of measurement (d) = LCLi+1 - UCLi

 Then for a given class:


1. The LCB is obtained by subtracting half the unit of measurements (d) from the LCL of
the class.

 LCLi 1  UCLi 
 2 
LCBi = LCLi -   half the unit measurement
2. The UCB is obtained by adding half the unit of measurements (d) to the UCL of the class.
 LCLi  1
 UCLi 
 
 2 
UCBi = UCLi +
The unit of measurement (d) is the gap between two UCL of the class and LCL to the next higher
class (two successive classes).

Unit of measurement (d) = LCli +1 - UCLi

Generally is used to make no gap.

Example: Converts the following class limits into class boundaries

Class limits(LCL , UCL)


5 - 9
10 - 14
15 - 19
Solution: we three classes

a) LCLi + 1 - UCLi

Business StatisticsPage 19
Unit of measurement (d) = LCLi+1 - UCLi = 10 - 9 = 1

LCL2  UCL1
 1  0.5
Half of unit of measurement 2 2

LCBi = LCLi - 0.5


LCB1 = 5 - 0.5 = 4.5
UCB1 = UCL1 + 0.5 = 9 + 0.5 = 9.5
LCB2 = LCL2 - 0.5 = 10 - 0.5 = 9.5
UCB2 = UCL2 + 0.5 = 14 + 0.5 = 14.5
LCB3 = LCL3 - 0.5 = 15 - 0.5 = 14.5
UCB3 = UCL3 + 0.5 = 19 + 0.5 = 19.5
Therefore the boundary form is:

Class limits Class boundary


(LCL , UCL) (LCB, UCB)
5 - 9 4.5 - 9.5
10 - 14 9.5 - 14.5
15 - 19 14.5 - 19.5

Class width ( or class size or class interval)(W): is the difference between the upper class
boundary and the lower class boundary of a class is known as a class width (size), and for the
above example is W=UCB-LCB=9.5-4.5=5( we can take any class because all classes have equal
class width).

Note: When all the classes have the same (uniform) class width (size) then the class width of the
distribution is the difference between either the lower class limit or upper class limit of the two
consecutive classes.
Class midpoint (class mark) (mi): When we add up the lower and the upper class limits of a
class interval, we get a certain value. This value is divided by two, which gives us the class mid-
point. Thus, the mid-point of class interval 40 - 60 is (40+60) / 2 = 50. The formula for obtaining
class mid-point is as follows:

LCLi  UCLi 
or
LCBi  UCBi 
Midpoint (mi) = 2 2

Business StatisticsPage 20
As we shall see subsequently, the mid-point of each class interval is taken to represent it for the
purpose of statistical calculations.
Steps to construct frequency distribution
It is difficult to lay down any hard and fast rules for constructing frequency distribution. Raw or
ungrouped data have the following steps for organizing in to a frequency distribution.

Step1: Decide on the number of classes. The goal is to use just enough groups or classes to
reveal the shape of the distribution. Too many classes or too few classes might not reveal the
basic shape of the data set. A useful recipe to determine the number of classes (k) is the “2 to the
‘k’ rule.” This guide suggests you select the smallest number (k) for the number of classes such

that 2k > N (2 raised to the power of k) or K 1  3.322 log N ..

 Where N = Total number of observations.


 K= the approximate number of classes.
 Log = the ordinary logarithm to the base of 10.

Step2: Determine the class interval or Width. Generally the class interval or Width should be
the same for all classes. The classes all taken together must cover at least the distance from the
lowest value in the raw data up to the highest value. Expressing these words in a formula:

H−L
i > K
Where i = the class interval
H =Highest observed value
L = Lowest observed value
K = the number of classes

Step3: Set the Individual Class Limits. State clear class limits so you can put each observation
in to only one category. This means you must avoid overlapping or unclear class limits.
Step4: tally the number or amount of items in each class.
Step5: Count the number or amount of items in each class. The number of observations in
each class is called class frequency.
Example
The following are the marks of the 30 students in statistics. Prepare a frequency distribution
taking a suitable class interval.

Business StatisticsPage 21
12 33 23 25 18 35 37 49 54 51 37 15

27 33 42 45 47 55 69 65 63 46 29 18

37 45 46 59 29 55

Requests:-
1. Determine the number of classes
2. Classify the above data taking a suitable class interval.
3. Determine the class limit
4. Tally the number of items in each class
5. Count the number or amount of items in each class
Solution
1. Given the number of observations = N = 30, 2k ≥ 30, k =5; therefore, the ♯ of classes is 5.
Range 69− 12 57
2. i≥ K ≥ 5 ≥ 5 = 11.4 ≈12
3. Lower class limit = 10

4 and 5 see


Frequency Distribution of the students’ marks in Statistics

Marks Tally Frequency


10-22 1lll 4

22-34 llll I 6

34-46 llll lll 8

46-58 llll lll 8

58-70 1lll 4
Total 30

Business StatisticsPage 22
Tally means compute, count or mark.

2.3. Graphic Methods of Data Presentation (Histograms, Polygons, Ogive, Pie-Charts, Bar
and Line Graphs)
There are many graphs, diagrams and charts used to present data. Histograms, Polygons, Ogive,
Pie-Charts, Bar and Line Graphs are some of them. Here let us see two of them (Histograms and
Polygon) only using the above example of frequency distribution.
One the most common ways to portray a
Class Frequency frequency distribution is Histogram.
Histogram is a graph in which the classes
Histogram Data Presentation are marked on the horizontal axis and the
class frequency on the vertical axis. The

class frequency are represented by the

8 8 hights of the bars and the bars are drawn

6 adjacent each other.

4 4

10 22 34 46 58 70 Class Interval

A frequency polygon is similar to

Histogram. It consists of line segments

connecting the points formed by the

8 8 intersection of the class midpoints and

the class frequencies.

4 4

Business StatisticsPage 23
4 16 28 40 52 64 76

Unit 3 - Measures of Central Tendency and Dispersion

Introduction: - we need a single representative value that describes the entire mass of data
given in the frequency distribution. This single representative value is called the central value,
measure of location or an average around which individual values of a series cluster. This
central value or an average enables us to get a gist of the entire mass of data, and its value lies
somewhere in the middle of the two extremes of the given observations. For this reason such a
central value or an average is frequently called a measure of central tendency.

It should be clear to you that the concept of a measure of central tendency is concerned only with
quantitative variables and is undefined for qualitative variables as these are immeasurable on a
scale.

In contrast, measures of dispersion, or variability, are concerned with describing the variability
among the values. Several techniques are available for measuring the extent of variability in data
sets.
3.1. The Use of Summation Notation
The most important objective of calculating and measuring central tendency is to determine a
“single figure “which may be used to represent a whole series involving magnitudes of the same
variable. In that sense it is an even more compact description of the statistical data than the
frequency distribution.

The Capital Letter or uppercase ∑ (sigma) is the mathematical symbol for summation. If f (i)
denotes some quantity whose value depends on the value of i, the expression.


i 1
x i  x1  x2  x3 . . . .  xn


i  1
xi yi  x1 y1  x2 y2  . .  xn yn

Business StatisticsPage 24
n

 x
i 1
i  yi   x1  y1   x2  y2   . . .  xn  yn 

= x1 + x2 +. . . . + xn + y1 + y2 +. . . . + Yn
n n


i 1
xi  
i  1
yi
=

is read as “sigma i, i going from 1 to n” and means to insert 1for i, then 2 for i, then 3 for i…and
sum the results.

3.2. Central tendency measures


In statistics, we have various types of measures of central tendencies. The most commonly used
types of MCT includes:- Mean Quartiles
Median Percentiles
Mode
Now, we will discuss the first three methods in detail.

3.2.1 Mean: There are four type of mean which are suitable for a particular type data.

These are
I. Arithmetic means III. Geometric mean
II. Weighted mean IV. Harmonic mean

Here we also see only the first one.


Arithmetic mean: The arithmetic mean of a sample is designated by the symbol X ( X
bar) and the mean of a population is designated by the Greek letter µ pronounced as
“mu”. The symbol ∑ is the letter capital sigma of the Greek alphabet and is used in
mathematics to denote the sum of the values.

In classification and tabulation of data, we observed that the values of the variable or
observations could be put in the form of any of the following statistical series, namely:

1. Individual series or ungrouped data.


2. Discrete frequency distribution.
3. Continuous frequency distribution

Business StatisticsPage 25
1. Individual series or ungrouped data: Let X be a variable which takes values x 1 ,x2 ,x3 ,
…,xn , in a sample size of n from a population of size N for n < N then A.M. of a set of
observations is the sum of all values in a series divided by the number of items in the series.
That is if x1, x2, x3,..xn be n random samples, their arithmetic mean is

x 1+¿ x 2+ ¿x 3+¿ x ¿ ¿ ¿ For sample data


¿
x
5+…+ ¿ x
n ∑x
4+¿ = ¿
n n

Example 1:Suppose the scores of a student on seven examinations were 5 ,10,20, 7,33 , 60
and 68,find the arithmetic mean of scores of student.

These are seven observations. Symbolically, the arithmetic mean, also called simply mean is

X =
∑x = (5 + 10 + 20+ 7 + 33 + 60 + 68) / 7 = 203 / 7 = 29
n

2. Discrete frequency distribution: In discrete frequency distribution we multiply the values of


the variable (X) by their respective frequencies (f) and get the sum of the products ( ∑ Xf ). The
sum of the products is then divided by the total of the frequencies, i.e., ∑ f = N. Thus according
to this method, the formula for calculating arithmetic mean becomes:

X=
∑ Xf
∑f
Here, ∑ X i f i= the sum of the products of observations with their respective frequencies.

∑ f i=N= the sum of the frequencies.


That is Calculations of A. M for Simple /discrete/ frequency distributions.
Value Frequency Xi fi
x1 f1 x1 f1
x2 f2 x2 f2
. .
. . .
xk fk xnfk

xi fi xi fi_

Business StatisticsPage 26
X =
∑ xi f i
∑ fi
We will solve one example to understand it.

Example: the following table gives the wages paid to 125 workers in a factory. Calculate the
arithmetic mean of the wages.
Wages (in birr): 200 210 220 230 240 250 260
No. of workers: 5 15 32 42 15 12 4

Wages(x) No. of Workers(f) fX


200 5 1000
210 15 3150
220 32 7040
230 42 9660
240 15 3600
250 12 3000
260 4 1040
Total N = ∑ f i = 125 ∑ x i f i = 28490

∑ xi f i 28490
Solution: X = = = 229.92birr
∑ fi 125

3. Continuous frequency distribution (the Case of Grouped Data)


Example: The following table gives the marks of 58 students in introduction to Statistics.

Calculate the average marks of this group.

Marks No. of Students____________________


0-10 4
10-20 8
20-30 11
30-40 15
40-50 12
50-60 6

Business StatisticsPage 27
60-70 2_________________________

Solution
Marks Mid-point (mi) No. of Students (fi) fX
0-10 5 4 20
10-20 15 8 120
20-30 25 11 275
30-40 35 15 525
40-50 45 12 540
50-60 55 6 330
60-70 65 2 130

∑ fi = 58 ∑ M i f i = 1940
So, Arithmetic mean will be

∑ xi f i
X =
∑ fi
= 1940/58 = 33.45 marks

It may be noted that the mid-point of each class is taken as a good approximation of the true
mean of the class. This is based on the assumption that the values are distributed fairy enough
throughout the interval. When large numbers of frequency occur, this assumption is usually
accepted.
Example: The data 5, 9, 13, 12 and 16 has mean 11 but, If we have 100 instead of 5 i.e. 100, 9,
13, 12, 16 then the mean will be 30.

Business StatisticsPage 28
3.2.2 Median: Median is defined as the value of the middle item (or the mean of the values of
the two middle items) when the data are arranged in an ascending or descending order of
magnitude.

Thus, in an ungrouped frequency distribution if the n values are arranged in ascending or


descending order of magnitude, the median is the middle value if n is odd. When n is even, the
median is the mean of the two middle values.

th
 n  1
 
Median =  2  element if n is odd.

th th
 n n 
     1
 2 2 
= 2 element if n is even.

Suppose we have the following series: 15, 19, 21, 7, 33, 25, 18, 10 and 5

We have to first arrange it in either ascending or descending order. These figures are arranged in
an ascending order as follows:

5, 7, 10, 15, 18, 19, 21, 25, 33

Now as the series consists of odd number of items, to find out the value of the middle item, we
use the formula

th
 n  1
 
Median =  2  element if n is odd.

That is the size of the 5th item is the median. This happens to be 18.

Suppose the series consists of one more item, 23. We may, therefore, have to include 23 in the
above series at an appropriate place, that is, between 21 and 25. Thus, the series is now 5, 7, 10,
15, 18, 19, 21, 23, 25, and 33. Applying the above formula, the median is the size of 5.5 th item.

Business StatisticsPage 29
Here, we have to take the average of the values of 5 th and 6th item. This means an average of 18
and 19, which gives the median as 18.5.

Determination of Median in a Continuous Frequency Distribution

In the case of a continuous frequency distribution, we first locate the median class by cumulating
th
N
 
the frequencies until  2  point is reached. Finally, the median is calculated by with the help
of the following formula:

N 
2  Cf  w
Median  LCb   
f

Where, Cf = less than cumulative frequency of the class preceding(one before) the median
class , f is frequency of the median class, LCb is lower class boundary of median class and
k
N  
i 1
fi ,
w is the size of the class width and

Let us take an example of a frequency distribution for which the median is to be calculated.

Monthly Wages (in birr) No. of Workers

800-1,000 18

1,000-1,200 25

1,200-1,400 30

1,400-1,600 34

1,600-1,800 26

1,800-2,000 10_____

Total 143

Business StatisticsPage 30
Solution: In order to calculate median in this case, we have to first provide cumulative frequency
to the table. Thus, the table with the cumulative frequency is written as:

Monthly Wages Frequency Cumulative Frequency


800--1,000 18 18
1,000--1,200 25 43
1,200--1,400 30 73
1,400--1,600 34 107
1,600--1,800 26 133
1,800--2,000 10 143
N 143
 71.5th
Now, Median is the value of 2 2 item, which lies in the class (1,200-1,400). Thus
(1,200-1,400) is the median class. For determining the median in this class, we use interpolation
formula as follows:
N 
2  Cf 
Median  L C b    w
f mc

71.5  43
200
=1200+ 30

=1390 birr

3.2.3 Mode ( ^
X)
The mode is another measure of central tendency. It is the value at the point around which the
items are most heavily concentrated.
A given set of data may have
 One mode – uni model e.g. A=3 ,3,7,6,2,1 ^
X =3
 Two mode – Bi – modal e.g. 10,10,9,9,6,3,2,1 ^
X = 10 and 9
 More than two mode- multi modal e.g. 5,5,5,6,6,6,8,8,8,2,3,2 ^
X =5,6,8
 May not exist at all e.g. 1,3,2,4,5,6,7,8 no modal value

As an example, consider the following series: 8, 9, 11, 15, 16, 12, 15, 3, 7, 15

Business StatisticsPage 31
There are ten observations in the series where in the figure 15 occurs maximum number of times-
three. The mode is therefore 15.
Note that
In case of discrete frequency distribution, mode is the value of the variable corresponding
to the maximum frequency. This method can be used conveniently if there is only one
value with the highest concentration of observation.

Example: Consider the following distribution, then determine modal value of the distribution.
X 1 2 3 4 5 6 7 8 9
F 3 1 18 25 40 30 22 10 6
Solution: The maximum frequency is 40 and therefore the corresponding value of X=5 is the
value of mode. In the case of grouped data, mode is determined by the following formula:
 f 1  f0 
^
X lo    w
Mode = =

 1 f  f 0    f1  f 
2 

lo
Where = is the lower value of the class in which the mode lie.

f1 = is the frequency of the class in which the mode lie.

f0 = is the frequency of the class preceding the modal class.

f2 = is the frequency of the class success ding the modal class.

w = is the class width of the modal class.

While applying the above formula, we should ensure that the class-intervals are uniform
throughout. If the class-intervals are not uniform, then they should be made uniform on the
assumption that the frequencies are evenly distributed throughout the class. In the case of
unequal class-intervals, the application of the above formula will give misleading results.

Example: Let us take the following frequency distribution:


Class intervals Frequency
30-40 4
40-50 6
50-60 8
60-70 12
70-80 9
80-90 7

Business StatisticsPage 32
90-100 4
We have to calculate the mode in respect of this series.

Solution: We can see from Column (2) of the table that the maximum frequency of 12 lies in the
class-interval of 60-70. This suggests that the mode lies in this class-interval. Applying the
formula given earlier, we get:

12  8
Mode 60  10
12  8  12  9

4
60  10
= 43

=65.7 approx.

In several cases, just by inspection one can identify the class interval in which the mode lies. One
should see which the highest frequency is and then identify to which class-interval this frequency
belongs. Having done this, the formula given for calculating the mode in a grouped frequency
distribution can be applied.

Business StatisticsPage 33
3.3. Measures of dispersion
In the preceding section we have seen the measures of central tendency. To describe a data set,
we use measures of variation in addition to measures of central tendency. But that it is not
enough to understand about the characteristics of the data we have collected. It is also important
to know the extent of variation among the data.

Literal meaning of dispersion is scatter or spread. Dispersion is the degree of the scatter or
variation of the variables about a central value. A measure of variation is designed to state the
extent to which the individual measures differ on an average from the mean. This section
discusses the methods used in measuring the extent of variations in the data we have collected.

Consider the following data sets:

Mean

Set 1: 60 40 30 50 60 40 70 50
Set 2: 50 49 49 51 48 50 53 50

The two data sets given above have a mean of 50, but obviously set 1 is more “spread out” than
set 2. How do we express this numerically? The object of measuring this scatter or dispersion is
to obtain a single summary figure which adequately exhibits whether the distribution is compact
or spread out. Some of the commonly used measures of dispersion (variation) are: Range,
variance, and standard deviation.

1. Range
The simplest measure of spread/variation is the range. It is the crudest measure of dispersion.
The range is a measure of absolute dispersion and as such cannot be usefully employed for
comparing the variability of two distributions expressed in different units. The range does not
use all the available observations. It uses only two extreme values. The range is the difference
between the highest and lowest scores. I.e. it is simply the highest value minus the lowest value.

Range = Maximum Value – Minimum Value


Since the range only uses the largest and smallest values, it is greatly affected by extreme values,
that is - it is not resistant to change.

Business StatisticsPage 34
In our example the highest score is a mark of 90 and the lowest is 0. The range is therefore 90.
This measure is a little crude; it sets the boundaries to the scores but does not tell us anything
about their general spread. Indeed, even if our marks were evenly spread between 0 and 90 rather
than clustered in the 50s, our range would still be 90.

In our example given above (the two data sets)

* The range of data in set 1 is 70-30 =40


* The range of data in set 2 is 53-48 =5
Conclusion on range:
1. Since it is based upon two extreme cases in the entire distribution, the range may be
considerably changed if either of the extreme cases happens to drop out, while the removal of
any other case would not affect it at all.
2. It wastes information for it takes no account of the entire data.
3. The extremes values may be unreliable; that is, they are the most likely to be faulty
4. Not suitable with regard to the mathematical treatment required in driving the techniques of
statistical inference.
1. Variance: is "Average Deviation"
The range only involves the smallest and largest numbers, and it would be desirable to have a
statistic which involved all of the data values. The mean absolute deviation, or MAD, is based on
the absolute value of the difference between each value in the data set and the mean of the group.
It is sometimes called the “average deviation.” The mean average of these absolute values is then
determined. The first attempt one might make at this is something they might call the average
deviation from the mean and define it as:

Sample MAD = ∑ (X- X)


n

The problem is that this summation is always zero. So, the average deviation will always be zero.
That is why the average deviation is never used.

Business StatisticsPage 35
Population Variance

So, to keep it from being zero, the deviation from the mean is squared and called the "squared
deviation from the mean". This "average squared deviation from the mean" is called the
variance.
 Variance is the average of the squares of the distance each value is from the mean.
 Population variance is (2) of N measurements is the sum of the squared deviations from
the mean divided by the N.
 The symbol for the population variance is  (read as sigma)
 The formula for the population variance is:

, where X= individual value


µ = population mean
N = population size

One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To counteract this, the sum of the squares of the deviations is divided by
one less than the sample size. Sample variance is (S 2) of n measurements is the sum of the
squared deviations from the mean divided by (n-1).

2. Standard Deviation

There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.

 SD is the positive square root of the variance.


 The symbol for the population standard deviation is .
 The corresponding formula for the standard deviation is S:

Business StatisticsPage 36
Steps to compute the sample variance and standard deviation:

 Total the data values;


 Divide by the number of values to get the mean;
 Subtract the mean from each value to get the numbers in the second column;
 Square each number in the second column to get the values in the third column;
 Total the numbers in the third column;
 Divide this total by one less than the sample size to get the variance;
 Take the square root of the variance to get the sample standard deviation.

Example: compute the Range, Variance and Standard deviation for the following Sample data

4 4 - 4.6 = -0.6 ( - 0.6 )2 = 0.36


5 5 - 4.6 = 0.4 ( 0.4 ) 2 = 0.16
3 3 - 4.6 = -1.6 ( - 1.6 )2 = 2.56
6 6 - 4.6 = 1.4 ( 1.4 )2 = 1.96
5 5 - 4.6 = 0.4 ( 0.4 )2 = 0.16
23 0.00 (Always) 5.2

S2 = 5.2 = 5.2 = 1.3.

Business StatisticsPage 37
5-1 4

S = 1.1402

Range = 6-3 = 3.

Conclusion about range, variance, & standard deviation


The Range: to find the range, you first need to find the lowest and highest values in the data. The
range is found by subtracting the lowest value from the highest value.
Population standard deviation: This measure of variation is universally used to show the scatter
of the individual measurements around the mean of all the measurements in a given distribution.
Note that the sum of the deviations of the individual observations of a sample about the sample
mean is always 0.
The square of the standard deviation is called the variance. The variance is a very useful measure
of variability because it uses the information provided by every observation in the sample and
also it is very easy to handle mathematically. Its main disadvantage is that the units of variance
are the square of the units of the original observations. Thus if the original observations, for
example, were heights in cm then the units of variance of the heights are cm2. The easiest way
around this difficulty is to use the square root of the variance (i.e., standard deviation) as a
measure of variability.

Business StatisticsPage 38
Unit 4 - Probability and probability distribution

4.1 Probability Theory: No doubt you are familiar with terms such as probability, chance and
likelihood. They are often used interchangeably. A manufacturer cannot be ascertained (sure) of
the future demand of his product with certainty. As everybody knows our world is a full of
uncertainty ,even ,no one knows exactly what comes after a minute, an hour……etc. but we
can guess the chance that something will happen. The word probability or chance is very
commonly used in day-to-day conversation, and generally, people have some idea what it means.
Terms like possible, probable, or likely and so on, have all similar meanings.

4.1.1 Basic definitions: Probability can be defined as a measure of the likelihood that a
particular event will occur or it is a science of decision making with calculated risk in face of
uncertainty. It is a numerical measure with a value between 0 and 1 of such likelihood. Where
the probability of zero indicates that the given event cannot occur and the Probability of one
assures certainty of such an occurrence.

Probability is a value between zero and one, inclusive, describing the relative possibility
(chance or likelihood) an event will occur.

4.1.2 Fundamental concepts: -


Experiment: An experiment or trial is an act that can be repeated under given identical
conditions.

It is a process that leads to the occurrences of one and only one of several possible observations.

Or it is a process of observing or measuring something we plan to do in which the outcome


is uncertain.

Example: Throwing a die, tossing a coin are the examples of experiment or trial.

Outcome: An outcome is the results of an experiment. Examples


If we throw a die we get 1 or 2 or 3 or 4 or 5 or 6. So that individually 1 is an outcome, 2 is an
outcome.
If we toss a coin we get head or tail. Individually head and trial are two outcomes.
Sample space: A sample space is the set of all outcomes.

Example

Business StatisticsPage 39
If we throw a die the outcomes are 1, 2, 3, 4, 5 and 6. Then S {1, 2,3, 4,5, 6} is a sample space;

If we toss a coin then the outcomes are head (H) and tail (T). Then S {H , T } is a sample space;
Event: An event is the collection of one or more outcomes of an experiment. Example

If we throw a die the outcomes are 1, 2, 3, 4, 5, and 6. Then the outcomes of even numbers are
2, 4, 6. Then A {2, 4, 6} is called an event of even numbers;

Mutually Exclusive event: When an event occurs and none of the other events will occur at the
same time, then the event is called mutually exclusive event.
Example
If we toss a coin two outcomes head (H) and tail (T) are mutually exclusive event. Because if it
appears head (H) or tail (T) not both head and tail at the same time.

Formally, two events A and B are mutually exclusive if and only if A  B  .


Exhaustive events
The total number of possible outcomes in any trial is known as exhaustive events or exhaustive
cases.
Example
In tossing of a coin there are two exhaustive cases viz, head and tail.
In throwing of a die, there are six exhaustive cases since any one of the 6 faces 1, 2, 3, 4, 5, 6
may come upper most.
Approaches to Assigning probabilities
Two approaches to assigning probabilities will be discussed, namely, the objective and the
subjective viewpoints. Objective probability is also subdivided in to two (1) Mathematical or
classical or a priori definition of probability and (2) Statistical or empirical or a posteriori
definition of probability.
(1) Mathematical or classical or a priori probability
Classical probability is based on the assumption that the outcomes of an experiment are equally
likely. Using the classical viewpoint, the probability of an event happening is computed by
dividing the number favorable outcomes by the number of possible outcomes.

Number of favorable outcomes


Probability of an Event =
Total number of possible outcomes

Business StatisticsPage 40
Equally likely means that each outcome of an experiment has the same chance of
happening as only other.
Limitations:
 The classical probability fails to define probability when the total numbers of possible
outcomes are infinite.
 It is not always to enumerate.

(2) Statistical or empirical probability

Another way to define probability is based on Relative Frequencies. If a trial is repeated a


number of times under essentially homogeneous and identical conditions, then the limiting value
of the ratio of the number of times the event happens to the number of trials as the number of
trials become indefinitely large is called the probability of happening of the event. It is assumed
that the limit is finite and unique.

Symbolically, if in n trials an event E happens m times, then the probability p of the


happening of E is given by

Number of ×event occured ∈ past


Probability of event happening =
Total number of observations

m
p  p  E  lim
n  n

Example: Suppose that an insurance company knows from past actual data that of all males 40
years old, about 60 out of every 100,000 will die within a one-year period.

Using this method, the company estimates the probability of death for that age group as:
60
= 0.0006
100,000

Subjective probability

The probability that a person assigns to an event which is the possible outcomes of some
processes on the basis of his own judgment, beliefs and information about the processes is
known as subjective probability.

Business StatisticsPage 41
For example, one fine morning Mr. X may well be prepared for rain, but his friend Mr. Y may
not.

Properties of probability
Let E be an experiment. Also let S be a sample space associated with E , with each event A we
P  A
associate a real number, designed by and called the probability of A satisfying the
following properties:
0  p  A  1
 .
p S  1
 .
p  A  B   p  A  p B 
 If A and B are mutually exclusive events, .
Rules for computing probabilities
Probability of two or more events are computed by applying rules addition and multiplication
p  A  B   p  A  p B 
Addition: is special rule of addition. Mutually exclusive means that
when one event occurs, none of the other events can occur at the same time. Eg flipping a coin;
Example
If we toss a coin then what is the probability of head or tail?
Solution
Here there are two events, namely event A = H and event B = T . So that
p  A or B   p  A   p  B 

0.5 + 0.5 =1.


General Rule of Addition: The outcome of an experiment may not be mutually exclusive.
When two events both occur, the probability is called joint probability.

Joint probability: a probability that measures the likelihood two or more events will happen
concurrently. Addition rule P (A or B) = P (A U B) = P (A) + P (B) - P (A ∩ B)
P (A U B) = P (A) + P (B) - P (AB)
Example

Mr. X feels that the probability that he will pass Mathematics is 0.66666 and Statistics is
0.83333. If the probability that he will pass both the course is 0.6. What is the probability that he
will pass at least one of the courses?

Business StatisticsPage 42
Solution

Let M and S be the events that he will pass the courses Mathematics and Statistics respectively.
p M  S   p ( M or S )
The event M  S means that at least one of M or S occurs. Therefore,
 p he pass at least one of the course 

 p M   p S   p M or S 

2 5 3 9
   
3 6 5 10 .
Complement rule
The complement rule is used to determine the probability of an event occurring by subtracting
p  A  1  p  A 
the probability of the event not occurring from 1 i., e. .

Example

Weight Event Probability


Underweight A 0.025
Satisfactory B ??
Overweight C 0.075

p B 
Find .
Solution
p  B  1  p  B  1  0.025  0.075  0.90
We know that .

Rule of multiplication: there are two rules of multiplications, Special rule of multiplication and
the general rule of multiplication.

Special rule of multiplication: the Special rule of multiplication requires that two events A and
B are independent. Two events are independent if the occurrence of one event does not alter the
probability of the occurrence of the other event.

INDEPENDENCE the occurrence of one event has no effect on the probability of the occurrence
of the other event.
Special rule of multiplication P (A and B) = P (A ∩ B) = P (A). P (B)

Business StatisticsPage 43
Foe three independent events A, B and C, the special rule of multiplication used to determine the
probability that three events will occur is: P (A and B and C) = P (A). P (B) .P (C)
Example
A company has two large computers. The probability that the newer one will breakdown on any
particular month is 0.05, the probability that the older one will breakdown on any particular
month is 0.1. What is the probability that they will both breakdowns in a particular month?
Solution
Let, Event A is the newer one will breakdown and Event B is the older one will breakdown. So
p  A  0.05 p  B  0.1
that and .
 p  A and B   p  A * p  B  0.05*0.1 0.005
.
General rule of multiplication: if two events are not independent, they are referred to as
dependent. We use the general rule of multiplication to find the joint probability of two events
when the events are not independent.

Remark: if A and B has a nonzero probability and the events are mutually exclusive they are
necessarily dependent; that is, they cannot be independent. On the other hand, if A and B are not
mutually exclusive they may be independent or dependent.

General rule of multiplication P (A and B) = P (A) P ( A/B)

Conditional probabilities are computed using general rule of multiplication. A conditional


probability is the likelihood that an event will happen, given that another event has already
happened.

Example
There are 10 rolls of film in a box, 3 of which are defective. Two rolls are to be selected one
after another. What is the probability of selecting a defective roll followed by another defective
roll?
Solution
3
 p  D1  
The first roll of film selected from the box being found defective is event D1 . 10 .

Business StatisticsPage 44
2
p  D2 D1  
The second roll selected being found defective is event D2 . Therefore, 9 . Since,
after the first selection was found to be defective, only 2 defective rolls of film remained in the
box containing 9 rolls.
So the probability of two defectives is
 p  D1 and D2 
 p  D1 * p  D2 D1 
3 2 6
 *  0.07
10 9 90 .

Bayes’ Theorem: Bayes’ Theorem is a method of revising probability given that additional
information is obtained. For mutually exclusive and collectively exhaustive events:

P ( A ) P(B/ A)
Bayes’ Theorem P (A/B) =
P ( A ) P ( B / A ) + P ( A ' ) P (B / A ' )

P (A) is known as prior probability. Prior probability is the initial probability based on the
present level of information and p (B/A) is called posterior probability. Posterior probability is
a revised probability based on additional information.

Example: - Once in the night, a speeding taxi struck a man as he crossed the street. An
eyewitness has testified that she thought the taxi (which did not

stop) was blue. The man sued the Blue Cab Company for his medical expenses. The city

where the accident occurred has only two taxi companies: Blue cab and Green cab. Green
cab has 85 percent of the taxis’ in the city. At the trial, the man’s lawyer shows that the
eyewitness is 80 percent reliable in identifying the color of taxis. That is, she was able to
identify correctly the color of taxis 80 percent of the time, under conditions like those of the
night accident. The lawyer concludes that it is extremely likely that aBlue Cab was hit the man.

Do you agree? Why or Why not?

Business StatisticsPage 45
Solution:
Given: B = Blue E = eyewitness thought that the taxis was blue.
G = Green
P (E/B) = 0.8 P (E/G) = 0.2
P (B) = 0.15 P (G) = 0.85 Required: P (B/E)=?
0.8 x 0.15
P (B/E) = P ( B ) P ¿ ¿ = = 0.41
0.8 x 0.15+ 0.2 x 0.85

Principles of counting: if the number of possible outcomes in an experiment is small, it is


relatively easy to count. There are six possible outcomes, for example, resulting from the roll of
a die. However for large number of possible outcomes such as the number of heads and tails in
an experiment of 10 tosses, it would be tedious to count. Therefore, to facilitate counting, three
counting formula will be examined: multiplication formula, permutation formula and
combination formula.

i. Multiplication formula: if there are m ways of doing something and n ways of doing another
thing, there are m * n ways of both.

Multiplication formula: total number of arrangements = (m) (n) or (m)(n) (o)

Example: pioneer manufacturers 3 models of stereo receivers, 2 cassette decks, 4 speakers and 3
CD carousels. When the 4 types of components are sold together, they form a “system”. How
many different systems can the electronics firm offer?

Solution: Arrangement = 3x2x4x3 = 72

ii. Permutation formula: is applied to find the possible number of arrangements when there is
only one group object. A permutation is an arrangement in which the order of the objects
selected from specific pool of objects is important.

n!
Permutation formula n Pr = ( n −r ) ! where:

n is the total number of objects.


r is the number of objects selected.

Business StatisticsPage 46
Example: a machine operator must make 4 safety checks before starting to machine a part. It
does not matter in which order the checks are made. How many different ways can the operator
make the checks?
4! 4! 4 x 3x 2 x1
Solution: n Pr = 4 P 4 = ( 4 − 4 ) ! = 0 ! 1
= 24

iii. Combination formula: if the order of the selected objects in not important, any selection is
called Combination. The formula to count the number of r object combinations from a set of n
objects is:

n!
Combination formula n Cr = r ! ( n− r ) !
Example: a pollster selected 4 of 10 available people. How many different groups of 4 are
10 ! 10! 10 x 9 x 8 x 7 x 6 !
possible? Solution: 10C4 = = = = 210
4 ! ( 10 − 4 ) ! 4!6! 4!6!
Permutation and Combination use a notation called n factorial. It is written n! and means the
product of n(n-1) (n-2) …(1). Example 5! = 5x4x3x2x1 = 120. Zero factorial, written 0! is 1.
That is, 0! = 1

4.2. Probability Distribution


4.2.1 A probability distribution shows the possible outcomes of an experiment and the
probability of each of these outcomes.

Or it is a listing of all the outcomes of an experiment and the probability associated with each
outcome.

Example

To begin our study of probability distribution, let’s go back to the idea of a fair coin, suppose we
toss a fair coin twice the possible outcomes are:

Business StatisticsPage 47
First toss Second toss Number of Probability of the
heads on four possible

Possible two tosses outcomes

outcomes from T T 0 0.5*0.5 0.25


two tosses of a T H 1 0.5*0.5 0.25
fair coin
H T 1 0.5*0.5 0.25

H H 2 0.5*0.5 0.25

Total 1.0

4.2.2 Types of probability distribution

Business StatisticsPage 48
Probability Distribution

Discrete probability distribution Continuous probability distribution

Bernoulli distribution Uniform distribution

Binomial distribution Exponential distribution

Poisson distribution Normal distribution

Hyper geometric distribution Gamma distribution

Negative binomial distribution


Lognormal distribution

Couchy distribution

4.2.2.1 Discrete probability distribution


A discrete probability can take on only a limited number of values which can be listed.
Example
The probability that you were born in a given month is also discrete because there are 12
possible values.
4.2.2.2 Continuous probability distribution
In a continuous probability distribution the variable under consideration is allowed to take on any
within a given range. So we cannot list all the possible values.

Business StatisticsPage 49
Example
Suppose we were examining the level of effluent in a variety of streams and we measured the
level of effluent by parts of effluent per million parts of water. We would expect quite a
continuous range of parts per million (ppm), all the way from very low levels is clear mountains
streams of extremely high levels in polluted streams. We would call the distribution of this
variable (ppm) a continuous distribution.

4.2. 3 Random variables


In experiment of chance, the outcomes occur randomly. So it is called random variable.
Random variable can be quantitative or qualitative. A quantitative random variable is a
quantity resulting from an experiment that, by chance can assume different values.

Discrete Random Variable: a random variable that can assume only certain clearly separated
values. It is usually the result of counting something or separated fractional or decimal values.

The Mean, Variance and Standard Deviation of a Probability Distribution

The expected value, mean or mathematical expectation of a random variable is the central
tendency measure of a random variable. Expected Value of a Random Variable: The
expected value of discrete random variable x, denoted by E(x) or μ, is the weighted mean of the
possible values that the random variable can assume, where the weight attached to each
value is the probability that the random variable will assume this value. In other words,

E(x) or μ = ∑[x P(x)]

2 = ∑[(x- μ)2 P(x)]

 = √2

Discrete Probability Distribution includes the following basic distributions:


1. The Binomial probability distribution
2. The Hyper geometric probability distribution
3. The Poisson probability distribution1.
The Binomial probability distribution

Business StatisticsPage 50
This distribution is one of the widely used probability distribution of a discrete
random variable. It describes discrete, not continuous, data resulting from an experiment
known as Bernoulli process (or experiment). This distribution was first developed by 17th
century Swiss mathematician, Jacob Bernoulli.
Properties of Binomial Experiment

1. The experiment consists of a series of n-identical trials.

2. In each trial there are only two possible outcomes. We refer to one outcome as ‘success’ and
the other as ‘failure’.

3. The probability of a success on one trial is denoted by P and does not change from
one trial to another. And the probability a failure, denoted by q, which is equal to 1-P,
does not change from trial to trial. (Stationary assumption);

4. Statistically, the trials are independent. Means the outcome of one trail does not affect the
outcome of other trails.

 If properties 2, 3 and 4 are present we say that a Bernoulli process generates


trials.
 If property 1 is present in addition to the three, we say that we have a binomial
experiment.

P(r) = ( nr) Pr (1-P)n-r

= ( nr) Pr qn-r

= nCr Pr qn-r
n!
P(r) = r ! ( n− r ) ! Pr qn-r

Where: P = probability of success


q = 1-p = probability of failure
n = number of trials undertaken
r = number of successes desired.
Example
2
In a community, the probability that a newly born child will be boy 5 . Among the 4 newly born
children in that community, what is the probability that

(a) All the four boys


(b) No boys

Business StatisticsPage 51
(c) Exactly one boy
Solution

Let us consider the event that a newly born child is a boy as success in Bernoulli trial with
2
probability of success 5 . Let the number of boys be a random variable X . Then X can take
values 0, 1, 2, 3, and 4.

According to binomial law, the probability function of X is

x 4 x
 2   4  2   3 
f  x, 4,        for x 0,1, 2,3, 4
 5   x 5   5 .

4 4 4
 4  2   3 
     
p all boys   p  x 4   4   5   5  0.0256 .
a)
0 4 0
 4  2   3 
     
p no boys   p  x 0   0   5   5  0.1296 .
b)
1 4 1
 4  2   3 
     
p exactly one boy   p  x 1  1   5   5  0.3456
c) .
Mean (Expected Value) and Variance of a Binomial Distribution
 Mean (Expected value) of a binomial distribution = p n = µ
 The variance of a binomial = σ2 = npq = n p(1-p) distribution
 The standard deviation is = σ = √ nPq
Example 1:-Take the case of a packaging machine that produces 20 percent defective
packages. If we take a random sample of 10 packages, compute the mean (expected value) and
the standard deviation of the binomial distribution of that process like this?

2. Hyper geometric Probability Distribution

This distribution is closely related to binomial probability distribution. But in hyper geometric
probability distribution, the trials are not independent. Thus, the probability of success changes
from trial to trial, the objective is to choose random sample of n-items out of a population of N
under condition that once an item has been selected, it is not returned to the population
(without replacement).

Business StatisticsPage 52
Properties of Hyper geometric Probability Distribution (Conditions for Hyper
Geometric Probability Distribution)
a) The events are two kinds only
b) The probability of success change in each trail
c) Events are dependent (because selection is without replacement but the manner of
dependence is one of kind only)
d) The trail are done fixed number of times
P(x) = (S C x) (N-S C n-x)
N Cn
Where: N= Population size
S = Number of successes in population
n = Sample size
x = Number of successes in a sample.
C = is the symbol for a combination
Examples 1:- If from 10 new technologies, 4 of them are classified as inappropriate to
local condition. What is the probability that a randomly selected 3 technologies without
replacement will contain 2 inappropriate technologies?
The hyper geometric probability formula P(r) = S N- S
x n- x
N
x
3 The Poisson Probability Distribution
The Poisson distribution named for its originator Simeon Denis Poisson (1781 – 1840), a
French man who developed the distribution from studies during the latter part of his lifetime.
This distribution is used to analyze the probability of small events or improbable events
within given time like the number accidents in given road and the number radiation leakages
within given time. Other examples of Poisson distribution include the distribution of telephone
calls going through a switch board system in given time, the demand of patients for service at a
health institution at given time period, the arrivals of trucks and cars at a toll booth in given time
period, and the a number of accidents at an intersection with in given time period, etc.

Properties of Poisson Distribution/ Conditions Leading to Poisson Distribution/

Business StatisticsPage 53
1. The probability of an occurrence of the event is the same for any two intervals of
equal length.
2. The occurrence or non-occurrence of the event in any interval is independent of the
occurrence or nonoccurrence in any other interval.
The Poisson probability function is given by probability of: P(x) = μx e-μ
x!
Where: P (x) = Probability of x occurrences in an interval or in a group
μ = the expected value or the average number of occurrences
e = constant equals to 2.71828 …

Example: A certain restaurant has a reputation for good food. The restaurant management boasts
that on a Saturday night, groups of customers arrive at a rate of 15 groups every half an hour, on
average.
a) What is the probability that 5 minutes will pass with no groups of customers arriving?
b) What is the probability that 8 groups of customers will arrive in 10 minutes?

4.2.4 Continuous Probability Distribution


Introduction: In this section, we shall turn to case in which the random variable can take
any value within a given range and in which the probability distribution is continuous. There are
several continuous probability distributions used in statistical work. In this course we treat only
Normal probability distribution.
Normal distribution is the most important probability distribution in Statistics. The normal
distribution was first developed in 1733 by English mathematician De Moivre.
Definition

A continuous random variable X is said to have a normal distribution if its density function is
given by

 x   2
1 
f  x,  ,  2   e 2 2
;   x 1
 2

where, the parameters  and  satisfy       and   0 .


2 2

Business StatisticsPage 54
The variable X whose density function given in (1) is called normal variate with parameters 
N  ,  2
.
and  and is denoted by The parameters  and  are actually the mean and
2 2

variance of the normal variable X .

Characteristics (Properties) of the Normal Probability Distribution


1. The normal curve is bell-shaped and symmetrical about its mean (µ). If the curves were
folded along its vertical axis, the two halves would coincide. The tails of the curve extend to
infinity in both direction and theoretically never touch the horizontal axis.
2. The highest point on the normal curve occurs at the mean, which are also the median
and the mode of the distribution. The height of the curve declines as we go on either
direction.
3. The standard deviation determines the width of the curve. Therefore, larger values of
standard deviation result in wider and flatter curves that show more dispersion in the data.
4. The area under the curve is equal to 1.
5. Areas under the curve give probabilities for the normally distributed variables. The area under
the normal curve is distributed as follows:
i) µ ± σ = 68.27%, one-tail each = 34.14% iii) µ ±2 σ = 95.45%
i)) µ ±1.96 σ = 95%, one-tail each = 47.5% iv) µ ±3 σ = 99.73%

Standard Normal Probability Distribution


The graph of the normal curve is

Standard Normal variable

Business StatisticsPage 55
X 
Z
If X is a normal variable with parameters  and  , then
2
 is a standard normal
distribution with mean zero and variance unity (standard deviation 1). The density function of
Z is
2
1  z2
f  z , 0,1  e ;   z 
2

Importance of Normal Distribution

 Most of the distributions occurring in practice can be approximated by normal


distribution. Moreover, many of the sampling distributions e., g., Student’s t , Snedecor’s
F , Chi-square distributions etc tend to normal for large samples.
 Normal distribution finds large applications in Statistical Quality Control in industry for
setting control limits.
 Note: Let X be a continuous random variable with a cumulative distribution function
F x 
and let a and b be two possible values of X , with a  b . The probability that X
lies between a and b is

p a  x  b  F b   F a 
Example

A company produces light bulbs whose life times follows a normal distribution with mean 1200
hours and standard deviation 250 hours. If a light bulb is chosen randomly from the company’s
output, what is the probability that its life time will be between 900 and 1300 hours?

Solution

Let X represent life time in hours. Then

 900   X   1300   
p 900  x  1300   p    
    
 900  1200 1300  1200 
p z 
 250 250 
 p  1.2  z  0.4 

 p    z  0.4   p    z   1.2 

Business StatisticsPage 56
0.65542  0.11507
0.54035 (By using Normal table)

Hence, the probability is approximately 0.54 that a light bulb will last between 900 and 1300
hours.

Example

A very large group of students obtains test scores that are normally distributed with mean 60 and
standard deviation 15. What proportion of students obtained scores?

a) Less than 85.


b) More than 90.
c) Between 85 and 95.
Solution

Let X denote the test score. Then

 X   85     85  60 
p   p z  
p  x  80       15 
a)
 p  z  1.67   p    z  1.67 

0.9525 . (By using Normal table).

That is 95.25% of the students obtained scores less than 80.

 X   90     90  60 
p   p z  
p  x  90       15 
b)
 p  z  2  1  p  z  2  1  p    z  2 

1  0.9772 0.0228 . (By using Normal table).

That is 2.28% of the students obtained scores more than 90.

 85   X   95   
p 85  x  95   p   
c)     
 85  60 95  60 
p z 
 15 15 

 p 1.67  z  2.33

Business StatisticsPage 57
 p    z  2.33  p    z  1.67 

0.9901  0.9525
0.03756 (By using Normal table)

That is 3.76% of the students obtained scores in the range 85 to 95.

Example

The average daily sales of 500 branch office were Tk. 150 thousands and the standard deviation
Tk. 15 thousands. Assuming the distribution to be normal indicate how many branches have
sales between

a) Tk. 120 thousands and Tk. 145 thousands.


b) Tk. 140 thousands and Tk. 165 thousands.
Solution

Let X be the average daily sales of 500 branch office.

 120   X   145   
p 120  x  145   p   
a)     
 120  150 145  150 
p z 
 15 15 

 p  2  z   0.33

 p    z   0.33  p    z   2 

0.3707  0.02275 0.34795 (By using Normal table)

Hence, the expected number of branches having sales between Tk. 120 thousands and Tk. 145
thousands are

0.3479 500 173.95 174 .

 140   X   165   
p 140  x  165   p   
b)     
 140  150 165  150 
p z 
 15 15 

 p  0.67  z  1

Business StatisticsPage 58
 p    z  1  p    z   0.67 

0.84434  0.25143 0.58991 (By using Normal table)

Hence, the expected number of branches having sales between Tk. 140 thousands and Tk. 165
thousands are

0.58991500 294.955 295 .

General rule: If both Zs are on the same side of the mean, then the area between them
can be obtained by subtracting. And if both Zs are on the opposite side of the mean, then
the area between them can be obtained by summing the two values.
Inverse Use of the Standard Normal probability Table
This means to find the value of Z, which corresponds to a given probability (P) in the table.
Example
1. Z (p) = Z (0.4864) = 2.21
2. Z (p) = Z (0.4922) = 2.42
What you have to do is reverse the early procedure. First find the closest approximate in
probabilities and if you go horizontal you will get the first decimal and first digit. And if
you go up you will get the second decimal of the Z value.
Given probability we can find the value of Z, then change Z to X value using the formula

Business StatisticsPage 59
UNIT 5: SAMPLING AND SAMPLING DISTRIBUTIONS
5.1 SAMPLING THEORY
5.1.1 Basic definitions
1. Population/Universe: It refers to the aggregate of statistical information in which all
members are covered by an investigation or enquiry. For example: marks obtained by students
in class 12th in Ethiopia.
-Total number of elements under investigation.

2. (N) Sampling frame: the list or procedure for defining the population.

-The lists of population.

-Sample is drawn from sample frame.

3. (n) Sample: It refers to the selection of the part of population with a view that it represents
whole population. Or subset of whole population.

-Sub unit(part) of the total population selected for detail investigation(inquiry).

4. Statistic: a number that describes some attribute of the sample. Ex. The average income of the
residents. (You can then use this to get to an estimate of the population parameter.
5.1.2 Need for sample
a. To get the maximum information about the population with minimum effort
b. By using sampling it saves our time and money because we do not collect data from whole
population.
c. Destruction of test units: if we want to know the quality of chocolates and if we check all
chocolates one by one so there will be possibility that all chocolates get waste or destroy. This is
one of the reason we used samples.
d. The physical impossibility of checking all items in the population.
e. Give accurate and reliable results.

Use of sampling in various areas


a. To get know the behavior of the customer.
b. In business for inspecting the lots of the material from the supplier.

Business StatisticsPage 60
c. For quality control in during production.

5.1.3 Designing and conducting a sampling study


1. Define target population
2. Select a sampling frame
3. Determine the type of sampling
4. Determine sample size
5. Conduct field work

Step 1: Define the target population


First of all we should identify what we want to survey on the basis of that population will be
selected. For example if the research/survey is conducted by a company selling lipsticks so the
target population is females. That is why the first stage of is to define the target population.
It is good.

Step 2: Select a sampling frame


Second stage is to make the framework that in females which age group we should consider. We
cannot take females having age 1-10, because they are not highly using this product that is why
among females we have to make the characteristics/attributes and on the basis of that population
will be identified.
Step 3: determine the type of sampling

Probability Non Probability


sampling sampling

1) Random or Probability Sampling: Every subject in the sample has the same chance of
getting selected. Therefore, the sample group possesses the same characteristics of the larger
population.

a. Simple Random Sample: (random selection) procedure that generates numbers or cases
strictly on the basis of chance. Ex: Lottery method. In this method each item has the equal
chance of being selected.

b. Systematic Sampling: under this method selection of items will be done after fix
distance/interval. Example: every 5th item from the population is selected.

Business StatisticsPage 61
c. Stratified Sampling: under this method the population is divided into different groups and
then from the groups sample will be identified randomly. Example population is divided into two
groups i.e. male and female and then from male and female sample will be selected randomly.
2) Non-random or Non Probability Sampling: used when probability sampling is too
expensive, or when exact representation of population is not important to study, or when the
population cannot be defined.
a. Judgment/Purposive sampling: under this method selection of sample is based on individual
judgment.

b. Convenience sampling: results from hanging out, use whoever around is. Stand on street
corner and survey people.

c. Snowball sampling (used references): one member of the sample is identified and then they
identify another person who could take part in the study and so on.

d. Quota sampling: Under this method a fixed quota is assigned. Ex: 50 salaried persons in the
age group of 25-30 years. Within this quota, the selection of sample items depends entirely on
personal judgment.

Step 4: Determine sample size


After selecting one of the methods from probability and non probability sampling sample size
will be selected. Further, it depends on if you want to generalize to the larger population.
According to the laws of numbers, the more we sample, the more accurate we can be. The idea
being that the larger the sample the more we can capture the diversity or similarities of the
population (homogeneous or heterogeneous).

Step 5: conduct field work


After deciding the sample size that in how many samples survey will be conducted, then at last
data collection from samples will be started.
5.1.4 SAMPLING AND NON SAMPLING ERRORS
Sampling error: The difference between the sample and population values is considered a
sampling error. Since the sample does not include all members of the population. Example: if
one measures the height of a thousand individuals from a country of one million, the average
height of the thousand is typically not the same as the average height of all one million people in
the country.

Non sampling error: A statistical error caused by human error to which a specific statistical
analysis is exposed. These errors can include, but are not limited to, data entry errors, biased

Business StatisticsPage 62
questions in a questionnaire, biased processing/decision making, inappropriate analysis
conclusions and false information provided by respondents

What is sampling bias?


Bias is a systematic error that can partiality your evaluation findings in some way. So, sampling
bias is consistent error that arises due to the sample selection.

For example, a survey of high school students to measure teenage use of illegal drugs will be a
biased sample because it does not include home schooled students or dropouts.

A sample is also biased if certain members are underrepresented or over represented relative
to others in the population.

For example, distributing a questionnaire at the end of a 3-day conference is likely to include
more people who are committed to the conference so their views would be overrepresented.

Selecting a sample using a telephone book will under represented people who cannot afford a
telephone, do not have a telephone, or do not list their telephone numbers.

*Sampling bias can occur any time your sample is not a random sample*
Why does it matter?
Sampling bias means that the data you collect may not be accurate or represent the group.
How can we know if the sample is biased?
Sometimes you can identify sampling bias just by being very thoughtful and comparing the
characteristics of respondents in your sample to what you know about the population in general.
Think about the demographic characteristics that might have an important relationship to their
answers. For example, if you know that gender is an important variable, and you know that the
population includes 50% males and 50% females, then the sample needs to include the same
proportions. If the sample includes 20% males, your results are likely to be biased because you
don’t have enough responses from men

5.2 Sampling distribution

Parameter: A parameter is a statistical measure based on each and every item of the population.

Statistic: it is a statistical measure based on sample.

Business StatisticsPage 63
SYMBOLS USED
Population Sample

Population mean= µ (mu) Sample mean= X


Population standard deviation= sigma (σ) Sample standard deviation= S
proportion of the defectives in the whole lot of proportion of the defectives in the sample= p
population= P (Capital letter P)= Õ (Small letter p)

Usually parameters are unknown and statistics are used to know the estimate of the
population.

5.2.1 Sampling distribution

It is a probability distribution of a given statistic based on random samples. In other words it is


the distribution of the statistic if we were to repeatedly draw samples from the populations.

Example: suppose a university class has 16 students and professor want to know the average age
of students. Suppose population mean i.e. mu is unknown to the professor. Suppose professor
know only age of 3 students i.e. n=age of 3 students is 20, 35, 40.

20+35+40/3= 33.33

Mean of 3 samples= 33.33 but we want to know the population mean i.e. µ (mu)

Sample means vary from sample to sample. In repeated sampling the value of the sample mean
vary from sample to sample. But the sampling distribution must be normal.

5.2.2 Sampling Distribution of the Sample Mean

- Sampling distribution of the sample mean is a probability distribution consisting of a list of all possible
sample means of a given sample size selected from a population, and the probability of occurrence associated
with each sample mean. It is also called the distribution of

- Sampling distribution of the mean is a probability distribution of a sample mean.

Consider population with a mean of µ and standard deviation of õ. Then if we draw sample cases of size n from the
total population (N), the number of possible sample kinds of the same size that can be drawn from this population

Business StatisticsPage 64
is given by NCn or CNn . Hence, the sample means that we can have are different according to the elements that
comprise our sample.

Sampling Distribution of the Sample proportion


Proportion means part or fraction.
Population proportion represents the fraction or part taken from population.
Sample proportion is a fraction or part taken from a sample.
Population proportion =Õ, sample proportion = q

Õ = X/N = number of success in the population/total population , and q = X/n = number of


success in the sample/number of sample.

Where: N= population size, n = sample size, X = the number of elements in the population or
sample that possess a specific characteristic.

Properties of sampling distribution of mean


1. Its mean X is the same as population mean µ (mu).
2. It is normally distributed provided that sample size is sufficiently large i.e. greater than
30.
3. It is useful in testing hypothesis.
4. Mean=median=mode.
What is Standard deviation?

In statistics and probability theory, the standard deviation (represented by the Greek letter
sigma, σ) shows how much variation or dispersion from the average exists. A low standard
deviation indicates that the data points tend to be very close to the mean (also called expected
value); a high standard deviation indicates that the data points are spread out over a large range
of values.
What is standard error?
Standard error of the given statistic is the standard deviation of sampling distribution of that
statistic.
Sample size Standard error
Increase Decrease
Decrease Increase

Formula for computation of standard deviation

Business StatisticsPage 65
Standard error of mean Standard error of proportion
When population standard deviation is known When population proportion is known
whether sample size is large or small
Standard error of mean (S.E.x) =σ
Standard error (S.E.P) = PQ
n
n
where σ = population standard deviation
where, P= population proportion
n= sample size
Q=1-P; n= Sample size

UNIT 6-STATISTICAL ESTIMATIONS

Introduction:

 As its name suggests, the objective of estimation is to determine the approximate value of
a population parameter on the basis of a sample statistic. For example, the sample mean
is employed to estimate the population mean.

 Statistical estimation implies estimating Population Parameters on the basis of a sample


statistic.

 Statistical estimation is the procedure of using a sample statistic to estimate population


parameter. Estimating population parameters like population mean, population
proportion, population standard deviation, and population variance by using a sample
statistic like sample mean, sample proportion, sample s.d and sample variance.

 Statistical estimation is also the process of estimating the value of a parameter from
information obtained from a sample.

We refer to the sample mean as the estimator of the population mean. Once the sample mean has
been computed, its value is called the estimate/point estimate. Parameters are estimated with
sample statistic value.

x µ
6.1 Basic Concepts of Statistical Estimation:

1. Estimation: is the process of estimating various unknown population parameters from sample
statistics.

2. Estimator: is a sample statistic that is used to estimate unknown population parameter.


Example: x = ∑X/n, is an estimator of the population mean.

Business StatisticsPage 66
3. Estimate/ point estimate: is numerical value of an estimator. It is the value taken by the
estimator as an estimate of the population parameter.

Example: the sample mean x = 100, sample S.D = 8 minutes, sample proportion: q = 5%.

6.2 Methods of Statistical Estimation

a) Point estimation

b) Interval estimation

A) Point Estimation:

-is the process of using a single value to estimate a population parameter. It is also a single
number which computed from a sample.

- An estimator is sample information (statistic) that is used to predict population information


(unknown parameters). A point estimator is a single value of an estimator for e.g. sample mean
can be used as estimator of population mean. Likewise, sample standard deviation can be used to
predict population standard deviation etc.

- Example: the elements in a random sample are: 1, 2, 4, 5, 7, and 11. Then, compute the
following:

a) The estimate of the population mean

b) The estimate of the population S.D

c) The estimate of the population standard error of the mean

Solution:

a) we know that : µ = ∑X/N

= ∑X/n,
x is the required.

x = ∑X/n = 30/6 = 5, is a point estimate of the population mean.


x = is an estimator of the population mean, µ.

b) Ϭ =√∑(X-µ)2

X-N

S = √∑(X-x ) 2 , is the required.

n-1

Business StatisticsPage 67
so, X (X- x ) (X- x )2
1 -4 16
2 -3 9
4 -1 1
5 0 0
7 2 4
11 6 36
66

Therefore, S = √∑(X- x )2 = √66/6-1 = √66/5 = 3.6332, is a point estimate of the population


n-1 standard deviation.
S = is an estimator of the population S.D, Ϭ.

C) Ϭ x = Ϭ/√n

S x = S/√n, is the required.


So,
S x = 3.6332/√6 = 1.483, is a point estimate of the S.D of the sample mean, Ϭ x

S x = is an estimator of S.D of the sample mean.

Three properties of a Good Estimator:


1. The estimator must be unbiased estimator.
-the expected value or the mean which obtained from samples of a given size must be
equal to the parameter being estimated.
2. The estimator must be consistent. For consistent estimator, as sample size increases, the
value of the estimator approaches the value of the parameter estimated.
3. The estimator must be relatively efficient estimator. I.e. it has the smallest variance.
In summary, 1) unbiased
2) Consistent
3) Relatively efficient estimator.
2.2 Point Estimation of the Population Proportion
Sample standard error of the proportion, Sq estimates the standard error of the proportion, Ϭq.
The formula used to estimate the unknown population proportion is:

p(1  p)
sp  Sq = sample standard error of the proportion
n
Business StatisticsPage 68
q = sample proportion of failure

p = sample proportion of success

n = sample size

Example: let even number be a success and suppose a sample of 200 numbers selected randomly
from a population contains 120 even numbers. Calculate point estimate of the standard error of
the proportion?

Solution: the required stands for sample standard deviation of the sample proportion (Sq):

P = sample proportion of success = no. of success in the sample = 120/200 = 0.6.

Total number of sample

q= 1-p = 1-0.6 = 0.4.

q= number of failure in the sample proportion

p= number of success in the sample proportion

Sq= sample standard error of the proportion

= sample standard deviation of the sample proportion

Ϭq = standard error of the proportion

= population S.D of the sample proportion

Therefore, Sq = √pq/n = √0.6 x 0.4/200 = 0.0346, is a point estimate of the population S.D of the
sample proportion.

Sq= is an estimator of population S. D of the sample proportion.

Point Estimates

We can estimate a Population Parameter … with a Sample


Statistic

(a Point Estimate)

Parameter... Statistic

Mean µ x

Proportion P= p̂

Variance Ϭ2 S2
Business StatisticsPage 69
Differences µ1- µ2 x1 _ x 2
1-2 q1-q2
Point and Interval Estimates:
 A point estimate is a single number,
 A confidence interval provides additional information about variability.

Lower Confidence Limit Upper confidence Limit


Point estimate
------------------------------------Width of Confidence Interval ------------------------------------
B. Interval Estimation

 Interval estimation is an interval or a range of values used to estimate the population


parameter. It is called confidence interval.

- The parameter occurs within that range.

- Confidence interval of the parameter is determined by using data obtained from a


sample and confidence level.

- Confidence level of an interval estimate of a parameter is the probability that the


interval estimate will contain the parameter.

 Confidence interval is a range of values constructed from sample data so that a


parameter occurs within that range at a preselected probability. The preselected
probability is termed the “level of confidence”.

 Point estimation produces a single value as an estimate of a population parameter.


The estimate may or may not be close to the actual parameter value.

However, interval estimation describes a range of values within which a parameter


might lie. So, interval estimates are more desirable than point estimates.

An interval estimate describes a range of values with in which a parameter might lie. Suppose that based on the
sample information, an investigator predicted that mean of a given population is between 6 and 7; this is what we
call an interval estimate.
Example: Sample mean = x = 50, is a point estimation.

Business StatisticsPage 70
I am 95% confident that µ is between 40 & 60. This is an example for interval estimation

Confidence Intervals:

 How much uncertainty is associated with a point estimate of a population parameter?

 An interval estimate provides more information about a population characteristic than


does a point estimate.

 Such interval estimates are called confidence intervals.

Confidence interval estimate:-

 An interval gives a range of values:

-Takes into consideration variation in sample statistics from sample to sample

- Based on observation from 1 sample

- Gives information about closeness to unknown population parameters

- Stated in terms of level of confidence Never 100% sure

Confidence Interval Estimation for the population Mean (µ) when Ϭ is known and n≥30

- Interval Estimates of a Population Mean


- How to find confidence interval for population mean?
- The confidence interval estimate of µ when Ϭ is known is given by the formula:

 
x z   x  z
n n

1) Interval changes from sample to sample

2) The probability that the population mean is within this range is not 100% (It is possible to
have a very unlikely sample mean comes out)

General Formula

 The general formula for all confidence intervals is:

Point Estimate ± (Critical Value) (Standard Error)

   
 z  x   z x z   x  z
n n n n

σ 71
Business StatisticsPage
x z α/2
n
, or

 The Z value determined based on the desired level of confidence.

x Ϭ is computed using the value of Ϭ and the sample size n.

Margin of Error:

 Margin of Error (e) is the amount added and subtracted to the point estimate to form the
confidence interval.

 Margin of error is called the maximum error of estimate,


z/2Ϭx .
 Note that the maximum error of estimate is the maximum d/ce b/n the point estimate of a
parameter and the actual value of the parameter.

 Example: Margin of error for estimating μ, σ known:

σ
The figure in the circle is Margin of error

e  z/2
σ n
x z/2
n

Factors Affecting Margin of Error:

σ
e  z/2
n
 Data variation, σ : e as σ

 Sample size, n : e as n

 Level of confidence, 1 - a : e if 1 - a

For the Following Questions Estimate the Population Means:

Business StatisticsPage 72
Example1: A sample of 11 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is .35 ohms. Determine
a 95% confidence interval for the true mean resistance of the population.

 Solution: σ
x z/2
n
2.20 1.96 (.35/ 11 )
2.20 .2068
1.9932 ............... 2.4068

 Interpretation: we are 95% confident that the true mean resistance is between 1.9932
and 2.4068 ohms.

Example 2: In order to know the Korean man’s height’s mean, randomly choose 100 persons.
The sample mean is 171.2 Estimate the mean of Korean man’s height with 95% confidence.
(Population standard deviation is assumed known as 10).

Sample mean = 171.2, n=100, Ϭ = 10.

 
x z   x  z
n n

171.2 ± 1.96 * 10/10 = [169.24, 173.16]

Interpretation: The mean belongs to the range [169.24, 173.16] with 95% confidence.

Example 3: A credit union wants to estimate the mean amount of outstanding loans. Past
experience reveals that the standard deviation is 250 birr. Determine a 98% confidence interval
estimate for the mean of all outstanding loans (population mean) if a random sample of 100
outstanding loans has a sample mean of 1, 950 birr.

Solution: Given x = 1950

Ϭ = 250

n = 100

Ϭ x = Ϭ/√n = 250/√100 = 25.

Confidence level = 98% = 0.98

z/2 = Z0.98/2 = Z0.49 = 2.33.

Business StatisticsPage 73
z/2
Therefore, the interval is = x ± Ϭx
= 1950 ± 2.33 (25)

= 1950 ± 58.25

The interval is then from 1891.75 to 2008.25.

Interpretation: the credit union can say with 98% confidence that the mean amount of
outstanding loans is b/n birr 1891.75 and 2008.25.

Level of confidence/confidence level, (1-a):-

 Confidence in which the interval will contain the unknown population parameter.

 A percentage (less than 100%).

 A probability that the population parameter falls somewhere within the interval.

- Denoted by (1 – α)%

- Typical values are 99%, 95%, and 90%

 
x z   x  z
n n

 confidence level : 1-α = 90% à z = 1.645

 1-α = 95% à z = 1.96

 1-α = 99% à z = 2.575

 The higher confidence level, the wider interval.

Common Levels of Confidence:-

 Commonly used confidence levels are 90%, 95%, and 99%.

Confidence Level Confidence Coefficient, (1-a) z value,


z/2
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33

Business StatisticsPage 74
99% .99 2.57
99.8% .998 3.08
99.9% .999 3.27

Width of interval:-

 
(x  z )  ( x  z )
LCLà n n ←UCL
 Lower limit ≤ population mean ≤ upper limit

 Left side is called LCL (Lower Confidence Limit)

 Right side is called UCL (Upper Confidence Limit)

àFactors decide the width of interval:

Confidence level 1- α, standard deviation σ, and sample size n:

① 1- α higher, interval wider

② σ bigger, interval wider

③ n bigger, interval narrower

If we repeat sampling and getting the interval 100 times, it shows


1) Different samples give different estimation interval
à LCL/UCL are random variables
2) 95%: Out of 100 intervals à 95 intervals include pop. Mean, 5 don’t.

Example 4: Advertisement Sponsors want to know average # of hours children spend watching
TV. Survey 10 kids to keep track of # of hours/ week and the sample mean is 29 and we know
that population s.d. = 8.0 hours by past experience The data follows normal distribution. ®
Find estimate of # of average hours kids are watching TV with 95% confidence interval.

A: parameter to be estimated = m
 8 8
x z 2 29 z.025 29 1.96 29 4.958 [24.04,33.96]
n 10 10

Business StatisticsPage 75
Expect the sample mean value:-
Example: Expect the sample mean:

Sample mean =? µ = 170, Ϭ =10, n = 100



 1.96
n
à 170 ± 1.96 * 10/10 = [168.02, 171.96]
Sample mean belongs to the range with 95%

à When we make an interval with the center of sample mean value, μ belongs to the new range
with 95%

Estimate the population means:-

Example: guessing the sample mean:

à 171.2 ± 1.96 * 10/10 = [169.24, 173.16]

Estimation Interval for μ & x :-

   
 z  x   z x z   x  z
n n n n

LHS : μ is known, expect the sample mean x

RHS : μ is unknown, estimate μ using x

This formula works only when x follows normal distribution and σ known.

Assumptions:-

 Population standard deviation σ is known

 Population is normally distributed

 If population is not normal, use large sample

Confidence Intervals for the Population Mean, :

 when Population Standard Deviation Ϭ is Known


 when Population Standard Deviation Ϭ is Unknown
Confidence Interval Estimation for the population mean μ when σ is unknown and n≥30:-

Business StatisticsPage 76
 When the original variable is normally distributed and Ϭ is known, the standard normal
distribution can be used to find confidence intervals regardless of the size of the sample.

 When n≥30, the distribution of means will be approximately normal even if the original variable
distribution departs from normality.

 Also, if n≥30 and Ϭ is unknown, S can be substituted for Ϭ in the formula for confidence
intervals; and the standard normal distribution can be used to find confidence intervals for means.

Example: A sample of 50 days showed that a fast-food restaurant served on average of 182
customers during lunch time (11:00 AM to 2:00 PM). The standard deviation of the sample was
8. Find the 90% confidence interval for the mean (population mean).

Solution: step 1: Find α/2.

Since the 90% confidence interval is to be used,

α= 1-0.90 = 0.10 (which is two tails).

α/2 = 0.10/2 = 0.05 (which is one tail).

Step 2: Find
z/2 , Z value.
Subtract 0.05 from 0.5000 to get 0.4500.

Therefore, the corresponding Z value obtained from Table E is 1.65.

Or, Given: n = 50, S = 8, x = 182

Confidence level = CL = 90% = 0.90

α= 1-0.90 = 0.10

α/2 = 0.05

z/2 = Z0.05 = 0.5 – 0.05 = 0.45.

Therefore, Z = 1.65.

Step 3: Substitute in the formula.

x -Zα/2 (S/√n) < µ< x + Zα/2 (S/√n), here S is used in place of Ϭ when Ϭ is unknown, since
n≥30).

= 182 – 1.65 (8/√50) <µ<182+1.65 (8/√50)

Business StatisticsPage 77
= 180.1 <µ< 183.9, or

= 180 <µ< 184.

Hence, one can be 90% confident that the true population mean is b/n 180 and 184, or 182 ± 2.

Knowing σ is not realistic! à Handle the case with unknown σ

 With σ Known
 
x  z / 2   x  z / 2
n n

 With σ Unknown, s s
x  t / 2   x  t / 2
n n
 Two things changed

1) σ à s (sample standard deviation)

If the population standard deviation σ is unknown, we can substitute


the sample standard deviation, s. This introduces extra uncertainty,
since s is variable from sample to sample. So we use the t distribution
instead of the normal distribution.

2) z à t (from t-distribution)
 Assumptions:

a. Population standard deviation is unknown

b. Population is normally distributed

c. If population is not normal, use large sample

d. Use Student’s t Distribution

Confidence Interval Estimate of µ in the t distribution or Ϭ is unknown:

s
x t/2
n
6.3 Student’s t Distribution

 When Ϭ is known and n≥30 or when Ϭ is unknown and n≥30, the standard normal
distribution should be used to find confidence intervals for the population mean.

Here, z/2 values can be used.


Business StatisticsPage 78
 When Ϭ is unknown and n<30, the t distribution must be used to find the confidence
interval of the population mean. In such situations, S (sample standard deviation) can be
used in place of Ϭ (the population standard deviation). Here, tα/2 values can be used. S x =
x
S/√n than Ϭ = Ϭ/√n.

 t distribution must be used when the sample size is less than 30 and the variable is
normally or approximately normally distributed.

Student’s t distribution is:

 Similar to the standard normal distribution

 Symmetric, Bell-shaped, and Mean = 0

-t-distributions are bell-shaped and symmetrical about the mean, but have ‘fatter’ tails
than the normal.

 Lower than the standard normal distribution

 The curve never touches x-axis.

 Several distributions à depends on the degrees of freedom (n-1)

 As d.f. increases, closer to standard normal.

-Note: t àz as n increases

- Standard Normal (t with df = ¥)

 The t is a family of distributions

 The t value depends on degrees of freedom (d.f.)

- Number of observations that are free to vary after sample mean has been
calculated, d.f. = n - 1

Degrees of Freedom (df): are the number of values that occur after a sample statistic has been
computed.

Idea: Number of observations that are free to vary after sample mean has been calculated.

Formula for finding a confidence interval for the population mean when Ϭ is unknown and
n<30:

s s
x  t / 2   x  t / 2
n n
Business StatisticsPage 79
Example 1:

A random sample of n = 25 has x = 50 and s = 8. Form a 95% confidence interval for μ.

d.f. = n – 1 = 24, so t
/2 , n  1 t .025,24 2.0639

The confidence interval is

s 8
x t/2  50 (2.0639)
n 25
46.698 …………….. 53.302

Example 2:

Advertisement Sponsors want to know # of hours children spend watching TV. Survey of 10
kids to keep track of # of hours / week and the mean is 29 and the standard deviation is 8.2. The
data follows normal distribution. ® Find estimate of # of average hours N.A. kids are watching
TV with 95% confidence interval

A: parameter to be estimated = m
s 8.2 8.2
x t 2,n  1,  29 t0.025,9  29 2.262  29 5.866 [23.13,34.87]
n 10 10
vs. [24.04, 33.96]: example 2

Slightly bigger interval than z-estimation (reflects the uncertainty of unknown σ).

t-distribution table

 t Value:

-In the t distribution table, we see that t.025, 9 = 2.262.

Degrees Area in Upper Tail


of Freedom .10 .05 .025 .01 .005
. . . . . .
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
. . . . . .
Pre-condition of Estimation

Business StatisticsPage 80
 If σ is known: Use normal distribution.

 If σ is unknown: Use t distribution.

 Pre-condition for both estimation à x follows normal

 If population is not normal and the sample size < 30

à We do not know x follows normal

à Estimation not possible

(In this case, use non-parametric test)

6.4 Determining Sample Size:-

 The required sample size can be found to reach a desired margin of error (e) and level of
confidence (1 - a) using margin of error formula. e= Z a/2 ( σ/ )
n
 Required sample size, σ known:
2
z2/2 σ 2  z/2 σ 
n  2  
e  e 
Example: If s = 45, what sample size is needed to be 90% confident of being correct within ± 5?

2 2
 z σ   1.645(45) 
n  /2    219.19
 e   5 
So the required sample size is n = 220, (Always round up).

Confidence Intervals for the Population Proportion, p

 An interval estimate for the population proportion (p) can be calculated by adding an
allowance for uncertainty to the sample proportion
p̂ ( ).

 Sample ratio q à Estimate p or 


 Find out the distribution of the sample proportion

 Recall that the distribution of the sample proportion is approximately normal if the
sample size is large, with standard deviation:

p(1  p)
σp 
n
p(1  p)
 We will estimate this with sample data: sp 
n
Business StatisticsPage 81
(When n > 20, np > 5, and n (1 - p) > 5)

 We do not know p à use p̂


pˆ z / 2 pˆ (1  pˆ ) / n

 NO need to use t-distribution always, z-estimation

 Confidence interval endpoints: Upper and lower confidence limits for the population
proportion are calculated with the formula,

p (1  p )
p z/2
n

Where:

 z is the standard normal value for the level of confidence desired

 p̂ is the sample proportion

 n is the sample size

Example: A survey is conducted to see if what percentage of CEOs has MBA degree for mid-
size companies. 97 CEOs have MBA degree out of 344 people. Estimate the proportion of
MBA degree holders among all mid-size companies. (Confidence level 95%)

Answer:

n=344, x=97 èp̂ = 97/344 = 0.282

Standard deviation = pˆ (1  pˆ ) / n 0=.282(1  0.282) / 344 = 0.024

pˆ z / 2 pˆ (1  pˆ ) / n = 0.282 ± 1.96 * 0.024 = [0.242, 0.322]

Example: A random sample of 100 people shows that 25 are left-handed. Form a 95%
confidence interval for the true proportion of left-handers.

Solution:
p 25/100 .25
Sp  p(1  p)/n  .25(.75)/n .0433

.25 1.96 (.0433)


0.1651 . . . . . 0.3349
Business StatisticsPage 82
Interpretation: We are 95% confident that the true percentage of left-handers in the population
is between 16.51% and 33.49%.

Changing the sample size:

 Increases in the sample size reduce the width of the confidence interval.

Example:

If the sample size in the above example is doubled to 200, and if 50 are left-handed in the
sample, then the interval is still centered at .25, but the width shrinks to .19 …….31

Finding the Required Sample Size for proportion problems

p (1  p )
Define the margin of error: e  z/2
n
Solve for n:
z2/2 p (1  p )
n
e2
P can be estimated with a pilot sample, if necessary (or conservatively use p = .50).

Example:

For 95% confidence, use Z = 1.96

E = .03

p̂ = .12, so use this to estimate p

z2/2 p (1  p ) (1.96) 2 (.12)(1  .12)


n  450.74
e2 (.03) 2

So use n = 451

UNIT 7 - HYPOTHESIS TESTING

Introduction

Business StatisticsPage 83
 Meaning of hypothesis: Hypothesis is an assumption or an informed guess made
about a population characteristic. It can also be defined as an unproven statement or
proposition about something under investigation.
 Hypothesis: is statement about population parameter. Or it is an assumption/ guess
about population parameters.

 This assumption may or may not be true.

 Hypothesis testing: it is a process of making a decision whether to accept or reject an


assumption about the population parameter.
 Hypothesis testing: is a procedure based on sample evidence and probability theory to
determine whether the hypothesis is a reasonable statement.
 Hypothesis testing is a statistical inference approach by means of which assumptions
about the unknown values of population parameters are proofed based on the sample
evidence.
 The generally applicable hypothesis form is to consider hypothesis that has =, ≥, or ≤ sign
as Ho and hypothesis with strict inequality as Ha i.e. with < or > sign.
 There are two kinds of hypothesis in each test, Null and Alternative hypothesis.
 Hypothesis means a tentative statement forwarded about the population parameter.

Procedure for testing a hypothesis

 Hypothesis testing starts with formulation of a hypothesis and ends with a decision to
accept or reject the hypothesis.
 There is five-step procedure that systematizes hypothesis testing. These are;

Step 1: state the null hypothesis (H0) and alternate hypothesis (HA)

 The first step is to state the hypothesis being tested, that is null hypothesis. It is
designated H0 were H stands for hypothesis, and the subscript zero implies “no
difference.”

 Null hypothesis:

- is a statement about the value of a population parameter.

- It is a statement that is not rejected if our sample data fail to provide convincing

evidence that it is false

Business StatisticsPage 84
 Alternate hypothesis: is a statement that is accepted if the sample data provide enough
evidence that the null hypothesis is false. It is written HA. It describe that you will
conclude if you reject the null hypothesis.

 In summary:

 The first step of statistical testing is to convert the research question into null and
alternative forms.
 H0 is a statement of “no difference” i.e. H0: μ1 = μ2 (Null Hypothesis)
 H1: μ1 ≠ μ2 (in case of two tail test) alternate hypothesis
 H2: μ1 > μ2 (in case of right tale test) alternate hypothesis
 H3: μ1 < μ2 (in case left tale test) alternate hypothesis
Step 2: selecting the level of significance

 Level of significance is the probability of rejecting null hypothesis when it is true. It is


designated α, the Greek letter alpha. It is also called the level of risk.

 Level of significance is a measure of degree of risk that a researcher might reject the null
hypothesis when the null hypothesis is not true.

The choice of the level of significance should be made before we collect the data. The most
common level is .05 or 5%, although .01 or 1% is also widely used. A 5% level of significance
implies that there is 5% probability that we may wrongly conclude that there is a difference
between the sample statistic and the hypothesized population parameter, when there is no
difference between them.

Critical region/rejection region: If the value of the statistic falls in the critical region, the null
hypothesis is rejected. If the value of the test statistic does not fall in the critical region the null
hypothesis accepted.

Two types of errors can result from a hypothesis test

Business StatisticsPage 85
 Type I error: A Type I error occurs when the researcher rejects a null hypothesis when
it is true. The probability of committing a Type I error is called the significance level.
This probability is also called alpha, and is often denoted by α.

 Type II error: A Type II error occurs when the researcher fails to reject a null
hypothesis when it is false. The probability of committing a Type II error is called Beta,
and is often denoted by β. The probability of not committing a Type II error is called the
Power of the test.

Step 3: compute the test statistics

 There is many sample statistics. In this chapter we will use only Z and t.

 Test statistic is a value determined from sample information used to determine whether to
reject the null hypothesis.


samplemean  population mean x  
z 
 The z value associated with sample mean is s tan darderror x

In this stage the test statistic is used to test whether we may accept or reject the null hypothesis.
There are different test statistics, some of these test statistics are z-test, t-test and chi-square test.
It is used as a guide in decision making regarding acceptance or rejection of H0.

Z Test Sample size large i.e. more than 30


t-test Small sample is less than 30
Chi square To know the discrepancy between observed and expected frequency.

Step 4: Formulate the Decision Rule

 Decision rule is a statement of the conditions under which the null hypothesis is rejected
and the condition under which it is not rejected.

 Critical value: is the dividing point between the region where the null hypothesis is
rejected and the region where it is not rejected.

 Therefore, the decision rule is critical value.

 In general,

Business StatisticsPage 86
 If Calculated value < Table Value (H0 not rejected)
 If p-value < 0.05 then a significant relation exists between the dependent & independent
variable i.e. it is because of some assignable cause (H0 rejected)
 If p-value > 0.05 then a no significant relation exists between the dependent &
independent variable i.e. it is because of chance only (H0 not rejected).
Or
Simple concept to learn when p-value is low i.e. less than .05 null hypothesis reject (when p
is low H0 go).

Step 5: Make a decision

Based on sample information, compute for test statistics (i.e. z or t), and check against decision
rule. Finally make decision.

The Null hypothesis (H0):


 There is no significant difference between sample statistic and population parameter.
 It is statement about one or more population parameters that is believed to be true until
enough statistical evidence is provided by the researcher to conclude otherwise.
 It is rejected only when it becomes evidently false, that is, when the researcher is
confident that the data do not support it. A null hypothesis is usually denoted by H0.
For example, if we want to compare the test scores of two random samples of boys and girls, a
null hypothesis would be that the mean score of the boy population was the same as the mean
score of the girl population:
 H0: μ1 = μ2 ( it means there is no significant difference between mean score of boy
and girl)
Where, - H0 = the null hypothesis
- μ1 = the mean of boy population, μ2 = the mean of girl population.
The symbol μ (mu) is used to represent the mean score of the population. Greek letter μ refers to
population parameters because we are attempting to make an inference about a population.

 States the assumption (numerical) to be tested.

Example: The average number of TV sets in U.S. Homes is at least three ( H : μ 3 )


0

Business StatisticsPage 87
 Is always about a population parameter, not about a sample statistic.

This is not correct

This is : μ 3
H 0correct H 0 : x 3

 Begin with the assumption that the null hypothesis is true.

◦ Similar to the idea of innocent until proven guilty

 Refers to the status quo.

 Always contains “=” , “≤” or “³” sign

 It May or may not be rejected.

The Alternative Hypothesis, HA


 Alternate hypothesis (Ha): It is a statement believed to be true if the null hypothesis is
false.
 An alternative hypothesis is usually denoted by Ha. Depending upon the research there
can be several alternative hypotheses.
• In the above said example, we may state null and alternative hypothesis as follows:
H0: μ1 = μ2 (Null Hypothesis)
Ha: μ1 ≠ μ2 (Alternative Hypothesis)
• In this example, if the null hypothesis is false, then one of the several alternative
hypotheses could be true.
Ha1: μ1 ≠ μ2 (there is significant difference between mean score of boy and girl)
Ha2: μ1 > μ2 (mean score of boy is greater than girl)
Ha3: μ1 < μ2 (mean score of boy is less than girl)

 Is the opposite of the null hypothesis

◦ e.g.: The average number of TV sets in U.S. homes is less than 3 ( HA: m < 3 )

 Challenges the status quo

 Never contains the “=” , “≤” or “³” sign

 May or may not be accepted

Business StatisticsPage 88
 Is generally the hypothesis that is believed (or needs to be supported) by the researcher

Level of Significance, a

 Defines unlikely values of sample statistic if null hypothesis is true

◦ Defines rejection region of the sampling distribution

 Is designated by a , (level of significance)

◦ Typical values are .01, .05, or .10

 Is selected by the researcher at the beginning

 Provides the critical value(s) of the test

Types of error: These are errors that are committed in making decisions.

In hypothesis testing sample evidence is used to test the null hypothesis. If the sample evidence convinces us that
the null hypothesis has a very small chance of being correct, the hypothesis will be unreasonable and hence should
be rejected. But if it has a greater chance of being true, then it should be accepted. No matter how we cannot be
100% sure about our conclusion as it is based on sample evidence. That means there is a possibility to accept a
false hypothesis or reject a true hypothesis. Hence, our conclusion is erroneous if a true hypothesis is rejected or if
a false one is accepted. From this one can easily understand that there are two possible errors, the first to reject a
true hypothesis, which is called type one error (a) and the second to accept a false hypothesis, which is called type
two error (B). Hence, type one error is said to have been committed only if a true hypothesis is rejected and type
two when a false hypothesis is accepted.

Both the two types of errors are not desirable as far as the reliability of the conclusion concerns. Hence,
statisticians should strive to avoid both errors entirely. However, it is not possible to avoid the possibilities of
making these errors as long as our conclusion is based on sample information. Hence, the objective should be to set
the chance of making errors at a low value. No matter how, there is one thing that must be taken into consideration
in doing so. To set both the two kinds of errors at a low value is somehow impossible as there is a tradeoff between
the two types. That means when we try to set the possibility of committing type 1 error at a low value the

possibility of committing type two errors will be higher. This by no means that    1 rather it means the

higher  is, the lower  is, and vice versa.

Business StatisticsPage 89
 Type I Error: The researcher rejects a null hypothesis that actually is true. This error is
shown by alpha (α).

◦ Reject a true null hypothesis

◦ Considered a serious type of error

The probability of Type I Error is a

 Called level of significance of the test

 Set by researcher in advance

 Type II Error: The researcher accepts a null hypothesis that is actually not true. It is
shown by beta (β).

◦ Fail to reject a false null hypothesis

The probability of Type II Error is β

Outcomes and Probabilities

State of Nature

Decision H0 True H0 False

Do Not Reject H0 No error (1 - α ) Type II Error ( β )

Reject H0 Type I Error ( α ) No Error ( 1 - β )

Example:

<Errors in Decision Making>

Truth Innocent Guilty

Decision

Innocent(H0) Correct error (Type II)

Guilty (H1 ) error (Type I) correct (goal)

Business StatisticsPage 90
Under the modern justice system, Type I error is more serious

• When we make a Type I error, it is a serious mistake!

• Hypothesis testing is analogous to a criminal trial.

H0: The defendant is innocent

H1: The defendant is guilty

• A person is accused of crime, and the jury does not know which is really true, and make a
decision on the basis of evidence found (Choose H0 or H1). However, the process is not
equal

• If enough evidence, the jury rejects H0(innocent), accepts H1 (guilty)

In case of not enough evidence, the jury does not reject H0 (innocent) & do not accept H1
(guilty) (But....we don’t say we accept H0)

Type I & II Error Relationship

 Type I and Type II errors cannot happen at the same time

 Type I error can only occur if H0 is true

 Type II error can only occur if H0 is false

If Type I error probability ( a ) , then Type II error probability (β)

Factors Affecting Type II Error

 All else equal,

◦ β when the difference between hypothesized parameter and its true value

◦ β when a

◦ β when σ

◦ β when n

Steps of calculation hypothesis


Step 1: setup a null and alternate hypothesis

Business StatisticsPage 91
Step 2: select z test if sample size is more than 30 or population standard deviation is known.
Step 3: calculate standard error of mean by using following formula
S.E. X =

n
Where = Population standard deviation, n= sample size
Step 4: calculate the value of Z as follows:
Z= X–μ
S.E. X
Where, X = Sample Mean, μ= Population mean
Step 5: Calculate value of z at 5 % level of significance from normal distribution table.
If calculated value is less than table value, we accept null hypothesis and conclude that there is
no significant difference
But if calculated value is more than table value, we reject null hypothesis and conclude that
there is significant difference.
How clear the evidence should be? (In order to reject H0)

- When the judge sentenced guilty with 70% crime evidence, guilty decision à 30% misjudge
possible! (too risky!)

à Set free (Reject H1)

à It does not guarantee the defendant is innocent!

-When the judge think 97% the judge is guilty, Guilty decision à 3% misjudge possible (very
small)

à Guilty decision and hang him. (Accept H1)

What is the guideline for guilty decision?

à Level of significance α (mistake probability)

à Maximum mistake probability the jury allows

In democratic culture, try to lower the mistake probability. But, if we allow 0.1% mistake
probability à hard to punish any criminals:

In statistics à

Business StatisticsPage 92
The smaller the α is, safer (mistake possibility smaller) but harder to prove something.

P-value >α à cannot reject Ho (too much risk)

P-value <α à Reject Ho

Significance level: Maximum allowed type 1 error probability.

• P-value (= observed significance level)

• Type 1 Error probability if the Ho is rejected for a certain case

Practical example
Question: Philips Company claims that the length of life of its electric bulb is 2000 hours with
standard deviation of 30 hours. A random sample of 25 showed an average life of 1940 hours
with a standard deviation of 25 hours. At 5% level of significance can we conclude that the
sample has come from a population mean of 2000 hours?

Solution:
Step 1: set up a null hypothesis:

H0: μ = 2000 hours


H1: μ ≠ 2000 hours

Step 2: calculate standard error


S.E.x = = 30 = 6
n 25

Step 3: Calculate the value of Z


Z= X-μ
S.E.X
Z = 1940-2000 = - 10
6
Business StatisticsPage 93
Step 4: at 5%, the table value of z is 1.96

Step 5: Decision
Since the calculated value of Z is more than the table value, hence we reject null hypothesis
and conclude that sample has not come from the population with mean of 2000 hours.

UNIT8- REGRESSION AND CORRELATION

8.1 Linear correlation

Correlation is a statistical (technique) method used to determine whether a relationship between


variables. Correlation thus denotes the interdependence amongst variables. The degrees are
expressed by a coefficient which ranges between -1 and 1 . The direction of change is indicated
by  or - signs.

If the increase (decrease) in one variable results in the corresponding increase (decrease) in the
others i.e. if the changes are in the same directions the variables are positively correlated. For
example, the heights and weights of a group of persons are positively correlated, advertising and
sales.

If the increase (decrease) in one variable results in the corresponding decrease (increase) in the
others i.e. if the changes are in the opposite directions the variables are negatively correlated.
For example, T.V registration and cinema attendance is negatively correlated.

An absence of correlation is indicated by zero.

Correlation thus expresses the relationship through a relative measure of change and it has
nothing to do with the units in which the variables are expressed.

Scatter Diagram

Business StatisticsPage 94
Scatter diagram (or Dot gram or Scatter gram) is a simple and attractive method of diagrammatic
represent of variable distribution for ascertaining the nature of correlation between the variables.

Thus for the variable distribution


 xi , yi , i 1, 2,..., n if the values of the variables X and Y be
plotted along the X -axis and Y -axis respectively in the XY plane, the diagram of dots so
obtained is known as scatter diagram.

On the other hand, a scatter plot of two variables shows the values of one variable on the Y -axis
and the values of the other variable on the X -axis. Scatter plots are well suited for revealing the
relationship between two variables.

Scatter Diagram

70
60
............... y ........

50
40
30
20
10
0
0 20 40 60 80 100 120
............. x ........

Types of Correlation

Correlation is described or classified in several different ways. Three of the most important are:

 Positive and negative Correlation


 Simple, partial and multiple Correlation
 Linear and non-linear Correlation

Positive and negative correlation

If two variables changes in the same direction (i.e. if one increases the other also increase or if
one decreases the other also decreases) then this is called a positive correlation. For example:

Positive Correlation Positive Correlation

Business StatisticsPage 95
X Y X Y

10 15 80 50

12 20 70 45

14 22 60 30

18 25 40 20

20 37 30 10

If two variables change in the opposite direction (i.e. if one increases, the other decreases and
vice versa); then the correlation is called a negative correlation. For example: T.V registrations
and cinema attendance.

Negative Negative
Correlation Correlation

X Y X Y

20 40 100 10

30 30 90 20

40 22 60 30

60 15 40 40

80 12 30 50

2. Simple, Partial and Multiple Correlation

 When only two variables are studied it is a problem of simple correlation.


 When three or more variables are studied it is a problem of either multiple or partial
correlation.
In multiple correlations three or more variables are studied simultaneously. For example, when
we study the relationship between the yield of rice per acre and both the amount of rainfall and
the amount of fertilizers used, it is problem of multiple correlations. Similarly the relationship of
plastic hardness, temperature and pressure is multivariate.

Business StatisticsPage 96
In partial correlation we recognize more than two variables. But consider only two variables to
be influencing variable being kept constant. For example, in the rice problem taken above if we
limit our correlation analysis of yield and rainfall to periods when a certain average daily
temperature existed, it becomes a problem of partial correlation.

4. Linear and non-linear correlation


The nature of the graph gives us the idea of the linear type of correlation between two
variables. If the graph is in a straight line, the correlation is called a “linear correlation”

and if the graph is not in a straight line, the correlation is non-linear and curve-linear.

The distinction between linear and non-linear correlation is based upon the constancy of the ratio
of change between the variables. If the amount of change in one variable tends to bear a constant
ratio to the amount of change in the other variable then the correlation is said to be linear. For
example, observe the following two variables X and Y:

X: 10 20 30 40 50
Y: 70 140 210 280 350

It is clear that the ratio of change between the two variables is the same. If such variables are
plotted on a graph paper all the plotted points would fall on a straight line.

Scatter Diagram

400
........... y ............

300
200
100
0
0 20 40 60
........... x ............

Correlation would be called non-linear or curvilinear if the amount of change in one variable
doesn’t bear a constant ratio to the amount of change in the other variable. For example, if we
double the amount of rainfall, the production of rice or wheat etc. would not necessarily be
doubled.

Business StatisticsPage 97
Scatter Diagram

........... y ..........
2000
1000
0
0 10 20 30 40 50
....... x ...........

8.1.1 The Coefficient of Correlation


Properties of the Coefficient of Correlation

The following are the important properties of the coefficient of correlation, r :

 The coefficient of correlation lies between -1 and 1 , -1 r 1 .


 The coefficient of correlation is the geometric mean of the two regression coefficients.
r  bxy byx
Symbolically:
 If X and Y are independent variables then coefficient of correlation is zero. However, the
converse is not true.

Degrees of Correlation

Through the coefficient of correlation, we can measure the degree or extent of the correlation
between two variables. On the basis of the coefficient of correlation we can also determine
whether the correlation is positive or negative and also its degree or extent.

 Perfect correlation: If two variables changes in the same direction and in the same
proportion, the correlation between the two is perfect positive. According to Karl
Pearson the coefficient of correlation in this case is 1 . On the other hand, if the variables
change in the opposite direction and in the same proportion, the correlation is perfect
negative. Its coefficient of correlation is -1 . In practice we rarely come across these types
of correlations.

Business StatisticsPage 98
 Absence of correlation: If two series of two variables exhibit no relations between them
or change in variable does not lead to a change in the other variable, then we can firmly
say that there is no correlation or absurd correlation between the two variables. In such
a case the coefficient of correlation is 0.

 Limited degrees of correlation: If two variables are not perfectly correlated or is there a
perfect absence of correlation, then we term the correlation as Limited correlation. It may
be positive, negative or zero but lies with the limits 1 .

High degree, moderate degree or low degrees are the three categories of this kind of correlation.
The following table reveals the effect (or degree) of coefficient or correlation.

Degrees Positive Negative

Absence of
Zero 0
correlation 

Perfect correlation  +1 -1

High degree  + 0.75 to + 1 - 0.75 to -1

Moderate degree  + 0.25 to +


- 0.25 to - 0.75
0.75

Low degree  0 to 0.25 0 to - 0.25

Methods of Determining Correlation

We shall consider the following most commonly used methods.

 Scatter Plot.
 Karl Pearson’s coefficient of correlation.
 Spearman’s Rank-correlation coefficient.
 Method of Least Squares.

Scatter Plot (Scatter diagram or dot diagram)

In this method the values of the two variables are plotted on a graph paper. One is taken along
the horizontal ( X -axis) and the other along the vertical ( Y -axis). By plotting the data, we get
points (dots) on the graph which are generally scattered and hence the name ‘Scatter Plot’.

Business StatisticsPage 99
The manner in which these points are scattered, suggest the degree and the direction of
correlation. The degree of correlation is denoted by ‘ r ’ and its direction is given by the signs
positive and negative.

 If all points lie on a rising straight line the correlation is perfectly positive and r   1 .

Scatter Diagram

150

............. y ...........
100

50

0
8 13 18 23
......... x .............

 If all points lie on a falling straight line the correlation is perfectly negative and r  -1 .

Scatter Diagram
.......... y .........

100
80
60
40
20
0
10 20 30 40 50 60
.............. x ..........

 If the points lie in narrow strip, rising upwards, the correlation is high
degree of positive.
 If the points lie in a narrow strip, falling downwards, the correlation is
high degree of negative.
 If the points are spread widely over a broad strip, rising upwards, the
correlation is low degree positive.
 If the points are spread widely over a broad strip, falling downward, the
correlation is low degree negative.
 If the points are spread (scattered) without any specific pattern, the

correlation is absent. i.e. r  0 .


Business StatisticsPage 100
Scatter Diagram

60

............ y ...........
50
40
30
20
10
0
0 10 20 30 40
............ x ..........

Though this method is simple and is a rough idea about the existence and
the degree of correlation, it is not reliable. As it is not a mathematical
method, it cannot measure the degree of correlation.

Example1: Given the following pairs of values:

Capital employed (Crores of Rs.): 1 2 3 4 5 7 8 9 11 12


Profit (Lakhs of Rs.) : 3 5 4 7 9 8 10 11 12 14

1) Make a scatter diagram


2) Do you think that there is any correlation between profits and capital employed? Is it
positive? Is it high or low?
Correlation between profits and Capital
employed(Crores of Rs.)

16
14
12
10
Profit

8 Profit(Lakhs of Rs.)
6
4
2
0
0 5 10 15
Capital Employed

By looking at the scatter diagram we can say that the variables profits and capital employed are
correlated. Further, correlation is positive because the trend to the points is upward rising from
the lower left hand corner to the upper right hand corner of the diagram.

Business StatisticsPage 101


The diagram also indicate that the degree of relationship is high because the plotted points are in
a narrow band which shows that it is a case of high degree of positive correlation.

Karl Pearson’s Coefficient of Correlation

Of the several mathematical methods of measuring correlation, the Karl Pearson’s method,
popularly known as Pearsonian coefficient of correlation, is most widely used in practice. The
coefficient of correlation is denoted by the symbol r. If the two variables under study are X and
Y, the following formula suggested by Karl Pearson can be used for measuring the degree of
relationship.

 XY   N
X Y
r

 X     Y  
2 2

  X 2   Y 2
 
 N   N 


The value of the coefficient of correlation as obtained by the above formula shall always lie
between 1 .

When r 1 , it means there is perfect positive correlation between the variables.

When r - 1 , it means there is a perfect negative correlation between the variables.

When r 0 , it means there is no relationship between the variables.

Example1: Calculate the coefficient of correlation between the heights of father and his son for
the following data.

Height of father 16 16 16 16 16 16 17
172
(cm): 5 6 7 8 7 9 0

Height of son 16 16 16 17 16 17 16
171
(cm): 7 8 5 2 8 2 9

Solution:

We know that. Correlation of coefficient

Business StatisticsPage 102


 XY   N
X Y
r

 X     Y  
2 2

  X 2   Y 2
 
 N   N 


Let us consider the height of father is X and height of son is Y .

By using calculator we get,

X 2
= 225828  X =1344 N 8

Y 2
= 228532  Y = 1352  XY = 227160

1344 1352
227160 
r 8
  1344   
2
1352  
2

 225828   228532  
  8   8 
  = 0.603022689 = 0.603

Example2: The following data consist of observations for the weights of 10 different
automobiles (in 1000 pounds) and the corresponding fuel consumptions (gallons per 100 miles).

Weight (x) Fuel Consumption (y)


3.4 5.5
3.8 5.9
4.1 6.5
2.2 3.3
2.6 3.6
2.9 4.6
2.0 2.9
2.7 3.6
1.9 3.1
3.4 4.9

We would like to find out how y is correlated to x.

Solution: We know that. Correlation of coefficient

Business StatisticsPage 103


 XY   N
X Y
r

 X     Y  
2 2

  X 2   Y 2
 
 N   N 


By using calculator we get,

X 2
= 89.29  X = 29 N 8

Y 2
= 207.31  Y =43.9  XY =135.8
29 43.9
135.8 
r 10
   29   
2
 43.9  
2

 89.29   207.31  
  10   10 
  = 0.976629971 = 0.976

8.1.2 Rank Correlation Coefficient


(3)Spearman’s Rank Correlation
The association between two series of rank is called rank correlation. The method of ascertaining
the coefficient of correlation by ranks was devised by Charles Edwards Spearman in 1904.This
method is especially useful in case when the actual magnitudes or item values are not given and
simply their ranks in the series are known. Spearman’s rank correlation coefficient, usually
denoted by  (Rho) is given by the formula:

6 d i 6 d i
2 2

 1  1 
n3  n n(n 2  1)

Where d stands for the difference between the pair of ranks and n the number of paired
observations

The value of Spearman’s rank correlation coefficient ranges between -1 and 1 .When 

is 1 , the concordance between rankings is perfect and the ranks are in the same direction. When
 is -1 , there is also perfect concordance between rankings but the ranks in opposite direction.

In rank correlation we may have two types of problems:

Business StatisticsPage 104


A. Where actual ranks are given.

B. Where ranks are not given.

A. Where Actual Ranks are given

Where Actual Ranks are given the steps required for computing rank correlation are:

 Take the difference of the two ranks i.e


R1 - R2  and denote these differences by d .

Square these differences and obtain the total 


2
d
 i

6 d i
2

 1  3
 Apply the formula: n n
Example1: Two managers are asked to rank a group of employees in order of potential for
eventually becoming top managers .The rankings are as follows:

Employee Ranked by manager I Ranked by Manager II


A 10 9
B 2 4
C 1 2
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Compute the coefficient of rank correlation and comment on the value.

Solution:

Calculation of Rank Correlation Coefficient

Employee Ranked by manager Ranked by Manager II


I
( R1 ) ( R2 ) d 2  R1 - R2 
2

A 10 9
B 2 4 By using
C 1 2 Calculator
D 4 3
E 3 1
F 6 5

Business StatisticsPage 105


G 5 6
H 8 8
I 7 7
J 9 10
Total
d
2
i 14

6 d i
2
6 14
 1  3
We know that, n  n =1- 103  10 = 0.915

Thus we find that there is a high degree of positive correlation in the ranks assigned by the two
managers.

B. Where Ranks are not given

When we are given the actual data and not the ranks it will be necessary to assigns the
ranks .Ranks can be assigned by taking either the highest value as 1 or the lowest value as 1. But
whether we start with the lowest value or the highest value we must follow the same method in
case of all the variables.

Example1:

Calculate the rank correlation coefficient for the following data of marks of 2 tests given to
candidates for a clerical job:

Preliminary
92 89 87 86 83 77 71 63 53 50
test
Final test 86 83 91 77 68 85 52 82 37 57
Solutions:

Calculation of Rank Correlation Coefficient

Preliminary test R1 Final test R2 d 2  R1 - R2 


2

92 10 86 9
89 9 83 7
87 8 91 10
86 7 77 5 By using
83 6 68 4 Calculator
77 5 85 8
71 4 52 2
63 3 82 6
53 2 37 1
50 1 57 3

Business StatisticsPage 106


Total
d
2
i 44

We know that,

6 d i
2 6 44
 1  1
n3  n = 103  10 = 1-0.267 = 0.733

Thus there is a high degree of positive correlation between preliminary and final test.

8.2.1 Simple Linear Regression

Simple linear regression refers to the linear relationship between two variables. We usually
denote the dependent variable by Y and the independent variable by X. A simple regression line
is the line fitted to the points plotted in the scatter diagram which would describe the average
relationship between the two variables. Therefore, to see the type of relationship, it is advisable
to prepare scatter plot before fitting the model. The linear model is: Regression coefficient of Y
on X is Y  a  b X

 X Y
 XY  n
b 
( X ) 2
X 2

n , and the intercept

Y  b  X
a Y  bX = n n where b= slope & a= constant

Example # 1: From the following data obtain the regression equations of Y on X :

Sales
X  91 97 108 121 67 124 51 73 111 57

Purchase
Y  71 75 69 97 70 91 39 61 80 47

Solution:

Sales
X  Purchase
Y  X2 Y2 XY

91 71 8281 5041 6461

Business StatisticsPage 107


97 75 9409 5625 7275

108 69 11664 4761 7452

121 97 14641 9409 11737

67 70 4489 4900 4690

124 91 15376 8281 11284

51 39 2601 1521 1989

73 61 5329 3721 4453

111 80 12321 6400 8880

57 47 3249 2209 2679

 X = 900  Y = 700 X 2
= 87360 Y 2
= 51868  XY = 66900

 X Y 900 700
 XY  n
66900 
10
b yx  
( X ) 2 900 
2


X 2

n
87360 
10 0.613207547 = 0.613

Y  b  X 700
 0.613 
900
a Y  b X = n n = 10 10 = 14.81

Then, Regression equation of Y on X is Y 14.81  0.613 X

8.2.2 Curve Fitting, the Method of Least Square, r2

Least Square principle determines a regression equation by minimizing the sum of the squares of
the vertical distances between the actual y values and the predicted values of y.

Coefficient of Determination

Business StatisticsPage 108


One very convenient and useful way of interpreting the value of coefficient of correlation
between two variables is to use the square of coefficient of correlation, which is called
2
coefficient of determination. The coefficient of determination thus equals r .

If the value of r 0.9 , r will be 0.81 and this would mean that 81% of the variation in the
2

dependent variable has been explained by the independent variable.

Business StatisticsPage 109

You might also like