0% found this document useful (0 votes)
70 views113 pages

DS Full

This document outlines the key concepts and formulas for several probability distributions that are important in quantitative analysis, including the binomial, Poisson, and normal distributions. It provides examples of how to calculate probabilities and parameter values for each distribution. The document also discusses key characteristics of the normal distribution and how standard normal tables can be used to find probabilities for the standard normal distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views113 pages

DS Full

This document outlines the key concepts and formulas for several probability distributions that are important in quantitative analysis, including the binomial, Poisson, and normal distributions. It provides examples of how to calculate probabilities and parameter values for each distribution. The document also discusses key characteristics of the normal distribution and how standard normal tables can be used to find probabilities for the standard normal distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

MSU 5509 - Quantitative

Techniques for Management II


Department of Organizational Studies
Faculty of Management Studies
The Open University of Sri Lanka
Outline
1. Probability Distributions
a. Binomial Distribution
b. Poisson Distribution
c. Normal Distribution
d. Normal Approximation to Binomial Distribution
e. Normal Approximation to Poisson Distribution
2. Sampling Applications - Confidence Intervals
3. Hypothesis Testing
a. One Sample Tests
b. Two Sample Tests
c. Chi Square Test
4. Correlation Analysis
5. Regression Analysis
6. Time Series Analysis

2
Probability Distributions

3
4

A Probability Distribution
● Consider the experiment of flipping a coin twice.
● The 4 possible outcomes are HH, HT, TH, TT
● Let the variable X represents the number of heads.
○ X can take values as 0,1 or 2.
○ When a variable takes different values with associated probability, it is called a random
variable.

● If we write the values of the random variable with associated


probabilities, it will become a probability distribution.

X Probability
0 1/4 = 0.25
1 2/4 = 0.5
2 1/4 = 0.25
5

A Probability Distribution Ctd.


● In precise terms, a probability distribution is a total listing of the
various values that a random variable can take along with the
corresponding probability of each value.
● A cumulative probability refers to the probability that the value of a
random variable falls within a specified range.
● A real life example would be, the waiting time distribution of the
customers before they could use the ATM for their transactions. A
study of a random sample of 500 customers reveals the following
probability distribution.
6

Example a) What is the probability


X (Waiting time per P(X) that a customer will wait
customer in minutes) more than 5 minutes?
0 0.20
1 0.18 b) What is the probability
2 0.16 that a customer will not
3 0.12
wait?
4 0.10
5 0.09 c) What is the probability
6 0.08 that a customer will wait
7 0.04
8 0.03
less than 4 minutes?
Total 1.00 d) What is the probability
that a customer will wait
at least 1 minute?
7

Expected Value of a Random Variable- E(X)



8

Variability of a Random Variable – V(X)


● The standard deviation of a discrete random variable (σ) is equal to the
square root of the variance of a discrete random variable (σ2).
● The equation for computing the variance of a random variable is shown
below.
● σ2 = Σ { [ xi - E(x) ]2 * P(xi) }
● where xi is the value of the random variable for outcome i, P(xi) is the
probability that the random variable will be outcome i, E(x) is the
expected value of the discrete random variable x.
9

Discrete Probability Distributions


● A probability that uses a discrete variable is called a discrete
probability distribution.
● This implies that the variable can assume only a restricted number of
distinct values that that are whole numbers.
● Ex: The number of units of a product demanded per day, the number
of cars passing through a street per hour during peak traffic.
● In this course, we will deal with two discrete probability distributions;
■ Binomial Distribution
■ Poisson Distribution
10

Continuous Probability Distributions


● A probability that uses a continuous variable is called a continuous
probability distribution.
● This implies that the variable can take any value within some interval
of real numbers.
● Ex: Measurements of the heights and weights of the respondents.
● In this course, we will deal with an important continuous probability
distribution namely, Normal Distribution.
11

Binomial Distribution
●This is a widely used discrete probability distribution
and it plays a major role in quality control and quality
assurance functions.
●Examples:
○ Manufacturing units use the binomial distribution for defective
analysis.
○ In service organizations like banks and insurance corporations to get
an idea of the proportion of customers who are satisfied with the
service quality.
○ In the context of deciding whether to accept or reject a lot, containing
components or finished products based on statistically designed
sampling plan.
12

Conditions for Applying Binomial Distribution


● Fixed numbers (n) of trials.
● Trials are independent.
● Two outcomes: success (S) or failure (F).
● The same probability of success for each trial.
13

Binomial Probability Function


P(r) = nCr prq(n-r)

○ n= number of trials
○ p= probability of success on a single trial
○ q= 1-p
○ r= number of successes in trials (the random variable)

● When a random variable X distributed as Binomial, we denote it as X


~ Bin(n,p)
● Mean = E(X) = µ = np
● Variance = σ2 = npq
14

Examples
● If X ~ Bin(5,0.3), find
a) P(X=5) b) P(X≤4)

● A bank issues credit cards to customers under the scheme of Master


Card. Based on the past data, the bank has found out that 60% of all
accounts pay on time following the bill. If a sample of 7 accounts is
selected at random from the current database, construct the Binomial
Probability Distribution of accounts paying on time.
a) What is the probability that no account paying on time?
b) What is the probability that at least one account paying on time?
c) Calculate mean and standard deviation of the accounts paying on
time.
15

Poisson Distribution
● This is another discrete distribution which also plays a major role in
quality control in the context of reducing the number of defects per
standard unit such as number of defects per item etc.
● Other real life examples include;
○ The number of cars arriving at a highway entrance per hour.
○ The number of customers visiting a bank per hour during peak business period.
16

Poisson Distribution Function


● P(x) = P(x) = e-λ λx / x!
where e = 2.71828
x=success per unit which can take values of 0,1,2,…
λ = average number of success

● λ is the parameter of the Poisson distribution


● Mean of the Poisson distribution = λ
● Standard deviation of the Poisson distribution = √λ
17

Examples
● If on an average, 6 customers arrive every two minutes at a bank
during the busy hours of working,
a) What is the probability that exactly 4 customers arrive in a given minute?
b) What is the probability that more than 3 customers arrive in a given minute?

● A book containing 500 pages has 750 misprints.


a) What is the average number of misprints per page?
b) Find the probability that any page of the book contains;
1. No misprints
2. Exactly 4 misprints
3. More than the average number of misprints
18

The Normal Distribution


● This is the most widely used continuous distribution.
● It occupies an unique space in the field of Statistics.
● Among other things, people’s weight, height and shoe sizes are normally
distributed as are annual rainfall and temperature of a region, IQ scores,
test scores and most natural phenomena in general.
● Many distributions can be approximated by Normal Distribution.
● It’s a bell-shaped distribution, which is symmetric about the mean and
mean, median and mode are all equal.
● This has 2 parameters namely, mean and variance.
19

Probability and The Normal Curve


● The total area under the normal curve is equal to 1.
● The probability that a normal random variable X equals any particular value
is 0.
● The probability that X >a (as indicated by the non-shaded area in the figure
below).
● The probability that X <a (as indicated by the shaded area in the figure
below).
20

The Standard Normal Distribution


● The Normal distribution with µ=0 and σ=1 is referred to as the
Standard Normal Distribution.
● It is represented as Z ~ N(0,1)
● We standardize a random variable X by using the formula; Z = (X-µ)/σ
● Standard normal tables are used to calculate the probabilities
associated with normal distribution.
21

Use of Standard Normal Tables


● Only positive values of z are given in the tables. So for negative values of
z, the symmetric property of the curve is used.

P(Z<-a)=P(Z>a)=prob

P(Z>-a)=1-P(Z>a)=1-prob
22

Examples
● P(Z≤2.1)
● P(Z>2.1)
● P(1.5 ≤Z ≤2.1)
● P(Z<-1.5)

● Suppose scores on an IQ test are normally distributed. If the test has a


mean of 100 and a standard deviation of 10, what is the probability that a
person who takes the test will score between 90 and 110?
23

Continuity Correction
●If P(X=n), use P(n-0.5<X<n+0.5)

●If P(X>n), use P(X>n+0.5)

●If P(X≥n), use P(X>n-0.5)

●If P(X≤n), use P(X<n+0.5)

●If P(X<n), use P(X<n-0.5)


24

Normal Approximation to Binomial


Distribution
● Under certain conditions, the normal distribution can be used as an
approximation to the binomial distribution.
● This works best when the sample size, n is large, np>5 and the
probability of success, p is close to 0.5.
● In general, X~bin(n,p), then µ=E(X)=np and σ2=V(X)=npq.
● Now, for large n and p close to 0.5, X~N(np,npq) approximately.
25

Example
● Find the probability of obtaining between 4 and 7 heads inclusive with 12
toss of a fair coin,
○ Using the binomial distribution
○ Using the normal approximation to binomial distribution
26

Normal Approximation to Poisson Distribution


● If X~Poi( ), then E(X)= and V(X)= λ
● Now for large λ, X~N( ) approximately.
● Generally we require >20 for a good approximation.
27

Example
● A radioactive disintegration gives counts that follow a Poisson distribution
with mean count per second of 25. Find the probability that in 1 second
the count is between 23 and 27 inclusive,
○ Using the Poisson distribution
○ Using normal approximation to Poisson distribution
28

Sampling Applications
MCU3209_Jayani C. Hapugoda_OUSL 29

Introduction
● Statistics consists of two main branches called descriptive statistics
and inferential statistics.
● Inferential statistics are used to generalize the findings from sample
to the population.

Relatively small sample whole


population
Infer or make
decisions
MCU3209_Jayani C. Hapugoda_OUSL 30

Important Points on Notation


Parameter
● A numerical value associated with a population is called a
‘parameter’ (fixed)
● Ex: If you want to study about the income level of particular
class of people, the average income of this entire class of people
is a parameter.
Statistic
● A numerical value computed from a sample is called a ‘statistic’
(variable)
● It is used to make inferences about the population parameter.
● Ex: In the previous example, average income based on a sample
is a statistic.
MCU3209_Jayani C. Hapugoda_OUSL 31

Important Points on Notation


Parameter Statistic
Mean µ x̄
Standard Deviation σ s
Proportion p
MCU3209_Jayani C. Hapugoda_OUSL 32

The Random Nature of Samples

● In theory, take many random samples (of same size) from a


population.

● What do you expect these samples to be?


○ You can expect different samples from the same population to give different values
for the same statistic (ex: sample mean)
MCU3209_Jayani C. Hapugoda_OUSL 33

Example
● Consider the population of size 5 school buses A, B, C, D, E. The no. of
students travelling in each bus is given below:
Sample Sample mean
Bus No. of students AB 27.0
A 24 AC 22.5
B 30 AD 21.0
C 21 AE 25.5
D 18 BC 25.5
E 27 BD 24.0
BE 28.5
CD 19.5
CE 24.0
DE 22.5
MCU3209_Jayani C. Hapugoda_OUSL 34

Sampling Distribution
● The frequency distribution of possible values of a statistic for
repeated samples of the same size from the same population is
called the sampling distribution of the statistic.

● If individual observations in the population have the N(µ,σ2)


distribution, then the sample mean, of independent observations
has the distribution N(µ, σ2/n)

● If measurements in the population follow a Normal distribution, then


so do the sample means.
35

Sampling Distribution of Sample Means –


MCU3209_Jayani C. Hapugoda_OUSL

Skewed Populations

● As n
increases,
shape
becomes
more
normal
MCU3209_Jayani C. Hapugoda_OUSL 36

Central Limit Theorem


● If the population has the N(µ,σ2) distribution, then the
sampling distribution of sample means has the
distribution N(µ, σ2/n)

● If the population has a skewed distribution, but sample


size is sufficiently large (n>30), then the sampling
distribution of sample means has approximated to the
distribution N(µ, σ2/n)

● No matter what distribution the population values


follow, the sample means will follow a Normal
distribution if the sample size is large.
MCU3209_Jayani C. Hapugoda_OUSL 37

Applications of Sampling Distribution Theory


● Parameter Estimation:
○ Uses sample data to provide a value / a range of values that covers the true value of
the population as the researcher is confident

● Hypothesis Testing/ Tests of Significance:


○ Uses the sample data to test if the population value is really what it believes to be.
MCU3209_Jayani C. Hapugoda_OUSL 38

Parameter Estimation

● Usually population parameters are unknown.


● Therefore, they are estimated by taking a random sample or samples
from the population. (estimating the parameters using the sample
values)
● Two methods are used for parameter estimation:
○ Point Estimation
○ Interval Estimation
MCU3209_Jayani C. Hapugoda_OUSL 39

1. Point Estimation

● A point estimate is a single value used to estimate the corresponding


population parameter.
● Eg: sample mean - Used as an estimate for population mean
MCU3209_Jayani C. Hapugoda_OUSL 40

2. Interval Estimation
● For a given population parameter, we can construct a random interval
so that it has a given probability of capturing the population
parameter.
● It deals with replacing a point estimate, a single number, by an entire
interval of possible values. (An interval of possible values for the
parameter being estimated).
● An interval estimate provides more information about population
parameter than does a point estimate.

Lower confidence Point estimate Upper confidence


limit limit
Width of the interval
MCU3209_Jayani C. Hapugoda_OUSL 41

Confidence Intervals (CI)


● An interval of values can be computed from a sample data that is likely to
include the true population value…such interval estimates are called
confidence intervals.

● An interval calculated from the data, usually of the form:


○ Point estimate ± margin of error

● How to define ‘likely’?...based on the level of confidence


MCU3209_Jayani C. Hapugoda_OUSL 42

Confidence Interval (CI) Ctd.


MCU3209_Jayani C. Hapugoda_OUSL 43

Confidence Intervals (CI) Ctd.

Confidence Intervals

Population Mean ( Population Proportion (p)

When population When population


standard standard
deviation (σ) deviation (σ)
known unknown
MCU3209_Jayani C. Hapugoda_OUSL 44

1) Confidence Interval for when known


MCU3209_Jayani C. Hapugoda_OUSL 45

1) Confidence Interval for when known ctd.


● Example:
● A random sample of 12 items is selected from a normal population with
variance 4. Measurements of some characteristics of the sample are as
follows: 9.5, 9.5, 11.2, 10.6, 9.9, 11.1, 10.9, 9.8, 10.1, 10.2, 10.9, 11
Calculate 95% confidence interval for the population mean.
MCU3209_Jayani C. Hapugoda_OUSL 46

2) Confidence Interval for when unknown


● In most of the real world applications, the standard deviation is
unknown.
● If the population standard deviation is unknown, we substitute it by the
sample standard deviation, s.
● This introduces extra uncertainty, since s varies from sample to sample.
● Therefore, we use the t distribution instead of Normal distribution.
MCU3209_Jayani C. Hapugoda_OUSL 47

2) Confidence Interval for when unknown ctd.


MCU3209_Jayani C. Hapugoda_OUSL 48

2) Confidence Interval for when unknown ctd.


MCU3209_Jayani C. Hapugoda_OUSL 49

2) Confidence Interval for μ when σ unknown ctd.


● Example:
● Ten packets of biscuits are selected at random and their weights are given
below: 397.3, 401, 392.9, 396.8, 400, 397.6, 392.1, 400.8, 399.2, 400.6
Calculate 95% confidence interval for the population mean.
MCU3209_Jayani C. Hapugoda_OUSL 50

3) Confidence Interval for proportion (p)


MCU3209_Jayani C. Hapugoda_OUSL 51

3) Confidence Interval for proportion (p) ctd.


● Example:
● A manufacturer wants to assess the proportion of defective items in
a large batch produced by a particular machine. A random sample of
300 items were taken and observed that 45 items of them are
defectives. Calculate 95% confidence interval for the proportion of
defective items in the batch.
Hypothesis Testing

52
53

The Idea of Hypothesis Testing


● Is the population value really what it has always been thought to
be?
○ Take a sample to test if the hypothesized population value is correct or not.
○ Do the data give evidence against the claim?...Consider the p-value
● Ex:
○ A biscuit manufacturer claims that the average weight of a biscuit is 400mg.
○ Take a sample of biscuits to check whether his claim is true.
○ A sample of 10 biscuits gave a sample mean of 397.83
○ Suppose that we know that the weights vary among biscuits by a Normal
distribution with known standard deviation 3.19
○ Does the data provide sufficient evidence that the manufacturer’s claim is true?
54

Applications of Hypothesis Testing


● The decision makers of companies have to make decisions with
minimum risk in an environment characterized by uncertainty.
● Acceptance or rejection of a decision depends on acceptance or
rejection of a hypothesis.
● Ex: A marketing manager is facing a decision whether to introduce
a new product in the market or not. If his company could get a
market share of 15% or more, then the new product would be
introduced to the market. A suitable hypothesis formulation and
testing would help the manager to take the right decision.
55

What is a Statistical Hypothesis?


● A statistical hypothesis is a statement about a population parameter.
● In any hypothesis testing, there’re 2 contradictory statements:
○ Null hypothesis (H0)
○ Alternative hypothesis (H1)
● These two hypotheses cannot be simultaneously true; only 1 of them
will be true and vice versa.
● The acceptance or rejection of a particular hypothesis leads to the
acceptance or rejection of a particular decision.
56

Null Hypothesis (H0)



57

Alternative Hypothesis (H1)


58

One – Sided Test


● A one-sided test is a statistical hypothesis test, in which the values for
which we can reject the null hypothesis (H0) are located entirely in
one tail of the probability distribution.
● Ex:
○ H0: µ = 50 vs H1: µ>50
○ We believe the mean value as 50, and now we want to check whether the mean has
increased

○ H0: µ = 50 vs H1: µ<50


○ We believe the mean value as 50, and now we want to check whether the mean has
decreased
59

Two – Sided Test


● A two-sided test is a statistical hypothesis test, in which the values for
which we can reject the null hypothesis, H0 are located in both tails of
the probability distribution.
○ Ex: H0: µ = 50 vs H1: µ≠50
■ We believe the mean value as 50, and now we want to check whether the mean has
changed

● The choice between a one-sided test and a two-sided test is


determined by the purpose of the test.
60

Test Statistic
● A test statistic is a quantity calculated from our sample of data (data
summary or measure).
● Its value is used to decide whether or not the null hypothesis should
be rejected in our hypothesis test.
● The choice of a test statistic will depend on the assumed probability
model and the hypotheses under question.
61

Errors in Hypothesis Testing


Decision
Reject H0 Do not reject H0
Actual H0true Type I error No error
condition
H0 false No error Type II error
62

Significance Level (α)


● The significance level of a statistical hypothesis test is a fixed
probability of incorrectly rejecting the null hypothesis H0, if it is in fact
true (type I error)
● Denoted by α
● Typical values are 0.01, 0.05 or 0.10
● Is selected by the researcher at the beginning
● Provide the critical value/s of the test
63

Critical Value and Critical Region


● We find a set of values while testing a hypothesis, which tell us to
reject Ho.
● This set of values are called critical region.
● The boundary of the critical region is the critical value.
● The critical value for any hypothesis test depends on the significance
level at which the test is carried out, and whether the test is one-
sided or two-sided.
64

Types of Hypothesis Tests


Hypothesis Testing
Population mean
Known Population
Unknown standard proportion
standard
deviation
deviation

Test
Small statistic
sample
Critical
value
Test
Large statistic
sample Critical
(n>30) value
65

Hypothesis Testing Process



66

Example 1
● A traditional manufacturing process has produced millions of TV tubes
with a mean life 1200h and st. deviation 300h. The engineering
department of the company introduced a new process. A sample of 100
tubes from new process gives sample mean 1265h. Assuming the st.
deviation of new process is same as traditional process, test the following
hypothesis at 5% significant level.
1. Traditional method and new method gives same mean life
2. New method is better than the traditional method
3. Traditional method is better than the new method
67

Example 2
● A personal specialist of a major corporation is recruiting a large number
of employees for an overseas assignment. During the testing process, the
management inquires the mean of the scores from the specialist and the
reply was 90. When the management reviews 20 of the test results
compiled, it finds that the mean score is 84, and the standard deviation of
this score is 11. If the management wants to test the specialist’s view at
1% significant level, what decision can be drawn?
68

Example 3
● A marketing manager of an enterprise is facing a decision whether to
introduce a new product into the market or not. Consumer acceptance
measured in a blind comparison test is agreed upon as an appropriate
basis for evaluation. Marketing of the new product will be pursued only if
the acceptance rate exceeds 30%. Otherwise, the new product will not be
introduced in the market. A random sample of 200 consumers reveals
that the acceptance rate is 32%. Using a level of significance of 0.01,
perform the hypothesis testing and recommend your action.
69

Testing the Difference Between Two


Population Parameters
● The null hypothesis for the two sample test is H0: µ0 = µ1
● That is, the two samples have both been drawn from the same
population.
● This null hypothesis is tested against one of the following alternative
hypotheses, depending on the question posed.
70

Testing Two Population Means: Large


Samples (n>30)

71

Example 4
● Two research laboratories have independently produced drugs that
provide relief to arthritis sufferers. The first drug was tested on a group
of 90 arthritis sufferers and produced an average of 8.5h of relief, and a
sample standard deviation of 1.8h. The second drug was tested on 80
arthritis sufferers, producing an average of 7.9h of relief, and a sample
standard deviation of 2.1h. At the 0.05 level of significance, does the
second drug provide a significantly shorter period of relief?
72

Chi Square Test


73

Introduction
● This technique is used for 2 purposes:
○ Comparing population proportions of more than 2 samples (Goodness of fit)
○ To determine the association between 2 nominal variables (Test of independence)

● Assumptions:
○ Samples are randomly and independently drawn.
○ The data must be in frequency form.
○ No frequency in any category must be less than 5.

● Hypothesis testing procedure is same for chi-square tests.


74

Chi-Square Test – Goodness of Fit


● Hypothesis:
○ H0: Population proportions are equal
○ H1: Population proportions are not equal

● Test statistic =
Where fo=observed frequency for the ith category
fe=expected frequency for the ith category

● Critical value = chi square value with degrees of freedom k-1; where k
– number of categories
● When test statistic > critical value (chi square table value), we reject
H0.
75

Chi-Square Test – Goodness of Fit


Example: Assume that a marketer wishes to compare
five different package designs. He is interested in
knowing which is the most preferred one so that the
same can be introduced to the market. A random
sample of 200 consumers gives the following results:
Package design Preference by consumers

A 36
B 52
C 40
D 35
E 37
Total 200
76

Chi-Square Test of Independence


● Hypothesis:
○ H0: There’s no association between the 2 categorical variables
○ H1: There’s an association between the 2 categorical variables

● Test statistic =

Where fo=observed frequency for the ith category


fe=expected frequency for the ith category
● The expected frequencies are calculated using the marginal totals of
rows and columns
● Contingency tables are used to carry out the test
● Critical value = chi square value with degrees of freedom (c-1)(r-1)
where: c – number of columns and r – number of rows
● When test statistic > critical value (chi square table value), we reject H0.
77

Chi-Square Test of Independence


Example: In a market survey conducted to examine whether the
choice of a brand is related to the income levels of the consumers, a
random sample of 400 consumers reveal the following:

Income level (per Brand 1 Brand 2 Total


month)
<20000 122 118 240
20000-50000 62 60 122
>50000 16 22 38
Total 200 200 400

Test the association between income level and brand preference at 5%


significant level.
78

Correlation Analysis
79

The Relationship between Two Variables


● Managers very often have to assess the nature and degree of
relationship between variables.
○ Ex: the relationship between the advertising expenditure and the sales volume. Normally
you expect a positive relationship.

● The manager would like to know whether money spent on advertising


is justified in terms of sales generated or 10% increase in advertising
expenditure will result in how much extra sales volume?

● These types of questions can be answered by ‘Correlation’.


80

The Relationship between Two Variables (Ctd)


● The relationship between two variables can be viewed graphically
by using;
○ Scatter Diagram (two continuous variables):
■ Plot of the ordered pairs of observations of the two
continuous variables.

● The relationship can be identified by comparing the distribution of


the numerical variable in terms of shape, center and spread.
81

Types of Relationships using Scatter Plots


82

Correlation
● Correlation describes the strength of a linear relationship.
● The strength of a linear relationship is an indication of how closely the
points in the scatter plot fit a straight line.
● Coefficient of correlation (r) measures the strength of a linear
relationship. It’s numerical value ranges between -1 to +1.
● If there is no fit we say there is no relationship; points are scattered
randomly on the plot, r=0.
● If points lie exactly on a straight line, we say that there is a perfect
linear relationship; r=1 or r=-1.
83

Correlation
84

Correlation
● Guidelines for classifying the strength of a linear relationship.
Correlation of coefficient (r) Strength of relationship
+1 Perfect positive
Between 0.75 and 0.99 Strong positive
Between 0.5 and 0.74 Moderate positive
Between 0.25 and 0.49 Weak positive
Between -0.24 and 0.24 No relationship
Between -0.25 and -0.49 Weak negative
Between -0.5 and -0.74 Moderate negative
Between -0.75 and -0.99 Strong negative
-1 Perfect negative
85

Correlation Coefficient
86

Correlation Coefficient - Example


Promotional Sales (Y)
Expenses (X)

7 12
10 14
9 13
4 5
11 15
5 7
3 4

Draw the scatter plot and draw the line of best fit.
Calculate and interpret the correlation between promotional
expenses and sales.
87

Regression Analysis
88

Regression Analysis
● The correlation coefficient gives us the degree of relationship between
two variables.
● It doesn’t estimate or predict one variable using the other variable.
○ Ex: predicting the sales volume using the advertising expenditure.
● Using regression analysis, it is possible to predict one variable using other
variables.
● For business planning and forecasting, regression is much more useful
than correlation.
89

Simple Linear Regression



90

Simple Linear Regression


● Using the simple linear regression line, the estimated values of the
dependent variable can be predicted.
● More often, the estimated values and the actual values of the
dependent variable will be differ. (the errors or residuals)
● Therefore, the standard error associated with the obtained
regression equation can be calculated as follows;

● When the standard error is small when compared with the range of y
values, this indicates a good model fit.
91

Coefficient of Determination


92

Assumptions of Regression Analysis

● Linearity of the relationship between data


● Independence of errors of the model
● Normality of errors of the model
● Equal variance of errors
93

Example
Promotional Expenses (X) Sales (Y)

7 12
10 14
9 13
4 5
11 15
5 7
3 4
94

Example Ctd.
1) Calculate the regression equation. (i.e. Calculate the slope and
intercept of the regression line)
2) Interpret the slope coefficient of the regression equation.
3) Using the regression equation, calculate the sales volume with
respect to promotional expense of 4.
4) Obtain the coefficient of determination (i.e. how much of the
variability of sales is predicted by promotional expenses) and
interpret the results.
95

Multiple Regression
● Multiple regression is a statistical technique used to predict the value of a
dependent variable that is influenced by two or more independent variables.
● Ex: The performance of an employee depends on age, gender, education level
etc.
● The multiple regression equation with k independent variables can be given as
Yi = b0 + b1 X1i+ b2X2i….+bkXki
■ Where;
■ Yi = the predicted value from the model
■ b0 = the estimated intercept
■ b1,b2..bk = the estimated slope coefficients
■ X1i , X2i .... Xki = values of independent variables
96

Coefficient of Determination (R2)

● The coefficient of determination of multiple regression represents


the proportion of the variation in Y that is explained by the set of
explanatory variables used.
● Ex: R2 =0.99, the interpretation of that is, 99% of the variation of the
dependent variable is explained by the variation of the independent
variables.
97

Testing Significance of the Model


● Test whether there is a significant relationship between the dependent
variable and the set of explanatory variables.
● If the significance value of the F value is lesser than the significance level,
the model fits the data well.
98

Significance of Individual Explanatory Variables


● This test shows if there is a linear relationship between the variable Xj
and Y, holding constant the effects of other X variables
○ H0: βj = 0 (no linear relationship)
○ H1: βj ≠ 0 (linear relationship does exist between Xj and Y)
○ (Similar to hypothesis testing)
● If the significance value of a coefficient is lesser than the significance
level, a linear relationship exists between the particular Xj and Y
variables.
99

Residual Analysis of The Model


● Errors (residuals) from the multiple regression model can be given as:
ei = (Yi – Ŷi)
● Assumptions:
○ The errors are normally distributed
○ Errors have a constant variance
○ The errors are independent
● The residual plot can be plotted as ‘Residuals vs. Predicted Yi’
● If the residual plots show a pattern, this may indicate the model fit is not
good.
100

Example
101

Time Series Analysis


Introduction
● Any variable that is measured over time in sequential order is called
a time series.
● Analyzing a time series will be useful in detecting patterns of the
data in regular time intervals.
● We use these patterns to predict the future values of the time series.
● Applications:
○ Assessing future demand for products and services
○ Sales growth
○ Cost trends

102
Components in Time Series
● Four components in a time series:
○ Trend (T): represents the long term behavior of the time series, whether the data
reveal a steady upward or downward movement.

○ Cyclic Effect (C): represents the typical business cycles that occur at irregular intervals
in several years. (long term)

103
Components in Time Series
○ Seasonal Variation (S): represents variation caused by season. A repetitive behavior
of less than one year period (short term).

○ Random Variation (R): represents irregular variations that occur by chance having no
assignable cause and this cannot be predicted.

104
Time Series Models
● Two commonly used models:
○ Additive model: Produces the forecasts by adding the components
Y = T+C+S+R

○ Multiplicative model: Produces the forecasts by multiplying the


components
Y = T*C*S*R

● Decomposition method: Since we cannot easily extract or


predict the cyclic effect and random variation in a time series,
the model is simplified to Y = T*S. Here, we assume the trend
component will capture cycles during the forecast period.

● De-seasonalized time series: When the seasonal effect is


105 removed from a time series, it is called the de-seasonalized
series.
Smoothing Techniques
● To present a better forecast, we need to extract the components in a
time series.
● For that, we need to remove the random variation present in the
time series.
● Smoothing techniques are used to remove the random variation
from a time series.
● Smoothing techniques:
○ Moving Averages (MA)

106
Moving Averages (MA)

107
Moving Averages (MA) -Example
● A company is interested in forecasting demand for one of its
products. Find 3-month and 5-month moving averages for the data.

Month Sales (100 3-month moving 5-month moving


units) averages averages
1 15
2 9 13.33
3 16 14 13.6
4 17 14.67 14.6
5 11 16
6 20

108
Centered Moving Averages
● When even number of observations includes in the moving
averages, the average is placed in the middle of two periods.
● To place it on the actual time, it has to be centered by
calculating the average of two consecutive moving averages.
● Ex: Month Sales (100 4-month moving Centered moving
units) averages averages
1 15
2 9
3 16 14.25
13.75
4 17 13.25 14.625
5 11 16
6 20
109
Trend Analysis
● Trend can be linear or non-linear.
● Linear trend lines are most commonly used, as a non-linear trend
can be mathematically transformed to a linear trend.
● In trend analysis, we fit a trend line using the de-seasonalized time
series data.
● Least square method is used
● The equation used is y = b0 +b1t
○ y is the forecasted value
○ t is the independent variable (time)
○ b0 is the intercept (the y value when t=0)
○ b1 is the slope of the line

110
Trend Analysis Ctd.

111
Seasonal Effect Analysis
● Seasonal variation may occur within a shorter time periods (within
a year).
● Seasonal indices are constructed to measure the seasonal effect.
● Steps:
1) Calculate a series of suitable centered moving average. (CMA)
2) Calculate the percentage of the actual value to the CMA value for
each period in the time series having a CMA entry.
3) Use ratios calculated in step 2 to calculate the average seasonal
effect for each season.
4) Adjust the seasonal effects in a way that they add up to a number
which equals to the number of seasons.
5) De-seasonalize the original series by dividing the corresponding
adjusted seasonal effects.
6) Estimate the trend line by fitting appropriate regression model on
the de-seasonalised series.
112
7) Prepare forecast based on trend*percentage seasonal variation.
Seasonal Effect Analysis - Example
● A company is interested in forecasting sales for one of its products.
Forecast sales for year 5 for the data using seasonal adjustments.

Year Sales of Widgets in ‘000s


Q1 Q2 Q3 Q4
1 20 32 62 29
2 21 42 75 31
3 23 39 77 48
4 27 39 92 53

113

You might also like