Regression, Correlation Analysis and Chi-Square Analysis
Regression, Correlation Analysis and Chi-Square Analysis
Introduction:
The linear relationship between two and more than two variables. One of them is
dependent variable and other s are independent variables. For example sales depends on
promotional expense. Using regression analysis it is possible to predict sales for a given
promotion expense.
Regression Model:
Simple Linear Regression Model:
The linear relationship between two variable one is independent variable and other is
dependent variable is called simple linear regression model. For example demand may be
structured as a linear function 0f price.
Y= a+bX
Where
Y = dependent variable
X = independent variable
Y = bo +b1X1 + b2X2 + b3 X3
Where
Y = dependent variable
X1 = independent variable
X2 = independent variable
X3 = independent variable
b o= constant
b 1= regression coefficient of x1
b 2= regression coefficient of x2
b 3= regression coefficient of x3
X Y XY X2
40 85 3400 1600
25 95 2375 625
20 65 1300 400
(i). Find the regression line to predict increase in sales from advertisement expenditure.
Solution:
Y = a + bx
b= 3.22
a = Y - bx
a = 43.71
y = 43.71 + 3.22 x
2. Are good grades in college important for earning a good salary? A business statistics student has
taken a random sample of starting salaries and college grade point average for some recently
graduated friends of his. The data follow:
Starting salary ($ thousands): 36, 30, 30, 24, 27, 33, 21, 27
3. A landlord is interested in seeing whether his apartment rents are typical. Thus, he has taken a
random sample of 11 rents and apartment sizes of similar apartment complexes. The data
follow:
Rent: 230, 190, 450, 310, 218, 185, 340, 245, 125, 350, 280
4. Many small companies buy advertising without considering its effect. “ Hamburger wars”
(substantial price rivalry with special “value meals”) have cut the profits of Ethiopian Burgers of
Santa Cruz, California, a small regional chain. The marketing manager is trying to make the case
that “ you have to spend money to make money.” Spending on billboard advertisements, in the
manager’s opinion, has a direct result on sales. There are records for 7 months:
Monthly expenditure
Monthly sales
Outlet
Consumer 1814 1852 1012 1482 1578 1778 1748 1020 1058 840 1358 1744 1848 1214 904
Sales 22.4 22.1 13.68 18.42 18.84 20.16 18.9 13.46 14.48 12.24 15.26 18.86 18.92 15.28 13.84
(lakh Rs)
33 3 125
61 6 115
70 10 140
82 13 130
17 9 145
24 6 140
(a) Calculate the least squares equation to predict sales from advertising and price.
(b) If advertising is 7 and price is $132, what sales would you predict?
Y X1 X2
3.0 80 2700
(a) Calculate the least Square equation that best describes the data.
(b) What percentage change in GNP would be expected in a year in which the federal
deficit was $240 billion and the mean Dow Jones value was 3000?
Km per 14 12 15 16 12 16 13 14 13 11
litre
Average 50 40 45 55 35 60 55 55 40 30
speed
(km)
Total 2 3 2 1.5 3 1 5 2 2 3
Weight
(000 kg)
Fit a multiple regression model through SPSS by taking fuel consumption as the dependent
variable and average speed and total weight as the independent variables.
8.4 35.9
10.2 31.8
16.5 24.7
21.7 25.2
9.4 36.8
8.3 35.8
11.5 33.4
18.4 25.4
16.7 31.4
19.3 27.4
28.4 15.8
4.7 31.5
12.3 28.9
Dependent
Independent 1 Quantitative
Model Summary
Std. Error
Adjusted R
a Model R R Square of the
Square
Estimate
1 .879 .773 .752 2.919
Sum of Mean
Model df F Sig.
Squares Square
1 Regression 319.098 1 319.098 37.448 .000
Residual 93.733 11 8.521
Total 412.831 12
P-value < money
a Predictors: (Constant), 0.05 spent ($ thousand)
b Dependent Variable: percentage of dangerous pollutants
Coefficients
Un
Standardized
standardized t Sig.
Coefficients
Coefficients
Model B Std. Error Beta
money spent
-.782 .128 -.879 -6.119 .000
($Thousand)
Sales Forecasting
To measure the effect of advertising and prices, the following data were collected form a consumer
marketing company for the last 10 months. Figures in the following table are in $1000. Fit regression
model sales as dependent variable and other variable take as independent variable and also find
correlation coefficient.
1 300 45 2400
2 350 50 2200
3 400 55 2100
4 500 85 2000
5 400 65 2150
6 450 60 2000
7 420 57 2150
8 550 68 1950
9 450 60 2050
10 500 70 2000
Situation Variable variable(s) Nature Technique
Dependent 1 Quantitative
2 Multiple regression
Independent >1 Quantitative
Y: Sales
X2: Prices
Model
Y=a+b1X1+b2X2
Model Summary
Std.
Adjusted
R Error of
Model R R
Square the
Square
Estimate
1 .953 .908 .881 25.71
Sum of Mean
Model df F Sig.
Squares Square
1 Regression 45533.520 2 22766.760 34.447 .000
Total 50160.000 9
a Predictors: (Constant), price in Rs, advertisement expense rs in lacs
b Dependent Variable: sales in units
P-value=0.001< 0.05, indicates that
model is significant to predict
dependent variable
Coefficients
Unstandardiz
Standardized
ed t Sig.
Coefficients
Coefficients
Model B Std. Error Beta
Indicate
Rate of change in Y per
variable is
unit increase of X keeping
significant
others as constant
Indicate variable is
insignificant
Correlation
The interdependence of two variable X and Y is called correlation. In other words, two variables
are said to be correlated if they tend to simultaneously vary in some direction. If both the variables tend
to increase (or decrease) together, the correlation is said to be direct or positive, if one variable tends to
increase as the other variable decreases the correlation is said to be negative or inverse,
Properties of correlation
(i) r xy = r yx
(ii) -1 r +1
Calculate the product moment coefficient of correlation between X and Y from the following
data:
X Y X2 Y2 XY
1 2 1 4 2
2 5 4 25 10
3 3 9 9 9
4 8 16 64 32
5 7 25 49 35
15 25 55 151 88
r= 0.8
Example:
Calculate the coefficient of correlation of the following data
3 25
4 24
5 20
6 20
7 19
8 17
9 16
10 13
11 10
12 6
r = -0.876
Example
The following data refer to two variables promotional expenses (Rs lakhs) and sales (1000 units)
collected in the context of a promotional study. Calculate the correlation coefficient and comment.
Promotional
Sales
Expenses
7 12
10 14
9 13
4 5
11 15
5 7
3 4
r= 0.9787
Comments The promotional expense is strongly associated with sales and the
Example
The following data were computed from personnel records of a manufacturing firm.
∑XY = 482788.
Example
Find the coefficient of correlation between persons employed and cloth manufactured in a
textile mill.
Cloth manufactured
Persons employed
(‘000 yds)
137 23
209 47
113 22
189 40
176 39
200 51
219 49
r= ∑xy - nXY
n Sx Sy
Example
S2x = 4, S2y = 9,
(ii) Given rxy = 0.8 and bxy = 0.45, what would be the value of byx ?
Example
Calculate the correlation coefficient for the following data on supply and demand
Supply Demand
400 50
200 60
700 20
100 70
500 40
300 30
600 10
Correlation by SPSS
Example
The following figures are index numbers of average weekly earnings and prices in a country.
Chi-Square analysis is widely used in research studies for testing hypothesis involving nominal
data. Nominal data are also known by two names categorical data and attributes data. The symbol χ 2
statistics is used to designate the chi-square distribution whose value depends on the number of degree
of freedom (d.f.). A Chi-Square distribution is a skewed distribution particularly with smaller d.f. As the
sample size and therefore the d.f. increase, the χ 2 distribution becomes a symmetrical distribution
approaching normality.
The sample observation drawn from a population must be independent and random.
The data must be in frequency (counting) form. If the original data are in percentages, they must
be converted into frequency.
No frequency in any cell/category must be less than 5. If the frequency is less than 5 for a
category, you have to do some regrouping.
Example
An educator has the opinion that the grades high school students make depend on the amount
of time they spend listening to music. To test this theory, he has randomly given 400 students a
questionnaire. Within the questionnaire are the two questions. How many hours per week do you listen
to music? What is the average grade for all your classes? The data from the survey are in the following
table. Using a 5 percent significance level, test whether grades and time spent listening to music are
independent or dependent.
Average Grade
Hours
spent
A B C D F Total
listening to
music
< 5 hrs 13 10 11 16 5 55
5-10 hrs 20 27 27 16 5 95
> 20 hrs 8 11 41 24 11 95
Crosstabs
hours * average grade Crosstabulation
Count
average grade
A B C D F Total
hours <5 13 10 11 16 5 55
5-10 20 27 27 19 2 95
11-20 9 27 71 16 32 155
> 20 8 11 41 24 11 95
Chi-Square Tests
a. 0 cells (.0%) have expected count less than 5. The minimum expected
count is 6.88.
Example
A newspaper publisher, trying to pinpoint his market’s characteristic, wondered whether
newspaper readership in the community is related to readers educational achievement. A survey
questioned adults in the area on their level of education and their frequency of readership. The results
are shown in the following table.
Never 10 17 11 21 59
Sometimes 12 23 8 5 48
Morning or
35 38 16 7 96
evening
Both editions 28 19 6 13 66
Total 85 97 41 46 269
At the 0.05 significance level, does the frequency of newspaper readership in the community differ
according to the readers level of education?
Example
An advertising firm is trying to determine the demographics for a new product. They have
randomly selected 75 people in each of 5 different age groups and introduced the product to them. The
results of the survey are given below.
Purchase
12 18 17 22 32
frequently
Seldom
18 25 29 24 30
Purchase
Never
45 32 29 29 13
Purchase