Regression and Factor
Regression and Factor
Regression and Factor
สถิตแิ ละการวิเคราะห์ข้อมูลขั้นสูงด้วย
SPSS for Windows
2
4. วันพฤหัสบดีที่ 2 กันยายน 2564 วลา 18.30 – 21.30 น.
➢ Discriminant Analysis
▪ Assumptions
▪ examine conditions(Assumptions)
▪ Example
3
Regression Analysis.
• Analyze the relationship among
variables.
• Causal Relationship
Objective
1. Study the pattern of relationship between variables.
2. Estimate or Forecast.
Kanlaya Vanichbuncha
4
Regression Analysis
Income Expense
Independent (X) Dependent (Y)
Income Expense Adv.
Expense Sales
Advertising Sales Revenue
Expense Income Sat.
Score
Income Satisfaction score
Kanlaya Vanichbuncha 6
Type of Data
1 2 3
Cross-sectional Time – series Longitudinal
data data data
Kanlaya Vanichbuncha 7
Simple Linear Regression Analysis
The relationship between dependent (Y) and
independent (X) is linear form.
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝒆𝒊 i = 1,2,…, N
Kanlaya Vanichbuncha
8
Predicted by regression line ( Yˆ )
y 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿 + 𝒆 y
𝜷𝟏 < 0
𝜷𝟏 > 0
𝜷𝟎
x x
𝜷𝟏 = slope of the line or the average changed in Y for each
change of one unit (either increase or decrease) in X
𝜷𝟏 = regression coefficients
𝜷𝟎 = Y- intercept or the estimated value of Y when X = 0
The unit 𝜷𝟎 and 𝜷𝟏 are the same of dependent variable Y.
Kanlaya Vanichbuncha 9
Figure 1 : Positive linear relationship Figure 2 : Negative linear relationship
y y
x x
10
Kanlaya Vanichbuncha
For Sample Data
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝒆𝒊 …(1)
𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊
𝒀 …(2)
Estimated a and b by
1. Ordinary Least Square (OLS)
2. Maximum Likelihood Estimator (MLE)
Kanlaya Vanichbuncha 11
X = Advertising expense (Baht 100,000)
Example 1
Y = Sales Revenue (million)
Regression equation
Kanlaya Vanichbuncha 12
Example 2
X 2 4 8 10 13 14
60
50
40
Y 50 38 26 25 7 2
30
Step
20
10
1 Use scatter plot to examine the relationship between X and Y.
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 2 𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊
𝒀
Error Term or Residual Term 𝒃𝟏 = -3.368 𝒃𝟏 = 54.417
14
Kanlaya Vanichbuncha
The coefficient of Determination : R2
𝑆𝑆 Re 𝑔 0 R2 1 or 0% R2 100%
𝑅2 =
𝑆𝑆𝑇
( )
2 n 2
𝑛
SSTotal = SST = 𝑌𝑖 − 𝑌ሜ SSReg = Yˆi − Y
i =1
𝑖=1
( )
n 2
15
Kanlaya Vanichbuncha
The Coefficient of Correlation
Pearson correlation:
Describes the strength of the relationship between two sets of quantitative variable.
n( XY ) − ( X )( Y )
r= -1 r +1
n( X 2 2
) − ( X ) n( Y ) − ( Y )
2 2
r = +1, or r = -1 : indicate perfect correlation.
r = +1 : X and Y are perfectly related in a positive linear.
r = -1 : X and Y are perfectly related in a negative linear.
r = 0 : There is absolutely no relationship between two variables.
Perfect positive
Perfect negative correlation
Moderate Moderate
Strong Weak
Weak Strong
-1 -.5 0 .5 1
1. e i = (Yi − Yˆi ) = 0
Kanlaya Vanichbuncha 17
Examining Condition
1. e is normal
- Chi-Square Test
- Kolmogorov-Smirnov Test (any sample size : n)
- Shapiro-Wilk (n ≤ 50)
2. V(e) is constant
(if V(e) is not constant, Heteroscedastic Problem)
- Plot graph between e and
Yˆor X
3. et and et+1 are independent
- Durbin-Watson
Kanlaya Vanichbuncha 18
Checking condition of V(e) is constant
e
• Homoskedasticity: Equal Error Variance
0
X or Yˆ
-1
10
2
HAPPY
0
Kanlaya Vanichbuncha
0 20000 40000 60000 80000
Kanlaya Vanichbuncha
100000
19
INCOME
• Homoskedasticity: Equal Error Variance
10
2
• Heteroskedasticity: Unequal Error Variance
HAPPY
0
0 20000 40000 60000 80000 100000
10 At higher values of
INCOME X, error variance
8 increases a lot.
2
HAPPY
0
0 20000 40000 60000 80000 100000
10000 30000 50000 70000 90000
INCOME
Kanlaya Vanichbuncha 20
Example 3
𝑌 = 2.286 + 2.679𝑋
X Y
𝒀 Error
1 4 4.96 -.96
2 8 7.64 .36
3 9 10.32 -1.32
4 16 13.00 3.00
5 13 15.68 -2.68
6 24 18.36 5.64
7 17 21.04 -4.04
e i =0
3
e
.
2
0 Series1
0 1 2 3 4 5 6 7 8
-1
-2
-3 X
-4
-5
21
Kanlaya Vanichbuncha
Yˆ = 1.906 + 0.397 X
X 𝑌 𝑌 Error Error
nonlinear linear
1 2.00 2.30 -.30 -.96
2 2.83 2.70 .13 .36
3 3.00 3.10 -.10 -1.32
4 4.00 3.49 .51 3.00
5 3.61 3.89 -.29 -2.68
6 4.90 4.29 .61 5.64
7 4.12 4.68 -.56 -4.04
1
e ei = 0 ei = 0
0.5
0
X
-0.5 0 1 2 3 4 5 6 7 8
-1
22
Kanlaya Vanichbuncha
Checking condition : et and et+1 are independent
n
(et − et −1 ) 2
D −W = t =2 0 D_W 4
n
et2
t =1
Kanlaya Vanichbuncha
23
Regression: Outliers
▫ Choose “influence” and “distance” statistics such as Cook’s Distance, DFFIT,
standardized residual
▫ High values signal potential outliers
▫ Note: This is less useful if you have a VERY large dataset, because you have to look
at each case value.
• Note: Even if regression assumptions are met, slope estimates can have
problems
• Example: Outliers -- cases with extreme values that differ greatly from
the rest of your sample
More formally: “influential cases”
• Outliers can result from:
Errors in coding or data entry
Highly unusual cases
Or, sometimes they reflect important “real” variation
• Even a few outliers can dramatically change estimates of the slope,
especially if N is small.
Kanlaya Vanichbuncha 25
Regression: Outliers
Extreme case that pulls
regression line up
4
• Outlier Example :
2
-4 -2 0 2 4
Regression line with
extreme case removed
-2 from sample
26
Kanlaya Vanichbuncha -4
Outlier Diagnostics
• Cook’s D: Identifies cases that are strongly influencing the
regression line
▫ SPSS calculates a value for each case
Go to “Save” menu, click on Cook’s D
• How large of a Cook’s D is a problem?
▫ Rule of thumb: Values greater than: 4 / (n – k – 1)
▫ Example: N=7, K = 1: Cut-off = 4/5 = .80
▫ Cases with higher values should be examined.
• Example: Outlier/Influential Case Statistics
Hours Score Residual Std Residual Cook’s D
28
Kanlaya Vanichbuncha
Multiple Linear Regression
One dependent and at least 2 independent
variables (k independent variables are potentially
related to the dependent variable).
Y = 0 + 1 X 1 + 2 X 2 + ... + k X k + e k2
X1 (Nominal)
X2 (Ordinal) Dependent
X3 (Interval) Y(I/R)
X4 (Ratio)
Kanlaya Vanichbuncha
29
การรับรู ้ความสามารถของตนเอง
คุณลักษณะงาน
▪ ความหลากหลายของทักษะ
▪ ความมีเอกลักษณ์ของงาน
▪ ความสาคัญของงาน พฤติกรรมการทางาน
▪ ความเป็ นอิสระในงาน
▪ การได้รับข้อมูลย้อนกลับ
ความสุขในการทางาน
30
ลักษณะความเป็ นพลเมืองดีดา้ นอนุรักษ์นิยม
การสนับสนุนทางสังคมด้านข้อมูลข่าวสาร
ภูมิลาเนาเดิม พฤติกรรมการแสดงออกที่เหมาะสมทาง
การเมืองของนิสิต
ความเชื่อในอานาจตนเอง
การมีส่วนร่ วมในกิจการนิสิตด้าน
การมีส่วนร่ วมในการรับประโยชน์
31
Estimate Y by Yˆ
𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + 𝑏3 𝑋3
conditions
For k = 3 𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 +. . . +𝑏𝑘 𝑋𝑘
Y= Sales Revenue (Million Baht)
X1 = Advertising expense (Baht 100,000)
X2 = Selling price per unit (Baht)
X3 = number of branches
If b1 = 7 => if we increase advertising expense 1 unit (Baht 100,000) Y or sales
mean will be increase 7 million baht when fix selling price per unit and number
of branch.
Kanlaya Vanichbuncha 32
Testing Hypothesis
bi − 0 MSE
t= SE (bi ) =
SE (bi ) X i
2
Kanlaya Vanichbuncha 34
The coefficient of Determination : R2
To the proportion of total variation in dependent variable Y that is
explained by the variation of independent variable X.
SS total = SS regression + SS Error
SST = SSReg + SSResidual
SS Re g SSE
R2 = = 1− 0 R2 1 or 0% R2 100%
SST SST
Note: As more independent variables (X) are added to regression model => R2
will never decrease, even if the additional variables are not related to Y.
Kanlaya Vanichbuncha
Copyright reserved by Kanlaya Vanichbuncha 35
Multicollinearity
Kanlaya Vanichbuncha
36
2. b unstable
3. SE(b) will be very large.
t-Test too small => accept H0 even when X and
Y are strong related with Y
Kanlaya Vanichbuncha 37
Detecting Multicollinearity
1. Compute the pairwise correlation between X’s multicollinearity
may be a serious problem if any pairwise correlation is bigger
than 0.5
2
R( X i ) = Coefficient of determination of 0 R2 1
X i = a + b1 X 1 + b2 X 2 + ... + bi −1 X i −1 + bi +1 X i +1 + ... + bk X k
0 Tolerance 1
3.VIF (Variance Inflation Factor)
VIF(Xi) =
1 1 VIF
Tolerance( Xi )
38
Copyright
Kanlaya reserved by Kanlaya Vanichbuncha
Vanichbuncha
1. VIF(Xi) > 10 => multicollinearity may be influencing the least
square astimate of regression coefficients (bi)
k
2. VIF ( X
j =1
j ) > 10 => serious problem of multicollinearity
VIF =
k
3. Ridge Regression
Kanlaya Vanichbuncha 39
Copyright reserved by Kanlaya Vanichbuncha
Variables Selection
1 2 3 4
Enter / Backward Forward Stepwise
Remove Elimination Selection Regression
Kanlaya Vanichbuncha 40
Example 1 : Data File : diaper. sav
Price
. Preference
.
.
.
Tape
Kanlaya Vanichbuncha 41
1.1 Enter for 9 Independent Variables
Kanlaya Vanichbuncha 42
Click Statistics
Click Plots
Kanlaya Vanichbuncha 43
Click Save
Kanlaya Vanichbuncha 44
Descriptive Statistics
Mean Std. Deviation N
Preference 3.99 1.961 296
Size 4.80 1.048 296
price 4.75 1.005 296
Value 4.81 1.166 296
unisex 4.13 1.500 296
style 4.03 1.325 296
absorb 3.92 1.172 296
leakage 3.89 1.153 296
Comfort 3.83 1.203 296
tape 3.35 1.167 296
Kanlaya Vanichbuncha 45
ANOVAa
Model df F Sig.
Sum of Squares Mean Square
Kanlaya Vanichbuncha 46
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3.590 .307 -11.694 .000
Size .524 .113 .280 4.644 .000
price .124 .122 .063 1.009 .314
Value -.027 .074 -.016 -.369 .712
1 unisex .430 .089 .329 4.819 .000
style -.014 .097 -.010 -.148 .882
absorb .370 .149 .221 2.479 .014
leakage .271 .157 .159 1.731 .085
Comfort .074 .081 .045 .910 .363
tape .031 .066 .018 .467 .641
a. Dependent Variable: Preference
Kanlaya Vanichbuncha 47
Model Summaryb
Model R R Square Adjusted R Square Std. Error of the Estimate
Kanlaya Vanichbuncha 48
1.2 Enter for 3 independent variables
Size , unisex , absorb
ANOVAa
Kanlaya Vanichbuncha 49
Coefficientsa
Standardize
Unstandardized d t Sig. Collinearity
Model Coefficients Coefficients Statistics
Kanlaya Vanichbuncha
50
Model Summaryb
Model
R Square Adjusted R Square Std. Error of the
R Estimate
1 .866a .750 .747 .986
Kanlaya Vanichbuncha 51
2.Forward
Variables Entered/Removeda
Kanlaya Vanichbuncha 52
\
Coefficientsa
Collinearity
Model Unstandardized Standardized Statistics
Coefficients Coefficients t Sig.
Kanlaya Vanichbuncha 54
3. Backward
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
Kanlaya Vanichbuncha 55
3.Backward
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity
Model Sig. Statistics
Tolerance VIF
B Std. Error Beta t
-3.484 .283 -12.3 .000
(Constant)
size .614 .066 .328 9.311 .000 .682 1.467
6 unisex .424 .048 .325 8.909 .000 .637 1.570
absorb .380 .147 .227 2.580 .010 .109 9.190
leak .332 .148 .195 2.242 .026 .112 8.964
a. Dependent Variable: Preference
Kanlaya Vanichbuncha 56
Model Summaryg
Kanlaya Vanichbuncha 57
4.Stepwise Variables Entered/Removeda
Model Variables Variables Method
Entered Removed
Stepwise (Criteria: Probability-of-F-to-enter <=
1 absorb .
.050, Probability-of-F-to-remove >= .100).
Stepwise (Criteria: Probability-of-F-to-enter <=
2 unisex .
.050, Probability-of-F-to-remove >= .100).
Stepwise (Criteria: Probability-of-F-to-enter <=
3 size .
.050, Probability-of-F-to-remove >= .100).
Stepwise (Criteria: Probability-of-F-to-enter <=
4 leak .
.050, Probability-of-F-to-remove >= .100).
Coefficientsa
Unstandardized Stand t Sig. Collinearity
Coefficients ardize Statistics
Model d
Coeffi
cients
B Std. Error Beta Toleranc VIF
e
Kanlaya Vanichbuncha 59
Copyright reserved by Kanlaya Vanichbuncha
X1, X2, …, X10
Kanlaya Vanichbuncha
60
Variable
Example
Service of Nurse
Quality of
Service of Doctor
Servers
Service of office employee
Quality of Drug
Quality of
Quality of
treatment
Products
Quality of equipment
Kanlaya Vanichbuncha
61
Factor Analysis
Kanlaya Vanichbuncha 62
63
➢ Data are used to help reveal or identify the structure of the factor
model.
Kanlaya Vanichbuncha
64
Confirmatory Factor Analysis
Kanlaya Vanichbuncha 65
e1 X1 D1 D3 Y1 e8
e2 X2
F1 Y2 e9
F3
e3 X3
Y3 e10
e4 X4
Y4 e11
e5 X5 F2
F4 Y5 e12
e6 X6
Y6 e13
e7 X7 D2 D4
Y7 e14
Kanlaya Vanichbuncha 66
Type of Exploratory Factor Analysis
1. Principal Component Analysis (PCA)
2. Common Factor Analysis
Exploratory Factor Analysis :PCA
Item1(Variable 1) Item2(Variable 2) Item3(Variable 2)
(Independent)
Component
(Dependent Variable)
67
Kanlaya Vanichbuncha
I. Principal Component Analysis (PCA)
(Measurement Model)
3. Full Solution
number of component (Factor ) = number of
measurement variable
4. No unique error
Kanlaya Vanichbuncha 68
Copyright reserved by Kanlaya Vanichbuncha
Exploratory Factor Analysis
(Principal Component Analysis)
component 1/
Factor 1 component 2
Kanlaya Vanichbuncha 69
Copyright reserved by Kanlaya Vanichbuncha
II. Common Factor Analysis
(PAF: Principal Axis Factoring)
Note: Factor structure be descriptive of what the
variables have n common
1. Total variance = common variance (shared)
+ unique variance
common variance
of I2, I2, I3
I3
I1
unique variance of I3
I2
Kanlaya Vanichbuncha 70
Copyright reserved by Kanlaya Vanichbuncha
II. Common Factor Analysis
(PAF: Principal Axis Factoring)
1. Factor= Independent variable
Indicator / Measurement = Dependent variable
Factor
(Independent)
1 1 1
e1 e2 e3
Kanlaya Vanichbuncha 71
Copyright reserved by Kanlaya Vanichbuncha
II. Common Factor Analysis
(PAF: Principal Axis Factoring)
latent/Unobserved
Factor 1 Factor 2
1 1 1 1
1 1
e11 e2 e3 e4 e5 e6
error/unique
72
Copyright reserved by Kanlaya
Kanlaya Vanichbuncha
Vanichbuncha
Steps for Exploratory Factor Analysis 73
KMO =
i j
0 ≤ KMO ≤ 1
r
i j
2
ij + a
i j
2
ij
Kanlaya Vanichbuncha 73
Step for Factor Analysis
KMO Recommendation
≥ 0.9 Marvelous
0.80 + Meritorious
0.70 + Middling
0.60 + Mediocre
0.50 + Miserable
Below 0.5 Unacceptable
2. Common Factor
◼ Principal Axis Factoring ( PAF)
◼ Maximum Likelihood ( MLE )
: assumes mutivariate normality, but
provide goodness of fit evaluation.
➢ etc
Kanlaya Vanichbuncha 75
Copyright reserved by Kanlaya Vanichbuncha
Fundamental Steps for EFA
Step 3. consider number of Factors
Determine the appropriate number of factors by
1. Eigenvalues
2. Scree plot of eigenvalues.
Kanlaya Vanichbuncha 76
Copyright reserved by Kanlaya Vanichbuncha
Step 6: Identify original /observed variables for each factor by
consider factor loading.
Step 7 :Interpret the Factor and Evaluate the Quality of
the Solution
➢ Consider the meaningfulness and interpretability
of the factors.
Eliminate poorly defined factors
Kanlaya Vanichbuncha 78
Kanlaya Vanichbuncha 79
Kanlaya Vanichbuncha 80
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .806
Approx. Chi-Square 2372.596
Bartlett's Test of Sphericity df 36
Sig. .000
Kanlaya Vanichbuncha 81
Communalities
Initial Extraction
Kanlaya Vanichbuncha 82
Total Variance Explained
Compone Initial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared
nt Loadings Loadings
Kanlaya Vanichbuncha 83
Component Matrixa
Component
1 2 3
Size .761 .485 .217
price .751 .524 .227
Value .669 .487 .316
unisex .752 .148 -.596
style .726 .126 -.633
absorb .828 -.383 .085
leakage .820 -.439 .070
Comfort .757 -.454 .158
tape .636 -.423 .179
Extraction Method: Principal Component Analysis.
a. 3 components extracted.
Kanlaya Vanichbuncha
84
Rotated Component Matrixa
Component
1 2 3
Kanlaya Vanichbuncha 85
Rotation Method
I. Orthogonal Rotation
1. Varimax (Kaiser 1958)
2. Qaurtimax
3. Equimax
Note Give one Rotated Factor Matrix. (structural
coefficient)
II. Oblique Factor Rotation
Direct Oblimin
Note : Give 2 different Rotated factor Matrix
1. Pattern Matrix (Pattern coefficient)
2. Structure Matrix
Kanlaya Vanichbuncha 86
Copyright reserved by Kanlaya Vanichbuncha
Orthogonal Rotation
F2
X4
X10
X6
X1 X9
0
F1
X2 X3
X7
X8
X5
Kanlaya Vanichbuncha 87
Factor Rotation
F2
X4
X10
X6
X1 X9
F1
0
X2 X3
X7
X8
X5
Kanlaya Vanichbuncha 88
F1 Factor Rotation
X4 X5
X1 F2
X2
X3
X6
Kanlaya Vanichbuncha 89
II. Common Factor Analysis
(PAF: Principal Axis Factoring)
1. Factor= Independent variable
Indicator / Measurement = Dependent variable
Factor
(Independent)
1 1 1
e1 e2 e3
Kanlaya Vanichbuncha 90
Copyright reserved by Kanlaya Vanichbuncha
II. Common Factor Analysis
(PAF: Principal Axis Factoring)
Communalities
Initial Extraction
Size .766 .820
price .784 .911
Value .558 .588
unisex .817 .911
style .803 .877
absorb .893 .843
leakage .900 .905
Comfort .655 .701
tape .453 .420
Extraction Method: Principal Axis Factoring.
Kanlaya Vanichbuncha 91
Total Variance Explained
92
Kanlaya Vanichbuncha
Factor Matrixa Pattern Matrix a
Factor Factor
1 2 3 1 2 3
Size .750 .454 .228 Size .012 .881 -.034
price .753 .526 .259 price -.048 .972 -.011
Value .627 .372 .237 Value .037 .763 .029
unisex .758 .145 -.562 unisex .002 .027 -.939
style .727 .122 -.578 style .003 -.019 -.945
absorb .818 -.401 .112 absorb .892 .022 -.028
leakage .820 -.472 .102 leakage .962 -.052 -.027
Comfort .724 -.399 .135 Comfort .850 .004 .028
tape .572 -.288 .101
tape .640 .025 .009
Extraction Method: Principal Axis Factoring.
Extraction Method: Principal Axis Factoring.
Rotation Method: Oblimin with Kaiser Normalization.
a. 3 factors extracted. 11 iterations required.
a. Rotation converged in 5 iterations.
93
Kanlaya Vanichbuncha
Factor Correlation Matrix
Factor 1 2 3
1 1.000 .504 -.534
2 .504 1.000 -.536
3 -.534 -.536 1.000
Extraction Method: Principal Axis Factoring.
Rotation Method: Oblimin with Kaiser Normalization.
Kanlaya Vanichbuncha 94
References
Kanlaya Vanichbuncha “การวิเคราะห์ขอ
้ มูลหลายต ัวแปร”
Kanlaya Vanichbuncha 95