Regression and Factor

1
สถิตแิ ละการวิเคราะห์ข้อมูลขั้นสูงด้วย
SPSS for Windows
กัลยา วานิชย์ บัญชา

1. วันที่ 4 พุธ สิ งหาคม 2564 วลา 18.30 – 21.30 น.
➢ Simple and Multiple Linear Regression

▪ Assumptions
▪ examine conditions(Assumptions)
▪ Example
2. วันพฤหัสบดีที่ 19 สิ งหาคม 2564 วลา 18.30 – 21.30 น.

➢ Multiple Linear Regression
➢ Factor Analysis
➢ Example
3. วันพฤหัสบดีที่ 26 สิ งหาคม 2564 วลา 18.30 – 21.30 น.
➢ Logistic Regression Analysis and ROC Curve

▪ Assumptions
▪ Example
2
4. วันพฤหัสบดีที่ 2 กันยายน 2564 วลา 18.30 – 21.30 น.
➢ Discriminant Analysis
▪ Assumptions
▪ Example

➢ Cluster Analysis
▪ Example
▪ Log Linear Model
3
Regression Analysis.
• Analyze the relationship among
variables.
• Causal Relationship
Objective
1. Study the pattern of relationship between variables.
2. Estimate or Forecast.
Kanlaya Vanichbuncha
4
Regression Analysis
Regression Analyze the Causal

Analysis. relationship
among variables. Relationship
Study the pattern

of relationship Estimate or
Objective between Forecast.
variables.
Simple Regression
One dependent and one independent
variable. Both of them are quantitative.
Interval / Ratio Interval / Ratio
Income Expense
Independent (X) Dependent (Y)
Income Expense Adv.
Expense Sales
Advertising Sales Revenue
Expense Income Sat.
Score
Income Satisfaction score
Kanlaya Vanichbuncha 6
Type of Data
1 2 3
Cross-sectional Time – series Longitudinal
data data data
Simple Linear Regression Analysis
The relationship between dependent (Y) and
independent (X) is linear form.
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝒆𝒊 i = 1,2,…, N
Y = dependent variable – annual sales

X = independent variable – advertising budget
𝜷𝟎 = Y intercept for regression line
𝜷𝟏 = slope of regression line
e = error – difference between actual value (Y) and Predicated
(Yˆ ) value
8
Predicted by regression line ( Yˆ )
y 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿 + 𝒆 y
𝜷𝟏 < 0
𝜷𝟏 > 0
𝜷𝟎

x x
𝜷𝟏 = slope of the line or the average changed in Y for each
change of one unit (either increase or decrease) in X
𝜷𝟏 = regression coefficients
𝜷𝟎 = Y- intercept or the estimated value of Y when X = 0
The unit 𝜷𝟎 and 𝜷𝟏 are the same of dependent variable Y.
Figure 1 : Positive linear relationship Figure 2 : Negative linear relationship
y y
x x
10
For Sample Data
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝒆𝒊 …(1)
෢𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊
𝒀 …(2)
Yi − Yî = ei (1) - (2)

෡𝟎
𝜷 = 𝒃𝟎 ෡ 𝟏 = 𝒃𝟏
𝜷
Estimated a and b by
1. Ordinary Least Square (OLS)
2. Maximum Likelihood Estimator (MLE)
X = Advertising expense (Baht 100,000)
Example 1
Y = Sales Revenue (million)
Regression equation
𝑆𝑎𝑙𝑒𝑠Ƹ = 28.94 + 5.18𝐴𝑑𝑣

𝑏1 = 5.18 million baht
𝑏0 = 28.94 million baht
If we do not advertise ➔ Adv =0 ,then 𝑆𝑎𝑙𝑒𝑠Ƹ = 28.94 + 5.18 (0)
If advertising expense increase 100,000 baht , we can

expect to increase sales mean by 5.18 million baht.
If next year we let advertising expense = 10 or Baht 1,000,000. we

can expect to have sales mean = 28.94 + 5.18(10) = 80.74 million
baht
Example 2
X 2 4 8 10 13 14
60
50
40
Y 50 38 26 25 7 2
30
Step
20
10
1 Use scatter plot to examine the relationship between X and Y.
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 2 ෢𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊
𝒀
Error Term or Residual Term 𝒃𝟏 = -3.368 𝒃𝟏 = 54.417
Yî = 54.417 − 3.368 X i

2 Yˆ = 54.417 − 3.368 X i
Xi Yi ei = Yi − Yî e i
2 50 47.681 2.319 5.378
4 38 40.945 -2.945 8.673
8 26 27.473 -1.473 2.17 ei = Yi − Yî or ei = Yî − Yi
10 25 20.737 4.263 18.173
13 7 10.633 -3.633 13.199
14 2 0.529 1.471 2.164
Total 49.757
13
Testing Hypothesis about 𝜷𝟎 and 𝜷𝟏
1. Test about (slope of linear line)
H 0 : 𝜷𝟏 = 0 or H0 : There is no linear relationship between Y and X.

H 1 : 𝜷𝟏  0 H1: X and Y are linearly related.
Test Statistics
𝑏1 − 0
tn-2 =
𝑆𝐸(𝑏)
Reject H0 if t > t1-/2,n-2 or t < - t1-/2,n-2
2. Test about 𝜷𝟎 (Intercept)

Test Statistics
H0 : 𝜷𝟎 = 0 𝑏0 − 0
tn-2 =
H1 : 𝜷𝟎  0 𝑆𝐸(𝑏0 )
Reject H0 if t > t1-/2,n-2 or t < - t1-/2,n-2
14
The coefficient of Determination : R2
▪ To the proportion of total variation in dependent variable Y that is

explained by the variation of independent variable X.
Sum Square total (SST) = Sum Square regression (SSR) + Sum Square Error (SSE)
SST = SSReg + SSE
𝑆𝑆 Re 𝑔 0  R2  1 or 0%  R2  100%
𝑅2 =
𝑆𝑆𝑇
( )
2 n 2
𝑛
SSTotal = SST = ෍ 𝑌𝑖 − 𝑌ሜ SSReg =  Yî − Y
i =1
𝑖=1
( )
n 2
SSE = SSError = SSResidual =  Yi −Yˆ i

i =1
15
The Coefficient of Correlation
Pearson correlation:
Describes the strength of the relationship between two sets of quantitative variable.
n( XY ) − ( X )( Y )
r= -1  r  +1
n( X 2 2

) − (  X ) n(  Y ) − (  Y )
2 2

r = +1, or r = -1 : indicate perfect correlation.
r = +1 : X and Y are perfectly related in a positive linear.
r = -1 : X and Y are perfectly related in a negative linear.
r = 0 : There is absolutely no relationship between two variables.
Perfect positive
Perfect negative correlation
Moderate Moderate
Strong Weak
Weak Strong
-1 -.5 0 .5 1
Negative correlation Positive correlation 16

Conditions or Assumptions
1. error ei ~ normal (0, 2)

2. V(e) = 2 is constant
3. et and et+1 are independent
Properties of a and b for Least Square Method
1. e i =  (Yi − Yî ) = 0
2. ( X , Y ) is the point on regression line

3.  i is minimum
e 2
Examining Condition
1. e is normal
- Chi-Square Test
- Kolmogorov-Smirnov Test (any sample size : n)
- Shapiro-Wilk (n ≤ 50)
2. V(e) is constant
(if V(e) is not constant, Heteroscedastic Problem)
- Plot graph between e and
Yôr X
- Durbin-Watson
Checking condition of V(e) is constant
e
• Homoskedasticity: Equal Error Variance
0
X or Yˆ
-1
10
• Homoskedasticity: Equal Error Variance Examine error at

8 different values of X. Is
it roughly equal?
6
2
HAPPY
0
0 20000 40000 60000 80000
100000
19
INCOME
• Homoskedasticity: Equal Error Variance
10
Examine error at different values

8
of X. Is it roughly equal?
2
• Heteroskedasticity: Unequal Error Variance
HAPPY
0
0 20000 40000 60000 80000 100000
10 At higher values of
INCOME X, error variance
8 increases a lot.
2
HAPPY
0
0 20000 40000 60000 80000 100000
10000 30000 50000 70000 90000
INCOME
Example 3
𝑌෠ = 2.286 + 2.679𝑋
X Y ෡
𝒀 Error
1 4 4.96 -.96
2 8 7.64 .36
3 9 10.32 -1.32
4 16 13.00 3.00
5 13 15.68 -2.68
6 24 18.36 5.64
7 17 21.04 -4.04
e i =0
3
e
.
2
0 Series1
0 1 2 3 4 5 6 7 8
-1
-2
-3 X
-4
-5
21
Yˆ = 1.906 + 0.397 X
X 𝑌 ෢𝑌 Error Error
nonlinear linear
1 2.00 2.30 -.30 -.96
2 2.83 2.70 .13 .36
3 3.00 3.10 -.10 -1.32
4 4.00 3.49 .51 3.00
5 3.61 3.89 -.29 -2.68
6 4.90 4.29 .61 5.64
7 4.12 4.68 -.56 -4.04
1
e  ei = 0  ei = 0
0.5
0
X
-0.5 0 1 2 3 4 5 6 7 8
-1
22
Checking condition : et and et+1 are independent
Use Durbin-Watson (D-W)

n
(et − et −1 ) 2
D −W = t =2 0  D_W  4

n
et2
t =1
1. D-W  2 => et and et+1 are independent
2. D-W < 2 => et and et+1 are positive relationship
3. D-W > 2 => et and et+1 are negative relationship
23
Regression: Outliers
▫ Choose “influence” and “distance” statistics such as Cook’s Distance, DFFIT,
standardized residual
▫ High values signal potential outliers
▫ Note: This is less useful if you have a VERY large dataset, because you have to look
at each case value.
• Example: Study time and student achievement.

▫ X variable: Average # hours spent studying per day
▫ Y variable: Score on reading test Scatterplots
Y axis
30
Case X Y
1 2.6 28
2 1.4 13
20
3 .65 17
4 4.1 31
5 .25 8
10
6 1.9 16
7 3.5 6
0 0 1 2 3 4 X axis 24
• Note: Even if regression assumptions are met, slope estimates can have
problems
• Example: Outliers -- cases with extreme values that differ greatly from
the rest of your sample
 More formally: “influential cases”
• Outliers can result from:
 Errors in coding or data entry
 Highly unusual cases
 Or, sometimes they reflect important “real” variation
• Even a few outliers can dramatically change estimates of the slope,
especially if N is small.
Extreme case that pulls
regression line up
4
• Outlier Example :
2
-4 -2 0 2 4
Regression line with
extreme case removed
-2 from sample
26
Kanlaya Vanichbuncha -4
Outlier Diagnostics
• Cook’s D: Identifies cases that are strongly influencing the
regression line
▫ SPSS calculates a value for each case
 Go to “Save” menu, click on Cook’s D
• How large of a Cook’s D is a problem?
▫ Rule of thumb: Values greater than: 4 / (n – k – 1)
▫ Example: N=7, K = 1: Cut-off = 4/5 = .80
▫ Cases with higher values should be examined.
• Example: Outlier/Influential Case Statistics
Hours Score Residual Std Residual Cook’s D
2.60 28 9.32 1.01 .124

1.40 13 -1.97 -.215 .006
.65 17 4.33 .473 .070
4.10 31 7.70 .841 .640
.25 8 -3.43 -.374 .082
1.90 16 -.515 -.056 .0003
3.50 6 -15.4 -1.68 .941
27
Model Summaryb
Adjusted Std. Error of

Outliers
Model R R Square R Square the Estimate
1 .466a .217 .060 9.1618
a. Predictors: (Constant), HRSTUDY
Coefficientsa
b. Dependent Variable: TESTSCOR
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 10.662 6.402 1.665 .157
HRSTUDY 3.081 2.617 .466 1.177 .292
a. Dependent Variable: T ESTSCOR
Model Summaryb • Results with outlier removed:

Adjusted Std. Error of
Model R R Square R Square the Estimate
1 .903a .816 .770 4.2587
a. Predictors: (Constant), HRSTUDY
Coefficientsa
b. Dependent Variable: TESTSCOR
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 8.428 3.019 2.791 .049
HRSTUDY 5.728 1.359 .903 4.215 .014
a. Dependent Variable: TESTSCOR
28
Multiple Linear Regression
One dependent and at least 2 independent
variables (k independent variables are potentially
related to the dependent variable).
Y =  0 + 1 X 1 +  2 X 2 + ... +  k X k + e k2
X1 (Nominal)
X2 (Ordinal) Dependent
X3 (Interval) Y(I/R)
X4 (Ratio)
29
การรับรู ้ความสามารถของตนเอง
คุณลักษณะงาน
▪ ความหลากหลายของทักษะ
▪ ความมีเอกลักษณ์ของงาน
▪ ความสาคัญของงาน พฤติกรรมการทางาน
▪ ความเป็ นอิสระในงาน
▪ การได้รับข้อมูลย้อนกลับ
ความสุขในการทางาน
ที่มา :สิ ริศา จักรบุญมา และ ผศ.ดร.ถวัลย์เนียมทรัพย์

“ปัจจัยที่มีอิทธิพลต่อพฤติกรรมการทางานของพนักงานฝ่ ายบารุ งรักษาในรัฐวิสาหกิจแห่งหนึ่ง”
30
ลักษณะความเป็ นพลเมืองดีดา้ นอนุรักษ์นิยม
ลักษณะความเป็ นพลเมืองดีดา้ นการมี

ความมีความรู ้
การสนับสนุนทางสังคมด้านการประเมิน
การสนับสนุนทางสังคมด้านข้อมูลข่าวสาร
ภูมิลาเนาเดิม พฤติกรรมการแสดงออกที่เหมาะสมทาง
การเมืองของนิสิต
ความเชื่อในอานาจตนเอง
การมีส่วนร่ วมในกิจการนิสิตด้าน
การมีส่วนร่ วมในการรับประโยชน์
ที่มา :วรรณกร พลพิชยั และ ผศ.น.ท.หญิง ดร. งามลมัย ผิวเหลือง

อิทธิพลของการมีส่วนร่ วมกิจกรรมนิสิต การสนับสนุนทางสังคม ความเชื่ออานาจควบคุมตนเอง และลักษณะความเป็ นพลเมืองดีที่มีต่อพฤติกรรมการแสดงออกที่เหมาะสมทางการเมือง
ของนิสิตมหาวิทยาลัยเกษตรศาสตร์
31
Estimate Y by Yˆ
𝑌෠ = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + 𝑏3 𝑋3
conditions
1. Error ~ Normal (o,2)

2. V(e) = 2 is constant (If not, Heteroscedastic)
4. X1,…,Xk are independent (If not, Multicollinearity)
For k = 3 𝑌෠ = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 +. . . +𝑏𝑘 𝑋𝑘
Y= Sales Revenue (Million Baht)
X1 = Advertising expense (Baht 100,000)
X2 = Selling price per unit (Baht)
X3 = number of branches
If b1 = 7 => if we increase advertising expense 1 unit (Baht 100,000) Y or sales
mean will be increase 7 million baht when fix selling price per unit and number
of branch.
Testing Hypothesis
Step 1 : Test effects of independent variables set

H0 : 1 = 2 = … = k = 0
H1 : At least one of i  0 ; i = 1,2,…,k
SV df SS MS=SS/df F
Regression (X) k SSReg MSReg MSReg/MSE
Error n-k-1 SSE MSE
Total n-1 SST
Test statistic F = MSReg / MSE

Conclusion :
1.1 Accept H0 : 1 = 2 = … = k = 0
-No linear relationship between Y and X1,X2,…, Xk stop
1.2 Reject H0 => There is at least one of Xi’s that, which linear
relationship with Y
-Test for each X or for each  in Step 2
Step 2 : Test effects for each independent variable
H0 : i = 0
H1 : i  0
bi − 0 MSE
t= SE (bi ) =
SE (bi ) X i
2
The coefficient of Determination : R2
To the proportion of total variation in dependent variable Y that is
explained by the variation of independent variable X.
SS total = SS regression + SS Error
SST = SSReg + SSResidual
SS Re g SSE
R2 = = 1− 0  R2  1 or 0%  R2  100%
SST SST
Note: As more independent variables (X) are added to regression model => R2
will never decrease, even if the additional variables are not related to Y.
Adjusted R2 for Multiple Regression

SSE /( n − k − 1) (n − 1)
R 2
= 1− = 1+ ( R 2 − 1)
adj
SST /( n − 1) (n − k − 1)
Copyright reserved by Kanlaya Vanichbuncha 35
Multicollinearity
Problem of Multicollinearity exist when independent variables

(X’s) are correlated with one another.
Problems high degree of Multicollinearity among the
independent variables.
1. Contradiction
Step 1 : Accept H0 : 1 = 2 = … = k = 0
but in Step 2 : Reject H0 : i = 0
• Or step1: reject Ho but in step 2 :accept Ho
36
2. b unstable
3. SE(b) will be very large.
t-Test too small => accept H0 even when X and
Y are strong related with Y
Detecting Multicollinearity
1. Compute the pairwise correlation between X’s multicollinearity
may be a serious problem if any pairwise correlation is bigger
than 0.5
2. Tolerance Tolerance (Xi) = 1 - R( X i )

2
2
R( X i ) = Coefficient of determination of 0  R2  1
X i = a + b1 X 1 + b2 X 2 + ... + bi −1 X i −1 + bi +1 X i +1 + ... + bk X k
0  Tolerance  1
3.VIF (Variance Inflation Factor)
VIF(Xi) =
1 1  VIF  
Tolerance( Xi )
38
Copyright
Kanlaya reserved by Kanlaya Vanichbuncha
Vanichbuncha
1. VIF(Xi) > 10 => multicollinearity may be influencing the least
square astimate of regression coefficients (bi)
k
2.  VIF ( X
j =1
j ) > 10 => serious problem of multicollinearity
VIF =
k
Corrections for Multicollinearity

1. Remove variables that are highly correlated with others.
2. Use Exploratory Factor Analysis
3. Ridge Regression
Copyright reserved by Kanlaya Vanichbuncha
Variables Selection
1 2 3 4
Enter / Backward Forward Stepwise
Remove Elimination Selection Regression
Example 1 : Data File : diaper. sav
Size Multiple Linear regression
Price
. Preference
.
.
.
Tape
1.1 Enter for 9 Independent Variables
Click Statistics
Click Plots
Click Save
Descriptive Statistics
Mean Std. Deviation N
Preference 3.99 1.961 296
Size 4.80 1.048 296
price 4.75 1.005 296
Value 4.81 1.166 296
unisex 4.13 1.500 296
style 4.03 1.325 296
absorb 3.92 1.172 296
leakage 3.89 1.153 296
Comfort 3.83 1.203 296
tape 3.35 1.167 296
ANOVAa
Model df F Sig.
Sum of Squares Mean Square
Regression 857.721 9 95.302 98.661 .000b
Residual 276.265 286 .966

1
Total 1133.986 295
a. Dependent Variable: Preference
b. Predictors: (Constant), tape, Value, style, absorb, Size, Comfort, price, unisex, leakage
Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3.590 .307 -11.694 .000
Size .524 .113 .280 4.644 .000
price .124 .122 .063 1.009 .314
Value -.027 .074 -.016 -.369 .712
1 unisex .430 .089 .329 4.819 .000
style -.014 .097 -.010 -.148 .882
absorb .370 .149 .221 2.479 .014
leakage .271 .157 .159 1.731 .085
Comfort .074 .081 .045 .910 .363
tape .031 .066 .018 .467 .641
Model Summaryb
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .870a .756 .749 .983

a. Predictors: (Constant), tape, Value, style, absorb, Size, Comfort, price, unisex, leakage
b. Dependent Variable: Preference
1.2 Enter for 3 independent variables
Size , unisex , absorb
ANOVAa
Model Sum of df Mean F Sig.

Squares Square
Regression 850.232 3 283.411 291.64 .000b
1 Residual 283.755 292 .972
Total 1133.986 295

a. Dependent Variable: คะแนนความชอบ
b. Predictors: (Constant), การซึมซับ, จานวนชิ้นต่อกล่อง, ใช้ได้ท้ งั 2 เพศ
Coefficientsa
Standardize
Unstandardized d t Sig. Collinearity
Model Coefficients Coefficients Statistics
B Std. Error Beta Tolerance VIF
(Constant) -3.387 .281 -12.04 .000

Size .603 .066 .322 9.112 .000 .685 1.45
Unisex .437 .048 .334 9.174 .000 .646 1.54
Absorb .684 .059 .409 11.656 .000 .696 1.43
50
Model Summaryb
Model
R Square Adjusted R Square Std. Error of the
R Estimate
1 .866a .750 .747 .986
a. Predictors: (Constant), การซึมซับ, จานวนชิ้นต่อกล่อง, ใช้ได้ท้งั 2 เพศ

b. Dependent Variable: คะแนนความชอบ
2.Forward
Variables Entered/Removeda
Variables Variables Method

Model Entered Removed
1 absorb . Forward (Criterion: Probability-of-F-to-enter <= .050)
2 unisex . Forward (Criterion: Probability-of-F-to-enter <= .050)
3 size . Forward (Criterion: Probability-of-F-to-enter <= .050)

4 Leak . Forward (Criterion: Probability-of-F-to-enter <= .050)
\
Coefficientsa
Collinearity
Model Unstandardized Standardized Statistics
Coefficients Coefficients t Sig.
B Std. Error Beta Toleranc VIF

e
(Constant) -.748 .276 -2.711 .007
1
absorb 1.209 .067 .723 17.925 .000 1.000 1.000
(Constant) -1.713 .241 -7.116 .000
2 absorb .825 .064 .493 12.892 .000 .749 1.336
unisex .597 .050 .457 11.942 .000 .749 1.336
(Constant) -3.387 .281 -12.048 .000
3 absorb .684 .059 .409 11.656 .000 .696 1.436

unisex .437 .048 .334 9.174 .000 .646 1.548
size .603 .066 .322 9.112 .000 .685 1.459
(Constant) -3.484 .283 -12.332 .000
absorb .380 .147 .227 2.580 .010 .109 9.190
4 unisex .424 .048 .325 8.909 .000 .637 1.570
size .614 .066 .328 9.311 .000 .682 1.467
leak .332 .148 .195 2.242 .026 .112 8.964
Model Summarye
Model
R R Square Adjusted R Square Std. Error of the
Estimate
1 .723a .522 .521 1.358
2 .824b .679 .676 1.115
3 .866c .750 .747 .986
4 .868d .754 .751 .979
3. Backward
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
tape, Value, style, absorb, Size, . Enter

1 Comfort, price, unisex, leakageb
. style Backward (criterion: Probability of F-to-
2
remove >= .100).
. Value Backward (criterion: Probability of F-to-
3
remove >= .100).
. tape Backward (criterion: Probability of F-to-
4
remove >= .100).
. price Backward (criterion: Probability of F-to-
5
remove >= .100).
. Comfort Backward (criterion: Probability of F-to-
6
remove >= .100).
b. All requested variables entered.
3.Backward
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity
Model Sig. Statistics
Tolerance VIF
B Std. Error Beta t
-3.484 .283 -12.3 .000
(Constant)
size .614 .066 .328 9.311 .000 .682 1.467
6 unisex .424 .048 .325 8.909 .000 .637 1.570
absorb .380 .147 .227 2.580 .010 .109 9.190
leak .332 .148 .195 2.242 .026 .112 8.964
Model Summaryg
Model R R Square Adjusted R Square Std. Error of the

Estimate
1 .870a .756 .749 .983
2 .870b .756 .750 .981
3 .870c .756 .750 .980
4 .870d .756 .751 .978
5 .869e .755 .751 .978
6 .868f .754 .751 .979
4.Stepwise Variables Entered/Removeda
Model Variables Variables Method
Entered Removed
Stepwise (Criteria: Probability-of-F-to-enter <=
1 absorb .
.050, Probability-of-F-to-remove >= .100).
2 unisex .
3 size .
4 leak .
Coefficientsa
Unstandardized Stand t Sig. Collinearity
Coefficients ardize Statistics
Model d
Coeffi
cients
B Std. Error Beta Toleranc VIF
e
(Constant) -3.484 .283 -12.332 .000

การซึมซับ .380 .147 .227 2.580 .010 .109 9.190
4 ใช้ได้ท้ งั 2 เพศ .424 .048 .325 8.909 .000 .637 1.570
จานวนชิ้นต่อกล่อง .614 .066 .328 9.311 .000 .682 1.467
การรั่ว .332 .148 .195
Kanlaya2.242 .026 .112 8.964
Vanichbuncha 58
Factor Analysis
 FA is a multivariate statistical technique that is used to
summarize the information contained in a large number of
variable into a small number of subsets or factor.
 FA is used to identify underlying dimensions or constructs in
the data and reduce the number of variables by eliminating
redundancy.
Types of Factor Analysis
1. Exploratory Factor Analysis
• Study of interrelationship among the variables in an
effort to find a new set of variables.
2. Confirmative Factor Analysis
• Confirm hypothesis relationship structure.
X1, X2, …, X10
X2, X4, X10 X1, X7, X8,X9 X3, X5, X6
Factor1 Factor2 Factor3
60
Variable
Example
Service of Nurse
Quality of
Service of Doctor
Servers
Service of office employee
Quality of Drug
Quality of
Quality of
treatment
Products
Quality of equipment
61
Factor Analysis
1.Study of interrelationship among the variables in an effort to find

a new set of variables.
2.Confirm hypothesis relationship structure.
1. Exploratory Factor Analysis

1.1 Principal Component Analysis
1.2 Common Factor Analysis
2. Confirmative Factor Analysis
63
Exploratory Factor Analysis

➢ Study structure of the factor model or the underlying
theory is not know or specified a priori.
➢ Data are used to help reveal or identify the structure of the factor
model.
➢ The researcher has no knowledge of the factor structure and is

essentially seeking to identify the factor model that would account for
the covariances among the variables.
➢ Exploratory Factor Analysis Software :

SPSS, SAS, etc.
component 1 component 2
Var1 Var2 Var3 Var4 Var5 Var6 Var7
64
Confirmatory Factor Analysis
 To validate the hypothesized model and estimate model parameters.
 The precise structure of the model is know before.
Software : Amos ,EQS , MPLUS, LISREL
If we => use wish to confirm or negate the hypothesis Structure

Confirmatory FA
e1 X1 D1 D3 Y1 e8
e2 X2
F1 Y2 e9
F3
e3 X3
Y3 e10
e4 X4
Y4 e11
e5 X5 F2
F4 Y5 e12
e6 X6
Y6 e13
e7 X7 D2 D4
Y7 e14
Type of Exploratory Factor Analysis
1. Principal Component Analysis (PCA)
2. Common Factor Analysis
Exploratory Factor Analysis :PCA
Item1(Variable 1) Item2(Variable 2) Item3(Variable 2)
(Independent)
Component
(Dependent Variable)
67
I. Principal Component Analysis (PCA)
(Measurement Model)
1. Total variance in Data set.

How much of total variance they account for?
2. Variance for each variable = 1
3. Full Solution
number of component (Factor ) = number of
measurement variable
4. No unique error
Exploratory Factor Analysis
(Principal Component Analysis)
component 1/
Factor 1 component 2
Variable1 Variable2 Variable3 Variable4 Variable5 Variable6 Variable7
II. Common Factor Analysis
(PAF: Principal Axis Factoring)
Note: Factor structure be descriptive of what the
variables have n common
1. Total variance = common variance (shared)
+ unique variance
common variance
of I2, I2, I3
I3
I1
unique variance of I3
I2
1. Factor= Independent variable
Indicator / Measurement = Dependent variable
Factor
(Independent)
Item 1 Item 2 Item 3

Dependent Dependent Dependent
1 1 1
e1 e2 e3
latent/Unobserved
Factor 1 Factor 2
Item1 Item2 Item3 Item4 Item5 Item6
1 1 1 1
1 1
e11 e2 e3 e4 e5 e6
error/unique
72
Copyright reserved by Kanlaya
Vanichbuncha
Steps for Exploratory Factor Analysis 73
Step1. Check Appropriation of data for EFA

Ho : X1, X2 , …… , Xp are independent
H1: X1, X2 , …… , Xp are related
Use KMO (Kaiser-Meyer-Olkin)
  ij
r 2
KMO =
i j
0 ≤ KMO ≤ 1
r
i j
2
ij + a
i j
2
ij
rij = bivariate correlation between Xi and Xj
= Partial correlation between Xi and Xj when control others X

aij
Bartlett’s Test of Sphericity
Step for Factor Analysis
KMO Recommendation
≥ 0.9 Marvelous
0.80 + Meritorious
0.70 + Middling
0.60 + Mediocre
0.50 + Miserable
Below 0.5 Unacceptable
Copyrighted Reserved by Kanlaya Vanichbuncha

Fundamental Steps for EFA
step2. Select Factor Extraction
1. Principal Component Analysis ( PCA )
: no distributional assumptions.
2. Common Factor
◼ Principal Axis Factoring ( PAF)
◼ Maximum Likelihood ( MLE )
: assumes mutivariate normality, but
provide goodness of fit evaluation.
➢ etc
Fundamental Steps for EFA
Step 3. consider number of Factors
 Determine the appropriate number of factors by
1. Eigenvalues
2. Scree plot of eigenvalues.
Step 4 : Identify original /observed variables for each factor

by consider factor loading. If can not identify ➔ Go to step 5
Step 5 : Rotate axis of Factors
Step 6: Identify original /observed variables for each factor by
consider factor loading.
Step 7 :Interpret the Factor and Evaluate the Quality of
the Solution
➢ Consider the meaningfulness and interpretability
of the factors.
Eliminate poorly defined factors
Re-Run and (Ideally) Replicate the Factor Analysis

 If items or factors are dropped in preceding
step, re-run the EFA
Step 8: Naming factor
Copyright reserved by Kanlaya Vanichbuncha 77

Example 2. Exploratory Factor Analysis :PCA
Analyze =➔ Dimension Reduction ➔ Factor…….
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .806
Approx. Chi-Square 2372.596
Bartlett's Test of Sphericity df 36
Sig. .000
Communalities
Initial Extraction
Size 1.000 .861

price 1.000 .890
Value 1.000 .785
unisex 1.000 .944
style 1.000 .943
absorb 1.000 .840
leakage 1.000 .869
Comfort 1.000 .804
tape 1.000 .615
Extraction Method: Principal Component Analysis.
Total Variance Explained
Compone Initial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared
nt Loadings Loadings
Total % of Cumulati Total % of Cumulati Total % of Cumulati

Variance ve % Variance ve % Variance ve %
1 5.019 55.764 55.764 5.019 55.764 55.764 3.058 33.981 33.981

2 1.509 16.764 72.528 1.509 16.764 72.528 2.550 28.337 62.318
3 1.024 11.374 83.903 1.024 11.374 83.903 1.943 21.584 83.903
4 .561 6.230 90.133
5 .327 3.635 93.768
6 .269 2.986 96.754
7 .137 1.525 98.279
8 .099 1.105 99.384
9 .055 .616 100.0
Component Matrixa
Component
1 2 3
Size .761 .485 .217
price .751 .524 .227
Value .669 .487 .316
unisex .752 .148 -.596
style .726 .126 -.633
absorb .828 -.383 .085
leakage .820 -.439 .070
Comfort .757 -.454 .158
tape .636 -.423 .179
a. 3 components extracted.
84
Rotated Component Matrixa
Component
1 2 3
Size .226 .865 .247

price .194 .891 .239
Value .187 .858 .118
unisex .253 .272 .897
style .242 .226 .913
absorb .845 .239 .264
leakage .874 .189 .265
Comfort .864 .181 .157
tape .767 .143 .085

Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
Rotation Method
I. Orthogonal Rotation
1. Varimax (Kaiser 1958)
2. Qaurtimax
3. Equimax
Note Give one Rotated Factor Matrix. (structural
coefficient)
II. Oblique Factor Rotation
 Direct Oblimin
Note : Give 2 different Rotated factor Matrix
1. Pattern Matrix (Pattern coefficient)
2. Structure Matrix
Orthogonal Rotation
F2
X4
X10
X6
X1 X9
0
F1
X2 X3
X7
X8
X5
Factor Rotation
F2
X4
X10
X6
X1 X9
F1
0
X2 X3
X7
X8
X5
F1 Factor Rotation
X4 X5
X1 F2
X2
X3
X6
1. Factor= Independent variable
Indicator / Measurement = Dependent variable
Factor
(Independent)
Item 1 Item 2 Item 3

Dependent Dependent Dependent
1 1 1
e1 e2 e3
Communalities
Initial Extraction
Size .766 .820
price .784 .911
Value .558 .588
unisex .817 .911
style .803 .877
absorb .893 .843
leakage .900 .905
Comfort .655 .701
tape .453 .420
Extraction Method: Principal Axis Factoring.
Total Variance Explained
Extraction Sums of Squared Rotation Sums of

Initial Eigenvalues Loadings Squared Loadingsa
% of
Varianc % of Cumulative
Factor Total e Cumulative % Total Variance % Total
1 5.019 55.764 55.764 4.818 53.534 53.534 3.972
2 1.509 16.764 72.528 1.283 14.254 67.788 3.565
3 1.024 11.374 83.903 .876 9.734 77.522 3.298
4 .561 6.230 90.133
5 .327 3.635 93.768
6 .269 2.986 96.754
7 .137 1.525 98.279
8 .099 1.105 99.384
9 .055 .616 100.000
a. When factors are correlated, sums of squared loadings cannot be added to obtain a total variance.
92
Factor Matrixa Pattern Matrix a
Factor Factor
1 2 3 1 2 3
Size .750 .454 .228 Size .012 .881 -.034
price .753 .526 .259 price -.048 .972 -.011
Value .627 .372 .237 Value .037 .763 .029
unisex .758 .145 -.562 unisex .002 .027 -.939
style .727 .122 -.578 style .003 -.019 -.945
absorb .818 -.401 .112 absorb .892 .022 -.028
leakage .820 -.472 .102 leakage .962 -.052 -.027
Comfort .724 -.399 .135 Comfort .850 .004 .028
tape .572 -.288 .101
tape .640 .025 .009
Rotation Method: Oblimin with Kaiser Normalization.
a. 3 factors extracted. 11 iterations required.
a. Rotation converged in 5 iterations.
93
Factor Correlation Matrix
Factor 1 2 3
1 1.000 .504 -.534
2 .504 1.000 -.536
3 -.534 -.536 1.000
Rotation Method: Oblimin with Kaiser Normalization.
References
Kanlaya Vanichbuncha “การวิเคราะห์ขอ
้ มูลหลายต ัวแปร”
Kanlaya Vanichbuncha “การวิเคราะห์สถิตข

ิ นสู
ั้ งด้วย SPSS ”
Kanlaya Vanichbuncha “สถิตส
ิ าหร ับงานวิจ ัย ”
กัลยา วานิชย์ บัญชา การวิเคราะห์ สมการโครงสร้ าง (SEM) ด้ วย AMOS

Regression and Factor

Uploaded by

Copyright:

Available Formats

Regression and Factor

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression and Factor

Uploaded by

Copyright:

Available Formats

1

กัลยา วานิชย์ บัญชา

➢ Simple and Multiple Linear Regression

2. วันพฤหัสบดีที่ 19 สิ งหาคม 2564 วลา 18.30 – 21.30 น.

3. วันพฤหัสบดีที่ 26 สิ งหาคม 2564 วลา 18.30 – 21.30 น.

➢ Logistic Regression Analysis and ROC Curve

5. วันพฤหัสบดีที่ 9 กันยายน 2564 วลา 18.30 – 21.30 น.

6. วันพฤหัสบดีที่ 16 กันยายน 2564 วลา 18.30 – 21.30 น.

▪ Log Linear Model

Regression Analyze the Causal

Study the pattern

Interval / Ratio Interval / Ratio

Y = dependent variable – annual sales

Yi − Yˆi = ei (1) - (2)

𝑆𝑎𝑙𝑒𝑠Ƹ = 28.94 + 5.18𝐴𝑑𝑣

If we do not advertise ➔ Adv =0 ,then 𝑆𝑎𝑙𝑒𝑠Ƹ = 28.94 + 5.18 (0)

If advertising expense increase 100,000 baht , we can

If next year we let advertising expense = 10 or Baht 1,000,000. we

Yˆi = 54.417 − 3.368 X i

1. Test about (slope of linear line)

H 0 : 𝜷𝟏 = 0 or H0 : There is no linear relationship between Y and X.

2. Test about 𝜷𝟎 (Intercept)

Reject H0 if t > t1-/2,n-2 or t < - t1-/2,n-2

▪ To the proportion of total variation in dependent variable Y that is

SST = SSReg + SSE

SSE = SSError = SSResidual =  Yi −Yˆ i

Negative correlation Positive correlation 16

1. error ei ~ normal (0, 2)

Properties of a and b for Least Square Method

2. ( X , Y ) is the point on regression line

• Homoskedasticity: Equal Error Variance Examine error at

Examine error at different values

Use Durbin-Watson (D-W)

1. D-W  2 => et and et+1 are independent

2. D-W < 2 => et and et+1 are positive relationship

3. D-W > 2 => et and et+1 are negative relationship

• Example: Study time and student achievement.

2.60 28 9.32 1.01 .124

Adjusted Std. Error of

Model Summaryb • Results with outlier removed:

ที่มา :สิ ริศา จักรบุญมา และ ผศ.ดร.ถวัลย์เนียมทรัพย์

ลักษณะความเป็ นพลเมืองดีดา้ นการมี

ที่มา :วรรณกร พลพิชยั และ ผศ.น.ท.หญิง ดร. งามลมัย ผิวเหลือง

1. Error ~ Normal (o,2)

Step 1 : Test effects of independent variables set

Regression (X) k SSReg MSReg MSReg/MSE

Error n-k-1 SSE MSE

Total n-1 SST

Test statistic F = MSReg / MSE

Adjusted R2 for Multiple Regression

Problem of Multicollinearity exist when independent variables

2. Tolerance Tolerance (Xi) = 1 - R( X i )

Corrections for Multicollinearity

Size Multiple Linear regression

Regression 857.721 9 95.302 98.661 .000b

Residual 276.265 286 .966

1 .870a .756 .749 .983

Model Sum of df Mean F Sig.

1 Residual 283.755 292 .972

Total 1133.986 295