Stepwise Multiple Regression Method To Forecast Fish Landing
Stepwise Multiple Regression Method To Forecast Fish Landing
net/publication/251713152
CITATIONS READS
41 2,619
1 author:
Sabri Ahmad
Universiti Malaysia Terengganu
36 PUBLICATIONS 287 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sabri Ahmad on 29 January 2016.
Abstract
There are six types of linear regression analyses that available in statistics which are simple linear regression, multiple linear
regressions, logistic regression, ordinal regression, multinominal regression and descriminant analysis. Multiple linear
regressions are the one of linear regression analyses that used to analyze the relationship between single response variable
(dependent variable) with two or more controlled variables (independent variables). In this paper, stepwise multiple regression
will use because this method is combination of forward selection and backward elimination method. The main objective in this
paper is to select the suitable controlled variables in forecast fish landing. Data that has been used in this research were taken
from Fisheries Annually Statistics of Department of Fisheries Malaysia. Then, the data will be analyzed by using Minitab 15 and
SPSS 17.0.
© 2010 Published by Elsevier Ltd.
1. Introduction
A statistics analysis is widely used in all aspects such as in science, medicine, fisheries (Ofuoku et al., 2007) and
also in social sciences (Sarker et al., 2006). There are many methods in statistics and one of them is regression.
There are six types of linear regression analyses which are simple linear regression, multiple linear regression,
logistic regression, ordinal regression, multinominal regression and desriminant analysis. Multiple linear regression
was selected to build a model of fish landing. Some method that categorized in the stepwise-type procedures which
is stepwise regression also used in this paper. The main objective in this paper is to select the suitable controlled
variables in forecast fish landing.
2. Literature review
Wan Nawang et al. (2009) have conducted a research to identify factor of youth’s interest to become fishermen.
There are two methods that use in their research which are factor analysis and multiple linear regression. According
to the MLR that was used in their research, it shows that there are three factors that influence the respondence
interest in fishermen career which are training programme, profitable and marketing. MLR also used by Ofuolu et
al. (2007) in their research to determine of adoption of improved fish production technologies among fish farmers in
Delta States, Nigeria. Level of education, farm size, farm income and extension contact show positively correlation
while age and household size shows negatively correlation between fish production technologies. MLR that applied
in the research by Khamis et al. (2003) states that the higher of R2 gives good result on model fitting. Combination
method of Pearson correlations, multiple and simple linear regression and ANOVA was used by Sain (2006) to see
if there was change in measured habitat and fish metrics occurred in relation to increased urbanization. Sain (2006)
used multiple and linear regression to explain the strongest relationship between fish and habitat.
3. Methodolgy
3.1 Data
In this paper, data were taken from Fisheries Statistics at official website of Department of Fisheries Malaysia,
Ministry of Agriculture & Agro-Based Industry Malaysia. The fisheries statistics include during 40 years which is
from 1968 until 2007. This research is focusing on the marine fish landing in Terengganu.
Multiple linear regressions (MLR) are the method of statistics in regression that used to analyze the relationship
between single response variable (dependent variable) with two or more controlled variables (independent
variables). This method was selected for this research because there were more than controlled variables. In this
research, response variable is marine fish landing ( Y ) while fishermen ( X1 ), fishing boat ( X 2 ) and fishing gears
licensed ( X 3 ) were controlled variables.
There were four general steps to build forecasting model of fish landing in MLR. The general steps were checking
assumptions, selecting suitable methods of MLR, interpret the output and develop equation of MLR.
The first step is to build forecasting model by checking assumptions of data. There are four assumptions that should
be check which are normality, linearity, heteroscedasticity and multicollinearity. All of the variables in this paper
must be normal distribution. The normal distribution can be seen via histogram graph, plot P-P, plot Q-Q, kurtosis
and skewness. If the distribution of data is not normal, so we need to use transformation.
Then, MLR should have linear relationship between response variable and controlled variables. in regression the
model, we fit is a linear model (‘linear model’ just means ‘model based on a straight line’) (Andy, 2005).
The third assumption in MLR is any data should be free from heteroscedasticity. Heteroscedasticity will happen
whenever there is interruption in the model that not fulfilled. If the important variables in the model are missing,
hence heteroscedasticity will happen. Any model can be check whether there is heteroscedasticity or not based on
Spearman’s rank correlation test.
The last assumption that should be checked in research is multicollinearity. Multicollinearity means situation that
has high degree of correlation between controlled variables. Any analyses can be known the present of
multicollinearity by checking the value of variation inflation factor (VIF). When the value of VIF is less than 5,
hence multicollinearity is not serious. While if VIF is more than 5, then multicollinearity is substantial.
Multicollinearity will be more serious whenever the value of VIF is more than 10.
There are three methods in MLR which are forward selection, backward elimination and stepwise regression. All
three methods can be categorized into stepwise-type procedures. In this research, only stepwise regression method
was applied. Stepwise regression method is a combination of forward selection and backward elimination. By
Intan Martina Md Ghani and Sabri Ahmad / Procedia Social and Behavioral Sciences 8 (2010) 549–554 551
referring Minitab Methods and Formulas, standard stepwise regression both adds and removes controlled variables
as needed for each step. Minitab ended its procedure when all variables not in the model have p-value that are less
than the specified Alpha-to-Enter value and when all variables in the model have p-value that are greater than or
equal to the specified Alpha-to-Remove value. Based on Minitab StatGuide, Alpha-to-Enter is a value that
determines if any of the predictors that not currently in the model should be added to the model. While Alpha-to-
Remove is a value that determines if any of the predictors in the model should be removed from the model.
From the SPSS output, we can interpret the values of Pearson coefficient, multiple coefficient of determination (R 2),
multiple correlation coefficient (R) and adjusted multiple coefficient of determination (adjusted R 2). Correlation
analyses is the method in regression that is used to look how far the relationship between two variables. Pearson
linear correlation was applied because to check the existence of relationship between two variables in this research.
According to Piaw (2006), some of the strength value (r) are as follows:
R2 means the proportion of the total variation in the n observed values of the dependent variable that is explained by
the overall regression model (Bowerman et al., 2005). The higher R2, the better the model fits your data. R is the
positive squared root of R2 (Levin & Rubin, 1994).
Adjusted R2 is a measure of the loss of predictive power or shrinkage in regression. The adjusted R 2 will explain
how much variance in the outcome would be accounted for if the model had been derived from the population from
which the sample was taken (Andy, 2005). The larger adjusted R 2, the better the model fits the data.
H 0 : E1 E 2 E3 0
H a : At least one of the E1, E 2 and E 3 does not equal to 0
Y E0 E1 X1 E2 X 2 E3 X 3 H (1)
where
552 Intan Martina Md Ghani and Sabri Ahmad / Procedia Social and Behavioral Sciences 8 (2010) 549–554
All data was analyzed using Statistical Package for the Social Sciences 17.0 (SPSS 17.0) and Minitab version 15
(Minitab 15).
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Marine fish .080 40 .200 .973 40 .446
landing
The normality can check by using two test as shown in Table 2 which are by Kolmogorov-Smirnov (if sample size is
more than 50) and Shapiro-Wilk (if sample size is less than 50). Since the sample size in this research n 40 50 ,
so we need to use Shapiro-Wilk which gives a Sig. value equal to .446. It shows that marine fish landing is normally
distributed (Sig.= .446 > 0.05).
According to Table 3, it shows that correlations between response variable and controlled variables has been
determined using Pearson Correlation. From this table, it shows that the weak correlation between marine fish
landing and fishermen (r=-.429), while the correlation between marine fish landing and fishing gears licensed
(r=.630) is medium. The strong correlation shows between fishermen and fishing boat (r=.828). Therefore the three
relationship are significant because Sig. < .05.
The strength correlation between marine fish landing and fishing boat (r=-.268), fishermen and fishing gears
licensed (r=-.239) also fishing boat and fishing gears licensed (r=-.073) shows very weak. The three relationship
explains that there are not significant because Sig. > .05. The negative sign in the Table 3 shows the direction of the
correlation.
Intan Martina Md Ghani and Sabri Ahmad / Procedia Social and Behavioral Sciences 8 (2010) 549–554 553
The model summary is based on the Table 4. The value of R 2 is .397 (Model 1) shows that there are 39.7% (R=.630)
changes in response variable (marine fish landing). It is because changes in controlled variable (fishing gears
licensed). This explains that fishing gears licensed is major factor to the marine fish landing. The value of R 2 is .479
(R=.692) for Model 2 shows that there are 47.9% changes in response variable (marine fish landing) is occurred
because changes in combination of two controlled variables which are fishing gears licensed and fishermen. By
comparing both models, Model 2 is a better model fits to the data than Model 1. This is because the higher the value
or R2 and adjusted R2 (Model 1= .381; Model 2= .451), the better the model fits to the data.
Table 5. Coefficients
According to Table 5, it shows coefficients. From that coefficient, we can develop two model using MLR equation
such as below:
Model 1:
Y 16062.147 33.463X 3 (2)
Model 2:
Y 88359.030 6.410X1 29.716X 3 (3)
Based on the Model 1, the standardized coefficients for fishing gears licensed ( E .630, p .05 ) is significant. This
explained that only fishing gears licensed is factor to the marine fish landing. While in Model 2, the standardized
coefficients for fishermen ( E .296, p .05 ) and fishing gears licensed ( E .559, p .05 ) are significant. This
explained that fishing gears licensed and fishermen are factors to the marine fish landing. As conclude, Model 2 is
better model fits to the data than Model 1.
Step 1 2
Constant 16062 88359
X3 33.5 29.7
t-value 5.00 4.58
p-value .000 .000
X1 -6.4
t-value -2.42
p-value .020
S 27700 26082
R-Sq 39.69 47.94
554 Intan Martina Md Ghani and Sabri Ahmad / Procedia Social and Behavioral Sciences 8 (2010) 549–554
According to Table 6, it presents the result of stepwise regression using Minitab 15. There are two steps that used to
select the controlled variables. In the step 1, X 3 indicates smallest p-value than Alpha-to-Enter (p=.000<.05).
Therefore X 3 is the first variable that enters into the model. In the step 2, the p-value for X 1 shows .020 which is
smallest than .05. Therefore X 1 is the second variable that needs to enter into the model. X 3 is retained in the
model with the coefficient 29.7, t-value is 4.58 and p-value is .000. After the second step, there are no controlled
variables that enter and remove the model.
For the marine fish landing output, the value of S decreases from step 1 (S=27700) to step 2 (S=26082), R-Sq and
R-Sq(adj) in the step 2 (R-Sq=47.94, R-Sq(adj)=45.13) is higher than step 1 (R-Sq=39.69, R-Sq(adj)=38.11)and
Mallows Cp in step 2 is closer to number of controlled variables than step 1. Hence, these statistics indicates step 2
which containing controlled variables X 3 and X 1 is provides better fits to the data. The final model shows only two
controlled variables by using stepwise regression and produced regression model such as follows:
5. Conclusions
This paper using multiple linear regressions (MLR) to built forecasting model for fish landing. Based on the output
from SPSS 17.0, there are two model was built. By comparing two models using SPSS 17.0, Model 2 is a better
model fits to the data than Model 1. This is because the value of R 2 and adjusted R2 in Model 2 (R2=.479, adjusted
R2=.451) is higher than Model 1 (R2=.397, adjusted R2=.381). In the stepwise regression method shows that only
two controlled variables that were selected using Minitab 15 which are fishermen and fishing gears licensed. This
explained that only fishermen and fishing gears licensed are factors to the marine fish landing.
References
Andy, F. (2005). Discovering statistics using SPSS: (and sex, drugs and rock ‘n’ roll). London: SAGE.
Bowerman, B.L., O’Connell, R.T. and Koehler, A.B. (2005). Forecasting, time series and regression. 4 th ed. United States of America:
Brooks/Cole Thomson Learning Inc.
Khamis, A., Ismail, Z. & Shabri, A. (2003). Pemodelan harga minyak sayuran menggunakan analisis regresi linear berganda. Matematika, Jilid 9,
bil. 1, 59-70.
Levinm R.I. and Rubin, D.S. (1994). Statistics for management. 6 th ed. Englewood Cliffs, N.J.: Prentice Hall.
Ofuoku, A.U., Olele & Emah, G. (2007). Determinants of adoption of improved fish production technologies among fish farmer in Delta States,
Nigeria. Journal of Fisheries International, 2(2):147-151.
Piaw, C.Y. (2006). Asas statistik penyelidikan (Buku 2). Kuala Lumpur: McGraw-Hill.
Sain, R.L. (2006). Characterizing how fish communities and physical habitat structure are affected by urbanization in an East Tennessee
Watershed, Master thesis, University of Tennessee, Knoxville.
Sarker, M.A., Chowdhury, A.H. & Itohara, Y. (2006). Entrepreneurships barriers of pond fish culture in Bangladesh- A case study from
Mymensingh district. Journal of Social Sciences, 2(3):68-73.
Wan Nawang, W.M.Z., Mamat, I. & Mohd Isa, A.M. (2009). Faktor peramal minat belia untuk menjadi nelayan: Satu kajian di Mukim Kuala
Besut, Terengganu. Jurnal Teknologi, 50(E):29-52, Universiti Teknologi Malaysia.