Enrollment Forecasting For School Management System: Rabby Q. Lavilles and Mary Jane B. Arcilla
Enrollment Forecasting For School Management System: Rabby Q. Lavilles and Mary Jane B. Arcilla
5, October 2012
563
A. Simple Moving Average constant is close to 1.0, ignoring old data is quick which
Moving average techniques forecast demand by calculating means more weight is given to recent observations and when
an average of actual demands from a specified number of it is close to 0.0, is given relatively less weight is to recent
prior periods. Each new forecast drops the demand in the observations. The best value for the smoothing constant is
oldest period and replaces it with the demand in the most the one that results in the smallest mean of the squared errors
recent period [6]. The formula for simple moving average of or other similar accuracy indicator.
order 3 is shown in (1).
E. Forecasting Error
1
T Y Y Y Differences or deviation from forecast and the actual data
(1) is compared to determine the accuracy of the model. This
where t 3
( 1 2 3 ) phase of model building is called model evaluation. The goal
rd
Y1 – is the 3 value from the value to be forecasted
nd is to compare and get the difference between actual and
Y2 – is the 2 value from the value to be forecasted
Y3 – is the 1st value from the value to be forecasted forecasted value using mean absolute percentage error
(MAPE). MAPE is a measure of accuracy in a fitted time
series value in statistics, specifically trending. It usually
B. Single Exponential Smoothing expresses accuracy as a percentage, and is defined by the
Exponential smoothing is highly suitable for environments formula [10][11] in (5):
such as inventory systems where forecasts must be made [7].
The equation for single exponential smoothing is shown in
1 tn At F (5)
(2) [8]. M
n t1 At
Ft1 yt (1 )(Ft 1 )
(2)
where where
Ft1 is a forecast for the period before current time period t At – Actual value
yt is the actual data at time period t Ft - Forecasted Value
is weight given to the latest data
IV. METHODS AND RESULTS
C. Double Exponential Smoothing
Initial analysis reveals that there are subjects in the
The formula for double exponential smoothing is shown in university which refer to the same subject but used different
(3) and (4). subject codes. This leads to cleaning the data as preparation
for the modeling. Graphical presentation of MA of order
Ft yt (1 )(Ft1 Tt1 )
(3) three (3) showed that there is a similarity with the SES. The
Tt (Ft Ft1 ) (1 )Tt1 consideration of MA to be integrated in school management
(4) system was dropped. Focus on SES and DES and
where: determining the alpha and beta of the models that will
y t is the observed value at time t. produce the least error were the concern of the researcher.
Ft is the forecast at time t. As shown in Fig. 1 and Fig. 2, the effects of the different
values of alpha on the service courses shows that an alpha of
Tt is the estimated slope/trend at time t.
0.9 gives more value to the latest values used in the forecast
- representing alpha - is the first smoothing constant, as described in the model. The graph with alpha of 0.1
used to smooth the observations. creates an average value of the data. It smoothes the
- representing beta - is the second smoothing differences that the actual data is producing and tends to
constant, used to smooth the trend. create an average value based on the history of data. Hence,
Initializing the value of the models is dependent on the the value of alpha on SES determines the percentage to be
implementation. In this study the initial value is computed extracted from the latest data. Determining the value of
by alpha therefore is a
setting the first Ft to yt , and the initial slope Tt is set to the critical concern of the SES.
difference between the first two observations [9].
Moving average is compared with single exponential Simple Exponential Smoothing
No of student
smoothing to determine which models can represent the data Parameters: Alpha= 0.9
of the university. These models represent the mean of the 3,000.0
student enrolled in a given subject. Double exponential 2,000.0
smoothing is used in subjects that represent increasing or 1,000.0
decreasing trends.
0.0
D. Choosing the Smoothing Constants 1 2 3 4 5 6 7 8 9 10 11
The smoothing constants must be values in the range
0.0-1.0. The most appropriate smoothing constant depends SY
Initial Values Forecasting
on the data series being modeled. In general, the speed at
which the older responses are dampened is a function of the
value of the smoothing constant. When this smoothing
Fig. 1. Single exponential smoothing with alpha of 0.9.
Initial analysis on single exponential smoothing can be
Simple Exponential Smoothing
No. of students
Parameters: Alpha= 0.1
3,000.0
2,000.0
1,000.0
0.0
1 2 3 4 5 6 7 8 9 10 11
SY
V. MODELING PROCESS
Moving average is comparable with naive model of
forecast. Including the model as part of the academic
program management is not a pragmatic approach. It also
exhibits the same characteristics with single exponential
smoothing but defers only on the value assigned to past data.
Single exponential smoothing shows that on average an
alpha of 0.9 using MAPE or MPE shows less error. These
subjects show that as the alpha is increasing the error
decreases. On the other hand, 20% of the subjects showed
the opposite pattern. That means that 80% of the subjects
considered the latest observation as major factor in
projecting the number of subjects. The remaining 20% gives
emphasis on old records of the students enrolled in a subject.
A. Initial Analysis
interpreted that alp ha of the subjects cannot be generalized
to an average alpha that will be applied to all subjects. Even
though there are similarities of the patterns on 80% of the
subjects, it does not guarantee that each subject has the same
alpha. The model with alpha that produces the smallest error
does not belong to the average because every subject has its
own alpha based on data. Even the 20% of the subject does
not produce the same alpha. It still ranges from 10 to 50
percent difference which matters on large number of
students.
Double exponential smoothing shows varied alpha and
beta combination with little differences in error. It shows
that part of the result wherein on average, an alpha of 0.9
with beta of 0.1 and alpha of 0.8 with beta of 0.1 has little
differences of 16.4 and 16.5 MAPE respectively.
These results show that using double exponential give
emphasis on the latest base value and less value is given to
the trend part. The beta of 0.1 means that 10% will be
derived from the trend component of the subject.
B. Model Selection
Selecting a model to be integrated in the school
management system is based on the evaluation of the
smallest percentage error. A MAPE is the main basis since it
is not offset by negative numbers in the forecast value [11].
In single exponential smoothing, an alpha with smallest error
is compared to the smallest error of the double exponential
smoothing. The model with least error was used to forecast
on that particular subject given the alpha for single
exponential smoothing and beta and alpha for double
exponential smoothing.
The idea is implemented into a program so that generation
of the alpha for single exponential smoothing and alpha and
beta for double exponential smoothing will not be a tedious
and computation extensive work for a forecaster.
C. Model Building
Brute force is used to find the least error based on MAPE.
Subjects that exhibit a consistent pattern of increasing or
decreasing number of students enrolled were observed to be
modeled using double exponential smoothing. On the other
hand, single exponential smoothing shows that it is a good
model for subjects that follow no particular pattern but more
on identifying the mean depending on the value of alpha.
A total of 182 subjects were used in generating the least
error model based on the available data. About 58% of
subjects generated have least average MAPE using double
exponential smoothing with varying alpha. The remaining
subjects used single exponential smoothing. The 58% of the
subjects that uses double exponential smoothing with
varying alpha shows that 58% of the subjects have consistent
patterns whether increasing or decreasing patterns of
enrolled students. Subjects that follow the double
exponential smoothing with beta greater that 0.30 shows a
consistent increasing or decreasing values. Those that have
small changes on the number of students every school year,
has a lower beta.
The remaining 42% uses single exponential smoothing
which mean that these subjects have an average number of
students throughout its offering. The 42% does not mean that
it has consistent number of students but might have patterns
that on the first part of the school year is increasing and on
the
second part have decreasing patterns. This behavior causes result yields less error using MAPE with about 20 %
the value of alpha to consider the value of the past values or difference in favor of the time series models.
offset some of the errors because of the large error it The techniques used in this study can be used to other
generated on some parts of school year while the model is in educational institutions with similar setting. It can also be
the process of generating the value of alpha and beta with used to enhance the understanding of enrolment patterns.
smallest MAPE. The connection of advising and projection of the number of
students can also be used to create a model for forecasting.
D. Testing and Evaluation Other statistical models can be integrated to the existing
The result of testing is done by comparing the actual model to enhance the accuracy of the model. Furthermore,
number of students enrolled in first semester academic year combination of different statistical models can be considered
2009-2010 with the result of the forecasted value for the said for further study. One form of combination is using cohort
year and the naive method. survival models of every program and used it as an
The projected data from the time series models yield less independent variable in projecting the number of students to
error using MAPE with 20.5 % difference in favor of the enroll. One category of quantitative models is causal models.
time This can also be explored by using the factors identified in
series models. The 20.5% advantage of the model over the this study.
current practice which is the naive method doesn’t have
much effect on subjects with students less than 100 but in
subjects with enrolled students which can go beyond one ACKNOWLEDGMENT
thousand, this value matters. Take for instance ENG 1 which
has an enrollee of 2,514, 20% of that number is more or less This research is funded by PGMASEGS (President Gloria
502 students. This number of students can open up to 12 or Macapagal-Arroyo Science and Engineering Graduate
13 sections with 40 students each section. The 20% Scholarship) under Commission on Higher Education of the
difference has major effect on the consideration of the Philippines.
number of sections. The 20.5% advantage over the naive
method can already create a difference on estimating the REFERENCES
number of students to enroll in a certain subject. Of the [1] D. Cecez-Kecmanovic, “The Discipline of Information Systems-issues
subjects selected as part of the sample study, about 87% and challenges,” R. Ramsower and J. Windsor (Eds.), in Proc. of the
Eight Americas Conference on Information Systems AMCIS 2002,
have enrollees of more than 100. It implies that on service Dallas, Texas, USA, August 9-11 2002, pp. 1696-1703.
courses, it is a good assistance in terms of determining the [2] S. E. Choudhuri, C. R. Standridge, C. Griffin, and W. Wenner,
number of students. Furthermore, majority of service “Enrollment Forecasting for an Upper Division General Education
subjects have more than 100 number of students enrolled. Componentz,” 37th ASEE/IEEE Frontiers in Education Conference,
Milwaukee, WI, 2007.
E. Integration of Forecasting [3] S. Guo, “Three Enrollment Forecasting Models: Issues in Enrollment
Projection for Community Colleges,” presented at the 40th RP
The enrollment forecasting module handles the Conference Asilomar Conference Grounds Pacific Grove, California,
determination of subjects as well as number of students May 1 –3, 2002
[4] J. M. Fraser, S. Djumin, and J. J. Mager, “The University as
expected to enroll. The model is implemented using Educational Lab,” in Proc. 1999 American Society for Engineering
postgreSQL. It is programmed in the database server as part Education Annual Conference & Exposition,1999
of a function of the database. It will then generate a table of [5] J. C. Segura-Ramirez and W. Chang, “Using Markov Chain and
the list of subject and the expected students to enroll. Nearest Neighbor Criteria in an Experience Based Study Planning
System with Linear Time Search and Scalability,” in Proc. 2006 IEEE
The table is to be retrieved in the graphical user interface International Conference, Waikoloa Village, HI, 2006, pp. 395-403.
of the school management system. This can be used as [6] O. T. Olalekon, B. M. Oyewole, and O. A. Olawande, “Forecasting
reference or input when deciding how many sections to open Demand for Office Space in Ikeja, Nigeria,” Mediterranean Journal
of Social Sciences, vol. 3, pp.323-338, .January 2012.
in a particular subject. [7] D. A. Freedman, Statistical Models: Theory and Practice, Cambridge:
Cambridge University Press, 2005.
[8] NIST/SEMATECH e-Handbook of Statistical Methods. [Online].
Available: http://www.itl.nist.gov/div898/handbook, U. S. Commerce
VI. CONCLUSION AND FUTURE WORK Department’s Technology Administration, 2012.
The integration of a forecasting module to the school [9] M. L. Berenson and D. M. Levine, Basic Business Statistics, 7th
Edition. USA: Prentice Hall, 2006.
management system is of assistance in projecting the number [10] J. R. Evans, Statistics, Data Analysis, and Decision Modeling, Upper
of students expected to enroll in a subject. The alpha for SES Saddle, New Jersey: Prentice Hall, 2000.
and alpha and beta for DES are critical factors to be [11] M. Hamburg, Statistical Analysis for Decision Making, 6th Edition,
considered in using a time series models. These values are Belmont, California: Duxbury Press. 1996.
dependent on the data to be used. The brute force approach
Rabby Q. Lavilles is a graduate of MS in Information
used in the study shows a promising result although in terms Technology at De La Salle University – Manila, Philippines.
of performance other algorithms can also be considered. He is a faculty member of the Information Technology
Three time series statistical models are considered in the Department of Mindanao State University – Iligan Institute
study. The simple moving average of order 3 (MA 3), SES of Technology.
with varying alpha and DES. MAPE is used to evaluate the
fit or accuracy of the data to the models. About 58% of Mary Jane B. Arcilla is an Assistant Professor at De La
subjects generated have least average MAPE using Salle University – Manila, Philippines. She is a graduate of
MS in Information Technology from De La Salle
double exponential smoothing with varying alpha and beta.
University – Manila, Philippines.
The remaining subjects use single exponential smoothing
with alpha having least MAPE.
The result of the projections for academic year 2009-2010
is compared to the naive model used by the university. Initial