0% found this document useful (0 votes)

86 views17 pages

Enrollment Prediction - Project

The document discusses using data mining techniques to build models for predicting student enrollment using admissions data. It describes various classification algorithms like decision trees and rules that were used to build predictive models and evaluate them. The document also provides details about the data that was used, which came from a university's student information system, and how it was extracted and preprocessed for modeling.

Uploaded by

Abiy Mulugeta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views17 pages

Enrollment Prediction - Project

Uploaded by

Abiy Mulugeta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Enrollment Prediction Models Using Data Mining

Ashutosh Nandeshwar Subodh Chaudhari

April 22, 2009

1 Introduction
Following World War II, a great need for higher education institutions arose
in the United States, and the higher education leaders built institutions on
“build it and they will come” basis. After the World War II, enrollment in the
public as well as the private institutions soared (Greenberg, 2004); however, this
changed by 1990s, due to a significant drop in enrollment, universities were in
a marketplace with “hypercompetition,” and institutions faced the unfamiliar
problem of receiving less applicants than they were used to receive (Klein, 2001).
Today higher education institutions are facing the problem of student re-
tention, which is related to graduation rates; colleges with higher freshmen
retention rate tend to have higher graduation rates within four years. The av-
erage national retention rate is close to 55% and in some colleges fewer than
20% of incoming student cohort graduate (Druzdzel and Glymour, 1994), and
approximately 50% of students entering in an engineering program leave before
graduation (Scalise et al., 2000).
Tinto (1982) reported national dropout rates and BA degree completions
rates for the past 100 years to be constant at 45 and 52 percent respectively
with the exception of the World War II period (see Figure 1 for the comple-
tion rates from 1880 to 1980). Tillman and Burns at Valdosta State University
(VSU) projected lost revenues per 10 students, who do not persist their first
semester, to be $326,811. Although gap between private institutions and public
institutions in terms of first-year students returning to second year is closing,
the retention rates have been constant for a long period for both types of insti-
tutions(ACT, 2007). National Center for Public Policy and Higher Education
(NCPPHE) reported the U.S. average retention rate for the year 2002 to be
73.6% (NCPPHE, 2007). This problem is not only limited to the U.S. institu-
tions, but also for the institutions in many countries such as U.K and Belgium.
The U.K. national average freshmen retention for the year 1996 was 75% (Lau,
2003), and Vandamme (2007) found that 60% of the first generation first-year
students in Belgium fail or dropout.

1
Figure 1: BA Degree Completion Rates for the period 1880 to 1980, where
Percent Completion is the Number of BAs Divided by the Number of First-time
Degree Enrollment Four Years Earlier (Tinto, 1982)

1.1 Previous Applications of Data Mining

Various researchers have applied data mining in different areas of education,
such as enrollment management (Gonzlez and DesJardins, 2002; Chang, 2006;
Antons and Maltz, 2006), graduation (Eykamp, 2006; Bailey, 2006), academic
performance (Naplava and Snorek, 2001; Pardos et al., 2006; Vandamme, 2007;
Ogor, 2007), gifted education (Ma et al., 2000; Im et al., 2005), web-based
education (Minaei-Bidgoli et al., 2003), retention (Druzdzel and Glymour, 1994;
Sanjeev and Zytkow, 1995; Massa and Puliafito, 1999; Stewart and Levin, 2001;
Veitch, 2004; Barker et al., 2004; Salazar et al., 2004; Superby et al., 2006;
Sujitparapitaya, 2006; Herzog, 2006; Atwell et al., 2006; Yu et al., 2007; DeLong
et al., 2007), and other areas (Intrasai and Avatchanakorn, 1998; Baker and
Richards, 1999; Thomas and Galambos, 2004). Luan and Serban (2002) listed
some of the applications of data mining to higher education, and provided some
case studies to showcase the application of data mining to the student retention
problem.. Delavari and Beikzadeh (2004); Delavari et al. (2005) proposed a data
mining analysis model to used in higher educational system, which identified
various research areas in higher education that could use data mining.

2
2 Research Objective
Research objectives of this project were:
• To build models to predict enrollment using the student admissions data
• To evaluate the models using cross-validation, win-loss tables and quartile
charts

• To present explainable theories to the business users

2.1 Tools Used

DaimlerChrysler (then Daimler-Benz), SPSS (then ISL), and NCR, in 1996,
worked together to form the CRoss Industry Standard Process for Data Mining
(CRISP-DM). Their philosophy behind creating this standard was to form non-
propriety, freely available, and application-neutral standards for data mining.
Figure 2 shows CRISP-DM version 1.0, and it illustrates the non-linear (cyclic)
nature of data mining.
Standard’s phases include, business understanding, data understanding, data
preparation, modeling, evaluation, and deployment. This standard was the base
of this research, and we created data mining models using Weka, which is a col-
lection of machine learning algorithms for data mining tasks and an open source
software. In addition, we used MS Access to import the flat files in database for-
mat, modifying and creating new fields, and converting Access tables to ARFF
using VBA.

3 Classifiers
3.1 Decision Trees
Decisions tree are a collection of nodes, branches, and leaves. Each node rep-
resents an attribute; this node is then split into branches and leaves. Decision
trees work on the “divide and conquer” approach; each node is divided, us-
ing purity information criteria, until the data are classified to meet a stopping
condition. Gini index and information gain ratio are two common purity mea-
surement criteria; Classification and Regression Tree (CART) algorithm uses
Gini index, and C4.5 algorithm uses the information gain ratio (Quinlan, 1986,
1996). The Gini index is given by Equation 1, and the information gain is given
by Equation 2.
m
X X
2
IG (i) = 1 − f (i, j) = f (i, j) f (i, k) (1)
j=1 j6=k

m
X
IE (i) = − f (i, j) log2 f (i, j) (2)
j=1

3
Figure 2: CRISP-DM Model Version 1.0

where, m is the number of values an attribute can take, and f (i, j) is the
proportion of class in i that belong to the j t h class.
Figure 3 is an example of construction decision tree using the Titanic data
and Clementine software. Based on the impurity, Clementine selected the at-
tribute sex (male and female) as the root node, then for attribute value sex =
male, Clementine created one more split on age (child and adult).

3.2 Rules
Construction of rules is quite similar to the construction of decision trees; how-
ever, rules first cover all the instances for each class, and exclude the instances,
which do not have class in it. Therefore, these algorithms are called as covering
algorithms, and pseudocode of such algorithm is given in Figure 4 reproduced
from Witten and Frank (2005). A fictitious example of a rule learner is given
in Figure 5.

4
Figure 3: Construction of Decision Tree by Clementine

For each class C

Initialize E to the instance set
While E contains instances in class C
Create a rule R with an empty left-hand side that predicts class C
Until R is perfect (or there are no more attributes to use) do
For each attribute A not mentioned in R, and each value v,
Consider adding the condition A=v to the LHS of R
Select A and v to maximize the accuracy p/t
(break ties by choosing the condition with the largest p)
Add A=v to R
Remove the instances covered by R from E

Figure 4: Pseudocode for a Basic Rule Learner

5
IF FinancialAid = "Yes" AND HighSchoolGPA > 3.00
THEN Persistence="Yes"

IF FinancialAid="No" AND HighSchoolGPA < 2.5 AND HoursRegistered < 10

THEN Persistence="No"

Figure 5: A Fictitious Example of a Rule Learner

4 Data
The data for this research were from WVU’s data warehouse. WVU used Sun-
Gard Banner as an Enterprise Resource Planning (ERP) system to run student
operations. This system stores the data in a relational database form, which
an unit in the Office of Information Technology (OIT) used to SQL queries to
obtain data in flat files. This flow of data is represented in Figure 6.

Figure 6: Data Flow of WVU’s Admissions Data

4.1 Extraction and Preprocessing

We used admissions data from spring 1999 to fall 2006, and there were approxi-
mately 3,000 applications for spring and 25,000 applications for fall. These data

6
contained 248 attributes with demographical and academic information of the
applicants. We performed some preprocessing on these data to use them in the
modeling process:
• All the data tables were joined to create a single table.

• Flag variables were modified —Enrollment indicator, First Generation

indicator, Accepted indicator.
• ACT and SAT scores were combined using concordance tables.
• Permanent address Zip codes were used to create a field Median Family
Income from using Zip code and Income data from census.gov website.

• Applications which were not accepted were removed, and the total number
of instances reduced to 112,390.
• Domain knowledge and common sense was used to remove some attributes
—email address, phone numbers, etc

• Access table was converted to ARFF using VBA script

• String variables were removed using Weka’s preprocessing filter: RemoveType
string

4.2 Data Visualization

Data visualization in Weka offered interesting insights on these data. For ex-
ample, Figure 7 shows that 51% of accepted applicants enrolled at WVU, and
Figure 8 shows that 92% of accepted applicants who received some form of fi-
nancial aid enrolled at WVU. Figure 9 shows that 66% of accepted WV residents
enrolled at WVU and 62% of accepted non-residents did not enroll at WVU.

5 Experiment
5.1 Feature Subset Selection (FSS)
Feature subset selection is a method to select relevant attributes (or features)
from the full set of attributes as a measure of dimensionality reduction. Al-
though some of the data mining techniques, such as decision trees, select rel-
evant attributes, their performance can be improved, as the experiments have
shown(Witten and Frank, 2005, p. 288).
Two main approaches of feature or attribute selection are the filters and
the wrappers (Witten and Frank, 2005). A filter is an unsupervised attribute
selection method, which conducts an independent assessment on general char-
acteristics of the data. It is called as a filter because the attributes are filtered
before the learning procedure starts. A wrapper is a supervised attribute selec-
tion method, which uses data mining algorithms to evaluate the attributes. It is

7
Figure 7: Enrolled Indicator

Figure 8: Financial Aid Indicator

8
Figure 9: Residency Indicator

called as a wrapper because the learning method is wrapped in the attribute se-
lection technique. In an attribute selection method, different search algorithms
are employed, such as, genetic algorithm, greedy step-wise, rank search, and
others.
For this research, we used Wrapper and InfoGain; Wrapper included J48
tree learner and Naive Bayes learner as part of the attribute selection process.
We used these FSS techniques to generate rankings of attributes in order of
importance. We then used these rankings for adding attributes in the dataset
to evaluate the changes in accuracy on three different learners: J48, Naive
Bayes, and RIDOR. To avoid any learning bias, we cross-validated each learning
procedure 10 times.

5.2 xval
We ran a script called xval, which performed following actions:

• Randomly divided data in two parts: training and testing

• Applied specified discretizers to the datasets
• Applied specified learners to given datasets < repeat > number of times

For this experiment, we set value of < repeat > to 10, and we used Nbins
and Fayyad-Irani’s discretizers. We used five learners: JRip, J48, Aode, Bayes,
and OneR.

9
6 Results
Using the rankings obtained from the FSS experiment, we added each attribute
sequentially in the dataset and ran the learning procedure to observe the changes
in the accuracy. As shown in Figure 10, accuracy was between 83%-84% for all
combinations after adding the variable: FinancialAid Indicator.

(a) Accuracy Using Wrapper and J48 (b) Accuracy Using Wrapper and Bayes

(c) Accuracy Using InfoGain

Figure 10: Results Comparison for FSS, Accuracy, and Number of Attributes

Dataset, Data WRP NB J48, was created with two attributes selected using
wrapper, and dataset, Data IG, was created with seven attributes selected using
InfoGain, because the tree size was small with seven attributes (see Figure
11). Figure 12 shows the results obtained by using different learners on the
datasets created using Wrapper and InfoGain. RiDor with nbins discretizer was
the best for Data WRP NB J48 dataset (highlighted in Figure 12a), and J48
with Fayyad-Irani’s discretizer was the best for Data IG dataset (highlighted in
Figure 12b).
There was no significant difference found between these two datasets by any
of the learners; however, statistically, by means of t-test with 95% confidence,
J48 with Fayyad-Irani was the best and OneR with nbins was the worst, as

10
Figure 11: Number of Attributes vs. Tree Size using J48

shown in the win-loss table (Figure 13a). The quartile charts show the margin
of win or a loss on another learners, as shown in Figure 13b.

6.1 Learned Theory

Using Win-Loss tables and Quartile charts as guideline for selecting learners and
attributes, we found the rules using J48, given in Figure 14, and using RiDor,
given in Figure 15.

7 Conclusions
Overall, financial aid was the most important factor that attracted students to
enroll. Student enrolled at this institution if they received some form of financial
aid, regardless of their high school GPA and ACT/SAT scores. Therefore,
financial aid can be used as a controlling factor for increasing the quality of
incoming student.

11
(a) Results for Wrapper Dataset

(b) Results for InfoGain Dataset

Figure 12: Pivot Table for Datasets, Learners, and Discretizers

12
(a) Win-Loss Table (b) Quartile Chart

Figure 13: Win-Loss Table and Quartile Chart

FinancialAidIndicator = N
| ApplicationStypCode = 0: N
| ApplicationStypCode = A: N
| ApplicationStypCode = B: Y
| ApplicationStypCode = C: N
| ApplicationStypCode = D: Y
| ApplicationStypCode = E: Y
FinancialAidIndicator = Y: Y
Number of Leaves : 7
Size of the tree : 9
Correctly Classified Instances 93448 83.1462%

Figure 14: J48 tree with two attributes and accuracy of 83.15%

13
EnrolledIndicator = Y
Except (FinancialAidIndicator = N) and (ApplicationStypCode = A) =>
EnrolledIndicator = N
Except (FinancialAidIndicator = N) and (ApplicationStypCode = C) =>
EnrolledIndicator = N
Total number of rules (incl. the default rule): 3
Correctly Classified Instances 93349 83.0581 %

Figure 15: Ridor rules with two attributes and accuracy of 83.05%

8 Future Work
Attributes, such as, distance from the campus and first method of contact,
should be created to see their effect. Although financial aid was the most sig-
nificant factor resulting in enrollment, amount of financial aid offered should
also be included in the data. So that “bins” can be created on the amount of
financial aid offered and then learners can be used for classification using those
bins.
Even though financial aid helps recruiting students, it does not necessarily
help in retaining the students. In order to find attributes affecting retention,
enrolled indicator and “persistence indicator” should be combined. Similar ex-
periments would be necessary to find a relationship between the student demo-
graphic, academic information, and retention

9 Acknowledgments
Authors sincerely thank Dr Tim Menzies, our faculty adviser, and Roberta
Dean, director of institutional research, both from West Virginia University, for
the efforts, the expertise, and the help offered to us for this project.

14
References
ACT. ACT National Collegiate Retention and Persistence to Degree Rates, 2007.
http://www.act.org/research/policymakers/reports/retain.html.

C.M. Antons and E.N. Maltz. Expanding the role of institutional research at small
private universities: A case study in enrollment management using data mining.
New Directions for Institutional Research, 2006(131):69, 2006.

R. H. Atwell, W. Ding, M. Ehasz, S. Johnson, and M. Wang. Using data mining

techniques to predict student development and retention. In Proceedings of the
National Symposium on Student Retention, 2006.

B.L. Bailey. Let the data talk: Developing models to explain IPEDS graduation rates.
New Directions for Institutional Research, 2006(131):101–11515, 2006.

Bruce D. Baker and Craig E. Richards. A comparison of conventional linear regression

methods and neural networks for forecasting educational spending. Economics of
Education Review, 18(4):405–415, 1999.

K. Barker, T. Trafalis, and T. R. Rhoads. Learning from student data. Systems and
Information Engineering Design Symposium, pages 79–86, 2004.

L. Chang. Applying data mining to predict college admissions yield: A case study.
New Directions for Institutional Research, 2006(131), 2006.

N. Delavari and M. R. Beikzadeh. A new analysis model for data mining processes in
higher educational systems, 2004.

N. Delavari, M.R. Beikzadeh, and S. Phon-Amnuaisuk. Application of enhanced analy-

sis model for data mining processes in higher educational system. ITHET 6th Annual
International Conference , pages 7–9, July 2005.

C. DeLong, P. M. Radcliffe, and L. S. Gorny. Recruiting for retention: Using data

mining and machine learning to leverage the admissions process for improved fresh-
man retention. In Proceedings of the National Symposium on Student Retention,
2007.

M. J. Druzdzel and C. Glymour. Application of the TETRAD II program to the study

of student retention in u.s. colleges. In Working notes of the AAAI-94 Workshop on
Knowledge Discovery in Databases (KDD-94), pages 419–430, Seattle, WA, 1994.

P.W. Eykamp. Using data mining to explore which students use advanced placement
to reduce time to degree. New Directions for Institutional Research, 2006(131):83,
2006.

J. M. B. Gonzlez and S. L. DesJardins. Artificial neural networks: A new approach

to predicting application behavior. Research in Higher Education, 43(2):235–258,
2002.

M. Greenberg. How the GI bill changed higher education, June 18, 2004 2004.

S. Herzog. Estimating student retention and degree-completion time: Decision trees

and neural networks vis--vis regression. New Directions for Institutional Research,
131(2006), 2006.

15
K. H. Im, T. H. Kim, S. Bae, and S. C. Park. Conceptual modeling with neural network
for giftedness identification and education. In Advances in Natural Computation,
volume 3611, page 530. Springer, 2005.

C. Intrasai and V. Avatchanakorn. Genetic data mining algorithm with academic

planning application. In IASTED International Conference on Applied Modeling
and Simulation, pages 286–129, Alberta, Canada, 1998.

TA. Klein. A fresh look at market segments in higher education. Planning for Higher
Education, 30(1):5, 2001.

L. K. Lau. Institutional factors affecting student retention. Education, 124(1):126–137,

2003.

J. Luan and A. M. Serban. Data mining and its application in higher education. In
Knowledge Management: Building a Competitive Advantage in Higher Education:
New Directions for Institutional Research. Jossey-Bass, 2002.

Y. Ma, B. Liu, C. K. Wong, P. S. Yu, and S. M. Lee. Targeting the right students
using data mining. In Conference on Knowledge Discovery and Data mining, pages
457–464, Boston, Massachusetts, 2000. ACM Press New York, NY, USA.

S. Massa and P.P. Puliafito. An application of data mining to the problem of the
university students’ dropout using markov chains. In Principles of Data Mining and
Knowledge Discovery. Third European Conference, PKDD’99, pages 51–60, Prague,
Czech Republic, 1999.

B. Minaei-Bidgoli, D.A. Kashy, G. Kortmeyer, and W.F. Punch. Predicting student

performance: an application of data mining methods with an educational web-
based system. In 33rd Annual Frontiers in Education, pages T2A–13–18 Vol.1,
Westminster, CO, USA, 2003. IEEE.

P. Naplava and N. Snorek. Modeling of student’s quality by means of GMDH algo-

rithms. In Modelling and Simulation 2001. 15th European Simulation Multiconfer-
ence 2001. ESM’2001, pages 696–700, Prague, Czech Republic, 2001.

NCPPHE. Retention rates - first-time college freshmen returning their second year
(ACT), 2007.

E.N. Ogor. Student academic performance monitoring and evaluation using data min-
ing techniques. Electronics, Robotics and Automotive Mechanics Conference, 2007.
CERMA 2007, pages 354–359, 2007.

Z.A. Pardos, N.T. Heffernan, B. Anderson, and C.L. Heffernan. Using fine grained
skill models to fit student performance with bayesian networks. In 8th Interna-
tional Conference on Intelligent Tutoring Systems (ITS 2006), pages 5–12, Jhongli,
Taiwan, 2006.

J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

J. R. Quinlan. Improved use of continuous attributes in C4. 5. Journal of Artificial

Intelligence Research, 4:77–90, 1996.

16
A. Salazar, J. Gosalbez, I. Bosch, R. Miralles, and L. Vergara. A case study of knowl-
edge discovery on academic achievement, student desertion and student retention.
Information Technology: Research and Education, 2004. ITRE 2004. 2nd Interna-
tional Conference on, pages 150–154, 2004.

A.P. Sanjeev and J.M. Zytkow. Discovering enrolment knowledge in university

databases. In First International Conference on Knowledge Discovery and Data
Mining, pages 246–51, Montreal, Que., Canada, 1995.

A. Scalise, M. Besterfield-Sacre, L. Shuman, and H. Wolfe. First term probation:

models for identifying high risk students. In 30th Annual Frontiers in Education
Conference, pages F1F/11–16 vol.1, Kansas City, MO, USA, 2000. Stripes Publish-
ing.

D. L. Stewart and B. H. Levin. A model to marry recruitment and retention: A case

study of prototype development in the new administration of justice program at
blue ridge community college, 2001.

S. Sujitparapitaya. Considering student mobility in retention outcomes. New Direc-

tions for Institutional Research, 2006(131), 2006.

J. F. Superby, J. P. Vandamme, and N. Meskens. Determination of factors influencing

the achievement of the first-year university students using data mining methods.
In 8th International Conference on Intelligent Tutoring Systems (ITS 2006), pages
37–44, Jhongli, Taiwan, 2006.

E. H. Thomas and N. Galambos. What satisfies students? mining student-opinion

data with regression and decision tree analysis. Research in Higher Education, 45
(3):251–269, 2004.

C. Tillman and P. Burns. Presentation on First Year Experience. http://www.

valdosta.edu/~cgtillma/powerpoint.ppt.

V. Tinto. Limits of Theory and Practice in Student Attrition. The Journal of Higher
Education, 53(6):687–700, 1982.

J.P. Vandamme. Predicting Academic Performance by Data Mining Methods. Edu-

cation Economics, 15(4):405–419, 2007.

W. R. Veitch. Identifying characteristics of high school dropouts: Data mining with a

decision tree model, 2004.

I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Tech-
niques. Morgan Kaufmann Publishers, San Francisco, 2 edition, 2005.

Chong Ho Yu, Samuel DiGangi, Angel Jannasch-Pennell, Wenjuo Lo, and Charles
Kaprolet. A data-mining approach to differentiate predictors of retention between
online and traditional students, 2007.

Student's Placement Eligibility Prediction Using Fuzzy Approach
No ratings yet
Student's Placement Eligibility Prediction Using Fuzzy Approach
5 pages
Top 10 Data Mining Papers
No ratings yet
Top 10 Data Mining Papers
126 pages
(Cybernetics and Information Technologies) Predicting Student Performance by Using Data Mining Methods For Classification
No ratings yet
(Cybernetics and Information Technologies) Predicting Student Performance by Using Data Mining Methods For Classification
12 pages
Prediction of Students' Educational Status Using CART Algorithm, Neural Network, and Increase in Prediction Precision Using Combinational Model
No ratings yet
Prediction of Students' Educational Status Using CART Algorithm, Neural Network, and Increase in Prediction Precision Using Combinational Model
5 pages
Synopsis New
No ratings yet
Synopsis New
5 pages
Avc06ijarse PDF
No ratings yet
Avc06ijarse PDF
10 pages
Prediction of Higher Education Admissibility
No ratings yet
Prediction of Higher Education Admissibility
7 pages
Evaluation of Student Academic Performan
No ratings yet
Evaluation of Student Academic Performan
7 pages
Hybrid Decision Tree and Naïve Bayes Classifier For Predicting Study Period and Predicate of Student's Graduation
No ratings yet
Hybrid Decision Tree and Naïve Bayes Classifier For Predicting Study Period and Predicate of Student's Graduation
6 pages
CLASSIFICATION ACADEMIC DATA USING MACHINE LEARNING FOR DECISION MAKING PROCESSJournal of Applied Engineering and Technological Science
No ratings yet
CLASSIFICATION ACADEMIC DATA USING MACHINE LEARNING FOR DECISION MAKING PROCESSJournal of Applied Engineering and Technological Science
14 pages
Tracking and Predecting Students Performance With Machine Learning
0% (1)
Tracking and Predecting Students Performance With Machine Learning
47 pages
1 Romero 3 PDF
No ratings yet
1 Romero 3 PDF
10 pages
Apoorva CSITSS
No ratings yet
Apoorva CSITSS
5 pages
Classification Model of Prediction For Placement of Students
No ratings yet
Classification Model of Prediction For Placement of Students
9 pages
Student Performance Prediction Using Machine Learn
No ratings yet
Student Performance Prediction Using Machine Learn
8 pages
Performance Improvement in Education Sector Using Datamining
No ratings yet
Performance Improvement in Education Sector Using Datamining
21 pages
Data Mining: A Prediction of Performer or Underperformer Using Classification
No ratings yet
Data Mining: A Prediction of Performer or Underperformer Using Classification
5 pages
Data Mining Applications: A Comparative Study For Predicting Student's Performance
No ratings yet
Data Mining Applications: A Comparative Study For Predicting Student's Performance
7 pages
Educational Data Mining and Analysis of Students' Academic Performance Using WEKA
No ratings yet
Educational Data Mining and Analysis of Students' Academic Performance Using WEKA
13 pages
9746 14870 1 PB
No ratings yet
9746 14870 1 PB
13 pages
Research Article: A Neuro-Fuzzy Approach in The Classification of Students' Academic Performance
No ratings yet
Research Article: A Neuro-Fuzzy Approach in The Classification of Students' Academic Performance
8 pages
Article 6
No ratings yet
Article 6
6 pages
Badr 2016
No ratings yet
Badr 2016
10 pages
Student Performance Prediction by Using Data Mining Classification Algorithms
No ratings yet
Student Performance Prediction by Using Data Mining Classification Algorithms
6 pages
Management-Mining Students Data To Predict Student
No ratings yet
Management-Mining Students Data To Predict Student
6 pages
Novel Approach To Evaluate Student Performance Using Data Mining
No ratings yet
Novel Approach To Evaluate Student Performance Using Data Mining
6 pages
Comparative Analysis of Decision Tree CL
No ratings yet
Comparative Analysis of Decision Tree CL
4 pages
Final Survey Paper 17-9-13
No ratings yet
Final Survey Paper 17-9-13
5 pages
Expert System For Student Placement Prediction
No ratings yet
Expert System For Student Placement Prediction
5 pages
Thesis Presentation - Martin Ieong
No ratings yet
Thesis Presentation - Martin Ieong
25 pages
Educational Data Mining Techniques Approach To Predict Student's Performance
No ratings yet
Educational Data Mining Techniques Approach To Predict Student's Performance
4 pages
Data Mining For Small Student Data Set - Knowledge Management System For Higher Education Teachers
No ratings yet
Data Mining For Small Student Data Set - Knowledge Management System For Higher Education Teachers
11 pages
Predicting Student Academic Performance Using Data Mining Methods
No ratings yet
Predicting Student Academic Performance Using Data Mining Methods
5 pages
IJISR VOlume 1, 2020
No ratings yet
IJISR VOlume 1, 2020
9 pages
Educational Data Mining The Case of Department of Mathematics and Computing in The Period 2009 2018
No ratings yet
Educational Data Mining The Case of Department of Mathematics and Computing in The Period 2009 2018
5 pages
Literature Review
No ratings yet
Literature Review
11 pages
Performance and Early Drop Prediction For Higher Education Students Using Machine Learning
No ratings yet
Performance and Early Drop Prediction For Higher Education Students Using Machine Learning
9 pages
Student Performance Prediction by Using Data Mining Classification Algorithms
No ratings yet
Student Performance Prediction by Using Data Mining Classification Algorithms
5 pages
'Hsduwphqwri0&$, 06551Djdu%Dqjdoruh Duqdwdnd, QGLD
No ratings yet
'Hsduwphqwri0&$, 06551Djdu%Dqjdoruh Duqdwdnd, QGLD
6 pages
Student Performance Prediction and Analysis: Ijarcce
No ratings yet
Student Performance Prediction and Analysis: Ijarcce
4 pages
Predicting Students Performance Using Data Mining Technique With Rough Set Theory Concepts
No ratings yet
Predicting Students Performance Using Data Mining Technique With Rough Set Theory Concepts
7 pages
Data Mining Chapter 1 Notes
No ratings yet
Data Mining Chapter 1 Notes
40 pages
Prediction of Student Academic Performance by An Application of K-Means Clustering Algorithm
No ratings yet
Prediction of Student Academic Performance by An Application of K-Means Clustering Algorithm
3 pages
c45 K Nearest Neighbor Naïve Bayes and R b0991171
No ratings yet
c45 K Nearest Neighbor Naïve Bayes and R b0991171
10 pages
Using ID3 Decision Tree Algorithm To The Student Grade Analysis and Prediction
No ratings yet
Using ID3 Decision Tree Algorithm To The Student Grade Analysis and Prediction
4 pages
Performance Improvement in Education Sector Using Datamining
No ratings yet
Performance Improvement in Education Sector Using Datamining
21 pages
Data Mining
No ratings yet
Data Mining
11 pages
Paper 7
No ratings yet
Paper 7
5 pages
Data Mining in Education Data Classification and Decision Tree Approach 097 Z00080E10038 2
No ratings yet
Data Mining in Education Data Classification and Decision Tree Approach 097 Z00080E10038 2
5 pages
Data Mining in Higher Education: University Student Dropout Case Study
No ratings yet
Data Mining in Higher Education: University Student Dropout Case Study
13 pages
Educational Data Mining For Student Placement Prediction Using Machine Learning Algorithms - Sreenivasa Rao - International Journal of Engineering & Technology
No ratings yet
Educational Data Mining For Student Placement Prediction Using Machine Learning Algorithms - Sreenivasa Rao - International Journal of Engineering & Technology
4 pages
Prediction of Final Result and Placement of Studen
No ratings yet
Prediction of Final Result and Placement of Studen
7 pages
Ejsr 43 1 03
No ratings yet
Ejsr 43 1 03
6 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Collegeadmissions
No ratings yet
Collegeadmissions
11 pages
Bee Jay1
No ratings yet
Bee Jay1
11 pages
ART: A Hybrid Classification Model: 2004 Kluwer Academic Publishers. Manufactured in The Netherlands
No ratings yet
ART: A Hybrid Classification Model: 2004 Kluwer Academic Publishers. Manufactured in The Netherlands
26 pages
Student Graduation Using DataMining
No ratings yet
Student Graduation Using DataMining
207 pages
Using Data Mining To Predict Secondary School Student Performance
No ratings yet
Using Data Mining To Predict Secondary School Student Performance
9 pages
92statistics Annual92
No ratings yet
92statistics Annual92
171 pages
91statistics Annual91
No ratings yet
91statistics Annual91
171 pages
Fundamental Data Mining in Institutional Research Workshop
No ratings yet
Fundamental Data Mining in Institutional Research Workshop
68 pages
KAIZEN Road Map & OS
No ratings yet
KAIZEN Road Map & OS
2 pages
SWStutorial ICWE2005
No ratings yet
SWStutorial ICWE2005
220 pages
The Application of Data Mining To Support Custemer Relationship Management at Ethiopian Airlines
No ratings yet
The Application of Data Mining To Support Custemer Relationship Management at Ethiopian Airlines
140 pages
Va Enrollment Demand Projection-2001-2010
No ratings yet
Va Enrollment Demand Projection-2001-2010
62 pages
Tagarela
No ratings yet
Tagarela
11 pages
Segmentation of Cells From Microscopic
No ratings yet
Segmentation of Cells From Microscopic
67 pages
Article 4
No ratings yet
Article 4
13 pages
Ii - What Is Data Mining?
No ratings yet
Ii - What Is Data Mining?
4 pages
Text Categorization and Classification
No ratings yet
Text Categorization and Classification
13 pages
System Linq
No ratings yet
System Linq
1,024 pages
Paper 28
No ratings yet
Paper 28
6 pages
Poster Final
No ratings yet
Poster Final
1 page
Enrollment Study
No ratings yet
Enrollment Study
33 pages
Murach's Android Programming Training & Reference 2nd Edition
100% (1)
Murach's Android Programming Training & Reference 2nd Edition
683 pages
Visa Global Level 3 L3 Testing Guidelines and FAQ Version 1.14 - Build 015 - FINAL - 061523
No ratings yet
Visa Global Level 3 L3 Testing Guidelines and FAQ Version 1.14 - Build 015 - FINAL - 061523
8 pages
Vol 101
No ratings yet
Vol 101
124 pages
Microsoft VisualBasic FileIO
No ratings yet
Microsoft VisualBasic FileIO
181 pages
Python Tutorial
No ratings yet
Python Tutorial
88 pages
SPECTRA-SoftPOS EN
No ratings yet
SPECTRA-SoftPOS EN
3 pages
ED474143
No ratings yet
ED474143
20 pages
Visa - Public - Key - Tables - Accessible - Reformatted 15 Sep 23 (1) - ACCESSIBLE
No ratings yet
Visa - Public - Key - Tables - Accessible - Reformatted 15 Sep 23 (1) - ACCESSIBLE
9 pages
Tokenisation
No ratings yet
Tokenisation
2 pages
Emvco Keynote The Future of Contactless Payments Slides 001
No ratings yet
Emvco Keynote The Future of Contactless Payments Slides 001
21 pages
E Book Tokenization
No ratings yet
E Book Tokenization
8 pages
Move 2500
No ratings yet
Move 2500
24 pages
Contactless Mester Card
No ratings yet
Contactless Mester Card
1 page
Prelim Intro To Multimedia Chap 1
No ratings yet
Prelim Intro To Multimedia Chap 1
38 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Exploring AI Ethics of ChatGPT - A Diagnostic Analysis
No ratings yet
Exploring AI Ethics of ChatGPT - A Diagnostic Analysis
17 pages
ERTMS in 10 Questions
No ratings yet
ERTMS in 10 Questions
8 pages
1973 Eldorado
No ratings yet
1973 Eldorado
70 pages
Shanghai City Times
No ratings yet
Shanghai City Times
3 pages
GRES Integrated Energy Storage Systgem User Manual V1.01-EN
No ratings yet
GRES Integrated Energy Storage Systgem User Manual V1.01-EN
112 pages
2023 R Programming Apr May (AICTE)
No ratings yet
2023 R Programming Apr May (AICTE)
3 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
CISSP Common Body of Knowledge Review in
No ratings yet
CISSP Common Body of Knowledge Review in
145 pages
Top 58 MySql Interview Questions (2023) - Javatpoint
No ratings yet
Top 58 MySql Interview Questions (2023) - Javatpoint
37 pages
ANSI Codes
No ratings yet
ANSI Codes
12 pages
Cainta Catholic College Senior High School Department Cainta, Rizal
No ratings yet
Cainta Catholic College Senior High School Department Cainta, Rizal
33 pages
Lorraine - de Souza - GCSE - String Manipulation With Helpsheets
No ratings yet
Lorraine - de Souza - GCSE - String Manipulation With Helpsheets
37 pages
Micromachines 10 00745 v2 PDF
No ratings yet
Micromachines 10 00745 v2 PDF
11 pages
Chapter 01
No ratings yet
Chapter 01
36 pages
Probabilistic Reasoning: Unit-V
No ratings yet
Probabilistic Reasoning: Unit-V
33 pages
Full Download Multiple Valued Logic Concepts and Representation 1st Edition D. Michael Miller PDF
100% (3)
Full Download Multiple Valued Logic Concepts and Representation 1st Edition D. Michael Miller PDF
40 pages
Unit 1 Sách ĐT5
No ratings yet
Unit 1 Sách ĐT5
18 pages
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
No ratings yet
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
7 pages
Pro Python System Administration 2nd Edition Rytis Sileika Download
100% (1)
Pro Python System Administration 2nd Edition Rytis Sileika Download
60 pages
Open Group Guide: Business Capabilities
100% (1)
Open Group Guide: Business Capabilities
25 pages
CAT-9519 MGC-CONFIG-KIT4 Fire Panel Configuration Kit
No ratings yet
CAT-9519 MGC-CONFIG-KIT4 Fire Panel Configuration Kit
1 page
DevOps Engineer
No ratings yet
DevOps Engineer
2 pages
Module-1: Web Programming
100% (1)
Module-1: Web Programming
50 pages
GVX 9000
No ratings yet
GVX 9000
212 pages
Extra Worksheets 1st Year
No ratings yet
Extra Worksheets 1st Year
41 pages
CS2230 OS Lab Mahesh Jangid CourseHandout
No ratings yet
CS2230 OS Lab Mahesh Jangid CourseHandout
5 pages
Mastering jBPM6 - Sample Chapter
No ratings yet
Mastering jBPM6 - Sample Chapter
52 pages
Mystic Media House Profile
No ratings yet
Mystic Media House Profile
16 pages

Enrollment Prediction - Project

Uploaded by

Enrollment Prediction - Project

Uploaded by

Enrollment Prediction Models Using Data Mining

Ashutosh Nandeshwar Subodh Chaudhari

1.1 Previous Applications of Data Mining

• To present explainable theories to the business users

2.1 Tools Used

For each class C

Figure 4: Pseudocode for a Basic Rule Learner

IF FinancialAid="No" AND HighSchoolGPA < 2.5 AND HoursRegistered < 10

Figure 5: A Fictitious Example of a Rule Learner

Figure 6: Data Flow of WVU’s Admissions Data

4.1 Extraction and Preprocessing

• Flag variables were modified —Enrollment indicator, First Generation

• Access table was converted to ARFF using VBA script

4.2 Data Visualization

Figure 8: Financial Aid Indicator

• Randomly divided data in two parts: training and testing

(c) Accuracy Using InfoGain

6.1 Learned Theory

(b) Results for InfoGain Dataset

Figure 12: Pivot Table for Datasets, Learners, and Discretizers

Figure 13: Win-Loss Table and Quartile Chart

R. H. Atwell, W. Ding, M. Ehasz, S. Johnson, and M. Wang. Using data mining

Bruce D. Baker and Craig E. Richards. A comparison of conventional linear regression

N. Delavari, M.R. Beikzadeh, and S. Phon-Amnuaisuk. Application of enhanced analy-

C. DeLong, P. M. Radcliffe, and L. S. Gorny. Recruiting for retention: Using data

M. J. Druzdzel and C. Glymour. Application of the TETRAD II program to the study

J. M. B. Gonzlez and S. L. DesJardins. Artificial neural networks: A new approach

S. Herzog. Estimating student retention and degree-completion time: Decision trees

C. Intrasai and V. Avatchanakorn. Genetic data mining algorithm with academic

L. K. Lau. Institutional factors affecting student retention. Education, 124(1):126–137,

B. Minaei-Bidgoli, D.A. Kashy, G. Kortmeyer, and W.F. Punch. Predicting student

P. Naplava and N. Snorek. Modeling of student’s quality by means of GMDH algo-

J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

J. R. Quinlan. Improved use of continuous attributes in C4. 5. Journal of Artificial

A.P. Sanjeev and J.M. Zytkow. Discovering enrolment knowledge in university

A. Scalise, M. Besterfield-Sacre, L. Shuman, and H. Wolfe. First term probation:

D. L. Stewart and B. H. Levin. A model to marry recruitment and retention: A case

S. Sujitparapitaya. Considering student mobility in retention outcomes. New Direc-

J. F. Superby, J. P. Vandamme, and N. Meskens. Determination of factors influencing

E. H. Thomas and N. Galambos. What satisfies students? mining student-opinion

C. Tillman and P. Burns. Presentation on First Year Experience. http://www.

J.P. Vandamme. Predicting Academic Performance by Data Mining Methods. Edu-

W. R. Veitch. Identifying characteristics of high school dropouts: Data mining with a

You might also like