A Survey On Research Work in Educational Data Mining
A Survey On Research Work in Educational Data Mining
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 2, Ver. II (Mar Apr. 2015), PP 43-49
www.iosrjournals.org
Associate Professor, MCA Department, MVSR Engineering College, Osmania University, Hyderabad, INDIA
2
Professor & Dean, Department of Informatics, Osmania University, Hyderabad, INDIA
3
Associate Professor, IT Department, MVSR Engineering College, Osmania University, Hyderabad, INDIA
Abstract: Educational Data Mining is an emerging discipline that focuses on applying Data Mining tools and
techniques to educationally related data. The discipline focuses on analyzing educational data to develop
models for improving learning experiences and institutional effectiveness. A literature review on educational
data mining follows, which covers topics such as student retention and attrition, personal recommender systems
with in education and how data mining can be used to analyze course management system data. Gaps in the
current literature and opportunities for further research are presented.
Keywords: Data mining, Educational Data Mining, Student Modelling, Student Retention, Recommendation
Systems, Learning Experience etc.
I.
Introduction
EDM is growing at a very fast pace. The main aim of EDM is to develop methods in order to explore
the unique type of data that comes from educational institutes and to use those methods to better understand the
students and their learning environments. All types of educational data independent of their source have
multiple levels of meaningful hierarchy which is determined by properties in the data itself and not in advance.
Other issues like time, sequence, and context also plays important roles in the study of educational data.
International Educational Data Mining Society has been formed with an aim to support collaboration
and scientific development in this area. To realize its objectives EDM society organizes a series of conferences,
bringing out a journal, development of community resources for sharing of data and techniques.
EDM deals with mining of large data sets of educational data to answer educational research questions.
These data sets may come from learning management systems, interactive learning environments, intelligent
tutoring systems, or any system used in a learning context. The types of data ranges from raw log files to eye
tracking devices and other sensor data. EDM is interdisciplinary research and may require adaptation of existing
or development of new approaches that build upon techniques from a combination of areas like statistics,
psychometrics, machine learning, information retrieval, recommender systems and scientific computing.
This survey features some of the innovative and fascinating basic and applied research centered on data
mining, education and learning technologies. Survey includes diverse set of papers spanning the field of
Machine Learning, Artificial Intelligence, Learning Technologies, Education, Linguistics and Psychology.
These papers study application of data mining to analyze data generated by various information systems
supporting learning or education. They also deal with EDM applications with an actual impact on the future of
learning and teaching. Papers are contributed by researchers from computer science, machine learning and data
mining, artificial intelligence in education, intelligent tutoring systems, education, learning sciences,
psychometrics, statistics and cognitive psychology.
II.
Literature Survey
Educational data mining is emerging as a research area with a suite of computational and psychological methods
and research approaches for understanding how students learn
2.1 Student Modelling Research:
Student modelling is the major area of research in EDM, work done in student modelling ranges from
automatic improvement of student model, unified discovery of student and cognitive model ,impact of
individualizing student models on practice opportunities, technique for automated improvement of student
model is presented which covers data sets from intelligent tutors to games. The improvements highlights flaws
in original model which can lead to new insights into the learning process thereby improving the tutor design.
The unified model is called as Dynamic Cognitive Tracing which expresses student learning in terms of skill
mastery overtime by simultaneously building the student and cognitive models.
Limits to Accuracy: How well Can we do at Student modelling (predicting Students next attempt): Here
student modelling approach is used to predict whether students next attempt will be correct. Many student
DOI: 10.9790/0661-17224349
www.iosrjournals.org
43 | Page
DOI: 10.9790/0661-17224349
www.iosrjournals.org
44 | Page
www.iosrjournals.org
45 | Page
DOI: 10.9790/0661-17224349
www.iosrjournals.org
46 | Page
Problem Statement
Explore potential reasons behind
the inability to create highly
accurate models.
Field
Predicting next item
correctness
Student Modeling
Predictive State Representation,
Spectral algorithm, classification
Inferring
knowledge
Improving student
models
Speech Act
Classification
Multiple Graphical
Representation
Student Modeling
Bayesian Knowledge Tracing
Classification
IV.
student
Mastery
Learning
Assessment
Future work
Construct student models that could
detect student behaviors like boredom,
frustration
and
discouragement,
retention instead of just using it for
deterring whether a student is learning
or not.
Learning complex latent variable
models (Variations of BKT) directly
from student performance data.
Conclusion
This paper presents the research work carried out on Educational Data Mining by several research
scholars and professional experts. There are a wide variety of applications of EDM discussed in this paper i.e.
Improving Student Models, Discovering or improving models of the knowledge structure of the domain,
studying the pedagogical support provided by learning software, Scientific discovery about learning and
learners. Discovery with models being the key method EDM have lot of scope to the Researchers and software
developers. A final recommendation is to create and continue strong collaboration across research, commercial,
and educational sectors. Commercial companies operate on fast development cycles and can produce data useful
for research.
References
[1].
[2].
[3].
[4].
[5].
[6].
[7].
[8].
[9].
[10].
DOI: 10.9790/0661-17224349
www.iosrjournals.org
47 | Page
Farzan, R. (2004). Adaptive socio-recommender system for open-corpus e-learning. In Doctoral consortium of the third
international conference on adaptive hypermedia and adaptive web-based systems.
Feng, M., Heffernan, N., & Koedinger, K. (2005). Looking for sources of error in predicting students knowledge. In Proceedings of
AAAI05 workshop on educational data mining.
Freyberger, J., Heffernan, N., & Ruiz, C. (2004). Using association rules to guide a search for best fitting transfer models of student
learning. In Workshop on analyzing studenttutor interactions logs to improve educational outcomes at ITS conference.
Grob, H., Bensberg, F., & Kaderali, F. (2004). Controlling open source intermediaries a web log mining approach. In Proceedings
of the 26th international conference on information technology interfaces (pp. 233242).
Hanna, M. (2004). Data mining in the e-learning domain. Computers & Education Journal, 42(3), 267287.
Heiner, C., Beck, J., & Mostow, J. (2004). Lessons on using its data to answer educational research questions. In Proceedings of the
ITS2004 workshop on analyzing studenttutor interaction logs to improve educational outcomes (pp. 19).
Hwang, W., Chang, C., & Chen, G. (2004). The relationship of learning traits, motivation and performance-learning response
dynamics. Computers & Education Journal, 42(3), 267287.
Iksal, S., & Choquet, C. (2005). Usage analysis driven by models in a pedagogical context. Ingram, A. (1999). Using web server
logs in evaluating instructional web sites. Journal of Educational Technology Systems, 28(2), 137157.
Johnson, S., Arago, S., Shaik, N., & Palma-Rivas, N. (2000). Comparative analysis of learner satisfaction and learning outcomes in
online and face-to-face learning environments. Journal of Interactive Learning Research, 11(1), 2949.
Klosgen, W., & Zytkow, J. (2002). Handbook of data mining and knowledge discovery. New York: Oxford University Press.
Koutri, M., Avouris, N., & Daskalaki, S. (2004). Ch. A survey on web usage mining techniques for web-based adaptive hypermedia
systems.
Lu, J. (2004). Personalized e-learning material recommender system. In International conference on information technology for
application (pp. 374379).
Luan, J. (2002). Data mining, knowledge management in higher education, potential applications. In Workshop associate of
institutional research international conference, Toronto (pp. 118).
Ma, Y., Liu, B., Wong, C., Yu, P., & Lee, S. (2000). Targeting the right students using data mining. In KDD 00: Proceedings of the
sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 457464).
Markham, S., Ceddia, J., Sheard, J., Burvill, C., Weir, J., Field, B., et al. (2003). Applying agent technology to evaluation tasks in elearning environments. In Proceedings of the exploring educational technologies conference.
Mazza, R., & Milani, C. (2005). Exploring usage analysis in learning systems: Gaining insights from visualisations. In Workshop
on usage analysis in learning systems at 12th international conference on artificial intelligence in education.
Merceron, A., & Yacef, K. (2003). A web-based tutoring tool with mining facilities to improve learning and teaching. In
Proceedings of 11th international conference on artificial intelligence in education (pp. 201208).
Merceron, A., & Yacef, K. (2004). Mining student data captured from a web-based tutoring tool: Initial exploration and results.
Journal of Interactive Learning Research, 15(4), 319346.
Merceron, A., & Yacef, K. (2005). Tada-ed for educational data mining. Interactive Multimedia Electronic Journal of ComputerEnhanced Learning, 7(1), 267287.
Minaei-Bidgoli, B., & Punch, W. (2003). Using genetic algorithms for data mining optimization in an educational web-based
system. In GECCO (pp. 22522263).
Mor, E., & Minguillon, J. (2004). E-learning personalization based on itineraries and long-term navigational behaviour. In
Proceedings of the 13th international World Wide Web conference (pp. 264265).
Mostow, J., Beck, J., Cen, H., Cuneo, A., Gouvea, E., & Heiner, C. (2005). An educational data mining tool to browse tutorstudent
interactions: Time will tell! In Proceedings of the workshop on educational data mining (pp. 1522).
Nilakant, K., & Mitrovic, A. (2005). Application of data mining in constraint-based intelligent tutoring systems. In Proceedings of
the artificial intelligence in education, AIED (pp. 896898).
Peled, A., & Rashty, D. (1999). Logging for success: Advancing the use of www logs to improve computer mediated distance
learning. Journal of Educational Computing Research, 21(4), 413431.
Rahkila, M., & Karjalainen, M. (1999). Evaluation of learning in computer based education using log systems. In ASEE/IEEE
frontiers in education conference, San Juan, Puerto Rico (pp. 1621).
Romero, C., Ventura, S., & Bra, P. D. (2004). Knowledge discovery with genetic programming for providing feedback to
courseware author. User Modeling and User-Adapted Interaction: The Journal of Personalization Research, 14(5), 425464.
Sanjeev, P., & Zytkow, J. M. (1995). Discovering enrolment knowledge in university databases. In KDD (pp. 246251).
Shen, R., Yang, F., & Han, P. (2002). Data analysis center based on e-learning platform. In Proceedings of the 5th international
workshop on the internet challenge: Technology and applications (pp. 1928).
Silva, D., & Vieira, M. (2002). Using data warehouse and data mining resources for ongoing assessment in distance learning. In
IEEE international conference on advanced learning technologies, Kazan, Russia (pp. 4045).
Srivastava, J., Cooley, R., Deshpande, M., & Tan, P. (2000). Web usage mining: Discovery and applications of usage patterns from
web data. SIGKDD Explorations, 1(2), 1223.
Talavera, L., & Gaudioso, E. (2004). Mining student data to characterize similar behaviour groups in unstructured collaboration
spaces. In Workshop on artificial intelligence in CSCL. 16th European conference on artificial intelligence (pp. 1723).
Tane, J., Schmitz, C., & Stumme, G. (2004). Semantic resource management for the web: An e-learning application. In Proceedings
of the WWW conference, New York, USA (pp. 110).
Tang, C., Yin, H., Li, T., Lau, R., Li, Q., & Kilis, D. (2000). Personalized courseware construction based on web data mining. In
Proceedings of the first international conference on web information systems engineering, Washington, DC, USA (pp. 204211).
Tang, T., & McCalla, G. (2002). Student modelling for a web-based learning environment: A data mining approach. In Eighteenth
national conference on artificial intelligence, Menlo Park, CA, USA (pp. 967 968).
Tang, T., & McCalla, G. (2005). Smart recommendation for an evolving e-learning system. International Journal on E-Learning,
4(1), 105 129.
Ueno, M. (2004b). Online outlier detection system for learning time data in e-learning and its evaluation. In International
conference on computers and advanced technology in education (pp. 248253).
Urbancic, T., Skrjanc, M., & Flach, P. (2002). Web-based analysis of data mining and decision support education. AI
Communications, 15, 199204.
Wang, F. (2002). On using data-mining technology for browsing log file analysis in asynchronous learning environment. In
Conference on educational multimedia, hypermedia and telecommunications (pp. 20052006).
DOI: 10.9790/0661-17224349
www.iosrjournals.org
48 | Page
[60].
[61].
[62].
[63].
[64].
[65].
[66].
[67].
[68].
[69].
[70].
[71].
[72].
[73].
[74].
[75].
[76].
[77].
[78].
[79].
[80].
Wang, W., Weng, J., Su, J., & Tseng, S. (2004). Learning portfolio analysis and mining in SCORM compliant environment. In
ASEE/IEEE frontiers in education conference (pp. 1724).
Zaane, O. (2002). Building a recommender agent for e-learning systems. In ICCE (pp. 5559).
Zaane, O., & Luo, J. (2001). Web usage mining for a better web-based learning environment. In Proceedings of conference on
advanced technology for education, Banff, Alberta (pp. 6064).
Zaane, O., Xin, M., & Han, J. (1998). Discovering web access patterns and trends by applying OLAP and data mining technology
on web logs. In Advances in digital libraries (pp. 1929).
Kenneth R. Koedinger, Elizabeth A. McLaughlin and John C. Stamper, "Automated Student Model Improvement ,In Fifth
international conference on Educational Data Mining- 2012, (PP:17-24).
Vasile Rus, Arthur Graesser, Cristian Moldovan and Nobal Niraula, Automatic Discovery of Speech Act Categories in Educational
Games, In Fifth international conference on Educational Data Mining-2012,(PP: 25-32).
Shubhendu Trivedi, Zachary Pardos, Gbor Srkzy and Neil Heffernan, Co-Clustering by Bipartite Spectral Graph Partitioning
for Out-of-Tutor Prediction, In Fifth international conference on Educational Data Mining-2012, (PP: 33-40).
Yanbo Xu and Jack Mostow,Comparison of methods to trace multiple subskills: Is LR-DBN best? , In Fifth international
conference on Educational Data Mining-2012, (PP: 41-48).
Jose Gonzalez-Brenes and Jack Mostow, Dynamic Cognitive Tracing: Towards Unified Discovery of Student and Cognitive
Models, In Fifth international conference on Educational Data Mining-2012, (PP: 49-56).
John Kinnebrew and Gautam Biswas, Identifying Learning Behaviours by Contextualizing Differential Sequence Mining with
Action Features and Performance Evolution, In Fifth international conference on Educational Data Mining-2012, (PP: 57 64).
Franois Bouchet, John Kinnebrew, Gautam Biswas and Roger Azevedo, Identifying Students Characteristic Learning
Behaviours in an Intelligent Tutoring System Fostering Self-Regulated Learning, In Fifth international conference on Educational
Data Mining-2012, (PP: 65-72).
Ilya Goldin, Kenneth Koedinger and Vincent Aleven, Learner Differences in Hint Processing, In Fifth international conference
on Educational Data Mining-2012, (PP: 73-80).
Behzad Beheshti, Michel Desmarais and Rhouma Naceur, Methods to find the number of latent skills, In Fifth international
conference on Educational Data Mining-2012, (PP: 81-86).
Terry Peckham and Gordon McCalla, Mining Student Behavior Patterns in Reading Comprehension Tasks, In Fifth international
conference on Educational Data Mining-2012, (PP: 87- 94).
Yoav Bergner, Stefan Droschler, Gerd Kortemeyer, Saif Rayyan, Daniel Seaton and David Pritchard, Model-Based Collaborative
Filtering Analysis of Student Response Data: Machine-Learning Item Response Theory, In Fifth international conference on
Educational Data Mining-2012, (PP: 95 -102).
Tomas Obsivac, Lubos Popelinsky, Jaroslav Bayer, Jan Geryk and Hana Bydzovska, Predicting drop-out from social behaviour of
students, In Fifth international conference on Educational Data Mining-2012, (PP: 103 109).
Martina Rau and Richard Scheines, Searching for Variables and Models to Investigate Mediators of Learning from Multiple
Representations, In Fifth international conference on Educational Data Mining-2012, (PP:110-117).
Jung In Lee and Emma Brunskill, The Impact on Individualizing Student Models on Necessary Practice Opportunities ,In Fifth
international conference on Educational Data Mining-2012, (PP:118-125).
Leigh Ann Sudol, Kelly Rivers and Thomas K. Harris, Calculating Probabilistic Distance to Solution in a Complex Problem
Solving Domain, In Fifth international conference on Educational Data Mining-2012, (PP:144-147).
Manuel Ignacio Lopez, Cristobal Romero, Sebastin Ventura and J.M. Luna, Classification via clustering for predicting final
marks starting from the student participation in Forums, In Fifth international conference on Educational Data Mining-2012,
(PP:148-151).
Ma. Mercedes Rodrigo, Ryan S. J. D. Baker, Bruce McLaren, Alejandra Jayme and Thomas Dy, Development of a Workbench to
Address the Educational Data Mining Bottleneck, In Fifth international conference on Educational Data Mining-2012, (PP: 152155).
Jennifer Sabourin, Bradford Mott and James Lester, Early Prediction of Student Self-Regulation Strategies by Combining Multiple
Models, In Fifth international conference on Educational Data Mining-2012, (PP:156-159).
Judi Mccuaig and Julia Baldwin, Identifying Successful Learners from Interaction Behaviour, In Fifth international conference on
Educational Data Mining-2012, (PP:160-163).
Michael Eagle, Matthew Johnson and Tiffany Barnes, Interaction Networks: Generating High Level Hints Based on Network
Community Clustering, In Fifth international conference on Educational Data Mining-2012, (PP: 164-167).
Martina Rau and Zachary Pardos, Interleaved Practice with Multiple Representations: Analyses with Knowledge Tracing Based
Techniques, In Fifth international conference on Educational Data Mining-2012, (PP: 168-171).
Carol Forsyth, Philip Pavlik Jr, Arthur Graesser, Zhiqiang Cai, Mae-Lynn Germany, Keith Millis, Heather Butler, Diane Halpern
and Robert Dolan, Learning Gains for Core Concepts in a Serious Game on Scientific Reasoning, In Fifth international
conference on Educational Data Mining-2012, (PP: 172-175).
Yutao Wang and Neil Heffernan, Leveraging First Response Time into the Knowledge Tracing Model, In Fifth international
conference on Educational Data Mining-2012, (PP: 176-179).
Jin Soung Yoo and Moon-Heum Cho, Mining Concept Maps to Understand University Students Learning, In Fifth international
conference on Educational Data Mining-2012, (PP:184-187).
Michael Yudelson and Emma Brunskill, Policy Building An Extension to User Modeling, In Fifth international conference on
Educational Data Mining-2012, (PP: 188- 191).
Zachary Pardos, Qing Yang Wang and Shubhendu Trivedi, The real world significance of performance prediction, In Fifth
international conference on Educational Data Mining-2012, (PP: 192-195).
John Stamper, Derek Lomas, Dixie Ching, Steven Ritter, Kenneth Koedinger and Jonathan Steinhart, The Rise of the Super
Experiment, In Fifth international conference on Educational Data Mining-2012, (PP: 196-199).
Yutao Wang and Joseph Beck, Using Student Modeling to Estimate Student Knowledge Retention, In Fifth international
conference on Educational Data Mining-2012, (PP:200-203).
DOI: 10.9790/0661-17224349
www.iosrjournals.org
49 | Page