Download Full Handbook of educational data mining 1st Edition Cristobal Romero PDF All Chapters
Download Full Handbook of educational data mining 1st Edition Cristobal Romero PDF All Chapters
Download Full Handbook of educational data mining 1st Edition Cristobal Romero PDF All Chapters
https://ebookultra.com
https://ebookultra.com/download/handbook-of-
educational-data-mining-1st-edition-cristobal-
romero/
https://ebookultra.com/download/the-handbook-of-data-mining-1st-
edition-nong-ye/
ebookultra.com
https://ebookultra.com/download/handbook-of-statistical-analysis-and-
data-mining-applications-1st-edition-robert-nisbet/
ebookultra.com
https://ebookultra.com/download/data-mining-and-data-warehousing-1st-
edition-s-k-mourya/
ebookultra.com
https://ebookultra.com/download/exploratory-data-mining-and-data-
cleaning-1st-edition-tamraparni-dasu/
ebookultra.com
Biological Data Mining Chapman Hall Crc Data Mining and
Knowledge Discovery Series 1st Edition Jake Y. Chen
https://ebookultra.com/download/biological-data-mining-chapman-hall-
crc-data-mining-and-knowledge-discovery-series-1st-edition-jake-y-
chen/
ebookultra.com
https://ebookultra.com/download/music-data-mining-1st-edition-tao-li/
ebookultra.com
https://ebookultra.com/download/data-mining-for-bioinformatics-
applications-1st-edition-he-zengyou/
ebookultra.com
https://ebookultra.com/download/practical-data-mining-1st-edition-
monte-f-hancock-jr/
ebookultra.com
Handbook of educational data mining 1st Edition
Cristobal Romero Digital Instant Download
Author(s): Cristobal Romero, Sebastian Ventura, Mykola Pechenizkiy, Ryan
S.J.d. Baker
ISBN(s): 9781439804582, 1439804583
Edition: 1
File Details: PDF, 13.87 MB
Year: 2010
Language: english
Handbook of
Educational Data Mining
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A
PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: TEXT MINING: CLASSIFICATION, CLUSTERING,
DATA MINING WITH MATRIX DECOMPOSITIONS AND APPLICATIONS
David Skillicorn Ashok N. Srivastava and Mehran Sahami
COMPUTATIONAL METHODS OF FEATURE BIOLOGICAL DATA MINING
SELECTION Jake Y. Chen and Stefano Lonardi
Huan Liu and Hiroshi Motoda
INFORMATION DISCOVERY ON ELECTRONIC
CONSTRAINED CLUSTERING: ADVANCES IN HEALTH RECORDS
ALGORITHMS, THEORY, AND APPLICATIONS Vagelis Hristidis
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
TEMPORAL DATA MINING
KNOWLEDGE DISCOVERY FOR Theophano Mitsa
COUNTERTERRORISM AND LAW ENFORCEMENT
David Skillicorn RELATIONAL DATA CLUSTERING: MODELS,
ALGORITHMS, AND APPLICATIONS
MULTIMEDIA DATA MINING: A SYSTEMATIC Bo Long, Zhongfei Zhang, and Philip S. Yu
INTRODUCTION TO CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, STATISTICAL DATA MINING USING SAS
Rajeev Motwani, and Vipin Kumar APPLICATIONS, SECOND EDITION
George Fernandez
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada INTRODUCTION TO PRIVACY-PRESERVING DATA
PUBLISHING: CONCEPTS AND TECHNIQUES
THE TOP TEN ALGORITHMS IN DATA MINING Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu,
Xindong Wu and Vipin Kumar and Philip S. Yu
GEOGRAPHIC DATA MINING AND HANDBOOK OF EDUCATIONAL DATA MINING
KNOWLEDGE DISCOVERY, SECOND EDITION Cristóbal Romero, Sebastian Ventura,
Harvey J. Miller and Jiawei Han Mykola Pechenizkiy, and Ryan S.J.d. Baker
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
Handbook of
Educational Data Mining
Edited by
Cristóbal Romero, Sebastian Ventura,
Mykola Pechenizkiy, and Ryan S.J.d. Baker
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the
accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products
does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular
use of the MATLAB® software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2011 by Taylor and Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To my wife, Ana, and my son, Cristóbal
Cristóbal Romero
Sebastián Ventura
Mykola Pechenizkiy
Ryan S. J. d. Baker
Contents
Preface...............................................................................................................................................xi
Editors..............................................................................................................................................xv
Contributors................................................................................................................................. xvii
1. Introduction..............................................................................................................................1
Cristóbal Romero, Sebastián Ventura, Mykola Pechenizkiy, and Ryan S. J. d. Baker
vii
viii Contents
11. Novel Derivation and Application of Skill Matrices: The q-Matrix Method.........159
Tiffany Barnes
29. Using Fine-Grained Skill Models to Fit Student Performance with Bayesian
Networks............................................................................................................................... 417
Zachary A. Pardos, Neil T. Heffernan, Brigham S. Anderson, and Cristina L. Heffernan
xi
xii Preface
are strong in statistical and computational techniques, but techniques and data are not suf-
ficient to advance a scientific domain; researchers with basic understanding of the teach-
ing and learning process are also required. Thus, education researchers and psychologists
are key participants in the EDM community.
teacher is a better choice. The first example is at the edge of what EDM is capable; the sec-
ond is, for now, beyond our capabilities.
This job of expanding our horizons and determining what are new, exciting questions to
ask the data is necessary for EDM to grow.
The third avenue of EDM is finding who are educational stakeholders that could benefit
from the richer reporting made possible with EDM. Obvious interested parties are stu-
dents and teachers. However, what about the students’ parents? Would it make sense for
them to receive reports? Aside from report cards and parent–teacher conferences, there
is little communication to parents about their child’s performance. Most parents are too
busy for a detailed report of their child’s school day, but what about some distilled infor-
mation? A system that informed parents if their child did not complete the homework
that was due that day could be beneficial. Similarly, if a student’s performance notice-
ably declines, such a change would be detectable using EDM and the parents could be
informed. Other stakeholders include school principals, who could be informed of teach-
ers who were struggling relative to peers, and areas in which the school was performing
poorly. Finally, there are the students themselves. Although students currently receive an
array of grades on homework, quizzes, and exams, they receive much less larger-grain
information, such as using the student’s past performance to suggest which classes to
take, or that the student’s homework scores are lower than expected based on exam per-
formance. Note that such features also change the context of educational data from some-
thing that is used in the classroom, to something that is potentially used in a completely
different place.
Research in this area focuses on expanding the list of stakeholders for whom we can
provide information, and where this information is received. Although there is much
potential work in this area that is not technically demanding, notifying parents of missed
homework assignments is simple enough, such work has to integrate with a school’s IT
infrastructure, and changes the ground rules. Previously, teachers and students controlled
information flow to parents; now parents are getting information directly. Overcoming
such issues is challenging. Therefore, this area has seen some attention, but is relatively
unexplored by EDM researchers.
The field of EDM has grown substantially in the past five years, with the first work-
shop referred to as “Educational data mining” occurring in 2005. Since then, it has held
its third international conference in 2010, had one book published, has its own online
journal, and is now having this book published. This growth is exciting for multiple
reasons. First, education is a fundamentally important topic, rivaled only by medi-
cine and health, which cuts across countries and cultures. Being able to better answer
age-old questions in education, as well as finding ways to answer questions that have
not yet been asked, is an activity that will have a broad impact on humanity. Second,
doing effective educational research is no longer about having a large team of graduate
assistants to score and code data, and sufficient offices with filing cabinets to store the
results. There are public repositories of educational data sets for others to try their hand
at EDM, and anyone with a computer and Internet connection can join the community.
Thus, a much larger and broader population can participate in helping improve the state
of education.
This book is a good first step for anyone wishing to join the EDM community, or for
active researchers wishing to keep abreast of the field. The chapters are written by key
EDM researchers, and cover many of the field’s essential topics. Thus, the reader gets a
broad treatment of the field by those on the front lines.
xiv Preface
Joseph E. Beck
Worcester Polytechnic Institute, Massachusetts
Editors
xv
xvi Editors
xvii
xviii Contributors
Cristina L. Heffernan
Kenneth R. Koedinger
Department of Computer Science
Human–Computer Interaction Institute
Worcester Polytechnic Institute
Carnegie Mellon University
Worcester, Massachusetts
Pittsburgh, Pennsylvania
Neil T. Heffernan
Department of Computer Science Irena Koprinska
Worcester Polytechnic Institute School of Information Technologies
Worcester, Massachusetts University of Sydney
Sydney, New South Wales, Australia
Cecily Heiner
Language Technologies Institute
Brett Leber
Carnegie Mellon University
Human–Computer Interaction Institute
Pittsburgh, Pennsylvania
Carnegie Mellon University
and Pittsburgh, Pennsylvania
Computer Science Department
University of Utah Tara M. Madhyastha
Salt Lake City, Utah Department of Psychology
University of Washington
Arnon Hershkovitz Seattle, Washington
Knowledge Technology Lab
School of Education David Masip
Tel Aviv University Department of Computer Science,
Tel Aviv, Israel
Multimedia and Telecommunications
Universitat Oberta de Catalunya
Earl Hunt
Barcelona, Spain
Department of Psychology
University of Washington
Seattle, Washington Manolis Mavrikis
London Knowledge Lab
Octavio Juarez The University of London
Robotics Institute London, United Kingdom
Carnegie Mellon University
Pittsburgh, Pennsylvania
Riccardo Mazza
Brian W. Junker Faculty of Communication Sciences
Department of Statistics University of Lugano
Carnegie Mellon University Lugano, Switzerland
Pittsburgh, Pennsylvania and
Judy Kay Department of Innovative Technologies
School of Information Technologies University of Applied Sciences of Southern
University of Sydney Switzerland
Sydney, New South Wales, Australia Manno, Switzerland
xx Contributors
Victor H. Menendez
Facultad de Matemáticas John C. Nesbit
Universidad Autónoma de Yucatán Faculty of Education
Merida, Mexico Simon Fraser University
Burnaby, British Columbia, Canada
Agathe Merceron
Media and Computer Science Department Engelbert Mephu Nguifo
Beuth University of Applied Sciences Department of Computer Sciences
Berlin, Germany Université Blaise-Pascal Clermont 2
Clermont-Ferrand, France
Amelia Zafra
Department of Computer Science and
Numerical Analysis
University of Cordoba
Cordoba, Spain
1
Introduction
Contents
1.1 Background.............................................................................................................................. 1
1.2 Educational Applications....................................................................................................... 3
1.3 Objectives, Content, and How to Read This Book............................................................. 4
References..........................................................................................................................................5
1.1╇ Background
In the last years, researchers from a variety of disciplines (including computer science,
statistics, data mining, and education) have begun to investigate how data mining can
improve education and facilitate education research. Educational data mining (EDM) is
increasingly recognized as an emerging discipline [10]. EDM focuses on the development
of methods for exploring the unique types of data that come from an educational context.
These data come from several sources, including data from traditional face-to-face class-
room environments, educational software, online courseware, and summative/high-stakes
tests. These sources increasingly provide vast amounts of data, which can be analyzed to
easily address questions that were not previously feasible, involving differences between
student populations, or involving uncommon student behaviors. EDM is contributing to
education and education research in a multitude of ways, as can be seen from the diver-
sity of educational problems considered in the following chapters of this volume. EDM’s
contributions have influenced thinking on pedagogy and learning, and have promoted
the improvement of educational software, improving software’s capacity to individualize
students’ learning experiences. As EDM matures as a research area, it has produced a con-
ference series (The International Conference on Educational Data Mining—as of 2010, in
its third iteration), a journal (the Journal of Educational Data Mining), and a number of highly
cited papers (see [2] for a review of some of the most highly cited EDM papers).
These contributions in education build off of data mining’s past impacts in other
domains such as commerce and biology [11]. In some ways, the advent of EDM can be con-
sidered as education “catching up” to other areas, where improving methods for exploiting
data have promoted transformative impacts in practice [4,7,12]. Although the discovery
methods used across domains are similar (e.g. [3]), there are some important differences
between them. For instance, in comparing the use of data mining within e-commerce and
EDM, there are the following differences:
1
2 Handbook of Educational Data Mining
Educational systems
(traditional classrooms, e-learning
Use, interact with,
systems, LMSs, web-based
participate in, design,
adaptive systems, intelligent
plan, build and maintain
tutoring systems, questionnaires
and quizzes)
Users
(students, learners, Provide, store:
instructors, Course information, contents,
teachers, course academic data, grades,
administrators, academic student usage and interaction data
researchers, school district
officials) Data mining techniques
(statistics, visualization, clustering,
Model learners and learning, classification, association rule
communicate findings, make mining, sequence mining, text
recommendations mining)
FIGURE 1.1
Applying data mining to the design of educational systems.
Chapters 2 through 4, 9, 12, 22, 24, and 28 discuss methods and case studies for this
category of application.
• Maintaining and improving courses. The objective is to help to course administrators
and educators in determining how to improve courses (contents, activities, links,
etc.), using information (in particular) about student usage and learning. The most
frequently used techniques for this type of goal are association, clustering, and
classification. Chapters 7, 17, 26, and 34 discuss methods and case studies for this
category of application.
• Generating recommendation. The objective is to recommend to students which con-
tent (or tasks or links) is most appropriate for them at the current time. The most
frequently used techniques for this type of goal are association, sequencing, clas-
sification, and clustering. Chapters 6, 8, 12, 18, 19, and 32 discuss methods and case
studies for this category of application.
• Predicting student grades and learning outcomes. The objective is to predict a student’s
final grades or other types of learning outcomes (such as retention in a degree
program or future ability to learn), based on data from course activities. The most
frequently used techniques for this type of goal are classification, clustering, and
association. Chapters 5 and 13 discuss methods and case studies for this category
of application.
• Student modeling. User modeling in the educational domain has a number of appli-
cations, including for example the detection (often in real time) of student states
and characteristics such as satisfaction, motivation, learning progress, or certain
types of problems that negatively impact their learning outcomes (making too
many errors, misusing or underusing help, gaming the system, inefficiently explor-
ing learning resources, etc.), affect, learning styles, and preferences. The common
objective here is to create a student model from usage information. The frequently
used techniques for this type of goal are not only clustering, classification, and
association analysis, but also statistical analyses, Bayes networks (including
Bayesian Knowledge-Tracing), psychometric models, and reinforcement learning.
Chapters 6, 12, 14 through 16, 20, 21, 23, 25, 27, 31, 33, and 35 discuss methods and
case studies for this category of application.
• Domain structure analysis. The objective is to determine domain structure, using
the ability to predict student performance as a measure of the quality of a domain
structure model. Performance on tests or within a learning environment is uti-
lized for this goal. The most frequently used techniques for this type of goal are
association rules, clustering methods, and space-searching algorithms. Chapters
10, 11, 29, and 30 discuss methods and case studies for this category of application.
courses and primary and secondary schools. For instance, 6% of U.S. high schools now use
Cognitive Tutor software for mathematics learning (cf. [6]). As these environments become
more widespread, ever-larger collections of data have been obtained by educational data
repositories. A case study on one of the largest of these repositories is given in the chapter
on the PSLC DataShop by Koedinger and colleagues.
This expansion of data has led to increasing interest among education researchers in a
variety of disciplines, and among practitioners and educational administrators, in tools
and techniques for analysis of the accumulated data to improve understanding of learners
and learning process, to drive the development of more effective educational software and
better educational decision-making. This interest has become a driving force for EDM. We
believe that this book can support researchers and practitioners in integrating EDM into
their research and practice, and bringing the educational and data mining communities
together, so that education experts understand what types of questions EDM can address,
and data miners understand what types of questions are of importance to educational
design and educational decision-making.
This volume, the Handbook of Educational Data Mining, consists of two parts. In the first
part, we offer nine surveys and tutorials about the principal data mining techniques that
have been applied in education. In the second part, we give a set of 25 case studies, offering
readers a rich overview of the problems that EDM has produced leverage for.
The book is structured so that it can be read in its entirety, first introducing concepts
and methods, and then showing their applications. However, readers can also focus on
areas of specific interest, as have been outlined in the categorization of the educational
applications. We welcome readers to the field of EDM and hope that it is of value to their
research or practical goals. If you enjoy this book, we hope that you will join us at a future
iteration of the Educational Data Mining conference; see www.educationaldatamining.org
for the latest information, and to subscribe to our community mailing list, edm-announce.
References
1. Arruabarrena, R., Pérez, T. A., López-Cuadrado, J., and Vadillo, J. G. J. (2002). On evaluating
adaptive systems for education. In International Conference on Adaptive Hypermedia and Adaptive
Web-Based Systems, Málaga, Spain, pp. 363–367.
2. Baker, R.S.J.d. and Yacef, K. (2009). The state of educational data mining in 2009: A review and
future visions. Journal of Educational Data Mining, 1(1), 3–17.
3. Hanna, M. (2004). Data mining in the e-learning domain. Computers and Education Journal, 42(3),
267–287.
4. Hirschman, L., Park, J.C., Tsujii, J., Wong, W., and Wu, C.H. (2002). Accomplishments and chal-
lenges in literature data mining for biology. Bioinformatics, 18(12), 1553–1561.
5. Ingram, A. (1999). Using web server logs in evaluating instructional web sites. Journal of
Educational Technology Systems, 28(2), 137–157.
6. Koedinger, K. and Corbett, A. (2006). Cognitive tutors: Technology bringing learning science to
the classroom. In K. Sawyer (Ed.), The Cambridge Handbook of the Learning Sciences. Cambridge,
U.K.: Cambridge University Press, pp. 61–78.
7. Lewis, M. (2004). Moneyball: The Art of Winning an Unfair Game. New York: Norton.
8. Li, J. and Zaïane, O. (2004). Combining usage, content, and structure data to improve web
site recommendation. In International Conference on Ecommerce and Web Technologies, Zaragoza,
Spain, pp. 305–315.
6 Handbook of Educational Data Mining
9. Pahl, C. and Donnellan, C. (2003). Data mining technology for the evaluation of web-based
teaching and learning systems. In Proceedings of the Congress e-Learning, Montreal, Canada.
10. Romero, C. and Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert
Systems with Applications, 33(1), 135–146.
11. Srivastava, J., Cooley, R., Deshpande, M., and Tan, P. (2000). Web usage mining: Discovery and
applications of usage patterns from web data. SIGKDD Explorations, 1(2), 12–23.
12. Shaw, M.J., Subramanian, C., Tan, G.W., and Welge, M.E. (2001). Knowledge management and
data management for marketing. Decision Support Systems, 31(1), 127–137.
Part I
Riccardo Mazza
Contents
2.1 Introduction.............................................................................................................................9
2.2 What Is Information Visualization?................................................................................... 10
2.2.1 Visual Representations............................................................................................ 10
2.2.2 Interaction.................................................................................................................. 11
2.2.3 Abstract Data............................................................................................................. 11
2.2.4 Cognitive Amplification.......................................................................................... 12
2.3 Design Principles.................................................................................................................. 13
2.3.1 Spatial Clarity............................................................................................................ 14
2.3.2 Graphical Excellence................................................................................................. 14
2.4 Visualizations in Educational Software............................................................................ 16
2.4.1 Visualizations of User Models................................................................................ 16
2.4.1.1 UM/QV........................................................................................................ 16
2.4.1.2 ViSMod........................................................................................................ 17
2.4.1.3 E-KERMIT................................................................................................... 18
2.4.2 Visualizations of Online Communications.......................................................... 19
2.4.2.1 Simuligne.................................................................................................... 19
2.4.2.2 PeopleGarden............................................................................................. 20
2.4.3 Visualizations of Student-Tracking Data............................................................... 20
2.5 Conclusions............................................................................................................................ 24
References........................................................................................................................................ 25
2.1╇ Introduction
This chapter presents an introduction to information visualization, a new discipline with
origins in the late 1980s that is part of the field of human–computer interaction. We will
illustrate the purposes of this discipline, its basic concepts, and some design principles
that can be applied to graphically render students’ data from educational systems. The
chapter starts with a description of information visualization followed by a discussion
on some design principles, which are defined by outstanding scholars in the field. Finally,
some systems in which visualizations have been used in learning environments to repre-
sent user models, discussions, and tracking data are described.
9
10 Handbook of Educational Data Mining
* A Dictionary of Computing. Oxford University Press, 1996. Oxford Reference Online. Oxford University Press.
Visualization in Educational Environments 11
the end. This can be defined as a scrutiny task, because it is a conscious operation that
involves memory, semantics, and symbolism.
Let us try to do the same operation, this time using the bars on the left. The length of
the bars lets us to identify almost immediately the longest and the shortest thanks to the
pre-attentive property of length, the length of the bars allows us to almost immediately
identify the longest and the shortest.
Graphical representations are often associated with the term “visualization” (or “visu-
alisation” in the British version of the term). It has been noted by Spence [16] that there is
a diversity of uses of the term “visualization.” For instance, in a dictionary the following
definitions can be found:
These definitions reveal that visualization is an activity in which humans are engaged, as
an internal construct of the mind [16,20]. It is something that cannot be printed on a paper
or displayed on a computer screen. With these considerations, we can summarize that
visualization is a cognitive activity, facilitated by graphical external representations from
which people construct internal mental representation of the world [16,20].
Computers may facilitate the visualization process with some visualization tools. This
is especially true in recent years with the availability of powerful computers at low cost.
However, the above definition is independent from computers: although computers can
facilitate visualization, it still remains an activity that happens in the mind.
2.2.2╇ Interaction
Recently there has been great progress in high-performance, affordable computer graphics.
The common personal computer has reached a graphic power that just 10 years ago was
possible only with very expensive graphic workstations specifically built for the graphic
process. At the same time, there has been a rapid expansion in information that people have
to process for their daily activities. This need led scientists to explore new ways to represent
huge amounts of data with computers, taking advantage of the possibility of users interact-
ing with the algorithms that create the graphical representation. Interactivity derives from
the people’s ability to also identify interesting facts when the visual display changes and
allows them to manipulate the visualization or the underlying data to explore such changes.
* The Concise Oxford Dictionary. Ed. Judy Pearsall. Oxford University Press, 2001. Oxford Reference Online. Oxford
University Press.
† A Dictionary of Computing. Oxford University Press, 1996. Oxford Reference Online. Oxford University Press.
‡ Merriam-Webster Online Dictionary. http://www.webster.com
12 Handbook of Educational Data Mining
earth), and data that is more abstract in nature (e.g., the stock
market fluctuations). The former is known as scientific visu-
alization, and the latter as IV [4,16,19].
Scientific visualization was developed in response to the
needs of scientists and engineers to view experimental or
phenomenal data in graphical formats (an example is given
in Figure 2.2), while IV is dealing with unstructured data
sets as a distinct flavor [4]. In Table 2.1 is reported a table
with some examples of abstract data and physical data.
However, we ought to say that this distinction is not strict,
and sometimes abstract and physical data are combined in a FIGURE 2.2
single representation. For instance, the results from the last Example of scientific visualiza-
Swiss federal referendum on changing the Swiss law on asy- tion: The ozone hole the South
lum can be considered a sort of abstract data if the goal of the Pole on September 22, 2004.
graphical representation is to highlight the preference (yes (Image from the NASA Goddard
Space Center archives and repro-
or no) with respect to the social status, age, sex, etc. of the duced with permission.)
voter. But if we want to highlight the percentage that the ref-
erendum got in each town, a mapping with the geographical location might be helpful to
see how the linguistic regions, cantons, and the proximity with the border influenced the
choice of the electorate (see Figure 2.3).
2 7 ×
4 2
5 4
1 0 8 —
1 1 3 4
This example shows how visual and manipulative use of the external representations and
processing amplifies cognitive performance. Graphics use the visual representations that
help to amplify cognition. They convey information to our minds that allows us to search
TABLE 2.1
Some Examples of Abstract Data and
Physical Data
Abstract Data Physical Data
Names Data gathered from
instruments
Grades Simulations of wind flow
News or stories Geographical locations
Jobs Molecular structure
Visualization in Educational Environments 13
ZG
LU
Provisorische Ergebnisse SZ
Résultats provisoires NE GL
NW
BE UR
OW GR
FR
VD
GE
TI
VS 0 25 50 km
Schweiz/Suisse
Stimmbeteiligung/participation: 48.4%
Abstimmung vom 24. September 2006
Ja-Stimmenanteil/proportion de «oui»: 67.8%
Votation du 24 Septembre 2006
Abst.–Nr./n ° vot.: 525
Schweizerische Eidgenossenschaft Eidgenössisches Departement des Innern EDI Quelle: Abstimmungsstatistik, BFS
Confédération suisse Département fédéral de l΄intérieur DFI Source: Statistique des votations, OFS
Confederazione svizzera Bundesamt für Statistik BFS © BFS, ThemaKart, Neuenburg 2006/K17.A525.R_bz
Confederaziun svizra Office fédéral de la statistique OFS © OFS, ThemaKart, Neuchâtel 2006/K17.A525.R_bz
FIGURE 2.3
Graphical representation of results of federal referendum in Switzerland on September 24, 2006. (Image from
the Swiss Federal Statistical Office, http://www.bfs.admin.ch. © Bundesamt für Statistik, ThemaKart 2009,
reproduced with permission.)
for patterns, recognize relationship between data, and perform some inferences more easily.
Card et al. [2] propose six major ways in which visualizations can amplify cognition by
La Manuel de Reyes
Moraleja Falla Católicos
10 Hospital del Norte
La Granja Marqués Baunatal
Ronda de la Comunicación de la
Las Tablas Valdavia
Montecarmelo
Tres Olivos
Herrera Oria 9 Fuencarral Pinar de chamartin Parque de Aeropuerto T4
Pitis 1 Manoteras
Barrio del Pilar Begoña Santa María 8
7 4
Ventilla San lorenzo
Arroyo del Fresno Bambú Hortaleza Barajas
Chamartín Mar de
Lacoma Pío XII Pinar del Rey Aeropuerto T1-T2-T3
Valdeacederas Cristal
Avenida de la Illustración
Peñagrande Duque de Campo de las
Tetuán Plaza de Colombia
Antonio Machado Castilla
Pastrana Canillas Naciones
Valdezarza Estrecho Cuzco Esperanza Alameda
Santiago Concha de Osuna
Francos Rodriguez Alvarado Arturo Soria 5
Bernabéu Espina
Cuatro Nuevos República Cruz del El Capricho
Guzmán el Bueno Caminos Ministerios Argentina Avenida de la Paz Canillejas
2 8 Rayo
Metropolitano Alfonso XIII Torre Arias
Ciudad Rios Rosas 6 Prosperidad Suanzes
Universitária Islas Alonso Gregorio Pque. de las Ciudad Lineal
Filipinas Cano Marañón Cartagena Avenidas Barrio de la
Canal
3 Moncloa Concepción
Avda. de América Diego de León El
Quevedo Iglesia
Argüelles San Bernardo Rubén Carmen
4 Núñez Ventas Pueblo Nuevo
Dario Quintana
Bilbao de Balboa
Ventura Lista 2 La Elipa
Rodriguez Noviciado Colón Velázquez Ascao
Pza. de España Garcia Noblejas
Tribunal Alonso Serrano Goya Manuel Becerra Simancas
Martínez San Blas
Santo Gran Príncipe de Vergara Las Musas
Príncipe Pío R O΄Donnell
domingo Via Chueca
Retiro Estadio Olimpico
Lago Callao Ibiza
Sevilla Barrio del Puerto
6 Sainz de Baranda Coslada Central
Batán Pta. del Banco de La Rambla
Angel Ópera R España San Fernando
Sol
5 Casa de Tirso de Molina Estrella Jarama
Alto de La
Campo Antón Martin Vinateros
Extremadura Latina Conde de 7
Campamento Lucero Pta. de Lavapiés Atocha Artilleros Henares
6 Casal
Empalme Toledo Atocha Renfe Pavones
Aluche Laguna Valdebernardo
Acacias Embajadores
Carpetana Menéndez Vicálvaro
Pirámides Pacífico
Eugenia Pelayo San Cipriano
de Montijo Urgel Palos de la Frontera Puente de Vallecas
Marqués Méndez
Delicias Nueva Numancia Puerta de Arganda
Carabanchel Vista Oporto de Vadillo Álvardo
Portazgo
Alegre Arganzuela-
Usera Buenos Aires Rivas Urbanizaciones
Colonia Jardin Planetario
Opañel 11 Alto del Arenal
Legazpi
Plaza Miguel Hernández Rivas Vaciamadrid
Aviación Española Eliptica Almendrales
Abrantes
Hospital 12 de Octubre Sierra de Guadalupe
Pan Bendito San Fermin-Orcasur Villa de Vallecas La Poveda
Cuatro Vientos
San Francisco Ciudad de los Ángeles Congosto
Carabanchel Alto Villaverde Bajo-Cruce La Gavia 9 Arganda del Rey
Joaquin Vilumbrales San Cristóbal Las Suertes
11 La Peseta
3 Villaverde Alto 1 Valdecarros
Puerta Leganés Hospital Casa del Julián
del Sur 12 San Nicasio Central Severo Ochoa Reloj Besteiro El Carrascal El Bercial
10
Metro de Madrid Los Espartales
Parque Lisboa El Casar
© 2007 Designed and drawn by Matthew McLauchlin, http://www.metrodemontreal.com/ 12
Alcorcón Central This version released under Creative Commons Share-Alike Attribution Licence (CC-SA-BY 2.5) Juan de la Cierva
Parque Oeste http://creativecommons.org/licenses/by-sa/2.5/ Getafe Central
Universidad Rey Juan Carlos Not affiliated with, released by, or approved of by the Alonso de Mendoza
Móstoles Central Consorcio Regional de Transportes de Madrid (http://www.metromadrid.es/) Conservatorio
Pradillo Arroyo Culebro
12
Hospital Manuela Loranca Hospital de Parque Fuenlabrada Parque de los
de Móstoles Malasaña Fuenlabrada Europa Central Estados
FIGURE 2.4
Map of the Madrid metro system. (Images licensed under Creative Commons Share-Alike.)
16 Handbook of Educational Data Mining
Following these principles is the key to build what Tufte calls the graphical excellence, and
it consists in “giving the viewer the greatest number of ideas in the shortest time with the
least ink in the smallest space” [18].
A key question in IV is how we convert abstract data into a graphical representation,
preserving the underlying meaning and, at the same time, providing new insight. There
is no “magic formula” that helps the researchers to build systematically a graphical repre-
sentation starting from a raw set of data. It depends on the nature of the data, the type of
information to be represented and its use, but more consistently, it depends on the creativ-
ity of the designer of the graphical representation. Some interesting ideas, even if innova-
tive, have often failed in practice.
Graphics facilitate IV, but a number of issues must be considered [16,18]:
2.4.1.1 UM/QV
QV [6] is an overview interface for UM [5], a toolkit for cooperative user modeling. A model
is structured as a hierarchy of elements of the domain. QV uses a hierarchical representa-
tion of concepts to present the user model. For instance, Figure 2.5 gives a graphical rep-
resentation of a model showing concepts of the SAM text editor. It gives a quick overview
whether the user appears to know each element of the domain. QV exploits different types
Visualization in Educational Environments 17
more_useful set_fname_k
Sam write_k
Editors Mouse
Other
command_window
Root exch_k
search_k
very_useful
mouse xerox_k
Powerful
mostly_useless
emacs
vi
c_c
Programming pascal_c
languages
lisp_c
fortran_c
typing_ok_c
user_info
input wpm_info
FIGURE 2.5
The QV tool showing a user model. (Image courtesy of Judy Kay.)
2.4.1.2 ViSMod
ViSMod [22] is an interactive visualization tool for the representation of Bayesian learner
models. In ViSMod, learners and instructors can inspect the learner model using a
graphical representation of the Bayesian network. ViSMod uses concept maps to render a
18 Handbook of Educational Data Mining
FIGURE 2.6
A screenshot of ViSMod showing a fragment of a Bayesian student model in the area of biology cell. (Image
courtesy of Diego Zapata-Rivera.)
Bayesian student model and various visualization techniques such as color, size proximity
link thickness, and animation to represent concepts such as marginal probability, changes
in probability, probability propagation, and cause–effect relationships (Figure 2.6). One
interesting aspect of this model is that the overall belief of a student knowing a particular
concept is captured taking into account the students’ opinion, the instructors’ opinion, and
the influence of social aspects of learning on each concept. By using VisMod, it is possible
to inspect complex networks by focusing on a particular segment (e.g., zooming or scroll-
ing) and using animations to represent how probability propagation occurs in a simple
network in which several causes affect a single node.
2.4.1.3 E-KERMIT
KERMIT (Knowledge-based Entity Relationship Modelling Intelligent Tutor) [17] is a
knowledge-based intelligent tutoring system aimed at teaching conceptual database
design for university level students. KERMIT teaches the basic entity-relationship (ER)
database modeling by presenting to the student the requirements for a database, and the
student has to design an ER diagram for it. E-KERMIT is an extension of KERMIT devel-
oped by Hartley and Mitrovic [3] with an open student model. In E-KERMIT the student
may examine with a dedicated interface the global view of the student model (see Figure
2.7). The course domain is divided in categories, representing the processes and concepts
in ER modeling. In the representation of the open student model concepts of the domain
are mapped with histograms. The histogram shows how much of the concrete part of the
domain the student knows correctly (in black) or incorrectly (in gray) and the percentage
of covered on the concepts of the category. For instance, the example shows that the stu-
dent covered 32% of the concepts of the category Type, and has scored 23% out of a possible
32% on this category. This means that the student’s performance on category type so far is
77% (23/320â•›×â•›100).
Visualization in Educational Environments 19
FIGURE 2.7
The main view of a student’s progress in E-KERMIT. Progress bars indicate how much the student compre-
hends each category of the domain. (Image reproduced with permission of Tanja Mitrovic.)
2.4.2.1 Simuligne
Simuligne [12] is a research project that uses social network analysis [14] to monitor
group communications in distance learning in order to help instructors detect collabora-
tion problems or slowdown of group interactions. Social network analysis is a research
field that “characterize the group’s structure and, in particular, the influence of each of
the members on that group, reasoning on the relationship that can be observed in that
group” (Reffay and Chanier [12], p. 343). It provides both a graphical and a mathematical
analysis of interactions between individuals. The graphical version can be represented
with a network, where the nodes in the network are the individuals and groups while
the links show relationships or flows between the nodes. The social network analysis can
help to determine the prominence of a student respect to others, and other social net-
work researcher measures, such as the cohesion factor between students. The cohesion
is a statistical measure that represents how much the individuals socialize in a group
that shares goals and values. Reffay and Chanier applied this theory to a list of e-mails
exchanged in a class of distance learners. Figure 2.8 illustrates the graphical representa-
tion of the e-mail graph for each learning group. We can see for instance that there is
no communication with Gl2 and Gl3, or the central role of the tutor in the discussions
(node Gt).
20 Handbook of Educational Data Mining
Gl1
28 25
2 4 Gl
2 20 2 19 12 17 4 2 9 3 2 1
1 1 2 9 4 2 13
Gn2 22 Gl6 24
3 3 8 10
3 3 Gl4 3
4 5
Gn1
FIGURE 2.8
The communication graph of the e-mail exchanged within groups in Simuligne. (Image courtesy of Christophe
Reffay.)
2.4.2.2 PeopleGarden
PeopleGarden [21] uses a flower and garden metaphor to visualize participations on a
message board. The message board is visualized as a garden full of flowers. Each flower
represents one individual. The height of flower denotes amount of time a user has been
at the board and its petals his postings. Initial postings are shown in red, replies in blue.
An example is represented in Figure 2.9. The figure can help the instructor of a course to
quickly grasp the underlying situation, such as a single dominant member in discussion
on the left or a group with many members at different level of participation on the right.
FIGURE 2.9
The PeopleGarden visual representations of participation on a message board. (Image by Rebecca Xiong and
Judith Donath, © 1999 MIT media lab.)
to the instructor of the course, but it is commonly presented in the format of a textual log
file, which is inappropriate for the instructor’s needs [8]. To this end, since the log data is
collected in a format that is suitable to be analyzed with IV techniques and tools, a number
of approaches have been proposed to graphically represent the tracking data generated
by a CMS.
Recently, a number of researches that exploits graphical representations to analyze the
student tracking have been proposed. ViSION [15] is a tool that was implemented to dis-
play student interactions with a courseware website designed to assist students with their
group project work. CourseVis [10] is another application that exploits graphical repre-
sentations to analyze the student tracking data. CourseVis is a visual student tracking
tool that transforms tracking data from a CMS into graphical representations that can be
explored and manipulated by course instructors to examine social, cognitive, and behav-
ioral aspects of distance students. CourseVis was started from a systematic investigation
aimed to find out what information about distance students the instructors need when
they run courses with a CMS, as well as to identify possible ways to help instructors
acquire this information. This investigation was conducted with a survey, and the results
were used to draw the requirements and to inform the design of the graphical representa-
tions. One of the (several) graphical representations produced by CourseVis is reported in
Figure 2.10.
This comprehensive image represents in a single view the behaviors of a specific stu-
dent in an online course. It takes advantage of single-axis composition method (multiple
variables share an axis and are aligned using that axis) for presenting large number of
variables in a 2D metric space. With a common x-axis mapping the dates of the course,
a number of variables are represented. The information represented here are namely the
student’s access to the content pages (ordered by topics of the course), the global access
to the course (content pages, quiz, discussion, etc.), a progress with the schedule of the
course, messages (posted, read, follow-ups), and the submission of quizzes and assign-
ments. For a detailed description of CourseVis, see Mazza and Dimitrova [10].
22
Summary of student’s behaviors from 2002-01-15 to 2002-04-11
Student: Francesco
Variable
Threads
String
Program structure
Package
Overriding
Overloading
Object serialization
Object
Method
Interface
Inner class
Access to Inheritance
I/O streams
content pages File
Exception
by topics Data file
Control flow
Constructor
Class variable
Class method
Class
Basic concepts
Array
Argument
Applet
Access level
Abstract method
Abstract class
Graphic libraries
AWT structure
AWT contexts
AWT components
AWT events
––––
“Stand up straight, Polly, and put your feet down flat, so,” said
Johnny, as Polly slid helplessly along on the backs of her heels,
resting all her little weight confidingly upon the boys. And, after two
or three earnest explanations from Johnny and Pep, she suddenly
seemed to understand; she stiffened up, grasped a hand on each
side, and went off in such style that the boys had almost to run to
keep up with her, and she obeyed her mother’s call very unwillingly.
“Wasn’t it fun to see her little face, though!” said Johnny, as he
and Pep walked home, having declined the proffered drive for the
sake of a little more skating. “I think she thought something had
made her feet slippery, all of a sudden—she’d never been on ice
before.”
The thaw came very soon after this, as thaws will come, even
when people have new steel skates, but happily, there are always
tops and marbles, and, as some wise person has remarked, “When
one door shuts, another opens.”
Johnny did not play marbles “for keeps”;
his father had explained to him that taking
anything without giving a fair return for it is
dishonesty, and as he quite understood this,
he had no desire to “win” marbles from
boys who could not shoot so well as he
could, but he enjoyed playing fully as much
as anybody did, and found the game
exciting enough when played merely for the
hope of victory.
It was in the midst of a very even game that the school bell rang
one morning. Johnny and one other boy were the champions; the
rest had “gone out.” They lingered for one more shot—two more—
then just a third to finish the game, and then, as they hurried into
the schoolroom, they found that the roll had been called, and they
were marked late.
Johnny had intended to take one more look at his history lesson,
but there was no time. He was sure of it all, except two or three
dates, and of course, one of those dates came to him—or rather,
didn’t come; it was the question that came. The next boy gave the
answer, and Johnny’s history lesson for the first time that term, was
marked “Imperfect.”
This vexed him so, that he gave only a small half of his mind to
his mental arithmetic, and he lost his place in the class,—lost it to a
boy who was almost certain to keep it, too.
Thinking of this misfortune, he dropped a penful of ink on his
spotless new copy-book, and, although he promptly licked it off, an
ugly smear remained, and the writing teacher reproved him for
untidiness. So he was very glad when two o’clock struck, and he
could go home and tell his mournful story, for he had an
uncomfortable feeling, under the injured one, that the real,
responsible cause of his misfortunes was one Johnny Leslie.
When his mother had heard it all with much sympathy, she paused
a moment, and then repeated these words,—
“‘That they who do lean only upon the hope of Thy Heavenly
grace, may evermore be defended by Thy mighty power.’”
A sudden light came into Johnny’s face, and he exclaimed,—
“That was it, mamma dear! I wasn’t leaning on it at all, and of
course, I went down! I know all about it now. I didn’t get up when
you called me the first time, and I said my prayers in a hurry, just as
if they were the multiplication table, and I didn’t wait to read the
verse in my little book—I meant to do it after breakfast, but the
marbles rattled in my pocket, and I forgot all about it, I was in such
a hurry to have a game before school. And I wouldn’t stop to think,
when the bell rang, except a sort of make-believe think that a
minute more would not make me too late to answer to my name,
and so I lost the chance to go over those dates. And the question I
missed in mental arithmetic was a mean little easy thing, if I’d had
my wits about me, but I was worrying about the history, and I made
that dreadful blot because I was thinking of both, and did not look,
and dug my pen down to the bottom of the inkstand. It’s just like
‘The House that Jack built.’”
“Yes,” said his mother, “I don’t think anything, the
smallest thing, stands quite alone; it is fast to
something else that it pulls after it, so we must keep
a sharp lookout for the first things. We can’t rub out
this bad day—it is like the blot on your copy book;
you will keep seeing the mark, even if you don’t
make another. But then, you can use the mark, with
the dear Saviour’s help, to keep you from making
another. To-morrow will be another day. You know
Tiny and you like Tennyson’s ‘Bugle Song’ so much,
here is something else he said,—