Literature Review 1. Computational Methods For The Analysis of Learning and Knowledge Building Communities
Literature Review 1. Computational Methods For The Analysis of Learning and Knowledge Building Communities
This article describes the process, practice, and challenges of using predictive modelling
in teaching and learning. In both the fields of educational data mining (EDM) and learning
analytics (LA) predictive modelling has become a core practice of researchers, largely with a
focus on predicting student success as operationalized by academic achievement. In predictive
modelling, the purpose is to create a model that will predict the values (or class if the prediction
does not deal with numeric data) of new data based on observations. Unlike explanatory
modelling, predictive modelling is based on the assumption that a set of known data can be used
to predict the value or class of new data based on observed variables.
Predictive Modelling Workflow:
Problem Identitcation
Data Collection
Classification and Regression
Feature Selection
A number of different algorithms exist for building predictive models. With educational data, it
is common to see models built using methods such as these:
1. Linear Regression predicts a continuous numeric output from a linear combination of
attributes.
2. Logistic Regression predicts the odds of two or more outcomes, allowing for categorical
predictions.
3. Nearest Neighbours Classifiers use only the closest labelled data points in the training
dataset to determine the appropriate predicted labels for new data.
4. Decision Trees (e.g., C4.5 algorithm) are repeated partitions of the data based on a series of
single attribute “tests.” Each test is chosen algorithmically to maximize the purity of the
classifications in each partition.
5. Naïve Bayes Classifiers assume the statistical independence of each attribute given the
classification, and provide probabilistic interpretations of classifications.
6. Bayesian Networks feature manually constructed graphical models and provide probabilistic
interpretations of classifications.
7. Support Vector Machines use a high dimensional data projection in order to find a
hyperplane of greatest separation between the various classes.
8. Neural Networks are biologically inspired algorithms that propagate data input through a
series of sparsely interconnected layers of computational nodes (neurons) to produce an
output. Increased interest has been shown in neural network approaches under the label of
deep learning.
9. Ensemble Methods use a voting pool of either homogeneous or heterogeneous classifiers.
Two prominent techniques are bootstrap aggregating, in which several predictive models are
built from random sub-samples of the dataset, and boosting, in which successive predictive
models are designed to account for the misclassifications of the prior models.
4. Going beyond Better Data Prediction to Create Explanatory Models of Educational Data
The vast majority of educational data mining research has focused on achieving pre-
dictive accuracy, but this paper argue that the field could benefit from more focus on developing
explanatory models. Explanatory models seek to identify interpretable causal relationships
between constructs that can be either observed or inferred from the data. Educational data mining
research has largely focused on developing two types of models: the statistical model and the
cognitive model. Statistical models drive the outer loop of intelligent tutoring systems based on
observable features of students’ performance as they learn. Cognitive models are representations
of the knowledge space (facts, concepts, skills, et cetera) underlying a particular educational
domain. Cognitive models map knowledge components to problem steps or tasks on which
student performance can be observed. This mapping provides a way for statistical models to
make inferences about students’ underlying knowledge based on their observable performance
on different problem steps. Thus, cognitive models are an important basis for the instructional
design of automated tutors and are important for accurate assessment of learning and knowledge.
Better cognitive models lead to better predictions of what a student knows, allowing adaptive
learning to work more efficiently. Learning factors analysis was developed to automate the data-
driven method of KC model refinement to further alleviate demands on human time. LFA
searches across hypothesized knowledge components drawn from different existing KC models,
evaluates different models based on their fit to data, and outputs the best-fitting KC model in the
form of a symbolic model. As such, LFA greatly reduces demands on human effort while
simultaneously easing the burden of interpretation, even if it does not automatically accomplish
it. The relationships between the fields of educational data mining, learning theory, and the
practice of education could be greatly strengthened with increased attention to the explanatory
power of models and their ability to influence future learning outcomes.
5. Natural Language Processing and Learning Analytics
In this research work, the authors discuss multiple, available NLP tools that can be
harnessed to understand discourse, as well as some applications of these tools for education. A
primary focus of these tools is the automated interpretation of human language input in order to
drive interactions between humans and computers, or human–computer interaction. Thus, the
tools measure a variety of linguistic features important for understanding text, including
coherence, syntactic complexity, lexical diversity, and semantic similarity. NLP can be used to
describe multiple facets of language from simple descriptive statistics, such as the number of
words, n-grams, and paragraphs, to the features of words, sentences, and text. Information is
provided by the features of the words, the sentences, and the text as a whole. This information
can be analyzed using machine learning techniques such as linear regression, discriminant
function classifiers, Naïve-Bayes classifiers, support vector machines, logistic regression classi-
fiers, and decision tree classifiers. When these techniques are used to predict learning outcomes,
algorithms can be derived that can then be used within educational technologies or applications.
The most common NLP approach to analyzing student language in MOOCS has been through
tools that analyze emotions. Sentiment analysis examines language for positive or negative
emotion words or words related to motivation, agreement, cognitive mechanisms, or
engagement. Students who were more likely to receive a certificate of completion in the course
generally used more sophisticated language. For example, their posts were more concise and
cohesive, used less frequent and specific words, and had greater overall writing quality.
Interestingly, indices related to affect were not predictive of completion rates. Collectively, this
research provides promising evidence that NLP can be a powerful predictor of success in the
context of MOOCs. Communication between the instructor and the students as well as between
the students is crucial, particularly for distance courses. Further, this communication can then be
used as forms of assessment of student performance. Therefore, it seems apparent that MOOCs
should include discussion forums in order to better monitor student participation and potential
success. The language that students use can also be utilized to identify students who are less
likely to complete the course, and target those students for interventions such as sending emails,
suggesting content, or recommending tutoring. Automating language understanding, and thereby
providing information about the language and social interactions within these courses, will help
to enhance both learning and engagement in MOOCs.
6. Discourse Analytics
This research work introduces the area of discourse analytics (DA). Discourse analytics
has its impact in multiple areas, including offering analytic lenses to support research, enabling
formative and summative assessment, enabling of dynamic and context sensitive triggering of
interventions to improve the effectiveness of learning activities, and provision of reflection tools
such as reports and feedback after learning activities in support of both learning and instruction.
The purpose of this research work is to encourage both an appropriate level of hope and an
appropriate level of skepticism for what is possible while also exposing the reader to the breadth
of expertise needed to do meaningful work in this area. Key decisions that strongly influence
how the data will appear through the analytic lens are made at the representation stage. At this
stage, text is transformed from a seemingly monolithic whole to a set of features that are said to
be extracted from it. Each feature extractor asks a question of the text, and the answer that the
text gives is the value of the corresponding feature within the representation. The most popular
unsupervised techniques in the education space include factor analytics approaches like latent
semantic analysis or structured latent variable models like latent Dirichlet allocation or LDA.
Recently, applications of supervised machine learning have been applied to the problem of
assessment of learning processes in discussion. This problem is referred to as automatic
collaborative-learning process analysis. Automatic analysis of collaborative processes has value
for real-time assessment during collaborative learning, for dynamically triggering supportive
interventions in the midst of collaborative-learning sessions, and for facilitating efficient analysis
of collaborative-learning processes at a grand scale. This dynamic approach has been
demonstrated to be more effective than an otherwise equivalent static approach to support.
7. Emotional Learning Analytics
This research work discusses the ubiquity and importance of emotion to learning. It
argues that substantial progress can be made by coupling the discovery-oriented, data-driven,
analytic methods of learning analytics (LA) and educational data mining (EDM) with theoretical
advances and methodologies from the affective and learning sciences. Emotion is related, but not
equivalent to motivation, attitudes, preferences, physiology, arousal, and a host of other
constructs often used to refer to it. Emotions are also distinct from moods and affective traits.
Affective states cannot be directly measured because they are conceptual entities (constructs).
However, they emerge from environment–person interactions (context) and influence action by
modulating cognition. Affect is an embodied phenomenon in that it activates bodily response
systems for action. This should make it possible to infer learner affect (a latent variable) from
machine-readable bodily signals (observables). Language communicates feelings. Hence,
sentiment analysis and opinion mining techniques have considerable potential to study how stu-
dents' thoughts (expressed in written language) about a learning experience predicts relevant
behaviours (most importantly attrition). Recent advances in sensing and signal processing
technologies have made it possible to automatically model aspects of students' classroom
experience that could previously only be obtained from self-reports and cumbersome human
observations. For example, second-generation Kinects can detect whether the eyes or mouth are
open, if a person is looking away, and if the mouth has moved, for up to six people at a time.
Learning is not a cold intellectual activity; it is punctuated with emotion. The emotions are not
merely decorative, they have agency. But emotion is a complex phenomenon with multiple
components that dynamically unfold across multiple time scales. If anything, the discovery-
oriented, data-driven, analytic methods of LA and EDM, along with an emphasis on real-world
data collection, has the unique potential to advance both the science of learning and the science
of emotion. It all begins by incorporating emotion into the analysis of learning.
8. Multimodal Learning Analytics
This research work presents a different way to approach learning analytics (LA) praxis
through the capture, fusion, and analysis of complementary sources of learning traces to obtain a
more robust and less uncertain understanding of the learning process. The sources or modalities
in multimodal learning analytics (MLA) include the traditional log-file data captured by online
systems, but also learning artifacts and more natural human signals such as gestures, gaze,
speech, or writing. Computer-based learning systems, even if not initially designed with
analytics in mind, tend to capture automatically, in fine-grained detail, the interactions with their
users. The data describing these interactions is stored in many forms; for example, log-files or
word-processor documents that can be later mined to extract the traces to be analyzed. In its
communication theory definition, multimodality refers to the use of diverse modes of
communication (textual, aural, linguistic, spatial, visual, et cetera) to interchange information
and meaning between individuals. Humans tend to look directly at what draws their attention. As
such, the direction of the gaze of an individual is a proxy indicator of the direction of his or her
attention. Posture, gestures, and motion are three interrelated modes, jointly referred as body
language, although each one could carry different types of information. Posture refers to the
position that the body or part of the body adopts at a given moment in time. Gestures are
coordinated movements from different parts of the body, especially the head, arms, and hands to
communicate a specific meaning. This non-verbal form of communication is usually conscious.
It is used as a way to provide short feedback loops and alternative emphasizing channels in the
learning process. Motion is any change in body position not necessary to acquire a new posture
or to perform a given gesture. This motion is often the result of unconscious body movements
that reveal the inner state of the subject during the learning process. The action mode is very
similar to the gesture and motion modes. Both are body movements usually captured by video
recordings in MLA. However, actions are purposeful movements, usually involving the
manipulation of a tool, that are usually learned. The human face can communicate very complex
mental states through relatively simple expressions. The most common use of audio recordings
in MLA is to capture traces of what the student is talking about or listening to. As the main and
most complex form of communication among humans, speech is especially important in
understanding the learning process. In the current practice of MLA, two main signals are
extracted from audio recordings: what is being said and how it is being said. Two closely related
modes are writing and sketching. They both use an instrument, most commonly a pen, to
communicate complex thoughts. ITSs are usually studied by traditional LA using log-files.
However, video and audio of the learner have been captured to add new modes that complement
the interaction data. LA has revolutionized the approaches used to understand and optimize the
learning process. However, its current bias towards studies and tools involving only computer-
based learning contexts jeopardizes its applicability to learning in general. MLA is a subfield that
seeks to integrate non-computer mediated learning contexts into the mainstream research and
practice of LA.
9. Learning Analytics Dashboards
This research work presents learning analytics dashboards that visualize learning traces to give
users insight into the learning process. The objectives of these dashboards include providing
feedback on learning activities, supporting reflection and decision making, increasing
engagement and motivation, and reducing dropout. These learning analytics dashboards apply
information visualization techniques to help teachers, learners, and other stakeholders explore
and understand relevant user traces collected in various (online) environments. The overall
objective is to improve (human) learning. This paper makes useful distinction between various
types of dashboards:
1. Dashboards that support traditional face-to-face lectures, so as to enable the teacher to adapt
the teaching, or to engage students during lecture sessions.
2. Dashboards that support face-to-face group work and classroom orchestration, for instance
by visualizing activities of both individual learners and groups of learners.
3. Dashboards that support online or blended learning: an early famous example is Course
Signals that visualizes predicted learning outcomes as a traffic light, based on grades in the
course so far, time on task and past performance
As for what can be incorporated into a dashboard, lists the following kinds of data:
1. Artefacts produced by learners, including blog posts, shared documents, software, and other
artefacts that would often end up in a student project portfolio.
2. Social interaction, including speech in face-to-face group work, blog comments, Twitter or
discussion forum interactions.
3. Resource use can include consultation of documents (manuals, web pages, slides), views of
videos, etcetera. Techniques like software trackers and eye-tracking can provide detailed
information about what parts of resources exactly are being used and how.
4. Time spent can be useful for teachers to identify students at risk and for students to compare
their own efforts with those of their peers.
5. Test and self-assessment results can provide an indication of learning progress.
The first step is getting to know the problem domain, the data set, the intended end-users of the
tool, the typical tasks they should be able to perform, and so on. The following questions need to
be answered at this stage:
1. Why: What is the goal of the visualization? What questions about the data should it answer?
2. For whom: For whom is the visualization intended? Are the people involved specialists in the
domain, or in visualization?
3. What: What data will the visualization display? Do these data exhibit a specific internal
structure, like time, a hierarchy, or a network?
4. How: How will the visualization support the goal? How will people be able to interact with
the visualization? What is the intended output device?
The following are the evaluation techniques:
1. Effectiveness, which can refer to engagement, higher grades or post-test results, higher reten-
tion rates, improved self-assessment, and overall course satisfaction.
2. Efficiency in the use of time of a teacher or learner.
3. Usability and usefulness evaluations often focus on teachers being able to identify learners at
risk or asking learners how well they think they are performing in a course.
Information visualization concepts and methodologies are key enablers for
Learners to gain insight into their learning actions and the effects these have.
Teachers to stay aware of the subtle interactions in their courses.
Researchers to discover patterns in large data sets of user traces and to communicate
these data to their peers.
Designing and creating an effective information visualization system for learning analytics is an
art, as the designer needs both domain expertise on learning theories and paradigms as well as
techniques ranging from visual design to algorithm design.
The use of learning analytics in ICT-rich learning environments assists teachers to (re)design
their learning scenarios. In this paper authors present preliminary research about four dimensions
of learning analytics (engagement, assessment, progression, satisfaction), and their visualization
as teaching analytics, that are hypothesized to be relevant to help teachers in the (re)design of
their learning scenarios. The term teacher inquiry has been defined as “a systematic, intentional
research by teachers” which aims at improving instructions in four levels:
1. By defining important instructional problems specific to the local context of the
participating teachers
2. By planning and implementing instructional solutions-Connecting theory to action
3. By using evidence to drive reflection and analysis
4. By working towards detectable improvements and specific cause-effect findings about
teaching and learning
The research question explored in this paper is:
RQ: Which learning analytics are useful to (re)design or to re-use a learning design?
This research question is investigated though the following more specific questions:
RQ1: Are the above learning analytics dimensions (engagement, assessment, progress,
satisfaction) or other information perceived as valuable by educational practitioners?
RQ2: Is there any relation between the four dimensions and between the dimensions and
the contexts of the students?
RQ3: In a collective level, do educators will to share learning analytics visualizations or
to look at the results of their colleagues?
Correlation analysis between those dimensions showed that perceived usefulness of engagement
analytics was correlated with progression and assessment with satisfaction. Moreover, interest in
knowing the educational context was correlated with interest about engagement and assessment.
The qualitative responses of the participants regarding additional information which could help
them to redesign their course or re-use an implemented design showed the importance of having
descriptive qualitative information about face to face sessions such as teacher reports and
observations about the levels of students´ interactions. Data-driven reflections on the teaching
practice can impact the way in which educators design for learning and deliver their teaching.
Educational teams or communities can be formed around situated activities such as teacher
planning, analysis of student’s data and improvement of learning designs. In this paper authors
analyzed which learning analytics data or additional information is useful to help educational
practitioners to redesign their learning scenarios. Authors considered their analysis within
teacher’s inquiry teams or wider communities and thus we proposed four learning analytics data
which can be aligned with teacher’s pedagogical intentions expressed in a learning design and
can drive discussions
References:
1. Hoppe, Hanno. “Computational Methods for the Analysis of Learning and Knowledge
Building Communities.” (2017).
2. Bergner, Y. (2017). Measurement and its Uses in Learning Analytics. In C. Lang, G.
Siemens, A. F. Wise, & Gašević, D. (Eds.) The Handbook of Learning Analytics (pp. 34–
48). Society for Learning Analytics Research (SoLAR), Alberta, Canada.
3. Brooks C, Thompson C. Predictive Modelling in Teaching and Learning Society for
Learning Analytics Research (SoLAR); 2017. p. 61–68.
4. Liu, Ran and Kenneth R. Koedinger. “Going Beyond Better Data Prediction to Create
Explanatory Models of Educational Data.” (2017).
5. McNamara, D.S., Allen, L.K., Crossley, S.A., Dascalu, M., Perret, C.A.: Natural
language processing and learning analytics. In: Siemens, G., Lang, C. (eds.) Handbook
of Learning Analytics and Educational Data Mining (in press)
6. Rosé, C. (2017). Discourse analytics. In C. Lang, G. Siemens, A. Wise, & D. Gašević
(Eds.), Handbook of learning analytics (1st ed., pp. 105–114). Edmonton: SoLAR.
7. D'Mello, Sidney K. “Emotional Learning Analytics.” (2017).
8. Blikstein, Paulo. (2013). Multimodal learning analytics. 10.1145/2460296.2460316.
9. Rienties, Bart & Herodotou, Christothea & Olney, Tom & Schencks, Mat & Boroowa,
Avinash. (2018). Making Sense of Learning Analytics Dashboards: A Technology
Acceptance Perspective of 95 Teachers. The International Review of Research in Open
and Distributed Learning. 19. 10.19173/irrodl.v19i5.3493.
10. Siemens G, Gasevic D, Haythornthwaite C, Dawson S, Buckingham Shum S, Ferguson
R, Duval E, Verbert K, Baker RSJD (2011) Open learning analytics: an integrated &
modularized platform proposal to design, implement and evaluate an open platform to
integrate heterogeneous learning analytics techniques.
11. Sclater, N., Peasgood, A. and Mullan, J. (2016), “Learning analytics in higher education:
a review of UK and international practice”, available at: www.jisc.ac.uk/reports/learning-
analytics-in-higher-education.
12. Q. Zhou, X. Han, J. Yang and J. Cheng, "Design and Implementation of Learning
Analytics System for Teachers and Learners Based on the Specified LMS," 2014
International Conference of Educational Innovation through Technology, Brisbane,
2014, pp. 79-82, doi: 10.1109/EITT.2014.21.
13. Rebholz, S., Libbrecht, P., and Müller, W. 2012. Learning analytics as an investigation
toolfor teaching practicioners. In Proceedings of the Workshop Towards Theory and
Practice of Teaching Analytics 2012 (TaPTA-2012), Saarbrücken, Germany, 2012.
CEUR-WS.
14. Michos K, Hernández-Leo D. Towards understanding the potential of teaching analytics
within educational communities. In: Vatrapu R, Kickmeier-Rust M, Ginon B, Bull S.
IWTA 2016 International Workshop on Teaching Analytics. Proceedings of the Fourth
International Workshop on Teaching Analytics, in conjunction with EC-TEL 2016; 2016
Sept 16; Lyon, France. [place unknown]: CEUR Workshop Proceedings, 2016. p. 1-8.
15. Clow, Doug (2013). An overview of learning analytics. Teaching in Higher Education,
18(6) pp. 683–695.