0% found this document useful (0 votes)
43 views9 pages

Moving Beyond Test Scores: Analyzing The Effectiveness of A Digital Learning Game Through Learning Analytics

The document analyzes the effectiveness of a digital learning game called Decimal Point that teaches decimal numbers and operations to middle school students. It uses learning analytics methods like Bayesian Knowledge Tracing on student data to evaluate how well students mastered skills in the game and to predict learning outcomes and enjoyment. The analysis yielded insights into improving learning support in the game.

Uploaded by

siti mutmainah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views9 pages

Moving Beyond Test Scores: Analyzing The Effectiveness of A Digital Learning Game Through Learning Analytics

The document analyzes the effectiveness of a digital learning game called Decimal Point that teaches decimal numbers and operations to middle school students. It uses learning analytics methods like Bayesian Knowledge Tracing on student data to evaluate how well students mastered skills in the game and to predict learning outcomes and enjoyment. The analysis yielded insights into improving learning support in the game.

Uploaded by

siti mutmainah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Moving beyond Test Scores: Analyzing the Effectiveness

of a Digital Learning Game through Learning Analytics

Huy Anh Nguyen Xinying Hou John Stamper


Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University
[email protected] [email protected] [email protected]

Bruce M. McLaren
Carnegie Mellon University
[email protected]

ABSTRACT studies by [10] showed that game conditions promoted signif-


A challenge in digital learning games is assessing students’ icantly more learning than non-game conditions with equiv-
learning behaviors, which are often intertwined with game alent knowledge content, and that augmented game designs
behaviors. How do we know whether students have learned with more learning-oriented features were more instruction-
enough or needed more practice at the end of their game ally effective than standard designs.
play? To answer this question, we performed post hoc anal-
yses on a prior study of the game Decimal Point, which While this prior research has demonstrated that digital learn-
teaches decimal numbers and decimal operations to mid- ing games can enhance learning, the next step is to exam-
dle school students. Using Bayesian Knowledge Tracing, we ine how they do so. In particular, even though the common
found that students had the most difficulty with mastering measures of pretest and posttest scores are necessary to eval-
the number line and sorting skills, but also tended to over- uate students’ transferable learning, they are inadequate to
practice the skills they had previously mastered. In addition, address many questions about how learning takes place dur-
using students’ survey responses and in-game measurements, ing the game. For example, did students get just enough
we identified the best feature sets to predict test scores and practice from the game, or more practice than necessary?
self-reported enjoyment. Analyzing these features and their How does in-game learning correlate with test performance?
connections with learning outcomes and enjoyment yielded These questions have been explored in great detail in Intel-
useful insights into areas of improvement for the game. We ligent Tutoring Systems (ITS), but not as much in digital
conclude by highlighting the need for combining traditional learning games, primarily because of the differences in design
test measures with rigorous learning analytics to critically approaches between these two platforms. ITS are typically
evaluate the effectiveness of learning games. very structured environments where students are frequently
evaluated on their knowledge and, in the mastery learning
Keywords settings [28], move to a new skill as soon as the system de-
Decimal, Digital Learning Game, Bayesian Knowledge Trac- termines they have mastered the current skill. In contrast,
ing, Over-practice digital learning games emphasize students’ freedom in shap-
ing their own learning experience without concern about the
consequences of failure [15]; as a result, the game’s learning
1. INTRODUCTION objectives are not always obvious to the students [4]. The
Digital learning games are typically regarded as a power- question, then, is how can we combine the traditional pretest
ful tool to promote learning by engaging students with a and posttest measures in learning game studies with learn-
novel and interactive game environment. While there have ing analytics methods from ITS to paint a better picture of
been concerns about the lack of empirical results on learn- students’ learning, both inside and outside of the game con-
ing games’ effectiveness [21, 32], recently we have seen more text? Furthermore, given the game’s dual goal of promoting
research that addresses this issue by showing students’ learn- both learning and enjoyment, do in-game learning metrics
ing gains from pretest to posttest in rigorous randomized ex- also relate to students’ enjoyment in any meaningful way?
periments [9, 41, 52]. More generally, a meta-analysis of 69
Our work explores these questions in the context of Decimal
Point, a game that teaches decimal numbers and operations
to middle-school students. Here we present a post hoc analy-
Huy Nguyen, Xinying Hou, John Stamper and Bruce McLaren sis of the data from a prior study [22]. First, we investigated
"Moving beyond Test Scores: Analyzing the Effectiveness of a how well students mastered the in-game skills, how long it
Digital Learning Game through Learning Analytics" In: took them to master each skill, and whether students con-
Proceedings of The 13th International Conference on tinued practicing after mastery. Next, we used student data
Educational Data Mining (EDM 2020), Anna N. Rafferty, Jacob from before and during game play to predict their learning
Whitehill, Violetta Cavalli-Sforza, and Cristobal Romero (eds.) outcomes and enjoyment after the game. Based on this re-
2020, pp. 487 - 495

487 Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020)
sult, we derived lessons for improving learning support in
Decimal Point as well as in a more general learning game
context.

2. RELATED WORK
2.1 Learning Analytics in Games
In-game formative assessment can be a powerful comple-
mentary tool for capturing students’ learning progress [59].
Traditional formative measures typically make use of game-
based metrics, such as the number of completed levels or
the highest level beaten [2, 11], but these metrics may not
always align with actual learning. Prior studies on Deci-
mal Point, for instance, reported that students who played
more mini-game rounds did not learn more than those who
played fewer [18, 39]. An alternative approach is to employ
learning analytics methods from ITS studies. For exam-
ple, learning curve analysis, which visualizes students’ error
rates over time, has been applied in several learning games
and yielded valuable insights that range from instructional
redesign lessons to discovery of unforeseen strategy by stu-
dents [17, 29, 42].

Learning analytics techniques can also connect formative as-


sessment with external performance. For example, Bayesian
networks have been applied to predict posttest responses
from students’ in-game data in several learning games [30,
48,54]. Similarly, [27] employed feature engineering and gra- (a) Goal (b) Space Raider
dient boosted random forest algorithm to identify struggling
students in real-time in a physics learning game. Recently Figure 1: Screenshots of the main map screen and
we have also seen more usage of deep learning for this predic- two example mini-games. Goal is a Number Line game
tion task [24, 51]. In general, research work in this direction and Space Raider is a Sorting game.
can illustrate how well students’ learning aligns with the
game’s learning objectives, while also guiding the develop-
ment of adaptive support game features.
In each mini-game, students solve a number of decimal prob-
lems related to the game’s targeted skill and receive imme-
2.2 Decimal Point diate feedback about the correctness of their answers. Stu-
Decimal Point is a web-based single-player digital learning dents don’t face penalty on incorrect responses and can re-
game that helps middle-school students learn about decimal submit answers as many times as needed; however, they are
numbers and their operations (e.g., adding and comparing). not allowed to move forward without solving all the prob-
The game features an amusement park metaphor, with a lems in the mini-game. More details about the instructional
map of the park used to guide students (Figure 1). There content of the mini-game problems can be found in [35].
are 8 theme areas with 24 mini-games, connected by a line
that is designed to interleave skill types and theme areas. The original study of Decimal Point showed that the game
Each mini-game is aimed at helping students solve one of led to more learning and enjoyment than a conventional tu-
the common decimal misconceptions: Megz (longer decimals tor with the same instructional content [35]. Subsequent
are larger), Segz (shorter decimals are larger), Pegz (the two studies have integrated the element of agency into the game,
sides of a decimal number are separate and independent) by endorsing students to select their preferred mini-games
and Negz (decimals smaller than 1 are treated as negative to play and stopping time [18, 39]. Based on their find-
numbers) [25]. Also, each mini-game calls for one of the ings, students who were provided agency acquired equiva-
following skills: lent learning gains in less time than those who were not.
Most recently, a study by [22] compared two versions of the
1. Addition: add two decimals by entering the carry dig- game, one that encourages students to play to learn, and one
its and the sum. that encourages them to play for fun. Their results indicated
2. Bucket: compare given decimals to a threshold num- that the learning-oriented group focused on re-practicing the
ber and place each decimal in a “less than” or “greater same mini-games, while the enjoyment-oriented group did
than” bucket. more exploration of different mini-games. In general, while
3. Number Line: locate the position of a decimal number all of these previous works reported that students learned
on the number line. from the game across all study conditions, it is not yet clear
4. Sequence: fill in the next two numbers of a sequence which game factors contributed to these findings. Further-
of decimal numbers. more, no connection between students’ learning and their
5. Sorting: sort a list of decimal numbers in ascending enjoyment has been identified. Our work aims at acquiring
or descending order. more insights into these areas.

Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) 488
Table 1: Survey items before and after game play.

Pre-intervention surveys
Dimension (item count) Example statement Cronbach’s α
Decimal efficacy (3) [44] I can do an excellent job on decimal number math assignments. .83
Computer efficacy (3) [31] I know how to find information on a computer. .71
Identification agency (2) [50] I work on my classwork because I want to learn new things. .60
Intrinsic agency (2) [50] I work on my classwork because I enjoy doing it. .86
External agency (3) [50] I work on my classwork so the teacher won’t be upset with me. .61
Perseverance (3) [12] Setbacks don’t discourage me. I don’t give up easily. .79
Math utility (3) [13] Math is useful in everyday life. .63
Math interest (2) [14] I find working on math to be very interesting. .75
Expectancy (1) [23] I plan to take the highest level of math available in high school. -
Post-intervention surveys
Dimension (item count) Example statement
Affective engagement (3) [5] I felt frustrated or annoyed. .78
Cognitive engagement (3) [5] I tried out my ideas to see what would happen. .54
Game engagement (5) [7] I lost track of time. .74
Achievement emotion (6) [43] Reflecting on my progress in the game made me happy. .89

3. DATASET enjoyment (also from 1 to 5) via survey questions that ad-


Our work uses data from 159 fifth and sixth grade students dress four enjoyment dimensions (Table 1). If a dimension
in our prior study [22], where students could select and play comprises several items, we compute the average ratings of
the mini-games from the map in Figure 1 in any order, and all items in that dimension to derive its representative rat-
were allowed to stop playing at any time after finishing 24 ing score. According to [16], a measure should have α ≥ .60
mini-game rounds. They could also play more rounds of to be considered reliable; therefore, based on Table 1 we re-
the completed mini-games, with the same game mechanics moved the cognitive engagement dimension (with α = .54)
but different question content. For example, the first round from further analyses.
of the mini-game Goal asks students to locate 0.76 on the
number line, while the second round features the same game The full log data from the study is archived in the DataShop
interactions but involves locating 0.431. Before playing, stu- repository [55], in dataset number 3086. We present our
dents did a pretest and answered demographic survey ques- analysis of this data in the following section.
tions. After game play, they completed another survey to
evaluate their experience and did a posttest, followed by a 4. RESULTS
delayed posttest one week later. Here we outline the mea- 4.1 Investigating in-game learning
sures which are relevant to our analyses. A more detailed In our prior work on Knowledge Component (KC) model-
description of the experimental design can be found in [22]. ing in Decimal Point, based on data from a separate study,
we used the correctness of the student’s first attempt in an-
Pretest, Posttest, and Delayed Posttest: Each test swering each mini-game problem to update their mastery
consisted of 43 items, for a total of 52 points. The items were of the KC covered by that mini-game. With this mapping
designed to probe for specific decimal misconceptions, and from in-game action to KC, we found that students’ learn-
involved either the five decimal skills targeted by the game or ing can be better captured by a KC model based on skill
conceptual questions (e.g., “is a longer decimal larger than a types (e.g., Addition, Bucket) than on decimal misconcep-
shorter decimal?”). There are three test versions (A, B and tions (e.g., Segz, Negz) [40]. Therefore, in this work we used
C), which are isomorphic to one another and counterbal- the five skill types as our KCs, and tracked students’ learn-
anced across students (e.g., ABC, ACB, BAC, etc. for pre, ing progress of these skills by Bayesian Knowledge Tracing
post, and delayed). Our prior analysis showed no differences (BKT) [60]. The BKT parameters were set as p(L0 ) = 0.4,
in difficulty between the three versions [22]. p(T ) = 0.05, p(S) = p(G) = 0.299 [3], and the mastery
threshold is 0.9.
Questionnaires: Before game play, students reported their
age and gender, as well as their ratings to survey items about First, we looked at how well students mastered each of the
their background information, from 1 (“Strongly Disagree”) five skills in the game. Comparing the students’ final mas-
to 5 (“Strongly Agree”). After playing, students rated their tery probabilities in each skill and our mastery threshold,

489 Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020)
we observed that: there were 4 students who did not master
any skill, 20 students who mastered one skill, 33 students
who mastered two skills, 42 students who mastered three
skills, 34 students who mastered four skills, and 26 students
who mastered all five skills. Next, we counted how many op-
portunities each student who mastered a skill took to reach
mastery in that skill. An opportunity is defined as one com-
plete decimal exercise; each mini-game round consists of one
opportunity, except for those in Sequence, which contain
three opportunities (i.e., students have to fill in three deci-
mal sequences per round). The distributions of opportunity
count until mastery are plotted in Figure 2, which shows
that Number Line and Sorting took the longest to master,
Figure 3: Over-practice ratio in each skilll. The
at around 5 opportunities on average. For Number Line, one
number next to each skill indicates the count of stu-
student even needed 26 opportunities to reach mastery.
dents who mastered that skill and were included in
the violin plot.

formed feature selection with linear regression, and (2) ran


another linear regression model with the selected features on
the full dataset to inspect the coefficient and significance of
each feature. In step (1), we use the mlxtend library [45] to
run a forward feature selection procedure that returns the
feature subset with the best cross-validated performance,
measured in terms of mean squared error (MSE).

In predicting posttest scores, our feature selection identi-


fied three features: Bucket mastery, Sorting mastery and
Figure 2: Opportunity counts until mastery for each pretest score. A linear regression model with these three fea-
skill. The number next to each skill indicates the tures, when trained and evaluated on the entire dataset, had
count of students who mastered that skill and were an MSE of 26.167 and an adjusted R2 of .735. Based on the
included in the violin plot. regression table, the coefficient and significance of each fea-
ture was as follow: pretest score with β = 0.734, p < .001,
Next, we examined how well students regulated their learn- Bucket mastery with β = 6.833, p < .001, Sorting mastery
ing, i.e., after mastering a skill, did they tend to continue with β = 5.100, p = .001. In other words, pretest scores,
practicing the same skill, or switch to a different skill? For Bucket mastery and Sorting mastery each had a positive
each student, following [8], once they mastered a skill (≥ and significant association with posttest scores.
90% mastery probability), we considered their subsequent
opportunities as over-practice. Then, for each student who The delayed posttest model incorporated two additional fea-
mastered a particular skill, we computed the ratio between tures – Number Line mastery and gender – and yielded an
their over-practice count and total opportunity count in that MSE of 24.218, as well as an adjusted R2 of .747. Based on
skill. Plotting these ratios for all the mastered students in the regression table, the coefficient and significance of each
each skill (Figure 3), we observed that between 20-80% of feature was as follows: pretest score with β = 0.730, p <
a student’s practice opportunities in a skill could be con- .001, Bucket mastery with β = 4.276, p = .018, Sort-
sidered over-practice, i.e., they took place after the student ing mastery with p = 4.270, p = .003, Number Line with
had mastered the skill. β = 3.099, p = .029, and gender with β = 1.426, p = .074.
In other words, the three skill mastery values – Bucket,
Sorting, Number Line – as well as pretest score each had
4.2 Investigating factors related to posttest and a positive and significant association with delayed posttest
delayed posttest performance score, while gender (male = 0, female = 1) had a positive
Having examined students’ in-game learning, we then looked and marginally significant association
at how it related to test performance after the game. In or-
der to predict posttest and delayed posttest scores, we col-
lected features that reflected students’ in-game learning and 4.3 Investigating factors related to enjoyment
also included demographic measures that account for indi- For each enjoyment dimension measured in post-intervention
vidual student differences. In total, we considered 19 fea- surveys (achievement emotion, game engagement, affective
tures: pretest score, decimal efficacy, gender, computer effi- engagement - see Table 1), we computed the per-student
cacy, identification agency, intrinsic agency, external agency, average Likert scores to the statements in that dimension.
perseverance, utility, math interest, expectancy, final in- Then, we performed the same feature selection procedure as
game mastery probabilities of the five skills (Addition, Bucket, in 4.2 and reported our results in Table 2.
Sequence, Number Line, Sorting), total opportunity count,
over-practice opportunity count and total incorrect answer We observed that the adjusted R2 values of the game en-
counts. To identify the most important features, we (1) per- gagement and affective engagement models were much lower

Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) 490
Table 2: Results of feature selection for predicting game enjoyment. The Overall performance row indicates
the selected model’s scores when trained and evaluated on the entire dataset.

Achievement Emotion Game Engagement Affective Engagement


Selected features computer efficacy, identification math interest, com- decimal efficacy, gender, intrinsic
agency, intrinsic agency, math inter- puter efficacy, gender agency, Sorting mastery, Bucket
est, pretest score, total opportunity mastery, total incorrect attempt
count count, identification agency
Overall MSE = 0.520 MSE = 0.602 MSE = 0.660
2 2
performance Adjusted R = 0.386 Adjusted R = 0.225 Adjusted R2 = 0.218

than those of the test score models. Even when trained tions of the game should provide more instructional support
and evaluated on the entire dataset, Linear Regression could that can react to various misconceptions from students, for
only explain about 20% of the variance in game engagement example via explanatory feedback [19] or predefined error
and affective engagement. On the other hand, the achieve- messages for different types of error [36].
ment emotion model did have reasonable performance (ad-
justed R2 = .386), so we focused on analyzing the fea- Once students have mastered a skill, however, our analy-
tures in this model. The linear regression table showed sis showed that over-practice was very common, i.e., stu-
the coefficient and significance of each feature as follows: dents kept playing more mini-games in the mastered skill.
computer efficacy with β = 0.047, p = .063, identification At the same time, there were only 26 out of 159 students
agency with β = 0.099, p = .024, intrinsic agency with who mastered all five skills, suggesting that the majority
β = 0.116, p = .002, math interest with β = 0.114, p = .001, of students still had room for improvement in the unmas-
pretest score with β = −0.017, p = .011, opportunity count tered skills but chose not to practice them. One possible
with β = 0.009, p = .033. In other words, computer efficacy reason is that the game environment did not explicitly indi-
had a positive and marginally significant association, while cate when the student has reached mastery or force them to
pretest score had a negative and significant association; the switch to practicing a different skill. Consequently, young
remaining features (identification agency, intrinsic agency, students, who were likely to be weak at self-regulated learn-
math interest and opportunity count) each had a positive ing [37,53], simply played the mini-games that they thought
and significant association. were engaging, which in this case involved the skills they had
already mastered. A prior study by [29] similarly found that,
5. DISCUSSION in a game about locating fractions on number line, students
were more engaged when the game was easier, contradicting
5.1 Investigating in-game learning game design theories that optimal engagement would occur
Based on the opportunity count until mastery in each skill
at moderate difficulty level.
(Figure 2), we identified Sorting and Number Line as the
most difficult skills in the game. Our prior learning curve
analysis [40] on a different Decimal Point study reported a
consistent finding – that the learning curves of these two 5.2 Investigating factors related to posttest and
skills were mostly flat and reflected small learning rates. delayed posttest performance
Based on previous research in decimal learning, a plausible We saw that our linear regression models were able to pre-
explanation is that there are several misconceptions which dict posttest and delayed posttest performance well, cap-
can lead to students making a mistake in Sorting or Num- turing about 75% of the variance in test scores with only
ber Line problems, including (1) treating decimals as whole 3-5 features. The three features present in both models are
numbers, (2) treating decimals as fractions, and (3) ignoring pretest score, Sorting mastery and Bucket mastery. The
the zero in the tenths place [46]. Furthermore, even when inclusion of pretest score is not surprising, as it is consistent
students recognize their misconception, they may shift to with the standard practice of controlling for prior knowledge
a different misconception instead of arriving at the correct when analyzing posttest score [58]. On the other hand, both
understanding [56]. This phenomenon likely also occurred Sorting mastery and Bucket mastery suggest that the abil-
in Decimal Point, as the game provides corrective feedback ity to compare decimal numbers plays a large role in test per-
(whether an answer is right or wrong) but does not empha- formance. This is likely due to the game and test materials
size the underlying reasoning; consequently, as an example, focusing on the four most common decimal misconceptions
a student realizing it is wrong to assume longer decimals are (Megz, Segz, Pegz, Negz), three of which are related to dec-
larger may end up concluding that shorter decimals must be imal comparison [25]. Based on the distribution of practice
larger, thereby adopting a new misconception. This high- opportunities until mastery, however, students took much
lights the need for more refined tracing of the student’s more attempts to master Sorting problems than Bucket
dynamic learning states in a digital learning environment. problems, which may explain why they did not achieve high
While the standard KC modeling technique can track when scores on the posttest and delayed posttest, averaging at
students make an intended mistake (e.g., longer decimals only around 30 out of 52 points [22]. Therefore, improv-
are larger), it does not investigate their specific input to ing students’ performance on Sorting problems, potentially
see whether a new misconception (e.g., shorter decimals are by incorporating hints and error messages as we previously
larger) has emerged. To address this issue, future itera- discussed, is crucial in future studies of the game.

491 Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020)
At the same time, we saw that Number Line mastery had a count. Identification and intrinsic agency indicate that, with
significant positive association with delayed posttest score, all other factors being equal, the more students identified
but was not selected in the posttest model. An interpreta- their learning as coming from intrinsic motivation (rather
tion of this result is that Number Line tasks, which we identi- than external pressures), the more achievement they felt
fied as among the most difficult in the game, could be at a de- after learning. Math interest and computer efficacy sug-
sirable difficulty level, which can promote deeper and longer- gest that students’ acquaintance with the learning domain
lasting learning than the more straightforward tasks [61]. or medium could also be positively associated with achieve-
For instance, a prior study on comparing erroneous exam- ment emotion [26]. On the other hand, pretest score had
ples and problem-solving decimal tasks found that erroneous a negative association, likely because students with lower
examples, which are more aligned with the desirable diffi- prior knowledge were able to learn more from the game
culty, led to significantly higher delayed posttest scores but and therefore felt more achievement than those with high
similar posttest scores [34]. In our case, we also saw that prior knowledge. Similarly, for opportunity count, a plau-
Number Line is an important feature for predicting delayed sible reason for students choosing to play more mini-game
posttest but not for predicting posttest performance. rounds is that they felt the mini-games were helpful, which
contributed to their achievement emotion after game play.
Similar to Number Line mastery, gender (male = 0, female Overall, the features we identified could serve as a guideline
= 1) was not a feature in the posttest model, but had a posi- for promoting achievement emotion in learning games and
tive association with delayed posttest scores. In other words, in more general instructional contexts.
with other factors being equal, females could achieve higher
delayed posttest scores than males. While this association is 6. CONCLUSIONS
only marginally significant (p = .074), similar findings about
From our analyses, we gained several insights into students’
females’ tendency to outperform males in retention and de-
learning outcomes and enjoyment in Decimal Point. First,
layed posttest have been reported in previous mathematics
we found that Sorting and Number Line are important skills
intervention studies [1, 20]. Using the same dataset as in
for posttest and delayed posttest performance, but students
this work, [22] also found that females demonstrated signif-
required more instructional support to effectively master
icantly higher pre-post and pre-delayed learning gains than
them. Second, very few students mastered all five deci-
males, with a larger effect size in pre-delayed learning gains.
mal skills from the game, while the majority engaged in
Therefore, an important next step is to conduct future stud-
over-practice, likely due to their preference for playing easy
ies of Decimal Point on a larger sample size to draw more
mini-games, i.e., those they had already mastered. Third,
conclusive findings about whether the game promotes more
expanding on prior findings about gender effect in Decimal
retention in females and what could lead to this effect.
Point [22,33], we identified a trend of females outperforming
males in the delayed posttest, which should be investigated
5.3 Investigating factors related to enjoyment on a larger sample size. Fourth, we learned that students’
Our enjoyment prediction models did not perform as well achievement emotion can be reasonably captured by their
as the learning models and could explain only about 20% of level of computer efficacy, learning motivation, prior knowl-
the variance in game engagement and affective engagement. edge and number of mini-game rounds. All of these insights
These poor model fits likely result from the lack of appropri- can be derived from log data alone and would serve as useful
ate features in our data. To track student engagement, pre- metrics to assist digital learning game researchers in evalu-
vious work has emphasized the use of fine-grained measures ating and improving their own games. For Decimal Point,
such as time spent on decision making [47], social engage- in particular, an important next step is to perform similar
ment profile [49] and interaction traces [6]; in contrast, our analyses in other studies of the game to see which of our
feature set consists mainly of quantitative scores (e.g., Likert findings can be replicated. Identifying consistent trends in
responses) and aggregate data (e.g., error count). Related to student data could allow us to construct a more generalized
this direction, a previous study of Decimal Point by [57] has model of students’ game play that combines existing theories
clustered students based on their mini-game selection orders with novel exploratory analyses [38].
and found that the cluster which demonstrated more agency
reported higher enjoyment. Adopting their method of en- In a broader context, we have seen the rapid growth of digi-
coding students’ mini-game sequences is a good first step in tal learning games in recent years, from being conceived as a
building more fine-grained features for our prediction tasks. novel learning platform [15, 21] to having their effectiveness
On the other hand, the lack of association between our in- validated by rigorous studies [10]. The game Decimal Point,
game learning measures (e.g., skill mastery, over-practice op- in particular, has been shown to significantly improve stu-
portunity count, error count) and game engagement or affec- dents’ learning through several research works [18,22,35,39].
tive engagement implies that students’ game performance, When viewing from a learning analytics perspective, how-
whether good or bad, were unlikely to yield any negative ever, one could identify room for improvement that would
emotion such as confusion or frustration. This is a positive otherwise not be reflected in pretest and posttest scores
outcome, indicating that our game environment does not alone. For instance, a game may not adequately support
impose any performance pressure on students – one of the all of its learning objectives, or students may engage in non-
primary principles of learning games [15]. optimal learning behavior due to a lack of self-regulation.
At the heart of these issues is the question of how digital
At the same time, we did find that a linear regression model learning games can optimize student learning while retain-
was able to predict achievement emotion reasonably well ing its core value as a playful environment, where players
from student’s identification agency, intrinsic agency, math are free to exercise their agency. Addressing this question is
interest, computer efficacy, pretest score and opportunity an important step for future works in the field.

Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) 492
7. REFERENCES [14] W. Fan and C. A. Wolters. School motivation and
[1] J. Ajai and B. Imoko. Gender differences in high school dropout: The mediating role of
mathematics achievement and retention scores: A case educational expectation. British Journal of
of problem-based learning method. International Educational Psychology, 84(1):22–39, 2014.
Journal of research in Education and Science, [15] J. P. Gee. What video games have to teach us about
1(1):45–50, 2015. learning and literacy. Computers in Entertainment
[2] E. Andersen, E. O’Rourke, Y.-E. Liu, R. Snider, (CIE), 1(1):20–20, 2003.
J. Lowdermilk, D. Truong, S. Cooper, and Z. Popovic. [16] J. F. Hair, W. C. Black, B. J. Babin, R. E. Anderson,
The impact of tutorials on games of varying R. L. Tatham, et al. Multivariate data analysis (vol.
complexity. In Proceedings of the SIGCHI Conference 6), 2006.
on Human Factors in Computing Systems, pages [17] E. Harpstead and V. Aleven. Using empirical learning
59–68, 2012. curve analysis to inform design in an educational
[3] R. S. Baker. Personal correspondence, 2019. game. In Proceedings of the 2015 Annual Symposium
[4] R. S. Baker, M. J. Habgood, S. E. Ainsworth, and on Computer-Human Interaction in Play, pages
A. T. Corbett. Modeling the acquisition of fluent skill 197–207, 2015.
in educational action games. In International [18] E. Harpstead, J. E. Richey, H. Nguyen, and B. M.
Conference on User Modeling, pages 17–26. Springer, McLaren. Exploring the subtleties of agency and
2007. indirect control in digital learning games. In
[5] A. Ben-Eliyahu, D. Moore, R. Dorph, and C. D. Proceedings of the 9th International Conference on
Schunn. Investigating the multidimensionality of Learning Analytics & Knowledge, pages 121–129, 2019.
engagement: Affective, behavioral, and cognitive [19] J. Hattie and H. Timperley. The power of feedback.
engagement across science activities and contexts. Review of educational research, 77(1):81–112, 2007.
Contemporary Educational Psychology, 53:87–105, [20] L. L. Haynes and J. V. Dempsey. How and why
2018. students play computer-based mathematics games: A
[6] P. Bouvier, K. Sehaba, and É. Lavoué. A trace-based consideration of gender differences. 2001 Annual
approach to identifying users’ engagement and Proceedings-Atlanta: Volume, page 178.
qualifying their engaged-behaviours in interactive [21] M. A. Honey and M. L. Hilton. Learning science
systems: application to a social game. User Modeling through computer games. National Academies Press,
and User-Adapted Interaction, 24(5):413–451, 2014. Washington, DC, 2011.
[7] J. H. Brockmyer, C. M. Fox, K. A. Curtiss, [22] X. Hou, H. Nguyen, J. E. Richey, and B. M. McLaren.
E. McBroom, K. M. Burkhart, and J. N. Pidruzny. Exploring how gender and enjoyment impact learning
The development of the game engagement in a digital learning game. In International Conference
questionnaire: A measure of engagement in video on Artificial Intelligence in Education. Springer, 2020.
game-playing. Journal of Experimental Social [23] C. S. Hulleman, O. Godes, B. L. Hendricks, and J. M.
Psychology, 45(4):624–634, 2009. Harackiewicz. Enhancing interest and performance
[8] H. Cen, K. R. Koedinger, and B. Junker. Is over with a utility value intervention. Journal of
practice necessary?-improving learning efficiency with educational psychology, 102(4):880, 2010.
the cognitive tutor through educational data mining. [24] A. Illanas Vila, J. R. Calvo-Ferrer, F. J.
Frontiers in artificial intelligence and applications, Gallego-Durán, F. Llorens Largo, et al. Predicting
158:511, 2007. student performance in foreign languages with a
[9] C.-H. Chen, K.-C. Wang, and Y.-H. Lin. The serious game. 2013.
comparison of solitary and collaborative modes of [25] S. Isotani, D. Adams, R. E. Mayer, K. Durkin,
game-based learning on students’ science learning and B. Rittle-Johnson, and B. M. McLaren. Can erroneous
motivation. Journal of Educational Technology & examples help middle-school students learn decimals?
Society, 18(2):237–248, 2015. In European Conference on Technology Enhanced
[10] D. B. Clark, E. E. Tanner-Smith, and S. S. Learning, pages 181–195. Springer, 2011.
Killingsworth. Digital games, design, and learning: A [26] M. Jansen, O. Lüdtke, and U. Schroeders. Evidence
systematic review and meta-analysis. Review of for a positive relation between interest and
educational research, 86(1):79–122, 2016. achievement: Examining between-person and
[11] G. C. Delacruz, G. K. Chung, and E. L. Baker. within-person variation in five domains. Contemporary
Validity evidence for games as assessment Educational Psychology, 46:116–127, 2016.
environments. cresst report 773. National Center for [27] S. Karumbaiah, R. S. Baker, and V. Shute. Predicting
Research on Evaluation, Standards, and Student quitting in students playing a learning game.
Testing (CRESST), 2010. International Educational Data Mining Society, 2018.
[12] A. L. Duckworth, C. Peterson, M. D. Matthews, and [28] K. R. Koedinger, E. Brunskill, R. S. Baker, E. A.
D. R. Kelly. Grit: perseverance and passion for McLaughlin, and J. Stamper. New potentials for
long-term goals. Journal of personality and social data-driven intelligent tutoring system development
psychology, 92(6):1087, 2007. and optimization. AI Magazine, 34(3):27–41, 2013.
[13] A. M. Durik, M. Vida, and J. S. Eccles. Task values [29] D. Lomas, K. Patel, J. L. Forlizzi, and K. R.
and ability beliefs as predictors of high school literacy Koedinger. Optimizing challenge in an educational
choices: A developmental analysis. Journal of game using large-scale design experiments. In
Educational Psychology, 98(2):382, 2006. Proceedings of the SIGCHI Conference on Human

493 Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020)
Factors in Computing Systems, pages 89–98, 2013. [44] P. Pintrich, D. Smith, T. Garcia, and W. McKeachie.
[30] M. Manske and C. Conati. Modelling learning in an A manual for the use of the motivated strategies for
educational game. In AIED, pages 411–418, 2005. learning questionnaire (mslq) ann arbor. MI: National
[31] G. Marakas, R. Johnson, and P. F. Clay. The evolving Center for Research to Improve Postsecondary
nature of the computer self-efficacy construct: An Teaching and Learning, pages 1–76, 1991.
empirical investigation of measurement construction, [45] S. Raschka. Mlxtend: Providing machine learning and
validity, reliability and stability over time. Journal of data science utilities and extensions to python’s
the Association for Information Systems, 8(1):2, 2007. scientific computing stack. The Journal of Open
[32] R. E. Mayer. Computer games for learning: An Source Software, 3(24), Apr. 2018.
evidence-based approach. MIT Press, 2014. [46] L. B. Resnick, P. Nesher, F. Leonard, M. Magone,
[33] B. McLaren, R. Farzan, D. Adams, R. Mayer, and S. Omanson, and I. Peled. Conceptual bases of
J. Forlizzi. Uncovering gender and problem difficulty arithmetic errors: The case of decimal fractions.
effects in learning with an educational game. In Journal for research in mathematics education, pages
International Conference on Artificial Intelligence in 8–27, 1989.
Education, pages 540–543. Springer, 2017. [47] V. Riemer and C. Schrader. Impacts of behavioral
[34] B. M. McLaren, D. M. Adams, and R. E. Mayer. engagement and self-monitoring on the development of
Delayed learning effects with erroneous examples: a mental models through serious games: Inferences from
study of learning decimals with a web-based tutor. in-game measures. Computers in Human Behavior,
International Journal of Artificial Intelligence in 64:264–273, 2016.
Education, 25(4):520–542, 2015. [48] J. P. Rowe and J. C. Lester. Modeling user knowledge
[35] B. M. McLaren, D. M. Adams, R. E. Mayer, and with dynamic bayesian networks in interactive
J. Forlizzi. A computer-based game that promotes narrative environments. In Sixth AI and Interactive
mathematics learning more than a conventional Digital Entertainment Conference, 2010.
approach. International Journal of Game-Based [49] J. A. Ruiperez-Valiente, M. Gaydos, L. Rosenheck,
Learning (IJGBL), 7(1):36–56, 2017. Y. J. Kim, and E. Klopfer. Patterns of engagement in
[36] B. M. McLaren, S.-J. Lim, D. Yaron, and K. R. an educational massive multiplayer online game: A
Koedinger. Can a polite intelligent tutoring system multidimensional view. IEEE Transactions on
lead to improved learning outside of the lab? Frontiers Learning Technologies, 2020.
in Artificial Intelligence and Applications, 158:433, [50] R. M. Ryan and J. P. Connell. Perceived locus of
2007. causality and internalization: Examining reasons for
[37] J. Metcalfe and N. Kornell. The dynamics of learning acting in two domains. Journal of personality and
and allocation of study time to a region of proximal social psychology, 57(5):749, 1989.
learning. Journal of Experimental Psychology: [51] J. L. Sabourin, L. R. Shores, B. W. Mott, and J. C.
General, 132(4):530, 2003. Lester. Understanding and predicting student
[38] R. J. Mislevy, J. T. Behrens, K. E. Dicerbo, and self-regulated learning strategies in game-based
R. Levy. Design and discovery in educational learning environments. International Journal of
assessment: Evidence-centered design, psychometrics, Artificial Intelligence in Education, 23(1-4):94–114,
and educational data mining. JEDM| Journal of 2013.
Educational Data Mining, 4(1):11–48, 2012. [52] R. Sawyer, A. Smith, J. Rowe, R. Azevedo, and
[39] H. Nguyen, E. Harpstead, Y. Wang, and B. M. J. Lester. Is more agency better? the impact of
McLaren. Student agency and game-based learning: A student agency on game-based learning. In
study comparing low and high agency. In International Conference on Artificial Intelligence in
International Conference on Artificial Intelligence in Education, pages 335–346. Springer, 2017.
Education, pages 338–351. Springer, 2018. [53] W. Schneider. The development of metacognitive
[40] H. Nguyen, Y. Wang, J. Stamper, and B. M. knowledge in children and adolescents: Major trends
McLaren. Using knowledge component modeling to and implications for education. Mind, Brain, and
increase domain understanding in a digital learning Education, 2(3):114–121, 2008.
game. In International Conference on Educational [54] V. J. Shute, L. Wang, S. Greiff, W. Zhao, and
Data Mining, pages 139–148, 2019. G. Moore. Measuring problem solving skills via stealth
[41] M. Ninaus, K. Moeller, J. McMullen, and K. Kiili. assessment in an engaging video game. Computers in
Acceptance of game-based learning and intrinsic Human Behavior, 63:106–117, 2016.
motivation as predictors for learning success and flow [55] J. Stamper, K. Koedinger, R. S. d Baker,
experience. 2017. A. Skogsholm, B. Leber, J. Rankin, and S. Demi. Pslc
[42] Z. Peddycord-Liu, R. Harred, S. Karamarkovich, datashop: A data analysis service for the learning
T. Barnes, C. Lynch, and T. Rutherford. Learning science community. In International Conference on
curve analysis in a large-scale, drill-and-practice Intelligent Tutoring Systems, pages 455–455. Springer,
serious math game: Where is learning support needed? 2010.
In International Conference on Artificial Intelligence [56] W. Van Dooren, D. De Bock, A. Hessels, D. Janssens,
in Education, pages 436–449. Springer, 2018. and L. Verschaffel. Remedying secondary school
[43] R. Pekrun. Progress and open problems in educational students’ illusion of linearity: A teaching experiment
emotion research. Learning and Instruction, aiming at conceptual change. Learning and
15(5):497–506, 2005. Instruction, 14(5):485–501, 2004.

Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) 494
[57] Y. Wang, H. Nguyen, E. Harpstead, J. Stamper, and
B. M. McLaren. How does order of gameplay impact
learning and enjoyment in a digital learning game? In
International Conference on Artificial Intelligence in
Education, pages 518–531. Springer, 2019.
[58] B. E. Whitley and M. E. Kite. Principles of research
in behavioral science. Routledge, 2013.
[59] J. Wiemeyer, M. Kickmeier-Rust, and C. M. Steiner.
Performance assessment in serious games. In Serious
Games, pages 273–302. Springer, 2016.
[60] M. V. Yudelson, K. R. Koedinger, and G. J. Gordon.
Individualized bayesian knowledge tracing models. In
International conference on artificial intelligence in
education, pages 171–180. Springer, 2013.
[61] C. L. Yue, E. L. Bjork, and R. A. Bjork. Reducing
verbal redundancy in multimedia learning: An
undesired desirable difficulty? Journal of Educational
Psychology, 105(2):266, 2013.

495 Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020)

You might also like