0% found this document useful (0 votes)
92 views11 pages

AI-Tutoring in Software Engineering Education: Eduard Frankford Clemens Sauerwein Patrick Bassner

Informe estudio

Uploaded by

vicente poch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views11 pages

AI-Tutoring in Software Engineering Education: Eduard Frankford Clemens Sauerwein Patrick Bassner

Informe estudio

Uploaded by

vicente poch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

AI-Tutoring in Software Engineering Education

Experiences with Large Language Models in Programming Assessments


Eduard Frankford Clemens Sauerwein Patrick Bassner
[email protected] [email protected] [email protected]
University of Innsbruck University of Innsbruck Technical University of Munich
Innsbruck, Austria Innsbruck, Austria Munich, Germany

Stephan Krusche Ruth Breu


[email protected] [email protected]
Technical University of Munich University of Innsbruck
arXiv:2404.02548v2 [cs.SE] 5 Apr 2024

Munich, Germany Innsbruck, Austria

ABSTRACT therefore generate nuanced, comprehensive, and context-aware


With the rapid advancement of artificial intelligence (AI) in vari- feedback [21]. Beyond just unit test feedback, LLMs like OpenAI’s
ous domains, the education sector is set for transformation. The GPT-3.5-Turbo or GPT-4 have the potential to recognize a broader
potential of AI-driven tools in enhancing the learning experience, spectrum of student mistakes and offer tailored guidance. Such
especially in programming, is immense. However, the scientific capabilities can bridge the gap in the shortcomings of traditional
evaluation of Large Language Models (LLMs) used in Automated ITSs and expand the horizon for feedback mechanisms within pro-
Programming Assessment Systems (APASs) as an AI-Tutor remains gramming education. While the integration of LLMs into various
largely unexplored. Therefore, there is a need to understand how tools and sectors is well-documented, its specific application in pro-
students interact with such AI-Tutors and to analyze their experi- gramming education, especially, in the form of an AI-Tutor within
ences. APASs, remains mostly unexplored.
In this paper, we conducted an exploratory case study by inte- To address this gap, we seek to address the following research
grating the GPT-3.5-Turbo model as an AI-Tutor within the APAS questions:
Artemis. Through a combination of empirical data collection and (1) RQ1: What is the nature of student interaction with Auto-
an exploratory survey, we identified different user types based on mated Programming Assessment Systems when facilitated
their interaction patterns with the AI-Tutor. Additionally, the find- by an AI-Tutor?
ings highlight advantages, such as timely feedback and scalability. (2) RQ2: How do students experience AI driven feedback in
However, challenges like generic responses and students’ concerns Automated Programming Assessment Systems?
about a learning progress inhibition when using the AI-Tutor were (3) RQ3: What are the lessons learned after implementing and
also evident. This research adds to the discourse on AI’s role in operating an AI-Tutor within an Automated Programming
education. Assessment System?
As a first step, the primary objective is to explore the effective-
KEYWORDS ness and implications of integrating an AI-Tutor based on GenAI,
Programming Education, Automated Programming Assessment specifically OpenAI’s GPT-3.5-Turbo model, into an APAS [13]. This
Systems, Artificial Intelligence, ChatGPT, OpenAI, ChatBots approach combines empirical data collection with an exploratory
survey. As part of the empirical investigation we closely monitored
the AI-tutor’s usage, student interactions, code submissions, and
1 INTRODUCTION feedback timings. Additionally, we analyzed code changes between
The recent rise of artificial intelligence (AI) has resulted in transfor- submissions to understand student engagement patterns with the
mative changes across various sectors. In healthcare, AI has enabled AI-Tutor. The preliminary findings suggest that the AI-Tutor offers
advanced diagnostics and personalized treatments [9]. In finance, unique benefits, but there is still a long way to fully optimize the
algorithmic trading and fraud detection have been revolutionized student learning experience.
[7] and the automotive industry is on the brink of a new era with the The remainder of this paper is structured as follows: Section 2
development of autonomous vehicles [34]. We have seen first appli- provides an overview of related work. Section 3 elaborates on the
cations of AI in the educational sector through Intelligent Tutoring research techniques. Section 4 presents the main findings of this
Systems (ITS) [4]. ITS offer personalized learning experiences, yet study, which are further discussed in Section 5. Section 6 outlines
their reliance on limited training data confines their applicability potential constraints of this study, and we conclude in Section
to specific scenarios [4]. This limitation not only escalates develop- 7, summarizing the main insights and reflecting on the broader
ment costs but also restricts the scope and depth of feedback, thus implications of this research.
hindering their broader adoption in diverse educational contexts.
Recently, with the introduction of ChatGPT, we have entered the 2 RELATED WORK
age of accessible generative AI (GenAI) and large language models ITSs have long been a subject of interest in the realm of program-
(LLMs). LLMs are trained on vast amounts of diverse data and can ming education [1]. These systems are generally designed to deliver
Eduard Frankford, Clemens Sauerwein, Patrick Bassner, Stephan Krusche, and Ruth Breu

instructional content in a way that is tailored to individual learners, which combines ChatGPT with Quizlet’s educational content li-
adapting to a student’s needs [4]. There have been experiments brary [16]. Furthermore, Khan Academy also started using AI to
proving that these systems show similar effects like human tutoring create a chat-bot, based on the GPT-4 model, with the goal in mind
[18]. As a result, many ITSs have been created for programming that students can use it to ask for assistance without the tool re-
education [1, 3, 10]. Adaptive or intelligent feedback is a common vealing the solution, but helping them solve the exercise [16].
feature, but this feedback is mainly generated by extensive unit The related work section shows that existing literature has ex-
testing [4]. plored traditional ITSs and the general capabilities of ChatGPT in
Beside unit testing, the application of machine learning to emu- various domains, and industry has already implemented sophisti-
late human feedback is no recent advancement, as the first chat-bot cated tools using generative AI. However, an exploratory under-
has been introduced over 50 years ago [2]. Since then, these chat- standing of its practical application as an AI-Tutor within APASs
bots have become more and more intelligent [17, 32]. However, is still missing. The empirical studies have primarily focused on
they are normally trained on questions the creators expect users to the model’s ability to debug, generate code, and provide hints, and
ask, but this is changing with the introduction of ChatGPT [5]. therefore, were able to state that large language models can be
Rudolph et al. did one of the first extensive literature reviews used for specific tasks, like tutoring. However, none have really
on ChatGPT and focused on its relevance for higher education, implemented such a system and therefore, the real student experi-
especially on student assessment, student learning and teaching ence, interaction patterns, and perceptions when using ChatGPT
[26]. They found that with ChatGPT it is now possible to simulate as an AI-Tutor in APASs have not been scientifically investigated.
the assistance provided by a tutor, such as providing personalised This study seeks to offer a scientific evaluation on the integration
assistance in solving problems. Furthermore, Ray focused on the a GPT-based AI-Tutor in APASs, demonstrating that its tutoring
applications of ChatGPT across various domains and found among capabilities, as proposed in literature, can be realized in practice.
other things that it has potential in personalizing learning, by ana-
lyzing data on students’ learning preferences, strengths, and weak- 3 METHODOLOGICAL APPROACH
nesses [25]. Kasneci et al. discuss the opportunities and challenges This study is guided by a set of research questions (RQ1, RQ2 and
when using generative AI tools like ChatGPT in education [12]. RQ3) that have been defined in Section 1. To address these research
They point out the opportunity to provide personalized feedback questions, we implemented a three-stage methodological approach:
to students.
Literature also already documents the effective use of ChatGPT (1) Integration of the AI-Tutor within an APAS: Initial
in improving source code. For example, Surameery and Shakor ex- integration into the Artemis platform [15, 19].
plored the use of ChatGPT to solve programming bugs [30]. To be (2) Practical application by students: Students solved a spe-
precise, they examined how they can leverage the model to provide cific programming task on the platform.
debugging assistance, bug prediction and bug explanation to help (3) Exploratory survey: A survey targeting students of the
solve programming problems. They conclude that ChatGPT can “Introduction to Programming” course at the University of
play an important role in solving programming bugs, but it is not a Innsbruck to collect their experiences with the AI-Tutor.
perfect solution and should be seen as an additional debugging tool. The following sections explain the details of each of these stages.
Sobania et al. analyzed the automatic bug fixing performance of
ChatGPT using the bug fixing benchmark set, QuixBugs [28]. They 3.1 Integration of the AI-Tutor within an APAS
found that ChatGPT’s bug fixing performance is notably better We have implemented the AI-Tutor to collect data. This included
than other state of the art approaches being able to solve 31 out developing a prototype that integrates the APAS Artemis [14, 15]
of 40 bugs. Other researchers, like Ouh et al. and Tian et al., con- with the GPT-3.5-Turbo model of OpenAI. We have chosen Artemis
ducted empirical analyses of ChatGPT’s potential as a programming as the APAS for this study because of several reasons:
assistant focusing on code generation, program repair, and code
summarization [22, 31]. Tian et al. found that ChatGPT can hint (1) Open Source: Artemis is available as an open source project
surprisingly well to the original intention behind what a correct on GitHub, which makes this research reproducible.1
version of a program should look like [31]. (2) Functional Scope: Artemis provides all the basic features
Pardos and Bhandari concentrated on comparing the efficacy of necessary for an APAS, including automatic exercise evalu-
hints authored by human tutors and hints generated by ChatGPT ation via test driven feedback, which improves the external
for elementary and intermediate Algebra [23]. They found that validity of the findings [27].
79% of hints produced by ChatGPT passed a manual quality test. (3) Online Editor: Artemis allows students to solve exercises
Additionally, Lo conducted research to decide on how ChatGPT online via a built-in code editor. This made the implemen-
performs in different subject domains [21] and found that ChatGPT tation of the AI-Tutor and data collection easier.
overall performance regarding programming was outstanding to (4) Large User Base: Artemis is used by more than ten dif-
satisfactory [29]. However, regarding “Software Testing”, it was ferent universities, like the TU Munich and University of
able to answer 55.6% of the questions partially correctly [11]. Innsbruck, and is therefore used by thousands of students
Industry has also recognized the value of generative AI, with every semester. Therefore, improving the Artemis APAS is
EdTech organizations developing AI-based solutions to help stu- directly beneficial for a large user base.
dents with their coursework and giving ideas for lessons to educa-
tors [16]. Kshetri found that Quizlet launched an AI-Tutor Q-Chat, 1 https://github.com/ls1intum/Artemis

Preprint — do not distribute.


AI-Tutoring in Software Engineering Education

Figure 2: Artemis code editor with the button to “View AI


Feedback”.

Figure 1: Sequence diagram of the usage workflow of the


Artemis system extended by the AI-Tutor functionality.

Before the integration of the AI-Tutor the workflow to use Artemis


for programming exercises was the following [15]:
(1) Instructors prepare an exercise: Mainly involves the cre-
ation of an exercise description, the creation of a template Figure 3: AI Generated Feedback displayed in a pop up.
file, the creation of a sample solution and the creation of
unit tests to test the code submitted by the students. level ensures relevant feedback with sufficient variability
(2) Students solve an exercise: Students write code to solve for exploring diverse solutions.2
the problem statement using the integrated online editor
Once having sent this API-call to the OpenAI servers, we ex-
offered by the platform. When students submit a solution
tracted the response of the LLM and displayed it in a pop-up window
attempt, the code of the submission is stored in a version
without any further modifications. This can be seen in Figure 3. All
control system. For this study, it was GitLab.
the code files needed to adapt Artemis to display this possibility
(3) System returns feedback: For each submission, a build
can be found in Figshare3 .
pipeline is triggered that executes the test cases written by
The language model receives the following prompt depicted in
the instructors and returns the test results with individual
Listing 1:
messages as feedback to the students.
Listing 1: GPT-3.5-Turbo Prompt
The integration of the AI-Tutor extended this workflow by an
additional possibility to request feedback from the AI-Tutor. This Act a s a programming t u t o r and g i v e i n f o r m a l
extended workflow has been depicted in Figure 1. f e e d b a c k i n language t o t h e s t u d e n t .
The e x e r c i s e d e s c r i p t i o n i s t h e f o l l o w i n g :
In Figure 2, the Artemis code editor is shown with the new pos-
description
sibility to request AI feedback by clicking the “View AI Feedback”
The s t u d e n t s c o d e l o o k s l i k e t h a t a t t h e moment :
button displayed on the top right. current
Once this request is send to the server, the current solution of the Do n o t p r o v i d e a c o d e s o l u t i o n .
student, the exercise description and sample solution are retrieved The o p t i m a l s o l u t i o n s h o u l d l o o k l i k e t h a t :
from the APAS’ database and an API-call to the OpenAI servers solution
containing the following information is sent: I m p o r t a n t : Do n o t p r o v i d e c o d e .

(1) Model: The requested LLM, which is in our case GPT-3.5- The prompt starts with the main instruction to tell the LLM to
Turbo. act as a tutor. language is the current selected language in Artemis.
(2) Message: The prompt which the LLM should take into This value can either be English or German.
consideration. This prompt can be seen in Listing 1. 2 https://platform.openai.com/docs/guides/text-generation/how-should-i-set-the-
(3) Temperature: We set the temperature at 0.7 to balance temperature-parameter
predictability and creativity in the LLM’s responses. This 3 https://figshare.com/s/636a9c5ff8f2c8315f26

Preprint — do not distribute.


Eduard Frankford, Clemens Sauerwein, Patrick Bassner, Stephan Krusche, and Ruth Breu

description is the the task to be solved by the student. For this (5) Resource efficiency: Direct, dynamic interactions with
study the students had to implement Pascal’s triangle and the task the system may consume more resources, because the chat
was to implement the functions to generate, display and release history should be given as context to enable meaningful
the memory for a portion of Pascal’s triangle [8]. In addition to the conversations. Therefore, leading to higher costs as more
task description, the students were given a starting template with tokens are used.
method stubs to start the exercise with. (6) Reduction of over-reliance: By limiting direct interac-
We have chosen the Pascal’s triangle exercise to test the AI-Tutor tions, students were encouraged to think critically and not
because: over-rely on the AI-Tutor for every minor query or chal-
(1) Foundational programming constructs: The implemen- lenge.
tation of Pascal’s triangle touches upon many foundational When the student presses the button “View AI Feedback” we
programming concepts such as loops, conditionals, arrays, store the following information in the database:
and in some languages, dynamic memory management. If (1) Code: The current solution of the student.
an AI can give valuable feedback on this exercise, it in- (2) Feedback: The feedback returned by the LLM.
dicates its capability to understand and instruct on tasks (3) User: A user identifier to identify each request.
involving the foundational concepts mentioned before. (4) File: The file on which the student is working on.
(2) Algorithmic thinking: The process for creating Pascal’s (5) Timestamp: The current time.
triangle involves iterative and recursive thinking. This show-
cases the AI’s capability to handle a diverse range of al- 3.2 Exploratory Survey
gorithmic challenges as many other algorithm problems
We selected students from the “Introduction to Programming” tu-
involve similar patterns of thought.
torial as subjects for the experiment. This tutorial is part of the
(3) Concept overlap: Many problems in computer science
Bachelor in Computer Science curriculum at the University of Inns-
and mathematics share concepts with Pascal’s triangle, e.g.,
bruck and teaches first year students the basics in the programming
the binomial expansion and combinatorics. A successful
language C. For this experiment a total of 23 students actively par-
tutoring here indicates the AI’s potential to generalize its
ticipated. While this may seem like a modest sample size, it’s impor-
capabilities to related problems.
tant to note that the qualitative nature of this analysis allowed for
(4) Versatility in problem complexity: Pascal’s triangle can
a more in-depth understanding of individual experiences, making
be approached in multiple ways. If the AI-tutor can man-
the size not only manageable but also advantageous and given their
age the range of solutions for this problem, it suggests its
recent interactions with traditional, human tutors, these students
robustness in tutoring exercises with different levels of
were especially appropriate subjects for assessing an AI-Tutor.
complexity.
Prior to the data collection and survey implementation, we in-
(5) Debugging and problem solving: Common mistakes are
troduced the students to the possibility of receiving feedback from
possible in implementing Pascal’s triangle. An AI-tutor’s
the newly implemented AI-Tutor via Artemis.
ability to diagnose and correct these signifies its potential to
In the subsequent tutorial, the students were tasked with solving
generalize this capability to other programming challenges.
the, before described, Pascal’s Triangle task [8]. They had one week
The last two parts of the prompt, current and solution, repre- to solve the task. While they were solving the exercise they were
sent the current solution of the student and the optimal solution free to choose whether they used the new AI-Feedback functionality
defined by the exercise creator, respectively. or not. However, when they pressed the “View AI-Feedback”-button
To integrate the AI-Tutor we have chosen to not enable direct we stored the “AI Feedback data” (Code at feedback time, feedback
interactions with the model because of the following reasons: returned by the LLM, User, File and Timestamp) in the database
(1) Easier usability: The assumption was that predetermining and when they submitted their current code to the Artemis system
the prompt would simplify the user experience. By elimi- their current solution, test results and timestamp were saved in the
nating the chat-bot-style interaction, we sidestepped the version control system connected to Artemis.
necessity for students to formulate a question, thus stream- Finally, for the next course, we allotted approximately 15 minutes
lining their interactions. for the students to complete a questionnaire. In the survey we asked
(2) Controlled environment: A predefined interaction model the following questions based on the Technology Acceptance Model
provides a more controlled setting, thereby simplifying mea- (TAM)[6]. TAM is a theoretical model that includes two primary
sures taken to avoid students from receiving the solution factors that determine an individual’s intention to use a technology:
for the exercise via prompt engineering [20]. (1) Perceived Ease of Use (PEOU) and (2) Perceived Usefulness (PU).
(3) Quality assurance: With a static model, we were able to The model has been widely adopted in various fields to understand
ensure that the AI-Tutor offers consistent pedagogically and predict the acceptance of newly implemented features.
sound feedback, which is in line with the course’s learning (1) I find the AI-Tutor easy to use: This is directly related
objectives. to the PEOU dimension of TAM. It seeks to collect the
(4) Data privacy: Direct interactions could inadvertently lead respondents’ perceptions about the ease of interface and
students to input sensitive or personal information. A static interaction with the AI-Tutor.
model minimizes this risk, adhering better to data privacy (2) Using the AI-Tutor for my tasks enables me to accom-
standards. plish the tasks more quickly: This question is mainly

Preprint — do not distribute.


AI-Tutoring in Software Engineering Education

about the efficiency offered by the AI-Tutor, which can be Key insights derived from this qualitative analysis were essential
seen as a subset of PU as it implies the benefit of time- in understanding the intricacies of the student interactions and
saving. experiences.
(3) Using the AI-Tutor improves my performance: This
touches on the PU dimension by gauging whether the users 4 RESULTS
feel they perform better in their tasks due to the AI-Tutor. In the following, we present the results of the conducted analysis.
(4) Using the AI-Tutor for my tasks increases my produc- We divided this section into three subsections, each addressing a
tivity: Again, this is a question about PU. By increasing specific research question.
productivity, the AI-Tutor is seen as adding value to the
user’s.
4.1 Student Interaction
(5) Using the AI-Tutor makes it easier to do my tasks:
This question is about both PEOU and PU. On one hand, it Overall, the following interaction patterns emerged from the analy-
assesses ease of task accomplishment (PEOU), and on the sis of the data. Four students neither made submissions to Artemis
other, it speaks to the utility value of the AI-Tutor (PU). nor sought feedback from the AI-Tutor. One student made a sin-
(6) I find the AI-Tutor useful: This is a direct reflection of gle submission to Artemis without asking for any feedback from
the PU dimension, asking the respondent to evaluate the the AI-Tutor. In contrast, a different student sought feedback from
overall usefulness of the AI-Tutor. the AI-Tutor once, yet did not submit anything to Artemis. Two
students made several submissions to Artemis without seeking
To obtain additional feedback, we asked the following open feedback from the AI-Tutor. Different patterns were observed, with
questions: one student displaying each of the following behaviors: making a
single submission to Artemis and seeking feedback from the AI-
(1) What challenges did you encounter when utilizing the AI- Tutor once, making multiple submissions to Artemis and asking for
Tutor? feedback from the AI-Tutor once, and making a single submission
(2) Do you have any further suggestions on how the AI-Tutor to Artemis while seeking feedback from the AI-Tutor multiple times.
could be improved? It is particularly noteworthy that 12 students worked intensively
with both systems, uploaded numerous submissions to Artemis and
Both questions aim to uncover specific difficulties or obstacles frequently asked for feedback from the AI tutor.
that users have faced, helping to identify specific areas for improve- Considering its significance, we primarily focus on the behavior
ment in the design or functionality of the AI-Tutor. Lastly, we asked of the 12 students who exhibited high interaction rates with both
questions about their demographics, including their highest degree, Artemis and the AI-Tutor. Figure 4 illustrates the timestamps when
their current semester and their programming experience. the students asked the AI-Tutor or submitted their solution to the
APAS. On the Y-Axis, each line corresponds to a student and the
3.3 Data Analysis X-Axis can be interpreted as a timeline starting with 2023-05-23 and
ending with 2023-05-29. The red points in this figure indicate the
The data analysis involved first the combination of two datasets:
time at which a student submitted their code to APAS. indicate the
The data saved when AI-Feedback was requested and the data saved
exact time at which a student requested feedback from the AI-Tutor.
when the students submitted their solutions. The “AI Feedback”
This figure shows that there are mainly two different ways in which
dataset provided insights into the code at feedback time, feedback
students interact with the AI tutor. Based on this timeline, we were
from the AI model, user details and timestamps. The student sub-
able to derive two user personas.
missions included the code at submission time, the test results and
the respective timestamps. 4.1.1 Continuous Feedback - Iterative Ivy. Iterative Ivy represents
For accuracy, students who did not solve the exercise and did students who utilize the AI-Tutor intensively before their initial
not engage with the AI-Tutor were excluded from the qualitative submission to the APAS. These students often begin without a com-
analysis allowing for an assessment of a total of 12 participants. plete solution and turn to the AI-Tutor for guidance on understand-
This analysis was designed to identify patterns, and insights from ing and solving the exercise. The AI-Tutor, in its capacity, guides
the student responses, ensuring a comprehensive understanding of through specific instructions encompassing aspects like function
their experiences with the AI-Tutor. This analysis included: implementation, memory management, and value calculation. Over
multiple feedback cycles, students refine their solutions. When the
(1) Temporal Coding: Students’ submissions and feedback AI-Tutor’s feedback shifts towards minor optimizations, students
request times were identified and marked in different colors tend to transition to submitting their work to the APAS, aiming for
to identify interaction patterns of students. a perfect score.
(2) Thematic Coding: Students’ responses from the open-
ended questions were initially read and re-read to identify 4.1.2 Alternating Feedback - Hybrid Harry. Hybrid Harry exem-
common themes and patterns. plifies students who alternate between the AI-Tutor feedback and
(3) Theme Development: The patterns were grouped un- APAS submissions throughout their coding process. Typically, they
der broader thematic categories, and a narrative was con- begin their tasks by seeking initial insights from the AI-Tutor even
structed around each theme. This involved interpreting the before submitting a solution. Some send repeated requests for feed-
data within the context of this study’s research questions. back on the same code segment, indicating potential uncertainties

Preprint — do not distribute.


Eduard Frankford, Clemens Sauerwein, Patrick Bassner, Stephan Krusche, and Ruth Breu

Figure 4: This figure shows the times when each student asked the AI-Tutor or submitted a code solution to the APAS.

or the need for more explicit guidance. These students tend to mixed, ranging from positive to equally negative. However, the
submit their work to the APAS after establishing a foundation of polarized responses appear to neutralize one another, resulting in a
their code. Notably, the AI-Tutor recognized incomplete or non- largely neutral overall response.
functional implementations, which students corrected after being Mapping the Likert scale from -3 to +3, using 0 as a neutral
told so by the AI-Tutor. midpoint. The average sentiment for each answer is the following:
• I find the AI Tutor easy to use: Somewhat Agree (1.29).
Main Findings for RQ 1 • Using the AI-Tutor for my tasks enables me to accomplish
We identified two user personas: (1) Continuous Feedback - the tasks more quickly: Neutral (-0.29).
Iterative Ivy, who relied mainly on AI feedback before final • Using the AI-Tutor improves my performance: Neutral (-
submissions to APAS, and (2) Alternating Feedback - Hybrid 0.43).
Harry, who alternately used the AI-Tutor and APAS submis- • Using the AI-Tutor for my tasks increases my productivity:
sions throughout the process. Neutral (-0.43).
• Using the AI-Tutor makes it easier to do my tasks: Neutral
(0.14).
4.2 Student Experience • I find the AI-Tutor useful: Neutral (0.14).
Figure 5 represents the distribution of user responses based on the
Only the statement “I find the AI Tutor easy to use” received an
questionnaire defined in Section 3. The responses are presented as
other than neutral average response, indicating a mild agreement.
horizontal stacked bars. Each bar represents a different statement,
Regarding the open questions about challenges encountered
and the segments of the bar represent the proportion of responses
while using Artemis and its AI-Tutor, student feedback consistently
for each level of agreement. The position of the bars along the
touched on several main themes:
X-axis reflects the average sentiment of the responses, ranging
from negative on the left to positive on the right. The zero point (1) Desire for Greater Specificity: The AI-Tutor’s responses
serves as a reference for interpreting Figure 5. If the majority of were perceived as too generic. Students preferred more
a bar lies to the left of this point, it generally indicates a more context-specific feedback pointing directly to improvement
negative sentiment. Conversely, if it is situated to the right of the areas in the code.
zero point, the sentiment is predominantly positive. Like this, the (2) Request for Increased Interactivity and Interface Con-
figure allows us to easily visualize how users perceive the AI-Tutor cerns: Students expressed the wish for enhanced interac-
and its benefits. Examining this figure we found that reactions were tive capabilities with the AI-Tutor, such as the ability to ask

Preprint — do not distribute.


AI-Tutoring in Software Engineering Education

Figure 5: Students’ satisfaction with the AI Tutor.

follow-up questions after initial feedback. Additionally, the the system tends to return a more high level explanation of the task,
interface was criticized because there was no possibility to if the students had not yet written a lot of code. When the student
see old feedback because once the feedback window was has already written much code that is mostly correct, then the
closed one could only request new feedback. system tends to start giving recommendations on how to improve
(3) Demand for Concrete Examples: To supplement the the code quality. For example, to add comments explaining the
written feedback, students believed that concrete code ex- code or to change ternary operators to if-statements for better
amples would help them interpreting the AI’s suggestions. readability. An other surprising insight is that the AI-Tutor was
(4) Apprehension about Learning Inhibition: Some stu- able to give feedback on logical and semantic issues. We found that
dents feared that using the AI-Tutor might lead to over- if students had defined wrong boundary conditions to terminate a
reliance which would slow down their learning progress. loop, then the AI-Tutor recognized this and proposed to the student
In terms of general feedback, students mentioned the system’s to change this condition. This immediate feedback helped students
potential, but also that it is notably perceived as an early-stage proto- to quickly correct their errors and thereby mitigating the acquisition
type. Furthermore, they compared its current utility to rudimentary of poor coding practices. Additionally, the system’s inherent ability
software aids, expressing hope for more refined, context-aware to serve feedback to a large, diverse student population in real-time
feedback in future iterations. underscores its applicability in large-scale educational contexts,
particularly in Massive Open Online Courses (MOOCs).
However, we also learned that AI-Tutoring does not come with-
Main Findings for RQ 2 out its challenges. The AI-Tutor, while efficient, occasionally de-
Some students found the system useful others stated the oppo- livered only general feedback, which means that there’s room to
site resulting in an overall neutral result regarding the TAM. refine its responses for more detailed, code-specific guidance. Ana-
However, answers to the open-questions revealed that students, lyzing the feedback provided by the AI-Tutor we found that from
which gave mostly negative responses found the feedback to 75 feedback requests 55(66.6%) were useful and 20(26.6%) were cate-
be too generic and lacking concrete examples. gorized as not useful. Among the 20 not useful responses we found
that three revealed the solution of the exercise to be solved, four
answers were hallucinations and 13 were too general to be helpful
4.3 Lessons Learned in the students situation. Mostly, if the answers were too general,
The practical integration of a large language model into an APAS we found that the AI-Tutor explained the exercise to the student
offered valuable insights into the system’s strengths and weak- even-though his or her solutions was already very sophisticated.
nesses. Through this experience, several key lessons and actionable The hallucinations were mainly about the AI-Tutor stating that a
insights can be derived. function looks well implemented, although there was no student
A key lesson learned is that the AI-Tutor exhibits the capacity implementation there yet or that a function should be implemented
for real-time, personalized feedback provision. We have found that that was already implemented by the student.

Preprint — do not distribute.


Eduard Frankford, Clemens Sauerwein, Patrick Bassner, Stephan Krusche, and Ruth Breu

Additionally, as the existing system did not allow any interac- Furthermore, given the responses to the TAM questions we found
tion, we learned that enhancing its capability to address follow-up that students have mixed feelings regarding the usefulness of the
queries would improve the learning experience. Regarding the op- AI-Tutor. While some students do not appreciate the help of the
erational dependency, we found that downtimes in the API could AI-Tutor others do appreciate it. It is important to identify the exact
jeopardize the tutor’s functionality. reasons why this is the case, but a first analysis indicates that the
We also found that it is important to address students’ over- main reasons for the negative responses are user interface and
reliance concerns, encouraging them to use the AI-Tutor without prompt related. Students who responded negatively to the TAM
the fear that their learning progress might be hindered. Addition- questions also complained about the inability to ask follow-up
ally, the feedback quality might get compromised due to context questions and that the system did not return code examples or that
limits of models like GPT-3.5-Turbo. Exploring ways to manage this the feedback is too general. The problem regarding the inability
limitation effectively will be beneficial. Despite careful prompt craft- to ask follow up questions can be solved by changing to a chat-
ing, there were instances where the AI-Tutor revealed solutions. bot based system. The second problem regarding the missing code
Ensuring that the model maintains adherence to the guidelines is a examples can be addressed by changing the prompt to allow code
pivotal lesson. examples as responses. As a result, it is reasonable to conclude that
GPT-3.5-Turbo can be successfully used as a language model behind
an AI-Tutor.
Main Findings for RQ 3 Addressing fears that AI-Tools may inhibit learning success is
also crucial. Reiterating the tool’s purpose to supplement rather
Implementing an AI-Tutor in an APAS showed that AI can sup-
than supplant traditional learning methods may decrease such
port human tutors, allowing them to focus on deeper personal
concerns.
interactions. However, improvements are needed, as the AI’s
Additionally, the use of AI-Tutors in programming education,
feedback was effective only 66.6% of the time, being too generic,
as seen in this research, presents a unique set of learned lessons.
revealing solutions, or incorrect. Additionally, some students
Regarding ITS we have found that LLMs offer a distinct adaptability
worry that using the AI-Tutor may slow their learning progress.
advantage, because in conventional ITS, altering feedback mecha-
nisms often demands intricate changes in the system’s codebase,
which can be time-consuming and resource-intensive. However,
5 DISCUSSION with LLMs, modifications are primarily done through prompt en-
In this study we found mainly two usage strategies adopted by gineering. Given their vast training data, refining or adjusting the
students when interacting with the AI-Tutor. It is important to prompts can quickly adapt the feedback the model provides, with-
understand these user personas, as they provide insights that can out needing to change its internal mechanics. This allows educators
inform the design of AI-powered educational systems to be able to swiftly adapt to changing educational needs or methodologies.
to fit different learning styles and strategies. First of all, we de- Among the advantages of AI-Tutors over traditional human tu-
fined the user persona called Iterative Ivy. Users assigned to this tors, the promptness of feedback provided by AI-Tutors stands out
persona first used the AI-Tutor intensively and only when the solu- as a game-changer. The capacity to instantly identify and correct
tion was already very advanced started to submit their solutions to errors can be vital for students learning programming, because
the system. This approach seems to favor a traditional, linear pro- quick feedback can minimize the propagation of misunderstand-
gramming strategy: (1) Comprehending the problem, (2) Writing a ings and bad coding practices. Moreover, the scalability and cost-
complete solution, and then (3) Validating the solution. By seeking effectiveness of an AI-Tutor makes it a compelling choice, especially
continuous feedback from the AI-Tutor, these students ensured in resource-limited settings or with a large student base.
they were progressing on the right track before submitting their However, these benefits come with their share of challenges. An
final solution. This finding suggests that AI-Tutors are beneficial to aspect of the AI-Tutor that needs attention is its occasional inclina-
students who prefer to seek guidance and validation throughout tion to “hallucinate”, producing responses that might not be entirely
their learning process, rather than just at the end. However, a po- accurate or relevant. In this study, this primarily manifested when
tential concern here could be an over-reliance on the AI-Tutor. The a student had already implemented a correct solution, resulting in
continuous feedback-seeking behavior may stem from uncertainty the LLM sometimes advising the student to refine a program that
or lack of confidence, which needs to be addressed in further ped- was already functioning correctly. A potential mitigation strategy
agogical planning. Secondly, we defined the user persona Hybrid could involve integrating the AI feedback with unit test results. This
Harry. Whose strategy contrasts sharply with Iterative Ivy’s ap- would inform the student when their solution meets all criteria,
proach. Users assigned to this persona alternated between seeking signaling that subsequent AI feedback may not be entirely accurate.
AI-Tutor feedback and submitting solutions to the APAS. These The dependence on API availability and the inherent token limit
students opted for an iterative learning approach, which represents of models like GPT-3.5-Turbo add an additional layer of complexity.
a more agile programming practice. This suggests that AI-Tutors Any change or downtime in the API can hinder the AI-Tutor’s
and APAS can facilitate active, self-regulated learning. However, operations, and the context limitation imposed by the token limit
the risk here lies in the potential for students to rely too heavily on can affect the quality of feedback, especially for more complex or
the test feedback to guide their work, which could inadvertently lengthy code submissions.
lead to a trial-and-error approach to solve an exercise, rather than A more psychological perspective brings forth concerns about
understanding the core principles. the impact of AI-Tutors on students’ learning progress and the lack

Preprint — do not distribute.


AI-Tutoring in Software Engineering Education

of a personal touch. An over-reliance on the AI-Tutor might impede 6.3 External Validity
students from developing their problem-solving skills, as they might One potential limitation in this domain arises from the fact that
rely too heavily on instant feedback rather than trying to debug we integrated the AI-Tutor only into Artemis. However, a system-
and solve problems themselves. Additionally, the impersonal nature atic comparison of various APASs confirmed that Artemis’ basic
of an AI-Tutor might make the learning experience less engaging functionalities are echoed in many other APASs, deeming it a repre-
and less adaptive to individual student needs, which could affect sentative system [27]. Additionally, the integration of the AI-Tutor
motivation and learning outcomes. can be done platform independent, because the approach stays the
These challenges should not overshadow the immense poten- same, as it should be possible on all APAS to integrate a pop-up
tial that an AI-Tutor holds. By addressing the issues mentioned, window that displays the results of the REST API calls.
we can create more sophisticated and effective AI-Tutors that can Another potential threat to external validity is the use of a GPT
significantly enhance the educational experience and outcomes for model as the foundation for the AI-Tutor. While the used model
students learning programming. The journey towards optimizing is a state-of-the-art LLM and exhibits advanced conversational
AI-Tutors for programming education is still in progress, but the abilities, it might not perfectly mimic every possible LLM’s behavior.
destination seems promising. Nonetheless, given that the model is based on the same foundational
architectures as most other prevalent LLMs, and shares many of
their characteristics and capabilities, we argue that the findings
6 LIMITATIONS related to GPT-3.5-Turbo can largely be extrapolated to other similar
In this paper we mainly considered the following four categories models. It serves as a representative example, providing insights
of validity, also used by [33]: (1) Construct validity, (2) Reliability, that are likely applicable across various LLMs.
(3) Internal validity and (4) External validity. Furthermore, the total number of participants can influence the
external validity. In this study 23 students from the course “In-
troduction to Programming” participated. This sample size is too
6.1 Construct Validity small to conduct a statistically significant quantitative analysis. As
The research questions were defined using the PICOC system [24]. a result, we decided to focus on a qualitative analysis and report
The PICOC framework provides a systematic way to formulate the experiences of implementing and operating an AI-Tutor. This
research questions by emphasizing five elements: Population, In- allows for a deeper exploration of the students’ experiences and
tervention, Comparison, Outcome, and Context. This structured behaviors when using the system. Last but not least, given the
approach ensures that research questions are both comprehensive students’ recent engagements with traditional human tutors, they
and relevant. For this study: were especially well-suited to evaluate the AI-Tutor.

(1) RQ1 primarily addresses the Population, Intervention, Out- 6.4 Internal Validity
come, and Context by examining the nature of student
interactions within the specific context of an APAS assisted This study study largely leaned on qualitative analysis, which can
by an AI-Tutor. sometimes introduce subjective bias. Nevertheless, the methodolog-
(2) RQ2 focuses on the Population, Intervention, and Outcome ical rigor employed aimed to minimize such biases. The detailed
by probing the students’ experiences with AI-driven feed- procedures involved in the qualitative analysis have been outlined
back when guided by the AI-Tutor. in the research methodology section. By closely following these
(3) RQ3 encompasses all the PICOC elements, especially Con- methodological steps, we have aimed to ensure that the findings
text, by analyzing the broader lessons learned from deploy- are both credible and trustworthy.
ing an AI-Tutor within the APAS environment.
7 CONCLUSION
The research questions were further refined through discussions In this study of integrating the model behind ChatGPT as an AI-
with several experts in the field to ensure alignment with the topic Tutor into the Artemis APAS, we uncovered both the immense
of interest. Leveraging the PICOC system as a foundation, coupled potential and challenges of such an application. While the AI-Tutor
with the structured data collection approach and exploratory survey, offered advantages like timely feedback and scalability, its limita-
facilitated a thorough answering of RQ1–3. tions were apparent. These included occasionally generic feedback,
lack of interactive dialog, operational vulnerabilities related to API
availability, potential over-reliance by students, the absence of a
6.2 Reliability human touch, and technical constraints like context limits.
We conducted a systematic data collection and analysis approach, as The vast potential of AI-Tutors in programming education is
detailed in Section 3. Therefore, the process is both transparent and undeniable, but careful implementation and ongoing refinement
reproducible. However, it’s crucial to note that the use of GPT-3.5- are essential. This exploration underscores the need for more re-
Turbo introduced a variable element. Given the nature of LLMs, not search in this domain, balancing technological progress with the
every prompt produces identical responses on different occasions. irreplaceable human aspect of education.
As a result, while the core structure and methodology can be repro- Future work should focus on enhancing AI-Tutor’s feedback
duced, there may be slight variations in the responses generated specificity, interactivity improvements, user interface refinements,
by the model across different replications of the experiment. and addressing the token limit and prompt engineering challenges.

Preprint — do not distribute.


Eduard Frankford, Clemens Sauerwein, Patrick Bassner, Stephan Krusche, and Ruth Breu

The potential exploration of more powerful models like GPT-4 may [6] Fred D. Davis. 1985. A Technology Acceptance Model for Empirically Testing
further improve the feedback quality. This study’s findings serve as New End-User Information Systems: Theory and Results. Ph. D. Dissertation.
Massachusetts Institute of Technology, Sloan School of Management.
a foundation for continued research in this innovative intersection [7] Awishkar Ghimire, Surendrabikram Thapa, Avinash Kumar Jha, Surabhi Ad-
of AI and education. hikari, and Ankit Kumar. 2020. Accelerating business growth with big data and
artificial intelligence. In 2020 Fourth International Conference on I-SMAC (IoT in
Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE, 441–448.
8 DATA AVAILABILITY [8] Andreas M Hinz. 1992. Pascal’s Triangle and the Tower of Hanoi. The American
mathematical monthly 99, 6 (1992), 538–544.
The data supporting the findings of this study are openly available [9] Dean Ho, Stephen R Quake, Edward RB McCabe, Wee Joo Chng, Edward K Chow,
in Figshare 4 . The dataset comprises the following: Xianting Ding, Bruce D Gelb, Geoffrey S Ginsburg, Jason Hassenstab, Chih-Ming
Ho, et al. 2020. Enabling technologies for personalized and precision medicine.
(1) Data Analysis ANONYM.xlsx: Trends in biotechnology 38, 5 (2020), 497–518.
(a) Sheet 1: Contains extracted data from the database, [10] Jay Holland, Antonija Mitrovic, and Brent Martin. 2009. J-LATTE: a Constraint-
based Tutor for Java. (2009).
such as Code, Feedback, User, Time, as well as various [11] Sajed Jalil, Suzzana Rafi, Thomas D LaToza, Kevin Moran, and Wing Lam. 2023.
descriptive statistics, detailing, for instance, the fre- Chatgpt and software testing education: Promises & perils. In 2023 IEEE Inter-
national Conference on Software Testing, Verification and Validation Workshops
quency with which each user consulted the AI-Tutor, (ICSTW). IEEE, 4130–4137.
submissions to the APAS, and the final score. [12] Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna
(b) Sheet 2: Houses the responses from the qualitative Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke
Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel,
survey. Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Sei-
(c) Sheet 3: Features an analysis that groups submissions del, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023.
and the state of the code when querying the AI-Tutor. ChatGPT for good? On opportunities and challenges of large language mod-
els for education. Learning and Individual Differences 103 (apr 2023), 102274.
It assesses the quality of the feedback and observes https://doi.org/10.1016/j.lindif.2023.102274
code alterations post-feedback. [13] Hieke Keuning, Johan Jeuring, and Bastiaan Heeren. 2016. Towards a systematic
review of automated feedback generation for programming exercises. In Pro-
(2) Student submissions to the version control system: ceedings of the 2016 ACM Conference on Innovation and Technology in Computer
Comprises multiple anonymized folders, each storing the Science Education. 41–46.
code a student uploaded to the system. This code is aug- [14] Stephan Krusche. 2021. Interactive learning - A scalable and adaptive learning
approach for large courses. Habilitation. Technische Universität München.
mented at the end with annotations detailing if the student [15] Stephan Krusche and Andreas Seitz. 2018. Artemis: An automatic assessment
had previously consulted the AI-Tutor and, if so, the associ- management system for interactive learning. In Proceedings of the 49th ACM
ated timestamps. technical symposium on computer science education. 284–289.
[16] Nir Kshetri. 2023. The Economics of Generative Artificial Intelligence in the
(3) Artemis-Files: Contains all essential files to be integrated Academic Industry. Computer 56, 8 (aug 2023), 77–83. https://doi.org/10.1109/
into your public Artemis project to activate the AI-Tutor mc.2023.3278089
[17] Mohammad Amin Kuhail, Nazik Alturki, Salwa Alramlawi, and Kholood Alhejori.
functionality. These files are designed to work with the 2023. Interacting with educational chatbots: A systematic review. Education and
open-source Artemis project.5 A comprehensive repository Information Technologies 28, 1 (2023), 973–1018.
has not been released due to challenges associated with its [18] James A Kulik and JD Fletcher. 2016. Effectiveness of intelligent tutoring systems:
a meta-analytic review. Review of educational research 86, 1 (2016), 42–78.
anonymization. [19] Matthias Linhuber, Jan Philip Bernius, and Stephan Krusche. 2023. Constructive
Alignment in Modern Computing Education: An Open-Source Computer-Based
It is essential to note that all personal identifiers have been Examination System. In 23nd Koli Calling International Conference on Computing
removed to maintain confidentiality and adhere to data protection Education Research (Koli, Finland) (Koli ’23). https://doi.org/10.35542/osf.io/
principles. nmpf6
[20] Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida
Zhao, Tianwei Zhang, and Yang Liu. 2023. Jailbreaking chatgpt via prompt
9 ACKNOWLEDGMENTS engineering: An empirical study. arXiv preprint arXiv:2305.13860 (2023).
[21] Chung Kwan Lo. 2023. What is the impact of ChatGPT on education? A rapid
The CodeAbility Austria project has been funded by the Austrian review of the literature. Education Sciences 13, 4 (2023), 410.
Federal Ministry of Education, Science and Research (BMBWF). [22] Eng Lieh Ouh, Benjamin Kok Siew Gan, Kyong Jin Shim, and Swavek Wlodkowski.
2023. ChatGPT, Can You Generate Solutions for my Coding Exercises? An
Evaluation on its Effectiveness in an undergraduate Java Programming Course.
REFERENCES arXiv preprint arXiv:2305.13680 (2023).
[23] Zachary A Pardos and Shreya Bhandari. 2023. Learning gain differences be-
[1] J. R Anderson and E. Skwarecki. 1986. The automated tutoring of introductory tween ChatGPT and human tutor generated algebra hints. arXiv preprint
computer programming. Commun. ACM 29, 9 (sep 1986), 842–849. https: arXiv:2302.06871 (2023).
//doi.org/10.1145/6592.6593 [24] Mark Petticrew and Helen Roberts. 2008. Systematic reviews in the social sciences:
[2] Luka Bradeško and Dunja Mladenić. 2012. A survey of chatbot systems through A practical guide. John Wiley & Sons.
a loebner prize competition. In Proceedings of Slovenian language technologies [25] Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background,
society eighth conference of language technologies, Vol. 2. sn, 34–37. applications, key challenges, bias, ethics, limitations and future scope. Internet
[3] PL Brusilovsky. 1992. Intelligent tutor, environment and manual for introductory of Things and Cyber-Physical Systems (2023).
programming. Educational & Training Technology International 29, 1 (1992), [26] Jürgen Rudolph, Samson Tan, and Shannon Tan. 2023. ChatGPT: Bullshit spewer
26–34. or the end of traditional assessments in higher education? Journal of Applied
[4] Tyne Crow, Andrew Luxton-Reilly, and Burkhard Wuensche. 2018. Intelligent tu- Learning and Teaching 6, 1 (2023).
toring systems for programming education. In Proceedings of the 20th Australasian [27] Clemens Sauerwein, Simon Priller, Martin Dobiasch, Stefan Oppl, Michael
Computing Education Conference. ACM. https://doi.org/10.1145/3160489.3160492 Felderer, and Ruth Breu. 2023. Lecturers’ and Students’ Experiences with an
[5] Marian Daun and Jennifer Brings. 2023. How ChatGPT Will Change Software Automated Programming Assessment System. (2023).
Engineering Education. In Proceedings of the Conference on Innovation and Tech- [28] Dominik Sobania, Martin Briesch, Carol Hanna, and Justyna Petke. 2023. An
nology in Computer Science Education V. 1. 110–116. analysis of the automatic bug fixing performance of chatgpt. arXiv preprint
arXiv:2301.08653 (2023).
[29] Petra Stutz, Maximilian Elixhauser, Judith Grubinger-Preiner, Vivienne Linner,
4 https://figshare.com/s/636a9c5ff8f2c8315f26
Eva Reibersdorfer-Adelsberger, Christoph Traun, Gudrun Wallentin, Katharina
5 https://github.com/ls1intum/Artemis Wöhs, and Thomas Zuberbühler. 2023. Ch (e) atgpt? an Anecdotal Approach on

Preprint — do not distribute.


AI-Tutoring in Software Engineering Education

the Impact of Chatgpt on Teaching and Learning Giscience. Preprint) https://doi. [32] Rainer Winkler and Matthias Söllner. 2018. Unleashing the potential of chatbots
org/10.35542/osf. io/j3m9b (2023). in education: A state-of-the-art analysis. In Academy of Management Proceedings,
[30] Nigar M Shafiq Surameery and Mohammed Y Shakor. 2023. Use chat gpt to solve Vol. 2018. Academy of Management Briarcliff Manor, NY 10510, 15903.
programming bugs. International Journal of Information Technology & Computer [33] Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and
Engineering (IJITC) ISSN: 2455-5290 3, 01 (2023), 17–22. Anders Wesslén. 2012. Experimentation in software engineering. Springer Science
[31] Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques & Business Media.
Klein, and Tegawendé F Bissyandé. 2023. Is ChatGPT the Ultimate Programming [34] Caiming Zhang and Yang Lu. 2021. Study on artificial intelligence: The state
Assistant–How far is it? arXiv preprint arXiv:2304.11938 (2023). of the art and future prospects. Journal of Industrial Information Integration 23
(2021), 100224.

Preprint — do not distribute.

You might also like