AI Coding Tools, LLM, ChatGPT, Copilot, Instructor Perspectives
AI Coding Tools, LLM, ChatGPT, Copilot, Instructor Perspectives
106
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
Figure 1: Summary of our study findings. We interviewed 20 introductory programming instructors and present both a) their
short-term plans, and their longer-term plans to either b) resist or c) embrace the use of AI coding tools in their classes.
having zero prior experience with AI coding tools to having used • A comprehensive snapshot of the current state of AI coding
them for personal programming projects. tools and the range of human-centered research surrounding
Figure 1 summarizes our findings: a) In the short-term, all par- them as of early 2023, less than a year after the public release
ticipants were concerned about cheating, which led to immediate of ChatGPT and GitHub Copilot.
reactions such as weighing exam scores more, banning the use • The first study of computing instructors’ perceptions of AI
of AI, or showing students the capabilities and limitations of AI coding tools, which found that they were most concerned
tools. Longer-term, opinions diverged into two groups, with some about cheating in the short term but that longer-term their
wanting to b) resist the use of AI tools and continue teaching pro- sentiments bifurcated into either wanting to resist these tools
gramming fundamentals, while others wanted to c) embrace AI or to embrace them by integrating them into future classes.
tools by integrating them into their classes to help both students • A set of open research questions for the computing education
and instructors. Participants brainstormed a range of ideas for both community to consider as AI coding tools potentially grow
resisting and embracing AI in future classes, ranging from creating more widespread in the coming years.
‘AI-proof’ assignments that may deter AI tools to new kinds of
assignments where students must collaborate with AI.
Over the past academic year (2022–2023) computing educa- 2 BACKGROUND: THE CURRENT STATE OF
tion researchers have been actively discussing these topics in blog AI CODING TOOLS IN EARLY 2023
posts [19, 54, 55], a SIGCSE position paper [14], and workshops [65, Over the past year (2022–2023), AI code generation and explanation
67]. Our paper complements these ongoing discussions by present- tools have become more widespread with the release of products
ing the first empirical study of computing instructor perspectives on like GitHub Copilot (currently free for students and instructors) and
AI code generation and explanation tools. The timing of our study is ChatGPT (currently free for the public). These are built upon neural
unique since our interviews occurred in early 2023, which is the network models trained on terabytes of textual data scraped from
first academic term where large numbers of students have access to the public internet (e.g., billions of webpages, billions of lines of
these tools due to ChatGPT’s release in late 2022. Thus, our findings open-source code from GitHub, and the contents of open-licensed
capture a rare snapshot in time when computing instructors are books [24, 85]). Their large-scale architecture enables them to ‘learn’
recounting their early reactions to this fast-growing phenomenon patterns from data and generate text that a human might plausibly
but have not worked out any best practices yet. write, which is why they are commonly called Large Language
Since we are still very early in the adoption curve of AI coding Models (LLMs) [24]. And since code is a structured form of text,
tools, we hope our study findings can spur conversations within these tools can also synthesize code; thus, some refer to AI code
the computing education community about whether to resist or generation as “Large Language Model (LLM)-driven program syn-
embrace these tools in the coming years, and how to work with thesis” [11, 55]. For brevity, throughout this paper we use the terms
these tools in ethical and equitable ways. We have a unique and ‘AI coding tools’ or simply ‘AI tools’ as shorthand to refer to these
timely opportunity to develop both policies and social norms that tools. Users interact with these AI tools in three main ways:
influence how these tools may impact future generations of students.
Thus, we conclude this paper with a set of open research questions 1) Standalone: The simplest interface to LLMs is a web application
derived from our study findings (Section 7). that shows a text box, such as the OpenAI Playground [84] for
In sum, the contributions of this paper are: GPT-series LLMs [20, 24, 85] (e.g., GPT-4) and TextSynth for vari-
ous open-source LLMs [36]. The user can input some text (called a
107
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
‘prompt’ [112]) and the tool tries to generate text (or code) that plau- join first and last names. Ask me clarifying questions one at a time
sibly continues from the user’s input. An example code-generation until you have enough information to write this code for me.” 1
prompt might be “Write a Python function that adds two numbers.” Code completion: Tools that integrate into the IDE, such as Co-
2) Conversational: Chatbots such as ChatGPT [82] improve upon pilot [35], Ghostwriter [93], CodeWhisperer [8], and Tabnine [110],
a standalone interface by enabling users to hold a back-and-forth enable the user to start typing code and see a list of contextually-
multi-turn conversation with the AI. This allows users to refer back relevant code completions, like an AI-enhanced autocomplete.
to prior context instead of needing to re-enter their entire prompt Code refactoring: Once the user has written some code, they can
every time. For instance, a user could say “Now rewrite this code ask the tool to rewrite it in order to improve readability, style, or
using more descriptive variable names” and ChatGPT knows that the maintainability; e.g., “Refactor this function to use smaller helper
user is referring to code that was written earlier in the conversation. functions.” Again, holding a conversation with ChatGPT and an-
3) IDE-integrated: Tools such as GitHub Copilot [35], Replit Ghost- swering its follow-up questions can help it to generate better code.
writer [93], Amazon CodeWhisperer [8], Codeium [2], and Tab- Code simplification: One kind of refactoring that students may try
nine [110] integrate into the user’s IDE (e.g., Visual Studio Code). is to ask the tool to simplify a piece of code. For instance, “Rewrite
This lets them do autocomplete and generate suggestions within this code using only simple Python features that a student in an
the context of the user’s codebase as they are coding. Another ben- introductory programming course would know about.” Students could
efit of IDE integration is that these tools can pull in the user’s own use this technique to generate more ‘plausible-looking’ answers to
surrounding code (both before and after the cursor, plus in other CS1/CS2 assignments, because otherwise it may look suspicious if
open project files [94]) as context, which enables them to generate they turn in code that uses too many advanced language features.
personalized code suggestions to fit the user’s current task. Also, Language translation: These tools can also translate code written
some IDE-integrated AI tools include an embedded chat interface. in one programming language into another (albeit imperfectly).
As an indicator of just how fast things are moving in this space, They can also translate mentions of human languages within code.
many new LLMs and AI coding tools have been announced in the Test generation: Users can also ask AI tools to generate test cases
2.5 months between when this paper was submitted (mid-March (e.g., unit/regression tests). For instance, Copilot has a TestPilot
2023) and when the final camera-ready publication was completed feature [72] that generates tests and interactively refines them based
(early June 2023). Examples include new coding-capable LLMs such on user feedback. These tools can often create tests for unusual
as LLaMA [103], GPT-4 [83], Cerebras-GPT [34], CodeGen2 [79], edge cases that novices may not think of on their own [113].
DIDACT [70], replit-code [5], and StarCoder [61]; LLM-based chat- Structured test data generation: AI tools can also generate struc-
bots such as Alpaca [102], Claude [1], Dolly [29], Koala [42], and tured data that users can pass into their software to manually test it.
Vicuna [26]; and new IDE-integrated AI tools such as Sourcegraph For instance, one could ask a tool to generate 100 fake user profiles
Cody [7], Google’s Codey LLM (similar name but unrelated tool!) (with fake names, ages, and locations) in some format, like JSON or
integrated into Colab and Android Studio IDE [90], and Meta’s a Python dictionary, in order to test a prototype social media app.
CodeCompose [78]. GitHub also announced new Copilot X [3]
2.1.2 Code explanation capabilities. Given a piece of code and
enhancements, which include IDE-embedded AI chat interfaces.
natural language instructions as input, these tools can explain what
that code does in a way that emulates how a human instructor
2.1 Current Capabilities AI Coding Tools might explain it to a student:
To provide context for the early-2023 era when our study’s inter- Explanations at varying expertise levels: The most straightfor-
views took place, we now summarize the current capabilities of AI ward prompt is to ask the tool to explain what a piece of code does.
coding tools that are most relevant for educational use cases. One can also use the ‘persona prompt pattern’ [112] to generate
2.1.1 Code generation capabilities. Given natural language and/or explanations for a given expertise level. For instance, “ChatGPT, I
code as input, these tools can generate relevant code: want you to take on the persona of a university CS1 instructor talking
to a student who has never taken a programming class before. Ex-
Specification-to-code: Given a natural language description for plain what this function does: [user’s code].” Users can also ask it to
what a piece of code should do (e.g., a function or class specifica- automatically generate code comments or API documentation.
tion), these tools can generate code to meet that specification. For
Debugging help: Users can ask the tool to find possible bugs in
example: “Create a function that takes a list of first names and a
the given code and explain why those may be bugs. Note that
list of last names then returns a new list with those names joined.”
the tool does not run the code or perform rigorous static code
Note that many CS1/CS2 programming assignments are phrased as
analysis. But in practice, ‘superficial’ bugs in student code (e.g., off-
specifications that students can directly input into tools.
by-one errors in a loop bound) can be found by the tool matching
Conversational specification-to-code: The main limitation of against patterns learned from billions of lines of open-source code.
‘specification-to-code’ is that novices are not good at writing precise And with ChatGPT, one can ask follow-up clarifying questions
specifications, so the generated code may not be what they want. and engage in a back-and-forth debugging conversation, which
To overcome this limitation, one can use the ‘flipped interaction simulate some aspects of working with a human tutor [71].
prompt pattern’ [112] to have a back-and-forth conversation with 1 The prompts in this paper are simplified illustrative examples that may not work
ChatGPT before it generates the requested code. For instance, the optimally as-is. In practice, it likely takes a fair amount of iteration to craft prompts
user could write: “ChatGPT, I want you to write a Python function to that work reliably and effectively. This process is known as prompt engineering [112].
108
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
Conversational bug finding: A more powerful way to do AI- • Ethical objections: Some people are opposed to using AI
assisted bug finding is to start a back-and-forth debugging conver- tools due to concerns about their creators disregarding soft-
sation. For instance: “Here is my code and the output I see when I run ware licenses when scraping code repositories for training
it. This output looks wrong because the last two array elements are data [22], the environmental impact of training and running
duplicated. What should I change in my code to help me find the bug LLMs [100], and companies using underpaid human workers
more easily?" The AI tool might suggest a code edit; then the user to label and filter training data [89].
can apply that edit, re-run their modified code, and paste the new
text output or error message into the next round of dialogue. This 3 RELATED WORK
conversational technique elegantly bypasses the AI tool’s limitation There are fast-growing lines of research on the technical architec-
that it cannot directly run the user’s code – instead, the user runs ture of LLMs (large language models) that power AI code generation
the code on their computer and sends the output to the AI tool. tools2 , applications of LLMs to many specific domains (e.g., cre-
Code review and critique: AI tools can also serve as a code re- ative writing [77]), and broader societal implications of LLMs [17].
viewer and give detailed critiques. Again, the ‘persona pattern’ [112] Instead of surveying the entire landscape of research on LLMs, we
can be useful here, e.g.,: “I want you to take on the persona of a senior focus our discussion on parts of the literature that are the most
software engineer at a top technology company. I have submitted this relevant to our interview study. This encompasses a range of human-
code to you for a formal code review. Please critique it: [user’s code]” centered research on how LLM-based code generation tools relate
to the fields of software engineering and computing education.
Conceptual explanations with code examples: The user can
also ask the tool to explain a programming concept just like how 3.1 How Software Developers Use AI Code
they would ask a human instructor. For instance: “What’s the differ- Generation Tools
ence between checked and unchecked exceptions in Java? Give code
examples for each.” The tool’s ability to generate code examples can Several recent groups of researchers have studied how software
enable some basic level of (albeit imperfect) fact-checking since the developers use GitHub Copilot in practice, since it is marketed
user can run that code and see if it matches the given explanation. as a tool to help developers be more productive [52]. Bird et al.
combined forum analysis, a think-aloud study, and a survey to
gather usage patterns such as Copilot enabling faster code-writing
2.2 Notable Limitations of AI Coding Tools but at the expense of less code understanding [18]. Sarkar et al.
Here are some commonly-known limitations of these tools and analyzed blog and forum posts to give a similar overview [96].
other reasons people have cited for not using them: Cheng et al. studied how developer communities might build trust
• Inaccuracies: The most notable limitation is that these tools in AI tools [25]. Barke et al. found that developers used it in two
can generate inaccurate outputs with no quality guaran- ways: to help them explore options and to accelerate their path
tees [81]. This may result in subtly-inaccurate code or incor- toward a known goal [13]. Peng et al. found in a 95-user between-
rect explanations which appear believable to novices. subjects study that using Copilot helped developers complete a
• Code quality: They may generate code that is stylistically web development task 56% faster than the control group [87]. But
non-ideal, that may not be robust to edge cases, that have Vaithilingam et al. found in a 24-user within-subjects study that
security vulnerabilities [86], or that is not aligned with what although developers liked using Copilot as a starting point, it did
students are learning in a particular class. not always improve task completion time or accuracy [106].
• Knowledge cutoff: These tools only ‘know’ what is in their More broadly, HCI researchers have done usability studies of AI
training data, which is an older snapshot of the web. So they code generation tools and proposed improved interface designs. For
cannot help with, say, a JavaScript library released last week. instance, Jayagopal/Lubin et al. analyzed the usability of 5 program
That said, they do get periodically re-trained, and Microsoft synthesis tools (Copilot was one of them) and found that those
and Google are augmenting them to search the web [74, 91]. which run in the background (without explicit user triggering) can
Also, tools like Sourcegraph Cody [7] augment an LLM by be more learnable for novices [49]. Sun et al. discovered users’ needs
retrieving code and text from a user’s own project repository for explainability in AI code generation tools [101]. Vaithilingam et
in order to generate responses about facts that are not on the al. prototyped 19 user interface ideas for augmenting IDEs with AI
public web (a form of retrieval-augmented generation [60]). assistance [105]. Ross et al. augmented an IDE with a conversational
• Learning curve: Novices may have a hard time producing AI tool (similar to ChatGPT) called the Programmer’s Assistant [95].
high-quality results with simple prompts [115]. It takes some McNutt et al. proposed a design space for how to integrate AI
level of expertise to craft effective and reliable prompts [112]. assistance within computational notebooks, which have different
• Nondeterminism: AI tools can produce different outputs even affordances than IDEs [73]. Liu et al. proposed a user experience
when given the same prompt. There are settings to reduce enhancement that translates the user’s prompts into code and then
randomness of outputs, but whenever the underlying AI back again to natural language in order to clarify what the tool
models get updated, results can still end up non-reproducible. intends to do [64].
• Offensive content: AI tools can generate outputs that ex- 2 Some examples of LLMs specialized for programming include CodeBERT [37],
hibit harmful biases [17, 63]. For instance, AI-generated code PyMT5 [28], Codex [24], AlphaCode [62], CodeGen [80], CodeGen2 [79], Parsel [116],
InCoder [40], CodeT [23], StarCoder [61], CodeCompose [78], replit-code [5], and
examples may contain offensive stereotypes embedded in Codey [90]. Note that modern general-purpose LLMs such as GPT-4 [83] are trained
variable names or strings [14, 18, 24]. on large amounts of code as well, so they can also do code generation and explanation.
109
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
Although our study focuses on computing instructors and not instance, Mirhosseini et al. interviewed 32 computing instructors
software developers, some of the instructors we interviewed men- across 5 countries to ask about the day-to-day challenges they face
tioned that their goal is to train the next generation of developers. when running their courses [76]. Although their study did not ask
Thus, instructors have been thinking about how to prepare students about AI tools, they found that “having more examples or variety
for a future where they might use these AI tools on the job. of assignments could benefit students both as additional resources
as well as a way to prevent plagiarism.” [76] AI tools can potentially
3.2 Computing Education Research on AI Tools help generate these more varied examples and assignments [67,
When these AI tools first came out, computing education researchers 97]. Krause-Levy et al. interviewed 21 CS instructors to get their
were curious about whether they could solve programming as- views on the purpose of prerequisite coursework [56]. Valstar et
signments that are typically given in CS1 and CS2. For instance, al. interviewed 14 faculty [107] then surveyed 249 faculty [108] to
Finnie-Ansley et al. found that Codex3 could solve both CS1 [38] discover how they felt CS programs should be preparing students
and CS2 [39] exam problems better than most students and also for industry jobs. In contrast to these prior studies, to our knowledge
performed well on variations of the classic Rainfall Problem [98]. our study is the first to interview computing instructors about their
Wermelinger followed a similar study protocol but instead used perspectives on AI code generation and explanation tools.
Copilot within an IDE (instead of standalone Codex) to qualita- Of recent note is a SIGCSE position paper that shares similar
tively understand the user experience of coding with an AI tool motivations as our study. The authors warn that “the sudden via-
in an IDE [111]. Denny et al. then showed that using prompt engi- bility and ease of access to these [AI] tools suggest educators may
neering (i.e., adjusting the wording of the input prompt to Copilot) be caught unaware or unprepared for the significant impact on
can significantly improve results when prompts are phrased more education practice resulting from AI-generated code. We therefore
like step-by-step pseudocode [32]. Vahid et al. found that Chat- urgently need to review our educational practices in the light of
GPT could solve all of the programming assignments in several these new technologies.” [14] Although that paper was not an em-
CS1 courses either by directly copying the assignment prompt into pirical study, it poses relevant questions such as: “What does an
ChatGPT or, if that did not work, by telling it which autograder introductory computing course look like when we can assume that
test cases failed and having it correct itself [104]. A student copied students will be able to easily auto-generate code solutions to their
all the 2022 ‘AP Computer Science A’ free-response questions into lab and assignment tasks by merely pasting problem descriptions
ChatGPT, and it scored 32 out of 36 points [99]. into an AI-powered tool?” Our interview protocol (Section 4) shares
A complementary line of research uses AI tools to assist instruc- some similarities with this question, and some of our findings con-
tors in creating course content: MacNeil et al. used these tools to firm what the authors foresaw as challenges and opportunities
generate explanations for CS1-level code [68] and then embedded of AI tools (see Section 6 for details). Also, two other events at
them within an interactive e-book to see how students engage with SIGCSE 2023 – a workshop on generating course materials using
them [66]. Sarsa et al. used Codex to automatically generate pro- AI tools [67] and a Birds-of-a-Feather session on the implications of
gramming exercises and code explanations, both of which could be AI for computing instructors and students [65] – indicate ongoing
used to help students to get extra practice and guidance [97]. Denny community interest in the topics that our paper covers.
et al. extend this idea by combining Codex with learnersourcing
(i.e., crowdsourcing using learners) to generate and validate exer- 4 METHODS
cises in a way that is personally motivating to learners [33]. Lastly, To gather instructors’ perspectives on AI tools, in early 2023 we con-
Leinonen et al. use Codex to generate a specialized type of code ducted semi-structured interviews with 20 instructors who teach in-
explanation: enhanced compiler and run-time error messages [59]. troductory programming courses at universities across 9 countries.
Lastly, an emerging line of work measures the impact of AI tools Each was done by one researcher over Zoom videoconferencing,
on learners. Prather et al. observed how students used Copilot on lasted 45 minutes to 1 hour, and was video-recorded upon getting
CS1 assignments in a lab study and then interviewed them about verbal consent from the participant. Our interview protocol was
their first impressions [92]. And Kazemitabaar et al. ran a controlled semi-structured and began with three background questions:
study with 69 pre-college students where half used Codex to learn • What is your level of personal experience with AI code gen-
Python and the other half did not [53]. They found that the Codex eration and explanation tools?
group could write code better while demonstrating a similar level • How much do you think that students are using these AI
of understanding as the control group. tools right now?
These projects all focus on applying or extending AI coding tools. • How much have you heard your colleagues discussing these
Our study complements their findings by reporting the perspectives AI tools? (And in what settings?)
of instructors regarding how they plan to prepare for a future where
The purpose of these background questions is to establish a
these tools become more widespread.
baseline for each participant’s perceptions of the status quo in early
2023. They also help get the participant into the frame of mind to
3.3 Perspectives of Computing Instructors
discuss our main open-ended question:
In terms of methodology, the closest related studies to ours are
Imagine a future where all students had an AI tool
those that uncover the perspectives of computing instructors. For
that can: 1) automatically write code to ‘perfectly’
3 Codex is the LLM that GitHub Copilot was originally built on, although newer versions solve any programming problem in your classes and
of Copilot may move to GPT-4 [3] since Codex was discontinued in March 2023. be undetectable by plagiarism detectors since AI tools
110
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
Table 1: We interviewed 20 introductory programming instructors to elicit their thoughts on the use of AI code generation and
explanation tools in their classes. The ‘Years’ column indicates years of experience as a full-time instructor so far.
ID Gender Age Country Years Type of university Prior experience with AI coding tools
P1 F 35-44 U.S. 9 Private PhD-granting experimented with ChatGPT in her class
P2 F 35-44 U.S. 8 Public PhD-granting no usage
P3 M 35-44 Chile 8 Private PhD-granting minimal usage (ChatGPT)
P4 M 45-54 U.S. 16 Private PhD-granting minimal usage (Copilot)
P5 M 65-74 Switzerland 35 Public PhD-granting no usage
P6 F 55-64 U.S. 23 Private liberal arts minimal usage (ChatGPT)
P7 M 35-44 U.S. 8 Public PhD-granting used for programming and in his class
P8 F 35-44 Botswana 3 Public PhD-granting no usage
P9 F 35-44 U.S. 14 Public PhD-granting no usage, purposely avoiding AI tools for now
P10 M 45-54 Rwanda 15 Private PhD-granting no usage
P11 M 55-64 U.S. 25 Private liberal arts minimal usage (ChatGPT)
P12 F 35-44 U.S. 6 Private liberal arts no usage
P13 M 25-34 China 2 Public PhD-granting minimal usage (ChatGPT)
P14 M 45-54 Canada 23 Public undergrad-only lots of usage for programming
P15 F 35-44 U.S. 2 Public PhD-granting used for programming
P16 F 25-34 U.S. 4 Public PhD-granting lots of experience using AI tools in her class
P17 F 45-54 U.S. 14 Private liberal arts minimal usage (ChatGPT)
P18 M 45-54 Spain 20 Public PhD-granting no usage
P19 M 55-64 Australia 33 Public PhD-granting no usage, but has AI research experience
P20 M 25-34 U.S. 3 Public undergrad-only experimented with ChatGPT, AI researcher
generate diverse code variants (not exact copies), and where having an artifact to discuss (such as course materials) can
2) explain what any piece of code does in English so help more substantive ideas come out of brainstorming sessions.
that it can answer free-response homework questions One risk here is that participants might get fixated on low-level
for students too. details, so we also gave them time to do higher-level reflections
Walk me through your CS1/CS2 course materials and before and after walking through their course materials.
let’s brainstorm how you would help students to learn To reduce cognitive biases such as priming [12] or anchoring [41]
effectively given this possible future. What might you effects, we purposely did not mention specific tools such as Chat-
do in both the short-term and longer-term? GPT or Copilot in our interview protocol. Everything was worded
generically as ‘AI tools.’ However, if participants started talking
We displayed this question on-screen via Zoom screen-share and about specific tools, then we let the conversation naturally turn to
spent the majority of each interview focused on it. discussing the details of those tools.
111
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
4.3 Data Overview and Analysis changing longer-term in response to AI, which fall into two possible
One researcher conducted each interview via Zoom and took times- futures: one where students are discouraged from using AI tools
tamped notes during the interview. Then a second researcher inde- in introductory programming courses (Section 5.3), and another
pendently watched each Zoom video recording and took their own where these courses embrace AI tools (Section 5.4).
set of notes. Both researchers met regularly to discuss their notes
and watch excerpts of videos together. Throughout this process, 5.1 What do instructors currently know about
we iteratively came up with a set of themes using an inductive ap- AI coding tools in early 2023?
proach [30]. We made several iterations together as a team before We began each interview with questions about each participant’s
finalizing on our split into short-term and long-term curriculum personal experience with AI coding tools, how much they think
ideas that participants mentioned in their interviews, with long- students are now using them, and how much their colleagues are
term itself then split into two groups (Section 5.3 and Section 5.4, discussing this topic. Their responses represent a baseline as of
respectively). We originally positioned sets of ideas as ‘dimensions’ early 2023, a few months after ChatGPT launched on Nov 30, 2022.
along a continuous spectrum like a design space diagram [45, 58]. Personal experience so far: Although programming assistance
However, we realized that the ideas that participants proposed tools have existed for many years (e.g., code autocomplete in IDEs),
were more discrete in nature (e.g., using paper exams vs. computer- participants mentioned that AI-based tools only came to the fore-
based exams) rather than falling along a spectrum, so we ended up front of their attention over the past year. A few were early adopters
grouping our findings using the format summarized by Figure 1. of trying GitHub Copilot for personal programming in 2022 (e.g.,
P7, P14, P15), but the majority started being aware of AI coding
4.4 Study Scope and Limitations tools after the release of ChatGPT at the end of 2022. The rightmost
We scoped our study to CS1 and CS2 university instructors since, at column of Table 1 shows that roughly half tried out ChatGPT to
this time, most prior research on AI coding tools focus on these in- varying degrees, ranging from casual personal use to experiment-
troductory courses [32, 33, 38, 39, 68, 97, 111]. Thus, upper-division ing with integrating it into their classes already. Eight participants
university courses, K-12 settings, and informal learning environ- reported never having used these tools yet (‘no usage’ in Table 1),
ments outside of schools are beyond the scope of our study, so our but all had heard of them being discussed by others. P9 purposely
findings may not generalize to them. avoided using AI tools for ethical reasons since she told her students
Next, we designed our interview study as a speculative futures not to use them so she wants to follow the same rules herself.
brainstorming exercise. Thus, if that future does not materialize How much do they think students are using AI coding tools?
(e.g., if AI tools stop being developed or are made less accessible) Since for many instructors this is their first full teaching term where
then the findings from our study may not be as applicable for future ChatGPT is available, nearly everyone we interviewed responded
researchers or practitioners. Relatedly, while we asked participants with some variant of “I don’t know.” The general sentiment was
to brainstorm how they might adapt their courses around a hypo- that as instructors it was hard for them to get a sense of whether
thetical future AI tool, in practice they often talked about a specific students were using AI tools because students would likely not tell
present-day tool such as ChatGPT or Copilot because those tools the professor about it. P7 asked one of his undergraduate TAs about
are what they have heard about or tried firsthand. Although we whether she had witnessed students using it, and the TA responded
encouraged participants to generalize beyond current tools, some that it almost felt like a taboo topic. The TA said she was afraid to
of our findings may still be tied to what participants think about the be seen in the computer lab with ChatGPT open in her web browser
present-day capabilities of AI tools rather than what future tools (even for innocuous use cases) out of fear that she may be setting a
might look like. bad example for her students; she said that it felt like being caught
Lastly, even though our participants came from 9 countries across browsing a website that offered cheating services. P14 was the only
6 continents, the majority were still from universities in the United one who saw direct evidence of students using AI tools. For some
States. Thus, despite our efforts to recruit globally, our findings assignments he requires students to submit screenshots of their
are not as globally-representative as we would ideally like [16]. computer, and he has seen screenshots with the ChatGPT website
And since we conducted all the interviews in English, we are likely open in a browser tab with the start of the assignment prompt
missing out on the diverse experiences of instructors worldwide clearly visible. He told us that “they don’t even try to hide it.”
who teach programming in other natural languages [44].
How much are their colleagues discussing AI tools? How-
ever, despite not having a clear sense of how much students were
5 FINDINGS: COMPUTING INSTRUCTORS’ using AI coding tools, all participants had heard their colleagues
PERCEPTIONS OF AI CODING TOOLS discussing these tools in recent months. These discussions ranged
Although our 20 interview participants teach courses in a wide from informal hallway chats to faculty mailing list threads all the
range of institution types and locations (see Table 1), everyone felt way to official department committees formed to investigate poli-
that their learning environments would likely change in response cies around AI tools. For instance, P6 said that over 20 faculty across
to the growing presence of AI coding tools. To organize the themes her university contributed to a 57-message-long email thread re-
that surfaced from our interviews, this section begins with what acting to the implications of AI tools for teaching. P9 said that
instructors have heard so far about these tools (Section 5.1) and her department formed a committee to investigate and eventually
any short-term changes they are now making to their courses make policy decisions about AI usage in computing courses. P13
(Section 5.2). Then we present the ways they envision their courses was the only one who reported their university issuing an official
112
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
policy decision; he showed us a policy document banning the use WhatsApp group, and then those friends used ChatGPT to
of ChatGPT and other AI tools in all classes across the university. generate the solutions and message it back to them.
• Ban AI tools in class: P2, P4, P9, and P13 added bans of
5.2 Short-term concerns about cheating lead to AI tools into their syllabuses this term. They showed us
immediate course adjustments their syllabus and described how they wrote the bans using
Although our interview questions encouraged participants to think language that equated AI with other forms of code plagiarism.
longer-term about future uses of AI tools in their classes, everyone A few others, such as P6 and P17, referred students to their
started their conversation by bringing up the topic of cheating as university’s honor code when they had questions about AI,
an immediate short-term concern that is on their minds. Note that but they did not issue a course-wide ban since they wanted
all 20 participants brought up the topic of AI-assisted cheating near to deal with usage on a case-by-case basis. P17, P19, and
the start of their interviews even though we did not mention cheating P20 argued against such bans, with P19 remarking that “it
as a topic in our interview questions. would be impossible to enforce, so why bother?” Similarly,
A common concern was that students who relied on AI code gen- both P17 and P20 said that an official ban would only arouse
eration tools to get the answers would not be learning the material. students’ curiosity since a ban could be interpreted as the
For example, a student who used an AI tool to complete an entire school admitting that AI tools were effective.
assignment could receive a good grade without meeting the learn- • Expose students to the capabilities and limitations of
ing objectives. P12 and P15 both independently pointed out that AI tools: P1, P7, and P16 took the opposite approach by
since future AI tools may become more seamlessly integrated into showing students what AI tools can and cannot do. P1 added
IDEs, some students might be unintentionally using them without an optional exercise where students use ChatGPT to solve
realizing they are activated, thus hindering their learning. a programming problem and turn in a chat transcript anno-
Even though students already have access to outside resources tated with their reflections. P7 showed a live coding demo in
such as Stack Overflow, assignment solutions leaked on the web, class where he copied in homework questions from a prior
and peers who can help them, instructors were especially concerned term into ChatGPT and asked his class to critique the AI-
about AI tools because these tools generate variations of code that generated code to assess its strengths and weaknesses. P16
are not exact copies and are thus less detectable by plagiarism did something similar by annotating how ChatGPT solved
detectors. Instructors with more prior AI experience, such as P14, her class’s prior homeworks. Then she let students use any
said that one way to detect the use of these tools is if students submit AI tool they wanted on a take-home coding exam and reflect
assignments containing more advanced programming constructs on their experiences. Note that these activities were easy to
(e.g., Python list comprehensions or lambdas) that have not yet add to their current courses because they did not involve
been taught in an introductory course. creating brand-new assignments. The rationale here was
This concern was compounded by the pervasive belief among to show students that these tools were imperfect so if they
instructors that AI tools had inherent limitations – even though wanted to use them then they had to carefully scrutinize the
these tools could get “95%” of the problem correct, they would never generated code, which may itself be a learning opportunity.
be able to produce correct code in all scenarios, as P15 described:
“In the real world, programming is all about edge cases, so you get 95% Participants remarked that these imperfect short-term patches were
there and that other 5% is the tricky part. Copilot is pretty good with the best they could do right now given time constraints. The sudden
the 95%.” Similarly, P15 noted how “a lot of times the code it [GitHub appearance of ChatGPT at the end of 2022 meant that when they
Copilot] creates is fairly subtly incorrect because your situation is just started teaching in January 2023, it was the first term when lots
slightly different from where it learned from, so you usually have to of students got access to a tool that could potentially solve their
tweak one or two things.” Thus, she perceived that not only would programming assignments. The last time they taught their current
students using AI tools be cheating, but they would be “cheating class, which was usually 6 to 12 months ago, AI tools were not
badly” since their code would be incorrect in subtle ways that they nearly as widespread or easily accessible to students.
would not be able to understand.
5.3 Longer-term ideas (1 of 2): Resisting AI tools
Short-term adjustments to courses: In response to concerns
about academic integrity, the majority of participants (14 out of
may improve programming pedagogy
20) were already making adjustments now during the current term. While everyone started their interviews by discussing cheating
Some common examples include: as a short-term concern (Section 5.2), eventually the conversation
turned to brainstorming longer-term ideas that they could poten-
• Weigh exam scores more heavily: P3, P5, P7, P12, P15, P17,
tially implement over the next year or few years. These ideas were
and P18 adapted by weighing exam scores more in students’
either about 1) resisting the use of AI tools in introductory program-
final grades, which involved no changes to course content.
ming courses (this section), or 2) embracing AI tools and integrating
The rationale for doing so is that exams are taken under
them into new curriculum (Section 5.4).
controlled environments so it is presumably harder to use
AI tools to cheat on them. But P14 noted that exams are not Why resist? Participants wanted to resist using AI tools in intro-
cheat-proof: He witnessed a recent cheating case where a ductory programming courses due to:
student had a smartphone hidden in their lap, took photos • Importance of learning programming fundamentals:
of their paper exam pages, sent those photos to friends in a The most common reason given here is that participants felt
113
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
it is still important to learn the fundamentals of program- syllabus cannot always be enforced, so they want to redesign their
ming, even if AI tools will be doing a lot of the coding in the curricula to mitigate its effects.
future. Several made the oft-repeated analogy to math educa- • Designing AI-proof assignments: One set of ideas for re-
tion after the introduction of calculators: Students still learn sisting AI tools involved designing assignments to be more
the fundamentals of arithmetic and algebra even though cal- ‘AI-proof.’ Several participants mentioned how traditional
culators can do all of those routine operations. P1 said that CS1/CS2 assignments consisting of self-contained program-
using AI coding tools is like “giving kids a calculator and ming tasks that are autograded with a test suite are no longer
they can play around with a calculator, but if they don’t know viable since AI can solve them [32, 104]. One way to improve
what a decimal point means, what do they really learn or do upon them is by adding more local context. For instance, P7
with it? They may not know how to plug in the right thing, walked us through a Java CS1 assignment that used thou-
or they don’t know how to interpret the answer.” P1 and P5 sands of lines of starter code to wrap around the Twitter
both made an analogy to power tools versus hand tools for API, which he and his TAs had written just for this class. He
performing physical labor. P5 said that AI is like a power tried putting in his assignment questions into ChatGPT and,
tool that professionals use but novices should start with the unsurprisingly, it could not generate good solutions since
equivalent of hand tools (i.e., writing code themselves) to it lacked the context of his starter code. P4 and P6 showed
understand the fundamental principles before graduating to us similar setups involving locally-written libraries of code,
power tools. P1 said that “it feels a bit weird to give them [CS1 with P6 walking us through an assignment that uses a cus-
students] this power so early.” P8 brought up the concept of tom graphics library she developed for her class. These in-
fragile knowledge [88] and how she was concerned that if structors felt that creating context-specific assignments with
students use code that they did not write themselves, then bespoke starter code may be a good way to stay ahead of AI
the mental models they build about that code may not be tools’ capabilities.4 P3 (from Chile) and P8 (from Botswana)
robust; she likened it to how she currently sees students described a different form of local context: cultural and lan-
copying code from Stack Overflow into their projects and guage context. They both wanted to incorporate local slang
using that code without trying to understanding it first. and cultural references from their home regions into pro-
• Ethical objections to AI: P9 brought up the lawsuit filed gramming assignments because their hunch is that AI tools
against GitHub [22] for potentially violating the software trained on U.S./English-centric web data would likely not be
licenses of open-source code repositories they used to train knowledgeable about those nuances and would not be able
the Codex model that powers Copilot. She said that since the to produce code to solve those assignments.
legal ramifications of these AI tools have not been clarified • Bringing back paper exams: P1, P4, P6, and P10 proposed
yet, it may be unethical for her to even teach her students going back to paper-based exams to assess learning since
how to use them: “What if my students use AI at their job and it would be harder to cheat using AI in this format. In re-
their company gets sued, that’s not good!” She does not want cent years, especially during remote-only classes due to the
to risk teaching her students to do something that may turn global pandemic, there has been more of a trend toward
out to be illegal. Similarly, P17 brought up how these models computer-based exams in CS1/CS2, with the benefits being
ingest not only open-source code but terabytes of written that students can run, test, and debug their code. However,
and image content created by people who did not consent students need to install special software that locks down their
to have their work used in AI tools without attribution. browser or records their session or even webcam, which can
• Potential lack of equity and access: P6 raised questions feel invasive [9]. Paper exams eliminate the need for such
about who is providing the data for training these AI sys- software and make it harder (but not impossible) for stu-
tems, whether that data is representative of certain groups dents to use devices to cheat since proctors can scan the
of people, and whether users may unknowingly reinforce room to make sure that only pencil and paper are present. P1
biases by programming using AI tools trained on such data. supported this idea but acknowledged that exams can feel
P9 was concerned that students who are more ‘in the know’ too high-stakes and stress-inducing. So she proposed giving
about technology trends will learn how to use AI tools from more frequent lower-stakes quizzes throughout the semester
peers while those without much prior technology exposure and letting students drop their 3 lowest quiz scores.
will not (i.e., an AI digital divide [31]). Thus, even though she • Oral, video, and image-based assessments: Aside from
is currently banning AI tools in her class, she may consider paper exams, some participants also brainstormed other
teaching with them in the future in order to share this knowl- forms of assessment that could both 1) prevent the use of
edge with students in a more equitable way so that everyone AI tools and 2) assess student learning in more meaningful
starts off with access to this same baseline knowledge. ways. P7 and P10 brought up using oral exams where a TA
would question students live one-on-one, though they ac-
knowledged the challenges of scaling to large classes. To
How to resist? Participants proposed the following ways to resist scale better, P7, P9, and P13 wanted students to video-record
AI tools in their courses, operating under the assumption that more 4A few weeks after these interviews, Sourcegraph announced Cody [7], which uses
and more students will gain access to these tools in the future. They retrieval-augmented generation [60] to take in the context of a user’s entire custom
cannot stop students from using AI since even an official ban in the codebase. Future AI tools will likely be able to ingest large bespoke codebases as well.
114
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
themselves tracing through code execution or explaining students are looking for focused vocational job training, so
code that they just wrote. P3 brought up the fact that since it was his responsibility to keep on top of AI coding trends
AI tools can take only text instructions as input, he wanted to help his students remain competitive for jobs.
to create assignments where the inputs are images, such as • Making one’s institution more competitive: A related
sketching out what he wants the student’s code to do.5
motivation for embracing AI was several instructors’ desires
• Process-based assessments: P9 mentioned that in soft- to make their institution stand out among their peers, which
ware engineering courses, students are already graded in could help attract future students. P11 predicted that many
part based on the process of engineering (e.g., making feature universities will be slow to change in response to AI – if
branches and pull requests on GitHub, doing code reviews for his CS department can be among the first to integrate AI
teammates), not just on the final output. Can we adapt some- coding tools into its curriculum, then it can emerge as a
thing similar for CS1/CS2? This may discourage students leader in this field. P16 mentioned that the rise of AI tools
from using AI tools since they will not be able to explain is a fun opportunity to update her curriculum and that it is
their design process in depth. (But future AI tools might be generating excitement amongst others in her department.
able to come up with convincing process explanations too!) Thus, if more colleagues adopt these tools, then that might
Some participants predicted that their ideas for resisting AI tools elevate her department’s reputation as a leader in pedagogy.
may actually improve pedagogy in introductory courses. For in- • Covering more advanced material in CS1/CS2: P2, P11,
stance, oral- and video-based assessments can get students to think P12, P16, and P19 discussed how if students can code using
more deeply about why code works the way that it does rather than the help of AI tools, then that will enable instructors to cover
simply writing code to obtain a known answer. And contextualized more advanced material in CS1/CS2. Currently, significant
programming assignments can be more motivating to students than amounts of time are spent teaching the rote mechanics of
the generic questions that AI tools can now solve. programming. But if AI tools can automate away those me-
chanics, then CS1/CS2 can be more about software design,
5.4 Longer-term ideas (2 of 2): Embracing AI which is usually reserved for more advanced courses. P16
tools for forward-looking CS curricula said “I could imagine having a lot more fun talking about
The second set of longer-term ideas involved why and how to concepts rather than syntax” and how she felt that AI would
embrace AI coding tools in the coming years. Note that some of allow her to cover a lot more ground in CS1. P19 proposed to
these ideas came from the same people who raised objections in the get rid of what we now think of as CS1/CS2 and instead have
prior section, which indicates that participants were not universally students jump straight into software development courses
pro- or anti-AI. Notably, several such as P2 who had put short-term since AI may be able to take care of the more mundane cod-
bans on AI tools in their current classes were open to the idea of ing details in the future. He made an analogy to the invention
allowing AI in the future if they had time to adapt their curricula. of compilers a few decades ago: “We don’t need to look at
P2 mentioned: “I feel like the class would have to change a lot because 1’s and 0’s anymore, and nobody ever says, ‘Wow what a big
almost all the questions we ask students could be solved by this thing problem, we don’t write machine language anymore!’ Com-
[ChatGPT]. We’re gonna have to change what we’re asking of them.” pilers are already like AI in that they can outperform the best
humans in generating code.”
Why embrace? Participants gave four main reasons why they
wanted to integrate AI tools into their curricula. • AI may improve equity and access: In contrast to Sec-
• Preparing students for future jobs: The most common tion 5.3 where participants mentioned objections related to
reason instructors gave for integrating AI tools into CS1/CS2 equity and access, others felt that AI could be beneficial here.
was that they felt responsible for preparing students for a For instance, P12 works at a liberal arts college and teaches
future where they will likely be programming using AI. P2, introductory programming using Arduino to provide a moti-
P12, and P18 argued that learning computing should not be vating context for students from art and design backgrounds.
about programming but should rather be about how to use She felt that the mechanics of coding was discouraging for
software tools to solve real problems; up until now writing many of her students who just wanted to create interest-
code has been the most expressive way to do so, but if using ing projects. She mentioned that if AI could generate this
AI to generate code becomes the industry norm, then that is low-level Arduino code then students can be more moti-
what we should be teaching. P17 stated how “it’s inevitable” vated to do more creative design and problem-solving in
that AI tools will pervade future workplaces, so the sooner class. P15 similarly mentioned how writing English prompts
her students learn to evaluate the output of those tools, the for AI tools can be far less intimidating than writing code
more prepared they will be for industry. This heavy focus on and having to get all the syntax right, so that could make
job preparation was also emphasized by P20, who teaches programming more accessible to a wider range of student
at a public undergraduate-only school designated as a U.S. backgrounds. P19 was excited by how AI code generation
minority-serving institution (over 70% of students belonging can potentially improve diversity in the software industry
to a minority ethnic group): P20 mentioned that many of his by encouraging more diverse types of students to learn com-
puting, including those who are currently discouraged by
5 Theweek after we interviewed P3, OpenAI announced a new GPT-4 LLM, which can having to manually write code. He felt that future computing
now take images as input and analyze those images to generate relevant code [83].
115
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
curricula will be more about fostering good communication “If these tools are the norm moving forward, the emphasis
(i.e., communicating with AI via natural language prompts in intro courses might move from code writing to code
and back-and-forth dialogue) rather than the obscure details reading, code comprehension, and testing. How do you
of programming language syntax and semantics. validate what comes out of an AI tool? Does it really meet
the expectations that you have? [...] Code comprehension
Integrating AI tools into current courses: Some participants
and code reading are going to be super important, like can
brainstormed ways that AI tools could integrate into their current
you trust what code you get out of an AI tool?”
courses without requiring any major structural changes.
• Giving personalized help to students: P12, P15, and P17 Similarly, P14 describes how he wants his introductory pro-
were excited that by doing some prompt engineering [112] gramming course to turn more into an English or literature
one could use AI tools to explain step-by-step how code class where the emphasis turns to reading, critically analyz-
works, teach relevant concepts on-demand within the con- ing, and editing code that may be produced by AI tools. He
text of each student’s code, and create extra exercises for made the analogy to teaching students to become good edi-
students who want more practice. P12 reported that after tors rather than just writers. Note that this skill of code review
students complete assignments or labs in her class they never (i.e., critiquing others’ code) is useful even when collaborat-
go back to reflect on why they might have gotten the an- ing with other humans in the workplace, as P15 noted: “Yeah
swers right or wrong. She tries to provide these explanations I think [code review] is generally going to be a skill for a while,
during office hours and hopes that AI can help do this for even if someone isn’t using Copilot, it’s a skill you’re going to
students in the future: “Imagine you hover over a line of code need when working with others in software engineering.”
in your IDE and it explains it to you, that could be great for • Creating open-ended design assignments: P2, P3, P11,
learners.” P17, who teaches at a liberal arts college, described P12, P15, and P19 were excited about the prospect of mak-
how she purposely does not use a textbook for CS1 so that ing more open-ended design-based assignments as early as
she can customize the curriculum for their liberal arts con- CS1/CS2 rather than waiting until more advanced courses.
text. However, she finds that students sometimes come to her This can be possible because future students will not be as
office hours asking for more examples of a specific concept hindered by the mechanics of writing code if AI can help
they are learning. So she wants an AI tool that can generate do it. For instance, P2 mentioned giving data science prob-
these additional examples and explanations in response to lems where students can find realistic data sets and analyze
student questions without needing her to be present. them in more creative ways, with AI helping them write
• Helping instructors with time-consuming tasks: P14 code using the Python pandas data analysis library [4]. P15
has already been using AI tools to generate new variants of and P19 wanted a portfolio-based approach to assignments
his programming assignments. He is doing this so that he like what occurs in art and design schools: Students could
can create fresh new variants each term to prevent students design their own projects and use AI to help them code up a
from copying the answers from friends who took his course prototype. P3 mentioned that open-ended project grading
in prior terms. In the past he created variants manually, and could be done by probing how well each student understood
it was very time-consuming. But this is something that AI the process of working with AI, including its strengths and
is well-suited to do since it can generate a lot of candidate weaknesses: “Tell me what things you tried to do with the AI
variants and he can pick the most suitable ones to use. Sim- and failed, and why. If you just press a button and the AI does
ilarly, P17 currently creates small drills for her students to it all for you, then you won’t have a good story to tell about
practice basic programming, akin to basic math drills; she what challenges and setbacks you faced while doing this.”
wants to use AI to help her generate lots of these drills to • Having students work collaboratively with AI: Going
give students more practice. P15 and P18 said that their TAs one step beyond open-ended assignments, P8, P11, P12, and
hand-grade hundreds of programming assignments using P19 wanted to design assignments where students have to
rubrics to check for code style best-practices and felt that AI work collaboratively with AI. For instance, P11 and P12
could be trained to do this sort of stylistic grading, which is both proposed algorithm design problems where the stu-
akin to doing a code review. Lastly, P17 wanted to use AI as dent would assume the role of a ‘client’ and specify what
a helper at her office hours since her CS1 students often ask they wanted to make in English. Then the AI would generate
the same types of basic questions, so she can let the AI take code, and the student would test and critique it and pass it
a first pass at answering those while she spends time on the back to the AI for the next round of iteration. P19 proposed
harder questions. this sort of collaboration even during exams: AI could gen-
erate problems for the student to solve and then help the
Designing new AI-embedded course materials: In contrast to
student along when prompted with clarifying questions. P8
the ideas above, which can be retrofitted into existing courses, the
proposed a variation of the think-pair-share [51] classroom
following ideas aim to redesign courses with AI tools in mind.
activity where a pair of students would try to solve a pro-
• Focusing more on code reading and critique: P9, P14, gramming task (e.g., reversing a string) and then prompt
and P15 propose to shift introductory computing courses an AI to solve it. Then each pair would discuss how their
more toward reading and critiquing code rather than simply human solutions compare to the AI solution.
writing code. P9 raised this question during her interview:
116
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
These sets of ideas reflect the sentiment that it is inevitable that someone is another human or a machine). Some of our findings also
AI tools will become more widespread, so it is futile to resist [6]. resonate with Ko’s pair of blog posts [54, 55], such as participants’
Recall that P11 mentioned how computing departments that adapt concerns about students overrelying on AI tools as shortcuts to
fastest to this change will emerge as leaders in the coming decade; bypass learning (Section 5.2) and the opportunity to focus more on
he recalled a similar moment in the 1990s when many students specifying requirements for software rather than the mechanics of
first gained widespread internet access. Similar to today, instructors coding (Section 5.4). Lastly, our study findings add empirical detail
were concerned that students could simply look up all the answers to several of the high-level themes that Becker et al. proposed in
online. But eventually, all schools had no choice but to adapt to the their position paper [14], such as using AI to give personalized
internet, and the departments that first embraced students using the tutoring, to help instructors with time-consuming tasks such as
internet had an advantage because they adapted early. P11 believed creating exercises, and to re-orient courses more toward code read-
that a similar outcome could happen for AI tools. ing, along with concerns about academic integrity (Section 5.2) and
ethical objections to AI tools. In addition, we present new ideas
6 REFLECTION AND DISCUSSION such as instructors’ varied motivations behind why they wanted to
either resist or embrace AI tools, along with specific proposed ways
The most unique aspect of our interviews was their timeliness. We
to make assignments and exams more ‘AI-proof’ (Section 5.3).
conducted them in early 2023, during the first full academic term
after ChatGPT’s release in late 2022, which was likely the first time
that many CS1/CS2 instructors started thinking about what to do 7 OPEN RESEARCH QUESTIONS
with their courses in light of the growing prevalence of AI tools. Since we are still early in the adoption curve of AI coding tools, as
Thus, our study captures a unique moment in time when instructors a community we now have a rare window of opportunity to guide
have started brainstorming but have not solidified their plans yet. their future usage in effective, equitable, and ethical ways. To work
Due to this timing, participants seemed enthusiastic to talk about toward this goal, we hope that researchers can investigate some of
this topic because it was already on their minds, regardless of the relevant open questions that our study findings raised:
whether they had used AI coding tools. Note that we did not pur-
posely recruit instructors who had experience with these tools; we • Theory-building – Our participants (even those who want to
tried to get a diverse sample of CS1/CS2 instructors around the embrace AI tools) worry that students will become overre-
world. As Table 1 shows, many had little experience with AI tools. liant on AI tools without knowing how they work. Thus, we
During the end of their interviews, several mentioned how our feel that it is important to build theories about how people
conversation helped them to clarify their own thinking about AI believe these tools work. For instance, what mental models
coding tools and that they were curious about the ideas that other do novices currently form both about the code that AI gen-
study participants came up with. For instance, at the end of his erates and about how the AI works to produce that code?
interview, P3 asked unprompted, “One more thing, when you write How do those novice mental models compare to experts’
the paper, can you email it to me since I am very interested in seeing mental models? And what strategies (if any) do novices and
[what other instructors said]?” We emailed a draft of the accepted experts currently use to try to validate AI tool outputs for
paper to all participants to get any additional feedback they had themselves? These questions will be critical for designing
before finalizing the camera-ready. techniques to guide novices to form viable mental models
A limitation of our participants being personally invested in this so that they can learn to use AI tools effectively.
topic was that we felt there were anchoring effects [41] in their pro-
posed ideas. Despite us designing our interview prompt to encour- • Scaffolding novice understanding – Related to above, how
age open-ended speculative futures brainstorming (Section 4), all can we add pedagogical scaffolds to the outputs of AI tools to
participants anchored their responses to their knowledge of present- help novices understand how they are coming up with their
day tools such as ChatGPT and Copilot, either obtained through code suggestions or explanations? Having the AI “show its
personal experience or from what they hear from colleagues. Thus, work” can potentially help novices to better understand both
some of their ideas about longer-term course changes felt like direct its capabilities and limitations. Recent lines of research from
reactions to these tools rather than radically-new notions of what the HCI and XAI (Explainable AI) communities could pro-
computing education ought to be like in the future. vide some inspiration. For instance, the grounded abstraction
matching technique [64] may be one starting point.
Relating to ideas from other computing education researchers:
In the midst of our interviews, we were made aware of several blog • Tailoring AI coding tools for pedagogy – Current AI coding
posts and a position paper written by computing education re- tools are meant to directly help a programmer write code as
searchers in early 2023. To avoid biasing our interviews and data quickly and efficiently as possible. However, such directness
analyses, we waited until after completing our analyses to read may not be the best for pedagogy since it gives away the
these articles in-depth. Here we reflect on some points of common- (possibly-right, possibly-wrong) answer without making the
ality between our study and what they wrote: Our participants’ learner think deeply. How can we tailor these tools to become
ideas about re-orienting programming education toward critiquing better at teaching rather than doing, perhaps integrating
pedagogical content knowledge [48] about programming?
code produced by AI resonates with the Bootstrap Project’s blog
post [19], which posits that meaningful learning happens when • Adapting IDEs for AI-aware pedagogy – Today’s IDEs are
one can verify (or refute) someone else’s solution (whether that optimized to streamline code writing, but if future AI-aware
117
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
curricula move toward emphasizing skills like code compre- classes it is impractical for any one instructor to provide
hension and critique, how should we redesign IDEs to align in-depth feedback on hundreds of code submissions. If AI
with these goals? For instance, how can we design IDEs to could be tailored using an instructor’s past feedback, these
discourage students from developing harmful habits, such tools could enable consistent, personalized instruction at
as overreliance on AI-generated code? And how can peda- scale. How might we enable instructors to tailor AI tools to
gogical IDEs nudge students to engage in activities such as their own context and desires, and what impact might this
reading and critically analyzing AI-generated code? have on the way instructors design and deliver their classes?
• Equity and access – Participants brought up how AI tools • Beyond autograded programming assignments – Related to
can both be beneficial and detrimental to these goals. So above, currently CS1/CS2 assignments often consist of pro-
how can we design curricula that use these AI tools in such gramming prompts that are graded by autograder software.
a way to work toward greater equity and access? On one This is the status quo not necessarily because it is ideal but
hand, current AI systems are criticized for their negative simply due to how well it scales to large classes. It is hard
impacts on equity [17, 63]. But if these systems become more to scale up open-ended free-response questions that need to
widespread then perhaps it is necessary to teach everyone be hand-graded by humans. However, modern LLMs show
to use them or else a new digital divide [31] may open up promise in natural language reasoning, so perhaps they can
between those with and without access to modern AI tools. help to both design and assess more interesting assignments
As P9 shared at the end of her interview: “My concern would that go beyond writing code to pass autograder test cases.
be from an equity perspective, the students who don’t know But that raises concerns about the ethics of having AI do
about AI tools would be disadvantaged. So how would you grading of student assignments, so any such interventions
make it so that everyone has the opportunity to use them, from may need to be carefully vetted by instructors.
an equitable perspective?” • Rethinking CS1/CS2 in light of AI tools – Lastly, what if we
• Efficacy studies – How can we tell whether AI tools in in- could redesign CS1/CS2 without following the traditions of
troductory courses make students more effective? Can we the past 50+ years [15] of research and practice in our field?
design controlled experiments where cohorts of CS1/CS2 If AI coding tools become more pervasive in the future, what
students receive either AI-enriched or AI-free curricula and timeless pedagogical themes should still remain the same,
track their progress throughout the major? How can we de- and what aspects need to be radically reconsidered? How
sign and sustain these longitudinal studies in an ethical way can we prepare our students for the next 50 years? And what
if it turns out that one condition is significantly better or will programmers need to know in the year 2073?
worse for students? And will it even be possible to enforce a
control condition if AI tools become so pervasive that most 8 CONCLUSION
students are regularly accessing them outside of class? We presented the perspectives of 20 introductory programming in-
• Evaluating AI-aware assessments – Can we effectively assess structors across 9 countries on how they plan to adapt their courses
student knowledge if future students collaborate with AI in light of the growing prevalence of AI coding tools. Our study
tools on their assignments (and perhaps even on exams)? Our captures a rare moment in time during the first full academic term
participants suggested a variety of alternative assessment (early 2023) when these AI tools started becoming widely accessible.
methods such as having students record video explanations. We found that in the short-term many planned to take immediate
How can we evaluate whether these methods are effective? measures to discourage cheating. Then opinions diverged about
how to work with these AI tools longer-term, with one side wanting
• Upper-division computing courses – Our study focused on
to ban them and continue teaching programming fundamentals,
introductory programming courses, but what about uses of
and the other side wanting to integrate them into courses to prepare
AI tools in upper-division courses where the learning objec-
students for future jobs. We hope these findings along with our
tives differ? Five of our interview participants mentioned
open research questions can spur conversations about how to work
how even though they were opposed to AI in introductory
with these tools in effective, equitable, and ethical ways.
courses, they actually wanted to use them in upper-division
courses such as software engineering labs.
ACKNOWLEDGMENTS
• (programming != computing) – Related to above, our pa- Thanks to all of our interview participants for sharing their time
per focuses on introductory programming education, but and expertise, to the ICER reviewers for their feedback, and to
computing education encompasses a much broader set of top- Majeed Kazemitabaar and Sangho Suh for their feedback. Also
ics and learning objectives. How will AI tools like current and thanks to a well-timed Hawaii vacation for giving Philip the much-
future LLMs [24] affect the many facets of computing educa- needed headspace to begin thinking about research directions in
tion that are not just about writing and running code? What this emerging area. This material is based upon work supported by
about using AI to teach computing concepts to students who the National Science Foundation under Grant No. NSF IIS-1845900.
do not necessarily want to become programmers [27, 109]?
• Scaling instruction – It seems plausible for AI tools to help REFERENCES
scale instructors’ human expertise. For example, in large [1] 2023. Anthropic: Introducing Claude. https://www.anthropic.com/index/
introducing-claude. Accessed: 2023-05-20.
118
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
[2] 2023. Codeium: The modern coding superpower. https://codeium.com/. Ac- and Greg Brockman. 2021. Evaluating Large Language Models Trained on Code.
cessed: 2023-05-20. arXiv preprint arXiv:2107.03374 (2021). arXiv:2107.03374
[3] 2023. Introducing GitHub Copilot X: Your AI pair programmer is leveling up. [25] Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, and Denae Ford. 2022. "It
https://github.com/features/preview/copilot-x. Accessed: 2023-05-20. would work for me too": How Online Communities Shape Software Developers’
[4] 2023. pandas - Python Data Analysis Library. https://pandas.pydata.org/. Ac- Trust in AI-Powered Code Generation Tools. arXiv:2212.03491 [cs.HC]
cessed: 2023-03-15. [26] Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang,
[5] 2023. ReplitLM: Guides, code and configs for the ReplitLM model family. https: Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion
//github.com/replit/ReplitLM/. Accessed: 2023-06-05. Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing
[6] 2023. "Resistance is futile" – Borg – Wikipedia. https://en.wikipedia.org/wiki/ GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
Borg. Accessed: 2023-06-02. [27] Parmit K. Chilana, Rishabh Singh, and Philip J. Guo. 2016. Understanding
[7] 2023. Sourcegraph Cody: Read, write, and understand code 10x faster with AI. Conversational Programmers: A Perspective from the Software Industry. In
https://about.sourcegraph.com/cody. Accessed: 2023-05-20. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
[8] Ankur Desai and Atul Deo. 2022. Introducing Amazon CodeWhis- (San Jose, California, USA) (CHI ’16). Association for Computing Machinery,
perer, the ML-powered Coding Companion | AWS Machine Learning New York, NY, USA, 1462–1472. https://doi.org/10.1145/2858036.2858323
Blog. https://aws.amazon.com/blogs/machine-learning/introducing-amazon- [28] Colin B. Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and
codewhisperer-the-ml-powered-coding-companion/. Accessed: 2023-03-23. Neel Sundaresan. 2020. PyMT5: Multi-Mode Translation of Natural Language
[9] Anushka Patil and Jonah Engel Bromwich. 2020. How It Feels and Python Code with Transformers. arXiv preprint arXiv:2010.03150 (2020).
When Software Watches You Take Tests - The New York Times. arXiv:2010.03150
https://www.nytimes.com/2020/09/29/style/testing-schools-proctorio.html. [29] Mike Conover, Matt Hayes, Ankit Mathur, Xiangrui Meng, Jianwei Xie, Jun
Accessed: 2023-03-23. Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold
[10] James Auger. 2013. Speculative Design: Crafting the Speculation. Digital Xin. 2023. Free Dolly: Introducing the World’s First Truly Open Instruction-
Creativity 24, 1 (2013), 11–35. Tuned LLM. https://www.databricks.com/blog/2023/04/12/dolly-first-open-
[11] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk commercially-viable-instruction-tuned-llm. Accessed: 2023-05-20.
Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, and Quoc [30] Juliet M. Corbin and Anselm L. Strauss. 2008. Basics of qualitative research:
Le. 2021. Program Synthesis with Large Language Models. arXiv preprint techniques and procedures for developing grounded theory. SAGE Publications,
arXiv:2108.07732 (2021). arXiv:2108.07732 Inc.
[12] John A. Bargh and Tanya L. Chartrand. 2014. The Mind in the Middle: A Practical [31] Rowena Cullen. 2001. Addressing the digital divide. Online information review
Guide to Priming and Automaticity Research. (2014). 25, 5 (2001), 311–320.
[13] Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded [32] Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Co-
Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM pilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natu-
Program. Lang. 7, OOPSLA 1, Article 78 (apr 2023), 27 pages. https://doi.org/10. ral Language. In Proceedings of the 54th ACM Technical Symposium on Com-
1145/3586030 puter Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Associ-
[14] Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James ation for Computing Machinery, New York, NY, USA, 1136–1142. https:
Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It //doi.org/10.1145/3545945.3569823
Used to Be: Educational Opportunities and Challenges of AI Code Generation. In [33] Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022. Robosourcing Ed-
Proceedings of the 54th ACM Technical Symposium on Computer Science Education ucational Resources – Leveraging Large Language Models for Learnersourcing.
V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, arXiv:2211.04715 [cs.HC]
New York, NY, USA, 500–506. https://doi.org/10.1145/3545945.3569759 [34] Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Mar-
[15] Brett A. Becker and Keith Quille. 2019. 50 Years of CS1 at SIGCSE: A Review of shall, Ribhu Pathria, Marvin Tom, and Joel Hestness. 2023. Cerebras-GPT:
the Evolution of Introductory Programming Education Research. In Proceedings Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale
of the 50th ACM Technical Symposium on Computer Science Education (Minneapo- Cluster. arXiv:2304.03208 [cs.LG]
lis, MN, USA) (SIGCSE ’19). Association for Computing Machinery, New York, [35] Thomas Dohmke. 2022. GitHub Copilot Is Generally Available to All Develop-
NY, USA, 338–344. https://doi.org/10.1145/3287324.3287432 ers. https://github.blog/2022-06-21-github-copilot-is-generally-available-to-
[16] Brett A. Becker, Amber Settle, Andrew Luxton-Reilly, Briana B. Morrison, and all-developers/. Accessed: 2023-03-23.
Cary Laxer. 2021. Expanding Opportunities: Assessing and Addressing Geo- [36] Fabrice Bellard. 2023. TextSynth. https://textsynth.com/. Accessed: 2023-03-23.
graphic Diversity at the SIGCSE Technical Symposium. In Proceedings of the [37] Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming
52nd ACM Technical Symposium on Computer Science Education. 281–287. Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A
[17] Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Pre-Trained Model for Programming and Natural Languages. arXiv preprint
Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be arXiv:2002.08155 (2020). arXiv:2002.08155
Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, [38] James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and
and Transparency. 610–623. James Prather. 2022. The Robots Are Coming: Exploring the Implications
[18] Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini of OpenAI Codex on Introductory Programming. In Proceedings of the 24th
Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2023. Taking Flight with Australasian Computing Education Conference (Virtual Event, Australia) (ACE
Copilot: Early Insights and Opportunities of AI-Powered Pair-Programming ’22). Association for Computing Machinery, New York, NY, USA, 10–19. https:
Tools. Queue 20, 6 (Jan 2023), 35–57. https://doi.org/10.1145/3582083 //doi.org/10.1145/3511861.3511863
[19] Bootstrap Blog. 2023. What Do Tools like ChatGPT Mean for Math and CS [39] James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos,
Education? https://bootstrapworld.org/blog/misc/thoughts-on-chat-gpt.shtml. James Prather, and Brett A. Becker. 2023. My AI Wants to Know If This Will
Accessed: 2023-03-23. Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In
[20] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Proceedings of the 25th Australasian Computing Education Conference (Melbourne,
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda VIC, Australia) (ACE ’23). Association for Computing Machinery, New York, NY,
Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, USA, 97–104. https://doi.org/10.1145/3576123.3576134
Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris [40] Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda
Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis. 2022. In-
Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, coder: A Generative Model for Code Infilling and Synthesis. arXiv preprint
and Dario Amodei. 2020. Language Models Are Few-Shot Learners. In Advances arXiv:2204.05999 (2022). arXiv:2204.05999
in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 1877– [41] Adrian Furnham and Hua Chu Boo. 2011. A Literature Review of the Anchoring
1901. Effect. The journal of socio-economics 40, 1 (2011), 35–42.
[21] Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accu- [42] Xinyang Geng, Arnav Gudibande, Hao Liu, Eric Wallace, Pieter Abbeel, Sergey
racy Disparities in Commercial Gender Classification. In Conference on Fairness, Levine, and Dawn Song. 2023. Koala: A Dialogue Model for Academic Research.
Accountability and Transparency. PMLR, 77–91. Blog post. https://bair.berkeley.edu/blog/2023/04/03/koala/
[22] Matthew Butterick. 2022. GitHub Copilot Investigation · Joseph Saveri Law [43] Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis.
Firm & Matthew Butterick. https://githubcopilotinvestigation.com/. Accessed: Foundations and Trends® in Programming Languages 4, 1-2 (2017), 1–119.
2023-03-23. [44] Philip J. Guo. 2018. Non-Native English Speakers Learning Computer Program-
[23] Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang ming: Barriers, Desires, and Design Opportunities. In Proceedings of the 2018
Lou, and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. CHI Conference on Human Factors in Computing Systems. 1–14.
arXiv preprint arXiv:2207.10397 (2022). arXiv:2207.10397 [45] Björn Hartmann, Loren Yu, Abel Allison, Yeonsoo Yang, and Scott R. Klemmer.
[24] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde 2008. Design as Exploration: Creating Interface Alternatives through Parallel Au-
de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, thoring and Runtime Tuning. In Proceedings of the 21st Annual ACM Symposium
119
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA
on User Interface Software and Technology. 91–100. Werra, and Harm de Vries. 2023. StarCoder: may the source be with you!
[46] J. Hoffman. 2022. Speculative Futures: Design Approaches to Navigate Change, arXiv:2305.06161 [cs.CL]
Foster Resilience, and Co-Create the Cities We Need. North Atlantic Books. https: [62] Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser,
//books.google.com/books?id=rdd2EAAAQBAJ Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, and Agustin Dal Lago.
[47] Krystal Hu. 2023. ChatGPT Sets Record for Fastest-Growing User Base - Ana- 2022. Competition-Level Code Generation with AlphaCode. Science 378, 6624
lyst Note | Reuters. https://www.reuters.com/technology/chatgpt-sets-record- (2022), 1092–1097.
fastest-growing-user-base-analyst-note-2023-02-01/. Accessed: 2023-03-23. [63] Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, and Ruslan Salakhutdinov.
[48] Aleata Hubbard. 2018. Pedagogical content knowledge in computing ed- 2021. Towards understanding and mitigating social biases in language models.
ucation: a review of the research literature. Computer Science Educa- In International Conference on Machine Learning. PMLR, 6565–6576.
tion 28, 2 (2018), 117–135. https://doi.org/10.1080/08993408.2018.1509580 [64] Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Benjamin Zorn, Jack
arXiv:https://doi.org/10.1080/08993408.2018.1509580 Williams, Neil Toronto, and Andrew D. Gordon. 2023. “What It Wants Me To
[49] Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins. 2022. Exploring the Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-
Learnability of Program Synthesizers by Novice Programmers. In Proceedings Generating Large Language Models. In Proceedings of the 2023 CHI Conference
of the 35th Annual ACM Symposium on User Interface Software and Technology on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Asso-
(Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, ciation for Computing Machinery, New York, NY, USA, Article 598, 31 pages.
NY, USA, Article 64, 15 pages. https://doi.org/10.1145/3526113.3545659 https://doi.org/10.1145/3544548.3580817
[50] Arianna Johnson. 2023. ChatGPT In Schools: Here’s Where [65] Stephen MacNeil, Joanne Kim, Juho Leinonen, Paul Denny, Seth Bernstein,
It’s Banned—And How It Could Potentially Help Students. Brett A. Becker, Michel Wermelinger, Arto Hellas, Andrew Tran, Sami Sarsa,
https://www.forbes.com/sites/ariannajohnson/2023/01/18/chatgpt-in- James Prather, and Viraj Kumar. 2023. The Implications of Large Language
schools-heres-where-its-banned-and-how-it-could-potentially-help-students/. Models for CS Teachers and Students. In Proceedings of the 54th ACM Technical
Accessed: 2023-03-23. Symposium on Computer Science Education V. 2 (Toronto ON, Canada) (SIGCSE
[51] Mahmoud Kaddoura. 2013. Think Pair Share: A Teaching Learning Strategy 2023). Association for Computing Machinery, New York, NY, USA, 1255. https:
to Enhance Students’ Critical Thinking. Educational Research Quarterly 36, 4 //doi.org/10.1145/3545947.3573358
(2013), 3–24. [66] Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul
[52] Eirini Kalliamvakou. 2022. Research: quantifying GitHub Copilot’s Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from Using
impact on developer productivity and happiness. GitHub blog – Code Explanations Generated by Large Language Models in a Web Soft-
https://github.blog/2022-09-07-research-quantifying-github-copilots- ware Development E-Book. In Proceedings of the 54th ACM Technical Sym-
impact-on-developer-productivity-and-happiness/. Accessed: 2023-03-15. posium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE
[53] Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David 2023). Association for Computing Machinery, New York, NY, USA, 931–937.
Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators https://doi.org/10.1145/3545945.3569785
on Supporting Novice Learners in Introductory Programming. In Proceedings [67] Stephen MacNeil, Andrew Tran, Juho Leinonen, Paul Denny, Joanne Kim, Arto
of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Hellas, Seth Bernstein, and Sami Sarsa. 2023. Automatically Generating CS
Association for Computing Machinery, New York, NY, USA. Learning Materials with Large Language Models. In Proceedings of the 54th ACM
[54] Amy J. Ko. 2023. Large Language Models Will Change Programming . . . Technical Symposium on Computer Science Education V. 2 (Toronto ON, Canada)
a Lot. https://medium.com/bits-and-behavior/large-language-models-will- (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 1176.
change-programming-a-lot-5cfe13afa46c. Accessed: 2023-03-23. https://doi.org/10.1145/3545947.3569630
[55] Amy J. Ko. 2023. Large Language Models Will Change Programming. . . a Lit- [68] Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and
tle. https://medium.com/bits-and-behavior/large-language-models-will-change- Ziheng Huang. 2022. Generating Diverse Code Explanations Using the GPT-3
programming-a-little-81445778d957. Accessed: 2023-03-23. Large Language Model. In Proceedings of the 2022 ACM Conference on Interna-
[56] Sophia Krause-Levy, Adrian Salguero, Rachel S. Lim, Hayden McTavish, Jelena tional Computing Education Research - Volume 2 (Lugano and Virtual Event,
Trajkovic, Leo Porter, and William G. Griswold. 2023. Instructor Perspectives Switzerland) (ICER ’22). Association for Computing Machinery, New York, NY,
on Prerequisite Courses in Computing. In Proceedings of the 54th ACM Technical USA, 37–39. https://doi.org/10.1145/3501709.3544280
Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE [69] Thomas Mahatody, Mouldi Sagar, and Christophe Kolski. 2010. State of the
2023). Association for Computing Machinery, New York, NY, USA, 277–283. Art on the Cognitive Walkthrough Method, Its Variants and Evolutions. Intl.
https://doi.org/10.1145/3545945.3569787 Journal of Human–Computer Interaction 26, 8 (2010), 741–785.
[57] Sarah Kreps, R. Miles McCain, and Miles Brundage. 2022. All the News That’s [70] Petros Maniatis and Daniel Tarlow. 2023. Large sequence models for software
Fit to Fabricate: AI-generated Text as a Tool of Media Misinformation. Journal development activities. Google Research Blog – https://ai.googleblog.com/2023/
of experimental political science 9, 1 (2022), 104–117. 05/large-sequence-models-for-software.html. Accessed: 2023-06-04.
[58] Sam Lau, Ian Drosos, Julia M. Markel, and Philip J. Guo. 2020. The Design [71] Julia M. Markel and Philip J. Guo. 2021. Inside the Mind of a CS Undergraduate
Space of Computational Notebooks: An Analysis of 60 Systems in Academia TA: A Firsthand Account of Undergraduate Peer Tutoring in Computer Labs. In
and Industry. In 2020 IEEE Symposium on Visual Languages and Human-Centric Proceedings of the 52nd ACM Technical Symposium on Computer Science Education
Computing (VL/HCC). IEEE, 1–11. (Virtual Event, USA) (SIGCSE ’21). Association for Computing Machinery, New
[59] Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James York, NY, USA, 502–508. https://doi.org/10.1145/3408877.3432533
Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance [72] Max Schaefer, Sarah Nadi, and Frank Tip. 2023. GitHub Next | TestPilot.
Programming Error Messages. In Proceedings of the 54th ACM Technical Sym- https://next.github.com/projects/testpilot. Accessed: 2023-03-23.
posium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE [73] Andrew M. McNutt, Chenglong Wang, Robert A. Deline, and Steven M. Drucker.
2023). Association for Computing Machinery, New York, NY, USA, 563–569. 2023. On the Design of AI-Powered Code Assistants for Notebooks. In Pro-
https://doi.org/10.1145/3545945.3569770 ceedings of the 2023 CHI Conference on Human Factors in Computing Systems
[60] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- York, NY, USA, Article 434, 16 pages. https://doi.org/10.1145/3544548.3580940
täschel, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive [74] Yusuf Mehdi. 2023. Reinventing Search with a New AI-powered Microsoft Bing
NLP Tasks. Advances in Neural Information Processing Systems 33 (2020), 9459– and Edge, Your Copilot for the Web. https://blogs.microsoft.com/blog/2023/
9474. 02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-
[61] Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Ko- your-copilot-for-the-web/. Accessed: 2023-03-23.
cetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, [75] Lance A. Miller. 1981. Natural Language Programming: Styles, Strategies, and
Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- Contrasts. IBM Systems Journal 20, 2 (1981), 184–215.
haene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, [76] Samim Mirhosseini, Austin Z. Henley, and Chris Parnin. 2023. What Is Your
Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Biggest Pain Point? An Investigation of CS Instructor Obstacles, Workarounds,
Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, and Desires. In Proceedings of the 54th ACM Technical Symposium on Computer
Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for
Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wen- Computing Machinery, New York, NY, USA, 291–297. https://doi.org/10.1145/
hao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fe- 3545945.3569816
dor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire [77] Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, and Richard Evans. 2023.
Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation
Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish by Industry Professionals. In Proceedings of the 2023 CHI Conference on Human
Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Car- Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for
los Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Computing Machinery, New York, NY, USA, Article 355, 34 pages. https:
//doi.org/10.1145/3544548.3581225
120
ICER ’23 V1, August 07–11, 2023, Chicago, IL, USA Sam Lau and Philip J. Guo
[78] Vijayaraghavan Murali, Chandra Maddila, Imad Ahmad, Michael Bolin, Daniel (2019). arXiv:1906.02243
Cheng, Negar Ghorbani, Renuka Fernandez, and Nachiappan Nagappan. 2023. [101] Jiao Sun, Q. Vera Liao, Michael Muller, Mayank Agarwal, Stephanie Houde,
CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Au- Kartik Talamadupula, and Justin D. Weisz. 2022. Investigating Explainability of
thoring. arXiv:2305.12050 [cs.SE] Generative AI for Code through Scenario-Based Design. In 27th International
[79] Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, and Yingbo Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association
Zhou. 2023. CodeGen2: Lessons for Training LLMs on Programming and Natural for Computing Machinery, New York, NY, USA, 212–228. https://doi.org/10.
Languages. arXiv:2305.02309 [cs.LG] 1145/3490099.3511119
[80] Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, [102] Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos
Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Alpaca: A Strong,
Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 [cs.LG] Replicable Instruction-Following Model. https://crfm.stanford.edu/2023/03/13/
[81] OpenAI. 2022. Educator Considerations for ChatGPT. alpaca.html.
https://platform.openai.com. Accessed: 2023-03-23. [103] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
[82] OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt. Ac- Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro,
cessed: 2023-03-23. Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guil-
[83] OpenAI. 2023. GPT-4 Technical Report. https://arxiv.org/abs/2303.08774v2. laume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models.
[84] OpenAI. 2023. OpenAI Playground. https://platform.openai.com. Accessed: arXiv:2302.13971 [cs.CL]
2023-03-23. [104] Frank Vahid, Lizbeth Areizaga, and Ashley Pang. 2023. ChatGPT
[85] Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela and Cheat Detection in CS1 Using a Program Autograding System.
Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John https://www.zybooks.com/research-items/chatgpt-and-cheat-detection-
Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda in-cs1-using-a-program-autograding-system/. Accessed: 2023-03-15.
Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training [105] Priyan Vaithilingam, Elena L. Glassman, Peter Groenwegen, Sumit Gulwani,
Language Models to Follow Instructions with Human Feedback. https://doi. Austin Z. Henley, Rohan Malpani, David Pugh, Arjun Radhakrishna, Gustavo
org/10.48550/arXiv.2203.02155 arXiv:arXiv:2203.02155 Soares, Joey Wang, and Aaron Yim. 2023. Towards More Effective AI-Assisted
[86] Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Programming: A Systematic Design Exploration to Improve Visual Studio Intelli-
Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of Github Code’s User Experience. In Proceedings of the IEEE/ACM International Conference
Copilot’s Code Contributions. In 2022 IEEE Symposium on Security and Privacy on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’23). Asso-
(SP). IEEE, 754–768. ciation for Computing Machinery, New York, NY, USA.
[87] Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The [106] Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation
Impact of AI on Developer Productivity: Evidence from GitHub Copilot. vs. Experience: Evaluating the Usability of Code Generation Tools Powered by
arXiv:2302.06590 [cs.SE] Large Language Models. In Extended Abstracts of the 2022 CHI Conference on
[88] David N. Perkins and Fay Martin. 1986. Fragile Knowledge and Neglected Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22).
Strategies in Novice Programmers. In Papers Presented at the First Workshop on Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages.
Empirical Studies of Programmers on Empirical Studies of Programmers. 213–229. https://doi.org/10.1145/3491101.3519665
[89] Billy Perrigo. 2023. Exclusive: The $2 Per Hour Workers Who Made ChatGPT [107] Sander Valstar, Sophia Krause-Levy, Alexandra Macedo, William G. Griswold,
Safer. https://time.com/6247678/openai-chatgpt-kenya-workers/. Accessed: and Leo Porter. 2020. Faculty Views on the Goals of an Undergraduate CS
2023-03-23. Education and the Academia-Industry Gap. In Proceedings of the 51st ACM Tech-
[90] Chris Perry and Shrestha Basu Mallick. 2023. AI-powered coding, free of charge nical Symposium on Computer Science Education (Portland, OR, USA) (SIGCSE
with Colab: Google Colab will soon introduce AI coding features using Google’s ’20). Association for Computing Machinery, New York, NY, USA, 577–583.
most advanced family of code models, Codey. https://blog.google/technology/ https://doi.org/10.1145/3328778.3366834
developers/google-colab-ai-coding-features/. Accessed: 2023-05-20. [108] Sander Valstar, Caroline Sih, Sophia Krause-Levy, Leo Porter, and William G.
[91] Sundar Pichai. 2023. An Important next Step on Our AI Journey. https://blog. Griswold. 2020. A Quantitative Study of Faculty Views on the Goals of an
google/technology/ai/bard-google-ai-search-updates/. Accessed: 2023-03-23. Undergraduate CS Program and Preparing Students for Industry. In Proceedings
[92] James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, of the 2020 ACM Conference on International Computing Education Research
Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio (Virtual Event, New Zealand) (ICER ’20). Association for Computing Machinery,
Santos. 2023. "It’s Weird That it Knows What I Want": Usability and Interactions New York, NY, USA, 113–123. https://doi.org/10.1145/3372782.3406277
with Copilot for Novice Programmers. ACM Transactions on Computer-Human [109] April Y. Wang, Ryan Mitts, Philip J. Guo, and Parmit K. Chilana. 2018. Mis-
Interaction (TOCHI) (2023). match of Expectations: How Modern Learning Resources Fail Conversational
[93] Reed Albergotti. 2023. Startup Replit Launches a ChatGPT-like Bot for Coders | Programmers. In Proceedings of the 2018 CHI Conference on Human Factors in
Semafor. https://www.semafor.com/article/02/15/2023/startup-replit-launches- Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing
a-chatgpt-like-bot-for-coders. Accessed: 2023-03-23. Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174085
[94] Johan Rosenkilde. 2023. How GitHub Copilot is getting better at understand- [110] Dror Weiss. 2022. Announcing Our Next-generation AI Models.
ing your code. https://github.blog/2023-05-17-how-github-copilot-is-getting- https://www.tabnine.com/blog/announcing-tabnine-next-generation/.
better-at-understanding-your-code/. Accessed: 2023-05-20. Accessed: 2023-03-23.
[95] Steven I. Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and [111] Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming
Justin D. Weisz. 2023. The Programmer’s Assistant: Conversational Interaction Problems. In Proceedings of the 54th ACM Technical Symposium on Computer
with a Large Language Model for Software Development (IUI ’23). Association Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for
for Computing Machinery, New York, NY, USA. Computing Machinery, New York, NY, USA, 172–178. https://doi.org/10.1145/
[96] Advait Sarkar, Andrew D. Gordon, Carina Negreanu, Christian Poelitz, 3545945.3569830
Sruti Srinivasa Ragavan, and Ben Zorn. 2022. What is it like to program with [112] Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry
artificial intelligence? Proceedings of the 33rd Annual Conference of the Psy- Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023.
chology of Programming Interest Group (PPIG 2022). arXiv:2208.06213 [cs.HC] A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT.
[97] Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic arXiv:2302.11382 [cs.SE]
Generation of Programming Exercises and Code Explanations Using Large [113] Simon Willison. 2022. Writing Tests with Copilot.
Language Models. In Proceedings of the 2022 ACM Conference on International https://til.simonwillison.net/til/til/gpt3_writing-test-with-copilot.md. Accessed:
Computing Education Research - Volume 1 (Lugano and Virtual Event, Switzer- 2023-03-23.
land) (ICER ’22). Association for Computing Machinery, New York, NY, USA, [114] Chloe Xiang. 2023. OpenAI Is Now Everything It Promised Not to Be: Corporate,
27–43. https://doi.org/10.1145/3501385.3543957 Closed-Source, and For-Profit. https://www.vice.com/en/article/5d3naz/openai-
[98] Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. is-now-everything-it-promised-not-to-be-corporate-closed-source-and-for-
2015. Do We Know How Difficult the Rainfall Problem Is?. In Proceedings of the profit. Accessed: 2023-03-15.
15th Koli Calling Conference on Computing Education Research (Koli, Finland) [115] J.D. Zamfirescu-Pereira, Richmond Wong, Bjoern Hartmann, and Qian Yang.
(Koli Calling ’15). Association for Computing Machinery, New York, NY, USA, 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design
87–96. https://doi.org/10.1145/2828959.2828963 LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in
[99] Gaelan Steele. 2022. ChatGPT passes the 2022 AP Computer Computing Systems. https://doi.org/10.1145/3544548.3581388
Science A free response section. https://gist.github.com/Gaelan/ [116] Eric Zelikman, Qian Huang, Gabriel Poesia, Noah D Goodman, and Nick Haber.
cf5ae4a1e9d8d64cb0b732cf3a38e04a. Accessed: 2023-03-15. 2022. Parsel: A Unified Natural Language Framework for Algorithmic Reasoning.
[100] Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and arXiv preprint arXiv:2212.10561 (2022).
Policy Considerations for Deep Learning in NLP. arXiv preprint arXiv:1906.02243
121