0% found this document useful (0 votes)

18 views

Building Collaborative Learning - Exploring Social Annotation in Introductory Programming

Artigo

Uploaded by

leandrosgalvao

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Building Collaborative Learning - Exploring Social Annotation in Introductory Programming

Artigo

Uploaded by

leandrosgalvao

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-

SEET)

Building Collaborative Learning: Exploring Social Annotation in

Introductory Programming
Francisco Gomes de Oliveira Neto Felix Dobslaw
Chalmers and the University of Gothenburg Mid Sweden University
Dept. of Computer Science and Engineering Dept. of Quality Mngmt, Communication and Inf. Systems
Gothenburg, Sweden Östersund, Sweden
[email protected] [email protected]

ABSTRACT Engineering Education and Training (ICSE-SEET ’24), April 14–20, 2024, Lis-
The increasing demand for software engineering education presents bon, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/
3639474.3640063
learning challenges in courses due to the diverse range of topics that
require practical applications, such as programming or software
design, all of which are supported by group work and interaction.
1 INTRODUCTION
Social Annotation (SA) is an approach to teaching that can enhance Programming is one of the first topics taught in many engineer-
collaborative learning among students. In SA, both students and ing disciplines. Large classes of students typically have their first
teachers utilize platforms like Feedback Fruits, Perusall, and Diigo contact with programming when teachers explain basic program-
to collaboratively annotate and discuss course materials. This ap- ming constructs as statements and are shown examples of output
proach encourages students to share their thoughts and answers produced by executing code [21, 29]. Most of that knowledge is
with their peers, fostering a more interactive learning environment. not introduced to students before higher-level education. More-
We share our experience of implementing social annotation via over, most students are unfamiliar with the technological content
Perusall as a preparatory tool for lectures in an introductory pro- knowledge associated with teaching and learning programming
gramming course aimed at undergraduate students in Software (e.g., development environments, installation of compilers or in-
Engineering. We report the impact of Perusall on the examina- terpreters). Exercising those skills already in the first class can
tion results of 112 students. Our results show that 81% of students easily overwhelm students, particularly those without any prior
engaged in meaningful social annotation successfully passed the knowledge of programming [29]. Besides the content itself, stu-
course. Notably, the proportion of students passing the exam tends dents must learn how to explain their algorithms to each other,
to rise as they complete more Perusall assignments. In contrast, i.e., explain to peers the steps that they followed to solve a specific
only 56% of students who did not participate in Perusall discus- problem [25, 28]. This is particularly challenging when students
sions managed to pass the exam. We did not enforce mandatory need to collaborate towards a solution in, e.g., a project course.
Perusall participation in the course. Yet, the feedback from our Teaching approaches focused on flipping the classroom or per-
course evaluation questionnaire reveals that most students ranked forming active learning have shown effective results in improving
Perusall among their favorite components of the course and that the exam results of students [3, 5]. Particularly, preparing for lec-
their interest in the subject has increased. tures by reading material or watching videos has been one of the
main tools used to allow teachers and students to focus their time
CCS CONCEPTS together on solving problems and discussing different solutions
to the problems. However, those studies did not investigate the
• Applied computing → Collaborative learning; Computer-
collaborative dimension of students working and learning together.
assisted instruction; Interactive learning environments; •
Social Annotation (SA) is a pedagogical approach that fosters
Social and professional topics → Computing education.
collaborative learning among students, enabling them to jointly
engage with course materials, discuss concepts, solve problems,
KEYWORDS and compare their annotations with peers [1, 18]. Recent research
Social Annotation, Educational Technology, Computing Education highlights the effectiveness of collaborative learning in computer
science and programming education, with students showing in-
ACM Reference Format:
creased engagement and improved learning outcomes [12, 24]. SA
Francisco Gomes de Oliveira Neto and Felix Dobslaw. 2024. Building Col-
laborative Learning: Exploring Social Annotation in Introductory Program- has garnered overwhelmingly positive student responses, under-
ming. In 46th International Conference on Software Engineering: : Software scoring its potential for enhancing the educational experience [4].
Social Annotation requires an online platform facilitating com-
munication and knowledge sharing among students as they interact
with course resources, such as textbooks, exercises, and video lec-
tures. Feedback Fruits, Diigo, and Perusall are a few examples of
This work licensed under Creative Commons Attribution International 4.0 License. such tools. For instance, in Perusall 1 , instructors share course ma-
ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal terials, allowing students to asynchronously and collaboratively
© 2024 Copyright held by the owner/author(s). generate annotations by highlighting specific sections within the
ACM ISBN 979-8-4007-0498-7/24/04.
https://doi.org/10.1145/3639474.3640063 1 https://www.perusall.com/

12
ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal de Oliveira Neto and Dobslaw

material — whether those annotations target timestamps in videos, 2 RELATED WORK

sentences or paragraphs in text, or sections of a web page. These an- Learning programming goes beyond the skill of writing code ac-
notations serve as a medium for students to write their understand- cording to a syntax (theory), it also requires students to trace the
ing, identify challenging concepts, and seek clarification through execution by predicting outputs and state changes [16, 21], as well
questions or comments. as explaining the code to other programmers [28]. Those skills
This interactive annotation process encourages students to ar- are connected but distinct from one another, such that instruction
ticulate their comprehension of concepts and pinpoint difficulties. models aim to refine them. For instance, the Theory of Instruction
These annotations trigger discussions among students, as other model proposed by Xie et al. [29] highlights four programming
students provide their own explanations, hence fostering collabora- skills based on reading and writing code using knowledge at a ma-
tive learning dynamics within the environment. Teachers can use chine level (semantics) or at task level (templates). Those four skills
such discussions to discover why certain ideas remain unclear to are progressively obtained by the student by reading semantics,
students and cover them in discussions with the entire class. writing semantics, reading templates, and writing templates.
Our goal is to investigate the impact of social annotations in Our goal is not to propose or evaluate such models, rather we
Perusall when teaching programming to first-year students in the investigate the prospects of social annotations that can later be
Software Engineering and Management bachelor program at the used in combination with such models to bring forward the aspect
University of Gothenburg (Sweden). Particularly, we aim to verify of metacognitive thinking. In Perusall, students are encouraged
whether students sharing their understanding of the course material to share their cognitive processes of understanding when posing
with their peers affects their performance in the course exam. We questions or providing answers within shared course materials [20],
compare the results of 112 students who used Perusall to prepare and prompting students to engage in metacognitive thinking can
for each lecture by completing reading assignments. Particularly, enhance their abilities in reading and writing code [14, 15].
our report targets the following research questions: Perusall, or social annotation, has not been widely investigated
in the context of teaching and learning programming yet. Meyer
RQ1: Do students of a programming course engage in
and Müller found significant challenges in implementing Perusall
non-compulsory social annotation activities?
in a Data Structures and Algorithms course [19]. The primary diffi-
Yes. Most students (on average 78% of 112 students) engaged in
culty was in maintaining student motivation to annotate materials
social annotations throughout all 18 course lectures. However,
and participate in online discussions. Similarly, we observed that
roughly 20% of those students created meaningful comments
many of our students failed to produce annotations that effectively
that showed their understanding of the topic.
demonstrated their understanding of the subject matter. These ex-
periences underscore the necessity of promoting a shift towards a
RQ2: Does social annotation engagement have an im-
culture of continuous learning
pact on the students’ grades and passing rates?
Other subject areas have shown promising results in using SA.
Yes. Students who engaged in social annotation by creating
In a comparative study, Suhre et al. identified a positive correlation
more meaningful comments had, proportionally, better grades
between active participation and examination results with Perusall
and passing rates than those who were less engaged in social
in eight different courses in social sciences [27]. They further found
annotations.
that engagement can be fostered through the tool if certain criteria
Our results reveal that students who create meaningful com- are respected including stimulating texts and assignment formu-
ments in Perusall and engage in discussion with their peers have lations, appropriate group sizes, the a-priori providing of good
better grades in the course. Failure rates decrease for those stu- annotation examples, as well as timely feedback from instructors.
dents who use Perusall more in the course. Moreover, the course Those findings align with other studies focused on social annota-
feedback questionnaire reveals a positive response from students tion with other platforms, where researchers see improvements in
using Perusall in the course, which aligns with existing findings in learning engagement [8], attention [7], peer communication, and
literature [12, 23]. On the other hand, the course feedback indicates sense of community [10].
that many of the students were not motivated to engage in Perusall The papers above provide insights into diverse methods for eval-
discussions. uating student performance and learning. In this experience report,
This paper is structured as follows. Section 2 presents related we establish correlations between Perusall activity and students’
work regarding teaching programming, social annotation, and pre- exam performance to discern distinctions among groups of stu-
vious studies with Perusall. Section 3 provides context to our inves- dents actively engaged in social annotation. Although exam scores
tigation by sharing course details, such as student population and offer a limited perspective on student performance [6], they have
course structure.2 We detail the data collection and analysis in Sec- been employed in prior research that investigates Perusall and so-
tion 4. We present results and findings from our research questions cial annotation, enabling the exploration of student engagement
in Section 5, followed by a discussion involving student feedback, and learning [7, 13, 20]. This approach allows us to draw parallels
lessons learned and the limitations of our report (Section 6). Lastly, with our findings. In future studies, we intend to explore additional
we conclude and outline future work in Section 7. dimensions of student performance, including their sense of be-
longing and learning progression throughout the course.
2 Some details were omitted due to the double-blind review process. Some of the course
artefacts (feedback form, example of the exam, course material can be shared after the
reviewing process.)

13
Building Collaborative Learning: Exploring Social Annotation in Introductory Programming ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal

Figure 1: Example of student interaction in Perusall. Three students comment on the highlighted (pink) annotation in the
course material about the lecture on Functions in Java. Students help each other understand the difference between reusable
code for simple tasks (functions) and structural abstractions (classes).

3 CASE COURSE: CONTEXT AND SCOPE reading the material until the end, and engaging with the material
We investigate the impact of Perusall in a course taught in the first (e.g., scrolling, highlighting text, etc.). The reading assignments are
study period of an international bachelor program in Software Engi- the main component of Perusall that fosters collaborative interac-
neering and Management at the University of Gothenburg (Sweden). tion between students before the lecture. Therefore, the completion
The course is on Object-oriented Programming (OOP) and covers of reading assignments will be our main metric to measure the
the following learning outcomes: (i) basics in procedural program- social annotation element in the course.
ming (e.g., printing, conditionals, loops, arrays and functions), and To motivate students to complete reading assignments, the course
(ii) core concepts of OOP (e.g., classes, objects, encapsulation, poly- instructor offered bonus points to students. A maximum of eight
morphism). The programming language taught in the course is Java. bonus points towards the exam were given to students creating
meaningful annotations or engaging in discussion in the Perusall
Course structure: The course instance took place during 10 material for the eighteen lectures.3 For each lecture, a 0.5 bonus
weeks in 2022 and had 143 registered students. Students were ex- point could be obtained. Thus, the maximum amount of points was
pected to dedicate 20 hours per week to the course, which would achieved completing any 16 of the 18 Perusall tasks.
include time in lectures, laboratory sessions (focused on practical We used Perusall’s definition of meaningful annotations and
exercises), and self-studies at home. Students were offered three shared examples with the students at course start.4 In short, mean-
2-hour lecture sessions, and three 2-hour lab sessions a week — ingful annotations are comments or questions that showcase the
all of which non-compulsory. For course completion, the students student’s comprehension of the concepts discussed in the material.
must submit: (i) three programming assignments done in groups We exemplify a meaningful student exchange in Perusall through
of up to three; and (ii) a final individual written hall exam where Figure 1 where three students discuss what a Function is. All three
students score between 0–100 points. The course was taught on students received the bonus for that lecture as they made multiple
campus by one course responsible, and eleven teaching assistants. similar comments throughout that lecture’s material.
To cope with the large number of students, we used Perusall’s
Student background: No entry requirements in programming algorithm that automatically assess the quality of annotations of
or computer science applied, as this was the students’ first pro- students based on the content quality, the number of students’
gramming course in the Program. Nonetheless, students may or replies, length of text, among other features extracted from the
may not have had previous programming knowledge (e.g., during annotation [11]. The course instructor chose the holistic scoring
their high-school education), leading to a heterogeneous sample of strategy defined in Perusall which considers the annotations cre-
student backgrounds. Students took one other course in parallel in ated, and whether the student has read the entire material before
discrete mathematics with the same expected workload. the lecture.5 The course instructor also determined that students
needed to create at least two meaningful annotations to complete
Perusall and social annotation: For each lecture, students
3 Students were also informed that bonus points could not be used to cross the passing
were instructed to prepare by reading the material in the Perusall
threshold of 50 points, i.e., a student could not pass the course through bonus points.
platform in the form of reading assignments. Students complete 4 https://support.perusall.com/hc/en-us/articles/360034824694-How-is-annotation-

these reading assignments by creating meaningful annotations, quality-defined-in-Perusall-

5 https://www.perusall.com/hubfs/downloads/scoring-details.pdf

14
ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal de Oliveira Neto and Dobslaw

the reading assignment to foster discussion threads and commu- usage of Perusall, (ii) the frequency and level of satisfaction from
nication between groups of students. The practical exercises and students engaged in peer instruction, (iii) students’ reactions to
student-teacher interactions happened mainly during lectures. In the teaching methods, and (iv) the self-reported impact on their
case a student disagrees with Perusall’s automatic score, the course learning.
instructor can revise and override Perusall’s decision. Fine-tuning
Perusall’s accuracy is beyond the scope of our study, therefore, Prior Knowledge in Programming: One of the main chal-
we acknowledge and discuss some limitations associated with its lenges in teaching first-year programming courses is the variance
automated grading system in our threats to validity. among students regarding their prior knowledge of programming.
On one hand, having prior knowledge can lower students’ motiva-
Lecture format: All lectures were hosted on campus. An aver- tion to engage in discussion about topics they are already familiar
age of 80 students showed up to class (55% attendance rate). Each with. On the other hand, those students can also share their experi-
two-hour lecture was divided into two parts with a 10-min break in ences with novice students to help them learn. Typically, program-
between. Part one was a Mentimeter6 session with multiple-choice ming is not taught in primary or secondary school, even though
questions regarding the Perusall material. We chose Mentimeter reality might change, given the benefits of introducing students
for its simplicity and to allow for anonymity. Students were told earlier to programming [26, 29]. The instructor estimated the prior
that the quiz participation had no influence on the grade and was knowledge of students by sending them an anonymous question-
optional. For each question, the answer statistics were presented naire with various programming-related questions before the first
live to the students, and the teacher initiated a discussion about lecture (each question had an option “I do not know/I cannot answer
the student’s reasoning, particularly when the answers were not yet”. From the sample of 115 respondents, 34% of students could not
converging to the correct option. The second part of the lecture answer what a String is, and 67% of students did not know what
focused on applying the lecture topics with the help of one or two an if-statement is. Both topics are basic programming constructs
coding exercises solved together with the class. taught in the first week of the course. Therefore, we argue that
prior programming knowledge is not a prevalent factor influencing
Written hall exam: Students did a four-hour written exam our analysis in this instance of the course.
with various questions focusing on tracing code, writing small
classes or functions, and explaining the application and trade-offs 4 RESEARCH METHODOLOGY
of topics covered in the course. Students received between 0–100 To prevent an unfair teaching environment and the risk of favoring
points based on the quality of their answers. We applied the four- or disadvantaging a particular group of students, we chose not to
level grading scale below. The exams were anonymised by the conduct a controlled experiment. Instead, we gave students the
examination office and graded by the course responsible. choice by making social annotation an optional part of the course.
• Fail (U): Assigned to students that scored less than 50 points Therefore, to answer our research questions, we used the individ-
in the exam. ual results of the Perusall reading assignments together with the
• Pass (3): Given to students that scored between 50 and 69 student’s exam results. The feedback from the course evaluation
points. questionnaire is used to discuss the qualitative aspects of the stu-
• Pass with merit (4): Given to students that scored between dent’s feedback about using social annotation. We refer to Perusall
70 and 84 points. activity as the outcome of the reading assignments made by each stu-
• Pass with distinction (5): This is the highest grade in the dent. Each lecture had a corresponding reading assignment. There
scale and is given to students that received points greater were three outcomes for each reading assignment:
than or equal to 85.
• Skipped: The student did not create a single annotation
Course evaluation: At the course’s outset, five students volun- for that reading assignment, or they did not even read the
teered to become student representatives who are the contact point material in Perusall.
for all students when the student collective wants to offer feed- • Incomplete: The student created at least one annotation in
back regarding the teaching and learning throughout the course. the material, but the content was not assessed as meaningful
Nonetheless, all students have direct channels to communicate with by Perusall’s algorithm, i.e., the annotation did not convey
the course instructor. In the last week of lectures, all students re- the student’s understanding of the subject covered.
ceive a questionnaire following the SEEQ feedback template [17]. • Completed: The student made at least two comments on
The questionnaire is closed before the written exam to reduce the annotations that were classified as meaningful according
risk of bias introduced by the examination experience. The course to Perusall’s algorithms. These comments can be questions
responsible and student representatives meet on two occasions: the they asked, answers provided to other students or comments
first time halfway into the course to reflect on the course status and in discussion threads.
the teaching methods for possible intervention; the second meeting
We compared those different types of activities in relation to the
was a retrospective with the presence of the program manager and
students’ exam results (both points and grade). We analysed the
study administrators where the SEEQ questionnaire results were
exam results without adding the bonus points from Perusall since
discussed. The information collected from those instruments helped
this would otherwise introduce a bias towards passing students. We
the instructor to understand: (i) some of the main obstacles in the
analysed the results of 112 students considering the intersection
6 https://www.mentimeter.com/ between those registering for Perusall during the course instance

15
Building Collaborative Learning: Exploring Social Annotation in Introductory Programming ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal

Table 1: List of topics covered in the course. 100

75.9 %
90 66.1 %
48.2 %
37.5 %
65.2 %
53.6 %
80 51.8 % 59.8 % 61.6 %
48.2 %
ID Topic of the lecture: 70
52.7 %
51.8 %
57.1 % 53.6 %
52.7 %
58.9 %

Num. of Students
L01 Variables, Types and Expressions
49.1 %
60
34.8 %

L02 User Input 50

L03 Conditionals
43.8 %
40
34.8 %

L04 Loops 30
25.9 %

L05 Arrays
24.1 % 24.1 %
20 21.4 %
19.6 % 19.6 %
17 % 17 % 17 % 16.1 %

L06 Basics in OOP

10 14.3 % 13.4 % 14.3 % 13.4 % 14.3 %
10.7 %

L07 Reference variables

L01 L02 L03 L04 L05 L06 L07 L08 L09 L10 L11 L12 L13 L14 L15 L16 L17 L18
L08 Encapsulation and immutable objects Lecture titles

L09 Collections - Lists, Sets and Maps Assignment Status: Incomplete Completed

L11 Composition and Aggregation

L12 Inheritance
Figure 2: The overall distribution of social annotations with
L13 Polymorphism
non-compulsory peer instruction. The dashed line intercepts
L14 Abstract Classes
the y-axis at half of the number of students (n = 56 students).
L15 Exceptions and Error Handling
L16 Interfaces in Java
L17 OOP Design Principles: SOLID 5 RESULTS
L18 Files
Table 2 presents the results of the exam without any association
to the Perusall activity. This allowed us to understand how the
class performed. Based on the exam grade distribution, 45% of the
and taking the written exam.7 We anonymised the data set by students failed the written exam. The one course parallel to this
removing all identity information from the records used throughout one showed a smaller yet similar failing rate (roughly 30%). Below,
our analysis. we analyse our research questions by relating those percentages to
For simplicity, the plots discussed in our results include IDs for the Perusall activity of students.
each lecture. Table 1 maps each lecture ID to the corresponding sub-
ject covered by the reading material. For the remainder of the paper, Table 2: Distribution of exam grades. Students who failed the
we use the term assignments to refer to the reading assignments in exam received a grade of U. Passing students received grades
Perusall. Throughout our discussions, we consider that a student 3, 4 or 5 (highest grade).
who completed a reading assignment conveyed their understanding
of the topic to other students, which is one of the main goals of
social annotation. Grades: U 3 4 5
Number of students: 51 37 17 7
4.1 Scientific Ethics and Data Availability Percentage of students: 45.5% 33.0% 15.2% 6.2%
The University of Gothenburg is a State University under Sweden’s
principle of publicity (in Swedish, offentlighetsprincipen) which
ensures transparency with the population.8 Therefore, the public
5.1 RQ1: Level of Engagement in Social
can request all exams (digital or printed) via a transparency office
at the University. Students are made aware of such principles when Annotation
admitted to the University. Nonetheless, we anonymise the data Figure 2 contains an overview of students’ annotation patterns
shared in this paper. To comply with scientific ethical guidelines, per reading assignment. More than 50% of students participated
we also asked students and teachers for consent to use their course in social annotation (Incomplete + Completed) with variations de-
data (e.g., annotations, Perusall login data, exam results) during the pending on the topic. An average of 78% attempted or completed
first week of the course. We clarified that students could opt out of the interactions with the material. The lecture on loops (L03) had
the study at any moment. the least social annotation (44% of the students skipped it), whereas
We share the files and scripts relevant to this experience report L17 (OOP Design) had the most engagement (only 13% skipped it).
in our analysis package in Zenodo [2].9 The CSV files include the While the majority of the engaged students did not complete the
exam points, grades and the classification of Perusall’s annotation reading assignments, students used the material throughout the
per student and lecture. We also share the report of the course course with no consistent signs of decline.
evaluation. On the other hand, there was a decline in the proportion of stu-
dents completing the reading assignments, hence indicating that
7 Students who did not complete previous course instances also take the exam. Similarly,
fewer were making meaningful annotations/comments in Perusall.
a subset of students participated in the course but decided to skip the written exam. The drop was higher after L03 (loops) from 44% to 21% which is
8 https://medarbetarportalen.gu.se/service-support/for-arbetsgivare/8.universitetet-
ar-en-statlig-myndighet/, available in Swedish. then sustained in different topics. The completion rate stayed con-
9 https://doi.org/10.5281/zenodo.10483184 sistently below 20% after L08 (Encapsulation) and the other core

16
ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal de Oliveira Neto and Dobslaw

OOP concepts. Note that the drop in completion rate did not affect 18
the drop in reading assignments engagement as the proportion of
incomplete assignments varies roughly between 40–60% through- 16

out all lectures. Below, we illustrate two contrasting annotations

Num. of Completed Assignments

14
from different students about overriding methods in Lecture 12
(Inheritance). 12
Grade

“not very clear what does it mean functionality?”. (Classified 10 U

as Incomplete.) 8
3
4
5
6
“Why is [overriding] risky? I get that it could become a prob-
lem if a subclass needs to override a method, but doesn’t. 4

Is there any risk if everything works even if the subclass 2

does not override the method and the superclass method is
executed instead?”. (Classified as Completed.) 0

0 10 20 30 40 50 60 70 80 90
The first comment was classified as incomplete because the Exam Points

student is not clear whether they mean the functionality of a shown

piece of code or how Inheritance and overridden methods work. In Figure 3: Exam points in correlation to the number of com-
contrast, the student that completed the assignment conveys their pleted Perusall assignments. The dashed vertical line denotes
current understanding of overriding risks (“I get that it could...” ) the passing threshold (50 points).
and their struggle to realise different risky scenarios (“... even if the
subclass does not override...” ).
The lower percentage of completed assignments can be attrib- Above Median Completion Less than Median Completion

uted to a variety of reasons. For instance, the learning curve to 45

create meaningful annotations, lack of time to dedicate to social 40

64.5 %

annotations, or a lack of accuracy in Perusall’s automated algorithm

to detect meaningful comments related to programming. Determin- 35

ing an accurate decline in the quality of annotations requires a

Num. of Students

more extensive and manual qualitative analysis of the text written 25

by students which is outside the scope of our experience report. 44 %

Therefore, we summarise our RQ1 findings below. 20

24.2 %
15
RQ1: More than 50% of students took part in the non-compulsory 22 % 22 %
social annotation activities in Perusall. However, the percent- 10

age of students making meaningful comments or annotations 5

12 % 9.7 %

decreased over time. 1.6 %

0
U 3 4 5 U 3 4 5
Grades
5.2 RQ2: Impact of Social Annotations on
Grades and Passing Rates Figure 4: Percentage of students per each grade based on the
We measure social annotation based on the number of completed expected number of completed assignments from passing
reading assignments. The reason behind our choice is that Perusall’s students (more than 2 assignments). Students who completed
algorithm mainly grades students based on the quality of their above the median have better grades compared to those who
annotations which is one of the affecting factors in social anno- do not.
tation [7, 13]. Therefore, we assumed that students completing
assignments have provided more insights to help their peers learn
about the subject. Figure 3 shows the correlation between the points compared the proportion of students for each grade. We chose the
obtained in the exam and the number of completed reading assign- median number of completed assignments for all passing students
ments. For students with none/little annotations, i.e., few completed to divide the groups. Our reasons were two-fold: (i) the median
assignments, no grade impact could be observed. As we increase conveys the expected number of completed assignments to pass the
the number of completed assignments, note that the number of exam; (ii) the median divides the sample into two student groups
students who failed started to decrease. Nonetheless, there were of roughly equal size such that a similar proportion of students in
still many students who passed the exam without completing many all grades indicates that students would pass/fail the exam indepen-
reading assignments. dently of their Perusall interactions. The median number of reading
To verify whether the number of completed assignments af- assignments completed for the passing students was 2, resulting in
fected the exam results, we divided our sample into two groups and the two student cohorts in Figure 4.

17
Building Collaborative Learning: Exploring Social Annotation in Introductory Programming ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal

Of the 62 students who completed less than two reading assign- Most students (64%) would often or always read the material
ments, 64.5% (40) failed the exam. This proportion was almost three available in Perusall, but only three students stated that they create
times higher than the proportion of failing students who completed annotations in Perusall at the same frequency. Below, we share the
at least two assignments (22%). Moreover, the proportion of stu- statement from a student reporting that the need to annotate the
dents in all passing grades is significantly higher in the group of material added a distraction to their studies, despite seeing their
students that completed the expected number of assignments to benefits when reading discussions from other students. Perusall
pass, particularly for the better grades 4 (6 vs. 11) and 5 (1 vs. 6). has options to hide comments and annotations, but students did
not receive a walk-through or demonstration of Perusall’s features
RQ2.1: We observed a distinct grade distribution among stu- in course start.
dents who actively participated in social annotation by com-
pleting the required number of assignments for passing. No- “Personally, I enjoyed being able to ask questions directly
tably, we identified a positive correlation between engage- in Perusall and receiving answers. However, my personal
ment and the distribution of exam scores and final grades. As learning style implies highlighting key concepts I find impor-
students increased their involvement in social annotations tant, and sometimes in Perusall there were full paragraphs
within Perusall, we noted a decrease in the number of exam highlighted with a question, and it distracted me from the
failures. Furthermore, a significant trend emerged, reveal- material”. (Student)
ing that the majority of students who completed fewer than
two assignments ended up failing the exam, while those who After L12 (Inheritance), student representatives in the course
completed at least two assignments exhibited proportionally asked the teacher to create anonymous annotations, which is an op-
better performance in their exam grades. tion for Perusall. The anonymity allows teachers to see the identity
of students creating or replying to comments, but students do not
Figure 5 contrasts the proportion of passing and failing students see each other. We see a slight increase in activity from students
based on their corresponding number of completed assignments. after that, but this is still lower than some assignments before the
We see that the largest proportion of failing students are those who anonymity was enabled.
did not complete a single assignment (x = 0), which aligns with The course evaluation questionnaire also includes two questions
our observations above. Moreover, when completing more than 4 about the different teaching practices used in the course: (Q9) “What
assignments the cumulative number of students passing the exam are the three things you liked the least about the course?”, (Q10)
(25) is much higher than the number of students failing (5). Focusing “What are the three things you liked the most about the course?”.
on the middle range of completed assignments (4–9), few students Only one student listed Perusall and social annotation as one of the
fail and many more pass in that range. For the students highly things they liked the least in the course. Particularly, the student
engaged in social annotation (above 9 lectures) the correlation is was unsatisfied with the amount of time spent during the lecture
even more apparent as only 2 (out of 19) students failed the exam. quizzes (which typically cover the content from Perusall). Also, this
student is more interested in a more traditional format of lectures
RQ2.2: The majority of students who failed did not com- where explanations are delivered predominantly by the lecturer
plete any reading assignments. After completing more than rather than by their colleagues. Related work also reports that
4 assignments, the proportion of students passing the exam students struggle to adopt social annotations and move towards
(22.5%) is much higher than those that failed (4.5%). continuous learning [19].

“The discussions about the questions on Perusall. I think they

were unnecessarily long and took away precious time from
6 DISCUSSIONS AND LESSONS LEARNED
the lectures. I personally was interested in listening to the
Here, we complement our quantitative analysis above with a qual- explanations from my lecturer, not from my peer that may
itative analysis of the course feedback provided by students. The be as lost as me.”. (Student)
course evaluation questionnaire reveals some qualitative aspects
of the usage and the response of students to using Perusall as a In contrast, a dominating number of students responded pos-
tool. We also cover connections between Perusall and the student’s itively to the usage of social annotation and practical exercises
learning and summarise our findings. We end the section with a during the lectures. From 18 text responses: 8 students (40%) ex-
summary of lessons learned and the limitations of our observations. plicitly mentioned Perusall as one of the things they liked the most
Course evaluations at our University are anonymous and use the in the course, and 11 students (61%) mentioned that the lecture
Student Evaluation of Educational Quality (SEEQ) template [17]. quizzes and discussions helped them understand programming bet-
We extended the questionnaire with a few questions focusing on the ter. Particularly, students emphasised the scope, size and quality of
social annotation aspect of the course. A subset of questions relevant the reading material created by the instructor for this course.
to our discussion and their corresponding answers is presented in
Table 3. The response rate was 17% (25 out of 142 students), which “The provided material on Perusall was on a nice level, and it
is low. One of the reasons for the low response rate is that the was never unclear what to read before each lecture or where
questionnaire was only available for students in the last 2-weeks we were in the course. And lastly, the quizzes were a good
of the course. Moreover, students reported that they received few
reminders to complete the course evaluation.

18
ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal de Oliveira Neto and Dobslaw

15 11.6 %

10 8% 8%

5 3.6 %
2.7 % 2.7 % 2.7 % 2.7 %
1.8 % 1.8 % 1.8 % 1.8 %
Num. of Students

0.9 % 0.9 % 0.9 % 0.9 % 0.9 % 0.9 %

0
0.9 % 0.9 % 0.9 % 0.9 %
1.8 % 1.8 %
2.7 %
5

10 8.9 %

30 26.8 %

0 2 4 6 8 10 12 14 16 18
Num. Completed Assignments

Course Outcome: Passed Failed

Figure 5: Course passing statistics per lecture. Completing many of the assignments (x-axis) has a large impact on passing the
course, while little activity results in a high risk of failing. The “negative” y-axis is used simply to emphasise the difference
between students who passed and those who failed.

Table 3: The responses for a subset of questions from the course evaluation questionnaire. 25 students answered the questionnaire
using a Likert scale with 5 levels detailed below. For each question, the median answer is highlighted in bold and blue.

ID Question Description 1 2 3 4 5
1:Never. 2:Rarely. 3:Sometimes. 4:Often. 5.Always
Q1 How often did you read the material before the lecture (Perusall or offline)? 1 2 6 6 10
Q2 How often did you create or respond to annotations in Perusall? 8 9 5 1 2
Q3 How often did you attend the lectures? 1 0 0 2 22
1:Strongly disagree. 2:Disagree. 3:Neutral. 4:Agree. 5:Strongly Agree
Q4 Reading the material before the lectures helped me better understand the lessons 1 2 6 7 9
Q5 The lecture quizzes made me better understand the concepts being taught. 2 2 4 10 6
Q6 Students are encouraged to ask questions and are given meaningful answers. 0 0 1 9 15
Q7 I have learned something that I consider valuable. 0 1 1 8 15
Q8 My interest in the subject has increased as a consequence of this course. 1 1 2 9 12

measure of what had been understood or needed more prac- students vary in study habits or access to additional educational
tice, and also led to some nice discussions.”. (Student) resources outside the course, which might influence exam scores
independently of the course’s teaching methods. Such an analysis
Most of our analysis focuses on the correlation between Pe- would require other instruments, such as a thematic analysis of
rusall annotations and exam points, such that we cannot use those students’ annotations throughout the course, as well as exercises
measures to confidently infer causation between social annotation or assessments that can show more of a progression throughout
and students’ learning. When analyzing the correlation between the course. We aim to perform those analyses in future work.
exam scores and students’ learning, it is important to consider po- For the scope of this paper, we evaluate the learning based on the
tential confounding factors. For instance, students who engage in self-reported satisfaction from the course evaluation. The results
social annotation might already be highly motivated and diligent, suggest that social annotation correlates with the student’s learning
inherently contributing to their higher exam scores. Additionally, satisfaction. Note that 23 students (92%) agree or strongly agree

19
Building Collaborative Learning: Exploring Social Annotation in Introductory Programming ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal

that they have learned something valuable in the course (Q7) and of annotations following the introduction of anonymous
that, similarly, their interest (84%) in programming increased as annotation, albeit introduced later in the course. It is worth
a consequence of the course (Q8). Therefore, we summarise our considering that implementing anonymous annotations from
findings and lessons learned in the points below: the outset could have fostered greater engagement early on.
• Most students who engaged in social annotation passed the
exam and, proportionally, showed higher grades. This is also 6.1 Limitations
reported in other areas such as physics [20] or multimedia There are some limitations in our analysis. One of the main con-
applications [7]. struct validity threats is focusing on one instrument to indicate
• Completing more reading assignments in Perusall is corre- learning performance in the course. Written exams have limitations
lated with higher passing rates. in conveying the learning of students due to various factors such
• Most students listed that Perusall, the lecture quizzes and as anxiety due to time constraints or cultural biases [6, 22]. On
class discussions among the three things they liked the most the other hand, exam results or grades provide a consistent way to
in the course. compare trends across many instances of courses and have been
• More than 90% of the course evaluation respondents agree used to evaluate teaching in software engineering education in the
that they learned something that they consider valuable, and literature [5]. Using exam points also allowed us to compare our
their interest in programming has increased after the course. findings with other results from literature [7, 13, 20]. We mitigate
• Social annotation can leverage flipped classroom approaches. the limitation in using exam scores by focusing our conclusions
Many students read the material before the lecture and were on the correlations between points, grades and Perusall activity
motivated to engage in active learning during classes (e.g., without inferring a direct causation to learning.
Mentimeter quizzes and discussions). Even though the instructor can override Perusall’s automated
Based on the experience reported in this paper, we make the fol- grading, there are also risks with students receiving bonus points
lowing recommendations to instructors interested in introducing without making substantial comments (false positives). We did not
social annotation and Perusall to their programming courses: make adjustments for those cases to avoid reducing the grade of the
student. In our findings, most students did not earn bonus points as
• Consider material length and scope: Given that students
their comments were deemed insubstantial by Perusall. Less than
in the analyzed course had approximately 24 hours between
five students requested a score review, and of those, just two had
lectures to prepare and annotate the associated reading ma-
their scores modified. Nonetheless, we plan to explore Perusall’s
terials, it is crucial for these materials to be both concise and
accuracy in future research further.
succinct. In this particular course, the average content for
Moreover, course evaluation feedback is also subject to various
each lecture encompassed approximately 10 pages, compris-
factors, such as the student population (e.g., student bias), the im-
ing text and Java code examples
pact of exam results, and the phrasing of the questions. We mitigate
• Investigate incentives for social annotation: More than
these factors by: (i) using the standardised SEEQ template [17] and
half of the students consistently engaged with Perusall through-
collecting course feedback before the written exam is performed.
out the course. However, the percentage of completed read-
Moreover, course evaluations can be a useful tool to gather infor-
ing assignments dropped over time. Increasing the number
mation about the student’s experiences and can offer insights on
of bonus points, or making social annotation compulsory
how to improve course quality [9].
can encourage engagement.
In our analysis, the heterogeneity of students’ background knowl-
• Demonstrate the social annotation platform early: Stu-
edge in programming is a key internal validity threat. To assess
dents are not familiar with social annotation platforms in
this, we conducted an entry questionnaire at the course’s outset,
education, which can create initial barriers to engagement.
revealing that 67% of students were unfamiliar with basic concepts
Additionally, they may not be used to articulate their cog-
like "conditionals." This suggests a relatively uniform knowledge
nitive processes while writing their notes. To mitigate this,
level among participants. However, our results are not broadly gen-
demonstrating Perusall, along with illustrative examples of
eralizable due to the specific context of our study. Despite this, our
both effective and ineffective annotations, can diminish the
large student sample and the range of topics align with those in
learning curve associated with social annotation.
many university programming courses. Future iterations of this
• Explain the social annotation platform in the first
course will incorporate student feedback to refine these teaching
weeks: Students are not familiar with social annotation
activities.
platforms, which can increase the friction of creating an-
notations in the first weeks of the course. Moreover, they
are not necessarily critical about conveying their cognitive 7 CONCLUSIONS AND FUTURE WORK
processes while studying. Showing how a tool like Perusall We report on our experience in introducing optional social anno-
works, as well as some examples of meaningful (and not so tation in a programming course via reading assignments using
meaningful) annotations can reduce the learning curve to Perusall. We analyse whether social annotation has an impact on
social annotation as a practice. the student’s grades and satisfaction. Our findings suggest that
• Enable anonymous annotations: Initially, students may many (in our case, the majority of) students actively engage with
hesitate to openly express their uncertainties or questions the optional material and that a significant correlation to passing
to their peers. We observed a small increase in the number grades can be observed. However, only a subset of the annotations

20
ICSE-SEET ’24, April 14–20, 2024, Lisbon, Portugal de Oliveira Neto and Dobslaw

done by students are classified as meaningful. We also observe that https://doi.org/10.1016/j.compedu.2005.04.020

many students interact with the material but leave assignments [9] David W Johnson, Roger T Johnson, and Karl Smith. 2007. The state of cooperative
learning in postsecondary and professional settings. Educational Psychology
incomplete. Therefore, teachers aiming to use social annotation Review 19 (2007), 15–29.
should account for the additional study time and effort required [10] Jeremiah H. Kalir. 2020. Social annotation enabling collaboration for open learn-
ing. Distance Education 41, 2 (2020), 245–260. https://doi.org/10.1080/01587919.
by students between lectures. Moreover, students might require 2020.1757413
guidance on how to use social annotation platforms and examples [11] Gary King, Eric Mazur, Kelly Miller, and Brian Lukoff. 2019. Instructional support
of how to create meaningful annotations. platform for interactive learning environments. US Patent 10,438,498.
[12] Cynthia Bailey Lee, Saturnino Garcia, and Leo Porter. 2013. Can peer instruction
Moreover, students were positive about the use of Perusall and, be effective in upper-division computer science courses? ACM Transactions on
particularly, its combination with the quizzes and practical sessions Computing Education (TOCE) 13, 3 (2013), 1–22.
during lectures. Students report that they felt that they learned [13] Jian-Wei Lin and Yuan-Cheng Lai. 2013. Harnessing Collaborative Annotations
on Online Formative Assessments. Journal of Educational Technology & Society
something valuable in the course and that their interest in pro- 16, 1 (2013), 263–274. http://www.jstor.org/stable/jeductechsoci.16.1.263
gramming has increased. We argue that this latter aspect is particu- [14] Dastyni Loksa and Amy J. Ko. 2016. The Role of Self-Regulation in Programming
Problem Solving Process and Success. In Proceedings of the 2016 ACM Conference
larly important, as an increased interest in programming can help on International Computing Education Research (Melbourne, VIC, Australia) (ICER
students keep their motivation and engagement as they progress ’16). Association for Computing Machinery, New York, NY, USA, 83–91. https:
through different courses in the program. Although not the main //doi.org/10.1145/2960310.2960334
[15] Dastyni Loksa, Lauren Margulieux, Brett A. Becker, Michelle Craig, Paul Denny,
focus of this study, the significant reduction in grading demands Raymond Pettit, and James Prather. 2022. Metacognition and Self-Regulation in
through the use of an AI grading tool indicates potential for scala- Programming Education: Theories and Exemplars of Use. ACM Trans. Comput.
bility. With 140 students enrolled in the course and using Perusall, Educ. 22, 4, Article 39 (sep 2022), 31 pages. https://doi.org/10.1145/3487050
[16] Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Rela-
manually assessing each student’s contributions would be impracti- tionships between Reading, Tracing and Writing Skills in Introductory Program-
cal, even with numerous teaching assistants. We believe that teach- ming. In Proceedings of the Fourth International Workshop on Computing Education
Research (Sydney, Australia) (ICER ’08). Association for Computing Machinery,
ers’ time is better spent engaging in Perusall discussions, offering New York, NY, USA, 101–112. https://doi.org/10.1145/1404520.1404531
comments, and using these interactions to inform the preparation [17] Herbert W Marsh. 1982. SEEQ: A reliable, valid and useful instrument for col-
of exercises and practical sessions. lecting students’ evaluation of University teaching. British journal of educational
psychology 52, 1 (1982), 77–95.
Future studies aim to compare our results to previous and fu- [18] E. Mazur. 1997. Peer Instruction: Getting Students to Think in Class. American
ture instances of the courses. Another important question is the Institute of Physics, 981–988. https://doi.org/10.1016/0378-4371(79)90165-1
relationship between social annotation and the other practical com- [19] M. Meyer and T. Müller. 2019. If it were that easy: First experiences of introducing
a social learning platform in an undergraduate CS course. In ICERI2019 Proceedings
ponents of the course, such as the programming assignments done (Seville, Spain) (12th annual International Conference of Education, Research and
in groups or the lab sessions where students interact with teaching Innovation). IATED, 10688–10697. https://doi.org/10.21125/iceri.2019.2624
[20] Kelly Miller, Brian Lukoff, Gary King, and Eric Mazur. 2018. Use of a Social
assistants. Lastly, we aim to compare our results to other topics in Annotation Platform for Pre-Class Reading Assignments in a Flipped Introductory
software engineering that require critical thinking and assessment Physics Class. Frontiers in Education 3 (2018), 1–11. https://doi.org/10.3389/
in other complex constructs, such as software architectural design feduc.2018.00008
[21] Greg L. Nelson, Benjamin Xie, and Amy J. Ko. 2017. Comprehension First:
patterns, test specifications, or planning software development ses- Evaluating a Novel Pedagogy and Tutoring System for Program Tracing in CS1.
sions. Most of these activities are typically carried out in teams In Proceedings of the 2017 ACM Conference on International Computing Education
and require consensus among participants, hence there might be Research (Tacoma, Washington, USA) (ICER ’17). Association for Computing
Machinery, New York, NY, USA, 2–11. https://doi.org/10.1145/3105726.3106178
an inherent component of collaborative learning in those activities [22] Anthony J. Nitko. 1996. Educational assessment of students. ERIC, Prentice-Hall
that can be enhanced with tools such as Perusall. Order Processing Center, P.O. Box 11071, Des Moines, IA 50336-1071.
[23] Elena Novak, Rim Razzouk, and Tristan E Johnson. 2012. The educational use of
social annotation tools in higher education: A literature review. The Internet and
Higher Education 15, 1 (2012), 39–49.
REFERENCES [24] Leo Porter, Cynthia Bailey Lee, Beth Simon, and Daniel Zingaro. 2011. Peer
[1] Catherine H Crouch and Eric Mazur. 2001. Peer instruction: Ten years of experi- Instruction: Do Students Really Learn from Peer Discussion in Computing?.
ence and results. American journal of physics 69, 9 (2001), 970–977. In Proceedings of the Seventh International Workshop on Computing Education
[2] Francisco Gomes de Oliveira Neto and Felix Dobslaw. 2024. Analysis Package: Research (Providence, Rhode Island, USA) (ICER ’11). Association for Computing
Building Collaborative Learning: Exploring Social Annotation in Introductory Pro- Machinery, New York, NY, USA, 45–52. https://doi.org/10.1145/2016911.2016923
gramming. https://doi.org/10.5281/zenodo.10483184 [25] Yizhou Qian and James Lehman. 2017. Students’ misconceptions and other
[3] Scott Freeman, Sarah L. Eddy, Miles McDonough, Michelle K. Smith, difficulties in introductory programming: A literature review. ACM Transactions
Nnadozie Okoroafor, Hannah Jordt, and Mary Pat Wenderoth. 2014. on Computing Education (TOCE) 18, 1 (2017), 1–24.
Active learning increases student performance in science, engineer- [26] Mara Saeli, Jacob Perrenet, Wim MG Jochems, and Bert Zwaneveld. 2011. Teach-
ing, and mathematics. Proceedings of the National Academy of Sciences ing programming in Secondary school: A pedagogical content knowledge per-
111, 23 (2014), 8410–8415. https://doi.org/10.1073/pnas.1319030111 spective. Informatics in education 10, 1 (2011), 73–88.
arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1319030111 [27] Cor Suhre, Koos Winnips, Vincent Boer, Pablo Valdivia, and Hans Beldhuis. 2019.
[4] Fei Gao. 2013. A case study of using a social annotation tool to support collabo- Students’ experiences with the use of a social annotation tool to improve learn-
ratively learning. The Internet and Higher Education 17 (2013), 76–83. ing in flipped classrooms. In Fifth International Conference on Higher Education
[5] Lucas Gren. 2020. A Flipped Classroom Approach to Teaching Empirical Software Advances. UPV Press, Valencia, Spain. https://doi.org/10.4995/HEAD19.2019.9131
Engineering. IEEE Transactions on Education 63, 3 (2020), 155–163. https: [28] Jacqueline L. Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins,
//doi.org/10.1109/TE.2019.2960264 P. K. Ajith Kumar, and Christine Prasad. 2006. An Australasian Study of Reading
[6] Jason A Grissom, Demetra Kalogrides, and Susanna Loeb. 2015. Using student and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO
test scores to measure principal performance. Educational evaluation and policy Taxonomies. In Proceedings of the 8th Australasian Conference on Computing
analysis 37, 1 (2015), 3–28. Education - Volume 52 (Hobart, Australia) (ACE ’06). Australian Computer Society,
[7] Yueh-Min Huang, Tien-Chi Huang, and Meng-Yeh Hsieh. 2008. Using annotation Inc., AUS, 243–252.
services in a ubiquitous Jigsaw cooperative learning environment. Journal of [29] Benjamin Xie, Dastyni Loksa, Greg L Nelson, Matthew J Davidson, Dongsheng
Educational Technology & Society 11, 2 (2008), 3–15. http://www.jstor.org/stable/ Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Amy J Ko. 2019.
jeductechsoci.11.2.3 A theory of instruction for introductory programming skills. Computer Science
[8] Wu-Yuin Hwang, Chin-Yu Wang, and Mike Sharples. 2007. A study of multimedia Education 29, 2-3 (2019), 205–253.
annotation of Web-based materials. Computers & Education 48, 4 (2007), 680–699.