Performance Secondary
Performance Secondary
INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER SCIENCE, AND DATA SCIENCE
Abstract:
Milić Vukojičić1*,
This study aims to show the difference in teaching and learning approaches in
[0009-0002-1218-5893]
secondary education. The purpose is to compare the output of a Computing
Ivana Korica1, project of three different groups of students with different resources for the
same task. The main resource for the first group was the Computing book and
[0009-0009-1918-347X]
teacher presentations, for the second group the main resource were the search
Mladen Veinović2 engines Google and Bing, and the third group’s main resource was the Large
Language Model. By comparing the outcomes and performance of the three
[0000-0001-6136-1895]
groups, the effectiveness of using different resources in a controlled classroom
environment is assessed, showing the difference in overall performance of
each group - project completion times vary widely, as do the definitions of key
British International School Belgrade,
1
terms and answers to specific questions. The study also shows that resources
Belgrade, Serbia such as books can be very useful as the limited experience of students means
that the standards are very clear. Using a search engine expands the reach of
Singidunum University,
2
information to students but also leads to too many choices. The problem that
Belgrade, Serbia arises here is that of data accuracy, for example: whether the data obtained by
LLM is accurate and sufficient for defining key terms and completing tasks.
Keywords:
Large Language Model, Artificial Intelligence in education, AI Chatbots,
Computing education, Tutoring systems.
INTRODUCTION
In recent years education has become one of the main fields that
have experienced major changes and the was faced with the need for
improvement. The difference between secondary students now and those
10 years ago is dramatic, particularly due to their attention span, their
understanding of what they have read, etc. The main difference that we
can see in the past two years has been made by the increased use of
Large Language Models (LLM) like ChatGPT (Generative Pre-trained
Transformer). From the discussions of LLM replacing teachers in the
classrooms [1] to the differences in the results when students are using
LLMs [2], but also the research in the domain of essay writing [3] and
digitised education [4,5] can give us the idea that ChatGPT will and is
Correspondence: already affecting all forms of education. In addition to the impact
Milić Vukojičić observed on teachers and students within the educational framework, it
is noticeable that both parties are subject to influences exerted by Large
e-mail: Language Models (LLMs) [6], thereby shaping and altering their roles,
[email protected] interactions, and experiences within the educational environment.
31
Sinteza 2024
DOI: 10.15308/Sinteza-2024-31-37
submit your manuscript | sinteza.singidunum.ac.rs
SINTEZA 2024
INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER SCIENCE, AND DATA SCIENCE
Interest and research attention in the field of education learning experiences while ensuring ethical consid-
regarding Large Language Models (LLMs) are experi- erations are prioritized. This paper aims to explain the
encing a notable increase. Recent scholarly activities impact and difference between students' results in the
reflect a global trend wherein educators across diverse domain of Computing education in Secondary school
contexts are actively assessing the integration of LLMs education. The experiment was planned and conducted
within their instructional practices. This growing interest on a group of 39 students who were divided into three
is particularly evident in evaluations centred around separate groups. Each group was equipped with varying
language teaching [7], as well as computer science and resources to facilitate learning and task completion. Spe-
programming education [2], highlighting predominant cifically, the resources provided to the students included
concerns and inquiries within these domains. a Computing coursebook accompanied by teacher pres-
ChatGPT embodies multiple roles within educational entations, access to conventional search engines such as
settings, functioning as an interlocutor, content provider, Google and Bing, and exposure to an innovative LLM
teaching assistant, and evaluator. Equally, teachers model, the above mentioned ChatGPT (Generative Pre-
undertake complex roles that encompass orchestrating trained Transformer). This comprehensive approach
resources with pedagogical decisions, fostering student aimed to examine the efficacy and comparative advan-
agency as active investigators, and instilling awareness tages of different learning resources within the educational
of AI ethics [7]. context, thereby offering insights into their respective
Moreover, ChatGPT's utility extends to academia as impacts on student performance and outcomes.
evidenced by its role as a writing assistant in scholarly
pursuits, as demonstrated in the paper titled “Analysing 2. EXPERIMENT METHODOLOGY
the role of ChatGPT as a writing assistant at higher
education level: A systematic review of the literature. The experiment of the study is based upon the par-
Contemporary Educational Technology” [8]. ticipation of three separate groups, totaling 39 Secondary
In various educational domains, ChatGPT – as an school students. These groups were differentiated as
example of a Large Language Model – offers invaluable follows: the first comprised of 12 students, the second
support to both students and teachers, facilitating included 15 students, and the third encompassed 12
Automated Grading and Feedback, Customized Learning students. Each group was equipped with unique
experiences, Language Translation and Vocabulary resources to tackle the same Computing task. Specifi-
Assistance, Personalized Educational Resources, Effi- cally, the first group relied primarily on Computing
cient Lesson Design, and Time Savings for Educators books and teacher presentations, while the second group
[9]. This likeness highlights the potential for ChatGPT utilized the search engines Google and Bing. In contrast,
to similarly benefit students across these areas. the third group's main resource was the LLM ChatGPT,
Researchers are actively exploring avenues for as depicted in Figure 1. This deliberate allocation of
integrating ChatGPT into educational practices, accom- resources across groups enabled a comparative analysis of
panied by guidance for its responsible implementation their respective impacts on task completion and learning
[10]. This concentrated effort reflects a growing recog- outcomes within the context of Computing education.
nition of ChatGPT's potential to enhance teaching and
The experiment involved each student completing The values displayed in Table 1, Table 2 and Table 3,
four relatively different tasks within two 40-minute and Figure 3, Figure 4 and Figure 5 offer a comprehen-
lessons, totalling 80 minutes. Although group discus- sive overview of the outcomes obtained across Task 1
sions on subject topics were present, all the tasks were through Task 4 for each of the three groups under study.
written tasks and the discussions were not considered in These tables serve as essential sources of data, summa-
regards to timing nor as a resource and the task-time- rising the performance metrics and achievements of the
point distribution can be seen in Figure 2. students throughout the experimental process. In Task
For the first task, students were tasked with defining 1 through Task 4, encompassing activities ranging from
keywords. They needed to define ten keywords, with keyword definition to presentation development, the
each correct definition earning them 1 point. Students values presented in the tables are expressed in percentages.
had 15 minutes to complete this assignment. Moving This percentage representation offers a standardized
on to the second task, students participated in a class means of comparison, enabling a nuanced understanding
discussion followed by answering short questions. After of the relative performance levels achieved across the
discussing as groups for 20 minutes, they had an addi- different tasks and groups.
tional 15 minutes for further discussion and 5 minutes Furthermore, the time allocated for completing each
to answer 3 questions. Each correct answer was worth task is recorded in seconds, providing insight into the
10 points. The third task involved completing a short efficiency and pace at which students engaged with the
test comprising 10 brief questions. This test carried a assigned activities. This time-based measurement adds
maximum of 27 points, and students were allotted 15 detail to the analysis, enabling an overview of patterns
minutes for completion. Lastly, the fourth task centred related to time management and task completion rates
on creating a presentation. Students were provided with among the groups.
a template and specific instructions for each slide. They
were required to create 8 to 10 slides, including a Title
slide and a "Thank you" slide, with the remaining slides
containing topic-related information. Each slide was
valued at 10 points, and students had 30 minutes to
finalize their presentations.
This structured approach ensured that students
engaged in diverse activities within the designated time
frame, covering aspects such as keyword definition,
group discussion, individual assessment, and presen-
tation development. The clear description of tasks and
time allocation facilitated efficient completion while
allowing for comprehensive evaluation of student
performance across different skill sets.
Table 1. Results of group 1, Computing book and teacher presentations as a main resource.
Student Task Task Task Task Task Task Task Task
Number 1 (%) 1 (sec) 2 (%) 2 (sec) 3 (%) 3 (sec) 4 (%) 4 (sec)
1 80 600 100 1200 77 900 80 1800
2 90 480 90 1200 55 900 90 1800
3 100 360 100 1200 55 900 95 1800
4 100 480 90 1200 96 900 80 1200
5 100 600 90 1200 96 900 95 1800
6 80 600 100 1200 44 900 75 1500
7 90 540 100 1200 92 900 80 1200
8 100 480 100 1200 100 900 100 1800
9 100 300 100 1200 100 900 100 1500
10 90 600 80 1200 70 900 93 1800
11 90 600 80 1200 92 900 91 1800
12 70 600 80 1200 70 900 80 1800
(a) (b)
Figure 3. Group 1 - task score in percentages (a) and Group 1 - Time taken to complete each task in seconds (b).
Table 2. Results of group 2, Search engines Google and Bing as a min resource.
Student Task Task Task Task Task Task Task Task
Number 1 (%) 1 (sec) 2 (%) 2 (sec) 3 (%) 3 (sec) 4 (%) 4 (sec)
1 100 660 100 1200 74 900 100 1200
2 100 900 100 1200 77 900 100 1800
3 60 780 100 1200 48 900 75 1500
4 80 600 90 1200 59 900 95 1200
5 90 600 90 1200 66 900 80 1800
6 60 900 100 1200 62 900 70 1500
7 100 540 90 1200 100 900 90 1800
8 100 480 100 1200 100 900 95 1800
9 60 480 90 1200 59 900 73 1500
10 80 420 90 1200 62 900 93 1800
11 80 480 100 1200 51 900 80 1200
12 90 360 90 1200 51 900 85 1800
13 100 420 80 1200 92 900 93 1800
14 80 660 80 1200 81 900 90 1200
15 80 480 80 1200 77 900 90 1500
34
Sinteza 2024 Computer Science and
submit your manuscript | sinteza.singidunum.ac.rs Artificial Intelligence Session
SINTEZA 2024
INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER SCIENCE, AND DATA SCIENCE
(a) (b)
Figure 4. Group 2 - task score in percentages (a) and Group 2 - Time taken to complete each task in seconds (b).
(a) (b)
Figure 5. Group 3 - task score in percentages (a) and Group 3 - Time taken to complete each task in seconds (b)
35
Sinteza 2024 Computer Science and
submit your manuscript | sinteza.singidunum.ac.rs Artificial Intelligence Sessionn
SINTEZA 2024
INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER SCIENCE, AND DATA SCIENCE
3. RESULTS AND DISCUSSION Specifically, for Task 1, the score range spanned from
40 to 100, while for Task 3, it ranged from 37 to 100.
Interestingly, Task 3 exhibited consistent differences in
This paper presents the outcomes obtained by
score ranges across all three groups.
students upon completing all four tasks (Tasks 1-4) as
well as the corresponding time required for task comple- These findings underscore the nuanced interplay be-
tion. These results are compiled and presented in Table tween resource allocation, task performance, and time
4, offering a comparative analysis across Group 1 (G1), management within the educational setting. While cer-
Group 2 (G2), and Group 3 (G3). The data in Table 4 tain groups may excel in specific tasks owing to their
is structured to showcase key performance metrics, in- chosen resources, variations in individual performance
cluding the Minimum (Min), Maximum (Max), Aver- levels underscore the need for tailored instructional ap-
age, and Median percentages attained by each group proaches. Furthermore, the consistent patterns observed
following evaluation. Additionally, the time taken for in Task 3 across all groups permits further investigation
task completion is presented in seconds. This facilitates into potential underlying factors influencing student
a clear and concise examination of the performance outcomes. Overall, this comprehensive analysis provides
outcomes and time-based aspects associated with each a robust foundation for refining educational strategies
group's engagement in the assigned tasks. Through this and optimizing resource allocations to enhance student
comprehensive presentation of results, valuable insights learning experiences.
can be gained into the effectiveness of different instruc-
tional strategies and resource allocations organised 4. CONCLUSION
within the educational context.
Upon closer examination of the data, it becomes evi- In recent years we have seen more papers on the
dent that Group 1 emerged with the highest test scores, topic of how LLM models can be used in teaching and
attributed to their use of Computing books and teacher learning to solve tasks, perform tests, make presenta-
presentations as primary resources. Equally, Group 3, tions, as well as help students in a variety of tasks. The
which relied on the Large Language Model ChatGPT, criticism of LLM models [11,12] and the difference
demonstrated the fastest response times among the between human and LLM output [13] was discussed in
groups. several papers.
Another crucial aspect to consider is the discrepancy In this paper, we noted the difference between
between the minimum and maximum values across the the three groups of students which relied on different
groups. Notably, Group 3 exhibited the most significant sources – coursebooks, presentations, search engines,
variance, with the widest range observed between the low- and LLM models – in the domain of completing school
est and highest scores attained by students across tasks. work tasks. The results show that the best-performing
Table 4. Comparison between results of (Group 1 - G1, Group 2 - G2, and Group 3 - G3), represented in Min, Max,
Average, and Median percentage received after evaluation and time represented in seconds.
Task Task Task Task Task Task Task Task
1 (%) 1 (sec) 2 (%) 2 (sec) 3 (%) 3 (sec) 4 (%) 4 (sec)
Min (G1) 70 300 80 1200 44 900 75 1200
Max (G1) 100 600 100 1200 100 900 100 1800
Average (G1) 90.83333333 520 92.5 1200 78.91666667 900 88.25 1650
Median (G1) 90 570 95 1200 84.5 900 90.5 1800
Min (G2) 60 360 80 1200 48 900 70 1200
Max (G2) 100 900 100 1200 100 900 100 1800
Average (G2) 84 584 92 1200 70.6 900 87.26666667 1560
Median (G2) 80 540 90 1200 66 900 90 1500
Min (G3) 40 240 80 1200 37 900 68 600
Max (G3) 100 600 100 1200 100 900 100 1800
Average (G3) 83.33333333 445 90 1200 76.58333333 900 82.91666667 1275
Median (G3) 85 480 90 1200 79.5 900 81 1200
36
Sinteza 2024 Computer Science and
submit your manuscript | sinteza.singidunum.ac.rs Artificial Intelligence Session
SINTEZA 2024
INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER SCIENCE, AND DATA SCIENCE
students are students from the first group, which had [6] J. Whalen and C. Mouza, “ChatGPT: Challenges,
only coursebooks and presentations as a resource. We opportunities, and implications for teacher edu-
can also observe that they needed the longest amount of cation,” Contemporary Issues in Technology and
time to finish tasks as the information available to them Teacher Education, vol. 23, no. 1, pp. 1-23, 2023
about task topics was limited. The second group, which [7] J. Jeon and S. Lee, “Large language models in edu-
used search engines, took the most amount of time to cation: A focus on the complementary relationship
between human teachers and ChatGPT,” Education
finish tasks, as they had unlimited resources on the
and Information Technologies, vol. 28, no. 12, pp.
internet to choose from for their tasks. Here it was very 15873-15892, 2023.
hard to differentiate which information was crucial to
[8] M. Imran and N. Almusharraf, “Analysing the role
them and which was not. Group 3 was the fastest, they of ChatGPT as a writing assistant at higher educa-
used ChatGPT as the main resource, but they also had tion level: A systematic review of the literature,”
the lowest score on the task results. The biggest differ- Contemporary Educational Technology, vol. 15,
ence between the lowest-scoring student and highest no. 4, pp. ep464, 2023
highest-scoring student we observed in Group 3. [9] M. Javaid, A. Haleem, R. P. Singh, S. Khan, and I. H.
While students can be confused when they have Khan, “Unlocking the opportunities through Chat-
unlimited choices for resources like search engines, the GPT Tool towards ameliorating the education sys-
tem,” BenchCouncil Transactions on Benchmarks,
performance of the students decreases when they have
Standards and Evaluations, vol. 3, no. 2, p. 100115,
limited resources like ChatGPT. LLM-based models can 2023
improve the speed of completing tasks, however they
[10] M. Halaweh, “ChatGPT in education: Strategies for
rely on the student's ability to ask questions. Future responsible implementation,” Contemporary Edu-
work in the domain of students' performance in secondary cational Technology, vol. 15, no. 2, p. ep421, 2023
school education, based on LLM, can be done by using [11] E. M. Bender and A. Koller, “Climbing towards
different LLM-based chatbots like Gemma and Llama 2. NLU: On meaning, form, and understanding in
the age of data,” in Proceedings of the 58th annual
meeting of the Association for Computational Lin-
guistics, July 2020, pp. 5185-5198
5. REFERENCES
[12] E. M. Bender, T. Gebru, A. McMillan-Major, and S.
37
Sinteza 2024 Computer Science and
submit your manuscript | sinteza.singidunum.ac.rs Artificial Intelligence Sessionn