Journal of Computer Science Research - Vol.5, Iss.3 July 2023
Journal of Computer Science Research - Vol.5, Iss.3 July 2023
Journal of
Computer Science
Research
Editor-in-Chief
Dr. Lixin Tao
Dr. Jerry Chun-Wei Lin
Volume 5 | Issue 3 | July 2023 | Page1-73
Journal of Computer Science Research
Contents
Articles
1 Similarity Intelligence: Similarity Based Reasoning, Computing, and Analytics
Zhaohao Sun
15 Development of New Machine Learning Based Algorithm for the Diagnosis of Obstructive Sleep Apnea
from ECG Data
Erdem Tuncer
22 Enhancing Human-Machine Interaction: Real-Time Emotion Recognition through Speech Analysis
Dominik Esteves de Andrade, Rüdiger Buchkremer
57 Expert Review on Usefulness of an Integrated Checklist-based Mobile Usability Evaluation Framework
Hazura Zulzalil, Hazwani Rahmat, Abdul Azim Abd Ghani, Azrina Kamaruddin
Review
46 Innovating Pedagogical Practices through Professional Development in Computer Science Education
Xiaoxue Du, Ellen B Meier
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ARTICLE
Department of Business Studies, PNG University of Technology, Private Mail Bag, Lae 411, Morobe, Papua New
Guinea
ABSTRACT
Similarity has been playing an important role in computer science, artificial intelligence (AI) and data
science. However, similarity intelligence has been ignored in these disciplines. Similarity intelligence is a
process of discovering intelligence through similarity. This article will explore similarity intelligence, similarity-
based reasoning, similarity computing and analytics. More specifically, this article looks at the similarity as an
intelligence and its impact on a few areas in the real world. It explores similarity intelligence accompanying
experience-based intelligence, knowledge-based intelligence, and data-based intelligence to play an important
role in computer science, AI, and data science. This article explores similarity-based reasoning (SBR) and
proposes three similarity-based inference rules. It then examines similarity computing and analytics, and a
multiagent SBR system. The main contributions of this article are: 1) Similarity intelligence is discovered
from experience-based intelligence consisting of data-based intelligence and knowledge-based intelligence. 2)
Similarity-based reasoning, computing and analytics can be used to create similarity intelligence. The proposed
approach will facilitate research and development of similarity intelligence, similarity computing and analytics,
machine learning and case-based reasoning.
Keywords: Similarity intelligence; Similarity computing; Similarity analytics; Similarity-based reasoning; Big data
analytics; Artificial intelligence; Intelligent agents
*CORRESPONDING AUTHOR:
Zhaohao Sun, Department of Business Studies, PNG University of Technology, Private Mail Bag, Lae 411, Morobe, Papua New Guinea; Email:
[email protected]
ARTICLE INFO
Received: 19 March 2023 | Revised: 20 May 2023 | Accepted: 26 May 2023 | Published Online: 9 June 2023
DOI: https://doi.org/10.30564/jcsr.v5i3.5575
CITATION
Sun, Zh.H., 2023. Similarity Intelligence: Similarity Based Reasoning, Computing, and Analytics. Journal of Computer Science Research. 5(3):
1-14. DOI: https://doi.org/10.30564/jcsr.v5i3.5575
COPYRIGHT
Copyright © 2023 by the author(s). Published by Bilingual Publishing Group. This is an open access article under the Creative Commons Attribu-
tion-NonCommercial 4.0 International (CC BY-NC 4.0) License. (https://creativecommons.org/licenses/by-nc/4.0/).
1
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
2
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
and 75% [15]. Therefore, one of the important tasks of learning [7,18], because as soon as we have created
ChatGPT (http://wwww.openAi.com) is to discover patterns, and we have to use similarity to match what
similarity intelligence from two or more objects, was input to the systems and compare it with our
texts, and cases. patterns.
Similarity intelligence is important because it en- Machine learning is about how to build comput-
ables us to identify similarities and patterns in data ers and apps that improve automatically through ex-
sets, which can be used to make more informed deci- perience [19], that is, machine learning is a process of
sions and predictions. By identifying similarities be- discovering intelligence from experience using com-
tween different sets of data, objects, and cases [8], we puters and software. Therefore, machine learning is
can better understand relationships and draw insights an experience-based Intelligence.
that might not be immediately apparent, at least sim- Machine learning is about how a computer can
ilarity intelligence can allow us to select one from a use a model and algorithm to observe some data
similarity class as a representative and then we can about the world, and adapt to new circumstances
analyze it as a characteristic of the similarity class [16]. and detect and extrapolate patterns [11]. Therefore,
For example, in the field of customer relationship machine learning is a process of discovering in-
management [12], intelligence can be used to identi- telligence from data, that is, machine learning is
fy patterns and preferences in consumer behaviors data-based intelligence, a process of discovering in-
through similarity metrics. The patterns and prefer- telligence from data, because it is a process of using
ences can then be used to develop targeted adver- probabilistic models and algorithms on data to create
tising and product recommendations that are more intelligence through data [11].
likely to appeal to specific groups of consumers. One of the unsupervised machine learning is clus-
Therefore, similarity intelligence is important tering [7]. How we calculate the similarity between
not only for computer science, AI, big data, and data two clusters or two objects is important for cluster-
science, but also for businesses and organizations in ing [4,7,18]. There are a few methodologies that are uti-
a wide range of industries, enabling decision makers lized to calculate the similarity: For example, Min,
to obtain more informed decisions in an intelligent Max, the distance between centroids and other simi-
experience-based, knowledge-driven, and data-driv- larity matrices mentioned in Section 4.4. Therefore,
en world. machine learning is similarity intelligence, a process
for creating intelligence through similarity.
Overall, machine learning is an experience-based
3. Experience-based intelligence
Intelligence, a process of discovering Intelligence
Experience-based intelligence is a process of through experience [4]. Machine learning is da-
discovering intelligence from experiences, based on ta-based intelligence, a process of discovering intel-
experience-based reasoning [17]. Experience-based ligence from data. Machine learning is also similar-
intelligence consists of data-based intelligence and ity intelligence, a process for creating intelligence
knowledge-based intelligence. This section looks at through similarity.
similarity intelligence from experience-based intel-
ligence using two examples, machine learning and 3.2 Case-based reasoning
CBR. Machine learning is data-based intelligence.
CBR is knowledge-based intelligence. CBR is a process of discovering similarity intelli-
gence from a case base, just as data mining is a process
of discovering data intelligence from a large DB [12].
3.1 Machine learning
Similarity intelligence includes the exact case that
Similarity has always been important in pattern has been used in the past for solving the problem en-
recognition, graphical pattern recognition, machine countered recently.
3
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
CBR is a reasoning paradigm based on previous and development is that it points out the importance
experiences or cases [12,8]. CBR is based on two prin- of experience and similarity [9,16]. CBR is experi-
ciples about the nature of the world [8]: The types of ence-based intelligence, a process for discovering
problems an agent encounters tend to recur. Hence, intelligence based on experience. Because case base
future problems are likely to be similar to current is a kind of knowledge base [10,8], so that, CBR is also
problems. The world is regular: similar problems a knowledge-based intelligence [11].
have similar solutions or similar causes bring similar Overall, similarity intelligence accompanies
effects [8]. Consequently, solutions to similar prior experience-based intelligence [10,8], data-based intel-
problems are a useful starting point for new problem ligence [4] and knowledge-based intelligence [11] to
solving. The first principle implies that CBR is a provide constructive insights and decision supports
kind of experience-based reasoning (EBR), while the for businesses and organizations.
second principle is the guiding principle underlying
most approaches to similarity-based reasoning (SBR) [8]. 4. Fundamentals of similarity
“Two cars with similar quality features have similar
The similarity is a fundamental concept for many
prices” is one application of the above-mentioned
fields in mathematics, mathematical logic, computer
second principle, and also a popular experience
science, AI, data science, and other sciences [16,9,20,21].
principle summarizing many individual experiences
This section first briefly looks at similarity and then
of buying cars. It is a kind of SBR. In other words,
focuses on similarity relations, fuzzy similarity rela-
SBR is a concrete realization of CBR. The CBR sys-
tions, and similarity metrics.
tem (CBRS) is an intelligent system based on CBR,
which can be modelled as [8]:
4.1 Introduction
CBRS = Case Base + CBRE (1)
where the case base (CB) is a set of cases, each of The concept of similarity has been studied by nu-
which consists of the previously encountered prob- merous researchers from different disciplines such as
lem and its solution. CBRE is a CBR engine, which in mathematics [20], big data [5], computer science [22,23],
is the inference mechanism for performing CBR, in AI and fuzzy logic [1,2,21], to name a few. For example,
particular for performing SBR. The SBR can be for- Klawonn and Castro [24] examined similarity in fuzzy
malized as: reasoning and showed that similarity is inherent to
P', P' ∼ P P, → Q fuzzy sets. Fontana and Formato [25] extended the res-
∴'
(2)
olution rule as the core of a logic programming lan-
where P, Pꞌ, Qꞌ and Qꞌ represent compound propo- guage based on similarity and discussed similarity in
sitions, Pꞌ ∼ P means that if Pꞌ and P are similar (in deductive databases. The concepts of similarity and
terms of similarity relations, metrics and measures, similarity relations play a fundamental role in many
see Section 4) and then Q and Q’ are also similar. (2) fields of pure and applied science [26,20]. The notion of
is called generalized modus ponens, that is, (2) is one a metric or distance between objects has long been
of the inference rules for performing modus ponens used in many contexts as a measure of similarity
based on SBR. Typical reasoning in CBR, known or dissimilarity between elements of a set [27,22,18].
as the CBR cycle, consists of (case) Repartition, Thus, there exist a wide variety of techniques for
Retrieve, Reuse, Revise and Retain [8]. Each of these dealing with problems involving similarity, similari-
five stages is a complex process. SBR dominates all ty relations, similarity measures, and similarity met-
these five stages [16]. Therefore, CBR is a process for rics [21,23]. For example, fuzzy logic [1,2], databases [5],
discovering intelligence through SBR, because Simi- data mining [18] and CBR [8] provides a number of
lar problems have similar solutions. concepts and techniques for dealing with similarity
One significant contribution of CBR research relations, similarity measures, and similarity metrics.
4
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Similarity-based modus tollens (SMT) is another Abduction has been used in system diagnosis or
inference rule for SBR. From a traditional viewpoint, medical diagnosis [8] and scientific discovery [34].
Similarity-based abduction
we can consider SMT as an integration of SBR and Abduction is an important reasoning paradigm in
modus tollens. The general form of SMT is as fol- Abduction has been usedabductive
SBR. Similarity-based in systemreasoning
diagnosis(SAR)
or medical diagn
lows: . a natural development of abductive reasoning [35],
[34]¬' , ' ≈, →, →'
is
Abduction
∴¬' is an important reasoning paradigm in SBR
¬' , ' ≈, →, →' or an application
Although of SBR tollens
fuzzyismodus in abductive
have reasoning.
not been Its
investigated in
(15) reasoning (SAR) a natural (15)
development of abductive reaso
∴¬' general form is as follows:
In the context of fuzzy similarity relations or similarity metri
SBR in abductive reasoning. Its general form is as follows:
Although
Although fuzzyfuzzymodus modustollens tollenshave havenotnotbeen
been investigated
in- rule' , of' ≈,
in
inferencefuzzy [1] logic [1], this is the first
→, →' to the above Equation (15) we obtain:
[22] (18)
In the context of fuzzy similarity relations or similarity
�0 = 1metrics
− (1' − �,1 )using∘ � the� compositional
�
10 ∘11 ∘ 01
[1]
vestigated in fuzzy logic , this is the first time that ∴
rule of inference to the above Equation (15) we obtain:
[1]
Example 6. Similarity-based abductive reasoning. As in Exam
�0similarity-based �1 )modus tollens is discussed. With This Exampleis a computational
6. Similarity-based foundation abductive for reasoning.
similarity-based mod
= 1 − (1 − ∘ � ∘ � ∘ � � • RAT: The applicant has a good (16)
credit rating,
the increasing importance of similarity, SMT and its 10
10 11 01 Asisina Example unit metric, 5, let: and Equation (16) '
is then simplified into
This is a computational foundation for similarity-based •�REP: = modus
1 The
− (1 tollens.
applicant
− � ) ∘Inhas
� thea∘ case
good
� financial
≈ , reputation,
corresponding SBR will find their applications in • The 0 ● RAT: The applicant 1 11has a 01 good credit rating,
�10 is a unit metric, and Equation (16) is then simplified intoloan officer has an experienced rule, → : If
business and(1mathematics. ● REP: The applicant has a good financial repu-
�0 = 1 − − �1 ) ∘ � 11 ∘ 01
� rating, then the applicant has a good financial reputation.
Example 5. Similarity-based modus tollens. Let: In thistation, case, the loan officer knows(17) the information from ap
Similarity-based The loan abduction
● RAT: The applicant has a good credit rating, has a ● satisfactory officer has
financial an experienced
reputation. Becauserule, “a satisfactory
Abduction
“a goodhas RAT→REP: been used If the applicant has
in system diagnosis a good
that is, or ~ creditmedical' diagn
Similarity-based
● REP: Theabduction applicant has a good financial to financial reputation”; , the
[34]
.
similarity-based rating, then the
abductive applicantreasoning has a good
to make financial
the decision and
reputation,
Abduction has been used in system diagnosis or medical a satisfactorydiagnosis
reputation.
Abduction credit
[8]
and
is an rating, scientific
important because discovery
reasoning
“a goodparadigm in SBR
credit rating” is s
[34] ● The loan officer has an experienced rule,
. reasoning
rating”. In this It is(SAR)
case,
obvious is aloan
the natural
that “Thedevelopment
officer knows thehas
applicant ofa satisfactory
informa- abductive reaso
cred
Abduction RAT→REP: is anIf important
the applicantreasoning has a goodparadigm
credit
SBR
“Thetion inin SBR.
abductive
applicant
from Similarity-based
has
applicant reasoning.
A, REPꞌIts
a satisfactory : Theabductive
general
financial
applicant form is
hasasafollows:
reputation.” Therefo
reasoning rating,
(SAR) thenisthe applicantdevelopment
a natural has a good financial
of abductive ' ' reasoning [35]
' , or an application of
reasoning
, ≈, →,
satisfactory canfinancial
→be alsoreputation. used for Because generations of explanation,
“a satis-
SBR in abductive reputation. reasoning. Its general form is as follows: scientific ∴'
factory discovery .
[34,36]
financial reputation” is similar to “a good
' '
, In≈, →, → '
this case, the loan officer knows the informa- financial Example 6. Similarity-based abductive reasoning. As in Exam
In the context
reputation”;of fuzzy that similarity
is, REP ~ relations
(18)
REPꞌ,rating,theandloansimilarity m
∴'
has an rule
• RAT: The applicant has a good credit
tion
Example from applicant
6. Similarity-based A, ¬REP: The applicant
abductive reasoning. Asofficerof
in inference
Example
uses
[1]
the5,above to thesimilarity-based
let above Equation abductive (18), we obtain:
•�REP: � The� applicant � has
� a good financial reputation,
unsatisfactory financial
• RAT: The applicant has a good credit rating,reputation.
reasoning = ∘
1 to make ∘ the ∘
11 decision and obtainrule, REPꞌ:The
• 0The loan 10
officer has an 01
experienced → : If
• REP: The applicant has a good financial reputation,
Because “a satisfactory financial reputation” This is a computational foundation for similarity-based abdu
rating,
' applicant then� the
has applicant
a satisfactory has a good
credit financial
rating, reputation.
because
• The loantoofficer has an experienced rule,that ≈ ,
: Ifcase, is
the thea unit
applicant metric, and Equation (19) is then simplifie
is similar “a good financial reputation”, is,
→ “a good
In this 10credit rating”loan ishas
officer a good
similar to credit
knows “athe information from ap
satisfactory
rating, then the applicant � = � ∘ � ∘ �
RAT→REP, therefore, thehasloana officer
good financial reputation.
uses the above hascredit0a satisfactory
1
rating”. It is11 financial
obvious' reputation.
01 that “The applicant Becausehas“aasatisfactory
In thistocase,
SMT makethe theloan officer
decision andknows
obtainthe information
¬REP: from
The to satisfactory
“a good applicant
financial A, : The
credit rating” is an explanationis,for“The
reputation”; applicant
that ~ ' , the
hasapplicant
a satisfactory financial reputation. Because “a satisfactory
has an unsatisfactory credit rating, since “a 5.3 Summary
similarity-based financial reputation” is similar
applicant
' has aabductive
satisfactory reasoning
financialtoreputation.”
make the decision and
to “a good financial reputation”; that is, ~ a
satisfactory , the loancredit officer
rating, usesbecause the above
“a good credit
good credit rating” is similar to “a satisfactory credit Therefore, Table similarity-based
1 summarizes ' abductive
the reasoning
well-known can rating”
inference is sM
rules:
similarity-based abductive reasoning to make the decision rating”. and obtain : The applicant has
rating”.
¬' , ' ≈, →, →' be alsoIt used
abduction, isandobvious
proposes
for that “The
generations threeofapplicant
inference
explanation, has a satisfactory
rules
as with respect
ab- credt
a satisfactory credit rating, because “a good credit rating”
“The is
applicant similar has to a “a satisfactory
satisfactory (15) credit
financial reputation.” Theref
or traditional forms: Modus ponens, modus for tollens, and abduc
[34,36]
rating”. In the context
It is∴¬ obviousofthat
'
fuzzy “The similarity
applicantrelations
has a satisfactoryductivecredit reasoning rating” does is scientific
an discovery .
reasoning can be also [1]explanation
used forisgenerations of explanation,
Although
similarity fuzzy
metrics [22] modus tollens have not been investigated
examined In the in
three fuzzy
context different logic
of fuzzy , this
inference
similarity the
rules firstfor
relations SBR and (see Table 1
“The
¬ , applicant
' '
≈, →, has a , satisfactory
→ ' using the compositional rule of
financial reputation.” scientific Therefore, discovery similarity-based
[22] [34,36]. (15) abductive
In the∴¬ context of fuzzy similarity relations or similarity of them metrics
has
similarity metrics, been , using
thoroughly
using the the compositional
used in
compositional computer science, mathem
reasoning
inference canto be [1]'
also
the above used
Equation for (15) we obtain:of explanation,
generations In the contextas abductive
of fuzzy reasoning
similarity does rule
relations and
of
similarity m
rule
Although of inference
fuzzy modus
[1]
to the above Equation (15)
[34,36] tollens have not been investigated we obtain:
and otherinsciences
inference fuzzy
[1]
to the
[30,39,34]
logic , this [1] . However,
is the firstthey are all the abstrac
scientific discovery . [1] above Equation (18), we obtain:
�
= 1 − (1 −of �1 ) ∘ � ∘� � (16) rule of
naturalmetrics inference
reasoning, to the
and ordinary above Equation
reasoning (18), we obtain:
(16)in the real world. Fu
In0 the the context of fuzzy similarity11 ∘ relations
relationsorand similarity , using
using the compositional
compositional
[22]
In context [1] fuzzy10 similarity 01 similarity
�0 = � metrics,
∘ � ∘ � ∘ the
� (19) '
rule of
rule This
of
This is
is aa computational
inference
inference [1] to the above
to the above
computational foundation
Equation
Equation
foundation for
for similarity-based
(15)
(18), we
weobtain:
similari- obtain: modus tollens. In the case ≈ ,
1 10 11 01
� � 1is− a (1unit �
metric, �
∘ and �
∘Equation �
11 ∘ 01 (16) is then This is ais computational
a computationalfoundation forforsimilarity-based abdu
0� =
10=
ty-based �
∘−
1modus � 1 )�
10 ∘
10�
11 ∘ In
tollens. 01the case Q ≈ Qꞌ, F issimplified
a '
This into
�
foundation
(19) (16) similari-
8
This
0
�0 is
= aa1computational
computational
− (1 − �1 ) ∘ � foundation � for similarity-based ty-based
≈ ,modus is atollens.
10abductive unitreasoning.
metric,
In theand case Equation
'
of(19)
Q ≈isQꞌ, then simplifie
11 ∘ 01 Inthe case
the case
≈ ,
10
This is foundation
unit metric, and Equation (16) is then simplified into for similarity-based � abductive
�
= 1 ∘ 11 ∘ 01 � reasoning.
� In of
�
' is a unit
10 ≈ , � metric, and Equation (16) is then simplified
10 is a unit metric, and Equation (19) is then simplified
0 into
F 10 is a unit metric, into: and Equation (19) is(17) then simpli-
�
� = 1
� − (1 � − � � ) ∘ � ∘ � (17) fied into:
0 = 1 ∘ 11 ∘ 01
0 1 11 01 (20)
Similarity-based abduction 5.3 Summary (17)
Summaryhas been used in system diagnosis or medicalTable
5.3Abduction 1 summarizes
diagnosis [8] the well-known
and scientific discovery inference rules: M
Similarity-based
[34]
. abduction abduction,
8 and proposes three inference rules with respect
Table 1
Abduction summarizes
is an the well-known inference rules:
traditional Modus
forms: ponens,
Modus modus abductive
ponens, tollens,
modus tollens, and abduc
Abduction has been usedimportant
in systemreasoning paradigm
diagnosis or in SBR. [8]Similarity-based
medical diagnosis and scientific discovery
abduction, and proposes three inference rules with respect to SBR,
examinedreasoning
[34]reasoning (SAR) is a natural development of abductive corresponding
three different
[35]
, orinference to thefor
rules
an application of SBR (see Table 1
.
traditional forms: Modus ponens, modus tollens, and abduction
SBR in abductive reasoning. Its general form is asoffollows:
[37,31]
. So far, we
them has been thoroughly used in computer have science, mathem
In the context of fuzzy similarity relations and similarity metrics, using the compositional
rule of inference [1] to the above Equation (18), we obtain:
�0 = �1 ∘ � � �
10 ∘11 ∘ 01 (19)
This is a computational foundation for similarity-based abductive reasoning. In the case of
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
' ≈ , � 10 is a unit metric, and Equation (19) is then simplified into:
�
0 = 1 ∘ �
� �
11 ∘ 01(20)
mechanism for performing modus (20) ponens or modus
tollens or abduction. However, in order to manipu-
5.35.3
Summary late the knowledge in the KB, the RBES must deal
Summary
with knowledge representation, knowledge expla-
Table 1 summarizes the well-known inference rules: Modus ponens, modus tollens,
Table 1 summarizes the well-known inference nation, and knowledge utility which are the main
abduction, and proposes three inference rules with respect to SBR, corresponding [11,8]to the
rules: Modus ponens, modus tollens, abduction, and components of the process model . Therefore, the
traditional forms: Modus ponens, modus tollens, and abduction [37,31] . So far, we have
proposes three inference rules with respect to SBR, reasoning involved in RBES can be considered as a
examined three different inference rules for SBR (see Table 1) in a unified viewpoint, each
of corresponding
them has beentothoroughly
the traditional forms:
used in[37,31]Modus pon-
computer science,composite reasoning
mathematics, paradigm. In
mathematical this [38]
logic way,
, we can
andens, modus
other tollens,[30,39,34]
sciences and abduction
. However, .they So far,
arewe differentiate
all the reasoning
abstractions paradigmsofinSBR,
and summaries mathematical
have reasoning,
natural examined three different reasoning
and ordinary inference in rules
thefor logic and
real world. AI. What we
Furthermore, CBRhavehas
examined
been only in this arti-
SBR (see Table 1) in a unified viewpoint, each of cle are simple or atomic inference rules for SBR. In
them has been thoroughly used in computer science, future work, we will examine composite reasoning
8
mathematics, mathematical logic [38], and other sci- paradigms for SBR, which constitute a “reasoning
ences [30,39,34]. However, they are all the abstractions chain” [3], “reasoning network” or “reasoning tree”
and summaries of SBR, natural reasoning, and ordi- with some depth, and correspond to natural reason-
nary reasoning in the real world. Furthermore, CBR ing in human professional activities.
has been only based on either modus ponens or mo- It should be noted that the above-mentioned
[33,8]
dus tollens or abduction , whereas SBR is based abductive reasoning and its SBR are unsound rea-
on the mentioned three inference rules. It should be soning paradigms from a logical viewpoint [31]. How-
noted that reasoning paradigms can be classified into ever, like nonmonotonic reasoning, which is also
simple (atomic or first level) reasoning paradigms and unsound reasoning [8], this inference rule and its sim-
composite (second level) reasoning paradigms [40], just ilarity-based abduction is the summarization of SBR
as propositions can be divided into simple (atomic) used by people in the real-world situations.
propositions and compound propositions [39]. The
simplest reasoning paradigm is an inference rule,
6. Similarity computing and analytics
which is the basis for any reasoning paradigm.
A composite reasoning paradigm consists of more Similarity computing and analytics are science,
than one inference rule. For example, fuzzy modus technology, system and tools used in data, informa-
ponens [2] is a composite reasoning paradigm that in- tion, and knowledge analysis to measure and com-
tegrates modus ponens and fuzzy rules. Any process pare the similarity between different data, informa-
model of a reasoning paradigm in AI is a method for tion, and knowledge sets. They are used in various
obtaining composite reasoning paradigms. For ex- fields such as AI including machine learning, data
ample, the simplest rule-based expert system (RBES) science, natural language understanding and process-
can mainly consist of the knowledge base (KB) and ing, image recognition, and information retrieval.
an inference engine (IE), where IE is an inference This section will examine similarity computing and
ased form
similarity-based asedased
form formform
→ ∴ '
, →∴≈ , → ' ' '
∴,≈≈ ∴→' '
¬,'
∴→~ ¬, '
∴→¬', '
~ ' '
~ ∴→,→ '
∴~ ' '
,∴→, '
~ ~'
'
' ' '
∴ ∴ ∴ ∴ ¬ ∴ ¬ '
∴ ¬' ' '
∴ ∴ ∴ ' '
9
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
10
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
this SBR System is the extension of CBRS and simi- ty-based modus tollens and its algorithms (see Sec-
larity-based reasoning [8,31]. tion 5.2).
3) The SAR agent is responsible for manipulating
the SCB to infer the case requested by the user based
on similarity-based abductive reasoning and its al-
gorithm (see Section 5.3). This agent can generate
the explanation for the experience-based reasoning
inferred by the MEBIE. This agent can be consid-
ered as an agentization of an inference engine in an
abductive CBR system [33].
Figure 1. A general architecture for a SBR system.
11
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
MSBRS. This issue will be resolved by the Analysis provided, the SBRS interface agent will ask U to ad-
assistant of the MSBRS. The analysis assistant will: just some aspects of the problem p, which is changed
● Rank the degree of importance of the sub into pꞌ, then the SBRS interface agent will once
outputs from the MAMIE taking into account again forward the revised problem pꞌ to the MIE for
the knowledge inconsistency, further processing.
● Give an explanation for each of the outputs
from the MIE and how the different results are 8. Conclusions
conflicting,
● Combine or vote to establish the best solu- Artificial intelligence (AI) has addressed expe-
tions, rience-based intelligence and knowledge-based in-
● Forward them to the SBRS interface agent telligence at their early stage. Big data has been ex-
who then forwards them to the user. periencing significant progress in the past 10 years,
The SCB manager is responsible for administer- AI has been developing machine learning and deep
ing the SCB. Its main tasks are SCB creation and learning to address data-based Intelligence. In fact,
maintenance, similarity case base evaluation, reuse, similarity intelligence has been accompanying expe-
revision, and retention. Therefore, the roles of the rience-based intelligence, knowledge-based intelli-
SCB manager are an extended form of the functions gence, and data-based Intelligence to play an import-
of a CBR system [8], because case base creation, case ant role in computer science, AI, and data science
retrieval, reuse, revision and retention are the main in general and similarity computing and analytics in
tasks of the CBR system [16]. particular. The main contributions of this article are:
1) It explored similarity intelligence, based on
the similarity discovered from experience-based
7.4 Workflows of agents in MSBRS
intelligence in machine learning and CBR. Similarity
Now let us have a look at how the MSBRS intelligence will be developed and created by many
works. The user, U, asks the SBRS interface agent to systems and algorithms in AI, computer science, and
solve the problem, p. The SBRS interface agent asks data science.
U whether a special reasoning agent is needed [45]. 2) It explored similarity-based reasoning and
U does not know. Thus, the SBRS interface agent proposed its three different rules, which constitute
forwards p (after formalizing it) to all agents in the the fundamentals for all SBR paradigms.
MIE for further processing. The agent in the MIE 3) It highlighted similarity-based reasoning, com-
manipulates the case in the SCB based on p, and puting, and analytics to create similarity intelligence.
the corresponding reasoning mechanism, and then As an example, the article also proposed a multia-
obtains the solution, which is forwarded to the Anal- gent architecture for an SBR system (MSBRS).
ysis assistant. After the Analysis assistant receives Overall, similarity intelligence is discovered
all solutions to p, it will rank the degree of impor- from big data, information, and knowledge using
tance of the solutions, give an explanation for each similarity relations, fuzzy similarity relations and
of the solutions and how the results are conflicting metrics, SBR, similarity computing and semantics.
or inconsistent, and then forward them (with p) to Furthermore, the similarity-based approach to
the SBRS interface agent who would then forward similarity intelligence, SBR, similarity computing
them to U. If U accepts one of the solutions to the and analytics proposed in the article opens a new
problem, then the MSBRS completes this mission. way to integrate machine learning (e.g. machine
In this case, the SCB manager will look at whether learning algorithms such as instance-based learning
this case is a new one. If yes, then it will add it to the and k-Nearest Neighbor (kNN) classifier and experi-
SCB. Otherwise, it will keep some routine records ence-based reasoning based on SBR, which will be
to update the SCB. If U does not accept the solution examined in future work. Knowledge management
12
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
and experience management have drawn increasing [9] Sun, R., 1995. Robust reasoning: Integrating
attention in business, e-commerce, and computer sci- rule-based and similarity-based reasoning. Arti-
ence. Their correspondence to intelligent systems is ficial Intelligence. 75(2), 241-295.
similarity-based systems such as CBR systems and [10] Bergmann, R., 2002. Experience management:
machine learning. How to apply similarity intelli- Foundations, development methodology, and
gence in Knowledge management, experience man- internet-based applications. Springer: Berlin.
agement, and similarity-based systems will be also [11] Russell, S., Norvig, P., 2020. Artificial intelli-
examined in future work. gence: A modern approach (4th Edition). Pren-
Measurement of intelligence is based on the tice Hall: Upper Saddle River.
ability to solve difficult problems. How to define [12] Laudon, K.G., Laudon, K.C., 2020. Manage-
the measurement of similarity intelligence is still a ment information systems: Managing the digital
weakness of this article. In future work, we will ex- firm (16th Edition). Pearson: Harlow.
plore the measurement of similarity intelligence. [13] López-Robles, J.R., Otegi-Olaso, J.R., Gómez,
I.P., et al., 2019. 30 years of intelligence models
Conflict of Interest in management and business: A bibliometric
review. International Journal of Information
There is no conflict of interest.
Management. 48, 22-38.
[14] Turing, A., 1950. Computing machinery and in-
References telligence. Mind. 49, 433-460.
[1] Zimmermann, H.J., 2011. Fuzzy set theory— [15] Schwab, P.N., 2023. ChatGPT: 1000 Texts
and its applications. Springer Science & Busi- Analyzed and up to 75,3% Similarity [Internet]
ness Media: Berlin. [cited 2023 Mar 17]. Available from: https://
[2] Zadeh, L.A., 1971. Similarity relations and www.intotheminds.com/blog/en/chatgpt-simi-
fuzzy orderings. Information Sciences. 3(2), larity-with-plan/
177-200. [16] Sun, Z., Finnie, G., Weber, K., 2004. Case base
[3] Minsky, M., 1988. Society of mind. Simon and building with similarity relations. Information
Schuster: New York. Sciences. 165(1-2), 21-43.
[4] Aroraa, C., Chitra, L., Munish, J., 2022. Data [17] Finnie, G., Sun, Z., 2003. R5 model for case-
analytics: Principles, tools, and practices. BPB based reasoning. Knowledge-Based Systems.
Publications: New Dalhi. 16(1), 59-65.
[5] Sun, Z., 2022. A mathematical theory of big [18] Kantardzic, M., 2011. Data mining: Concepts,
data. Journal of Computer Science Research. models, methods, and algorithms. John Wiley &
4(2), 13-23. Sons: Hoboken.
[6] Zhang, D.G., Ni, C.H., Zhang, J., et al., 2022. [19] Jordan, M.I., Mitchell, T.M., 2015. Machine
A novel edge computing architecture based on learning: Trends, perspectives, and prospects.
adaptive stratified sampling. Computer Commu- Science. 349(6245), 255-260.
nications. 183, 121-135. [20] Epp, S.S., 2010. Discrete mathematics with ap-
[7] Milošević, P., Petrović, B., Jeremić, V., 2017. plications. Cengage Learning: Boston.
IFS-IBA similarity measure in machine learning [21] Zhang, D.G., Ni, C.H., Zhang, J., et al., 2022.
algorithms. Expert Systems with Applications. New method of vehicle cooperative communi-
89, 296-305. cation based on fuzzy logic and signaling game
[8] Finnie, G., Sun, Z., 2004. Intelligent techniques strategy. Future Generation Computer Systems.
in E-commerce: A case based reasoning per- 142, 131-149.
spective. Springer-Verlag: Berlin. [22] Finnie, G., Sun, Z., 2002. Similarity and metrics
13
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
14
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ARTICLE
Faculty of Technology, Biomedical Eng. Department of Kocaeli University, Kocaeli, 41001, Turkey
ABSTRACT
In this study, a machine learning algorithm is proposed to be used in the detection of Obstructive Sleep Apnea
(OSA) from the analysis of single-channel ECG recordings. Eighteen ECG recordings from the PhysioNet Apnea-ECG
dataset were used in the study. In the feature extraction stage, dynamic time warping and median frequency features
were obtained from the coefficients obtained from different frequency bands of the ECG data by using the wavelet
transform-based algorithm. In the classification phase, OSA patients and normal ECG recordings were classified using
Random Forest (RF) and Long Short-Term Memory (LSTM) classifier algorithms. The performance of the classifiers
was evaluated as 90% training and 10% testing. According to this evaluation, the accuracy of the RF classifier was
82.43% and the accuracy of the LSTM classifier was 77.60%. Considering the results obtained, it is thought that it
may be possible to use the proposed features and classifier algorithms in OSA classification and maybe a different
alternative to existing machine learning methods. The proposed method and the feature set used are promising because
they can be implemented effectively thanks to low computing overhead.
Keywords: ECG; Sleep apnea; Classification; Dynamic time warping; Median frequency
*CORRESPONDING AUTHOR:
Erdem Tuncer, Faculty of Technology, Biomedical Eng. Department of Kocaeli University, Kocaeli, 41001, Turkey; Email: [email protected]
ARTICLE INFO
Received: 6 June 2023 | Revised: 4 July 2023 | Accepted: 5 July 2023 | Published Online: 14 July 2023
DOI: https://doi.org/10.30564/jcsr.v5i3.5762
CITATION
Tuncer, E., 2023. Development of New Machine Learning Based Algorithm for the Diagnosis of Obstructive Sleep Apnea from ECG Data. Jour-
nal of Computer Science Research. 5(3): 15-21. DOI: https://doi.org/10.30564/jcsr.v5i3.5762
COPYRIGHT
Copyright © 2023 by the author(s). Published by Bilingual Publishing Group. This is an open access article under the Creative Commons Attribu-
tion-NonCommercial 4.0 International (CC BY-NC 4.0) License. (https://creativecommons.org/licenses/by-nc/4.0/).
15
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ogram (ECG) is the process of recording the electri- effect of different features on apnea data instead of
cal activity of the heart. In today’s conditions, ECG the features frequently used in the literature.
signals are used in the diagnosis of OSA. Apnea
diagnosis from the ECG signal is measured by heart 2. Materials and methods
rate variability. It will be economical and practical
to determine whether a person has OSA syndrome 2.1 Data set
with the proposed machine learning technique using
The ECG recordings used in the study were tak-
single-channel ECG recordings. Because with such a
en from the PhysioNet Apnea-ECG dataset. There
system, there will be no need for environments such
are 70 ECG recordings in total. Recordings can take
as sleep laboratories [1,2]. There are many studies in
the literature on the detection of OSA from ECG up to 10 hours in length. All of the sleep recordings
using methods. In the study conducted by Yildiz [3], were taken from 32 subjects. The age range of the
obstructive sleep apnea data from ECG recordings subjects was between 27 and 63 years. The standard
were classified. Twelve features were obtained using V2 lead was used for the placement of the electrodes
wavelet transform and they achieved the highest suc- on the body surface during recording. ECGs were
cess rate of 98.3% with the support vector machine/ digitized by sampling at 16 bits per sample and 100
artificial neural network classifier algorithms. In the Hz. ECG signals with 16-bit resolution. Evaluation
study by Faal et al. [4], they presented a new feature of whether the ECG recordings belong to people
generation method using autoregressive integrated with obstructive sleep apnea was made according to
moving average and exponential generalized autore- the sleep study technique [7]. In this study, 18 ECG
gressive conditional heteroscedasticity model in the recordings of 10 randomly selected patients (a01,
time domain from ECG signals. ECG signals were a02, a03, a04, a05, a06, a07, a08, a09, a10) were
analyzed in one-minute segments. The results were used. The randomly selected apnea and normal ECG
evaluated using five different classifiers (support data signal form is given in Figure 1. In Figure 1(a),
vector machine, neural network, quadratic separation heart rate variability is visually striking after the
analysis, linear separation analysis and k-nearest 4000th sample. In Figure 1(b), the normal one-min-
neighbor). As a result of the classification, a success ute ECG signal form is given.
rate of 81.43% was achieved. Tyagi et al. [5] pro-
posed a new approach to cascade two different types 2.2 Feature selection
of restricted boltzmann machines in the deep belief
networks method for sleep apnea classification using Discrete wavelet transform
electrocardiogram signals. They achieved a success The discrete wavelet transform aims to solve the
rate of 89.11% from the ECG data examined in fixed width window source problem of the fourier
one-minute epochs. Yang et al. [6] proposed a one-di- transform by using a scalable wavelet function. Thus,
mensional compression and excitation residual group optimum time-frequency resolution is provided in
network for sleep apnea detection. With the proposed different frequency ranges for the biomedical signals
method, an accuracy rate of 90.3% was achieved. to be analyzed. With the discrete wavelet transform,
Thus, they argued that cheap and useful sleep apnea it is aimed to eliminate the excessive computational
detectors can be integrated with wearable devices. load. Since an efficient algorithm based on filters has
The aim of this study is to present an automatic been developed in the discrete wavelet transform,
machine-learning method that can detect OSA from the calculation of the wavelet coefficients is made
ECG recordings. In the proposed method, a wavelet for discrete values at certain points. This algorithm,
transform-based algorithm is proposed. Unlike the called multiple resolution, consists of sequential
studies in the literature, it is the examination of the high-pass and low-pass filter pairs [8,9]. The lower fre-
16
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Figure 1. (a) ECG sign with apnea, (b) Normal ECG sign.
quency bands of the ECG data used in the study are C = c1, c2, …., cm-1, cm (2)
given in Table 1. As shown in Table 1, a six-level Q and C in Equation (1) and Equation (2) repre-
wavelet transform is used. sent two different signals or data; n and m indicate
Table 1. Ranges of frequency bands in wavelet transform de- the lengths of these signals. The similarity ratio
composition of ECG signal. between the Q and C signals is calculated using the
Sub-bands Frequency ranges (Hz) Euclidean length as in Equation (3).
D1 25-50
d(qi, cj) = (qi, cj)2(3)
D2 12.5-50
After obtaining the (i, j) matrix for Q and C, the
D3 6.25-12.5
accumulated distance matrix is calculated using this
D4 3.125-6.25
matrix. d represents the accumulated cost matrix and
D5 1.5625-3.125
D6 0.78125-1.5625
is calculated recursively [12].
A6 0-0.78125 Median frequency
Dynamic time warping algorithm Power spectral density is the frequency domain
equivalent of the power content of the signal. It is
Dynamic time warping algorithm is a classifica-
used to characterize broadband random signals. The
tion algorithm that uses similarity measurement of
median frequency represents the midpoint of the
time series. Biomedical signals sampled over a peri-
power spectral density distribution and is the name
od of time form a time series. The similarity between
the series can be calculated by finding the sum of the given to the frequencies above and below that make-
Euclidean distances between the elements of each up 50% of the total power in the ECG [13,14].
element of two discrete time series. The closer the
Euclidean distance sum is to zero, the more similar 2.3 Classification
the time series are. Today, the dynamic time-warping
algorithm is used in many areas from image process- Random forest
ing to audio processing [10,11]. Random Forest (RF) is a very popular learning
Q = q1, q2, …., qn-1, qn (1) algorithm for classification and regression problems.
17
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Here fi is the fraction of class i recorded at node v. In this study, the machine learning method that
can predict the automatic detection of OSA disease,
Long short-term memory
which is time-consuming and costly to diagnose,
Introduced by Hochreiter and Schmidhuber, Long from single-channel ECG recordings is presented.
Short-Term Memory (LSTM) is an advanced variant The flow chart of the proposed method is shown in
of the Recurrent Neural Network (RNN) architec- Figure 2.
ture. The basic structure of LSTM is that it uses a
memory cell to remember and explicitly span unit
outputs at different time steps. The memory cell of
LSTM uses cell states to remember the information
of temporal contexts. It has a forget gate, an entry Figure 2. Flow chart of the proposed model.
gate and an exit gate to control the flow of informa- In the presented method, ECG data were analyzed
tion between different time steps. The three gates of in one-minute windows. The coefficients of the low-
LSTM make it easy to organize long-term memory. er frequency bands were obtained from each window
LSTM models can learn the temporal dependence data by using the wavelet transform (6-level Symlet2
between data. Due to its ability to learn long-term wavelet). After applying the dynamic time-warping
correlations in a sequence, LSTM networks are ca- algorithm to the wavelet coefficients in different
pable of accurately modeling complex multivariate frequency bands, the results obtained are recorded
sequences such as the ECG signal [17,18]. in the feature matrix. The relationship of the A6 co-
efficients with the other coefficients was evaluated
2.4 Evaluation of classification models with the dynamic time-warping algorithm. Another
parameter calculated as a feature is the median fre-
One of the performance metrics for the machine
quency. The median frequency values of the wavelet
learning classification problem is the confusion ma-
coefficients obtained from all lower frequency bands
trix. Table 2 contains four different combinations of
were calculated. As shown in Table 3, a total of 13
the value to be estimated and the actual values are
features were extracted and given as input to the
called the confusion matrix [19].
classifier algorithms.
Table 2. Confusion matrix.
In this study, two different classifier algorithms
Predicted: No Predicted: Yes were evaluated. One is the deep learning architecture
Actual: No True Negative False Positive LSTM and the other is the traditional learning algo-
Actual: Yes False Negative True Positive rithm RF. The architecture of the model created in
Here, TP: True positive, TN: True negative, FP: the LSTM classifier is shown in Figure 3. LSTM ar-
False positive, FN: False negative. Some of the met- chitecture layers are composed of input layer, LSTM
rics we can calculate with the terms in Table 2 are layer, dropout layer, LSTM layer, dropout layer and
accuracy, precision and recall. Their mathematical output layer, respectively. The LSTM layer contains
equations are given in Equations (5), (6) and (7). 50 units per layer. These units use the Corrected
18
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Linear Unit (ReLU) activation function and give a study, multiple combinations were tested to find the
different output for each time step. The reason for optimum parameters of tree and parameter num-
using ReLU is that it is generally less costly to train bers. The success rates obtained for combinations
the model in terms of computational load and can of different tree and parameter numbers are shown
achieve better performance than other models. In ad- in Table 4. As can be seen in Table 4, the number
dition, ReLU can avoid the vanishing gradient prob- of trees with the highest success rate was selected
lem, which is an advantage over the tanh function. as 250 and the number of parameters as two for the
After the first LSTM layer, the dropout layer classification of apnea data. Since increasing the
(with a value of 0.2) is applied to reduce overfitting. number of trees does not increase the performance of
The next layer is a new LSTM layer containing 50 the model, the model with the highest performance
units and ReLU activation functions, followed by
with the least number of trees was selected.
the dropout layer. Finally, the value containing the
Table 4. RF algorithm success results by parameters.
classification result is estimated after a sigmoid acti-
vation function is used to estimate the result with the Number of trees
Number of
Accuracy rate (%)
parameters
output layer.
10 2 78.15
20 2 80.13
30 2 80.46
1 1 1 1 40 2 80.57
. . . . . 50 2 80.79
. . . . . 70 2 81.22
100 2 81.66
. . . . .
150 2 81.99
50 50 50 50
200 2 81.88
250 2 82.43
Input LSTM Dropout LSTM Dropout Output
Layer Layer Layer Layer Layer Layer 500 2 82.43
Figure 3. Diagram of the LSTM architecture. The success rates obtained as a result of LSTM
The classification accuracy of the RF method de- architecture and RF architecture are given in Table
pends on user-defined parameters such as the number 5. As can be seen from Table 5, the optimized RF
of trees and the number of parameters. Therefore, algorithm performed better than the LSTM architec-
the selection of the most appropriate parameter for ture. Therefore, the LSTM architecture has the best
the data increases the classification accuracy. In the performance.
19
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Table 5. Classifier performances. on ECG apnea data. At the same time, the change in
Accuracy (%) Precision (%) Recall (%) success rates with the optimization of the classifier
Dataset
RF LSTM RF LSTM RF LSTM algorithms was examined. It is possible to reach
ECG higher success rates by diversifying and optimizing
82.43 77.60 82.10 76.70 82.40 77.50
Apnea the parameters of the machine learning model.
4. Discussion 5. Conclusions
The analyzed results show that it gives the highest This article discusses the estimation of apnea diag-
accuracy with 82.43% accuracy with the RF algorithm. nosis from ECG data. We propose a binary classifica-
High classification performance was achieved with thir- tion machine learning method to support physicians’
teen features obtained by using two features from ECG decisions in clinical practice. For decision support
data. When the studies in the literature were examined, applications, modeling using the RF algorithm as a
the norm entropy values of each wavelet level were classifier and classification of patients’ apnea data are
calculated by using the twelve-level wavelet trans- recommended. It has been seen that the feature meth-
form of the obstructive sleep apnea data from the ECG od selected with the RF algorithm is successful. In
recordings in the study conducted by Yildiz [3]. The the classification made with the used feature set and
obtained features were applied to the support vector RF algorithm optimization, a successful prediction
was made with 13 features with an accuracy rate of
machine/artificial neural network classifier algorithms
82.43%. The feature set and method we used in our
and the highest success rate of 98.3% was obtained. In
study give hope for higher future success rates. In
the study by Faal et al. [4], they presented a new feature
further studies, it is aimed to evaluate the efficiency of
generation method. As a result of five different classifi-
the feature set by expanding the dataset.
er algorithms, a success rate of 81.43% was achieved.
Tyagi et al. [5] proposed a new approach and achieved
a success rate of 89.11%. Yang et al. [6] proposed a Conflict of Interest
one-dimensional compression and excitation residual The author has no conflicts of interest to declare.
group network and 90.3% accuracy was achieved with
the proposed method. In the study by Razi et al. [20], Funding
ten-time domain features were extracted and reduced
to five features. Principal component analysis and dis- This research received no external funding.
criminant linear analysis were used for size reduction.
RF algorithm is proposed for classification and the re- References
sults are compared with other classifier algorithms. The [1] Wiegand, L., Zwillich, C.W., 1994. Obstructive
highest success rate detected is 95.01%. sleep apnea. Disease-a-Month. 40(4), 202-252.
When the studies in the literature are examined, it DOI: https://doi.org/10.1016/0011-5029(94)90013-2
is observed that the success rates are generally high- [2] Paiva, T., Attarian, H., 2014. Obstructive sleep
er than the study of this article. Most of the studies apnea and other sleep-related syndromes.
aimed to reach a higher success rate by using similar Handbook of clinical neurology. Elsevier:
methods and techniques. However, in the field of Amsterdam. pp. 251-271.
machine learning, the goal is not only to increase [3] Yildiz, A., 2017. Tek kanallı EKG kayıtları ana-
classification success but also to develop different lizinden uyku apne tespiti (Turkish) [Detection
features and method techniques. From this point of of sleep apnea from analysis of single-channel
view, our article differs from the studies in the liter- ECG recordings]. Dicle Üniversitesi Mühendislik
ature. A previously unused feature set is suggested Fakültesi Mühendislik Dergisi. 8(1), 111-122.
20
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
[4] Faal, M., Almasganj, F., 2021. Obstructive sleep Journal of Electrical and Computer Engineering.
apnea screening from unprocessed ECG signals 4(2), 79-83.
using statistical modelling. Biomedical Signal DOI: https://doi.org/10.17694/bajece.43067
Processing and Control. 68, 102685. [13] Brown, C.G., Griffith, R.F., Ligten, P.V., et al.,
DOI: https://doi.org/10.1016/j.bspc.2021.102685 1991. Median frequency—a new parameter for
[5] Tyagi, P.K., Agrawal, D., 2023. Automatic de- predicting defibrillation success rate. Annals of
tection of sleep apnea from single-lead ECG Emergency Medicine. 20(7), 787-789.
signal using enhanced-deep belief network mod- DOI: https://doi.org/10.1016/S0196-0644(05)80843-1
el. Biomedical Signal Processing and Control. [14] Tonner, P.H., Bein, B., 2006. Classic electroen-
80(2), 104401. cephalographic parameters: Median frequency,
DOI: https://doi.org/10.1016/j.bspc.2022.104401 spectral edge frequency etc. Best Practice & Re-
[6] Yang, Q., Zou, L., Wei, K., et al., 2022. search Clinical Anaesthesiology. 20(1), 147-159.
Obstructive sleep apnea detection from single- DOI: https://doi.org/10.1016/j.bpa.2005.08.008
lead electrocardiogram signals using one- [15] Masetic, Z., Subasi, A., 2016. Congestive heart
dimensional squeeze-and-excitation residual failure detection using random forest classifier.
group network. Computers in Biology and Computer Methods and Programs in Biomedi-
Medicine. 140, 105124.
cine. 130, 54-64.
DOI: https://doi.org/10.1016/j.compbiomed.
DOI: https://doi.org/10.1016/j.cmpb.2016.03.020
2021.105124
[16] Coskun, G., Aytekin, I., 2021. Early detection of
[7] Penzel, T., Moody, G.B., Mark, R.G., et al.
mastitis by using infrared thermography in hol-
(editors), 2000. The Apnea-ECG database.
stein-friesian dairy cows via classification and re-
Computers in Cardiology. 2000 September 24-
gression tree (CART) Analysis. Selcuk Journal of
27; Cambridge. USA: IEEE. p. 255-258.
Agriculture and Food Sciences. 35(2), 118-127.
[8] Tuncer, E., Bolat, E.D., 2022. Destek Vektör
[17] Tuncer, E., Bolat, E.D., 2022. Classification of
Makinaları ile EEG Sinyallerinden Epileptik
epileptic seizures from electroencephalogram
Nöbet Sınıflandırması (Turkish) [Epileptic seizure
(EEG) data using bidirectional short-term mem-
classification from EEG signals with support vec-
ory (Bi-LSTM) network architecture. Biomedi-
tor machines]. Politeknik Dergisi. 25(1), 239-249.
cal Signal Processing and Control. 73, 103462.
DOI: https://doi.org/10.2339/politeknik.672077
[9] Mallat, S.G., 1989. A theory for multiresolution DOI: https://doi.org/10.1016/j.bspc.2021.103462
signal decomposition: The wavelet representa- [18] Sowmya, S., Jose, D., 2022. Contemplate on
tion. IEEE Transactions on Pattern Analysis and ECG signals and classification of arrhythmia
Machine Intelligence. 11(7), 674-693. signals using CNN-LSTM deep learning model.
DOI: http://dx.doi.org/10.1109/34.192463 Measurement: Sensors. 24, 100558.
[10] Zhang, Z., Tavenard, R., Bailly, A., et al., 2017. DOI: https://doi.org/10.1016/j.measen.2022.100558
Dynamic time warping under limited warping [19] Dağli, E., Büber, M., Taspinar, Y.S., 2022. De-
path length. Information Sciences. 393, 91-107. tection of accident situation by machine learning
DOI: https://doi.org/10.1016/j.ins.2017.02.018 methods using traffic announcements: The case
[11] Jeong, Y.S., Jeong, M.K., Omitaomu, O.A., of metropol Istanbul. International journal of
2011. Weighted dynamic time warping for time applied mathematics electronics and computers.
series classification. Pattern Recognition. 44(9), 10(3), 61-67.
2231-2240. [20] Razi, A.P., Einalou, Z., Manthouri, M., 2021.
DOI: https://doi.org/10.1016/j.patcog.2010.09.022 Sleep Apnea classification using random forest
[12] Bakir, C., 2016. Automatic speaker gender via ECG. Sleep and Vigilance. 5, 141-146.
identification for the German language. Balkan DOI: https://doi.org/10.1007/s41782-021-00138-4
21
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ARTICLE
Institute of IT Management and Digitization Research (IFID), FOM University of Applied Sciences, Dusseldorf,
40476, Germany
ABSTRACT
Humans, as intricate beings driven by a multitude of emotions, possess a remarkable ability to decipher and
respond to socio-affective cues. However, many individuals and machines struggle to interpret such nuanced signals,
including variations in tone of voice. This paper explores the potential of intelligent technologies to bridge this
gap and improve the quality of conversations. In particular, the authors propose a real-time processing method that
captures and evaluates emotions in speech, utilizing a terminal device like the Raspberry Pi computer. Furthermore,
the authors provide an overview of the current research landscape surrounding speech emotional recognition and
delve into our methodology, which involves analyzing audio files from renowned emotional speech databases. To aid
incomprehension, the authors present visualizations of these audio files in situ, employing dB-scaled Mel spectrograms
generated through TensorFlow and Matplotlib. The authors use a support vector machine kernel and a Convolutional
Neural Network with transfer learning to classify emotions. Notably, the classification accuracies achieved are 70%
and 77%, respectively, demonstrating the efficacy of our approach when executed on an edge device rather than relying
on a server. The system can evaluate pure emotion in speech and provide corresponding visualizations to depict the
speaker’s emotional state in less than one second on a Raspberry Pi. These findings pave the way for more effective
and emotionally intelligent human-machine interactions in various domains.
Keywords: Speech emotion recognition; Edge computing; Real-time computing; Raspberry Pi
*CORRESPONDING AUTHOR:
Rüdiger Buchkremer, Institute of IT Management and Digitization Research (IFID), FOM University of Applied Sciences, Dusseldorf, 40476,
Germany; Email: [email protected]
ARTICLE INFO
Received: 7 June 2023 | Revised: 7 July 2023 | Accepted: 10 July 2023 | Published Online: 21 July 2023
DOI: https://doi.org/10.30564/jcsr.v5i3.5768
CITATION
Esteves de Andrade, D., Buchkremer, R., 2023. Enhancing Human-Machine Interaction: Real-Time Emotion Recognition through Speech Analy-
sis. Journal of Computer Science Research. 5(3): 22-45. DOI: https://doi.org/10.30564/jcsr.v5i3.5768
COPYRIGHT
Copyright © 2023 by the author(s). Published by Bilingual Publishing Group. This is an open access article under the Creative Commons Attribu-
tion-NonCommercial 4.0 International (CC BY-NC 4.0) License. (https://creativecommons.org/licenses/by-nc/4.0/).
22
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
23
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
stone in the early 2000s, enabling new business works, SVM, or any combination of these two. Stud-
models and innovations. However, the era of cloud ies reveal that even pure emotion determination by
computing seems to be ending as the edge comput- humans is not accurate in all cases, so the focus is on
ing paradigm is increasingly replacing the cloud the use and further development of neural networks [13].
computing paradigm due to new requirements. Edge In the field of neural networks, recurrent neural net-
computing can support the new requirements for works (RNNs) such as Long Short-Term Memory
low latency, increased data security, mobility sup- Hochreiter and Schmidhuber [14] were initially used
port, and real-time processing. The literature divides because their feedback loops make them more suita-
edge computing into the sub-areas of fog computing, ble for processing continuous inputs such as speech
cloudlet, and mobile edge computing (MEC). While signals [15,16]. RNNs have been superseded by convo-
the first two approaches mentioned are hardly found lutional neural networks (CNNs) such as AlexNet,
in practice, MEC is ubiquitous. In MEC, compu- VGG16, ResNet, or MobileNetV2 due to their high
tationally intensive cloud servers are stationed in resource and memory requirements and continued
mobile base stations at the network’s edge and thus success. Furthermore, MFCC or Mel spectrograms
close to the end devices, ensuring daily use of this were launched using a Convolutional Neural Net-
technology. As Shi et al. [10] stated, MEC means data work (CNN). Moreover, the everyday use of transfer
processing immediately to the end device and on it. learning and Multitask Learning methods makes the
In addition to MEC, mobile cloud computing (MCC) CNN deployment even more efficient [17].
is based on the principle that end devices perform Every pattern recognition is based on previous-
the processing and only send the result or partial re- ly extracted features in considerable quantity and
sult to the MEC server or the MCC server. However, quality. Due to this given diversity, selecting suitable
none of these approaches can be found in pure form parts is relevant in classification. The method gener-
in practice. Instead, cloud and edge computing tech- ally used in machine learning for feature extraction
niques are combined to cover various use cases and is the use of the framework open-source Speech
exploit their advantages. and Music Interpretation by Large-space Extraction
The topic of speech emotion recognition (SER) (openSMILE) [18], which in turn includes the datasets
and its feature extraction and pattern recognition extended Geneva Minimalistic Acoustic Parameter
are a constant part of current research. Thus, the re- Set (eGeMAPS) and ComParE. In deep learning,
cent literature review shows that in SER, especially recent literature has increasingly used CNN for this
the continuous and the spectral features of speech purpose. In this approach, the output layer is either
are used since these reflect the characteristics of preserved as a classifier or replaced by, for example,
emotions most appropriately. Priority is given to an SVM.
the course of the primary speech frequency or loud- In the phase of emotion classification, diverse sets
ness, the temporal ratios, pauses, and spectral fea- of emotions diverge, which in turn harbor a differ-
tures such as the Mel frequency cepstral coefficient ent number of emotions. The settings can vary from
(MFCC) and the Mel spectrograms [3]. The most five to 20 other emotions. The most common set of
common classification techniques used in speech emotions in the literature refers to the six basic emo-
recognition in recent years are the Gaussian Mix- tions, according to Ekman [19], which are happiness,
ture Model (GMM) in combination with the Hidden sadness, anger, fear, disgust, and surprise, including
Markov Model (HMM), the support vector machine a seventh neutral emotion.
(SVM) (Cortes and Vapnik 1995), and more recently, In the mainstream literature, descriptions of the
neural networks [11,12]. Consequently, the successes hardware on which a neural network is trained or
achieved in this regard also inspired using these executed are scarce. However, Tariq et al. [15] de-
techniques in SER, but with a focus on neural net- scribe that neural networks—especially deep neural
24
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
25
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Min.
Number of Max. Length Average Total length
Database length in Language Emotions
files in sec. length in sec. in minutes
sec.
Neutral, joy,
Emo-DB 535 1.23 8.98 2.78 24.79 German sadness, anger, fear,
disgust, boredom
Neutral, joy,
TESS 2800 1.25 2.98 2.06 95.91 English sadness, anger, fear,
disgust, surprise
Neutral, joy,
EMOVO 588 1.29 13.99 3.12 30.59 Italian sadness, anger, fear,
disgust, surprise
26
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
guage data within the system. Consequently, imple- purposes. Table 2 lists the hardware components
menting SER necessitates an initial filtering process used in the training process.
that distinguishes between speech and non-speech Once the functional models are prepared, they are
audio. For this purpose, Hershey et al. [31] describe a transferred to ambient end devices where real-time
neural network called YAMNet in their paper, which classification occurs. Typically, these end devices
is specifically trained on audio classification using have lower computational power and memory than
the AudioSet database [32]. YAMNet can distinguish servers, rendering them unsuitable for training ma-
between 521 audio classes from human, animal, chine learning or deep learning models. However,
machine, and natural sources. This publicly availa- these devices’ internal processors and microphones
ble Convolutional Neural Network (CNN) YAMNet are well-suited for executing such models. The per-
serves as the upstream filter for SER and is utilized formance achieved is contingent upon the specific
in both methods described in this work. However, hardware components of each device. Table 3 pre-
since YAMNet does not impact the main SER algo- sents the ambient terminals employed in this study
rithm developed in this research, further details re- and their respective specifications. The table encom-
garding its functionality or structure are not provided passes two distinct types of devices chosen to repre-
here. sent each category.
Distinct terminal configurations are employed The selection of end devices encompasses various
during the creation and execution of the prototypes. categories, encompassing multiple operating sys-
The machine learning and deep learning models are tems, performance levels, and storage capacities. As
trained on a Windows server. This step focuses on a result, the chosen range serves as a representative
processing the five databases using the respective cross-section of the available ambient end devices.
method, necessitating hardware utilization with ap- Due to these end devices’ distinct architectures
propriate performance capabilities. It should be not- and operating systems, specific methods and require-
ed that servers do not possess microphone inputs due ments are necessary for utilizing the trained models.
to their general structure, broad physical localization, The fundamental prerequisites for deploying the
and clustering, which are also irrelevant for training ported models on the end devices include the frame-
27
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
works openSMILE and TensorFlow/TensorFlow Lite illustrates an example of a confusion matrix, depict-
alongside the Python programming language. ing the four potential outcomes. The primary dis-
To measure and evaluate the performance of the tinctionConfusion
lies in whether
matrices are commonly the employed
model’s prediction
for representing aligns
and evaluating
classification problems in machine learning. These matrices juxtapose the model’s predictions
prototypes, appropriate metrics are employed, focus- with reality or deviates from it.
with the actual states. Figure 3 illustrates an example of a confusion matrix, depicting the four
potential outcomes. The primary distinction lies in whether the model’s prediction aligns with
ing on real-time capability and classification success reality or deviates from it.
In contrast, the ISO/IEC 2382: 2015 standard de- the overall accuracy ofthe respective
+ machine learn-
=
fines real-time as the “processing of data by a com- + + +
ing system is calculated from the confusion matrix
puter in connection with another process outside the andachieved
For comparison, the previously mentioned CNNs are used, whereby the highest accuracy
represented
in each case, as in decimal
shown in Table 4, isnumbers,
deposited. The CNNswhere thearevalue
mentioned sorted
chronologically by publication date within the table. Besides MobileNetV2, the cited papers do
computer according to time requirements imposed 1.0therepresents
results of the CNNsthewere maximum, and the value
to what is0.0 is the
not specify the machine used to generate results. Therefore, for the time being, it is assumed that
generated on cloud-like servers, similar described by
by the outside process” (ISO/IEC JTC 2015). Thus, Tariq et al. (2019) . Since a direct comparison of server-generated results with terminal device-
minimum.
generated resultsThe accuracy
is not possible, indicates
the subsequent the
interpretation of totalof this
the results number
study is
it is apparent from this definition that specifying an limited.
of correct predictions of the model and is determined
exact time in seconds or milliseconds is not feasible. using the following formula:
Instead, the external process defines the real-time True Positives +True Negalives
Accuracy =
capability, which may include human perception. True Positives + False Pastitves +True Negatives + False Negatives
Human perception is susceptible to linguistic com- For comparison, the previously mentioned CNNs
munication, as pauses of a few milliseconds can be are used, whereby the highest accuracy achieved in
subjectively perceived as interruptions. Vogt et al. [33] each case, as shown in Table 4, is deposited. The
suggest that subjective interruption is perceived after CNNs mentioned are sorted chronologically by pub-
1000 milliseconds. Zhang et al. [34] report that neural lication date within the table. Besides MobileNetV2,
networks for image classification require a range of the cited papers do not specify the machine used to
15.2 to 184 milliseconds for processing, with input generate results. Therefore, for the time being, it is
dimensions similar to the Deep Learning method uti- assumed that the results of the CNNs were generated
lized in this study (224 × 224 × 3). Furthermore, Liu on cloud-like servers, similar to what is described
et al. [35] state that compressed neural networks re- by Tariq et al. (2019). Since a direct comparison of
quire only 103 to 189 milliseconds for processing on server-generated results with terminal device-gener-
ambient devices such as smartphones. Consequently, ated results is not possible, the subsequent interpre-
in this prototyping without employing compressed tation of the results of this study is limited.
methods, a measured duration of fewer than 1000 Additional metrics are utilized for neural net-
milliseconds is considered real-time. works to measure and evaluate training results.
Confusion matrices are commonly employed for These include training and validation accuracy
representing and evaluating classification problems and the duration of training and validation losses.
in machine learning. These matrices juxtapose the Training and validation accuracy are represented as
model’s predictions with the actual states. Figure 3 decimal values, ranging between 0.0 and 1.0, where
28
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
1.0 signifies the highest accuracy. Similarly, training cific application is challenging. Energy consumption
and validation losses are expressed as decimal num- can be estimated indirectly by measuring processor
bers, with no upper limit but a minimum value of 0.0 utilization. Thus, the difference in measured proces-
representing the optimal loss. Consequently, Table 4 sor utilization, represented as a percentage, before
can also be applied to assess the accurate measure- and during classification is utilized as a metric in this
ment of neural networks in this context. However, in context.
evaluating CNNs, the focus is primarily on accuracy, The overall accuracy metric is related to the
rendering an evaluation or classification of training model and, therefore, independent of the hardware
losses unnecessary. Accuracy measurement and the utilized. However, metrics such as time, memory
creation of confusion matrices occur immediately consumption, energy consumption, and processor
after training on the server, unlike the measure of utilization are hardware-dependent. The metrics
classification time. mentioned above are evaluated using the hardware
Table 4. Comparison of prediction accuracy of known CNN
listed in Table 3 in the subsequent analysis.
models.
29
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
training partition is used for model training, while continues in a continuous three-second cycle, pro-
the test partition validates the training results. cessing the following files until manually terminated.
The relevant hyperparameters for training are Figure 4 provides a schematic overview of the
optimized and determined by the algorithm itself. mentioned processing steps, indicating the sequence
Initially, four hyperparameters with specified value of individual actions. Not all processing steps are ex-
ranges are provided. These include the selection of ecuted on the same hardware, and the figure specifies
available SVM kernels (polynomial, linear, sigmoi- which steps are performed on the server and the end
dal, and radial basis function), a regulation parame- device.
ter ranging from 10–3 to 102, and a degree parameter The deep learning algorithm is based on the same
ranging from zero to nine for the polynomial kernel. data corpus to ensure a subsequent parity compari-
The algorithm optimizes and applies various com- son of both approaches. The goal is to generate an
binations of hyperparameters during training on the executable model for subsequent porting to the end
training partition. Following the training phase, vali- devices. As an alternative to machine learning, CNN
dation is performed using the training dataset. acts as a feature extractor and classifier. In this con-
Upon completion of training, the machine learn- text, the creation and training of the CNN are based
ing system can classify new, unknown data based on on TensorFlow [42]. Since a CNN expects image files
the learned generalization. The system is connected instead of audio files as input, it is first required to
to a microphone, which records human speech at generate corresponding representative spectrograms
1024 frames per buffer every three seconds. The from audio data.
recorded audio is stored locally in a 16-bit WAV for- Input for the CNNs is Mel spectrograms derived
mat with a sampling rate of 16000 Hz and a mono from the spectrogram audio representations. There-
channel. The stored file is then read by the machine fore, speech recordings of different lengths also
learning system and processed using YAMNet. If the result in spectrograms of various sizes. However,
classification result from YAMNet indicates “human since it is necessary to always use identically sized
speech”, the file is further processed using openS- spectrograms for training the CNN, the audio files
MILE to obtain eGeMAPS features. Similar to the must be read in and processed with a fixed window.
training phase, this dataset is normalized and passed To ensure a subsequent comparison, the first three
to the SVM for emotion classification. The classi- seconds of the audio files are read in, of which only
fication is performed immediately, and the process two seconds are processed with an offset of half a
Figure 4. Schematic representation of the processing steps of the machine learning method.
30
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
second. If an audio file is smaller than three seconds, They must be normalized before the Mel spec-
the content of the file is duplicated until the mini- trograms can serve as input data to the CNN. In this
mum size is reached. To generate the spectrograms, method, normalization consists of importing the
the final two-second audio file is transformed using image files with fixed dimensions of 224 × 224 × 3
Fast Fourier Transform with a window size of 512 pixels and then dividing each pixel value by a factor
milliseconds and a jump size of 256 milliseconds of 255. The dimensions of 224 × 224 × 3 pixels have
between windows. From this spectrogram, the Mel been proven in image recognition by CNN since
spectrogram is derived with 128 Mel filters, a min- AlexNet, which is why they are also used here. The
imum frequency of 0 Hertz, and a maximum fre- division by 255 is necessary because neural networks
quency of 8000 Hertz. Finally, this Mel spectrogram are known to operate from zero to one, and thus the
is plotted on a dB scale with 80 dB and the magma pixel values are normalized.
color scheme and is available for subsequent classi- Training a neural network from scratch is compu-
fication. The generated dB-scaled Mel spectrogram, tationally intensive, time-consuming, and involves
including its intermediate stages, is visually present- significant data, so transfer learning is used now.
ed in Figure 5. This way, the entire data corpus is Transfer learning for neural networks consists of
preprocessed and then split again into a training and removing the output layer of a pre-trained neural
a test data set in a ratio of 80 to 20. network and replacing it with new output layers
Figure 5. Mel spectrogram generation in individual steps visualized with TensorFlow and Matplotlib.
of its own, which act as classifiers. MobileNetV2, es of equal size to apply transfer learning. First, the
which is pre-trained on ImageNeta, is this method’s training of the new model operates with 50 initial
base model for transfer learning. MobileNetV2, the epochs on the 154 untrainable layers and weights,
training base, was initially designed for object recog- which is used to transfer the experience of the base
nition and execution on mobile devices. Three sepa- model to the task. Only the three newly added layers
rate output layers then augment the base model with are trainable in this phase. Subsequently, the model
a GlobalAveragePooling2D, a dropout of 0.2, and a training operates another time with 50 epochs, this
fully connected layer including a softmax activation time with 54 trainable and 100 untrainable layers,
function, which is used when the number of classes which is called fine-tuning in the corresponding liter-
is more significant than two. The neural network is ature. Each epoch is run with 100 training steps and
then optimized using the Adam optimization algo- ten validation steps. The training and validation data
rithm [43] and an initial learning rate dropout of 10–5. are read into the model training with a batch size of
Furthermore, categorical cross-entropy is used 16.
as a loss function, which is used to quantify the dif- Figure 6 schematically visualizes the described
ferences between two probability distributions in sequence of the deep learning process. In addition
prediction. Finally, the model is trained in two phas- to the individual processing steps and their arrange-
ment, it is also apparent here which processing steps
a ImageNet is an image database consisting of over 15 million labeled
and high resolution natural images with approximately 22000 categories. are executed on the server or the end device. The
31
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
similarities and differences between this graphic and by the confusion matrix, shown in Figure 7, for the
the similarities and differences in Figure 4 become test partition of the trained model and the classifica-
apparent. The diagrams outline the appropriate pro- tion report based on it. In the former, the list of the
cessing steps, sequence, and execution location. Fur- seven considered emotions can be lined up vertically
thermore, the optimization of the hyperparameters on the left edge as absolute values on the one hand
and the general parameterization of the models are and horizontally on the bottom edge as values pre-
part of the training and are, therefore, not listed in dicted by the model on the other. Furthermore, it can
both diagrams. Furthermore, it can be seen from the be seen from the marked green fields that the mod-
comparison that additional work steps are necessary el’s prediction agrees with the actual values in most
for the deep learning method before the neural net- cases. Those correct predictions represent the true
work training is started. The effects of the extra steps positives. The remaining whitish areas represent the
on the result will be discussed later. False Positives since the predicted emotion classes
While describing the results of both prototypes, do not match the real-world conditions.
a distinction is made between the generation of the
executable model, including its training, and the re-
al-time classification by the same model.
The prototype is a supervised machine learning
method using a support vector machine as a classifier
for emotion determination. The algorithm is trained
on the five databases to generate an executable and
portable model. The training of this algorithm, in-
cluding the optimization of the hyperparameters, is
about 96 hours. The hyperparameters selected and
optimized by the algorithm are the radial basis func-
tion kernel, the regulation parameter with a value of
Figure 7. Confusion matrix of the machine learning process.
101, the gamma with 10–2, and the degree parameter
with a zero value. The confusion matrix calculates the model’s
The accuracy after the model training is indicated overall accuracy using the above formula, which is
32
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
already included in the classification report and vis- Furthermore, measurement is also performed on
ualized in Figure 8, together with other metrics. For the Raspberry Pi. Here, the arithmetic means of 15
example, in addition to the overall accuracy rates, observed cycles is 4.33 seconds for audio and emo-
accuracy rates for individual emotions are also pres- tion classification. Also, at this point, the result is
ent. Figure 8 shows that the overall accuracy of this compared with a pure emotion classification without
procedure is 0.77. With an achieved value of 0.77 prior audio classification. This cycle takes an aver-
and 77%, respectively, the model is ranked between age of 0.337 seconds, calculated from 15 observed
the CNN MobileNetV2 and LeNet based on Table 4. cycles. In conclusion, based on the set benchmarks,
The confusion matrix and the classification report the emotion-only classification is declared real-time
are valid for the machine learning model and thus capable, but the combined audio and emotion classi-
independent of the end device used. fication with a time of 4330 milliseconds is not. Fur-
ther memory measurement indicates an increase in
memory usage after starting classification from 415
megabytes to 586 megabytes, a relative increase of
4.4% for a total availability of 3838 megabytes, from
10.8% utilization for the first time to 15.2% utiliza-
tion now. On the other hand, processor utilization in-
creases by 26.4% points during execution, from 0.7%
utilization for the first time to 27.1%.
The second method described in 4.3, a CNN, is
Figure 8. Classification report of the machine learning method. used based on the pre-trained MobileNetV2 network.
An exemplary metrics measurement is performed The CNN developed in this method is also trained
on the notebook mentioned in Table 3. The estimat- with the same intention on the five databases. The
ed time is measured within the model between two training time of the neural network with a total of
cycles. A cycle consists of a speech recording, an 100 epochs is about six hours. As described above,
audio classification, and an emotion classification, the training proceeds in two identical phases of 50
depending on this result and its output. The average epochs each, one step for initial learning and one
estimated time for 15 observed cycles is 0.799 sec- stage for finetuning the model. Following each train-
onds. For comparison purposes, processes without ing epoch, the training and validation accuracy and
audio classification are also performed, where emo- the training and validation loss are reported. The pre-
tion classification is applied to each incoming audio liminary result after the first 50 epochs is graphically
signal. The average time required here is 0.114 sec- visualized in Figure 9. On the left side is the training
onds, calculated from 15 observed cycles. and validation accuracy course. The training and
In conclusion, based on the criteria set, 144 mil- validation loss, each for 50 epochs, is shown on the
liseconds for emotion classification alone and 799 right side. Each of these four parameters is shown as
milliseconds for emotion classification, including a separate curve.
previous audio classification, are declared to be re- The accuracy curve shows that the training ac-
al-time capable. The memory requirement increases curacy starts at around 0.16 and increases to ap-
from 9.8 gigabytes to 10.1 gigabytes after starting proximately 0.42 by the 50th epoch. The validation
the classification, which is derivatively an increase accuracy also starts at about 0.16 and reaches an ac-
from 62% to 64% utilization. On the other hand, the curacy of about 0.5 at the 50th epoch. It is noticeable
processor utilization increases by 17% points after here that the validation accuracy is always above the
starting the classification, from an average of 11% to training accuracy. This phenomenon is due to the pe-
around 28%. culiarity of transfer learning. When training a neural
33
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Figure 9. Training result of the CNN after initial 50 epochs with transfer learning.
network without transfer learning, the training accu- Fifty other fine-tuning epochs supplement the
racy is always above the validation accuracy. result in the previous Figure 9. A vertical straight
Similar behavior can be observed in the course of line in the 50th epoch shows at which point the
the loss curve. The training loss curve starts at a loss fine-tuning phase starts. Thus, after the beginning
of around 2.3 and drops to about 1.55 by completing of the fine-tuning stage, the training accuracy curve
the 50th epoch. On the other hand, the validation drops to 0.35 but then takes a steeper course than be-
loss curve begins at 2.2 and drops to about 1.4 by fore and reaches the maximum of 1.0 from the 90th
the 50th epoch. Once again, it is characteristic of epoch, at which point the curve stagnates until the
transfer learning that the validation curve always lies 100th epoch. On the other hand, the validation ac-
below the training curve. curacy curve initially drops to around 0.45 after the
During the subsequent fine-tuning of the CNN, start of the finetuning phase. Still, it rises again and
more trainable layers and, thus, more trainable reaches an accuracy rate of about 0.7 by the 100th
weights are available. The model also has more pos- epoch.
sibilities to optimize performance. The result of the A change can also be observed in the loss curves
fine-tuning is illustrated in Figure 10. after the start of fine-tuning. For example, the train-
Figure 10. Training result of the CNN after 100 epochs using transfer learning.
34
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ing loss curve rises to 1.75 after the start of fine-tun- och and remains there.
ing but then drops more sharply than before, reach- On the other hand, the validation loss curve starts
ing a value of 0.1 at the 100th epoch. The validation with a value of about 2.2 and falls with a fluctuating
loss curve does not rise after the start of fine-tuning downward trend until the 35th epoch. There, the
but drops to a value of 0.8 by the 80th epoch, where curve has reached its local minimum of 0.9. How-
the local minimum of the curve is located. By com- ever, the curve rises again until the 100th epoch to
pleting the 100th epoch, the curve rises to about 1.0. about 1.2.
Ultimately, it is not the training accuracy but the The advantages of transfer learning become ap-
validation accuracy that is decisive for the correct parent when comparing Figure 10 with Figure 11,
classification. With an accuracy of 0.7 and 70 %, and the benefits of transfer learning become evident.
respectively, this result is based on Table 4 ranks be- Starting from the start of fine-tuning, it can be stat-
low MobileNetV2. ed that the validation accuracy curve has already
Analogously, the CNN of this method is trained a reached the value of 0.7 after 20 epochs. In con-
second time on the five databases but without apply- trast, the validation accuracy curve without transfer
ing transfer learning. In this training, the CNN is also learning has only reached this value after about 50
qualified with 100 epochs, but in only one phase and epochs. The advantage can also be seen in that the
with the absolute number of trainable layers. With its validation accuracy curve for the method with trans-
100 epochs, this training has a running time of about fer learning has a higher slope than the validation ac-
six hours, as before. The result of this training of curacy curve without transfer learning. Finally, it can
100 epochs without transfer learning is visualized in be seen that the start of the validation accuracy curve
Figure 11. Here, it can be seen that the training ac- for the process with transfer learning starts higher on
curacy curve starts at a value of 0.22 and rises to the the Y axis with a value of 0.45 than the curve with-
maximum of 1.0 by the 50th epoch. There the curve out transfer learning with a value of 0.22.
remains until the 100th epoch. The validation accu- In contrast to an SVM, the classification result in
racy curve also starts at a value of 0.22 and increases a neural network does not output a single value but a
to 0.7 by the 50th epoch, which remains with fluctu- value range with seven entries corresponding to the
ations until the 100th epoch. The training loss curve number of classes present. The entries in this val-
begins at the value of 2.0 and steadily decreases until ue range represent the probabilities with which the
it reaches the minimum of 0.0 at about the 70th ep- model predicts one class each. The individual entries
Figure 11. Training result of the CNN after 100 epochs without using transfer learning.
35
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
can assume a value between 0.0 and 1.0, with the but the previous mark with 4427 milliseconds is not.
sum of all entries in the value range again resulting Thus, implementing the prototype on the Raspberry
in 1.0. The emotion class with the highest probability Pi lacks real-time capability. The memory used in-
value is the classified emotion. creases from 563 megabytes to 675 megabytes dur-
An exemplary metrics measurement is also ing runtime, a rise of 2.9% points for a total avail-
performed on the notebook mentioned in Table 3. ability of 3838 megabytes, from 14.7% utilization
Here, the time is also measured between two cycles, for the first time to 17.6% now. On the other hand,
whereby one consists of the audio and emotion clas- processor utilization increases by 22% points during
sification, including the output. The arithmetic mean execution, from 2.9% utilization for the first time to
of the estimated time is 0.856 seconds for 15 ob- 24.9%.
served processes. At this point, a comparison is also In this study, four core elements are to be noted as
made to a cycle without prior audio classification. findings. First, a tabular comparison of the results of
The time counted for this cycle is 0.119 seconds, the two methods used is provided in Table 5, where
also calculated from 15 observed cycles. With a time the metrics listed here represent the arithmetic mean
value of 119 milliseconds for a cycle without audio across all measured metrics.
classification and 856 milliseconds for a cycle with It can be deduced from the previous table that an
audio classification, respectively, the result is below SER system can distinguish between speech, non-
the set benchmarks and is therefore considered re- speech, and silence. To this end, the YAMNet neural
al-time capable. The memory requirement increases network, which is not a primary component of this
from 9.9 gigabytes to 10.6 gigabytes from the start of work and was not developed within this research,
classification. Relative to the total available memory, is used within the prototypes. Nevertheless, the
this is an increase of 4% points, from 63% utilization YAMNet neural network is part of both prototypes,
for the first time to 67%. The processor utilization which are thus able to classify audio inputs into dif-
also shows an increase of 16% points, from 15% to ferent categories, such as music, meowing, barking,
31% for the first time. silence, or even speech.
Based on the implementation of the prototype Concerning the databases used in this research, it
on the Raspberry Pi, the average time required for a was shown that they contain various emotional audio
cycle, including audio and emotion classification, is files, including the six basic emotions mentioned by
around 4.43 seconds, calculated from 15 observed Ekman (1971), plus further emotional stimuli such
cycles. In comparison, emotion recognition without as tiredness or boredom. Neutral emotion can also
prior audio classification requires an arithmetic mean be found in the majority of the databases. The proto-
of only 0.393 seconds, again calculated from 15 types trained on these databases are thus able to dis-
practical cycles. With a needed time of 393 millisec- tinguish between the seven emotions. Therefore, an
onds, the latter result is below the set benchmarks, SER system can distinguish between positive, nega-
36
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
tive, and neutral emotions but is not limited to this. the set index of 1000 milliseconds. Additionally, the
Instead, such a system can perform a more detailed SER, including initial audio classification with an
categorization of speech input into individual emo- average time of 799 milliseconds, is below the set of
tions with an accuracy of 77% for machine learning 1000 milliseconds. Related to Deep Learning, the av-
and 70% for deep learning. erage time for a cycle without an audio classification
The third finding relates to the feasibility of an is 256 milliseconds. Meanwhile, the average time for
SER system on ambient terminals but is distin- a cycle with audio and emotion classification is 2642
guished between the phases and the ambient end milliseconds. According to the results, the fourth
devices used in Table 3. Due to the intensive com- finding is that the choice of the method determines
puting power and high runtime, the one-time model whether the real-time capability is given or not.
training step must be executed on a server. There- However, since there is no porting and, therefore, no
fore, therefore not feasible on an end device. The results regarding the machine learning method, there
subsequent real-time classification phase is based is the possibility that this last finding is falsified.
on the trained model and can be performed multiple
times on a terminal device. The prototype porting to 5. Discussion
a notebook is feasible since notebooks generally sup-
Both machine learning and deep learning ap-
port corresponding Python runtime environments.
proaches in this study rely on a shared data corpus,
Thus, running the emotion classification is possible
which is obtained and selected based on predefined
on a notebook regardless of the method used. Porting
criteria outlined in the existing literature. These crite-
the prototypes to a Raspberry Pi, on the other hand,
ria encompass several factors, including a minimum
is more complex since, on the one hand, Python audio duration of one second and a maximum dura-
runtime environments are supported in principle. tion of 20 seconds. The choice of one second as the
Still, on the other hand, the necessary frameworks, minimum duration is justified because shorter audio
openSMILE and TensorFlow, are not available for files generally lack sufficient information. Converse-
Raspberry Pi’s. Alternatively, for TensorFlow, the ly, selecting a maximum period of 20 seconds is
porting of the Deep Learning procedure is done with somewhat arbitrary, as durations of 10, 15, or 30 sec-
TensorFlow Lite, which runs the trained model on onds could have also been considered. However, as
the end device. In the absence of openSMILE com- depicted in Figure 2, most audio files in the chosen
patibility with 32-bit operating systems and a lack of databases have durations of less than 10 seconds.
a qualitative alternative, the porting of the machine Another criterion is the exclusion of non-spoken
learning procedure is omitted at this point. In sum- sentences since the prototypes focus on speech-
mary, it can be stated that realizing an SER system emotion recognition (SER) rather than general
using edge computing is only possible to a limited audio classification. Therefore, the audio files must
extent. While they assist in executing deep learning exclusively contain speech, even though some music
approaches and neural networks on the end devices, may include spoken segments accompanied by
this does not always apply to the machine learning instruments.
method. Furthermore, it is essential to note that this study
Regarding the real-time capability of the classifi- does not address emotion recognition in music,
cation system, it is necessary to differentiate which although it could be a potential avenue for future
method is used and whether only SER or SER plus research. Including audio files with background noise
prior audio classification is considered. Concerning is essential, as real-life communication often occurs
the machine learning method, the SER system re- in noisy environments. While background noises are
quires an average of 114 milliseconds for pure SER prominent in music, they play a secondary role in
without prior audio classification and is thus below speech-related scenarios. Therefore, incorporating
37
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
databases containing audio data with such background start of the training process. The algorithm then in-
noise would be valuable for enhancing the research. dependently determines the optimal values based on
The criteria specify that the files must be in an these predefined hyperparameters. The selection of
auditory or audiovisual format, but there are no these hyperparameters is based on the research con-
restrictions regarding file type, sampling rate, or ducted by Mao et al. [27]. However, alternative param-
dubbing. Although limiting the selection to purely eters or value ranges described by Wang et al. [44] or
auditive file formats may influence the choice of Cummins et al. [45] could also be considered. Feature
databases, it would not impact the subsequent extraction utilizes the openSMILE framework, par-
procedures, as all audio files are transformed to ticularly the eGeMAPS, which aligns with its usage
a standardized format and file type before model in Cummins et al.’s work.
training. In contrast, the deep learning method employs
The native language used in the audio files is explicit hyperparameters. The training process con-
also not a selection criterion, as indicated in Table sists of two phases, each comprising 50 epochs, as
1, which demonstrates the inclusion of German, established by Tan et al.. Alternatively, Zhang et al.
English, and Italian data in the selected databases utilized a batch size of 30, SGD as the optimization
and audio files in other languages such as Turkish, algorithm, and a learning rate of 10-3 as hyper-
Danish, or Chinese. Since the six basic emotions parameters. However, standard hyperparameters
described by Darwin (1873) and Ekman (1971) employed by Lim et al. include SGD with a learn-
are expressed similarly across cultures, the spoken ing rate of 10-2, a dropout of 0.25, and a Rectified
language does not significantly affect the Mel Linear Unit activation function. Discrepancies also
spectrograms, model training, or results. However, exist in the generation of Mel spectrograms, as men-
it is crucial to include both male and female voices tioned in section 2.3.3 and the relevant literature. For
when selecting databases. Failing to meet this instance, Zhang et al. used 64 Mel filters for audio
criterion could impede data generalization and lead classification within a frequency range of 20 to 8000
to overfitting or underfitting of the model. hertz, utilizing a 25-millisecond Hamming window
Open accessibility and availability of labeled with a ten-millisecond overlap for each window. The
data are mandatory for data collection. Without open variation in Mel spectrogram generation can be jus-
access to the databases, it would be impossible for tified since speech emotion recognition (SER) and
third parties to reproduce the procedures and results speech recognition are distinct processes, as noted
of this study. Moreover, the absence of labeled by Zhang et al. Nonetheless, the use of Mel spectro-
data would render supervised machine-learning grams aligns with the current state of research. Alter-
algorithms infeasible. Investigating the impact natively, MFCC can also be applied within the deep
of different database quantities or compositions, learning procedure.
including language variations, on the outcomes of The base model utilized in this study is Mobile-
this research can be pursued in future investigations. NetV2 when employing transfer learning. However,
Other methodologies that equally impact both the literature suggests considering CNN ResNet50
procedures involve dividing the data corpus into or SqueezeNet [46]. In this study, only the last 54 lay-
training and validation sets using an 80 to 20 ratio. ers out of 154 are fine-tuned. Optionally, different
Preprocessing the audio files commonly entails numbers of trainable layers, such as 32 or 16, can be
transforming them to a 16,000-hertz format with considered. Exploring the impact of modifications to
a mono channel, as frequently reported in the these hyperparameters on the results can be a subject
literature. of future research.
In the machine learning method, the hyperpa- The other cannot be drawn solely from compar-
rameters and their value ranges are defined at the ing the approaches and their results. Table 5 does
38
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
not provide conclusive evidence to support the dom- When examining the individual process steps
inance of either procedure. Examining the training in Figures 4 and 6, no direct conclusion regarding
time reveals that the Machine Learning approach training duration is apparent. However, the Deep
requires approximately 16 times the duration com- Learning method necessitates more process steps
pared to the Deep Learning approach. However, the than the Machine Learning method. As mentioned
Machine Learning model exhibits higher accuracy, earlier, the models’ parameterization is not depicted
faster classification, and lower increases in processor in these figures, as it is part of the mapped training.
load and memory requirements. Specifically, the choice between fixed hyperparam-
The speed advantage of the machine learning eters and ranges of hyperparameter values signifi-
method in real-time classification arises from the uti- cantly impacts training time. In the Deep Learning
lization of distinct emotion recognition algorithms. method, training duration also varies based on pa-
In this case, audio classification is not considered, rameters such as batch size, number of epochs, and
as it is identical in both approaches and precedes number of steps per epoch. The Machine Learning
emotion recognition. In the Machine Learning mod- method determines training duration by the number
el, speech input undergoes openSMILE processing, of hyperparameters and their value ranges. The re-
normalization, and subsequent classification using sulting hyperparameters are determined through the
SVM. Conversely, in the Deep Learning model, the algorithm’s processing and optimization of potential
speech input is initially transformed into a spectro- combinations. In contrast to explicit parameteriza-
gram, stored, normalized, and processed through tion, processing all conceivable combinations is like-
all neural network layers. The storage and retrieval ly responsible for the disparity in training duration.
of spectrograms involve additional read-and-write Consequently, this implies that the overall accuracy
transactions that are not required in the machine of the machine learning model surpasses that of the
learning method, thereby impacting the speed of the deep learning model due to the processing of all
Deep Learning model. However, the difference in combinations.
speed is marginal and invisible to humans, as such Moreover, the results in Figure 10 reveal that
disparities occur in milliseconds. the validation loss curve increases again after
Furthermore, Table 5 highlights that the pure epoch 80. A similar phenomenon is observed in
emotion classification alone operates, on average, Figure 11 from epoch 50 onwards. However, the
ten times faster than the combined audio and emo- validation accuracies in these figures do not exhibit
tion classification, regardless of the chosen method. the same increase. The rising course of these curves
This discrepancy significantly influences the decla- may indicate overfitting, which warrants further
ration of real-time capability, particularly concerning investigation in future research.
porting to the Raspberry Pi. While pure emotion The higher memory requirement in the Deep
classification can be deemed real-time capable, the Learning model is attributed to the necessity of
same cannot be said for the combined type due to ex- storing the generated spectrogram, in addition to
tended runtime. The disparity may likely stem from the primary audio file, for emotion recognition.
the CNN YAMNet employed for audio classification Another contributing factor is the higher number of
and its external development, which falls outside the parameters in the CNN than in an SVM, which are
scope of this study. Consequently, a comprehensive also stored in memory.
analysis of the time difference and its origin cannot Contrary to our expectations, the Machine Learning
be provided. Therefore, optimizing speech classifica- method exhibits higher processor utilization than
tion, which is not extensively examined in this paper, the Deep Learning method. This phenomenon is
could considerably enhance the overall process la- likely due to the improper timing of the recording.
tency. However, considering the marginal variances, the
39
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
21.7% difference in processor utilization between the of processor and memory utilization. Since the sys-
two ways is negligible. tem constantly updates these two indicators, it is
It should be noted that both models consider impossible to identify the exact utilization. Thus,
the presence of a microphone as an essential the processor and RAM utilization documentation
prerequisite. Unlike multimodal emotion recognition, only represents a snapshot, not a calculated average
a unimodal SER system does not require additional value. Furthermore, the maximum number of simul-
provisions such as cameras. Most ambient terminals taneously recognizable emotions is another technical
are equipped with native microphones but lack limitation. This paper assumes that only one emotion
native cameras, as seen in smart speakers or smart is contained in a sentence or voice recording. As the
TVs. Therefore, the prototypes developed in this sentences and audio file length increases, the proba-
study are suitable for porting to such devices. bility that multiple emotions are controlled increases.
Regarding the theoretical foundations of this However, the machine learning method using the
research, SER plays an increasingly crucial role SVM can only classify one emotion, which is why
in human-computer interaction (HCI). HCI occurs the length of the voice recording is limited to three
within the context of remote participation, which seconds.
is a component of the growing computer-supported On the other hand, the CNN in the Deep Learn-
hybrid communication in everyday life. Conse- ing method calculates a probability for each of the
quently, SER holds greater significance in everyday seven emotions. For this reason, this method can
life and is the subject of ongoing research. Similar potentially identify multiple emotions within one
to this study, there are investigations into real-time speech recording. Another technical limitation is the
SER [33] and edge computing. However, no research applicability of the prototypes to only one person.
on SER applications on edge devices exists, as The model training is based on emotional content in
Shi et al. (2016) defined. Thus, the combination the audio files of the acquired databases. Individuals
of SER, edge computing, and real-time processing can be heard in each audio file so that the prototypes
can apply emotion recognition only to individuals.
explored in this study represents a novel research
When multiple individuals speak simultaneously, the
extension. To maintain the focus of this work,
prototypes cannot distinguish between individuals
restrictions are deliberately made. However, other
and their emotions. The extension to multi-person
external limitations also limit this work, which
recognition goes beyond the definition of these pro-
will be explained in more detail below. According
totypes and therefore needs to be investigated in fur-
to Ekman (1971), only the six basic emotions,
ther work.
including a seventh neutral emotion, are considered,
Furthermore, the porting of prototypes is also lim-
which is why emotions such as tiredness or boredom
ited. For example, only two device categories were
are excluded in this work. Accordingly, the data
selected since porting to more devices would exceed
acquisition is made with the mentioned seven
the scope of this paper. For this reason, porting to
emotions, further limiting the selection of suitable
smartphones or tablets, for example, is not included.
databases.
Furthermore, the dimensions of arousal and va-
lence are also omitted. These dimensions can be con- 6. Conclusions
sidered in continuing work but do not play a role in The outcomes and interpretations presented in
the mere emotion recognition in this research. There- this study provide compelling evidence that the de-
fore, it is pinpointed that these dimensions exist, but veloped prototypes are functional and well-suited for
it does not address them in the further course of the practical applications. This Speech Emotion Recog-
study. nition (SER) systems have the potential for various
A technical limitation, however, is the mapping use cases and can offer extensions to existing prod-
40
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ucts and services. Here are some examples of appli- based on emotional states (e.g., increasing
cation areas and their resulting benefits. costs for joyful emotions), is also a potential
(i) Universal Application: SER applications can application.
be utilized wherever speech plays a central This research investigated the ability of an SER
role, such as call centers, radio broadcasts, system to distinguish speech, non-speech, and si-
podcasts, and television shows. New business lence, as well as classify different emotions. The
models can be developed that merge physical study involved a systematic literature review, devel-
and virtual presence. For instance, imple- oping two prototypes using machine learning and
menting an SER system in a smart speaker deep learning approaches, and training the models
can detect vocal activity and emotions in a using a data corpus comprising five audio databases.
home environment. These emotions can be Before being used for model training, the audio files
visually presented to the user and, with their underwent preprocessing, including conversion to a
consent, transmitted to the producer for prod- sampling rate of 16000 Hz and a mono channel.
uct optimization, offering the user a premium In the machine learning approach, the openS-
in return. Similarly, SER can automate the MILE framework was employed for feature ex-
editing of highlights in a broadcast sports traction, generating eGeMAPS features that were
game based on detected emotions. Such normalized and used for classification. Support
scenarios can be extended to internet-based Vector Machine (SVM) served as the classifier. The
broadcasts like Twitch or Netflix. model training took approximately 96 hours on a
(ii) Real-time Audience Mood Capture: SER server, and while successful porting to a notebook
applications can capture the current mood of was achieved, porting to a Raspberry Pi was unsuc-
an audience in real time. Unlike the previous cessful. The prototype demonstrated the capability to
use case, where emotions are summarized identify different sounds in under 1000 milliseconds
over time, this approach focuses on determin- and classify seven emotions in the case of speech.
ing emotion levels at precise moments. This In the Deep Learning Model approach, audio
can be valuable in political talks or product files were transformed into Mel spectrograms, nor-
presentations, where immediate feedback on malized, and used as input for a CNN implemented
expressed emotions is crucial. By providing using TensorFlow. The CNN performed feature
unbiased input to speakers, SER enables extraction and classification. The model training,
them to gauge audience response accurately. including transfer learning, took around six hours on
These techniques apply to physical, virtual, the server. The completed model was successfully
or hybrid forms of communication, further ported to a notebook and a Raspberry Pi. The note-
emphasizing the increasing trend of remote book achieved classification below 1000 millisec-
participation. onds, while the Raspberry Pi required approximately
(iii) Individual-focused Applications: SER appli- 4427 milliseconds. The models’ computation time
cations can cater to individual users, tailoring and classification accuracy were evaluated using the
experiences based on their detected emo- provided formula.
tions. For example, a smart speaker or auto- SER systems are embedded in Human-Computer
mobile with an SER system can adjust music Interaction (HCI) systems and can potentially be ap-
or lighting according to the user’s emotional plied in everyday scenarios. The results of this study
state. In gaming, the algorithm can offer in- demonstrate that the technical feasibility of practical
game relief when anger is detected. Individ- implementation is achievable, and several use cases
ualized advertising in social media or e-com- described in this research can find real-world appli-
merce platforms, varying prices dynamically cations. Moreover, these findings highlight the grow-
41
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ing relevance of SER in everyday communication, as smartphones or tablets, each with its diverse range
where remote participation is increasingly combined of devices and operating systems.
with a physical presence. Such SER systems have Regarding the real-time capability of the proto-
the potential to enhance human-machine interaction, type, it would be worthwhile to explore the execu-
making communication more human-like and intui- tion of digital signal processors and their potential
tive. Based on this research, the acceptance and uti- for optimizing runtimes. Utilizing digital signal
lization of SER-enabled remote participation appli- processors optimized for real-time functions like
cations can be considered an extension of the fourth Fast Fourier Transform in mobile devices like the
criterion of emotion recognition. Raspberry Pi could further enhance the prototypes’
The results and discussions presented in this real-time capability and overall performance.
study can be further enriched and expanded through In conclusion, this study demonstrates the theo-
future research. Additional investigations could ex- retical and practical feasibility of real-time speech-
plore other emotions or broaden the scope of the uti- based emotion recognition through edge computing.
lized databases. Furthermore, examining arousal and The implications of this research extend to practical
valence dimensions would be valuable. Investigating applications and provide a foundation for future in-
the machine’s subsequent actions linked to recog- vestigations.
nized emotions within the SER system is another
avenue worth exploring. For example, studying the Author Contributions
most suitable lighting settings, color combinations,
D.E.d.A. conceived the idea of researching a
or music choices to support or counteract specific
real-time processing method that captures and eval-
emotions based on the determined emotions could be uates emotions in speech. R.B. and D.E.d.A. con-
interesting. This could involve studying music’s gen- ceived the study. R.B. served as D.E.d.A.’s graduate
re, volume, and beat rate and its relation to emotion advisor on his graduate thesis at the FOM University
recognition within songs. Combining both approach- of Applied Sciences. All authors reviewed and ap-
es, selecting songs based on identified emotions and proved the final manuscript.
playing them in response to human emotions, could
provide an intriguing direction for further explora-
Conflict of Interest
tion.
Further research could also focus on the Deep There is no conflict of interest.
Learning method, exploring different hyperparam-
eters for model training and investigating modified Funding
transfer learning techniques. Multitask or semi-su-
This research received no external funding.
pervised learning could offer new perspectives in ad-
vancing SER research. Additionally, the limitations
identified in this study open up opportunities for References
independent research and raise further questions. For [1] El Ayadi, M., Kamel, M.S., Karray, F., 2011.
instance, investigating whether an SER system can Survey on speech emotion recognition: Features,
differentiate between multiple individuals based on classification schemes, and databases. Pattern
speech or identify numerous emotions within a sen- Recognition. 44(3), 572-587.
tence could be explored. Exploring subjective user DOI: https://doi.org/10.1016/j.patcog.2010.09.020
perception and experience could also be valuable. [2] Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N.,
Lastly, the prototypes developed in this study were et al., 2001. Emotion recognition in human-com-
ported to two device categories, prompting whether puter interaction. IEEE Signal Processing Mag-
they can be extended to other device categories, such azine. 18(1), 32-80.
42
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
DOI: https://doi.org/10.1109/79.911197 [12] Nassif, A.B., Shahin, I., Attili, I., et al., 2019.
[3] Schuller, B.W., 2018. Speech emotion recog- Speech recognition using deep neural networks:
nition: Two decades in a nutshell, benchmarks, A systematic review. IEEE Access. 7, 19143-
and ongoing trends. Communications of the 19165.
ACM. 61(5), 90-99. DOI: https://doi.org/10.1109/ACCESS.2019.
DOI: https://doi.org/10.1145/3129340 2896880
[4] Kraus, M.W., 2017. Voice-only communication [13] Schuller, B., Batliner, A., Steidl, S., et al., 2011.
enhances empathic accuracy. American Psychol- Recognising realistic emotions and affect in
ogist. 72(7), 644. speech: State of the art and lessons learnt from
DOI: https://doi.org/10.1037/amp0000147 the first challenge. Speech Communication.
[5] Akçay, M.B., Oğuz, K., 2020. Speech emotion 53(9-10), 1062-1087.
recognition: Emotional models, databases, fea- DOI: https://doi.org/10.1016/j.specom.2011.01.011
tures, preprocessing methods, supporting mo- [14] Hochreiter, S., Schmidhuber, J., 1997. Long
dalities, and classifiers. Speech Communication. short-term memory. Neural Computation. 9(8),
116, 56-76. 1735-1780.
DOI: https://doi.org/10.1016/j.specom.2019.12.001 DOI: https://doi.org/10.1162/neco.1997.9.8.1735
[6] Dincer, I., 2000. Renewable energy and sustain- [15] Khalil, R.A., Jones, E., Babar, M.I., et al., 2019.
able development: A crucial review. Renewable Speech emotion recognition using deep learning
and Sustainable Energy Reviews. 4(2), 157-175. techniques: A review. IEEE Access. 7, 117327-
DOI: https://doi.org/10.1016/S1364-0321(99)00011-8 117345.
[7] Chao, K.M., Hardison, R.C., Miller, W., 1994. DOI: https://doi.org/10.1109/ACCESS.2019.2936124
Recent developments in linear-space alignment [16] Hinton, G., Deng, L., Yu, D., et al., 2012. Deep
methods: A survey. Journal of Computational neural networks for acoustic modeling in speech
Biology. 1(4), 271-291. recognition: The shared views of four research
DOI: https://doi.org/10.1089/cmb.1994.1.271 groups. IEEE Signal Processing Magazine.
[8] Abbas, N., Zhang, Y., Taherkordi, A., et al., 29(6), 82-97.
2017. Mobile edge computing: A survey. IEEE DOI: https://doi.org/10.1109/MSP.2012.2205597
Internet of Things Journal. 5(1), 450-465. [17] Torrey, L., Shavlik, J., Walker, T., et al., 2010.
DOI: https://doi.org/10.1109/JIOT.2017.2750180 Transfer learning via advice taking. Advances in
[9] Cao, K., Liu, Y., Meng, G., et al., 2020. An over- machine learning. Springer: Berlin.
view on edge computing research. IEEE Access. DOI: https://doi.org/10.1007/978-3-642-05177-7_7
8, 85714-85728. [18] Eyben, F., Scherer, K.R., Schuller, B.W., et al.,
DOI: https://doi.org/10.1109/ACCESS.2020. 2015. The Geneva minimalistic acoustic pa-
2991734 rameter set (GeMAPS) for voice research and
[10] Shi, W., Cao, J., Zhang, Q., et al., 2016. Edge affective computing. IEEE Transactions on Af-
computing: Vision and challenges. IEEE Inter- fective Computing. 7(2), 190-202.
net of Things Journal. 3(5), 637-646. DOI: https://doi.org/10.1109/TAFFC.2015.2457417
DOI: https://doi.org/10.1109/JIOT.2016.2579198 [19] Ekman, P., 1971. Universals and cultural differ-
[11] Lin, Y.L., Wei, G. (editors), 2005. Speech emo- ences in facial expressions of emotion. Nebraska
tion recognition based on HMM and SVM. 2005 Symposium on Motivation. University of Ne-
International Conference on Machine Learning braska Press: Nebraska.
and Cybernetics; 2005 Aug 18-21; Guangzhou, [20] Siedlecka, E., Denson, T.F., 2019. Experimental
China. New York: IEEE. methods for inducing basic emotions: A qualita-
DOI: https://doi.org/10.1109/icmlc.2005.1527805 tive review. Emotion Review. 11(1), 87-97.
43
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
44
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
45
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
REVIEW
ABSTRACT
Recent advancements in technology have opened up new avenues for educators to facilitate teaching and leverage
more learning access in the digital age. As the demand for computational skills continues to grow in preparation
for future careers, both teachers and students face the challenge of developing problem-solving, critical thinking,
communication, and collaboration skills within an emerging digital landscape. Technology adoption, big data,
cloud computing and artificial intelligence pose ongoing challenges for both teachers and students in adapting to
the changing workforce development landscape. To tackle these challenges, the paper highlights the importance of
exploring the implications of learning sciences in classroom teaching, developing a holistic vision for professional
development in education, and understanding the complexities of teacher change. To effectively implement these
components, it is crucial to adopt design approaches that prioritize student ownership in education and embrace the
principles of inclusive education to reconceptualize the teaching practices in education and technology.
Keywords: Education; Computational thinking; Teacher education; Professional development; Design; Equity
*CORRESPONDING AUTHOR:
Xiaoxue Du, MIT Media Lab, MIT, Cambridge, MA 02139, USA; Email: [email protected]
ARTICLE INFO
Received: 30 May 2023 | Revised: 9 July 2023 | Accepted: 13 July 2023 | Published Online: 20 July 2023
DOI: https://doi.org/10.30564/jcsr.v5i3.5757
CITATION
Du, X.X., Meier, E.B., 2023. Innovating Pedagogical Practices through Professional Development in Computer Science Education. Journal of
Computer Science Research. 5(3): 46-56. DOI: https://doi.org/10.30564/jcsr.v5i3.5757
COPYRIGHT
Copyright © 2023 by the author(s). Published by Bilingual Publishing Group. This is an open access article under the Creative Commons Attribu-
tion-NonCommercial 4.0 International (CC BY-NC 4.0) License. (https://creativecommons.org/licenses/by-nc/4.0/).
46
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
the unprecedented changes in the immediate future discuss the implications of strengthening teachers’
of education [2]. Teachers must acquire fundamental professional development in the computing science
knowledge and adopt innovative teaching methods education community.
in order to effectively incorporate technology into
their instruction and meet both the academic and so- 2. Key challenges and opportunities
cial-emotional requirements of students in the realm
Implementing in school settings presents new
of technology [3,4]. Drawing upon insights from the
challenges for educators for at least five key reasons.
learning sciences and teacher education literature,
First, there is a lack of “shared meaning” [10] for
technology possesses the capability to pave the way
computer science as an academic discipline in K-12
for groundbreaking teaching approaches within
education. Teachers should strive to develop a shared
classrooms. By harnessing the power of technology,
comprehension of both content knowledge and effec-
educators can unlock diverse learning opportunities
tive pedagogical practices in order to seamlessly in-
that cater to the needs of all students, including those
tegrate them into their curriculum planning. Second,
from diverse cultural and language backgrounds [5].
computational thinking is increasingly considered a
Emerging technologies could serve as cognitive tu-
foundational skill in the 21st century, but is not sys-
tors, peer learners, and conversational agents, in order
tematically addressed in the curriculum. It serves as
to introduce students to novel methods of reflection,
a process for recognizing aspects of computation in
reasoning, and learning in their everyday lives [6,7].
the surroundings and introduces techniques from CS
The growing demands in computer science (CS)
to understand both natural and artificial systems and
education among schools and educational entities
processes [11]. Third, there is not a clear scope and
have shown the need to strengthen students’ knowl- sequence for standards in each grade, which creates
edge and skills in problem-solving and analytical challenges for educators interested in developing
thinking [8,9]. Therefore, establishing meaningful student learning plans to integrate CS across disci-
pedagogical practices and fostering a culture of life- plines. Because of the lack of a scope and sequence,
long learning are crucial aspects when it comes to there is insufficient empirical evidence for student
computer science education. The basic premise of learning and a lack of clear assessment objectives to
the paper is that integrating CS is a complex process, support content definition and sequencing. Fourth,
which requires much more than simply “shoehorn- teachers’ professional development in computer sci-
ing” a new curriculum into the school day. Teachers ence is a new process, which requires more empiri-
need to cultivate the skills to design and establish a cal evidence and research to identify the core profes-
student-centered learning environment, purposefully sional development content material and resources
enabling the effective integration of computer sci- needed to prepare educators for designing stu-
ence education. Moreover, it is essential to foster a dent-centered learning experiences in education [12].
rigorous community space that encourages learners Lastly, recent research has emphasized the signifi-
to make connections between their acquired knowl- cance of nurturing young people’s capacity to create
edge and other areas within the computing field. through the acquisition of computing skills. The
This approach helps foster a sense of belonging development of these skills holds substantial impli-
to the wider computing community. In this paper, cations for their personal lives and the betterment of
we will first synthesize key challenges and oppor- their communities. To ensure that students develop
tunities identified in computer science education. essential computing design skills, it is imperative
Then, we will introduce the key components of a to create an inclusive, motivating, and empowering
research-based, systematic professional development learning environment. This could provide students
approach to build teachers’ capacity to design a stu- with greater autonomy to code, break down complex
dent-centered learning environment. Finally, we will problems, and apply their learning across various
47
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
contexts to solve real-world problems. However, in In this model, teachers take on the role of designers
many cases, the curriculum may focus exclusively collaborating with facilitators to co-design projects
on technical challenges and entry points, requiring that can be implemented in their real school settings.
students to have prior programming experience. This Technology plays a pivotal role as a catalyst for driv-
limits opportunities for students to engage critically ing pedagogical innovation and motivates teachers
with a broader curriculum and participate in a larger as changemakers to advocate and sustain change in
computing community. To address the challenge, it classroom teaching [14,15]. (see Table 1, CTSC Profes-
is important to create a wide and deep learning space sional Development model: Innovating Instruction
for learners, allowing them to connect what they model).
have learned to other computing and content fields The model comprises three fundamental compo-
and fostering a sense of belonging to the greater nents: Design, Situate, and Lead, all aimed at assist-
computing community [13]. ing teachers in transforming their teaching and learn-
One way to address the challenges faced by both ing approaches. It is imperative for teachers to grasp
educators and students in computer science educa- effective teaching practices aligned with principles
tion is to provide more professional development derived from the learning sciences. This understand-
opportunities for educators that position teachers ing enables them to design environments that foster
as designers and effectively integrate computation- meaningful learning experiences and facilitate op-
al-oriented curriculum into daily learning and teach- portunities for students to deepen their understanding
ing. Teachers need the essential knowledge and skills of the subject. Equipping teachers with the capacity
to plan enriched lessons, select the most relevant to design curriculum goals, employ formative assess-
user cases, and design curricula to develop students’ ments, and engage diverse students in inquiry-based
skills in problem-solving, computational thinking, learning environments holds great promise in sup-
and critical awareness in education. In addition, in porting their professional growth [15].
the design process, teachers can develop student-cen- The Situate component plays a crucial role in cus-
tered learning environments that allow students
tomizing the learning experience for each teacher’s
with different background knowledge to engage
classroom and their students. It not only showcases
curriculum, develop their interests, and build confi-
engaging pedagogical practices through a hands-
dence that empowers them to learn and grow in the
on approach but also offers personalized support to
computer science curriculum. Finally, more research
teachers. Incorporating insights from the learning
should be conducted to develop high-quality profes-
sciences, it establishes a foundation for comprehend-
sional development, which could, in turn, prepare
ing the intricacies of learning and thinking. The sci-
a cohort of change leaders to innovate pedagogical
ence of learning and development (SoLD) approach
practices in computer science beyond programming,
has been utilized to expound upon the “whole child
while building local community networks to sustain
model”, which underscores the necessity of address-
the change and innovation in daily teaching.
ing various aspects of students’ academic, cognitive,
ethical, physical, psychological, and socio-emotional
3. Applying the innovating instruction well-being. Specifically, creating a supportive envi-
model in education ronment fosters strong relationships and a sense of
The Innovating Instruction © model has been community among students. The situated approach
developed by the Center for Technology and School encourages teachers to position students as active
Change, Teachers College, Columbia University. “knowledge-builders” within an inquiry-based learn-
The model is developed and built upon the theory of ing environment [16,17].
change, learning science, professional development Finally, the Lead component of the model focuses
theories, and the emerging capabilities of technology. on preparing teachers to become leaders and col-
48
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
laborates with building administrators to empower Learning Environments (STILE 1.0) for STEM (Ex-
individual leadership and foster a culture of change ploratory Award No. DRL-1238643) and STILE 2.0
and innovation. The guiding theoretical framework (Early-Stage Design and Development Award No.
is the theory of change, which recognizes the com- DRL-1621387)—have established the model’s pos-
plexity of the learning environment and the essential itive impact on teachers’ ability to design projects,
components required to facilitate transformative to shift from disciplinary to transdisciplinary project
shifts in education. Strategic planning, the effective design, and to shift instructional thinking to include
implementation of new teaching approaches, and the inquiry-based approaches. The research findings
continuous development of teachers’ beliefs about from the STILE initiatives demonstrate the posi-
teaching and learning are emphasized as pivotal tive impacts of the model on teachers’ pedagogical
factors for driving meaningful change within the change as defined by shifts in STEM perspectives,
school system. The model also underscores the im- STEM design practices, and STEM classroom prac-
portance of creating “shared meaning” among key tices [11]. The research in thirteen diverse school con-
stakeholders, considering the institutional, historical, texts, included 169 classroom visitations, 372 plan-
and cultural perspectives that shape relationships and ning meetings, and over 51 hours of administrator
language in the field of education [10,17]. interaction. The total average dosage was estimated
Two recent National Science Foundation (NSF) at 61 hours per teacher, supporting 169 teachers in
grants—the Systemic Transformation of Inquiry the New York City public school systems, cumula-
49
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
tively ultimately serving over 7536 students. The the teaching profession that prioritizes learning and
research identified positive changes that teachers the needs of learners from diverse communities [18].
have made under the STILE 2.0 program, specifical- For example, professional development programs
ly in teachers’ ability to design projects, shift from can provide teachers with training and resources to
a disciplinary/subject-orientation to a more sophis- enhance their pedagogical skills in computer science
ticated transdisciplinary focus, and broaden their education. This could include workshops on inte-
instructional thinking to include more inquiry-based grating technology into lessons, designing engaging
approaches [11]. coding activities, or implementing project-based
Two recent grants from the National Science learning in the computer science classroom. By
Foundation (NSF), namely STILE 1.0 (Systemic equipping teachers with the necessary knowledge
Transformation of Inquiry Learning Environments and tools, professional development empowers them
for STEM) and STILE 2.0, have demonstrated the to create meaningful learning experiences and cater
positive impact of the model on teachers’ capacity to to the diverse needs of their students in the realm of
design projects, transition from disciplinary to trans- computer science [19,20].
disciplinary project design, and adopt inquiry-based With access to resources in computer science
approaches in their instruction. The research findings (CS) education, teachers receive valuable support
from these initiatives highlight the constructive influ- in designing student-centered learning experiences
ence of the model on teachers’ pedagogical change, that foster the development of students’ identity and
as evidenced by shifts in STEM perspectives, STEM their willingness to actively engage in the broader
design practices, and STEM classroom practices. computing community [21]. For instance, through pro-
The research study took place in thirteen different fessional development, teachers can learn about var-
school settings, comprising 169 classroom observa- ious tools, platforms, and instructional strategies that
tions, 372 planning meetings, and over 51 hours of enable them to create interactive coding projects,
engagement with administrators. On average, each collaborative programming exercises, or real-world
teacher received around 61 hours of support from
CS applications. By incorporating these resources
the program, benefiting a total of 169 teachers in the
into their teaching, teachers can empower students
New York City public school systems and ultimately
to explore their interests, develop problem-solving
influencing over 7536 students. The research find-
skills, and cultivate a sense of belonging within the
ings revealed significant improvements in teachers’
computing field [22]. This not only enhances stu-
abilities to design projects, shift from a narrow disci-
dents’ learning experiences but also nurtures their
plinary focus to a broader transdisciplinary perspec-
enthusiasm and motivation to actively participate in
tive, and enhance their instructional thinking by inte-
the wider CS community beyond the classroom [23].
grating more inquiry-based methods. These positive
As a result, students will have more ownership and
changes were observed as a result of the STILE 2.0
responsibility to explore concepts beyond essential
program [11].
programming ideas (e.g., loops, arrays, conditional
statements). They will utilize their skills to build a
4. Visionary goals in professional deeper understanding of how these concepts apply
development and CS education to broader social and cultural contexts. This expand-
ed perspective encourages students to consider the
4.1 A grand vision for professional develop-
practical applications of computer science in various
ment in CS education
domains, such as healthcare, environmental sustaina-
To effectively introduce computer science curric- bility, or social justice [24,25]. By connecting program-
ulum in schools, professional development plays a ming skills to real-world contexts, students develop
crucial role for teachers to adopt a broader vision for a more comprehensive understanding of the societal
50
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
impact and significance of computer science, em- offer teachers access to various technology tools and
powering them to become critical thinkers and active platforms specifically designed for computer sci-
contributors to their communities [26,27]. ence education. This includes platforms for creating
Finally, CS professional development should interactive programming projects, virtual environ-
deepen teachers’ content knowledge and provide ments for exploring computer science concepts, and
curriculum resources, encouraging teachers to utilize collaborative coding platforms. Through hands-on
different means, access, and representations to sim- workshops and training sessions, teachers can gain
ulate abstract concepts in order to develop students’ practical knowledge of these tools and learn effective
interests and curiosity in the field [28,29]. Specifically, strategies for incorporating them into their teaching.
ongoing research in professional development should To summarize, professional development initiatives
prepare teachers to continuously innovate pedagog- offer teachers increased opportunities to gain famil-
ical practices to design, pilot and implement com- iarity with a variety of technology tools, demon-
puter science curriculum in classroom settings [30]. strate their use in classroom instruction, and provide
It should invite educators to consider a broader, cul- checkpoints for reflection on the implementation
turally relevant approach that designs curriculum process [37,38]. By successfully employing new prac-
situated for a range of learners, especially students tices and research-based methods, teachers receive
who have been traditionally under-represented in the further support in assimilating innovative approaches
computing fields [31]. The CS literature has shown into their existing belief systems [39,40].
that women and students of color have been over-
looked and excluded by the wider computing com- 4.3 Reconceptualizing teaching practices in
munity [8], therefore, it is critical for teachers to iden- CS education
tify effective approaches for engaging all students in
the field of computer science education [32,33]. In the realm of computer science education, it is
crucial to critically examine practices and emerging
research in general education [41]. For example, teach-
4.2 The complexity of the teacher changes in
er education programs can incorporate pedagogical
CS education
strategies that promote hands-on learning experienc-
Research findings demonstrate that thoughtful es, such as coding workshops or robotics projects.
professional development can significantly impact By engaging students in these practical activities,
teachers’ ability to incorporate technology into their they can develop a deeper understanding of program-
classroom practices [34]. Studies indicate that teach- ming concepts and gain valuable problem-solving
ers’ beliefs and practices can evolve when provided skills. Additionally, project-based approaches can be
with clear and specific instructions during profes- implemented in computer science education [42]. For
sional development sessions in CS education [35]. In instance, students can work on real-world projects
the context of computer science education, ongoing like designing a mobile application or creating a
research should place a strong emphasis on exposing website for a local business. These projects not only
teachers to the design process and enabling them reinforce technical skills but also encourage critical
to explore the integration of technology in projects thinking and creativity as students navigate challeng-
that promote a shift in instructional thinking [36]. For es and make design decisions [43].
example, professional development programs can Collaborative problem-solving is another impor-
provide teachers with opportunities to engage in tant strategy to consider. An example of this could
coding and computational thinking activities them- be organizing group activities where students collab-
selves, allowing them to experience firsthand the orate to solve complex coding problems or develop
creative process involved in designing and devel- a software solution together [44]. Through teamwork,
oping computer programs. These programs can also students learn how to communicate effectively, share
51
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ideas, and leverage each other’s strengths to achieve ing situated, culturally relevant computer science
a common goal [45]. To stay up-to-date with the lat- curriculum [55]. In addition, professional development
est advancements, it is important for educators to should provide teachers with an easy-to-use platform
explore research and developments in educational that could encourage students to quickly build pro-
technology, computational thinking, and computer totype, implementation solutions without creating
science instruction [46]. For instance, they can learn complicated programming syntax [56]. For instance,
about new tools and platforms that facilitate inter- the growing usage of block-based programming lan-
active learning experiences or discover innovative guages (e.g., PoseBlocks, App Inventor) has shown
teaching methods that enhance student engagement the value for students to build solutions, implement
and understanding [47]. design, and create functional mobile applications
By reconceptualizing teacher education in com- without complex debugging and programming pro-
puter science, educators can better prepare future cess [57,58]. The ongoing research study could further
teachers to design and deliver engaging and mean- explore core processes and components that prepare
ingful learning experiences [48]. For example, they teachers to use, adapt, and implement computer sci-
can develop new curricular materials that incorporate ence curriculum with technology integration across
coding exercises, multimedia resources, and interac- diverse classroom settings domestically and interna-
tive simulations to make learning more interactive tionally in computer science education [59,60].
and dynamic [49]. Furthermore, assessment approach-
es can be adapted to evaluate students’ computation- 5. Conclusions
al thinking skills, creativity, and problem-solving
abilities [50]. This could involve designing coding To enhance teacher education in the field of com-
challenges or projects that require students to apply puter science, it is crucial to equip teachers with the
their knowledge in practical contexts, as opposed skills to strategically utilize technology in designing
to traditional exams or quizzes that solely focus on engaging curriculum that promotes deep learning in
theoretical concepts [51]. By focusing on these as- computer science education. Given the complexity
pects, students not only gain technical knowledge, of school systems, collaborative efforts involving
but also develop the necessary skills to thrive in researchers, scientists, and professionals are neces-
an ever-changing digital landscape. They become sary to drive these transformative shifts. To prepare
equipped with computational thinking skills, creativ- for the change, the Innovating Instruction model
ity, and problem-solving abilities, which are highly has shown an effective model to incorporate interac-
sought-after in the industry [52,53]. tive and hands-on activities, project-based learning,
and real-world applications of computer science
4.4 Embracing computer science education concepts, tailor their instruction to meet the diverse
for all students with real-world connections needs and interests of their students, and constant-
ly refine their instructional techniques and become
Specifically, effective professional development more effective educators. By nurturing a community
should provide teachers with more accessible re- of practice and fostering collaboration among teach-
sources that reduce barriers for teachers to learn, ers, researchers, and professionals, the field can col-
adopt, and integrate into the daily curriculum [54]. For lectively drive the adoption of innovative technology
instance, providing teachers sample curriculum that and create a more engaging and impactful learning
allows teachers to adapt and integrate into current environment for students.
lesson plans, could be effective for teachers to devel-
op capacity in computer science disciplines, develop
students’ interests in exploring CS topics and en-
Author Contributions
courage educators to understand the value of design- Both authors made equal contributions to the
52
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
53
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
practice of the science of learning and develop- in practice: Scaffolding teenagers’ learning
ment. Applied Developmental Science. 24(2), about emerging technologies and their ethical
97-140. and societal impact. International Journal of
[17] Scardamalia, M.B., 2014. Knowledge building Child-Computer Interaction. 34, 100537.
and knowledge creation. Cambridge handbook [27] Tsortanidou, X., Daradoumis, T., Barberá, E.,
of the learning sciences. Cambridge University 2019. Connecting moments of creativity, com-
Press: Cambridge. pp. 297-417. putational thinking, collaboration and new
[18] Harju, V., Niemi, H., 202). Newly qualified media literacy skills. Information and Learning
teachers’ support needs in developing profes- Sciences. 120(11/12), 704-722.
sional competences: The principal’s viewpoint. [28] Alfaro-Ponce, B., Patiño, A., Sanabria-Z, J.,
Teacher Development. 24(1), 52-70. 2023. Components of computational thinking
[19] Ng, D.T.K., Lee, M., Tan, R.J.Y., et al., 2022. A in citizen science games and its contribution to
review of AI teaching and learning from 2000 to reasoning for complexity through digital game-
2020. Education and Information Technologies. based learning: A framework proposal. Cogent
1-57. Education. 10(1), 2191751.
[20] Dash, B.B., 2022. Digital tools for teaching and [29] Ketelhut, D.J., Mills, K., Hestness, E., et al.,
learning English language in 21 st century. In- 2020. Teacher change following a professional
ternational Journal Of English and Studies. 4(2), development experience in integrating computa-
8-13. tional thinking into elementary science. Journal
[21] Biswas, S., Benabentos, R., Brewe, E., et al., of Science Education and Technology. 29, 174-
2022. Institutionalizing evidence-based STEM 188.
reform through faculty professional develop- [30] Bragg, L.A., Walsh, C., Heyeres, M., 2021. Suc-
ment and support structures. International Jour- cessful design and delivery of online profession-
nal of STEM Education. 9(1), 1-23. al development for teachers: A systematic re-
[22] McGill, M.M., Reinking, A., 2022. Early view of the literature. Computers & Education.
findings on the impacts of developing evi- 166, 104158.
dence-based practice briefs on middle school [31] Mystakidis, S., Fragkaki, M., Filippousis, G.,
computer science teachers. ACM Transactions 2021. Ready teacher one: Virtual and augmented
on Computing Education. 22(4), 1-29. reality online professional development for K-12
[23] Apiola, M., Sutinen, E., 2021. Design science school teachers. Computers. 10(10), 134.
research for learning software engineering and [32] Li, M., 2020. Multimodal pedagogy in TESOL
computational thinking: Four cases. Computer teacher education: Students’ perspectives. Sys-
Applications in Engineering Education. 29(1), tem. 94, 102337.
83-101. [33] Kafai, Y.B., Baskin, J., Fields, D., et al. (editors),
[24] Casey, E., Jocz, J., Peterson, K.A., et al., 2023. 2020. Looking ahead: Professional development
Motivating youth to learn STEM through a gen- needs for experienced CS teachers. SIGCSE’20:
der inclusive digital forensic science program. Proceedings of the 51st ACM Technical Sympo-
Smart Learning Environments. 10(1), 2. sium on Computer Science Education; 2020 Mar
[25] Tissenbaum, M., Weintrop, D., Holbert, N., et 11-14; Portland OR, USA. New York: Associa-
al., 2021. The case for alternative endpoints in tion for Computing Machinery. p. 1118-1119.
computing education. British Journal of Educa- [34] Tshukudu, E., Cutts, Q., Goletti, O., et al. (edi-
tional Technology. 52(3), 1164-1177. tors), 2021. Teachers’ views and experiences on
[26] Schaper, M.M., Smith, R.C., Tamashiro, M.A., teaching second and subsequent programming
et al., 2022. Computational empowerment languages. ICER 2021: Proceedings of the 17th
54
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ACM Conference on International Computing Thinking (CT) in teaching and learning. Learn-
Education Research; 2021 Aug 16-19; Virtual ing and Motivation. 78, 101802.
Event, USA. New York: Association for Com- [44] Wang, Y., 2023. The role of computer supported
puting Machinery. p. 294-305. project-based learning in students’ computation-
[35] Rich, P.J., Larsen, R.A., Mason, S.L., 2021. al thinking and engagement in robotics courses.
Measuring teacher beliefs about coding and Thinking Skills and Creativity. 48, 101269.
computational thinking. Journal of Research on [45] Bers, M.U., Blake-West, J., Kapoor, M.G., et
Technology in Education. 53(3), 296-316. al., 2023. Coding as another language: Re-
[36] Bereczki, E.O., Kárpáti, A., 2021. Technolo- search-based curriculum for early childhood
gy-enhanced creativity: A multiple case study of computer science. Early Childhood Research
digital technology-integration expert teachers’ Quarterly. 64, 394-404.
beliefs and practices. Thinking Skills and Cre- [46] Huang, W., Looi, C.K., 2021. A critical review
ativity. 39, 100791. of literature on “unplugged” pedagogies in K-12
[37] Griful-Freixenet, J., Struyven, K., Vantieghem, computer science and computational thinking
W., 2021. Exploring pre-service teachers’ beliefs education. Computer Science Education. 31(1),
and practices about two inclusive frameworks: 83-111.
Universal design for learning and differentiated [47] Yildiz Durak, H., Atman Uslu, N., Canbazoğlu
instruction. Teaching and Teacher Education. Bilici, S., et al., 2022. Examining the predic-
107, 103503. tors of TPACK for integrated STEM: Science
[38] Dignath, C., Rimm-Kaufman, S., van Ewijk, R., teaching self-efficacy, computational thinking,
et al., 2022. Teachers’ beliefs about inclusive and design thinking. Education and Information
education and insights on what contributes to Technologies. 1-28.
those beliefs: a meta-analytical study. Educa- [48] Lee, S.W.Y., Liang, J.C., Hsu, C.Y., et al., 2023.
tional Psychology Review. 34(4), 2609-2660. Students’ beliefs about computer programming
[39] Almazroa, H., Alotaibi, W., 2023. Teaching 21st predict their computational thinking and com-
century skills: Understanding the depth and puter programming self-efficacy. Interactive
width of the challenges to shape proactive teach- Learning Environments. 1-21.
er education programmes. Sustainability. 15(9), [49] Lee, S.J., Francom, G.M., Nuatomue, J., 2022.
7365. Computer science education and K-12 students’
[40] Bhutoria, A., 2022. Personalized education and computational thinking: A systematic review.
artificial intelligence in the United States, China, International Journal of Educational Research.
and India: A systematic review using a human- 114, 102008.
in-the-loop model. Computers and Education: [50] Ung, L.L., Labadin, J., Mohamad, F.S., 2022.
Artificial Intelligence. 3, 100068. Computational thinking for teachers: Develop-
[41] Bozkurt, A., 2020. Educational technology re- ment of a localised E-learning system. Comput-
search patterns in the realm of the digital knowl- ers & Education. 177, 104379.
edge age. Journal of Interactive Media in Educa- [51] Kallia, M., van Borkulo, S.P., Drijvers, P., et al.,
tion. (1). 2021. Characterising computational thinking in
[42] Çiftci, S., Bildiren, A., 2020. The effect of cod- mathematics education: A literature-informed
ing courses on the cognitive abilities and prob- Delphi study. Research in Mathematics Educa-
lem-solving skills of preschool children. Com- tion. 23(2), 159-187.
puter Science Education. 30(1), 3-21. [52] Ogegbo, A.A., Ramnarain, U., 2022. A system-
[43] Saad, A., Zainudin, S., 2022. A review of Proj- atic review of computational thinking in science
ect-Based Learning (PBL) and Computational classrooms. Studies in Science Education. 58(2),
55
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
56
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ARTICLE
ABSTRACT
Previous mobile usability studies are only pertinent in the context of ergonomics, physical user interface, and mo-
bility aspects. In addition, much of the previous mobile usability conception was built on desktop computing measure-
ments, such as desktop and web application checklists, or scarcely addressed the mobile user interface. Moreover, the
studies focus mainly on interface features for desktop applications and do not reflect comprehensive mobile interface
features such as navigation drawers and spinners. Therefore, conducting usability evaluation using conventional us-
ability measurement would result in irrelevant results. In addition, the resulting works are tailored for usability testing,
which requires highly skilled evaluators and usability specialists (e.g., usability testers and user experience designers),
who are rarely integrated into a development team. The lack of expertise could lead to unreliable usability evaluations.
This paper presents a review from industrial experts on a comprehensive and feasible usability evaluation framework
developed in our previous work. The framework is dedicated to smartphone apps, which integrate evaluator skills and
design concerns. However, there is no evidence of its usefulness in practice. Therefore, the usefulness of the frame-
work measurement for evaluating apps’ usability in the eyes of non-usability specialists is empirically assessed in this
paper through an expert review. The expert review involved eleven industrial developers and was complemented by a
semi-structured interview. The method is replicated in comparison with a framework from another study. The findings
show that the formulated framework significantly outperformed the framework (p = 0.0286) from other studies with
large effect sizes (r = 1.81) in terms of usefulness.
Keywords: Usability framework; Mobile usability; Usability evaluation; Expert review; Heuristic walkthrough
*CORRESPONDING AUTHOR:
Hazura Zulzalil, Department of Software Engineering and Information System, Universiti Putra Malaysia, UPM Serdang, Selangor, 43400, Ma-
laysia; Email: [email protected]
ARTICLE INFO
Received: 28 June 2023 | Revised: 24 July 2023 | Accepted: 26 July 2023 | Published Online: 31 July 2023
DOI: https://doi.org/10.30564/jcsr.v5i3.5816
CITATION
Zulzalil, H., Rahmat, H., Ghani, A.A.A., et al., 2023. Expert Review on Usefulness of an Integrated Checklist-based Mobile Usability Evaluation
Framework. Journal of Computer Science Research. 5(3): 57-73. DOI: https://doi.org/10.30564/jcsr.v5i3.5816
COPYRIGHT
Copyright © 2023 by the author(s). Published by Bilingual Publishing Group. This is an open access article under the Creative Commons Attribu-
tion-NonCommercial 4.0 International (CC BY-NC 4.0) License. (https://creativecommons.org/licenses/by-nc/4.0/).
57
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
58
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
59
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
teristics in their work, the UIs mainly reflect desktop usability measurement for mobile applications,
and web application UIs and features such as input, Hoehle, Aljafari, and Venkatesh proposed a set of
hardware, bookmarks, and headers. measurements for mobile applications in view of in-
terface features based on measurement theory [12]. A
2.3 Theoretical-based frameworks content analysis approach was used to relate the con-
structs and variables. Though their work explicitly
The usability framework, which was developed
focused on apps, the measurements were tailored for
based on usability conceptions of principles and
Microsoft-based apps, and mobile interface features
criteria, mainly revolves around the effectiveness of
are not well addressed in their work. Instead, they
existing usability measurements for evaluating apps.
emphasize aspects such as usability principles, aes-
For example, Dubey and Rana [9] acknowledged
thetics, and navigation.
the characteristics and features of mobile devices.
They doubt the effectiveness of existing usability
2.4 Decision-based framework
measurements on mobile phones. By hierarchically
organizing usability indicators (principles), criteria, The primary purpose of adopting a decision-based
and properties based on a goal-mean relationship framework is to determine a usable mobile applica-
between the parameters, they formulated a frame- tion. Lachgar and Abdelmounaim pursued an analyt-
work for usability specialists to conduct an analytical ic hierarchy process in developing their framework.
evaluation of mobile phones. While focusing on the Grounded in measurement theory, he developed
parameters of each abstraction level and all three cat- usability constructs and variables to facilitate the se-
egories of UIs (PUI, LUI, and GUI), their checklist lection of usable mobile phones [15]. Table 1 summa-
suffers from redundancy, ambiguity, confusion, and rizes the literature review.
indirectly measurable issues. Earlier mobile usability studies emphasized the
Pursuing a different approach, Gómez, Caballero, physical user interface. While the logical user inter-
and Sevillano formed their framework by formu- face persists across most computing platforms, rapid
lating a structure of heuristics and sub-heuristics, updates in smartphone technologies highlight the im-
paired with a checklist based on their semantic rela- portance of the graphical user interface, particularly
tions [6]. They achieved excellent results in address- interface features. The coverage of IU studied previ-
ing mobile-specific usability issues while focusing ously conforms to the scope of UI covered in the re-
on LUI and GUI. Unfortunately, though they argue viewed framework from the age of feature phones to
for the effectiveness of a desktop-centered checklist handhelds until smartphones, where PUI is scarcely
for evaluating apps, a portion of their checklist stems studied in recent works.
from a web-based checklist that appears irrelevant
for apps.
3. Formulation of framework meas-
Judging by the limitations of mobile devices, Fa-
urement
tih Nayebi developed a heuristic-based framework
for app evaluation [7]. A set of usability criteria estab- Representative definitions of usability by the
lished based on his review of academic and industri- industry (i.e., ISO 9241-11 [16], ISO 9126 [17]) and
al heuristics, theories, and guidelines were assigned academia [5,18-22] are usually referred to most studies.
to the most relevant logical groups of the reviewed In the context of mobile usability studies, Harrison
bibliographic references. Although he managed to et al. [5] work, which extends Nielsen’s usability
address the characteristics of mobile devices, the conception in view of the ISO 9241-11 context, is
proposed criteria were ambiguous and hardly ad- deemed as a comprehensive reference [23]. However,
dressed mobile interface features. neither metrics nor checklists are associated with
Further, arguing for the effectiveness of current their work, thus leaving little support for usability
60
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Mapping of Analytic
Goal Question Goal Question Pairwise- Pairwise- Content Content
abstraction levels Hierarchy Content analysis Content analysis
Metric Metric comparison comparison analysis analysis
components Process
Tunnel vision Tunnel vision Large number Large number Tunnel vision Tunnel vision Tunnel vision Large number of
Drawback Tunnel vision bias
bias bias of evaluations of evaluations bias bias bias evaluations
Establishing Establishing
Countermeasure
Not applicable Not applicable selection selection Expert review Not applicable Not applicable Expert review Content rating
implemented
criteria criteria
Platform smartphone smartphone Feature phone Tablet Feature phone Smartphone, tablet smartphone smartphone Handhelds
Scope LUI LUI LUI, GUI LUI, GUI PUI, LUI, GUI LUI, GUI LUI, GUI GUI PUI, LUI, GUI
61
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Game-based measures 10
Physical user interface related measure (Ex: widget, soft keys, notification drawer) 11
Input and output devices (Ex: Desktop based input hardware, wearable) 8
Miscellaneous (Ex: conflicting measure between the same bibliographic reference, application
8
purpose, design statement or fact)
62
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
pose is understandable at first sight”, were excluded, similar apps’ design patterns and usability features.
although they refer to usability criteria of under-
standability. The rationale is that the measure does 4. An integrated usability evaluation
not contribute towards achieving user goals in oper- framework
ating an app, thus irrelevant in representing usability
for apps. Characterizing usability solely on usability prin-
Additionally, given that the measures consist of ciples or usability attributes suffers from a lack of
different forms such as usability requirements, heu- reflection on interface features in detail such as no-
ristics, checklists, guidelines, recommendations, and tification and interaction method, which is another
usability problems, it is not possible to review the aspect influencing mobile usability. On the other
bibliography in terms of quality criteria that share a hand, depending solely on the UI component for the
similar meaning, the same name, or both. Instead, evaluation would be inappropriate for measuring the
regardless of their original form, the measures were usability factor. In addition, considering apps’ short
rephrased into a checklist. time-to-market, where usability specialists are rarely
These measures were reviewed using a content involved during the usability evaluation, there is a
analysis technique to develop usability constructs for need to support non-usability specialists in conduct-
apps. Content analysis of the measures developed ing reliable usability evaluations from their point of
relevant emergent quality attributes and interface view. These suggest a mobile usability framework
features, which later resulted in a paired usability that integrates multiple evaluator viewpoints. How-
checklist. Initially, a conceptual definition for each ever, this would result in different evaluation criteria,
usability criterion and interface feature is established. such as interface features and usability features, in
The conceptual definitions are made as unambiguous contrast to usability specialists and developers, who
as possible in the context of apps. Conceptually sim- mostly view usability in terms of usability heuristics
ilar items and repeated items referring to the same and quality criteria. Figure 1 illustrates the concep-
usability criteria are grouped together and rephrased tual framework.
to homogenize the resulting usability checklist. In The usability constructs are abstracted into three
the case of conflicting items, items that coincide with tiers of abstraction levels: usability feature level,
other items are retained, and the conflicting items are usability criteria level, and interface feature level.
excluded from the checklist pool. Finally, the usabil- Each abstraction level of the framework denotes
ity criteria are examined for similarities and differ- a construct that consists of a group of framework
ences in terms of their design patterns. The usability components. The framework components are paired
criteria are then grouped under conceptual units of with the usability checklists for usability inspection.
63
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Each framework tier reflects the different viewpoints study, the design patterns are formulated as usability
of the usability evaluator and their level of expertise. features to meet the viewpoint of usability specialists
The identified interface feature serves as the frame- in conceptualizing usability as an emergent property
work measurement, which formed the interface fea- of app interaction complexities. Table 3 presents the
ture component for the lowest abstraction levels in elicited usability features in this study.
the framework. The usability criteria are tied to the The usability features level denotes a collection
middle tier of the framework, the usability criteria of smartphone characteristics. These features are
component. Components for the top abstraction lev- characterized by the attributes in the usability criteria
els and usability features were identified by formu- level. It is formulated to meet the usability special-
lating conceptual units with similar usability criteria. ist’s viewpoint in conceptualizing usability as an
Figure 2 illustrates the framework abstraction level. emergent property of app interaction complexities. It
serves as an evaluation basis for both 1) specialists
4.1 Usability features who view usability in terms of design patterns and
2) non-usability specialists, such as developers and
Usability is commonly viewed by specialists in designers, who could benefit from understanding us-
terms of constructs such as heuristics, principles, and ability in terms of design functionalities, in conduct-
guidelines, which are generally abstract. However, ing usability evaluation.
the mobile context of use, such as the interaction
and operating environment, of the application on the
4.2 Usability criteria
intended platform has been regarded as an emergent
property that affects usability [9,10,26]. Functional fea- Characterizing usability solely by either usability
tures of technology have been addressed in usability principles or usability attributes suffers from a lack
studies through design patterns [28-31]. The design of reflection on interface features in detail such as
pattern of app functionalities demonstrates the in- notification and interaction methods, which is anoth-
teraction complexities of smartphone apps. In this er aspect influencing mobile usability. However, on
64
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
the other hand, depending solely on the UI compo- 4.3 Interface features
nent for the evaluation would be inappropriate for
The interface feature level defines components
measuring the usability factor. Therefore, this study
that are tied to the usability criteria in the middle tier.
bridged usability constructs, usability criteria, and
This level facilitates technical evaluators (e.g., ana-
interface features together with usability features.
lysts, designers, etc.) who perceive usability in view
The usability criteria level consists of a collection
of the design context approach. It evaluates usability
of usability attributes addressing the corresponding
in view of design elements. Table 5 lists the compo-
usability feature in the top tier. It emphasizes usabil-
nents of the interface features.
ity evaluation from a software engineering perspec- In the formulated framework, each usability fea-
tive. Table 4 lists the components of the usability ture is decomposed into several usability criteria (as
criteria. in a one-to-many relationship). However, a usability
The label next to each usability criterion denotes criterion is tied to more than one checklist, assessing
the usability features they are associated with. The different UI elements. Likewise, it is also possible
usability criteria and interface features in the next for the UI elements to be associated with more than
tier facilitate usability evaluation and the perception one usability criteria (as in a many-to-many relation-
of the evaluator in the domain of software engineer- ship). Table 6 exhibits the partial list of the paired
ing and development. usability checklist.
65
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
5. Evaluating the frameworks use- online databases subscribed by Universiti Putra Ma-
fulness laysia (UPM) and accessed publications. The search
was performed using Google Scholar to review the
In our previous work, we validated the compre- recently proposed checklist-based framework pub-
hensiveness of the framework components among lished during the development of the framework in
academicians in Malaysia’s public universities [32]. this study. The query returned 424 results in the Eng-
The components were refined based on the survey lish language. Any matching results that have been
responses. Subsequently, the components were adopted in developing the framework in this study
evaluated for their feasibility in real practice among are omitted to avoid bias. Subsequently, publication
software engineering practitioners in Malaysia and on the checklist-based framework was filtered for
refined once again based on the survey response [33]. selection. The process ends with two relevant search
In this paper, we conducted an expert review and results. Since the work of Joseph [36] is more about
a semi-structured interview to evaluate the frame- usability heuristics, we have selected the work of
work’s usefulness in comparison to existing usability Thitichaimongkhol and Senivongse [37] as a compari-
evaluation frameworks. son against the formulated framework.
Usefulness is characterized in most usability stud-
ies as a composition of usability and as is utility [34,35]. Methods and material
Likewise, available usefulness questionnaires (e.g., Prior to the evaluation, the participants were giv-
USE and TAM) measure usefulness in the same di- en a demographic form to record their background
mension. The dimension includes a composition of experience, the specifications of the smartphone used
several usability criteria, such as ease of use, learn- during the evaluation, such as brand and operating
ability, and satisfaction, in addition to as-is utility. system, and their experience using apps in the domi-
This section demonstrates the framework’s evalua- nant category in the marketplace.
tion in terms of its usefulness in comparison to the Evaluating the entire framework measurement
selected study. (373 checklists) from this study in comparison with
The usability evaluation framework to be com- the previous work is inefficient in terms of time and
pared to the one from this study was selected resources. Therefore, the evaluation scope covers us-
through an exhaustive search of existing work on ability measurements from both sets that match the
66
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
ISO/IEC 25010 product quality model. Although the face features of the primary task for Lazada apps.
usability criteria corresponding to both checklist sets The participants were given two sets of checklists
for the evaluation have a different name compared (76 items from the formulated framework and 39
to the ISO/IEC 25010 quality criteria, the conceptu- items from the other framework), which correspond
al definition for the corresponding usability criteria to the ISO/IEC 25010 usability criteria and interface
shares the same description as the ISO/IEC 25010 features of primary tasks from selected apps. The
usability criteria. evaluators were required to perform a heuristic walk-
Three apps from different categories commonly through on three apps (Google+, Viber, and Lazada)
used by Malaysians (from survey responses in our using both checklist sets. Subsequently, they are
previous study) are selected from the Play Store.
required to review both checklist sets. The checklist
Task analysis is performed on the apps to identify the
sets are given in random order. The first evaluator is
primary task and the interface feature associated with
given Set 1, followed by Set 2. Meanwhile, the next
each task. Usability criteria from both studies (this
evaluator is given Set 2, followed by Set 1.
study and the other) corresponding to the interface
Finally, both frameworks were rated for their
features associated with the primary task are selected
usefulness using the USE questionnaires. The eval-
for the heuristic walkthrough. Figure 3 exhibits an
uators were given two sets of USE questionnaires,
excerpt from the identified checklist from this study,
corresponding to the usability criteria from ISO/IEC one for each framework. The questionnaire includes
25010 and interface features of the primary task for 30 checklist items on a 7-point Likert scale. The
Lazada apps. scale ranges from 1 (strongly disagree) to 7 (strongly
Likewise, the checklist for Set 2 is prepared us- agree). The resulting USE score was analyzed using
ing the same task and interface features of the same paired t-test to determine if there was any difference
apps, corresponding to the same usability criteria as between the compared frameworks. A post hoc test,
in Set 1. Figure 4 exhibits an excerpt from the iden- Cohen’s D, is used to investigate the effect size on
tified checklist from another study, corresponding to the significance of the compared framework. Equa-
the usability criteria from ISO/IEC 25010 and inter- tion (1) explains Cohen’s D measure of effect size.
67
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
gree). The resulting USE score was analyzed using paired t-test to determine if there was
(stronglyframeworks.
nce between the compared agree). The resulting
A post hoc USEtest,score was D,
Cohen’s analyzed
is usedusing paired t-test to determine if there was
to investigate
any difference between the compared frameworks. Figure
A post 4.
hocChecklist for SetD,2.is used to investigate
test, Cohen’s
size on the significance of the compared framework. Equation (1) explains Cohen’s D
effect size. the effect size on the significance of the compared framework. Equation (1) explains Cohen’s D
measure of effect size.
=
� �
2− 1� �
� 2− ��1�
able rating, even after reviewing the score given. It
= is not feasible to set up an upfront meeting with that
1 2 + 2 2 (1)
1 2 + 2 2
2 expert. The expert’s background profile showed that
2
sample mean, S = sample standard deviation. this expert is the only participant to select gaming
where,
where, � ==sample
sample mean,
mean, S = sample
S = sample standardstandard
deviation.devia-
tion. into small, medium, large, and very large impacts [38]. An effect apps
ect size is categorized size as the most frequently used apps. Since gaming
ates a small magnitude of Effect size Medium
the effect. is categorized
effectinto
sizesmall,
rangesmedium,
from 0.5large, and very
vicinities. Largelarge impacts [38]. An effect size
ofEffect
ranges from 0.8. Meanwhile, size
0.2 indicates
a very ais categorized
small
large magnitude
effect ofinto
size is small,
the effect.
indicated medium,
byMedium
values sizeapps
effect than,
larger ranges
or
have
from different designs
0.5 vicinities. Large and purposes compared
3. effectand
large, size very
rangeslarge
from 0.8. Meanwhile,
impacts [38]
. An a very
effectlarge effect
size to otherbycategories
size is indicated
of 0.2 values largerofthan,
apps, or the expert’s perception of
equal to 1.3.
indicates a small after magnitude of the effect.why Medium usability might skew away from the other five partic-
emi-structured interview is conducted the experiment to clarify the experts rated
work better than effect A
size
the other. semi-structured
Theranges
identity from interview
of each0.5 is conducted
vicinities.
treatment, after
which Large the
framework was the ipants,
experiment
effect one who
to clarify whywere not familiar
the experts rated with mobile gaming.
one framework
tudy, and which framework was from better
thethan the other.
previous studyThe wereidentity of eachuntil
not revealed treatment, which framework was the one
the Thus,
end.
size
fromranges
this from
study, and0.8.
whichMeanwhile,
framework was a very
from large effectstudy were not
the previous
it is reasonable that the expert gave a contra-
revealed until the end.
le is to have an expert’s honest opinion on the framework.
Theisrationale
size indicated is to by
havevalues
an expert’s
largerhonestthan,opinion 1.3. dictory score compared to the other participants.
on thetoframework.
or equal
and discussions A semi-structured interview is conducted after Therefore, the response by this expert was excluded
6. Results and discussions
the industrial
experiment to clarify whymobile the experts rated one testers from the analysis. The overall USE score collected
approached eleven experts, ranging from developers to mobile
esigners. However, five of Wethemapproached
repeatedly eleven industrial the experts, ranging from mobile the developers
from each expert to mobile testers to evaluate the useful-
is analyzed
framework
and UX better than therescheduled
other.ofThe dateline
identity to each
of complete
and failed to complete thedesigners.
requested However,
evaluation five even after themmore repeatedly
than three rescheduled
follow-up the dateline to complete the
treatment,
evaluation and which failed framework
to complete was from thiseven ness
the one evaluation
the requested of thethan
after more frameworks.
three follow-up Table 7 exhibits the mean
study, and which framework was from the previous differences in the overall USE score for both treat-
reminders.
ly six of the experts managed to complete the experiment. A difference in the overall USE ments.
study
d by the six experts for were Only
both notsix revealed
frameworks was until
of the experts the end.
managed
computed. The rationale
to complete
However, onethe themisgives an
of experiment. A difference in the overall USE
ating, even afterto scores
reviewing rated
have anthe
by the six
score given.
expert’s
experts for
It isopinion both
not feasible frameworks
to up an upfront meetingA paired
was
thesetframework.
computed. However, one t-test
of themisgives
used an to determine the signif-
xpert. The expert’sunreliable
background rating, evenhonest
profile after reviewing
showed that this the
onscore
expert is given.
the only It is not feasible
participant to to set up an upfront meeting
ng apps as the most with that expert.
frequently usedTheapps. expert’s
Since background
gaming appsprofile showed that
have different designsthisicance
expert isofthethe
and onlyusefulness
participant toscore with a p-value of
select gaming apps as the most frequently used
ompared to other categories of apps, the expert’s perception of usability might skew away apps. Since gaming 0.05. The distribution
apps have different designsofandthe USE score differences
6. Results
purposes compared and to discussions
other categories of apps, the
her five participants, who were not familiar with mobile gaming. Thus, it is reasonable between expert’s perception that of usability
both
might skew away
gave a contradictory from the compared
score other five participants, who were notTherefore,
to the other participants. familiar with the mobile
response gaming.
by Thus, it isgroups
reasonableofthat
experts is normally distrib-
We
the approached
expert gave a eleven
contradictory industrial
score compared
was excluded from the analysis. The overall USE score collected from each expert is experts,
to the rang-
other uted,
participants. thus making
Therefore, the it
responseappropriate
by for conducting a
this expert
evaluate the usefulness of thewas excluded Table
frameworks. from the 7 analysis.theThe
exhibits mean overall USE score
differences in thecollected from each expert is
ing from tomobile
analyzed evaluatedevelopers
the usefulness to mobile
of the testersTable
frameworks. and7 exhibits
pairedthet-test.
mean The meaninindicates
differences the that the five experts
E score for both treatments.
UX designers.
overall USE score However,
for both treatments. five of them repeatedly (N = 5) gave a larger USE score for the formulated
Table 7. Overall USE score for both frameworks.
rescheduled the datelineTable
Paired Samples Statistics
to complete
7. Overall USE thescore
evaluation
for both frameworks.framework (mean = 180.60) compared to the other
and failed to complete Paired evaluation
Samples Statistics
Mean theNrequested Std. deviation even
Std. error mean framework (mean = 156.60). In addition, a smaller
Formulated framework USE score 180.60 5 10.991Mean N4.915Std. deviation Std. error mean
after more
Previous framework USE Pair 1 than
score
three
Formulated
156.60
follow-up
framework reminders.
5 USE score 15.143 180.60 5
6.772 standard
10.991deviation 4.915 compared to the other framework
Previous framework USE score 156.60 5 15.143 6.772
Only six of the experts managed to complete the indicates that the USE scores among the experts
paired t-test is used to determine the significance of the usefulness score with a p-value of
distribution of theexperiment.
USE scoreA pairedA difference
t-test is used
differences betweenin the
bothoverall
to determine groups USE
the significance
of scores
experts were more
isof normally
the usefulness scoreconsistent
with a p-valuein the
of formulated framework.
0.05. Thefordistribution ofa paired
the USE score differences between bothfivegroups of experts is normally
rated
thus making it appropriate by the six
conductingexperts for both
t-test. Theframeworks
mean indicates was
that the Tablemean 8 exhibits the results of the paired t-test.
= 5) gave a larger distributed,
USE score for thusthemaking
formulatedit appropriate
framework for (mean
conducting a paired
= 180.60) t-test. The
compared to indicates that the five
computed.
experts (N = However,
5) gave a one
larger USEof them
score gives
for the
amework (mean = 156.60). In addition, a smaller standard deviation compared to the other an unreli-
formulated framework USE
(mean =scores
180.60) for the formulated
compared to framework are
indicates that the the
USEother framework
scores among the(mean = 156.60).
experts In addition,
were more a smaller
consistent in the standard deviation compared to the other
formulated
framework
Table 8 exhibits the results ofindicates that
the paired the USE scores among the experts were more consistent in the formulated
t-test.
framework. Table 8 exhibits the results of the paired t-test. 68
Table 8. Results of the paired t-test.
Paired samples test Table 8. Results of the paired t-test.
Paired samples test
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
69
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
evaluation plan established beforehand, which deter- mobile computing, the drop-down menu is recog-
mines the criteria to be evaluated during an inspec- nized as a list. In addition, the jump menu is known
tion. The formulated framework came in handy due as a spinner in mobile computing. Furthermore, el-
to its features in supporting different backgrounds of ements such as sub-screen and gesture are interface
usability evaluators through the abstraction level. In features that are absent in desktop computing and
fact, restricting usability evaluation to usability cri- could be misinterpreted differently by individuals.
teria of interest will eventually reduce the number of This necessitates a further description of a UI el-
checklists to be used during the usability evaluation. ement’s operation or behavior in view of desktop
computing to facilitate an inexperienced evaluator in
7. Threats to validity this case. Secondly, the experiment is designed as a
repeated measure to reduce variability across partici-
Threats are inevitable yet manageable in re- pants.
search. In this section, threats to internal, external, The main threat to the conclusion validity of the
conclusion, and construct validity are discussed. The result is statistical power. This threat is alleviated by
selection of associated usability criteria tied to UI applying the most common statistical test, appropri-
elements was determined by adopting the ISO/IEC ate for the research design of within-subject design.
25010 product quality model (usability component) Moreover, the significance level was 5%. Hence, the
as a benchmark criterion in comparing the formulat- chance of a Type I error is small.
ed framework over previously proposed frameworks. A checklist from the previous study is used in
However, the experiment is still vulnerable to the comparison to the checklist in this study to manage
order effect. Thus, in replicating the experiment over construct validity. The scores of both checklists in
the other framework in comparison, two sets of eval- measuring the ISO/IEC 25010 product quality model
uation plans representing the formulated framework (usability component) were correlated in conjunc-
and the previously proposed framework were given tion with the use of an established questionnaire to
to the evaluator in random order. measure the framework’s usefulness. In addition, a
Regarding the external threat, the respondent’s well-established usability questionnaire was care-
expertise and experience in using the evaluated app fully selected for this study to measure usefulness
might affect the validity of the result. The respon- appropriately.
dents consist of field experts from various branches
of software engineering disciplines and app develop-
ment stages with a different range of years of expe-
8. Conclusions
rience. In addition, they might use their experience This study empirically evaluates the usefulness of
of using a particular type of app, e.g., transactional, an integrated usability evaluation framework through
communication, or games, as a benchmark in scoring an expert review. The framework measurement is
UI elements. Altogether, the respondent might per- reviewed and compared against a framework from
ceive usability differently based on their background another study. Both frameworks were compared
of expertise and experience with the app, thus af- based on the ISO/EIC 25010 product quality mod-
fecting their subjective judgement. These threats are el (usability component). Hypothesis testing was
controlled through two countermeasures. Firstly, a conducted to investigate the significance and effect
conceptual definition of the evaluated interface fea- size of the response from the expert review. The re-
ture was established. There is a possibility that an sults of the statistical test proved that the formulated
interface feature is recognized by a different name framework had a significant and large effect size and
in academia and industry. For example, a drop-down was more useful compared to the other framework.
menu is well-known in desktop computing. On some In the future, we plan to improve the effectiveness of
occasions, it is used as a jump menu. However, in this framework by comparing the results of using it
70
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
in usability testing against usability inspection. The cations and Services (IiWAS’08); 2008 Nov 24-
rationale is to alleviate a possible false alarm in the 26; Linz, Austria. New York: Association for
formulated framework measurement and capture the Computing Machinery. p. 567-570.
true usability problem. Consequently, an additional [4] Liu, N., Yu, R., 2017. Identifying design feature
checklist could be proposed based on the usability factors critical to acceptance and usage behavior
testing result to complement the developed usability of smartphones. Computers in Human Behavior.
measurement. 70, 131-142.
[5] Harrison, R., Flood, D., Duce, D., 2013. Usabil-
Author Contributions ity of mobile applications: literature review and
rationale for a new usability model. Journal of
H. R. conceived the idea and study of proposing
Interaction Science. 1, 1-16.
a usability evaluation framework for mobile apps
[6] Yáñez Gómez, R., Cascado Caballero, D.,
that incorporates the usability criteria and interface
Sevillano, J.L., 2014. Heuristic evaluation
features in conjunction with different evaluator
on mobile interfaces: A new checklist. The
viewpoints into a framework abstraction level. H.
Scientific World Journal. 1-19.
Z., A. K. and the late A. A. A. A. served as H.R.’s
[7] Nayebi, F., 2015. iOS application user rating
supervisor and co-supervisors on her Ph.D. thesis at
prediction using usability evaluation and ma-
the Universiti Putra Malaysia. All authors reviewed
chine learning [Ph.D. thesis]. Quebec: Universi-
and approved the final manuscript.
ty of Quebec.
[8] Saleh, A., Ismail, R., Fabil, N. (editors), 2017.
Conflict of Interest Evaluating usability for mobile application: A
There is no conflict of interest. MAUEM approach. ICSEB 2017 Proceedings
of the 2017 International Conference on Soft-
Funding ware and E-Business; 2017 Dec 28-30; Hong
Kong. New York: Association for Computing
This research was partially funded by the Re-
Machinery. p. 71-77.
search University Grant Scheme (RUGS), Universiti
[9] Dubey, S.K., Gulati, A., Rana, A., 2012. Inte-
Putra Malaysia (UPM).
grated model for software usability. Internation-
al Journal on Computer Science and Engineer-
References ing. 4(3), 429.
[1] Moumane, K., Idri, A., Abran, A., 2016. Usability [10] Elsantil, Y., 2020. User perceptions of the
evaluation of mobile applications using ISO 9241 security of mobile applications. International
and ISO 25062 standards. SpringerPlus. 5, 1-15. Journal of E-Services and Mobile Applications
[2] Hussain, A., Abubakar, H.I., Hashim, N.B. (edi- (IJESMA). 12(4), 24-41.
tors), 2014. Evaluating mobile banking applica- DOI: https://doi.org/10.4018/IJESMA.2020100102
tion: Usability dimensions and measurements. [11] Malatini, S., Bogliolo, A. (editors), 2015.
Proceedings of the 6th international Conference Gamification in mobile applications usability
on Information Technology and Multimedia; evaluation: A New Approach. MobileHCI’15:
2014 Nov 18-20; Putrajaya, Malaysia. New Proceedings of the 17th International Confer-
York: IEEE. p. 136-140. ence on Human-Computer Interaction with
[3] Hussain, A., Kutar, M. (editors), 2009. Usability Mobile Devices and Services; 2015 Aug 24-27;
metric for mobile application. 2008 Proceed- Copenhagen, Denmark. New York: Association
ings of the 10th International Conference on for Computing Machinery. p. 897-899.
Information Integration and Web-Based Appli- [12] Hoehle, H., Aljafari, R., Venkatesh, V., 2016.
71
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
Leveraging Microsoft’s mobile usability guide- tors), 2012. Smartphone applications usability
lines: Conceptualizing and developing scales for evaluation: A hybrid model and its implemen-
mobile application usability. International Jour- tation. Human-Centered Software Engineering:
nal of Human-Computer Studies. 89, 35-53. 4th International Conference, HCSE 2012; 2012
[13] Mugisha, A., Nankabirwa, V., Tylleskär, T., et Oct 29-31; Toulouse, France. p. 146-163.
al., 2019. A usability design checklist for Mobile [22] Olsina, L., Santos, L., Lew, P., 2014. Evaluating
electronic data capturing forms: the validation mobileapp usability: A holistic quality approach.
process. BMC Medical Informatics and Decision Lecture notes in computer science. Springer,
Making. 19(1), 1-11. Cham.: New York. pp. 111-129.
DOI: https://doi.org/10.1186/S12911-018-0718-3 [23] Fabil, N.B., Saleh, A., Isamil, R.B., 2015. Ex-
[14] Xu, H., Jonsson, M., 2012. Tablet application tension of pacmad model for usability evalua-
GUI usability checklist—Creation of a user in- tion metrics using goal question metrics (Gqm)
terface usability checklist for tablet applications approach. Journal of Theoretical and Applied
[Master’s thesis]. Huddinge: Södertörns Univer- Information Technology. 79(1), 90-100.
sity College. [24] Inostroza, R., Rusu, C., Roncagliolo, S., et al.
[15] Lachgar, M., Abdelmounaim, A., 2017. Decision (editors), 2012. Usability heuristics for touch-
framework for mobile development methods. screen-based mobile devices. 2012 Ninth Inter-
International Journal of Advanced Computer national Conference on Information Technol-
Science and Applications. 8(2), 110-118. ogy—New Generations; 2012 Apr 16-18; Las
DOI: https://doi.org/10.14569/IJACSA.2017.080215 Vegas, NV, USA. New York: IEEE.
[16] ISO 9241-11:1998 Ergonomic Requirements [25] Mi, N., Cavuoto, L.A., Benson, K., et al.,
for Office Work with Visual Display Termi- 2014. A heuristic checklist for an accessible
nals (VDTs)—Part 11: Guidance on Usability smartphone interface design. Universal Access
[Internet]. International Organization for Stan- in the Information Society. 13, 351-365.
dardization; 1998 [cited 2018 Dec 21]. Available [26] Soomro, S., Ahmad, W.F.W., Sulaiman, S.
from: https://www.iso.org/standard/16883.html (editors), 2012. A preliminary study on heuris-
[17] ISO/IEC 9126-1:2001 Software Engineering— tics for mobile games. 2012 International Con-
Product Quality—Part 1: Quality Model [Internet]. ference on Computer and Information Science;
International Standard for Standardization; 2001 2012 Jun 12-14; Kuala Lumpur, Malaysia. New
[cited 2018 Dec 21]. Available from: https://www. York: IEEE. p. 1030-1035.
iso.org/standard/22749.html [27] Zahra, F., Hussain, A., Mohd, H. (editors),
[18] Nielsen, J., Budiu, R., 2012. Mobile usability. 2017. Usability evaluation of mobile applica-
New Riders Press: Berkeley CA. tions; Where do we stand? The 2nd International
[19] Constantine, L.L., Lockwood, L.A., 1999. Soft- Conference on Applied Science and Technology
ware for use: A practical guide to the models 2017 (ICAST’17); 2017 Apr 3-5; Kedah, Malaysia.
and methods of usage-centered design. Addi- [28] Zamfiroiu, A., 2014. Factors influencing the
son-Wesley Publishing Co.: Boston. quality of mobile applications. Informatica
[20] Böckle, M., Rühmkorf, J. (editors), 2019. Economica. 18(1), 131.
Towards a framework for the classification of [29] Homann, M., Wittges, H., Krcmar, H., 2013.
usability issues. Human-Computer Interaction– Towards user interface patterns for ERP appli-
INTERACT 2019: 17th IFIP TC 13 International cations on smartphones. Business Information
Conference; 2019 Sep 2-6; Paphos, Cyprus. p. Systems. 157, 14-25.
610-614. [30] Roder, H. (editor), 2012. Specifying usability
[21] Kronbauer, A.H., Santos, C.A., Vieira, V. (edi- features with patterns and templates. 2012 First
72
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
International Workshop on Usability and Acces- ability: Design discourse. Springer: Cham. pp.
sibility Focused Requirements Engineering (Us- 269-280.
ARE); 2012 Jun 28; Zurich, Switzerland. New DOI: https://doi.org/10.1007/978-3-319-20886-2_26
York: IEEE. p. 6-11. [35] MacDonald, C.M., Atwood, M.E. (editors),
[31] Punchoojit, L., Hongwarittorrn, N., 2017. Us- 2014. What does it mean for a system to be
ability studies on mobile user interface design useful? Proceedings of the 2014 Conference on
patterns: A systematic literature review. Advanc- Designing Interactive Systems; 2014 Jun 21-25;
es in Human-Computer Interaction. 1-22. Vancouver BC, Canada. New York: Association
DOI: https://doi.org/10.1155/2017/6787504
for Computing Machinery. p. 885-894.
[32] Rahmat, H., Zulzalil, H., Ghani, A.A.A., et al.,
[36] Joseph, V., 2017. User experience guidelines for
2018. A comprehensive usability model for
improving retention rate in mobile apps [Mas-
evaluating smartphone apps. Advanced Science
ter’s thesis]. Madrid: Universidad Politécnica de
Letters. 24(3), 1633-1637.
Madrid.
[33] Zulzalil, H., Rahmat, H., Ghani, A.A.A., et al.,
2019. Conceptualising mobile apps usability di- [37] Thitichaimongkhol, K., Senivongse, T., 2016.
mension: A feasibility assessment of Malaysian Enhancing usability heuristics for android ap-
industrial practitioners. International Journal of plications on mobile devices. Proceedings of the
Engineering and Advanced Technology. 9(1), World Congress on Engineering and Computer
1708-1713. Science. 1, 19-21.
[34] Tarkkanen, K., Harkke, V., Reijonen, P., 2015. [38] Sullivan, G.M., Feinn, R., 2012. Using effect
Are we testing utility? Analysis of usability size—or why the P value is not enough. Journal
problem types. Design, user experience, and us- of Graduate Medical Education. 4(3), 279-282.
73
Journal of Computer Science Research | Volume 05 | Issue 03 | July 2023
74