Development_of_a_Modularized_Undergraduate_Data_Science_and_Big_Data_Curricular_Using_No-Code_Software_Development_Tools
Development_of_a_Modularized_Undergraduate_Data_Science_and_Big_Data_Curricular_Using_No-Code_Software_Development_Tools
ABSTRACT Over the last decade, Data Science has emerged as one of the most important subjects
that has had a major impact on industry. This is due to the continual development of scientific methods,
algorithms, processes, and computational tools that help to extract knowledge from raw data efficiently
and cost-effectively, compared with early-generation tools. Professional data scientists create code that
processes, analyses and extracts actionable insights from high volumes of data. This process requires a
deep understanding of mathematical principles, statistics, business knowledge, and computer science. But
most importantly, the data science development chain requires knowledge of a high-level programming tool
and its dependencies. This is a major problem in some aspects due to the steep learning curve. In this
paper, we describe and present a modularized Data Science curriculum for undergraduate learners that
relies on no-code software development tools as programming aids for non-computer science majors.
No-code development tools have been added to the traditional teaching pedagogy to improve students’
motivation and conceptual understanding of coding despite their limited programming skills. The study
aims to assess the impacts of visual programming languages on the performance of non-computer science
majors on programming problems. The study’s sample consists of 50 fourth-year students from the Faculty
of Science and Technology at the Midlands State University. A post-survey questionnaire and assessment
items were administered to the control and experimental groups. Results show that the students drawn
from the experimental group benefited from the use of a visual programming language. These results offer
evidence-based recommendations for incorporating high-performance no-code software development tools
in the formal curriculum to aid teaching and learning data science programming for students of diverse
academic backgrounds.
INDEX TERMS Curriculum, data science, education, no-code tools, visual programming languages.
universities, a high-powered delegation comprising 25 under- concepts in their final year projects. To assist our students,
graduate faculty members from a variety of institutions across we came across VPLs, that have helped them develop
the US met to develop a series of curriculum guidelines for data-related projects without actually worrying about the
an undergraduate data science course [2]. Drawn from the intricacies of coding.
three major disciplines; mathematics, statistics and computer The primary motive driving the present work is the need
science, the guidelines stipulate that a graduating student to reduce the data science learning curve for non-majors,
majoring in data science must be proficient in subjects such especially on the practical side, that is mainly characterized
as computational and statistical thinking, mathematics, model by heavy programming. Currently, minimum effort has been
building, algorithms, data modelling and communication. made to formally merge data science education with this new
Such guidelines define the skills that learners are supposed visual programming tool, despite their promising advantages.
to have after completion of the course [3]. As of now, the two fields have existed in parallel with each
As the report by MGI states, data is now a key asset for other, thereby failing to provide an adequate broad-based cur-
companies, and data analytics can improve the company’s key riculum required to support professional development in data
operations or help launch new business models to expand the science. This paper presents a data curriculum initiative using
markets. Considering this goal, there are two ways to achieve No-Code tools (NCTs). The curriculum has been designed
this: (i) engaging professional data scientists, or (ii) up- in such a way that the knowledge base, course structure
skilling existing professionals who are not data scientists with and content are similar, in principle to the curriculum that
the necessary data-based skills required to meet the needs is based on traditional programming languages. It should
of industry. Producing data scientists is straightforward, be noted, however, that the proposed curriculum has not
students would graduate with a major specialization in added or removed content from the existing data science
data science and are then deployed in various industries. curriculum, rather, the present work proposes the integration
On the other hand, developing data literacy skills in graduate of emerging and flexible programming environments in
students who do not have the pre-requisite programming data science education initiatives. We define the appropriate
experience may be challenging due to the steep learning teaching and evaluation methods that are suitable for this type
curve of text-based programming languages that have been of programming. The primary objectives of our work are to
traditionally used to teach or learn the practical aspect of integrate data science education with NCTs and accelerate
data science. However, a different programming paradigm data science education using such tools to reduce the
has been developed over the years. They rely on visual, amount of time required to up-skill a non-data professional.
drag-and-drop, no-code computer programming tools, where We summarize the key contributions of our work as follows:
instructions are encapsulated in blocks, instead of text-based
formal languages. Several blocks can then be connected 1) We describe key features and processes for visual
sequentially to solve a programming problem. As we can programming approaches.
expect, such no-code models offer several advantages, 2) We demonstrate the feasibility of no-code paradigms
especially to new learners. Mainly, it enables the learner to as a potential aid for programming in data science
focus on algorithm development, instead of struggling with education.
the intricacies of the structure or style of the programming 3) We evaluate the performance of visual programming
language. environments, and demonstrate that they are compara-
This work is motivated by our experiences in teaching Data ble with text-based programming languages in terms of
Science as a module in the Department of Applied Physics data science education.
and Telecommunications under the Faculty of Science and 4) We provide a survey on NCTs, as well as empirical
Technology at Midlands State University in Zimbabwe. evidence on the use of NCTs in education.
The Department offers a four-year Bachelor of Engineering 5) We demonstrate, through experiments, that NCTs are
Degree in Telecommunication Engineering, where Data enablers of student success, especially non-computer
Science is offered in the final year. In the module, students science majors in data science programming.
learn the fundamentals of data science as well as the
applications of data science and big data in various domains. This paper is organized as follows; Section II describes
The Department also offers a four-year Bachelor of Science the data science curriculum initiative. The need for a
Degree In Industrial Physics and Instrumentation. Of late, supplementary data science programming tool and the current
data science has emerged as students’ favourites, with the state of visual programming and data science education are
majority of them implementing data science concepts in discussed. Furthermore, empirical evidence on the use of
their final year projects. However, we realized that although VPLs in formalized educational environments is presented.
our students have a strong Mathematical background, the In Section III, we discuss the data science topics that can be
majority of them, especially those from BSc Industrial implemented using VPLs as well as develop assessment items
Physics and Instrumentation often struggle with developing for Python and visual programming languages. Chapter IV
practical computer code. As a result, this has affected discusses the teaching methods suitable for visual program-
the majority of students who intend to apply data science ming languages and Chapter IV discusses the assessment
TABLE 1. Components of a widget. form, whereby parameters are added or changed using
drop-down menus or windows. Diagram-based languages
enable end-user developers to connect basic shapes such
as rectangles, parallelograms, circles, etc by arrows, lines
or connectors to represent programming constructs. On the
other hand, icon-based languages rely on the use of icons
to visualize the organization, design and flow of a program.
Lastly, block-based languages enable end-user developers to
drag and drop components of a program in the form of blocks.
Several blocks can then be connected to define program
flow. In general, using visual or graphical expressions
as a way of writing computer programs greatly reduces
syntax errors, and is easily comprehended by users of
diverse backgrounds, since the human visual system and
visual information processing is greatly optimized for multi-
dimensional data [10].
Many VPLs have been developed in the literature.
However, a large number of those do not possess the features
of a true VPL. Although there is no consensus on what
makes up a complete VPL [15], certain criteria must be
met first to be considered a VPL. It is generally agreed that
the language of a visual programming environment must be
able to convey meaningful information for programming,
rather than just cosmetic graphics [12], or rich graphical user
interface features. To extend the criteria, Burnett and Baker
develop a classification scheme for VPL research papers [14].
FIGURE 2. Basic layout and connection of widgets in NCTs. Two widgets In their work, they highlight a set of important features of
are connected sequentially via a communication link that relays a VPL, which can then be used for comparison. A detailed
information between the two. In this workflow, widget A processes the
data, and transmits it to the next widget connected to it through its criteria is presented by Kiper et al. [15]. They suggest that
output port. The input of widget B receives this data, processes it further, VPLs can be assessed based on visual nature, functionality,
and transmits using its output port. Depending on the configuration,
multiple widgets could be connected to one widget to reveal several ease of comprehension, paradigm support, and scalability.
instances of the data. Although this field has received considerable interest in
the past, work is ongoing to address inherent problems that
have plagued early generations of visual programming tools.
In 1986, Myers defined this as any system that could be In the past, researchers had faced difficulty developing large
programmed by the user in 2D or multiple dimensions [10]. programs or processing large datasets [12]. This problem
This requires environments that use graphical techniques to has been solved by recent advances in computer graphics,
aid the entire processes of programming and developing com- abstraction and cloud computing. It is now possible to fit
puter applications [10], [11]. Unlike conventional text-based a lot of blocks, icons or diagrams on the same computer
languages, visual programming is motivated by the ideology screen. Users are now able to navigate large programs
that graphical techniques, particularly pictures, can convey through the use of multiple sheets. The wide uptake of cloud
more information concisely, compared with 1-dimensional computing, and related services has played a crucial role in
linear programming languages [11]. In addition, pictures processing large datasets efficiently, and hence improves the
can break the language barrier, simplifying the process functionality of visual programming tools. In other spheres,
of programming for users regardless of the language they early researchers have cited a lack of functionality as another
speak [11]. In summary, graphical tools provide two things; major drawback of early visual programming tools. Indeed,
(i) a visual environment for programming, and (ii) a language even some modern VPLs are hindered by this problem.
interface for expressing visual information flow [12]. These Besides the lack of functionality, novice programmers may
are some of the predominant factors that have influenced the face limited or no room at all to develop and integrate
development of no-code programming languages. customized program elements due to; proprietary software
To establish a common understanding, Kuhail et al. and the steep learning curve of the tool. Another aspect that
[13] combined Myers [10] and Burnett and Bakers [14] characterized early generations of VPLs is inefficiency. This
taxonomies to divide existing VPLs into four broad areas: was a major challenge due to slow program execution [12],
form-based languages, diagram based, icon-based, and block- and large memory requirements. However, this is no longer
based. According to the authors, form-based programming a problem, nowadays, as most tools leverage web-based
allows end-user programmers to configure a graphical online environments to deliver programming tools with high
is reported in [30]. In their work, Kelleher et al. survey In perhaps an interesting and related application area,
a variety of graphics-oriented programming languages that Estevez et al. introduced Artificial Intelligence to high-school
can be used for different application areas. They cite syntax students using Scratch. Their work is motivated by a strong
and program style as primary barriers to programming and need to grasp the attention of young learners through the use
demonstrate that simplifying the mechanics of programming, of interactive graphical programming tools in computational
especially for novice programmers, greatly reduces the sciences, which is usually characterized by a lack of appeal
barrier to learning to program. Later in 2007, the authors of the presentations [35]. In their work, they teach young
considered using storytelling to motivate programming [31]. learners two basic methods of Artificial Intelligence: data
In their later work, they attribute a falling interest in clustering, and artificial neural networks learning. They
Computer Science in the US to the uninspiring courses taught. selected 37 students and followed a scaffolded teaching
To inspire middle-school girls’ interest in programming, approach, where they provided a pre-built template for the
they use the Storytelling Alice programming environment to students to fill the gaps among lines of code. Just like
create custom 3D animated movies from in-built characters the authors in the previous work, Estevez’s approach also
and environments. Their results show most of the participants conducts a formative and summative assessment of the target
were able to create a sequential program in Storytelling Alice group. A quantitative analysis of the results reveals that the
within just two hours of programming, while 87% were students acquired confidence to understand the fundamentals
able to create a program with multiple flow control mech- of Artificial Intelligence algorithms, and its holistic view.
anisms. Based on these results, the authors concluded that A case study by Ase et al. to teach engineering modules
offering computer programming in the form of storytelling using computer-aided animations was conducted at the
encouraged the target group to learn to develop computer University of Hertfordshire. The study focused on applying
programs. visualization and 3D animations in automotive engineering
To support self-directed learning among young learners in courses to help improve conventional teaching resources.
developing computer programs, Maloney et al. developed the They explore the benefits of automation, with particular
Scratch programming language and environment [32]. Mal- emphasis on 3D computer-aided animation tools for auto-
oney’s programming environment allows students, especially motive studies. This is an innovative paradigm shift from
primary ages to create engaging projects such as animated the conventional methods of delivery that are based on 1D
stories, games and simulations [32]. A distinguishing feature flowcharts, schematics and static objects. In their results,
of Scratch is that program flow is constructed sequentially they report that over 61% of the students reported a better
by joining together building blocks that represent actions or understanding of automotive engineering modules taught
flow control mechanisms. Its primary goal is to introduce using animations.
programming education to learners who have little or We have provided empirical evidence where VPL tools
no programming experience. This goal has motivated the have been applied successfully in education. Results show
worldwide use of the tool, and by 2010, the program had that such programming environments are helpful to early-
been offered in nearly fifty languages and had supported stage learners, they have the necessary tools to foster learning.
almost two million users. By January 2024, the number of Furthermore, it has been shown that such tools grasp the
subscribers has risen to over one hundred million registered attention of learners, promote their motivation to learn,
users [33], signifying the importance of graphical-based and improve their learning experiences, without focusing
teaching of computer programming. The program has been on the mechanics and intricacies of programming, which
so popular that it has been incorporated into formal education have been shown to be a barrier to programming. With
streams, targeting early-stage programmers in different these results, we are convinced that such tools can also be
fields. integrated with data science education to improve problem-
A pilot study by Friss et al. at ORT Uruguay University solving, creativity, motivation, collaboration and data science
during the 2nd semester of 2007 experimented with Scratch in communication at the tertiary level.
two scenarios. They incorporated Scratch in; (i) a university
course and (ii) vocational studies environments to improve III. DATA SCIENCE PROGRAMMING USING NCTs
students’ capabilities in computer science courses [34]. The following Chapter discusses the sections of the data
In their work entitled Scratch: Applications in Computer science curriculum and develops section-specific assessment
Science 1, the authors conduct formative and summative items for evaluating Python coding against NCT workflows
assessments on a group of students who were randomly as shown in Table 4 – Table 12.
selected from the class. They administered scratch, over three It should be noted, however, that the chapters for this
weeks with the control group solving the same programming proposed curriculum have been adapted from the conven-
tasks manually. For their results, 88% of students who had tional data science curriculum, as found in modern textbooks
used the Scratch programming environment described their such as [4] and [36], and teaching has been modified to
learning experiences as ‘‘motivating’’ or ‘‘easy’’, while 80% support data science education using no-code programming
of the control group described their learning experiences as environments. This was done to ensure learners would be
‘‘normal’’ or ‘‘difficult’’. exposed to the same content that is offered in a conventional
TABLE 3. Overview of the proposed curriculum: The proposed data science curriculum using NCTs, including the aims, knowledge area and learning
objectives in nine chapters. The design follows a typical data science structure for majors, except that the practical component does not rely on textual
programming.
data science curriculum while being taught in dynamic and TABLE 4. Assessment items for Chapter 1.
interactive learning environments. This curriculum can also
be used by tutors who want to study introductory data science
using such tools.
TABLE 5. Assessment items for Chapter 2. TABLE 6. Assessment items for Chapter 3.
Therefore, the curriculum for data collection and prepa- techniques [44]. However, it is in the 20th Century that data
ration focuses on students understanding the various data visualization rose to prominence due to major developments
types, formats, and pre-processing stages that are performed in computer graphics, technology, scientific visualization,
on datasets using NCTs, especially very large datasets. personal computers, and software tools [43]. The continual
The starting point of any data-related problem is the developments in software tools into the 21st Century,
collection of usable, representable and unbiased data. This especially open-source tools, enabled users to create custom
is a critical process that requires (1) a prior understanding of visualizations using simple, yet powerful data visualization
the problem at hand, (2) formulating research questions, and libraries such as Pandas, Matplotlib and Seaborn [45], among
(3) a thorough comprehension of the subsequent objectives of others.
the data analysis [38]. Several authors in the literature discuss The objectives of the data visualization stage in the
various principles and procedures that must be followed to data processing pipeline vary depending on a lot of fac-
ensure data integrity, however, that is beyond the scope of tors. However, any data-proficient student must be able
this work. A comprehensive overview is provided by [39]. to effectively communicate insights and findings, support
In this chapter, we will only focus on working with the data informed decision-making, and identify patterns, trends, and
that has already been collected. However, the rule of thumb correlations derived from the data analysis, irrespective of the
is to ask ‘‘What data?’’. Navigating this space will require platform. NCTs support data visualization through the use of
identifying the relevant data sources and planning the data widgets that create visual elements such as scatter plots, line
collection and processing methods. On the contrary, research plots, histograms, bar charts, and heat maps, among others.
shows that a lot of students and novice data scientists often At the end of this chapter, learners must be able to confidently
struggle with this part [40], hence the use of NCTs to simplify create interactive and informative visualizations that (1)
the data collection and preparation process. facilitate the understanding of relatively straightforward or
Table 5 shows some of the assessment items for the data complex data and (2), provide a holistic view of the data, thus
collection and preparation chapter. The primary focus is to facilitating more informed knowledge discovery.
take learners through the processes of collecting data and Table 6 shows some of the assessment items for the data
preparing it for the subsequent steps. visualization chapter. Here, the focus is on using relevant
The learning outcomes of this chapter have been developed widgets to visualize relations in the data.
as follows: (i) identify the sources of data for a particular
project, (ii) evaluate the reliability of data sources and the data D. CHAPTER 4: UNSUPERVISED LEARNING
collection procedures, (iii) collect that data from different The subject of unsupervised learning rose from a strong
sources and (iv) clean and pre-process the data to detect and need to detect anomalies or discover hidden structures or
handle inconsistencies such as missing values, and outliers. trends in unlabeled datasets. A central feature of these
algorithms is that they do not require prior knowledge
C. CHAPTER 3: DATA VISUALIZATION or output labels of the datasets. In other words, they do
Data visualization is considered to be one of the most not require training datasets to learn data dependencies,
important topics in data science [41]. As such, a lot of empha- instead, they learn features on their own from uncategorized
sis has been placed on the development of programming data on the fly. This chapter aims to study a range of
languages, visualization libraries and frameworks to enable unsupervised machine learning algorithms for clustering such
data-driven decision-making. Recent efforts are in huge graph as K-means, hierarchical, and density-based spatial clustering
visualization with big data infrastructure [42]. of noisy datasets. The chapter proceeds with a discussion
The concept of data visualization can be traced back on dimensionality reduction techniques. These are a set of
centuries to ancient Greek mathematicians who utilized algorithms that reduce the number of variables or features,
latitude and longitude information to visualize geographic creating a lower dimensional representation of the dataset.
information [43]. Subsequent developments in coordinate Principal component analysis is widely used for this. Lastly,
systems and Cartesian graphs by scientists, mathematicians the chapter explores algorithms for anomaly detection. This
and philosophers in the 17th Century are widely considered is a very critical and broad area that has witnessed significant
to have laid the foundations of modern data visualization research over the years due to its capability of detecting
TABLE 7. Assessment items for Chapter 4. TABLE 8. Assessment items for Chapter 5.
TABLE 9. Assessment items for Chapter 6. TABLE 11. Assessment items for Chapter 8.
content that consists of an introduction to data science, TABLE 13. Methods of creating project-based learning workflow.
data collection, data visualization, unsupervised learning,
supervised learning, feature engineering, model evaluation
and deployment, time series analysis, and machine learning
automation. In that regard, we believe that more emphasis
should be placed on developing teaching, assessment and
evaluation methods suitable for NCT-based education.
Table 12 shows some of the assessment items for the
machine learning automation chapter. The emphasis is on
testing student’s abilities to automate the entire process of
machine learning.
V. ASSESSMENT METHODS
This Chapter addresses five important questions: (i) how
do you assess learners who use graphical tools for pro-
gramming?, (ii) who has reported on the use of visual
programming languages, especially in higher and tertiary
education scenarios?, (iii) has it been successful? (iv) what
were their recommendations?, and (v) how can this impact
our curriculum design?
The question of student assessment, especially in general
Andragogy, has been addressed thoroughly throughout the
literature [63], [64], [65]. This is mainly conducted to
evaluate how well students have performed against a set of
learning outcomes at various stages of learning. This provides
quantifiable evidence that can then be used by both students
and lecturers to evaluate the knowledge and skills gained
through learning [66]. In [67], Llamas-Nistal et al. discussed
the two main categories of assessment as continuous, and
summative assessment. Continuous assessment is usually
carried out during the instructional process to gather and
analyze information on student’s performance [68]. On the
other hand, summative assessment is carried out towards or
at the end of the learning process to evaluate cumulative
knowledge and skills gained [68].
According to the computing curriculum developed in
2013 by the Association for Computing Machinery – IEEE-
Computer Society, the generic learning outcome of any
programming course is to design, implement, test, and
debug a program that incorporates some basic programming
constructs [69]. This is a guideline that has been used
by many authors in the literature to assess students in
programming and has been used as a basis to judge the
coding abilities of first-year computer science students [70].
Assessing students who use graphical tools for programming
is not in any way different from assessing students who
use conventional programming tools. With VPL tools, tutors
can also test students’ main learning outcomes such as
designing, implementing, testing, and debugging a program
that incorporates basic to advanced programming constructs.
Here, students will connect various widgets of a VPL
to demonstrate program sequence, selection functions, and
iteration loops. Afterwards, the tutor will run their programs
to evaluate these concepts.
TABLE 14. Questions regarding students’ views and experiences on the TABLE 15. t-Test – Statistical analysis of the performance of
use of VPLs in their data science projects. non-computer science majors (Physics and Telecommunications students)
on Python-based and VPL-based assessment items.
REFERENCES
[1] M. Analytics, ‘‘The age of analytics: Competing in a data-driven world,’’
in McKinsey Global Institute Research. McKinsey & Company, 2016.
[Online]. Available: https://www.mckinsey.com/capabilities/quantumbla
ck/our-insights/the-age-of-analytics-competing-in-a-data-driven-world
[2] R. D. De Veaux et al., ‘‘Curriculum guidelines for undergraduate programs
in data science,’’ Annu. Rev. Statist. Appl., vol. 4, pp. 15–30, Aug. 2017.
[3] M. J. Ramzan, S. U. R. Khan, Inayat-Ur-Rehman, T. A. Khan,
FIGURE 7. Basic representation of an artificial neuron. The artificial A. Akhunzada, and C. Naseeb, ‘‘A conceptual model to support the
neuron is analogous in operation to the biological neuron. It has a bias transmuters in acquiring the desired knowledge of a data scientist,’’ IEEE
parameter (constant), and accepts two or more inputs, which are Access, vol. 9, pp. 115335–115347, 2021.
multiplied by their respective coefficients (wP
i ). These weighed inputs are [4] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The Elements
summed together to produce an output z = n i =0 wi xi which is then of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2.
passed through an activation function to produce y = f (z) [72]. Springer, 2009.
[5] J. Demsar, T. Curk, A. Erjavec, C. Gorup, T. Hočevar, M. Milutinovič,
M. Možina, M. Polajnar, M. Toplak, A. Starič, M. Štajdohar, L. Umek,
non-computer science majors to achieve higher marks that are L. Žagar, J. Žbontar, M. Žitnik, and B. Župan, ‘‘Orange: Data mining
toolbox in Python,’’ J. Mach. Learn. Res., vol. 14, no. 1, pp. 2349–2353,
comparable to the marks of computer science majors. 2013.
In this work, VPLs have shown great potential as a [6] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl,
pedagogical aid to data science students who do not have K. Thiel, and B. Wiswedel, ‘‘KNIME–the Konstanz information miner:
a strong programming background. Research on assisting Version 2.0 and beyond,’’ ACM SIGKDD Explorations Newslett., vol. 11,
no. 1, pp. 26–31, Nov. 2009.
learners to transition from no-code tools to text-based [7] A. Santos. (2023). Home. [Online]. Available: https://www.
programming languages is a work of further research. neuraldesigner.com/
In conclusion, the overarching contribution of this work [8] J. M. Wing, ‘‘The data life cycle,’’ Harvard Data Sci. Rev., vol. 1, no. 1,
p. 6, 2019.
will support students, tutors, educational institutions and data
[9] A. X. Zhang, M. M´’uller, and D. Wang, ‘‘How do data science workers
industries by (i) reducing the time and resources required collaborate? Roles, workflows, and tools,’’ Proc. ACM Hum.-Comput.
to learn data science programming, (ii) offering an alter- Interact., vol. 4, no. CSCW1, pp. 1–23, May 2020.
native approach to text-based programming in data science [10] B. A. Myers, ‘‘Visual programming, programming by example, and
program visualization: A taxonomy,’’ ACM SIGCHI Bull., vol. 17, no. 4,
education, (iii) providing detailed procedures for developing pp. 59–66, Apr. 1986.
teaching, learning, evidence-based assessment and evaluation [11] N. C. Shu, ‘‘Visual programming languages: A perspective and a
methods using interactive learning environments. dimensional analysis,’’ in Visual Languages. Cham, Switzerland: Springer,
1986, pp. 11–34.
[12] N. C. Shu, ‘‘Visual programming: Perspectives and approaches,’’ IBM Syst.
APPENDIX A J., vol. 38, no. 2, pp. 199–221, 1999.
OVERVIEW OF ARTIFICIAL NEURAL NETWORKS AND [13] M. A. Kuhail, S. Farooq, R. Hammad, and M. Bahja, ‘‘Characterizing
DATA SCIENCE visual programming approaches for end-user developers: A systematic
review,’’ IEEE Access, vol. 9, pp. 14181–14202, 2021.
The human brain consists of several interconnected cells that [14] M. M. Burnett and M. J. Baker, ‘‘A classification system for visual
transmit information encapsulated in electrical and chemical programming languages,’’ J. Vis. Lang. Comput., vol. 5, no. 3, pp. 287–300,
signals from various parts of the brain. These cells or neurons Sep. 1994.
[15] J. D. Kiper, E. Howard, and C. Ames, ‘‘Criteria for evaluation of visual
receive sensory inputs, process the signals and relay the programming languages,’’ J. Vis. Lang. Comput., vol. 8, no. 2, pp. 175–192,
output to other neurons. Neurons can work together to learn Apr. 1997.
the solution to a problem by creating a neural pathway. This [16] Z. Dobesova, ‘‘Evaluation of orange data mining software and examples
pathway becomes more accurate through trial and error by for lecturing machine learning tasks in geoinformatics,’’ in Computer
Applications in Engineering Education. Hoboken, NJ, USA: Wiley, 2024.
identifying neurons that regularly communicate. Through [17] U. Thange, V. K. Shukla, R. Punhani, and W. Grobbelaar, ‘‘Analyzing
regular practice, the brain learns to solve a problem. COVID-19 dataset through data mining tool ‘Orang,’’’ in Proc. 2nd
This simple, yet complex operation is the fundamental Int. Conf. Comput., Autom. Knowl. Manage. (ICCAKM), Jan. 2021,
pp. 198–203.
principle that has influenced the development of the artificial
[18] A. Abdelmagid and A. Qahmash, ‘‘Utilizing the educational data mining
neuron as shown in Figure 7. Each neuron can be seen as an techniques,’’ Inf. Sci. Lett., vol. 12, no. 3, pp. 1415–1431, 2023.
individual computing node that accepts one or more inputs [19] I. Popchev and D. Orozova, ‘‘Algorithms for machine learning with
xi , and produces an output y based on an activation function orange system,’’ Int. J. Online Biomed. Eng., vol. 19, no. 4, pp. 109–123,
Apr. 2023.
f (z). Non-linear functions Sigmoid functions are typically [20] J. Demsar and B. Zupan, ‘‘From experimental machine learning to
used, however, several activation functions are sufficient for interactive data mining,’’ in Proc. Knowl. Discovery Databases, 2005,
this purpose. A summary of the most common activation pp. 537–539.
functions for neural networks is presented by et al. in [72]. [21] B. Cukic, D. Hague, and M. Lou Maher, ‘‘An innovative interdisciplinary
undergraduate data science program: Pathways and experience,’’ in Proc.
A collection of these neurons form an artificial neural IEEE Frontiers Educ. Conf. (FIE), Oct. 2020, pp. 1–5.
network that learns from data by adjusting their parameters [22] D. Conway. (2010). The Data Science Venn Diagram. [Online]. Available:
and finding the correct solutions on their own, thereby http://www.dataists.com/2010/09/the-data-science-venn-diagram
[23] J. W. Tukey, ‘‘The future of data analysis,’’ Ann. Math. Statist., vol. 33,
mimicking human intelligence. In what follows, we describe no. 1, pp. 1–67, 1962.
the technical principles that govern the operation of artificial [24] J. M. Chambers, ‘‘Greater or lesser statistics: A choice for future research,’’
neural networks. Statist. Comput., vol. 3, no. 4, pp. 182–184, Dec. 1993.
[25] W. S. Cleveland, ‘‘Data science: An action plan for expanding the technical [52] B. Zhao and D. D. Potter, ‘‘Comparison of lecture-based learning vs
areas of the field of statistics,’’ Stat. Anal. Data Mining: ASA Data Sci. J., discussion-based learning in undergraduate medical students,’’ J. Surgical
vol. 7, no. 6, pp. 414–417, Dec. 2014. Educ., vol. 73, no. 2, pp. 250–257, Mar. 2016.
[26] S. C. Hicks and R. A. Irizarry, ‘‘A guide to teaching data science,’’ Amer. [53] W. Hung, D. H. Jonassen, and R. Liu, ‘‘Problem-based learning,’’
Statistician, vol. 72, no. 4, pp. 382–391, 2018. Handbook Res. Educ. Commun. Technol., vol. 3, no. 1, pp. 485–506, 2008.
[27] D. Donoho, ‘‘50 years of data science,’’ J. Comput. Graph. Statist., vol. 26, [54] M. A. Albanese and L. C. Dast, ‘‘Problem-based learning,’’ in Under-
no. 4, pp. 745–766, Oct. 2017. standing Medical Education: Evidence, Theory and Practice. Wiley, 2013,
[28] N. Corte-Real, P. Ruivo, T. Oliveira, and A. Popovic, ‘‘Unlocking the pp. 61–79.
drivers of big data analytics value in firms,’’ J. Bus. Res., vol. 97, [55] E. de Graaff and A. Kolmos, ‘‘Characteristics of problem-based learning,’’
pp. 160–173, Apr. 2019. Int. J. Eng. Educ., vol. 19, pp. 657–662, Jan. 2003.
[29] O. Hazzan and K. Mike, Guide to Teaching Data Science: An Interdisci- [56] C. Onyon, ‘‘Problem-based learning: A review of the educational and
plinary Approach. Springer, 2023. psychological theory,’’ Clin. Teacher, vol. 9, no. 1, pp. 22–26, Feb. 2012.
[30] C. Kelleher and R. Pausch, ‘‘Lowering the barriers to programming: [57] D. Kokotsaki, V. Menzies, and A. Wiggins, ‘‘Project-based learning: A
A taxonomy of programming environments and languages for novice review of the literature,’’ Improving Schools, vol. 19, no. 3, pp. 267–277,
programmers,’’ ACM Comput. Surveys, vol. 37, no. 2, pp. 83–137, 2005. Nov. 2016.
[58] N. Hosseinzadeh and M. R. Hesamzadeh, ‘‘Application of project-based
[31] C. Kelleher and R. Pausch, ‘‘Using storytelling to motivate programming,’’
learning (PBL) to the teaching of electrical power systems engineering,’’
Commun. ACM, vol. 50, no. 7, pp. 58–64, Jul. 2007.
IEEE Trans. Educ., vol. 55, no. 4, pp. 495–501, Nov. 2012.
[32] J. Maloney, M. Resnick, N. Rusk, B. Silverman, and E. Eastmond, ‘‘The [59] B. Condliffe, ‘‘Project-based learning: A literature review. working paper,’’
scratch programming language and environment,’’ ACM Trans. Comput. in Proc. MDRC, 2017, pp. 1–11.
Educ., vol. 10, no. 4, pp. 1–15, Nov. 2010. [60] M. M. Grant, ‘‘Getting a grip on project-based learning: Theory, cases and
[33] Scratch Statistics. Accessed: Feb. 7, 2024. [Online]. Available: recommendations,’’ Meridian, A Middle School Comput. Technol. J., vol. 5,
https://scratch.mit.edu/statistics/ no. 1, p. 83, 2002.
[34] I. F. de Kereki, ‘‘Scratch: Applications in computer science 1,’’ in Proc. [61] J. S. Krajcik and P. C. Blumenfeld, Project-Based Learning. Cambridge
38th Annu. Frontiers Educ. Conf., Oct. 2008, pp. 1–7. Univ. Press, 2006.
[35] J. Estevez, G. Garate, and M. Graña, ‘‘Gentle introduction to artificial [62] M. Saleh, M. Abbas, and R. B. Le Jeannès, ‘‘FallAllD: An open dataset
intelligence for high-school students using scratch,’’ IEEE Access, vol. 7, of human falls and activities of daily living for classical and deep learning
pp. 179027–179036, 2019. applications,’’ IEEE Sensors J., vol. 21, no. 2, pp. 1849–1858, Jan. 2021.
[36] R. J. Brunner and E. J. Kim, ‘‘Teaching data science,’’ Proc. Comput. Sci., [63] S. C. dos Santos, ‘‘PBL-SEE: An authentic assessment model for PBL-
vol. 80, pp. 1947–1956, Dec. 2016. based software engineering education,’’ IEEE Trans. Educ., vol. 60, no. 2,
[37] H. Habibzadeh, K. Dinesh, O. Rajabi Shishvan, A. Boggio-Dandry, pp. 120–126, May 2017.
G. Sharma, and T. Soyata, ‘‘A survey of healthcare Internet of Things [64] P. Abichandani, V. Sivakumar, D. Lobo, C. Iaboni, and P. Shekhar,
(HIoT): A clinical perspective,’’ IEEE Internet Things J., vol. 7, no. 1, ‘‘Internet-of-Things curriculum, pedagogy, and assessment for
pp. 53–71, Jan. 2020. STEM education: A review of literature,’’ IEEE Access, vol. 10,
[38] H. Hu, Y. Wen, T.-S. Chua, and X. Li, ‘‘Toward scalable systems pp. 38351–38369, 2022.
for big data analytics: A technology tutorial,’’ IEEE Access, vol. 2, [65] G. V. Helden, V. Van Der Werf, G. N. Saunders-Smits, and M. M. Specht,
pp. 652–687, 2014. ‘‘The use of digital peer assessment in higher education—An umbrella
[39] A. K. Pandey, A. I. Khan, Y. B. Abushark, Md. M. Alam, A. Agrawal, review of literature,’’ IEEE Access, vol. 11, pp. 22948–22960, 2023.
R. Kumar, and R. A. Khan, ‘‘Key issues in healthcare data integrity: Anal- [66] H.-P. Yueh, T.-L. Chen, L.-A. Chiu, S.-L. Lee, and A.-B. Wang, ‘‘Student
ysis and recommendations,’’ IEEE Access, vol. 8, pp. 40612–40628, 2020. evaluation of teaching effectiveness of a nationwide innovative education
[40] B. K. Daniel, ‘‘Big data and data science: A critical review of issues for program on image display technology,’’ IEEE Trans. Educ., vol. 55, no. 3,
educational research,’’ Brit. J. Educ. Technol., vol. 50, no. 1, pp. 101–113, pp. 365–369, Aug. 2012.
Jan. 2019. [67] M. Llamas-Nistal, F. A. Mikic-Fonte, M. Caeiro-Rodríguez, and
[41] X. Qin, Y. Luo, N. Tang, and G. Li, ‘‘DeepEye: An automatic big data M. Liz-Domínguez, ‘‘Supporting intensive continuous assessment with
visualization framework,’’ Big Data Mining Analytics, vol. 1, no. 1, BeA in a flipped classroom experience,’’ IEEE Access, vol. 7,
pp. 75–82, Mar. 2018. pp. 150022–150036, 2019.
[42] A. Perrot and D. Auber, ‘‘Cornac: Tackling huge graph visualization with [68] J. Moreno and A. F. Pineda, ‘‘A framework for automated for-
big data infrastructure,’’ IEEE Trans. Big Data, vol. 6, no. 1, pp. 80–92, mative assessment in mathematics courses,’’ IEEE Access, vol. 8,
Mar. 2020. pp. 30152–30159, 2020.
[69] S. Draft, Computer Science Curricula. New York, NY, USA: ACM, 2013.
[43] M. Aparicio and C. J. Costa, ‘‘Data visualization,’’ Commun. Design
[70] M. McCracken, V. Almstrum, D. Diaz, M. Guzdial, D. Hagan,
Quart., vol. 3, no. 1, pp. 7–11, Jan. 2015, doi: 10.1145/2721882.2721883.
Y. B.-D. Kolikant, C. Laxer, L. Thomas, I. Utting, and T. Wilusz, ‘‘A multi-
[44] R. Descartes, The Philosophical Works of Descartes.[2 Vols.]. Dover, 1955.
national, multi-institutional study of assessment of programming skills
[45] E. Bisong, ‘‘Matplotlib and seaborn,’’ in Building Machine Learning and of first-year cs students,’’ in Working Group Reports From ITiCSE on
Deep Learning Models on Google Cloud Platform. Cham, Switzerland: Innovation and Technology in Computer Science Education. Association
Springer, 2019, pp. 151–165. for Computing Machinery, 2001, pp. 125–180.
[46] J. Davis and M. Goadrich, ‘‘The relationship between precision-recall and [71] A. Joshi, S. Kale, S. Chandel, and D. Pal, ‘‘Likert scale: Explored
ROC curves,’’ in Proc. 23rd Int. Conf. Mach. Learn., 2006, pp. 233–240. and explained,’’ Brit. J. Appl. Sci. Technol., vol. 7, no. 4, pp. 396–403,
[47] A. N. Shewalkar, ‘‘Comparison of RNN, LSTM and GRU on speech Jan. 2015.
recognition data,’’ Comput. Sci. Masters Papers, 2018. [72] R. Parhi and R. D. Nowak, ‘‘The role of neural network activation
[48] Y. Bai, J. Xie, C. Liu, Y. Tao, B. Zeng, and C. Li, ‘‘Regression modeling functions,’’ IEEE Signal Process. Lett., vol. 27, pp. 1779–1783, 2020.
for enterprise electricity consumption: A comparison of recurrent neural
network and its variants,’’ Int. J. Electr. Power Energy Syst., vol. 126,
Mar. 2021, Art. no. 106612.
HARRY D. MAFUKIDZE received the B.Sc.
[49] K. H. Lycke, P. Grøttum, and H. I. Strømsø, ‘‘Student learning strategies,
degree (Hons.) in physics from Midlands
mental models and learning outcomes in problem-based and traditional
State University, Gweru, Zimbabwe, in 2009,
curricula in medicine,’’ Med. Teacher, vol. 28, no. 8, pp. 717–722,
Jan. 2006. and the M.Eng. degree in electronic engineering
[50] M. Khatiban, S. N. Falahan, R. Amini, A. Farahanchi, and A. Soltanian, from the University of Stellenbosch, Stellenbosch,
‘‘Lecture-based versus problem-based learning in ethics education among South Africa, in 2014. He is currently with the
nursing students,’’ Nursing Ethics, vol. 26, no. 6, pp. 1753–1764, Department of Applied Physics and Telecommu-
Sep. 2019. nications, Midlands State University. His research
[51] L. D. Kantar and S. Sailian, ‘‘The effect of instruction on learning: Case interests include radar signal processing, data
based versus lecture based,’’ Teaching Learn. Nursing, vol. 13, no. 4, science, machine learning, and deep learning and
pp. 207–211, Oct. 2018. their applications.
ACTION NECHIBVUTE received the B.Sc. degree IRFAN ANJUM BADRUDDIN received the
in physics from Midlands State University, Graduate degree in mechanical engineering,
in 2001, the B.Sc. degree in mathematics from in 1998, the Master of Technology degree,
the University of Zimbabwe, in 2001, the M.Sc. in 2001, and the Ph.D. degree in heat transfer
degree in physics from the University of Botswana, from Universiti Sains Malaysia, in 2007. He is
in 2008, and the Ph.D. degree in physics in the area currently a Professor with the Department of
of energy harvesting for wireless sensor devices Mechanical Engineering, King Khalid University,
from Midlands State University, in 2015. He is Saudi Arabia. He works in the interdisciplinary
currently an Academic Researcher with Midlands fields. He has more than 300 articles to his credit.
State University.