Technical Background
Technical Background
Technical background
We have seen how powerful and how useful modern AI is. In this part, we study the key
factors behind the success of current AI. This will help us better understand the nature of
this technology. In the process, we will also explain some common terminology that one
would likely run into when reading about AI now.
• What are the key factors that enable AI to do what seemed impossible in the past?
– artificial neural networks
– hardware acceleration
– Big Data
• What are the key factors that enable AI to be so widely available today?
– open-source code
– low-code development
– cloud computing
– edge computing
At the end, we will demonstrate these using the transformer model, which is one of the most
influential AI models nowadays.
67
6.1.1 The challenge
• In the traditional setting, some human programmers must give a computer each and
every instruction precisely to get it to perform a task.
• In such a setting, the complexity of the problems a computer can solve is limited by
the complexity of the instructions humans can comprehend precisely.
– For example, it is humanly impossible to describe precise rules which when followed
would allow a computer to tell correctly whether an arbitrary input is an image
of noodles or not, be it at the left or the right of the image, made from rice or
wheat, raw or cooked, in a soup or stir-fried, with a pouched egg or wontons on
top, served in a hawker centre or in a restaurant, while excluding french fries and
beansprouts.
Image sources: [1] Tekkyy (english Wiki), Public domain, via Wikimedia Commons. https://commons.
wikimedia.org/wiki/File:Wanton noodles.jpg. [2] Alpha, CC BY-SA 2.0, via Wikimedia Commons.
https://commons.wikimedia.org/wiki/File:Curry Laksa - Laksa King (2597729514).jpg. [3] Ocdp, CC0,
via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Nissin Chicken Ramen 002.jpg.
[4] N509FZ, CC BY-SA 4.0, via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:
Gon caau ngau ho (20150222171214).JPG. [5] https://m.facebook.com/323068641198750/posts/120
2981213207484/. [6] Photo by CEphoto, Uwe Aranas, https://commons.wikimedia.org/wiki/File:
Dalian Liaoning China Noodlemaker-01.jpg. [7] Popo le Chien, CC BY-SA 3.0, via Wikimedia Commons.
https://commons.wikimedia.org/wiki/File:Rice vermicelli.jpg. [8] Popo le Chien, CC BY-SA 3.0,
via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Fries 2.jpg. [9] cyclonebill, CC
BY-SA 2.0, via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Stegte gr%C3%B8ntsag
er (6290846922).jpg.
68
– In machine learning, the specific class of programs from which a suitable model is
to be found has to be carefully chosen: if it is too small, then it may not contain a
program that can perform the given task well enough; if it is too big, then it may
be too hard to find a suitable program in it.
• Traditionally, there are three basic machine learning paradigms: supervised learning,
unsupervised learning, and reinforcement learning.
• In supervised learning, training is done using labelled examples, meaning that each
data point possesses an associated label to be learnt by the model: the training is then
essentially a process of finding a model that can produce labels sufficiently similar to
the given ones.
– For example, in object recognition, the training data can be a large number of
images, each of which is labelled “noodles” or “non-noodles”. The model then
learns a way to reproduce the labels by looking at the images without seeing the
labels.
69
• (Artificial) neural networks are a particular computational architecture inspired by
neural circuits in brains.
• A neural network typically has an input layer (e.g., containing the pixels in an image,
digitized as numbers), one or more hidden layers, and an output layer (e.g., indicating
whether the input is an image of noodles).
• A layer in a neural network is composed of nodes that are often referred to as neurons.
• Each neuron stores a number, which, except for the neurons in the input layer, is
calculated using a weighted sum of the numbers stored in the neurons in the previous
layer following the structure of the network.
• An activation function (often the Rectified Linear Unit (ReLU) function, which allows
only the positive numbers to go through) gives a criterion to determine whether this
weighted sum goes through to the next layer.
• Before passing the weighted sum to the activation function at a neuron, one sometimes
adds a fixed number called a bias to the weighted sum to adjust the activation threshold.
• The weights and the biases are the parameters of the model that are independent of
the input here.
• By varying these parameters, one gets a whole class of different programs of the same
neural network structure that have varying behaviours.
• A right set of parameters for each neuron is required for the neural network to give
desired outputs.
• To train a neural network, one tunes the parameters to look for a program that performs
the desired task sufficiently well.
• The training is typically done by repeatedly running labelled examples (e.g., images that
are known to show noodles or non-noodles) through the neural network: by comparing
the outputs with the labels, one revises the parameters to decrease the error.
• One advantage of the neural network architecture is the availability of well-tested al-
gorithms to train it, e.g., Gradient Descent with back-propagation.
• In general, more complicated tasks require bigger neural networks in terms of the
number of parameters, but bigger neural networks take more time and more energy to
train and run.
• It is mathematically proven that neural networks, when appropriately structured and
trained, can in theory perform any task on a digital computer arbitrarily well.
70
6.1.4 Deep neural networks
inputs outputs
shallower
inputs outputs
deeper
• Deep neural networks are neural networks with multiple hidden layers.
• Deep learning is machine learning using deep neural networks.
• Deeper neural networks can perform more tasks, and are observed to generalize better,
compared to other neural networks with the same number of parameters.
• Since around 2009, deep learning has made major advances in solving problems that
had resisted the best attempts of the AI community for many years, e.g., in recogniz-
ing images and speech, predicting the activity of drug molecules, reconstructing brain
circuits, and understanding natural language.
References: [1] Yoshua Bengio. “Learning Deep Architectures for AI”. Foundations and Trends® in Machine
Learning, vol. 2, no. 1, pp. 1–127, 2009. [2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep
learning”. Nature, vol. 521, pp. 436–444, 2015. [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
“ImageNet Classification with Deep Convolutional Neural Networks”. Communications of the ACM, vol. 60,
no. 6, Jun. 2017, pp. 84–90.
71
6.2 Hardware acceleration
• Training AI models for real-world applications typically require massive amounts of
computation that would take impractically long to execute without specialized hard-
wares.
• Central processing units (CPUs) of modern computers are designed to perform a small
number of general tasks fast.
• Exploiting the similarities with calculations in processing graphics, graphics process-
ing units (GPUs) help speed up the training and the running of neural networks by
performing a large number of simple arithmetic operations in parallel.
NVIDIA® GeForce RTX™ 4090 GPU, 16384 cores, 24 GB memory, Boost Clock 2.52 GHz,
starting at SG$2700 as of Sep. 2024.
• It was in the mid 2000s when GPUs started to become available for non-graphics use. It
took only a few years for researchers to start using GPUs to train neural networks, and
the improvement in speed ranged from 5- to 70-fold. Nowadays, GPUs have become a
popular hardware accelerator for machine learning.
• There are now also specialized hardware developed for AI computations, e.g., Google’s
tensor processing units (TPUs).
Reference: Rajat Raina, Anand Madhavan, and Andrew Y. Ng. “Large-scale deep unsupervised learning
using graphics processors”. In Proceedings of the 26th Annual International Conference on Machine Learning
(ICML ’09), Association for Computing Machinery, New York, NY, USA, pp. 873–880, 2009.
72
– IoT expanded quickly in recent years because of decreasing hardware costs, de-
creasing cost of digital communication, and increasing device proliferation, amongst
other reasons.
• Digitalization makes other types of data, e.g., business records and customer feedback,
more easily available for training and analytics too.
• All these are Big Data as defined in §2.4, i.e., they are extensive data sets that are too
large to be analyzed using traditional methods.
• Using Big Data to train AI models is actually a way of extracting information from the
data.
• Big Data’s huge volume and variety are important in training AI models that perform
well.
• The high velocity at which Big Data are generated provides up-to-date data for training
AI, while AI provides a means to process Big Data at high velocity.
Reference: Malika Bakdi and Wassila Chadli. “Big Data: An Overview”. In Soraya Sedkaoui, Mounia
Khelfaoui, Nadjat Kadi, eds., Big Data Analytics, pp. 3–13. Apple Academic Press, 2022.
Image source: Open Source Initiative, “Logo Usage Guidelines”, 5 May 2023. https://opensource.org/log
o-usage-guidelines/.
• AI researchers often make their research findings, code and data sets freely available,
e.g., on arXiv, GitHub, Papers With Code, and Hugging Face.
• This makes it easy and fast for people to build on others’ work, and modify others’
code for their own needs.
73
• The Open Source Initiative is working towards a definition of open-source AI that takes
into account the special nature of AI. A first version is scheduled to be released in late
Oct. 2024.
Reference: Opensource.org. “Open Source AI Deep Dive”. https://opensource.org/deepdive. Last
accessed: 14 Sep. 2024.
• AI models for specific tasks, e.g., image, sound, and pose recognition, can be built with
a no-code graphical interface, from data collection and training to evaluation. Watch
how this is done with Google’s Teachable Machine in the video below.
2 min 8 sec
Video source: Google. “Teachable Machine 2.0: Making AI easier for everyone”. YouTube, 8 Nov. 2019.
https://youtu.be/T2qQGqZxkD0.
• There are platforms with graphical interfaces that automate the process of comparing
and implementing AI algorithms for specific applications. One example is DataRobot.
See how it is like in the video below.
1 min 31 sec
Video source: DataRobot. “DataRobot AI Platform [2018 Version - Update Available]”. YouTube,
16 Apr. 2018. https://youtu.be/RrbJLm6atwc.
• AI can now automate even the selection of AI algorithms and the pre-processing of
data. This capability is called automated machine learning (AutoML). Watch Google
74
CEO Sundar Pichai talk about it when they first made it possible in 2017.
1 min 11 sec
Video source: Elrashid Media : Tech-meetups-Startups-Hackathons (@elrashidmediatech-meetups-2861).
“Google #IO17 | Keynote | AutoML”. YouTube, 18 May 2017. https://youtu.be/92-DoDjCdsY.
• Watch how Amazon Web Service (AWS)’s cloud computing service can help businesses
in the video below.
3 min 1 sec
75
Video source: Amazon Web Services. “Introduction to AWS Lambda — Serverless Compute on Ama-
zon Web Services”. YouTube, 20 May 2015. https://youtu.be/eOBq h4OJ4.
• AWS, Google Cloud, Microsoft’s cloud computing platform Azure, and Alibaba Cloud
all have services specific to AI applications.
• Google’s Colaboratory (Colab) allows users to run Python code with GPUs online, free
of charge.
• With the advance of technology, computing devices are getting smaller, cheaper, more
powerful, more power-efficient, and more flexible physically, to the extent that some
IoT devices are now able to run AI models locally.
• This creates a so-called Artificial Intelligence of Things (AIoT) system.
• Example 1 cont’d: an AI model learns what the optimal tire pressure to maintain is,
given the car model, the tire model, and the terrain frequently driven on, and makes
timely suggestions to the driver on when to pump the tires and to what pressure.
• Example 2 cont’d: the pacemaker monitors for irregular heart activity, warns the user
about them, and can send out an automated distress call via the mobile phone during
a heart attack.
Image source: Michael H. (,,Laserlicht“), CC BY-SA 4.0, via Wikimedia Commons, https:
//commons.wikimedia.org/wiki/File:Raspberry Pi 4 Model B - Side.jpg.
– NVIDIA® Jetson Nano™ (Developer Kit, about SG$270 as of Sep. 2024, 100 mm×
80 mm × 29 mm, 5–10W power, 128-core GPU, 4-core CPU at 1.43GHz, 4GB
76
memory, designed for AI applications).
• There are also light-weight, low-power hardware accelerators that are suitable for edge
devices, for example:
– The Intel® Neural Compute Stick 2 (about SG$250 as of Sep. 2024, 72.5 mm ×
27 mm × 14mm, plug and play via USB) contains a hardware accelerator for AI
vision applications.
– One can add a TPU to a device via a USB port using Google Coral’s USB Accel-
erator (about SG$80 as of Sep. 2024, 65 mm × 30 mm × 8 mm, capable of perform
4 trillion operations per second using 2W).
• Some smartphones nowadays are equipped with (co-)processors that are designed with
AI applications in mind, for example, iPhone 15 has a 5-core GPU and a 16-core Neural
Engine, the HUAWEI P60 Pro has an Adreno GPU and a Qualcomm AI Engine,
Google’s Pixel 9 has a Google Tensor G4.
References: [1] https://www.apple.com/sg/iphone-15/specs/, last accessed: 14 Sep. 2024. [2] ht
tps://consumer.huawei.com/sg/phones/p60-pro/specs/, last accessed: 14 Sep. 2024. [3] https:
//store.google.com/product/pixel 9 specs, last accessed: 14 Sep. 2024.
• One can make trained AI models programmed in TensorFlow or PyTorch run on mobile
devices and on the web, for example:
– TensorFlow Lite enables one to run and retrain AI models written in TensorFlow
on mobile, microcontrollers and other edge devices.
• Fifth-generation (5G) cellular networks enable IoT and AIoT devices to communicate
with one another must faster, which is extremely important, for example, in autonomous
vehicles and remote surgeries.
• Here are some advantages of AIoT systems.
77
– The sharing of data, e.g., health data, amongst edge devices raises privacy and
security issues.
– As these systems are typically connected to the Internet, such data may even be
sent to the cloud without the user knowing.
– The reliance on AI in decision making increases the severity of malicious attacks.
2 min 9 sec
• A special feature of Transformer models is that, when processing a word, the surround-
ing words are directly involved as well.
• Transformer models are generally pretrained only for simple tasks like predicting a
masked word or the next word/sentence in a given piece of text.
• This pretraining is self-supervised, in the sense that the training data come unlabelled
like in unsupervised learning, but labels are extracted from the data, which the model
then learns like in supervised learning.
• Via transfer learning, the pretrained models can then be trained with smaller datasets
for more specific tasks, e.g., text categorization, named entity recognition, rudimentary
reading comprehension, question answering, summarization, and translations (between
natural languages and between programming languages).
• As a result, these pretrained models are sometimes called foundation models.
• Transformer-based large language models are mostly hosted on the cloud.
• Meta’s LLaMa is open source, but OpenAI’s ChatGPT is not, as of Sep. 2024.
• As of Sep. 2024, the free version of ChatGPT is powered by GPT-4o mini, where the
acronym GPT stands for “generative pretrained transformer”.
• More technical information is publicly available about GPT-3, a predecessor of GPT-4o
mini.
– The GPT-3 model has 175 billion parameters and is 96 layers deep.
78
– The training data for GPT-3 consist of over 300 billion words (or, more precisely,
tokens).
– It is estimated that the training for GPT-3 would have taken 34 days if 1024 NVIDIA
Tensor Core A100 GPUs were used.
• Transformer models find applications also in protein structure prediction, speech recog-
nition, image classification, and video classification.
• A similar training method gives the popular diffusion model for images, which is trained
to remove noise added into images.
– The text-to-image programs DALL·E and Stable Diffusion mentioned in §3.6 are
both based on diffusion models.
References: [1] Jakob Uszkoreit. “Transformer: A Novel Neural Network Architecture for Language Under-
standing”. Google Research, 31 Aug. 2017. https://ai.googleblog.com/2017/08/transformer-novel-neur
al-network.html. Last accessed: 14 Sep. 2024. [2] Craig S. Smith. “Battle Of The Bots: China’s ChatGPT
Comes Out Swinging To Challenge OpenAI”. Forbes, 24 Mar. 2023. https://www.forbes.com/sites/craig
smith/2023/03/24/battle-of-the-bots-baidus-ernie-comes-out-swinging-to-challenge-openai/. Last
accessed: 14 Sep. 2024. [3] Rick Merritt. “What Is a Transformer Model?”. NVIDIA blog, 25 Mar. 2022. ht
tps://blogs.nvidia.com/blog/2022/03/25/what-is-a-transformer-model/. Last accessed: 14 Sep. 2024.
[4] Tom Brown, et al. “Language models are few-shot learners”. Advances in neural information processing
systems, vol. 33, pp. 1877–1901, 2020. [5] Deepak Narayanan, et al. “Efficient Large-Scale Language Model
Training on GPU Clusters Using Megatron-LM”. SC ’21: Proceedings of the International Conference for
High Performance Computing, Networking, Storage and Analysis, art. no. 58, pp. 1–15, Nov. 2021.
6.9 Reflection
• We saw a greatly simplified picture of how AI works under the hood, and the key
resources it relies on.
• Are you more interested in being a user or a developer of AI, or both?
• What would you most want to achieve using Google’s Teachable Machine we saw
in §6.5?
• Do you think current AI qualifies as being intelligent? Why?
79