0% found this document useful (0 votes)
13 views13 pages

Technical Background

notes

Uploaded by

lengsang03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

Technical Background

notes

Uploaded by

lengsang03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

HS1501 §6

Technical background

We have seen how powerful and how useful modern AI is. In this part, we study the key
factors behind the success of current AI. This will help us better understand the nature of
this technology. In the process, we will also explain some common terminology that one
would likely run into when reading about AI now.
• What are the key factors that enable AI to do what seemed impossible in the past?
– artificial neural networks
– hardware acceleration
– Big Data
• What are the key factors that enable AI to be so widely available today?
– open-source code
– low-code development
– cloud computing
– edge computing
At the end, we will demonstrate these using the transformer model, which is one of the most
influential AI models nowadays.

6.1 Artificial neural networks

Image source: Lollixzc, CC BY-SA 4.0, via Wikimedia Commons, https://commons.wikimedia.org/wiki/F


ile:AI hierarchy.svg.

67
6.1.1 The challenge
• In the traditional setting, some human programmers must give a computer each and
every instruction precisely to get it to perform a task.
• In such a setting, the complexity of the problems a computer can solve is limited by
the complexity of the instructions humans can comprehend precisely.
– For example, it is humanly impossible to describe precise rules which when followed
would allow a computer to tell correctly whether an arbitrary input is an image
of noodles or not, be it at the left or the right of the image, made from rice or
wheat, raw or cooked, in a soup or stir-fried, with a pouched egg or wontons on
top, served in a hawker centre or in a restaurant, while excluding french fries and
beansprouts.

Image sources: [1] Tekkyy (english Wiki), Public domain, via Wikimedia Commons. https://commons.
wikimedia.org/wiki/File:Wanton noodles.jpg. [2] Alpha, CC BY-SA 2.0, via Wikimedia Commons.
https://commons.wikimedia.org/wiki/File:Curry Laksa - Laksa King (2597729514).jpg. [3] Ocdp, CC0,
via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Nissin Chicken Ramen 002.jpg.
[4] N509FZ, CC BY-SA 4.0, via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:
Gon caau ngau ho (20150222171214).JPG. [5] https://m.facebook.com/323068641198750/posts/120
2981213207484/. [6] Photo by CEphoto, Uwe Aranas, https://commons.wikimedia.org/wiki/File:
Dalian Liaoning China Noodlemaker-01.jpg. [7] Popo le Chien, CC BY-SA 3.0, via Wikimedia Commons.
https://commons.wikimedia.org/wiki/File:Rice vermicelli.jpg. [8] Popo le Chien, CC BY-SA 3.0,
via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Fries 2.jpg. [9] cyclonebill, CC
BY-SA 2.0, via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Stegte gr%C3%B8ntsag
er (6290846922).jpg.

6.1.2 Machine learning


• These days, one popular solution is to use a (generally simpler) algorithm to find, within
a specific class of (generally more complicated) programs, one that performs the given
task well.
– This approach is known as machine learning.
– The program to be found is (from §1.1) called a model.
– The algorithmic process of finding a model is called training (from the point of
view of the machine learning algorithm) or learning (from the point of view of the
model).

68
– In machine learning, the specific class of programs from which a suitable model is
to be found has to be carefully chosen: if it is too small, then it may not contain a
program that can perform the given task well enough; if it is too big, then it may
be too hard to find a suitable program in it.
• Traditionally, there are three basic machine learning paradigms: supervised learning,
unsupervised learning, and reinforcement learning.
• In supervised learning, training is done using labelled examples, meaning that each
data point possesses an associated label to be learnt by the model: the training is then
essentially a process of finding a model that can produce labels sufficiently similar to
the given ones.

– For example, in object recognition, the training data can be a large number of
images, each of which is labelled “noodles” or “non-noodles”. The model then
learns a way to reproduce the labels by looking at the images without seeing the
labels.

• In unsupervised learning, training is done by identifying patterns in unlabelled data.


– For example, in data analytics, the model can use the transactions on an online
marketplace as training data to identify shopping patterns, according to which
users can be categorized.
• In reinforcement learning, a model learns an action policy through trial and error using
a reward/punishment system.
– For example, DeepMind’s virtual robot in §4.3 learnt to walk using forward progress
as reward.
• In all cases, a trained model is supposed to generalize, in the sense that it performs
well even on inputs that it has not seen during training.

6.1.3 Neural networks

Image source: BrunelloN, CC BY-SA 4.0, via Wikimedia Commons, https://commons.wikimedia.org/wiki


/File:Example of a deep neural network.png.

• Many AI models nowadays are based on deep neural networks.

69
• (Artificial) neural networks are a particular computational architecture inspired by
neural circuits in brains.
• A neural network typically has an input layer (e.g., containing the pixels in an image,
digitized as numbers), one or more hidden layers, and an output layer (e.g., indicating
whether the input is an image of noodles).

• A layer in a neural network is composed of nodes that are often referred to as neurons.
• Each neuron stores a number, which, except for the neurons in the input layer, is
calculated using a weighted sum of the numbers stored in the neurons in the previous
layer following the structure of the network.

• An activation function (often the Rectified Linear Unit (ReLU) function, which allows
only the positive numbers to go through) gives a criterion to determine whether this
weighted sum goes through to the next layer.
• Before passing the weighted sum to the activation function at a neuron, one sometimes
adds a fixed number called a bias to the weighted sum to adjust the activation threshold.

• The weights and the biases are the parameters of the model that are independent of
the input here.
• By varying these parameters, one gets a whole class of different programs of the same
neural network structure that have varying behaviours.

• A right set of parameters for each neuron is required for the neural network to give
desired outputs.
• To train a neural network, one tunes the parameters to look for a program that performs
the desired task sufficiently well.

• The training is typically done by repeatedly running labelled examples (e.g., images that
are known to show noodles or non-noodles) through the neural network: by comparing
the outputs with the labels, one revises the parameters to decrease the error.
• One advantage of the neural network architecture is the availability of well-tested al-
gorithms to train it, e.g., Gradient Descent with back-propagation.

• In general, more complicated tasks require bigger neural networks in terms of the
number of parameters, but bigger neural networks take more time and more energy to
train and run.
• It is mathematically proven that neural networks, when appropriately structured and
trained, can in theory perform any task on a digital computer arbitrarily well.

70
6.1.4 Deep neural networks

inputs outputs

shallower

inputs outputs

deeper

• Deep neural networks are neural networks with multiple hidden layers.
• Deep learning is machine learning using deep neural networks.
• Deeper neural networks can perform more tasks, and are observed to generalize better,
compared to other neural networks with the same number of parameters.

• Since around 2009, deep learning has made major advances in solving problems that
had resisted the best attempts of the AI community for many years, e.g., in recogniz-
ing images and speech, predicting the activity of drug molecules, reconstructing brain
circuits, and understanding natural language.
References: [1] Yoshua Bengio. “Learning Deep Architectures for AI”. Foundations and Trends® in Machine
Learning, vol. 2, no. 1, pp. 1–127, 2009. [2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep
learning”. Nature, vol. 521, pp. 436–444, 2015. [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
“ImageNet Classification with Deep Convolutional Neural Networks”. Communications of the ACM, vol. 60,
no. 6, Jun. 2017, pp. 84–90.

71
6.2 Hardware acceleration
• Training AI models for real-world applications typically require massive amounts of
computation that would take impractically long to execute without specialized hard-
wares.
• Central processing units (CPUs) of modern computers are designed to perform a small
number of general tasks fast.
• Exploiting the similarities with calculations in processing graphics, graphics process-
ing units (GPUs) help speed up the training and the running of neural networks by
performing a large number of simple arithmetic operations in parallel.

Image source: ZMASLO, CC BY 3.0, via Wikimedia Commons, https://commons.wikimedia.org/wi


ki/File:NVIDIA RTX 4090 Founders Edition - Verpackung (ZMASLO).png.

NVIDIA® GeForce RTX™ 4090 GPU, 16384 cores, 24 GB memory, Boost Clock 2.52 GHz,
starting at SG$2700 as of Sep. 2024.
• It was in the mid 2000s when GPUs started to become available for non-graphics use. It
took only a few years for researchers to start using GPUs to train neural networks, and
the improvement in speed ranged from 5- to 70-fold. Nowadays, GPUs have become a
popular hardware accelerator for machine learning.
• There are now also specialized hardware developed for AI computations, e.g., Google’s
tensor processing units (TPUs).
Reference: Rajat Raina, Anand Madhavan, and Andrew Y. Ng. “Large-scale deep unsupervised learning
using graphics processors”. In Proceedings of the 26th Annual International Conference on Machine Learning
(ICML ’09), Association for Computing Machinery, New York, NY, USA, pp. 873–880, 2009.

6.3 Big Data


• We saw in §6.1.2 above that data are needed in training AI models.
• The growth of the Internet and the Internet of Things (IoT) has made more and more
data readily available for different kinds of training and analytics.
– The IoT refers to infrastructures where multiple physical devices, typically with
sensors or interactive components, exchange data with one another over the In-
ternet or another kind of communication network.
– Often, this exchange of data does not require human intervention.
– Example 1: in a smart car, tire pressure sensors send measurements to the car
dashboard, so that the driver can continuously monitor tire condition without
leaving the car.
– Example 2: a pacemaker can send heart activity to the user’s mobile phone.
– IoT systems bring data from the physical world into the digital world.

72
– IoT expanded quickly in recent years because of decreasing hardware costs, de-
creasing cost of digital communication, and increasing device proliferation, amongst
other reasons.
• Digitalization makes other types of data, e.g., business records and customer feedback,
more easily available for training and analytics too.

• All these are Big Data as defined in §2.4, i.e., they are extensive data sets that are too
large to be analyzed using traditional methods.
• Using Big Data to train AI models is actually a way of extracting information from the
data.

• Big Data’s huge volume and variety are important in training AI models that perform
well.
• The high velocity at which Big Data are generated provides up-to-date data for training
AI, while AI provides a means to process Big Data at high velocity.
Reference: Malika Bakdi and Wassila Chadli. “Big Data: An Overview”. In Soraya Sedkaoui, Mounia
Khelfaoui, Nadjat Kadi, eds., Big Data Analytics, pp. 3–13. Apple Academic Press, 2022.

6.4 Open-source code

Image source: Open Source Initiative, “Logo Usage Guidelines”, 5 May 2023. https://opensource.org/log
o-usage-guidelines/.

• AI researchers often make their research findings, code and data sets freely available,
e.g., on arXiv, GitHub, Papers With Code, and Hugging Face.
• This makes it easy and fast for people to build on others’ work, and modify others’
code for their own needs.

• Two popular software libraries/frameworks for programming AI algorithms are Tensorflow


(developed by Google Brain) and PyTorch (originally developed by Facebook AI (now
Meta AI), now part of the Linux Foundation).
• Both are open source, meaning in particular that the source code is widely and freely
available, and it may be redistributed and modified freely.
• Both allow computation on one or more CPUs and GPUs.
• Both can be used with Python, which is also open source, and is arguably the top
programming language for AI applications as of today.
References: [1] Preston Fore. Reviewed by Jasmine Suarez. “AI programming languages power
today’s innovations like ChatGPT. These are some of the most popular”. Fortune Recommends,
2 Mar. 2024. https://fortune.com/education/articles/ai- programming- languages/. Last
accessed: 14 Sep. 2024. [2] Vinita Silaparasetty. “Top 10 AI Programming Languages: A Beginner’s
Guide to Getting Started”. DataCamp, Jun. 2024. https://www.datacamp.com/blog/ai-programmi
ng-languages. Last accessed: 14 Sep. 2024.

73
• The Open Source Initiative is working towards a definition of open-source AI that takes
into account the special nature of AI. A first version is scheduled to be released in late
Oct. 2024.
Reference: Opensource.org. “Open Source AI Deep Dive”. https://opensource.org/deepdive. Last
accessed: 14 Sep. 2024.

6.5 Low-code development


• With TensorFlow and PyTorch, simple AI algorithms can be programmed with less
than ten lines of code.
Reference: Google Developers. “Hello World – Machine Learning Recipes #1”. YouTube, 31 Mar. 2016.
https://youtu.be/cKxRvEZd3Mw, 6 min 52 sec.

• AI models for specific tasks, e.g., image, sound, and pose recognition, can be built with
a no-code graphical interface, from data collection and training to evaluation. Watch
how this is done with Google’s Teachable Machine in the video below.

2 min 8 sec
Video source: Google. “Teachable Machine 2.0: Making AI easier for everyone”. YouTube, 8 Nov. 2019.
https://youtu.be/T2qQGqZxkD0.

• There are platforms with graphical interfaces that automate the process of comparing
and implementing AI algorithms for specific applications. One example is DataRobot.
See how it is like in the video below.

1 min 31 sec
Video source: DataRobot. “DataRobot AI Platform [2018 Version - Update Available]”. YouTube,
16 Apr. 2018. https://youtu.be/RrbJLm6atwc.

• AI can now automate even the selection of AI algorithms and the pre-processing of
data. This capability is called automated machine learning (AutoML). Watch Google

74
CEO Sundar Pichai talk about it when they first made it possible in 2017.

1 min 11 sec
Video source: Elrashid Media : Tech-meetups-Startups-Hackathons (@elrashidmediatech-meetups-2861).
“Google #IO17 | Keynote | AutoML”. YouTube, 18 May 2017. https://youtu.be/92-DoDjCdsY.

6.6 Cloud computing


• The cloud refers to a distributed network of servers, accessible via the Internet, that
virtually deliver services such as softwares, hardwares, and data storage.
• Installing and maintaining the hardware required to run complex AI on a commer-
cial scale can be prohibitively expensive, especially for Small and Medium Enterprises
(SMEs).
• Cloud computing services provide smaller companies an affordable option to equip
themselves with powerful AI capabilities that drive their products.
• Here are the characteristics of cloud computing.
– On-demand self-service: the resources are available whenever the user wants
them.
– Broad network access: the resources are available through the Internet on
common consumer devices.
– Resource pooling: the resources are deployed to serve multiple users.
– Rapid elasticity: users can choose the amount and the type of resources they
get dynamically.
– Measured service: the user pays according to the amount of resources s/he
used.
– The service provider is responsible for the set-up, the management, the mainte-
nance, and the security of the software and the hardware resources, not the user.
Reference: Peter Mell and Timothy Grance. “The NIST Definition of Cloud Computing”. National
Institute of Standards and Technology, NIST Special Publication 800-145, Sep. 2011. https://nvlp
ubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf.

• Watch how Amazon Web Service (AWS)’s cloud computing service can help businesses
in the video below.

3 min 1 sec

75
Video source: Amazon Web Services. “Introduction to AWS Lambda — Serverless Compute on Ama-
zon Web Services”. YouTube, 20 May 2015. https://youtu.be/eOBq h4OJ4.

• AWS, Google Cloud, Microsoft’s cloud computing platform Azure, and Alibaba Cloud
all have services specific to AI applications.
• Google’s Colaboratory (Colab) allows users to run Python code with GPUs online, free
of charge.

6.7 Edge computing


• Edge computing refers to the idea of storing data and performing computations on
them at the edge, i.e., at or close to the data sources and the users, e.g., at a sensor, a
mobile phone, and more generally an IoT device.

• With the advance of technology, computing devices are getting smaller, cheaper, more
powerful, more power-efficient, and more flexible physically, to the extent that some
IoT devices are now able to run AI models locally.
• This creates a so-called Artificial Intelligence of Things (AIoT) system.
• Example 1 cont’d: an AI model learns what the optimal tire pressure to maintain is,
given the car model, the tire model, and the terrain frequently driven on, and makes
timely suggestions to the driver on when to pump the tires and to what pressure.
• Example 2 cont’d: the pacemaker monitors for irregular heart activity, warns the user
about them, and can send out an automated distress call via the mobile phone during
a heart attack.

• A number of small, power-efficient single-board computers can be used to deploy AI in


IoT devices, for example:
– Raspberry Pi (model 4B, from SG$72 as of Sep. 2024, 85 mm × 56 mm × 17 mm,
Power over Ethernet, 4-core CPU at 1.5GHz, from 2GB memory, originally de-
signed for educational use);

Image source: Michael H. (,,Laserlicht“), CC BY-SA 4.0, via Wikimedia Commons, https:
//commons.wikimedia.org/wiki/File:Raspberry Pi 4 Model B - Side.jpg.

– NVIDIA® Jetson Nano™ (Developer Kit, about SG$270 as of Sep. 2024, 100 mm×
80 mm × 29 mm, 5–10W power, 128-core GPU, 4-core CPU at 1.43GHz, 4GB

76
memory, designed for AI applications).

Image source: SparkFun Electronics, via Wikimedia Commons, https://commons.wikimedia.or


g/wiki/File:NVIDIA Jetson Nano Developer Kit %2847616885631%29.jpg.

• There are also light-weight, low-power hardware accelerators that are suitable for edge
devices, for example:
– The Intel® Neural Compute Stick 2 (about SG$250 as of Sep. 2024, 72.5 mm ×
27 mm × 14mm, plug and play via USB) contains a hardware accelerator for AI
vision applications.
– One can add a TPU to a device via a USB port using Google Coral’s USB Accel-
erator (about SG$80 as of Sep. 2024, 65 mm × 30 mm × 8 mm, capable of perform
4 trillion operations per second using 2W).
• Some smartphones nowadays are equipped with (co-)processors that are designed with
AI applications in mind, for example, iPhone 15 has a 5-core GPU and a 16-core Neural
Engine, the HUAWEI P60 Pro has an Adreno GPU and a Qualcomm AI Engine,
Google’s Pixel 9 has a Google Tensor G4.
References: [1] https://www.apple.com/sg/iphone-15/specs/, last accessed: 14 Sep. 2024. [2] ht
tps://consumer.huawei.com/sg/phones/p60-pro/specs/, last accessed: 14 Sep. 2024. [3] https:
//store.google.com/product/pixel 9 specs, last accessed: 14 Sep. 2024.

• One can make trained AI models programmed in TensorFlow or PyTorch run on mobile
devices and on the web, for example:
– TensorFlow Lite enables one to run and retrain AI models written in TensorFlow
on mobile, microcontrollers and other edge devices.

• Fifth-generation (5G) cellular networks enable IoT and AIoT devices to communicate
with one another must faster, which is extremely important, for example, in autonomous
vehicles and remote surgeries.
• Here are some advantages of AIoT systems.

– The incorporation of AI allows IoT devices to perform a wider range of functions.


– Not having to send the data over to the cloud for processing can improve the speed
of the operation and saves network bandwidth.
– Having the data stored and processed at the edge avoids the privacy and the
security issues of sending the data to the cloud for processing.
– There is sometimes no network connection, e.g., for drones flying in remote areas
or for robots working underground, in which case the AI must run on the device
itself.
• Here are some issues associated with the use of AIoT systems.

77
– The sharing of data, e.g., health data, amongst edge devices raises privacy and
security issues.
– As these systems are typically connected to the Internet, such data may even be
sent to the cloud without the user knowing.
– The reliance on AI in decision making increases the severity of malicious attacks.

Listen to Prof. Yu talk about the potentials of AIoT systems.

2 min 9 sec

6.8 Example: Transformer


• In 2017, Google Brain introduced the Transformer language model, which was shown
to outperform a number of other AI NLP models that had been in use.
• Many powerful chatbots mentioned in §2.9, e.g., OpenAI’s ChatGPT, Google’s Gemini,
Meta’s LLaMA, and reportedly Baidu’s ERNIE bot, are based on Transformer models.
• Transformer models use neural networks.

• A special feature of Transformer models is that, when processing a word, the surround-
ing words are directly involved as well.
• Transformer models are generally pretrained only for simple tasks like predicting a
masked word or the next word/sentence in a given piece of text.

• This pretraining is self-supervised, in the sense that the training data come unlabelled
like in unsupervised learning, but labels are extracted from the data, which the model
then learns like in supervised learning.
• Via transfer learning, the pretrained models can then be trained with smaller datasets
for more specific tasks, e.g., text categorization, named entity recognition, rudimentary
reading comprehension, question answering, summarization, and translations (between
natural languages and between programming languages).
• As a result, these pretrained models are sometimes called foundation models.
• Transformer-based large language models are mostly hosted on the cloud.

• Meta’s LLaMa is open source, but OpenAI’s ChatGPT is not, as of Sep. 2024.
• As of Sep. 2024, the free version of ChatGPT is powered by GPT-4o mini, where the
acronym GPT stands for “generative pretrained transformer”.
• More technical information is publicly available about GPT-3, a predecessor of GPT-4o
mini.

– The GPT-3 model has 175 billion parameters and is 96 layers deep.

78
– The training data for GPT-3 consist of over 300 billion words (or, more precisely,
tokens).
– It is estimated that the training for GPT-3 would have taken 34 days if 1024 NVIDIA
Tensor Core A100 GPUs were used.

• Transformer models find applications also in protein structure prediction, speech recog-
nition, image classification, and video classification.
• A similar training method gives the popular diffusion model for images, which is trained
to remove noise added into images.
– The text-to-image programs DALL·E and Stable Diffusion mentioned in §3.6 are
both based on diffusion models.
References: [1] Jakob Uszkoreit. “Transformer: A Novel Neural Network Architecture for Language Under-
standing”. Google Research, 31 Aug. 2017. https://ai.googleblog.com/2017/08/transformer-novel-neur
al-network.html. Last accessed: 14 Sep. 2024. [2] Craig S. Smith. “Battle Of The Bots: China’s ChatGPT
Comes Out Swinging To Challenge OpenAI”. Forbes, 24 Mar. 2023. https://www.forbes.com/sites/craig
smith/2023/03/24/battle-of-the-bots-baidus-ernie-comes-out-swinging-to-challenge-openai/. Last
accessed: 14 Sep. 2024. [3] Rick Merritt. “What Is a Transformer Model?”. NVIDIA blog, 25 Mar. 2022. ht
tps://blogs.nvidia.com/blog/2022/03/25/what-is-a-transformer-model/. Last accessed: 14 Sep. 2024.
[4] Tom Brown, et al. “Language models are few-shot learners”. Advances in neural information processing
systems, vol. 33, pp. 1877–1901, 2020. [5] Deepak Narayanan, et al. “Efficient Large-Scale Language Model
Training on GPU Clusters Using Megatron-LM”. SC ’21: Proceedings of the International Conference for
High Performance Computing, Networking, Storage and Analysis, art. no. 58, pp. 1–15, Nov. 2021.

6.9 Reflection
• We saw a greatly simplified picture of how AI works under the hood, and the key
resources it relies on.
• Are you more interested in being a user or a developer of AI, or both?
• What would you most want to achieve using Google’s Teachable Machine we saw
in §6.5?
• Do you think current AI qualifies as being intelligent? Why?

79

You might also like