Block 3
Block 3
BLOCK 3
AI and Machine Learning
Indira Gandhi
National Open University
MIO-002
School of Engineering & Technology SMART TECHNOLOGIES
(HARDWARE AND
SOFTWARE)
Block
3
AI AND MACHINE LEARNING
UNIT 7
Basics of AI
UNIT 8
Basics of Machine Language / Introduction to Machine Language
UNIT 9
AI and Machine Learning for Smart Cities
MIO - 002 : SMART TECHNOLOGIES (HARDWARE AND SOFTWARE)
BLOCK 3: AI AND MACHINE LEARNING
GUIDANCE
Prof. Nageshwar Rao Prof. Satyakam Prof. Ashish Agarwal
Vice-Chancellor, IGNOU PVC, IGNOU Director, SOET, IGNOU
October 2022
©Indira Gandhi National Open University, (IGNOU)2022 CRC Prepared in 2021
ISBN - 978-93-5568-577-3
All rights reserved No part of this work may be reproduced in any from, by mimeograph or any other means, without
permission is writing from the Indira Gandhi National Open University (IGNOU).
Further Information on the Indira Gandhi National Open University (IGNOU) courses may be obtained from the
University’s office at Maidan Garhi, New Delhi-110068.
Laser Typesetting by School of Engineering and Technology, IGNOU, New Delhi-110 068. Phone 29532863
Printed and publish Digital Materials on behalf of School of Engineering and Technology (SOET), Indira Gandhi
National Open University, New Delhi 110068.
UNIT 7 BASICS OF AI
Structure
7.1 Introduction
7.2 What is AI?
7.3 Components of Artificial Intelligence
7.4 Fields of Applications of AI
7.5 Implementation of AI
7.6 The Future of AI
7.7 AI Ethics
7.8 Summary
7.9 Keywords
7.10 Check Your Progress – Possible Answers
7.11 References and Selected Readings
7.1 INTRODUCTION
Whether a computer has Artificial Intelligence? Alan Turing proposed a test to
answer this question. A computer that satisfies Turing’s definition of Artificial
Intelligence should be able to do things that are expected from humans such as
writing an essay, recognizing pictures of celebrities, engaging in conversation,
composing music, solving reasoning tests, and so on. To pass the Turing test,
AI should have capabilities like natural language processing, knowledge
representation, reasoning, and machine learning. The domain of natural
language processing (NLP) is concerned with understanding how human
languages like English can be understood and replicated by computers.
Different NLP models perform a variety of tasks such as sentiment analysis i.e.
tone of a sentence, machine translation (like Google Translator) and speech
recognition (like Alexa, Siri). Generative Pre-trained Transformer or GPT is a
model for Natural Language Processing. An AI called GPT-3, trained on
millions of online articles and posts, can generate human-like textual passages
based on prompts. AI is a powerful technology and it is progressing rapidly. AI
is capable of performing many tasks. AI can translate between languages. It
can beat best of the chess players. It can recognize objects in images and
videos. It has made inroads into stock trading, self-driving cars, and many such
applications. Neural networks and machine learning can creatively generate
texts, music pieces and even painting in the style of famous painters. Some of
the prominent fields where applications of AI can be discerned are: climate
science, finance, cybersecurity, and natural language processing. The ultimate
goal of AI is to make machines that think like human beings. This idea is
called Artificial General Intelligence. As opposed to the current AI systems,
which are dedicated to solving specific tasks, a machine with Artificial General
intelligence will be capable of learning and performing several tasks. With the
advancement and progress of AI, questions about the ethics of AI become
more prominent.
Objective
In this unit, you will learn about the basics of AI. After reading this unit, you
will be able to:
Understand the concept of AI
Identify components of AI
Appreciate applications of AI in different fields
Appreciate AI through the implementation of a small project
Discuss the future of AI
Discuss ethics of AI
If we denote win for Player 1 by +1, win for Player 2 as -1, and a draw as zero,
then the objective for player 1 is to maximize the final score and the objective
of player 2 is to minimize the score.
Figure 7.2 shows an example of a game tree which we will search using alpha-
beta pruning. To begin with, two variables α =− ∞ and β =+ ∞ for each node
have been initialized. α represents the best-case scenario for Player 1, while β
represents the best-case scenario for player 2. Following are the steps to
understand the working of alpha-beta pruning.
Figure 7.2: Alpha-beta Pruning Example
,
Figure 7.3: City Graph with the Distance between Cities
A* Search
A* search uses both the cost to reach the state (same as Dijkstra’s algorithm)
and the heuristic. Thus, the total cost is given by
�=ℎ � +� �
Different problems have different heuristic functions ℎ. For our example, ℎ �
can simply be chosen as the aerial distance of the destination from the current
node. Where g(n) is the total cost of the path from the starting node to the
current node.
A* search progressively choses the nodes which lead to a path with the
minimum total cost.
A heuristic search does not guarantee an optimal result but generally performs
faster than uninformed searches.
Monte Carlo Tree Search
Monte Carlo Tree Search (MCTS) is useful when the tree is very large. It
estimates the probability for wining given a move by using Monte Carlo
Simulation. Monte Carlo Tree Search example has been given in Figure 7.4.
Monte Carlo Tree Search has 4 essential steps:
1. Selection: It is decided which node is to be selected next; based on a
function called Upper Confidence Bound (UCB).
UCB = �� + �� � /��
Where �� represents the value of the current node. The value of a node
corresponds to the probability of winning. UCB function has two
important characteristics.
Exploitation: You choose to explore the nodes that have higher
chances of winning. Thus, following the exploitation strategy
the higher the value of �� the more likely the node will be
selected.
Exploration: The term �� � /�� quantifies the exploration
strategy. Once a node has been explored many times the values
�� becomes much larger than the logarithm term eventually
bringing the UCB value down for that node compared to nodes
that have a small �� value.
You stop the selection process when you reach the node which has not
been further expanded.
2. Expansion: When you reach the last node, you randomly make a move
to add another node to the tree.
3. Simulation: For the given node, you run a classical Monte Carlo
simulation to accumulate statistics for the node. A Monte Carlo
simulation in this case is simply playing multiple games by making
random moves until a result is achieved and recording the statistics for
that node.
4. Backpropagation: The newly collected statistics are used to update the
probability values of nodes upwards to the top.
After a sufficient number of iterations are completed, you simply choose the
move with the highest probability of winning. Monte Carlo Tree Search is
widely used in programs that play games like chess.
7.3.2 Learning from Data
The domain of learning from data is more commonly known as machine
learning. Machine learning techniques use data to learn to do a specific task.
For example, using a collection of your album photos as input, machine
learning techniques can be used to develop a model that can recognize your
face. In machine learning “Training a model” is often referred to in the same
sense as it is used in the case of “Training a pet”. To be more precise, training
a model means finding parameters for which the model gives optimal results.
Machine learning is classified into 3 types:
i. Supervised Learning
ii. Unsupervised Learning
iii. Reinforcement Learning
Supervised learning uses data that has been labelled or tagged. The goal here is
to predict the labels for new input.
Unsupervised learning utilizes data that is not labelled. The main goal here is
to find patterns in the data. A very common example is a clustering of data, i.e.
to group similar data together and form different groups of the data. For
example, consider a satellite image of India, an unsupervised learning model
may recognize different features like the forest, desert, mountains, and so on.
Reinforcement learning allows the model to learn by exploring the
environment and receiving rewards or punishments for certain actions. The
model often referred to as the Agent is allowed to make decisions according to
a policy that it can learn such that it maximizes the reward accumulated.
Neural Networks
Here neural networks have been explained to illustrate the ideas in machine
learning. Neural networks were designed to replicate the human brain. The
human brain has billions of neurons connected to each other, which convey
messages from one part of the brain to another and make us capable of
listening, speaking, and reasoning. Likewise, a neural network has neurons that
are connected with each other. Each neuron performs computation and
forwards its output to connected neurons.
Perceptron
A perceptron is the single unit of the neural network. The perceptron has some
parameters which it can learn. It takes an input, performs some calculations on
it, and returns the output. The calculation consists of two parts:
i) Linear Transformation: �� �
ii) Activation function: � �� �
A linear transformation is simply the dot product of the input � and the
parameters of the perceptron (�). The activation function introduces
nonlinearity to the output. This activation function makes sure that the neural
network is able to learn even complex relationships.
In a neural network, the perceptrons are arranged in layers as shown in
Figure7.5.
The output of the neural network can be designed to represent the probabilities
of different output classes. Sometimes, the output layer of the neural network
is removed. This gives us processed data, and this process is called feature
extraction.
Recently many advancements have been made in this area which has lead to
development of neural networks specializing in different tasks such as image
processing and language processing.
Figure 7.5: Representation of a Neural Network
(Source:Wikipedia: https://upload.wikimedia.org/wikipedia/commons/e/e4/Artificial_neural_network.svg)
7.5 IMPLEMENTATION OF AI
Many implementations of AI are released under open source licenses. These
software can be used and modified by anyone in the world. Here you can look
at one such model, GPT-2. You were acquainted with GPT 3 model in the last
section. GPT 2 is the precursor of GPT-3. The project example is used to
generate passages based on prompts.
7.5.1 Implementation through a Project Example: Generating passage
based on prompts
To run GPT-2, you will use Google Colab as it gives free access to cloud-
based GPU which can speed up the code execution.
Setting Up Google Colab
1. To use Google Colab, you must have a Google account. If you don’t
already have an account, you can create a Google account and set up an
email address and a password.
2. Go to https://colab.research.google.com/ and sign in with your Google
account. Select the New Notebook option. This will open a blank
notebook (Figure7.9).
3. The default notebook contains a code cell. You can run blocks of
python code in these cells. The other type of cell is the Text cell, where
you can enter a description of your code or any other information that
you want to.
4. Just to get started, Make a text box and type “My First Colab
Notebook”, now create a Code Cell and type print(‘hello’). It should
look something like this (Figure7.10):
3. Clone the GitHub repository of GPT-2. Like many other open source
software GPT 2 is hosted on GitHub. Cloning a repository means that
you have made a local copy of the repository. To clone GPT-2 source
code, execute the following code in the code cell.
to interactively generate text based on prompt. The model will then ask
for a prompt, enter a prompt of your choice and hit enter.
Please note that it is ok if you get several warnings before the output. This
happens because as software engineers update the software, they introduce
better functionalities and add warnings to older methods.
7. Output
============================== SAMPLE 1
============================== (AI) and human-machine
cooperation to produce a new generation of vehicles that drive autonomously
and safely.
The idea of robot cars that drive themselves has been around for decades, with
some of the first self-driving car prototypes produced in the 1970s. It has only
been in the last few years that the technology has become commercially
feasible, with Google's self-driving car project, Waymo, and Ford's F-650
concept car being among the first to hit the streets.
Although no firm timeline has been set for the commercial introduction of a
self-driving car, some of the early trials have already begun, with the world's
first fully self-driving car, which was driven for the first time in Switzerland,
taking the reins at a race meeting in Singapore last year.
Now, it's the turn of Chinese tech giant Baidu, which hopes to be one of the
first carmakers to use the technology.
The autonomous vehicle will be developed by Baidu as part of its self-driving
car research. The Chinese technology giant has partnered with Carnegie
Mellon University to develop and test the vehicle.
The car, which has been dubbed the "Baidu Drive", will be able to drive from
New York to San Francisco in less than 12 hours and from Beijing to Shanghai
in about 17 hours. Baidu is currently building a fleet of 100 prototype self-
driving cars in China.
"By combining AI, machine learning, and robotics to improve driving safety,
we hope to dramatically decrease the time it takes to travel from the U.S. to
China," Baidu CEO Robin Li said in a release.
The company hopes the self-driving car will become a mainstream technology
used in the transportation market in the near future, and will help it compete
with Chinese rivals such as Tencent and Alibaba. Baidu will use the test results
to help develop further technologies for autonomous driving, according to the
release.
"The new Baidu Drive prototype demonstrates our technology can handle
highly congested urban environments and has a promising safety record in the
city," said Dr. Bin Zhao, director of the Department of Robotics, the Carnegie
Mellon University's Department of Computer Science.
The autonomous vehicle has been built using Baidu's self-driving technology.
Researchers believe the self-driving car will be able to handle difficult driving
conditions, such as road construction, traffic jams, and sudden changes in road
conditions.
"We are very happy to
8. You can experiment with different prompts.
Question: Is artificial intelligence dangerous?, Answer:
This format can be used to ask questions to the model.
9. Even though the model performs well there are caveats. Sometimes it
seems that it is copying the information from random internet sources.
This is because it was trained on corpus of text taken from the internet.
10. Change the value of temperature and top_k parameters, these
parameters control how creative or “random” the model can become.
Check Your Progress 4
In this section, you studied the implementation of AI, now answer the
questions given in Check Your Progress-4.
Note: a) Write your answer in about 50 words
b) Check your answer with possible answers given at the end of the unit
(1) Setup Google Colab.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(2) Load and use gpt-2.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(3) Use gpt-2 model for various prompts.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
7.7 AI ETHICS
AI is a powerful technology and it is progressing rapidly. AI is capable of
performing many tasks. AI can translate between languages. It can beat best of
the chess players. It can recognize objects in images and videos. It has made
inroads into stock trading, self-driving cars, and many such applications. With
the advancement and progress of AI, questions about the ethics of AI become
more prominent. One such ethical issue comes in self-driving cars. For saving
the life of a patient while going to a hospital, self-driving cars should give
priority to the life of a patient on the car or pedestrian in a crowded road.
Another example pertains to the autonomous weapon system. Can an
autonomous weapon system be given free hand in identifying and attacking a
target? These are dilemmas. There are several ethical issues while using AI.
Some of the ethical issues pertaining to AI have been discussed here.
7.7.1 Data Privacy
Perhaps the most recent episode in the minds of the people is the Cambridge
Analytica scandal where about 87 million people mostly in the USA were
profiled and their profiles were sold to political campaign managers for the
purposes of influencing the voters.
The data was leaked from Facebook, which earns its revenue by running ads.
Tech giants like Facebook and Google that offer free services usually use AI to
monitor and analyze the behavior of their users to improve the quality of ads
by making them more targeted. Thus naturally they collect and store petabytes
of data.
Even smaller, new organizations are collecting and analyzing data for
improving their services. For example, smart appliances send data periodically
to servers which helps the manufacturer to gain insights into the performance
of the product and improve it.
Clearly, there is a trade-off between the amount of data collected and the
performance of a product or service. Thus several practices are incorporated to
ensure that the privacy of individual users is not compromised. The data
collected is usually anonymized, that is, all the markers that may be used to
identify a user are removed before storing and processing the data.
k-anonymity
k-anonymity is a standard metric that measures how many users in a database
share a particular attribute. For example, if we have a database of school
students from which obvious identifiers such as name and roll number are
omitted, one can think of identifying a person based on other attributes such as
percentage of marks obtained, sports team they are a part of, etc. If each query
of such attribute results in at least k-1 outputs, then the database is said to
possess k-anonymity. Thus, for sufficiently large ‘k’, it is not possible to
ascertain the identity of a person based on their attributes.
L-diversity
L-diversity extends the idea of k-anonymity by merging several sensitive
attributes together. For example, if we have a hospital database that has a list
of patients in each disease category, one can run a “background knowledge
attack” i.e. just knowing that a certain number of people in a hospital suffer
from a particular disease can be used for malicious purposes. Thus in the
database instead of listing patients by individual diseases, we can group ‘l’
similar diseases together.
7.7.2 Biases in AI
Many AI models are black boxes, even leading experts don’t understand how
and what factors were taken into account and to what extent. Many AI models,
especially in machine learning, are trained on historic data which themselves
have biases. For example, if a police database is used to train a model that
determines the bail amount, people from certain neighborhoods might see their
bail set at higher amounts if the historical data is biased against them.
7.7.3 Deepfakes
With advancements in deep learning, a new phenomenon has emerged called
deepfakes. Deepfakes refer to audio, video, or images that are generated by AI,
and never actually existed in the real world. For example, the image shown
below (Figure 7.12) is not a picture of a real woman but an image generated by
AI.
AI models can also be trained to generate images or videos that resemble real-
life humans but in entirely different contexts, for example, a video of Barak
Obama calling Trump names was released in 2017.
The most notable architecture that drives these models is Generative
Adversarial Networks or GAN. GANs consist of two neural network models,
one generator and the other discriminator. The generator generates images and
the discriminator tries to classify generator images as fake or real. Both the
generator and discriminator are trained simultaneously and finally when the
discriminator accuracy approaches 50% (i.e. no better than random guess), we
consider the model to be trained.
Deepfakes are problematic because they are hard to detect and hence can
convey misinformation very convincingly to large audiences.
Check Your Progress 6
In this section, you studied AI ethics, now answer the questions given in Check
Your Progress-6.
Note: a) Write your answer in about 50 words
b) Check your answer with possible answers given at the end of the unit
(1) What are k-anonymity and L-diversity.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(2) Discuss biases in AI.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(3) What are deepfakes?
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
7.8 SUMMARY
This unit discussed the basics of Artificial Intelligence. The concept of AI was
elucidated. Acquaintance with various search algorithms was made. An
overview of learning from data i.e. machine learning was presented. Further,
fields of applications such as natural language processing, and self-driving cars
were discussed. A small project was illustrated to demonstrate the
implementation of AI. Finally, the unit concluded with the ethics of AI.
7.9 KEYWORDS
Application Specific Integrated Circuit: An application specific integrated
circuit (ASIC) is a kind of integrated circuit that is specially built for a specific
application or purpose.
Tensor Processing Unit: Tensor Processing Unit (TPU) is an AI accelerator
application-specific integrated circuit (ASIC) developed by Google
specifically for neural network machine learning.
Depth-First Search: Depth-first search is an algorithm for traversing or
searching tree or graph data structures. The algorithm starts at the root node
and explores as far as possible along each branch before backtracking.
Dijkstra’s Algorithm: Dijkstra’s algorithm allows you to calculate the
shortest path between one node of your choosing and every other node in a
graph.
Field Programmable Gate Array (FPGA): It is a semiconductor IC where a
large majority of the electrical functionality inside the device can be changed;
changed by the design engineer, changed during the PCB assembly process, or
even changed after the equipment has been shipped to customers out in the
‘field’.
Game Tree:A game tree is a representation of a sequential game (i.e. a game
in which people take turns like tic-tac-toe or chess). It represents all possible
situations (states) that can occur in the game. Each state is represented by a
node in the graph, nodes that radiate downwards from a node are called
daughter nodes and the node itself is referred to as parent node.
GPT-3: Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive
language model that uses deep learning to produce human-like text.
7.10 CHECK YOUR PROGRESS 1 – POSSIBLE ANSWERS
1) Explain the Turing test that ascertains whether a computer has Artificial
Intelligence.
In 1950 Alan Turing the famous British cryptographer, remembered for his
work on decoding the German Enigma machine, proposed a test to ascertain
whether a computer has Artificial Intelligence (AI). The test goes as follows:
An interviewer interviews a human and a computer without knowing their
identities initially. The interviewer then asks a series of questions to both for
five minutes and tries to figure out which one of them is the AI. If the AI is
able to confuse the human interviewer, then it can be said that the computer
has Artificial Intelligence.
Thus, a computer that satisfies Turing’s definition of Artificial Intelligence
should be able to do things that are expected from humans, for example,
writing an essay, recognizing pictures of celebrities, engaging in conversation,
composing music, solving reasoning tests, and so on.
Check Your Progress 2 – Possible Answers
1) Discuss A* search.
A* search uses both the cost to reach the state (same as Dijkstra’s algorithm)
and the heuristic. Thus, the total cost is given by
�=ℎ � +� �
Different problems have different heuristic functions ℎ. For our example, ℎ �
can simply be chosen as the aerial distance of the destination from the current
node. Where g(n) is the total cost of the path from the starting node to the
current node.
2) Discuss the knowledge representation and reasoning.
Humans often reason based on their knowledge. Certain statements about the
world are assumed to be true; in mathematics and reasoning, these statements
which are assumed to be true are called axioms. A famous example is Euclid’s
axioms of geometry.
In logic, sentences that are either true or false are called statements. We can
represent statements by variables. Statements are usually denoted by upper
case alphabets. Each statement has a truth value, that is, it is either True or
False.
3) Discuss Hardware for AI.
The development of AI algorithms has also led to developments in the
hardware used to run AI algorithms. On the other hand, the development of
processors with high processing power has allowed models like neural
networks to become feasible. Graphics Processing Units or GPUs were
developed for image processing and rendering. GPUs are structured in a highly
parallelized manner, which means it can run thousands of processes
simultaneously. Training neural networks, for example, can be divided into
several sub-processes which can be run simultaneously on a GPU. Thus, neural
networks train much faster on GPUs than an all-purpose CPU.
Field Programmable Gate Arrays (FPGA) are integrated circuits that are
widely used to deploy AI as they are reconfigurable and offer flexibility to the
designer. Since no standard circuits are available for AI, the customizable
nature of FPGA is beneficial.
Check Your Progress 3 – Possible Answers
1) Discuss transformer architecture.
The transformer is basically an encoder decoder architecture. The left side of
the figure shows the encoder. The encoder takes a sentence as input and returns
an embedding (i.e. a vector that is obtained by processing of the input.) The
embedding can be thought of as an essence or meaning of the sentence.
The other half is the decoder, which takes the encoder output and returns the
desired probabilities. For example, if we are using the transformer for machine
translation we can get the output as probabilities for words in the translated
sentence.
The key feature of transformer architecture is the Multi-Headed Attention
block. Attention block allows the model to make long range correlations. Such
correlations are important to make, as languages have pronouns, verbs and
adjectives which usually are correlated with nouns which may occur quite later
in the sentenc
2) Discuss Tesla’s HydraNet architecture.
Tesla uses feed from eight cameras to generate a 3-D map of its environment.
The architecture used is a “HydraNet”. The hydraNet has a single neural
network backbone that takes images from the cameras as input and extracts the
features. Then these features are provided to different heads which are fine-
tuned for different tasks such as vehicle detection, marking detection and so on.
Check Your Progress 4 – Possible Answers
1) Setup Google Colab
8.1 INTRODUCTION
Credit to define machine learning goes to Arthur Samuel. IBM’s Arthur
Samuel wrote a paper titled “Some Studies in Machine Learning Using the
Game of Checkers” in 1959. The paper investigated the application of machine
learning in the game of checkers. The concept of machine learning introduced
by Samuel showed that machines (computers) can learn without being
explicitly programmed. Without explicitly programmed means without the use
of direct programming commands. Here machine learning refers to self-
learning by machine (computer). How will a machine learn? A machine will
learn from historical data and empirical information. Machine Learning,
statistical learning or predictive modelling represents the same concept.
Statistical modelling is at the core of Machine Learning.
Broadly speaking, machines can learn in three ways. These form three
categories of Machine Learning. These are: Supervised Learning,
Unsupervised Learning and Reinforcement Learning. Supervised learning
uses labelled datasets. Labelled data is data that comes with a name or type.
Unsupervised learning involves finding a pattern in data. Thus unsupervised
learning segregates data in clusters or groups. These clusters or groups are
unlabeled. Reinforcement learning works on the principle of reward and
punishment. In other words, Reinforcement learning builds its prediction
model by gaining feedback from random trial and error and leveraging insight
from previous iterations. There are Machine Learning Algorithms
corresponding to these Machine Learning categories. Support Vector Machine
is a supervised machine learning algorithm. K-means clustering is an
unsupervised machine learning algorithm. These traditional models do not
scale in performance as the size of the dataset increases. However, deep
learning methods continue to scale in performance with the increasing size of
the dataset. Machine Learning is an emerging field of computer science having
wide applications in Search engines, Recommendation systems, Spam filters
etc.
Objectives
In this unit, you will learn the fundamentals of Machine Learning with proper
examples and illustrations. After reading this unit, you will be able to:
Understand the concept of Machine Learning
Discuss various types of Machine Learning
Appreciate various Machine Learning Algorithms
Understand the concept of neural networks and deep learning
Explore some frameworks for Machine Learning in Python
However, a machine learning model tries to figure out the function f given x
and y. This approach is similar to how humans figure out things. They observe
the cause and effect and try to figure out how it happened. In other words,
machine learning tries to figure out the rule that connects the input and output.
This approach works wonders when a computer is tasked with performing non-
trivial tasks which do not have a set defined rule, such as recognizing a human
face, differentiating between different cat species, and talking to humans
(Remember Siri and Alexa?).
Another interesting non-trivial task is playing games such as chess and Go, at
which humans seem to excel. However, computers have a tough time
understanding and evaluating the game. For a long time, computers relied on
brute force calculations and tabulated data to try and outperform human
players. For example, when deep blue defeated the World Chess Champion
Garry Kasparov in 1997, it relied heavily on Opening book, a database of
about 70,000 Master Level games and Good-Old Fashioned AI algorithms like
alpha-beta search.
Cut to the present, Google’s AlphaZero is ruling the chess world. It is an AI
based system that learns chess by literally just playing against itself and
consistently outperforms all Grand Masters and other conventional chess
engines.
Unlike chess, in Go (a game quite popular in Korea), the possible scenarios
grow at an even faster rate than in chess, making it even harder to use brute
force techniques and the role of intuition becomes much more important.
Google’s AlphaGo defeated the reigning world champion of Go, again
showing how superior AI and Machine Learning Models can be compared to
conventional algorithms.
A more business-like application of machine learning is in building
recommendation systems. Websites like Netflix and YouTube rely on
recommendation engines to show relevant results to users from an almost
infinite set of possibilities.
Another widely used application of Machine Learning is spam detection.
Google and other companies classify Emails as Spam or Not-Spam based on
certain patterns found in them. Recently Machine Learning has been applied to
this task and with great success.
Check Your Progress 1
In this section, you studied “What is Machine Learning?”, now answer the
questions given in Check Your Progress-1.
Note: a) Write your answer in about 50 words
b) Check your answer with possible answers given at the end of the unit
(1) Briefly explain the idea of Machine Learning.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(2) Mention currently used applications based on Machine Learning.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(The colours are for readers’ convenience. The dataset does not have any prior
information regarding the cluster.)
The simplest way of doing this is using the Naive k-means algorithm. Start
with some arbitrary values for the centroids (μ1 , μ2 , …. . , μ�) . Then allocate
each point to its nearest centroid. Finally, update the centroid values to the new
values, which is the mean of the data points in that partition.
8.4.2 Linear and Logistic Regression
Regression is a widely used method to establish a relation between input and
output variables. In linear regression, the relation between the input and output
is known to be linear in nature (while this might seem too restrictive, there are
indeed a lot of problems that can be mapped to this simple case)
i.e.,
� = �� + �
(a, b are unknown but fixed constants)
A formal presentation of the problem at hand can be done in the following
manner.
Given a set of ordered pairs �1 , �1 , �2 , �2 ……, �� , �� , the goal is to find
the constants a and b that characterize a function �(�) = �� + � such that the
total error ��=1 � � �� , �� (L is a loss function.) is minimized. Commonly
used loss functions include the mean square error, which is defined as
�
2
� �� − ��
�=1
The above statement can be understood visually is as shown in the Figure 8.3.
Here ‘n’ (input, output) pairs are given, and the task is to find the line which
minimizes the sum of the distance of each point from the line.
This can be done using the gradient descent algorithm (Ref Section 8.6 for
details). Start with some random values for ‘a’ and ‘b’ and then calculate the
loss. Then find the gradient of the loss and update parameters to reduce the
loss function. Do so until the error is below the acceptable limit or a certain
number of iterations have been done.
Repeat
��
�=�−
��
��
�=�−
��
Until
L < ϵ, where ϵ is the acceptable error limit
However, there is another class of problem that can be approached in a similar
manner. This problem is that of binary classification. So now the output is
binary one (i.e. �� ∈ 0,1) instead of being a continuous variable. One simple
approach can be to fit a straight line to the data and define a threshold value �0
and classify everything greater than �0 as 1 and everything less than �0 as 0.
Figure 8.4: Sigmoid Function
where,
� = �� + �
Equation (i) represents the logistic regression model. The output of the sigmoid
function can be interpreted as the probability or the confidence that a given
data point has the label 1. There is one last catch before we can fully
implement the logistic regression model, that is, the mean squared error (one
used in linear regression) cannot be used here as it leads to non-convex
optimization problems. So, finally, a Log Loss (also called Cross-Entropy Loss)
function is used which is defined as:
�
The gradient descent is used to optimize the parameters ‘a’ and ‘b’ and
minimize the loss function.
Bias-variance Trade-off
Bias: A model is said to have a high bias when it underfits the data. That is
when the model does not correctly learn the relationship between the input data
and the output. This might happen because the model is too simple, or the
model is making assumptions that simply isn’t true.
Variance: When the model overfits, it is said to have high variance. In this case,
the model is unable to generalize to the data and performs poorly on the test
data while scoring very well on the training data. Generally, this happens when
the model is too complex, or there is too little data to train on.
8.4.3 Support Vector Machines
Support Vector Machines (SVM) are robust classifiers that are great at
separating data that have a complex decision boundary. An SVM finds a
hyperplane is as shown in Figure 8.5, that separates the two classes such that it
maximizes the distance from the nearest data points (also known as the
“Support Vectors” and hence the name).
In Fig.5, there are two classes of points, green and blue. SVM finds � ��� �
such that the line (or more generally the hyperplane) separating the two classes
maximizes the distance between points nearest to the line.
Mathematically we are interested in finding the hyperplane,
�� − � = 0
which is mid-way between the hyperplanes containing the support vectors, i.e.,
�� − � = 1
�� − � =− 1
Such that the distance between them,
2
�=
�
is maximized.
The intuition behind using an SVM is that points that are closest to other group
are more important to consider while making a boundary than those which are
not. Further, the best boundary that separates the two groups is the one that
maximizes the distance between these nearby points. As can be seen intuitively
in Figure 8.6, clearly, the red line is the best boundary.
Figure 8.6: Possible Hyperplanes Separating the Two Classes
The real power of SVM is, however, unleashed when the “Kernel Trick” is
used. In this technique, the data is embedded into a higher dimensional space.
By such embedding, the data which is not linearly separable in the original
space becomes linearly separable in the higher dimensional space. The SVM
then goes on to find the optimal hyperplane separating the data.
Embedding the data to a higher dimensional space basically means we add
more dimensions to the data that are derived from the original data. For
example as shown in the Figure 8.7, we can see there is no hyperplane
separating the two variables. However, if we introduce a third variable
� = �2 + �2
we get an embedding in a 3-D space. When we plot �� , �� , �� , we can clearly
see that purple points are higher up compared to the red points; thus, a
horizontal plane can separate the two.
Feel free to change the input data and weights to see how the output changes.
Exercise: Generate plot for ReLU activation function.
(Hint: ReLU function is defined as ��� 0, � )
Ans.
To summarize mathematically:
������: � = �1 , �2 , ……, �� �
����������: � = �1 , �2 , ……�� �
� = �� �
�=� �
1 2
���� = �' − �
2
�' is the actual label, � is the predicted label, and � is the activation function.
Loss is generally taken to be mean square error or cross entropy loss.
8.5.2 Neural Network
One perceptron isn’t very powerful on its own. To build a more powerful
classifier, we combine a lot of perceptrons so that they can learn even more
complex decision boundaries; this is called a neural network. A simple google
search would return the following trademark image of a neural network is as
shown in Figure 8.9. One can now instantly recognize that it’s just a lot of
perceptrons; each having its own set of input vectors, parameters and
activation function. Output from one layer is passed on to the next as input.
This layered structure gives rise to some fairly obvious terminology:
Input Layer: The input layer is basically the input data itself, reshaped to a
suitable shape.
Hidden Layer: Hidden layer is where all of the computation and learning
occurs.
Output Layer: It finally converts the output of the hidden layer to the
desired output such as {0,1} in the case of binary classification.
Loss Function:
The goal of training the neural network is to minimize the loss function. For
binary classification problems we can use the same loss function as in the case
of logistic regression, i.e., binary cross entropy. Mean Squared Error is used
for regression problems.
Backpropagation
The neural network works by minimizing the loss function. To minimize the
loss function, the gradient descent algorithm is used. For that, there is a need to
find gradients with the help of the chain rule. Differentiating the equations
developed for perceptronusing the chain rule,
�� �� �� ��
=
�� �� �� ��
Notice that � is the loss, which is a function of �. In turn, � is a function of �,
which is a function of �. We can compute each derivate term separately.
��
= �' − �
��
��
= �' �
��
If � is ReLu
�� 1 if z > 0
=
�� 0 �� � < 0
And finally
��
=�
��
Combining,
��
= � − �' �' � �
��
Now in a neural network, there are several layers of perceptrons. For the
output layer, the above formula can be modified to,
��
= � − �' �' � �ℎ
��
Where �ℎ is the output of the hidden layer and the input of the output layer.
For any other layer, the gradient depends on the next outer layer. In other
words, gradient of a layer determines the gradient of the layer before it, hence
the name back propagation
�� ��ℎ
= � − �' �' � �
��ℎ ��ℎ
Where,
�ℎ = � ��ℎ,� �ℎ,�
The subscript j denotes each perceptron in the hidden layer.
Thus,
��
= �0 �' �ℎ �ℎ,�
��ℎ
�0 is the term that is carried over from the output layer.
Finally, all parameters are updated,
��
��,� = ��,� − α
���,�
α is called the learning rate, it determines by how much are the parameters
updated. A very small learning rate will make the learning slow, a very high
learning rate may cause the model to skip the minimum loss point and diverge.
8.5.3 Deep Learning
A Neural Network can have any number of neurons (aka perceptrons) in the
hidden layer, and the hidden layer itself can be multilayered. Neural Networks
that have a lot of hidden layers are termed deep neural networks. As the layers
on a neural network increase, its capacity to learn more complex models
increases. Thus, the more data you have, the deeper neural network you need.
However, as the depth of the network increases, the computational resources
required to train the model goes up considerably. Thus it is preferable to keep
the network deep enough so that it is capable of fitting the data, and at the
same time, avoid problems like overfitting and resource wastage. The two
most basic deep learning models are Convolutional Neural Network and
Recurrent Neural Networks used in image processing and Natural Language
Processing, respectively.
Convolutional Neural Networks
Convolutional neural networks (CNN) use “Convolutional layers” along with
Dense Layers (the normal neural network layers we studied in previous
section).
Convolutional Layers: Convolutional layers use filters to scan the image. A
filter is basically an NxN array where N is significantly less than the
dimension of the input image. The filter is placed at the start of the image, and
a dot product of the filter and overlapping image is calculated.
0.8 2.4 2.5 1 1 1
A = 2.4 0 4.0 ∗ 1 1 1 = (14.4)
1.1 2.4 0.8 1 1 1
This gives us the first output value. The filter is then shifted by a number of
“strides”
and the dot product is evaluated again. This gives us the output of the
convolutional layer. When multiple filters are used, we can stack the output
back to back, so if we have k kernels each giving an MxM output, we would
have a kxMxM output. Convolutional Layers help the model to learn the
correlation between neighbouring pixels and identify structures and patterns
such as eyes on a face and so on.
Another important type of layer used in CNNs is the Pooling Layers, which
come in two varieties. These are : MaxPooling Layer and AveragePooling
Layers
Pooling Layers: Pooling Layers work in a manner similar to the convolutional
layers. They scan the image using a small NxN filter but instead of taking the
dot product between the filter and image, they pick out the maximum value
from the overlapping image in case of max pooling or return the averagevalue
in case of average pooling.
Recurrent Neural Networks
Recurrent neural networks (RNN) are useful in cases where we have sequential
data, such as language models, music and so on. RNNs take a sequential input:
� = �0 , �1 , ……. . , �� . This could be a sentence, for example, “This is a book
about neural networks”. Now, of course, to use a model such as RNN there is a
need to encode this sentence into numeric values. So, each word in the
vocabulary is mapped to a number. For example,
�ℎ�� → �0
�� → �1
� → �2
���� → �3
����� → �4
������ ������� → �5
Now, the way RNN works is bypassing the value at each time-step �� to a
neural network, which returns two parameters ��, �� as shown in Figure 8.11.��
is the output for that time-step and �� is passed over to the next time-step. Then
the output �� is taken as the input along with ��+1 to give ��+1 and ��+1 . Thus,
an RNN utilizes the information of the previous time step to generate the
output for the current time-step, something that is desired when sequential
information is being processed.
Let’s looks at the mathematical description of RNN
�� 2
�
�� ��
�
The inner product gives us the sense of how correlated two vectors are. The
inner product is maximum when two vectors are in the same direction. When
the inner product between two vectors is zero, they are termed Orthogonal
Vectors.
8.6.2 Multivariable Calculus
Single variable calculus deals with functions of one variable � � . However, in
machine learning, we are often interested in functions that depend on several
variables. For example, to predict if it will rain today, we might want to know
the Temperature, Pressure, Wind speed, Humidity and so on. Thus the
probability of it raining today is a function of all of these variables
� �, �, �, �
Naturally, we want to find analogues of derivatives and integrals for such
functions. Also, we would like to see how to find minima and maxima for
these functions. Multivariable Calculus deals with these questions. We will
consider the functions of two variables x and y, as these are easy to understand
visually.
Concept of Gradient
Partial Derivative: Consider a function � �, � , the partial derivative of �
with respect to �, at a given point �0 , �0 is given by
�� � � + ℎ, � − � �, �
= lim
�� ℎ→0 ℎ �0 ,�0
The partial derivative measures the change in the function with respect to one
variable while keeping the other variables fixed. Let’s look at an example.
Example: Find the partial derivatives of the function � �, � = �2 + �2 at
� = 2, � = 3 .
Solution:
��
= 2�
��
��
�� � = 2; =4
��
��
= 2�
��
��
�� � = 3; =6
��
Gradient
Gradient gives the direction (in the domain of the function) in which the
function is the “steepest,” i.e., the direction of maximum change. The
magnitude of the gradient quantifies the steepness.
��
��
∇� �, � = ��
��
Note that Gradient of a function is a vector quantity.
If the gradient of a function is zero at any point, it may point to one of the three
cases:
Local Maxima: A point where the function attains a maximum value
locally, that is, the value at the point is greater than all neighbouring
points.
Saddle Point: Saddle points occur when the gradient is zero, but the
point is neither a maxima or a minima. The figure shows how the
function appears to have a minima when viewed from the front and a
maxima when viewed from the side.
Check Your Progress 5
In this section, you studied Mathematics for Machine Learning, now answer
the questions given in Check Your Progress-5.
Note: a) Write your answer in about 50 words
b) Check your answer with possible answers given at the end of the unit
(1) Find the gradients of the following function:
�2+�2
� �, � = �
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(2) Find the Eigenvalue and Eigenvector for the following matrices.
3 0
1 1
(Hint:)
�� = ��
� − �� � = 0
� − �� = 0
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(3) Find the Dot Product of the following pair of vectors.
�
� = 2,4,6,1,8
�
� = 1,0,3,0,1
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
8.7 SOFTWARE FOR MACHINE LEARNING
Python is the most commonly used Programming Language for designing
machine learning algorithms. The easiest way of using Python is to use Google
Colab Notebooks. Colab Notebooks can run python command interactively.
Google also gives access to high end GPUs for free, which greatly reduces the
time taken to train models, especially deep-learning models.
8.7.1 Setting Up the Environment
Google Colab
To use google Colab you first must have a google account. If you don’t already
have an account, you can create a google account and set up an email address
and a password.
Go to https://colab.research.google.com/ and sign in with your google account.
Select the New Notebook option. This will open a blank notebook.
The default notebook contains a code cell. You can run blocks of python code
in these cells. The other type of cell is the Text Cell, where you can enter a
description of your code or any other information that you want to.
Just to get started, Make a text box and type “My First Colab Notebook”, now
create a Code Cell and type print(‘hello’).
It should look something like this,
Now run each cell using Ctrl+Enter. Alternatively, use Runtime->Run All. The
text will get formatted according to Markdown Syntax. All code output will be
displayed below the cell.
You can add multiple code cells to implement different parts of a code. All
variables, functions and classes defined in one cell are accessible to other cells.
The getting started guide gives a more detailed and comprehensive
introduction.
Setting Up Locally
Easiest way to set up an environment on your system is by using Anaconda.
Anaconda is a package manager, an environment manager and a Python/R data
science distribution. This basically means that all the machine learning
packages such as numpy, Tensorflow etc. Following is a quick guide on how
to install Anaconda
i. Download the Anaconda installer.
ii. Double click the installer to launch.
iii. Complete the set up by agreeing to the Licencing terms and selecting
the location for installation.
iv. Select the following option in advanced menu(Recommended by
Anaconda).
v. Click the Install button. If you want to watch the packages Anaconda is
installing, click Show Details.
vi. After a successful installation you will see the “Thanks for installing
Anaconda” dialog box:
Open Anaconda Navigator from the start Menu. You can install different tools
that are used in Data Science and Machine learning from here. It is
recommended that you Install Jupyter-Notebook for this course. A Jupyter-
Notebook works almost like a Colab Notebook, just with a minor difference.
4Anaconda Navigator
8.7.1 Numpy
Although numpy is not a machine learning library, it is extensively used to
handle the data and pass inputs to other libraries/frameworks. Numpy helps to
handle data that is in the form of an array. For example, an image is an array of
pixels, so you can represent an image using numpy. Operations in numpy are
highly optimized and work way faster than other alternatives such as iterating
over loops. To illustrate this, let’s sum an array with 10^7 elements
It takes the loop about 2 seconds to evaluate the sum, whereas numpy takes
only 0.02 seconds for the same task, making numpy 100x faster than the
alternative.
Numpy arrays are often called “ndarrays” which is a short form for n-
dimensional arrays. Numpy has inbuilt methods that can be utilized to perform
almost all linear algebra tasks. Simplest way to create a numpy array is by
passing a list to the np.array() method. It returns an nd array having the same
values as the list, but a fixed data type.
The fit() method returns a history object which stores the values of different
parameters such as accuracy, validation accuracy, loss for each epoch. The
history member inside the history object is a dictionary that stores values for
different parameters for each epoch.
The compile method configures the model. It takes as input the optimizer that
we want to use to minimize the loss function, the metric for validation.
Finally, we test the model for our test data and see how it performs.
The evaluate() method returns the value of the loss and the metric chosen (In
our case accuracy). The evaluate function gives the results for the entire data
set. If we wish to make an individual prediction on a test data, we can use the
predict() method.
Check Your Progress 6
In this section, you studied Software for Machine Language, now answer the
questions given in Check Your Progress-6.
Note: a) Write your answer in about 50 words
b) Check your answer with possible answers given at the end of the unit
(1) What will be the output for the following code?
print(np.sum([1,2,3,4]))
print(np.concatenate([[1,2,3,4],[4,3,2,1]]))
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(2) Identify and describe the layers used in the following Keras model.
model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
(3) What is one-hot encoding?
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
8.8 SUMMARY
This unit dealt with the fundamentals of Machine Learning. This unit discussed
how a machine learns. Machine learning algorithms were explained and linked
to various machine learning categories. Neural Networks and Deep learning
were described to appreciate the power of machine learning. Mathematics and
Machine Learning Software are crucial in appreciating machine learning
algorithms. These have been ingrained at appropriate places in this unit and
can be used easily by learners either using Google Colab (online) or python
package Anaconda (by installing on your computer/laptop).
8.9 KEYWORDS
Activation Function: The activation function is a mathematical function that
lets transform the outputs to a desired non-linear format before it is sent to the
next layer. It maps the summation result to a desired range.
Agent: An agent is the entity which performs certain actions by interacting
with the environment and has a goal of maximizing its rewards.
Algorithm: An algorithm provides fixed computational rules.
Conventional algorithm: Algorithms based on explicit instructions of how to
perform a task
Machine Learning algorithm: Algorithms that are based on an implicit set of
rules (based on data, not on explicit instructions of how to perform a task)
Anaconda: The most popular data science platform for Python
Backpropagation: Backpropagation is the method for performing gradient
descent in artificial neural networks. It allows us to compute the derivative of
a loss function with respect to every free parameter (i.e. weight and bias) in the
network. It does so layer by layer.
Classifier: A classifier is a type of machine learning algorithm used to assign a
class label to a data input.
Convolutional Neural Networks (CNN): CNN is one of the most popular
deep neural network algorithms. It is mostly used in visual recognition task. It
takes image as an input and learns the features from the different parts of
image.
Environment: Think of the environment as a board game in which landing on
a particular square gives a reward or penalty. Thus the environment is a
collection of entities with which an agent can interact
Gradient descent: Gradient descent is an optimization algorithm which is
commonly used to train machine learning models and neural networks.
K-means clustering: K-means clustering is an unsupervised learning
algorithm, which attempts to partition the set of given input data into k
partitions by minimizing the variance for each partition.
Keras: Keras is a high level open source neural network library written in
python.
Kernel: In machine learning, a kernel is the measure of resemblance where a
kernel function defines the distribution of similarity of points around a given
point.
Kernel-trick: The kernel-trick is a method that allows one to use a linear
classifier to solve a non-linear problem.
Labelled Database: Labelled data is data that comes with a name or type. In
other words, a dataset that contains the data as well as its labels (name/type)
MNIST dataset: The MNIST (Modified National Institute of Standards and
Technology) dataset is a large collection of handwritten digits.
Reward: Reward is a numerical incentive given to the agent for performing a
certain task.
Recurrent Neural Networks (RNN): A recurrent neural network (RNN) is a
type of artificial neural network which uses sequential data or time series data.
These deep learning algorithms are commonly used in language translation,
natural language processing (nlp), speech recognition, and image captioning;
they are incorporated into popular applications such as Siri, voice search, and
Google Translate.
ReLU (Rectified Linear Unit): ReLUis anactivation function in deep neural
network. ReLU function is defined as ��� 0, �
Scikit-Learn: Scikit-Learn is a complete machine learning package that has
most of the machine learning techniques implemented in a highly optimized
fashion.
Support Vector Machine (SVM): SVM is a machine learning algorithm
popular for regression and classification.
Tanh: Tanh is an activation function in a neural network.
1. Input Layer: The input layer is basically the input data itself, reshaped
to a suitable shape.
2. Hidden Layer: Hidden layer is where all of the computation and
learning occurs.
3. Output Layer: It finally converts the output of the hidden layer to the
desired output such as {0,1} in the case of binary classification.
2) What advantages deep learning has over other algorithms such as Support
Vector Machine?
As the size of data scales up, traditional machine learning algorithms (like
Support Vector Machine) stop performing better. However, in this regime,
deep learning models continue to scale up in performance with the data making
them very relevant in the current scenario when a humongous volume of data
is generated on a daily basis (Think Facebook, YouTube, Satellite Data).
3) What is the major problem associated with simple RNNs?
It turns out very simple RNNs are not able to learn long distance connections
in the data, for example, “Sachin Tendulkar is one of the greatest cricket
players, he has over 14,000 runs.” In the sentence, we expect the RNN to be
able to figure out that he refers to Sachin Tendulkar however, in practice,
RNNs do poorly when the gap between two time-steps increases and are
unable to make the connection.
Check Your Progress 5 – Possible Answers
1) Find the gradients of the following functions.
2 2
2�� � +�
2 2
2�� � +�
2) Find the Eigenvalue and Eigenvector for the following matrices.
3−� 0
��� =0
1 1−�
3−� 1−� =0
����� ������: � = 1, � = 3
��� � = 3, �����
3−3 0 �
=0
1 1−3 �
� = 2, � = 1
Similarly, solve for k=1.
2) Identify and describe the layers used in the following Keras model.
Convolution Layer: Uses a filter to generate an output corresponding to the dot
product of filter and overlapping image.
MaxPool Layer: Uses a filter to select maximum values from the image for
each filter window
Flatten Layer: Reshapes the data for compatibility
Droput Layer: Turns off some neurons for the epoch
Dense Layer: Is the simple neural network layer
3) What is one-hot encoding??
Note that we changed the labels to categorical labels. What this means is
that each number from 0 to 9 is represented by a vector with 1 at the
corresponding position.
1 0 0
0 1 0
0 0 0
0 0 0
0 0 0
0→ ,1 → , …………. , 9→
0 0 0
0 0 0
0 0 0
0 0 0
0 0 1
This is done because we use a SoftMax activation function with the last
layer, which returns output in this format. This is known as one-hot
encoding.
4. https://web.mit.edu/6.034/wwwbob/svm.pdf
5. https://ml-cheatsheet.readthedocs.io/en/latest/backpropagation.html
6. https://users.math.msu.edu/users/gnagy/teaching/11-fall/mth234/L19-
234-th.pdf
7. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
8. https://cs231n.github.io/convolutional-networks/
9. https://keras.io/examples/vision/mnist_convnet/
UNIT 9 AI AND MACHINE LEARNING FOR
SMARTCITIES
Structure
9.1 Introduction
9.2 Healthcare
9.3 Education
9.4 Mobility and Transportation
9.5 Energy Sector
9.6 Environment and Economy
9.7 AI and ML Challenges
9.8 Summary
9.9 Keywords
9.10 Check Your Progress – Possible Answers
9.11 References and Selected Readings
9.1 INTRODUCTION
Key components of smart cities are: smart people, smart transportation, smart
living, smart environment, smart economy and smart governance. These
components get manifested in education, mobility and transportation,
healthcare, energy, environment and economy. When we talk of personalized
learning, when we talk of flexible learning, when we talk of inclusive learning,
AI in education can aid in offering personalized, flexible and inclusive learning.
AI can play a significant role in developing a more efficient intelligent
transport system in smart cities. With the coming of IoT devices, advanced
sensors and an increase in data rates, AI and ML have seen extensive use in the
healthcare sector. The very objective of a smart environment is sustainability.
Sustainability signifies the balance between city and environment. Smart cities
should be developed to utilize natural resources in a sustainable way. Air
quality, water quality, waste management and building management are the
attributes of a smart environment. Smart cities are flooded with real-time data
collected from various sources. Analysis of this vast amount of data in full is
nearly impossible without the use of machine language tools. AI and ML have
penetrated every aspect of smart cities. When it comes to the diagnosis of
various diseases, Artificial Intelligence (AI) and Machine Language (ML)
algorithms play a key role. Deep Learning, a subfield of machine language,
like Convolutional Neural Networks (CNN) can be applied in cancer imaging
that assists pathologists to detect and classify the disease at earlier stages.
Support Vector Machine (SVM) can help in the diagnosis of heart disease. The
main algorithm used for autonomous vehicles also called self-driving cars is
Convolutional Neural Networks (CNN). The long and short time memory
(LSTM) model, a common model in the field of deep learning, can be
effectively applied to the power prediction of wind power and photovoltaic
power generation. K-nearest neighbour, Random Forest and Support Vector
Machine can be used for classification of pollution data to estimate pollution
level. With these ubiquitous influences of AI and ML on every component of
smart cities, there are many challenges also that need to be addressed in the
coming days.
Objectives:
In this unit, you will learn the applications of AI and Machine Learning for
smart cities. After reading this unit, you will be able to:
Appreciate applications of AI and ML for healthcare
Understand ML algorithm in education
Identify the role of AI and ML in mobility and transportation
Understand the concept of AI and ML in the energy sector
Identify various applications of AI and ML in the environment and
economy
Discuss challenges of AI and ML in key components of smart cities
9.2 HEALTHCARE
With the coming of IoT devices, advanced sensors and an increase in data rates,
AI and ML have seen extensive use in the healthcare sector. AI and ML are
playing a pivotal role in disease diagnosing, cure prediction, and medical
imaging.
9.2.1 Predictive Medicine
Predictive medicine is a branch of medicine that aims to identify patients at
risk of developing a disease, thereby enabling either prevention or early
treatment of that disease. As AI can find meaningful relationships in raw data,
it can support diagnosis, treatment and predictive outcomes in many medical
situations. The application of AI will help medical professionals to incorporate
proactive management of a disease that is likely to develop. Further AI can
help in predictions of disease by identifying risk factors of a patient and thus
earlier healthcare intervention is possible.
The approach to predict cardiovascular risk without AI and ML fail to identify
many people who would benefit from preventive treatment, whereas others
receive an unnecessary intervention. ML offers improved accuracy in
predicting cardiovascular risks. ML can be used to substantially improve the
accuracy of predicting cancer susceptibility, recurrence and mortality. Deep
Learning technologies like Convolutional Neural Networks (CNN) can be
applied in cancer imaging that assists pathologists to detect and classify the
disease at earlier stages, thus improving the chances of the patient to survive,
especially for lung, breast and thyroid cancer.
9.2.2 AI in Clinical Trials
Clinical trials are a type of research that studies new tests and treatments and
evaluates their effects on human health outcomes. People volunteer to take part
in clinical trials to test medical interventions including drugs, cells and other
biological products, surgical procedures, radiological procedures, devices,
behavioural treatments and preventive care. This takes a lot of time and money.
Further, the success rate is very low. AI can help in eliminating time
consuming data monitoring procedures. There are four phases of clinical trials.
And every phase takes a considerable amount of time and money. AI has the
potential to reduce clinical trial cycle duration.
9.2.3 ML in Medical Image Analysis
The objective of medical image analysis is to assist clinicians and radiologists
in efficient diagnosis and prognosis of the diseases. Deep Learning, a sub field
of ML, is used for automatic extraction of information from medical images
such as magnetic resonance imaging (MRI), X-ray, computed tomography
(CT), ultrasound. Important tasks under medical image analysis are: detection,
classification and segmentation. The identification of specific abnormalities,
like tumor and cancer, in medical images is detection. CNN gives high
performance in medical image analysis detection and classification tasks as
compared to other conventional techniques. CNN and RNN is widely used for
segmentation task in medical image analysis.
Architecture of ML in medical image analysis is as shown in the Figure 9.1.
DL automatically extracts imaging features from medical images. Predictive
modelling for various tasks such as detection, classification, segmentation is
done based on extracted imaging information.
Feature Predictive
Medical Images Extraction Modeling
Prediction
(using CNN) (ML model)
Data Gather
(of learner’s interactions)
Feedback
Data Analysis
(using AI and ML)
Smart Meter
Smart Controller
9.8 SUMMARY
This unit discussed key components of smart cities. Applications of AI and
ML in smart cities in the light of these key components were discussed. It was
shown that AI and ML have been embedded in every aspect of smart cities. In
smart cities, millions of devices generate data, and the analysis of these data in
full is nearly impossible without the use of AI and ML. Be it the diagnosis of
disease, be it personalized learning, be it the integration of renewable energy,
be it intelligent transportation systems, be it entrepreneurship, be it air quality
monitoring, AI and ML are there to support and influence.
9.9 KEYWORDS
Convolutional Neural Networks (CNN): CNN is one of the most popular
deep neural network algorithms. It is mostly used in the visual recognition task.
It takes an image as an input and learns the features from the different parts of
the image.
Decision Tree: Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems.
Genetic Algorithms (GA): Genetic algorithms are stochastic search
algorithms that act on a population of possible solutions. They are loosely
based on the mechanics of population genetics and selection. The potential
solutions are encoded as ‘genes’ — strings of characters from some alphabet.
K-means clustering: K-means clustering is an unsupervised learning
algorithm, which attempts to partition the set of given input data into k
partitions by minimizing the variance for each partition.
K-nearest neighbour: The K-nearest neighbours algorithm is a supervised
learning classifier, which uses proximity to make classifications or predictions
about the grouping of an individual data point. While it can be used for either
regression or classification problems, it is typically used as a classification
algorithm, working off the assumption that similar points can be found near
one another.
LSTM neural network: LSTM network is a common model in the field of
deep learning. This model is an improved one based on a recursive neural
network (RNN). The characteristic of the LSTM network is to use memory
modules instead of common hidden nodes to ensure that the gradient will not
disappear or expand after passing through many time steps, so as to overcome
some difficulties encountered in traditional RNN training. LSTM is suitable
for processing and predicting important events with relatively long intervals
and delays in time series.
Naïve Bayes: The Naive Bayes classification algorithm is a probabilistic
classifier. It is based on probability models that incorporate strong
independence assumptions.
Random Forest: This method uses multiple decision trees to basically classify
and regress large amounts of data, where each tree generates a value for a
given subset of random variables.
RNN: Recurrent neural networks (RNN) are useful in cases where we have
sequential data, such as language models, music and so on. RNNs take a
sequential input: � = �0 , �1 , ……. . , �� .
Support Vector Machine (SVM): SVM is a machine learning algorithm
popular for regression and classification.
ISBN- 978-93-5568-577-3