Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
Soft Computing
CSCC53
MCA V Sem
2020
Textbook
• Text Book: Machine Learning by Tom M. Mitchell, TMH
• Reference:
S. Russell and P. Norvig, Artificial Intelligence: A
Modern Approach, 3rd Eition, Prentice Hall, (third
edition) 2010.
1
Other texts
Title Authors Publisher Year Edition
AI: Structures and George F. Pearson Ed. 2004 4th
Strategies or complex Luger
problem solving
Artificial Intelligence Rich, Knight, McGraw-Hill 2012 3rd
Nair
Artificial Intelligence N. P. Padhy Oxford 2005 1st
and Intelligent Systems University
Press
Artificial Intelligence: A Michael Pearson Ed. 2011 2nd
guide to Intelligent Negnevitsky edition
Systems
Artificial Intelligence
• The automation of activities that we associate with
human thinking, activities such as decision making,
problem solving and learning.
2
AI techniques
• Knowledge Representation: This technique addresses the problem
of capturing the full range of knowledge required for intelligent
behavior in a formal language i.e. one suitable for computer
manipulation. Some methods are:
– Predicate Calculus
– Semantic nets (Quillian)
– Frames (Marvin Minsky to represent common sense knowledge)
– Conceptual Dependency (Schank for natural language)
– Scripts (stereo type sequence of events)
AI
• Two notions of AI as defined by Prof.
Andrew Ng, Stanford Univ.
– Artificial Narrow Intelligence
– Artificial General Intelligence
3
Machine learning
• Machine learning is a type of artificial intelligence (AI)
that provides computers with the ability to learn without
being explicitly programmed.
Machine Learning
• An important subfield of AI as it has a huge
impact on society.
• How can we build computer systems/programs
that automatically improve with experience, and
4
Machine Learning
5
Machine Learning
• Instead of writing a program by hand, we collect lots of
examples that specify the correct output for a given
input.
Traditional Programming
Traditionally, we input data and a program to a computer to get output.
Data
Computer Output
Program
6
Definition of Machine Learning
Traditionally, we input data and a program to a computer to get output.
Data
Computer Program
Output
Training Testing
Data
Data Computer
Output
Computer
Program
Output
7
Machine Learning
• The world has become immeasurably data-rich
– Human genome is being sequenced
– Vast chemical databases
– Pharmaceutical databases
– Financial databases
– Medical records of patients etc.
8
Soft Computing
• Soft computing differs from conventional (hard)
computing in that, unlike hard computing, it is tolerant of
– imprecision,
– uncertainty,
– partial truth, and
– approximation.
9
Supervised Machine Learning
• For designing m/l based model, we collect lots of
examples that specify the correct output for a given
input.
• A machine learning algorithm then takes these examples
and produces a program that does the job.
– The program produced by the learning algorithm may look very
different from a typical hand-written program. It may contain
millions of numbers.
– If we do it right, the program works for new cases as well as
the ones we trained it on.
10
Some more examples of tasks that are best solved by using a
machine learning algorithm:
• Machine Translation
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
• Recognizing anomalies:
– Unusual sequences of credit card transactions
– Unusual patterns of sensor readings in a
nuclear power plant
• Prediction:
– Future stock prices
– Future currency exchange rates
Type of prediction
The different types of predictive models are summed up in
the table below:
Regression Classifier
Outcome Continuous Class
Linear/Non-linear
Logistic regression,
regression
Examples SVM, Naive Bayes,
(fitting line/curve to the
Backpropagation
data)
Classifying whether an
Stock price prediction,
email is a spam or not,
electricity load
classifying whether a
forecasting, predict
Tasks tumour is malignant or
changes in temperature
benign, classifying
or fluctuations in power
whether a website is
demand, etc.
fraudulent or not, etc.
11
Type of model
The different models are summed up in
the table below:
Discriminative model Generative model
Estimate P(x|y) then use it to
Goal Directly estimate P(y|x)
deduce P(y|x)
What's
Decision boundary Probability distributions of the data
learned
12
Examples of Un-supervised learning
• text mining
• Web analysis
• marketing
• E.g. Targeted marketing, Recommender Systems, and
Customer Segmentation are applications in Unsupervised
learning.
– Targeted marketing identifies an audience likely to buy services
or products and promotes those services or products to that
audience.
– Once these key groups are recognized, companies develop
marketing campaigns and specific products for those preferred
market segments.
Deep Learning
• Deep learning tasks:
– Image/Video clip analysis: face recognition
– Computer vision: object detection, car detection
– listen to an audio clip and understand what is said in an audio clip
13
Pre-trained Models
• Pretrained models are a wonderful source of help for
people looking to learn an algorithm or try out an existing
framework.
– E.g. Classify Image Using GoogLeNet, AlexNet, VGG-16, VGG-
19, and DenseNet-201 etc.
– The pretrained networks are trained on more than a million
images and can classify images into 1000 object categories,
such as keyboard, coffee mug, pencil, and many animals.
– The training images are a subset of the ImageNet database
• Transfer Learning
14
EEG
• Article cutting
from Times
Trends dated
July 3, 2017
AI and Data
• IBM quits the facial-recognition business
dated: June 10, 2020
source: https://www.theverge.com/2020/6/8/21284683/ibm-no-
longer-general-purpose-facial-recognition-analysis-software
• IBM will no longer sell “general purpose” facial-recognition
technology, chief executive Arvind Krishna wrote in a letter to US
Congress.
• Rekognition by Amazon is a tool for monitoring people of interest
and doubled down on providing other surveillance technologies to
governments.
• In 2018, nearly 70 civil rights and research organizations wrote a
letter to Amazon CEO Jeff Bezos demanding that Amazon stop
providing face recognition technology to governments. source:
https://www.aclu.org/letter-nationwide-coalition-amazon-ceo-jeff-
bezos-regarding-rekognition/
15
MIT Technology Review
on Wednesday, June 10, 2020, Amazon shocked civil rights activists and
researchers when it announced that it would place a one-year moratorium on police
use of Rekognition.
AI and data
• In modern artificial intelligence, data rules.
• If there are many more white men than black women in the
system, it will be worse at identifying the black women.
16
AI Definitions: Summary
Bayesian Learning
• Provides practical learning algorithms
– Naïve Bayes learning
– Bayesian belief network learning
– Combine prior knowledge (prior probabilities)
17
Dilemma
This person dropped their
ticket in the hallway.
Do you call out
“Excuse me, ma’am!”
or
“Excuse me, sir!”
You have to make a
guess.
Bayesian Inference
• Bayesian inference is a way to capture
common sense.
• It helps you use what you know to make
better guesses.
18
Conditional probabilities
P(A | B) is the probability of A, given B.
“If I know B is the case, what is the probability that A is also the case?”
P(A | B) is not the same as P(B | A).
Joint probabilities
● P(A and B) or P(A,B) or P(A with B) or P(A Ⴖ B)
● P(A,B)=P(A)*P(B|A)
● P(A,B,C)=P(A)*P(B|A)*P(C|A and B)
● P(B,A)=P(B)* P(A|B)
e.g. What is the probability that a person is both a woman and has short
hair?
P(woman with short hair)
= P(woman) * P(short hair | woman)
= .5 * .5 = .25
● P( Ⴈ A | B ) = 1- P( A | B )
19
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Sunny?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Sunny)=? P(D2=Rainy | D1=Rainy)=0.4
Answer:
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Sunny?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Sunny)=? P(D2=Rainy | D1=Rainy)=0.4
Answer:
● P(D2=Sunny)=
P(D2=Sunny and D1=Sunny) +
P(D2=Sunny and D1=Rainy)
= P(D2=Sunny | D1=Sunny) * P(D1=Sunny) +
P(D2=Sunny | D1=Rainy)* P(D1=Rainy)
=0.8 * 0.9+ 0.6 * 0.1
=0.78
20
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Rainy?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Rainy)=? P(D2=Rainy | D1=Rainy)=0.4
Answer:
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Rainy?
P(D1=Sunny)=0.9 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Sunny)=0.8
P(D2=Rainy | D1=Sunny)=0.2
P(D2=Sunny | D1=Rainy)=0.6
P(D2=Rainy)=? P(D2=Rainy | D1=Rainy)=0.4
Answer: 0.22
21
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 3 is Sunny?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Rainy)=? P(D2=Rainy | D1=Rainy)=0.4
Answer: solve it by replacing D3 by D2 and D2 by D1 from previous solution.
n
P ( B ) P ( B | Ai ) P ( Ai )
i 1
22
Product Rule : probability P(A,B) of a
conjunction of two events A and B:
P ( A, B ) P ( A | B ) P ( B ) P ( B | A) P ( A)
● The two joint probabilities on occurrence
of same two events are always equal.
● P(B,A) = P(A,B)
● P(B)* P(A|B)= P(A)*P(B|A)
P(A | B) = P(B|A) * P(A) / P(B)
P(A | B) = P(A,B) / P(B)
Bayes’ Theorem
23
Properties of Bayes Rule
• The computation or revision of unknown or old
probabilities called priori probabilities in the light of
additional information made available by experiment or
past records to derive a set of new probabilities known
as posterior probabilities.
• Combines prior knowledge and observed data: prior
probability of a hypothesis multiplied with probability
of the hypothesis given the training data
• Probabilistic hypothesis: outputs not only a
classification, but a probability distribution over all
classes
24
Example 2
Does patient have cancer or not?
A patient takes a lab test and the result comes back positive.
The test returns a correct positive result in only 98% of the cases
in which the disease is actually present, and a correct negative
result in only 97% of the cases in which the disease is not
present. Furthermore, .008 of the entire population have this
cancer.
P ( cancer ) . 008 , P ( cancer ) . 992
P ( | cancer ) . 98 , P ( | cancer ) . 02
P ( | cancer ) . 03 , P ( | cancer ) . 97
P ( | cancer ) P ( cancer )
P ( cancer | )
P ( )
P ( | cancer ) P ( cancer )
P ( cancer | )
P ( )
25
Reasoning Under Uncertainty
Many different types of errors can contribute to
uncertainty.
1. data might be missing or unavailable
2. data might be ambiguous or unreliable due to
measurement errors
3. the representation of data may be imprecise or
inconsistent
4. data may just be user's best guess (random)
5. data may be based on defaults, and defaults
may have exceptions
26
Naïve Bayesian Classifier
● It is based on Baye’s theorem with independence
assumption between predictors.
27
Bayesian Theorem
• Given training data D, posteriori probability of a
hypothesis h, P(h|D) follows the Bayes theorem
P(h | D) P( D | h)P(h)
P(D)
• MAP (maximum posteriori) hypothesis
h arg max P ( h | D ) arg max P ( D | h ) P ( h ) .
MAP hH hH
• Practical difficulty: require initial knowledge of
many probabilities, significant computational
cost
Estimating a-posteriori
probabilities
• Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
• P(X) is constant for all classes
• P(C) = relative freq of class C samples
• C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
28
Basic Approach
Bayes Rule: P ( D | h) P ( h)
P(h | D)
P( D)
• P(h) = prior probability of hypothesis h
• P(D) = prior probability of training data D
• P(h|D) = probability of h given D (posterior density )
• P(D|h) = probability of D given h (likelihood of D given h)
The Goal of Bayesian Learning: the most probable hypothesis given the training data
(Maximum A Posteriori hypothesis hmap
)
hmap
max P (h | D )
hH
P ( D | h) P ( h)
max
hH P( D)
max P ( D | h) P (h)
hH
Source: http://artint.info/2e/html/ArtInt2e.Ch10.S1.SS2.html#p1
P ( D | h ) P ( h)
P(h | D )
P( D)
Output the hypothesis hmap with the highest posterior probability
hmap max P (h | D)
hH
Comments:
Computational intensive
Providing a standard for judging the performance
of learning algorithms
Choosing P(h) and P(D|h) reflects our prior
knowledge about the learning task
29
Bayesian classification
• The classification problem may be formalized
using a-posteriori probabilities:
• P(C|X) = prob. that the sample tuple
X=<x1,…,xk> is of class C.
Bayesian classification
• The classification problem may be
formalized using a-posteriori probabilities:
• P(C|X) = prob. that the sample tuple
X=<x1,…,xk> is of class C.
• E.g. P(class=N |
outlook=sunny,windy=true,…)
30
Naïve Bayesian Classification
• Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
• If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of
samples having value xi as i-th attribute in class
C
• If i-th attribute is continuous:
P(xi|C) is estimated through a Gaussian density
function
• Computationally easy in both cases
likelihood
31
Day Outlook Temperature Humidity Wind Play ball
32
weather example: classifying
X
• An unseen sample X = <rain, hot, high, weak>
• P(X|p)·P(yes) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(yes) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286
33