0% found this document useful (0 votes)
72 views

Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020

This document provides an overview of machine learning and soft computing. It discusses key topics like supervised and unsupervised machine learning, artificial intelligence techniques for knowledge representation and search, and different types of supervised machine learning like classification and regression. Soft computing techniques like fuzzy logic, neural networks, genetic algorithms and probabilistic reasoning are also introduced.

Uploaded by

Nausheen Fatima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020

This document provides an overview of machine learning and soft computing. It discusses key topics like supervised and unsupervised machine learning, artificial intelligence techniques for knowledge representation and search, and different types of supervised machine learning like classification and regression. Soft computing techniques like fuzzy logic, neural networks, genetic algorithms and probabilistic reasoning are also introduced.

Uploaded by

Nausheen Fatima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Machine Learning and

Soft Computing

CSCC53
MCA V Sem
2020

Textbook
• Text Book: Machine Learning by Tom M. Mitchell, TMH

• Reference:
S. Russell and P. Norvig, Artificial Intelligence: A
Modern Approach, 3rd Eition, Prentice Hall, (third
edition) 2010.

• Papers: Selected, topic based papers from the


– Journals: Artificial Intelligence, Artificial Intelligence
Programming, Machine Learning, IEEE Explorer, Data and
Knowledge Engineering, Pattern Recognition etc.

1
Other texts
Title Authors Publisher Year Edition
AI: Structures and George F. Pearson Ed. 2004 4th
Strategies or complex Luger
problem solving
Artificial Intelligence Rich, Knight, McGraw-Hill 2012 3rd
Nair
Artificial Intelligence N. P. Padhy Oxford 2005 1st
and Intelligent Systems University
Press
Artificial Intelligence: A Michael Pearson Ed. 2011 2nd
guide to Intelligent Negnevitsky edition
Systems

Artificial Intelligence
• The automation of activities that we associate with
human thinking, activities such as decision making,
problem solving and learning.

• One of the earliest and most significant papers on


machine intelligence “Computing Machinery and
Intelligence” was written by British mathematician Alan
Turing [Turing, 1950] over sixty five years ago.

• The phrase AI was coined by John McCarthy during


1956.

• It has stood up well to the test of time, and Turing’s


approach remains universal.

2
AI techniques
• Knowledge Representation: This technique addresses the problem
of capturing the full range of knowledge required for intelligent
behavior in a formal language i.e. one suitable for computer
manipulation. Some methods are:
– Predicate Calculus
– Semantic nets (Quillian)
– Frames (Marvin Minsky to represent common sense knowledge)
– Conceptual Dependency (Schank for natural language)
– Scripts (stereo type sequence of events)

• Search: It is problem solving technique that systematically explores a


space of problem states i.e. successive and alternative stages in the
problem–solving process. Some methods are:
– BFS
– DFS
– Best First Search
– MiniMax Search
– Alpha Beta Cutoff

• Machine Learning: Grew out of work in AI (ANN, GA, HMM,


Reinforcement Learning etc. )

AI
• Two notions of AI as defined by Prof.
Andrew Ng, Stanford Univ.
– Artificial Narrow Intelligence
– Artificial General Intelligence

"Machines will not replace physicians, but


physicians using AI will soon replace those
not using it."
-Antonio Di leva
The Lancet

3
Machine learning
• Machine learning is a type of artificial intelligence (AI)
that provides computers with the ability to learn without
being explicitly programmed.

• Machine learning focuses on the development of


computer programs that can teach themselves to grow
and change when exposed to new data.

Machine Learning
• An important subfield of AI as it has a huge
impact on society.
• How can we build computer systems/programs
that automatically improve with experience, and

• What are the fundamental laws that govern all


learning processes?

4
Machine Learning

• It is very hard to write programs that solve


problems like recognizing a face.
– We don’t know what program to write because we don’t
know how its done.

– Even if we had a good idea about how to do it, the


program might be horrendously complicated.

It is very hard to say what makes a 2

5
Machine Learning
• Instead of writing a program by hand, we collect lots of
examples that specify the correct output for a given
input.

• A machine learning algorithm then takes these examples


and produces a program that does the job.

– The program produced by the learning algorithm may


look very different from a typical hand-written program.
It may contain millions of numbers.

– If we do it right, the program works for new cases


as well as the ones we trained it on.

Traditional Programming
Traditionally, we input data and a program to a computer to get output.

Data

Computer Output

Program

6
Definition of Machine Learning
Traditionally, we input data and a program to a computer to get output.

Data

Computer Program

Output

Training Testing

Data
Data Computer
Output
Computer
Program

Output

Formal Definition of ML given by Tom M.


Mitchell (1997)
• Formally: A computer program A is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.
• Informally: Algorithms that improve on some task
with experience .

7
Machine Learning
• The world has become immeasurably data-rich
– Human genome is being sequenced
– Vast chemical databases
– Pharmaceutical databases
– Financial databases
– Medical records of patients etc.

• To make sense out of the data, to extract information


from the data, m/c learning is the discipline to go.
• ML tasks
– to data mine historical medical records to learn which future
patients will respond best to which treatments,

– to build search engines that automatically customize to their


user’s interests.

Data Mining versus M/c Learning


• The process of machine learning is similar to that of
data mining but with a difference.
• Both systems search through data to look for patterns.
– However, instead of extracting data for human comprehension --
as is the case in data mining applications -- machine learning
uses that data to improve the program's own
understanding.
• Machine learning programs detect patterns in data and
adjust program actions accordingly. e.g.)
– Facebook's News Feed changes according to the user's
personal interactions with other users.
– If a user frequently tags a friend in photos, writes on his wall or
"likes" his links, the News Feed will show more of that friend's
activity in the user's News Feed due to presumed closeness.

8
Soft Computing
• Soft computing differs from conventional (hard)
computing in that, unlike hard computing, it is tolerant of
– imprecision,
– uncertainty,
– partial truth, and
– approximation.

• In effect, the role model for soft computing is the human


mind.

• tools, techniques, of Soft Computing (SC) are


– Fuzzy Logic (FL), Neural Networks (NN),
– Support Vector Machines (SVM),
– Evolutionary Computation (EC) or Genetic Algorithm
– Probabilistic Reasoning (PR)

3 kinds of M/C Learning


• Supervised
– Target labels are present
– sequence of training vectors or patterns, each with an
associated target output vector
• Unsupervised
– Target labels are missing
– A sequence of input vectors is provided but no target
vectors are specified
• Reinforcement
– Here an agent learns from feedback with the physical
environment by interacting and trying actions and
receiving some sort of evaluation from environment.

9
Supervised Machine Learning
• For designing m/l based model, we collect lots of
examples that specify the correct output for a given
input.
• A machine learning algorithm then takes these examples
and produces a program that does the job.
– The program produced by the learning algorithm may look very
different from a typical hand-written program. It may contain
millions of numbers.
– If we do it right, the program works for new cases as well as
the ones we trained it on.

Different types of Supervised m/c learning


• Classification: the training data set is having target
labels as classes or discrete o/p values
– E.g. 0 or 1, malignant or benign, type 1/2/3/4 cancer etc
• Regression: our goal is to predict a continuous/real
valued output
– It means approximating a real valued target.
– E.g. You have some inventory, you want to predict how many of
these items will sell over the next 3 months
– predict continuous responses— for example, changes in
temperature or fluctuations in power demand. Typical
applications include electricity load forecasting and algorithmic
trading.
• Pattern association: the desired o/p is not just a yes or
no but rather a pattern.

10
Some more examples of tasks that are best solved by using a
machine learning algorithm:
• Machine Translation
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
• Recognizing anomalies:
– Unusual sequences of credit card transactions
– Unusual patterns of sensor readings in a
nuclear power plant
• Prediction:
– Future stock prices
– Future currency exchange rates

Type of prediction
The different types of predictive models are summed up in
the table below:

Regression Classifier
Outcome Continuous Class
Linear/Non-linear
Logistic regression,
regression
Examples SVM, Naive Bayes,
(fitting line/curve to the
Backpropagation
data)

Classifying whether an
Stock price prediction,
email is a spam or not,
electricity load
classifying whether a
forecasting, predict
Tasks tumour is malignant or
changes in temperature
benign, classifying
or fluctuations in power
whether a website is
demand, etc.
fraudulent or not, etc.

11
Type of model
The different models are summed up in
the table below:
Discriminative model Generative model
Estimate P(x|y) then use it to
Goal Directly estimate P(y|x)
deduce P(y|x)
What's
Decision boundary Probability distributions of the data
learned

Naive Bayes, Gaussian


Examples Regressions, SVMs
Discriminant Analysis

Examples of Supervised learning


• Supervised learning is about Input to Output mapping.

Input Output Application

Email Spam? (Yes/No) Spam filtering

Audio Text Transcript Speech recognition

English Text German Text Machine Translation

Ad, user info Click? (Yes/No) Online advertising

Image of phones Defect? (Yes/No) Visual inspection

Image, radar info Position of other cars Self-driving car

12
Examples of Un-supervised learning
• text mining
• Web analysis
• marketing
• E.g. Targeted marketing, Recommender Systems, and
Customer Segmentation are applications in Unsupervised
learning.
– Targeted marketing identifies an audience likely to buy services
or products and promotes those services or products to that
audience.
– Once these key groups are recognized, companies develop
marketing campaigns and specific products for those preferred
market segments.

Deep Learning
• Deep learning tasks:
– Image/Video clip analysis: face recognition
– Computer vision: object detection, car detection
– listen to an audio clip and understand what is said in an audio clip

• Deep-learning networks perform automatic feature


extraction without human intervention, unlike most
traditional machine-learning algorithms.

13
Pre-trained Models
• Pretrained models are a wonderful source of help for
people looking to learn an algorithm or try out an existing
framework.
– E.g. Classify Image Using GoogLeNet, AlexNet, VGG-16, VGG-
19, and DenseNet-201 etc.
– The pretrained networks are trained on more than a million
images and can classify images into 1000 object categories,
such as keyboard, coffee mug, pencil, and many animals.
– The training images are a subset of the ImageNet database
• Transfer Learning

Watch, Attend and Spell S/W System

• Article cutting from


Times Trends
dated March 20,
2017

14
EEG
• Article cutting
from Times
Trends dated
July 3, 2017

AI and Data
• IBM quits the facial-recognition business
dated: June 10, 2020
source: https://www.theverge.com/2020/6/8/21284683/ibm-no-
longer-general-purpose-facial-recognition-analysis-software
• IBM will no longer sell “general purpose” facial-recognition
technology, chief executive Arvind Krishna wrote in a letter to US
Congress.
• Rekognition by Amazon is a tool for monitoring people of interest
and doubled down on providing other surveillance technologies to
governments.
• In 2018, nearly 70 civil rights and research organizations wrote a
letter to Amazon CEO Jeff Bezos demanding that Amazon stop
providing face recognition technology to governments. source:
https://www.aclu.org/letter-nationwide-coalition-amazon-ceo-jeff-
bezos-regarding-rekognition/

15
MIT Technology Review

on Wednesday, June 10, 2020, Amazon shocked civil rights activists and
researchers when it announced that it would place a one-year moratorium on police
use of Rekognition.

AI and data
• In modern artificial intelligence, data rules.

• A.I. software is only as smart as the data used to train it.

• If there are many more white men than black women in the
system, it will be worse at identifying the black women.

• The reason being; society itself is biased, discriminatory,


messy, and unequal and this is embedded into datasets.

• Under-represented groups just don’t produce enough


data that AI systems can train on.
• During July 2019, Hyderabad airport became the first in
India to launch voluntary facial recognition system called
DigiYatra, then Bengaluru 2nd, and Delhi 3rd.

16
AI Definitions: Summary

Bayesian Learning
• Provides practical learning algorithms
– Naïve Bayes learning
– Bayesian belief network learning
– Combine prior knowledge (prior probabilities)

• Provides foundations for machine learning


– Evaluating learning algorithms
– Guiding the design of new algorithms
– Learning from models : meta learning

17
Dilemma
This person dropped their
ticket in the hallway.
Do you call out
“Excuse me, ma’am!”
or
“Excuse me, sir!”
You have to make a
guess.

Bayesian inference is a way to make guesses about what


your data mean based on sometimes very little data.

Bayesian Inference
• Bayesian inference is a way to capture
common sense.
• It helps you use what you know to make
better guesses.

18
Conditional probabilities
P(A | B) is the probability of A, given B.
“If I know B is the case, what is the probability that A is also the case?”
P(A | B) is not the same as P(B | A).

P(cute | puppy) is not the same as P(puppy | cute)


If I know the thing I’m holding is a puppy, what is the probability that it is cute?
If I know the the thing I’m holding is cute, what is the probability that it is a puppy?

Joint probabilities
● P(A and B) or P(A,B) or P(A with B) or P(A Ⴖ B)
● P(A,B)=P(A)*P(B|A)
● P(A,B,C)=P(A)*P(B|A)*P(C|A and B)
● P(B,A)=P(B)* P(A|B)

e.g. What is the probability that a person is both a woman and has short
hair?
P(woman with short hair)
= P(woman) * P(short hair | woman)
= .5 * .5 = .25
● P( Ⴈ A | B ) = 1- P( A | B )

19
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Sunny?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Sunny)=? P(D2=Rainy | D1=Rainy)=0.4
Answer:

Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Sunny?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Sunny)=? P(D2=Rainy | D1=Rainy)=0.4
Answer:
● P(D2=Sunny)=
P(D2=Sunny and D1=Sunny) +
P(D2=Sunny and D1=Rainy)
= P(D2=Sunny | D1=Sunny) * P(D1=Sunny) +
P(D2=Sunny | D1=Rainy)* P(D1=Rainy)
=0.8 * 0.9+ 0.6 * 0.1
=0.78

20
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Rainy?
P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Rainy)=? P(D2=Rainy | D1=Rainy)=0.4
Answer:

Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 2 is Rainy?
P(D1=Sunny)=0.9 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Sunny)=0.8
P(D2=Rainy | D1=Sunny)=0.2
P(D2=Sunny | D1=Rainy)=0.6
P(D2=Rainy)=? P(D2=Rainy | D1=Rainy)=0.4

Answer: 0.22

21
Marginal Probability
● It is either sunny or it’s rainy. Probability of a sunny day is 0.9. A sunny day
follows a sunny day with a probability of 0.8. A sunny day follows a rainy
day with probability of 0.6. What is the probability that Day 3 is Sunny?

P(D1=Sunny)=0.9
P(D2=Sunny | D1=Sunny)=0.8 P(D1=Rainy)=0.1
P(D2=Sunny | D1=Rainy)=0.6 P(D2=Rainy | D1=Sunny)=0.2
P(D2=Rainy)=? P(D2=Rainy | D1=Rainy)=0.4
Answer: solve it by replacing D3 by D2 and D2 by D1 from previous solution.

Theorem of Total Probability


• if events A1, …., An are mutually exclusive with

n
P ( B )   P ( B | Ai ) P ( Ai )
i 1

22
Product Rule : probability P(A,B) of a
conjunction of two events A and B:
P ( A, B )  P ( A | B ) P ( B )  P ( B | A) P ( A)
● The two joint probabilities on occurrence
of same two events are always equal.
● P(B,A) = P(A,B)
● P(B)* P(A|B)= P(A)*P(B|A)
 P(A | B) = P(B|A) * P(A) / P(B)
 P(A | B) = P(A,B) / P(B)

Bayes’ Theorem

P(A | B) = P(B | A) P(A)


P(B)

23
Properties of Bayes Rule
• The computation or revision of unknown or old
probabilities called priori probabilities in the light of
additional information made available by experiment or
past records to derive a set of new probabilities known
as posterior probabilities.
• Combines prior knowledge and observed data: prior
probability of a hypothesis multiplied with probability
of the hypothesis given the training data
• Probabilistic hypothesis: outputs not only a
classification, but a probability distribution over all
classes

Bayes Rule example


• There is a specific type of cancer which exists for 1% of population.
Probability of a test coming positive given that one has cancer is 0.9. And the
probability of this test coming out negative given that one doesn’t have
cancer is 0.2. What is the probability that a person has this cancer given that
he just received a positive test.
• Answer:
P(C)=0.01 P(ႨC)=0.9
P(+ | C)=0.9 P(- | C)=0.1
P(- | Ⴈ C)=0.2 P(- | Ⴈ C)=0.8
P(C | +)=?

P(C | +)= P(+|c) P(C) / P(+)


=0.9 * 0.01 / 0.207
= 0.0403 i.e. 4.03% so the test is quite sensitive.

P(+)=P(+|C) P(C)+ P(+ | ႨC) P( Ⴈ C)=0.9*0.01+0.2*0.99=0.207

24
Example 2
Does patient have cancer or not?
A patient takes a lab test and the result comes back positive.
The test returns a correct positive result in only 98% of the cases
in which the disease is actually present, and a correct negative
result in only 97% of the cases in which the disease is not
present. Furthermore, .008 of the entire population have this
cancer.
P ( cancer )  . 008 , P (  cancer )  . 992
P (  | cancer )  . 98 , P (  | cancer )  . 02
P (  |  cancer )  . 03 , P (  |  cancer )  . 97
P (  | cancer ) P ( cancer )
P ( cancer |  ) 
P ( )
P (  |  cancer ) P (  cancer )
P (  cancer |  ) 
P ( )

Bayesian Classification: Why?


• Probabilistic learning: Calculate explicit probabilities for
hypothesis, among the most practical approaches to
certain types of learning problems
• Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct. Prior knowledge can be combined with
observed data.
• Probabilistic prediction: Predict multiple hypotheses,
weighted by their probabilities
• Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
of optimal decision making against which other methods
can be measured.

25
Reasoning Under Uncertainty
Many different types of errors can contribute to
uncertainty.
1. data might be missing or unavailable
2. data might be ambiguous or unreliable due to
measurement errors
3. the representation of data may be imprecise or
inconsistent
4. data may just be user's best guess (random)
5. data may be based on defaults, and defaults
may have exceptions

Approaches in Dealing with


Uncertainty
Numerically oriented methods:
• Bayes’ Rules
• Certainty Factors
• Dempster Shafer
• Fuzzy Sets
Quantitative approaches
• Non-monotonic reasoning
Symbolic approaches
• Cohen’s Theory of Endorsements
• Fox’s semantic systems

26
Naïve Bayesian Classifier
● It is based on Baye’s theorem with independence
assumption between predictors.

● A naïve Bayesian model is easy to build, with no


complicated iterative parameter estimation which makes it
particularly useful for very large datasets.

● Despite its simplicity, it often does surprisingly well and is


widely used because it often outperforms more
sophisticated classification methods.

Naïve Bayesian Classifier


● Let D=training set of tuples, each tuple is represented by
n-dimensional vector X=(x1, x2, x3….xn)

● Let there be m classes, C1, C2, ….Cm

● Given a tuple X, naïve Bayesian classifier predicts that


tuple X belongs to class Ci

iff P(Ci|X) > P(Cj|X) for 1≤j ≤ m, j ≠i

i.e. Maximize P(Ci|X)= Maximize P(X|Ci) P(Ci)

27
Bayesian Theorem
• Given training data D, posteriori probability of a
hypothesis h, P(h|D) follows the Bayes theorem
P(h | D)  P( D | h)P(h)
P(D)
• MAP (maximum posteriori) hypothesis
h  arg max P ( h | D )  arg max P ( D | h ) P ( h ) .
MAP hH hH
• Practical difficulty: require initial knowledge of
many probabilities, significant computational
cost

Estimating a-posteriori
probabilities
• Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
• P(X) is constant for all classes
• P(C) = relative freq of class C samples
• C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum

28
Basic Approach
Bayes Rule: P ( D | h) P ( h)
P(h | D) 
P( D)
• P(h) = prior probability of hypothesis h
• P(D) = prior probability of training data D
• P(h|D) = probability of h given D (posterior density )
• P(D|h) = probability of D given h (likelihood of D given h)
The Goal of Bayesian Learning: the most probable hypothesis given the training data
(Maximum A Posteriori hypothesis hmap
)
hmap
 max P (h | D )
hH

P ( D | h) P ( h)
 max
hH P( D)
 max P ( D | h) P (h)
hH

Source: http://artint.info/2e/html/ArtInt2e.Ch10.S1.SS2.html#p1

MAP (Maximum A Posteriori hypothesis) Learner


For each hypothesis h in H, calculate the posterior probability

P ( D | h ) P ( h)
P(h | D ) 
P( D)
Output the hypothesis hmap with the highest posterior probability

hmap  max P (h | D)
hH
Comments:
Computational intensive
Providing a standard for judging the performance
of learning algorithms
Choosing P(h) and P(D|h) reflects our prior
knowledge about the learning task

29
Bayesian classification
• The classification problem may be formalized
using a-posteriori probabilities:
• P(C|X) = prob. that the sample tuple
X=<x1,…,xk> is of class C.

• E.g. P(class=N | outlook=sunny, windy=true,…)

• Idea: assign to sample X the class label C such


that P(C|X) is maximal

Bayesian classification
• The classification problem may be
formalized using a-posteriori probabilities:
• P(C|X) = prob. that the sample tuple
X=<x1,…,xk> is of class C.

• E.g. P(class=N |
outlook=sunny,windy=true,…)

• Idea: assign to sample X the class label C


such that P(C|X) is maximal

30
Naïve Bayesian Classification
• Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
• If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of
samples having value xi as i-th attribute in class
C
• If i-th attribute is continuous:
P(xi|C) is estimated through a Gaussian density
function
• Computationally easy in both cases

likelihood

31
Day Outlook Temperature Humidity Wind Play ball

D1 Sunny Hot High Weak No


D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Naive Bayesian Classifier (II)


• Given a training set, we can compute the
probabilities
Outlook Y N Humidity Y N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Y N Windy Y N
hot 2/9 2/5 weak 6/9 2/5
mild 4/9 2/5 strong 3/9 3/5
cool 3/9 1/5

32
weather example: classifying
X
• An unseen sample X = <rain, hot, high, weak>

• P(X|p)·P(yes) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(yes) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286

• Sample X is classified in class n (don’t play)

33

You might also like