0% found this document useful (0 votes)

20 views85 pages

Statistical Perspective

Uploaded by

Ratheesh P M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views85 pages

Statistical Perspective

Uploaded by

Ratheesh P M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Neural Networks:

A Statistical Pattern Recognition

Perspective

1/67
Statistical Framework
 The natural framework for studying the
design and capabilities of pattern
classification machines is statistical
 Nature of information available for decision
making is probabilistic

2/67
Feedforward Neural Networks
 Have a natural propensity for performing
classification tasks
 Solve the problem of recognition of patterns in
the input space or pattern space
 Pattern recognition:
 Concerned with the problem of decision making based
on complex patterns of information that are
probabilistic in nature.
 Network outputs can be shown to find proper
interpretation of conventional statistical pattern
recognition concepts. 3/67
Pattern Classification
 Linearly separable pattern sets:
 only the simplest ones
 Iris data: classes overlap
 Important issue:
 Find an optimal placement of the discriminant
function so as to minimize the number of
misclassifications on the given data set, and
simultaneously minimize the probability of
misclassification on unseen patterns.
4/67
Notion of Prior
 The prior probability P(Ck) of a pattern
belonging to class Ck is measured by the
fraction of patterns in that class assuming an
infinite number of patterns in the training
set.
 Priors influence our decision to assign an
unseen pattern to a class.

5/67
Assignment without Information
 In the absence of all other information:
 Experiment:
 In a large sample of outcomes of a coin toss
experiment the ratio of Heads to Tails is 60:40
 Is the coin biased?
 Classify the next (unseen) outcome and
minimize the probability of mis-classification
 (Natural and safe) Answer: Choose Heads!

6/67
Introduce Observations
 Can do much better with an observation…
 Suppose we are allowed to make a single
measurement of a feature x of each pattern
of the data set.
 x is assigned a set of discrete values
{x1, x2, …, xd}

7/67
Conditional Probability

 The probability of an event given that another

event has occurred is called a conditional
probability.
 The conditional probability of A given B is
denoted by P(A|B).
A conditional probability is computed as follows :
P( A  B)
P( A|B) 
P( B)
Conditional Probability
 Conditional Probability Joint probability

P( A  B)
P( A|B) 
P ( B)
Conditional Probability --Example
 Next table shows the number of male and
female members of the standing faculty in
the departments of Math and English.
Math English Total
Female 1 17 18
Male 37 20 57
Total 38 37 75
10/67
Conditional Probability --Example
 Joint probability of gender and area
Math English Total
Female .013 .227 .240
Male .493 .267 .760
Total .506 .494 1.00
 If gender is the class of a professor and area
is the attribute of a professor,
P(female, math)=0.013
11/67
Math English Total

Female .013 .227 .240

Male .493 .267 .760

Conditional Probability --Example

Total .506 .494 1.00

 gender is the classes and area is the attributes, the

joint probabilities:
P(female, math) = 0.013 = P(math, female)
P(female, English) = 0.227 = P(English, female)
P(male, math) = 0.496 = P(math, math)
P(male, English) = 0.267 = P(English, male)

12/67
Math English Total

Female .013 .227 .240

Male .493 .267 .760

Conditional Probability --Example

Total .506 .494 1.00

 Joint probability P(Ck,xl) that xl belongs to

Ck is the fraction of total patterns that have
value xl while belonging to class Ck
 So, C1=female, C2=male
x1=math, x2=English

13/67
Joint and Conditional Probability
 Conditional probability P(xl|Ck) is the
fraction of patterns that have value xl (math,
English) given only patterns from class Ck
(male, female)
 P(math|male)
= P(xl=x1=math|Ck =C1=math)
= P(math, male) /P(male)
=0.493/0.760=0.649
 That is, P(xl|Ck)=P(xl, Ck)/P(Ck) 14/67
Joint and Conditional Probability
 From previous result,
P(Ck, xl) = P(xl, Ck)
= P(xl|Ck)*P(Ck)
 So

15/67
Joint Probability = Conditional
Probability  Class Prior

Number of patterns with

value xl in class Ck

Total number of patterns

Number of patterns
in class Ck

16/67
Posterior Probability: Bayes’
Theorem
 P(Ck/ xl) is the posterior probability:
probability that feature value xl belongs to
class Ck

 Bayes’ Theorem

17/67
Posterior Probability: Bayes’
Theorem
 Bayes’ Theorem

 Posterior = (likelihood*prior)/evidence

18/67
Bayes’ Theorem and
Classification
 Bayes’ Theorem provides the key to
classifier design:
 Assign pattern with x=xl to class Ck for which
the posterior is the highest!
 Note therefore that all posteriors must sum
to one

 And
19/67
Bayes’ Theorem and
Classification-- Example
The objects can be
classified as
either GREEN or RED
Our task is to classify
new cases as they arrive
i.e., decide to which
class label they belong,
based on the currently
exiting objects.
20/67
Bayes’ Theorem and
Classification-- Example
The objects can be classified as
either GREEN or RED
Our task is to classify new cases as they arrive
i.e., decide to which class label they belong,
based on the currently exiting objects
20 of 60 are RED and 40 of them are GREEN
Prior Probability for GREEN: 40 / 60 = 2/3
Prior Probability for RED : 20 / 60 = 1/3

21/67
Bayes’ Theorem and
Classification-- Example
 we are now ready to classify a new object
(WHITE circle in the diagram above)
 To measure this likelihood, we draw a circle
around X which encompasses a number (to be
chosen a priori) of points irrespective of their
class labels.

22/67
Bayes’ Theorem and
Classification-- Example

 It is clear that Likelihood of X given GREEN is

smaller than Likelihood of X given RED, since
the circle encompasses 1 GREEN object
and 3 RED ones.

23/67
Bayes’ Theorem and
Classification-- Example

 This is different from prior probability

Prior Probability for GREEN: 40 / 60 = 2/3
Prior Probability for RED : 20 / 60 = 1/3
 Thus we need t compute the posterior
probability
24/67
Bayes’ Theorem and
Classification-- Example

 Finally, we classify X RED since its

membership achieves the largest posterior
probability
25/67
Bayes’ Theorem for Continuous
Variables
 Probabilities for discrete intervals of a
feature measurement are then replaced by
probability density functions p(x)

26/67
Gaussian Distributions
 Two-class one Distribution Mean and
dimensional Gaussian Variance
probability density
function

variance
mean 27/67
normalizing factor
Example of Gaussian
Distribution
 Two classes are assumed to be distributed
about means 1.5 and 3 respectively, with
equal variances 0.25.

28/67
Example of Gaussian
Distribution

29/67
Extension to n-dimensions
 The probability density function expression
extends to the following

 Mean
 Covariance matrix

30/67
Covariance Matrix and Mean
 Covariance matrix
 describes the shape and orientation of the
distribution in space
 Mean
 describes the translation of the scatter from the
origin

31/67
Covariance Matrix and Data
Scatters

32/67
Covariance Matrix and Data
Scatters

33/67
Covariance Matrix and Data
Scatters

34/67
Probability Contours
 Contours of the probability density function
are loci of equal Mahalanobis distance

35/67
Classification Decisions with
Bayes’ Theorem
 Key: Assign X to Class Ck such that

or,

Note:

36/67
Placement of a Decision
Boundary
 Decision boundary separates the classes in
question
 Where do we place decision region
boundaries such that the probability of
misclassification is minimized?

37/67
Quantifying the Classification
Error
 Example: 1-dimension, 2 classes identified by
regions R1, R2
 Perror = P(x  R1, C2) + P(x  R2, C1)

38/67
Optimal Placement of A Decision
Boundary

39/67
Quantifying the Classification
Error
 Place decision boundary such that
 point x lies in R1 (decide C1) if p(x|C1)P(C1) >
p(x|C2)P(C2)
 point x lies in R2 (decide C2) if p(x|C2)P(C2) >
p(x|C1)P(C1)

40/67
Optimal Placement of A Decision
Boundary

Bayesian Decision
Boundary:

The point
where the unnormalized
probability density
functions crossover

41/67
Probabilistic Interpretation of a
Neuron Discriminant Function
 An artificial neuron
implements the discriminant
function:
 Each of C neurons
implements its own
discriminant function for a
C-class problem
 An arbitrary input vector X
is assigned to class Ck if
neuron k has the largest
activation 42/67
Probabilistic Interpretation of a
Neuron Discriminant Function
 An optimal Bayes’ classification chooses the
class with maximum posterior probability P(Cj|X)
 Discriminant function yj = P(X|Cj) P(Cj)
 yj notation re-used for emphasis
 Two classes case:
 point x lies in R1 (decide C1) if p(x|C1)P(C1) >
p(x|C2)P(C2)
 Relative magnitudes are important: use any
monotonic function of the probabilities to
generate a new discriminant function 43/67
Probabilistic Interpretation of a
Neuron Discriminant Function
 Assume an n-dimensional density function
 This yields,

 Ignore the constant term, assume that all

covariance matrices are the same:

44/67
Plotting a Bayesian Decision
Boundary: 2-Class Example
 Assume classes C1, C2, and discriminant functions
of the form,

 Combine the discriminants y(X) = y2(X) – Y1(X)

 New rule:
 Assign X to C2 if y(X) > 0; C1 otherwise

45/67
Plotting a Bayesian Decision
Boundary: 2-Class Example
 This boundary is elliptic

 If K1 = K2 = K then the boundary becomes

linear…

46/67
Bayesian Decision Boundary

47/67
Bayesian Decision Boundary

48/67
Cholesky Decomposition of
Covariance Matrix K
 Returns a matrix Q such that QTQ = K and
Q is upper triangular

49/67
Interpreting Neuron Signals as
Probabilities: Gaussian Data
 Gaussian Distributed Data
 2-Class data, K2 = K1 = K

 From Bayes’ Theorem, we have the

posterior probability

50/67
Interpreting Neuron Signals as
Probabilities: Gaussian Data
 Consider Class 1

Sigmoidal neuron ?

51/67
Interpreting Neuron Signals as
Probabilities: Gaussian Data
 We substituted

 or,

Neuron activation !

52/67
Interpreting Neuron Signals as
Probabilities
 Bernoulli Distributed Data
 Random variable xi takes values 0,1
 Bernoulli distribution

 Extending this result to an n-dimensional

vector of independent input variables

53/67
Interpreting Neuron Signals as
Probabilities: Bernoulli Data
 Bayesian discriminant

Neuron activation

54/67
Interpreting Neuron Signals as
Probabilities: Bernoulli Data
 Consider the posterior probability for class
C1

where

55/67
Interpreting Neuron Signals as
Probabilities: Bernoulli Data

56/67
Multilayered Networks
 The computational power of neural
networks stems from their multilayered
architecture
 What kind of interpretation can the outputs of
such networks be given?
 Can we use some other (more appropriate) error
function to train such networks?
 If so, then with what consequences in network
behaviour?
57/67
Likelihood
 Assume a training data set T={Xk,Dk}
drawn from a joint p.d.f. p(X,D) defined on
np
 Joint probability or likelihood of T

58/67
Sum of Squares Error Function
 Motivated by the concept of maximum likelihood
 Context: neural network solving a classification or
regression problem
 Objective: maximize the likelihood function
 Alternatively: minimize negative likelihood:

Drop this
constant

59/67
Sum of Squares Error Function
 Error function is the
negative sum of the log-
probabilities of desired
outputs conditioned on
inputs

 A feedforward neural
network provides a
framework for modelling
p(D|X)
60/67
Normally Distributed Data
 Decompose the p.d.f. into a product of
individual density functions

 Assume target data is Gaussian distributed

 j is a Gaussian distributed noise term

 gj(X) is an underlying deterministic function

61/67
From Likelihood to Sum Square
Errors
 Noise term has zero mean and s.d. 

 Neural network expected to provide a model of

g(X)

 Since f(X,W) is deterministic p(dj|X) = p(j)

62/67
From Likelihood to Sum Square
Errors

 Neglecting the constant terms yields

63/67
Interpreting Network Signal
Vectors
 Re-write the sum of squares error function

 1/Q provides averaging, permits replacement of

the summations by integrals

64/67
Interpreting Network Signal
Vectors
 Algebra yields

 Error is minimized when fj(X,W) = E[dj|X] for each j.

 The error minimization procedure tends to drive the
network map fj(X,W) towards the conditional average
E[dj,X] of the desired outputs
 At the error minimum, network map approximates the
regression of d conditioned on X! 65/67
Numerical Example
 Noisy distribution of
200 points
distributed about the
function
 Used to train a
neural network with
7 hidden nodes
 Response of the
network is plotted
with a continuous
line
66/67
Residual Error
 The error expression just presented neglected an integral
term shown below

 If the training environment does manage to reduce the

error on the first integral term in to zero, a residual error
still manifests due to the second integral term
67/67
Notes…
 The network cannot reduce the error below
the average variance of the target data!
 The results discussed rest on the three
assumptions:
 The data set is sufficiently large
 The network architecture is sufficiently general
to drive the error to zero.
 The error minimization procedure selected does
find the appropriate error minimum.
68/67
An Important Point
 Sum of squares error function was derived from maximum
likelihood and Gaussian distributed target data
 Using a sum of squares error function for training a neural
network does not require target data be Gaussian distributed.
 A neural network trained with a sum of squares error
function generates outputs that provide estimates of the
average of the target data and the average variance of target
data
 Therefore, the specific selection of a sum of squares error
function does not allow us to distinguish between Gaussian
and non-Gaussian distributed target data which share the
same average desired outputs and average desired output
variances…

69/67
Classification Problems
 For a C-class classification problem, there will be
C-outputs
 Only 1-of-C outputs will be one
 Input pattern Xk is classified into class J if

 A more sophisticated approach seeks to represent

the outputs of the network as posterior
probabilities of class memberships.
70/67
Advantages of a Probabilistic
Interpretation
 We make classification decisions that lead to the
smallest error rates.
 By actually computing a prior from the network pattern
average, and comparing that value with the knowledge
of a prior calculated from class frequency fractions on
the training set, one can measure how closely the
network is able to model the posterior probabilities.
 The network outputs estimate posterior probabilities
from training data in which class priors are naturally
estimated from the training set. Sometimes class priors
will actually differ from those computed from the
training set. A compensation for this difference can be
made easily. 71/67
NN Classifiers and Square Error
Functions
 Recall: feedforward neural network trained on a squared
error function generates signals that approximate the
conditional average of the desired target vectors
 If the error approaches zero,

 Define Kronecker delta function

1 if i  j
 ij  
0 if i  j
72/67
NN Classifiers and Square Error
Functions
 In the classification scheme, djk for pattern Xk belonging
to class CJ is given by
d   Jj
k
j

 That is, the desired output is one for node J and 0 for
other nodes.
 The probability that X takes on desire output dj is p(dj|X).

73/67
NN Classifiers and Square Error
Functions
 Note:

1 if x  0
   ( x)  
0 if x  0
 For two class problem:

p ( d1 | X )   ( d1  11 ) p (C1 | X )   ( d1  12 ) p (C2 | X )

74/67
NN Classifiers and Square Error
Functions
p ( d1  1 | X )   (1  11 ) p (C1 | X )   (1  12 ) p (C2 | X )
 p (C1 | X )
p ( d1  0 | X )   ( 0  11 ) p (C1 | X )   ( 0  12 ) p (C2 | X )
 p (C 2 | X )

75/67
Network Output = Class
Posterior
 The jth output sj is

Class posterior

76/67
Relaxing the Gaussian Constraint
 Design a new error function
 Without the Gaussian noise assumption on the
desired outputs
 Retain the ability to interpret the network
outputs as posterior probabilities
 Subject to constraints:
 signal confinement to (0,1) and
 sum of outputs to 1

77/67
Neural Network With A Single
Output
 Output s represents Class 1 posterior
 Then 1-s represents Class 2 posterior
 The probability that we observe a target value dk
on pattern Xk

 Problem: Maximize the likelihood of observing

the training data set

78/67
Cross Entropy Error Function
 Maximizing the probability of observing desired
value dk for input Xk on each pattern in T
 Likelihood

 Convenient to
minimize the negative
log-likelihood, which
we denote as the
error:

79/67
Architecture of Feedforward
Network Classifier

80/67
Network Training
 Using the chain rule (Chapter 6) with the cross
entropy error function

 Input – hidden weight derivatives can be found

similarly

81/67
C-Class Problem
 Assume a 1 of C encoding scheme
 Network has C outputs

and

 Likelihood function

82/67
Modified Error Function
 Cross entropy error
function for the C- class
case

 Minimum value

 Subtracting the minimum

value ensures that the
minimum is always zero
83/67
Softmax Signal Function
 Ensures that
 the outputs of the network are confined to the
interval (0,1) and
 simultaneously all outputs add to 1

 Is a close relative of the sigmoid

84/67
Error Derivatives
 For hidden-output weights

 The remaining part of the error

backpropagation algorithm remains intact

85/67

Ai Funda
No ratings yet
Ai Funda
354 pages
YAS
No ratings yet
YAS
97 pages
Features of Bayesian Learning Methods
No ratings yet
Features of Bayesian Learning Methods
39 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Bayesian
No ratings yet
Bayesian
91 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Mit18 05 s22 Statistics
No ratings yet
Mit18 05 s22 Statistics
173 pages
Back Matter
No ratings yet
Back Matter
10 pages
Probabilistic Reasoning in Artificial Intelligence
No ratings yet
Probabilistic Reasoning in Artificial Intelligence
14 pages
Chapter 07
No ratings yet
Chapter 07
68 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
BECE352E Module 3
No ratings yet
BECE352E Module 3
64 pages
7 Posterior Probability and Bayes PDF
No ratings yet
7 Posterior Probability and Bayes PDF
20 pages
Chapter 3
No ratings yet
Chapter 3
66 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Bayes Theorem
No ratings yet
Bayes Theorem
4 pages
Assignment 01 Math Kishan
No ratings yet
Assignment 01 Math Kishan
16 pages
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
39 pages
Unit 3-Generative Models
No ratings yet
Unit 3-Generative Models
23 pages
Unit 4 Introduction To Data Mining
No ratings yet
Unit 4 Introduction To Data Mining
22 pages
Math Section A
No ratings yet
Math Section A
4 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Unit-2 Statistical PR
No ratings yet
Unit-2 Statistical PR
26 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Program Name: B.Tech CSE Semester: 5th Course Code:PEC-CS-D-501 (I) Facilitator Name: Aastha
No ratings yet
Program Name: B.Tech CSE Semester: 5th Course Code:PEC-CS-D-501 (I) Facilitator Name: Aastha
17 pages
Guess Which Box
No ratings yet
Guess Which Box
6 pages
Bayes Classifier PDF
100% (1)
Bayes Classifier PDF
18 pages
Data Science
No ratings yet
Data Science
64 pages
Bayes 1
No ratings yet
Bayes 1
25 pages
Lec 1
No ratings yet
Lec 1
42 pages
Course Hand Out S3
No ratings yet
Course Hand Out S3
218 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Chapter 06 - Probability Theory
No ratings yet
Chapter 06 - Probability Theory
64 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
PPT CH 1 PR Ir
No ratings yet
PPT CH 1 PR Ir
48 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Lec 04
No ratings yet
Lec 04
70 pages
Lecture 3
No ratings yet
Lecture 3
6 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Unit 5
No ratings yet
Unit 5
41 pages
Chapter 5
No ratings yet
Chapter 5
42 pages
DM See M4
No ratings yet
DM See M4
8 pages
Course Hand Out S3
No ratings yet
Course Hand Out S3
7 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Bayes
No ratings yet
Bayes
10 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
No ratings yet
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
14 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Course Syllabus-Undefined-Undefined-Undefined
No ratings yet
Course Syllabus-Undefined-Undefined-Undefined
1 page
Level 1 Processing Greeshma
No ratings yet
Level 1 Processing Greeshma
38 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Suku Krishna
No ratings yet
Suku Krishna
19 pages
9 10 AI MCQ5 Sol
No ratings yet
9 10 AI MCQ5 Sol
10 pages
Ajay Identity Declaration - F
No ratings yet
Ajay Identity Declaration - F
14 pages
Identity Declaration Pattern Recognition
No ratings yet
Identity Declaration Pattern Recognition
13 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
Hca 1
No ratings yet
Hca 1
71 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
Bayes' Theorem Questions With Solutions For CBSE Class 12
No ratings yet
Bayes' Theorem Questions With Solutions For CBSE Class 12
13 pages
ml3 - Text Classification - Naive Bayes
No ratings yet
ml3 - Text Classification - Naive Bayes
50 pages
Naive by
No ratings yet
Naive by
23 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Bayesian
No ratings yet
Bayesian
23 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
Pattern Reco Tutorial
No ratings yet
Pattern Reco Tutorial
13 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Baye'S Theorm Problem:: Swapnamoy Koley 59 104 Quantitative Modeling in HRM Mba-Hrm
No ratings yet
Baye'S Theorm Problem:: Swapnamoy Koley 59 104 Quantitative Modeling in HRM Mba-Hrm
4 pages
2 Unit PR Statistical Decision Making
No ratings yet
2 Unit PR Statistical Decision Making
61 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bayes Classification
No ratings yet
Bayes Classification
4 pages
Bayes Theorem Topic Final
No ratings yet
Bayes Theorem Topic Final
23 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)

Statistical Perspective

Uploaded by

Statistical Perspective

Uploaded by

Neural Networks:

A Statistical Pattern Recognition

 The probability of an event given that another

Female .013 .227 .240

Male .493 .267 .760

Conditional Probability --Example

 gender is the classes and area is the attributes, the

Female .013 .227 .240

Male .493 .267 .760

Conditional Probability --Example

 Joint probability P(Ck,xl) that xl belongs to

Number of patterns with

Total number of patterns

 It is clear that Likelihood of X given GREEN is

 This is different from prior probability

 Finally, we classify X RED since its

 Ignore the constant term, assume that all

 Combine the discriminants y(X) = y2(X) – Y1(X)

 If K1 = K2 = K then the boundary becomes

 From Bayes’ Theorem, we have the

 Extending this result to an n-dimensional

 Assume target data is Gaussian distributed

 j is a Gaussian distributed noise term

 Neural network expected to provide a model of

 Since f(X,W) is deterministic p(dj|X) = p(j)

 Neglecting the constant terms yields

 1/Q provides averaging, permits replacement of

 Error is minimized when fj(X,W) = E[dj|X] for each j.

 If the training environment does manage to reduce the

 A more sophisticated approach seeks to represent

 Define Kronecker delta function

p ( d1 | X )   ( d1  11 ) p (C1 | X )   ( d1  12 ) p (C2 | X )

 Problem: Maximize the likelihood of observing

 Input – hidden weight derivatives can be found

 Subtracting the minimum

 Is a close relative of the sigmoid

 The remaining part of the error

You might also like