All in One
All in One
Department of CSE
Daffodil International University
Topic Contents
3
Case Studies: Playing
Grandmaster Chess…
Kasparov vs. Deep Blue, May 1997
• 6 game full-regulation match sponsored by ACM
• Kasparov lost the match 1 wins to 2 wins and 3 tie
• This was a historic achievement for computer chess
being the first time a computer became
the best chess player on the planet.
• Note that Deep Blue plays by “brute force” (i.e. raw
power from computer speed and memory). It uses
relatively little that is similar to human intuition and
cleverness.
4
Game Playing and AI
Why would game playing be a good problem for AI
research?
– game playing is non-trivial
• players need “human-like” intelligence
• games can be very complex (e.g. chess, go)
• requires decision making within limited time
– game playing can be usefully viewed as a search problem in
a space defined by a fixed set of rules
• Nodes are either white or black corresponding to reflect the
adversaries’ turns.
• The tree of possible moves can be searched for favourable positions.
5
Game Playing and AI…
Why would game playing be a good problem for AI
research?
– games often are:
• well-defined and repeatable
• easy to represent
• fully observable and limited environments
– can directly compare humans and computers
6
Game Playing as Search
• Consider two-player, turn-taking, board
games
– e.g., tic-tac-toe, checkers, chess
7
Game Playing as Search:
Game Tree
What's the new aspect
to the search problem?
…
There’s an opponent X X X
that we cannot control!
X
…
XO X O X X
O
O
…
XX X X
O X O XO
8
Game Playing as Search:
Complexity
• Assume the opponent’s moves can be
predicted given the computer's moves.
10
11
Greedy Search Game Playing
• Expand each branch to the terminal states
• Evaluate the utility of each terminal state
• Choose the move that results in the board
configuration with the maximum value
A
A
9
computer's
possible moves
B
B C
C D
D E
E
-5 9 2 3
opponent's
possible moves
F
F G
G H
H I9I J
J K
K L
L M
M N
N O
O
-7 -5 3 -6 0 2 1 3 2
12
Greedy Search Game Playing
Assuming a reasonable search space,
what's the problem with greedy search?
It ignores what the opponent might do!
e.g. Computer chooses C. Opponent
chooses J and defeats computer.
A
9
computer's
possible moves
B C D E
-5 9 2 3
opponent's
possible moves
F G H I J K L M N O
-7 -5 3 9 -6 0 2 1 3 2
13
Minimax: Idea
14
Minimax: Idea
• The computer assumes after it moves the
opponent will choose the minimizing move.
• It chooses its best move considering
both its move and opponent’s best move.
A
A
1
computer's
possible moves
B
B C
C D
D E
E
-7 -6 0 1
opponent's
possible moves
F G H I J K L M N O
-7 -5 3 9 -6 0 2 1 3 2
15
Minimax: Passing Values up
Game Tree
• Explore the game tree to the terminal states
• Evaluate the utility of the terminal states
• Computer chooses the move to put the board
in the best configuration for it assuming
the opponent makes best moves on her
turns:
– start at the leaves
– assign value to the parent node as follows
• use minimum of children when opponent’s moves
• use maximum of children when computer's moves
16
Deeper Game Trees
• Minimax can be generalized for > 2 moves
• Values backed up in minimax way
A
A
3 computer max
B
B C
C D E
E
-5 3 0 -7
opponent
F G H I J K L M min
F
4 -5 3 8 J
9 K
5 2 M
-7
computer
N O P Q R S T U V max
4 O
-5 9 -6 0 3 5 -7 -9
opponent
W X min terminal states
-3 -5
17
Minimax: Direct Algorithm
Note:
• minimax values gradually propagate upwards as DFS
proceeds: i.e., minimax values propagate up
in “left-to-right” fashion
• minimax values for sub-tree backed up “as we go”,
so only O(bd) nodes need to be kept in memory at any
time
18
Minimax: Algorithm Complexity
Assume all terminal states are at depth d
Space complexity?
depth-first search, so O(bd)
Time complexity?
given branching factor b, so O(bd)
Time complexity is a major problem!
Computer typically only has a
finite amount of time to make a move.
19
Minimax: Algorithm Complexity
• Direct minimax algorithm is impractical in practice
– instead do depth-limited search to ply (depth) m
21
Minimax:
Static Board Evaluator (SBE)
• How to design good static board evaluator functions?
First, the evaluation function should order the
terminal states in the same way as the true utility
function; otherwise, an agent using it might select
suboptimal moves even if it can see ahead all the
way to the end of the game.
Second, the computation must not take too long!
Third, for nonterminal states, the evaluation
function should be strongly correlated with the actual
chances of winning.
22
Minimax:
Static Board Evaluator (SBE)
• Evaluation function is a heuristic function, and it is where the
domain experts’ knowledge resides.
• Example of an evaluation function for Tic-Tac-Toe:
f(n) = [# of 3-lengths open for me] - [# of 3-lengths open for you],
where a 3-length is a complete row, column, or diagonal.
• Alan Turing’s function for chess
– f(n) = w(n)/b(n), where w(n) = sum of the point value of white’s pieces
and b(n) is sum for black.
• Most evaluation functions are specified as a weighted sum of
position features:
f(n) = w1*feat1(n) + w2*feat2(n) + ... + wn*featk(n)
• Example features for chess are piece count, piece placement,
squares controlled, etc.
• Deep Blue has about 6,000 features in its evaluation function.
Minimax: Algorithm with SBE
25
Recap
• Can't minimax search to the end of the
game.
– if could, then choosing move is easy
• SBE isn't perfect at estimating.
– if it was, just choose best move without
searching
26
Alpha-Beta Pruning Idea
• Some of the branches of the game tree won't be
taken if playing against a smart opponent.
• Use pruning to ignore those branches.
• While doing DFS of game tree, keep track of:
– alpha at maximizing levels (computer’s move)
• highest SBE value seen so far (initialize to -infinity)
• is lower bound on state's evaluation
– beta at minimizing levels (opponent’s move)
• lowest SBE value seen so far (initialize to +infinity)
• is higher bound on state's evaluation
27
Alpha-Beta Pruning Idea
• Beta cutoff pruning occurs when maximizing
if child’s alpha ≥ parent's beta
Why stop expanding children?
opponent won't allow computer to take this move
28
Alpha-Beta Search Example
minimax(A,0,4) alpha initialized to -infinity
Expand A? Yes since there are successors, no cutoff test for root
A Call
max A
α=- Stack
B C D E
0
F G H I J K L M
-5 3 8 2
N O P Q R S T U V
4 9 -6 0 3 5 -7 -9
W X A
-3 -5
29
Alpha-Beta Search Example
minimax(B,1,4) beta initialized to +infinity
Expand B? Yes since A’s alpha >= B’s beta is false, no alpha cutoff
A Call
max α=- Stack
BB C D E
β=+ 0
min
F G H I J K L M
-5 3 8 2
N O P Q R S T U V
4 9 -6 0 3 5 -7 -9
B
W X A
-3 -5
30
Alpha-Beta Search Example
minimax(F,2,4) alpha initialized to -infinity
Expand F? Yes since F’s alpha >= B’s beta is false, no beta cutoff
A Call
max α=- Stack
B C D E
β=+ 0
min
FF G H I J K L M
α=- -5 3 8 2
max
N O P Q R S T U V
4 9 -6 0 3 5 -7 -9 F
B
W X A
-3 -5
31
Alpha-Beta Search Example
minimax(N,3,4) evaluate and return SBE value
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=- -5 3 8 2
max
N
N O P Q R S T U V
4 9 -6 0 3 5 -7 -9 F
B
W X green: terminal state A
-3 -5
32
Alpha-Beta Search Example
back to alpha = 4, since 4 >= -infinity (maximizing)
minimax(F,2,4)
Keep expanding F? Yes since F’s alpha >= B’s beta is false, no beta cutoff
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=4
α=- -5 3 8 2
max
N O P Q R S T U V
4 9 -6 0 3 5 -7 -9 F
B
W X A
-3 -5
33
Alpha-Beta Search Example
minimax(O,3,4) beta initialized to +infinity
Expand O? Yes since F’s alpha >= O’s beta is false, no alpha cutoff
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2
max
O
N O
O P Q R S T U V
4 β=+ 9 -6 0 3 5 -7 -9 F
B
min
W X A
-3 -5
34
Alpha-Beta Search Example
minimax(W,4,4) evaluate and return SBE value
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2 W
max
O
N O P Q R S T U V
4 β=+ 9 -6 0 3 5 -7 -9 F
B
min
W X blue: non-terminal state (depth limit) A
-3 -5
35
Alpha-Beta Search Example
back to beta = -3, since -3 <= +infinity (minimizing)
minimax(O,3,4)
Keep expanding O? No since F’s alpha >= O’s beta is true: alpha cutoff
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2
max
O
N O P Q R S T U V
4 β=-3
β=+ 9 -6 0 3 5 -7 -9 F
B
min
W X A
-3 -5
36
Alpha-Beta Search Example
Why?
Smart opponent will choose W or worse, thus O's upper bound is –3.
Computer already has better move at N.
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2
max
O
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 F
B
min
W X red: pruned state A
-3 -5
37
Alpha-Beta Search Example
back to alpha doesn’t change, since -3 < 4 (maximizing)
minimax(F,2,4)
Keep expanding F? No since no more successors for F
A Call
max α=- Stack
B C D E
β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 F
B
min
W X A
-3 -5
38
Alpha-Beta Search Example
back to beta = 4, since 4 <= +infinity (minimizing)
minimax(B,1,4)
Keep expanding B? Yes since A’s alpha >= B’s beta is false, no alpha cutoff
A Call
max α=- Stack
B C D E
β=+
β=4 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
B
min
W X A
-3 -5
39
Alpha-Beta Search Example
minimax(G,2,4) evaluate and return SBE value
A Call
max α=- Stack
B C D E
β=4 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 G
B
min
W X green: terminal state A
-3 -5
40
Alpha-Beta Search Example
back to beta = -5, since -5 <= 4 (minimizing)
minimax(B,1,4)
Keep expanding B? No since no more successors for B
A Call
max α=- Stack
B C D E
β=-5
β=4 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
B
min
W X A
-3 -5
41
Alpha-Beta Search Example
back to alpha = -5, since -5 >= -infinity (maximizing)
minimax(A,0,4)
Keep expanding A? Yes since there are more successors, no cutoff test
A Call
max α=-5
α= Stack
B C D E
β=-5 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
min
W X A
-3 -5
42
Alpha-Beta Search Example
minimax(C,1,4) beta initialized to +infinity
Expand C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff
A Call
max α=-5
α= Stack
B CC D E
β=-5 β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
C
min
W X A
-3 -5
43
Alpha-Beta Search Example
minimax(H,2,4) evaluate and return SBE value
A Call
max α=-5
α= Stack
B C D E
β=-5 β=+ 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 H
C
min
W X green: terminal state A
-3 -5
44
Alpha-Beta Search Example
back to beta = 3, since 3 <= +infinity (minimizing)
minimax(C,1,4)
Keep expanding C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff
A Call
max α=-5
α= Stack
B C D E
β=-5 β=+
β=3 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
C
min
W X A
-3 -5
45
Alpha-Beta Search Example
minimax(I,2,4) evaluate and return SBE value
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 I
C
min
W X green: terminal state A
-3 -5
46
Alpha-Beta Search Example
back to beta doesn’t change, since 8 > 3 (minimizing)
minimax(C,1,4)
Keep expanding C? Yes since A’s alpha >= C’s beta is false, no alpha cutoff
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
C
min
W X A
-3 -5
47
Alpha-Beta Search Example
minimax(J,2,4) alpha initialized to -infinity
Expand J? Yes since J’s alpha >= C’s beta is false, no beta cutoff
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J
J K L M
α=4 -5 3 8 α=- 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 J
C
min
W X A
-3 -5
48
Alpha-Beta Search Example
minimax(P,3,4) evaluate and return SBE value
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=- 2
max
P
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 J
C
min
W X green: terminal state A
-3 -5
49
Alpha-Beta Search Example
back to alpha = 9, since 9 >= -infinity (maximizing)
minimax(J,2,4)
Keep expanding J? No since J’s alpha >= C’s beta is true: beta cutoff
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9
α=- 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 J
C
min
W X red: pruned states A
-3 -5
50
Alpha-Beta Search Example
Why?
Computer will choose P or better, thus J's lower bound is 9.
Smart opponent won’t let computer take move to J
(since opponent already has better move at H).
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9 J
C
min
W X red: pruned states A
-3 -5
51
Alpha-Beta Search Example
back to beta doesn’t change, since 9 > 3 (minimizing)
minimax(C,1,4)
Keep expanding C? No since no more successors for C
A Call
max α=-5
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
C
min
W X A
-3 -5
52
Alpha-Beta Search Example
back to alpha = 3, since 3 >= -5 (maximizing)
minimax(A,0,4)
Keep expanding A? Yes since there are more successors, no cutoff test
A Call
max α=-5
α=3 Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
min
W X A
-3 -5
53
Alpha-Beta Search Example
minimax(D,1,4) evaluate and return SBE value
A Call
max α=3
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
D
min
W X green: terminal state A
-3 -5
54
Alpha-Beta Search Example
back to alpha doesn’t change, since 0 < 3 (maximizing)
minimax(A,0,4)
Keep expanding A? Yes since there are more successors, no cutoff test
A Call
max α=3
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
min
W X A
-3 -5
55
Alpha-Beta Search Example
How does the algorithm finish searching the tree?
A Call
max α=3
α= Stack
B C D E
β=-5 β=3 0
min
F G H I J K L M
α=4 -5 3 8 α=9 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
min
W X A
-3 -5
56
Alpha-Beta Search Example
Stop Expanding E since A's alpha >= E's beta is true: alpha cutoff
Why?
Smart opponent will choose L or worse, thus E's upper bound is 2.
Computer already has better move at C.
A Call
max α=3
α= Stack
B C D E
β=-5 β=3 0 β=2
min
F G H I J K L M
α=4 -5 3 8 α=9 α=5 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
min
W X A
-3 -5
57
Alpha-Beta Search Example
Result: Computer chooses move to C.
A Call
max α=3
α= Stack
B C D E
β=-5 β=3 0 β=2
min
F G H I J K L M
α=4 -5 3 8 α=9 α=5 2
max
N O P Q R S T U V
4 β=-3 9 -6 0 3 5 -7 -9
min
W X green: terminal states, red: pruned states A
-3 -5
blue: non-terminal state (depth limit)
58
Alpha-Beta Effectiveness
Effectiveness depends on the order
in which successors are examined.
A A
α=3 α=3
E E
β=2 β=2
K L M L K M
α=5 2 2
S T U V S T U V
3 5 -7 -9 3 5 -7 -9
60
Alpha-Beta Effectiveness
61
62
Other Issues: The Horizon
Effect
• Sometimes disaster lurks just
beyond search depth
– e.g. computer captures queen,
but a few moves later the opponent checkmates
63
Other Issues: The Horizon
Effect
• Quiescence Search
– when SBE value frequently changing,
look deeper than limit
– looking for point when game quiets down
• Secondary Search
1. find best move looking to depth d
2. look k steps beyond to verify that it still looks
good
3. if it doesn't, repeat step 2 for next best move
64
THANKS…
CSE 412: Artificial Intelligence
• Logic
• Propositional Logic: A Very Simple Logic
☞ Syntax
☞ Semantics
☞ A simple knowledge base
☞ Inference
☞ Equivalence, validity, and satisfiability
•
Architecture of a
Simple Intelligent Agent
Environment Agent
Model of Prior
Sensors World Knowledge
(being about the
updated) World
Reasoning &
Decisions
Making
Effectors List of
Possible Goals/Utility
Actions
3
Knowledge Based Agent
● Knowledge base:
– A knowledge base (abbreviated KB or kb) is a special
kind of database for knowledge management.
– A knowledge base is an information repository that
provides a means for information to be collected,
organized, shared, searched and utilized.
– The part of an expert system that contains the facts
and rules needed to solve problems.
– A collection of facts and rules for problem solving.
4
Knowledge Based Agent
5
Knowledge Bases (KB)
� A knowledge base:
– contains the domain-specific content for an agent
– is a set of representations of facts about the world
� is a set of sentences in a formal language
6
Knowledge Bases (KB)
8
Algorithm
9
General Logic
10
General Logic
11
General Logic
12
General Logic
13
Entailment
● KB ╞ α
Knowledge base KB entails sentence α
if and only if α is true in all worlds where KB is true
14
Entailment
● KB ╞ α
Knowledge base KB entails sentence α
if and only if α is true in all worlds where KB is true
● For example:
KB: "sky is blue" = true, "sun is shining" = true
entails α: "sky is blue and sun is shining" = true
– α represents a true fact
as long as facts represented in KB are true
– if the sky was actually cloudy then KB isn't the true world state
then α wouldn't represent a true fact
� Entailment requires sentences in KB to be true.
15
Logical Inference
16
General Logic
17
General Logic
18
Propositional Logic (PL) Basics
19
Logical Connectives of PL
● ¬S negation (not)
● S1∧S2 conjunction (and)
S1 and S2 are conjuncts
● S1∨S2 disjunction (or)
S1 and S2 are disjuncts
● S1⇒S2 implication/conditional (if-then)
S1 is the antecedent/premise
S2 is the consequent/conclusion
● S1⇔S2 equivalence/biconditional (if and only if)
20
Syntax of PL
A B C
false false false
false false true Given n symbols,
false true false 2n possible combinations
false true true of
true false false truth value assignments.
true false true here each row is an interpretation
true true false
true true true
22
Implication Truth Table
A B A⇒B B∨ ¬A
false false true true
false true true true
true false false false
true true true true
A⇒B is equivalent to B∨ ¬A
23
Validity
A B C A∨ ¬A
false false false true
false false true true A sentence is valid
false true false true if it's true in all interpretations:
false true true true P1∨ ¬P1 P1⇒P1 (tautologies)
true false false true
(i.e. its entire column is true)
true false true true
true true false true
true true true true
24
Satisfiability
A B C A∨ ¬B
false false false true
false false true true A sentence is satisfiable
false true false false if it's true in some interpretations:
false true true false P1∨ ¬P2 P2⇒P1
true false false true
(i.e. its column is true and false)
true false true true
true true false true
true true true true
25
Unsatisfiability
A B C C∧ ¬C
false false false false
false false true false A sentence is unsatisfiable
false true false false if it's true in no interpretations:
false true true false P1∧ ¬P1 (inconsistent/contradiction)
true false false false
(i.e. its entire column is false)
true false true false
true true false false
true true true false
26
Inference Proof Methods
● Model Checking:
– truth table enumeration
sound and complete for propositional logic
– heuristic search in model space
sound but incomplete
● Application of Syntactic Operations
(i.e. Inference Rules):
– sound generation of new sentences from old
– could use inference rules as operators for search
27
Inference by Enumeration
Independence
Effectors Actions
Goals
change speed
drive home
accelerator change steering
brakes
steering 3
Uncertainty in the World Model
The agent can never be completely certain about the
state of the external world since there is ambiguity and
uncertainty.
Why?
sensors have limited precision
e.g. camera has only so many pixels to capture an image
sensors have limited accuracy
e.g. tachometer’s estimate of velocity is approximate
there are hidden variables that sensors can’t “see”
e.g. vehicle behind large truck or storm clouds approaching
the future is unknown, uncertain,
i.e. cannot foresee all possible future events which may happen
4
Rules and Uncertainty
Say we have a rule:
if toothache then problem is cavity
But not all patients have toothaches due to cavities
so we could set up rules like:
if toothache and not(gum disease) and not(filling) and ...
then problem = cavity
This gets complicated, a better method would be:
if toothache then problem is cavity with 0.8 probability
or P(cavity|toothache) = 0.8
the probability of cavity is 0.8 given toothache is all that is known
5
Uncertainty in the World Model
True uncertainty: rules are probabilistic in nature
rolling dice, flipping a coin?
Laziness: too hard to determine exceptionless rules
takes too much work to determine all of the relevant factors
too hard to use the enormous rules that result
Theoretical ignorance: don't know all the rules
problem domain has no complete theory (medical diagnosis)
Practical ignorance: do know all the rules BUT
haven't collected all relevant information for a particular case
6
Logics
Logics are characterized by
what they commit to as "primitives".
Logic What Exists in World Knowledge States
Propositional facts true/false/unknown
First-Order facts, objects, true/false/unknown
relations
Temporal facts, objects, true/false/unknown
relations, times
Probability facts degree of belief
Theory 0..1
Fuzzy degree of truth degree of belief
0..1
7
Probability Theory
8
Utility Theory
Every state has a degree of usefulness or utility and the
agent will prefer states with higher utility.
9
Decision Theory
An agent is rational if and only if it chooses the action
that yields the highest expected utility, averaged over all
the possible outcomes of the action.
10
Kolmogorov's Axioms of Probability
1. 0 ≤ P(a) ≤ 1
probabilities are between 0 and 1 inclusive
2. P(true) = 1, P(false) = 0
probability of 1 for propositions believed to be absolutely true
probability of 0 for propositions believed to be absolutely false
3. P(a b) = P(a) + P(b) - P(a b)
probability of two states is their sum minus their “intersection”
11
Inference Using Full Joint Distribution
Start with the joint probability distribution:
12
Inference by Enumeration
13
Inference by Enumeration
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
14
Independence
Independence between propositions a and b can be
written as:
P(a | b) = P(a) or P(b | a) = P(b) or P(a b) = P(a) P(b)
Independence assertions are usually based on knowledge
of the domain.
As we have seen, they can dramatically reduce the
amount of information necessary to specify the full joint
distribution.
If the complete set of variables can be divided into
independent subsets, then the full joint can be factored
into separate joint distributions on those subsets.
For example, the joint distribution on the outcome of n
independent coin flips, P(C1, . . . , Cn), can be represented
as the product of n single-variable distributions P(Ci).
15
Bayes’ Theorem
16
Bayes’ Theorem
P(AB) = P(A|B)P(B)
P(BA) = P(B|A)P(A)
P(B|A)P(A) = P(A|B)P(B)
P(B|A) = P(A|B)P(B)
P(A)
Bayes’ law (also Bayes’ law or Bayes’ rule) is fundamental to
probabilistic reasoning in AI!!
17
Bayes’ in Action
Bayes’ rule requires three terms - a conditional probability
and two unconditional probabilities - just to compute one
conditional probability.
Bayes’ rule is useful in practice because there are many
cases where we do have good probability estimates for
these three numbers and need to compute the fourth.
In a task such as medical diagnosis, we often have
conditional probabilities on causal relationships and want
to derive a diagnosis.
18
Bayes’ in Action
Example:
A doctor knows that the disease meningitis causes the patient
to have a stiff neck, say, 50% of the time. The doctor also
knows some unconditional facts: the prior probability that a
patient has meningitis is 1/50,000, and the prior probability
that any patient has a stiff neck is 1/20. Let s be the
proposition that the patient has a stiff neck and m be the
proposition that the patient has meningitis.
Solution:
This question can be answered by using the well-known
Bayes’ theorem.
19
Bayes’ in Action
Solution:
P(s|m) = 0.5
P(m) = 1/50000
P(s) = 1/20
20
Bayes’ Theorem
Example
Consider a football game between two rival teams: Team 0 and
Team 1. Suppose Team 0 wins 65% of the time and Team 1 wins
the remaining matches. Among the games won by Team 0, only
30% of them come from playing on Team 1 ’s football field. On
the other hand, 75% of the victories for Team 1 are obtained
while playing at home. If Team 1 is to host the next match
between the two teams, which team will most likely emerge as
the winner?
Solution:
This question can be answered by using the well-known Bayes’
theorem.
21
Bayes’ Theorem
22
Bayes’ Theorem
23
Using Bayes’ Theorem More Realistically
Bayes’ian updating
P(Cavity | Toothache Catch) = P(Toothache Catch | Cavity) P(Cavity)
24
THANKS…
Department of CSE
Artificial Intelligence
CSE 412
Topic – Natural Language Processing
Topic Contents
Natural Language
Formal Language
Formal Grammar
Chomsky Hierarchy
Natural Language
In the philosophy of language, a natural language
is any language which arises in an unpremeditated
fashion as the result of the innate facility for
language possessed by the human intellect.
A natural language is typically used for
communication, and may be spoken, signed, or
written.
Natural language is distinguished from formal
languages such as computer-programming
languages.
Natural Language Processing
Natural Language Processing (NLP) is the
study of human languages and how they can
be represented computationally and analyzed
and generated algorithmically.
NLP can also be defined as the study of
building computational models of natural
language comprehension and production.
Natural Language Processing
Other Names of NLP:
Computational Linguistics (CL)
Human Language Technology (HLT)
Natural Language Engineering (NLE)
Speech and Text Processing
Studying NLP involves studying natural
language, formal representations, and
algorithms for their manipulation.
Formal Language
A formal language is a set of strings of symbols
that may be constrained by rules that are specific to
it.
The alphabet of a formal language is the finite,
nonempty set of symbols, letters, or tokens from
which the strings of the language may be formed.
The strings formed from this alphabet are called
words.
A formal language is often defined by means of a
formal grammar.
Formal Grammar
A formal grammar is a set of production rules for
strings in a formal language.
The rules describe how to form strings from the
language's alphabet that are valid according to the
language's syntax.
A grammar does not describe the meaning of the
strings or what can be done with them in whatever
context—only their form.
Informal Example of CFG’s
Informal Example of CFG’s
Formal Definition of CFG’s
Formal Definition Examples
Formal Definition Examples
Chomsky Hierarchy
Grammatical formalisms can be classified by their
generative capacity: the set of languages they can
represent.
Noam Chomsky (1957) describes four classes of
grammatical formalisms that differ only in the form
of the rewrite rules.
The classes can be arranged in a hierarchy, where
each class can be used to describe all the
languages that can be described by a less powerful
class, as well as some additional languages.
This hierarchy of grammars is known as Chomsky
hierarchy.
Chomsky Hierarchy
Chomsky Hierarchy
Type-0 grammars (recursively enumerable
or unrestricted grammars) include all
formal grammars.
They generate exactly all languages that can
be recognized by a Turing machine.
These languages are also known as the
recursively enumerable languages.
Chomsky Hierarchy
Type-1 grammars (context-sensitive grammars)
generate the context-sensitive languages.
These grammars have rules of the form αAβ → αγβ
with A a nonterminal and α, β and γ strings of
terminals and nonterminals. The strings α and β
may be empty, but γ must be nonempty. The rule S
→ ε is allowed if S does not appear on the right side
of any rule.
The languages described by these grammars are
exactly all languages that can be recognized by a
linear-bounded Turing machine.
Chomsky Hierarchy
Type-2 grammars (context-free grammars)
generate the context-free languages.
These are defined by rules of the form A → γ with A
a nonterminal and γ a string of terminals and
nonterminals.
These languages are exactly all languages that can
be recognized by a pushdown automaton.
Context-free languages are the theoretical basis for
the syntax of most programming languages.
Chomsky Hierarchy
Type-3 grammars (regular grammars) generate the
regular languages.
Such a grammar restricts its rules to a single nonterminal on
the left-hand side and a right-hand side consisting of a
single terminal, possibly followed (or preceded, but not both
in the same grammar) by a single nonterminal.
The rule S → ε is also allowed here if S does not appear on
the right side of any rule.
These languages are exactly all languages that can be
decided by a finite state automaton. Additionally, this
family of formal languages can be obtained by regular
expressions.
Regular languages are commonly used to define search
patterns and the lexical structure of programming
languages.
Chomsky Hierarchy
The following table summarizes each of
Chomsky's four types of grammars, the class
of language it generates, the type of
automaton that recognizes it, and the form its
rules must have.
Chomsky Hierarchy
THANKS…
Artificial Intelligence (CSE 412)
Department of CSE
Daffodil International University
Topic Contents
Overview
Applications
History
Threshold Logic
3
Threshold Logic
4
Threshold Logic
5
Threshold Logic
6
Threshold Logic
7
Threshold Logic
A single unit can compute the disjunction or the
conjunction of n arguments as is shown in
Figure 2.9, where the conjunction of three and
the disjunction of four arguments are computed
by two units, respectively.
The same kind of computation requires several
conventional logic gates with two inputs.
It should be clear from this simple example that
threshold logic elements can reduce the
complexity of the circuit used to implement a
given logical function.
8
Threshold Logic
9
Threshold Logic
10
11
Weighted and Unweighted
Networks
Since McCulloch–Pitts networks do not use
weighted edges, the question of whether
weighted networks are more general than
unweighted ones must be answered.
A simple example shows that both kinds of
networks are equivalent.
Assume that three weighted edges converge on
the unit shown in Figure 2.17. The unit computes
12
13
Perceptron
In the preceding slides, we arrived at the conclusion
that McCulloch–Pitts units can be used to build
networks capable of computing any logical function
and of simulating any finite automaton.
Learning can only be implemented by modifying the
connection pattern of the network and the thresholds
of the units, but this is necessarily more complex than
just adjusting numerical parameters.
For that reason, we turn our attention to weighted
networks and consider their most relevant properties.
14
Perceptron
In 1958 Frank Rosenblatt, an American psychologist,
proposed the perceptron, a more general computational model
than McCulloch–Pitts units.
The essential innovation was the introduction of numerical
weights and a special interconnection pattern.
In the original Rosenblatt model the computing units are
threshold elements and the connectivity is determined
stochastically. Learning takes place by adapting the weights of
the network with a numerical algorithm.
Rosenblatt’s model was refined and perfected in the 1960s and
its computational properties were carefully analyzed by Minsky
and Papert.
In the following, Rosenblatt’s model will be called the classical
perceptron and the model analyzed by Minsky and Papert the
perceptron.
15
Perceptron
16
Perceptron
17
18
Perceptron:
Computational Limits
Early experiments with Rosenblatt’s model had aroused
unrealistic expectations in some quarters, and there was no
clear understanding of the class of pattern recognition
problems which it could solve efficiently.
Minsky and Papert used their simplified perceptron model to
investigate the computational capabilities of weighted
networks.
They found that perceptrons are only capable of learning
linearly separable patterns.
Theorem (Minsky and Papert, 1969): The perceptron rule
converges to weights that correctly classify all training
examples provided the given data set represents a function
that is linearly separable.
19
Perceptron:
Computational Limits
Linear separability:
Perceptron: Conclusion
21