AI
AI
General information
Course name
Artificial Intelligence
Volume
15 x 2h
Lecturer
Trn c Khnh Department of Information Systems, SOICT, HUT [email protected]
Goals
Total image of AI Basic concepts of AI Major techniques of AI Important applications of AI
Content
Introduction Agents Problem Solving
Search Algorithms Constraint Satisfaction Problems
Reference
Book
Artificial Intelligence A Modern Approach. Stuart Russell and Peter Norvig. Prentice Hall
Evaluation
Final examination
70% Achievement test
Continuous assessment
30% Class attendance, midterm test, short tests or quizzes, project
Do not hesitate to
Ask questions Give your opinions/feedbacks Discuss with your lecturer
Artificial Intelligence
For HEDSPI Project
Lecture 1 - Introduction
Lecturer: Tran Duc Khanh Dept of Information Systems School of Information and Communication Technology HUST
1
Outline
What is AI? Foundations of AI Short history of AI Philosophical discussions
What is AI?
Views of AI fall into four categories: Think like humans Thinking rationally Act like humans Acting rationally The textbook advocates "acting rationally"
Predicted that by 2000, a machine might have a 30% chance of fooling a lay person for 5 minutes Anticipated all major arguments against AI in following 50 years Suggested major components of AI: knowledge, reasoning, language understanding, learning
Thinking rationally
The Laws of Thought approach What does it mean to think rationally? Normative / prescriptive rather than descriptive Logicist tradition: Logic: notation and rules of derivation for thoughts Aristotle: what are correct arguments/thought processes? Direct line through mathematics, philosophy, to modern AI Problems: Not all intelligent behavior is mediated by logical deliberation What is the purpose of thinking? What thoughts should I have? Logical systems tend to do the wrong thing in the presence of uncertainty
6
Acting rationally
Rational behavior: doing the right thing The right thing: that which is expected to maximize goal achievement, given the available information Doesn't necessarily involve thinking, e.g., blinking Thinking can be in the service of rational action Entirely dependent on goals! Irrational insane, irrationality is sub-optimal action Rational successful Our focus here: rational agents Systems which make the best possible decisions given goals, evidences, and constraints In the real world, usually lots of uncertainty and lots of complexity Usually, were just approximating rationality Computational rationality a better title for this course
7
Rational agents
An agent is an entity that perceives and acts An agent function maps from percept histories to actions:
Agent Sensors Percepts
Environment
?
Actuators Actions
P*
For any given class of environments and tasks, we seek the agent (or class of agents) with the best performance Computational limitations make perfect rationality unachievable So we want the best program for given machine resources
Foundations of AI
Philosophy logic, methods of reasoning, mind as physical system foundations of learning, language, rationality Mathematics formal representation and proof algorithms, computation, (un)decidability, (in)tractability, probability Economics utility, decision theory Neurosciencephysical substrate for mental activity Psychology phenomena of perception and motor control, experimental techniques Computer building fast computers engineering Control theory design systems that maximize an objective function over time Linguistics knowledge representation, grammar
Short history of AI
1940-1950: Early days
1943: McCulloch & Pitts: Boolean circuit model of brain 1950: Turing's ``Computing Machinery and Intelligence'
Expert system
computer data input software hardware output input output knowledge
knowledge engineering
Proved a mathematical conjecture (Robbins conjecture) unsolved for decades No hands across America (driving autonomously 98% of the time from Pittsburgh to San Diego) During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program that involved up to 50,000 vehicles, cargo, and people NASA's on-board autonomous planning program controlled the scheduling of operations for a spacecraft Proverb solves crossword puzzles better than most humans
12
Philosophical discussions
What Can AI Do? Play a decent game of table tennis? Drive safely along a curving mountain road? Buy a week's worth of groceries on the web? Discover and prove a new mathematical theorem? Converse successfully with another person for an hour? Perform a complex surgical operation? Unload a dishwasher and put everything away? Translate spoken English into spoken Vietnamese in real time? Write an intentionally funny story? Can machine think?
13
14
Artificial Intelligence
For HEDSPI Project
Lecture 2 - Agent
Lecturer: Tran Duc Khanh Dept of Information Systems School of Information and Communication Technology HUST
1
Outline
1. 2.
3. 4.
Agents and environments PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types
Environment
?
Actuators Actions
The agent program runs on the physical architecture to produce the agent function agent = architecture + program
Function TABLE-DRIVEN-AGENT(percept) returns an action static: percepts, a sequence, initially empty table, a table of actions, indexed by percept sequences, initially fully specified
Append percept to the end of percepts action LOOKUP(percepts, table) Return action
Vacuum-cleaner world
Vacuum-cleaner world
Funtion Reflex-Vacuum-Agent([position, state]) returns action If state = Dirty then return Suck Else if position = A then return Right Else if position = B then return Left End Function Does the agent act reasonably?
Rational agent
A rational agent is one that does the right thing the one that will cause the agent to be most successful Performance measure embodies the criterion for success of an agent's behavior.
E.g., performance measure of a vacuum-cleaner agent:
amount of dirt cleaned up amount of time taken amount of electricity consumed amount of noise generated
8
Rational agent
For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has. An agent is autonomous if its behavior is determined by its own experience (with ability to learn and adapt)
9
PEAS
4 factors should be considered when design an automated agent:
Performance measure Environment Actuators Sensors
10
11
13
Environment types
Fully observable (vs. partially observable): An agent's sensors give it access to the complete state of the environment at each point in time. Deterministic (vs. stochastic): The next state of the environment is completely determined by the current state and the action executed by the agent. Episodic (vs. sequential): The agent's experience is divided into atomic "episodes" (each episode consists of the agent perceiving and then performing a single action.
14
Environment types
Static (vs. dynamic): The environment is unchanged while an agent is deliberating. Discrete (vs. continuous): A limited number of distinct, clearly defined percepts and actions. Single agent (vs. multiagent): An agent operating by itself in an environment.
15
Agent types
Four basic agent types:
Simple reflex agents Model-based reflex agents Goal-based agents Utility-based agents
16
These agents select actions on the basis of the current percept, ignoring the rest of the percept history
Conditionaction rules
Environment
Effectors
Function SIMPLE-REFLEX- AGENT(percept) returns an action static: rules, a set of condition-action rules state INTERPRET-INPUT(percept) rule RULE-MATCH(state, rules) action RULE-ACTION[rule] return action
17
Environment
What my actions do
Conditionaction rules
Agent
Effectors
function REFLEX-AGENT-WITH-STATE(percept) returns an action) static: state, a description of the current world state rules, a set of condition-action rules action, the most recent action, initially none state UPDATE-STATE(state, action, percept) rule RULE-MATCH(state, rules) action RULE-ACTION[rule] return action
18
Goal-based agents
Sensors State How the world evolves What the world is like now What it will be like if I do action A
Environment
What my actions do
Goals
Agent
Effectors
Goals introduce the need to reason about the future or other hypothetical states. It may be the case that none of the actions an agent can currently perform will lead to a goal state.
19
Utility-based agents
Sensors State How the world evolves What the world is like now What it will be like if I do action A How happy I will be in such a state What action I should do now
Environment
What my actions do
Utility
Agents that take actions that make them the most happy in the long run.
Agent
Effectors
More formally agents that prefer actions that lead to states with higher utility. Utility-based agents can reason about multiple goals, conflicting goals, and uncertain situations.
20
Learning agents
Learning allows the agent to operate in initially unknown environments and to become more competent than its initial knowledge alone might allow. The most important question: What kind of performance element will my agent need to do this once it has learned how?
21
Knowledge bases
Knowledge base is a set of sentences in a formal language, telling an agent what it needs to know Agent can ASK itself what to do, the answer should follow from the KB Agents can be viewed at:
the knowledge level: what they know, what its goals are the implementation level: data structures in KB and algorithms that manipulate them
Knowledge-based agents
function KB-AGENT(percept) returns an action static: KB, a knowledge base t, a counter, initially 0, indicating time TELL(KB, MAKE-PERCEPT-SENTENCE(percept,t) action ASK(KB, MAKE-ACTION- QUERY(^)) TELL(KB, MAKE-ACTION-SENTENCE(action,t) ) t t+1 return action
23
Multi-agent planning
Environment: cooperative or competitive Issue: the environment is not static synchronization Require a model of the other agent's plans Cooperation: joint goals and plans, e.g., team planning in doubles tennis. Joint goal: returning the ball that has been hit to them and ensuring that at least one of them is covering the net Joint plan: multibody planning Coordination mechanisms: decompose and distribute tasks Competition: e.g., chess-playing An agent in a competitive environment must
recognize that there are other agents compute some of the other agent's possible plans compute how the other agent's plans interact with its own plans decide on the best action in view of these interactions.
24
Artificial Intelligence
For HEDSPI Project
Lecturer 3 - Search
Lecturer: Tran Duc Khanh Dept of Information Systems School of Information and CommunicationTechnology HUST
Outline
Problem-solving agents Problem types Problem formulation Example problems Basic search algorithms
breadth-first search depth-first search depth-limited search iterative deepening depth-first search
2
2
Problem-solving agents
3
3
Performance: Get from Arad to Bucharest as quickly as possible Environment: The map, with cities, roads, and guaranteed travel times Actions: Travel a road between adjacent cities
4
Problem types
Deterministic, fully observable single-state problem Agent knows exactly which state it will be in; solution is a sequence Non-observable sensorless problem (conformant problem) Agent may have no idea where it is; solution is a sequence Nondeterministic and/or partially observable contingency problem percepts provide new information about current state often interleave search, execution Unknown state space exploration problem
Zerind, Zerind>, }
4.
A solution is a sequence of actions leading from the initial state to a goal state
6
locations of tiles move blank left, right, up, down = goal state (given) 1 per move
7
Search tree
Search trees:
Represent the branching paths through a state graph. Usually much larger than the state graph. Can a finite state graph give an infinite search tree?
10
We can turn graph search problems into tree search problems by:
replacing undirected links by 2 directed links avoiding loops in path (or keeping trach of visited nodes globally)
11
n0
n successors(n) Goal
12
13
The Expand function creates new nodes, filling in the various fields and using the SuccessorFn of the problem to create the corresponding states.
14
Search strategies
A search strategy is defined by picking the order of node expansion Strategies are evaluated along the following dimensions: completeness: does it always find a solution if one exists? time complexity: number of nodes generated space complexity: maximum number of nodes in memory optimality: does it always find a least-cost solution? Time and space complexity are measured in terms of b: maximum branching factor of the search tree d: depth of the least-cost solution m: maximum depth of the state space (may be )
15
in
successors(n)
in
out
n fringe
Breadth-first search
Expand shallowest unexpanded node
17
18
Uniform-cost search
Expand cheapest unexpanded node n (cost so far to reach n) fringe = queue ordered by path cost Equivalent to breadth-first if step costs all equal Complete? Yes, if step cost Time? # of nodes with g cost of optimal solution, O(bceiling(C*/ )) where C* is the cost of the optimal solution Space? # of nodes with g cost of optimal solution, O(bceiling(C*/ )) Optimal? Yes nodes expanded in increasing order of g(n)
19
You can promote or demote keys by resetting their priorities Unlike a regular queue, insertions into a priority queue are not constant time, usually O(log n) Well need priority queues for most cost-sensitive search methods.
20
Depth-first search
Expand deepest unexpanded node
21
22
Depth-limited search
Depth-first search can get stuck on infinite path when a different choice would lead to a solution Depth-limited search = depth-first search with depth limit l, i.e., nodes at depth l have no successors
23
1. 2. 3. 4.
Problem with depth-limited search: if the shallowest goal is beyond the depth limit, no solution is found. Iterative deepening search:
Do a DFS which only searches for paths of length 1 or less. (DFS gives up on any path of length 2) If 1 failed, do a DFS which only searches paths of length 2 or less. If 2 failed, do a DFS which only searches paths of length 3 or less. .and so on.
25
26
27
NDLS = 1 + 10 + 100 + 1,000 + 10,000 + 100,000 = 111,111 NIDS = 6 + 50 + 400 + 3,000 + 20,000 + 100,000 = 123,456
Overhead = (123,456 - 111,111)/111,111 = 11%
28
29
Summary of algorithms
30
Artificial Intelligence
For HEDSPI Project
Lecturer 4 - Search
Lecturer: Tran Duc Khanh Dept of Information Systems School of Information and Communicarion Technology HUST
Outline
Graph search Best-first search A* search
Graph search
Graph search
Failure to detect repeated states can turn a linear problem into an exponential one!
Graph search
Best-first search
Idea: use an evaluation function f(n) for each node
estimate of "desirability" Expand most desirable unexpanded node
10
11
Sibiu 253
Timisoara 329
Zerind 374
12
Sibiu 253
Zerind 374
Arad 366
Fagaras 176
Oradea 380
13
Sibiu 253
Zerind 374
Oradea 380
14
A* search
Idea: Expand unexpanded node with lowest evaluation value Evaluation function f(n) = g(n) + h(n) g(n) = cost so far to reach n h(n) = estimated cost from n to goal f(n) = estimated total cost of path through n to goal Nodes are ordered according to f(n).
16
A* search example
Arad 366 = 0 + 366
17
A* search example
Arad 366 = 0 + 366
Timisoara 447=118+329
Zerind 449=75+374
18
A* search example
Arad 366 = 0 + 366
Zerind 449=75+374
Arad 646=280+366
Fagaras 415=239+176
Oradea 671=291+380
19
A* search example
Arad 366 = 0 + 366
Zerind 449=75+374
Arad 646=280+366
Fagaras 415=239+176
Pitesti 417=317+100
Sibiu 553=300+253
20
A* search example
Arad 366 = 0 + 366
Zerind 449=75+374
Pitesti 417=317+100
Sibiu 553=300+253
21
A* search example
Sibiu 393= 40+253
Zerind 449=75+374
Admissible heuristic
Let h*(N) be the true cost of the optimal path from N to a goal node Heuristic h(N) is admissible if: 0 h(N) h*(N) An admissible heuristic is always optimistic
24
Admissible heuristics
The 8-puzzle: h1(n) = number of misplaced tiles h2(n) = total Manhattan distance (i.e., no. of squares from desired location of each tile)
5 6 7
h1(S) = ? h2(S) = ?
4 1 3
Start State
5 1 8 8 2 2 6 8 7
4 2
3 8 4
6
Goal State
2 5
7 2+3+3+2+4+2+0+2 = 18
25
Heuristic quality
Effective branching factor b* Is the branching factor that a uniform tree of depth d would have in order to contain N+1 nodes.
26
27
The optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem.
28
Suppose suboptimal goal G2 in the queue. Let n be an unexpanded node on a shortest to optimal goal G.
= g(G2 ) since h(G2 )=0 > g(G) since G2 is suboptimal >= f(n) since h is admissible Since f(G2) > f(n), A* will never select G2 for expansion
29
f(G2 )
A heuristic is consistent if for every successor n' of a node n generated by any action a, h(n) c(n,a,n') + h(n') (aka monotonic) admissible heuristics are generally consistent
30
Thus, first goal-state selected for expansion must be optimal Theorem: If h(n) is consistent, A* using GRAPH-SEARCH is optimal
31
Contours of A* Search
A* expands nodes in order of increasing f value Gradually adds "f-contours" of nodes Contour i has all nodes with f=fi, where fi < fi+1
32
Contours of A* Search
With uniform-cost (h(n) = 0, contours will be circular With good heuristics, contours will be focused around optimal path A* will expand all nodes with cost f(n) < C*
33
A* search, evaluation
Completeness: YES
Since bands of increasing f are added Unless there are infinitely many nodes with f<f(G)
34
A* search, evaluation
Completeness: YES Time complexity:
Number of nodes expanded is still exponential in the length of the solution.
35
A* search, evaluation
Completeness: YES Time complexity: (exponential with path length) Space complexity:
It keeps all generated nodes in memory Hence space is the major problem not time
36
A* search, evaluation
Completeness: YES Time complexity: (exponential with path length) Space complexity:(all nodes are stored) Optimality: YES
Cannot expand fi+1 until fi is finished. A* expands all nodes with f(n)< C* A* expands some nodes with f(n)=C* A* expands no nodes with f(n)>C*
37
38
Artificial Intelligence
For HEDSPI Project
Outline
Memory-bounded heuristic search Hill-climbing search Simulated annealing search Local beam search
Iterative Deeping A*
Iterative Deeping version of A*
use threshold as depth bound
To find solution under the threshold of f(.)
Iterative Deepening A* Search Algorithm ( for tree search) start set threshold as h(s) put s in OPEN, compute f(s) yes threshold = min( f(.) | f(.) > threshold)
OPEN empty ?
Remove the node of OPEN whose f(.) value is smallest and put it in CLOSE (call it n) yes
n = goal ?
Success
Expand n. calculate f(.) of successor if f(suc) < threshold then Put successors to OPEN if pointers back to n
5
RBFS evaluation
RBFS is a bit more efficient than IDA* Still excessive node generation (mind changes) Like A*, optimal if h(n) is admissible Space complexity is O(bd). IDA* retains only one single number (the current f-cost limit) Time complexity difficult to characterize Depends on accuracy if h(n) and how often best path changes. IDA* and RBFS suffer from too little memory.
(simplified) memory-bounded A*
Use all available memory.
I.e. expand best leafs until available memory is full When full, SMA* drops worst leaf node (highest f-value) Like RBFS, we remember the best descendant in the branch we delete
The deleted node is regenerated when all other candidates look worse than the node. SMA* is complete if solution is reachable, optimal if optimal solution is reachable. Time can still be exponential.
10
12
Example: n-queens
Put n queens on an n n board with no two queens on the same row, column, or diagonal
13
Hill-climbing search
Simple, general idea:
Start wherever Always choose the best neighbor If no neighbors have better scores than current, quit
Hill climbing does not look ahead of the immediate neighbors of the current state. Hill-climbing chooses randomly among the set of best successors, if there is more than one. Some problem spaces are great for hill climbing and others are terrible.
14
Hill-climbing search
function HILL-CLIMBING(problem) return a state that is a local maximum input: problem, a problem local variables: current, a node. neighbor, a node. current MAKE-NODE(INITIAL-STATE[problem]) loop do neighbor a highest valued successor of current if VALUE [neighbor] VALUE[current] then return STATE[current] current neighbor
15
Introduce randomness
16
Hill-climbing variations
Stochastic hill-climbing
Random selection among the uphill moves. The selection probability can vary with the steepness of the uphill move.
First-choice hill-climbing
Stochastic hill climbing by generating successors randomly until a better one is found.
Random-restart hill-climbing
Tries to avoid getting stuck in local maxima. If at first you dont succeed, try, try again
17
Simulated Annealing
Simulates slow cooling of annealing process Applied for combinatorial optimization problem by S. Kirkpatric (83) What is annealing?
Process of slowly cooling down a compound or a substance Slow cooling let the substance flow around thermodynamic equilibrium Molecules get optimum conformation
18
Simulated annealing
gradually decrease shaking to make sure the ball escape from local minima and fall into the global minimum
19
Simulated annealing
Escape local maxima by allowing bad moves.
Idea: but gradually decrease their size and frequency.
If T decreases slowly enough, best state is reached. Applied for VLSI layout, airline scheduling, etc.
20
Simulated annealing
function SIMULATED-ANNEALING( problem, schedule) return a solution state input: problem, a problem schedule, a mapping from time to temperature local variables: current, a node; next, a node. T, a temperature controlling the probability of downward steps current MAKE-NODE(INITIAL-STATE[problem]) Similar to hill climbing, for t 1 to do but a random move T schedule[t] instead of best move if T = 0 then return current next a randomly selected successor of current E VALUE[next] - VALUE[current] case of improvement, make the move if E > 0 then current next else current next only with probability eE /T
Whats the probability when: T inf? Whats the probability when: T 0? Whats the probability when: =0? Whats the probability when: -?
Otherwise, choose the move with probability that decreases exponentially with the badness of the move.
Cooling Schedule
Determines rate at which the temperature T is lowered Lowers T slowly enough, the algorithm will find a global optimum
In the beginning, aggressive for searching alternatives, become conservative when time goes by
22
if Ti is reduced too fast, poor quality if Tt >= T(0) / log(1+t) - Geman System will converge to minimun configuration Tt = k/1+t - Szu Tt = a T(t-1) where a is in between 0.8 and 0.99
23
Difficulties
Determination of parameters If cooling is too slow Too much time to get solution If cooling is too rapid Solution may not be the global optimum
24
p( x) e
E ( x) kT
Greedy Search
Beam Search
26
27
Artificial Intelligence
For HEDSPI Project
Outline
Game and search Alpha-beta pruning
machines are better than humans in: othello humans are better than machines in: go here: perfect information zero-sum games
4
Games adversary
Solution is strategy (strategy specifies move for every possible opponent reply). Time limits force an approximate solution Evaluation function: evaluate goodness of game position Examples: chess, checkers, Othello, backgammon
Ignoring computational complexity, games are a perfect application for a complete search. Of course, ignoring complexity is a bad idea, so games are a good place to study resource bounded searches.
6
Types of Games
deterministic perfect information imperfect information chance
chess, checkers, go, backgammon othello monopoly battleships, blind tictactoe bridge, poker, scrabble nuclear war
Minimax
Two players: MAX and MIN MAX moves first and they take turns until the game is over. Winner gets award, looser gets penalty. Games as search: Initial state: e.g. board configuration of chess Successor function: list of (move,state) pairs specifying legal moves. Terminal test: Is the game finished? Utility function: Gives numerical value of terminal states. E.g. win (+1), loose (-1) and draw (0) in tic-tac-toe MAX uses search tree to determine next move. Perfect play for deterministic games
Minimax
From among the moves available to you, take the best one The best one is determined by a search using the MiniMax strategy
9
Optimal strategies
Find the contingent strategy for MAX assuming an infallible MIN opponent. Assumption: Both players play optimally !! Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) maxs successors(n) MINIMAX-VALUE(s) mins successors(n) MINIMAX-VALUE(s)
10
Minimax
11
Minimax algorithm
12
Properties of minimax
Complete? Yes (if tree is finite) Optimal? Yes (against an optimal opponent) Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration) For chess, b 35, m 100 for "reasonable" games exact solution completely infeasible
13
Alpha-beta pruning:
Remove branches that do not influence final decision Revisit example
14
- pruning
Alpha values: the best values achievable for MAX, hence the max value so far Beta values: the best values achievable for MIN, hence the min value so far At MIN level: compare result V of node to alpha value. If V>alpha, pass value to parent node and BREAK At MAX level: compare result V of node to beta value. If V<beta, pass value to parent node and BREAK
15
- pruning example
16
- pruning example
17
- pruning example
18
- pruning example
19
- pruning example
20
Properties of -
Pruning does not affect final result Entire sub-trees can be pruned. Good move ordering improves effectiveness of pruning. With "perfect ordering"
time complexity = O(bm/2) doubles depth of search Branching factor of sqrt(b) !! Alpha-beta pruning can look twice as far as minimax in the same amount of time
A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)
21
Why is it called -?
is the value of the best (i.e., highestvalue) choice found so far at any choice point along the path for max If v is worse than , max will avoid it
prune that branch
The - algorithm
23
The - algorithm
24
25
Cut-off search
Change:
if TERMINAL-TEST(state) then return UTILITY(state)
into:
if CUTOFF-TEST(state,depth) then return EVAL(state)
Only useful for quiescent (no wild swings in value in near future) states
27
For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + + wn fn(s) e.g., w1 = 9 with f1(s) = (number of white queens) (number of black queens), etc.
28
Chess complexity
PC can search 200 millions nodes/3min. Branching factor: ~35
355 ~ 50 millions if use minimax, could look ahead 5 plies, defeated by average player, planning 6-8 plies.
To reach grandmaster level, needs a better extensively tuned evaluation and a large database of optimal opening and ending of the game
29
Nondeterministic games
Chance introduces by dice, card-shuffling, coin-flipping... Example with coin-flipping: change nodes
31
Backgammon
EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxssuccessors(n) EXPECTEDMINIMAX(s) If n is a max node minssuccessors(n) EXPECTEDMINIMAX(s) If n is a max node ssuccessors(n) P(s) .EXPECTEDMINIMAX(s) If n is a chance node P(s) is probability of s occurence
33
34
Artificial Intelligence
For HEDSPI Project
CSP
Standard search problems
State is a black-box
Any data structure that implements initial states, goal states, successor function
CSPs
State is composed of variables Xi with value in domain Di Goal test is a set of constraints over variables
Domain
Di = {red, green, blue}
Constraint
Neighbor regions must have different colors
WA /= NT WA /= SA NT /= SA
WA=red, and NT=green, and Q=red, and NSW=green, and V=red, and SA=blue
Constraint Graph
Binary CSPs
Each constraint relates at most two variables
Constraint graph
Node is variable Edge is constraint
Varieties of CSPs
Discrete variables
Finite domain, e.g, SAT Solving Infinite domain, e.g., work scheduling
Variables is start/end of working day Constraint laguage, e.g., StartJob1 + 5 <= StartJob3 Linear constraints are decidable, non-linear constraints are undecidable
Continuous variables
e.g., start/end time of observing the universe using Hubble telescope Linear constraints are solvable using Linear Programming
Varieties of Constraints
Single-variable constraints
e.g., SA /= green
Binary constraints
e.g., SA /= WA
Multi-variable constraints
Relate at least 3 variables
Soft constraints
Priority, e.g., red better than green Cost function over variables
Example: Cryptarimetic
Variables
F,T,O,U,R,W, X1,X2,X3
Domain
{0,1,2,3,4,5,6,7, 8,9}
Constraints
Alldiff(F,T,O,U,R,W) O+O = R+10*X1 X1+W+W= U+10*X2 X2+T+T= O+10*X3 X3=F
Scheduling
E.g., when and where the class takes place
Initial state
The empty assignment
Successor function
Assign a value to a unassigned variable that does not conflict with current assignment
Fail if no legal assignment
Goal test
All variables are assigned and no conflict
Backtracking Search
Variable assignments are commutative, e.g.,
{WA=red, NT =green} {NT =green, WA=red}
Single-variable assignment
Only consider one variable at each node dn leaves
Backtracking search
Depth-first search+ Single-variable assignment
Choosing Variables
Minimum remaining values (MRV)
Choose the variable with the fewest legal values
Degree heuristic
Choose the variable with the most constraints on remaining variables
Choosing Values
Least constraining value (LCV)
Choose the least constraining value
the one that rules out the fewest values in the remaining variables
Forward Checking
Constraint propagation
Problem Structure
Assume we have a new region T in addition T and the rest are two independent problems
Each problem is a connected component in the constraint graph
Problem Structure
Tree-structured problem
Theorem
If the constraint graph has no loop then CSPs can be solved in O(nd2) time
Problem Structure
Algorithm for tree-structured problems
Value selection by min-conflicts heuristic Choose value that violates the fewest constraints
i.e., hill climbing with h(n) = total number of violated constraints
Example: 4-Queens
State: 4 queens in four columns (4*4 = 256 states) Operators: move queen in column Goal test: no attacks Evaluation: h(n) = number of attacks
Summary
CSPs are a special kind of problem:
states defined by values of a fixed set of variables goal test defined by constraints on variable values
Backtracking = depth-first search with one variable assigned per node Variable ordering and value selection heuristics help significantly Forward checking prevents assignments that guarantee later failure Constraint propagation (e.g., arc consistency) does additional work to constrain values and detect inconsistencies The CSPs representation allows analysis of problem structure Tree-structured CSPs can be solved in linear time Iterative min-conflicts is usually effective in practice
Exercice
Solve the following cryptarithmetic problem by combining the heuristics
Constraint Propagation Minimum Remaining Values Least Constraining Values
Exercice
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Choose X3: domain {0,1} Choose X3=1: use constraint propagation F/=0 F=1 Choose X2: X1 and X2 c have the same remaing values Choose X2=0 Choose X1: X1 has MRV Choose X1=0 Choose O: O must be even, less than 5 and therefore has MRV (T+T=O d 1 v O+O=R+10*0) Choose O=4 R=8 T=7 Choose U: U must be even, less than 9 U=6: constraint propagation W=3
Artificial Intelligence
For HEDSPI Project
Outline
What is Logic Propositional Logic
Syntax Semantic
Knowledge-based Agents
Know about the world
They maintain a collection of facts (sentences) about the world, their Knowledge Base, expressed in some formal language.
What is Logic ?
A logic is a triplet <L,S,R>
L, the language of the logic, is a class of sentences described by a precise syntax, usually a formal grammar S, the logics semantic, describes the meaning of elements in L R, the logics inference system, consisting of derivation rules over L
Examples of logics:
Propositional, First Order, Higher Order, Temporal, Fuzzy, Modal, Linear,
Propositional Logic
Propositional Logic is about facts in the world that are either true or false, nothing else Propositional variables stand for basic facts Sentences are made of
propositional variables (A,B,), logical constants (TRUE, FALSE), and logical connectives (not,and,or,..)
The meaning of sentences ranges over the Boolean values {True, False}
Examples: Its sunny, John is married
, , , ,
Sentences
Each propositional variable is a sentence Each logical constant is a sentence If and are sentences then the following are sentences
( ), , , , ,
Formal Grammar
Sentence -> Asentence | Csentence Asentence -> TRUE | FALSE | A | B| Csentence -> (Sentence) | Sentence | Sentence Connective Sentence | | | | Connective ->
Validity
A sentence is valid if it is true in every interpretation Ex: ((P or H) and A ) => P is valid P or H is not valid
We write
if and only if every interpretation that makes all sentences in true also makes true
is valid iff True (also written ) is unsatisfiable iff False iff { } is unsatisfiable
Forward Chaining
Given a set of rules, i.e. formulae of the form p1 p2 ... pn q and a set of known facts, i.e., formulae of the form q, r, A new fact p is added Find all rules that have p as a premise If the other premises are already known to hold then
add the consequent to the set of know facts, and trigger further inferences
Forward Chaining
Example
Forward Chaining
Forward Chaining
Forward Chaining
Forward Chaining
Forward Chaining
Forward Chaining
Forward Chaining
Soundness
Yes
Completeness
Yes
Backward Chaining
Given a set of rules, and a set of known facts We ask whether a fact P is a consequence of the set of rules and the set of known facts The procedure check whether P is in the set of known facts Otherwise find all rules that have P as a consequent
If the premise is a conjunction, then process the conjunction conjunct by conjunct
Backward Chaining
Example
Backward Chaining
25
Backward Chaining
26
Backward Chaining
27
Backward Chaining
28
Backward Chaining
Soundness
Yes
Completeness
Yes
Artificial Intelligence
For HEDSPI Project
FOL
Syntax Semantic Inference
Resolution
FOL Syntax
Symbols
Variables: x, y, z, Constants: a, b, c, Function symbols (with arities): f, g, h, Relation symbols (with arities): p, r, r Logical connectives: ,,, , Quantifiers: ,
FOL Syntax
Variables, constants and function symbols are used to build terms
X, Bill, FatherOf(X),
FOL Syntax
Terms
Variables are terms Constants are terms If t1,, tn are terms and f is a function symbol with arity n then f(t1,, tn ) is a term
FOL Syntax
Predicates
If t1,, tn are terms and p is a relation symbol with arity n then p(t1,, tn ) is a predicate
FOL Syntax
Sentences
True, False are sentences Predicates are sentences If , are sentences then the followings are sentences
x. , x. , ( ), , , , ,
FOL
Syntax Semantic Inference
Resolution
FOL Semantic
Variables
Objects
Constants
Entities
Function symbol
Function from objects to objects
Relation symbol
Relation between objects
Quantifiers
FOL Semantic
Interpretation (D, )
D is a set of objects, called domain or universe is a mapping from variables to D CD is a member of D for each constant C FD is a mapping form Dn to D for each function symbol F with arity n RD is a relation over Dn for each relation symbol R with arity n
FOL Semantic
Given an interpretation (D, ), semantic of a term/sentence is denoted
[ ]
D a
Interpretation of terms
FOL Semantic
Interpretation of sentence
Example
Symbols
Variables: x,y,z, Constants: 0,1,2, Function symbols: +,* Relation symbols: >, =
Semantic
Universe: N (natural numbers) The meaning of symbols Constants: the meaning of 0 is the number zero, Function symbols: the meaning of + is the natural number addition, Relation symbols: the meaning of > is the relation greater than,
FOL Semantic
Satisfiability Model
A sentence is satisfiable if it is true under some interpretation (D, )
An interpretation (D, ) is a model of a sentence if true under (D, ) Then we write (D, ) |=
is
A sentence is valid if every interpretation is its mode A sentence is valid in D if (D, ) |= for all A sentence is unsatisfiable if it has no model
Example
Consider the universe N of natural numbers x.x + 1 > 5 is satisfiable
x.x + 1 > 0 is valid is N x.2 x + 1 = 6 is unsatisfiable
Artificial Intelligence
For HEDSPI Project
Lecturers : Le Thanh Huong Tran Duc Khanh Dept of Information Systems Faculty of Information Technology - HUT
Inference in FOL
Difficulties
Quantifiers Infinite sets of terms Infinite sets of sentences
Robinsons Resolution
Herbrands Theorem (~1930)
A set of sentences S is unsatisfiable if and only there exists a finite subset Sg of the set of all ground instances Gr(S), which is unsatisfiabe
Herbrand showed that there is a procedure to demonstrate the unsatisfiability of a unsatisfiable set of sentences Robinson propose the Resolution procedure (~1950)
Idea of Resolution
Refutation-based procedure S |= A if and only if S {A} is unsatisfible Resolution procedure
Transform S {A} into a set of clauses Apply Resolution rule to find a the empty calause (contradiction) If the empty clause is found
Conclude S |= A
Otherwise
No conclusion
Clause
A clause is a disjunction of literals, i.e., has the form
P P2 ... Pn 1
Example
Pi []Ri
P ( x) Q( x, a ) R(b) P ( y ) Q(b, y ) R( y )
The empty clause corresponds to a contradiction Any sentence can be transformed to an equi-satisfiable set of clauses
Elements of Resolution
Resolution rule Unification Transform a sentence to a set of clauses
Elements of Resolution
Resolution rule Unification Transform a sentence to a set of clauses
Resolution rule
Resolution rule
A B
C D ( A D)
= mgu ( B, C )
mgu: most general unifier The most general assignment of variables to terms in such a way that two terms are equal Syntactical unification algorithm
= {x = b, y = a}
Elements of Resolution
Resolution rule Unification Transform a sentence to a set of clauses
Unification
Input
Set of equalities between two terms
Output
Most general assignment of variables that satisfies all equalities Fail if no such assignment exists
Unification algorithm
Vars(U), Vars(t) are sets of variables in U and t v is a variable s and t are terms f and g are function symbols
Example of Unification
Elements of Resolution
Resolution rule Unification Transform a sentence to a set of clauses
Eliminate implication Move negation inward Standardize variable scope Move quantifiers outward Skolemize existential quantifiers Eliminate universal quantifiers Distribute and, or Flatten and, or Eliminate and
Eliminate implication
Distribute and, or
Flatten and, or
Eliminate and
Summary of Resolution
Refutation-based procedure S |= A if and only if S {A} is unsatisfiable Resolution procedure
Transform S {A} into a set of clauses Apply Resolution rule to find a the empty clause (contradiction) If the empty clause is found
Conclude S |= A
Otherwise
No conclusion
Summary of Resolution
Theorem
A set of clauses S is unsatisfiable if and only if upon the input S, Resolution procedure finds the empty clause (after a finite time).
Exercice
The law says that it is a crime for an American to sell weapons to hostile nations The country Nono, an enemy of America, has some missiles, and all of its missiles were sold to it by Colonel West, who is American Is West a criminal?
Modeling
Exercice
Jack owns a dog Every dog owner is an animal lover No animal lover kills an animal Either Jack or Curiosity killed the cat, who is named Tuna Did Curiosity kill the cat?
Modeling
x.Dog ( x) Owns ( Jack , x) xy.( Dog ( y ) Owns ( x, y )) AnimalLover ( x) xy.( AnimalLover ( x) Animal ( y ) Kills ( x, y )) Kills ( Jack , Tuna ) Kill (Curiosity , Tuna ) Cat (Tuna ) x.Cat ( x) Animal ( x)
Artificial Intelligence
For HEDSPI Project
Lecturer 12 - Planning
Lecturer : Tran Duc Khanh Dept of Information Systems School of Information and Communication Technology HUST
Outline
Planning problem State-space search Partial-order planning Planning graphs Planning with propositional logic
Planning problem
Planning is the task of determining a sequence of actions that will achieve a goal. Domain independent heuristics and strategies must be based on a domain independent representation
General planning algorithms require a way to represent states, actions and goals STRIPS, ADL, PDDL are languages based on propositional or first-order logic
Classical planning environment: fully observable, deterministic, finite, static and discrete.
Additional complexities
Because the world is
Dynamic Stochastic Partially observable
AI Planning background
Focus on classical planning; assume none of the above Deterministic, static, fully observable
Basic Most of the recent progress Ideas often also useful for more complex problems
Problem Representation
State What is true about the (hypothesized) world? Goal What must be true in the final state of the world? Actions What can be done to change the world? Preconditions and effects Well represent all these as logical predicates
STRIPS operators
Tidily arranged actions descriptions, restricted language Action: Buy(x) Precondition: At(p); Sells(p; x) Effect: Have(x) [Note: this abstracts away many important details!] Restricted language efficient algorithm Precondition: conjunction of positive literals Effect: conjunction of literals A complete set of STRIPS operators can be translated into a set of successor-state axioms
Initial C A B
Goal A B C
forward state-space search consider the effect of all possible actions in a given state Regression planners backward state-space search Determine what must have been true in the previous state in order to achieve the current state
10
goal
11
Progression algorithm
Formulation as state-space search problem: Initial state and goal test: obvious Successor function: generate from applicable actions Step cost = each action costs 1 Any complete graph search algorithm is a complete planning algorithm. E.g. A* Inherently inefficient: (1) irrelevant actions lead to very broad search tree (2) good heuristic required for efficient search
12
13
Regression algorithm
How to determine predecessors?
What are the states from which applying a given action leads to the goal? Goal state = At(C1, B) At(C2, B) At(C20, B) Relevant action for first conjunct: Unload(C1,p,B) Works only if pre-conditions are satisfied. Previous state= In(C1, p) At(p, B) At(C2, B) At(C20, B) Subgoal At(C1,B) should not be present in this state.
Actions must not undo desired literals (consistent) Main advantage: only relevant actions are considered.
Often much lower branching factor than forward search.
14
Regression algorithm
General process for predecessor construction Give a goal description G Let A be an action that is relevant and consistent The predecessors are as follows: Any positive effects of A that appear in G are deleted. Each precondition literal of A is added , unless it already appears. Any standard search algorithm can be added to perform the search. Termination when predecessor satisfied by initial state. In FO case, satisfaction might require a substitution.
15
16
17
Subgoal independence: the cost of solving a set of subgoals equal the sum cost of solving each one independently.
Can be pesimistic (interacting subplans) Can be optimistic (negative effects)
Simple: number of unsatisfied subgoals. Various ideas related to removing negative effects or positive effects.
19
SNLP: Systematic Nonlinear Planning (McAllester and Rosenblitt 1991) NONLIN (Tate 1977)
20
21
Partial-order planning
Partially ordered collection of steps with
Start step has the initial state description as its effect Finish step has the goal description as its precondition causal links from outcome of one step to precondition of another temporal ordering between pairs of steps
Open condition = precondition of a step not yet causally linked A plan is complete iff every precondition is achieved A precondition is achieved iff it is the effect of an earlier step and no possibly intervening step undoes it
22
Example
Example
Example
Planning process
Operators on partial plans:
add a link from an existing action to an open condition add a step to fulfill an open condition order one step wrt another to remove possible conflicts
Gradually move from incomplete/vague plans to complete, correct plans Backtrack if an open condition is unachievable or if a conflict is unresolvable
26
29
Properties of POP
Nondeterministic algorithm: backtracks at choice points on failure choice of Sadd to achieve Sneed choice of demotion or promotion for clobberer selection of Sneed is irrevocable POP is sound, complete, and systematic (no repetition) Extensions for disjunction, universals, negation, conditionals Can be made effcient with good heuristics derived from problem description Particularly good for problems with many loosely related subgoals
30
31
32
33
34
35
Planning Graphs
A planning graph consists of a sequence of levels that correspond to time-steps in the plan Level 0 is the initial state. Each level contains a set of literals and a set of actions Literals are those that could be true at the time step. Actions are those that their preconditions could be satisfied at the time step. Works only for propositional planning.
36
37
38
Graph plans are relaxation of the problem. Representing more than pair-wise mutex is not costeffective
40
41
43
Because literals increase and mutex decrease it is guaranteed that we will have a level where all goals are non-mutex
44
We search for a model to the formula. Those actions that are assigned true constitute a plan. To have a single plan we may have a mutual exclusion for all actions in the same time slot. We can also choose to allow partial order plans and only write exclusions between actions that interfere with each other. Planning: iteratively try to find longer and longer plans.
46
SATplan algorithm
47
Complexity of satplan
The total number of action symbols is: |T|x|Act|x|O|^p O = number of objects, p is scope of atoms. Number of clauses is higher. Example: 10 time steps, 12 planes, 30 airports, the complete action exclusion axiom has 583 million clauses.
48
Artificial Intelligence
For HEDSPI Project
Which word?
we
do
in
the
right
Move left
Training examples
How many training examples are sufficient? How does the size of the training set influence the accuracy of the learned target function? How does noise and/or missing-value data influence the accuracy?
Learning capability
What target function should the system learn?
Representation of the target function: expressiveness vs. complexity
What are the theoretical limits of learnability? How can the system generalize from the training examples?
To avoid the overfitting problem
Artificial Intelligence
For HEDSPI Project
A DT can be represented (interpreted) as a set of IF-THEN rules (i.e., easy to read and understand) Capable of learning disjunctive expressions DT learning is robust to noisy data One of the most widely used methods for inductive inference Successfully applied to a range of real-world applications
2
player?
is present is absent
football?
is present is absent
Interested
Uninterested
Interested
is present
goal?
is absent
Interested
Uninterested
Humidity=?
High Normal
Yes
Strong
Wind=?
Weak
No
Yes
No
Yes
(Outlook=Overcast, Temperature=Hot, Humidity=High, Wind=Weak) Yes (Outlook=Rain, Temperature=Mild, Humidity=High, Wind=Strong) No (Outlook=Sunny, Temperature=Hot, Humidity=High, Wind=Strong) No
4
player?
is present is absent
football?
is present is absent
Interested
Uninterested
Interested
is present
goal?
is absent
Interested
Uninterested
Humidity=?
High Normal
Yes
Strong
Wind=?
Weak
No
Yes
No
Yes
10
Entropy
A commonly used measure in the Information Theory field To measure the impurity (inhomogeneity) of a set of instances The entropy of a set S relative to a c-class classification
Entropy ( S ) = pi . log 2 pi
i =1
The entropy of a set S relative to a two-class classification Entropy(S) = -p1.log2p1 p2.log2p2 Interpretation of entropy (in the Information Theory field)
The entropy of S specifies the expected number of bits needed to encode class of a member randomly drawn out of S Optical length code assigns log2p bits to message having probability p The expected number of bits needed to encode a class: p.log2p
12
S contains 14 instances, where 9 belongs to class c1 and 5 to class c2 The entropy of S relative to the two-class classification:
Entropy(S) = -(9/14).log2(9/14)(5/14).log2(5/14) 0.94
0.5
Entropy =0, if all the instances belong to the same class (either c1 or c2)
Need 0 bit for encoding (no message need be sent)
0.5
p1
Entropy = some value in (0,1), if the set contains unequal numbers of c1 and c2 instances
Need on average <1 bit per message for encoding
13
Information gain
Information gain of an attribute relative to a set of instances is
the expected reduction in entropy caused by partitioning the instances according to the attribute
In the above formula, the second term is the expected value of the entropy after S is partitioned by the values of attribute A Interpretation of Gain(S,A): The number of bits saved (reduced) for encoding class of a randomly drawn member of S, by knowing the value of attribute A
14
Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High
Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong
Play Tennis No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
[Mitchell, 1997]
So, Outlook is chosen as the test attribute for the root node!
Outlook=?
Sunny Overcast S={9+, 5-} Rain
Node1
SSunny={2+, 3-}
Yes
SOvercast={4+, 0-}
Node2
SRain={3+, 2-}
17
Humidity=?
Yes
SOvercast= {4+, 0-} Normal
Node2
SRain= {3+, 2-}
High
Node3
SHigh= {0+, 3-}
Node4
SNormal= {2+, 0-}
18
19
At each step in the search, ID3 uses a statistical measure of all the instances (i.e., information gain) to refine its current hypothesis
The resulting search is much less sensitive to errors in individual training instances
20
Outlook=?
Overcast Rain
Humidity=?
High Normal
Yes
Wind=?
Weak
Temperature=?
Hot Cool Mild
Yes
Wind=?
Strong Weak
Strong
No
Yes
No
Yes
No
Yes
Humidity=?
High Normal
No
Yes
No
21
Yes
22
Issues in DT learning
Over-fitting the training data Handling continuous-valued (i.e., real-valued) attributes Choosing appropriate measures for attribute selection Handling training data with missing attribute values Handling attributes with differing costs An extension of the ID3 algorithm with the above mentioned issues resolved results in the C4.5 algorithm
23
Artificial Intelligence
For HEDSPI Project
Features of RL
Learning from numerical rewards Interaction with the task; sequences of states, actions and rewards Uncertainty and non-deterministic worlds Delayed consequences The explore/exploit dilemma The whole problem of goal-directed learning
Points of view
From the point of view of agents
RL is a process of trial-and-error learning How much reward will I get if I do this action?
Applications of RL
Robot Animal training Scheduling Games Control systems
Examples
Chess
Win +1, loose -1
Elevator dispatching
reward based on mean squared time for elevator to arrive (optimization problem)
Reward Function
defines the goal(s) in an RL problem maps from states, state-action pairs, or state-action-successor state, triplets to a numerical reward goal of the agent is to maximise the total reward in the long run the policy is altered to achieve this goal
Rt = rt +1 + rt + 2 + ... + rT
where T is the last time step of the world
Discounted Return
The geometrically discounted model of return
Rt = rt +1 + rt + 2 + ... + rT
T
0 1
is called discount rate, used to Bound the infinite sum Favor earlier rewards, in other words to give preference to shorter paths
Optimal Policies
An RL agent adapts its policy in order to increase return A policy p1 is at least as good as a policy p2 if its expected return is at least as great in each possible initial state An optimal policy p is at least as good as any other policy
Value Functions
A value function maps each state to an estimate of return under a policy An action-value function maps from stateaction pairs to estimates of return Learning a value function is referred to as the prediction problem or policy evaluation in the Dynamic Programming literature
Q-learning
Learns action-values Q(s,a) rather than statevalues V(s) Action-values learning
Q-learning Algorithm
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Algorithm Q { For each (s,a) initialize Q(s,a) at zero Choose current action s Iterate infinitely{ Choose and execute action a Get immediate reward r Choose new state s Update Q(s,a) as follows: Q(s,a) s s } }
r + max Q(s,a)
Example
Initially Initialization
G 0 0
0 0 0 0 0
100 G 0 100 0
Example
s1 Assume Go right: s2
Reward: 0
= 0,9
A 0 0
0 0 0 0 0
100 G 0 0 0 0 100 0
100 A 0 0 0 0 0 0 G 100
Example
Go right
Reward: 100
Update s2
Reward: 100
0 0 0 0
100 A 0 0 0 0 0 0 G 0 100 0
100 A 0 0 0 0 0 0 G 100
Example
Update s1
Reward: 90
s2
A 0 0
90 0 0 0 0
100 G 0 0 0 0 100 0
90
100 G 0 100 0
A 0 0 0
0 0
90 72 81 81 81 81 72
100 G 90 100
90 81
Exercice
Agent is in room C of the building The goal is to get out of the building
A A B C D E F
F
100
100 100
Result
= 0,8
A A B C D E Divide all rewards by 5 F
320 400 400 255 320 400 320 320 400 500 500
E
400
F
500
Artificial Intelligence
For HEDSPI Project
Neuron
Has an input/output (I/O) characteristic Implements a local computation
Applications of ANNs
Image processing and computer vision
E.g., image matching, preprocessing, segmentation and analysis, computer vision, image compression, stereo vision, and processing and understanding of time-varying images
Signal processing
E.g., seismic signal analysis and morphology
Pattern recognition
E.g., feature extraction, radar signal classification and analysis, speech recognition and understanding, fingerprint identification, character recognition, face recognition, and handwriting analysis
Medicine
E.g., electrocardiographic signal analysis and understanding, diagnosis of various diseases, and medical image processing
4
Applications of ANNs
Military systems
E.g., undersea mine detection, radar clutter classification, and tactical speaker recognition
Financial systems
E.g., stock market analysis, real estate appraisal, credit card authorization, and securities trading
Power systems
E.g., system state estimation, transient detection and classification, fault detection and recovery, load forecasting, and security assessment
...
5
The bias w0 (with the input x0=1) Net input is an integration function of the inputs Net(w,x) Activation (transfer) function computes the output of the neuron f(Net(w,x)) Output of the neuron: Out=f(Net(w,x))
Net = w0 + w1 x1 + w2 x2 + ... + wm xm = w0 .1 + wi xi = wi xi
i =1 i =0
Net = w1x1 + w0 x1
7
x1
Out 1
Bipolar hard-limiter
Out 1
Net
0 -1
Net
Out
( >0)
1 - 0 1/ (1/)-
Net
1 1 + e ( Net + )
Out
1 0.5
Net
10
Out
Net
11
Network structure
Topology of an ANN is composed by:
The number of input signals and output signals The number of layers The number of neurons in each layer The number of weights in each neuron The way the weights are linked together within or between the layer(s) Which neurons receive the (error) correction signals input hidden layer output layer output An ANN with one hidden layer Input space: 3-dimensional Output space: 2-dimensional In total, there are 6 neurons - 4 in the hidden layer - 2 in the output layer bias
Network structure
A layer is a group of neurons A hidden layer is any layer between the input and the output layers Hidden nodes do not directly interact with the external environment An ANN is said to be fully connected if every output from one layer is connected to every node in the next layer An ANN is called feed-forward network if no node output is an input to a node in the same layer or in a preceding layer When node outputs can be directed back as inputs to a node in the same (or a preceding) layer, it is a feedback network
If the feedback is directed back as input to the nodes in the same layer, then it is called lateral feedback
Feedback networks that have closed loops are called recurrent networks
13
14
Learning rules
Two kinds of learning in neural networks
Parameter learning
Focus on the update of the connecting weights in an ANN
Structure learning
Focus on the change of the network structure, including the number of processing elements and their connection types
These two kinds of learning can be performed simultaneously or separately Most of the existing learning rules are the type of parameter learning We focus the parameter learning
15
xj
...
xm
Perceptron
A perceptron is the simplest type of ANNs Use the hard-limit activation function
m wj x j Out = sign( Net ( w, x) ) = sign j =0
x0=1 x1 x2 xm w1 w2 wm w0
Out
17
Perceptron Illustration
The decision hyperplane x1 w0+w1x1+w2x2=0
Output=1
Output=-1 x2
18
Perceptron Learning
Given a training set D= {(x,d)}
x is the input vector d is the desired output value (i.e., -1 or 1)
The perceptron learning is to determine a weight vector that makes the perceptron produce the correct output (-1 or 1) for every training instance If a training instance x is correctly classified, then no update is needed If d=1 but the perceptron outputs -1, then the weight w should be updated so that Net(w,x) is increased If d=-1 but the perceptron outputs 1, then the weight w should be updated so that Net(w,x) is decreased
19
Perceptron_incremental(D, ) Initialize w (wi an initial (small) random value) do for each training instance (x,d)D Compute the real output value Out if (Outd) w w + (d-Out)x end for until all the training instances in D are correctly classified return w
20
Perceptron_batch(D, ) Initialize w (wi an initial (small) random value) do w 0 for each training instance (x,d)D Compute the real output value Out if (Outd) w w + (d-Out)x end for w w + w until all the training instances in D are correctly classified return w
21
Perceptron - Limitation
The perceptron learning procedure is proven to converge if
The training instances are linearly separable With a sufficiently small used A perceptron cannot correctly classify this training set!
The perceptron may not converge if the training instances are not linearly separable We need to use the delta rule
Converges toward a best-fit approximation of the target function The delta rule uses gradient descent to search the hypothesis space (of possible weight vectors) to find the weight vector that best fits the training instances
22
1 Ex (w ) = (d i Outi ) 2 i =1
The training error made by the currently estimated weights vector w over the entire training set D:
1 ED (w ) = D
xD
E (w )
x
23
Gradient descent
Gradient of E (denoted as E) is a vector
The direction points most uphill The length is proportional to steepness of hill
The gradient of E specifies the direction that produces the steepest increase in E E E E
E ( w ) = w , w ,..., w 2 N 1
where N is the number of the weights in the network (i.e., N is the length of w)
Hence, the direction that produces the steepest decrease is the negative of the gradient of E E
w = -.E(w);
wi =
wi
, i = 1..N
Requirement: The activation functions used in the network must be continuous functions of the weights, differentiable everywhere
24
25
Gradient_descent_incremental (D, ) Initialize w (wi an initial (small) random value) do for each training instance (x,d)D Compute the network output for each weight component wi wi wi (Ex/wi) end for end for until (stopping criterion satisfied) return w
Stopping criterion: # of iterations (epochs), threshold error, etc.
26
29
wqj
... Outq ...
wiq
... Outi ...
30
Given an input vector x, a neuron zq in the hidden layer m receives a net input of Net q = wqj x j
j =1
32
Before discussing the error signals and their back propagation, we first define an error (cost) function
1 n 1 n 2 2 E (w ) = (d i Out i ) = [d i f ( Net i )] 2 i =1 2 i =1
1 = d i f wiq Out q 2 i =1 q =1
n l
33
where Neti is the net input to neuron yi in the output layer, and f'(Neti)=f(Neti)/Neti
34
From the equation of the error function E(w), it is clear that each error term (di-yi) (i=1..n) is a function of Outq
1 E (w ) = d i f wiq Out q 2 i =1 q =1
n l 2
35
i =1 n
where Netq is the net input to neuron zq in the hidden layer, and f'(Netq)=f(Netq)/Netq
36
The important feature of the BP algorithm: the weights update rule is local
To compute the weight change for a given connection, we need only the quantities available at both ends of that connection!
37
38
Back_propagation_incremental(D, )
A network with Q feed-forward layers, q = 1,2,...,Q
qNet i
and qOuti are the net input and output of the ith neuron in the qth layer
is the weight of the connection from the jth neuron in the (q-1)th layer to the ith neuron in the qth layer
ij
Step 0 (Initialization) Choose Ethreshold (a tolerable error) Initialize the weights to small random values Set E=0 Step 1 (Training loop) Apply the input vector of the kth training instance to the input layer (q=1)
qOut i
= 1Outi = xi(k), I Step 2 (Forward propagation) Propagate the signal forward through the network, until the network outputs (in the output layer) QOuti have all been obtained
q
Out i = f
39
Step 3 (Output error measure) Compute the error and error signals Qi for every neuron in the output layer
1 n E = E + ( d i( k ) Q Out i ) 2 2 i =1
Propagate the error backward to update the weights and compute the error signals q-1i for the preceding layers
qwij = .(qi).(q-1Outj);
q 1
qw ij
= qwij + qwij
Check whether the entire training set has been exploited (i.e., one epoch) If the entire training set has been exploited, then go to step 6; otherwise, go to step 1
x1
f(Net2)
x2
f(Net3)
f(Net5)
41
x1
w1x1 x1
f(Net1)
w1x2 x2
f(Net2)
x2
f(Net3)
f(Net5)
x1
w2 x1 x1
f(Net2)
x2
w2 x2 x2
f(Net3)
f(Net5)
Out 2 = f ( w2 x1 x1 + w2 x2 x2 )
43
x1
f(Net2)
x2
w3 x1 x1
f(Net5)
w3 x2 x2
f(Net3)
Out3 = f ( w3 x1 x1 + w3 x2 x2 )
44
x1
f(Net2)
w41Out1
w42Out2
f(Net4)
w43Out3
f(Net5)
f(Net6)
Out6
x2
f(Net3)
x1
f(Net2)
w51Out1
w52Out2
f(Net5)
x2
f(Net3)
w53Out3
Out5 = f ( w51Out1 + w52Out 2 + w53Out3 )
46
x1
f(Net2)
f(Net4)
w 64Out4
f(Net6) Out6
x2
f(Net3)
f(Net5)
w65Out5
x1
f(Net2)
f(Net4)
6
f(Net6) Out6
x2
f(Net3)
6 =
x1
f(Net2)
4
f(Net4)
w64
6
f(Net6)
Out6
x2
f(Net3)
f(Net5)
x1
f(Net2)
f(Net4)
6
f(Net6)
Out6
5
f(Net5)
x2
f(Net3)
w65
5 = f '(Net5 )(w656 )
50
x1
f(Net2)
w41
w51
4
f(Net4) f(Net6) Out6
5
f(Net5)
x2
f(Net3)
51
x1
4
w42 w52
f(Net4) f(Net6) Out6
2
f(Net2)
5
f(Net5)
x2
f(Net3)
x1
f(Net2)
4
f(Net4) f(Net6) Out6
w43
w53
5
f(Net5)
x2
3
f(Net3)
1
f(Net4) f(Net6) Out6
x1
w1x2
f(Net2)
x2
f(Net3)
f(Net5)
w1 x1 = w1 x1 + 1 x1 w1 x 2 = w1 x 2 + 1 x 2
54
x1
w2 x1 2
f(Net2)
x2
w2 x2
f(Net3)
f(Net5)
w 2 x1 = w 2 x1 + 2 x1 w 2 x 2 = w 2 x 2 + 2 x 2
55
x1
f(Net2)
x2
w3x1
w3x2
3
f(Net3)
f(Net5)
w3 x1 = w3 x1 + 3 x1 w3 x 2 = w3 x 2 + 3 x 2
56
x1
f(Net2)
w41
4
f(Net4) Out6
w42
w43
f(Net5)
f(Net6)
x2
f(Net3)
x1
f(Net2)
x2
f(Net3)
w52
w 51
w53
5
f(Net5)
58
x1
f(Net2)
f(Net4)
w64 w65
6
f(Net6)
Out6
x2
f(Net3)
f(Net5)
Disadvantages
No clear rules or design guidelines for arbitrary applications No general way to assess the internal operation of the network (therefore, an ANN system is seen as a black-box) Difficult to predict future network performance (generalization)
60
61