0% found this document useful (0 votes)
109 views270 pages

AI All Slides

This document discusses artificial intelligence and provides definitions and examples of key concepts. It begins by defining AI as systems that think or act like humans or rationally. It then discusses different types of thinking systems, including those that model human cognition and those that use logical reasoning. For acting systems, it covers those that mimic human behavior and those that act rationally to achieve goals. The document also provides examples of areas of AI like natural language processing, computer vision, robotics, and intelligent agents. It describes how to design agents and defines rational agents as those that do the "right thing" given their environment and goals.

Uploaded by

Rakibul Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views270 pages

AI All Slides

This document discusses artificial intelligence and provides definitions and examples of key concepts. It begins by defining AI as systems that think or act like humans or rationally. It then discusses different types of thinking systems, including those that model human cognition and those that use logical reasoning. For acting systems, it covers those that mimic human behavior and those that act rationally to achieve goals. The document also provides examples of areas of AI like natural language processing, computer vision, robotics, and intelligent agents. It describes how to design agents and defines rational agents as those that do the "right thing" given their environment and goals.

Uploaded by

Rakibul Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 270

Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Textbook
Areas of AI and some dependencies

Knowledge
Search Logic Representation

Machine
Planning
Learning

Expert
NLP Vision Robotics Systems
What is Artificial Intelligence ?

 Making computers that think?

 The automation of activities we associate with human


thinking, like decision making, learning ... ?

 The art of creating machines that perform functions that


require intelligence when performed by people ?
What is Artificial Intelligence ?

 The study of computations that make it possible to perceive,


reason and act ?

 A field of study that seeks to explain and emulate intelligent


behaviour in terms of computational processes ?

 A branch of computer science that is concerned with the


automation of intelligent behaviour ?
What is Artificial Intelligence ?

THOUGHT Systems that thinkSystems that think


like humans rationally

Systems that act Systems that act


BEHAVIOUR like humans rationally

HUMAN RATIONAL
Systems that act like humans:

 “The art of creating machines that perform


functions that require intelligence when performed
by people.” (Kurzweil)

 “The study of how to make computers do things at


which, at the moment, people are better.” (Rich
and Knight)
Systems that act like humans:

?
 You enter a room which has a computer terminal. You
have a fixed period of time to type what you want into
the terminal, and study the replies. At the other end of
the line is either a human being or a computer
system.

 If it is a computer system, and at the end of the


period you cannot reliably determine whether it is a
system or a human, then the system is deemed to be
intelligent.
What is Artificial Intelligence ?

THOUGHT Systems that thinkSystems that think


like humans rationally

Systems that act Systems that act


BEHAVIOUR like humans rationally

HUMAN RATIONAL
Systems that think like humans:
cognitive modeling
 How do we know how humans think?
 Introspection vs. psychological experiments

 “The exciting new effort to make computers think …


machines with minds in the full and literal sense”
(Haugeland)

 “[The automation of] activities that we associate with


human thinking, activities such as decision-making,
problem solving, learning …” (Bellman)
What is Artificial Intelligence ?

THOUGHT Systems that thinkSystems that think


like humans rationally

Systems that act Systems that act


BEHAVIOUR like humans rationally

HUMAN RATIONAL
Systems that think ‘rationally’
"laws of thought"
 Humans are not always ‘rational’

 Rational - defined in terms of logic?


 Logic can’t express everything (e.g. uncertainty)

 “The study of the computations that make it possible


to perceive, reason, and act” (Winston)
Systems that act rationally:
“Rational agent”
 Rational behavior: doing the right thing
 The right thing: that which is expected to maximize
goal achievement, given the available information

 Giving answers to questions is ‘acting’.


 I don't care whether a system:
 replicateshuman thought processes
 makes the same decisions as humans
 uses purely logical reasoning
Real AI
 General-purpose AI like the robots of science
fiction is incredibly hard
 Human brain appears to have lots of special and
general functions, integrated in some amazing
way that we really do not understand at all

 Special-purpose AI is more doable


 E.g., chess/poker playing programs, logistics
planning, automated translation, voice recognition,
web search, data mining, medical diagnosis.
Natural language processing
 Speech technologies (e.g. Siri)
 Automatic speech recognition (ASR)
 Text-to-speech synthesis (TTS)
 Dialog systems

 Language processing technologies


 Question answering
 Machine translation

 Web search
 Text classification, spam filtering, etc…
Computer vision

 Object and face recognition


 Scene segmentation
 Image classification
Robotics
 Robotics
 Part mech. eng.
 Part AI
 Reality much
harder than
simulations!

 Technologies
 Vehicles
 Rescue
 Soccer!
 Lots of automation…
Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Content of previous lecture

THOUGHT Systems that thinkSystems that think


like humans rationally

Systems that act Systems that act


BEHAVIOUR like humans rationally

HUMAN RATIONAL

2
Agents

 Agent – perceives the environment through sensors and


acts on it through actuators
 Percept – agent’s perceptual input (the basis for its
actions)
 Percept sequence – complete history of what has been
perceived.
What do you mean, sensors/percepts
and effectors/actions?

 Humans
– Sensors: Eyes (vision), ears (hearing), skin (touch), tongue, nose,
neuromuscular system
– Percepts:
• At the lowest level – electrical signals from these sensors
• After preprocessing – objects in the visual field (location, textures,
colors, ...), auditory streams (pitch, loudness, direction), ...
– Effectors: limbs, digits, eyes, tongue, ...
– Actions: lift a finger, turn left, walk, run, carry an object, ...
Agent function

 Agent function – maps a give percept sequence


into an action; describes what the agent does.
 Externally – Table of actions
 Internally – Agent program
How do you design an intelligent
agent?

 Definition: An intelligent agent perceives its environment via sensors and


acts rationally upon that environment with its effectors.
 A discrete agent receives percepts one at a time, and maps this percept
sequence to a sequence of discrete actions.
 Properties
– Autonomous
– Reactive to the environment
– Pro-active (goal-directed)
– Interacts with other agents via the environment
A more specific example:
Automated taxi driving system

 Percepts: Video, sonar, speedometer, odometer, engine sensors,


keyboard input, microphone, GPS, ...
 Actions: Steer, accelerate, brake, horn, speak/display, ...
 Goals: Maintain safety, reach destination, maximize profits (fuel, tire
wear), obey laws, provide passenger comfort, ...
 Environment: U.S. urban streets, freeways, traffic, pedestrians,
weather, customers,
Vacuum cleaner world

 Percepts: which square (A or B); dirt?


 Actions: move right, move left, suck, do nothing
 Agent function: maps percept sequence into actions
 Agent program: function’s implementation
 How should the program act?
Rational agent – does the right thing
What does that mean? One that behaves as well as possible given the
environment in which it acts. How should success be measured?

 Performance measure
 Embodies criterion for success
• Amount of dirt cleaned?
• Cleaned floors?
 Generally defined in terms of desired effect on environment (not
on actions of agent)
 Defining measure not always easy!
Rationality depends on:
1. Performance measure that defines criterion for
success.
2. Agent’s prior knowledge of the environment.
3. Actions the agent can perform.
4. Agent’s percept sequence to date.

For each possible percept sequence, a rational agent should select an


action that is expected to maximize its performance measure,
given the evidence provided by the percept sequence and
whatever built-in knowledge the agent has.
Autonomy
o A system is autonomous to the extent that its own behavior
is determined by its own experience.
o Therefore, a system is not autonomous if it is guided by its
designer according to a priori decisions.

o To survive, agents must have:


– Enough built-in knowledge to survive.
– The ability to learn.
Task environment
 The “problems” for which rational agents are the “solutions”
PEAS description of task environment
 Performance Measure

 Environment

 Actuators (actions)

 Sensors (what can be perceived)


Properties of task environments
(affect appropriate agent design)
 Single Agent vs Multi-agent
 Single Agent – crossword puzzle
 Multi-agent – chess, taxi driving? (are other drivers best described as
maximizing a performance element?)
 Multi-agent means other agents may be competitive or cooperative and
may require communication
 Multi-agent may need communication

 Deterministic vs Stochastic
 Deterministic – next state completely determined by current state and
action
 Uncertainty may arise because of defective actions or partially
observable state (i.e., agent might not see everything that affects the
outcome of an action).
Properties of task environments
(affect appropriate agent design)
 Episodic vs Sequential
 Episodic the agent’s experience divided into atomic episodes
 Next episode not dependent on actions taken in previous episode. e.g.,
assembly line
 Sequential – current action may affect future actions. e.g., playing chess,
taxi
 must think ahead in choosing an action

 Static vs Dynamic
 does environment change while agent is deliberating?
 Static – crossword puzzle
 Dynamic – taxi driver
Properties of task environments
(affect appropriate agent design)
 Discrete vs Continuous.
 the state of the environment (chess has finite number of discrete
states)
 the way time is handled (taxi driving continuous – speed and
location of taxi sweep through range of continuous values)
 percepts and actions (taxi driving continuous – steering angles)
Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Content of previous lecture

 Simple agent: direct mapping from state to actions


 Goal-based agent: can succeed by considering future
action and desirability of their outcomes

2
Summary for today’s lecture

 Simple agent: direct mapping from state to actions


 Goal-based agent: can succeed by considering future
action and desirability of their outcomes
 Problem solving agent: decide what to do by finding
sequences of actions that lead to desirable states.
 Uninformed search: they are given no information
except its definition.
3
Search
 Search: process of looking for such a
sequence
 What choices are we searching through?
 Problem solving
Action combinations (move 1, then move 3,
then move 2...)
 Natural language
Ways to map words to parts of speech
 Computer vision
Ways to map features to object model
Assumptions

 Static or dynamic?

Environment is static
Assumptions

 Static or dynamic?
 Fully or partially observable?

Environment is fully observable


Assumptions

 Static or dynamic?
 Fully or partially observable?

 Discrete or continuous?

Environment is discrete
Assumptions

 Static or dynamic?
 Fully or partially observable?

 Discrete or continuous?

 Deterministic or stochastic?

Environment is deterministic
Assumptions

 Static or dynamic?
 Fully or partially observable?

 Discrete or continuous?

 Deterministic or stochastic?

 Episodic or sequential?

Environment is sequential
Assumptions

 Static or dynamic?
 Fully or partially observable?

 Discrete or continuous?

 Deterministic or stochastic?

 Episodic or sequential?

 Single agent or multiple


agent?
Assumptions

 Static or dynamic?
 Fully or partially observable?

 Discrete or continuous?

 Deterministic or stochastic?

 Episodic or sequential?

 Single agent or multiple


agent?
Search example
Formulate goal: Be in
Bucharest.

Formulate problem: states


are cities, operators drive
between pairs of cities

Find solution: Find a


sequence of cities (e.g.,
Arad, Sibiu, Fagaras,
Bucharest) that leads from
the current state to a state
meeting the goal condition
Search space definitions
 State
 A description of a possible state of the world
 Initial state
 Agent starts the search
 Goal test
 Conditions the agent is trying to meet
 Goal state
 Any state which meets the goal condition
 Action
 Function that maps (transitions) from one state to
another
Search space definitions
 Problem formulation
 Describe a general problem as a search problem
 Solution
 Sequence of actions that transitions the world from the initial
state to a goal state
 Solution cost (additive)
 Sum of the cost of operators
 Alternative: sum of distances, number of steps, etc.
 Search
 Process of looking for a solution
 Search algorithm takes problem as input and returns solution
 We are searching through a space of possible states
 Execution
 Process of executing sequence of actions
Problem formulation

A search problem is defined by the

1. Initial state (e.g., Arad)


2. Operators (e.g., Arad -> Zerind, Arad -> Sibiu,
etc.)
3. Goal test (e.g., at Bucharest)
4. Solution cost (e.g., path cost)
Example problems
 Toy problem: is intended o illustrate or exercise
various problem-solving methods.

 Real-world problem: is one whose solutions to


compare the performance of algorithms.
Example problems – Eight Puzzle
States: tile locations

Initial state: one specific tile configuration

Operators: move blank tile left, right, up,


or down

Goal: tiles are numbered from one to


eight around the square

Path cost: cost of 1 per move (solution


cost same as number of most or path
length)

Eight puzzle applet


Example problems – eight
queens
States: locations of 8 queens on chess
board

Initial state: one specific queens


configuration

Operators: move queen x to row y and


column z

Goal: no queen can attack another


(cannot be in same row, column, or
diagonal)

Path cost: 0 per move

Eight queens applet


Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Content of previous lecture

Formulate goal: Be in
Bucharest.

Formulate problem: states


are cities, operators drive
between pairs of cities

Find solution: Find a


sequence of cities (e.g.,
Arad, Sibiu, Fagaras,
Bucharest) that leads from
the current state to a state
meeting the goal condition

2
Searching for a solution: Key concepts in
search
 Set of states
 Including an initial state and goal states
 For every state, a set of actions
 Each action results in a new state
 Typically defined by successor function
 Cost function that determines the cost of each
action (path, sequence of actions)
 Solution: path from initial state to a goal state
 Optimal solution: solution with minimal cost
Measuring problem-solving
performance
 Completeness: is the algorithm guaranteed to
find a solution ?
 Optimality: does the strategy find optimal solution
?
 Time complexity: how long does it takes to find a
solution?
 Space complexity: how much memory is needed
to perform solution ?

4
8-puzzle

1 2 1 2 3
4 5 3 4 5 6
7 8 6 7 8
goal state
8-puzzle
1 2
4 5 3
7 8 6

1 2 1 2 1 5 2
4 5 3 4 5 3 4 3
7 8 6 7 8 6 7 8 6

.. ..
. .
Recall: State-space formulation
 Intelligent agents: problem solving as search

 Search consists of
 state space
 operators
 start state
 goal states

 A Search Tree is an effective way to represent the search process

 There are a variety of search algorithms, including


 Depth-First Search
 Breadth-First Search

8
Uninformed search strategies
 Uninformed: While searching we have no clue
whether one non-goal state is better than any
other. The search is blind.

 Various blind strategies:


 Breadth-first search
 Uniform-cost search
 Depth-first search
 Iterative deepening search

9
Breadth-first search
 Expand shallowest unexpanded node
 Fringe: nodes waiting in a queue to be explored, also
called OPEN
 Implementation:
 fringe is a first-in-first-out (FIFO) queue, i.e., new
successors go at end of the queue.

Is A a goal state?

10
Breadth-first search

 Expand shallowest unexpanded node


 Implementation:
 fringe is a FIFO queue, i.e., new successors
go at end

Expand:
fringe = [B,C]

Is B a goal state?

11
Breadth-first search

 Expand shallowest unexpanded node


 Implementation:
 fringe is a FIFO queue, i.e., new successors go at
end

Expand:
fringe=[C,D,E]

Is C a goal state?

12
Breadth-first search

 Expand shallowest unexpanded node


 Implementation:
 fringe is a FIFO queue, i.e., new successors go at
end

Expand:
fringe=[D,E,F,G]

Is D a goal state?

13
Example: map navigation

A B C

S G

D E F

S = start, G = goal, other nodes = intermediate states, links = legal transitions

14
What is the complexity of Breadth-first
search?
• Time Complexity
– assume (worst case) that there is 1
goal leaf at the RHS d=0
– so BFS will expand all nodes
d=1
= 1 + b + b 2+ ......... + bd
= O (bd) d=2
G

• Space Complexity
– how many nodes can be in the queue d=0
(worst-case)?
– at depth d there are bd unexpanded d=1
nodes in the Q = O (bd)
d=2
– Time and space of number of generated G
nodes is O (b^(d+1))
15
Examples of time and memory requirements for
the Breadth-first search

Depth of Nodes
Solution Expanded Time Memory

0 1 1 millisecond 100 bytes

2 111 0.1 seconds 11 kbytes

4 11,111 11 seconds 1 megabyte

8 108 31 hours 11 giabytes

12 1012 35 years 111 terabytes

Assuming b=10, 1000 nodes/sec, 100 bytes/node

16
Depth-first search
1. Put the start node s on OPEN
2. If OPEN is empty exit with failure.
3. Remove the first node n from OPEN and place it on
CLOSED.
4. If n is a goal node, exit successfully with the solution
obtained by tracing back pointers from n to s.
5. Otherwise, expand n, generating all its successors
attach to them pointers back to n, and put them at the
top of OPEN in some order.
6. Go to step 2.

17
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = Last In First Out (LIPO) queue, i.e., put
successors at front

Is A a goal state?

18
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[B,C]

Is B a goal state?

19
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[D,E,C]

Is D = goal state?

20
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[H,I,E,C]

Is H = goal state?

21
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[I,E,C]

Is I = goal state?

22
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[E,C]

Is E = goal state?

23
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[J,K,C]

Is J = goal state?

24
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[K,C]

Is K = goal state?

25
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[C]

Is C = goal state?

26
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[F,G]

Is F = goal state?

27
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[L,M,G]

Is L = goal state?

28
Depth-first search
 Expand deepest unexpanded node
 Implementation:
 fringe = LIFO queue, i.e., put successors at front

queue=[M,G]

Is M = goal state?

29
What is the complexity of Depth-first search?

d=0
 Time Complexity
 assume (worst case) that there is 1
goal leaf at the RHS d=1
 so DFS will expand all nodes
d=2
(m is cutoff)
=1 + b + b2+ ......... + b^m G

= O (b^m)
d=0

d=1
 Space Complexity
 how many nodes can be in the d=2
queue (worst-case)?
 at depth l < d we have b-1 nodes d=3
 at depth d we have b nodes
 total = (m-1)*(b-1) + b = O(bm) d=4

30
Comparing DFS and BFS
 Same worst-case time Complexity, but
 In the worst-case BFS is always better than DFS
 Sometime, on the average DFS is better if:
• many goals, no loops and no infinite paths
 In general
• BFS is better if goal is not deep, if infinite paths, if many
loops, if small search space
• DFS is better if many goals, not many loops,
• DFS is much better in terms of memory

31
Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Content of previous lecture
 Same worst-case time Complexity, but
 In the worst-case BFS is always better than DFS
 Sometime, on the average DFS is better if:
• many goals, no loops and no infinite paths
 In general
• BFS is better if goal is not deep, if infinite paths, if many
loops, if small search space
• DFS is better if many goals, not many loops,
• DFS is much better in terms of memory

2
Informed search and exploration

3
Best-first search
 Idea: use an evaluation function f(n) for each node
 Estimate of "desirability“
 Expand most desirable unexpanded node
 Choose node which appears best

 Implementation:
 use a data structure that maintains the frontier in a decreasing order of
desirability

 Special cases: uniform-cost (Dijkstra’s algorithm), greedy


search, A*search

 A key component is a heuristic function h(n):


 h(n) = estimated cost of the cheapest path from node n to a goal node
 h(n) = 0 if n is the goal
 h(n) could be general or problem-specific
Romania with step costs in Km
 hSLD = straight-line
distance heuristic
 hSLD cannot be
computed from the
problem description
itself
 In greedy best-first
search f(n)=h(n)
 Expand node that is
closest to goal

5
Greedy best-first search:
Example
Greedy best-first search:
Example
Greedy best-first search:
Example
Greedy best-first search:
Example

 Goal reached
 For this example no node is expanded that is not on the solution path
 But not optimal (see Arad, Sibiu, Rimnicu Vilcea, Pitesti)
Greedy best-first search:
Example
 Complete or optimal: no
 Minimizing h(n) can result in false starts, e.g. Iasi to Fagaras
 Check on repeated states
Greedy best-first search:
Example
 Time and space complexity:
 In the worst case all the nodes in the search
tree are generated: O(bm )
(m is maximum depth of search tree and b is
branching factor)
A* Search
 Best-known form of best-first search
 Idea: avoid expanding paths that are already expensive

 Evaluation function f(n)= g(n) + h(n)


 g(n): the cost (so far) to reach the node
 h(n): estimated cost to get from the node to the goal
 f(n): estimated total cost of path through n to goal

 A* search is both complete and optimal if h(n) satisfies certain


conditions
A* Search
 A* search is optimal if h(n) is an admissible
heuristic
 A heuristic is admissible if it never overestimates the
cost to reach the goal
 h(n) ≤ h*(n) where h*(n) is the true cost from n
 e.g. hSLD(n) never overestimates the actual road
distance
Romania Example

14
A* Search: Example
A* Search: Example
A* Search: Example
A* Search: Example
A* Search: Example
A* Search: Example

20
A* Search: Example
A* Search: Evaluation

 Complete: yes
 Unless there are infinitely many nodes with f < f(G)
 Optimal: yes
 A* is also optimally efficient for any given h(n). That is, no
other optimal algorithm is guaranteed to expand fewer nodes
than A*.
Heuristic functions
 A heuristic is a technique designed for solving a problem more quickly when
classic methods are too slow, or for finding an approximate solution when classic
methods fail to find any exact solution

 A heuristic function, also simply called a heuristic, is a function that ranks


alternatives in search algorithms at each branching step based on available
information to decide which branch to follow
Example: 8-puzzle

 States: location of each tile plus blank


 Initial state: Any state can be initial
 Actions: Move blank {Left, Right, Up, Down}
 Goal test: Check whether goal configuration is
reached
 Path cost: Number of actions to reach goal
24
Example: 8-puzzle

 For 8-puzzle problem:


 h1 = number of tiles out of place. In the
example h1= 8
 h2 = total Manhattan distance
In the example
h2 = 3+1+2+2+2+3+3+2 = 18
Hill-Climbing search
• continually moves uphill
– increasing value of the evaluation function

• is a loop that continuously moves in the direction


of increasing value - that is, uphill.
– It terminates when a peak is reached.
Hill-Climbing search

• Hill-climbing does not look ahead of the immediate


neighbors of the current state.
• Hill-climbing chooses randomly among the set of
best successors, if there is more than one.
• Hill-climbing is also called greedy local search
• Hill-climbing is a very simple strategy with low
space requirements
– stores only the state and its evaluation, no search tree
Hill-Climbing example

 8-queens problem (complete-state


formulation).
 Successor function: move a single queen to
another square in the same column.
 Heuristic function h(n): the number of pairs of
queens that are attacking each other (directly
or indirectly).
Hill-Climbing example
(a) (b)

(a) An 8-queens state with heuristic cost estimate h = 17,


showing the value of h for each possible successor
obtained by moving a queen within its column. The best
moves are marked.
(b) A local minimum in the 8-queens state space; the state
has h = 1 but every successor has a higher cost.
Drawbacks: get stuck

 Local maxima: peak is higher than each of its neighbour


 Ridge = sequence of local maxima difficult for greedy
algorithms to navigate
 Plateaux = an area of the state space where the
evaluation function is flat.
 Gets stuck 86% of the time.
Escaping local optima

 HC gets stuck at local maxima limiting


the quality of the solution found.

 Two ways to modify HC:


1. choice of neighbor
2. criteria for accepting neighbor for current
 For example:
1. accept neighbor if it is better or
if it isn't, accept with some fixed probability p
Hill-Climbing variations

 Stochastic hill-climbing
 Random selection among the uphill moves.
 The selection probability can vary with the
steepness of the uphill move.
 First-choice hill-climbing
 implements stochastic hill climbing by
generating successors randomly until a better
one is found.
 Random-restart hill-climbing
 Tries to avoid getting stuck in local maxima.
Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Introduction to Neural Networks

2
Neural network history
 History traces back to the 50’s but became popular in
the 80’s with work by Rumelhart, Hinton, Kanade, and
Mclelland
 A General Framework for Parallel Distributed Processing in
Parallel Distributed Processing

 Peaked in the 90’s. Today:


 Hundreds of variants
 Less a model of the actual brain than a useful tool, but still some
debate
 Numerous applications
 Handwriting, face, speech recognition, virtual reality
 Computer vision, autonomous vehicles
 Models of reading, sentence production, dreaming
 Debate for philosophers and cognitive scientists
 Can human cognitive abilities be explained by a connectionist
model?
How do our brains work?
 The Brain is a massively parallel information processing system.
 Our brains are a huge network of processing elements. A typical
brain contains a network of 10 billion neurons.
How do our brains work?
 A processing element

Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
How do our brains work?
 A processing element

A neuron is connected to other neurons through about 10,000


synapses
How do our brains work?
 A processing element

A neuron receives input from other neurons.


How do our brains work?
 A processing element

Transmission of an electrical signal from one neuron to the next is


effected by neurotransmitters.
How do our brains work?
 A processing element

This link is called a synapse.


The first neural networks
X1
2

X2 2
Y

-1

For the network shown here the activation function for


X3
unit Y is

f(y_in) = 1, if y_in >= θ else 0

where y_in is the total input signal received


θ is the threshold for Y
The first neural networks
X1
2

X2 2
Y

-1

X3

Each neuron has a fixed threshold. If the net input into


the neuron is greater than the threshold, the neuron
fires
The first neural networks

1
AND
X1

X1 X2 Y
Y

1 1 1
X2 1
1 0 0
AND Function
0 1 0
0 0 0

Threshold(Y) = 2
The first neural networks
OR
X1 2
X1 X2 Y
Y
1 1 1
X2 2
1 0 1
0 1 1
AND Function
OR Function
0 0 0

Threshold(Y) = 2
The first neural networks
AND
X1 2 NOT
Y X1 X2 Y
X2
1 1 0
-1
1 0 1
AND NOT Function
0 1 0
0 0 0
Threshold(Y) = 2
How do ANNs work?

An artificial neuron is an imitation of a human neuron


How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
............
Input xm x2 x1

Processing ∑
∑= X1+X2 + ….+Xm =y

Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm ..... w2 w1
weights

Processing ∑ ∑= X1w1+X2w2 + ….+Xmwm


=y

Output y
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
............
xm x2 x1
Input
wm ..... w2 w1
weights

Processing ∑
Transfer Function
f(vk)
(Activation Function)

Output y
The output is a function of the input, that is affected by
the weights, and the transfer functions
Neural network model

Inputs
.6 Output
Age 34 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2

Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction

21
“Combined logistic models”

Inputs
.6 Output
Age 34
.5 0.6
.1
Gender 2 S
.7 .8 “Probability of
beingAlive”
Stage 4

Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction

22
Inputs
Output
Age 34
.2 .5
0.6
Gender 2 .3
S
“Probability of
.8
beingAlive”
Stage 4 .2

Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction

23
Inputs
.6 Output
Age 34
.2 .5
.1 0.6
Gender 1 .3
S
.7 “Probability of
.8
beingAlive”
Stage 4 .2

Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction

24
Not really,
no target for hidden units...

Age 34 .6 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2

Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction

25
Decision Neural Network
Functions

Output

Hidden Layer …

Input …
26
Decision Neural Network
Functions

Output


Hidden Layer


Input
Building a neural network

Output

Features …

28
Building a neural network

Output

Hidden Layer …
D=M

1 1 1

Input …
29
Building a neural network

Output

Hidden Layer …
D=M

Input …
30
Building a neural network

Output

Hidden Layer …
D=M

Input …
31
Building a neural network

Output

Hidden Layer …
D<M

Input …
32
Multi-class output

Output …

Hidden Layer …

Input …
33
Artificial Intelligence

Md. Zasim Uddin, PhD


Associate professor, Dept. Computer Science & Engineering
Begum Rokeya University, Rangpur
Introduction to Deep Learning

2
Outline
 Machine Learning basics
 Introduction to Deep Learning
 what is Deep Learning
 why is it useful

 Main components/hyper-parameters:
 activation functions
 optimizers, cost functions and training
 regularization methods
 classification vs. regression tasks
Machine learning basics
 Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed

Machine Learning
Labeled Data algorithm

Training

Learned model

Methods that can learn from data


Machine learning basics
 Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed

Machine Learning
Labeled Data algorithm

Training
Prediction

Labeled Data Learned model Prediction

Methods that can learn from and make predictions on data


Types of learning
 Supervised: Learning with a labeled training set
Example: Animal classification with already labeled animals (e.g., cat, dog,
horse, etc.,)

 Unsupervised: Discover patterns in unlabeled data


Example: cluster similar documents based on text

 Reinforcement learning: learn to act based on feedback/reward


Example: learn to play Go, reward: win or lose

class A

class A

Classification

Anomaly detection Regression Clustering


sequence labeling

ML vs. Deep learning
 Most machine learning methods work well because of human-designed
representations and input features
 ML becomes just optimizing weights to best make a final prediction
What is deep learning (DL) ?
 A machine learning subfield of learning representations of data. Exceptional
effective at learning patterns.
 Deep learning algorithms attempt to learn (multiple levels of) representation by using
a hierarchy of multiple layers
 If you provide the system tons of information, it begins to understand it and
respond in useful ways.
Why is DL useful?
 Manually designed features are often over-specified, incomplete and take a
long time to design and validate
 Learned Features are easy to adapt, fast to learn
 Deep learning provides a very flexible, (almost?) universal, learnable framework
for representing world, visual and linguistic information.
 Can learn both unsupervised and supervised
 Effective end-to-end joint system learning
 Utilize large amounts of training data
Neural network

Output


Hidden Layer


Input
Training
Sample Forward it Back- Update the
labeled data through the
propagate network
network, get
(batch) the errors weights
predictions

Optimize (min. or max.) objective/cost function 𝑱(𝜽)


Generate error signal that measures difference
between predictions and target values

Use error signal to change the weights and get more


accurate predictions
Subtracting a fraction of the gradient moves you
towards the (local) minimum of the cost function
Convolutional neural network (CNN)

12
Convolutional neural network (CNN)

13
CNN stride and padding

14
CNN stride and padding (input output same)
CNN stride and padding
CNN layer

17
CNN layer

18
CNN layer

19
CNN layer

20
CNN layer

21
Pooling layer

22
Activation function

23
Activation: Sigmoid
Takes a real-valued number and
“squashes” it into range between 0
and 1.
𝑅𝑛 → 0,1
Activation: Tanh
Takes a real-valued number and
“squashes” it into range between -1
and 1.
𝑅𝑛 → −1,1

- Like sigmoid, tanh neurons saturate


- Unlike sigmoid, output is zero-centered
- Tanh is a scaled sigmoid: tanh 𝑥 = 2𝑠𝑖𝑔𝑚 2𝑥 − 1
Activation: ReLU
Takes a real-valued number and
thresholds it at zero f 𝑥 = max(0, 𝑥)

𝑅𝑛 → 𝑅+𝑛

Most Deep Networks use ReLU nowadays

� Trains much faster


• accelerates the convergence of SGD
• due to linear, non-saturating form
� Less expensive operations
• compared to sigmoid/tanh (exponentials etc.)
• implemented by simply thresholding a matrix at zero
� More expressive
� Prevents the gradient vanishing problem
Loss functions and output
Classification Regression

Training Rn x {class_1, ..., class_n} Rn x Rm


examples (one-hot encoding)

Output Soft-max Linear (Identity)


Layer [map Rn to a probability distribution] or Sigmoid

f(x)=x

Cost (loss) Cross-entropy Mean Squared Error


function 1
𝑛
2
𝑛 𝐾 𝐽 𝜃 = ෍ 𝑦 (𝑖) − 𝑦ො (𝑖)
1 (𝑖) (𝑖) (𝑖) 𝑖
𝑛
𝐽 𝜃 = − ෍ ෍ 𝑦𝑘 log 𝑦ො𝑘 + 1 − 𝑦𝑘 log 1 − 𝑦ො𝑘 𝑖=1
𝑛
𝑖=1 𝑘=1 Mean Absolute Error
𝑛
1
𝐽 𝜃 = ෍ 𝑦 (𝑖) − 𝑦ො (𝑖)
𝑛
𝑖=1
LeNet-5
 Proposed in “Gradient-based learning applied to document
recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,
in Proceedings of the IEEE, 1998

 Apply convolution on 2D images (MNIST) and use backpropagation

 Structure: 2 convolutional layers (with pooling) + 3 fully connected layers


• Input size: 32x32x1
• Convolution kernel size: 5x5
• Pooling: 2x2

28
LeNet-5

29
LeNet-5

30
LeNet-5

31
LeNet-5

32
LeNet-5

33
LeNet-5

34
LeNet-5

35
LeNet-5

36
VGG-16 architecture

37
Alexnet architecture
Software platforms for CNN

 Platform: Caffe
 Platform: Tensorflow

 Platform: Pytorch

39
Adversarial Search
Game Playing

Chapter 6
Outline
• Games
• Perfect Play
– Minimax decisions
– α-β pruning
• Resource Limits and Approximate Evaluation
• Games of chance
Games

• Multi agent environments : any given agent will need


to consider the actions of other agents and how they
affect its own welfare.
• The unpredictability of these other agents can
introduce many possible contingencies

• There could be competitive or cooperative


environments

• Competitive environments, in which the agent’s goals


are in conflict require adversarial search – these
problems are called as games
What kind of games?
• Abstraction: To describe a game we must capture every
relevant aspect of the game. Such as:
– Chess
– Tic-tac-toe
– …
• Accessible environments: Such games are
characterized by perfect information
• Search: game-playing then consists of a search through
possible game positions
• Unpredictable opponent: introduces uncertainty thus
game-playing must deal with contingency problems
Type of Games
Games
• In game theory (economics), any multi-agent environment (either
cooperative or competitive) is a game provided that the impact of
each agent on the other is significant

• AI games are a specialized kind - deterministic, turn taking, two-


player, zero sum games of perfect information

– a zero-sum game is a mathematical representation of a situation in


which a participant's gain (or loss) of utility is exactly balanced by the
losses (or gains) of the utility of other participant(s)

• In our terminology – deterministic, fully observable environments


with two agents whose actions alternate and the utility values at the
end of the game are always equal and opposite (+1 and –1)
– If a player wins a game of chess (+1), the other player necessarily loses
(-1)
Searching for the next move
• Complexity: many games have a huge search space
– Chess: b = 35, m=100 nodes = 35 100
– if each node takes about 1 ns to explore then each move will
take about 1050 millennia to calculate.

• Resource (e.g., time, memory) limit: optimal solution


not feasible/possible, thus must approximate

• 1. Pruning: makes the search more efficient by


discarding portions of the search tree that cannot
improve quality result.

• 2. Evaluation functions: heuristics to evaluate utility of


a state without exhaustive search.
Optimal Decision (Two-player Games)
• A game formulated as a search problem:
Example: Tic-Tac-Toe
The minimax algorithm
• Perfect play for deterministic environments with perfect
information

• Basic idea: choose move with highest minimax value


= best achievable payoff against best play

• Algorithm:
1. Generate game tree completely
2. Determine utility of each terminal state
3. Propagate the utility values upward in the three by applying
MIN and MAX operators on the nodes in the current level
4. At the root node use minimax decision to select the move
with the max (of the min) utility value

• Steps 2 and 3 in the algorithm assume that the opponent will


play perfectly.
Generate Game Tree
Optimal Strategy (Minimax Example)
Minimax value
• Given a game tree, the optimal strategy can be
determined by examining the minimax value of
each node (MINIMAX-VALUE(n))

• The minimax value of a node is the utility of


being in the corresponding state, assuming that
both players play optimally from there to the end
of the game

• Given a choice, MAX prefer to move to a state of


maximum value, whereas MIN prefers a state of
minimum value
Minimax: Recursive implementation
α-β pruning
α-β pruning: example
α-β pruning: example
α-β pruning: example
α-β pruning: example
α-β pruning: example
α-β pruning: example
α-β pruning: example
α-β pruning: example
α-β pruning: General Principle
Why is it called α-β?
• α is the value of the
best (i.e., highest-
value) choice found
so far at any choice
point along the path
for max

• If v is worse than α,
max will avoid it
•  prune that branch
• Define β similarly for
min
α-β pruning
• Alpha-beta search updates the values of α and β
as it goes along and prunes the remaining
branches at a node as soon as the value of the
current node is known to be worse than the
current α or β value for MAX or MIN,
respectively.

• The effectiveness of alpha-beta pruning is highly


dependent on the order in which the successors
are examined.
Properties of α-β
• Pruning does not affect final result

• Good move ordering improves effectiveness of pruning


• With "perfect ordering," time complexity = O(bm/2)


 doubles depth of search

• A simple example of the value of reasoning about which


computations are relevant (a form of metareasoning)

The α-β algorithm
The α-β algorithm
Imperfect Real-Time Decisions
Suppose we have 100 secs, explore 104
nodes/sec
 106 nodes per move
Standard approach:
cutoff test:
e.g., depth limit (perhaps add quiescence search)
evaluation function
= estimated desirability of position
* Replace the utility function by a heuristic evaluation
function EVAL, which gives an estimate of the
position’s utility
Evaluation Functions
• The evaluation function should order the
terminal states in the same way as the true utility
function
• The computation must not take too long
• For non-terminal states, the evaluation function
should be strongly correlated with the actual
chances of winning
– Uncertainty introduced by computational limits
Evaluation Functions
Evaluation Functions
• Material value for each piece in chess
– Pawn: 1
– Knight: 3
– Bishop: 3
– Rook: 5
– Queen: 9
This can be used as weights and the number of each kind can be used as
features
• Other features
– Good pawn structure
– King safety

• These features and weights are not part of the rules of chess, they
come from playing experience
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
3.
Does it work in practice?

bm = 106, b=35  m=4

4-ply lookahead is a hopeless chess player!

– 4-ply ≈ human novice


– 8-ply ≈ typical PC, human master
– 12-ply ≈ Deep Blue, Kasparov

Knowledge and Reasoning
Knowledge and Reasoning
Knowledge and Reasoning:
humans are very good at acquiring new information by
combining raw knowledge, experience with reasoning.
AI-slogan: “Knowledge is power” (or “Data is power”?)

Examples:
Medical diagnosis --- physician diagnosing a patient
infers what disease, based on the knowledge s/he
acquired as a student, textbooks, prior cases

Common sense knowledge / reasoning ---


common everyday assumptions / inferences
e.g., “lecture starts at four” infer pm not am;
when traveling, I assume there is some way to get from the
airport to the hotel.
Logical agents:
Agents with some representation of the
complex knowledge about the world / its environment,
and uses inference to derive new information from that
knowledge combined with new inputs (e.g. via
perception).

Key issues:
1- Representation of knowledge
What form? Meaning / semantics?
2- Reasoning and inference processes
Efficiency.
Knowledge-base Agents

• Key issues:
– Representation of knowledge  knowledge base
– Reasoning processes  inference/reasoning

Knowledge base = set of sentences in a formal language


representing facts about the world(*)

(*) called Knowledge Representation (KR) language


Knowledge bases
• Key aspects:
– How to add sentences to the knowledge base
– How to query the knowledge base

Both tasks may involve inference – i.e. how to derive new


sentences from old sentences

Logical agents – inference must obey the fundamental


requirement that when one asks a question to the
knowledge base,
the answer should follow from what has been told to the
knowledge base previously.
A simple knowledge-based agent

• The agent must be able to:


– First, it TELLs the knowledge base what it perceives. Second, it
ASKs the knowledge base what action it should perform.
– Represent states, actions, etc.
– Incorporate new percepts and update internal representations
of the world
– Deduce hidden properties of the world and deduce appropriate
action
Wumpus World
Wumpus world characterization
• Fully Observable No – only local perception

• Deterministic Yes – outcomes exactly specified

• Static Yes – Wumpus and Pits do not move

• Discrete Yes

• Single-agent? Yes – Wumpus is essentially a “natural
feature.”

Illustrative example: Wumpus World
•Performance measure
– gold +1000,
– death -1000
(falling into a pit or being eaten by the wumpus)
– -1 per step, -10 for using the arrow
•Environment
– Rooms / squares connected by doors.
– Squares adjacent to wumpus are smelly
– Squares adjacent to pit are breezy
– Glitter iff gold is in the same square
– Shooting kills wumpus if you are facing it
– Shooting uses up the only arrow
– Grabbing picks up gold if in same square
– Releasing drops the gold in same square
– Randomly generated at start of game. Wumpus only senses
current room.
•Sensors: Stench, Breeze, Glitter, Bump, Scream [perceptual inputs]
•Actuators: Left turn, Right turn, Forward, Grab, Release, Shoot
Wumpus world characterization
• Fully Observable No – only local perception

• Deterministic Yes – outcomes exactly specified

• Static Yes – Wumpus and Pits do not move

• Discrete Yes

• Single-agent? Yes – Wumpus is essentially a “natural
feature.”

Exploring a wumpus world

The knowledge base of the agent


consists of the rules of the
Wumpus world plus the percept
“nothing” in [1,1]

None, none, none, none, none


Stench, Breeze, Glitter, Bump, Scream

Boolean percept
feature values:
<0, 0, 0, 0, 0>
Exploring a wumpus world
Exploring a wumpus world
Exploring a wumpus world
Exploring a wumpus world
Knowledge and Reasoning
First order logic
Inference rule
Propositional logic
Basic: Propositional logic
First order predicate logic
First order predicate logic
Constants
Variables
Predicates
Statement in predicate logic
Quantifiers
Homeworks
• Bound and free variable
• Different sentence transaxle into predicate
logic and vice versa
Learning from Observations
Outline
• Learning agents
• Inductive learning
• Decision tree learning
Learning
• Learning is essential for unknown environments,
– i.e., when designer lacks omniscience

• Learning is useful as a system construction


method,
– i.e., expose the agent to reality rather than trying to
write it down

• Learning modifies the agent's decision


mechanisms to improve performance
Learning agents
Learning agents
Learning element
• Design of a learning element is affected by
– Which components of the performance element are to
be learned
– What feedback is available to learn these
components
– What representation is used for the components

• Type of feedback:
– Supervised learning: correct answers for each example (e.g.,
give label during training)
– Unsupervised learning: correct answers not given
– Reinforcement learning: occasional rewards
Inductive learning
• Simplest form: learn a function from examples
• f is the target function

An example is a pair (x, f(x))

Problem: find a hypothesis h


such that h ≈ f
given a training set of examples

(This is a highly simplified model of real learning:


– Ignores prior knowledge
– Assumes examples are given)

Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)

• E.g., curve fitting:

Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)

• E.g., curve fitting:

Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)

• E.g., curve fitting:

Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)

• E.g., curve fitting:

Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)

• E.g., curve fitting:
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)

• E.g., curve fitting:

• Ockham’s razor: prefer the simplest hypothesis


consistent with data
Learning decision trees
Problem: decide whether to wait for a table at a restaurant,
based on the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Attribute-based representations
• Examples described by attribute values (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait for a table:

• Classification of examples is positive (T) or negative (F)



Decision trees
• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding whether to wait:
Expressiveness
• Decision trees can express any function of the input attributes.
• E.g., for Boolean functions, truth table row → path to leaf:

• Trivially, there is a consistent decision tree for any training set with one path
to leaf for each example (unless f nondeterministic in x) but it probably won't
generalize to new examples

• Prefer to find more compact decision trees


Decision tree learning
• Aim: find a small tree consistent with the training examples
• Idea: (recursively) choose "most significant" attribute as root of
(sub)tree
Choosing an attribute
• Idea: a good attribute splits the examples into subsets
that are (ideally) "all positive" or "all negative"

• Patrons? is a better choice


Example contd.
• Decision tree learned from the 12 examples:

• Substantially simpler than “true” tree---a more complex


hypothesis isn’t justified by small amount of data

You might also like