AI All Slides
AI All Slides
Knowledge
Search Logic Representation
Machine
Planning
Learning
Expert
NLP Vision Robotics Systems
What is Artificial Intelligence ?
HUMAN RATIONAL
Systems that act like humans:
?
You enter a room which has a computer terminal. You
have a fixed period of time to type what you want into
the terminal, and study the replies. At the other end of
the line is either a human being or a computer
system.
HUMAN RATIONAL
Systems that think like humans:
cognitive modeling
How do we know how humans think?
Introspection vs. psychological experiments
HUMAN RATIONAL
Systems that think ‘rationally’
"laws of thought"
Humans are not always ‘rational’
Web search
Text classification, spam filtering, etc…
Computer vision
Technologies
Vehicles
Rescue
Soccer!
Lots of automation…
Artificial Intelligence
HUMAN RATIONAL
2
Agents
Humans
– Sensors: Eyes (vision), ears (hearing), skin (touch), tongue, nose,
neuromuscular system
– Percepts:
• At the lowest level – electrical signals from these sensors
• After preprocessing – objects in the visual field (location, textures,
colors, ...), auditory streams (pitch, loudness, direction), ...
– Effectors: limbs, digits, eyes, tongue, ...
– Actions: lift a finger, turn left, walk, run, carry an object, ...
Agent function
Performance measure
Embodies criterion for success
• Amount of dirt cleaned?
• Cleaned floors?
Generally defined in terms of desired effect on environment (not
on actions of agent)
Defining measure not always easy!
Rationality depends on:
1. Performance measure that defines criterion for
success.
2. Agent’s prior knowledge of the environment.
3. Actions the agent can perform.
4. Agent’s percept sequence to date.
Environment
Actuators (actions)
Deterministic vs Stochastic
Deterministic – next state completely determined by current state and
action
Uncertainty may arise because of defective actions or partially
observable state (i.e., agent might not see everything that affects the
outcome of an action).
Properties of task environments
(affect appropriate agent design)
Episodic vs Sequential
Episodic the agent’s experience divided into atomic episodes
Next episode not dependent on actions taken in previous episode. e.g.,
assembly line
Sequential – current action may affect future actions. e.g., playing chess,
taxi
must think ahead in choosing an action
Static vs Dynamic
does environment change while agent is deliberating?
Static – crossword puzzle
Dynamic – taxi driver
Properties of task environments
(affect appropriate agent design)
Discrete vs Continuous.
the state of the environment (chess has finite number of discrete
states)
the way time is handled (taxi driving continuous – speed and
location of taxi sweep through range of continuous values)
percepts and actions (taxi driving continuous – steering angles)
Artificial Intelligence
2
Summary for today’s lecture
Static or dynamic?
Environment is static
Assumptions
Static or dynamic?
Fully or partially observable?
Static or dynamic?
Fully or partially observable?
Discrete or continuous?
Environment is discrete
Assumptions
Static or dynamic?
Fully or partially observable?
Discrete or continuous?
Deterministic or stochastic?
Environment is deterministic
Assumptions
Static or dynamic?
Fully or partially observable?
Discrete or continuous?
Deterministic or stochastic?
Episodic or sequential?
Environment is sequential
Assumptions
Static or dynamic?
Fully or partially observable?
Discrete or continuous?
Deterministic or stochastic?
Episodic or sequential?
Static or dynamic?
Fully or partially observable?
Discrete or continuous?
Deterministic or stochastic?
Episodic or sequential?
Formulate goal: Be in
Bucharest.
2
Searching for a solution: Key concepts in
search
Set of states
Including an initial state and goal states
For every state, a set of actions
Each action results in a new state
Typically defined by successor function
Cost function that determines the cost of each
action (path, sequence of actions)
Solution: path from initial state to a goal state
Optimal solution: solution with minimal cost
Measuring problem-solving
performance
Completeness: is the algorithm guaranteed to
find a solution ?
Optimality: does the strategy find optimal solution
?
Time complexity: how long does it takes to find a
solution?
Space complexity: how much memory is needed
to perform solution ?
4
8-puzzle
1 2 1 2 3
4 5 3 4 5 6
7 8 6 7 8
goal state
8-puzzle
1 2
4 5 3
7 8 6
1 2 1 2 1 5 2
4 5 3 4 5 3 4 3
7 8 6 7 8 6 7 8 6
.. ..
. .
Recall: State-space formulation
Intelligent agents: problem solving as search
Search consists of
state space
operators
start state
goal states
8
Uninformed search strategies
Uninformed: While searching we have no clue
whether one non-goal state is better than any
other. The search is blind.
9
Breadth-first search
Expand shallowest unexpanded node
Fringe: nodes waiting in a queue to be explored, also
called OPEN
Implementation:
fringe is a first-in-first-out (FIFO) queue, i.e., new
successors go at end of the queue.
Is A a goal state?
10
Breadth-first search
Expand:
fringe = [B,C]
Is B a goal state?
11
Breadth-first search
Expand:
fringe=[C,D,E]
Is C a goal state?
12
Breadth-first search
Expand:
fringe=[D,E,F,G]
Is D a goal state?
13
Example: map navigation
A B C
S G
D E F
14
What is the complexity of Breadth-first
search?
• Time Complexity
– assume (worst case) that there is 1
goal leaf at the RHS d=0
– so BFS will expand all nodes
d=1
= 1 + b + b 2+ ......... + bd
= O (bd) d=2
G
• Space Complexity
– how many nodes can be in the queue d=0
(worst-case)?
– at depth d there are bd unexpanded d=1
nodes in the Q = O (bd)
d=2
– Time and space of number of generated G
nodes is O (b^(d+1))
15
Examples of time and memory requirements for
the Breadth-first search
Depth of Nodes
Solution Expanded Time Memory
16
Depth-first search
1. Put the start node s on OPEN
2. If OPEN is empty exit with failure.
3. Remove the first node n from OPEN and place it on
CLOSED.
4. If n is a goal node, exit successfully with the solution
obtained by tracing back pointers from n to s.
5. Otherwise, expand n, generating all its successors
attach to them pointers back to n, and put them at the
top of OPEN in some order.
6. Go to step 2.
17
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = Last In First Out (LIPO) queue, i.e., put
successors at front
Is A a goal state?
18
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[B,C]
Is B a goal state?
19
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[D,E,C]
Is D = goal state?
20
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[H,I,E,C]
Is H = goal state?
21
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[I,E,C]
Is I = goal state?
22
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[E,C]
Is E = goal state?
23
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[J,K,C]
Is J = goal state?
24
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[K,C]
Is K = goal state?
25
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[C]
Is C = goal state?
26
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[F,G]
Is F = goal state?
27
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[L,M,G]
Is L = goal state?
28
Depth-first search
Expand deepest unexpanded node
Implementation:
fringe = LIFO queue, i.e., put successors at front
queue=[M,G]
Is M = goal state?
29
What is the complexity of Depth-first search?
d=0
Time Complexity
assume (worst case) that there is 1
goal leaf at the RHS d=1
so DFS will expand all nodes
d=2
(m is cutoff)
=1 + b + b2+ ......... + b^m G
= O (b^m)
d=0
d=1
Space Complexity
how many nodes can be in the d=2
queue (worst-case)?
at depth l < d we have b-1 nodes d=3
at depth d we have b nodes
total = (m-1)*(b-1) + b = O(bm) d=4
30
Comparing DFS and BFS
Same worst-case time Complexity, but
In the worst-case BFS is always better than DFS
Sometime, on the average DFS is better if:
• many goals, no loops and no infinite paths
In general
• BFS is better if goal is not deep, if infinite paths, if many
loops, if small search space
• DFS is better if many goals, not many loops,
• DFS is much better in terms of memory
31
Artificial Intelligence
2
Informed search and exploration
3
Best-first search
Idea: use an evaluation function f(n) for each node
Estimate of "desirability“
Expand most desirable unexpanded node
Choose node which appears best
Implementation:
use a data structure that maintains the frontier in a decreasing order of
desirability
5
Greedy best-first search:
Example
Greedy best-first search:
Example
Greedy best-first search:
Example
Greedy best-first search:
Example
Goal reached
For this example no node is expanded that is not on the solution path
But not optimal (see Arad, Sibiu, Rimnicu Vilcea, Pitesti)
Greedy best-first search:
Example
Complete or optimal: no
Minimizing h(n) can result in false starts, e.g. Iasi to Fagaras
Check on repeated states
Greedy best-first search:
Example
Time and space complexity:
In the worst case all the nodes in the search
tree are generated: O(bm )
(m is maximum depth of search tree and b is
branching factor)
A* Search
Best-known form of best-first search
Idea: avoid expanding paths that are already expensive
14
A* Search: Example
A* Search: Example
A* Search: Example
A* Search: Example
A* Search: Example
A* Search: Example
20
A* Search: Example
A* Search: Evaluation
Complete: yes
Unless there are infinitely many nodes with f < f(G)
Optimal: yes
A* is also optimally efficient for any given h(n). That is, no
other optimal algorithm is guaranteed to expand fewer nodes
than A*.
Heuristic functions
A heuristic is a technique designed for solving a problem more quickly when
classic methods are too slow, or for finding an approximate solution when classic
methods fail to find any exact solution
Stochastic hill-climbing
Random selection among the uphill moves.
The selection probability can vary with the
steepness of the uphill move.
First-choice hill-climbing
implements stochastic hill climbing by
generating successors randomly until a better
one is found.
Random-restart hill-climbing
Tries to avoid getting stuck in local maxima.
Artificial Intelligence
2
Neural network history
History traces back to the 50’s but became popular in
the 80’s with work by Rumelhart, Hinton, Kanade, and
Mclelland
A General Framework for Parallel Distributed Processing in
Parallel Distributed Processing
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
How do our brains work?
A processing element
X2 2
Y
-1
X2 2
Y
-1
X3
1
AND
X1
X1 X2 Y
Y
1 1 1
X2 1
1 0 0
AND Function
0 1 0
0 0 0
Threshold(Y) = 2
The first neural networks
OR
X1 2
X1 X2 Y
Y
1 1 1
X2 2
1 0 1
0 1 1
AND Function
OR Function
0 0 0
Threshold(Y) = 2
The first neural networks
AND
X1 2 NOT
Y X1 X2 Y
X2
1 1 0
-1
1 0 1
AND NOT Function
0 1 0
0 0 0
Threshold(Y) = 2
How do ANNs work?
Processing ∑
∑= X1+X2 + ….+Xm =y
Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm ..... w2 w1
weights
Output y
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
............
xm x2 x1
Input
wm ..... w2 w1
weights
Processing ∑
Transfer Function
f(vk)
(Activation Function)
Output y
The output is a function of the input, that is affected by
the weights, and the transfer functions
Neural network model
Inputs
.6 Output
Age 34 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2
Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction
21
“Combined logistic models”
Inputs
.6 Output
Age 34
.5 0.6
.1
Gender 2 S
.7 .8 “Probability of
beingAlive”
Stage 4
Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction
22
Inputs
Output
Age 34
.2 .5
0.6
Gender 2 .3
S
“Probability of
.8
beingAlive”
Stage 4 .2
Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction
23
Inputs
.6 Output
Age 34
.2 .5
.1 0.6
Gender 1 .3
S
.7 “Probability of
.8
beingAlive”
Stage 4 .2
Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction
24
Not really,
no target for hidden units...
Age 34 .6 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2
Dependent
Independent Weights HiddenL Weights variable
variables ayer
Prediction
25
Decision Neural Network
Functions
Output
Hidden Layer …
Input …
26
Decision Neural Network
Functions
Output
…
Hidden Layer
…
Input
Building a neural network
Output
Features …
28
Building a neural network
Output
Hidden Layer …
D=M
1 1 1
Input …
29
Building a neural network
Output
Hidden Layer …
D=M
Input …
30
Building a neural network
Output
Hidden Layer …
D=M
Input …
31
Building a neural network
Output
Hidden Layer …
D<M
Input …
32
Multi-class output
Output …
Hidden Layer …
Input …
33
Artificial Intelligence
2
Outline
Machine Learning basics
Introduction to Deep Learning
what is Deep Learning
why is it useful
Main components/hyper-parameters:
activation functions
optimizers, cost functions and training
regularization methods
classification vs. regression tasks
Machine learning basics
Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed
Machine Learning
Labeled Data algorithm
Training
Learned model
Machine Learning
Labeled Data algorithm
Training
Prediction
class A
class A
Classification
Output
…
Hidden Layer
…
Input
Training
Sample Forward it Back- Update the
labeled data through the
propagate network
network, get
(batch) the errors weights
predictions
12
Convolutional neural network (CNN)
13
CNN stride and padding
14
CNN stride and padding (input output same)
CNN stride and padding
CNN layer
17
CNN layer
18
CNN layer
19
CNN layer
20
CNN layer
21
Pooling layer
22
Activation function
23
Activation: Sigmoid
Takes a real-valued number and
“squashes” it into range between 0
and 1.
𝑅𝑛 → 0,1
Activation: Tanh
Takes a real-valued number and
“squashes” it into range between -1
and 1.
𝑅𝑛 → −1,1
𝑅𝑛 → 𝑅+𝑛
f(x)=x
28
LeNet-5
29
LeNet-5
30
LeNet-5
31
LeNet-5
32
LeNet-5
33
LeNet-5
34
LeNet-5
35
LeNet-5
36
VGG-16 architecture
37
Alexnet architecture
Software platforms for CNN
Platform: Caffe
Platform: Tensorflow
Platform: Pytorch
39
Adversarial Search
Game Playing
Chapter 6
Outline
• Games
• Perfect Play
– Minimax decisions
– α-β pruning
• Resource Limits and Approximate Evaluation
• Games of chance
Games
• Algorithm:
1. Generate game tree completely
2. Determine utility of each terminal state
3. Propagate the utility values upward in the three by applying
MIN and MAX operators on the nodes in the current level
4. At the root node use minimax decision to select the move
with the max (of the min) utility value
• These features and weights are not part of the rules of chess, they
come from playing experience
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
3.
Does it work in practice?
Examples:
Medical diagnosis --- physician diagnosing a patient
infers what disease, based on the knowledge s/he
acquired as a student, textbooks, prior cases
Key issues:
1- Representation of knowledge
What form? Meaning / semantics?
2- Reasoning and inference processes
Efficiency.
Knowledge-base Agents
• Key issues:
– Representation of knowledge knowledge base
– Reasoning processes inference/reasoning
Boolean percept
feature values:
<0, 0, 0, 0, 0>
Exploring a wumpus world
Exploring a wumpus world
Exploring a wumpus world
Exploring a wumpus world
Knowledge and Reasoning
First order logic
Inference rule
Propositional logic
Basic: Propositional logic
First order predicate logic
First order predicate logic
Constants
Variables
Predicates
Statement in predicate logic
Quantifiers
Homeworks
• Bound and free variable
• Different sentence transaxle into predicate
logic and vice versa
Learning from Observations
Outline
• Learning agents
• Inductive learning
• Decision tree learning
Learning
• Learning is essential for unknown environments,
– i.e., when designer lacks omniscience
• Type of feedback:
– Supervised learning: correct answers for each example (e.g.,
give label during training)
– Unsupervised learning: correct answers not given
– Reinforcement learning: occasional rewards
Inductive learning
• Simplest form: learn a function from examples
• f is the target function
• Trivially, there is a consistent decision tree for any training set with one path
to leaf for each example (unless f nondeterministic in x) but it probably won't
generalize to new examples