AI Final

Local Search
Local search algorithms

 Look at the neighboring states to decide what to do next.
 Do not keep track of the states that have been reached.
 Don't care about the fact that there may be a better solution somewhere else.
Hill-Climbing
 Keeps track of one current state (no backtracking)
 Does not look ahead beyond the immediate neighbors of the current state(greedy)
 On each iteration moves to the neighboring state with highest value (steepest ascent)
 Terminates when a peak is reached (no neighbor has a higher value)
o Local maxima: is a peak that is higher than each of its neighboring

states but lower than the global maximum.
o Global maximum: best possible state of the state space landscape. Has the highest
value of objective function.
o Current state: state in the diagram where an agent is currently present.
o Flat local maximum: It is a flat space in the landscape where all the neighbor state or
current state have the same value.
o Shoulder: It is a plateau region which has an uphill edge.
o Random sideways moves can escape from shoulders, but they loop forever on flat
maxima
o Plateau is a flat area of the state-space landscape (flat local maximum and shoulder)
Problems
 Local Maxima
 Plateaus (flat local maximum or shoulder)

 Ridges (see figure)
 Sequence of local maxima that are not directly
connected
 Each local maximum only has worse connecting states
 Common in low-dimensional state spaces
Improvements
 Allow for a limited number of sideways moves (if on plateau that is really a shoulder)
‣ Higher success rate + Higher number of moves
 Stochastic hill climbing random selection between the uphill moves, with probability
related to steepness.
‣First-choice hill climbing random testing of successors until one is found that is
better than the current state.
‣Good strategy when testing all successors is costly
 Random-restart hill climbing do several hill-climbing searches from randomly

selected states.
‣If each hill-climbing search has probability of success p then solution will be
found on average after 1/p restarts.
‣Will eventually find the correct solution because goal state will be initial state
-------------------------
If elevation = objective function -> find the global maximum or highest peak -> hill climbing
If elevation = cost -> find the global minimum or lowest valley -> gradient descent
Stimulated Annealing
o Problem with hill climbing: efficient but will get stuck in a local maximum.
o Problem with random walk: most inefficient but will eventually find the local
maximum.
o Combination of both -> simulated annealing (completer and more efficient)
 Move to randomly chosen neighbor state

 If utility is higher, always move to that state
 If utility is lower, move to that with probability p<1
 Probability of a move to a worse state
o Becomes less likely the worse the move makes the situation
o Becomes less likely as a temperature decrease
Local Beam Search
 Selects the k best successors at every step
 If k=1 -> hill climbing
 If k=>2 -> parallel hill climbing process
 Stochastic local beam search
 Selects k successors at random every step
 Probability of selection is a function of utility (aka fitness)
Genetic Algorithms
 Starts with k randomly selected states(population)
 Each state (or individual) is encoded as a string
 Each state is rated by objective function (aka fitness function)
 Two pairs are selected at random with probability of selection increasing with fitness
 Crossover point for pair is chosen at random
 Offspring are created by combining string at crossover point
 Small probability of random mutation
Combine
 Uphill tendency and Random exploration
 Exchange of information between search threads
 Biggest advantage comes from crossover operation (no advantage if code is initially
perfumed)
 Good application requires careful engineering of code
Summary for Local Search
 For many search problems we do not need the best possible solution, or the best
solution is not achievable at all.
 Local search methods are a useful tool because they operate on complete search
problems without keeping track of all the states.
 Simulated annealing adds a stochastic element to hill climbing and give optimal
solutions in some circumstances.
 Stochastic local beam search provides a first approach to the generation and selection
of states.
 Genetic algorithms maintain a large population of states and use operations such as
mutation and crossover to expand the search space.
The uninformed and informed search algorithms that we have seen are designed to explore
search spaces systematically.
- They keep one or more paths in memory and by record which
alternatives have been explored at each point along the path.
- When a goal is found, the path to that goal also constitutes a solution to
the problem.
- In many problems, however, the path to the goal is irrelevant.
If the path to the goal does not matter, we might consider a different class of algorithms that
do not worry about paths at all.
Adversarial Search
 How can we design AI systems to play adversarial games.?
 Is there a way to let AI play a game optimally?
 Is there a limit to how far AI can look ahead in the game as far as
possible?
 Which techniques are there to look further ahead in the game?
Games most commonly studied in AI are:
 Deterministic (chess (lack of randomness in the game ) )

 Two-player (card and board games)
 Turn-taking (ball catching or kicking games)
 Perfect information (chess, tic-tac-toe)
 Zero-sum(poker)
Approaches to modelling adversarial games

1. Consider agent together as an economy.
 No need to predict the action of individual agents.

 Can capture large characteristics of the system, such as laws of supply and
demand.
2. Consider adversarial agent as part of the environment.
 Models the probabilistic behavior of agents as a dynamic system.

 Does not explicitly take into account that agent may have conflicting goals.
3. Model agents using adversarial game-tree search.

 Explicitly model players as an adversarial agent.
 Only suitable for specific games.
Two-player zero-sum games
Formalization
State space graph
Tic-Tac-Toe
Ordinary search vs Adversarial search
 In a normal search such as for the 8-puzzle, we could end the game by finding a path
to a good end position.
 However, in adversarial search, the other player co-determines the path.
Minimax search
 Two players, MAX, and MIN, take turns in the game.

 MAX must plan ahead against each of MIN’s possible moves (a move by a player is
also called a ply)
 If we are at a terminal node -> utility of terminal node

 If it’s MAX’s turn to move -> maximum of descendant’s utilities
 It it’s MIN’s turn to move -> minimum of descendant’s utilities
MINIMAX search Algorithm
 Depth-first exploration of the tree

 Recursively descends each branch of the tree
 Computes utility for terminal nodes
 Goes back up, assigning minimax value to each node
Complexity of MINIMAX
 Time complexity of MINIMAX is exponential: O(bm)

b = (average) branching factor
m = maximum depth of the tree
 More efficient way to search the game tree -> Alpha-Beta pruning
Alpha-Beta pruning
 α = the value of the best (i.e., highest value) choice we have found so
far at any choice point along the path for MAX. Think: α = “at least.”
 β = the value of the best (i.e., lowest value) choice we have found so
far at any choice point along the path for MIN. Think: β = “at most.”
Transposition tables
In games like chess, the same positions can occur as a result of different moves -> this is
called a transposition
 Exploring the search-tree from that point again would be at least double work
 Results of search for positions can be stored in a transposition table
 Lookup from transposition table instead of search
 Chess positions can be converted into unique indexes using special hashing
techniques so that lookup has O(1) time complexity
Heuristic strategies
Shannon (1950)
 Type A strategy (historically used for chess)
Consider wide but shallow part of tree and estimate the utility at that point
 Type B strategy (historically used for Go)
Consider promising parts of the tree deeply and ignore unpromising paths
Heuristic Alpha-Beta Tree Search

 Can treat non-terminal nodes as if they were terminal
 Utility function, which is certain, is replaced by an evaluation function, which
provides an estimate
o E.g., queen=9, knight=3, bishop=3, rook=5, pawn=1….
o Typically, a weighted linear function of values
o …. but can be any function of the features
 H-MINIMAX(s,d)
o If cut-off reached -> compute expected utility of node (true
utility for terminal nodes)
o If it’s MAX’s turn to move -> maximum of descendant’s
expected utilities
o If it’s MIN’s turn to move -> minimum of descendant’s
expected utilities
Forward Pruning
 Prune moves that appear to be bad (based on experience)
o Type B strategy
o PROBCUT: Forward pruning version of alpha-beta search that prunes
nodes that are probably outside the window
o Late move reduction reduces depth to which ordered moves are
searched. Backs up full search if higher alpha value is found
Monte Carlo Tree Search
Summary for Adversarial Search

 Games can be formalized by their initial state, the legal actions, the result of each
action, a terminal test, and a utility function
 The MINIMAX algorithm can be determining the optimal-moves for two-player,

discrete, deterministic, turn-taking, zero-sum games, with perfect information
 Alpha-beta pruning can be removed subtrees that are provably irrelevant
 Heuristic evaluation functions must be used when the entire game-tree cannot be
explored (i.e., when the utility of the terminal nodes can’t be computed)
 Monte-Carlo tree search is an alternative which plays-out entire games repeatedly and
chooses the next move based on the proportion of winning playouts
Problem Solving Under Uncertainty
How can we build machines that can handle the uncertainty of the natural world?
(“in this world noting can be said to be certain, except death and taxes”- Benjamin Franklin 1789)
It is a valid system of making predictions, but is it a good one?
What makes a good model?
Good models make errors because there are often noises in the date and we don’t want or model to
capture all that noises. The black lines are the errors.
Q1: degree-25 polynomial Q2: degree-3 polynomial
How can we build machines that can handle the uncertainty of the natural world?
If we were making a prediction, we would

conclude that now he has over $30 billion,
but that is not the case. His fund didn’t do
well during the pandemic.
Quantifying Uncertainty
Acting under Uncertainty
Rational agents with perfect knowledge of the environment

 Can find an optimal solution by exploring the complete environment
 Can find a good (but perhaps suboptimal) solution by exploring part of the environment using
heuristics
What should agents do if they don’t have perfect information?

 Maximize performance by keeping track of the relative importance of different outcomes and
likehood that this outcome will be achieved. (Blackjack example)
Most of the time we use deductive reasoning, but logic is insufficient

 Toothache -> cavity
 Toothache-> cavity v gum problem
 Toothache-> cavity v gum problem v abscess
 Toothache-> cavity v gum problem v abscess v …...
 Cavity -> toothache
Only an exhaustive list of possibilities on the right side will make the rule true.
Why is logic insufficient?

 Laziness: it’s too much work to make and use the rules.
 Theoretical Ignorance: we don’t know everything there is to know.
 Practical Ignorance: we don’t have access to all the information
 In that case we replace certainty(logic) with degree of belief(probability)
Probability Theory
 Probability statements are usually made with regard to a knowledge state.
 Actual state: patient has a cavity or patient does not have a cavity
 Knowledge state: probability that the patient has a cavity if we haven’t observed her yet.
Decision theory = probability theory + utility theory
Principle of maximum expected utility (MEU)

 An agent is rational if and only if it chooses the action that yields the highest expected utility.
Expected = average of outcome utilities, weighted by probability of the outcome

Example: Choose between an 80% chance of getting 4000$ and a 100% chance of getting 3000$
Utility Probability
4000$ 0.8
3000$ 1
Answer:
Utility Probability Expected utility
4000$ 0.8 4000 * 0.8 +0.2 * 0 = 3200$
3000$ 1 3000 * 1 = 3000$
Probability Terminological Map
Possible worlds
Example: 1) when you throw two dice and you have a bunch of outcomes, those are the
possible worlds of throwing the dices
2) all possible configurations of a chessboard would all be the possible worlds of
the game
 In statistics and AI, we use the term “possible worlds” to refer the possible states of
whatever we are trying to represent. The term ”world” is limited to the problem we
are trying to represent.
 A possible world (ω lowercase omega) is a state that the world could be in.
 A set of possible worlds ( capital omega) includes all the states that the world could
be in.  must be exhaustive.
 Each possible world must be different from all the other possible worlds. Worlds must
be mutually exclusive.
Set of all possible worlds = sample space = 
 = {(1,1),(1,2),……(6,5),(6,6)}
Possible world = element of the sample space = ω

ω1 = (1,1) ω36 = (6,6)
Events
 Set of worlds in which a proposition holds
 Probability of an event: sum of probabilities of the worlds in which a proposition holds
Example:
 Proposition: rolling 11 with two dice
P (total= 11)
 Event: set of worlds in which the proposition holds
{(5,6), (6,5)}
 Probability of event
P((5,6)) + P((6,5)) = 1/36 + 1/36 = 1/18
Conditional and Unconditional Probabilities
Unconditional probabilities: Degree of belief in propositions in the absence of other

information
 Also known as prior probabilities or priors
Conditional probabilities: Degree of belief given other information

 Example: rolling a double if the first dice is 5
P (double | Dice1 = 5)
Conditional Probabilities
Dice 1
P (doubles =5)
P (double | Dice1 = 5) =
P (Dice 1=5)
The product rule:
P (ab )
P (a | b) =
P(b)
implies
P (a ^ b) = P (a | b)P(b)
Random Variables
 Function that maps from a set of possible worlds to a domain or range

 Always uppercase
Example: Random variable Total is defined as the sum of throwing two dices
 Possible worlds: {(1,1), (1,2),.....(6,6)}
 Domain or range: {2,3, 4…12}
Domains of a Random Variable
 Boolen: {true, false}
A= true is written as a
A = false is written as -a
 Arbitrary: {blonde, brown, black, red}

A= blonde, written as blonde
 Infinite and discrete: A=Z (set of integers)

 Infinite and continuous: A=R (set of real numbers)
Joint Probabilities
P (Toothache ^ Cavity) = P (Toothache | Cavity) * P (Cavity)
 Boldface P means “for all possible values of the random variable”

 A probability model is completely determined by the joint distribution for all the
random variables.
Example: P (Cavity, Toothache, Catch) = 2x2x4 table
Probability Axioms
from that we can derive:

 Complement of a proposition and its negation
P (-a) = 1 – P (a)
 Inclusion – Exclusion principle
P (a v b) = P (a) + P (b) –
P (a ^ b)
Probabilistic Inference
P (cavity v toothache) =?
cavity – rectangle
toothache - circle
P (cavity v (OR) toothache) = (0.108+0.012+ 0.072+ 0.008) + (0.108 + 0.012 + 0.016 +

0.064) – (0.108 + 0.012) = 0.28
Extracting unconditional probabilities (marginalization)
P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
P (toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

P (catch) = 0.108 + 0.016 + 0.072 + 0.144 = 0.2
Conditioning
Computing conditional probabilities

Normalization
General inference procedure
Independence
The cavity has no influence on the weather and same situation on the other hand, the cavity
has nothing to do with the actual weather outside.
Weather is considered independent of all these things.
The things are independent from

each other, so we don’t need the
weather in our table for making
the calculations.
 Assumptions about independence are usually based on domain knowledge.
 Independence drastically reduces the amount of information needed to specify the full
joint distribution
For instance: rolling 5 dices

 Full joint distribution: 65 = 7776
 Five single variable distributions: 6 * 5 = 50
Conditional Independence
P (X, Y | Z) = P (X | Z) * P (Y | Z)
Example:
 Catch and toothache are not independent: if the probe catches, then it is likely that the
tooth has a cavity, and that this cavity causes a toothache.
 However, toothache and catch are independent, given the presence or absence of a
cavity.
o If a cavity is present, then whether there is a toothache is not
independent or whether the probe catches, and vice versa.
o If a cavity is not present, then whether there is a toothache is not

dependent on whether the probe catches, and vice versa.
P (toothache, catch | cavity) = P (toothache | cavity) * P (catch | cavity)
Bayes’ Rule – derivation
 Product Rule
o P (a ^ b) = P (a | b) * P (b)
o P (a ^ b) = P (b | a) * P (a)
 Bayes Rule
¿
o P (b | a) = P ( a|b ¿∗P(b) P (a)
o Useful when you have estimate for three of the four terms and you
need to compute the fourth
Bayes’ Rule
Determining the probability of a cause given a certain effect (diagnosis)
o Example: what is the probability that you ate a magic mushroom if you are
hallucinating?
P (magic mushroom | hallucination) =

P ( hallucination|magic mushroom ¿∗P (magic mushroom) ¿
P (hallucination)
Magic mushrooms cause hallucinations 70% of the time. The prior probability that someone
ate magic mushroom for lunch is 1/50,000. The prior probability that someone who comes
into the hospital is hallucinating is 1%.
P (0.07)∗P (0.00002)
P (magic mushroom | hallucination) = = 0.00014
P(0.01)
o Example 2: What is the probability that you will hallucinate when eat a magic
mushroom?
P (hallucination | magic mushroom) =

P (magic mushroom∨hallucination)∗P(hallucination)
P( magic mushroom)
A devoted researcher does experiment with psychoactive substances once a month. Over the
course of the last year, the researcher did 12 experiments. The researcher hallucinated 9 times
out of 12. Out of 10 times the researcher was hallucinating, two were attributable to magic
mushroom use. Half of the experiments involved magic mushrooms.
12 experiments; 9/12 (0.75) hallucinated; 2/10 (0,2) times hallucinating because of magic
mushrooms; 6/12 (0,5) experiments involved magic mushrooms
P (0 , 2)∗P(0.75)
P (hallucination | magic mushroom) = = 0,3
P(0.5)
Scaling up inference?
Summary
Practical 4 – Local & Adversarial Search 2
1.MINIMAX
Determine the node values of MAX’s last move
1st step
2nd step
2.Alpha-Beta Pruning 1
3.Alpha-Beta Pruning 2 (Same as the tree with minimax, different search strategy)
4.Genetic Algorithms
o A genetic algorithm (GA) is a variant of stochastic beam search in which successor
are generated by combining two parent states rather than by modifying a single state
o Starts with k randomly selected states(population)
o Each state (or individual) is encoded as a string
o Each state is rated by the objective function (aka fitness function)

o Two pairs are selected at random with probability of selection increasing with fitness.
o Crossover point for each pair is chosen at random
o Offspring are created by combining strings at crossover point
o Small probability of random mutation
Examples:
Probabilistic Reasoning / Bayesian Networks
Graphs/Networks
Mathematical structures used to model pairwise (because any edge in the graph tells us something
about how the nodes are connected) relations between objects
Types
Acyclic graphs are always directed, you There is at least one cyclic in the graph
cannot end up at the same point. and if you decide to follow it you will
end up at the starting point.
Directed connected graph without directed cycles is a tree. It does not need to look like a tree.
Paths,
Trails,
and
Walks
Real life examples:
Transfer Serial Duplication Parallel Duplication

-when you have -when you have -when you have something
something in a something and you and you have it in a node, by
node, and you have it in a node, by following the edge it also
follow an edge and following the edge it appears in the other node
the thing you had also appears in the (from one-to-many others)
goes the new one other node (from one
to other, not to many
others)
Shortest Path Package Delivery Mitosis = cell division

(Unique nodes and edges)
Path Hand-me-down Sexually Transmitted Web Server

(Unique nodes and edges)) Clothing Diseases -Whatever is served from
- The previous - Person can transmit the web server on your CP
owner won’t wear the disease only once remains on the web server
these clothes again. to one person
-It can be transmitted to
-Once you have it and many, many different
you transmit it to machines at once.
another person, you
still have it, BUT you
cannot have it second
time
Trail Book lending Gossip Chain Letters
(Unique edges) - When you hear a - Any kind of scams that
gossip, you share it involving sending letters to a
with someone new bunch of people and then
you will get money in return
if you only pay money to the
previous person
Walk Money Exchange Emotional Support Ideology Transfer

(No restrictions) - You can exchange - Someone makes you -Something that can be done
money with the feel better, and you by talking to different
same person and make someone feeling people as one at once
then do the reverse better, there is no -There are no restrictions
transaction any way certain way about the way the message is
you want transferred
It is P.D. because when I
believe sth and I tell what it
is, it’s not gone from me
Bayesian Network
o How do we decide which variables have a direct influence on each other?

- There is no easy way to do it. It is what is called domain knowledge and it rely on
the expertise of somebody or on your own expertise to define which variables have an
influence on other variables or not.
-we use common sense inference to say that the weather has no influence on the cavity, catch
and toothache, on the other hand we think that whether or not there is a cavity has an
influence on the probability of toothache and same with the catch. But we do not think that
the toothache directly influences the catch.
-This kind of inferences and making this kind of assumptions is when you make a Bayesian
network is usual, and that is called domain knowledge.
Example:
We want to know the probability that the alarm has sounded, but neither a burglary nor an
earthquake has occurred, and both John and Mary call.
P(a, -b, -e, j, m)
Constructing Bayesian Network
Two ways to understand the semantics of a Bayesian Network
 As an encoding of conditional independence statements. Useful to design inference

procedure (i.e., answer questions about the probability of events)
 As a representation of the joint probability distribution. Useful to construct networks
Chain Rule
Constructing Bayesian Network
Correctly constructed network = DIRECTED ACYCLIC GRAPH, in which the nodes

have no influence from earlier parents.
*1 would be wrong if there was a connection between 1 and 4
Why is 2 wrong? -the graph is not fully directed (no connection between weather and the
other nodes)
Representing conditional probability tables more efficiently
Most relationships between parents nodes and their descendants are not completely arbitrary.
Deterministic nodes
 Value of the nodes is specified exactly by
the value of is parents, with no
uncertainty
 Logic is insufficient in this way we do
not need probability.
 We do not create probability tables if
there is an easier way.
Dealing with continuous variables:
 Discretization (e.g., split up temperature in low, medium and high)
Summary
Probabilistic Reasoning 1
P(c0) =?
P(-r1) =?
P(c1) =?
P (c1 | -c0) =?
P (-c1 | c0) =?
P (r1 | c0) =?
P (-r1 | -c0) =?
P(c0) =?
P(c0) = 0.171 + 0.076 + 0.189 + 0.126 = 0.562
P(-r1) =?
P(-r1) = 0.189 + 0.126 + 0.075 + 0.258 = 0.648
P(c1) =?
P(c1) = 0.171 + 0.189 + 0.087 + 0.075 = 0.522
P (c1 | -c0) =?
P (c1 | -c0) = P (c1 ^ -c0) / P (-c0)
P (c0) = 0.562 => P (-c0) = 1 – 0.562 = 0.438
P (c1 ^ -c0) = 0.087 + 0.075 = 0.162
P (c1 | -c0) = P (c1 ^ -c0) / P (-c0) = 0.162 / 0.438 = 0.370
P (-c1 | c0) =?
P (-c1 | c0) = P (-c1 ^ c0) / P (c0)
P (co) = 0.562
P (-c1 ^ c0) = 0.076 + 0.126 = 0.202
P (-c1 | c0) = P (-c1 ^ c0) / P (c0) = 0.202 / 0.562 = 0.359
P (r1 | c0) =?
P (r1 | c0) = P (r1 ^ c0) / P(c0)
P (co) = 0.562
P (r1 ^ c0) = 0.171 + 0.076 = 0.247
P (r1 | c0) = P (r1 ^ c0) / P(c0) = 0.247 / 0.562 =0.440

P (-r1 | -c0) =?
P (-r1 | -c0) = P (-r1 ^ -c0) / P (-c0)
P (-c0) = 1 – 0.562 = 0.438
P (-r1 ^ -c0) = 0.075 + 0.258 = 0.333
P (-r1 | -c0) = P (-r1 ^ -c0) / P (-c0) = 0.333 / 0.438 = 0.760
----------------------------------------------------------------------------------------------------------------
P (-c1 | m0) =?
P (c1 | (c0 ^ r0)) =?
P (c1 | (h0 ^ -r1)) =?
P (-c1 | (-c0 ^ l0)) =?
P (-c1 | m0) =?
P (c1 | (c0 ^ r0)) =?
P (c1 | (h0 ^ -r1)) =?
P (c1 ^ h0 ^ -r1) = 0.007 + 0.000 + 0.003 + 0.004 = 0.014

P (h0 ^ -r1) = 0.041 + 0.003 + 0.011 + 0.008 + 0.007 + 0.000 + 0.003 + 0.004 = 0.077
P (-c1 | (-c0 ^ l0)) =?
P (-c1 ^ -c0 ^ l0) = 0.179 + 0.013 + 0.008 + 0.002 = 0.202

P (-c0 ^ l0) = 0.179 + 0.013 + 0.008 + 0.002 + 0.063 + 0.069 + 0.005 + 0.002 = 0.341
------------------------------------------------------------------------------------------------------
Probabilistic Reasoning 2
Discussion: Searle
Connectionism (symbolic vs. subsymbolic AI, parallel distributed processing) = a movement in

cognitive science that hopes to explain intellectual abilities using artificial neural networks. Symbolic
AI = the term for the collection of all methods in artificial intelligence research that are based on
high-level symbolic (human-readable) representations of problems, logic and search. In subsymbolic
AI, Implicit representation is derived from the learning from experience with no symbolic
representation of rules and properties. Parallel distributed processing = a type of information
processing where large numbers of computing units perform their calculations simultaneously.
Computational units can receive and process multiple inputs and transmit multiple outputs.
Syntax = arrangement of words and phrases to make well-formed sentences in a language.
Semantics = the meaning of words and how to combine words into meaningful phrases and
sentences.
Behaviourism (and how it’s implicit in Strong-AI) = the theory that human and animal behaviour can
be explained in terms of conditioning, without appeal to thoughts or feelings. Behaviour is seen as
the outward expression of what goes on in the mind, so, according to the strong AI view, if a
machine behaves as if it has intelligence/a mind, then it must be intelligent/have a mind.
Searle’s main points
1. Some AI researchers believe that by finding the right program they will create a thinking,
conscious machine.
2. Searle’s Chinese room argument: Running the right program is not sufficient for a thinking
machine.
3. The Chinese room runs the right program, but has no understanding of Chinese.
4. Searle is not arguing against the possibility of creating a thinking machine, he is arguing
against the idea that doing this is merely a matter of coming up with the right program.
5. If we are to construct thinking machines with consciousness, we also need to consider the
nature of the machinery that runs the program.
6. What kind of machinery do we need?
7. Will it ever be possible to “measure” consciousness?
4 counter arguments
1: the systems reply
The person in the room doesn’t understand Chinese, but the system as a whole does understand
Chinese. Searle is playing the role of a CPU, but the system has other components like a memory etc.
E.g., we don’t say the brain understands and feels love, we say that people understand and feel love.
• Searle’s response: The person in the room could internalize the whole system, and would still not
understand Chinese.
2: the robot reply
The person in the room doesn’t understand Chinese, but if the system were connected to the world
like a robot, with sensors etc, then it would understand Chinese. This would establish a causal
connection between the world and the structures being manipulated.
• Searle responds: All these sensors provide is information. There is no difference between this
information and information passed into the room in the form of questions.
3: the brain simulator reply
What if the program precisely simulated the brain of Chinese speaker, including the neural
architecture and the state of every neuron. Then then the system would understand Chinese.
• Searle responds: Whatever system the person in the room is simulating, it will still only be a
simulation.
4: the other minds reply
The only way we attribute understanding to other people is through their behaviour. There is no
other way. Therefore, we must decide if we attribute understanding to machines in the same way,
only through their behaviour.
 Searle responds: “The problem in this discussion is not about how I know that other people
have cognitive states, but rather what it is that I am attributing to them when I attribute
cognitive states to them.”
 There is a difference: We know machines are just manipulating symbols without knowing
what they mean, but we are not sure about people.
Overview: is this a well-formed question?
 Do we have a single example of an agent that we know to be conscious?

 Is there any scientific way in which we could distinguish conscious and non-
conscious agents? Strong AI vs. weak AI: an inconsistency?
Searle considers 7 replies to the Chinese Room argument (points a-g on

pages 29-30). Summarize and elaborate on each view in one paragraph,
as well as explaining why the group agrees with (1) Searle, (2) the
opposing view, (3) both views, or (4) neither view.
1-Understanding something comes from the system around us and it

actually doesn’t entail the person realizing that they understand it. We
believe that this is close to impossible because we view ‘understanding’ as
actually applying thought processes and consideration. To sum it up, in order
to understand something we need to actively reflect on it.
2 -Understanding of the surrounding system is innate and unconscious
(happens via internal subsystems). This opinion seems plausible as people
possess innate knowledge. For example individuals do not learn that they are
thirsty, they just know it.
3- Understanding the system isn’t the main objective, we’re a mere
contribution to the system making it whole and giving it meaning. We agree
with this opinion as cooperation of systems brings a better understanding of
the system since different stance points are presented.
4- Computation is strictly analogical. Complicated brain processes can be
reduced to formal symbol manipulation in machines. We disagree with this
statement as the brain is much more complicated, for example it can process
emotions, thoughts, etc.
5 - Programs are conscious. This statement seems a little bit too far
fetched at least given all the technology available nowadays. Yes, there is a
theory of emerging agents (using the trial-error learning method for
example) , but seems implausible as intelligent agents will (for now at least)
need a little ‘push’ and continuous support and supervision from humans.
6-” F” talks about how computers could attach meaning to the symbols
(semantics) and therefore act upon the environmental stimuli. These
processes, however, are still rule-governed (and there isn’t true randomness
in computer systems as well), therefore we agree with Searle.
7- Knowledge can be replicated by ‘imitating’ the system. We agree since
knowledge is represented by mere facts and intelligence is the one thing that
for now at least cannot be replicated.
In what senses, if any, does Searle's argument pose a problem to AI?

Your group should compose a single response to this question in no
more than two paragraphs. If group members disagree, describe the
disagreement. For the sake of this exercise, we encourage
disagreement!
Searle points out that simply running a computer program on a symbol-

manipulating device is not enough to guarantee cognition and precisely hits
the mark. A potential problem with this is that one could never achieve true
AI, as the human brain is too sophisticated a unit and duplication seems
implausible for the time being.
Discussion: Ethics
Two main concerns
1. Protecting us from AI: As AI penetrates deeper into society, what ethical and moral issues
does this pose?
2. Protecting the rights of AI systems: Should AI systems have moral status, and if so, when and
why? What are the implications?
Three scenarios
1. The current scenario: Approaching Artificial General Intelligence.

2. The scenario when we attribute moral status to machines.
3. The scenario when minds with “exotic properties” exist.
Scenario 1: approaching artificial general intelligence (AGI)
 Because AGI aims at general abilities, AI systems of the future are likely to carry out tasks
that we didn’t design them for. Will they behave ethically when carrying out these tasks?
 The moral/ethical implications of AGI systems need to be verified before they are deployed.
How can we do this? The systems must somehow think in the same way that trustworthy
designer would.
 Ethical cognitive considerations need to be made part of the engineering problem, rather
than being considered as an afterthought.
Scenario 2: machines with moral status
Two properties seem relevant when attributing moral status:
1. Sentience: The ability to feel.

2. Sapience: The ability to think, reason, and be self-aware.
Do both need to be established for moral status?Non-discrimination principles when attributing

moral status:
 Principle of substrate non-discrimination: All else being equal, an agents sentience or

sapience should be judged independent of the physical substrate on which it is
implemented.
 Principle of ontogeny non-discrimination. All else being equal, an agents sentience or
sapience should be judged independently of the process that created the agent.
Scenario 3: minds with exotic properties
We need to be open minded about what kind of systems might possess sentience and
sapience. The notions of morality and ethics has always evolved, fitting the concerns of the
time. This is likely to continue, and AI may play a significant role in shaping future notions of
ethics and morality.
Two exotic properties:
 Objective vs. subjective time: Should machines that think faster than us go to prison for a
shorter period of time?
 Accelerated reproduction: Should machines that reproduce faster than others be subject to
different moral codes?
Bostrom and Yudkowsky propose how we should decide if a machine
should have moral status. Summarize and elaborate on their proposal in
one paragraph. In addition, summarize in one paragraph why the group
agrees, or disagrees, with this proposal.
Two criteria are commonly proposed as being importantly linked to moral status,
either separately or in combination: sentience and sapience (or personhood).
These may be characterized roughly as follows:
 Sentience: the capacity for phenomenal experience or qualia, such as

the capacity to feel pain and suffer.
 Sapience: a set of capacities associated with higher intelligence, such as
self- awareness and being a reason-responsive agent.
We strongly agree with this proposal, simply because there is no other criteria
relevant to have a moral status.
Bostrom and Yudkowsky propose three principles (defined on pages 7,

8, and 11). Summarize and elaborate on each of these principles, using
one paragraph for each principle. In addition, for each of these
principles, summarize one argument for, and one argument against, the
principle.
 Principle of Substrate Non-Discrimination
“If two beings have the same functionality and the same conscious experience,
and differ only in the substrate of their implementation, then they have the
same moral status.”
Summary: AI and humans can have the same conscious experience. The only
difference between them is the material they are composed of.
For: The precondition of same functionality and same conscious experience are
required to make sure that the two entities have at least some common
propensities and idea of how the world is. This argument implies that difference
in moral status does not arise simply due to difference in substrate. If we have
the presupposition that the theory of mind we use to judge whether other
humans have a conscious experience similar to us does not necessarily depend
upon their physical appearance, then the same is true for entities other than
humans if we have judged that they have similar conscious experience. If a
human consciousness is uploaded into a computer chip but still has the same
conscious experience judged by the functionality and propensities that they
have, then the same is true for an AI. Considerations other than moral status can
be used to differentiate, like we use among humans (family members vs
strangers), simply not the moral status itself.
Against: We will never know if an AI has the same conscious experience as a

human, because therefore we would need to be able to be the AI to experience
what it is experiencing. So it is impossible to compare experiences.
Initially, even if two people who experienced the same function and conscious
experience may not necessarily have the same morals, regardless of how they
are put into practice. Machines hence can encode an implicit similar algorithm as
human functionality and conscious experiences, but still an algorithm. It's just
matrix multiplication at the end of the day(no consciousness although we are
unable to clearly define it). The author(s) of how the Ai would conceive rules and
moralities are conscious entities who carefully consider their reactions and fully
comprehend the inputs. Ai in this case is simply acting, and even though they
are “physically or consciously” there, a conscious has already responded to
those inputs morally.
 Principle of Ontogeny Non-Discrimination
“If two beings have the same functionality and the same consciousness
experience, and differ only in how they came into existence, then they have the
same moral status.”
Summary: A being’s moral status is not affected by how it came into existence.
The moral standing of that being is not undermined, reduced, or altered by
deliberate design.So, The Principle of Ontogeny Non-Discrimination is consistent
with the claim that the creators or owners of an AI system with moral status may
have special duties to their artificial mind which they do not have to another
artificial mind, even if the minds in question are qualitatively similar and have
the same moral status.
For: If two beings have the same consciousness experience and functionality
therefore they have the same quality of subjective experience and even same
possibility of use. Although they differ in how they came into existence they have
the same moral status. As an example I would provide: If we would make a
human clone. The clone would not remember its birth the same as we do not
also it would have the exact same functions as we do. Therefore the clone could
think he is human. So if we would be able to find only difference and it would be
the way of birth we could argue if the clone is human but we would have to
assign the same moral status.
Against: AI shouldn’t have a moral status because it, while it can be argued it
could experience reality the same way humans do, does not have a brain, which
is responsible for consciousness and intelligence. People do not oppose causal
factors such as assisted delivery, in vitro fertilization, etc. in humans when
deciding whether new humans deserve a moral status, but they possess a brain
which is the main prerequisite both for sentience and sapience An AI cannot
have a moral status in my opinion. First of all, it cannot reproduce emotions like
humans, they don’t feel pain, regret or sorrow after being done wrong for
example. Secondly AI machines don’t know what is morally wrong or right. A
human can for example deduct that stealing is wrong, but a machine cannot
decide that for itself, without human intervention.
 Principle of Subjective Rate of Time
“In cases where the duration of an experience is of basic normative significance,

it is the experience’s subjective duration that counts.”
Summary: The idea of whole brain emulation or “uploading” is the foundation

of the principle of the subjective rate of time. “Uploading” refers to the concept
that one day there will be invented a technology able to transfer human
intelligence from its organic form onto a digital computer. If uploaded to a faster
computer the consciousness will perceive the external world as if it were slowed
down and therefore develop a subjective sense of time that is moving faster than
the actual time. From there arises the question when judging time should we use
the subjective perspective that the uploads have or should we use the objective
perspective that we all perceive.
For: I agree that humans and computers experience time differently and this
time of ‘reflection’ sometimes is crucial. AI could experience time differently
from humans, at least in its uploading and processing of information (which it
does much faster compared to a human being). For example, humans are put
into prison to reflect on what they have done. This entails that fairness requires
us to take subjective experiences into account.
Against:Time could not be viewed subjectively. Humans and AI may experience

time subjectively differently, but time can also be viewed subjectively differently
between humans. "Time flies when you're having fun", fits this well. Besides, it is
not possible to measure something subjective objectively. So it will be difficult to
take subjective experiences into account. Because of the experiential difference
of time between people themselves and between AI, it would be better to
assume one objective time.
MOCK
1.Given the figure below, indicate from which starting states the global maximum will be reached
with the hill-climbing algorithm
Answer: C, D, A, B
2.Given no mutations and a crossover point of 4(after the fourth character), give the strings for the
following ancestors and descendants.
Answer: A2: A3CE42 A3: 049AFF

Y2: A3CEA0 Y3: 049A3D
3.Given the following game tree for a two players game, match nodes J, L, O, R to their correct
value, applying the MINIMAX algorithm
Answer:
J=2
L=8
O = 12
R = 18
4.Given the following game tree with utility values for
the terminal nodes, indicate which nodes will be
pruned using MINIMAX with alpha-beta pruning.
Answer: L, M, P
Neither
Neither
Neither
Searle
A critic
Local and Adversarial Search 1
State Space
Suppose that we want to sort the unsorted sequence using steepest descent: A-D-C-B
 The next states can be reached by swapping two neighbouring characters

 The cost function is the total number of steps the characters are out of place
 The cost of A-D-C-B is 0 + 2 + 0 + 2 = 4
 Our goal is to minimize the cost: A-B-C-D has a cost of 0, because no characters are out of
place
 Which actions are available from the initial state?
 What is the cost of each of these resulting states?
What is the result of applying the steepest descent algorithm

to this problem?
• The cost in our initial state is 4
• The cost of the neighbouring states are 6, 4, and 4
• None of the neighbours can improve the cost
• We cannot make a move
Find the result of applying the steepest descent algorithm,

allowing to go to another state with equal cost

AI Final

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

AI Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Final

Uploaded by

Copyright:

Available Formats

Local Search

Local search algorithms

o Local maxima: is a peak that is higher than each of its neighboring

o Current state: state in the diagram where an agent is currently present.

o Shoulder: It is a plateau region which has an uphill edge.

 Plateaus (flat local maximum or shoulder)

‣Good strategy when testing all successors is costly

 Random-restart hill climbing do several hill-climbing searches from randomly

 Move to randomly chosen neighbor state

 Each state is rated by objective function (aka fitness function)

 Is there a way to let AI play a game optimally?

 Which techniques are there to look further ahead in the game?

Games most commonly studied in AI are:

 Deterministic (chess (lack of randomness in the game ) )

Approaches to modelling adversarial games

 No need to predict the action of individual agents.

 Models the probabilistic behavior of agents as a dynamic system.

3. Model agents using adversarial game-tree search.

State space graph

 Two players, MAX, and MIN, take turns in the game.

 If we are at a terminal node -> utility of terminal node

MINIMAX search Algorithm

 Depth-first exploration of the tree

 Time complexity of MINIMAX is exponential: O(bm)

Heuristic Alpha-Beta Tree Search

Summary for Adversarial Search

 The MINIMAX algorithm can be determining the optimal-moves for two-player,

 Alpha-beta pruning can be removed subtrees that are provably irrelevant

If we were making a prediction, we would

Rational agents with perfect knowledge of the environment

What should agents do if they don’t have perfect information?

Most of the time we use deductive reasoning, but logic is insufficient

 Cavity -> toothache

Why is logic insufficient?

Decision theory = probability theory + utility theory

Principle of maximum expected utility (MEU)

Expected = average of outcome utilities, weighted by probability of the outcome

Utility Probability Expected utility

4000$ 0.8 4000 * 0.8 +0.2 * 0 = 3200$

3000$ 1 3000 * 1 = 3000$

Probability Terminological Map

Possible world = element of the sample space = ω

Conditional and Unconditional Probabilities

Unconditional probabilities: Degree of belief in propositions in the absence of other

Conditional probabilities: Degree of belief given other information

The product rule:

 Function that maps from a set of possible worlds to a domain or range

Domains of a Random Variable

 Boolen: {true, false}

 Arbitrary: {blonde, brown, black, red}

 Infinite and discrete: A=Z (set of integers)

P (Toothache ^ Cavity) = P (Toothache | Cavity) * P (Cavity)

 Boldface P means “for all possible values of the random variable”

Example: P (Cavity, Toothache, Catch) = 2x2x4 table

from that we can derive:

P (cavity v (OR) toothache) = (0.108+0.012+ 0.072+ 0.008) + (0.108 + 0.012 + 0.016 +

Extracting unconditional probabilities (marginalization)

P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2

P (toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2