AI Final
AI Final
AI Final
Hill-Climbing
Keeps track of one current state (no backtracking)
Does not look ahead beyond the immediate neighbors of the current state(greedy)
On each iteration moves to the neighboring state with highest value (steepest ascent)
Terminates when a peak is reached (no neighbor has a higher value)
o Global maximum: best possible state of the state space landscape. Has the highest
value of objective function.
o Flat local maximum: It is a flat space in the landscape where all the neighbor state or
current state have the same value.
o Random sideways moves can escape from shoulders, but they loop forever on flat
maxima
o Plateau is a flat area of the state-space landscape (flat local maximum and shoulder)
Problems
Local Maxima
Improvements
Allow for a limited number of sideways moves (if on plateau that is really a shoulder)
‣ Higher success rate + Higher number of moves
Stochastic hill climbing random selection between the uphill moves, with probability
related to steepness.
‣First-choice hill climbing random testing of successors until one is found that is
better than the current state.
‣Will eventually find the correct solution because goal state will be initial state
-------------------------
If elevation = objective function -> find the global maximum or highest peak -> hill climbing
If elevation = cost -> find the global minimum or lowest valley -> gradient descent
Stimulated Annealing
o Problem with hill climbing: efficient but will get stuck in a local maximum.
o Problem with random walk: most inefficient but will eventually find the local
maximum.
o Combination of both -> simulated annealing (completer and more efficient)
Genetic Algorithms
Starts with k randomly selected states(population)
Each state (or individual) is encoded as a string
Two pairs are selected at random with probability of selection increasing with fitness
Crossover point for pair is chosen at random
Offspring are created by combining string at crossover point
Small probability of random mutation
Combine
Uphill tendency and Random exploration
Exchange of information between search threads
Biggest advantage comes from crossover operation (no advantage if code is initially
perfumed)
Good application requires careful engineering of code
Summary for Local Search
For many search problems we do not need the best possible solution, or the best
solution is not achievable at all.
Local search methods are a useful tool because they operate on complete search
problems without keeping track of all the states.
Simulated annealing adds a stochastic element to hill climbing and give optimal
solutions in some circumstances.
Stochastic local beam search provides a first approach to the generation and selection
of states.
Genetic algorithms maintain a large population of states and use operations such as
mutation and crossover to expand the search space.
The uninformed and informed search algorithms that we have seen are designed to explore
search spaces systematically.
- They keep one or more paths in memory and by record which
alternatives have been explored at each point along the path.
- When a goal is found, the path to that goal also constitutes a solution to
the problem.
- In many problems, however, the path to the goal is irrelevant.
If the path to the goal does not matter, we might consider a different class of algorithms that
do not worry about paths at all.
Adversarial Search
How can we design AI systems to play adversarial games.?
Is there a limit to how far AI can look ahead in the game as far as
possible?
Tic-Tac-Toe
Ordinary search vs Adversarial search
In a normal search such as for the 8-puzzle, we could end the game by finding a path
to a good end position.
However, in adversarial search, the other player co-determines the path.
Minimax search
Complexity of MINIMAX
Alpha-Beta pruning
α = the value of the best (i.e., highest value) choice we have found so
far at any choice point along the path for MAX. Think: α = “at least.”
β = the value of the best (i.e., lowest value) choice we have found so
far at any choice point along the path for MIN. Think: β = “at most.”
Transposition tables
In games like chess, the same positions can occur as a result of different moves -> this is
called a transposition
Exploring the search-tree from that point again would be at least double work
Results of search for positions can be stored in a transposition table
Lookup from transposition table instead of search
Chess positions can be converted into unique indexes using special hashing
techniques so that lookup has O(1) time complexity
Heuristic strategies
Shannon (1950)
Type A strategy (historically used for chess)
Consider wide but shallow part of tree and estimate the utility at that point
Type B strategy (historically used for Go)
Consider promising parts of the tree deeply and ignore unpromising paths
Heuristic evaluation functions must be used when the entire game-tree cannot be
explored (i.e., when the utility of the terminal nodes can’t be computed)
Monte-Carlo tree search is an alternative which plays-out entire games repeatedly and
chooses the next move based on the proportion of winning playouts
Problem Solving Under Uncertainty
How can we build machines that can handle the uncertainty of the natural world?
(“in this world noting can be said to be certain, except death and taxes”- Benjamin Franklin 1789)
It is a valid system of making predictions, but is it a good one?
What makes a good model?
Good models make errors because there are often noises in the date and we don’t want or model to
capture all that noises. The black lines are the errors.
Q1: degree-25 polynomial Q2: degree-3 polynomial
How can we build machines that can handle the uncertainty of the natural world?
Only an exhaustive list of possibilities on the right side will make the rule true.
Probability Theory
Probability statements are usually made with regard to a knowledge state.
Actual state: patient has a cavity or patient does not have a cavity
Knowledge state: probability that the patient has a cavity if we haven’t observed her yet.
4000$ 0.8
3000$ 1
Answer:
Possible worlds
Example: 1) when you throw two dice and you have a bunch of outcomes, those are the
possible worlds of throwing the dices
2) all possible configurations of a chessboard would all be the possible worlds of
the game
In statistics and AI, we use the term “possible worlds” to refer the possible states of
whatever we are trying to represent. The term ”world” is limited to the problem we
are trying to represent.
A possible world (ω lowercase omega) is a state that the world could be in.
A set of possible worlds ( capital omega) includes all the states that the world could
be in. must be exhaustive.
Each possible world must be different from all the other possible worlds. Worlds must
be mutually exclusive.
Set of all possible worlds = sample space =
= {(1,1),(1,2),……(6,5),(6,6)}
Events
Set of worlds in which a proposition holds
Probability of an event: sum of probabilities of the worlds in which a proposition holds
Example:
Proposition: rolling 11 with two dice
P (total= 11)
Event: set of worlds in which the proposition holds
{(5,6), (6,5)}
Probability of event
P((5,6)) + P((6,5)) = 1/36 + 1/36 = 1/18
Conditional Probabilities
Dice 1
P (doubles =5)
P (double | Dice1 = 5) =
P (Dice 1=5)
P (ab )
P (a | b) =
P(b)
implies
P (a ^ b) = P (a | b)P(b)
Random Variables
Example: Random variable Total is defined as the sum of throwing two dices
Possible worlds: {(1,1), (1,2),.....(6,6)}
Domain or range: {2,3, 4…12}
A= true is written as a
A = false is written as -a
Joint Probabilities
Probability Axioms
P (cavity v toothache) =?
cavity – rectangle
toothache - circle
Conditioning
Independence
The cavity has no influence on the weather and same situation on the other hand, the cavity
has nothing to do with the actual weather outside.
Conditional Independence
P (X, Y | Z) = P (X | Z) * P (Y | Z)
Example:
Catch and toothache are not independent: if the probe catches, then it is likely that the
tooth has a cavity, and that this cavity causes a toothache.
However, toothache and catch are independent, given the presence or absence of a
cavity.
o If a cavity is present, then whether there is a toothache is not
independent or whether the probe catches, and vice versa.
Product Rule
o P (a ^ b) = P (a | b) * P (b)
o P (a ^ b) = P (b | a) * P (a)
Bayes Rule
¿
o P (b | a) = P ( a|b ¿∗P(b) P (a)
o Useful when you have estimate for three of the four terms and you
need to compute the fourth
Bayes’ Rule
o Example: what is the probability that you ate a magic mushroom if you are
hallucinating?
Magic mushrooms cause hallucinations 70% of the time. The prior probability that someone
ate magic mushroom for lunch is 1/50,000. The prior probability that someone who comes
into the hospital is hallucinating is 1%.
P (0.07)∗P (0.00002)
P (magic mushroom | hallucination) = = 0.00014
P(0.01)
o Example 2: What is the probability that you will hallucinate when eat a magic
mushroom?
A devoted researcher does experiment with psychoactive substances once a month. Over the
course of the last year, the researcher did 12 experiments. The researcher hallucinated 9 times
out of 12. Out of 10 times the researcher was hallucinating, two were attributable to magic
mushroom use. Half of the experiments involved magic mushrooms.
12 experiments; 9/12 (0.75) hallucinated; 2/10 (0,2) times hallucinating because of magic
mushrooms; 6/12 (0,5) experiments involved magic mushrooms
P (0 , 2)∗P(0.75)
P (hallucination | magic mushroom) = = 0,3
P(0.5)
Scaling up inference?
Summary
Practical 4 – Local & Adversarial Search 2
1.MINIMAX
1st step
2nd step
2.Alpha-Beta Pruning 1
3.Alpha-Beta Pruning 2 (Same as the tree with minimax, different search strategy)
4.Genetic Algorithms
o A genetic algorithm (GA) is a variant of stochastic beam search in which successor
are generated by combining two parent states rather than by modifying a single state
o Starts with k randomly selected states(population)
o Each state (or individual) is encoded as a string
Examples:
Probabilistic Reasoning / Bayesian Networks
Graphs/Networks
Mathematical structures used to model pairwise (because any edge in the graph tells us something
about how the nodes are connected) relations between objects
Types
Acyclic graphs are always directed, you There is at least one cyclic in the graph
cannot end up at the same point. and if you decide to follow it you will
end up at the starting point.
Directed connected graph without directed cycles is a tree. It does not need to look like a tree.
Paths,
Trails,
and
Walks
Real life examples:
-we use common sense inference to say that the weather has no influence on the cavity, catch
and toothache, on the other hand we think that whether or not there is a cavity has an
influence on the probability of toothache and same with the catch. But we do not think that
the toothache directly influences the catch.
-This kind of inferences and making this kind of assumptions is when you make a Bayesian
network is usual, and that is called domain knowledge.
Example:
We want to know the probability that the alarm has sounded, but neither a burglary nor an
earthquake has occurred, and both John and Mary call.
P(a, -b, -e, j, m)
Constructing Bayesian Network
Two ways to understand the semantics of a Bayesian Network
Chain Rule
Constructing Bayesian Network
Why is 2 wrong? -the graph is not fully directed (no connection between weather and the
other nodes)
Representing conditional probability tables more efficiently
Most relationships between parents nodes and their descendants are not completely arbitrary.
Deterministic nodes
Value of the nodes is specified exactly by
the value of is parents, with no
uncertainty
Logic is insufficient in this way we do
not need probability.
We do not create probability tables if
there is an easier way.
Dealing with continuous variables:
Discretization (e.g., split up temperature in low, medium and high)
Summary
Probabilistic Reasoning 1
P(c0) =?
P(-r1) =?
P(c1) =?
P (c1 | -c0) =?
P (-c1 | c0) =?
P (r1 | c0) =?
P (-r1 | -c0) =?
P(c0) =?
P(c0) = 0.171 + 0.076 + 0.189 + 0.126 = 0.562
P(-r1) =?
P(-r1) = 0.189 + 0.126 + 0.075 + 0.258 = 0.648
P(c1) =?
P(c1) = 0.171 + 0.189 + 0.087 + 0.075 = 0.522
P (c1 | -c0) =?
P (c1 | -c0) = P (c1 ^ -c0) / P (-c0)
P (c0) = 0.562 => P (-c0) = 1 – 0.562 = 0.438
P (-c1 | c0) =?
P (-c1 | c0) = P (-c1 ^ c0) / P (c0)
P (co) = 0.562
P (r1 | c0) =?
P (r1 | c0) = P (r1 ^ c0) / P(c0)
P (co) = 0.562
----------------------------------------------------------------------------------------------------------------
P (-c1 | m0) =?
P (c1 | (c0 ^ r0)) =?
P (c1 | (h0 ^ -r1)) =?
P (-c1 | (-c0 ^ l0)) =?
P (-c1 | m0) =?
Semantics = the meaning of words and how to combine words into meaningful phrases and
sentences.
Behaviourism (and how it’s implicit in Strong-AI) = the theory that human and animal behaviour can
be explained in terms of conditioning, without appeal to thoughts or feelings. Behaviour is seen as
the outward expression of what goes on in the mind, so, according to the strong AI view, if a
machine behaves as if it has intelligence/a mind, then it must be intelligent/have a mind.
1. Some AI researchers believe that by finding the right program they will create a thinking,
conscious machine.
2. Searle’s Chinese room argument: Running the right program is not sufficient for a thinking
machine.
3. The Chinese room runs the right program, but has no understanding of Chinese.
4. Searle is not arguing against the possibility of creating a thinking machine, he is arguing
against the idea that doing this is merely a matter of coming up with the right program.
5. If we are to construct thinking machines with consciousness, we also need to consider the
nature of the machinery that runs the program.
6. What kind of machinery do we need?
7. Will it ever be possible to “measure” consciousness?
4 counter arguments
The person in the room doesn’t understand Chinese, but the system as a whole does understand
Chinese. Searle is playing the role of a CPU, but the system has other components like a memory etc.
E.g., we don’t say the brain understands and feels love, we say that people understand and feel love.
• Searle’s response: The person in the room could internalize the whole system, and would still not
understand Chinese.
The person in the room doesn’t understand Chinese, but if the system were connected to the world
like a robot, with sensors etc, then it would understand Chinese. This would establish a causal
connection between the world and the structures being manipulated.
• Searle responds: All these sensors provide is information. There is no difference between this
information and information passed into the room in the form of questions.
3: the brain simulator reply
What if the program precisely simulated the brain of Chinese speaker, including the neural
architecture and the state of every neuron. Then then the system would understand Chinese.
• Searle responds: Whatever system the person in the room is simulating, it will still only be a
simulation.
The only way we attribute understanding to other people is through their behaviour. There is no
other way. Therefore, we must decide if we attribute understanding to machines in the same way,
only through their behaviour.
Searle responds: “The problem in this discussion is not about how I know that other people
have cognitive states, but rather what it is that I am attributing to them when I attribute
cognitive states to them.”
There is a difference: We know machines are just manipulating symbols without knowing
what they mean, but we are not sure about people.
Discussion: Ethics
Two main concerns
1. Protecting us from AI: As AI penetrates deeper into society, what ethical and moral issues
does this pose?
2. Protecting the rights of AI systems: Should AI systems have moral status, and if so, when and
why? What are the implications?
Three scenarios
Because AGI aims at general abilities, AI systems of the future are likely to carry out tasks
that we didn’t design them for. Will they behave ethically when carrying out these tasks?
The moral/ethical implications of AGI systems need to be verified before they are deployed.
How can we do this? The systems must somehow think in the same way that trustworthy
designer would.
Ethical cognitive considerations need to be made part of the engineering problem, rather
than being considered as an afterthought.
We need to be open minded about what kind of systems might possess sentience and
sapience. The notions of morality and ethics has always evolved, fitting the concerns of the
time. This is likely to continue, and AI may play a significant role in shaping future notions of
ethics and morality.
Objective vs. subjective time: Should machines that think faster than us go to prison for a
shorter period of time?
Accelerated reproduction: Should machines that reproduce faster than others be subject to
different moral codes?
Bostrom and Yudkowsky propose how we should decide if a machine
should have moral status. Summarize and elaborate on their proposal in
one paragraph. In addition, summarize in one paragraph why the group
agrees, or disagrees, with this proposal.
Two criteria are commonly proposed as being importantly linked to moral status,
either separately or in combination: sentience and sapience (or personhood).
These may be characterized roughly as follows:
We strongly agree with this proposal, simply because there is no other criteria
relevant to have a moral status.
“If two beings have the same functionality and the same conscious experience,
and differ only in the substrate of their implementation, then they have the
same moral status.”
Summary: AI and humans can have the same conscious experience. The only
difference between them is the material they are composed of.
For: The precondition of same functionality and same conscious experience are
required to make sure that the two entities have at least some common
propensities and idea of how the world is. This argument implies that difference
in moral status does not arise simply due to difference in substrate. If we have
the presupposition that the theory of mind we use to judge whether other
humans have a conscious experience similar to us does not necessarily depend
upon their physical appearance, then the same is true for entities other than
humans if we have judged that they have similar conscious experience. If a
human consciousness is uploaded into a computer chip but still has the same
conscious experience judged by the functionality and propensities that they
have, then the same is true for an AI. Considerations other than moral status can
be used to differentiate, like we use among humans (family members vs
strangers), simply not the moral status itself.
Initially, even if two people who experienced the same function and conscious
experience may not necessarily have the same morals, regardless of how they
are put into practice. Machines hence can encode an implicit similar algorithm as
human functionality and conscious experiences, but still an algorithm. It's just
matrix multiplication at the end of the day(no consciousness although we are
unable to clearly define it). The author(s) of how the Ai would conceive rules and
moralities are conscious entities who carefully consider their reactions and fully
comprehend the inputs. Ai in this case is simply acting, and even though they
are “physically or consciously” there, a conscious has already responded to
those inputs morally.
“If two beings have the same functionality and the same consciousness
experience, and differ only in how they came into existence, then they have the
same moral status.”
Summary: A being’s moral status is not affected by how it came into existence.
The moral standing of that being is not undermined, reduced, or altered by
deliberate design.So, The Principle of Ontogeny Non-Discrimination is consistent
with the claim that the creators or owners of an AI system with moral status may
have special duties to their artificial mind which they do not have to another
artificial mind, even if the minds in question are qualitatively similar and have
the same moral status.
For: If two beings have the same consciousness experience and functionality
therefore they have the same quality of subjective experience and even same
possibility of use. Although they differ in how they came into existence they have
the same moral status. As an example I would provide: If we would make a
human clone. The clone would not remember its birth the same as we do not
also it would have the exact same functions as we do. Therefore the clone could
think he is human. So if we would be able to find only difference and it would be
the way of birth we could argue if the clone is human but we would have to
assign the same moral status.
Against: AI shouldn’t have a moral status because it, while it can be argued it
could experience reality the same way humans do, does not have a brain, which
is responsible for consciousness and intelligence. People do not oppose causal
factors such as assisted delivery, in vitro fertilization, etc. in humans when
deciding whether new humans deserve a moral status, but they possess a brain
which is the main prerequisite both for sentience and sapience An AI cannot
have a moral status in my opinion. First of all, it cannot reproduce emotions like
humans, they don’t feel pain, regret or sorrow after being done wrong for
example. Secondly AI machines don’t know what is morally wrong or right. A
human can for example deduct that stealing is wrong, but a machine cannot
decide that for itself, without human intervention.
For: I agree that humans and computers experience time differently and this
time of ‘reflection’ sometimes is crucial. AI could experience time differently
from humans, at least in its uploading and processing of information (which it
does much faster compared to a human being). For example, humans are put
into prison to reflect on what they have done. This entails that fairness requires
us to take subjective experiences into account.
MOCK
1.Given the figure below, indicate from which starting states the global maximum will be reached
with the hill-climbing algorithm
Answer: C, D, A, B
2.Given no mutations and a crossover point of 4(after the fourth character), give the strings for the
following ancestors and descendants.
3.Given the following game tree for a two players game, match nodes J, L, O, R to their correct
value, applying the MINIMAX algorithm
Answer:
J=2
L=8
O = 12
R = 18
4.Given the following game tree with utility values for
the terminal nodes, indicate which nodes will be
pruned using MINIMAX with alpha-beta pruning.
Answer: L, M, P
Neither
Neither
Neither
Searle
A critic
Local and Adversarial Search 1
State Space
Suppose that we want to sort the unsorted sequence using steepest descent: A-D-C-B