Ai Mcs101 Module 1 Notes
Ai Mcs101 Module 1 Notes
AI which stands for artificial intelligence refers to systems or machines that mimic
human intelligence to perform tasks and can iteratively improve themselves based
on the information they collect.
The field of artificial intelligence, or AI, it attempts not just to understand but also to build
intelligent entities.
AI currently encompasses a huge variety of subfields, ranging from the general (learning and
perception) to the specific, such as playing chess, proving mathematical theorems, writing poetry,
driving a car on a crowded street, and diagnosing diseases.
AI is relevant to any intellectual task; it is truly a universal field.
In Figure 1.1 we see eight definitions of AI, laid out along two dimensions.
The definitions on top are concerned with thought processes and reasoning, whereas the ones on the
bottom address behavior.
The definitions on the left measure success in terms of fidelity to human performance, whereas
RATIONALITY the ones on the right measure against an ideal performance measure, called
rationality.
A system is rational if it does the “right thing,” given what it knows. Historically, all four approaches
to AI have been followed, each by different people with different methods.
A human-centered approach must be in part an empirical science, involving observations and
hypotheses about human behavior.
A rationalist approach involves a combination of mathematics and engineering.
The Turing Test, proposed by Alan Turing (1950), was designed to provide a satisfactory
operational definition of intelligence.
A computer passes the test if a human interrogator, after posing some written questions, cannot
tell whether the written responses come from a person or from a computer.
The computer would need to possess the following capabilities:
Turing’s test deliberately avoided direct physical interaction between the interrogator and the
computer, because physical simulation of a person is unnecessary for intelligence. However, the
so- called total Turing Test includes a video signal so that the interrogator can test the subject’s
perceptual abilities, as well as the opportunity for the interrogator to pass physical objects “through
the hatch.”
To say that a program thinks like a human, we must know how humans think. We can learn about
human thought in three ways:
• introspection—trying to catch our own thoughts as they go by;
• psychological experiments—observing a person in action;
• brain imaging—observing the brain in action.
Once we have a sufficiently precise theory of the mind, it becomes possible to express the theory as
a computer program.
If the program’s input–output behavior matches corresponding human behavior, that is evidence that
some of the program’s mechanisms could also be operating in humans.
For example, Allen Newell and Herbert Simon, who developed GPS, the “General Problem Solver”
(Newell and Simon, 1961), were not content merely to have their program solve problems correctly.
They were more concerned with comparing the trace of its reasoning steps to traces of human
subjects solving the same problems.
The interdisciplinary field of cognitive science brings together computer models from AI and
experimental techniques from psychology to construct precise and testable theories of the human
mind.
The two fields continue to fertilize each other, most notably in computer vision, which incorporates
neurophysiological evidence into computational models.
The Greek philosopher Aristotle was one of the first to attempt to codify “right thinking,” that is,
irrefutable reasoning processes.
His syllogisms provided patterns for argument structures that always yielded correct conclusions
when given correct premises—for example, “Socrates is a man; all men are mortal; therefore,
Socrates is mortal.” These laws of thought were supposed to govern the operation of the mind; their
study initiated the field called logic.
Logicians in the 19th century developed a precise notation for statements about all kinds of objects
in the world and the relations among them. (Contrast this with ordinary arithmetic notation, which
provides only for statements about numbers.)
The so-called logicist tradition within artificial intelligence hopes to build on such programs to
create intelligent systems.
There are two main obstacles to this approach:
First, it is not easy to take informal knowledge and state it in the formal terms required by logical
notation, particularly when the knowledge is less than 100% certain.
Second, there is a big difference between solving a problem “in principle” and solving it in practice.
An agent is just something that acts. All computer programs do something, but computer
agents are expected to do more: operate autonomously, perceive their environment, persist
over a prolonged time period, adapt to change, and create and pursue goals.
A rational agent is one that acts so as to achieve the best outcome or, when there is
uncertainty, the best expected outcome.
Making correct inferences is sometimes part of being a rational agent, because one way to
act rationally is to reason logically to the conclusion that a given action will achieve one’s
goals and then to act on that conclusion.
On the other hand, correct inference is not all of rationality; in some situations, there is no
provably correct thing to do, but something must still be done.
All the skills needed for the Turing Test also allow an agent to act rationally.
Knowledge representation and reasoning enable agents to reach good decisions.
The rational-agent approach has two advantages over the other approaches:
First, it is more general than the “laws of thought” approach because correct inference is
just one of several possible mechanisms for achieving rationality.
Second, it is more amenable to scientific development than are approaches based on human
behavior or human thought.
The standard of rationality is mathematically well defined and completely general, and can be
“unpacked” to generate agent designs that provably achieve it.
Human behavior, on the other hand, is well adapted for one specific environment and is
defined by, well, the sum total of all the things that humans do.
In this section, we provide a brief history of the disciplines that contributed ideas, viewpoints, and
techniques to AI.
Typically, an axon is 1 cm long (100 times the diameter of the cell body), but can reach up to 1 meter.
A neuron makes connections with 10 to 100,000 other neurons at junctions called synapses. Signals are
propagated from neuron to neuron by a complicated electrochemical reaction.
The signals control brain activity in the short term and also enable long-term changes in the connectivity
of neurons. These mechanisms are thought to form the basis for learning in the brain.
Most information processing goes on in the cerebral cortex, the outer layer of the brain. The basic
organizational unit appears to be a column of tissue about 0.5 mm in diameter, containing about 20,000
neurons and extending the full depth of the cortex about 4 mm in humans)
Machine Translation:
A computer program automatically translates from Arabic to English, allowing an
English speaker to see the headline “Ardogan Confirms That Turkey Would Not
Accept Any Pressure, Urging Them to Recognize Cyprus.”
The program uses a statistical model built from examples of Arabic-to-English
translations and from examples of English text totaling two trillion words (Brants et al.,
2007).
None of the computer scientists on the team speak Arabic, but they do understand
statistics and machine learning algorithms.
These are just a few examples of artificial intelligence systems that exist today. Not magic or
science fiction—but rather science, engineering, and mathematics, to which this book provides an
introduction
Simplest agents (Reflex agents), which base their actions on a direct mapping from states to actions.
Such agents cannot operate well in environments for which this mapping would be too large to store and
would take too long to learn.
Goal-based agents, consider future actions and the desirability of their outcomes.
Problem-solving agents use atomic representations, that is, states of the world are considered as wholes,
with no internal structure visible to the problem-solving algorithms.
Goal-based agents that use more advanced factored or structured representations are usually called
planning agents.
There are several general-purpose search algorithms that can be used to solve these problems.
Uninformed search algorithms—algorithms that are given no information about the problem other
than its definition. Although some of these algorithms can solve any solvable problem, none of them
can do so efficiently.
Informed search algorithms, on the other hand, can do quite well given some guidance on where to
look for solutions.
• Formulate goal:
• be in Bucharest
• Formulate problem:
• states: various cities
• actions: drive between cities
• Find solution:
• sequence of cities, e.g., Arad, Sibiu, Fagaras, Bucharest.
Single-State Problem Formulation
A description of what each action does; the formal name for this is the transition model,
specified by a function RESULT(s, a) that returns the state that results from doing action a
in state s. We also use the term successor to refer to any state reachable from a given state
by a single action. For example, we have
RESULT(In(Arad),Go(Zerind)) = In(Zerind) .
A path cost function that assigns a numeric cost to each path. The step cost of taking action
a in state s to reach state s’ is denoted by c(s, a, s’).
Formulating problems:
Our formulation of the problem of getting to Bucharest is a model—an abstract
mathematical description—and not the real thing.
• Level of abstraction: Think of the abstract states and actions we have chosen as
corresponding to large sets of detailed world states and detailed action sequences.
Now consider a solution to the abstract problem: for example, the path from Arad to
Sibiu to Rimnicu ,Vilcea to Pitesti to Bucharest. This abstract solution corresponds to
a large number of more detailed paths. For example, we could drive with the radio on
between Sibiu and Rimnicu Vilcea, and then switch it off for the rest of the trip.
• The abstraction is valid if we can elaborate any abstract solution into a solution in the
more detailed world; a sufficient condition is that for every detailed state that is “in
Arad,”there is a detailed path to some state that is “in Sibiu,” and so on.The
abstraction is useful if carrying out each of the actions in the solution is easier than the
original problem; in our case, the action “drive from Arad to Sibiu” can be carried
out without further search or planning by a driver with average skill.
Figure 3.3 The state space for the vacuum world. Links denote actions: L = Left, R =Right, S = Suck.
• States: The state is determined by both the agent location and the dirt locations. The
agent is in one of two locations, each of which might or might not contain dirt. Thus,
there are 2 × 22 = 8 possible world states. A larger environment with n locations has n
x 2n states.
• Initial state: Any state can be designated as the initial state.
• Actions: In this simple environment, each state has just three actions: Left, Right, and
Suck. Larger environments might also include Up and Down.
• Transition model: The actions have their expected effects, except that moving Left in
the leftmost square, moving Right in the rightmost square, and Sucking in a clean
square have no effect. The complete state space is shown in Figure 3.3.
• Goal test: This checks whether all the squares are clean.
• Path cost: Each step costs 1, so the path cost is the number of steps in the path.
The 8-puzzle, an instance of which is shown in Figure 3.3, consists of a 3×3 board with eight
numbered tiles and a blank space. A tile adjacent to the blank space can slide into the space.
The object is to reach a specified goal state, such as the one shown on the right of the figure.
The standard formulation is as follows:
The 8-puzzle belongs to the family of sliding-block puzzles, which are often used as test
problems for new search algorithms in AI. This family is known to be NP-complete, so one
does not expect to find methods significantly better in the worst case than the search
algorithms described in this chapter and the next. The 8-puzzle has 9!/2=181, 440 reachable
states and is easily solved. The 15-puzzle (on a 4×4 board) has around 1.3 trillion states, and
random instances can be solved optimally in a few milliseconds by the best search
algorithms. The 24- puzzle (on a 5 × 5 board) has around 1025 states, and random instances
take several hours to solve optimally.
8-Queens Problem:
The goal of the 8-queens problem is to place eight queens on a chessboard such that no
queen attacks any other. There are two main kinds of formulation. An incremental
formulation involves operators that augment the state description, starting with an empty
state; for the 8- queens problem, this means that each action adds a queen to the state. A
complete-state
formulation starts with all 8 queens on the board and moves them around.
• States: All possible arrangements of n queens (0 ≤ n ≤ 8), one per column in the leftmost
n columns, with no queen attacking another.
• Actions: Add a queen to any square in the leftmost empty column such that it is not
attacked by any other queen.
This formulation reduces the 8-queens state space from 1.8×1014 to just 2,057, and solutions
are easy to find. Our final toy problem was devised by Donald Knuth (1964) and illustrates
how infinite state spaces can arise. Knuth conjectured that, starting with the number 4, a
sequence of factorial, square root, and floor operations will reach any desired positive integer.
For example, we can reach 5 from 4 as follows:
Real-world problems
Consider the airline travel problems that must be solved by a travel-planning Web site:
Touring problems
Touring problems are closely related to route-finding problems, but with an important
difference.
Consider, for example, the problem “Visit every city in Figure 3.2 at least once, starting
and ending in Bucharest.”
As with route finding, the actions correspond to trips between adjacent cities. The
state space, however, is quite different.
Each state must include not just the current location but also the set of cities the agent
has visited.
So the initial state would be In(Bucharest), Visited({Bucharest}), a typical intermediate state
would be In(Vaslui), Visited({Bucharest, Urziceni, Vaslui}), and the goal test would check
whether the agent is in Bucharest and all 20 cities have been visited.
A VLSI layout problem requires positioning millions of components and connections on a chip
to minimize area, minimize circuit delays, minimize stray capacitances, and maximize
manufacturing yield.
The layout problem comes after the logical design phase and is usually split into two parts: cell
layout and channel routing.
In cell layout, the primitive components of the circuit are grouped into cells, each of
which performs some recognized function.
Each cell has a fixed footprint (size and shape) and requires a certain number of connections to
each of the other cells.
The aim is to place the cells on the chip so that they do not overlap and so that there is room for
the connecting wires to be placed between the cells.
Channel routing finds a specific route for each wire through the gaps between the cells.
These search problems are extremely complex, but definitely worth solving.
States: Each state obviously includes a location (e.g., an airport) and the current time
Furthermore, because the cost of an action (a flight segment) may depend on previous
segments, their fare bases, and their status as domestic or international, the state must
record extra information about these “historical” aspects.
Search algorithms require a data structure to keep track of the search tree that is being constructed.
For each node n of the tree, we have a structure that contains four components:
• n.STATE: the state in the state space to which the node corresponds;
• n.PARENT: the node in the search tree that generated this node;
• n.ACTION: the action that was applied to the parent to generate the node;
• n.PATH-COST: the cost, traditionally denoted by g(n), of the path from the initial state to
the node, as indicated by the parent pointers.
Queues are characterized by the order in which they store the inserted nodes. Three common
variants are the first-in, first-out or FIFO queue, which pops the oldest element of the queue;the
last-in, first-out or LIFO queue (also known as a stack), which pops the newest element of the
queue; and the priority queue, which pops the element of the queue with the highest
priority according to some ordering function.
The Uninformed Search term means that the strategies have no additional information about
states beyond that provided in the problem definition. All they can do is generate successors and
distinguish a goal state from a non-goal state. All search strategies are distinguished by the order
in which nodes are expanded. Strategies that know whether one non-goal state is “more
promising” than another are called informed search or heuristic search strategies.
• Breadth-first search (BFS)
• Uniform-cost search
• Depth-first search (DFS)
• Depth-limited search
• Iterative deepening search
• Bidirectional Search
First, the memory requirements are a bigger problem for breadth-first search than is the
execution time. One might wait 13 days for the solution to an important problem with search
depth 12, but no personal computer has the petabyte of memory it would take. Fortunately, other
strategies require less memory.
The second lesson is that time is still a major factor. If your problem has a solution at depth 16,
then (given our assumptions) it will take about 350 years for breadth-first search (or indeed any
uninformed search) to find it. In general, exponential-complexity search problems cannot be
solved by uninformed methods for any but the smallest instances.
Iterative deepening search (or iterative deepening depth-first search) is a general strategy,
often used in combination with depth-first tree search, that finds the best depth limit. It does
this by gradually increasing the limit—first 0, then 1, then 2, and so on—until a goal is found.
This will occur when the depth limit reaches d, the depth of the shallowest goal node. The
algorithm is shown in Figure 3.18. Iterative deepening combines the benefits of depth-first
and breadth-first search. Like depth-first search, its memory requirements are modest: O(bd)
to be precise. Like breadth-first search, it is complete when the branching factor is finite and
optimal when the path cost is a nondecreasing function of the depth of the node. Figure 3.19
shows four iterations of on a binary search tree, where the solution is found on the fourth
iteration.
Advantage:
delays exponential growth by reducing the exponent for time and space
complexity in half.
Disadvantage:
At every time point the two fringes must be compared. This requires an efficient
hashing data structure
Bidirectional search also requires to search backward (predecessors of a state).
This is not always possible.
3.4.4 Comparing uninformed search strategies
Figure 3.21 compares search strategies in terms of the four evaluation criteria set forth in Section
3.3.2. This comparison is for tree-search versions. For graph searches, the main differences are
that depth-first search is complete for finite state spaces and that the space and time
complexities are bounded by the size of the state space.
h(n) = estimated cost of the cheapest path from the state at node n to a goal state.
Heuristic functions are the most common form in which additional knowledge of the
problem is imparted to the search algorithm.
The most widely known form of best-first search is called A∗ A search (pronounced “A-star search”).
It evaluates nodes by combining g(n), the cost to reach the node, and h(n), the cost to get from the
node to the goal:
f(n) = g(n) + h(n)
Since g(n) gives the path cost from the start node to node n, and h(n) is the estimated cost of the
cheapest path from n to the goal, we have
f(n) = estimated cost of the cheapest solution through n
Thus, if we are trying to find the cheapest solution, a reasonable thing to try first is the node with the
The first condition we require for optimality is that h(n) be an admissible heuristic.
An admissible heuristic is one that never overestimates the cost to reach the goal.
Because g(n) is the actual cost to reach n along the current path, and f(n) = g(n) + h(n), we have as an
immediate consequence that f(n) never overestimates the true cost of a solution along the current path
through n.
Admissible heuristics are by nature optimistic because they think the cost of solving the problem is
less than it actually is.
An obvious example of an admissible heuristic is the straight-line distance hSLD that we used in
getting to Bucharest.
Straight-line distance is admissible because the shortest path between any two points is a straight
line, so the straight line cannot be an overestimate.
In Figure 3.24, we show the progress of an A∗ tree search for Bucharest.
The values of g are computed from the step costs in Figure 3.2, and the values of hSLD are given in
Figure 3.22. Notice in particular that Bucharest first appears on the frontier at step (e), but it is not
selected for expansion because its f-cost (450) is higher than that of Pitesti (417).
Another way to say this is that there might be a solution through Pitesti whose cost is as low as 417,
so the algorithm will not settle for a solution that costs 450.
A second, slightly stronger condition called consistency (or sometimes monotonicity) is required
only for applications of A∗ to graph search.
A heuristic h(n) is consistent if, for every node n and every successor n 1 of n generated by any action
a, the estimated cost of reaching the goal from n is no greater than the step cost of getting to n1 plus
the estimated cost of reaching the goal from n1:
This is a form of the general triangle inequality, which stipulates that each side of a triangle cannot be
longer than the sum of the other two sides.
Here, the triangle is formed by n, n1, and the goal Gn closest to n.
For an admissible heuristic, the inequality makes perfect sense: if there were a route from n to Gn via
n1 that was cheaper than h(n), that would violate the property that h(n) is a lower bound on the cost to
reach Gn.
Consistency is therefore a stricter requirement than admissibility, but one has to work quite hard to
concoct heuristics that are admissible but not consistent.
Consider, for example, hSLD. We know that the general triangle inequality is satisfied when each
side is measured by the straight-line distance and that the straight-line distance between n and n1 is no
greater than c(n, a, n1). Hence, hSLD is a consistent heuristic.
Optimality of A*
Completeness requires that there be only finitely many nodes with cost less than or equal to
C∗, a condition that is true if all step costs exceed some finite and if b is finite.
That A∗ search is complete, optimal, and optimally efficient among all such algorithms is
rather satisfying.
Unfortunately, it does not mean that A∗ is the answer to all our searching needs. The catch
is that, for most problems, the number of states within the goal contour search space is still
exponential in the length of the solution.
The details of the analysis are beyond the scope of this book, but the basic results are as
follows. For problems with constant step costs, the growth in run time as a function of the
optimal solution depth d is analyzed in terms of the the absolute error or the relative error of
the heuristic.
The average solution cost for a randomly generated 8-puzzle instance is about 22 steps.
The branching factor is about 3.
There is a long history of such heuristics for the 15-puzzle; here are two commonly
used candidates:
To test the heuristic functions h1 and h2, we generated 1200 random problems with solution
lengths from 2 to 24 (100 for each even number) and solved them with iterative deepening
search and with A∗ tree search using both h1 and h2.
Figure 3.29 gives the average number of nodes generated by each strategy and the effective
branching factor.
The results suggest that h2 is better than h1, and is far better than using iterative deepening
search.
Even for small problems with d = 12, A∗ with h2 is 50,000 times more efficient than
uninformed iterative deepening search.
This composite heuristic uses whichever function is most accurate on the node in question.
Because the component heuristics are admissible, h is admissible; it is also easy to prove that h is
consistent. Furthermore, h dominates all of its component heuristics.
A heuristic function h(n) is supposed to estimate the cost of a solution beginning from the state at node
n. How could an agent construct such a function?
One solution was given in the preceding sections—namely, to devise relaxed problems for which an
optimal solution can be found easily.
Another solution is to learn from experience. “Experience” here means solving lots of 8-puzzles, for
instance.
Each optimal solution to an 8-puzzle problem provides examples from which h(n) can be learned.
Each example consists of a state from the solution path and the actual cost of the solution from that point.
From these examples, a learning algorithm can be used to construct a function h(n) that can
(with luck) predict solution costs for other states that arise during search.
Inductive learning methods work best when supplied with features of a state that are relevant
to predicting the state’s value, rather than with just the raw state description.
For example, the feature “number of misplaced tiles” might be helpful in predicting the
actual distance of a state from the goal.
Let’s call this feature x1(n). We could take 100 randomly generated 8-puzzle configurations
and gather statistics on their actual solution costs.
We might find that when x1(n) is 5, the average solution cost is around 14, and so on.
Given these data, the value of x1 can be used to predict h(n). Of course, we can use several features.
A second feature x2(n) might be “number of pairs of adjacent tiles that are not adjacent in the goal
state.” How should x1(n) and x2(n) be combined to predict h(n)? A common approach is to use a
linear combination:
The constants c1 and c2 are adjusted to give the best fit to the actual data on solution costs.
One expects both c1 and c2 to be positive because misplaced tiles and incorrect adjacent pairs
make the problem harder to solve.
This heuristic does satisfy the condition that h(n)=0 for goal states, but it is not necessarily
admissible or consistent.
2024-25