AI Summary
AI Summary
This summary is heavily based on the course and written content provided by Dr. Kareem Badawi, though
not entirely.
Artificial Intelligence
1. Introduction
Artificial intelligence is the science of making machines that have the ability to act rationally.
Therefore; it’s assumed that it can think like people in both the good way and the bad way which is not
ideal due to the fact that machines cannot differentiate between the good and the bad.
This also requires us to get inside the human mind to see how it works and then creating a model that
attempts to copy that which is a difficult task and is more related to cognitive science.
Alan Turing proposed the Turing test which aims to identify a machine by making an interrogator talk to
it for about an hour, if the AI manages to go unidentified; it means that a perfect AI has been built.
However an AI that can pass this test has not yet been created as people focus on things like don’t answer
too quickly, having a favorite movie, and not being able to calculate what the square root of 1412 is in the
blink of an eye.
A concept introduced in the past by Aristotle but still isn’t recommended as:
Rationality only concerns what decisions are made not the thought process behind them.
Since goals are expressed in terms of the utility ) (منفعةof outcomes, therefore, being rational means
maximizing your expected utility.
One can argue that to be able to act rationally, one must think logically, and because an AI is basically a
computer and computers have the ability to think logically, therefore, they can achieve their goal based
on logic and trial and error.
Since an AI is a computer as well, this means that it can store past experiences in their memory or use
databases to achieve their goals.
Therefore, we can deduce that, memory and simulation are the key to decision making.
The study of AI as a rational agent is advantageous but it’s also not perfect.
Achieving perfect rationality in complex environments is not possible because the computational
demands are too high.
Note: There’s a huge difference between programming a computer to do a certain thing given all the steps
to achieve the goal, and programming a computer with certain decision-making tools that allows it to
learn how to achieve said goal by trial and error.
1.3. AI History:
1.4. Examples of what an AI can do:
Action List Can it be implemented
Play a decent game of table tennis. Yes
Play a decent game of jeopardy. Yes
Drive safely along a curving mountain road. Yes
Drive safely along Cairo Maybe
Buy a week’s worth of groceries on the web. Yes
Buy a week’s worth of groceries at a hyper super market. No
Discover and prove a new mathematical theorem. Maybe
Converse successfully with another person for an hour. No
Perform a surgical operation. No
Put away the dishes and fold the laundry. Yes
Translate spoken Chinese into spoken English in real time. Yes
Write an intentionally funny story. No
When the answer is No, this is due to the fact the environment is too complex and this complexity MUST
be implemented in the AI as the action is too sensitive. Achieving complexity requires unachievable
computational demands.
1.5. AI Applications:
Natural Language: Vision (Perception):
Speech technologies: • Object and face recognition.
• Automatic Speech Recognition (ASR). • Scene segmentation.
• Text-To-Speech Synthesis (TTS). • Image classification.
• Dialog Systems.
Language processing technologies: Robotics:
• Question answering. • Home assistant robots.
• Machine translation. • Soccer robots.
• Web search. • Self-driving cars. (Google cars)
• Text classification and spam filtering. • Advanced motion robots.
1.6. Designing Rational Agents:
• An agent is an entity that perceives and acts.
• A percept is a lasting result of something we have perceived, which, rather than being
immediately perceived, is something we know we could possibly perceive due to a certain action.
• A rational agent selects actions that maximize its expected utility.
• An agent is anything that can be viewed as perceiving its environment through sensors and acting
upon that environment through actuators.
• Characteristics of the sensors, actuators, and environment dictate techniques for selecting
rational actions.
• Humans can be considered as agents.
Agents can be grouped into five classes based on their degree of perceived intelligence and capability.
With Waymo for example, the model-based agent uses GPS to understand its location and predict
upcoming drivers. You and I take for granted that, when the brake lights of the car ahead of us come on,
the driver has hit the brakes and so the car in front of us is going to slow down. But there's no reason to
associate a red light with the deceleration of a vehicle, unless you are used to seeing those two things
happen at the same time. So the Waymo can learn that it needs to hit the brakes by drawing on its
perceptual history. Waymo can learn to associate red brake lights just ahead with the need to slow itself
down.
2.1. Types of agents:
Reflex Agents:
Reflex agents ignore percept history and act only on the current percept.
The agent function is based on the condition-action rule where if a condition (state) is true, action is taken
else it’s not.
We can deduce from those characteristics that a reflex agent doesn’t consider the future consequences
of their actions; therefore, it doesn’t need a model of the world’s current state.
This kind of agent can work in a fully observable environment but in partially observable environments
infinite loops are impossible to avoid unless the agent can randomize its actions.
Example: Blinking your eye, and vacuum cleaner moving towards the nearest dirt.
Model based agents must have a model of the environment and how the environment evolves in response
to its actions (percept history).
The model of the environment must be given to them initially but how the environment evolves in
response to their actions is percept history learned by trial and error or through given data.
Therefore, a planning agent can replan to complete its action more delicately but with a more
computational cost through evolving its percept history.
Re-planning Agents
Mastermind Agents
2.2. Environment Classification:
Fully or Partially Observable:
Fully: Meaning that the AI can see the whole environment clearly as in chess where it can see all the
pieces, tiles, and where the pieces are currently are.
Partially: Meaning that the AI can see part of the environment only as in dominos where it can’t see the
pieces in the opponent’s hands, medical diagnosis, and self-driving cars where it can only see the road to
the extent where its sensors can reach.
Adversarial:
Deterministic or Stochastic:
Deterministic: Meaning that the current state and the action taken can completely determine the next
state of the environment.
Stochastic: A stochastic environment is a random one that cannot be determined by the agent.
Discrete or Continuous:
Discrete: Meaning that the environment has a finite amount of percepts and actions that can be
performed within it.
Continuous: Meaning that the environment has an infinite amount of percepts and actions that can be
performed within it.
Single or Multi-Agent
2.3. PEAS:
PEAS stands for a Performance measure, Environment, Actuator, Sensor.
• Performance Measure: it’s the unit to define the success of an agent in what it does.
• Environment: it’s the surroundings of an agent at every instant.
• Actuator: It’s the part of the agent that delivers the output of action to the environment.
• Sensor: It’s the receptive part of an agent that takes in the input for the agent.
A world state includes every last detail of the environment while a search state keeps only the details
needed for planning or to solve the search problem.
Therefore, we can understand that we have to be picky when designing an AI; choosing the states with
the greatest benefits to the AI’s functionality and neglecting the ones with minimum benefits.
2.6. State Space Representations:
2.6.1. State Space Graph:
• It’s a mathematical representation of a search
problem.
• Nodes are abstracted world configurations.
• Arcs represent successors (action results).
• The goal test is a set of goal nodes (or just one
node).
• In a state space graph, each state occurs only
once.
A state space graph cannot be fully built as it’s too big; usually a partial graph is built to solve the problem.
• The whole idea is to ask the question of “what if” on each node and if the goal state isn’t met at
that node, then the tree expands according to the search algorithm.
A search tree only expands the states that are needed by the agent to achieve its goals.
Definitions:
Properties:
1 + 𝑏 2 + 𝑏 3 + ⋯ + 𝑏 𝑚 = 𝑂(𝑏 𝑚 ) [𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦]
Such that:
In this chapter, the search problem -travelling from the node “s” to the node “e”- will be used to
demonstrate different search algorithms.
3.1. Uninformed Search:
Idea of work:
Since iterative deepening visits states multiple times, it may seem wasteful, but it turns out to be not so
costly, since in a tree, most of the nodes are in the bottom level, so it does not matter much if the upper
levels are visited multiple times.
Properties:
• Complete.
• Optimal if all costs are 1.
• Time complexity is proven to be similar to BFS (advantage).
• Space complexity similar to DFS (advantage).
3.2. Informed Search:
3.2.1. Uniform Cost Search:
Algorithm:
• It processes all nodes with cost less than the cheapest solution by exploring increasing cost
contours.
• If that solution costs 𝐶 ∗ and arcs cost at least 𝜀, then the effective depth is roughly 𝐶 ∗ /𝜀.
• Expand the cheapest node first.
• Fringe is a priority queue depending on cumulative cost.
Properties:
• Complete: Assuming that the best
solution has a finite cost and minimum
arc cost is positive.
• Optimal: Chooses the path with the
lowest cost.
∗
• Time complexity: Takes time 𝑂(𝑏 𝐶 /𝜀 ).
(Exponential in effective depth)
• Space Complexity: Has roughly the last
∗
tier, so 𝑂(𝑏 𝐶 /𝜀 ).
Disadvantages:
A heuristic is designed for a particular search problem. (Meaning that it changes according to the search
problem)
Examples:
For the following Pac-man game, assume the search problem of pathing where we want to reach the dot.
The heuristic here would be the distance between the Pac-man and the dot. The distance may be the
Manhattan distance (horizontal and vertical distance summed up), or Euclidean distance (directly from
the Pac-man to the dot).
For the following map of Romania, assume the search problem of travelling from Arad to Bucharest. The
heuristic here would be the straight-line distance to Bucharest (Euclidean distance to Bucharest).
Choosing the heuristic for the search problem is a critical design step as a different heuristic will lead to a
solution with different properties (time complexity, space complexity, …)
4.1. Informed Search using Search Heuristics:
4.1.1. Greedy Search:
Expand the node that seems closest (with the least heuristic).
Example:
For the following map of Romania, assume the search problem of travelling from Arad to Bucharest. The
heuristic here would be the straight-line distance to Bucharest (Euclidean distance to Bucharest).
Notice how in terms of cost, the algorithm chose a path with a cost equal to (140+99+211=450) while the
optimal path has a cost equal to (140+80+97+101=418), therefore, the greedy search didn’t choose the
optimal solution.
Properties:
• Heuristic: Estimate of distance to nearest goal for each state.
• Classified as a single shot cost (doesn’t accumulate the total cost).
• Classified as a very fast search algorithm.
• Common-case: Can easily take you straight to the (wrong/not the optimal) goal.
• Worst-case: A bad greedy search algorithm can be classified as a badly-guided DFS.
• Greedy search is complete but not optimal.
4.1.2. A* Search
The UCS algorithm orders by path cost (cumulative) or backward cost 𝑔(𝑛).
The A* Search algorithm combines both and orders by the sum of the backward and forward cost.
Example:
Algorithm:
• Combines the speed of greedy search with the cost effectiveness of UCS.
• Only stops when we dequeue a goal.
• Expands in a directive root to the goal.
• Expands mainly toward the goal but does hedge its bets to ensure optimality.
Properties:
• Complete.
• Optimal through admissible/consistent heuristics.
• Time Complexity: Less than UCS.
• Space Complexity: Almost the same as UCS.
Termination of A* Search:
An A* search algorithm only stops/terminates when we dequeue a goal.
To put it simply, this means that the algorithm only terminates when no other nodes exist on the fringe
with a total cost equal to or less than the total cost of the current candidate solution. When that happens,
the algorithm explores the goal which removes it from the fringe.
Examples:
In this example, the red path is not optimal, however, the algorithm will not terminate once it finds that
goal, it’ll first explore 𝑆 → 𝐴 as 𝐴 is a node on the fringe that has a less total cost than the current
candidate solution’s cost.
In this example, the red path is not optimal but will be chosen as the solution anyway because the node
𝐴 on the fringe has a total cost more than the total cost of the candidate solution, therefore, it won’t be
explored.
Therefore, the node 𝐴 where the optimal goal lies beyond is trapped on the fringe by the bad heuristic.
This is where admissibility is required to solve the problem that may be caused by the heuristic function.
Admissibility of a Heuristic:
A heuristic is inadmissible when it breaks optimality (leads to the non-optimal solution) by trapping good
plans on the fringe.
A heuristic is admissible when it slows down bad plans but never outweighs true costs. This means that
the heuristic at any node is lower than or equal to the true cost to a nearest goal.
0 ≤ ℎ(𝑛) ≤ ℎ∗ (𝑛)
An example of an admissible heuristic would be taking the Euclidean distance as the heuristic in a Pac-
man pathing search problem as the heuristic at any location will always be less than the actual cost to the
nearest goal due to the presence of barriers.
Optimality of A* Search Algorithm:
In the following example, we’ll proof that the A* search algorithm is optimal using the heuristic ℎ by
showing that 𝐴 will exit the fringe before 𝐵 under the following assumptions:
1. Assume that B is on the fringe while an ancestor of A which is called n is also on the fringe.
2. Since 𝐴 is the optimal goal, then its cost is the true cost to the nearest goal.
o 𝑓(𝐴) = 𝑔(𝐴) + ℎ(𝐴) = 𝑔(𝐴) [ℎ(𝐴) = 0 𝑎𝑠 𝑡ℎ𝑒 ℎ𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐 𝑎𝑡 𝑡ℎ𝑒 𝑔𝑜𝑎𝑙 𝑖𝑠 𝑧𝑒𝑟𝑜]
3. The total cost of reaching the node (n) is:
o 𝑓(𝑛) = 𝑔(𝑛) + ℎ(𝑛)
4. Since the heuristic is admissible and n is an ancestor of the optimal goal A, then:
o 𝑓(𝑛) = 𝑔(𝑛) + ℎ(𝑛) ≤ 𝑓(𝐴) = 𝑔(𝐴)
5. The total cost of reaching the candidate goal (B) is:
o 𝑓(𝐵) = 𝑔(𝐵) + ℎ(𝐵) = 𝑔(𝐵) [ℎ(𝐵) = 0 𝑎𝑠 𝑡ℎ𝑒 ℎ𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐 𝑎𝑡 𝑡ℎ𝑒 𝑔𝑜𝑎𝑙 𝑖𝑠 𝑧𝑒𝑟𝑜]
6. Since the heuristic is admissible and B is a suboptimal goal, then:
o 𝑔(𝐴) < 𝑓(𝐵) = 𝑔(𝐵)
7. Therefore:
o 𝑓(𝑛) ≤ 𝑓(𝐴) < 𝑓(𝐵)
o n expands before B.
o A expands before B, therefore, A leaves the fringe (dequeues).
o A* search algorithm is optimal.
A* Applications:
• Video games. • Language analysis.
• Pathing/routing problems. • Machine translation.
• Resource planning problems. • Speech recognition.
• Robot motion planning.
5. Constraint Satisfaction Problems:
5.1. A comparison between SSPs & CSPs
5.1.1. Standard Search Problems:
In standard search problems we assumed that:
• The path to the goal is important but the goal itself is really not the center of attention.
• Paths have various costs, depths.
• Heuristics give problem-specific guidance.
An identification problem is one where we must simply identify whether a state is a goal state or not,
with no regard to how we arrive at that goal.
• In standard search problems, a state is a “black box”: arbitrary data structures, they are given to
solve the problem, the AI agent doesn’t have any information about what each state represents.
• Goal test can be any function over states.
• Successor function can also be anything.
CSPs are useful general-purpose algorithms with more power than standard search algorithms.
• Variables: CSPs possess a set of N variables 𝑋1 , … , 𝑋𝑁 that can each take on a single value from
some defined set of values.
• Domain: A set {𝑥1 , … , 𝑥𝑑 } representing all possible values that a CSP variable can take on.
• Constraints: Constraints define restrictions on the values of variables, potentially with regard to
other variables.
EX: for IPv4 you have 32 bits in the IP, there are 232 variations of that IP, the domain is 1 or 0, the
variables are the 32 bits with the location in mind.
5.2. Examples of CSPs
5.2.1. Map Coloring Example:
The idea is that we want each state ( )واليةto have a color that is different to its neighbors.
Formulation One:
• Variables: 𝑋𝑖𝑗
o The variables represent each tile in the board with 0 < 𝑖, 𝑗 < 𝑁 where N is the number
of rows/columns and the number of queens on the board.
• Domain: D = {0, 1}
o The domain represents whether there’s a queen or not.
If a variable/tile is given a value from the domain 0 then there’s no queen in that tile, if it’s given 1 then
there is.
• Constraints:
• ∑𝑁 𝑋𝑖𝑗 = 𝑁
▪ This constraint states that we must have exactly N grid positions marked
with a 1, and all others marked with a 0, capturing the requirement that
there are exactly N queens on the board.
• ∀𝑖, 𝑗, 𝑘 (𝑋𝑖𝑗 , 𝑋𝑖𝑘 ) ∈ {(0, 0), (0, 1), (1,0)}
▪ This constraint states that if two variables have the same value for i, only
one of them can take on a value of 1, encapsulating the condition that no
two queens can be in the same row.
• ∀𝑖, 𝑗, 𝑘 (𝑋𝑖𝑗 , 𝑋𝑘𝑗 ) ∈ {(0, 0), (0, 1), (1,0)}
▪ This constraint states that if two variables have the same value for j, only
one of them can take on a value of 1, encapsulating the condition that no
two queens can be in the same column.
• ∀𝑖, 𝑗, 𝑘 (𝑋𝑖𝑗 , 𝑋𝑖+𝑘,𝑗+𝑘 ) ∈ {(0, 0), (0, 1), (1,0)}
• ∀𝑖, 𝑗, 𝑘 (𝑋𝑖𝑗 , 𝑋𝑖+𝑘,𝑗−𝑘 ) ∈ {(0, 0), (0, 1), (1,0)}
▪ With similar reasoning as above, we can see that the previous two
constraints represent the conditions that no two queens can be in the same
major or minor diagonals, respectively.
Formulation Two:
• Variables: 𝑄𝑘
o The variable represents each row.
• Domain: {1, 2, 3, 𝑁}
o The domain represents the tiles in the row.
• Constraints:
o Implicit: ∀𝑖, 𝑗 𝑁𝑜𝑛 − 𝑡ℎ𝑟𝑒𝑎𝑡𝑒𝑛𝑖𝑛𝑔(𝑄𝑖 , 𝑄𝑗 )
o Explicit: (𝑄1 , 𝑄2 ) ∈ {(1, 3), (1, 4), … }, (𝑄2 , 𝑄3 ) ∈ ⋯
▪ The explicit constraint means that for rows 1 and 2, select the first tile in
row 1 and the third tile in row 3, therefore, we’re picking all the choices
where row 1 and row 2 are safe to place the queen at.
5.3. Expanding Knowledge on CSPs:
5.3.1. Constraint Graphs:
Constraint satisfaction problems are often represented as constraint graphs, where nodes represent
variables and edges represent constraints between them.
Algorithms use the constraint graph structure to speed up the search process.
Soft Constraints
A Discrete variable can take only a specific value amongst the set of all possible values or in other words,
if you don’t keep counting that value, then it is a discrete variable aka categorized variable.
• Finite Domains:
o The number of values in the domain is limited.
o E.g., Boolean CSPs, including Boolean satisfiability.
o Size 𝑑 means 𝑂(𝑑𝑛 ) complete assignments.
• Infinite Domains (integers, strings, etc.)
o The number of values in the domain is vast.
o E.g., job scheduling, variables are start/end times for each job.
o Linear constraints are solvable while nonlinear are undecidable.
Continuous Variables
A continuous variable can take any values. Think of it like this: If that number in the variable can keep
counting, then it’s a continuous variable.
The whole idea is implementing the methods used in CSPs on standard search algorithms to improve
their performance.
For understanding:
A State is defined by the value assigned so far from the domain to the variables in the state:
• Initial State: The state contains variables; those variables don’t have any assigned values from
the domain.
• Successor Function: Assign a value from the domain to an unassigned variable in the current
state.
• Goal Test: Is the current state complete and satisfies all the constraints?
For memorizing:
When assigning values from the domain to variables, only do one variable at a time.
When selecting values for a variable from the domain, only select values that don’t conflict with any
previously assigned values. If no such values exist, backtrack and return to the previous variable,
changing its value.
6.2.1. Filtering:
Filtering involves keeping track of the domain for unassigned variables and crossing off bad options.
Forward Checking:
After assigning a value from the domain to a variable, remove from the unassigned variables that share
a constraint with the assigned variable the values that would cause a violation.
• Cross off values that violate a constraint when added to the existing assignment.
• Disadvantage is that it doesn’t provide early detection for all failures.
• For an arc X→Y, it’s consistent if for every “X” in the tail, there is some “Y” in the head which
could be assigned without violating a constraint.
o In other words, each value that can be assigned to the variable at the tail has a value
that can be assigned to the variable at the head. If that is true, then the arc is consistent.
Forward checking enforces that all arcs pointing from the unassigned variables that share a constraint
with the assigned variable are consistent.
Arc-Consistency of an entire CSP:
In arc-consistency filtering algorithm, we make sure that all arcs are consistent.
Extra Knowledge:
• Begin by storing all arcs in the constraint graph for the CSP in a queue 𝑄.
• Remove arcs from 𝑄 such that in each removed arc, 𝑋𝑗 → 𝑋𝑖 , for every value for the tail variable
𝑋𝑗 , there exists at least one value for the head variable 𝑋𝑖 that does not violate any constraints.
• If one value for the tail variable 𝑋𝑗 would not work with any values in the head variable 𝑋𝑖 , we
remove that value from the set of possible values tail variable 𝑋𝑗 .
• If a value is removed for 𝑋𝑗 when enforcing arc-consistency for an arc 𝑋𝑗 → 𝑋𝑖 , add arcs of the
form 𝑋𝑘 → 𝑋𝑗 to 𝑄 where 𝑋𝑘 represents all unassigned variables.
• Continue until all arcs are removed from 𝑄.
• Assign a value to a variable, then repeat the previous steps.
Limitations of Arc-Consistency:
• After enforcing arc consistency, you may have one solution left, multiple solutions, or no
solutions at all and not know it.
• Arc-Consistency can lead to no solution for the problem due to the arc structure.
• Choose the variable with the fewest legal left values in its domain.
• That variable is also called “most constrained variable”.
• Given a choice of variable, choose the least constraining value that rules out the fewest values in
the remaining variables.
• It may take more computation.
6.2.3. Structure:
Structure is exploiting the problem’s structure to improve the performance of backtracking search.