Artificial Intelligence Unit 1 - 5
Artificial Intelligence Unit 1 - 5
UNIT-I
1.1 Introduction
Formal Definition of AI:
AI is a branch of computer science which is concerned with the study and creation of computer
systems that exhibit
some form of intelligence
OR
those characteristics which we associate with intelligence in human behavior
AI is a broad area consisting of different fields, from machine vision, expert systems to the
creation of machines that can "think". In order to classify machines as "thinking", it is necessary
to define intelligence.
Intelligence:
Intelligence is a property of mind that encompasses many related mental abilities, such as the
capabilities to
reason
plan
solve problems
think abstractly
comprehend ideas and language and
learn
Intelligent Systems:
An intelligent system is a system that can imitate, automate some intelligent behaviors of human
being. Expert systems, intelligent agents and knowledge-based systems are examples
of intelligent systems. Intelligent systems perform search and optimization along with learning
capabilities. So they are technologically advanced machines that perceive and respond to the
world around them. The field of intelligent systems also focuses on how these systems interact
with human users in changing and dynamic physical and social environments.
Categories of AI System
Systems that think like humans
Systems that act like humans
Systems that think rationally
Systems that act rationally
Foundations of AI
Foundation of AI is based on
• Philosophy
• Mathematics
• Economics
• Neuroscience
• Control Theory
• Linguistics
• Computer Engineering
• Psychology
Philosophy:
Can formal rules be used to draw valid conclusions?
How does the mind arise from a physical brain?
Where does knowledge come from?
How does knowledge lead to action?
Mathematics:
o More formal logical methods
Boolean logic
Fuzzy logic
o Uncertainty
The basis for most modern approaches to handle uncertainty in AI
applications can be handled by Probability theory, modal and temporal logics
Economics:
How should we make decisions so as to maximize payoff?
How should we do this when others may not go along?
How should we do this when the payoff may be far in the future?
Neuroscience:
How do the brain works?
o Early studies (1824) relied on injured and abnormal people to understand what
parts of brain work
o More recent studies use accurate sensors to correlate brain activity to human
thought
▪ By monitoring individual neurons, monkeys can now control a computer
mouse using thought alone
o How close are we to have a mechanical brain?
▪ Parallel computation, remapping, interconnections,….
Control Theory:
o Machines can modify their behavior in response to the environment (sense/action loop)
▪ Water-flow regulator, steam engine governor, thermostat
o The theory of stable feedback systems (1894)
▪ Build systems that transition from initial
state to goal state with minimum energy
▪ In 1950, control theory could only describe
linear systems and AI largely rose as a
response to this shortcoming
Linguistics:
How does language relate to thought?
Speech demonstrates so much of human intelligence
o Analysis of human language reveals thought taking place in ways not understood
in other settings
▪ Children can create sentences they have never heard before
Language and thought are believed to be tightly intertwined
Computer Engineering:
How can we build an efficient computer ?
Psychology:
How do humans and animals think and act ?
History of AI
A boom of AI (1980-1987):
Year 1980: After AI winter duration, AI came back with "Expert System". Expert
systems were programmed that emulate the decision-making ability of a human expert.
In the Year 1980, the first national conference of the American Association of Artificial
Intelligence was held at Stanford University.
Sub-areas of AI
Artificial Intelligence is having various sub fields in its domain. All the Sub-Fields can be
distinguished as per various techniques :
Neural Networks
Neural Networks are inspired by human brains and copies the working process of human
brains. It is based on a collection of connected units or nodes called artificial
neurons or perceptrons.
The Objective of this approach was to solve the problems in the same way that a human
brain does.
Vision
In Artificial Intelligence Vision (Visioning Applications) means processing any
image/video sources to extract meaningful information and take action based on that.
In this field of artificial Intelligence we have also developed such kind of robots which
are acquiring human activities within some days or sometimes some hours and train
themselves . For e.g. object recognition, image understanding , Playing Robots etc.
Machine Learning
The capability of Artificial Intelligence systems to learn by extracting patterns from data
is known as Machine Learning.
It is an approach or subset of Artificial Intelligence that is based on the idea that
machines can be given access to data along with the ability to learn from it.
Robotics
Robots are the artificial agents which behaves like human and build for the purpose of
manipulating the objects by perceiving, picking, moving, modifying the physical
properties of object, or to have an effect thereby freeing manpower from doing repetitive
functions without getting bored, distracted, or exhausted.
Applications
Some of the applications are given below:
Business : Financial strategies, give advice
Engineering: check design, offer suggestions to create new product
Manufacturing: Assembly, inspection & maintenance
Mining: used when conditions are dangerous
Hospital : monitoring, diagnosing & prescribing
Education : In teaching
Household : Advice on cooking, shopping etc.
Farming : prune trees & selectively harvest mixed crops.
Structure of Agents
Agents
Definition: An agent perceives its environment via sensors and acts upon that environment
through its actuators
Rational Agents:
An agent should strive to "do the right thing", based on what:
– it can perceive and
– the actions it can perform.
The right action is the one that will cause the agent to be most successful
Definition:
For each possible percept sequence, a rational agent should select an action that maximizes
its performance measure (in expectation) given the evidence provided by the percept
sequence and whatever built-in knowledge the agent has.
Types of Agents
1. Simple reflex agents
2. Model based reflex agents
3. Goal based agents
4. Utility based agents
5. Learning agents
Example: The agent program for a simple reflex agent in the two-state vacuum
environment.
function REFLEX-VACUUM-AGENT([location,status]) returns an action
if status = Dirty then return Suck
else if location = A then return Right
else if location = B then return Left
3. Goal-based agents
They choose their actions in order to achieve goals. Goal-based approach is more flexible than
reflex agent since the knowledge supporting a decision is explicitly modeled, thereby allowing
for modifications.
Goal − It is the description of desirable situations.
4. Utility-based agents
They choose actions based on a preference (utility) for each state.
Goals are inadequate when −
• There are conflicting goals, out of which only few can be achieved.
• Goals have some uncertainty of being achieved and you need to weigh likelihood of
success against the importance of a goal.
5. Learning agents
Learning agents are such agents which adapts and improve over time.
More complicated when agent needs to learn utility information: Reinforcement learning
Overview of Structure of Agents:
An agent perceives and acts in an environment, has an architecture, and is implemented by an
agent program.
A rational agent always chooses the action which maximizes its expected performance, given its
percept sequence so far.
An autonomous agent uses its own experience rather than built-in knowledge of the environment
by the designer.
An agent program maps from percept to action and updates its internal state.
Simple reflex agents
are based on condition-action rules, implemented with an appropriate production
system. They are stateless devices which do not have memory of past world
states.
Agents with memory - Model-based reflex agents
have internal state, which is used to keep track of past states of the world.
Agents with goals – Goal-based agents
are agents that, in addition to state information, have goal information that
describes desirable situations. Agents of this kind take future events into
consideration.
Utility-based agents
base their decisions on classic axiomatic utility theory in order to act rationally.
Learning agents
they have the ability to improve performance through learning.
State Space
Together the initial state, actions and transition model implicitly define the state space of
the problem – the set of all states reachable from the initial state by any sequence of
actions.
The state space forms a directed network or graph in which the nodes are states and the
links between nodes are actions.
A path in the state space is a sequence of states connected by a sequence of actions.
Formulate problem
States: various cities
Actions: drive between cities
Problem types
1. Single –state problem
2. Multiple –state problem
3. Contingency problem.
Contingency problem
partially observable (initial state not observable)
non-deterministic
Solution :
A sequence of operators leading from the initial state to a goal state
Abstraction
Real world is absurdly complex
State space must be abstracted for problem solving
(Abstract) operator
Complex combination of real actions
Example: Arad → Zerind represents complex set of possible routes
(Abstract) solution
Set of real paths that are solutions in the real world
7 2 4 1 2 3
5 6 4 5 6
8 3 1 7 8
Place eight queens on a chess board such that no queen can attack another queen
No path cost because only the final state counts!
Incremental formulations
Complete state formulations
Solutions:
Eight Queens Problem Formulation 1:
States:
o Any arrangement of 0 to 8 queens on the board
Initial state:
o No queens on the board
Successor function:
o Add a queen to an empty square
Goal Test:
o 8 queens on the board and none are attacked
64*63*…*57 = 1.8*1014 possible sequences
Search: Searching is a step by step procedure to solve a search-problem in a given search space.
A search problem can have three main factors:
Search Space: Search space represents a set of possible solutions, which a system may
have.
Start State: It is a state from where agent begins the search.
Goal test: It is a function which observe the current state and returns whether the goal
state is achieved or not.
Search tree: A tree representation of search problem is called Search tree. The root of the search
tree is the root node which is corresponding to the initial state.
Search Trees
(a) The initial state Arad
Informed Search:
Informed search algorithms use domain knowledge. In an informed search, problem
information is available which can guide the search. Informed search strategies can find a
solution more efficiently than an uninformed search strategy. Informed search is also
called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but
guaranteed to find a good solution in reasonable time.
Informed search can solve much complex problem which could not be solved in another
way.
The informed search algorithm is more useful for large search space. Informed search
algorithm uses the idea of heuristics, so it is also called Heuristic search.
So Information about cost to goal taken into account
Ex: Best first search , A* search
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path.
It takes the current state of the agent as its input and produces the estimation of how close
agent is from the goal.
The heuristic method, however, might not always give the best solution, but it guaranteed
to find a good solution in reasonable time.
Heuristic function estimates how close a state is to the goal. It is represented by h(n), and
it calculates the cost of an optimal path between the pair of states.
The value of the heuristic function is always positive, it is also called Heuristic search.
Admissibility of the heuristic function is given as:
h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should
be less than or equal to the estimated cost.
Example:
Let us see how this works for route-finding problems in Romania; we use the straight
line distance heuristic, which we will call hSLD.
If the goal is Bucharest, we need to know the straight-line distances to Bucharest, which
are shown in below figure.
For example, hSLD(In(Arad))=366. Notice that the values of hSLD cannot be computed
from the problem description itself.
Moreover, it takes a certain amount of experience to know that hSLD is correlated with
actual road distances and is, therefore, a useful heuristic.
The above shows the progress of a greedy best-first search using hSLD to find a path from
Arad to Bucharest.
The first node to be expanded from Arad will be Sibiu because it is closer to Bucharest
than either Zerind or Timisoara.
The next node to be expanded will be Fagaras because it is closest. Fagaras in turn
generates Bucharest, which is the goal.
For this particular problem, greedy best-first search using hSLD finds a solution without
ever expanding a node that is not on the solution path; hence, its search cost is minimal. It is not
optimal, however: the path via Sibiu and Fagaras to Bucharest is 32 kilometers longer than the
path through Rimnicu Vilcea and Pitesti.
This shows why the algorithm is called “greedy”—at each step it tries to get as close to the goal
as it can.
Algorithm A*:
OPEN = nodes on frontier CLOSED=expanded nodes
OPEN = {<s,nil>}
while OPEN is not empty
remove from OPEN the node <n,p> with minimum f(n)
place <n,p> on CLOSED
if n is a goal node, return success (path p)
for each edge connecting n & m with cost c
if <m,q> is on CLOSED and {p|e} is cheaper than q
then remove n from CLOSED , put <m,{p|e}> on OPEN
else if <m,q> is on OPEN AND {p|e} is cheaper than q
then replace q with {p|e}
else if m is not on OPEN put <m,{p|e}> on OPEN
return failure
Example 1:
:
Solution:
A*: Difficulties
It becomes often difficult to use A* as the OPEN queue grows very large.
A solution is to use algorithms that work with less memory
Memory bounded heuristic search algorithms reduce the memory requirements for A* by
introducing IDA*.
IDA* is an iterative deepening algorithm.
The cut-off for nodes expanded in an iteration is decided by the f-value of the nodes.
IDA* Algorithm
Algorithm:
Set certain threshold/f-bound
If f(node) > threshold/f-bound, prune the node
Set threshold/f-bound = minimum cost of any node that is pruned
Terminates when goal is reached.
Game Setup:
Two players: MAX and MIN
MAX moves first and they take turns until the game is over
o Winner gets award, loser gets penalty.
Games as search:
o Initial state: e.g. board configuration of chess
o Successor function: list of (move , state) pairs specifying legal moves.
o Terminal test: Is the game finished?
o Utility function: Gives numerical value of terminal states.
E.g. win (+1), lose (-1) and draw (0) in tic-tac-toe or chess
Draw back:
The main drawback of the minimax algorithm is that it gets really slow for complex
games such as Chess, go, etc.
This type of games has a huge branching factor, and the player has lots of choices to
decide.
This limitation of the minimax algorithm can be improved from alpha-beta pruning
α-β Pruning
Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization
technique for the minimax algorithm.
As we have seen in the minimax search algorithm that the number of game states it has to
examine are exponential in depth of the tree.
Since we cannot eliminate the exponent, but we can cut it to half. Hence there is a
technique by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning.
This involves two threshold parameter Alpha and beta for future expansion, so it is
called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only prune
the tree leaves but also entire sub-tree.
Alpha: The best (highest-value) choice we have found so far at any point along the path
of Maximizer. The initial value of alpha is -∞.
Beta: The best (lowest-value) choice we have found so far at any point along the path of
Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the
final decision but making algorithm slow. Hence by pruning these nodes, it makes the
algorithm fast.
Using α-β Pruning improve search by reducing the size of the game tree.
Example 1:
Rules of Thumb
α is the best ( highest) found so far along the path for Max
β is the best (lowest) found so far along the path for Min
Search below a MIN node may be alpha-pruned if the its β of some MAX ancestor
Search below a MAX node may be beta-pruned if the its β of some MIN ancestor.
Example 2:
The α-β algorithm
Knowledge Bases:
A Wumpus World
A Wumpus World PEAS description
Performance measure:
gold +1000, death -1000
-1 per step, -10 for using the arrow
Principle Difficulty: Agent is initially ignorant of the configuration of the environment – going
to have to reason to figure out where the gold is without getting killed!
Wumpus goal:
The agent’s goal is to find the gold and bring it back to the start square as quickly as possible,
without getting killed
1000 points reward for climbing out of the cave with the gold
1 point deducted for every action taken
10000 points penalty for getting killed
Note that in each case for which the agent draws a conclusion from the available information,
that conclusion is guaranteed to be correct if the available information is correct. This is a
fundamental property of logical reasoning.
The Wumpus Agent First Step
Later
Let’s Play
Wumpus World Characterization:
Fully Observable: No – only local perception
Deterministic: Yes – outcomes exactly specified
Episodic: No – sequential at the level of actions
Static : Yes – Wumpus and Pits do not move
Discrete: Yes
Single-agent: Yes – Wumpus is essentially a natural feature
Propositional Logic:
Propositional logic (PL) is the simplest form of logic where all the statements are made by
propositions. A proposition is a declarative statement which is either true or false. It is a
technique of knowledge representation in logical and mathematical form. Propositions can be
either true or false, but it cannot be both. Propositional logic is also called Boolean logic as it
works on 0 and 1.
Examples:
Today is Tuesday.
The Sun rises from West (False proposition)
2+2= 5(False proposition)
2+3= 5
Examples of Connectives:
(P ∧ Q) Arsalan likes football and Arsalan likes baseball.
(P ∨ Q) Arsalan is a doctor or Arsalan is an engineer.
(P ⇒ Q) If it is raining, then the street is wet.
(P ⇔ Q) I am breathing if and only if I am alive
Precedence of Connectives:
To eliminate the ambiguity we define a precedence for each operator. The “not” operator (¬)
has the highest precedence, followed by ∧ (conjunction), ∨(disjunction),⇒(implication),⇔
(biconditional).
Example:¬A ∧ B the ¬ binds most tightly, giving us the equivalent of (¬A)∧B rather than ¬
(A∧B)
Syntax of FOL:
Constants KingJohn, 2,...
Predicates Brother, >,...
Functions Sqrt, ...
Variables x, y, a, b,...
Connectives , , , ,
Equality =
Quantifiers ,
Logics in general:
Ontological Commitment:
What exists in the world — TRUTH
PL : facts hold or do not hold.
FOL : objects with relations between them that hold or do not hold
Constant Symbols:
▪ Stand for objects
▪ e.g., KingJohn, 2,...
Predicate Symbols
▪ Stand for relations
▪ e.g., Brother(Richard, John), greater_than(3,2)...
Function Symbols
▪ Stand for functions
▪ e.g., Sqrt(3) ...
Relations:
Some relations are properties: they state
some fact about a single object:
Round(ball), Prime(7).
n-ary relations state facts about two or more objects:
Married(John,Mary), LargerThan(3,2).
Some relations are functions: their value is another object: Plus(2,3), Father(Dan).
Terms:
Term = logical expression that refers to an object.
There are 2 kinds of terms:
o constant symbols: Table, Computer
o function symbols: Sqrt(3), Plus(2,3) etc
Functions can be nested:
o Pat_Grandfather(x) = father(father(x))
Terms can contain variables.
No variables = ground term.
Atomic Sentence:
Atomic sentences are the most basic sentences of first-order logic. These sentences are formed
from a predicate symbol followed by a parenthesis with a sequence of terms.
Atomic sentence = predicate (term1,...,termn)
Atomic sentences state facts using terms and predicate symbols P(x,y) interpreted as “x is P of y”
Examples: LargerThan(2,3) is false.
Married(Father(Richard), Mother(John)) could be true or false
Complex Sentence:
Complex sentences are made from atomic sentences using connectives
S, S1 S2, S1 S2, S1 S2, S1 S2,
Variables:
Person(John) is true or false because we give it a single argument ‘John’
We can be much more flexible if we allow variables which can take on values in a
domain. e.g., all persons x, all integers i, etc.
o E.g., can state rules like Person(x) => HasHead(x)
o or Integer(i) => Integer(plus(i,1)
Example:
All man drink coffee.
∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.It is denoted by the logical operator ∃, which
resembles as inverted E. When it is used with a predicate variable then it is called as an
existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
There exists a 'x.'
For some 'x.'
For at least one 'x.'
Example:
Some boys are intelligent.
∃x: boys(x) ∧ intelligent(x)
It will be read as: There are some x where x is a boy who is intelligent.
Note:
In Existential quantifier we always use AND or Conjunction symbol (∧).
The main connective for universal quantifier ∀ is implication →.
The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
In universal quantifier, ∀x∀y is similar to ∀y∀x.
In Existential quantifier, ∃x∃y is similar to ∃y∃x.
∃x∀y is not similar to ∀y∃x
Some Examples of FOL using quantifier:
1. All birds fly.
In this question the predicate is "fly(bird)." Since there are all birds who fly so it will be
represented as follows. ∀x bird(x) →fly(x)
.
2. Every man respects his parent.
In this question, the predicate is "respect(x, y)," where x=man, and y= parent. Since there is
every man so will use ∀, and it will be represented as follows: ∀x man(x) → respects (x, parent).
Forward chaining: Forward chaining is the process where it matches the set of conditions and
infer results from these conditions. It is data driven approach using which it reaches (infers) to
the goal condition.
Example: Given A (is true)
B->C
A->B
C->D
Prove D is also true.
Solution: Starting from A, A is true then B is true (A->B)
B is true then C is true (B->C)
C is true then D is true Proved (C->D)
Backward chaining: Backward chaining is the process where it performs backward search from
goal to the conditions used to get the goal. It is goal driven approach using which it reaches to
the initial condition.
Example: Given A (is true)
B->C
A->B
C->D
Prove D is also true.
Solution: Starting from D,
Let D is true then C is true (C->D)
C is true then B is true (B->C)
B is true then A is true Proved (A->B)
2.1.5 Resolution in FOL:
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions. Resolution
is a single inference rule which can efficiently operate on the conjunctive normal form or clausal
form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as
a unit clause.
Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said
to be conjunctive normal form or CNF.
Steps in Resolution:
Conversion of facts into first-order logic.
Negate the statement which needs to prove (proof by contradiction)
Convert FOL statements into CNF
Draw resolution graph (unification).
To better understand all the above steps, we will take an example in which we will apply
resolution.
Example:
Facts:
All peoples who are graduating are happy
All happy people smile
Some one is graduating
Prove that “Is someone smiling?” using resolution
Eliminate Implication
α⇒β≡¬α∨β
α ⇔ β ≡ (α ⇒ β) ∧ (β ⇒ α )
Example:
Eliminate Implication: α⇒β≡¬α∨β
∀x(¬ Graduating(x) ∨ happy(x))
∀x(¬happy(x) ∨ smile(x))
∃x graduating(x)
¬∃x smile(x)
Standardize variable
∀x regular(x) ∀x regular(x)
∃x busy(x) ∃y busy(y)
∃x attentive(x) ∃z attentive(z)
Example:
∀x(¬ Graduating(x) ∨ happy(x))
∀y(¬happy(y) ∨ smile(y))
∃z graduating(z)
¬ ∃w smile(w)
Example:
∀x(¬ Graduating(x) ∨ happy(x))
∀y(¬happy(y) ∨ smile(y))
∃z graduating(z)
∀w ¬ smile(w)
Example:
∀x(¬ Graduating(x) ∨ happy(x))
∀y(¬happy(y) ∨ smile(y))
4. Resolution Graph
If fact F is to be proved then it start with ¬F
It contradicts all other rules in KB
The process stop when it returns Null clause
Example:
¬ Graduating(x) ∨ happy(x)
¬happy(y) ∨ smile(y)
graduating(A)
¬ smile(w)
Resolution Graph:
2.2.1 Semantic Network:
Semantic nets were originally proposed in the early 1960 by M. Ross Quillian to represent the
meaning of English words. Semantic networks are alternative of predicate logic for knowledge
representation. In Semantic networks, we can represent our knowledge in the form of graphical
networks. This network consists of nodes representing objects and arcs which describe the
relationship between those objects. Semantic networks can categorize the object in different
forms and can also link those objects. Semantic networks are easy to understand and can be
easily extended.
Representation:
Semantic network representation consists of mainly two types of relations:
1. IS-A relation (Inheritance)
2. Kind-of-relation
Example 1: Following are some statements which we need to represent in the form of nodes and
arcs.
Example 2:
Statements
Jerry is a cat.
Jerry is a mammal
Jerry is owned by Priya.
Jerry is white colored.
All Mammals are animal.
Hendrix partitioned semantic network now comprises two partitions SA and S1. Node G is an
instance of the special class of general statements about the world comprising link statement,
form, and one universal quantifier
The partitioning of a semantic network renders them more logically adequate, in that one can
distinguish between individuals and sets of individuals and indirectly more heuristically adequate
by way of controlling the search space by delineating semantic networks.
Hendrix's partitioned semantic networks-oriented formalism has been used in building natural
language front-ends for data bases and for programs to deduct information from databases.
2.2.2 Frames:
A frame is a record like structure which consists of a collection of attributes and its values to
describe an entity in the world. A frame is analogous to a record structure, corresponding to the
fields and values of a record are the slots and slot fillers of a frame
Frames are the AI data structure which divides knowledge into substructures by representing
stereotypes situations. It consists of a collection of slots and slot values. These slots may be of
any type and sizes. Slots have names and values which are called facets.
Facets: The various aspects of a slot is known as Facets. Facets are features of frames which
enable us to put constraints on the frames.
Example: IF-NEEDED facts are called when data of any particular slot is needed.
A frame may consist of any number of slots, and a slot may include any number of facets and
facets may have any number of values. A frame is also known as slot-filter knowledge
representation in artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-day classes and
objects. A single frame is not much useful. Frames system consist of a collection of frames
which are connected. In the frame, knowledge about an object or event can be stored together in
the knowledge base. The frame is a type of technology which is widely used in various
applications including Natural language processing and machine visions.
Predicates not included on either of these lists are assumed to be unaffected by the operation.
Frame axioms are specified implicitly in STRIPS which greatly reduces amount of information
stored.
Rules:
• R1 : pickup(x)
Precondition & Deletion List: hand empty, on(x,table), clear(x)
Add List: holding(x)
• R2 : putdown(x)
Precondition & Deletion List: holding(x)
Add List: hand empty, on(x,table), clear(x)
• R3 : stack(x,y)
Precondition & Deletion List: holding(x), clear(y)
Add List: on(x,y), clear(x), hand empty
• R4 : unstack(x,y)
Precondition & Deletion List: on(x,y), clear(x), hand empty
Add List: holding(x), clear(y)
The space of POPs is smaller than TOPs and hence involve less search because in TOP it
requires six corresponding linearization.
Casual Link A B and the ordering constraints are added () ( A<B Start
< A A< Finish)
Resolve Conflict : add B<C or C<A
Goal Test:
There are no open preconditions
What is Knowledge?
➢ The data is collection of facts. The information is organized
as data and facts about the task domain. Data,
information, and past experience combined together are
termed as knowledge.
Components of Expert Systems
Components of Knowledge Base:
The knowledge base of an ES is a store of both, factual and
heuristic knowledge.
➢ Factual Knowledge − It is the information widely accepted
by the Knowledge Engineers and scholars in the task
domain.
➢ Heuristic Knowledge − It is about practice, accurate
judgement, one’s ability of evaluation, and guessing.
Components of Expert Systems
Inference Engine(Rules of Engine)
➢ The inference engine is known as the brain of the expert
system as it is the main processing unit of the system. It
applies inference rules to the knowledge base to derive a
conclusion or deduce new information. It helps in deriving
an error-free solution of queries asked by the user.
➢ With the help of an inference engine, the system extracts
the knowledge from the knowledge base.
The knowledge engineer and the domain expert usually work very
closely together for long periods of time throughout the several
stages of the development process.
1.Identification Phase
➢ To begin, the knowledge engineer, who may be unfamiliar with
this particular domain, consults manuals and training guides to
gain some familiarity with the subject. Then the domain expert
describes several typical problem states.
➢ The knowledge engineer attempts to extract fundamental concepts
from the similar cases in order to develop a more general idea of
the purpose of the expert system.
➢ After the domain expert describes several cases, the knowledge
engineer develops a ‘first-pass’ problem description.
➢ Typically, the domain expert may feel that the description does
not entirely represent the problem.
➢ The domain expert then suggests changes to the description and
provides the knowledge engineer with additional examples to
illustrate further the problem’s fine points.
1.Identification Phase
Next, the knowledge engineer revises the description, and the domain
expert suggests further changes. This process is repeated until the
domain expert is satisfied that the knowledge engineer understands
the problems and until both are satisfied that the description
adequately portrays the problem which the expert system is expected
to solve.
2.Conceptualisation Phase
In the conceptualisation stage, the knowledge engineer
frequently creates a diagram of the problem to depict
graphically the relationships between the objects and processes
in the problem domain.
It is often helpful at this stage to divide the problem into a series
of sub-problems and to diagram both the relationships among
the pieces of each sub-problem and the relationships among the
various sub-problems.
As in the identification stage, the conceptualisation stage
involves a circular procedure of iteration and reiteration
between the knowledge engineer and the domain expert. When
both agree that the key concepts-and the relationships among
them-have been adequately conceptualised, this stage is
complete.
2.Conceptualisation Phase
Examples:
➢In a drawer of ten socks where 8 of them are yellow, there is a
20% chance of choosing a sock that is not yellow.
➢There are 9 red candies in a bag and 1 blue candy in the same
bag. The chance of picking a blue candy is 10%.
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.
➢ Event: Each possible outcome of a variable is called an event.
➢ Sample space: The collection of all possible events is called
sample space.
➢ Random variables: Random variables are used to represent
the events and objects in the real world.
➢ Prior probability: The prior probability of an event is
probability computed before observing new information.
➢ Posterior Probability: The probability that is calculated after
all evidence or information has taken into account. It is a
combination of prior probability and new information.
Conditional Probability
➢Conditional probability is a probability of occurring an
event when another event has already happened.
Hence, 57% are the students who like C also like Java.
Prior Probability
Prior Probability- Degree of belief in an event, in the
absence of any other information
Example:
➢ P(rain tomorrow)= 0.7
➢ P(no rain tomorrow)= 0.3 Rain
No
No rain
rain
Conditional Probability
What is the probability of an event , given knowledge of
another event.
Example:
➢ P(raining | sunny)
➢ P(raining | cloudy)
➢ P(raining | cloudy, cold)
Conditional Probability…
In some cases , given knowledge of one or more random
variables, we can improve our prior belief of another
random variable.
For Example:
➢ P(slept in stadium) = 0.5
➢ P(slept in stadium | liked match) = 0.33
➢ P(didn’t slept in stadium | liked match) = 0.67
Bayes Theorem
➢ Bayes' theorem is also known as Bayes' rule, Bayes' law,
or Bayesian reasoning, which determines the probability
of an event with uncertain knowledge.
➢ In probability theory, it relates the conditional probability
and marginal probabilities of two random events.
➢ Bayes' theorem was named after the British
mathematician Thomas Bayes.
➢ The Bayesian inference is an application of Bayes'
theorem, which is fundamental to Bayesian statistics.
➢ It is a way to calculate the value of P(B|A) with the
knowledge of P(A|B).
Bayes Theorem …
➢ Bayes' theorem allows updating the probability prediction
of an event by observing new information of the real
world.
➢ Example: If cancer corresponds to one's age then by using
Bayes' theorem, we can determine the probability of cancer
more accurately with the help of age.
➢ Bayes' theorem can be derived using product rule and
conditional probability of event A with known event B:
As from product rule we can write:
P(A ∧ B) = P(A|B)P(B) and
Similarly, the probability of event B with known event A:
P(A ∧ B) = P(B|A)P(A)
Bayes Theorem …
Equating right hand side of both the equations, we will get:
A1 A2 A3
( 0.00025 )( 0.4 )
=
( 0.00025 )( 0.4 ) + ( 0.0002 )( 0.6 )
= 0.4545
Bayesian Networks
➢ Bayesian belief network is key computer technology for
dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network
as:
➢ "A Bayesian network is a probabilistic graphical model
which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
➢ It is also called a Bayes network, belief network, decision
network, or Bayesian model.
Bayesian Networks
➢ Bayesian networks are probabilistic, because these networks
are built from a probability distribution, and also use
probability theory for prediction and anomaly detection.
➢ Real world applications are probabilistic in nature, and to
represent the relationship between multiple events, we need a
Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction,
and decision making under uncertainty.
➢ Bayesian Network can be used for building models from data
and experts opinions, and it consists of two parts:
Directed Acyclic Graph
Table of conditional probabilities.
A Bayesian network graph (Directed Acyclic Graph) is made up
of nodes and Arcs (directed links), where:
➢ Each node corresponds to the random
variables, and a variable can
be continuous or discrete.
➢ Arc or directed arrows represent the causal
relationship or conditional probabilities between
random variables.
➢ These directed links or arrows connect the
pair of nodes in the graph.
➢ These links represent that one node directly
influence the other node, and if there is no
directed link that means that nodes are
independent with each other
➢In the above diagram X1, X2,X3 and X4 are random variables represented
by the nodes of the network graph.
➢If we are considering node X3, which is connected with node X1 by a
directed arrow, then node X1 is called the parent of Node X3.
➢Node X4 is independent of node X1.
Conditional Probability Tables- CPTs
➢ The conditional probability tables in the network give the
probabilities for the value of the random variables
depending on the combination of values for the parent
nodes.
➢ Each row must be sum to 1.
➢ All variables are Boolean, and therefore, the probability of
a true value is p, the probability of false value must be 1-p.
➢ A table for a boolean variable with k-parents contains 2k
independently specifiable probabilities.
➢ A variable with no parents has only one row, representing
the prior probabilities of each possible values of the
variable.
Joint Probability Distribution
➢Bayesian network is based on Joint probability distribution and
conditional probability. So let's first understand the joint
probability distribution:
P(J|A) P(M|A)
A 0.90 (0.1) A 0.70 (0.30)
P(j|A)P(m|A)
A 0.9 * 0.7
-A 0.05 * 0.01
Inference by Variable Elimination…
α P(B)ΣE P(E) ΣA P(A|E,B)f1(A)
P(J|A) P(M|A)
A 0.90 (0.1) A 0.70 (0.30)
f1 (A)
A 0.63
-A 0.0005
Inference by Variable Elimination…
α P(B)ΣE P(E) ΣA P(A|E,B)f1(A)
P(A|E,B)
e,b 0.95 (0.05)
f1 (A)
e , -b 0.29 (0.71)
A 0.63
-e , b 0.94(0.06)
-A 0.0005
-e , -b 0.001(0.999)
ΣAP(A|E,B) f1(A)
e,b 0.95 * 0.63 + 0.05 * 0.0005
-e , b 0.94(0.06)
-A 0.0005
-e , -b 0.001(0.999)
f2(E,B)
e,b 0.60
e , -b 0.18
-e , b 0.59
-e , -b 0.001
Inference by Variable Elimination…
α P(B)ΣE P(E) f2(E,B)
f2(E,B)
e,b 0.60
P(E=T) P(E=F) P(B=T) P(B=F)
0.002 0.998 0.001 0.999 e , -b 0.18
-e , b 0.59
-e , -b 0.001
-e , b 0.59
-e , -b 0.001
f3(B)
b 0.0006
-b 0.0013
Inference by Variable Elimination…
α f3(B) → P(B | j , m)
f3(B)
b 0.0006
-b 0.0013
P(B | j , m)
b 0.32
-b 0.68
CERTAINLY YES
POSSIBLY YES
CANNOT SAY
POSSIBLY NO
CERTAINLY NO
Implementation
It can be implemented in systems with various sizes and
capabilities ranging from small micro-controllers to large,
networked, workstation-based control systems.
It can be implemented in hardware, software, or a combination
of both.
Slow Fast
Speed = 0 Speed = 1
bool speed;
get the speed
if ( speed == 0) {
// speed is slow
}
else {
// speed is fast
}
FUZZY LOGIC REPRESENTATION
Slowest
• For every problem
[ 0.0 – 0.25 ]
must represent in
terms of fuzzy sets.
Slow
• What are fuzzy sets? [ 0.25 – 0.50 ]
Fast
[ 0.50 – 0.75 ]
Fastest
[ 0.75 – 1.00 ]
FUZZY LOGIC REPRESENTATION
CONT.
Example:
Let's suppose A is a set which contains following elements:
A = {( X1, 0.3 ), (X2, 0.7), (X3, 0.5), (X4, 0.1)}
then,
μĀ(x) = 1-μA(x),
Example:
Let's suppose A is a set which contains following elements:
Inference Engine:
It helps you to determines the degree of match between fuzzy
input and the rules. Based on the % match, it determines which
rules need implment according to the given input field. After
this, the applied rules are combined to develop the control
actions.
Fuzzy Logic Systems Architecture
Defuzzification:
At last the Defuzzification process is performed to convert the
fuzzy sets into a crisp value. There are many types of
techniques available, so you need to select it which is best
suited when it is used with an expert system.
Fuzzy logic algorithm
1) Initialization process:
▪ Define the linguistic variables.
▪ Construct the fuzzy logic membership functions that
define the meaning or values of the input and output
terms used in the rules.
▪ Construct the rule base (Break down the control problem
into a series of IF X AND Y, THEN Z rules based on the
fuzzy logic rules).
2)Convert crisp input data to fuzzy values using the
membership functions (fuzzification).
3) Evaluate the rules in the rule base (inference).
4) Combine the results of each rule (inference).
5)Convert the output data to non-fuzzy values
(defuzzification).
Example: Air conditioner system
controlled by a FLS
Example: Air conditioner system
controlled by a FLS
The system adjusts the temperature of the room according
to the current temperature of the room and the target value.
The fuzzy engine periodically compares the room
temperature and the target temperature, and produces a
command to heat or cool the room.
RoomTemp.
Too Cold Cold Warm Hot Too Hot
/Target
Too Cold No_Change Heat Heat Heat Heat
0, if x < a
(x – a) / (b- a), if a ≤ x ≤ b
F(x, a, b, c) =
(c – x) / (c – b), if b ≤ x ≤ c
0, if c < x
Cont…
1.2
1
Membership Values
0.8
0.6
0.4
0.2
a b c
0
0 20 40 60 80 100
0, if x < a
(x – a) / (b- a), if a ≤ x ≤ b
F(x, a, b, c, d) = 1, if b < x < c
(d – x) / (d – c), if c ≤ x ≤ d
0, if d < x
Trapezoidal membership function
0, if x < a
(x – a) / (b- a), if a ≤ x ≤ b
F(x, a, b, c, d) = 1, if b < x < c
(d – x) / (d – c), if c ≤ x ≤ d
0, if d < x
Cont…
Gaussian membership function
− ( x −b ) 2
( x, a, b) = e 2a2
a b
Figure Gaussian Membership Function
Applications of Fuzzy Logic
Following are the different application areas where the Fuzzy Logic
concept is widely used:
It is used in Businesses for decision-making support system.
It is used in Automative systems for controlling the traffic and
speed, and for improving the efficiency of automatic transmissions.
Automative systems also use the shift scheduling method for
automatic transmissions.
This concept is also used in the Defence in various areas. Defence
mainly uses the Fuzzy logic systems for underwater target
recognition and the automatic target recognition of thermal infrared
images.
It is also widely used in the Pattern Recognition and
Classification in the form of Fuzzy logic-based recognition and
handwriting recognition. It is also used in the searching of fuzzy
images.
Applications of Fuzzy Logic
Fuzzy logic systems also used in Securities.
It is also used in microwave oven for setting the lunes power and
cooking strategy.
This technique is also used in the area of modern control
systems such as expert systems.
Finance is also another application where this concept is used for
predicting the stock market, and for managing the funds.
It is also used for controlling the brakes.
It is also used in the industries of chemicals for controlling the ph,
and chemical distillation process.
It is also used in the industries of manufacturing for the
optimization of milk and cheese production.
It is also used in the vacuum cleaners, and the timings of washing
machines.
It is also used in heaters, air conditioners, and humidifiers.
Utility Theory and utility functions
Decision theory, in its simplest form, deals with choosing
among actions based on the desirability of their immediate
outcomes
If agent may not know the current state and define
RESULT(a) as a random variable whose values are the
possible outcome states. The probability of outcome s ,
given evidence observations e, is written
P(RESULT(a) = s ‘| a, e)
where the a on the right-hand side of the conditioning bar
stands for the event that action a is executed
The agent’s preferences are captured by a utility function,
U(s), which assigns a single number to express the
desirability of a state.
The expected utility of an action given the evidence, EU
(a|e), is just the average utility value of the outcomes,
weighted by the probability that the outcome occurs:
EU (a|e) = ∑ P(RESULT(a) = s’ | a, e)U(s’)
The principle of maximum expected utility (MEU) says that a
rational agent should choose the action that maximizes the
agent’s expected utility:
action = argmax EU (a|e)
In a sense, the MEU principle could be seen as defining all of
AI. All an intelligent agent has to do is calculate the various
quantities, maximize utility over its actions, and away it goes.
Basis of Utility Theory
Intuitively, the principle of Maximum Expected Utility
(MEU) seems like a reasonable way to make decisions, but
it is by no means obvious that it is the only rational way.
Learning
Machine Learning Paradigms
What is learning?
“Learning denotes changes in a system that ... enable a
system to do the same task more efficiently the next time.”
–Herbert Simon
“Learning is constructing or modifying representations of
what is being experienced.”
–Ryszard Michalski
“Learning is making useful changes in our minds.” –
Marvin Minsky
3
Paradigms in Machine Learning
A paradigm, as most of us know, is a set of ideas,
assumptions and values held by an entity and they shape
the way that entity interacts with their environment.
For machine learning, this translates into the set of policies
and assumptions inherited by a machine learning algorithm
which dictate how it interacts with both the data inputs and
the user.
Machine Learning
Machine Learning is said as a subset of artificial
intelligence that is mainly concerned with the development
of algorithms which allow a computer to learn from the
data and past experiences on their own. The term machine
learning was first introduced by Arthur Samuel in 1959.
We can define it in a summarized way as:
Machine learning enables a machine to automatically
learn from data, improve performance from
experiences, and predict things without being explicitly
programmed.
Machine Learning
With the help of sample historical data, which is known
as training data, machine learning algorithms build
a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine
learning brings computer science and statistics together for
creating predictive models. Machine learning constructs or
uses the algorithms that learn from historical data. The
more we will provide the information, the higher will be the
performance.
A machine has the ability to learn if it can improve its
performance by gaining more data.
How does Machine Learning work
A Machine Learning system learns from historical data,
builds the prediction models, and whenever it receives
new data, predicts the output for it. The accuracy of
predicted output depends upon the amount of data, as the
huge amount of data helps to build a better model which
predicts the output more accurately.
Classification of Machine Learning
At a broad level, machine learning can be classified into three
types:
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised Learning
Supervised learning is a type of machine learning method in
which we provide sample labeled data to the machine learning
system in order to train it, and on that basis, it predicts the
output.
The system creates a model using labeled data to understand
the datasets and learn about each data, once the training and
processing are done then we test the model by providing a
sample data to check whether it is predicting the exact output
or not.
The goal of supervised learning is to map input data with the
output data. The supervised learning is based on supervision,
and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning
is spam filtering.
Supervised Learning
Supervised learning can be grouped further in two categories of
algorithms:
Classification
Regression
Classification:
Classification is a process of categorizing a given set of data
into classes, It can be performed on both structured or
unstructured data. The process starts with predicting the class
of given data points. The classes are often referred to as target,
label or categories.
Classification
The classification predictive modeling is the task of
approximating the mapping function from input variables to
discrete output variables. The main goal is to identify which
class/category the new data will fall into.
Heart disease detection can be identified as a classification
problem, this is a binary classification since there can be
only two classes i.e has heart disease or does not have heart
disease. The classifier, in this case, needs training data to
understand how the given input variables are related to the
class. And once the classifier is trained accurately, it can be
used to detect whether heart disease is there or not for a
particular patient.
Classification
Since classification is a type of supervised learning, even
the targets are also provided with the input data. Let us get
familiar with the classification in machine learning
terminologies.
Examples of supervised machine learning algorithms for
classification are:
Decision Tree Classifiers
Support Vector Machines
Naive Bayes Classifiers
K Nearest Neighbor
Artificial Neural Networks
Regression
The regression algorithms attempt to estimate the mapping
function (f) from the input variables (x) to numerical or
continuous output variables (y). Now, the output variable could
be a real value, which can be an integer or a floating point
value. Therefore, the regression prediction problems are
usually quantities or sizes.
For example, if you are provided with a dataset about houses,
and you are asked to predict their prices, that is a regression
task because the price will be a continuous output.
Examples of supervised machine learning algorithms for
regression:
Linear Regression
Logistic Regression
Regression Decision Trees
Artificial Neural Networks
Unsupervised Learning
As the name suggests, unsupervised learning is a machine
learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden
patterns and insights from the given data. It can be
compared to learning which takes place in the human brain
while learning new things. It can be defined as:
Unsupervised learning is a type of machine learning in
which models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.
Unsupervised learning cannot be directly applied to a
regression or classification problem because unlike
supervised learning, we have the input data but no
corresponding output data.
Unsupervised Learning
The goal of unsupervised learning is to find the
underlying structure of dataset, group that data
according to similarities, and represent that dataset in a
compressed format.
Example: Suppose the unsupervised learning algorithm is
given an input dataset containing images of different types
of cats and dogs. The algorithm is never trained upon the
given dataset, which means it does not have any idea about
the features of the dataset. The task of the unsupervised
learning algorithm is to identify the image features on their
own. Unsupervised learning algorithm will perform this
task by clustering the image dataset into the groups
according to similarities between images.
Unsupervised Learning
Subset 2:
s.no place type weather location decision
Rule Set
Gain(S,Temperature) = 0.94-(4/14)(1.0)
-(6/14)(0.9183)
-(4/14)(0.8113)
=0.0289
Tempe Humidit Play
Day Outlook
rature y
Wind
Golf Attribute : Humidity
D1 Sunny Hot High Weak No
Values(Humidity) = High, Normal
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Tempe Humidit Play
Day Outlook Wind
rature y Golf Attribute : Wind
D1 Sunny Hot High Weak No Values(Wind) = Strong, Weak
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
We calculating information gain for all attributes:
Gain(S,Outlook)= 0.2464,
Gain(S,Temperature)= 0.0289
Gain(S,Humidity)=0.1516
Gain(S,Wind) =0.0478
We can clearly see that IG(S, Outlook) has the highest
information gain of 0.246, hence we chose Outlook attribute as
the root node. At this point, the decision tree looks like.
Here we observe that whenever the outlook is Overcast,
Play Golf is always ‘Yes’, it’s no coincidence by any
chance, the simple tree resulted because of the highest
information gain is given by the attribute Outlook.
Now how do we proceed from this point? We can simply
apply recursion, you might want to look at the algorithm
steps described earlier.
Now that we’ve used Outlook, we’ve got three of them
remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain.
Where the Overcast node already ended up having leaf
node ‘Yes’, so we’re left with two subtrees to compute:
Sunny and Rain.
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Humidity
Values(Humidity) = High, Normal
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Wind
Values(Wind) = Strong, Weak
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Gain(Ssunny,Temperature)= 0.570
Gain(Ssunny,Humidity)=0.97
Gain(Ssunny,Wind) =0.0192
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool
Gain(Srain,Humidity)=0.0192
Gain(Srain,Wind) =0.97
Decision Tree
Neural Networks
● Artificial neural network (ANN) is a machine learning
approach that models human brain and consists of a number
of artificial neurons.
● Neuron in ANNs tend to have fewer connections than
biological neurons.
● Each neuron in ANN receives a number of inputs.
● An activation function is applied to these inputs which
results in activation level of neuron (output value of the
neuron).
● Knowledge about the learning task is given in the form of
examples called training examples.
Contd..
y = (u + b)
The Neuron Diagram
Bias
b
x1 w1
Activation
Induced function
Field
(−)
Output
v
x2 w2 y
Input
values
Summing
function
xm wm
weights
Bias of a Neuron
w0 = b
Neural Networks Activation
Functions
Activation functions are mathematical equations that determine
the output of a neural network.
The function is attached to each neuron in the network, and
determines whether it should be activated (“fired”) or not, based
on whether each neuron’s input is relevant for the model’s
prediction.
Activation functions also help normalize the output of each
neuron to a range between 1 and 0 or between -1 and 1.
An additional aspect of activation functions is that they must be
computationally efficient because they are calculated across
thousands or even millions of neurons for each data sample.
Contd..
Modern neural networks use a technique called
backpropagation to train the model, which places an
increased computational strain on the activation function,
and its derivative function.
It’s just a thing function that you use to get the output of
node. It is also known as Transfer Function.
It is used to determine the output of neural network like yes
or no. It maps the resulting values in between 0 to 1 or -1 to
1 etc. (depending upon the function).
Step Function
A step function is a function like that used by the original
Perceptron.
The output is a certain value, A1, if the input sum is above a
certain threshold and A0 if the input sum is below a certain
threshold.
The values used by the Perceptron were A1 = 1 and A0 = 0.
Step Function
Linear or Identity Activation
Function
As you can see the function is a line or linear. Therefore,
the output of the functions will not be confined between
any range.
Equation : f(x) = x
Range : (-infinity to infinity)
It doesn’t help with the complexity or various parameters of
usual data that is fed to the neural networks.
Sigmoid or Logistic Activation
Function
The sigmoid function is an activation function where it
scales the values between 0 and 1 by applying a threshold.
The above equation represents a sigmoid function. When
we apply the weighted sum in the place of X, the values are
scaled in between 0 and 1.
The beauty of an exponent is that the value never reaches
zero nor exceed 1 in the above equation.
The large negative numbers are scaled towards 0 and large
positive numbers are scaled towards 1.
Sigmoid Function
Tanh or Hyperbolic Tangent
Function
The Tanh function is an activation function which re scales
the values between -1 and 1 by applying a threshold just
like a sigmoid function.
The advantage i.e the values of a tanh is zero centered
which helps the next neuron during propagating.
When we apply the weighted sum of the inputs in the
tanh(x), it re scales the values between -1 and 1. .
The large negative numbers are scaled towards -1 and large
positive numbers are scaled towards 1.
ReLU(Rectified Linear Unit) :
This is one of the most widely used activation function.
The benefits of ReLU is the sparsity, it allows only values
which are positive and negative values are not passed
which will speed up the process and it will negate or bring
down possibility of occurrence of a dead neuron.
f(x) = (0,max)
This function will allow only the maximum values to pass
during the front propagation .
The draw backs of ReLU is when the gradient hits zero for
the negative values, it does not converge towards the
minima which will result in a dead neuron while back
propagation.
Network Architectures
● Three different classes of network architectures
− single-layer feed-forward
− multi-layer feed-forward
− recurrent
Input Output
layer layer
Hidden Layer
3-4-2 Network
FFNN for XOR
FFNN for XOR
● The ANN for XOR has two hidden nodes that realizes this non-linear
separation and uses the sign (step) activation function.
● Arrows from input nodes to two hidden nodes indicate the directions of
the weight vectors.
● The output node is used to combine the outputs of the two hidden nodes.
Inputs Output of Hidden Nodes Output Node X1 XOR X2
X1 X2 H1 (OR) H2 (NAND) AND (H1 , H2)
0 0 0 1 0 0
0 1 1 1 1 1
1 0 1 1 1 1
1 1 1 0 0 0
Network activation
Forward Step
Error propagation
Backward Step
The agent learns by trying all the possible paths and then choosing the path which
gives him the reward with the least hurdles. Each right step will give the agent a
reward and each wrong step will subtract the reward of the agent. The total reward
will be calculated when it reaches the final reward that is at the state where an agent
gets +1 reward
Steps in Reinforcement Learning
Input: The input should be an initial state from which the
model will start
Output: There are many possible output as there are variety
of solution to a particular problem
Training: The training is based upon the input, The model
will return a state and the user will decide to reward or
punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
Policy: It is a mapping of an action to every
possible state in the system (sequence of
states).
Optimal Policy: A policy which maximizes
the long term reward.
Active and Passive Reinforcement
Learning
Both active and passive reinforcement learning are types of
Reinforcement Learning.
In case of passive reinforcement learning, the agent’s policy
is fixed which means that it is told what to do.
In contrast to this, in active reinforcement learning, an
agent needs to decide what to do as there’s no fixed policy
that it can act on.
Therefore, the goal of a passive reinforcement learning
agent is to execute a fixed policy (sequence of actions) and
evaluate it while that of an active reinforcement learning
agent is to act and learn an optimal policy.
Passive Reinforcement Learning
Techniques
In this kind of RL, agent assume that the agent’s policy
π(s) is fixed.
Agent is therefore bound to do what the policy dictates,
although the outcomes of agent actions are probabilistic.
The agent may watch what is happening, so the agent
knows what states the agent is reaching and what rewards
the agent gets there.
Techniques:
1. Direct utility estimation
2. Adaptive dynamic programming
3. Temporal difference learning
Active Reinforcement Learning
Techniques
In this kind of RL agent , it assume that the agent’s policy
π(s) is not fixed.
Agent is therefore not bound on existing policy and tries to
act and find an Optimal policy for calculating and
maximizing the overall reward value.
Techniques:
1. Q-Learning
2. ADP with exploration function
Applications of Reinforcement
Learning
Robotics for industrial automation.
Business strategy planning
Machine learning and data processing
It helps you to create training systems that provide custom
instruction and materials according to the requirement of
students.
Aircraft control and robot motion control
UNIT-5
Natural Language Processing
(NLP)
What is NLP?
Natural Language Processing or NLP is a
field of Artificial Intelligence that gives the
machines the ability to read, understand and
derive meaning from human languages.
3
Why NLP?
Everything we express (either verbally or in written) carries
huge amounts of information.
The topic we choose, our tone, our selection of words,
everything adds some type of information that can be
interpreted and value extracted from it.
In theory, we can understand and even predict human
behavior using that information.
But there is a problem: one person may generate hundreds
or thousands of words in a declaration, each sentence with
its corresponding complexity.
If you want to scale and analyze several hundreds,
thousands or millions of people or declarations in a given
geography, then the situation is unmanageable. 4
Why NLP?
Data generated from conversations, declarations or even
tweets are examples of unstructured data.
Unstructured data doesn’t fit neatly into the traditional
row and column structure of relational databases, and
represent the vast majority of data available in the actual
world.
It is messy and hard to manipulate. Nevertheless, thanks to
the advances in disciplines like machine learning a big
revolution is going on regarding this topic.
5
Applications of NLP
Chat box
Speech recognisation
Machine Translation
Spell Checking
Keyword Searching
Information extraction
Advertisement Matching
Components of NLP
Natural Language Understanding (NLU)
Taking some spoken / typed sentence and working out what
it means
Natural Language Generation (NLG)
Taking some formal representation of what you want to say
and working out a way to express it in a natural (human)
language (e.g., English)
Components of NLP
Natural Language Understanding (NLU)
Mapping the given input in the natural language into a
useful representation
Different levels of analysis required:
Morphological analysis
Syntactic Analysis
Semantic Analysis
Discourse Analysis
Components of NLP
Natural Language Generation (NLG)
Producing some output in the natural language from
some internal representation
Different levels of synthesis required:
Deep planning (what to say)
Syntactic Generation
NL Understanding is much harder than NL Generation.
But still both of them are hard.
Building an NLP pipeline
There are the following steps to build an NLP pipeline
Tokenization
Stemming
Lemmatization
POS Tags
Name Entity Recognition
Chunking
Tokenization
Cutting the big sentences into small tokens
Example: Today is Wednesday
“Today”
“is”
“Wednesday”
Stemming
Refers to the process of slicing the end or the beginning of
words with the intention of removing affixes (lexical
additions to the root of the word).
Normalize words into its base or root forms.
For example, celebrates, celebrated and celebrating, all
these words are originated with a single root word
"celebrate."
Problem in Stemming
The big problem with stemming is that sometimes it
produces the root word which may not have any meaning.
Lemmatization
Lemmatization is quite similar to the Stemming but it
overcomes the limitation of Stemming.
It is used to group different inflected forms of the word,
called Lemma.
The main difference between Stemming and lemmatization
is that it produces the root word, which has a meaning.
Output of Lemmatization is a proper word.
POS Tags
POS stands for Parts of Speech Tags.
It includes Noun, verb, adverb, and Adjective.
It indicates that how a word functions with its meaning as
well as grammatically within the sentences.
A word has one or more parts of speech based on the
context in which it is used.
Example: "Google" something on the Internet.
In the above example, Google is used as a verb, although it
is a proper noun.
Name Entity Recognition
Named Entity Recognition (NER) is the process of
detecting the named entity such as person name,
organization name, location or Monetary value.
Example: HelpingHands founder Arhaan lists his Bangalore
penthouse for 20 Million rupees.
Chunking
Chunking is used to collect the individual piece of
information and grouping them into bigger pieces of
sentences.
This help getting insight and meaningful information from
the text.
Phases of NLP
Morphological and Lexical Analysis
Syntactic Analysis
Semantic Analysis
Discourse Integration
Pragmatic Analysis
Morphological and Lexical Analysis
The lexicon of a language is its vocabulary that includes its
words and expressions
Morphology depicts analyzing, identifying and description
of structure of words
Lexical analysis involves dividing a text into paragraphs
and the sentences
Syntactic Analysis
Syntax concerns the proper ordering of words and its
effects on meaning
This involves analysis of the words in a sentence to depict
the grammatical structure of the sentence.
The words are transformed into structure that shows how
the words are related to each other.
Eg: School went to Raju
This would be rejected by English syntactic analyzer
Semantic Analysis
Semantic concerns the (literal) meaning of words, phrases
and sentences.
This abstracts the dictionary meaning or the exact meaning
from context.
The structures which are created by the syntactic analyzer
are assigned meaning.
Eg: “Hot Ice cream”
It is rejected as it does not give any sense
Discourse Integration
Sense of the context
The meaning of any single sentence depends upon the
sentences that precedes it and also invokes the meaning of
the sentences that follow it.
Eg: The word “there” in the sentence “He wants to go
there” depends upon the prior discourse context.
Pragmatic Analysis
Pragmatic concerns the overall communicative and social
context and its effects on interpretation.
It means abstracting or deriving the purposeful use of the
language in situations.
Importantly those aspects of language which require world
knowledge.
The main focus is on what was said is reinterpreted on what
it actually means
Eg: “close the window” should have been interpreted as a
request rather than an order.
Natural Language Generation (NLG)
Natural Language Generation is the process of constructing
natural language outputs from non-linguistic inputs.
NLG can be viewed as the reverse process of NL
understanding.
A NLG system contains:
Discourse Planner
What will be generated, which sentences
Surface Realizer
realizes a sentence from its internal representation
Lexical selection
selecting the correct words describing the concepts
Why NLP is difficult?
NLP is difficult because Ambiguity and Uncertainty exist in the
language.
Ambiguity
There are the following three ambiguity -
Lexical Ambiguity
Lexical Ambiguity exists in the presence of two or more
possible meanings of the sentence within a single word.
Example: Jack is looking for a match.
In the above example, the word match refers to that either Jack
is looking for a partner or Jack is looking for a match. (Cricket
or other match)
Why NLP is difficult?
Syntactic Ambiguity
Syntactic Ambiguity exists in the presence of two or more
possible meanings within the sentence.
Example: I saw the boy with the binocular.
In the above example, did I have the binoculars? Or did the boy
have the binoculars?
Referential Ambiguity
Referential Ambiguity exists when you are referring to something
using the pronoun.
Example: John met Jack. He said, "I am hungry."
In the above sentence, you do not know that who is hungry,
either John or Jack.
Grammars and Parsing
It is the part for implementation aspects of syntactic
Analysis
There are a number of algorithms researchers have
developed for syntactic analysis, but we consider only the
following simple methods −
Context-Free Grammar
Top-Down Parser
Bottom-Up Parser
Grammars and Parsing
Syntactic categories (common denotations) in NLP
np - noun phrase
vp - verb phrase
s - sentence
det - determiner (article)
n - noun
tv - transitive verb (takes an object)
iv - intransitive verb
prep - preposition
pp - prepositional phrase
adj - adjective
Context Free Grammar
A context-free grammar (CFG) is a list of rules that define
the set of all well-formed sentences in a language. Each
rule has a left-hand side, which identifies a syntactic
category, and a right-hand side, which defines its
alternative component parts, reading from left to right.
E.g., the rule s --> np vp means that "a sentence is defined
as a noun phrase followed by a verb phrase."
Parsing
Parsing in NLP is the process of determining the syntactic
structure of a text (parse tree) by analyzing its constituent words
based on an underlying grammar (of the language).
A sentence in the language defined by a CFG is a series of words
that can be derived by systematically applying the rules, beginning
with a rule that has s on its left-hand side.
A parse of the sentence is a series of rule applications in which a
syntactic category is replaced by the right-hand side of a rule that has
that category on its left-hand side, and the final
rule application yields the sentence itself.
E.g., a parse of the sentence "the giraffe dreams" is:
s => np vp => det n vp => the n vp => the giraffe vp => the giraffe
iv => the giraffe dreams
A convenient way to describe a parse is to show its parse tree, which
is simply a graphical display of the parse. Figure 1 shows a parse tree
for the sentence "the giraffe dreams".
Note that the root of every sub tree has a grammatical category that
appears on the left-hand side of a rule, and the children of that root
are identical to the elements on the right-hand side of that rule.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to
rewrite it into a sequence of terminal symbols that matches
the classes of the words in the input sentence until it
consists entirely of terminal symbols.
These are then checked with the input sentence to see if it
matched. If not, the process is started over again with a
different set of rules. This is repeated until a specific rule is
found which describes the structure of the sentence.
Bottom-Up Parser
Looks at words in input string first, checks / assigns their
category(ies), and tries to combine them into acceptable
structures in the grammar.
Involves scanning the derivation so far for sub-strings
which match the right-hand-side of grammar / production
rules and using the rule that would show their derivation
from the nonterminal symbol of that rule
Finally it ends to S symbol.