Network Analysis
Network Analysis
Unit - 10
Or
KAR
Artificial Intelligence
In today's world, technology is growing very fast, and we are getting in touch with
different new technologies day by day.
AI is one of the fascinating and universal fields of Computer science which has a
great scope in future. AI holds a tendency to cause a machine to work as a
human.
Artificial Intelligence exists when a machine can have human based skills such as
learning, reasoning, and solving problems
It is believed that AI is not a new technology, and some people says that as per
Greek myth, there were Mechanical men in early days which can work and
behave like humans.
With the help of AI, we can create such software or devices which can solve
real-world problems very easily and with accuracy such as health issues,
marketing, traffic issues, etc.
With the help of AI, we can create our personal virtual Assistant, such as
Cortana, Google Assistant, Siri, etc.
With the help of AI, we can build such Robots which can work in an
environment where survival of humans can be at risk.
AI opens a path for other new technologies, new devices, and new
Opportunities.
Proving a theorem
Playing chess
Plan some surgical operation
Driving a car in traffic
5. Creating some system which can exhibit intelligent behavior, learn new
things by itself, demonstrate, explain, and can advise to its user.
Artificial Intelligence is not just a part of computer science even it's so vast and
requires lots of other factors which can contribute to it. To create the AI first we
should know that how intelligence is composed, so the Intelligence is an
intangible part of our brain which is a combination of Reasoning,
learning, problem-solving perception, language understanding, etc.
Mathematics
Biology
Psychology
Sociology
Computer Science
Neurons Study
Statistics
High reliability: AI machines are highly reliable and can perform the
same action multiple times with high accuracy.
o Useful as a public utility: AI can be very useful for public utilities such
as a self-driving car which can make our journey safer and hassle-free,
facial recognition for security purpose, Natural language processing
to communicate with the human in human-language, etc.
Every technology has some disadvantages, and thesame goes for Artificial
intelligence. Being so advantageous technology still, it has some disadvantages
which we need to keep in our mind while creating an AI system. Following are the
disadvantages of AI:
o Can't think out of the box: Even we are making smarter machines with
AI, but still they cannot work out of the box, as the robot will only do that
work for which they are trained, or programmed.
Algorithms must express the transitions between states using a well-defined and
formal language that the computer can understand. In processing the data and
solving the problem, the algorithm defines, refines, and executes a function. The
function is always specific to the kind of problem being addressed by the
algorithm.
Each of the five tribes has a different technique and strategy for solving problems
that result in unique algorithms. Combining these algorithms should lead
eventually to the master algorithm that will be able to solve any given problem.
The following discussion provides an overview of the five main algorithmic
techniques.
Symbolic reasoning
One of the earliest tribes, the symbolists, believed that knowledge could be
obtained by operating on symbols (signs that stand for a certain meaning or
event) and deriving rules from them. By putting together complex systems of
rules, we could attain a logic deduction of the result we wanted to know, thus the
symbolists shaped their algorithms to produce rules from data. In symbolic
reasoning, deduction expands the realm of human knowledge,
while induction raises the level of human knowledge. Induction commonly opens
new fields of exploration, while deduction explores those fields.
Essentially, each of the neurons (created as an algorithm that models the real-
world counterpart) solves a small piece of the problem, and using many neurons
in parallel solves the problem as a whole.
Bayesian inference
A group of scientists, called Bayesians, perceived that uncertainty was the key
aspect to keep an eye on and that learning wasn‘t assured but rather took place
as a continuous updating of previous beliefs that grew more and more accurate.
This perception led the Bayesians to adopt statistical methods and, in particular,
derivations from Bayes‘ theorem, which helps us to calculate probabilities under
specific conditions (for instance, seeing a card of a certain seed, the starting value
for a pseudo-random sequence, drawn from a deck after three other cards of
same seed).
Modification is not quick and easy. It may lead to Quick and Easy
affecting the program adversely. program
modification.
AI Technique
Applications of AI
AI has been dominant in various fields such as −
o Police use computer software that can recognize the face of criminal
with the stored portrait made by forensic artist.
History of AI
1945 Isaac Asimov, a Columbia University alumni, coined the term Robotics.
1985 Harold Cohen created and demonstrated the drawing program, Aaron.
Multi-agent planning
Scheduling
Games
The Deep Blue Chess Program beats the then world chess champion,
1997 Garry Kasparov.
Turing Test
A Turing Test is a method of inquiry in artificial intelligence (AI) for determining
whether or not a computer is capable of thinking like a human being. The test is
named after Alan Turing, the founder of the Turning Test and an English computer
scientist, cryptanalyst, mathematician and theoretical biologist.
During the test, one of the humans functions as the questioner, while the second
human and the computer function as respondents. The questioner interrogates
the respondents within a specific subject area, using a specified format and
context. After a preset length of time or number of questions, the questioner is
then asked to decide which respondent was human and which was a computer.
The test is repeated many times. If the questioner makes the correct
determination in half of the test runs or less, the computer is considered to have
artificial intelligence because the questioner regards it as "just as human" as the
human respondent.
The test is named after Alan Turing, who pioneered machine learning during the
1940s and 1950s. Turing introduced the test in his 1950 paper called ―Computing
Machinery and Intelligence‖ while at the University of Manchester.
In his paper, Turing proposed a twist on what is called ―The Imitation Game.‖ The
Imitation Game involves no use of AI, but rather three human participants in
three separate rooms. Each room is connected via a screen and keyboard, one
containing a male, the other a female, and the other containing a male or female
judge. The female tries to convince the judge that she is the male, and the judge
tries to disseminate which is which.
Turing changes the concept of this game to include an AI, a human and a human
questioner. The questioner‘s job is then to decide which is the AI and which is the
human. Science the formation of the test, many AI have been able to pass; one of
the first is a program created by Joseph Weizenbaum called ELIZA.
The Turing Test has been criticized over the years, in particular because
historically, the nature of the questioning had to be limited in order for a
computer to exhibit human-like intelligence. For many years, a computer might
only score high if the questioner formulated the queries, so they had "Yes" or
"No" answers or pertained to a narrow field of knowledge. When questions were
open-ended and required conversational answers, it was less likely that the
computer program could successfully fool the questioner.
In addition, a program such as ELIZA could pass the Turing Test by manipulating
symbols it does not understand fully. John Searle argued that this does not
determine intelligence comparable to humans.
There have been a number of variations to the Turing Test to make it more
relevant. Such examples include:
Total Turing Test- Where the questioner can also test perceptual abilities as
well as the ability to manipulate objects.
Alternatives to Turing Tests were later developed because many see the Turing
test to be flawed. These alternatives include tests such as:
The Marcus Test- In which a program which can ‗watch‘ a television show is
tested by being asked meaningful questions about the show's content.
The Lovelace Test 2.0- Which is a test made to detect AI through examining
its ability to create art.
o Motor Control (For total Turing test): To act upon objects if requested.
Note : Every agent can perceive its own actions (but not always the effects)
Examples of Agent:-
A software agent has Keystrokes, file contents, received network packages
which act as sensors and displays on the screen, files, sent network packets
acting as actuators.
A Human agent has eyes, ears, and other organs which act as sensors and hands,
legs, mouth, and other body parts acting as actuators.
A Robotic agent has Cameras and infrared range finders which act as sensors and
various motors acting as actuators.
Types of Agents
Agents can be grouped into four classes based on their degree of perceived
intelligence and capability :
Goal-Based Agents
Utility-Based Agents
Learning Agent
Simple reflex agents ignore the rest of the percept history and act only on the
basis of the current percept. Percept history is the history of all that an agent
has perceived till date. The agent function is based on the condition-action
rule. A condition-action rule is a rule that maps a state i.e, condition to an
action. If the condition is true, then the action is taken, else not. This agent
function only succeeds when the environment is fully observable. For simple
reflex agents operating in partially observable environments, infinite loops
are often
unavoidable. It may be possible to escape from infinite loops if the agent can
randomize its actions. Problems with Simple reflex agents are :
If there occurs any change in the environment, then the collection of rules
need to be updated.
the part of the world which cannot be seen. Updating the state requires
information about :
Goal-based agents
These kind of agents take decision based on how far they are currently from
their goal(description of desirable situations). Their every action is intended to
reduce its distance from the goal. This allows the agent a way to choose among
multiple possibilities, selecting the one which reaches a goal state. The knowledge
that supports its decisions is represented explicitly and can be modified, which
makes these agents more flexible. They usually require search and planning. The
Utility-based agents
The agents which are developed having their end uses as building blocks are
called utility based agents. When there are multiple possible alternatives, then to
decide which one is best, utility-based agents are used.They choose actions based
on a preference (utility) for each state. Sometimes achieving the desired goal
is not enough. We may look for a quicker, safer, cheaper trip to reach a
destination.
Agent happiness should be taken into consideration. Utility describes
how “happy” the agent is. Because of the uncertainty in the world, a utility
agent chooses the action that maximizes the expected utility. A utility function
maps a
state onto a real number which describes the associated degree of happiness.
Learning Agent
A learning agent in AI is the type of agent which can learn from its past
experiences or it has learning capabilities.
It starts to act with basic knowledge and then able to act and adapt automatically
through learning.
A learning agent has mainly four conceptual components, which are:
Heuristic Search
Other names for these are Blind Search, Uninformed Search, and Blind Control
Strategy. These aren‘t always possible since they demand much time or memory.
They search the entire state space for a solution and use an arbitrary ordering of
operations. Examples of these are Breadth First Search (BFS) and Depth First
Search (DFS).
Other names for these are Informed Search, Heuristic Search, and Heuristic
Control Strategy. These are effective if applied correctly to the right types of tasks
and usually demand domain-specific information. We need this extra information
to compute preference among child nodes to explore and expand. Each node has
a heuristic function associated with it. Examples are Best First Search (BFS) and
A*.
Before moving on described certain techniques, first take a look at the ones we
generally observe. Below, a few name.
Best-First Search
A* Search
Bidirectional Search
Tabu Search
Beam Search
Simulated Annealing
Hill Climbing
Search algorithms are one of the most important areas of Artificial Intelligence.
This topic will explain all about the search algorithms in AI.
Problem-solving agents:
Actions: It gives the description of all the available actions to the agent.
Solution: It is an action sequence which leads from the start node to the
goal node.
Optimal Solution: If a solution has the lowest cost among all solutions.
Following are the four essential properties of search algorithms to compare the
efficiency of these algorithms:
Uninformed/Blind Search:
The uninformed search does not contain any domain knowledge such as
closeness, the location of the goal. It operates in a brute-force way as it only
includes information about how to traverse the tree and how to identify leaf and
goal nodes. Uninformed search applies a way in which search tree is searched
without any information about the search space like initial state operators and
test for the goal, so it is also called blind search.It examines each node of the tree
until it achieves the goal node.
o Breadth-first search
o Depth-first search
o Bidirectional Search
Informed Search
Informed search algorithms use domain knowledge. In an informed search,
problem information is available which can guide the search. Informed search
strategies can find a solution more efficiently than an uninformed search strategy.
Informed search is also called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but
guaranteed to find a good solution in reasonable time.
Informed search can solve much complex problem which could not be solved in
another way.
1. Greedy Search
2. A* Search
Observability - Check for the object or System states and all other
factors affecting the output.
Error prevention
Game Playing
Game Playing is an important domain of artificial intelligence. Games don‘t
require much knowledge; the only knowledge we need to provide is the rules,
legal moves and the conditions of winning or losing the game.
Both players try to win the game. So, both of them try to make the best move
possible at each turn. Searching techniques like BFS(Breadth First Search) are not
accurate for this as the branching factor is very high, so searching will take a lot of
time. So, we need another search procedures that improve –
MOVEGEN : It generates all the possible moves that can be generated from
the current position.
STATICEVALUATION : It returns a value depending upon the goodness
from the viewpoint otwo-player
This algorithm is a two player game, so we call the first player as PLAYER1 and
second player as PLAYER2. The value of each node is backed-up from its children.
For PLAYER1 the backed-up value is the maximum value of its children and for
PLAYER2 the backed-up value is the minimum value of its children. It provides
most promising move to PLAYER1, assuming that the PLAYER2 has make the best
move. It is a recursive algorithm, as same procedure occurs at each level.
We assume that PLAYER1 will start the game. 4 levels are generated. The value to
nodes H, I, J, K, L, M, N, O is provided by STATICEVALUATION function. Level 3 is
maximizing level, so all nodes of level 3 will take maximum values of their
children. Level 2 is minimizing level, so all its nodes will take minimum values of
their children. This process continues. The value of A is 23. That means A should
choose C move to win.
o Min-Max algorithm is mostly used for game playing in AI. Such as Chess,
Checkers, tic-tac-toe, go, and various tow-players game. This Algorithm
computes the minimax decision for the current state.
o In this algorithm two players play the game, one is called MAX and other is
called MIN.
o Both the players fight it as the opponent player gets the minimum benefit
while they get the maximum benefit.
o Both Players of the game are opponent of each other, where MAX will
select the maximized value and MIN will select the minimized value.
o The minimax algorithm proceeds all the way down to the terminal node of
the tree, then backtrack the tree as the recursion.
maxEva= -infinity
Initial call:
Minimax(node, 3, true)
Working of Min-Max
Algorithm:
o In this example, there are two players one is called Maximizer and other is
called Minimizer.
o Maximizer will try to get the Maximum possible score, and Minimizer will
try to get the minimum possible score.
o This algorithm applies DFS, so in this game-tree, we have to go all the way
through the leaves to reach the terminal nodes.
o At the terminal node, the terminal values are given so we will compare
those value and backtrack the tree until the initial state occurs. Following
are the main steps involved in solving the two-player game tree:
Step-1: In the first step, the algorithm generates the entire game-tree and apply
the utility function to get the utility values for the terminal states. In the below
tree diagram, let's take A is the initial state of the tree. Suppose maximizer takes
first turn which has worst-case initial value =- infinity, and minimizer will take next
turn which has worst-case initial value = +infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -
∞, so we will compare each value in terminal state with initial value of Maximizer
and determines the higher nodes values. It will find the maximum among the all.
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value
with +∞, and will find the 3rd layer node values.
Step 3: Now it's a turn for Maximizer, and it will again choose the maximum of
all nodes value and find the maximum value for the root node. In this game
tree, there are only 4 layers, hence we reach immediately to the root node, but
in real games, there will be more than 4 layers.
That was the complete workflow of the minimax two player game.
Alpha-Beta Pruning
o Alpha-beta pruning is a modified version of the minimax algorithm. It is an
optimization technique for the minimax algorithm.
o As we have seen in the minimax search algorithm that the number of game
states it has to examine are exponential in depth of the tree. Since we
cannot eliminate the exponent, but we can cut it to half. Hence there is a
technique by which without checking each node of the game tree we can
compute the correct minimax decision, and this technique is called
pruning. This involves two threshold parameter Alpha and beta for future
expansion, so it is called alpha-beta pruning. It is also called as Alpha-
Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it
not only prune the tree leaves but also entire sub-tree.
The Alpha-beta pruning to a standard minimax algorithm returns the same move
as the standard algorithm does, but it removes all the nodes which are not really
affecting the final decision but making algorithm slow. Hence by pruning these
nodes, it makes the algorithm fast.
Note: To better understand this topic, kindly study the minimax algorithm.
1. α>=β
o While backtracking the tree, the node values will be passed to upper nodes
instead of values of alpha and beta.
o We will only pass the alpha, beta values to the child nodes.
maxEva= -infinity
if beta<=alpha
break
return maxEva
if beta<=alpha
break
return minEva
Step 1: At the first step the, Max player will start first move from node A
where α= -∞ and β= +∞, these value of alpha and beta passed down to node
B where again α= -∞ and β= +∞, and Node B passes the same value to its
child D.
Step 2: At Node D, the value of α will be calculated as its turn for Max. The
value of α is compared with firstly 2 and then 3, and the max (2, 3) = 3 will be
the value of α at node D and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change
as this is a turn of Min, Now β= +∞, will compare with the available
subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now α= -∞,
and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E,
and the values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change.
The current value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at
node E α= 5 and β= 3, where α>=β, so the right successor of E will be pruned,
and algorithm will not traverse it, and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A.
At node A, the value of alpha will be changed the maximum available value is 3 as
max (-∞, 3)= 3, and β= +∞, these two values now passes to right successor of A
which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is
0, and max(3,0)= 3, and then compared with right child which is 1, and max(3,1)=
3 still α remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here
the value of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now
at C, α=3 and β= 1, and again it satisfies the condition α>=β, so the next child
of C which is G will be pruned, and the algorithm will not compute the entire
sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) =
3. Following is the final game tree which is the showing the nodes which are
computed and nodes which has never computed. Hence the optimal value for the
maximizer is 3 for this example.
o Ideal ordering: The ideal ordering for alpha-beta pruning occurs when
lots of pruning happens in the tree, and best moves occur at the left side
of the tree. We apply DFS hence it first search left of the tree and go deep
twice as minimax algorithm in the same amount of time. Complexity in
ideal ordering is O(bm/2).
o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try
order: captures first, then threats, then forward moves, backward moves.
o We can bookkeep the states, as there is a possibility that states may repeat.
Knowledge Representation
What is Knowledge?
Knowledge is an useful term to judge the understanding of an individual on
a given subject.
In intelligent systems, domain is the main focused subject area. So, the
system specifically focuses on acquiring the domain knowledge.
Types of knowledge in AI
1. Declarative knowledge
2. Procedural knowledge
3. Heuristic knowledge
This type gives an idea about the other types of knowledge that are suitable
for solving problem.
5. Structural knowledge
1. Important attributes
There are two attributes shown in the diagram, instance and isa. Since
these attributes support property of inheritance, they are of prime
importance.
i. What are the primitives and at what level should the knowledge be
represented?
ii. What should be the number (small or large) of low-level primitives or high-level
facts?
Such a representation can make it easy to answer questions such as: Who spotted
Alex?
Hence, the user can add other facts, such as "Spotted (x, y) → saw (x, y)"
"There are more sheep than people in Australia", and "English speakers can be
found all over the world."
These facts can be described by including an assertion to the sets representing
people, sheep, and English.
While selecting and reversing the right structure, it is necessary to solve following
problem statements. They include the process on how to:
There is no specific way to solve these problems, but some of the effective
knowledge representation techniques have the potential to solve them.
1. Logical Representation
3. Frame Representation
4. Production Rules
1. Logical Representation
Logical representation is a language with some concrete rules which deals with
propositions and has no ambiguity in representation. Logical representation
means drawing a conclusion based on various conditions. This representation lays
down some important communication rules. It consists of precisely defined syntax
and semantics which supports the sound inference. Each sentence can be
translated into logics using syntax and semantics.
Facts are the general statements that may be either True or False. Thus, logic can
be used to represent such simple facts.
User has to define a set of primitive symbols along with the required
semantics.
New logical statements are formed from the existing ones. The statements which
can be either TRUE or false but not both , are called propositions. A declarative
sentence expresses a statement with a proposition as content;
Example: The declarative "Cotton is white" expresses that Cotton is white. So, the
sentence "Cotton is white" is a true statement.
Example:
1. a) It is Sunday.
4. d) 5 is a prime number.
o A proposition formula which has both true and false values is called
The syntax of propositional logic defines the allowable sentences for the
knowledge representation. There are two types of Propositions:
a. Atomic Propositions
b. Compound propositions
Example:
Example:
Logical Connectives:
Truth Table:
Precedence of connectives:
Precedence Operators
Logical equivalence:
Let's take two propositions A and B, so for logical equivalence, we can write it as
A⇔B. In below truth table we can see that column for ¬A∨ B and A→B, are
identical hence A is Equivalent to B
Properties of
Operators:
o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.
Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences
in the logic.
Semantics:
o Semantics are the rules by which we can interpret the sentence in the logic.
Note: We will discuss Prepositional Logics and Predicate logics in later chapters.
b. Kind-of-relation
Statements:
a. Jerry is a cat.
b. Jerry is a mammal
In the above diagram, we have represented the different type of knowledge in the
form of nodes and arcs. Each object is connected with another object by some
relation.
4. Semantic networks do not have any standard definition for the link names.
5. These networks are not intelligent and depend on the creator of the
system.
3. Frame Representation
Facets: The various aspects of a slot is known as Facets. Facets are features
of frames which enable us to put constraints on the frames. Example: IF-
NEEDED facts are called when data of any particular slot is needed. A frame may
consist of any number of slots, and a slot may include any number of facets and
facets may have any number of values. A frame is also known as slot-filter
knowledge representation in artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-
day classes and objects. A single frame is not much useful. Frames system consist
of a collection of frames which are connected. In the frame, knowledge about an
object or event can be stored together in the knowledge base. The frame is a type
of technology which is widely used in various applications including Natural
language processing and machine visions.
Example: 1
an example of a frame for a book
Slots Filters
Year 1996
Page 1152
Example 2:
Slots Filter
Name Peter
Profession Doctor
Age 25
Weight 78
4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If
condition then action". It has mainly three parts:
o Working Memory
o The recognize-act-cycle
In production rules agent checks for the condition and if the condition exists then
production rule fires and corresponding action is carried out. The condition part
of the rule determines which rule may be applied to a problem. And the action
part carries out the associated problem-solving steps. This complete process is
called a recognize-act cycle.
The working memory contains the description of the current state of problems-
solving and rule can write knowledge to the working memory. This knowledge
match and may fire other rules.
If there is a new situation (state) generates, then multiple production rules will be
fired together, this is called conflict set. In this situation, the agent needs to select
a rule from these sets, and it is called a conflict resolution.
Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (bus arrives at destination) THEN action (get down from the bus).
2. The production rules are highly modular, so we can easily remove, add or
modify an individual rule.
1. Production rule system does not exhibit any learning capabilities, as it does
not store the result of the problem for the future uses.
2. During the execution of the program, many rules may be active hence rule-
based production systems are inefficient.
Rules are usually expressed in the form of IF . . . THEN . . . statements, such as: IF
A THEN B This can be considered to have a similar logical meaning as the
following: A→B
In other words, the purpose of a rule is usually to tell a system (such as an expert
system) what to do in certain circumstances, or what conclusions to draw from a
set of inputs about the current situation.
In general, a rule can have more than one antecedent, usually combined either by
AND or by OR (logically the same as the operators ∧ and ∨ ).
Similarly, a rule may have more than one consequent, which usually suggests that
there are multiple actions to be taken.
IF x > 3
IF name is ―Bob‖
IF weather is cold
Here, the objects being considered are x, name, and weather; the operators are
―>‖ and ―is‖, and the values are 3, ―Bob,‖ and cold.
An object in this sense is simply a variable that represents some physical object or
state in the real world.
IF name is ―Bob‖
The conclusion of the rule is actually an action, and the action takes the form of a
recommendation to Bob that he should wear a coat.
In some cases, the rules provide more definite actions such as ―move left‖ or
―close door,‖ in which case the rules are being used to represent directives.
IF temperature is below 0
Scripts
Some of the elements of a typical script include entry conditions, props, roles,
tracks, and scenes. The entry conditions describe situations that must be satisfied
before events in this script can occur or be valid. Props refer to objects that are
used in the sequence of events that occur. Roles refer to the people involved in
the script. The result is conditions that exist after the events in the script have
occurred. Track refers to variations that might occur in a particular script. And
finally, scenes describe the actual sequence of events that occur.
A script is useful in predicting what will happen in a specific situation. Even though
certain events have not been observed, the script permits the computer to
predict what will happen to whom and when. If the computer triggers a script,
questions can be asked and accurate answers derived with little or no original
input knowledge. Like frames, scripts are a particularly useful form of knowledge
representation because there are so many stereotypical situations and events
that people use every day. Knowledge like this is generally taken for granted, but
in computer problem-solving situations, such knowledge must often be simulated
to solve a particular problem using artificial intelligence.
To use the script, we store knowledge in the computer in symbolic form. This is
best done using LISP or another symbolic language. We can then ask questions
about various persons and conditions. A search and pattern-matching process
examines the script for the answers. For example, what does the customer do
first? Well, he parks the car, then goes into the restaurant. Whom does he pay?
The server, of course. The whole thing is totally predictable.
That is to say: For any 2 (or more) sentences that are identical in meaning
there should be only one representation of that meaning.
Double arrow indicates two may link between actor and the action
ATRANS is one of the primitive acts used by the theory . it indicates transfer of
possession
A second set of building block is the set of allowable dependencies among the
conceptualization describe in a sentence.
Conceptual Ontologies
An ontology may give axioms to restrict the use of some symbol. For example, it
may specify that apartment buildings are buildings, which are human-constructed
artifacts. It may give some restriction on the size of buildings so that shoeboxes
cannot be buildings or that cities cannot be buildings. It may state that a building
cannot be at two geographically dispersed locations at the same time (so if you
take off some part of the building and move it to a different location, it is no
longer a single building). Because apartment buildings are buildings, these
restrictions also apply to apartment buildings.
Aristotelian Definitions
If genera are different and co-ordinate, their differentiae are themselves different
in kind. Take as an instance the genus "animal" and the genus "knowledge". "With
feet", "two-footed", "winged", "aquatic", are differentiae of "animal"; the species
In the style of modern ontologies, we would say that "animal" is a class, and
"knowledge" is a class. The property "two-footed" has domain "animal". If
something is an instance of knowledge, it does not have a value for the property
"two-footed".
For each class you may want to define, determine a relevant superclass,
and then select those attributes that distinguish the class from other
subclasses. Each attribute gives a property and a value.
For each property, define the most general class for which it makes sense,
and define the domain of the property to be this class. Make the range
another class that makes sense (perhaps requiring this range class to be
defined, either by enumerating its values or by defining it using an
Aristotelian definition).
This can get quite complicated. For example, defining "luxury furniture", perhaps
the superclass you want is "furniture" and the distinguishing characteristics are
cost is high and luxury furniture is soft. The softness of furniture is different than
the softness of rocks. we also probably want to distinguish the squishiness from
the texture (both of which may be regarded as soft).
This methodology does not, in general, give a tree hierarchy of classes. Objects
can be in many classes. Each class does not have a single most-specific superclass.
However, it is still straightforward to check whether one class is a subclass of
another, to check the meaning of a class, and to determine the class that
corresponds to a concept in your head.
evolution. Trying to force a tree structure in other domains has been much less
successful.
An ontology does not specify the individuals not known at design time. For
example, an ontology of buildings would typically not include actual buildings. An
ontology would specify those individuals that are fixed and should be shared, such
as the days of the week, or colors.
The primary purpose of an ontology is to document what the symbols mean - the
mapping between symbols (in a computer) and concepts (in someone's head).
Given a symbol, a person is able to use the ontology to determine what it means.
When someone has a concept to be represented, the ontology is used to find the
appropriate symbol or to determine that the concept does not exist in the
ontology. The secondary purpose, achieved by the use of axioms, is to allow
inference or to determine that some combination of values is inconsistent. The
main challenge in building an ontology is the organization of the concepts to allow
a human to map concepts into symbols in the computer, and for the computer to
infer useful new knowledge from stated facts.
Expert System
The expert system is a part of AI, and the first ES was developed in the year 1970,
which was the first successful approach of artificial intelligence. It solves the most
complex issue as an expert by extracting the knowledge stored in its knowledge
base. The system helps in decision making for compsex problems using both
facts and heuristics like a human expert. It is called so because it contains
the expert knowledge of a specific domain and can solve any complex
problem of that particular domain. These systems are designed for a specific
domain, such
as medicine, science, etc.
Below is the block diagram that represents the working of an expert system:
o PXDES: It is an expert system that is used to determine the type and level
of lung cancer. To determine the disease, it takes a picture from the
upper body, which looks like the shadow. This shadow identifies the type
and degree of harm.
o User Interface
o Inference Engine
o Knowledge Base
1. User Interface
With the help of a user interface, the expert system interacts with the user, takes
queries as an input in a readable format, and passes it to the inference engine.
After getting the response from the inference engine, it displays the output to the
user. In other words, it is an interface that helps a non-expert user to
communicate with the expert system to find a solution.
o With the help of an inference engine, the system extracts the knowledge
from the knowledge base.
o Forward Chaining: It starts from the known facts and rules, and applies
the inference rules to add their conclusion to the known facts.
3. Knowledge Base
o One can also view the knowledge base as collections of objects and their
attributes. Such as a Lion is an object and its attributes are it is a mammal,
it is not a domestic animal, etc.
o The KB of the MYCIN is updated successfully. In order to test it, the doctor
provides a new problem to it. The problem is to identify the presence of the
bacteria by inputting the details of a patient, including the symptoms,
current condition, and medical history.
o Now the system has collected all the information, so it will find the solution
for the problem by applying if-then rules using the inference engine and
using the facts stored within the KB.
o In the end, it will provide a response to the patient by using the user
interface.
Before using any technology, we must have an idea about why to use that
technology and hence the same for the ES. Although we have human experts in
every field, then what is the need to develop a computer-based system. So below
are the points that are describing the need of the ES:
5. High security: These systems provide high security to resolve any query.
o They can be used for risky places where the human presence is not safe.
o The response of the expert system may get wrong if the knowledge base
contains the wrong information.
o For each domain, we require a specific ES, which is one of the big
limitations.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
In the real world, there are lots of scenarios, where the certainty of something is
not confirmed, such as "It will rain today," "behavior of someone for some
situations," "A match between two teams or two players." These are probable
sentences for which we can assume that it will happen but not sure about it, so
here we use probabilistic reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain
knowledge:
o Bayes' rule
o Bayesian Statistics
We can find the probability of an uncertain event by using the below formula.
o P(¬A) + P(A) = 1.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and
objects in the real world.
Conditional probability:
Let's suppose, we want to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it can be written as:
It can be explained by using the below Venn diagram, where B is occurred event,
so sample space will be reduced to set B, and now we can only calculate event A
when event B is already occurred by dividing the probability of P(A⋀B) by P(
B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students
who likes English and mathematics, and then what is the percent of students
those who like English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
planning in AI?
In blocks-world problem, three blocks labeled as 'A', 'B', 'C' are allowed to
rest on the flat surface. The given condition is that only one block can be
moved at a time to achieve the goal.
The start state and goal state are shown in the following diagram.
Apply the chosen rule for computing the new problem state.
Detect dead ends so that they can be abandoned and the system‘s effort is
directed in more fruitful directions.
This is one of the most important planning algorithms, which is specifically used
by STRIPS.
The stack is used in an algorithm to hold the action and satisfy the goal. A
knowledge base is used to hold the current state, actions.
Goal stack is similar to a node in a search tree, where the branches are
created if there is a choice of an action.
i. Start by pushing the original goal on the stack. Repeat this until the stack
becomes empty. If stack top is a compound goal, then push its unsatisfied
subgoals on the stack.
ii. If stack top is a single unsatisfied goal then, replace it by an action and push the
action‘s precondition on the stack to satisfy the condition.
iii. If stack top is an action, pop it from the stack, execute it and change the
knowledge base by the effects of the action.
iv. If stack top is a satisfied goal, pop it from the stack.
Non-linear planning
This planning is used to set a goal stack and is included in the search space of all
possible subgoal orderings. It handles the goal interactions by interleaving
method.
It takes larger search space, since all possible goal orderings are taken into
consideration.
Algorithm
1. Choose a goal 'g' from the goalset
2. If 'g' does not match the state, then
plan = [plan; o]
hierarchical planning to truly cope with the requirements of applications from the
real world. We propose a framework-based approach to remedy this situation.
First, we provide a basis for defining different formal models of hierarchical
planning, and define two models that comprise a large portion of HTN planners.
Second, we provide a set of concepts that helps to interpret HTN planners from
the aspect of their search space. Then, we analyse and compare the planners
based on a variety of properties organised in five segments, namely domain
authoring, expressiveness, competence, performance and applicability.
Furthermore, we select Web service composition as a real-world and current
application, and classify and compare the approaches that employ HTN planning
to solve the problem of service composition. Finally, we conclude with our
findings and present directions for future work.
The biggest contribution towards this kind of ―popular‖ image of HTN planning
has emerged after the proposal of the Simple Hierarchical Ordered Planner
(SHOP) and its successors. SHOP is an HTN-based planner that shows efficient
performance even on complex problems, but at the expense of providing well-
written and possibly algorithmic-like domain knowledge. Several situations may
confirm our observation, but the most well-known is the disqualification of SHOP
from the International Planning Competition (IPC) in 2000 with the reason that
the domain knowledge was not well-written so that the planner produced plans
that were not solutions to the competition problems. Furthermore, the
disqualification was followed by a dispute on whether providing such knowledge
to a planner should be considered as ―cheating‖ in the world of AI planning.
SHOP's style of HTN planning was introduced by the end of 1990s, but HTN
planning existed long before that. The initial idea of hierarchical planning was
presented by the Nets of Action Hierarchies (NOAH) planner in 1975. It was
followed by a series of studies on practical implementations and theoretical
contributions on HTN planning up until today. We believe that the fruitful ideas
and scientific contribution of nearly 40 years must not be easily reduced to
controversy and antagonism towards HTN planning. On the other hand, we are
faced with a situation full of fuzziness in terms of difficulty to understand what
kind of planning style other HTN planners perform, how it is achieved and
implemented, what are the similarities and differences among these planners,
and finally, what is their actual contribution to the creation of the overall and
possibly objective image of HTN planning. The situation cannot be effortlessly
clarified because the current literature on HTN planning, despite being very rich,
reports little or nothing at all on any of these issues, especially in a consolidated
form.
Partial-Order Planning
The forward and regression planners enforce a total ordering on actions at all
stages of the planning process. The CSP planner commits to the particular time
that the action will be carried out. This means that those planners have to commit
to an ordering of actions that cannot occur concurrently when adding them to a
partial plan, even if there is no particular reason to put one action before another.
For uniformity, treat start as an action that achieves the relations that are true in
the initial state, and treat finish as an action whose precondition is the goal to be
solved. The pseudoaction start is before every other action, and finish is after
every other action. The use of these as actions means that the algorithm does not
require special cases for the initial situation and for the goals. When the
preconditions of finish hold, the goal is solved. An action, other than start or
finish, will be in a partial-order plan to achieve a precondition of an action in the
plan. Each precondition of an action in the plan is either true in the initial state, and
so achieved by start, or there will be an action in the plan that achieves it.
We must ensure that the actions achieve the conditions they were assigned to
achieve. Each precondition P of an action act1 in a plan will have an
action act0 associated with it such that act0 achieves precondition P for act1. The
triple ⟨ act0,P,act1⟩ is a causal link. The partial order specifies that
action act0 occurs before action act1, which is written as act0 < act1. Any other
action A that makes P false must either be before act0 or after act1.
At each stage in the planning process, a pair ⟨ G,act1⟩ is selected from the
agenda, where P is a precondition for action act1. Then an action, act0, is
chosen to achieve P. That action is either already in the plan - it could be the
start action, for example - or it is a new action that is added to the plan. Action
act0 must happen before act1 in the partial order. It adds a causal link that
records
that act0 achieves P for action act1. Any action in the plan that deletes P must
happen either before act0 or after act1. If act0 is a new action, its preconditions
are added to the agenda, and the process continues until the agenda is empty.
This is a non-deterministic procedure. The "choose" and the "either ...or ..." form
choices that must be searched over. There are two choices that require search:
non-deterministic procedure
PartialOrderPlanner(Gs) 2: Inputs
3: Gs: set of atomic propositions to achieve
4: Output
5: linear plan to achieve Gs
6: Local
7: Agenda: set of ⟨ P,A⟩ pairs where P is atom and A an
action 8: Actions: set of actions in the current plan
9: Constraints: set of temporal constraints on actions
10: CausalLinks: set of ⟨ act0,P,act1⟩ triples
11: Agenda ←{⟨ G,finish⟩ :G
∈ Gs} 12: Actions ←{start,finish}
13: Constraints ←{start<finish}
14: CausalLinks ←{}
15: repeat
16: select and remove ⟨ G,act1⟩ from Agenda
17: either
18: choose act0 ∈ Actions such that act0 achieves G
19: or
20: choose act0 ∉ Actions such that act0 achieves G
21: Actions ←Actions ∪{act0}
22: Constraints ←add_const(start<act0,Constraints)
23: for each CL∈ CausalLinks do
24: Constraints ←protect(CL,act0,Constraints)
25:
26: Agenda ←Agenda ∪ {⟨ P,act0⟩ : P is a precondition of
act0 } 27:
28 : Constraints ←add_const(act0<act1,Constraints)
29: CausalLinks ∪ {⟨ acto,G,act1⟩ }
30: for each A∈ Actions do
31: Constraints ←protect(⟨ acto,G,act1⟩ ,A,Constraints)
until Agenda={}
The preceding algorithm has glossed over one important detail. It is sometimes
necessary to perform some action more than once in a plan. The preceding
algorithm will not work in this case, because it will try to find a partial ordering
with both instances of the action occurring at the same time. To fix this problem,
the ordering should be between action instances, and not actions themselves. To
implement this, assign an index to each instance of an action in the plan, and the
ordering is on the action instance indexes and not the actions themselves. This is
left as an exercise.
The field of NLP involves making computers to perform useful tasks with the
natural languages humans use. The input and output of an NLP system can be −
Speech
Written Text
Components of NLP
tasks −
It involves −
Difficulties in NLU
For example, ―He lifted the beetle with red cap.‖ − Did he use cap to lift the
beetle or he lifted a beetle that had red cap?
NLP Terminology
Steps in NLP
Context-Free Grammar
Top-Down Parser
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of
the rewrite rules. Let us create grammar to parse a sentence −
The parse tree breaks down the sentence into structured parts so that the
computer can easily understand and process it. In order for the parsing algorithm
to construct this parse tree, a set of rewrite rules, which describe what tree
structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence
of other symbols. According to first order logic rule, if there are two strings Noun
Phrase (NP) and Verb Phrase (VP), then the string combined by NP followed by VP
is a sentence. The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET
ADJ N VP → V NP
Lexocon −
DET → a | the
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck the grains" can be wrongly permitted. i.
e. the subject-verb agreement error is approved as correct.
Demerits −
They are not highly precise. For example, ―The grains peck the bird‖, is a
syntactically correct according to parser, but even if it makes no sense,
parser takes it as a correct sentence.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a
sequence of terminal symbols that matches the classes of the words in the input
sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the
process is started over again with a different set of rules. This is repeated until a
specific rule is found which describes the structure of the sentence.
Demerits −
The types of grammars that exist are Noam Chomsky invented a hierarchy of
grammars.
A finite state automaton could be designed that defined the language that
consisted of a string of one or more occurrences of the letter a. Hence, the
following strings would be valid strings in this language:
aaa
aaaaaaaaaaaaaaaaa
Regular languages are of interest to computer scientists, but are not of great
interest to the field of natural language processing because they are not powerful
enough to represent even simple formal languages, let alone the more complex
natural languages.
It is context free because it defines the grammar simply in terms of which word
types can go together—it does not specify the way that words should agree with
each.
Chickens eats.
A context-free grammar can have only at most one terminal symbol on the right-
hand side of its rewrite rules.
Rewrite rules for a context-sensitive grammar, in contrast, can have more than
one terminal symbol on the right-hand side. This enables the grammar to specify
number, case, tense, and gender agreement.
Each context-sensitive rewrite rule must have at least as many symbols on the
right-hand side as it does on the left-hand side.
A X B→A Y B
This process, in which we convert a sentence into a tree that represents the
sentence‘s syntactic structure, is known as parsing.
This tree shows how the sentence is made up of a noun phrase and a verb phrase.
The noun phrase consists of an article, an adjective, and a noun. The verb phrase
consists of a verb and a further noun phrase, which in turn consists of an article
and a noun.
Building a parse tree from the top down involves starting from a sentence and
determining which of the possible rewrites for Sentence can be applied to the
sentence that is being parsed. Hence, in this case, Sentence would be rewritten
using the following rule:
Sentence→NounPhrase VerbPhrase
Then the verb phrase and noun phrase would be broken down recursively in the
same way, until only terminal symbols were left.
When a parse tree is built from the top down, it is known as a derivation tree.
To build a parse tree from the bottom up, the terminal symbols of the sentence
are first replaced by their corresponding nonterminals (e.g., cat is replaced by
noun), and then these nonterminals are combined to match the right-hand sides
of rewrite rules.
Parsing Techniques
Transition Networks
Each network represents one nonterminal symbol in the grammar. Hence, in the
grammar for the English language, we would have one transition network for
Sentence, one for Noun Phrase, one for Verb Phrase, one for Verb, and so on.
Fig shows the transition network equivalents for three production rules.
In each transition network, S1 is the start state, and the accepting state, or final
state, is denoted by a heavy border. When a phrase is applied to a transition
network, the first word is compared against one of the arcs leading from the first
state.
If this word matches one of those arcs, the network moves into the state to which
that arc points. Hence, the first network shown in Fig 10.2, when presented with a
Noun Phrase, will move from state S1 to state S2.
A cat sat.
The first arc of the NounPhrase network is labeled Noun. We thus move into the
Noun network. We now follow each of the arcs in the Noun network and discover
that our first word, A, does not match any of them. Hence, we backtrack to the
next arc in the NounPhrase network.
This arc is labeled Article, so we move on to the Article transition network. Here,
on examining the second label, we find that the first word is matched by the
terminal symbol on this arc.
As before, we move into the Noun network and find that our next word, cat,
matches. We thus move to state S4 in the NounPhrase network. This is a success
node, and so we move back to the Sentence network and repeat the process for
the VerbPhrase arc.
This can be done by simply having the system build up the tree by noting which
arcs it successfully followed. When, for example, it successfully follows the
NounPhrase arc in the Sentence network, the system generates a root node
labeled Sentence and an arc leading from that node to a new node labeled
NounPhrase.When the system follows the NounPhrase network and
identifies an article and a noun, these are similarly added to the tree.
In this way, the full parse tree for the sentence can be generated using transition
networks. Parsing using transition networks is simple to understand, but is not
necessarily as efficient or as effective as we might hope for. In particular, it does
not pay any attention to potential ambiguities or the need for words to agree
with each other in case, gender, or number.
Parsing
Top-Down Parsing
We have learnt in the last chapter that the top-down parsing technique parses
the input, and starts constructing a parse tree from the root node gradually
moving down to the leaf nodes. The types of top-down parsing are depicted
below:
Recursive descent is a top-down parsing technique that constructs the parse tree
from the top and the input is read from left to right. It uses procedures for every
terminal and non-terminal entity. This parsing technique recursively parses the
input to make a parse tree, which may or may not require back-tracking. But the
grammar associated with it (if not left factored) cannot avoid back-tracking. A
form of recursive-descent parsing that does not require any back-tracking is
known as predictive parsing.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input
string against the production rules to replace them (if matched). To understand
this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-
most letter of the input, i.e. ‗r‘. The very production of S (S → rXd) matches with
it. So the top-down parser advances to the next input letter (i.e. ‗e‘). The parser
tries to expand non-terminal ‗X‘ and checks its production from the left (X → oa).
It does not match with the next input symbol. So the top-down parser backtracks
to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is
accepted.
Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict
which production is to be used to replace the input string. The predictive parser
does not suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which
points to the next input symbols. To make the parser back-tracking free, the
predictive parser puts some constraints on the grammar and accepts only a class
of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate
a parse tree. Both the stack and the input contains an end symbol $ to denote
that the stack is empty and the input is consumed. The parser refers to the
parsing table to take any decision on the input and stack element combination.
In recursive descent parsing, the parser may have more than one production to
choose from for a single instance of input, whereas in predictive parser, each step
has at most one production to choose. There might be instances where there is
no production matching the input string, making the parsing procedure to fail.
LL Parser
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to
right, the second L in LL(k) stands for left-most derivation and k itself represents
the number of look aheads. Generally k = 1, so LL(k) may also be written as LL(1).
LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table
grows exponentially with the value of k. Secondly, if a given grammar is not LL(1),
then usually, it is not LL(k), for any given k.
Input:
string ω
Output:
error otherwise.
repeat
let X be the top stack symbol and a the symbol pointed by ip.
if X∈ Vt or $
if X = a
else
error()
endif
else /* X is non-terminal */
POP X
else
error()
endif
endif
Bottom-up Parsing
Bottom-up parsing starts from the leaf nodes of a tree and works in upward
direction till it reaches the root node. Here, we start from a sentence and then
apply production rules in reverse manner in order to reach the start symbol. The
image given below depicts the bottom-up parsers available.
Shift-Reduce Parsing
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps
are known as shift-step and reduce-step.
Shift step: The shift step refers to the advancement of the input pointer
to the next input symbol, which is called the shifted symbol. This symbol
is pushed onto the stack. The shifted symbol is treated as a single node of
the parse tree.
Reduce step : When the parser finds a complete grammar rule (RHS)
and replaces it to (LHS), it is known as reduce-step. This occurs when the
top of the stack contains a handle. To reduce, a POP function is performed
on the stack which pops off the handle and replaces it with LHS non-
terminal symbol.
LR Parser
technique. LR parsers are also known as LR(k) parsers, where L stands for left-to-
right scanning of the input stream; R stands for the construction of right-most
derivation in reverse, and k denotes the number of lookahead symbols to make
decisions.
There are three widely used algorithms available for constructing an LR parser:
LR(1) – LR Parser:
o Slow construction
LR Parsing Algorithm
token = next_token()
repeat forever
s = top of stack
PUSH token
PUSH si
token = next_token()
s = top of stack
PUSH A
PUSH goto[s,A]
return
else
error()
LL vs. LR
LL LR
Starts with the root nonterminal on Ends with the root nonterminal on the
the stack. stack.
Uses the stack for designating what is Uses the stack for designating what is
still to be expected. already seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Reads the terminals when it pops one Reads the terminals while it pushes
off the stack. them on the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
Semantic Analysis
The purpose of semantic analysis is to draw exact meaning, or you can say
dictionary meaning from the text. The work of semantic analyzer is to check the
text for meaningfulness.
We already know that lexical analysis also deals with the meaning of the words,
then how is semantic analysis different from lexical analysis? Lexical analysis is
based on smaller token but on the other side semantic analysis focuses on larger
chunks. That is why semantic analysis can be divided into the following two parts
−
It is the first part of the semantic analysis in which the study of the meaning of
individual words is performed. This part is called lexical semantics.
In the second part, the individual words will be combined to provide meaning in
sentences.
The most important task of semantic analysis is to get the proper meaning of the
sentence. For example, analyze the sentence “Ram is great.” In this sentence, the
speaker is talking either about Lord Ram or about a person whose name is Ram.
That is why the job, to get the proper meaning of the sentence, of semantic
analyzer is important.
Hyponymy
It may be defined as the relationship between a generic term and instances of
that generic term. Here the generic term is called hypernym and its instances are
called hyponyms. For example, the word color is hypernym and the color blue,
yellow etc. are hyponyms.
Homonymy
It may be defined as the words having same spelling or same form but having
different and unrelated meaning. For example, the word ―Bat‖ is a homonymy
word because bat can be an implement to hit a ball or bat is a nocturnal flying
mammal also.
Polysemy
Polysemy is a Greek word, which means ―many signs‖. It is a word or phrase with
different but related sense. In other words, we can say that polysemy has the
same spelling but different and related meaning. For example, the word ―bank‖ is
a polysemy word having the following meanings −
A financial institution.
The building in which such an institution is located.
A synonym for ―to rely on‖.
Both polysemy and homonymy words have the same syntax or spelling. The main
difference between them is that in polysemy, the meanings of the words are
related but in homonymy, the meanings of the words are not related. For
example, if we talk about the same word ―Bank‖, we can write the meaning ‗a
financial institution‘ or ‗a river bank‘. In that case it would be the example of
homonym because the meanings are unrelated to each other.
Synonymy
It is the relation between two lexical items having different forms but expressing
the same or a close meaning. Examples are ‗author/writer‘, ‗fate/destiny‘.
Antonymy
It is the relation between two lexical items having symmetry between their
semantic components relative to an axis. The scope of antonymy is as follows −
Meaning Representation
Now, we can understand that meaning representation shows how to put together
the building blocks of semantic systems. In other words, it shows how to put
together entities, concepts, relation and predicates to describe a situation. It also
enables the reasoning about the semantic world.
Semantic Nets
Frames
Rule-based architecture
Case Grammar
Conceptual Graphs
The very first reason is that with the help of meaning representation the linking of
linguistic elements to the non-linguistic elements can be done.
Meaning representation can be used to reason for verifying what is true in the
world as well as to infer the knowledge from the semantic representation.
Lexical Semantics
The first part of semantic analysis, studying the meaning of individual words is
called lexical semantics. It includes words, sub-words, affixes (sub-units),
compound words and phrases also. All the words, sub-words, etc. are collectively
called lexical items. In other words, we can say that lexical semantics is the
relationship between lexical items, meaning of sentences and syntax of sentence.
Pragmatic Analysis
It is the fourth phase of NLP. Pragmatic analysis simply fits the actual
objects/events, which exist in a given context with object references obtained
during the last phase (semantic analysis). For example, the sentence ―Put the
banana in the basket on the shelf‖ can have two semantic interpretations and
pragmatic analyzer will choose between these two possibilities.
The pragmatic analysis means handling the situation in a much more practical or
realistic manner than using a theoretical approach. As we know that a sentence
can have different meanings in various situations. For example, The average is 18.
We can see that for the same input there can be different perceptions. To
interpret the meaning of the sentence we need to understand the situation. To
tackle such problems we use pragmatic analysis. The pragmatic analysis tends to
make the understanding of the language much more clear and easy to interpret.
Implementation:
Language processing are required to follow an order. Each phase takes its input
from the previous phase‘s output and sends it along to the next phase for
processing. While this process input can get rejected half-way if it does not follow
the rules defining it for the next phase.
An AI system can be defined as the study of the rational agent and its
environment. The agents sense the environment through sensors and act on their
environment through actuators. An AI agent can have mental properties such as
knowledge, belief, intention, etc.
Agent
An agent can be anything that perceiveits environment through sensors and act
upon that environment through actuators. An Agent runs in the cycle
of perceiving, thinking, and acting. An agent can be:
o Human-Agent: A human agent has eyes, ears, and other organs which
work for sensors and hand, legs, vocal tract work for actuators.
Before moving forward, we should first know about sensors, effectors, and
actuators.
Sensor: Sensor is a device which detects the change in the environment and
sends the information to other electronic devices. An agent observes its
environment through sensors.
Effectors: Effectors are the devices which affect the environment. Effectors
can be legs, wheels, arms, fingers, wings, fins, and display screen.
Intelligent Agents:
Rational Agent:
A rational agent is an agent which has clear preference, models uncertainty, and
acts in a way to maximize its performance measure with all possible actions.
A rational agent is said to perform the right things. AI is about creating rational
agents to use for game theory and decision theory for various real-world
scenarios.
Rationality:
Structure of an AI Agent
Following are the main three terms involved in the structure of an AI agent:
1. f:P* → A
PEAS Representation
o P: Performance measure
o E: Environment
o A: Actuators
o S: Sensors
Here performance measure is the objective for the success of an agent's behavior.
signal, horn
Types of AI Agents
Agents can be grouped into five classes based on their degree of perceived
intelligence and capability. All these agents can improve their performance and
generate better action over the time. These are given below:
o The Simple reflex agents are the simplest agents. These agents take
decisions on the basis of the current percepts and ignore the rest of the
percept history.
o The Simple reflex agent does not consider any part of percepts history
during their decision and action process.
o These agents have the model, "which is knowledge of the world" and based
on the model they perform actions.
3. Goal-based agents
o The agent needs to know its goal which describes desirable situations.
4. Utility-based agents
o These agents are similar to the goal-based agent but provide an extra
component of utility measurement which makes them different by
providing a measure of success at a given state.
o Utility-based agent act based not only goals but also the best way to
achieve the goal.
o The utility function maps each state to a real number to check how
efficiently each action achieves the goals.
5. Learning Agents
o A learning agent in AI is the type of agent which can learn from its past
experiences, or it has learning capabilities.
o It starts to act with basic knowledge and then able to act and adapt
automatically through learning.
Hence, learning agents are able to learn, analyze performance, and look for new
ways to improve the performance.
Semantic Web
Current World Wide Web (WWW) is a huge library of interlinked documents that
are transferred by computers and presented to people. It has grown from
hypertext systems, but the difference is that anyone can contribute to it. This also
means that the quality of information or even the persistence of documents
cannot be generally guaranteed. Current WWW contains a lot of information and
knowledge, but machines usually serve only to deliver and present the content of
documents describing the knowledge. People have to connect all the sources of
relevant information and interpret them themselves.
Semantic web is an effort to enhance current web so that computers can process
the information presented on WWW, interpret and connect it, to help humans to
The architecture of semantic web is illustrated in the figure below. The first layer,
URI and Unicode, follows the important features of the existing WWW. Unicode is
a standard of encoding international character sets and it allows that all human
languages can be used (written and read) on the web using one standardized
form. Uniform Resource Identifier (URI) is a string of a standardized form that
allows to uniquely identify resources (e.g., documents). A subset of URI is Uniform
Resource Locator (URL), which contains access mechanism and a (network)
location of a document - such as http://www.example.org/. Another subset of URI
is URN that allows to identify a resource without implying its location and means
of dereferencing it - an example is urn:isbn:0-123-45678-9. The usage of URI is
important for a distributed internet system as it provides understandable
identification of all resources. An international variant to URI is Internationalized
Resource Identifier (IRI) that allows usage of Unicode characters in identifier and
for which a mapping to URI is defined. In the rest of this text, whenever URI is
used, IRI can be used as well as a more general concept.
Extensible Markup Language (XML) layer with XML namespace and XML
schema definitions makes sure that there is a common syntax used in the
semantic web. XML is a general purpose markup language for documents
containing structured information. A XML document contains elements that can
be nested and that may have attributes and content. XML namespaces allow to
specify different markup vocabularies in one XML document. XML schema serves
for expressing schema of a particular set of XML documents.
the primary representation language. The normative syntax for serializing RDF is
XML in the RDF/XML form. Formal semantics of RDF is defined as well.
RDF itself serves as a description of a graph formed by triples. Anyone can define
vocabulary of terms used for more detailed description. To allow standardized
description of taxonomies and other ontological constructs, a RDF Schema
(RDFS) was created together with its formal semantics within RDF. RDFS can be
used to describe taxonomies of classes and properties and use them to create
lightweight ontologies.
More detailed ontologies can be created with Web Ontology Language OWL. The
OWL is a language derived from description logics, and offers more constructs
over RDFS. It is syntactically embedded into RDF, so like RDFS, it provides
additional standardized vocabulary. OWL comes in three species - OWL Lite for
taxonomies and simple constrains, OWL DL for full description logic support, and
OWL Full for maximum expressiveness and syntactic freedom of RDF. Since OWL
is based on description logic, it is not surprising that a formal semantics is defined
for this language.
RDFS and OWL have semantics defined and this semantics can be used for
reasoning within ontologies and knowledge bases described using these
languages. To provide rules beyond the constructs available from these
languages, rule languages are being standardized for the semantic web as well.
Two standards are emerging - RIF and SWRL.
For querying RDF data as well as RDFS and OWL ontologies with knowledge bases,
a Simple Protocol and RDF Query Language (SPARQL) is available. SPARQL is SQL-
like language, but uses RDF triples and resources for both matching part of the
query and for returning results of the query. Since both RDFS and OWL are built
on RDF, SPARQL can be used for querying ontologies and knowledge bases
directly as well. Note that SPARQL is not only query language, it is also a protocol
for accessing RDF data.
It is expected that all the semantics and rules will be executed at the layers below
Proof and the result will be used to prove deductions. Formal proof together with
trusted inputs for the proof will mean that the results can be trusted, which is
shown in the top layer of the figure. For reliable inputs, cryptography means are
to be used, such as digital signatures for verification of the origin of the sources.
On top of these layers, application with user interface can be built.
Agent communication
Components of communicating
agents Speaker
1. Intention:
Before speaking anything, we know the intention of what we want to
convey to the other person. The same thing is implemented in the
communicating systems. This makes communication valid and relevant
from the side of the communicating system.
2. Generation:
After knowing the intention of what is to be conveyed, the system must
gather words so that the information can be reached to the user in his very
own communicating language. So, the generation of relevant words is done
by the system after the intention process.
3. Synthesis:
Once the agent has all the relevant words, yet they have to be uttered in a
way that they have some meaning. So, after the generation of words, the
formation of meaningful sentences takes places and finally, the agent
speaks them out to the user.
Hearer
1. Perception:
In the perception phase, the communicating system perceives what the
user has spoken to it. This is a sort of an audio input signal which the agent
receives from the user and then this signal is sent for the further processing
by the system.
2. Analysis:
After getting the audio input from the user which is a sequence of
sentences and phrases, the system tries to analyze them by extracting the
meaningful terms out of the sentences by removing the articles, connectors
and other words which are there only for the sake of sentence formation.
3. Disambiguation:
This is the most important thing that a communicating system carries out.
After the analyzing process, the agent must understand the meaning of the
sentences that the user have spoken. So, this understanding phase in which
the system tries to derive the meaning of the sentences by removing
various ambiguities and errors is known as disambiguation. This is done by
understanding the Syntax, Semantics, and Pragmatics of the sentences.
4. Incorporation:
In incorporation, the system figures out whether the understanding that it
has derived out of the audio signal is correct or not. Whether it is
meaningful, whether the system should consider it or ask the user for
further input for resolving any sort of ambiguity.
Fuzzy sets
Mathematical Concept
A˜={(y,μA˜(y))|y∈ U}
Let us now consider two cases of universe of information and understand how a
fuzzy set can be represented.
Case 1
Case 2
Union/Fuzzy „OR‟
Intersection/Fuzzy „AND‟
μA˜∩B˜(y)=μA˜∧ μB˜∀ y∈ U
Complement/Fuzzy „NOT‟
μA˜=1−μA˜(y)y∈ U
Commutative Property
Having two fuzzy sets A˜A~ and B˜B~, this property states −
A˜∪ B˜=B˜∪ A˜
A˜∩B˜=B˜∩A˜
Associative Property
Having three fuzzy sets A˜A~, B˜B~ and C˜C~, this property states −
Distributive Property
Having three fuzzy sets A~, B~ and C~, this property states −
A˜∪(B˜∩C˜)=(A˜∪B˜)∩(A˜∪C˜)
A˜∩(B˜∪C˜)=(A˜∩B˜)∪(A˜∩C˜)
Idempotency Property
A˜∪A˜=A˜A~∪A~=A~
A˜∩A˜=A˜A~∩A~=A~
Identity Property
For fuzzy set A˜A~ and universal set UU, this property states −
A˜∪φ=A˜
A˜∩U=A˜
A˜∩φ=φ
A˜∪U=U
Transitive Property
Having three fuzzy sets A~, B~ and C~, this property states −
Involution Property
De Morgan‟s Law
This law plays a crucial role in proving tautologies and contradiction. This law
states −
Membership Function
We already know that fuzzy logic is not logic that is fuzzy but logic that is used to
describe fuzziness. This fuzziness is best characterized by its membership
function. In other words, we can say that membership function represents the
degree of truth in fuzzy logic.
Mathematical Notation
We have already studied that a fuzzy set à in the universe of information U can be
defined as a set of ordered pairs and it can be represented mathematically as −
A˜={(y,μA˜(y))|y∈ U}
Here μA˜(∙) = membership function of A~; this assumes values in the range from 0
to 1, i.e., μA˜(∙)∈ [0,1]. The membership function μA˜(∙) maps UU to the
membership spaceMM.
The dot (∙) in the membership function described above, represents the element
in a fuzzy set; whether it is discrete or continuous.
Core
For any fuzzy set A~, the core of a membership function is that region of universe
that is characterize by full membership in the set. Hence, core consists of all those
elements yy of the universe of information such that,
μA˜(y)=1μA~(y)=1
Support
For any fuzzy set A˜A~, the support of a membership function is the region of
universe that is characterize by a nonzero membership in the set. Hence core
consists of all those elements yy of the universe of information such that,
μA˜(y)>0
Boundary
For any fuzzy set A˜A~, the boundary of a membership function is the region of
universe that is characterized by a nonzero but incomplete membership in the
set. Hence, core consists of all those elements yy of the universe of information
such that,
1>μA˜(y)>0
Fuzzification
In this method, the fuzzified set can be expressed with the help of the following
relation −
A˜=μ1Q(x1)+μ2Q(x2)+...+μnQ(xn)
Here the fuzzy set Q(xi)Q(xi) is called as kernel of fuzzification. This method is
implemented by keeping μiμi constant and xi being transformed to a fuzzy
set Q(xi).
It is quite similar to the above method but the main difference is that it
kept xixi constant and μiμi is expressed as a fuzzy set.
Defuzzification
It may be defined as the process of reducing a fuzzy set into a crisp set or to
convert a fuzzy member into a crisp member.
We have already studied that the fuzzification process involves conversion from
crisp quantities to fuzzy quantities. In a number of engineering applications, it is
necessary to defuzzify the result or rather ―fuzzy result‖ so that it must be
converted to crisp result. Mathematically, the process of Defuzzification is also
called ―rounding it off‖.
Max-Membership Method
This method is limited to peak output functions and also known as height
method. Mathematically it can be represented as follows −
Centroid Method
This method is also known as the center of area or the center of gravity method.
Mathematically, the defuzzified output x∗ x∗ will be represented as −
Mean-Max Membership
This method is also known as the middle of the maxima. Mathematically, the
defuzzified output x∗ x∗ will be represented as −
Logic, which was originally just the study of what distinguishes sound argument
from unsound argument, has now developed into a powerful and rigorous system
whereby true statements can be discovered, given other statements that are
already known to be true.
Predicate Logic
This logic deals with predicates, which are propositions containing variables.
Propositional Logic
Connectives
OR (∨ ∨ )
AND (∧ ∧ )
OR (∨ ∨ )
A B A∨ B
AND (∧ ∧ )
A B A∧ B
Negation (¬¬)
The negation of a proposition A (written as ¬A¬A) is false when A is true and is
true when A is false.
A ¬A
True False
False True
An implication A→BA→B is the proposition ―if A, then B‖. It is false if A is true and
B is false. The rest cases are true.
A B A→B
A B A⇔B
Quantifiers
Universal Quantifier
Existential Quantifier
Universal Quantifier
Universal quantifier states that the statements within its scope are true for every
value of the specific variable. It is denoted by the symbol ∀ .
Existential Quantifier
Existential quantifier states that the statements within its scope are true for some
values of the specific variable. It is denoted by the symbol ∃ .
Nested Quantifiers
Example
Approximate Reasoning
Following are the different modes of approximate reasoning −
Categorical Reasoning
In this mode of approximate reasoning, the antecedents, containing no fuzzy
quantifiers and fuzzy probabilities, are assumed to be in canonical form.
Qualitative Reasoning
In this mode of approximate reasoning, the antecedents and consequents have
fuzzy linguistic variables; the input-output relationship of a system is expressed as
a collection of fuzzy IF-THEN rules. This reasoning is mainly used in control system
analysis.
Syllogistic Reasoning
Dispositional Reasoning
The expression as stated above is referred to as the Fuzzy IF-THEN rule base.
Canonical Form
Rules
Assignment Statements
These kinds of statements use ―=‖ (equal to sign) for the purpose of assignment.
They are of the following form −
a = hello
climate = summer
Conditional Statements
These kinds of statements use the ―IF-THEN‖ rule base form for the purpose of
condition. They are of the following form −
Unconditional Statements
GOTO 10
Linguistic Variable
We have studied that fuzzy logic uses linguistic variables which are the words or
sentences in a natural language. For example, if we say temperature, it is a
linguistic variable; the values of which are very hot or cold, slightly hot or cold,
very warm, slightly warm, etc. The words very, slightly are the linguistic hedges.
s as P
For example, ―Delhi is the capital of India‖, this is a proposition where ―Delhi‖ is
the subject and ―is the capital of India‖ is the predicate which shows the property
of subject.
We know that logic is the basis of reasoning and fuzzy logic extends the capability
of reasoning by using fuzzy predicates, fuzzy-predicate modifiers, fuzzy quantifiers
and fuzzy qualifiers in fuzzy propositions which creates the difference from
classical logic.
Fuzzy Predicate
Almost every predicate in natural language is fuzzy in nature hence, fuzzy logic
has the predicates like tall, short, warm, hot, fast, etc.
Fuzzy-predicate Modifiers
Fuzzy Quantifiers
It can be defined as a fuzzy number which gives a vague classification of the
cardinality of one or more fuzzy or non-fuzzy sets. It can be used to influence
probability within fuzzy logic. For example, the words many, most, frequently are
used as fuzzy quantifiers and the propositions can be like ―most people are
allergic to it.‖
Fuzzy Qualifiers
Inference System
Fuzzy Inference System is the key unit of a fuzzy logic system having decision
making as its primary work. It uses the ―IF…THEN‖ rules along with connectors
―OR‖ or ―AND‖ for drawing essential decision rules.
The output from FIS is always a fuzzy set irrespective of its input which can
be fuzzy or crisp.
A defuzzification unit would be there with FIS to convert fuzzy variables into
crisp variables.
The following five functional blocks will help you understand the construction of
FIS −
Working of FIS
A knowledge base - collection of rule base and database is formed upon the
conversion of crisp input into fuzzy input.
The defuzzification unit fuzzy input is finally converted into crisp output.
Methods of FIS
Let us now discuss the different methods of FIS. Following are the two important
methods of FIS, having different consequent of fuzzy rules −
Following steps need to be followed to compute the output from this FIS −
Interface System.
This model was proposed by Takagi, Sugeno and Kang in 1985. Format of this rule
is given as −
Here, AB are fuzzy sets in antecedents and z = f(x,y) is a crisp function in the
consequent.
The fuzzy inference process under Takagi-Sugeno Fuzzy Model (TS Method) works
in the following way −
Step 1: Fuzzifying the inputs − Here, the inputs of the system are
made fuzzy.
Step 2: Applying the fuzzy operator − In this step, the fuzzy operators
must be applied to get the output.
Let us now understand the comparison between the Mamdani System and the
Sugeno Model.
Control System
Fuzzy logic is applied with great success in various control application. Almost all
the consumer products have fuzzy control. Some of the examples include
controlling your room temperature with the help of air-conditioner, anti-braking
system used in vehicles, control on traffic lights, washing machines, large
economic systems, etc.
While applying traditional control, one needs to know about the model and
the objective function formulated in precise terms. This makes it very
difficult to apply in many cases.
By applying fuzzy logic for control we can utilize the human expertise and
experience for designing a controller.
The fuzzy control rules, basically the IF-THEN rules, can be best utilized in
designing a controller.
While designing fuzzy control system, the following six basic assumptions should
be made −
The following diagram shows the architecture of Fuzzy Logic Control (FLC).
Followings are the major components of the FLC as shown in the above figure −
Fuzzifier − The role of fuzzifier is to convert the crisp input values into
fuzzy values.
Fuzzy Knowledge Base − It stores the knowledge about all the input-
output fuzzy relationships. It also has the membership function which
defines the input variables to the fuzzy rule base and the output variables
to the plant under control.
We will now discuss what are the disadvantages of Fuzzy Logic Control.
Needs regular updating of rules − The rules must be updated with time.
Every neuron is connected with other neuron through a connection link. Each
connection link is associated with a weight having the information about the input
signal. This is the most useful information for neurons to solve a particular
problem because the weight usually inhibits the signal that is being
communicated. Each neuron is having its internal state which is called the
activation signal. Output signals, which are produced after combining the input
signals and the activation rule, may be sent to other units. It also consists of a bias
‗b‘ whose weight is always 1.
As we have discussed above that every neuron in ANN is connected with other
neuron through a connection link and that link is associated with a weight having
the information about the input signal. Hence we can say that weights have the
useful information about input to solve the problems.
Fuzzy logic is largely used to define the weights, from fuzzy sets, in neural
networks.
When crisp values are not possible to apply, then fuzzy values are used.
We have already studied that training and learning help neural networks
perform better in unexpected situations. At that time fuzzy values would be
more applicable than crisp values.
When we use fuzzy logic in neural networks then the values must not be
crisp and the processing can be done in parallel.
Despite having numerous advantages, there is also some difficulty while using
fuzzy logic in neural networks. The difficulty is related with membership rules, the
need to build fuzzy system, because it is sometimes complicated to deduce it with
the given set of complex data.
The reverse relationship between neural network and fuzzy logic, i.e., neural
network used to train fuzzy logic is also a good area of study. Following are two
major reasons to build neuraltrained fuzzy logic −
New patterns of data can be learned easily with the help of neural
networks hence, it can be used to preprocess data in fuzzy systems.
Ford Motor Company has developed trainable fuzzy systems for automobile
idle-speed control.
Genetic Algorithm
Genetic algorithm (GAs) are a class of search algorithms designed on the natural
evolution process. Genetic Algorithms are based on the principles of survival
of the fittest.
The advancement of ANNs is a subject that has been broadly dealt with extremely
different techniques. The world of evolutionary algorithms is no exemption, and
evidence of that is the incredible amount of works that have been published
about the various techniques in this area, even with genetic algorithms or GP. As
a general rule, the field of ANNs generation using evolutionary algorithms is
separated into three principal fields: Evolution of weight, Architectures,
Learning rules.
Initially, the weight evolution begins from an ANN with a previously determined
topology. The issue to be solved is the training of the association weights,
attempting to limit the network error. With the utilization of an evolutionary
algorithm, the weights can be represented either as the connection of binary or
real values.
At the first option, direct encoding, there is a balanced analogy between all of the
genes and their resulting phenotypes. The most typical encoding technique
comprises a matrix that represents an architecture where each component
reveals the presence or absence of association between two nodes.
In the encoding schemes, GP has been utilized to create both architecture and
association weights at the same time, either for feed-forward or recurrent ANNs,
Start:
Fitness:
New Population:
It generates a new population by repeating the following steps until the New
population is finished.
Selection:
It chooses two parent chromosomes from a population as per their fitness. The
better fitness, the higher the probability of getting selected.
Crossover:
In crossover probability, cross over the parents to form new offspring (children). If
no crossover was performed, the offspring is the exact copy of the parents.
Mutation:
Accepting:
Replace:
It uses the newly generated population for a further run of the algorithm.
Test:
If the end condition is satisfied, then it stops and returns the best solution in the
current population.
Loop:
In this step, we need to go to the second step for fitness evaluation.
The basic principle behind the genetic algorithms is that they generate and
maintain a population of individuals represented by chromosomes. Chromosomes
are a character string practically equivalent to the chromosomes appearing in
DNA. These chromosomes are usually encoded solutions to a problem. It
undergoes a process of evolution as per rules of selection, reproduction, and
mutation. Each individual in the environment (represented by chromosome) gets
a measure of its fitness in the environment. Reproduction chooses individuals
with high fitness values in the population. Through crossover and mutation of
such individuals, a new population is determined in which individuals might be an
even better fit for their environment. The process of crossover includes two
chromosomes swapping chunks of data and is analogous to the process of
reproduction. Mutation introduces slight changes into a little extant of the
population, and it is representative of an evolutionary step.
It selects the next point in the series It selects the next population by
by a deterministic computation. computation, which utilizes random
number generators.
Along with making a decent choice of the fitness function, different parameters of
a Genetic Algorithm like population size, mutation, and crossover rate must be
chosen effectively. Small population size will not give enough solution to the
genetic algorithm to produce precise results. A frequency of genetic change or
poor selection scheme will result in disrupting the beneficial schema.
Basic principles :
Algorithmic Phases :
Simple_Genetic_Algorithm()
Mutation;
Encoding Methods :
Binary Encoding : Most common methods of encoding. Chromosomes
are string of 1s and 0s and each position in the chromosome represents
a particular characteristics of the problem.
Value Encoding : Used in problems where complicated values, such as
real numbers, are used and where binary encoding would not suffice.
Good for some problems, nut often necessary to develop some specific
crossover and mutation techniques for these chromosomes.
It's usually always positive, and the larger the number the better the genome.
When we use such a fitness function, we're performing maximization on the
search space - looking for maximum value of fitness.
The objective function is quite similar to fitness function, and in a lot of cases
they're the same, but sometimes the distinction is important. The objective
function is used to calculate the fitness of the best genome in each generation
(the one with the maximum fitness function value) in order to check whether it
satisfies a predetermined conditions.
Why use two different functions? Well, because the fitness function is performed
on every genome in every generation, it's very important for it to be fast. It
doesn't have to be very precise, as long as it more or less sorts the genomes by
quality reasonably well.
On the other hand, the objective function is called only once per generation, so
we can afford to use a more costly and more precise function, so we'd know for
sure how good our result is. The objective function would be our f(x) on the
clifftop picture, while the fitness function would be its close approximation.
In most cases the fitness function and the objective function are the same as the
objective is to either maximize or minimize the given objective function. However,
for more complex problems with multiple objectives and constraints,
an Algorithm Designer might choose to have a different fitness
characteristics −
In some cases, calculating the fitness function directly might not be possible due
to the inherent complexities of the problem at hand. In such cases, we do fitness
approximation to suit our needs.
The following image shows the fitness calculation for a solution of the 0/1
Knapsack. It is a simple fitness function which just sums the profit values of the
items being picked (which have a 1), scanning the elements from left to right till
the knapsack is full.
Crossover
The crossover operator is analogous to reproduction and biological crossover. In
this more than one parent is selected and one or more off-springs are produced
using the genetic material of the parents. Crossover is usually applied in a GA with
a high probability – pc .
Crossover Operators
In this section we will discuss some of the most popularly used crossover
operators. It is to be noted that these crossover operators are very generic and
the GA Designer might choose to implement a problem-specific crossover
operator as well.
Uniform Crossover
Obviously, if α = 0.5, then both the children will be identical as shown in the
following image.
OX1 is used for permutation based crossovers with the intention of transmitting
information about relative ordering to the off-springs. It works as follows −
Create two random crossover points in the parent and copy the segment
between them from the first parent to the first offspring.
Now, starting from the second crossover point in the second parent, copy
the remaining unused numbers from the second parent to the first child,
wrapping around the list.
Repeat for the second child with the parent‘s role reversed.
There exist a lot of other crossovers like Partially Mapped Crossover (PMX), Order
based crossover (OX2), Shuffle Crossover, Ring Crossover, etc.
Mutation
Mutation is the part of the GA which is related to the ―exploration‖ of the search
space. It has been observed that mutation is essential to the convergence of the
GA while crossover is not.
Mutation Operators
In this section, we describe some of the most commonly used mutation
operators. Like the crossover operators, this is not an exhaustive list and the GA
designer might find a combination of these approaches or a problem-specific
mutation operator more useful.
In this bit flip mutation, we select one or more random bits and flip them. This is
used for binary encoded GAs.
Random Resetting
Random Resetting is an extension of the bit flip for the integer representation. In
this, a random value from the set of permissible values is assigned to a randomly
chosen gene.
Swap Mutation
Scramble Mutation
Scramble mutation is also popular with permutation representations. In this, from
the entire chromosome, a subset of genes is chosen and their values are
scrambled or shuffled randomly.
Inversion Mutation
In inversion mutation, we select a subset of genes like in scramble mutation, but
instead of shuffling the subset, we merely invert the entire string in the subset.
In this section, we list some of the areas in which Genetic Algorithms are
frequently used. These are −
Parallelization − GAs also have very good parallel capabilities, and prove
to be very effective means in solving certain problems, and also provide
a good area for research.
Image Processing − GAs are used for various digital image processing
(DIP) tasks as well like dense pixel matching.
Robot Trajectory Generation − GAs have been used to plan the path
which a robot arm takes by moving from one point to another.
Dendrites Inputs
Synapse Weights
Axon Output
There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human
brain, data is stored in such a manner as to be distributed, and we can extract
more than one piece of this data when necessary from our memory parallelly. We
can say that the human brain is made up of incredibly amazing parallel
processors.
Input Layer:
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the
inputs and includes a bias. This computation is represented in the form of a
transfer function.
Artificial neural networks have a numerical value that can perform more than one
task simultaneously.
Data that is used in traditional programming is stored on the whole network, not
on a database. The disappearance of a couple of pieces of data in one place
doesn't prevent the network from working.
After ANN training, the information may produce output even with inadequate
data. The loss of performance here relies upon the significance of missing data.
It is the most significant issue of ANN. When ANN produces a testing solution, it
does not provide insight concerning why and how. It decreases trust in the
network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per
their structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be
resolved here will directly impact the performance of the network. It relies on the
user's abilities.
The network is reduced to a specific value of the error, and this value does not
give us optimum results.
Science artificial neural networks that have steeped into the world in the mid-
20th century are exponentially developing. In the present time, we have
the course of their utilization. It should not be overlooked that the cons of ANN
their pros are increasing day by day. It means that artificial neural networks will
turn into an irreplaceable part of our lives progressively important.
If the weighted sum is equal to zero, then bias is added to make the output non-
zero or something else to scale up to the system's response. Bias has the same
input, and weight equals to 1. Here the total of weighted inputs can be in the
range of 0 to positive infinity. Here, to keep the response in the limits of the
The activation function refers to the set of transfer functions used to achieve the
desired output. There is a different kind of the activation function, but primarily
either linear or non-linear sets of functions. Some of the commonly used sets of
activation functions are the Binary, linear, and Tan hyperbolic sigmoidal activation
functions. Let us take a look at each of them in details:
Binary:
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here
the tan hyperbolic function is used to approximate output from the actual net
input. The function is defined as:
There are various types of Artificial Neural Networks (ANN) depending upon the
human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some
similarities with a more complex biological partner and are very effective at their
expected tasks. For example, segmentation or classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-
evolved results internally. As per the University of Massachusetts, Lowell Centre
for Atmospheric Research. The feedback networks feed information back into
itself and are well suited to solve optimization issues. The Internal system error
corrections utilize feedback ANNs.
Feed-Forward ANN:
Supervised learning
In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies
the same concept as a student learns in the supervision of the teacher.
o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the
model is to identify the shape.
The machine is already trained on all types of shapes, and when it finds a new
shape, it classifies the shape on the bases of a number of sides, and predicts the
output.
o Split the training dataset into training dataset, test dataset, and
validation dataset.
o Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
o Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.
1. Regression
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
Unsupervised learning
Unsupervised learning is the training of machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information
without guidance. Here the task of machine is to group unsorted information
according to similarities, patterns and differences without any prior training of
data.
Thus the machine has no idea about the features of dogs and cat so we can‘t
categorize it in dogs and cats. But it can categorize them according to their
similarities, patterns, and differences i.e., we can easily categorize the above
picture into two parts. First first may contain all pics having dogs in it and
second part may contain all pics having cats in it. Here we didn‘t learn
anything before, means no training data or examples.
Unsupervised Learning
o Unsupervised learning is helpful for finding useful insights from the data.
Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret the
raw data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
The unsupervised learning algorithm can be further categorized into two types of
problems:
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Apriori algorithm
Reinforcement Learning
Reinforcement learning is the training of machine learning models to make
a sequence of decisions. The agent learns to achieve a goal in an uncertain,
potentially complex environment. In reinforcement learning, an artificial
intelligence faces a game-like situation. The computer employs trial and error
to come up with a solution to the problem. To get the machine to do what
the programmer wants, the artificial intelligence gets either rewards or penalties
for the actions it performs. Its goal is to maximize the total reward.
Although the designer sets the reward policy–that is, the rules of the game–he
gives the model no hints or suggestions for how to solve the game. It‘s up
to the model to figure out how to perform the task to maximize the reward,
starting from totally random trials and finishing with sophisticated tactics
and superhuman skills. By leveraging the power of search and many trials,
reinforcement learning is currently the most effective way to hint machine‘s
creativity. In contrast to human beings, artificial intelligence can gather
experience from thousands of parallel gameplays if a reinforcement learning
algorithm is run on a sufficiently powerful computer infrastructure.
passengers comfort and obey the rules of law. With an autonomous race
car, on the other hand, we would emphasize speed much more than
the driver‘s comfort. The programmer cannot predict everything that could
happen on the road. Instead of building lengthy ―if-then‖ instructions,
the programmer prepares the reinforcement learning agent to be capable
of learning from the system of rewards and penalties. The agent (another
name for reinforcement learning algorithms performing the task) gets
rewards for reaching specific goals.
was designed for. An interesting example can be found in the OpenAI video
below, where the agent learned to gain rewards, but not to complete the race.
Output: There are many possible output as there are variety of solution to a
particular problem
Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
Example: Object
Example: Chess game recognition
1. Positive –
Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the
behavior. In other words, it has a positive effect on behavior.
Maximizes Performance
2. Negative –
Negative Reinforcement is defined as strengthening of a behavior because
a negative condition is stopped or avoided.
The diagram shows that the hidden units communicate with the external layer.
While the input and output units communicate only through the hidden layer of
the network.
The pattern of connection with nodes, the total number of layers and level of
nodes between inputs and outputs with the number of neurons per layer define
the architecture of a neural network.
There are two types of architecture. These types focus on the functionality
artificial neural networks as follows −
Single layer perceptron is the first proposed neural model created. The content of
the local memory of the neuron consists of a vector of weights. The computation
of a single layer perceptron is performed over the calculation of sum of the input
vector each with the value multiplied by corresponding element of vector of the
weights. The value which is displayed in the output will be the input of an
activation function.
For each element of the training set, the error is calculated with the
difference between desired output and the actual output. The error
calculated is used to adjust the weights.
The process is repeated until the error made on the entire training set is
not less than the specified threshold, until the maximum number of
iterations is reached.
import tensorflow as tf
# Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1
# tf Graph Input
# Create model
cross_entropy = y*tf.log(activation)
#Plot settings
avg_set = []
epoch_set = []
sess.run(init)
# Training cycle
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
for i in range(total_batch):
x: batch_xs, y: batch_ys})
x: batch_xs, \ y: batch_ys})/total_batch
if epoch % display_step == 0:
avg_set.append(avg_cost) epoch_set.append(epoch+1)
plt.ylabel('cost')
plt.legend()
plt.show()
# Test model
# Calculate accuracy
Output
MLP networks are usually used for supervised learning format. A typical learning
algorithm for MLP networks is also called back propagation‘s algorithm.
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256
# tf Graph input
# weights layer 1
bias_layer_1 = tf.Variable(tf.random_normal([n_hidden_1]))
# weights layer 2
w = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
# bias layer 2
bias_layer_2 = tf.Variable(tf.random_normal([n_hidden_2]))
# layer 2
# cost function
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
#cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(output_layer,
y))
# optimizer
# optimizer = tf.train.GradientDescentOptimizer(
learning_rate = learning_rate).minimize(cost)
# Plot settings
avg_set = []
epoch_set = []
init = tf.global_variables_initializer()
sess.run(init)
# Training cycle
avg_cost = 0.
for i in range(total_batch):
x: batch_xs, y: batch_ys})
if epoch % display_step == 0:
avg_set.append(avg_cost)
epoch_set.append(epoch + 1)
plt.ylabel('cost')
plt.legend()
plt.show()
# Test model
# Calculate accuracy
All the entire learning process occurs without supervision because the nodes are
self-organizing. They are also known as feature maps, as they are basically
retraining the features of the input data, and simply grouping themselves as
indicated by the similarity between each other. It has practical value for
visualizing complex or huge quantities of high dimensional data and showing the
relationship between them into a low, usually two-dimensional field to check
whether the given unlabeled data have any structure to it.
A self-Organizing Map (SOM) varies from typical artificial neural networks (ANNs)
both in its architecture and algorithmic properties. Its structure consists of a
single layer linear 2D grid of neurons, rather than a series of layers. All the nodes
on this lattice are associated directly to the input vector, but not to each other. It
means the nodes don't know the values of their neighbors, and only update the
weight of their associations as a function of the given input. The grid itself is the
map that coordinates itself at each iteration as a function of the input data. As
such, after clustering, each node has its own coordinate (i.j), which enables one to
calculate Euclidean distance between two nodes by means of the Pythagoras
theorem.
The selected node- the Best Matching Unit (BMU) is selected according to the
similarity between the current input values and all the other nodes in the
network. The node with the fractional Euclidean difference between the input
vector, all nodes, and its neighboring nodes is selected and within a specific
radius, to have their position slightly adjusted to coordinate the input vector. By
experiencing all the nodes present on the grid, the whole grid eventually matches
the entire input dataset with connected nodes gathered towards one area, and
dissimilar ones are isolated.
Algorithm:
Step:1
Step:2
Step:3
Step:4
Calculate the Euclidean distance between weight vector wij and the input vector
x(t) connected with the first node, where t, i, j =0.
Step:5
Step:6
Calculate the overall Best Matching Unit (BMU). It means the node with the
smallest distance from all calculated ones.
Step:7
Discover topological neighborhood βij(t) its radius σ(t) of BMU in Kohonen Map.
Step:8
Repeat for all nodes in the BMU neighborhood: Update the weight vector w_ij of
the first node in the neighborhood of the BMU by including a fraction of the
difference between the input vector x(t) and the weight w(t) of the neuron.
Step:9
Repeat the complete iteration until reaching the selected iteration limit t=n.
Where;
t = current iteration.
W= weight vector
X = input vector
β_ij = the neighborhood function, decreasing and representing node i,j distance
from the BMU.
σ(t) = The radius of the neighborhood function, which calculates how far neighbor
nodes are examined in the 2D grid when updating vectors. It gradually decreases
over time.
Hopfield Networks
With zero self-connectivity, Wii =0 is given below. Here, the given three neurons
having values i = 1, 2, 3 with values Xi=±1 have connectivity weight Wij.
Updating rule:
If hi ≥ 0 then xi → 1 otherwise xi → -1
We need to put bi=0 so that it makes no difference in training the network with
random patterns.
Synchronously:
In this approach, the update of all the nodes taking place simultaneously at each
time.
Asynchronously:
In this approach, at each point of time, update one node chosen randomly or
according to some rule. Asynchronous updating is more biologically realistic.
We can describe a metric on X by using the Hamming distance between any two
states:
w12 = w21 =
1 w12= w21
= -1
Asynchronous updating:
In the first case, there are two attracting fixed points termed as [-1,-1] and [-1,-
1]. All orbit converges to one of these. For a second, the fixed points are [-1,1]
and [1,-1], and all orbits are joined through one of these. For any fixed
point, swapping all the signs gives another fixed point.
Synchronous updating:
In the first and second cases, although there are fixed points, none can be
attracted to nearby points, i.e., they are not attracting fixed points. Some orbits
oscillate forever.
For a given state X ∈ {−1, 1} N of the network and for any set of
association weights Wij with Wij = wji and wii =0 let,
Here, we need to update Xm to X'm and denote the new energy by E' and show
that.
Thus, E' - E ≤ 0
Note:
If Xm flips, then E' - E = 2Xmhm
Suppose the connection weight Wij = Wji between two neurons I and
j. If Wij > 0, the updating rule implies:
o If Xj = 1, then the contribution of j in the weighted sum, i.e., WijXj,
is positive. Thus the value of Xi is pulled by j towards its value Xj=
1
Thus, if Wij > 0 , then the value of i is pulled by the value of j. By symmetry, the
value of j is also pulled by the value of i.
(Ki=0)
If we select Wij =ɳ XiXj for 1 ≤ i , j ≤ N (Here, i≠j), where ɳ > 0 is the learning
rate, then the value of Xi will not change under updating condition as we
illustrate below.
We have
It implies that the value of Xi, whether 1 or -1 will not change, so that x→
is a fixed point.