Artificial Intelligence: Turing Test
Artificial Intelligence: Turing Test
Approaches to AI
A heuristic is a technique that is used to solve a problem faster than the classic
methods. These techniques are used to find the approximate solution of a problem
when classical methods do not. Heuristics are said to be the problem-solving
techniques that result in practical and quick solutions.
Heuristics are strategies that are derived from past experience with similar problems.
Heuristics use practical methods and shortcuts used to produce the solutions that may
or may not be optimal, but those solutions are sufficient in a given limited timeframe.
The examples of Direct Heuristic search techniques include Breadth-First Search (BFS)
and Depth First Search (DFS).
The examples of Weak Heuristic search techniques include Best First Search (BFS) and
A*.
o Bidirectional Search
o A* search
o Simulated Annealing
o Hill Climbing
o Best First search
o Beam search
Game Playing in Artificial Intelligence
Initial call:
Minimax(node, 3, true)
Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility
function to get the utility values for the terminal states. In the below tree diagram, let's take A
is the initial state of the tree. Suppose maximizer takes first turn which has worst-case initial
value =- infinity, and minimizer will take next turn which has worst-case initial value =
+infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we
will compare each value in terminal state with initial value of Maximizer and determines the
higher nodes values. It will find the maximum among the all.
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with +∞,
and will find the 3rd layer node values.
History
Allen Newell and Herbert A. Simon who used what John McCarthy calls an
"approximation"[2] in 1958 wrote that alpha–beta "appears to have been reinvented a number
of times".[3] Arthur Samuel had an early version for a checkers simulation. Richards, Timothy
Hart, Michael Levin and/or Daniel Edwards also invented alpha–beta independently in the
United States.[4] McCarthy proposed similar ideas during the Dartmouth workshop in 1956
and suggested it to a group of his students including Alan Kotok at MIT in 1961.[5] Alexander
Brudno independently conceived the alpha–beta algorithm, publishing his results in 1963.[6]
Donald Knuth and Ronald W. Moore refined the algorithm in 1975.[7][8] Judea Pearl proved
its optimality in terms of the expected running time for trees with randomly assigned leaf
values in two papers.[9][10] The optimality of the randomized version of alpha–beta was shown
by Michael Saks and Avi Wigderson in 1986. [11]
Core idea
A game tree can represent many two-player zero-sum games, such as chess, checkers, and
reversi. Each node in the tree represents a possible situation in the game. Each terminal node
(outcome) of a branch is assigned a numeric score that determines the value of the outcome to
the player with the next move.[12]
The algorithm maintains two values, alpha and beta, which respectively represent the
minimum score that the maximizing player is assured of and the maximum score that the
minimizing player is assured of. Initially, alpha is negative infinity and beta is positive
infinity, i.e. both players start with their worst possible score. Whenever the maximum score
that the minimizing player (i.e. the "beta" player) is assured of becomes less than the
minimum score that the maximizing player (i.e., the "alpha" player) is assured of (i.e. beta <
alpha), the maximizing player need not consider further descendants of this node, as they will
never be reached in the actual play.
To illustrate this with a real-life example, suppose somebody is playing chess, and it is their
turn. Move "A" will improve the player's position. The player continues to look for moves to
make sure a better one hasn't been missed. Move "B" is also a good move, but the player then
realizes that it will allow the opponent to force checkmate in two moves. Thus, other
outcomes from playing move B no longer need to be considered since the opponent can force
a win. The maximum score that the opponent could force after move "B" is negative infinity:
a loss for the player. This is less than the minimum position that was previously found; move
"A" does not result in a forced loss in two moves.
The benefit of alpha–beta pruning lies in the fact that branches of the search tree can be
eliminated.[12] This way, the search time can be limited to the 'more promising' subtree, and a
deeper search can be performed in the same time. Like its predecessor, it belongs to the
branch and bound class of algorithms. The optimization reduces the effective depth to slightly
more than half that of simple minimax if the nodes are evaluated in an optimal or near
optimal order (best choice for side on move ordered first at each node).
With an (average or constant) branching factor of b, and a search depth of d plies, the
maximum number of leaf node positions evaluated (when the move ordering is pessimal) is
O(b×b×...×b) = O(bd) – the same as a simple minimax search. If the move ordering for the
search is optimal (meaning the best moves are always searched first), the number of leaf node
positions evaluated is about O(b×1×b×1×...×b) for odd depth and O(b×1×b×1×...×1) for even
depth, or O(bd/2)=O( bd). In the latter case, where the ply of a search is even, the effective
branching factor is reduced to its square root, or, equivalently, the search can go twice as
deep with the same amount of computation. [13] The explanation of b×1×b×1×... is that all the
first player's moves must be studied to find the best one, but for each, only the second player's
best move is needed to refute all but the first (and best) first player move—alpha–beta
ensures no other second player moves need be considered. When nodes are considered in a
random order (i.e., the algorithm randomizes), asymptotically, the expected number of nodes
evaluated in uniform trees with binary leaf-values is Ɵ(((b-1+ b2+14b+1)/4)d).[11] For the
same trees, when the values are assigned to the leaf values independently of each other and
say zero and one are both equally probable, the expected number of nodes evaluated is
Ɵ((b/2)d), which is much smaller than the work done by the randomized algorithm,
mentioned above, and is again optimal for such random trees. [9] When the leaf values are
chosen independently of each other but from the [0,1]interval uniformly at random, the
expected number of nodes evaluated increases Ɵ (bd/log(d))to in the d--> ∞ imit,[10] which is
again optimal for these kind random trees. Note that the actual work for "small" values of d
is better approximated using 0.925d0.747 .[10][9]
A chess program that searches four plies with an average of 36 branches per node evaluates
more than one million terminal nodes. An optimal alpha-beta prune would eliminate all but
about 2,000 terminal nodes, a reduction of 99.8%. [12]
Normally during alpha–beta, the subtrees are temporarily dominated by either a first player
advantage (when many first player moves are good, and at each search depth the first move
checked by the first player is adequate, but all second player responses are required to try to
find a refutation), or vice versa. This advantage can switch sides many times during the
search if the move ordering is incorrect, each time leading to inefficiency. As the number of
positions searched decreases exponentially each move nearer the current position, it is worth
spending considerable effort on sorting early moves. An improved sort at any depth will
exponentially reduce the total number of positions searched, but sorting all positions at depths
near the root node is relatively cheap as there are so few of them. In practice, the move
ordering is often determined by the results of earlier, smaller searches, such as through
iterative deepening.
Additionally, this algorithm can be trivially modified to return an entire principal variation in
addition to the score. Some more aggressive algorithms such as MTD(f) do not easily permit
such a modification.
Pseudocode
The pseudo-code for depth limited minimax with alpha–beta pruning is as follows:[13]
Implementations of alpha–beta pruning can often be delineated by whether they are "fail-
soft," or "fail-hard". With fail-soft alpha–beta, the alphabeta function may return values (v)
that exceed (v < α or v > β) the α and β bounds set by its function call arguments. In
comparison, fail-hard alpha–beta limits its function return value into the inclusive range of α
and β. The main difference between fail-soft and fail-hard implementations is whether α and
β are updated before or after the cutoff check. If they are updated before the check, then they
can exceed initial bounds and the algorithm is fail-soft.
Heuristic improvements
Further improvement can be achieved without sacrificing accuracy by using ordering
heuristics to search earlier parts of the tree that are likely to force alpha–beta cutoffs. For
example, in chess, moves that capture pieces may be examined before moves that do not, and
moves that have scored highly in earlier passes through the game-tree analysis may be
evaluated before others. Another common, and very cheap, heuristic is the killer heuristic,
where the last move that caused a beta-cutoff at the same tree level in the tree search is
always examined first. This idea can also be generalized into a set of refutation tables.
Alpha–beta search can be made even faster by considering only a narrow search window
(generally determined by guesswork based on experience). This is known as aspiration
search. In the extreme case, the search is performed with alpha and beta equal; a technique
known as zero-window search, null-window search, or scout search. This is particularly
useful for win/loss searches near the end of a game where the extra depth gained from the
narrow window and a simple win/loss evaluation function may lead to a conclusive result. If
an aspiration search fails, it is straightforward to detect whether it failed high (high edge of
window was too low) or low (lower edge of window was too high). This gives information
about what window values might be useful in a re-search of the position.
Over time, other improvements have been suggested, and indeed the Falphabeta (fail-soft
alpha–beta) idea of John Fishburn is nearly universal and is already incorporated above in a
slightly modified form. Fishburn also suggested a combination of the killer heuristic and
zero-window search under the name Lalphabeta ("last move with minimal window alpha–
beta search").
Knowledge Representation
Knowledge Representation in AI describes the representation of knowledge.
Basically, it is a study of how the beliefs, intentions, and judgments of
an intelligent agent can be expressed suitably for automated reasoning. One
of the primary purposes of Knowledge Representation includes modeling
intelligent behavior for an agent.
o Object: All the facts about objects in our world domain. E.g., Guitars
contains strings, trumpets are brass instruments.
o Events: Events are the actions which occur in our world.
o Performance: It describe behavior which involves knowledge about
how to do things.
o Meta-knowledge: It is knowledge about what we know.
o Facts: Facts are the truths about the real world and what we represent.
o Knowledge-Base: The central component of the knowledge-based
agents is the knowledge base. It is represented as KB. The
Knowledgebase is a group of the Sentences (Here, sentences are used
as a technical term and not identical with the English language).
o Perception
o Learning
o Knowledge Representation and Reasoning
o Planning
o Execution
Techniques of knowledge representation
There are mainly four ways of knowledge representation which are given as
follows:
1. Logical Representation
2. Semantic Network Representation
3. Frame Representation
4. Production Rules
1. Logical Representation
Logical representation is a language with some concrete rules which deals
with propositions and has no ambiguity in representation. Logical
representation means drawing a conclusion based on various conditions. This
representation lays down some important communication rules. It consists of
precisely defined syntax and semantics which supports the sound inference.
Each sentence can be translated into logics using syntax and semantics.
Syntax:
o Syntaxes are the rules which decide how we can construct legal
sentences in the logic.
o It determines which symbol we can use in knowledge representation.
o How to write those symbols.
Semantics:
o Semantics are the rules by which we can interpret the sentence in the
logic.
o Semantic also involves assigning a meaning to each sentence.
Logical representation can be categorised into mainly two logics:
a. Propositional Logics
b. Predicate logics
3. Frame Representation
A frame is a record like structure which consists of a collection of attributes
and its values to describe an entity in the world. Frames are the AI data
structure which divides knowledge into substructures by representing
stereotypes situations. It consists of a collection of slots and slot values.
These slots may be of any type and sizes. Slots have names and values
which are called facets.
Facets: The various aspects of a slot is known as Facets. Facets are features
of frames which enable us to put constraints on the frames. Example: IF-
NEEDED facts are called when data of any particular slot is needed. A frame
may consist of any number of slots, and a slot may include any number of
facets and facets may have any number of values. A frame is also known
as slot-filter knowledge representation in artificial intelligence.
Frames are derived from semantic networks and later evolved into our
modern-day classes and objects. A single frame is not much useful. Frames
system consists of a collection of frames which are connected. In the frame,
knowledge about an object or event can be stored together in the knowledge
base. The frame is a type of technology which is widely used in various
applications including Natural language processing and machine visions.
Example: 1
Let's take an example of a frame for a book
Slots Filters
Year 1996
Page 1152
Example 2:
Let's suppose we are taking an entity, Peter. Peter is an engineer as a
profession, and his age is 25, he lives in city London, and the country is
England. So following is the frame representation for this:
Slots Filter
Name Peter
Profession Doctor
Age 25
Marital status Single
Weight 78
4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If
condition then action". It has mainly three parts:
The working memory contains the description of the current state of problems-
solving and rule can write knowledge to the working memory. This knowledge
match and may fire other rules.
If there is a new situation (state) generates, then multiple production rules will
be fired together, this is called conflict set. In this situation, the agent needs to
select a rule from these sets, and it is called a conflict resolution.
Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the
bus).
Logical Connectives:
Logical connectives are used to connect two simpler propositions or
representing a sentence logically. We can create compound propositions with
the help of logical connectives. There are mainly five connectives, which are
given as follows:
Inference:
In artificial intelligence, we need intelligent computers which can create new
logic from old logic or by evidence, so generating the conclusions from
evidence and facts is termed as Inference.
3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is
true, and Q→R is true. It can be represented as the following notation:
Example:
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R
4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q
will be true. It can be represented as:
5. Addition:
The Addition rule is one the common inference rule, and it states that If P is
true, then P∨Q will be true.
6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It
can be represented as:
Conceptual Dependency
Conceptual dependency theory is a model of natural language
understanding used in artificial intelligence systems.
•The actions are built up from a set of primitive acts which can be modified
by tense.
The expert system is a part of AI, and the first ES was developed in the year 1970,
which was the first successful approach of artificial intelligence. It solves the most
complex issue as an expert by extracting the knowledge stored in its knowledge base.
The system helps in decision making for complex problems using both facts and
heuristics like a human expert. It is called so because it contains the expert
knowledge of a specific domain and can solve any complex problem of that particular
domain. These systems are designed for a specific domain, such as medicine,
science, etc.
The performance of an expert system is based on the expert's knowledge stored in its
knowledge base. The more knowledge stored in the KB, the more that system improves
its performance. One of the common examples of an ES is a suggestion of spelling
errors while typing in the Google search box.
Below is the block diagram that represents the working of an expert system:
1. User Interface
With the help of a user interface, the expert system interacts with the user, takes
queries as an input in a readable format, and passes it to the inference engine. After
getting the response from the inference engine, it displays the output to the user. In
other words, it is an interface that helps a non-expert user to communicate with
the expert system to find a solution.
o Forward Chaining: It starts from the known facts and rules, and applies the inference
rules to add their conclusion to the known facts.
o Backward Chaining: It is a backward reasoning method that starts from the goal and
works backward to prove the known facts.
3. Knowledge Base
o The knowledgebase is a type of storage that stores knowledge acquired from the
different experts of the particular domain. It is considered as big storage of knowledge.
The more the knowledge base, the more precise will be the Expert System.
o It is similar to a database that contains information and rules of a particular domain or
subject.
o One can also view the knowledge base as collections of objects and their attributes.
Such as a Lion is an object and its attributes are it is a mammal, it is not a domestic
animal, etc.
Even Planning is an important part of Artificial Intelligence which deals with the tasks
and domains of a particular problem. Planning is considered the logical side of acting.
Everything we humans do is with a definite goal in mind, and all our actions are
oriented towards achieving our goal. Similarly, Planning is also done for Artificial
Intelligence.
We have Forward State Space Planning (FSSP) and Backward State Space
Planning (BSSP) at the basic level.
Hierarchical Planning
Another interesting concept related to planning is hierarchical
planning:
In complex problems, reaching the goal state from the initial state
can be challenging.
Hierarchical planning involves eliminating some problem details
until a solution is found. These details are not removed from the
actual description of operators but are deferred.
Once a high-level solution is obtained, the missing details are filled
in.
o Imagine a table with several blocks placed on it. Some blocks may be stacked
on top of others, and we have a robot arm capable of picking up or putting
down these blocks.
o The robot arm can move only one block at a time, and it must ensure that no
other block is stacked on top of the one it intends to move.
o Our objective is to transform the configuration of blocks from the initial
state to the goal state.
STRIPS
What is STRIPS?
o STRIPS is an acronym for “STanford Research Institute Planning
System.” Developed by the Stanford AI Lab in the early 1970s, it was initially
designed for use with a robotic arm. However, its applications extend beyond
robotics to various other planning problems.
o At its core, STRIPS aims to find a solution by executing a domain (which
describes the world) and a problem (which defines the initial state and goal
condition).
2. How Does STRIPS Work?
o Describing the World:
In STRIPS, you start by describing the world using several
components:
Objects: These represent entities in the game world (e.g.,
ogres, trolls, dragons, magical items).
Actions: Specify what can be done (e.g., picking up items,
building weapons).
Preconditions: Conditions that must be met before an action
can be executed.
Effects: Changes that occur after an action is performed.
This description sets the stage for the planning process.
o Creating a Problem Set:
A problem consists of an initial state (where things begin) and a goal
condition (what you want to achieve).
STRIPS then searches through all possible states, starting from the
initial state, and executes various actions until it reaches the goal.
3. Using PDDL: Planning Domain Definition Language
o PDDL (Planning Domain Definition Language) is a common language for
writing STRIPS domain and problem sets.
o It allows you to express most of the code using English words, making it
readable and understandable.
o Writing simple AI planning problems using PDDL is relatively
straightforward.
4. What Can STRIPS Solve?
o A wide range of problems can be tackled using STRIPS and PDDL:
Stacking Blocks: Imagine arranging blocks in a specific order.
Rubik’s Cube: Solving the classic puzzle.
Navigating a Robot: In scenarios like Shakey’s World.
Starcraft Build Orders: Planning optimal strategies.
And much more!
Phases of NLP
There are the following five phases of NLP:
Parsing techniques in NLP :
Top-Down and Bottom-Up parsing techniques
Tokenization: the process of breaking text into individual words or phrases
Part-of-speech tagging: the process of labelling each word in a sentence with its
grammatical part of speech
Stemming: a technique that comes from morphology and information retrieval
which is used in natural language processing for pre-processing and efficiency
purposes
Text Segmentation
Named Entity Recognition
Relationship Extraction
Sentiment Analysis
parsing techniques in the context of compiler design. Parsing is a crucial phase in the
compilation process, where a token string (usually generated by lexical analysis) is
transformed into an Intermediate Representation (IR) based on the given grammar. The
parser, also known as the Syntax Analyzer, plays a pivotal role in this process.
Here are the primary types of parsers:
1. Top-Down Parser:
o The top-down parser constructs the parse tree by expanding non-
terminals using grammar productions. It starts from the start symbol and
proceeds towards the terminals.
o Two subtypes of top-down parsers are:
Recursive Descent Parser: Also known as the Brute
Force or Backtracking parser, it generates the parse tree using brute
force and backtracking.
Non-Recursive Descent Parser (LL(1)): This parser employs
a parsing table to generate the parse tree without backtracking.
o Useful for: Simple grammars and LL(1) languages.
2. Bottom-Up Parser:
o The bottom-up parser constructs the parse tree by compressing terminals. It
starts from the terminals and works its way up to the start symbol.
o Two subtypes of bottom-up parsers are:
LR Parser: Generates the parse tree using unambiguous grammar. It
has four variants: LR(0), SLR(1), LALR(1), and CLR(1).
Operator Precedence Parser: Constructs the parse tree based on
operator grammars, where consecutive non-terminals do not appear
without any terminal in between.
o Useful for: Handling complex language constructs.
Types of Agents
Agents can be grouped into five classes based on their degree of perceived
intelligence and capability:
Simple Reflex Agents
Model-Based Reflex Agents
Goal-Based Agents
Utility-Based Agents
Learning Agent
Multi-agent systems
Hierarchical agents
Agents Vs Objects •
Agents natural extension\evolution of Objects but with some very
fundamental differences
– Level of Autonomy
– Stronger design metaphor
– High Level Interactions
– Supporting Organizational structure
– Proactively
– Separate Thread of Execution
– Independent life span
Agents Vs Objects •
It is about adding new abstraction entities: – OOP = structured
programming + objects that have persistent local states
– AOP = OOP + agents that have an independent execution thread +
pro-activity + greater level of autonomy over itself •
An agent is able to act in a goal-directed fashion rather than just
passively reacting to procedure calls – “An agent can say no!”
Semantic Web
Semantic Web is an extension to the World Wide Web. The purpose of the
semantic web is to provide structure to the web and data in general. It
emphasizes on representing a web of data instead of web of documents. It allows
computers to intelligently search, combine and process the web content based on
the meaning that the content has. Three main models of the semantic web are:
1. Building models
2. Computing with Knowledge
3. Exchanging Information
Building Models:
Model is a simplified version or description of certain aspects of the
real-time entities. Model gathers information which is useful for the
understanding of the particular domain.
Computing Knowledge:
Conclusions can be obtained from the knowledge present.
Example: If two sentences are given as ‘John is the son of Harry’ and
another sentence given is- ‘Hary’s father is Joey’, then the knowledge
that can be computed from it is – ‘John is the grandson of Joey’
Similarly, another example useful in the understanding of computing
knowledge is-
‘All A is B’ and ‘All B is C’, then the conclusion that can be drawn from it
is – ‘All A are C’ respectively.
Exchanging Information:
It is an important aspect. Various communication protocols have been
implemented for the exchange of information like the TCP/IP, HTML,
WWW. Web Services have also been used for the exchange of the data.
Agent Communication:
b) Voyager
Voyager [10, 1, 9 ] is an agent development tool developed by
ObjectSpace, in mid1996. ObjectSpace has been taken over by
Recursion Software Inc. since 2001 and it’s now their commercial
product.
c) JADE JADE (Java Agent DEvelopment Framework) [13, 3, 4, 6 , 5 ,
17] is a software Framework fully implemented in Java language. It is
developed by Tilab for the development of multi-agent applications
based on peer-to-peer communication architecture.
d) Anchor Anchor [12] agent toolkit is developed by Lawerence Berkeley
National Laboratory, U.S.A. it facilitates the transmission and secure
management of mobile agents in a heterogeneous distributed
environments. This toolkit is available in BSD style license.
e) Zeus Zeus [13, 7, 18] is an integrated environment for the rapid
development of collaborative agent applications, developed by
Advanced Applications & Technology Department of British
Telecommunication labs
Fuzzy set:
The 'Fuzzy' word means the things that are not clear or are vague. Sometimes, we cannot
decide in real life that the given problem or statement is either true or false. At that time, this
concept provides many values between the true and false and gives the flexibility to find the
best solution to that problem.
Fuzzy set theory is a mathematical framework that extends classical set theory by
allowing elements to have degrees of membership rather than a strict binary
classification. Here are the key points:
1. Classical Sets vs. Fuzzy Sets:
o In classical set theory, an element either belongs or does not
belong to a set (bivalent condition).
o In contrast, fuzzy set theory permits gradual assessment of
membership. Elements can have partial membership based on
a membership function that assigns a value between 0 and 1.
o Fuzzy sets generalize classical sets, where the indicator functions of
classical sets are special cases of the membership functions of fuzzy
sets.
2. Definition:
o A fuzzy set is defined by a pair: a reference set (universe of
discourse) and a membership function.
o The membership function assigns a value to each element in the
reference set, representing the grade of membership.
o For example, if we have a fuzzy set denoted as (A), the membership
function (m(x)) describes how much element (x) belongs to (A).
3. Membership Levels:
o An element can be:
Not included in the fuzzy set if (m(x) = 0) (no membership).
Fully included if (m(x) = 1) (full membership).
Partially included if (0 < m(x) < 1) (fuzzy membership).
4. Applications:
o Fuzzy set theory is used in various domains where information
is incomplete or imprecise, such as:
Bioinformatics
Linguistics
Decision-making
Clustering
5. Example:
o Imagine a fuzzy set representing “tall buildings.” Instead of
categorizing a building as tall or not tall, fuzzy set theory allows us to
express the degree of tallness for each building.
Fuzzy sets provide a powerful way to handle imprecise information and vagueness,
making them valuable in practical applications.
Notion of fuzziness:
The notion of fuzziness refers to the quality of being unclear, vague,
or imprecise. In various contexts, we encounter situations where it’s challenging to
determine whether a state is strictly true or completely false. Fuzzy logic provides
a valuable approach to reasoning in such scenarios, allowing us to
consider inaccuracies and uncertainties.
Here are some key points about fuzzy logic:
1. Definition: Fuzzy logic is a form of many-valued logic where the truth values
of variables can be any real number between 0 and 1. Unlike traditional
binary logic (true or false), fuzzy logic accommodates shades of gray in
between.
2. Applications: Fuzzy logic finds applications in various fields, including:
o Control systems: It’s used to handle imprecise information in control
processes.
o Image processing: Fuzzy techniques enhance image analysis and
feature extraction.
o Natural language processing: Fuzzy logic aids in understanding and
processing ambiguous language.
o Medical diagnosis: It deals with uncertain medical data.
o Artificial intelligence: Fuzzy systems model human reasoning.
3. Membership Function: The fundamental concept in fuzzy logic is
the membership function. It maps an input value to a membership degree
between 0 and 1, representing the degree of belonging to a certain set or
category.
4. Fuzzy Rules: Fuzzy logic operates using if-then rules that express
relationships between input and output variables in a fuzzy manner.
5. Output: The output of a fuzzy logic system is a fuzzy set, which provides
membership degrees for each possible output value.
In summary, fuzzy logic allows for partial truths and is a mathematical method for
handling vagueness and uncertainty in decision-making. It recognizes that the world
isn’t always black and white, but rather a spectrum of possibilities.
Fuzzification is the process of converting a crisp quantity into a fuzzy quantity. On the
other hand, defuzzification is the process of translating a fuzzy quantity into a crisp
quantity. Read this article to learn more about fuzzification and defuzzification and how
they are different from each other.
What is Fuzzification?
Fuzzification may be defined as the process of transforming a crisp set to a fuzzy set or
a fuzzy set to fuzzier set. Basically, this operation translates accurate crisp input values
into linguistic variables. In a number of engineering applications, it is necessary to
defuzzify the result or rather "fuzzy result" so that it must be converted to crisp result.
Fuzzification translates the crisp input data into linguistic variables which are represented
by fuzzy sets. After that, it applies the membership functions to measure and determine
the degree of membership.
What is Defuzzification?
Defuzzification may be defined as the process of reducing a fuzzy set into a crisp set or
to convert a fuzzy member into a crisp member. Mathematically, the process of
Defuzzification is also called "rounding it off". Defuzzification basically transforms an
imprecise data into precise data. However, it is a relatively complex to implement
defuzzification as compared to fuzzification.
Conclusion
The most significant difference that you should note here is that fuzzification converts a
precise data into imprecise data, while defuzzification converts an imprecise data into
precise data.
Union:
In the case of the union of crisp sets, we simply have to select repeated
elements only once. In the case of fuzzy sets, when there are common
elements in both fuzzy sets, we should select the element with
the maximum membership value.
The union of two fuzzy sets A and B is a fuzzy set C, written
as C = A ∪ B
C = A ∪ B = {(x, μA ∪ B (x)) | ∀x ∈ X}
μC(x) = μA ∪ B (x) = μA(x) ∨ μB(x)
= max( μA(x), μB(x) ), ∀x ∈ X
Graphically, we can represent union operations as follows: Red and Blue
membership functions represent the fuzzy value for elements in sets A
and B, respectively. Wherever these fuzzy functions overlap, we have to
consider the point with the maximum membership value.
Example of Fuzzy Union:
C = A ∪ B = {(x, μA ∪ B (x)) | ∀x ∈ X}
A = { (x1, 0.2), (x2, 0.5), (x3, 0.6), (x4, 0.8), (x5, 1.0) }
B = { (x1, 0.8), (x2, 0.6), (x3, 0.4), (x4, 0.2), (x5, 0.1) }
μA ∪ B (x1) = max( μA(x1), μB(x1) ) = max { 0.2, 0.8 } = 0.8
μA ∪ B (x2) = max( μA(x2), μB(x2) ) = max { 0.5, 0.6 } = 0.6
μA ∪ B (x3) = max( μA(x3), μB(x3) ) = max { 0.6, 0.4 } = 0.6
μA ∪ B (x4) = max( μA(x4), μB(x4) ) = max { 0.8, 0.2 } = 0.8
μA ∪ B (x5) = max( μA(x5), μB(x5) ) = max { 1.0, 0.1 } = 1.0
So, A ∪ B = { (x1, 0.8), (x2, 0.6), (x3, 0.6), (x4, 0.8), (x5, 1.0) }
Intersection:
In the case of the intersection of crisp sets, we simply have to select
common elements from both sets. In the case of fuzzy sets, when there
are common elements in both fuzzy sets, we should select the element
with minimum membership value.
The intersection of two fuzzy sets A and B is a fuzzy set C, written
as C = A ∩ B
C = A ∩ B = {(x, μA ∩ B (x)) | ∀x ∈ X}
μC(x) = μA ∩ B (x) = μA(x) ⋀ μB(x)
= min( μA(x), μB(x) ), ∀x ∈ X
Graphically, we can represent the intersection operation as follows: Red
and blue membership functions represent the fuzzy value for elements in
sets A and B, respectively. Wherever these fuzzy functions overlap, we
have to consider the point with the minimum membership value.
Unlike crisp sets, fuzzy sets do not hold the law of contradiction and the
law of excluded middle.
Fuzzy Functions and Linguistic Variables:
fuzzy functions and linguistic variables.
1. Linguistic Variables:
o In traditional mathematics, variables typically take numeric values. However,
in fuzzy logic, we often encounter linguistic variables to express concepts
more intuitively.
o For instance, consider the variable “Age”. Instead of using precise numeric
values, we can define linguistic terms like “Child,” “Young,” and “Old.”
o The linguistic variable “AGE” can be represented as:
AGE = {Child, Young, Old}
o Each linguistic term (e.g., “Child”) has a membership function associated
with a specific age range. These membership values help determine whether a
person falls into the category of a child, young person, or elderly individual.
o For example, if someone’s age is 11, their membership values might be
approximately:
Child: 0.75
Young: 0.2
Old: 0
o The formal definition of a linguistic variable is:
x, T(x), U, G, M
x: Variable name (e.g., AGE)
T(x): Set of linguistic terms (e.g., {Child, Young, Old})
U: Universe (the range of possible values)
G: Syntactical rules that modify the linguistic terms
M: Semantic rules associated with each linguistic term (giving
meaning to the terms)
Example:
Given A = { (a , 0.2), (a , 0.7), (a , 0.4) } and B = { (b , 0.5), (b , 0.6)},
1 2 3 1 2
Complement:
R = { (a, b), μR (a, b) }
c c
μR (a, b) = 1 – μR(a, b)
c
.
.
μR (x , y ) = 1 – μR(x , y ) = 1 – 0.8 = 0.2
c 3 4 3 4
Projection:
The projection of R on X :
∏ (x) = sup( R(x, y) | y ∈ Y)
X
The projection of R on Y :
∏ (y) = sup( R(x, y) | x ∈ X)
Y
Fuzzy rules:
Fuzzy rules are used within fuzzy logic systems to infer an output based on input
variables. Modus ponens and modus tollens are the most important rules of inference. A modus
ponens rule is in the form
Premise: x is A
Implication: IF x is A THEN y is B
Consequent: y is B
In crisp logic, the premise x is A can only be true or false. However, in a fuzzy rule,
the premise x is A and the consequent y is B can be true to a degree, instead of
entirely true or entirely false. This is achieved by representing the linguistic
variables A and B using fuzzy sets. In a fuzzy rule, modus ponens is extended
to generalised modus ponens:
Premise: x is A*
Implication: IF x is A THEN y is B
Consequent: y is B*
The key difference is that the premise x is A can be only partially true.
As a result, the consequent y is B is also partially true. Truth is
represented as a real number between 0 and 1, where 0 is false and 1
is true.
Fuzzy inference:
Fuzzy inference is a fundamental concept in fuzzy logic, which allows us to make decisions
based on imprecise or uncertain information. Let’s explore it further:
1. Definition:
o Fuzzy inference is the process of mapping from a given input to an output
using fuzzy logic.
o It provides a basis for making decisions or discerning patterns when dealing
with inexact or vague data.
2. Components of Fuzzy Inference System (FIS):
o FIS is the key unit of a fuzzy logic system responsible for decision-making.
o It uses IF…THEN rules along with connectors such as “OR” or “AND” to
draw essential decision rules.
o Key characteristics of FIS include:
The output from FIS is always a fuzzy set, regardless of whether the
input is fuzzy or crisp.
A defuzzification unit converts fuzzy variables into crisp variables
when FIS is used as a controller.
3. Functional Blocks of FIS:
o Rule Base: Contains fuzzy IF-THEN rules.
o Database: Defines the membership functions of fuzzy sets used in fuzzy rules.
o Decision-making Unit: Performs operations on rules.
o Fuzzification Interface Unit: Converts crisp quantities into fuzzy quantities.
o Defuzzification Interface Unit: Converts fuzzy quantities into crisp
quantities.
4. Methods of FIS:
o Mamdani Fuzzy Inference System:
Proposed by Ebrahim Mamdani in 1975.
Steps for computing the output:
1. Determine a set of fuzzy rules.
2. Fuzzify the input using input membership functions.
3. Combine fuzzified inputs according to fuzzy rules to establish
rule strength.
4. Determine the consequent of the rule by combining rule
strength and output membership function.
5. Combine all consequents to obtain the output distribution.
6. Finally, defuzzify the output distribution.
o Takagi-Sugeno Fuzzy Model (TS Method):
Proposed by Takagi, Sugeno, and Kang in 1985.
Format of rules: IF x is A and y is B THEN Z = f(x, y).
Here, A and B are fuzzy sets in antecedents, and Z = f(x, y) is a crisp
function in the consequent.
5. Application Areas:
o FIS has been successfully applied in fields such as:
Automatic control
Data classification
Decision analysis
Expert systems
And more!
Remember that fuzzy inference allows us to handle uncertainty and imprecision, making it a
powerful tool in various domains.
The various steps involved in designing a fuzzy logic controller are as follows:
Step 1: Locate the input, output, and state variables of the plane under
consideration. I
Step 2: Split the complete universe of discourse spanned by each variable
into a number of fuzzy subsets, assigning each with a linguistic label. The
subsets include all the elements in the universe.
Step 3: Obtain the membership function for each fuzzy subset.
Step 4: Assign the fuzzy relationships between the inputs or states of
fuzzy subsets on one side and the output of fuzzy subsets on the other side,
thereby forming the rule base.
Step 5: Choose appropriate scaling factors for the input and output
variables for normalizing the variables between [0, 1] and [-1, I] interval.
Step 6: Carry out the fuzzification process.
Step 7: Identify the output contributed from each rule using fuzzy
approximate reasoning.
Step 8: Combine the fuzzy outputs obtained from each rule.
Step 9: Finally, apply defuzzification to form a crisp output.
The above steps are performed and executed for a simple FLC system. The
following design elements are adopted for designing a general FLC system:
1. Fuzzification strategies and the interpretation of a fuzzifier.
2. Fuzzy knowledge base: Normalization of the parameters involved;
partitioning of input and output spaces; selection of membership functions
of a primary fuzzy set.
3. Fuzzy rule base: Selection of input and output variables; the source from
which fuzzy control rules are to be derived; types of fuzzy control rules;
completeness of fuzzy control rules.
4. Decision· making logic: The proper definition of fuzzy implication;
interpretation of connective “and”; interpretation of connective “or”;
inference engine.
5. Defuzzification materials and the interpretation of a defuzzifier.
Applications:
FLC systems find a wide range of applications in various industrial and commercial
products and systems. In several applications- related to nonlinear, time-varying, ill-
defined systems and also complex systems – FLC systems have proved to be very
efficient in comparison with other conventional control systems. The applications of
FLC systems include:
1. Traffic Control
2. Steam Engine
3. Aircraft Flight Control
4. Missile Control
5. Adaptive Control
6. Liquid-Level Control
7. Helicopter Model
In summary, FRBSs provide a powerful way to reason and make decisions in complex
scenarios, leveraging the flexibility and expressiveness of fuzzy logic
Genetic Algorithms
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to
the larger part of evolutionary algorithms. Genetic algorithms are based on the
ideas of natural selection and genetics. These are intelligent exploitation of
random searches provided with historical data to direct the search into the
region of better performance in solution space. They are commonly used to
generate high-quality solutions for optimization problems and search
problems.
Genetic algorithms simulate the process of natural selection which means those
species that can adapt to changes in their environment can survive and reproduce
and go to the next generation. In simple words, they simulate “survival of the fittest”
among individuals of consecutive generations to solve a problem. Each generation
consists of a population of individuals and each individual represents a point in
search space and possible solution. Each individual is represented as a string of
character/integer/float/bits. This string is analogous to the Chromosome.
Genetic algorithms (GAs) are optimization techniques inspired by the process of natural
selection. They mimic the principles of evolution to find optimal solutions to complex
problems. Let’s delve into the encoding strategies used in GAs:
1. Binary Encoding:
o Most common method: Chromosomes are represented as strings of 1s and 0s.
o Each position in the chromosome corresponds to a specific characteristic of
the solution.
o Well-suited for optimization problems in a discrete search space.
o For example, in binary encoding, each gene controls a particular trait, just like
genes in DNA encode traits in living organisms.
2. Permutation Encoding:
o Useful for problems involving ordering, such as the Traveling Salesman
Problem (TSP).
o In TSP, each chromosome is a string of numbers, where each number
represents a city to be visited.
o The order of cities in the chromosome determines the tour route.
3. Value Encoding:
o Applied when complex values (e.g., real numbers) are involved.
o Binary encoding may not suffice for such cases.
o Requires specific crossover and mutation techniques tailored to these
chromosomes.
o Used in various domains, including engineering, finance, medical diagnostics,
artificial intelligence, and logistics.
Remember, the choice of encoding method significantly impacts the GA’s performance and
convergence to optimal solutions.
genetic operators play a crucial role in guiding the algorithm toward finding solutions to
specific problems. Let’s explore these operators:
1. Mutation Operator:
o Mutation is a unary operator that operates on individual
chromosomes (solutions).
o Its purpose is to introduce genetic diversity by randomly altering one or more
genes within a chromosome.
o By doing so, it prevents the algorithm from getting stuck in local optima and
allows exploration of different regions of the solution space.
o Think of it as a way to introduce small, random changes to the genetic makeup
of an individual.
o For example, if we’re optimizing a set of parameters, mutation might tweak
one of those parameters slightly.
2. Crossover Operator (Recombination):
o Crossover is a binary operator that combines two parent chromosomes to
create a new child chromosome.
o It mimics the process of genetic recombination in natural evolution.
o By recombining portions of good solutions, the algorithm is more likely to
create better offspring.
o Different methods exist for combining parent solutions, such as edge
recombination, cut and splice crossover, and uniform crossover.
o The choice of crossover method often depends on the problem being solved
and the representation of the solution.
o For instance, if variables are grouped together as building blocks, a respectful
crossover operator is essential to maintain their integrity.
3. Selection Operator:
o Selection operators give preference to better solutions (chromosomes) based
on some fitness function.
o The best solutions are chosen to pass their genes to the next generation.
o Methods like fitness proportionate selection and tournament selection help
determine the best solutions.
o Elitism, where the best solutions directly pass to the next generation without
mutation, is also a form of selection.
4. Inversion (Permutation) Operator (less commonly used):
o This operator is rarely discussed and its effectiveness remains uncertain.
o It involves reversing a portion of a chromosome.
o While it’s not widely used, it’s interesting to note its existence.
Genetic Algorithms (GAs) and explore the concepts of fitness functions and the GA cycle.
1. Fitness Function:
oThe fitness function is a crucial component in GAs. It evaluates how “fit” or
“good” a candidate solution (individual) is with respect to the problem being
considered.
o Specifically, the fitness function takes a candidate solution as input and
produces a quantitative measure of its fitness.
o In most cases, the fitness function aligns with the objective function of the
problem. For optimization tasks, the goal is either
to maximize or minimize the objective function.
o Characteristics of a good fitness function:
It should be fast to compute, as the fitness value is calculated
repeatedly during the GA process.
It must quantitatively measure the fitness of a solution or the
potential of producing fit individuals from that solution.
In complex problems, direct calculation of the fitness function may not
be feasible due to inherent complexities. In such cases, fitness
approximation is used to suit our needs.
o For example, consider the 0/1 Knapsack problem. A simple fitness function
might sum the profit values of the selected items (those with a value of 1) until
the knapsack is full.
2. GA Cycle:
o The GA operates in a loop over a specified number of generations.
o Key steps in each generation:
Evaluation: The fitness value of each individual in the population is
assessed using the fitness function.
Selection: Individuals with better fitness scores have a higher chance
of being selected for reproduction.
Reproduction: The current generation produces the next generation
through genetic operators (such as crossover and mutation) based on
the selected individuals.
o The process continues iteratively, allowing the population to evolve toward
better solutions.
Remember, GAs mimic natural selection and evolution, making them powerful tools for
optimization and search problems.
Algorithm:
This chromosome undergoes mutation. During mutation, the position of two cities in
the chromosome is swapped to form a new configuration, except the first and the last
cell, as they represent the start and endpoint.
Original chromosome had a path length equal to INT_MAX, according to the input
defined below, since the path between city 1 and city 4 didn’t exist. After mutation,
the new child formed has a path length equal to 21, which is a much-optimized
answer than the original assumption. This is how the genetic algorithm optimizes
solutions to hard problems.
A genetic algorithm (GA) is a search and optimization technique inspired
by the process of natural selection and evolution. A GA works by
creating and maintaining a population of candidate solutions (called
individuals) to a given problem, and applying biologically inspired
operators such as mutation, crossover, and selection to evolve them
toward better solutions.
A typical GA diagram can be represented as follows:
The main steps of a GA are:
Initialization: Generate an initial population of random individuals,
each representing a possible solution to the problem.
Evaluation: Calculate the fitness score of each individual, which
measures how well it solves the problem.
Selection: Select a subset of individuals from the current
population, based on their fitness scores, to be the parents of the
next generation.
Crossover: Combine two or more parents to create new offspring,
by exchanging some of their genetic information (such as bits,
characters, or numbers).
Mutation: Introduce some random changes in the offspring, by
flipping, swapping, or altering some of their genetic information.
Replacement: Replace the current population with the new
offspring, or keep some of the best individuals from the current
population.
Termination: Check if a stopping criterion is met, such as
reaching a maximum number of generations, finding an optimal
solution, or reaching a convergence point. If not, go back to the
evaluation step and repeat the process.
Here is an example of applying a GA to find the optimal values of a and
b that satisfy the following expression:
The objective function is to minimize the value of the expression, which
is zero when a = 3 and b = 5.
The steps are:
Initialization: Generate six random pairs of a and b values
between 1 and 10, and encode them as binary strings of length
8. For example, (a = 2.47, b = 6.84) can be encoded as
0010011001101100.
| Individual | Binary encoding | Decimal values | |------------|-----------------|-
---------------| | 1 | 0010011001101100| (2.47, 6.84) | | 2 |
0100100110000111| (4.77, 8.07) | | 3 | 0001110000011010| (1.87, 2.66)
| | 4 | 0110001010100101| (6.21, 10.61) | | 5 | 0100000001110010|
(4.00, 7.38) | | 6 | 0011100110101001| (3.51, 5.41) |
Evaluation: Calculate the fitness score of each individual, which is
the inverse of the value of the expression. For example, the
fitness score of individual 1 is 1 / (2.47 - 3)^2 + (6.84 - 5)^2 =
0.33.
| Individual | Binary encoding | Decimal values | Fitness score | |------------
|-----------------|----------------|---------------| | 1 | 0010011001101100| (2.47,
6.84) | 0.33 | | 2 | 0100100110000111| (4.77, 8.07) | 0.06 | | 3 |
0001110000011010| (1.87, 2.66) | 0.14 | | 4 | 0110001010100101| (6.21,
10.61) | 0.02 | | 5 | 0100000001110010| (4.00, 7.38) | 0.08 | | 6 |
0011100110101001| (3.51, 5.41) | 0.28 |
Selection: Select three pairs of individuals to be the parents of the
next generation, using a roulette wheel selection method, which
gives a higher probability of selection to individuals with higher
fitness scores.
| Individual | Binary encoding | Decimal values | Fitness score | Selection
probability | |------------|-----------------|----------------|---------------|----------------
-------| | 1 | 0010011001101100| (2.47, 6.84) | 0.33 | 0.36 | | 2 |
0100100110000111| (4.77, 8.07) | 0.06 | 0.07 | | 3 | 0001110000011010|
(1.87, 2.66) | 0.14 | 0.15 | | 4 | 0110001010100101| (6.21, 10.61) | 0.02 |
0.02 | | 5 | 0100000001110010| (4.00, 7.38) |
The table continues as:
| Individual | Binary encoding | Decimal values | Fitness score | Selection
probability | |------------|-----------------|----------------|---------------|----------------
-------| | 5 | 0100000001110010| (4.00, 7.38) | 0.08 | 0.09 | | 6 |
0011100110101001| (3.51, 5.41) | 0.28 | 0.31 | | Total | |
Artificial Neural Network (ANN)
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.
2. Unsupervised Learning:
o In unsupervised learning, the machine is trained on unlabeled data. There
are no paired input-output examples.
o The goal is to discover patterns, relationships, or structures within the data.
o Common tasks in unsupervised learning include clustering, dimensionality
reduction, and anomaly detection.
o For example, given a collection of customer purchase data, unsupervised
learning can group similar customers together without any predefined labels.
3. Reinforcement Learning:
o Reinforcement learning is distinct from the other two. It deals
with sequential decision-making.
o Here, an agent interacts with an environment and learns by receiving rewards
or penalties based on its actions.
o Reinforcement learning is used in scenarios like game playing, robotics, and
recommendation systems.
o Think of it as teaching an AI to play chess: it explores different moves,
receives feedback (rewards or penalties), and adjusts its strategy accordingly.
In summary:
What is a Perceptron?
A perceptron is a type of artificial neuron or the simplest form of a neural
network. It is a model of a single neuron that can be used for binary
classification problems, which means it can decide whether an input
represented by a vector of numbers belongs to one class or another. The
concept of the perceptron was introduced by Frank Rosenblatt in 1957 and
is considered one of the earliest algorithms for supervised learning.
At its core, a perceptron takes several binary inputs, multiplies each input
by a weight, sums all the weighted inputs, and then passes that sum
through a step function, which is a type of activation function, to produce a
single binary output.
Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers,
which transform any input dimension to the desired dimension. A multi-layer
perception is a neural network that has multiple layers. To create a neural network
we combine neurons together so that the outputs of some neurons are inputs of other
neurons.
A multi-layer perceptron has one input layer and for each input, there is one
neuron(or node), it has one output layer with a single node for each output and it can
have any number of hidden layers and each hidden layer can have any number of
nodes. A schematic diagram of a Multi-Layer Perceptron (MLP) is depicted below.
In the multi-layer perceptron diagram above, we can see that there are three inputs
and thus three input nodes and the hidden layer has three nodes. The output layer
gives two outputs, therefore there are two output nodes. The nodes in the input layer
take input and forward it for further process, in the diagram above the nodes in the
input layer forwards their output to each of the three nodes in the hidden layer, and
in the same way, the hidden layer processes the information and passes it to the
output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The
sigmoid activation function takes real values as input and converts them to numbers
between 0 and 1 using the sigmoid formula.
Remember, SOMs are powerful tools for understanding complex data patterns and
visualizing high-dimensional data in a more manageable form!
Hopfield Network
Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative process. It has just one
layer of neurons relating to the size of the input and output, which must be the same.
When such a network recognizes, for example, digits, we present a list of correctly
rendered digits to the network. Subsequently, the network can transform a noise input
to the relating perfect output.
A Hopfield network is a single-layered and recurrent network in which the neurons are
entirely connected, i.e., each neuron is associated with other neurons. If there are two
neurons i and j, then there is a connectivity weight wij lies between them which is
symmetric wij = wji .
With zero self-connectivity, Wii =0 is given below. Here, the given three neurons
having values i = 1, 2, 3 with values Xi=±1 have connectivity weight Wij.
Updating rule:
Consider N neurons = 1, … , N with values Xi = +1, -1.
If hi ≥ 0 then xi → 1 otherwise xi → -1
Thus, xi → sgn(hi), where the value of sgn(r)=1, if r ≥ 0, and the value of sgn(r)=-1,
if r < 0.
We need to put bi=0 so that it makes no difference in training the network with
random patterns.
Synchronously:
In this approach, the update of all the nodes taking place simultaneously at each time.
Asynchronously:
In this approach, at each point of time, update one node chosen randomly or
according to some rule. Asynchronous updating is more biologically realistic.
We can describe a metric on X by using the Hamming distance between any two states: