0% found this document useful (0 votes)
4 views74 pages

Unit 3: Contents

Evolutionary computing utilizes principles of natural selection to optimize solutions for various problems, employing concepts such as populations, chromosomes, and genetic operators. Genetic algorithms (GAs) mimic human evolution through processes like crossover and mutation to navigate large search spaces effectively. Various encoding schemes, including binary, Gray, and real-valued coding, are used to represent solutions, enhancing the efficiency of optimization tasks.

Uploaded by

yt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views74 pages

Unit 3: Contents

Evolutionary computing utilizes principles of natural selection to optimize solutions for various problems, employing concepts such as populations, chromosomes, and genetic operators. Genetic algorithms (GAs) mimic human evolution through processes like crossover and mutation to navigate large search spaces effectively. Various encoding schemes, including binary, Gray, and real-valued coding, are used to represent solutions, enhancing the efficiency of optimization tasks.

Uploaded by

yt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Unit 3: Contents

Evolutionary Computing

❑ Why do we use evolutionary


computation?
❑ ability to produce tightly optimized
solutions for a wide range of
problems
❑ they are extensively used in
computer science
❑ There are even variants that are
created and used specifically for
particular data structures and families
of problems.
Basic Terminology of Evolutionary Computing
Before beginning a discussion on Genetic Algorithms, it is essential to be familiar with some
basic terminology which will be used throughout this Unit.

•Population − It is a subset of all the possible (encoded) solutions to the given problem. The
population for a GA is analogous to the population for human beings except that instead of
human beings, we have Candidate Solutions representing human beings.
•Chromosomes − A chromosome is one such solution to the given problem.
•Gene − A gene is one element position of a chromosome.
•Allele − It is the value a gene takes for a particular chromosome.

DNA sequenceAn individual inherits


two alleles, one from each parent
•Genotype − Genotype is the population in the computation space. In the computation space, the
solutions are represented in a way which can be easily understood and manipulated using a
computing system.

•Phenotype − Phenotype is the population in the actual real world solution space in which
solutions are represented in a way they are represented in real world situations.

•Decoding and Encoding − For simple problems, the phenotype and genotype spaces are the
same. However, in most of the cases, the phenotype and genotype spaces are different.
Decoding is a process of transforming a solution from the genotype to the phenotype space,
while encoding is a process of transforming from the phenotype to genotype space.
Decoding should be fast as it is carried out repeatedly in a GA during the fitness value
calculation.

•For example, consider the 0/1 Knapsack Problem. The Phenotype space consists of solutions
which just contain the item numbers of the items to be picked.
•However, in the genotype space it can be represented as a binary string of length n (where n is
the number of items). A 0 at position x represents that xth item is picked while a 1 represents the
reverse. This is a case where genotype and phenotype spaces are different.
•Fitness Function − A fitness function simply defined is a function which takes the solution as
input and produces the suitability of the solution as the output.
In some cases, the fitness function and the objective function may be the same, while in
others it might be different based on the problem.

•Genetic Operators −
These alter the genetic composition of the offspring. These include crossover, mutation,
selection, etc.
Basic Structure
The basic structure of a GA is as follows

We start with an initial population
(which may be generated at random or
seeded by other heuristics),

select parents from this population for


mating.

Apply crossover and mutation operators


on the parents to generate new
off-springs.

And finally these off-springs replace the


existing individuals in the population
and the process repeats.

In this way genetic algorithms actually


try to mimic the human evolution to
some extent.
GA (Genetic Algorithm) Operators

▪ GA (Genetic Algorithm) is good at taking larger, potentially huge search space


and navigating them looking for optimal solution which we might not find in
lifetime.
▪ GA is better than other traditional algorithm in that they are more robust.
▪ They do not break easily even if the inputs are changed slightly or in the
presence of reasonable noise.

GA is used to resolve complicated optimization problems, such as , organizing the


time table, scheduling job shop, playing games.
The concept of GA is directly derived from natural evolution and heredity i.e.
inheritance, where child inherits the characters (stored in the chromosomes)
from the parent.
Operators in GA:
1.Crossover (Recombination):-
Crossover is the process of taking two parent solutions and producing from them a
child. After the selection (reproduction) process, the population is enriched with better
individuals. Crossover operator is applied to the mating pool with the hope that it creates a
better offspring.

The various crossover techniques are-


i).Single-Point Crossover-Here the two mating chromosomes are cut once at corresponding
points and the sections after the cuts exchanged.
ii). Two-Point Crossover-
Here two crossover points are chosen and the contents between these points are exchanged
between two mated parents.
2. Inversion:-
Inversion operator inverts the bits between two random sites.
01 0011 1
Then, 0111001

3. Deletion:-
i).Deletion and duplication-Here any two or three bits in
random are selected and their previous bits are duplicated.
before duplication: 00 1001 0
deletion: 00 10_ _ 0
duplication: 00 1010 0
ii). Deletion and regeneration-Here bits between the
cross site are deleted and regenerated randomly.
10 0110 1
10 _ _ _ _ 1
10 1101 1
4. Mutation:-
After crossover, the strings are subjected to mutation. Mutation prevents the
algorithm to be trapped in a local minimum. It plays the role of recovering the
genetic materials as well as for randomly distributing genetic information. It
helps escape from local minima’s trap and maintain diversity in the population.
Mutation of a bit involves flipping a bit, changing 0 to 1and vice-versa.
Genetic Algorithms offer the following
advantages-
❖ Introduction: Genetic Algorithm
❖ Abstraction of real biological Point-01:
evolution
•Genetic Algorithms are better than
❖ Solve complex conventional AI.
Problems(NP-Hard type •This is because they are more robust.
ex.TSP)
Point-02:
❖ Focus On Optimaization
•They do not break easily unlike older
❖ Population of possible AI systems.
solution for a given problem •They do not break easily even in the
presence of reasonable noise or if
❖ From group of individuals the inputs get change slightly.
the best one Will survive
Encode
Point-03:
Phenotype Genotype
While performing search in multi modal
Decode state-space or large state-space,
•Genetic algorithms has significant
benefits over other typical search
optimization techniques.
Evolutionary Computing
❖ We see a diversity of life on earth – millions of species each with its own unique behaviour patterns and

characteristics or traits.

❖ All of these plants, animals, birds, fishes and other creatures have evolved, and continue evolving, over

millions of years.

❖ They have adapted themselves to a constantly shifting and changing environment in order to survive.

❖ Those weaker and less fit members of species tend to die away, leaving the stronger and fitter to mate,

create offspring and ensure the continuing survival of the species.

❖ Their lives are dictated by the laws of natural selection and Darwinian evolution – struggle for existence

and survival of the fittest


❖ Evolutionary computing is the emulation of the process of natural selection in a search procedure.

❖ In nature, organisms have certain characteristics that influence their ability to survive and reproduce.

❖ These characteristics are represented by encoding of information contained in the chromosomes of the

organisms.

❖ New offspring chromosomes are created by means of mating and reproduction mechanisms.

❖ The end result will be offspring chromosomes that contain the best characteristics of each parent’s

chromosomes, which enable them to survive in an adverse environment.

❖ The process of natural selection ensures that more fit individuals have the opportunity to mate most of the

time, leading to the expectation that the offspring will have similar or better fitness
Terminologies of Evolutionary Computing

1.Chromosome Representation
In nature, characteristics or traits of organisms are represented by long strings of information encoded in the
chromosomes. The first design step in EC is commonly called chromosome representation, where each individual of a
population represents a candidate solution to an optimization or search problem. The characteristics of an individual are
represented by the chromosome, or genome. A chromosome can be thought of as a vector X consisting of m genes
denoted by
x: X = {x1, x2, x3,..., xm}
Objects forming possible solutions within the original problem context are referred to as phenotypes, while their
encodings are called genotypes
A phenotype is the expressed behavioural traits of an individual in a specific environment. A genotype describes the
genetic composition of an individual as inherited from its parents. In other words, it is a mechanism to store
experiential evidence as gathered by parents. It is important to understand the difference between the phenotype space
and the genotype space
2.Encoding Schemes
❖ The first task in EC is to find a mechanism to encode the genetic information of a population representing an entire search
space of a problem domain into chromosomes
❖ . In fact, this is a mapping from the phenotype to the genotype space.
❖ Encoding schemes in any evolutionary algorithm (EA) should be such that the representation and the problem space are
close together, i.e., a natural representation of the problem.
❖ This also allows the incorporation of knowledge about the problem domain into the EC system in the form of special
genetic information and a set of operations.

some of the encoding schemes will be introduced that have been in use with some success in the field.

I] Binary Coding :
The most commonly used chromosome representation in the EC is the binary coding scheme. For an n-dimensional search
space, each individual consists of n variables with each variable in the parameter set encoded as a binary string and
concatenated to form a chromosome. Some problems can be expressed very efficiently using binary coding, as follows:
For example
Chromosome A 10110010
Chromosome B 11111110
In this example, chromosomes A and B have eight genes. The position or locus of the ith gene is simply the ith bit in the
bit-string and the value or allele is given by the bit-string A[i] or B[i]

Example Details

The problem is to optimize a function f (x1, x2, x3) that takes real values between [0.0, 1.0] with each value represented by 8
digits. In binary coding the string {00000000} corresponds to the real value 0.0 and {11111111} corresponds to 1.0. Now a
chromosome represented by {x1, x2, x3} looks like
2.Gray Coding :
While binary coding is frequently used, it has the disadvantage of Hamming Cliffs. A Hamming Cliff is formed when two
numerically adjacent values have bit representations with a large Hamming distance. For example, consider the decimal
numbers 7 and 8. The corresponding binary representations using 4-bits are 7 = 0111 and 8 = 1000, with a Hamming distance of
4. This causes a problem when a small change in variables should result in a small change in fitness. To overcome the problem
of Hamming Cliffs, an alternative bit representation is to use Gray coding. Gray coding has the advantage over binary coding in
that the Hamming distance between two successive numerical values is one. Binary numbers can easily be converted into Gray
coding using the conversion

where bk = b1, b2,..., bn and b1 is the most significant bit in binary representation, g1 = b1 and ⊕ represents XOR
operation

Example 6.2 Represent the decimal


numbers 1–8 in binary and Gray code
showing Hamming Cliffs between two
subsequent numbers. The difference in
bit position is shown in italic in the box
below
Real-valued Coding :
The use of real-valued genes in EC is claimed to offer a number of advantages in numerical function optimization over
binary encoding (Wright, 1991).
The efficiency of EC is increased as there is no need to convert binary strings into real values before each function
evaluation and hence there is no loss in precision caused by conversion.
When real values are used in chromosome representation, chromosomes are simply a string of real values
X = {r,r,r} r ∈ R
For example
Chromosome A 456.1, 0.6879, 4.589
Chromosome B 456.34, 0.7968, 5.984
Example The problem is to optimize a function f (x1, x2, x3) that takes real values between [0.0, 1.0]
with each value represented by a real value rather than binary digits.

Now the chromosome represented by {x1, x2, x3} looks like X = {0.00390625 0.15625 0.69140625}
The advantage of real-valued coding over binary coding includes increased precision and the chromosome string becomes
shorter. Also, real-valued coding gives greater freedom to use special crossover and mutation techniques
Hybrid Coding :
There are many heterogeneously structured problems that occur very often in the industry which have a large
complex set of solutions.
A simple homogeneous encoding scheme of chromosome representation – such as a binary string, encoded
integers, permutation of symbols or expression trees – does not work out to a solution of such problems.
Partitioning a problem into components is sometimes realistic in terms of implementation issues.
A component is defined as a homogeneous collection of parameters or variable values of the same type or
structure.
A component can be, for example, a set of integers, floating points, trees, permutation strings, etc.
The chromosome can thus be a combination of binary, real values and other expressions depending on the problem
Permutation Coding:
Permutation problems require the optimal arrangement of a set of symbols in a list.
The travelling salesperson problem (TSP) is such a problem, where a symbol represents a city and the
arrangements of symbols in a list represent the order in which the person should visit each city for a circuit of all
cities.
Chromosome A = {1 5 3 2 6 4 7 9 8}
Chromosome B = {a d e b f i c g h }
This representation, in fact, prohibits missing or duplicate allele values and facilitates a simple decoding
mechanism
There are actually two classes of problems that can be represented by permutation coding. The first kind is the
ordering problem in which events should occur in a fixed order, e.g., job shop scheduling. The second kind is the
adjacency problem. A typical problem is the TSP

Value Coding:
Direct value encoding can be used in problems where some more complicated values such as real numbers
are used. Use of binary encoding for this type of problem would be difficult. In the value encoding, every
chromosome is a sequence of some values. Values can be anything connected to the problem, such as (real)
numbers, charts or any objects. An example of chromosomes with value encoding is as follows:
Chromosome A = {1.2324 5.3243 0.4556 2.3293 2.4545}
Chromosome B = {ABDJEIFJDHDIERJFDLDFLFEGT}
Chromosome C = {(back),(back),(right), (forward),(left)}
Chromosome D = {NB, NB, ZO, ZO, PS, NS, NS, ZO}
Tree Coding:
Tree encoding is used mainly for evolving programs or expressions, i.e., for genetic programming. In the tree encoding every
chromosome is a tree of some objects, such as functions or commands in programming language. For example, an algebraic
expression x + 5 y can be described by the tree encoding shown in chromosome A in Figure 6.3(a). Similarly, a computer
command ‘steps do until wall’ is expressed using tree encoding shown in chromosome B in Figure 6.3(b). Tree encoding is
useful for evolving programs or any other structures that can be encoded in trees. A programming language like LISP is often
used for this purpose, since programs in LISP are represented directly in the form of a tree and can easily be parsed as a tree,
so the crossover and mutation can be done relatively easily. The task is to find a function that would approximate given pairs
of values
Grammar Coding:

Grammatical coding was introduced by Kitano (1990) to train neural network architectures. A grammar is a set of rules that

is applied to produce a set of structures (e.g., sentences in a natural language, programs in a computer language). A simple

example is the following:

S → aSb

S →∈

Here S is the start symbol and a non-terminal, a and b are terminals, and ∈ is the empty string terminal. S → ∈ means that

S can be replaced by the empty string. To construct a structure from this grammar, start with S and replace it with one of the

allowed replacements given by the right-hand sides; take the resulting structure and continue until no non-terminals are left.

For example: S → aSb → a(aSb)b → a(a(∈)b)b → aab


Evolutionary Programming
❖ EP is derived from simulation of adaptive behaviour in evolution. That is, EP considers phenotypic evolution. The
evolutionary process consists of finding a set of optimal behaviours from a space of observable behaviours
❖ For example, if one were trying to find the shortest path in a travelling salesman problem, each solution would be a
path. The length of the path could be expressed as a number, which would serve as the solution’s fitness.
❖ The goal would be to find the globally shortest path in that space, or more practically, to find very short tours very
quickly.

The basic EP method involves four steps (repeated until a threshold for iteration is exceeded or an adequate solution is
obtained):
1. Initialization of population. A population of individuals is created randomly, which uniformly covers the search space of the
optimization problem. The number of individuals in a population is highly relevant to the speed of optimization, but no
definite answers are available as to how many individuals are appropriate (other than >1) and how many individuals are just
wasteful.
2. Mutation. Each individual is replicated into a new population. Each of these offspring are mutated according to a distribution
of mutation types, ranging from minor to extreme, with a continuum of mutation types between. The severity of mutation is
judged on the basis of the functional change imposed on the parents
▪ 3. Evaluation. Each offspring is assessed by computing its fitness values f (xi) from the objective function by scaling them
to positive values and sometimes by imposing some random alternation νi . The fitness values actually quantify
behavioural traits. Survival in EP is usually based on a relative fitness measure. Individuals that go into the next
generation are selected based on relative fitness.
▪ 4. Selection. The purpose of the selection mechanism is to choose individuals from parents and offspring that survive to
the next generation. Typically, a stochastic tournament is held to determine N individuals to be retained for the
population of the next generation, although this is occasionally performed deterministically. There is no requirement that
the population size be held constant, however, neither that only a single offspring be generated from each parent

EP differs from the other evolutionary algorithms in that no


crossover operation is implemented.
Only selection and mutation operators are applied to produce
the new generation of population. Selection is based on
competition and mutation is based on the amount of variation
determined by a step size sampled from some probability
distribution.
Advance Topics: Multi-objective Optimization
Problems requiring simultaneous optimization of more than one objective function are
known as multi-objective optimization problems (MOOPs).
▪ This type of problem has no unique perfect solution. In traditional multi-objective optimization, it is very common to

simply aggregate all the objectives together to form a single (scalar) fitness function

▪ This requires knowledge about the underlying problem which is not known a priori in most cases

▪ when dealing with such MOOPs, rather than a single solution. Most MOOPs do not provide a single solution; rather,

they offer a set of solutions. Such solutions are the ‘trade-offs’ or good compromises among the objectives.

▪ The first task ensures that the obtained set of solutions is near optimal, while the second task ensures that a wide

range of trade-off solutions is obtained.

▪ EAs are applied in MOOPs and the combination became known as a multi-objective evolutionary algorithm (MOEA).
In an MOEA,
Figure 7.1 shows many solutions trading off differently between the objectives
for a two-objective minimization problem. Any two solutions from the feasible
objective space can be compared. For a pair of solutions, it can be seen that
one solution is better than the other in the first objective but worse in the
second objective. The individuals that fall close to either axes or the origin of
the two dimensional objective space are better than those away from the axes
or origin. In the objective
Unit 4 Introduction to Basic Terminologies in Genetic
Algorithm:
What is GA ?
• A genetic algorithm (or GA) is a search technique used in computing to find true or
approximate solutions to optimization and search problems.
• (GA)s are categorized as global search heuristics.
• (GA)s are a particular class of evolutionary algorithms that use techniques inspired by
evolutionary biology such as inheritance, mutation, selection, and crossover (also called
recombination).
• The new population is used in the next iteration of the algorithm.
• The algorithm terminates when either a maximum number of generations has been produced, or a satisfactory
fitness level has been reached for the population. No convergence rule or guarantee!

• Individual - Any possible solution


• Population - Group of all individuals
• Fitness – Target function that we are optimizing (each individual has a fitness)
• Trait - Possible aspect (features) of an individual
• Genome - Collection of all chromosomes (traits) for an individual.
GA Operators
• Methods of representation
Binary Representation in GA
Claculate Fitness Function for Selection to Maximize Value

To find Probabbility We Use


following formaula and we get
probability & % Prob by multiplying
100 to prob value we get that value
To count expected count we use this formula
And we get diff. values then in case of actual count
We find nearest value to them ex.2.1 to 2, 0.0 to 0
and so on. So amongst those we negelect 0 values
and those come 1 use ones and 2 can use two time
in next..
String 1 has actual count 1 so take one time and zero excluded and actual count 2 is taken 2 time for ex.string 2 and 3 same
Crossover applied new offspring generate from that new decimal no. and thir fitness value by formula and so on.
So find the differences in previous population fitness max value i.e 625 and now we get 729
Used these for further mutation operation and we get max value 841>729>625 and so on what ever string we get in
mutation again use for next selection as it is looping till stopping condition met

Crossover Mutation
Messy Genetic Algorithms:
messy genetic algorithm" (MGA), which is a type of genetic algorithm designed for solving combinatorial optimization
problems. In MGA, the encoding of solutions is "messy" because each gene (decision variable) can influence
multiple parts of the solution

Step 1: Initialization : Start by randomly generating an initial population of candidate solutions.


Step 2: Fitness Evaluation: Evaluate the fitness of each candidate solution using an objective function.
Step 3: Selection Select parent solutions from the population based on their fitness, using methods like tournament
selection or roulette wheel selection.

Step 4: Crossover (Recombination)Create new offspring solutions by combining genetic material from the selected
parent solutions.
Step 5: Mutation Introduce random changes to some genes in the offspring solutions to promote diversity.
Step 6: Replacement Replace some solutions in the current population with the newly generated offspring solutions.
Step 7: Termination: Repeat the process for a fixed number of generations or until a termination condition is met
(e.g., reaching a maximum number of iterations or finding a satisfactory solution).

Example:
Let's consider an example where we want to find the optimal combination of ingredients for a recipe using MGA.
•Problem: Find the best combination of ingredients for a cake recipe.
•Genes: Each gene represents a particular ingredient (e.g., flour, sugar, eggs).
•Objective Function: The objective function evaluates each recipe based on factors like taste, texture, and
appearance.
Example Exlained :
Let's consider a simple optimization problem of maximizing the value of a mathematical function f(x,y)=x2+y2,
where x and y are real-valued decision variables.
Pseudo Code for Messy Genetic Algorithm:
1. Initialize population P with randomly generated individuals (x, y)
2. Evaluate the fitness of each individual in P using the objective function f(x, y)
3. Repeat until termination condition is met:
a. Select parents from P based on fitness (e.g., roulette wheel selection)
b. Perform crossover to generate offspring (e.g., single-point crossover)
c. Perform mutation on offspring (e.g., randomly perturb gene values)
d. Evaluate the fitness of offspring
e. Replace some individuals in P with the offspring
4. Return the best individual found as the solution

Messy Genetic Algorithm to solve an optimization problem. By iteratively evolving a population of potential
solutions through selection, crossover, and mutation operations, MGAs are capable of finding near-optimal
solutions to complex optimization problems.
Unit V Computational Intelligence and NLP

Computational Intelligence (CI):


•Computational Intelligence refers to the study of adaptive mechanisms to enable or facilitate intelligent
behaviour in complex and changing environments.
It encompasses various techniques inspired by biological and natural systems, such as neural networks,
evolutionary algorithms, fuzzy logic, and swarm intelligence.
Natural Language Processing (NLP):
•Natural Language Processing is a subfield of artificial intelligence (AI) that focuses on the interaction
between computers and humans through natural language.
•NLP involves tasks such as text parsing, sentiment analysis, machine translation, question
answering, and language generation.
It aims to enable computers to understand, interpret, and generate human language in a way that is both
meaningful and useful
Examples:
•Sentiment analysis systems often use machine learning algorithms, such as neural networks or support
vector machines, trained on large datasets to classify the sentiment of text.
•Machine translation systems can employ evolutionary algorithms to optimize translation models,
improving the quality of translated output over time.
•Chatbots and virtual assistants utilize various CI techniques to understand and generate natural language
responses, providing more intelligent and contextually appropriate interactions.
•ul.
Unit V Computational Intelligence and NLP

Word embedding:
Word embedding is a technique used in natural language processing (NLP) to represent words as
vectors in a continuous vector space, where words with similar meanings are mapped to nearby points. This
method captures the semantic relationships between words, enabling algorithms to better understand the
context and meaning of words in textual data.
A few of the tasks that NLP is used for
•Text summarization: extractive or abstractive text summarization
•Sentiment Analysis
•Translating from one language to another: neural machine translation
•Chatbots

Word Embedding Techniques


1)Bag of words(BOW)
Bag of words is a simple and popular technique for feature extraction from text. Bag of word model
processes the text to find how many times each word appeared in the sentence. This is also called
as vectorization.
Steps for creating BOW
•Tokenize the text into sentences
•Tokenize sentences into words
•Remove punctuation or stop words
•Convert the words to lower text
Word Embedding Here's a simplified example:

Consider a corpus with two documents:


•Document 1: "The cat sat on the mat."
•Document 2: "The dog ate the bone.“

The vocabulary would be: {"the", "cat", "sat", "on", "mat", "dog", "ate", "bone"}
The Bag of Words representation for each document would be:

•Document 1: [2, 1, 1, 1, 1, 0, 0, 0]
•Document 2: [2, 0, 0, 0, 0, 1, 1, 1]

In these vectors, each index corresponds to the count of the respective word in the
vocabulary within the document.
While Bag of Words is simple and easy to implement, it doesn't capture the semantic relationships
between words or consider the order of words in the text.
It treats each document as a collection of words without regard to their sequence. Consequently, it
may not be suitable for tasks where the context and order of words are important, such as language
translation or sentiment analysis.
However, it can still be effective for tasks like document classification or spam detection.
Word Embedding Techniques continue….

2)TF-IDF (Term Frequency-Inverse Document Frequency):


it is a popular technique used in natural language processing for representing text data and determining the
importance of words in a document corpus(biological term). Unlike traditional Bag of Words representation,
TF-IDF takes into account both the frequency of a term in a document and its rarity across the entire corpus.

Term Frequency (TF):


•Term Frequency measures the frequency of a term (word) within a document.

Inverse Document Frequency (IDF):


•Inverse Document Frequency measures the rarity of a term across the entire corpus.

TF-IDF Calculation:
•TF-IDF is calculated by multiplying the Term Frequency (TF) of a term in a document by its Inverse Document
Frequency (IDF) across the corpus.

•The formula for TF-IDF is: TF-IDF = TF(term, document) * IDF(term, corpus)
•The result is a weight that represents the importance of the term in the document relative to its importance in
the corpus.
Example: Consider a small corpus containing three documents:
•Document 1: "The cat sat on the mat."
•Document 2: "The dog ate the bone."
•Document 3: "The cat and the dog are friends.“

Let's calculate the TF-IDF scores for each term in each document:
•Term Frequency (TF) is calculated as the number of times a term appears in a document divided by
the total number of terms in the document.
•Inverse Document Frequency (IDF) is calculated as the logarithm of the total number of documents
divided by the number of documents containing the term.
•TF-IDF is the product of TF and IDF.

For example,
let's calculate the TF-IDF score for the term "cat" in Document 1:

•TF("cat", Document 1) = 1 (cat appears once in Document 1) / 6 (total terms in Document 1) = 1/6
•IDF("cat", corpus) = log(3 (total documents) / 2 (documents containing "cat")) = log(1.5) ≈ 0.405
•TF-IDF("cat", Document 1) = (1/6) * 0.405 ≈ 0.0675
Similarly, TF-IDF scores for all terms in all documents can be calculated.
Word Embedding Techniques continue….

Word2Vec is a widely used word embedding technique in natural language processing (NLP) that
captures semantic relationships between words by representing them as dense vectors in a
continuous vector space
Word2Vec is based on the distributional hypothesis, which states that words that frequently
appear together in similar contexts have similar meanings
How Word2Vec Works:
1. Architecture:
1. Word2Vec consists of two main models: Continuous Bag of Words (CBOW) and Skip-gram.
These models are trained on large corpora of text data.
2. CBOW predicts a target word based on its context words, while Skip-gram predicts context
words based on a target word.
2.Training:
1. Word2Vec models are trained on large text corpora. During training, the model learns to
predict the context words for a given target word (Skip-gram) or predict the target word given
its context words (CBOW).
3.Vector Representation:
1. Words with similar meanings or usage patterns are mapped to nearby points in the vector
space, allowing for efficient computation of semantic similarity.
4.Semantic Similarity:
1. Words with similar meanings will have high cosine similarity values, while words with
dissimilar meanings will have low cosine similarity values.
Example:Word2Vec

Consider the following sentences:


1."The cat sat on the mat."
2."The dog chased the ball.“

Let's train a Word2Vec model on these sentences and then


explore the resulting word vectors:

•After training, the Word2Vec model might represent the


word "cat" as a vector like [0.2, -0.3, 0.5], "dog" as [0.1, -0.4,
0.6], "sat" as [0.3, -0.2, 0.4], and so on.
•We can then compute the cosine similarity between these
word vectors to measure their semantic similarity. For
example, the cosine similarity between the vectors for "cat"
and "dog" might be 0.8, indicating a high degree of semantic
similarity between these words.

•We can also perform vector arithmetic with word vectors.


For example, if we subtract the vector for "cat" from the
vector for "dog" and add the resulting vector to the vector for
"mat", we might obtain a vector that is close to the vector for
"ball", suggesting a semantic relationship between these
words.
GloVe (Global Vectors for Word Representation)
is a popular word embedding technique that captures the statistical properties of words in a corpus to generate
dense vector representations.
These vectors are useful for various natural language processing (NLP) tasks, such as sentiment analysis,
machine translation, and named entity recognition.

Word Co-occurrence Matrix: GloVe starts by constructing a word co-occurrence matrix X, where Xij represents
how often word i appears in the context of word j.

Example: Consider a small corpus containing the following sentences:

•Sentence 1: "The cat sat on the mat."


•Sentence 2: "The dog played in the park."
For simplicity, let's assume a window size of 1. The co-occurrence matrix for the words "cat," "dog," "sat," and
"played" would look like this:

Using GloVe, this matrix is utilized to learn vector representations for each word that capture the semantic
relationships between them based on their co-occurrences. These learned vectors can then be used in various
downstream NLP tasks.
Neural word embeddings:
particularly techniques like Word2Vec and FastText, are widely used methods for representing words as
dense vectors in NLP tasks

Example: Consider a small corpus with the sentence Formula is (Target,Contecxt)


"the quick brown fox jumps over the lazy dog"
Data Preparation:
•From this sentence, we create pairs of (target, context) words. For example:
• (quick, the), (quick, brown), (brown, quick), (brown, fox), ...so.on

Consider a simple corpus with the following sentences:


•Sentence 1: "The cat sat on the mat."
•Sentence 2: "The dog played in the park."
We'll train a Word2Vec model on this corpus to generate word embeddings.
1.Data Preparation:
1. Extract context-target word pairs within a specified window size around each target word.
2. For example, for the target word "cat," the context words are "the" and "sat."
Neural Machine Translation (NMT) is a powerful approach to machine translation that utilizes neural networks,
including embedding layers, to translate text from one language to another.
1. Word Embeddings:
These word embeddings capture semantic and syntactic information about words.
2.Neural Network Architecture:
Encoder: Takes the input sentence in the source language and processes it into a fixed-size vector
representation
Decoder: Generates the translated sentence in the target language based on the context vector produced by the
encoder.
3. Training:
During training, the model learns to minimize the difference between the predicted translation and the actual
translation.
•4. Inference:During inference, the trained NMT model is used to translate new sentences. The input sentence is
first tokenized and converted into word embeddings. The encoder produces the context vector, which is then used
by the decoder to generate the translated sentence word by word.

Example:
Let's say we have a sentence in English: "The cat is on the mat."
We want to translate this sentence into French: "Le chat est sur le tapis.“

NMT with embedding has achieved state-of-the-art performance in machine translation tasks and is widely
used in production systems for translating text between different languages.
Sequence-to-Sequence (Seq2Seq) Models:
Sequence-to-Sequence (Seq2Seq) models are a class of neural network architectures used for various
sequence generation tasks, with one prominent application being Neural Machine Translation (NMT)

Architecture:
•Seq2Seq models consist of two main components: an encoder and a decoder.
•Encoder: Processes an input sequence and encodes it into a fixed-size context vector, which captures the
semantic information of the input.
•Decoder: Takes the context vector produced by the encoder and generates an output sequence one step at a
time.
As its Corelated to NMT so

Example:
1.Input Sentence (Source Language): "Je suis étudiant."
2.Target Translation (Target Language): "I am a student."
In this example, the Seq2Seq model takes the input sentence "Je suis étudiant." (French) and generates the
corresponding translation "I am a student." (English).

Overall, Seq2Seq models have been highly successful in NMT tasks, achieving state-of-the-art
performance and outperforming traditional statistical machine translation approaches in many cases.
Metrics (BLEU Score & BERT Score)
Metrics BLEU Score :BLEU Score (Bilingual Evaluation Understudy Score):

•Purpose: BLEU Score is a popular metric for evaluating the quality of machine-generated translations by
comparing them to reference translations.

•Calculation: It computes a score based on the n-gram overlap between the machine-generated
translation and one or more reference translations.

•Procedure:
• Calculate the precision of n-grams (1-gram, 2-gram, ..., up to a certain maximum n) in the
machine-generated translation compared to the reference translations.
• Compute a brevity penalty to address shorter translations.
• Combine the precision scores using a geometric mean, giving more weight to higher n-grams.
• The final BLEU score is the geometric mean of the combined precision scores.
•Example:
•Machine Translation: "The cat is on the mat."
•Reference Translation: "The cat is sitting on the mat."
We'll calculate the BLEU Score for this translation using unigrams and bigrams.
Traditional Versus Neural Metrics for Machine Translation Evaluation
Neural Style Transfer (NST) is a technique in artificial intelligence and computer vision that allows
for the synthesis of images by combining the content of one image with the style of another. It is
inspired by the idea of separating and recombining content and style in visual art, and it has gained
popularity for generating artistic images and videos.

1. Content and Style Representation:


•Content Image (C): This is the input image whose content we want to preserve in the final stylized
image.
•Style Image (S): This image provides the artistic style that we want to apply to the content image.
2. Feature Extraction:
•Both the content image (C) and style image (S) are passed through a pretrained convolutional neural
network (CNN) such as VGG, ResNet, or Inception.
3. Content Loss:
•The content loss measures the difference in content between the generated image (G) and the content
image (C).
4. Style Loss:
•The style loss quantifies the difference in style between the generated image (G) and the style image (S).
•Gram matrix captures the correlations between different features, which helps in preserving the style
characteristics such as textures, colors, and patterns.
5. Total Variation Loss:
•To ensure smoothness and reduce artifacts in the generated image, a total variation loss term is often
added.
•This loss encourages spatial coherence by penalizing rapid changes in pixel values.
6. Optimization:
Example
Pertained NLP BERT Model and its application
Pretrained NLP (Natural Language Processing) BERT (Bidirectional Encoder Representations from
Transformers) models have revolutionized various NLP tasks by providing contextualized word representations.

Pretrained BERT Model:

•BERT is a transformer-based model developed by Google AI, introduced in the paper "BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding."
•It utilizes bidirectional attention mechanisms to capture contextual information from both left and right
contexts in a sentence.
•BERT is pretrained on large text corpora using unsupervised learning objectives, such as masked language
modeling (MLM) and next sentence prediction (NSP).

Key Features of BERT:

1.Contextual Word Representations: BERT captures the contextual meaning of words by considering their
surrounding words in a sentence.
2.Bidirectional Attention: It uses bidirectional attention mechanisms to understand the relationships between
words in both directions.
3.Transformer Architecture: BERT is built upon the transformer architecture, which enables efficient processing of
long-range dependencies in text.
Applications of Pretrained BERT Models:

1. Text Classification:
•BERT can be fine-tuned for various text classification tasks such as sentiment analysis, spam detection, and document
categorization.

2. Named Entity Recognition (NER):Accuracy achieve


•BERT can be used for NER tasks to identify and classify named entities such as persons, organizations, locations, etc.,
within a text.

3. Question Answering (QA): datasets like SQuAD (Stanford Question Answering Dataset),
BERT can be fine-tuned for QA tasks where the model is required to answer questions based on a given context
passage.

4. Text Generation: like GPT (Generative Pre-trained Transformer).


•BERT can be used for text generation tasks such as language modeling, dialogue generation, and summarization.

5. Semantic Similarity:semantic similarity between sentences or documents.


•BERT embeddings can be used to measure Applications include duplicate detection, paraphrase identification, and
information retrieval.

6. Machine Translation:
•While not traditionally a primary application, BERT embeddings can aid in machine translation tasks by providing
contextualized representations of source and target language sentences.
•These embeddings can be used in conjunction with encoder-decoder architectures to improve translation quality.
Sentiment Analysis By Using BERT As an Example For Reference

MLM-Masked Modelling Lang


NSP-Next Sentence Prediction
CLS-special classification token
SEP-separator token
Unit VI Artificial Immune Systems
What is the immune system?
The immune system protects your child's body from outside invaders. These include germs such as bacteria,
viruses, and fungi, and toxins (chemicals made by microbes). The immune system is made up of different organs,
cells, and proteins that work together.
There are 2 main parts of the immune system:
•The innate immune system. You are born with this.
•The adaptive immune system. You develop this when your body is exposed to microbes or chemicals released by
microbes.
These 2 immune systems work together.
Computational aspects of Immune System
From the point of view of information processing, the natural biological immune system exhibits many interesting characteristics .
Pattern matching : The immune system is able to recognize specific antigens and generate appropriate responses.
Feature extraction : In general antibodies do not bind to the complete antigen, rather portion of it. In this way, the immune
system can recognize an antigen just by matching segments of it

Learning and Memory : The main characteristic of the adaptive immune system is that it is able to learn through the
interaction with the previously encountered antigens. So next time when the same antigen is detected, the memory cells
generate a faster and more intense response (secondary response). Memory cells work as an associative distributed memory.

Diversity : Clonal selection and hypermutation mechanisms are constantly testing different detector configuration for known
and unknown antigens.

Distributed Processing : Unlike nervous system, the immune system does not possess a central controller. Detection and
response can be executed locally and immediately without communicating with any central organ.

Self-regulation : Depending on the severity of the attack, response of the immune system can range from very light almost
imperceptible to very strong. A stronger response uses a lot of resources to help repel the attacker. Once the invader is
eliminated, the immune system regulates itself in order to stop the delivery of new resources and to release the used ones.

Self-protection : By protecting the whole body the immune system is protecting itself. It means that there is no other
additional system to protect and maintain the immune system
Immune Network Model:
This theory proposed that the immune system maintains a idiotypic network of interconnected
cells for antigen recognition. These cells both stimulate and suppress each other in a certain
way that leads to stabilization of network. The formation of such a network is possible by the
presence of paratope and idiotope on the each antibody cell. The paratope present on one
B-cell is recognized by other B-cells idiotopes so each cell recognize as well as recognized.

In network formation point of view two things are


very important : antigen-antibody binding and
antibody-antibody binding. This idiotypic network
can also be thought of as having cognitive
capabilities that makes it similar to a neural network
Negative Selection Algorithm
1.The purpose of negative selection is to provide tolerance for self-cells. It deals with the immune system’s ability to detect
unknown antigens while not reacting to the self-cells
2.During the generation of T-cells, receptors are made through a pseudo-random genetic rearrangement process. Then, they
undergo a censoring process in the thymus, called the negative selection.

Step 1. In generation stage, the detectors are generated by some random process and censored by trying to match self
samples as shown in Fig 2.
Step 2. Those candidates that match are eliminated and the rest are kept as detectors.
Step 3. In the detection stage, the collection of detectors (or detector set) is used to check whether an incoming data instance
is self or non-self as shown in Fig 3.
Step 4. If it matches any detector, then it is claimed as non-self or anomaly.
Clonal Selection Algorithm
The clonal selection principle of AIS describes how the
immune cells eliminate a foreign antigen and is simple but
efficient approximation algorithm for achieving optimum
solution.

Step 1: Initialize a number of antibodies (immune cells)


which represent initial population size.
Step 2: When an antigen or pathogen invades the
organism; a number of antibodies that recognize these
antigens survives. In Fig.3.4 only the antibody C is able to
recognize the antigen3 as its structure fits to a portion of
the pathogen. So fitness of antibody C is higher than
others.
Step 3: The immune cells recognize antigens under go
cellular reproduction. During reproduction the somatic cells
reproduce in an asexual form, i.e. there is no crossover of
genetic material during cell mitosis. The new cells are
copies (clones) of their parents as shown for antibody C in
Fig.3.4.
Step 4: A portion of cloned cells undergo a mutation
mechanism which is known as somatic hypermutation as
described in [3.25].
Step 5: The affinity of every cell with each other is a measure of similarity between them. It is calculated by the distance
between the two cells. The antibodies present in a memory response have on average a higher affinity than those of early
primary response. This phenomenon is referred to as maturation of immune response. During the mutation process the fitness
as well as the affinity of the antibodies gets changed. In each iteration after cloning and mutation those antibodies which have
higher fitness and higher affinity are allowed to enter the pool of efficient cells. Those cells with low affinity or self-reactive
receptors must be efficiently eliminated.

Step 6: At each iteration among the efficient immune cells some become effecter cells (Plasma Cell), while others are
maintained as memory cells. The effecter cells secrete antibodies and memory cells having longer span of life so as to act faster
or more effectively in future when the organism is exposed to same or similar pathogen.

Step 7: The process continues till the termination condition is satisfied else steps 2 to 7 are repeated
Danger Theory

The immune system in order to function properly, it’s very important that only the “correct” cells are matched as
otherwise this could lead to a self-destructive autoimmune reaction.
In particular, it is thought that the maturation process plays an important role to achieve self-tolerance by eliminating
those T- and B-cells that react to self. In addition, a “confirmation” signal is required: that is, for either B-cell or T-
(killer) cell activation, a T- (helper) lymphocyte must also be activated. This dual activation is further protection against
the chance of accidentally reacting to self
❑ The Danger Theory takes care of “non-self but harmless” and of “self but harmful” invaders into our system.

❑ The central idea is that the immune system does not respond to non-self but to danger.

❑ Practically there is no need to attack everything that is foreign, something that seems to be supported by the
counter-examples above.

❑ In this theory, danger is measured by damage to cells indicated by distress signals that are sent out when cells die an
unnatural death.

❑ Signal1 : this is used for antigen recognition. Basically to determine the cell is a foreign cell.

❑ Signal2 : this is used for co-stimulation

in accordance to the two signal model the danger theory operates by 3 steps

Step1 : Become activated if you receive signals one and two together. Die if you receive signal one in the absence of
signal two. Ignore signal two without signal one.

Step2 : Accept signal two from antigen-presenting cells only. Signal one may come from any cell.

Step3 : After activation revert to resting state after a short time


Applications of AIS models
1.Anomaly Detection: AIS models are widely used for anomaly detection in diverse fields such as cybersecurity,
network intrusion detection, fraud detection, and fault diagnosis.

2.Pattern Recognition: AIS models are employed for pattern recognition tasks in fields such as image processing,
bioinformatics, and data mining.

3.Optimization: AIS models are applied to optimization problems in various domains, including engineering
design, scheduling, and logistics

4.Intrusion Detection Systems (IDS): AIS-based IDS are employed to detect and respond to malicious activities
and security breaches in computer networks.

5.Robotics and Autonomous Systems: AIS models are utilized in robotics and autonomous systems for tasks
such as path planning, navigation, and behavior adaptation.

6.Bioinformatics and Computational Biology: AIS models are applied in bioinformatics and computational
biology for tasks such as sequence alignment, protein structure prediction, and gene expression analysis.

7.Resource Allocation and Management: AIS models are utilized for resource allocation and management tasks
in dynamic and distributed systems such as telecommunications networks and cloud computing environments.

You might also like