0% found this document useful (0 votes)
112 views170 pages

Aiml Material Unit-1 To 5

The document provides a comprehensive overview of Artificial Intelligence (AI), its evolution, and applications, including intelligent agents and their environments. It covers key concepts such as neural networks, machine learning, and various types of agents, along with their structures and rationality. Additionally, it discusses the historical milestones in AI development and the characteristics of different environments in which AI agents operate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views170 pages

Aiml Material Unit-1 To 5

The document provides a comprehensive overview of Artificial Intelligence (AI), its evolution, and applications, including intelligent agents and their environments. It covers key concepts such as neural networks, machine learning, and various types of agents, along with their structures and rationality. Additionally, it discusses the historical milestones in AI development and the characteristics of different environments in which AI agents operate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 170

1

UNIT– I:
Introduction: Definition of Artificial Intelligence, Evolution, Need, and applications in real world.
Intelligent Agents, Agents and environments; Good Behavior-The concept of rationality, the nature
of environments, structure of agents.
Neural Networks and Genetic Algorithms: Neural network representation, problems, perceptrons,
multilayer networks and back propagation algorithms, Genetic algorithms.
UNIT– II:
Knowledge–Representation and Reasoning: Logical Agents: Knowledge based agents, the
Wumpus world, logic. Patterns in Propositional Logic, Inference in First-Order Logic-Propositional
vs first order inference, unification and lifting
UNIT– III:
Bayesian and Computational Learning: Bayes theorem , concept learning, maximum likelihood,
minimum description length principle, Gibbs Algorithm, Naïve Bayes Classifier, Instance Based
Learning- K-Nearest neighbour learning
Introduction to Machine Learning (ML): Definition, Evolution, Need, applications of ML in
industry and real world, classification; differences between supervised and unsupervised learning
paradigms.
UNIT– IV:
Basic Methods in Supervised Learning: Distance-based methods, Nearest-Neighbors, Decision
Trees, Support Vector Machines, Nonlinearity and Kernel Methods.
Unsupervised Learning: Clustering, K-means, Dimensionality Reduction, PCA and kernel.
UNIT– V:
Machine Learning Algorithm Analytics: Evaluating Machine Learning algorithms, Model,
Selection, Ensemble Methods (Boosting, Bagging, and Random Forests).
Modeling Sequence/Time-Series Data and Deep Learning: Deep generative models, Deep
Boltzmann Machines, Deep auto-encoders, Applications of Deep Networks.
2

UNIT-I

Artificial intelligence is a technology


That enables a machine to think like human .

The goal of AI is to make a smart computer system like humans to solve complex
problems.

Machine learning is a subset of AI.


Which allows a machine to automatically learn from past data without programming by
human.

Deep Learning is a subset of machine learning.


That uses huge volumes of data and complex algorithms to train a machine

What Is AI?,
3

Artificial Intelligence suggest that machines can mimic(imitative) humans in:

• Talking

• Thinking

• Learning

• Planning

• Understanding

Artificial Intelligence is also called Machine Intelligence and Computer Intelligence.

AI Examples

Artificial Intelligence Samples:

• Self Driving Cars

• E-Payment

• Google Maps

• Text Autocorrect

• Automated Translation

• Chatbots

• Social Media

• Face Detection

• Search Algorithms

• Robots

• Automated Investment

• NLP - Natural Language Processing

• Flying Drones

• Dr. Watson

• Apple Siri

• Microsoft Cortana

• Amazon Alexa
4

Artificial Intelligence is not a new word and not a new technology for researchers.
This technology is much older than you would imagine.

Maturation of Artificial Intelligence (1943-1952)

• Year 1943: The first work which is now recognized as AI was done by Warren McCulloch and
Walter pits in 1943. They proposed a model of artificial neurons.

• Year 1949: Donald Hebb demonstrated an updating rule for modifying the connection strength
between neurons. His rule is now called Hebbian learning.

• Year 1950: The Alan Turing who was an English mathematician and pioneered Machine
learning in 1950. Alan Turing publishes "Computing Machinery and Intelligence" in which he
5

proposed a test. The test can check the machine's ability to exhibit intelligent behavior
equivalent to human intelligence, called a Turing test.

The birth of Artificial Intelligence (1952-1956)


• Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial intelligence
program"Which was named as "Logic Theorist". This program had proved 38 of 52

Mathematics theorems, and find new and more elegant proofs for some theorems.

• Year 1956: The word "Artificial Intelligence" first adopted by American Computer scientist

John McCarthy at the Dartmouth Conference. For the first time, AI coined as an academic
field.
At that time high-level computer languages such as FORTRAN, LISP, or COBOL were invented.
And the enthusiasm for AI was very high at that time.

The golden years-Early enthusiasm (1956-1974)


• Year 1966: The researchers emphasized developing algorithms which can solve mathematical

problems. Joseph Weizenbaum created the first chatbot in 1966, which was named as
ELIZA.

• Year 1972: The first intelligent humanoid robot was built in Japan which was named as

WABOT-1.

The first AI winter (1974-1980)


• The duration between years 1974 to 1980 was the first AI winter duration. AI winter refers
to the time period where computer scientist dealt with a severe shortage of funding from
government for AI researches.

• During AI winters, an interest of publicity on artificial intelligence was decreased.

A boom of AI (1980-1987)
6

• Year 1980: After AI winter duration, AI came back with "Expert System". Expert systems were

programmed that emulate the decision-making ability of a human expert.

• In the Year 1980, the first national conference of the American Association of Artificial
Intelligence was held at Stanford University.

The second AI winter (1987-1993)


• The duration between the years 1987 to 1993 was the second AI Winter duration.

• Again Investors and government stopped in funding for AI research as due to high cost but
not efficient result. The expert system such as XCON was very cost effective.

The emergence of intelligent agents (1993-2011)


• Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary Kasparov, and became
the first computer to beat a world chess champion.

• Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum cleaner.

• Year 2006: AI came in the Business world till the year 2006. Companies like Facebook, Twitter, and
Netflix also started using AI.

Deep learning, big data and artificial general intelligence (2011-


present)
• Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show, where it had to solve the
complex questions as well as riddles. Watson had proved that it could understand natural language
and can solve tricky questions quickly.
• Year 2012: Google has launched an Android app feature "Google now", which was able to provide
information to the user as a prediction.
• Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a competition in the infamous "Turing
test."
• Year 2018: The "Project Debater" from IBM debated on complex topics with two master debaters and
also performed extremely well.
7

Agents in Artificial Intelligence


An AI system can be defined as the study of the rational (reason) agent and its environment.

The agents sense the environment through sensors and act on their environment through actuators.

An AI agent can have mental properties such as knowledge, belief, intention, etc.

Agent Terminology

• Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
• Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
• Percept − It is agent’s perceptual inputs at a given instance.
• Percept Sequence − It is the history of all that an agent has perceived till date.
• Agent Function − It is a map from the precept sequence to an action.

What is an Agent?
An agent can be gets information from environment through sensors

And act upon that environment through actuators.

An Agent runs in the cycle of perceiving (receive), thinking, and acting.

An agent can be:

o Human-Agent: A human agent has eyes, ears, and other organs which work for sensors and hand, legs, vocal tract
work for actuators.

o Robotic Agent: A robotic agent can have cameras, infrared range finder, sensors and various motors for actuators.

o Software Agent: Software agent can have keystrokes, file contents as sensory input and act on those inputs and
display output on the screen.

Hence the world around us is full of agents such as cellphone, camera, and even we are also agents.

Before moving forward, we should first know about sensors, effectors, and actuators.

Sensor: Sensor is a device which detects the change in the environment and sends the information to other electronic
devices. An agent observes its environment through sensors.

Actuators: Actuators are the component of machines that converts energy into motion. The actuators are only responsible
for moving and controlling a system. An actuator can be an electric motor, gears, etc.

Effectors: Effectors are the devices which affect the environment. Effectors can be legs, wheels, arms, fingers, wings, fins,
and display screen.
8

PEAS for self-driving cars:


Let's suppose a self-driving car then PEAS representation will be:

Performance: Safety, time, legal drive, comfort

Environment: Roads, other vehicles, road signs, pedestrian

Actuators: Steering, accelerator, brake, signal, horn

Sensors: Camera, GPS, speedometer, odometer, accelerometer, sonar.

Intelligent Agents:
An intelligent agent is an autonomous entity which act upon an environment using sensors and actuators for achieving
goals.

An intelligent agent may learn from the environment to achieve their goals.

A thermostat is an example of an intelligent agent.

Following are the main four rules for an AI agent:

o Rule 1: An AI agent must have the ability to perceive (receive) the environment.

o Rule 2: The observation must be used to make decisions.

o Rule 3: Decision should result in an action.

o Rule 4: The action taken by an AI agent must be a rational action.

Rationality (Reasons) Existence of reasons for a particular set of actions.

Good Behavior: Rationality


9

The rationality of an agent is measured by its performance measure.

Rationality can be judged on the basis of following points:

o Performance measure which defines the success criterion.

o Agent prior knowledge of its environment.

o Best possible actions that an agent can perform.

o The sequence of percepts.

Rational Agent:
Performance of agent is called Rational of Agent.

A rational agent has clear reasons and actions.

A rational agent is said to perform the right things.

Maximize its performance measure with all possible reasons based on environment.

AI is about creating rational agents

Rational agents to useful for game theory and decision theory for various real-world scenarios.

Structure of an AI Agent
The task of AI is to design an agent program which implements the agent function.

The structure of an intelligent agent is a combination of architecture and agent program.

It can be viewed as:

Agent’s structure can be viewed as −

• Agent = Architecture + Agent Program


• Architecture = the machinery that an agent executes on.
• Agent Program = an implementation of an agent function.

Following are the main three terms involved in the structure of an AI agent:

Architecture: Architecture is machinery that an AI agent executes on.

Agent Function: Agent function is used to map a percept to an action.


10

PEAS Representation
PEAS is a type of model on which an AI agent works upon.

When we define an AI agent or rational agent, then we can group its properties under PEAS representation model. It is made
up of four words:

o P: Performance measure

o E: Environment

o A: Actuators

o S: Sensors

Here performance measure is the objective for the success of an agent's behavior.

Example of Agents with their PEAS representation

Agent Performance measure Environment Actuators Sensors

1. Medical Diagnose Healthy patient Patient Tests Keyboard


Minimized cost Hospital Treatments (Entry of symptoms)
Staff

2. Vacuum Cleaner Cleanness Room Wheels Camera


Efficiency Table Brushes Dirt detection sensor
Battery life Wood floor Vacuum Cliff sensor
Security Carpet Extractor Bump Sensor
Various obstacles Infrared Wall Sensor

3. Part -picking Percentage of parts in Conveyor belt with Jointed Arms Camera
Robot correct bins. parts, Hand Joint angle sensors.
Bins

What is Ideal Rational Agent?


An ideal rational agent can capable of doing all expected actions to maximize its performance measure, on the basis of −

• Its percept sequence (getting information sequence)


• Its built-in knowledge base
Rationality of an agent depends on the following −
• The performance measures, which determine the degree of success.
• Agent’s Percept Sequence till now.
• The agent’s prior knowledge about the environment.
• The actions that the agent can carry out.
A rational agent always performs right action,
11

Where the right action means the action that causes the agent to be most successful in the given percept sequence.

The problem the agent solves is characterized by Performance Measure, Environment, Actuators, and Sensors (PEAS).

Simple Reflex Agents


• They choose actions only based on the current percept (input).
• They are rational only if a correct decision is made only on the basis of current precept (input).
• Their environment is completely observable.
Condition-Action Rule − It is a rule that maps a state (condition) to an action.

Model Based Reflex Agents


They use a model of the world to choose their actions. They maintain an internal state.

Model − knowledge about “how the things happen in the world”.

Internal State − It is a representation of unobserved aspects of current state depending on percept history.

Updating the state requires the information about −

• How the world evolves.

• How the agent’s actions affect the world.


12

Goal Based Agents


They choose their actions in order to achieve goals.

Goal-based approach is more flexible than reflex agent

Since the knowledge supporting a decision is explicitly modeled, thereby allowing for modifications.

Goal − It is the description of desirable situations.

Utility Based Agents


They choose actions based on a preference (utility) for each state.

Goals are inadequate when −


• There are goals, out of which only few can be achieved.
13

The Nature of Environments


An environment is everything in the world which surrounds the agent,

But it is not a part of an agent itself.

An environment can be described as a situation in which an agent is present.

The environment is where agent lives, operate and provide the agent with something to sense and act upon it.

Features of Environment
As per Russell and Norvig, an environment can have various features from the point of view of an agent:

1. Fully observable vs Partially Observable

2. Static vs Dynamic

3. Discrete vs Continuous

4. Deterministic vs Stochastic

5. Single-agent vs Multi-agent

6. Episodic vs sequential

7. Known vs Unknown

8. Accessible vs Inaccessible

1. Fully observable vs Partially Observable:


14

o If an agent sensor can sense or access the complete state of an environment at each point of time then it is a fully
observable environment, else it is partially observable.

o A fully observable environment is easy as there is no need to maintain the internal state to keep track history of the
world.

o An agent with no sensors in all environments then such an environment is called as unobservable.

2. Deterministic vs Stochastic:
o If an agent's current state and selected action can completely determine the next state of the environment,

o Then such environment is called a deterministic environment.

o A stochastic environment is random in nature and cannot be determined completely by an agent.

o In a deterministic, fully observable environment, agent does not need to worry about uncertainty.

3. Episodic vs Sequential:
o In an episodic environment, there is a series of one-shot actions, and only the current percept is required for the
action.

o However, in Sequential environment, an agent requires memory of past actions to determine the next best actions.

4. Single-agent vs Multi-agent
o If only one agent is involved in an environment, and operating by itself then such an environment is called single
agent environment.

o However, if multiple agents are operating in an environment, then such an environment is called a multi-agent
environment.

o The agent design problems in the multi-agent environment are different from single agent environment.

5. Static vs Dynamic:
o If the environment can change itself while an agent is deliberating then such environment is called a dynamic
environment else it is called a static environment.

o Static environments are easy to deal because an agent does not need to continue looking at the world while
deciding for an action.

o However for dynamic environment, agents need to keep looking at the world at each action.

6. Discrete vs Continuous:
o If in an environment there are a finite number of percepts and actions that can be performed within it, then such an
environment is called a discrete environment else it is called continuous environment.

o A chess game comes under discrete environment as there is a finite number of moves that can be performed.

o A self-driving car is an example of a continuous environment.


15

7. Known vs Unknown
o Known and unknown are not actually a feature of an environment, but it is an agent's state of knowledge to perform
an action.

o In a known environment, the results for all actions are known to the agent. While in unknown environment, agent
needs to learn how it works in order to perform an action.

o It is quite possible that a known environment to be partially observable and an Unknown environment to be fully
observable.

8. Accessible vs Inaccessible
o If an agent can obtain complete and accurate information about the state's environment, then such an environment
is called an Accessible environment else it is called inaccessible.

o An empty room whose state can be defined by its temperature is an example of an accessible environment.

o Information about an event on earth is an example of Inaccessible environment.

Structure of an AI Agent
The task of AI is to design an agent program which implements the agent function.
The structure of an intelligent agent is a combination of architecture and agent program.
It can be viewed as:
1. Agent = Architecture + Agent program
Following are the main three terms involved in the structure of an AI agent:
Architecture: Architecture is machinery that an AI agent executes on.
Agent Function: Agent function is used to map a percept to an action.

Error! Hyperlink reference not valid.


16

Neural Networks and Genetic Algorithms: Neural network representation,


problems, perceptrons,
multilayer networks and back propagation algorithms, Genetic algorithms.

Example Diagram for Neural Network.


17

IntroductionArtificial Neural Networks


1. Artificial Neural Networks (ANN) are multi-layer fully-connected neural nets that look like the figure
below.
2. They consist of an input layer, multiple hidden layers, and an output layer.
3. Every node in one layer is connected to every other node in the next layer.
4. We make the network deeper by increasing the number of hidden layers.
18

Neural Networks can solve problems that can't be solved by algorithms:


1. Medical Diagnosis
2. Face Detection
3. Voice Recognition

Neural Networks is the important of Deep Learning.


19

1. A given node takes the weighted sum of its inputs, and passes it through a non-linear activation function.
2. This is the output of the node, which then becomes the input of another node in the next layer.
3. The signal flows from left to right, and the final output is calculated by performing this procedure for all
the nodes.
4. Training this deep neural network means learning the weights associated with all the edges.
5. The equation for a given node looks as follows.
6. The weighted sum of its inputs passed through a non-linear activation function.
7. It can be represented as a vector dot product, where n is the number of inputs for the node.

Referred from:-https://towardsdatascience.com/applied-deep-learning-part-1-artificial-neural-networks-d7834f67a4f6

Weight is the parameter or units within a neural network that transforms


input data within the network's hidden layers.

A neural network is a series of nodes, or neurons. Within each node is a set


of inputs, weight, and a bias value.

Neurons
Scientists agree that our brain has around 100 billion neurons.
These neurons have hundreds of billions connections between them.

Neurons (aka Nerve Cells) are the fundamental units of our brain and nervous system.
The neurons are responsible for receiving input from the external world, for sending output
(commands to our muscles), and for transforming the electrical signals in between.
20

Neural Networks
Artificial Neural Networks are normally called Neural Networks (NN).
Neural networks are in fact multi-layer Perceptrons. (Perceptron=గ్రహణశక్త)ి
The perceptron defines the single unit into number of neural networks.

Perceptron in Machine Learning


1. Perceptron is a building block of an Artificial Neural Network.
2. In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term.
3. It is the primary step to learn Machine Learning and Deep Learning technologies,
4. which consists of a set of weights, input values or scores, and a threshold.
5. Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented
6. The Perceptron for performing certain calculations to detect input data capabilities or business intelligence.
7. Perceptron is a linear Machine Learning algorithm used for supervised learning for various binary classifiers.
8. This algorithm enables neurons to learn elements and processes them one by one during preparation.
9. Let's start with the basic introduction of Perceptron.

Basic Components of Perceptron


Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains three main components.
These are as follows:

Input Nodes or Input Layer:


This is the primary component of Perceptron which accepts the initial data into the system for further processing.
Each input node contains a real numerical value.
Wight and Bias:
Weight parameter represents the strength of the connection between units.
This is another most important parameter of Perceptron components.
Weight is directly proportional to the strength of the associated input neuron in deciding the output.
Activation Function:
21

These are the final and important components that help to determine whether the neuron will fire or not.
Activation Function can be considered primarily as a step function.

Backpropagation Step by Step


If you are building your own neural network, you will definitely need to understand how to train it.
Backpropagation is a commonly used technique for training neural network.
There are many resources explaining the technique, but this post will explain backpropagation with concrete example in
a very detailed colorful steps.

In this post, we will build a neural network with three layers:


• Input layer with two inputs neurons

• One hidden layer with two neurons

• Output layer with a single neuron

Weights
Neural network training is about finding weights that minimize prediction error.
We usually start our training with a set of randomly generated weights.
Then, backpropagation is used to update the weights in an attempt to correctly map arbitrary inputs to outputs.

Our initial weights will be as following:

w1 = 0.11,
w2 = 0.21,
w3 = 0.12,
w4 = 0.08,
w5 = 0.14 and
w6 = 0.15
22

Dataset
Our dataset has one sample with two inputs and one output.

Our single sample is as following inputs=[2, 3] and output=[1].

Forward Pass
We will use given weights and inputs to predict the output. Inputs are multiplied by weights; the results are then passed
forward to next layer.
23

Calculating Error
• Now, it’s time to find out how our network performed by calculating the difference between the actual output
and predicted one.
• It’s clear that our network output, or prediction, is not even close to actual output.
• We can calculate the difference or the error as following.
24

Reducing Error
Our main goal of the training is to reduce the error or the difference between prediction and actual output.
Since actual output is constant, “not changing”, the only way to reduce the error is to change prediction value.
The question now is, how to change prediction value?
By decomposing prediction into its basic elements we can find that weights are the variable elements
affecting prediction value.
In other words, in order to change prediction value, we need to change weights values.

Backpropagation
25

Backpropagation, short for “backward propagation of errors”, is a mechanism used to update

the weights using gradient descent.

The calculation proceeds backwards through the network.

Genetic Algorithm in Machine Learning


1. A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory of
evolution in Nature."
2. It is used to solve optimization problems in machine learning.
3. It is one of the important algorithms as it helps solve complex problems that would take a long time to
solve.
4. Genetic Algorithms are being widely used in different real-world applications, for example, Designing
electronic circuits, code-breaking, image processing, and artificial creativity.
5. The genetic algorithm is a method for solving optimization search problems that is based on natural
selection,
6. Derives from the process that drives biological evolution.
7. The genetic algorithm repeatedly modifies a population of individual solutions.
8. At each step, the genetic algorithm selects individuals from the current population to be parents and uses
them to produce the children for the next generation for search solution.
9. Over successive generations, the population "evolves" toward an optimal solution.
10. You can apply the genetic algorithm to solve a variety of optimization problems.

What is a Genetic Algorithm?


Before understanding the Genetic algorithm, let's first understand basic terminologies to better understand this algorithm:

o Population: Population is the subset of all possible or probable solutions, which can solve the given problem.
26

o Chromosomes: A chromosome is one of the solutions in the population for the given problem, and the collection of gene generate a
chromosome.

o Gene: A chromosome is divided into a different gene, or it is an element of the chromosome.

o Allele: Allele is the value provided to the gene within a particular chromosome.

o Fitness Function: The fitness function is used to determine the individual's fitness level in the population. It means the ability of an individual
to compete with other individuals. In every iteration, individuals are evaluated based on their fitness function.

o Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring better than parents. Here genetic operators play
a role in changing the genetic composition of the next generation.

o Selection

After calculating the fitness of every existent in the population, a selection process is used to determine which of the individualities in the population
will get to reproduce and produce the seed that will form the coming generation.

1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called population. Here each individual is the solution for the
given problem. An individual contains or is characterized by a set of parameters called Genes. Genes are combined into a string and generate
chromosomes, which is the solution to the problem. One of the most popular techniques for initialization is the use of random binary strings.

2. Fitness Assignment
1. Fitness function is used to determine how fit an individual is?

2. It means the ability of an individual to compete with other individuals.

3. In every iteration, individuals are evaluated based on their fitness function.

4. The fitness function provides a fitness score to each individual.

5. This score further determines the probability of being selected for reproduction.

6. The high the fitness score, the more chances of getting selected for reproduction.
27

3. Selection
1. The selection phase involves the selection of individuals for the reproduction of offspring.
2. All the selected individuals are then arranged in a pair of two to increase reproduction.
3. Then these individuals transfer their genes to the next generation.

4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this step, the genetic algorithm uses
two variation operators that are applied to the parent population. The two operators involved in the reproduction phase are
given below:
o Crossover: The crossover plays a most significant role in the reproduction phase of the genetic algorithm. In this
process, a crossover point is selected at random within the genes. Then the crossover operator swaps genetic
information of two parents from the current generation to produce a new individual representing the offspring.

o The genes of parents are exchanged among themselves until the crossover point is met.

o These newly generated offspring are added to the population.

o This process is also called or crossover.

o Types of crossover styles available:

1. One point crossover

2. Two-point crossover

3. Livery crossover

4. Inheritable Algorithms crossover

Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the diversity in the population.
o It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification.
o The below image shows the mutation process:
Types of mutation styles available,
➢ Flip bit mutation
➢ Exchange/Swap mutation
28

What Is the Genetic Algorithm?

Referred from: - https://in.mathworks.com/help/gads/what-is-the-genetic-algorithm.html


UNIT – II
Introduction to Prolog

Knowledge Representation, Problems in representing knowledge, knowledge


representation using propositional and predicate logic, logical consequences, syntax, and
semantics of an expression, semantic Tableau Forward and backward reasoning. Proof
methods, substitution and unification, conversion to clausal form, normal forms, resolution,
refutation, deduction, theorem proving, inferencing, monotonic and non-monotonic
reasoning.

KNOWLEDGE REPRESENTATION

What exactly is meant by knowledge representation?

Knowledge consists of facts, concepts, rules, and so forth. It can be represented in


different forms, as mental images in one's thoughts, as spoken or written words in some
language, as graphical or other pictures, and as character strings or collections of magnetic
spots stored in a computer as shown in figure.
Example:

Suppose we wish to write a program to play a simple card game using the standard
deck of 52 playing cards. We will need some way to represent the cards dealt to each player
and a way to express the rules. We can represent cards in different ways.

1. The most straightforward way is to record the suit (clubs, diamonds, hearts, spades)
and face values (ace.' 2,3, ... , IO, jack, queen, king) as a symbolic pair. So the queen
of hearts might be represented as <queen, hearts>.

2. Alternatively, we could assign abbreviated codes (c6 for the 6 of clubs), numeric
values which ignore suit (1, 2, ... , 13), or some other scheme. If the game we wish to
play is bridge, suit as well as value will be important.

3. On the other hand, if the game is black jack, only face values are important and a
simpler program will result if only numeric values are used.

To see how important a good representation is, one only needs to try solving a few simple
problems using different representations. Consider the problem of discovering a pattern in the
sequence of numbers 1 1 2 3 4 7. A change of base in the number from l 0 to 2 transforms the
number to
011011011011011011.

Clearly a representation in the proper base greatly simplifies finding the


pattern solution. There are several representation schemes that have become popular among
AI practitioners. Perhaps the most important of these is first order predicate logic. It has
become important because it is one of the few methods that has a well-developed theory, has
reasonable expressive power, and uses valid form of inferring. Other representation schemes
include frames and associative networks (also called semantic and conceptual networks),
fuzzy logic, modal logics, and object-oriented methods.

To solve the complex problems encountered in artificial intelligence,


one needs both large amount of knowledge and some mechanisms for manipulating that
knowledge to create solutions to new problems. We must consider the following point that
pertains to all discussions of representation, namely that we are dealing with two different
kinds of entities:
• Facts: truths in some relevant world. These are the things we want to represent.
• Representations of facts in some chosen formalism. These are the things we will
actually be able to manipulate.
• The knowledge level, at which facts (including each agent's behaviours and current
goals) are described.
• The symbol level, at which representations of objects at the knowledge level are
'defined in terms of symbols that can be manipulated by programs, we will follow a model
more like the one shown in above figure.

Representations and Mappings

 In order to solve complex problems encountered in artificial intelligence, one needs both
a large amount of knowledge and some mechanism for manipulating that knowledge to
create solutions.

 Knowledge and Representation are two distinct entities. They play central but
distinguishable roles in the intelligent system.

 Knowledge is a description of the world. It determines a system’s competence by what it


knows.

 Moreover, Representationis the way knowledge is encoded. It defines a system’s


performance in doing something.

 Different types of knowledge require different kinds of representation.

The Knowledge Representation models/mechanisms are often based on:


 Logic
 Rules
 Frames
 Semantic Net
Knowledge is categorized into two major types:

1. Tacit corresponds to “informal” or “implicit“

 Exists within a human being;


 It is embodied.
 Difficult to articulate formally.
 Difficult to communicate or share.
 Moreover, Hard to steal or copy.
 Drawn from experience, action, subjective insight

2. Explicit formal type of knowledge, Explicit

 Explicit knowledge
 Exists outside a human being;
 It is embedded.
 Can be articulated formally.
 Also, Can be shared, copied, processed and stored.
 So, Easy to steal or copy
 Drawn from the artifact of some type as a principle,
procedure, process, concepts.
A variety of ways of representing knowledge have been exploited in AI programs.
There are two different kinds of entities, we are dealing with.
1. Facts: Truth in some relevant world. Things we want to represent.
2. Also, Representation of facts in some chosen formalism. Things we will be able to
manipulate.
These entities structured at two levels:
1. The knowledge level, at which facts described.
2. Moreover, the symbol level, at which representation of objects defined in terms of
symbols that can manipulate by programs.

Framework of Knowledge Representation

 The computer requires a well-defined problem description to process and provide a well
defined acceptable solution.

 Moreover, To collect fragments of knowledge we need first to formulate a description in


our spoken language and then represent it in formal language so that computer can
understand.

 Also, The computer can then use an algorithm to compute an answer. So, This process
illustrated as knowledge representation framework.

The steps are:


 The informal formalism of the problem takes place first.
 It then represented formally and the computer produces an output.
 This output can then represented in an informally described solution that user understands
or checks for consistency.
The Problem solving requires,
 Formal knowledge representation, and
 Moreover, Conversion of informal knowledge to a formal knowledge that is the
conversion of implicit knowledge to explicit knowledge.

Mapping between Facts and Representation

 Knowledge is a collection of facts from some domain.


 Also, We need a representation of “facts“ that can manipulate by a program.
 Moreover, Normal English is insufficient, too hard currently for a computer program to
draw inferences in natural languages.
 Thus some symbolic representation is necessary. A good knowledge representation enables
fast and accurate access to knowledge and understanding of the content.
A knowledge representation system should have following properties.
1. Representational Adequacy
 The ability to represent all kinds of knowledge that are needed in that domain.
2. Inferential Adequacy
 Also, The ability to manipulate the representational structures to derive new
structures corresponding to new knowledge inferred from old.
3. Inferential Efficiency
 The ability to incorporate additional information into the knowledge structure that
can be used to focus the attention of the inference mechanisms in the most
promising direction.
4. Acquisitional Efficiency
 Moreover, The ability to acquire new knowledge using automatic methods
wherever possible rather than reliance on human intervention.

Knowledge Representation Schemes

Relational Knowledge
 The simplest way to represent declarative facts is a set of relations of the same sort used
in the database system.
 Provides a framework to compare two objects based on equivalent attributes. o Any
instance in which two different objects are compared is a relational type of knowledge.
 The table below shows a simple way to store facts.
 Also, The facts about a set of objects are put systematically in columns.
 This representation provides little opportunity for inference.

 Given the facts, it is not possible to answer a simple question such as: “Who is the
heaviest player?”
 Also, But if a procedure for finding the heaviest player is provided, then these facts will
enable that procedure to compute an answer.
 Moreover, We can ask things like who “bats – left” and “throws – right”.

Inheritable Knowledge
 Here the knowledge elements inherit attributes from their parents.
 The knowledge embodied in the design hierarchies found in the functional, physical and
process domains.
 Within the hierarchy, elements inherit attributes from their parents, but in many cases, not
all attributes of the parent elements prescribed to the child elements.
 Also, The inheritance is a powerful form of inference, but not adequate.
 Moreover, The basic KR (Knowledge Representation) needs to augment with inference
mechanism.
 Property inheritance: The objects or elements of specific classes inherit attributes and
values from more general classes.
 So, The classes organized in a generalized hierarchy.

 Boxed nodes — objects and values of attributes of objects.


 Arrows — the point from object to its value.
 This structure is known as a slot and filler structure, semantic network or a collection of
frames.
The steps to retrieve a value for an attribute of an instance object:
1. Find the object in the knowledge base
2. If there is a value for the attribute report it
3. Otherwise look for a value of an instance, if none fail
4. Also, Go to that node and find a value for the attribute and then report it
5. Otherwise, search through using is until a value is found for the attribute.

Inferential Knowledge
 This knowledge generates new information from the given information.
 This new information does not require further data gathering form source but does
require analysis of the given information to generate new knowledge.
 Example: given a set of relations and values, one may infer other values or relations. A
predicate logic (a mathematical deduction) used to infer from a set of attributes.
Moreover, Inference through predicate logic uses a set of logical operations to relate
individual data.
 Represent knowledge as formal logic: All dogs have tails ∀x: dog(x)  hastail(x)
 Advantages:
 A set of strict rules.
 Can use to derive more facts.
 Also, Truths of new statements can be verified.
 Guaranteed correctness.
 So, Many inference proceduresavailable to implement standard rules of logic popular in
AI systems. e.g Automated theorem proving.

Procedural Knowledge
 A representation in which the control information, to use the knowledge, embedded in the
knowledge itself. For example, computer programs, directions, and recipes; these indicate
specific use or implementation.
 Moreover, Knowledge encoded in some procedures, small programs that know how to do
specific things, how to proceed.
 Advantages:
 Heuristic or domain-specific knowledge can represent.
 Moreover, Extended logical inferences, such as default reasoning facilitated.
 Also, Side effects of actions may model. Some rules may become false in time.
Keeping track of this in large systems may be tricky.
 Disadvantages:
 Completeness — not all cases may represent.
 Consistency — not all deductions may be correct. e.g If we know that Fred is a
bird we might deduce that Fred can fly. Later we might discover that Fred is an
emu.
 Modularity sacrificed. Changes in knowledge base might have far-reaching effects.
 Cumbersome control information.

PROBLEMS IN REPRESENTING KNOWLEDGE

All Knowledge representation schemes suffer from problems. We will discuss them here.

 Are any attributes of objects so basic that they occur in almost every problem
domain. If there are, we need to make sure that they are handled appropriately in
each of the mechanisms we propose. If such attributes exist, what are they?

 Are there any important relationships that exists among attributes of objects?

 At what level should knowledge be represented? Is there a good set of primitives.


into which all knowledge can be broken down? Is it helpful to use such primitives?

 How should sets of objects be represented?

 Given a large amount of knowledge stored in a database, how can relevant parts
be accessed when they are needed?

Important Attributes

There are two attribute that are of very general significance, and we have already seen
.their use. instance and.1sa. These attributes are important because they support property
inheritance. They are called a variety of things in AI systems, but the names do not
matter. What does matter is that they represent class membership and class inclusion
and that class inclusion is transitive.

Relationship among Attributes

The attributes that we use to describe objects are themselves entities that we represent.
what properties do they have independent of the specific knowledge they encode? There
are four such properties are listed issues that should be raised when using a knowledge
representation technique:

Inverses

The relationship between the attributes of an object, such as, inverses, existence, techniques
for reasoning about values and single valued attributes. We can consider an example of an
inverse in

band(John Zorn, Naked City)

This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked
City.

The second approach is to use attributes that focus on single entity but to use in them pairs.
One the inverse of the other. The band information is represented with two attributes.

band = Naked City

band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron,

Existence in isa hierarchy

Just as there are classes of objects and specialized subsets of those classes, there are attributes
and specializations of attributes. Consider, for example, the attribute height. It Is actually a
specialization of the more general attribute physical-size which is, in turn, a specialization of
physical-attribute. These generalization-specialization relationships are important for
attributes, for the same reason that they are important for other concepts they support
inheritance. In the case of attributes, they support inheriting, information on about such things
as constraints on the values that the attribute can have and mechanisms for computing those-
values.

Techniques for Reasoning about values

Sometimes values of attributes are specified explicitly when a knowledge base is created.
Several kinds of information can play a role in this reasoning, including:

• Information about the type of the value. For example, the value of height must be a number
measured m a unit of length. ·
• Constraints on the value, often stated in terms of related entities. For example, the age of a
person cannot be greater than the age of either of that person's parents.
• Rules for computing the value when it is needed, These rules are called backward rules.
Such rules have also been called if-needed rules.
• Rules that describe actions that should be taken if a value ever becomes known. These rules
are called forward rules, or sometimes if-added rules.

Single-valued attribute

A specific but very useful kind of attribute is one that is guaranteed to take a unique value.
For example, a baseball player can, at any one time, have only a single height and be a
member of only one team. If there is already a value present for one of these attributes and a
different value is asserted, then one of two things has happened. Either a. change has
occurred in the world or there is now a contradiction in the knowledge base that needs to be
resolved. Knowledge-representation systems have taken several different approaches to
providing support for single-valued attributes! including:

• Introduce. an explicit notation for temporal interval. If two different values are ever asserted
for the same temporal interval, signal a contradiction automatically.
• Assume that the only temporal interval that is of interest is now, So if a new value is
asserted, replace the old value.

Choosing the Granularity of Representation

At what level should the knowledge be represented and what are the primitives. Choosing the
Granularity of Representation Primitives are fundamental concepts such as holding, seeing,
playing and as English is a very rich language with over half a million words it is clear we
will find difficulty in deciding upon which words to choose as our primitives in a series of
situations.

If Tom feeds a dog then it could become:

feeds(tom, dog)

If Tom gives the dog a bone like:

gives(tom, dog, bone) Are these the same?

In any sense does giving an object food constitute feeding?

If give(x, food) feed(x) then we are making progress.

There are several arguments against the use of low-level primitives. One is that simple high-
level facts may require a lot of storage when broken down into primitives. Much of that
storage is really wasted since the low-level rendition of a particular high level concept will
appear many times, once for each time the high-level concept is referenced. For example,
suppose that actions are being represented as combinations of a small set of primitive actions.
Then the fact that John punched Mary might be represented as shown in first figure. The
representation says that there was physical contact between John's fist and Mary. The contact
was caused by John propelling his fist toward Mary, and to do that John first went to where
Mary was. But suppose we also know that Mary punched John. Then we must also store the
structure shown in second figure below. If, however, punching were represented simply as
punching, then most of the detail of both structures could be omitted from the structures
themselves. It could instead be stored just once in a common representation of the concept of
punching.
Representing Set of Objects

It is important to be able to represent set of objects for several reasons. One is that there are
some properties that are true of sets that are not true of the individual members of a set. As
examples, consider the assertions that are being made in the sentences.

"There are more sheep than people in Australia" and "English speakers can be found all over
the world.".

The only way to represent the facts described in these sentences is to attach assertions to the
sets representing people, sheep, and English speakers, since, for example, no single English
speaker can be found all over the world. The other reason that it is important to be. able to
represent sets of objects is that if a property is true of all (or even most) elements of a set,
then it is more efficient to associate it once with the set rather than to associate it explicitly
with every element of the set.

Thus if we assert something like large(Elephant), it must be clear whether we are asserting
Some property of the set itself (i.e., that the set of elephants is large) or some property that
holds for individual elements of the set (i.e., that anything that is an elephant is large). There
are three obvious ways in which sets may be represented. The simplest is just by a name.

Finding the Right Structures as Needed

This is the issue of locating appropriate knowledge structures that have been stored in
memory. For example, suppose we have a script (a description of a class of events in terms of
contexts, participants, and subevents) that describes the typical sequence of events in a .
restaurant. This script would enable us to take a text such as ·

John went to Steak and Ale last night. He ordered a large, rare steak, paid his bill, and left.
And answer yes to the question: Did john eat dinner last night?

Notice that nowhere in the story was John's eating anything mentioned explicitly. But the fact
that when one goes to a restaurant one eats will be contained in the restaurant script. If we
know in advance to use the restaurant script, then we can answer the question easily. But in·
order to be able to reason about a variety of things, a system must have many scripts for
everything from going to work to sailing around the world. How will it select the appropriate
one each time? For example, nowhere in our story was the word "restaurant" mentioned. · · _
In fact in order to have access to the right structure for describing a particular situation, it is
necessary to solve all of the following problems.

• How to perform an initial selection of the most appropriate structure.


• How to fill in appropriate details from the current situation.
• How to find a better structure 'if the one chosen initially turns out not to be appropriate.
• What to do if none of the available structures is appropriate.
• When to create and remember a new structure.

There is no good, general-purpose method for solving all these problems. Some knowledge-
representation technique solve some of them. This leads to two questions: how to select an
initial structure to consider and how to find a better structure (or revise it) if that one turns out
not to be a good match.
KNOWLEDGE REPRESENTATION USING PROPOSITIONAL AND PREDICATE
LOGIC

SYNTAX AND SEMANTICS FOR PROPOSITIONAL LOGIC

Valid statements or sentences in PL are determined according to the rules of propositional


syntax. This syntax governs the combination of basic building blocks such as propositions
and logical connectives. Propositions are elementary atomic sentences. (We shall also use the
term formulas or well-formed formulas in place of sentences) Propositions may be either true
or false but may take on no other value. Some examples of simple propositions are

It is raining.
My car is painted silver.
John and Sue have five children.
Snow is white.
People live on the moon.

Compound propositions are formed from atomic formulas using the logical connectives not
and or if . . . then, and if and only if. For example, the following are compound formulas.

It is raining and the wind is blowing.


The moon is made of green cheese or it is not.
If you study hard you will be rewarded.
The sum of 10 and 20 is not 50.

We will use capital letters, sometimes followed by digits, to stand for propositions; T and F
are special symbols having the values true and false, respectively.

The following symbols will also be used for logical connectives.

- for not or negation

& for and or conjunction

V for or or disjunction

for if . . . then or implication

for if and only if or double implication

In addition, left and right parentheses, left and right braces, and the period will be used as
delimiters for punctuation. So, for example, to represent the compound sentence "It is raining
and the wind is blowing" we could write (R & B) where R and B stand for the propositions "It
is raining" and "the wind is blowing," respectively. If we write (R V B) we mean "it is raining
or the wind is blowing or both" that is, V indicates inclusive disjunction.

Syntax

The syntax of PL is defined recursively as follows.


Semantics

The semantics or meaning of a sentence is just the value true or false; that is, it is an
assignment of a truth value to the sentence. The values true and false should not be confused
with the symbols T and F which can appear within a sentence. An interpretation for a
sentence or group of sentences is an assignment of a truth value to each propositional symbol.
As an example, consider the statement (P & -Q). One interpretation (I1) assigns true to P and
false to Q. A different interpretation (I2) assigns true to P and true to Q. Clearly, there are
four distinct interpretations for this sentence. Some semantic rules are summarized in the
table below.

We can now find the meaning of any statement given an interpretation I for the statement.
For example, let I assign true to P, false to Q and false to R in the statement

Application of rule 2 then gives -Q as true, rule 3 gives (P & -Q) as true, rule 6 gives (P & -
Q) ~ R as false, and rule 5 gives the statement value as false.
Properties of statements

Satisfiable. A statement is satisfiable if there is some interpretation for which it is true.


Contradiction. A sentence is contradictory (unsatisfiable) if there is no interpretation for
which it is true.
Valid. A sentence is valid if it is true for every interpretation. Valid sentences are also called
tautologies.
Equivalence. Two sentences are equivalent if they have the same truth value under every
interpretation.
Logical consequences. A sentence is a logical consequence of another if it is satisfied by all
interpretations which satisfy the first.

LOGICAL CONSEQUENCES

More generally, it is a logical consequence of other statements if and only if for any
interpretation in which the statements are true, the resulting statement is also true. A valid
statement is satisfiable, and a contradictory statement is invalid, but the converse is not
necessarily true. As examples of the above definitions consider the following statements.
P is satisfiable but not valid since an interpretation that assigns false to P assigns false to the
sentence P.

P V - P is valid since every interpretation results in a value of true for (P V -P).


P & - P is a contradiction since every interpretation results in a value of false for (P & -P).
P and -CP) are equivalent since each has the same truth values under every interpretation.
P is a logical consequence of ( P & Q) since any interpretation for which (P & Q) is true, P is
also true.

The notion of logical consequence provides us with a means to perform valid inferencing in
PL. The following are two important theorems which give criteria for a statement to be a
logical consequence of a set of statements.
Theorem 4.1. The sentence s is a logical consequence of s1, ............ , sn if and only if s1 & s2
& ... & sn s is valid.
Theorem 4.2. The sentence s is a logical consequence of s1,……….., sn if and only if s1 & s2
& ....&sn &‘s is inconsistent. Table 4.2 lists some of the important laws of PL.

One way to determine the equivalence of two sentences is by using truth table. For example
the conditional elimination and Bi-conditional elimination in the above table can be verified
by the following truth table.

Inference Rules

The inference rules of PL provide the means to perform logical proofs or deductions. The
problem is, given a set of sentences S = {s1, ............. , sn} (the premises), prove the truth of s
(the conclusion); that is, show that Sl-s. The use of truth tables to do this is a form of
semantic proof. Other syntactic methods of inference or deduction are also possible. Such
methods do not depend on truth assignments but on syntactic relationships only; that is, it is
possible to derive new sentences which are logical consequences of s1… sn using only
syntactic operations. Few inference rules are given here.
SYNTAX AND SEMANTICS FOR PREDICATE LOGIC (FOPL-First Order
Predicate Logic)

expressiveness is one of the requirements for any serious representation scheme. It should be
possible to accurately represent most, if not all concepts which can be verbalized. PL falls
short of this requirement in some important respects. It is too "coarse" to easily describe
properties of objects, and it lacks the structure to express relations that exist among two or
more entitles. Furthermore, PL does not permit us to make generalized statements about
classes of similar objects. These are serious limitations when reasoning about real world
entities. For example, given the following statements, it should be possible to conclude that
John must take the Pascal course.

All students in Computer Science must take Pascal.

John is a Computer Science major.

As stated, it is not possible to conclude in PL that John must take Pascal since the second
statement does not occur as part of the first one. To draw the desired conclusion with a valid
inference rule, it would be necessary to rewrite the sentences.
FOPL was developed by logicians to extend the expressiveness of PL. It is a generalization of
PL that permits reasoning about world objects as relational entities as well as classes or
subclasses of objects. This generalization comes from the introduction of predicates in place
of propositions, the use of functions and the use of variables together with variable
quantifiers.

These concepts are formalized below.


The syntax for FOPL, like PL, is determined by the allowable symbols and rules of
combination. The semantics of FOPL are determined by interpretations assigned to
predicates, rather than propositions. This means that an interpretation must also assign values
to other terms including constants, variables and functions, since predicates may have
arguments consisting of any of these terms. Therefore, the arguments of a predicate must be
assigned before an interpretation can be made.

Syntax of FOPL

The symbols and rules of combination permitted in FOPL are defined as follows.
Semantics for FOPL

When considering specific wffs, we always have in mind some domain D. If not stated
explicitly, D will be understood from the context. D is the set of all elements or objects from
which fixed assignments are made to constants and from which the domain and range of
functions are defined. The arguments of predicates must be terms (constants, variables, or
functions). Therefore, the domain of each n-place predicate is also defined over D.
For example, our domain might be all entities that make up the Computer Science
Department at the University of Texas. In this case, constants would be professors (Bell,
Cooke, Gelfond, and so on), staff (Martha, Pat, Linda, and so on), books, labs, offices, and so
forth. The functions we may choose might be
PROPOSITIONAL LOGIC: RESOLUTION

Resolution in Propositional Logic:


Resolution is a rule of inference leading to a refutation theorem—theorem proving technique
for statements in propositional logic and first- order logic. In other words, iteratively applying
resolution rule in a suitable way allows for telling whether, a propositional formula (WFF –
Well-formed formulas) is satisfiable.

The following steps should be carried out in sequences to employ it for theorem
proving in propositional using resolution:

Resolution Algorithm:
Given:
A set of clauses, called axioms and a goal.

Aim:
To test whether the goal is derivable from the axioms.
Begin:
1. Construct a set S of axioms plus the negated goal.
2. Represent each element of S into conjunctive normal form (CNF) by the following
steps:
a) Replace ‘if-then’ operator by NEGATION and OR operation by theorem using 10.

(b) Bring each modified clause into the following form and then drop AND operators
connected between each square bracket. The clauses thus obtained are in conjunctive normal
from (CNF). It may be noted that pij may be in negated or non-negated form.
3. Repeat:
(a) Select any two clauses from S, such that one clause contains a negated literal and the other
clause contains its corresponding positive (non-negated) literal.

(b) Resolve these two clauses and call the resulting clause the resolvent. Remove the parent
clauses from S.
Until a null clause is obtained or no further progress can be made.

4. If a null clause is obtained, then report: “goal is proved”

Example 1:
We are given the axioms given in the first column of table 6.3, and we want to prove R. First
we convert the axioms to clause form, as shown in the second column of the table. Then we
negate R, producing ¬ R which is already in clause form it is added into the given clauses
(data base).
Then we look for pairs of clauses to resolve together. Although many pair of clauses can be
resolved, only those pairs which contain complementary literals will produce a resolvent
which is likely to lead to the goal shown by empty clause (shown as a box). We begin by
resolving R with the clause ¬ R since that is one of the clauses which must be involved in the
contradiction we are trying to find. The sequence of contradiction resolvents of the example
in table 6.3., is shown in Fig. 6.4.

The procedure adopted in the above example can be explained as follows:


Resolution process starts with a set of clauses all assumed to be true. The clause 2 becomes
true when either ¬ P or ¬ Q or R is true. Since ¬ R is assumed to be true, clause 2 remains
true for either ¬ P or ¬ Q. This is the first resolvent clause. Now the proposition 1 says that P
is true meaning thereby that ¬ P cannot be true. This leaves only one possibility ¬ Q for
clause 2 to be true. This is shown by second resolvent. Proposition 4 can be true if either ¬ T
or Q is true.But ¬ Q must be true, so for proposition 4 to be true the only way for clause 4 to
be true is for ¬ T to be true, shown as third resolvent. But clause 5 says that T is true.

Example 2:
Consider the following knowledge base:
1. If the-humidity-is-high or the-sky-is-cloudy.
2. If the-sky-is-cloudy then it-will-rain.
3. If the-humidity-is-high then it-is-hot.
4. It-is-not-hot.
and the goal: It-will-rain prove by resolution theorem that the goal is derivable from the
knowledge base.
Proof:
Let us first denote the above clauses by the following symbols.

p = the-humidity-is high, q = the-sky-is-cloudy, r = it-will-rain, s = it-is-hot.

The CNF form of the above clause thus become-

1. p ∨ q
2. ¬ q ∨ r (after applying theorem 10)
3. ¬ p ∨ s (after applying theorem 10)
4. ¬ s
5. ¬ r
and the negated goal = ¬ r. The set of statements; S, thus includes all these 5 clauses in
Normal Form. When all the clauses are connected through connector 𝖠 they are called in CNF
and conjugated terms for the set S. For example

Now by resolution algorithm, we construct the graph of Fig. 6.5. Since it terminates with a
null clause the goal is proved.
KNOWLEDGE REPRESENTATION USING PREDICATE LOGIC

Representation of Simple Facts in Logic

Propositional logic is useful because it is simple to deal with and a decision procedure for it
exists.
Also, In order to draw conclusions, facts are represented in a more convenient way as,
1. Marcus is a man.
 man(Marcus)
2. Plato is a man.
 man(Plato)
3. All men are mortal.
 mortal(men)
But propositional logic fails to capture the relationship between an individual being a man
and
that individual being a mortal.
 How can these sentences be represented so that we can infer the third sentence from the
first two?
 Also, Propositional logic commits only to the existence of facts that may or may not be
the case in the world being represented.
 Moreover, It has a simple syntax and simple semantics. It suffices to illustrate the process
of inference.
 Propositional logic quickly becomes impractical, even for very small worlds.

Predicate logic
First-order Predicate logic (FOPL) models the world in terms of
 Objects, which are things with individual identities
 Properties of objects that distinguish them from other objects

 Relations that hold among sets of objects

 Functions, which are a subset of relations where there is only one “value” for any given
“input”
First-order Predicate logic (FOPL) provides
 Constants: a, b, dog33. Name a specific object.
 Variables: X, Y. Refer to an object without naming it.
 Functions: Mapping from objects to objects.
 Terms: Refer to objects
 Atomic Sentences: in(dad-of(X), food6) Can be true or false, Correspond to propositional
symbols P, Q.
A well-formed formula (wff) is a sentence containing no “free” variables. So, That is, all
variables are “bound” by universal or existential quantifiers.
(∀x)P(x, y) has x bound as a universally quantified variable, but y is free.

Quantifiers
Universal quantification
 (∀x)P(x) means that P holds for all values of x in the domain associated with that variable
 E.g., (∀x) dolphin(x) → mammal(x)
Existential quantification
 (∃ x)P(x) means that P holds for some value of x in the domain associated with that
variable
 E.g., (∃ x) mammal(x) 𝖠 lays-eggs(x)
Also, Consider the following example that shows the use of predicate logic as a way of
representing knowledge.
1. Marcus was a man.
2. Marcus was a Pompeian.
3. All Pompeians were Romans.
4. Caesar was a ruler.
5. Also, All Pompeians were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
The facts described by these sentences can be represented as a set of well-formed formulas
(wffs)
as follows:
1. Marcus was a man.
 man(Marcus)
2. Marcus was a Pompeian.
 Pompeian(Marcus)
3. All Pompeians were Romans.
 ∀x: Pompeian(x) → Roman(x)
4. Caesar was a ruler.
 ruler(Caesar)
5. All Pompeians were either loyal to Caesar or hated him.
 inclusive-or
 ∀x: Roman(x) → loyalto(x, Caesar) ∨ hate(x, Caesar)
 exclusive-or
 ∀x: Roman(x) → (loyalto(x, Caesar) 𝖠¬ hate(x, Caesar)) ∨

 (¬loyalto(x, Caesar) 𝖠 hate(x, Caesar))

6. Everyone is loyal to someone.


 ∀x: ∃y: loyalto(x, y)
7. People only try to assassinate rulers they are not loyal to.
 ∀x: ∀y: person(x) 𝖠 ruler(y) 𝖠 tryassassinate(x, y)
 →¬loyalto(x, y)
8. Marcus tried to assassinate Caesar.
 tryassassinate(Marcus, Caesar)

Now suppose if we want to use these statements to answer the question: Was Marcus loyal to
Caesar?
Also, Now let’s try to produce a formal proof, reasoning backward from the desired goal: ¬
Ioyalto(Marcus, Caesar)
In order to prove the goal, we need to use the rules of inference to transform it into another
goal
(or possibly a set of goals) that can, in turn, transformed, and so on, until there are no
unsatisfied goals remaining.

Figure: An attempt to prove ¬loyalto(Marcus, Caesar).


 The problem is that, although we know that Marcus was a man, we do not have any way
to conclude from that that Marcus was a person. Also, We need to add the representation
of another fact to our system, namely: ∀ man(x) → person(x)
 Now we can satisfy the last goal and produce a proof that Marcus was not loyal to
Caesar.
 Moreover, From this simple example, we see that three important issues must be
addressed in the process of converting English sentences into logical statements and then
using those statements to deduce new ones:

1. Many English sentences are ambiguous (for example, 5, 6, and 7 above). Choosing the
correct interpretation may be difficult.
2. Also, There is often a choice of how to represent the knowledge. Simple representations
are desirable, but they may exclude certain kinds of reasoning.
3. Similalry, Even in very simple situations, a set of sentences is unlikely to contain all the
information necessary to reason about the topic at hand. In order to be able to use a set of
statements effectively. Moreover, It is usually necessary to have access to another set of
statements that represent facts that people consider too obvious to mention.

Representing Instance and ISA Relationships

* Specific attributes instance and isa play an important role particularly in a useful form of
reasoning called property inheritance.
* The predicates instance and isa explicitly captured the relationships they used to express,
namely class membership and class inclusion.
* Figure shows the first five sentences of the last section represented in logic in three
different ways.
* The first part of the figure contains the representations we have already discussed. In
these representations, class membership represented with unary predicates (such as
Roman), each of which corresponds to a class.
* Asserting that P(x) is true is equivalent to asserting that x is an instance (or element) of P.
* The second part of the figure contains representations that use the instance predicate
explicitly.

The following figure shows three ways of representing class membership: isa relationships

 The predicate instance is a binary one, whose first argument is an object and whose second
argument is a class to which the object belongs.
 But these representations do not use an explicit isa predicate.
 Instead, subclass relationships, such as that between Pompeians and Romans, described
as shown in sentence 3.
 The implication rule states that if an object is an instance of the subclass Pompeian then it
is an instance of the superclass Roman.
 Note that this rule is equivalent to the standard set-theoretic definition of the subclass
superclass relationship.
 The third part contains representations that use both the instance and isa predicates
explicitly.
 The use of the isa predicate simplifies the representation of sentence 3, but it requires that
one additional axiom (shown here as number 6) be provided.

Computable Functions and Predicates

 To express simple facts, such as the following greater-than and less-than relationships:
gt(1,O) It(0,1) gt(2,1) It(1,2) gt(3,2) It( 2,3)
 It is often also useful to have computable functions as well as computable predicates.
Thus we might want to be able to evaluate the truth of gt(2 + 3,1)
 To do so requires that we first compute the value of the plus function given the arguments
2 and 3, and then send the arguments 5 and 1 to gt.
Consider the following set of facts, again involving Marcus:
1) Marcus was a man.
man(Marcus)
2) Marcus was a Pompeian.
Pompeian(Marcus)
3) Marcus was born in 40 A.D.
born(Marcus, 40)
4) All men are mortal.
x: man(x) → mortal(x)
5) All Pompeians died when the volcano erupted in 79 A.D.
erupted(volcano, 79) 𝖠 ∀ x : [Pompeian(x) → died(x, 79)]
6) No mortal lives longer than 150 years.
x: t1: At2: mortal(x) born(x, t1) gt(t2 – t1,150) → died(x, t2)
7) It is now 1991.
now = 1991
So, Above example shows how these ideas of computable functions and predicates can be
useful.
It also makes use of the notion of equality and allows equal objects to be substituted for each
other whenever it appears helpful to do so during a proof.
 So, Now suppose we want to answer the question “Is Marcus alive?”
 The statements suggested here, there may be two ways of deducing an answer.
 Either we can show that Marcus is dead because he was killed by the volcano or we can
show that he must be dead because he would otherwise be more than 150 years old, which we
know is not possible.
 Also, As soon as we attempt to follow either of those paths rigorously, however, we
discover, just as we did in the last example, that we need some additional knowledge. For
example, our statements talk about dying, but they say nothing that relates to being alive,
which is what the question is asking. So we add the following facts:
8) Alive means not dead.
x: t: [alive(x, t) → ¬ dead(x, t)] [¬ dead(x, t) → alive(x, t)]
9) If someone dies, then he is dead at all later times.
x: t1: At2: died(x, t1) gt(t2, t1) → dead(x, t2)
So, Now let’s attempt to answer the question “Is Marcus alive?” by proving: ¬ alive(Marcus,
now)

RESOLUTION

Resolution is a theorem proving technique that proceeds by building refutation proofs,


i.e., proofs by contradictions.

Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions.
Resolution is a single inference rule which can efficiently operate on the conjunctive normal
form or clausal form.

Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a
unit clause.

Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to


be conjunctive normal form or CNF.

Steps for Resolution:

1. Conversion of facts into first-order logic.


2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).

To better understand all the above steps, we will take an example in which we will
apply resolution.

Example:

a. John likes all kind of food.


b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.

Prove by resolution that:

f. John likes peanuts.

Step-1: Conversion of Facts into FOL Step-2: Conversion of FOL into CNF

In the first step we will convert all the given Eliminate all implication (→) and rewrite
statements into its first order logic.
1. ∀x ¬ food(x) V likes(John, x)
2. food(Apple)Λ
food(vegetables)
3. ∀x ∀y ¬ [eats(x, y) Λ ¬
killed(x)] V food(y)
4. eats (Anil, Peanuts) Λ
alive(Anil)
5. ∀x ¬ eats(Anil, x) V
eats(Harry, x)
6. ∀x¬ [¬ killed(x) ] V alive(x)
7. ∀x ¬ alive(x) V ¬ killed(x)
8. likes(John, Peanuts).

Move negation (¬)inwards and rewrite Rename variables or standardize variables

a. ∀x ¬ food(x) V likes(John, x) a. ∀x ¬ food(x) V likes(John, x)


b. food(Apple) Λ food(vegetables) b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y) c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil) d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x) e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀x ¬killed(x) ] V alive(x) f. ∀g ¬killed(g) ] V alive(g)
g. ∀x ¬ alive(x) V ¬ killed(x) g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts). h. likes(John, Peanuts).

o Eliminate existential instantiation quantifier o Drop Universal quantifiers.


by elimination. In this step we will drop all universal
In this step, we will eliminate existential quantifier since all the statements are not
quantifier ∃, and this process is known implicitly quantified so we don't need it.
as Skolemization. But in this example problem 1. ¬ food(x) V likes(John, x)
since there is no existential quantifier so all the 2. food(Apple)
statements will remain same in this step.
3. food(vegetables)
4. ¬ eats(y, z) V killed(y) V food(z)
5. eats (Anil, Peanuts)
6. alive(Anil)
7. ¬ eats(Anil, w) V eats(Harry, w)
8. killed(g) V alive(g)
9. ¬ alive(k) V ¬ killed(k)
10. likes(John, Peanuts).

Step-3: Negate the statement to be proved


o Distribute conjunction 𝖠 over disjunction¬.
This step will not make any change in this In this statement, we will apply negation to the
problem. conclusion statements, which will be written as
¬likes(John, Peanuts)

Step-4: Draw Resolution graph:

Now in this step, we will solve the


problem by resolution tree using substitution.
For the above problem, it will be given as
follows:

FORWARD AND BACKWARD CHAINING

In artificial intelligence, forward and backward chaining is one of the important


topics, but before understanding forward and backward chaining lets first understand that
from where these two terms came.
Inference engine:

The inference engine is the component of the intelligent system in artificial intelligence,
which applies logical rules to the knowledge base to infer new information from known facts.
The first inference engine was part of the expert system. Inference engine commonly
proceeds in two modes, which are:
a. Forward chaining

b. Backward chaining

Horn Clause and Definite clause:

Horn clause and definite clause are the forms of sentences, which enables knowledge base to
use a more restricted and efficient inference algorithm. Logical inference algorithms use
forward and backward chaining approaches, which require KB in the form of the first-order
definite clause.

Definite clause: A clause which is a disjunction of literals with exactly one positive
literal is known as a definite clause or strict horn clause.

Horn clause: A clause which is a disjunction of literals with at most one positive literal is
known as horn clause. Hence all the definite clauses are horn clauses.

Example: (¬ p V ¬ q V k). It has only one positive literal k.

It is equivalent to p 𝖠 q → k.

Forward Chaining

Forward chaining is also known as a forward deduction or forward reasoning method when
using an inference engine. Forward chaining is a form of reasoning which start with atomic
sentences in the knowledge base and applies inference rules (Modus Ponens) in the forward
direction to extract more data until a goal is reached.

The Forward-chaining algorithm starts from known facts, triggers all rules whose premises
are satisfied, and add their conclusion to the known facts. This process repeats until the
problem is solved.

Properties of Forward-Chaining:

o It is a down-up approach, as it moves from bottom to top.


o It is a process of making a conclusion based on known facts or data, by starting from
the initial state and reaches the goal state.
o Forward-chaining approach is also called as data-driven as we reach to the goal using
available data.
o Forward -chaining approach is commonly used in the expert system, such as CLIPS,
business, and production rule systems.

Consider the following famous example which we will use in both approaches:

Example:

"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A,
an enemy of America, has some missiles, and all the missiles were sold to it by Robert, who
is an American citizen."
Prove that "Robert is criminal."

To solve the above problem, first, we will convert all the above facts into first-order definite
clauses, and then we will use a forward-chaining algorithm to reach the goal.

Facts Conversion into FOL:


o It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r
are variables)
American (p) 𝖠 weapon(q) 𝖠 sells (p, q, r) 𝖠 hostile(r) → Criminal(p) ...(1)
o Country A has some missiles. ?p Owns(A, p) 𝖠 Missile(p). It can be written in two
definite clauses by using Existential Instantiation, introducing new Constant T1.
Owns(A, T1)...................(2)
Missile(T1) ................... (3)
o All of the missiles were sold to country A by Robert.
?p Missiles(p) 𝖠 Owns (A, p) → Sells (Robert, p, A) ............. (4)
o Missiles are weapons.
Missile(p) → Weapons (p) ................... (5)
o Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) .................... (6)
o Country A is an enemy of America.
Enemy (A, America) ..................... (7)
o Robert is American
American(Robert). ...................... (8)

Forward chaining proof:

Step-1:

In the first step we will start with the known facts and will choose the sentences which do not
have implications, such as: American(Robert), Enemy(A, America), Owns(A, T1), and
Missile(T1). All these facts will be represented as below.

Step-2:

At the second step, we will see those facts which infer from available facts and with satisfied
premises.

Rule-(1) does not satisfy premises, so it will not be added in the first iteration.

Rule-(2) and (3) are already added.

Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers
from the conjunction of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from
Rule-(7).

Step-3:

At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A},
so we can add Criminal(Robert) which infers all the available facts. And hence we reached
our goal statement.

Hence it is proved that Robert is Criminal using forward chaining approach.

Backward Chaining:

Backward-chaining is also known as a backward deduction or backward reasoning method


when using an inference engine. A backward chaining algorithm is a form of reasoning,
which starts with the goal and works backward, chaining through rules to find known facts
that support the goal.

Properties of backward chaining:

o It is known as a top-down approach.


o Backward-chaining is based on modus ponens inference rule.
o In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts
true.
o It is called a goal-driven approach, as a list of goals decides which rules are selected
and used.
o Backward -chaining algorithm is used in game theory, automated theorem proving
tools, inference engines, proof assistants, and various AI applications.
o The backward-chaining method mostly used a depth-first search strategy for proof.

Example:

In backward-chaining, we will use the same above example, and will rewrite all the rules.

o American (p) 𝖠 weapon(q) 𝖠 sells (p, q, r) 𝖠 hostile(r) → Criminal(p) ...(1)


Owns(A, T1).........................(2)
o Missile(T1)
o ?p Missiles(p) 𝖠 Owns (A, p) → Sells (Robert, p, A) ................. (4)
o Missile(p) → Weapons (p) ....................... (5)
o Enemy(p, America) →Hostile(p) ........................ (6)
o Enemy (A, America) ......................... (7)
o American(Robert). .......................... (8)

Backward-Chaining proof:

In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and
then infer further rules.

Step-1:

At the first step, we will take the goal fact. And from the goal fact, we will infer other facts,
and at last, we will prove those facts true. So our goal fact is "Robert is Criminal," so
following is the predicate of it.

Step-2:

At the second step, we will infer other facts form goal fact which satisfies the rules. So as we
can see in Rule-1, the goal predicate Criminal (Robert) is present with substitution
{Robert/P}. So we will add all the conjunctive facts below the first level and will replace p
with Robert.

Here we can see American (Robert) is a fact, so it is proved here.


Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it
satisfies Rule-(5). Weapon (q) is also true with the substitution of a constant T1 at q.

Step-4:

At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which
satisfies the Rule- 4, with the substitution of A in place of r. So these two statements are
proved here.
Step-5:

At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule-
6. And hence all the statements are proved true using backward chaining.
SEMANTIC TABLEAU

Since the 1980s another technique for determining the validity of arguments in either PC
(Pedictive Coding) or LPC (Linear Predictive Coding)has gained some popularity, owing
both to its ease of learning and to its straightforward implementation by computer programs.
Originally suggested by the Dutch logician Evert W. Beth, it was more fully developed and
publicized by the American mathematician and logician Raymond M. Smullyan. Resting on
the observation that it is impossible for the premises of a valid argument to be true while
the conclusion is false, this method attempts to interpret (or evaluate) the premises in such a
way that they are all simultaneously satisfied and the negation of the conclusion is also
satisfied. Success in such an effort would show the argument to be invalid, while failure to
find such an interpretation would show it to be valid.

The construction of a semantic tableau proceeds as follows: express the premises


and negation of the conclusion of an argument in PC using only negation (∼)
and disjunction (∨) as propositional connectives. Eliminate every occurrence of two negation
signs in a sequence (e.g., ∼∼∼∼∼a becomes ∼a). Now construct a tree diagram branching
downward such that each disjunction is replaced by two branches, one for the left disjunct
and one for the right. The original disjunction is true if either branch is true. Reference to De
Morgan’s laws shows that a negation of a disjunction is true just in case the negations of both
disjuncts are true [i.e., ∼(p ∨ q) ≡ (∼p · ∼q)]. This semantic observation leads to the rule that
the negation of a disjunction becomes one branch containing the negation of each disjunct:

Consider the following argument:

Write:

Now strike out the disjunction and form two branches:

Only if all the sentences in at least one branch are true is it possible for the original premises
to be true and the conclusion false (equivalently for the negation of the conclusion). By
tracing the line upward in each branch to the top of the tree, one observes that no valuation
of a in the left branch will result in all the sentences in that branch receiving the value true
(because of the presence of a and ∼a). Similarly, in the right branch the presence of b and
∼b makes it impossible for a valuation to result in all the sentences of the branch receiving
the value true. These are all the possible branches; thus, it is impossible to find a situation in
which the premises are true and the conclusion false. The original argument is therefore valid

This technique can be extended to deal with other connectives:

Furthermore, in LPC, rules for instantiating quantified wffs need to be introduced. Clearly,
any branch containing both (∀x)ϕx and ∼ϕy is one in which not all the sentences in that
branch can be simultaneously. Again, if all the branches fail to be simultaneously satisfiable,
the original argument is valid.
UNIFICATION ALGORITHM
CONVERSION TO DIFFERENT FORMS

DEDUCTION
PROPOSITIONAL THEOREM PROVING

Theorem proving is very popular in artificial intelligence application in mathematics. It was a


very early technique proved to be effective in checking the mathematical theorems.

Standard Logical Equivalence

INFERENCING

It applies logical rulesw to the knowledge base to infer new information from known facts.
Inference engine proceeds in two modes:
1. Forward Chaining 2. Backward Chaining
The location of inference engine is shown in the figure here.
In artificial intelligence, we need intelligent computers which can create new logic
from old logic or by evidence, so generating the conclusions from evidence and
facts is termed as Inference.

Inference rules:

Inference rules are the templates for generating valid arguments. Inference rules are applied
to derive proofs in artificial intelligence, and the proof is a sequence of the conclusion that
leads to the desired goal.

In inference rules, the implication among all the connectives plays an important role.
Following are some terminologies related to inference rules:

o Implication: It is one of the logical connectives which can be represented as P → Q.


It is a Boolean expression.
o Converse: The converse of implication, which means the right-hand side proposition
goes to the left-hand side and vice-versa. It can be written as Q → P.
o Contrapositive: The negation of converse is termed as contrapositive, and it can be
represented as ¬ Q → ¬ P.
o Inverse: The negation of implication is called inverse. It can be represented as ¬ P →
¬ Q.

From the above term some of the compound statements are equivalent to each other, which
we can prove using truth table:

Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬ P, and
Q→ P is equivalent to ¬ P → ¬ Q.
Types of Inference rules:
1. Modus Ponens:

The Modus Ponens rule is one of the most important rules of inference, and it states that if P
and P → Q is true, then we can infer that Q will be true. It can be represented as:

Example:

Statement-1: "If I am sleepy then I go to bed" ==> P→ Q Statement-


2: "I am sleepy" ==> P
Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.

Proof by Truth table:

2. Modus Tollens:

The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also true. It
can be represented as:

Statement-1: "If I am sleepy then I go to bed" ==> P→ Q Statement-


2: "I do not go to the bed."==> ~Q
Statement-3: Which infers that "I am not sleepy" => ~P

Proof by Truth table:

3. Hypothetical Syllogism:

The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and Q→R
is true. It can be represented as the following notation:
Example:

Statement-1: If you have my home key then you can unlock my home. P→Q Statement-
2: If you can unlock my home then you can take my money. Q→R Conclusion: If you
have my home key then you can take my money. P→R

Proof by truth table:

4. Disjunctive Syllogism:

The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be true. It
can be represented as:

Example:

Statement-1: Today is Sunday or Monday. ==>P∨Q


Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q

Proof by truth-table:

5. Addition:

The Addition rule is one the common inference rule, and it states that If P is true, then P∨Q
will be true.
Example:

Statement: I have a vanilla ice-cream. ==> P


Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)

Proof by Truth-Table:

6. Simplification:

The simplification rule state that if P𝖠 Q is true, then Q or P will also be true. It can be
represented as:

Proof by Truth-Table:

7. Resolution:

The Resolution rule state that if P∨Q and ¬ P𝖠R is true, then Q∨R will also be true. It can be
represented as

Proof by Truth-Table:
MONOTONIC AND NON-MONOTONIC REASONING

Monotonic Reasoning:

In monotonic reasoning, once the conclusion is taken, then it will remain the same
even if we add some other information to existing information in our knowledge base. In
monotonic reasoning, adding knowledge does not decrease the set of prepositions that can be
derived.

To solve monotonic problems, we can derive the valid conclusion from the available
facts only, and it will not be affected by new facts.

Monotonic reasoning is not useful for the real-time systems, as in real time, facts get
changed, so we cannot use monotonic reasoning.

Monotonic reasoning is used in conventional reasoning systems, and a logic-based


system is monotonic.

Any theorem proving is an example of monotonic reasoning.

Example:

o Earth revolves around the Sun.

It is a true fact, and it cannot be changed even if we add another sentence in


knowledge base like, "The moon revolves around the earth" Or "Earth is not round," etc.

Advantages of Monotonic Reasoning:

o In monotonic reasoning, each old proof will always remain valid.


o If we deduce some facts from available facts, then it will remain valid for always.

Disadvantages of Monotonic Reasoning:

o We cannot represent the real world scenarios using Monotonic reasoning.


o Hypothesis knowledge cannot be expressed with monotonic reasoning, which means
facts should be true.
o Since we can only derive conclusions from the old proofs, so new knowledge from
the real world cannot be added.
Non-monotonic Reasoning

In Non-monotonic reasoning, some conclusions may be invalidated if we add some


more information to our knowledge base.

Logic will be said as non-monotonic if some conclusions can be invalidated by adding


more knowledge into our knowledge base.

Non-monotonic reasoning deals with incomplete and uncertain models.

"Human perceptions for various things in daily life, "is a general example of non-
monotonic reasoning.

Example: Let suppose the knowledge base contains the following knowledge:

o Birds can fly


o Penguins cannot fly
o Pitty is a bird

So from the above sentences, we can conclude that Pitty can fly.

However, if we add one another sentence into knowledge base "Pitty is a penguin",
which concludes "Pitty cannot fly", so it invalidates the above conclusion.

Advantages of Non-monotonic reasoning:

o For real-world systems such as Robot navigation, we can use non-monotonic


reasoning.
o In Non-monotonic reasoning, we can choose probabilistic facts or can make
assumptions.

Disadvantages of Non-monotonic Reasoning:

o In non-monotonic reasoning, the old facts may be invalidated by adding new


sentences.
o It cannot be used for theorem proving.
UNIT-3

Bayes' Theorem Formula:

P(H|E) = P(E|H) * P(H) / P(E)

Where:

- P(H|E) is the posterior probability of the hypothesis (H) given the evidence (E)

- P(E|H) is the likelihood of the evidence given the hypothesis

- P(H) is the prior probability of the hypothesis

- P(E) is the probability of the evidence

How Bayes' Theorem Works:

1. Prior Probability: Start with an initial estimate of the probability of the


hypothesis (P(H)).

2. New Evidence: Collect new data or evidence (E) related to the hypothesis.

3. Likelihood: Calculate the likelihood of the evidence given the hypothesis


(P(E|H)).

4. Posterior Probability: Update the probability of the hypothesis using Bayes'


Theorem (P(H|E)).

Example:
Suppose we want to determine the probability that a person has a disease (H)
based on a positive test result (E).

- Prior Probability (P(H)): 0.01 (1% of the population has the disease)

- Likelihood (P(E|H)): 0.9 (90% of people with the disease test positive)

- Probability of Evidence (P(E)): 0.02 (2% of the population tests positive)

Using Bayes' Theorem, we can update the probability of the hypothesis:

P(H|E) = P(E|H) * P(H) / P(E)

= 0.9 * 0.01 / 0.02

= 0.45

The posterior probability of the hypothesis (P(H|E)) is now 45%, indicating a


significant increase in the likelihood of the disease given the positive test result.

Applications of Bayes' Theorem:

1. Medical diagnosis

2. Spam filtering

3. Image recognition

4. Natural Language Processing (NLP)

5. Finance and risk analysis

Concept Learning:

Concept Learning is a type of machine learning that involves learning concepts


or categories from data. It's a fundamental aspect of artificial intelligence,
enabling machines to understand and generalize concepts from specific
instances.

1. Concepts: Abstract ideas or categories that represent a set of objects, events,


or attributes.

2. Instances: Specific examples or data points that illustrate a concept.

3. Learning: The process of identifying patterns, relationships, or rules that


define a concept.

4. Generalization: The ability to apply learned concepts to new, unseen


instances.

Types of Concept Learning:

1. Supervised Concept Learning: The machine is provided with labeled


instances, and it learns to map inputs to outputs.

2. Unsupervised Concept Learning: The machine discovers patterns and


relationships in unlabeled data.

3. Semi-Supervised Concept Learning: A combination of labeled and unlabeled


data is used to learn concepts.

Techniques for Concept Learning:

1. Decision Trees: A hierarchical representation of concepts, using decision


nodes and leaf nodes.

2. Rule-Based Systems: A set of rules is learned to define a concept.

3. Neural Networks: A network of interconnected nodes (neurons) learns to


represent concepts.

4. Support Vector Machines (SVMs): A machine learning algorithm that finds


the best hyperplane to separate concepts.
Applications of Concept Learning:

1. Image Classification: Learning concepts like objects, scenes, and actions


from images.

2. Natural Language Processing (NLP): Learning concepts like sentiment,


entities, and intent from text.

3. Recommendation Systems: Learning concepts like user preferences and item


attributes.

4. Expert Systems: Learning concepts like rules and decision-making processes


from experts.

Concept Learning is a crucial aspect of machine learning, enabling machines to


understand and apply abstract concepts to real-world problems.

[3:09 pm, 13/12/2024] Meta AI: Maximum Likelihood (ML) is a fundamental


concept in statistics and machine learning. It's a method for estimating the
parameters of a statistical model by finding the values that maximize the
likelihood of observing the data.

Key Aspects of Maximum Likelihood:

1. Likelihood Function: A mathematical function that describes the probability


of observing the data given the model parameters.

2. Parameter Estimation: The process of finding the values of the model


parameters that maximize the likelihood function.

3. Maximum Likelihood Estimator (MLE): The value of the parameter that


maximizes the likelihood function.

How Maximum Likelihood Works:


1. Define the Model: Specify the statistical model and its parameters.

2. Collect Data: Gather the data to be analyzed.

3. Define the Likelihood Function: Write the likelihood function based on the
model and data.

4. Maximize the Likelihood Function: Use optimization techniques to find the


values of the parameters that maximize the likelihood function.

Advantages of Maximum Likelihood:

1. Consistency: MLEs are consistent, meaning they converge to the true


parameter values as the sample size increases.

2. Efficiency: MLEs are often the most efficient estimators, meaning they have
the smallest variance.

3. Invariance: MLEs are invariant to transformations of the data.

Common Applications of Maximum Likelihood:

1. Linear Regression: Estimating the coefficients of a linear regression model.

2. Logistic Regression: Estimating the coefficients of a logistic regression


model.

3. Time Series Analysis: Estimating the parameters of a time series model.

4. Machine Learning: Estimating the parameters of a machine learning model,


such as a neural network.

Common Challenges and Limitations:

1. Computational Complexity: Maximizing the likelihood function can be


computationally intensive.
2. Non-Identifiability: In some cases, the likelihood function may not be
identifiable, making it difficult to estimate the parameters.

3. Model Misspecification: If the model is misspecified, the MLE may not be


consistent or efficient.

Maximum Likelihood is a powerful tool for estimating model parameters and


making inferences about the data. However, it's essential to be aware of its
limitations and challenges.

The Minimum Description Length (MDL) principle

is a fundamental concept in information theory, machine learning, and statistics.


It provides a framework for model selection, hypothesis testing, and data
compression.

Key Aspects of MDL:

1. Description Length: The length of the description of a model or hypothesis,


typically measured in bits.

2. Model Complexity: The complexity of a model, which affects its ability to fit
the data.

3. Data Compression: The idea that a good model should be able to compress
the data, reducing the description length.

MDL Principle:

The MDL principle states that the best model is the one that minimizes the total
description length, which includes:

1. Model Description Length: The length of the description of the model itself.

2. Data Description Length: The length of the description of the data given the
model.
Mathematical Formulation:

Let M be a model and D be the data. The MDL principle can be formulated as:

L(M, D) = L(M) + L(D|M)

where L(M) is the model description length, L(D|M) is the data description
length given the model, and L(M, D) is the total description length.

Applications of MDL:

1. Model Selection: MDL can be used to select the best model from a set of
candidate models.

2. Hypothesis Testing: MDL can be used to test hypotheses and select the most
plausible explanation.

3. Data Compression: MDL can be used to compress data by finding the most
concise description.

4. Machine Learning: MDL can be used to regularize models and prevent


overfitting.

Advantages of MDL:

1. Interpretable: MDL provides an interpretable framework for model selection


and hypothesis testing.

2. Flexible: MDL can be applied to various domains, including machine


learning, statistics, and information theory.

3. Robust: MDL is robust to overfitting and can handle high-dimensional data.


Common Challenges and Limitations:

1. Computational Complexity: Computing the MDL can be computationally


intensive.

2. Model Misspecification: MDL assumes that the model is correctly specified,


which may not always be the case.

3. Hyperparameter Tuning: MDL requires hyperparameter tuning, which can be


challenging.

The Minimum Description Length principle provides a powerful framework


for model selection, hypothesis testing, and data compression. While it has its
limitations, MDL remains a widely used and influential concept in machine
learning, statistics, and information theory.

Here's a more detailed explanation of the Minimum Description Length (MDL)


principle:

What is MDL?

The Minimum Description Length principle is a concept in information theory,


machine learning, and statistics. It states that the best model or explanation for a
set of data is the one that minimizes the total description length.

Description Length

The description length is the length of the description of a model or hypothesis,


typically measured in bits. It includes:

1. Model Description Length: The length of the description of the model itself.
2. Data Description Length: The length of the description of the data given the
model.

MDL Principle

The MDL principle states that the best model is the one that minimizes the total
description length:

L(M, D) = L(M) + L(D|M)

where L(M) is the model description length, L(D|M) is the data description
length given the model, and L(M, D) is the total description length.

Applications of MDL

1. Model Selection: MDL can be used to select the best model from a set of
candidate models.

2. Hypothesis Testing: MDL can be used to test hypotheses and select the most
plausible explanation.

3. Data Compression: MDL can be used to compress data by finding the most
concise description.

4. Machine Learning: MDL can be used to regularize models and prevent


overfitting.

Advantages of MDL

1. Interpretable: MDL provides an interpretable framework for model selection


and hypothesis testing.
2. Flexible: MDL can be applied to various domains, including machine
learning, statistics, and information theory.

3. Robust: MDL is robust to overfitting and can handle high-dimensional data.

Common Challenges and Limitations

1. Computational Complexity: Computing the MDL can be computationally


intensive.

2. Model Misspecification: MDL assumes that the model is correctly specified,


which may not always be the case.

3. Hyperparameter Tuning: MDL requires hyperparameter tuning, which can be


challenging.

Gibbs Algorithm:

The Gibbs Algorithm is a Markov chain Monte Carlo (MCMC) method used for
estimating the distribution of a random variable or a set of random variables. It's
a powerful tool for Bayesian inference and is widely used in machine learning,
statistics, and data science.

1. Markov Chain: The Gibbs Algorithm generates a Markov chain, which is a


sequence of random variables where each variable depends only on the previous
one.

2. Monte Carlo: The algorithm uses Monte Carlo methods to approximate the
distribution of the random variables.

3. Conditional Distributions: The Gibbs Algorithm iteratively samples from the


conditional distributions of each variable given the others.

How the Gibbs Algorithm Works:


1. Initialization: Initialize the variables to some arbitrary values.

2. Iteration: Iterate through the variables, sampling from the conditional


distribution of each variable given the others.

3. Convergence: Continue iterating until the Markov chain converges to a


stationary distribution.

Advantages of the Gibbs Algorithm:

1. Flexibility: The Gibbs Algorithm can be used for a wide range of


distributions and models.

2. Efficiency: The algorithm can be more efficient than other MCMC methods,
especially for high-dimensional distributions.

3. Easy to Implement: The Gibbs Algorithm is relatively simple to implement,


especially when compared to other MCMC methods.

Common Applications of the Gibbs Algorithm:

1. Bayesian Inference: The Gibbs Algorithm is widely used for Bayesian


inference, especially for complex models with many parameters.

2. Machine Learning: The algorithm is used in machine learning for tasks such
as clustering, dimensionality reduction, and regression.

3. Statistics: The Gibbs Algorithm is used in statistics for tasks such as


hypothesis testing, confidence intervals, and regression analysis.

Common Challenges and Limitations:

1. Convergence: The Gibbs Algorithm can be slow to converge, especially for


complex distributions.
2. Autocorrelation: The algorithm can suffer from autocorrelation, which can
affect the accuracy of the estimates.

3. Computational Complexity: The Gibbs Algorithm can be computationally


intensive, especially for large datasets.

The Naïve Bayes Classifier

is a popular supervised learning algorithm used for classification tasks. It's a


simple, yet effective, method for predicting the class of a new instance based on
its features.

Key Aspects of Naïve Bayes Classifier:

1. Bayes' Theorem: The algorithm is based on Bayes' theorem, which describes


the probability of an event given some prior knowledge.

2. Naïve Assumption: The algorithm assumes that the features of the instances
are independent of each other, given the class.

3. Conditional Probability: The algorithm calculates the conditional probability


of each class given the features of the instance.

How Naïve Bayes Classifier Works:

1. Training: The algorithm is trained on a labeled dataset, where each instance is


described by a set of features.

2. Prior Probabilities: The algorithm calculates the prior probabilities of each


class, which represent the probability of each class before observing the
features.

3. Likelihood: The algorithm calculates the likelihood of each feature given


each class, which represents the probability of observing the feature given the
class.

4. Posterior Probabilities: The algorithm calculates the posterior probabilities of


each class given the features of the instance, using Bayes' theorem.
5. Prediction: The algorithm predicts the class with the highest posterior
probability.

Advantages of Naïve Bayes Classifier:

1. Simple to Implement: The algorithm is easy to implement and understand.

2. Fast Training: The algorithm trains quickly, even on large datasets.

3. Good Performance: The algorithm performs well on many classification


tasks, especially when the features are independent.

Common Applications of Naïve Bayes Classifier:

1. Text Classification: The algorithm is widely used for text classification tasks,
such as spam detection and sentiment analysis.

2. Image Classification: The algorithm can be used for image classification


tasks, such as object recognition and image tagging.

3. Recommendation Systems: The algorithm can be used in recommendation


systems to predict user preferences.

Common Challenges and Limitations:

1. Assumes Independence: The algorithm assumes that the features are


independent, which may not always be the case.

2. Sensitive to Prior Probabilities: The algorithm is sensitive to the prior


probabilities of the classes, which can affect the accuracy of the predictions.

3. Not Suitable for Complex Relationships: The algorithm is not suitable for
modeling complex relationships between features.

Instance-Based Learning (IBL)


Instance-Based Learning (IBL) and K-Nearest Neighbors (KNN) are two
closely related concepts in machine learning.

IBL is a type of machine learning that involves storing and retrieving instances
of data. The goal of IBL is to make predictions or take actions based on the
similarity between new, unseen instances and the stored instances.

K-Nearest Neighbors (KNN)

KNN is a specific type of IBL that involves finding the k most similar instances
to a new instance. The KNN algorithm works as follows:

1. Training: Store a set of labeled instances, where each instance is described by


a set of features.

2. Prediction: For a new, unseen instance, calculate the distance or similarity


between the new instance and each stored instance.

3. K-Nearest Neighbors: Select the k instances with the smallest distance or


highest similarity to the new instance.

4. Prediction: Make a prediction based on the majority vote or weighted average


of the k nearest neighbors.

Advantages of KNN

1. Simple to Implement: KNN is a simple algorithm to implement, especially


when compared to more complex machine learning algorithms.

2. Flexible: KNN can be used for classification, regression, and other tasks.

3. Robust: KNN is robust to noisy data and can handle high-dimensional data.
Disadvantages of KNN

1. Computational Complexity: KNN can be computationally expensive,


especially for large datasets.

2. Sensitive to Choice of k: The choice of k can significantly affect the


performance of KNN.

3. Not Suitable for Complex Relationships: KNN is not suitable for modeling
complex relationships between features.

Applications of KNN

1. Image Classification: KNN can be used for image classification tasks, such as
object recognition and image tagging.

2. Text Classification: KNN can be used for text classification tasks, such as
spam detection and sentiment analysis.

3. Recommendation Systems: KNN can be used in recommendation systems to


predict user preferences.

Variations of KNN

1. Weighted KNN: Assigns weights to the k nearest neighbors based on their


distance or similarity.

2. K-D Trees: Uses a k-d tree data structure to efficiently search for the k
nearest neighbors.

3. Ball Tree: Uses a ball tree data structure to efficiently search for the k nearest
neighbors.

Definition:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that involves
the development of algorithms and statistical models that enable machines to
learn from data, make decisions, and improve their performance on a task
without being explicitly programmed.

Evolution:

Machine Learning has its roots in the 1950s, but it has evolved significantly
over the years. Here's a brief timeline:

1. 1950s: Alan Turing proposed the Turing Test, which measures a machine's
ability to exhibit intelligent behavior equivalent to, or indistinguishable from,
that of a human.

2. 1960s: The first machine learning algorithms, such as decision trees and
clustering, were developed.

3. 1980s: Machine learning gained popularity with the introduction of expert


systems and rule-based systems.

4. 1990s: The development of support vector machines (SVMs) and neural


networks marked a significant milestone in machine learning.

5. 2000s: The rise of big data and the development of deep learning algorithms,
such as convolutional neural networks (CNNs) and recurrent neural networks
(RNNs), transformed the field of machine learning.

Need:

Machine Learning is necessary for several reasons:

1. Automation: Machine learning enables automation of repetitive and mundane


tasks, freeing up human resources for more strategic and creative work.
2. Insight Generation: Machine learning helps generate insights from large
datasets, enabling businesses to make informed decisions.

3. Personalization: Machine learning enables personalization of products and


services, improving customer experience and loyalty.

4. Competitive Advantage: Machine learning provides a competitive advantage


to businesses that adopt it, enabling them to innovate and disrupt markets.

Applications of Machine Learning:

Machine Learning has numerous applications across industries, including:

1. Computer Vision: Image recognition, object detection, facial recognition, and


image segmentation.

2. Natural Language Processing (NLP): Text classification, sentiment analysis,


language translation, and speech recognition.

3. Predictive Maintenance: Predicting equipment failures, reducing downtime,


and improving overall efficiency.

4. Recommendation Systems: Personalized product recommendations, content


recommendation, and advertising.

5. Healthcare: Disease diagnosis, medical imaging analysis, and personalized


medicine.

6. Finance: Credit risk assessment, fraud detection, and portfolio optimization.

7. Marketing: Customer segmentation, lead scoring, and marketing automation.

8. Autonomous Vehicles: Object detection, tracking, and motion forecasting.

Real-World Examples:

1. Virtual Assistants: Siri, Alexa, and Google Assistant use machine learning to
understand voice commands and respond accordingly.
2. Image Recognition: Facebook's facial recognition feature uses machine
learning to identify and tag people in photos.

3. Self-Driving Cars: Tesla's Autopilot system uses machine learning to detect


and respond to objects on the road.

4. Personalized Recommendations: Netflix's recommendation engine uses


machine learning to suggest TV shows and movies based on user preferences.

In summary, Machine Learning is a powerful technology that enables machines


to learn from data and make decisions without being explicitly programmed. Its
applications are diverse, and it has the potential to transform industries and
revolutionize the way we live and work.

[3:17 pm, 13/12/2024] Meta AI: Here's an overview of classification and the
differences between supervised and unsupervised learning paradigms:

Classification

Classification is a type of machine learning task where the goal is to assign a


label or category to a new instance based on its features. The classification task
can be further divided into two sub-tasks:

- Binary Classification: Involves assigning one of two labels to an instance (e.g.,


spam vs. non-spam emails).

- Multi-Class Classification: Involves assigning one of multiple labels to an


instance (e.g., classifying handwritten digits into one of 10 classes).

Supervised Learning

Supervised learning is a type of machine learning where the algorithm is trained


on labeled data. The goal is to learn a mapping between input data and the
corresponding output labels.
- Key Characteristics:

- Labeled data is used for training.

- The algorithm learns to predict the output label based on the input data.

- The goal is to minimize the error between predicted and actual labels.

- Examples:

- Image classification

- Sentiment analysis

- Speech recognition

Unsupervised Learning

Unsupervised learning is a type of machine learning where the algorithm is


trained on unlabeled data. The goal is to discover patterns, relationships, or
groupings in the data.

- Key Characteristics:

- Unlabeled data is used for training.

- The algorithm learns to identify patterns or structure in the data.

- The goal is to discover meaningful insights or representations of the data.

- Examples:

- Clustering

- Dimensionality reduction

- Anomaly detection

Key Differences
Here are the key differences between supervised and unsupervised learning:

- Labeled vs. Unlabeled Data: Supervised learning uses labeled data, while
unsupervised learning uses unlabeled data.

- Goal: Supervised learning aims to predict output labels, while unsupervised


learning aims to discover patterns or structure in the data.

- Algorithmic Approach: Supervised learning typically uses discriminative


models, while unsupervised learning uses generative models.

- Evaluation Metrics: Supervised learning uses metrics like accuracy, precision,


and recall, while unsupervised learning uses metrics like clustering quality,
dimensionality reduction quality, or anomaly detection performance.
UNIT-4

ARTIFICIAL INTELLIGENCE AND


MACHINE LEARNING
Perceptron model, Multilayer perceptron
In machine learning, the perceptron is an algorithm for supervised learning of binary
classifiers. A binary classifier is a function which can decide whether an input,
represented by a vector of numbers, belongs to some specific class. It is a type of
linear classifier, i.e. a classification algorithm that makes its predictions based on a
linear predictor function combining a set of weights with the feature vector.

The perceptron algorithm was invented in 1958 at the Cornell Aeronautical


Laboratory by Frank Rosenblatt, funded by the United States Office of Naval
Research.

The perceptron was intended to be a machine, rather than a program, and while its
first implementation was in software for the IBM 704, it was subsequently
implemented in custom-built hardware as the “Mark 1 perceptron”. This machine
was designed for image recognition: it had an array of 400 photocells, randomly
connected to the “neurons”. Weights were encoded in potentiometers, and weight
updates during learning were performed by electric motors.

In a 1958 press conference organized by the US Navy, Rosenblatt made statements


about the perceptron that caused a heated controversy among the fledgling AI
community; based on Rosenblatt’s statements, The New York Times reported the
perceptron to be “the embryo of an electronic computer that [the Navy] expects will
be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”

Although the perceptron initially seemed promising, it was quickly proved that
perceptron’s could not be trained to recognize many classes of patterns. This caused
the field of neural network research to stagnate for many years before it was
recognized that a feedforward neural network with two or more layers (also called a
multilayer perceptron) had greater processing power than perceptron’s with one
layer (also called a single-layer perceptron).

Single layer perceptron’s are only capable of learning linearly separable patterns. For
a classification task with some step activation function, a single node will have a
single line dividing the data points forming the patterns. More nodes can create
more dividing lines, but those lines must somehow be combined to form more
complex classifications. A second layer of perceptron’s, or even linear nodes, are
sufficient to solve a lot of otherwise non-separable problems.

In 1969, a famous book entitled Perceptron’s by Marvin Minsky and Seymour Paper
showed that it was impossible for these classes of network to learn an XOR function.
It is often believed (incorrectly) that they also conjectured that a similar result would
hold for a multi-layer perceptron network. However, this is not true, as both Minsky
and Paper already knew that multi-layer perceptron’s could produce an XOR
function. (See the page on Perceptron’s (book) for more information.) Nevertheless,
the often-miscited Minsky/Paper text caused a significant decline in interest and
funding of neural network research. It took ten more years until neural network
research experienced a resurgence in the 1980s. This text was reprinted in 1987 as
“Perceptron’s Expanded Edition” where some errors in the original text are shown
and corrected.

The kernel perceptron algorithm was already introduced in 1964 by Aizerman.


Margin bounds guarantees were given for the Perceptron algorithm in the general
non-separable case first by Freund and Schapire (1998), and more recently by Mohri
and Rostamizadeh (2013) who extend previous results and give new L1 bounds.

The perceptron is a simplified model of a biological neuron. While the complexity of


biological neuron models is often required to fully understand neural behavior,
research suggests a perceptron-like linear model can produce some behavior seen in
real neurons.

Single-layered perceptron model

A single-layer perceptron model includes a feed-forward network depends on a


threshold transfer function in its model. It is the easiest type of artificial neural
network that able to analyze only linearly separable objects with binary
outcomes(target) i.e. 1, and 0.

Multi-layered perceptron model

A multi-layered perceptron model has a structure similar to a single-layered


perceptron model with a greater number of hidden layers. It is also termed as a
Backpropagation algorithm. It executes in two stages: the forward stage and the
backward stages.

Gradient descent and the Delta rule


The development of the perceptron was a big step towards the goal of creating
useful connectionist networks capable of learning complex relations between inputs
and outputs. In the late 1950’s, the connectionist community understood that what
was needed for further development of connectionist models was a mathematically
derived (and thus potentially more flexible and powerful) rule for learning. By early
1960’s, the Delta Rule [also known as the Widrow & Hoff Learning rule or the Least
Mean Square (LMS) rule] was invented by Widrow and Hoff. This rule is similar to
the perceptron learning rule by McClelland & Rumelhart, 1988, but is also
characterized by a mathematical utility and elegance missing in the perceptron and
other early learning rules.
The Delta Rule uses the difference between target activation (i.e., target output
values) and obtained activation to drive learning. For reasons discussed below, the
use of a threshold activation function (as used in both the McCulloch-Pitts network
and the perceptron) is dropped & instead a linear sum of products is used to
calculate the activation of the output neuron (alternative activation functions can
also be applied). Thus, the activation function is called a Linear Activation function,
in which the output node’s activation is simply equal to the sum of the network’s
respective input/weight products. The strength of network connections (i.e., the
values of the weights) are adjusted to reduce the difference between target and
actual output activation (i.e., error).

A set of data points are said to be linearly separable if the data can be divided into
two classes using a straight line. If the data is not divided into two classes using a
straight line, such data points are said to be called non-linearly separable data.

Although the perceptron rule finds a successful weight vector when the training
examples are linearly separable, it can fail to converge if the examples are not
linearly separable.

A second training rule, called the delta rule, is designed to overcome this difficulty.

If the training examples are not linearly separable, the delta rule converges toward a
best-fit approximation to the target concept.

The key idea behind the delta rule is to use gradient descent to search the
hypothesis space of possible weight vectors to find the weights that best fit the
training examples.
This rule is important because gradient descent provides the basis for the
BACKPROPAGATON algorithm, which can learn networks with many interconnected
units.

Multilayer networks
In network theory, multidimensional networks, a special type of multilayer network,
are networks with multiple kinds of relations. Increasingly sophisticated attempts to
model real-world systems as multidimensional networks have yielded valuable
insight in the fields of social network analysis, economics, urban and international
transport, ecology, psychology, medicine, biology, Commerce, climatology, physics,
computational neuroscience, operations management, infrastructures, and finance.

The rapid exploration of complex networks in recent years has been dogged by a
lack of standardized naming conventions, as various groups use overlapping and
contradictory terminology to describe specific network configurations (e.g.,
multiplex, multilayer, multilevel, multidimensional, multirelational, interconnected).
Formally, multidimensional networks are edge-labeled multigraphs. The term “fully
multidimensional” has also been used to refer to a multipartite edge-labeled
multigraph. Multidimensional networks have also recently been reframed as specific
instances of multilayer networks. In this case, there are as many layers as there are
dimensions, and the links between nodes within each layer are simply all the links
for a given dimension.

Different Routes of Infection

Multilayer networks can be usefully applied in contexts where a pathogen can be


transmitted through multiple modes or pathways of infection, as the multiplex
approach provides a framework to account for multiple transmission probabilities.
Considering the presence of multiple transmission modes can influence the efficacy
of targeted interventions, particularly if nodes were traditionally targeted according
to their degree in only one layer. This has implications for situations where data,
networks, and resultant optimal control strategies are only available for one mode of
transmission, leading to overconfidence in the efficacy of control.
In the context of veterinary epidemiology, animal movements are typically
considered the most effective transmission mode between farms (direct contacts).
However, other infection mechanisms might play an important role such as wind-
borne spread and fomites disseminated through contaminated clothes, equipment,
and vehicles by personnel (indirect contacts). Ignoring one mode of transmission
could lead to inaccurate farm risk predictions and ineffective targeted surveillance.
This has been demonstrated in a network analysis that considered both direct (cattle
movements) and indirect (veterinarian movements) contacts to reveal that indirect
contact, despite being less efficient in transmission, can play a major role in spread
of a pathogen within a network.

In another example, Stella used an “eco multiplex model” to study the spread of
Trypanosoma Cruzi (cause of Chagas disease in humans) across different mammal
species. This pathogen can be transmitted either through invertebrate vectors
(Triatominae or kissing bugs) or through predation when a susceptible predator
feeds on infected prey or vectors. Thus, their model included two
ecological/transmission layers: the food-web and vector layers. Their results showed
that studying the multiplex network structure offered insights on which host species
facilitate parasite spread, and thus which would be more effective to immunize to
control the spread. At the same time, they showed how, in this system, when
parasites spread occurs primarily through the trophic layer, immunizing predators
hampers parasite transmission more than immunizing prey.

Furthermore, multilayer network analysis can help differentiate between different


types of social interactions that may lead to disease transmission. For example, sex-
related dynamics of contact networks can have important implications for disease
spread in animal populations, as seen in the spread of Mycobacterium bovis in
European badgers (Meles meles). The authors constructed an interconnected
network that distinguished male-male, female-female, and between-sex contacts
recorded during proximity loggers. Inter-layer between-sex edges and edges in the
male-male layer were more important in connecting groups into wider social
communities, and contacts between different social communities were also more
likely in these layers.

Dynamics of Coupled Processes: The Spread of Two Pathogens

Another application of multilayer networks in epidemiology is to model the


concurrent propagation of two entities through a network, such as two different
pathogens co-occurring in the same population or the spread of disease awareness
alongside the spread of infection. In both scenarios, the spread of one entity within
the network interacts with the spread of the other, creating a coupled dynamical
system. A multiplex approach can allow for each coupled process to spread through
a network that is based on the appropriate type of contact for propagation (i.e.,
contact networks involved in pathogen transmission vs. interaction or association
networks that allow information to spread). In the case of two infectious diseases
concurrently spreading through a network, a multiplex approach can be particularly
useful if infection of a node by pathogen A alters the susceptibility to pathogen B, or
if coinfection of a node influences its ability to transmit either pathogen. For
example, when infection by one pathogen increases the likelihood of becoming
infected by another pathogen, it could theoretically facilitate the spread of a second
pathogen and thus alter epidemic dynamics. This type of dynamic is likely to
widespread in wild and domestic animals due to the importance of co-infection in
affecting infectious disease dynamics by influencing the replication of pathogens
within hosts. However, when there is competition or cross-immunity, the spread of
one pathogen could reduce the spread of a second pathogen. For example, this type
of dynamic could be expected for pathogens strains characterized by partial cross-
immunity, such as avian influenza, or microparasite-macroparasite coinfections in
which infection with one parasite reduces transmission of a second, such as infection
with gastrointestinal helminths reducing the transmission of bovine tuberculosis in
African buffalo (Syncerus caffer). Similar “within-node” dynamics could be important
at a farm-level in livestock movement networks. For example, the detection of a
given pathogen infection in a farm might cause it to be quarantined, thus reduce its
susceptibility and ability to transmit other pathogen infections.

Dynamics of Coupled Processes: Interactions Between Transmission


Networks and Information/Social Networks

For coupled processes involving a disease alongside a social process (i.e., spread of
information or disease awareness), we might expect that the spread of the pathogen
will be associated with the spread of disease awareness or preventative behaviors
such as mask-wearing, and in these cases theoretical models suggest that
considering the spread of disease awareness can result in reduced disease spread. A
model was presented by Granell, which represented two competing processes on the
same network: infection spread (modeled using a Susceptible-Infected-Susceptible
compartmental model) coupled with information spread through a social network (an
Unaware-Aware-Unaware compartmental model). The authors used their model to
show that the timing of self-awareness of infection had little effect on the epidemic
dynamics. However, the degree of immunization (a parameter which regulates the
probability of becoming infected when aware) and mass media information spread
on the social layer did critically impact disease spread. A similar framework has been
used to study the effect of the diffusion of vaccine opinion (pro or anti) across a
social network with concurrent infectious disease spread. The study showed a clear
regime shift from a vaccinated population and controlled outbreak to vaccine refusal
and epidemic spread depending on the strength of opinion on the perceived risks of
the vaccine. The shift in outcomes from a controlled to uncontrolled outbreak was
accompanied by an increase in the spatial correlation of cases. While models in the
veterinary literature have accounted for altered behavior of nodes (imposition of
control measures) because of detection or awareness of disease, it is not common
for awareness to be considered as a dynamic process that is influenced by how each
node has interacted with the pathogen (i.e., contact with an infected neighbor). For
example, the rate of adoption of biosecurity practices at a farm, such as enhanced
surveillance, use of vaccination, or installation of air filtration systems, may be
dependent on the presence of disease in neighboring farms or the farmers’
awareness of a pathogen through a professional network of colleagues.
There is also some evidence that nodes that are more connected in their “social
support” networks (e.g., connections with family and close friends in humans) can
alter network processes that result in negative outcomes, such as pathogen
exposure or engagement in high-risk behavior. In a case based on users of
injectable drugs, social connections with non-injectors can reduce drug-users
connectivity in a network based on risky behavior with other drug injectors. In a
model presented by Chen, a social-support layer of a multiplex network drove the
allocation of resources for infection recovery, meaning that infected individuals
recovered faster if they possessed more neighbors in the social support layer. In
animal (both wild and domesticated) populations, this concept could be adapted to
represent an individual’s likelihood of recovery from, or tolerance to, infection being
influenced by the buffering effect of affiliative social relationships. For domestic
animals, investment in certain resources at a farm level could influence a premise’s
ability to recover (e.g., treatment) or onwards transmission of a pathogen (e.g.,
treatment or biosecurity practices). Sharing of these resources between farms could
be modeled through a “social-support” layer in a multiplex, for example, where a
farm’s transmissibility is impacted by access to shared truck-washing facilities.

Multi-Host Infections

Multilayer networks can be used to study the features of mixed species contact
networks or model the spread of a pathogens in a host community, providing
important insights into multi-host pathogens. Scenarios like this are commonplace at
the livestock-wildlife interface and therefore the insights provided could be of real
interest to veterinary epidemiology. In the case of multi-host pathogens, intralayer
and interlayer edges represent the contacts between individuals of the same species
and between individuals of different species, respectively. They can therefore be
used to identify bottlenecks of transmission and provide a clearer idea of how
spillover occurs. For example, Silk used an interconnected network with three layers
to study potential routes of transmission in a multi-host system. One layer consisted
of a wild European badger (Meles meles) contact network, the second a
domesticated cattle contact network, and the third a layer containing badger latrine
sites (potentially important sites of indirect environmental transmission). No
intralayer edges were possible in the latrine layer. The authors demonstrated the
importance of these environmental sites in shortening paths through the multilayer
network (for both between- and within-species transmission routes) and showed
that some latrine sites were more important than others in connecting the different
layers. Pilosof presented a theoretical model, labeling the species as focal (i.e., of
interest) and non-focal, showing that the outbreak probability and outbreak size
depend on which species originates the outbreak and on asymmetries in between-
species transmission probabilities.

Similar applications of multilayer networks could easily be extended to systems


where two or more species are domesticated animals, as well. Examples of these
could be the study of a pathogen such as Bluetongue virus, which affects both cattle
and sheep, or foot-and-mouth disease virus, which infects cattle, sheep, and pigs. In
such cases, each species can be represented by a different level in the network, and
interlayer edges are made possible because of mixed farms (i.e., cattle and sheep),
different species from different farms grazing on the same pasture, or for other
types of indirect contacts such as the sharing equipment or personnel.

Overall, multilayer approaches provide an elegant way to analyze cross-species


transmission and spillover, including for zoonotic pathogens across the human-
livestock-wildlife interface. They can be used to simultaneously model within-species
transmission, identify heterogeneities among nodes in their tendency to engage in
between-species contacts relevant for spillover and spillback, and better predict the
dynamics of spread prior and subsequent to cross-species transmission events,
which may contribute to forecasting outbreaks in target species. Measures of
multilayer network centrality in this instance could be used to extend the super
spreader concept into a community context; individuals that are influential in within-
species contact networks and possess between-species connections might be
predicted to have a more substantial influence on infectious disease dynamics in the
wider community.

Backpropagation Algorithm
In machine learning, backpropagation (backprop, BP) is a widely used algorithm for
training feedforward neural networks. Generalizations of backpropagation exist for
other artificial neural networks (ANNs), and for functions generally. These classes of
algorithms are all referred to generically as “backpropagation”. In fitting a neural
network, backpropagation computes the gradient of the loss function with respect to
the weights of the network for a single input–output example, and does so
efficiently, unlike a naive direct computation of the gradient with respect to each
weight individually. This efficiency makes it feasible to use gradient methods for
training multilayer networks, updating weights to minimize loss; gradient descent, or
variants such as stochastic gradient descent, are commonly used. The
backpropagation algorithm works by computing the gradient of the loss function
with respect to each weight by the chain rule, computing the gradient one layer at a
time, iterating backward from the last layer to avoid redundant calculations of
intermediate terms in the chain rule; this is an example of dynamic programming.

The term backpropagation strictly refers only to the algorithm for computing the
gradient, not how the gradient is used; however, the term is often used loosely to
refer to the entire learning algorithm, including how the gradient is used, such as by
stochastic gradient descent. Backpropagation generalizes the gradient computation
in the delta rule, which is the single-layer version of backpropagation, and is in turn
generalized by automatic differentiation, where backpropagation is a special case of
reverse accumulation (or “reverse mode”). The term backpropagation and its
general use in neural networks was announced in Rumelhart, Hinton & Williams
(1986a), then elaborated and popularized in Rumelhart, Hinton & Williams (1986b),
but the technique was independently rediscovered many times, and had many
predecessors dating to the 1960s; see § History. A modern overview is given in the
deep learning textbook by Goodfellow, Bengio & Courville.
The algorithm is used to effectively train a neural network through a method called
chain rule. In simple terms, after each forward pass through a network,
backpropagation performs a backward pass while adjusting the model’s parameters
(weights and biases).

Input layer

The neurons, colored in purple, represent the input data. These can be as simple as
scalars or more complex like vectors or multidimensional matrices.

Hidden layers

The final values at the hidden neurons, colored in green, are computed using z^l —
weighted inputs in layer l, and a^l— activations in layer l.

Output layer

The final part of a neural network is the output layer which produces the predicated
value. In our simple example, it is presented as a single neuron, colored in blue.

Plan of attack: Backpropagation is based around four fundamental equations.


Together, those equations give us a way of computing both the error δl and the
gradient of the cost function. Be warned, though: you shouldn’t expect to
instantaneously assimilate the equations. Such an expectation will lead to
disappointment. In fact, the backpropagation equations are so rich that
understanding them well requires considerable time and patience as you gradually
delve deeper into the equations. The good news is that such patience is repaid many
times over. And so the discussion in this section is merely a beginning, helping you
on the way to a thorough understanding of the equations.

Deep Learning Introduction


Deep learning (also known as deep structured learning) is part of a broader family of
machine learning methods based on artificial neural networks with representation
learning. Learning can be supervised, semi-supervised or unsupervised.

Deep-learning architectures such as deep neural networks, deep belief networks,


deep reinforcement learning, recurrent neural networks and convolutional neural
networks have been applied to fields including computer vision, speech recognition,
natural language processing, machine translation, bioinformatics, drug design,
medical image analysis, material inspection and board game programs, where they
have produced results comparable to and in some cases surpassing human expert
performance.

Artificial neural networks (ANNs) were inspired by information processing and


distributed communication nodes in biological systems. ANNs have various
differences from biological brains. Specifically, artificial neural networks tend to be
static and symbolic, while the biological brain of most living organisms is dynamic
(plastic) and analogue.

The adjective “deep” in deep learning refers to the use of multiple layers in the
network. Early work showed that a linear perceptron cannot be a universal classifier,
but that a network with a nonpolynomial activation function with one hidden layer of
unbounded width can. Deep learning is a modern variation which is concerned with
an unbounded number of layers of bounded size, which permits practical application
and optimized implementation, while retaining theoretical universality under mild
conditions. In deep learning the layers are also permitted to be heterogeneous and
to deviate widely from biologically informed connectionist models, for the sake of
efficiency, trainability, and understandability, whence the “structured” part.

Most modern deep learning models are based on artificial neural networks,
specifically convolutional neural networks (CNN)s, although they can also include
propositional formulas or latent variables organized layer-wise in deep generative
models such as the nodes in deep belief networks and deep Boltzmann machines.

In deep learning, each level learns to transform its input data into a slightly more
abstract and composite representation. In an image recognition application, the raw
input may be a matrix of pixels; the first representational layer may abstract the
pixels and encode edges; the second layer may compose and encode arrangements
of edges; the third layer may encode a nose and eyes; and the fourth layer may
recognize that the image contains a face. Importantly, a deep learning process can
learn which features to optimally place in which level on its own. This does not
completely eliminate the need for hand-tuning; for example, varying numbers of
layers and layer sizes can provide different degrees of abstraction.

The word “Deep” in “Deep learning” refers to the number of layers through which
the data is transformed. More precisely, deep learning systems have a substantial
credit assignment path (CAP) depth. The CAP is the chain of transformations from
input to output. CAPs describe potentially causal connections between input and
output. For a feedforward neural network, the depth of the CAPs is that of the
network and is the number of hidden layers plus one (as the output layer is also
parameterized). For recurrent neural networks, in which a signal may propagate
through a layer more than once, the CAP depth is potentially unlimited. No
universally agreed-upon threshold of depth divides shallow learning from deep
learning, but most researchers agree that deep learning involves CAP depth higher
than 2. CAP of depth 2 has been shown to be a universal approximator in the sense
that it can emulate any function. Beyond that, more layers do not add to the
function approximator ability of the network. Deep models (CAP > 2) can extract
better features than shallow models and hence, extra layers help in learning the
features effectively.

Deep learning architectures can be constructed with a greedy layer-by-layer method.


Deep learning helps to disentangle these abstractions and pick out which features
improve performance.

Deep neural networks are generally interpreted in terms of the universal


approximation theorem or probabilistic inference.

The classic universal approximation theorem concerns the capacity of feedforward


neural networks with a single hidden layer of finite size to approximate continuous
functions. In 1989, the first proof was published by George Cybenko for sigmoid
activation functions and was generalized to feed-forward multi-layer architectures in
1991 by Kurt Hornik. Recent work also showed that universal approximation also
holds for non-bounded activation functions such as the rectified linear unit.

The universal approximation theorem for deep neural networks concerns the
capacity of networks with bounded width but the depth is allowed to grow. Lu
proved that if the width of a deep neural network with ReLU activation is strictly
larger than the input dimension, then the network can approximate any Lebesgue
integrable function; If the width is smaller or equal to the input dimension, then
deep neural network is not a universal approximator.

The probabilistic interpretation derives from the field of machine learning. It features
inference, as well as the optimization concepts of training and testing, related to
fitting and generalization, respectively. More specifically, the probabilistic
interpretation considers the activation nonlinearity as a cumulative distribution
function. The probabilistic interpretation led to the introduction of dropout as
regularize in neural networks. The probabilistic interpretation was introduced by
researchers including Hopfield, Widrow and Narendra and popularized in surveys
such as the one by Bishop.

Architectures:

Deep Neural Network: It is a neural network with a certain level of complexity


(having multiple hidden layers in between input and output layers). They are
capable of modeling and processing non-linear relationships.
Deep Belief Network (DBN): It is a class of Deep Neural Network. It is multi-
layer belief networks.

Steps for performing DBN:

1. Learn a layer of features from visible units using Contrastive Divergence


algorithm.
2. Treat activations of previously trained features as visible units and then learn
features of features.
3. Finally, the whole DBN is trained when the learning for the final hidden layer
is achieved.

Recurrent (perform same task for every element of a sequence) Neural Network:
Allows for parallel and sequential computation. Like the human brain (large feedback
network of connected neurons). They can remember important things about the
input they received and hence enables them to be more precise.

Limitations:

 Learning through observations only


 The issue of biases

Advantages:

 Reduces need for feature engineering.


 Best in-class performance on problems.
 Eliminates unnecessary costs.
 Identifies defects easily that are difficult to detect.

Disadvantages:

 Computationally expensive to train.


 Large amount of data required.
 No strong theoretical foundation.

Applications:

Healthcare: Helps in diagnosing various diseases and treating it.

Automatic Text Generation: Corpus of text is learned and from this model new
text is generated, word-by-word or character-by-character. Then this model is
capable of learning how to spell, punctuate, form sentences, or it may even capture
the style.

Concept of Convolutional Neural network


In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of
artificial neural network, most applied to analyze visual imagery. They are also
known as shift invariant or space invariant artificial neural networks (SIANN), based
on the shared-weight architecture of the convolution kernels or filters that slide
along input features and provide translation equivariant responses known as feature
maps. Counter-intuitively, most convolutional neural networks are only equivariant,
as opposed to invariant, to translation. They have applications in image and video
recognition, recommender systems, image classification, image segmentation,
medical image analysis, natural language processing, brain-computer interfaces, and
financial time series.

CNNs are regularized versions of multilayer perceptron’s. Multilayer perceptron’s


usually mean fully connected networks, that is, each neuron in one layer is
connected to all neurons in the next layer. The “full connectivity” of these networks
makes them prone to overfitting data. Typical ways of regularization, or preventing
overfitting, include penalizing parameters during training (such as weight decay) or
trimming connectivity (skipped connections, dropout, etc.) CNNs take a different
approach towards regularization: they take advantage of the hierarchical pattern in
data and assemble patterns of increasing complexity using smaller and simpler
patterns embossed in their filters. Therefore, on a scale of connectivity and
complexity, CNNs are on the lower extreme.

Convolutional networks were inspired by biological processes in that the connectivity


pattern between neurons resembles the organization of the animal visual cortex.
Individual cortical neurons respond to stimuli only in a restricted region of the visual
field known as the receptive field. The receptive fields of different neurons partially
overlap such that they cover the entire visual field.

CNNs use relatively little pre-processing compared to other image classification


algorithms. This means that the network learns to optimize the filters (or kernels)
through automated learning, whereas in traditional algorithms these filters are hand-
engineered. This independence from prior knowledge and human intervention in
feature extraction is a major advantage.

Architecture

A convolutional neural network consists of an input layer, hidden layers and an


output layer. In any feed-forward neural network, any middle layers are called
hidden because their inputs and outputs are masked by the activation function and
final convolution. In a convolutional neural network, the hidden layers include layers
that perform convolutions. Typically, this includes a layer that performs a dot
product of the convolution kernel with the layer’s input matrix. This product is
usually the Frobenius inner product, and its activation function is commonly ReLU.
As the convolution kernel slides along the input matrix for the layer, the convolution
operation generates a feature map, which in turn contributes to the input of the next
layer. This is followed by other layers such as pooling layers, fully connected layers,
and normalization layers.
Convolutional layers

In a CNN, the input is a tensor with a shape: (number of inputs) x (input height) x
(input width) x (input channels). After passing through a convolutional layer, the
image becomes abstracted to a feature map, also called an activation map, with
shape: (number of inputs) x (feature map height) x (feature map width) x (feature
map channels).

Convolutional layers convolve the input and pass its result to the next layer. This is
similar to the response of a neuron in the visual cortex to a specific stimulus. Each
convolutional neuron processes data only for its receptive field. Although fully
connected feedforward neural networks can be used to learn features and classify
data, this architecture is generally impractical for larger inputs such as high-
resolution images. It would require a very high number of neurons, even in a
shallow architecture, due to the large input size of images, where each pixel is a
relevant input feature. For instance, a fully connected layer for a (small) image of
size 100 x 100 has 10,000 weights for each neuron in the second layer. Instead,
convolution reduces the number of free parameters, allowing the network to be
deeper. For example, regardless of image size, using a 5 x 5 tiling region, each with
the same shared weights, requires only 25 learnable parameters. Using regularized
weights over fewer parameters avoids the vanishing gradients and exploding
gradients problems seen during backpropagation in traditional neural networks.
Furthermore, convolutional neural networks are ideal for data with a grid-like
topology (such as images) as spatial relations between separate features are
considered during convolution and/or pooling.

Pooling layers

Convolutional networks may include local and/or global pooling layers along with
traditional convolutional layers. Pooling layers reduce the dimensions of data by
combining the outputs of neuron clusters at one layer into a single neuron in the
next layer. Local pooling combines small clusters, tiling sizes such as 2 x 2 are
commonly used. Global pooling acts on all the neurons of the feature map. There
are two common types of pooling in popular use: max and average. Max pooling
uses the maximum value of each local cluster of neurons in the feature map, while
average pooling takes the average value.

Fully connected layers

Fully connected layers connect every neuron in one layer to every neuron in another
layer. It is the same as a traditional multi-layer perceptron neural network (MLP).
The flattened matrix goes through a fully connected layer to classify the images.

Receptive field

In neural networks, each neuron receives input from some number of locations in
the previous layer. In a convolutional layer, each neuron receives input from only a
restricted area of the previous layer called the neuron’s receptive field. Typically, the
area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the
receptive field is the entire previous layer. Thus, in each convolutional layer, each
neuron takes input from a larger area in the input than previous layers. This is due
to applying the convolution over and over, which considers the value of a pixel, as
well as its surrounding pixels. When using dilated layers, the number of pixels in the
receptive field remains constant, but the field is more sparsely populated as its
dimensions grow when combining the effect of several layers.

Weights

Each neuron in a neural network computes an output value by applying a specific


function to the input values received from the receptive field in the previous layer.
The function that is applied to the input values is determined by a vector of weights
and a bias (typically real numbers). Learning consists of iteratively adjusting these
biases and weights.

The vector of weights and the bias are called filters and represent features of the
input (e.g., a particular shape). A distinguishing feature of CNNs is that many
neurons can share the same filter. This reduces the memory footprint because a
single bias and a single vector of weights are used across all receptive fields that
share that filter, as opposed to each receptive field having its own bias and vector
weighting.

Types of Layers (Convolutional Layers,


Activation function, Pooling, fully
connected)
Convolutional Layers

Convolutional layers are the major building blocks used in convolutional neural
networks.

A convolution is the simple application of a filter to an input that results in an


activation. Repeated application of the same filter to an input result in a map of
activations called a feature map, indicating the locations and strength of a detected
feature in an input, such as an image.

The innovation of convolutional neural networks is the ability to automatically learn


many filters in parallel specific to a training dataset under the constraints of a
specific predictive modelling problem, such as image classification. The result is
highly specific features that can be detected anywhere on input images.

Activation function
Activation function decides whether a neuron should be activated or not by
calculating weighted sum and further adding bias with it. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.

Neural network has neurons that work in correspondence of weight, bias and their
respective activation function. In a neural network, we would update the weights
and biases of the neurons based on the error at the output. This process is known
as back-propagation. Activation functions make the back-propagation possible since
the gradients are supplied along with the error to update the weights and biases.

1) Linear Function:

 Equation: Linear function has the equation similar to as of a straight line i.e. y
= ax
 No matter how many layers we have, if all are linear in nature, the final
activation function of last layer is nothing but just a linear function of the
input of first layer.
 Range: -inf to +inf
 Uses: Linear activation function is used at just one place i.e. output layer.
 Issues: If we will differentiate linear function to bring non-linearity, result will
no longer depend on input “x” and function will become constant, it won’t
introduce any ground-breaking behavior to our algorithm.

2) Sigmoid Function:

 It is a function which is plotted as ‘S’ shaped graph.


 Equation:

A = 1/(1 + e-x)

 Nature: Non-linear. Notice that X values lies between -2 to 2, Y values are


very steep. This means, small changes in x would also bring about large
changes in the value of Y.
 Value Range: 0 to 1
 Uses: Usually used in output layer of a binary classification, where result is
either 0 or 1, as value for sigmoid function lies between 0 and 1 only so,
result can be predicted easily to be 1 if value is greater than 0.5 and 0
otherwise.

Tanh Function: The activation that works almost always better than sigmoid
function is Tanh function also knows as Tangent Hyperbolic function. It’s
mathematically shifted version of the sigmoid function. Both are similar and can be
derived from each other.

Pooling Layer

The pooling or down sampling layer is responsible for reducing the special size of the
activation maps. In general, they are used after multiple stages of other layers (i.e.
convolutional and non-linearity layers) to reduce the computational requirements
progressively through the network as well as minimizing the likelihood of overfitting.

The key concept of the pooling layer is to provide translational invariance since
particularly in image recognition tasks, the feature detection is more important
compared to the feature’s exact location. Therefore, the pooling operation aims to
preserve the detected features in a smaller representation and does so, by
discarding less significant data at the cost of spatial resolution.

Fully connected

Fully connected layers connect every neuron in one layer to every neuron in another
layer. It is the same as a traditional multi-layer perceptron neural network (MLP).
The flattened matrix goes through a fully connected layer to classify the images.

Fully connected neural networks (FCNNs) are a type of artificial neural network
where the architecture is such that all the nodes, or neurons, in one layer are
connected to the neurons in the next layer.

While this type of algorithm is commonly applied to some types of data, in practice
this type of network has some issues in terms of image recognition and
classification. Such networks are computationally intense and may be prone to
overfitting. When such networks are also ‘deep’ (meaning there are many layers of
nodes or neurons) they can be particularly hard for humans to understand.

Training of Network, Recent Applications


ANNs are statistical models designed to adapt and self-program by using learning
algorithms to understand and sort out concepts, images, and photographs. For
processors to do their work, developers arrange them in layers that operate in
parallel. The input layer is analogous to the dendrites in the human brain’s neural
network. The hidden layer is comparable to the cell body and sits between the input
layer and output layer (which is akin to the synaptic outputs in the brain). The
hidden layer is where artificial neurons take in a set of inputs based on synaptic
weight, which is the amplitude or strength of a connection between nodes. These
weighted inputs generate an output through a transfer function to the output layer.

Attributes of Neural Networks

With the human-like ability to problem-solve and apply that skill to huge datasets
neural networks possess the following powerful attributes:

Adaptive Learning: Like humans, neural networks model non-linear and complex
relationships and build on previous knowledge. For example, software uses adaptive
learning to teach math and language arts.
Self-Organization: The ability to cluster and classify vast amounts of data makes
neural networks uniquely suited for organizing the complicated visual problems
posed by medical image analysis.

Real-Time Operation: Neural networks can (sometimes) provide real-time


answers, as is the case with self-driving cars and drone navigation.

Prognosis: NN’s ability to predict based on models has a wide range of


applications, including for weather and traffic.

Fault Tolerance: When significant parts of a network are lost or missing, neural
networks can fill in the blanks. This ability is especially useful in space exploration,
where the failure of electronic devices is always a possibility.

Tasks Neural Networks Perform

Neural networks are highly valuable because they can carry out tasks to make sense
of data while retaining all their other attributes. Here are the critical tasks that
neural networks perform:

Classification: NNs organize patterns or datasets into predefined classes.

Prediction: They produce the expected output from given input.

Clustering: They identify a unique feature of the data and classify it without any
knowledge of prior data.

Associating: You can train neural networks to “remember” patterns. When you
show an unfamiliar version of a pattern, the network associates it with the most
comparable version in its memory and reverts to the latter.

Network engineering applications currently in use in various industries:

 Aerospace: Aircraft component fault detectors and simulations, aircraft


control systems, high-performance auto-piloting, and flight path simulations
 Automotive: Improved guidance systems, development of power trains,
virtual sensors, and warranty activity analyzers
 Electronics: Chip failure analysis, circuit chip layouts, machine vision, non-
linear modeling, prediction of the code sequence, process control, and voice
synthesis
 Manufacturing: Chemical product design analysis, dynamic modeling of
chemical process systems, process control, process and machine diagnosis,
product design and analysis, paper quality prediction, project bidding,
planning and management, quality analysis of computer chips, visual quality
inspection systems, and welding quality analysis
 Mechanics: Condition monitoring, systems modeling, and control
 Robotics: Forklift robots, manipulator controllers, trajectory control, and
vision systems
 Telecommunications: ATM network control, automated information
services, customer payment processing systems, data compression,
equalizers, fault management, handwriting recognition, network design,
management, routing and control, network monitoring, real-time translation
of spoken language, and pattern recognition (faces, objects, fingerprints,
semantic parsing, spell check, signal processing, and speech recognition).

Types of Neural Networks in Artificial Intelligence

Parameter Types Description


Based on the Feedforward: In which graphs have no
Feedforward,
connection loops. Recurrent: Loops occur because of
Recurrent
pattern feedback.
Based on the Single Layer: Having one secret layer. E.g., Single
Single layer,
number of Perceptron Multilayer: Having multiple secret
multi-Layer
hidden layers layers. Multilayer Perceptron
Based on the Fixed: Weights are a fixed priority and not changed
Fixed,
nature of at all. Adaptive: Updates the weights and changes
Adaptive
weights during training.
Static: Memoryless unit. The current output
depends on the current input. E.g., Feedforward
Based on the Static,
network. Dynamic: Memory unit – The output
Memory unit Dynamic
depends upon the current input as well as the
current output. E.g., Recurrent Neural Network

Perceptron Model in Neural Networks

Neural Network is having two input units and one output unit with no hidden layers.
These are also known as ‘single-layer perceptron’s.’

Radial Basis Function Neural Network

These networks are like the feed-forward Neural Network, except radial basis
function is used as these neurons’ activation function.

Multilayer Perceptron Neural Network

These networks use more than one hidden layer of neurons, unlike single-layer
perceptron. These are also known as Deep Feedforward Neural Networks.

Recurrent Neural Network

Type of Neural Network in which hidden layer neurons have self-connections.


Recurrent Neural Networks possess memory. At any instance, the hidden layer
neuron receives activation from the lower layer and its previous activation value.

Long Short-Term Memory Neural Network (LSTM)


The type of Neural Network in which memory cell is incorporated into hidden layer
neurons is called LSTM network.

Hopfield Network

A fully interconnected network of neurons in which each neuron is connected to


every other neuron. The network is trained with input patterns by setting a value of
neurons to the desired pattern. Then its weights are computed. The weights are not
changed. Once trained for one or more patterns, the network will converge to the
learned patterns. It is different from other Neural Networks.

Boltzmann Machine Neural Network

These networks are like the Hopfield network, except some neurons are input, while
others are hidden in nature. The weights are initialized randomly and learn through
the backpropagation algorithm.

Convolutional Neural Network

Get a complete overview of Convolutional Neural Networks through our blog Log
Analytics with Machine Learning and Deep Learning.

Modular Neural Network

It is the combined structure of different types of neural networks like multilayer


perceptron, Hopfield Network, Recurrent Neural Network, etc., which are
incorporated as a single module into the network to perform independent subtask of
whole complete Neural Networks.

Physical Neural Network

In this type of Artificial Neural Network, electrically adjustable resistance material is


used to emulate synapse instead of software simulations performed in the neural
network.

Introduction to Reinforcement Learning,


Learning Task, Example of Reinforcement
Learning in Practice, Learning model for
Reinforcement Markov Decision process
Reinforcement learning (RL) is an area of machine learning concerned with how
intelligent agents ought to take actions in an environment in order to maximize the
notion of cumulative reward. Reinforcement learning is one of three basic machine
learning paradigms, alongside supervised learning and unsupervised learning.
Reinforcement learning differs from supervised learning in not needing labelled
input/output pairs be presented, and in not needing sub-optimal actions to be
explicitly corrected. Instead, the focus is on finding a balance between exploration
(of uncharted territory) and exploitation (of current knowledge). Partially supervised
RL algorithms can combine the advantages of supervised and RL algorithms.

The environment is typically stated in the form of a Markov decision process (MDP)
because many reinforcement learning algorithms for this context use dynamic
programming techniques. The main difference between the classical dynamic
programming methods and reinforcement learning algorithms is that the latter do
not assume knowledge of an exact mathematical model of the MDP and they target
large MDPs where exact methods become infeasible.

Reinforcement learning is an area of Machine Learning. It is about taking suitable


action to maximize reward in a particular situation. It is employed by various
software and machines to find the best possible behavior or path it should take in a
specific situation. Reinforcement learning differs from supervised learning in a way
that in supervised learning the training data has the answer key with it so the model
is trained with the correct answer itself whereas in reinforcement learning, there is
no answer but the reinforcement agent decides what to do to perform the given
task. In the absence of a training dataset, it is bound to learn from its experience.

Focal point of Reinforcement learning:

Input: The input should be an initial state from which the model will start

Output: There are many possible outputs as there are a variety of solutions to a
particular problem

Training: The training is based upon the input; The model will return a state and
the user will decide to reward or punish the model based on its output. The model
keeps continues to learn. The best solution is decided based on the maximum
reward.

Types of Reinforcement: There are two types of Reinforcement:

Positive:

Positive Reinforcement is defined as when an event, occurs due to a particular


behavior, increases the strength and the frequency of the behavior. In other words,
it has a positive effect on behavior.

Advantages of reinforcement learning are:

 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish
the results
Negative:

Negative Reinforcement is defined as strengthening of behavior because a negative


condition is stopped or avoided.

Advantages of reinforcement learning:

 Increases Behavior.
 Provide defiance to a minimum standard of performance.
 It Only provides enough to meet up the minimum behavior.

Example of Reinforcement Learning in Practice

 Robotics for industrial automation.


 Business strategy planning.
 Machine learning and data processing.
 It helps you to create training systems that provide custom instruction and
materials according to the requirement of students.
 Aircraft control and robot motion control.

Learning model for Reinforcement Markov Decision process

There are many different algorithms that tackle this issue. As a matter of fact,
Reinforcement Learning is defined by a specific type of problem, and all its solutions
are classed as Reinforcement Learning algorithms. In the problem, an agent is
supposed to decide the best action to select based on his current state. When this
step is repeated, the problem is known as a Markov Decision Process.

A Markov Decision Process (MDP) model contains:

 A set of possible world states S.


 A set of Models.
 A set of possible actions A.
 A real-valued reward function R(s,a).
 A policy the solution of Markov Decision Process.

State

A State is a set of tokens that represent every state that the agent can be in.

Model

A Model (sometimes called Transition Model) gives an action’s effect in a state. In


particular, T (S, a, S’) defines a transition T where being in state S and taking an
action ‘a’ takes us to state S’ (S and S’ may be the same). For stochastic actions
(noisy, non-deterministic) we also define a probability P(S’|S,a) which represents the
probability of reaching a state S’ if action ‘a’ is taken in state S. Note Markov
property states that the effects of an action taken in a state depend only on that
state and not on the prior history.

Actions

An Action A is a set of all possible actions. A(s) defines the set of actions that can be
taken being in state S.

Q Learning: Q Learning function, Q


Learning Algorithm), Application of
Reinforcement Learning, Introduction to
Deep Q Learning
Q-learning is a model-free reinforcement learning algorithm to learn the value of
an action in a particular state. It does not require a model of the environment
(hence “model-free”), and it can handle problems with stochastic transitions and
rewards without requiring adaptations.

For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in
the sense of maximizing the expected value of the total reward over all successive
steps, starting from the current state. Q-learning can identify an optimal action-
selection policy for any given FMDP, given infinite exploration time and a partly
random policy. “Q” refers to the function that the algorithm computes the expected
rewards for an action taken in each state.

Reinforcement learning involves an agent, a set of states’ S, and a set A of actions


per state. By performing an action, a€A, the agent transitions from state to state.
Executing an action in a specific state provides the agent with a reward (a numerical
score).

The goal of the agent is to maximize its total reward. It does this by adding the
maximum reward attainable from future states to the reward for achieving its
current state, effectively influencing the current action by the potential future
reward. This potential reward is a weighted sum of expected values of the rewards
of all future steps starting from the current state.

As an example, consider the process of boarding a train, in which the reward is


measured by the negative of the total time spent boarding (alternatively, the cost of
boarding the train is equal to the boarding time). One strategy is to enter the train
door as soon as they open, minimizing the initial wait time for yourself. If the train is
crowded, however, then you will have a slow entry after the initial action of entering
the door as people are fighting you to depart the train as you attempt to board. The
total boarding time, or cost, is then:
0 seconds wait time + 15 seconds fight time

On the next day, by random chance (exploration), you decide to wait and let other
people depart first. This initially results in a longer wait time. However, time-fighting
other passengers are less. Overall, this path has a higher reward than that of the
previous day, since the total boarding time is now:

5 second wait time + 0 second fight time

Through exploration, despite the initial (patient) action resulting in a larger cost (or
negative reward) than in the forceful strategy, the overall cost is lower, thus
revealing a more rewarding strategy.

Q Learning Algorithm

Q-learning is a model-free reinforcement learning algorithm.

Q-learning is a values-based learning algorithm. Value based algorithms updates the


value function based on an equation (particularly Bellman equation). Whereas the
other type, policy-based estimates the value function with a greedy policy obtained
from the last policy improvement.

Q-learning is an off-policy learner. Means it learns the value of the optimal policy
independently of the agent’s actions. On the other hand, an on-policy learner learns
the value of the policy being carried out by the agent, including the exploration steps
and it will find a policy that is optimal, considering the exploration inherent in the
policy.

Q-Table

Q-Table is the data structure used to calculate the maximum expected future
rewards for action at each state. Basically, this table will guide us to the best action
at each state. To learn each value of the Q-table, Q-Learning algorithm is used.

Step 1: initialize the Q-Table

We will first build a Q-table. There are n columns, where n= number of actions.
There are m rows, where m= number of states. We will initialize the values at 0.

Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of time. This means that
this step runs until the time we stop the training, or the training loop stops as
defined in the code.

We will choose an action (a) in the state (s) based on the Q-Table. But, as
mentioned earlier when the episode initially starts, every Q-value is 0.
Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and reward. We need to
update the function Q(s,a).

Application of Reinforcement Learning

Manufacturing

In Fanuc, a robot uses deep reinforcement learning to pick a device from one box
and putting it in a container. Whether it succeeds or fails, it memorizes the object
and gains knowledge and train’s itself to do this job with great speed and precision.

Many warehousing facilities used by eCommerce sites and other supermarkets use
these intelligent robots for sorting their millions of products every day and helping to
deliver the right products to the right people. If you look at Tesla’s factory, it
comprises of more than 160 robots that do major part of work on its cars to reduce
the risk of any defect.

Finance

Reinforcement learning has helped develop several innovative applications in the


financial industry. This combined with Machine Learning has made several
differences in the domain over the years. Today, there are numerous technologies
involved in finance, such as search engines, chatbots, etc.

Several reinforcement learning techniques can help generate more return on


investment, reduce cost, improve customer experience, etc. Reinforcement learning
and Machine Learning, together, can result in improved execution while approving
loans, measuring risk factors, and managing investments.

One of the most popular applications of reinforcement learning in finance is portfolio


management. It is building a platform that allows you to make significantly more
accurate predictions with regards to stock and other such investments, thereby
providing better results. This is one of the main reasons why most investors in the
industry wish to create these applications to evaluate the financial market in a
detailed manner. Moreover, many of these portfolio management applications,
including Robo-advisors, allow you to generate more accurate results with time.

Inventory Management

A major issue in supply chain inventory management is the coordination of inventory


policies adopted by different supply chain actors, such as suppliers, manufacturers,
distributors, to smooth material flow and minimize costs while responsively meeting
customer demand.
Reinforcement learning algorithms can be built to reduce transit time for stocking as
well as retrieving products in the warehouse for optimizing space utilization and
warehouse operations.

Healthcare

With technology improving and advancing on a regular basis, it has taken over
almost every industry today, especially the healthcare sector. With the
implementation of reinforcement learning, the healthcare system has generated
better outcomes consistently. One of the most common areas of reinforcement
learning in the healthcare domain is Quotient Health.

Quotient Health is a software app built to target reduced expenses on electronic


medical record assistance. The app achieves this by standardizing and enhancing the
methods that create such systems. The main goal of this is to make improvements in
the healthcare system, specifically by lowering unnecessary costs.

Delivery Management

Reinforcement learning is used to solve the problem of Split Delivery Vehicle


Routing. Q-learning is used to serve appropriate customers with just one vehicle.

Image Processing

The image processing field is a subcategory of the healthcare domain. It is,


somewhat, a part of the medical industry but having a domain of its own. Honestly,
reinforcement learning revolutionized not only image processing but the medical
industry at large. However, here, we will discuss some of the applications of this
technology in image processing alone.

Deep Q-learning

The DeepMind system used a deep convolutional neural network, with layers of tiled
convolutional filters to mimic the effects of receptive fields. Reinforcement learning is
unstable or divergent when a nonlinear function approximator such as a neural
network is used to represent Q. This instability comes from the correlations present
in the sequence of observations, the fact that small updates to Q may significantly
change the policy of the agent and the data distribution, and the correlations
between Q and the target values.

The technique used experience replay, a biologically inspired mechanism that uses a
random sample of prior actions instead of the most recent action to proceed. This
removes correlations in the observation sequence and smooths changes in the data
distribution. Iterative updates adjust Q towards target values that are only
periodically updated, further reducing correlations with the target.

Because the future maximum approximated action value in Q-learning is evaluated


using the same Q function as in current action selection policy, in noisy
environments Q-learning can sometimes overestimate the action values, slowing the
learning. A variant called Double Q-learning was proposed to correct this. Double Q-
learning is an off-policy reinforcement learning algorithm, where a different policy is
used for value evaluation than what is used to select the next action.

You might also like