0% found this document useful (0 votes)
44 views57 pages

Rtmnu AIIIII

Uploaded by

Priti Dhekne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views57 pages

Rtmnu AIIIII

Uploaded by

Priti Dhekne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Machine learning

Unsupervised Learning
In supervised learning, the aim is to learn a mapping from the input to an output
whose correct values are provided by a supervisor. In unsupervised learning,
there is no such supervisor and we only have input data. The aim is to find the
regularities in the input. There is a structure to the input space such that certain
patterns occur more often than others, and we want to see what generally
happens and what does not. In statistics, this is called density estimation.
One method for density estimation is clustering where the aim is to find clusters
or groupings of input. In the case of a company with a data of past customers,
the customer data contains the demographic information as well as the past
transactions with the company, and the company may want to see the
distribution of the profile of its customers, to see what type of customers
frequently occur. In such a case, a clustering model allocates customers similar in
their attributes to the same group, providing the company with naturalgroupings
of its customers; this is called customer segmentation. Once such groups are
found, the company may decide strategies, for example, services and products,
specific to different groups; this is known as customer relationship management.
Such a grouping also allows identifying those who are outliers, namely, those
who are different from other customers, which may imply a niche in the market
that can be further exploited by the company.
Unsupervised learning can be classified into two category:
1)clustering
2)Association
Supervised learning

In Supervised Learning, the machine learns under supervision. It contains a


model that is able to predict with the help of a label dataset. A label dataset is
one where you already know the target answer.

Supervised learning can be further divided into two types:


1. Classification

2. Regression

3. Classification is used when the output variable is categorical i.e. with 2 or


more classes. For example, yes or no, male or female, true or false, etc.

4. Regression is used when the output variable is a real or continuous value.


In this case, there is a relationship between two or more variables i.e., a
change in one variable is associated with a change in the other variable.
For example, salary based on work experience or weight based on height,
etc.

Reinforcement Learning

1) In reinforcement learning, the learner is a decision- making agent that takes


actions in an environment and receives reward (or penalty) for its actions in trying
to solve a problem. After a set of trial-and error runs, it should learn the best
policy, which is the sequence of actions that maximize the total reward.
2)In reinforcement learning There is a decision maker, called the agent, that is
placed in an environment (see figure 18.1). In chess, the game-player is the
decision maker and the environmentis the board;

3)At any time, the environment is in a certain state that is one of a set of possible
states for example, the state of the board, the position of the robot in the maze. The
decision maker has a set of actions possible: legal movement of pieces on the chess
board, movement of the robot in possible directions without hitting the walls, and
so forth. Once an action is chosen and taken, the state changes. The solution to the
task requires a sequence of actions, and we get feedback, in the form of a reward
rarely, generally only when the complete sequence is carried out. The reward
defines the problem and is necessary if we want a learning agent. The learning agent
learns the best sequence of actions to solve a problem where “best” is quantified as
the sequence of actions that has the maximum cumulative reward. Such is the
setting of reinforcement learning.

Reinforcement learning is different from the learning methods we discussed before


in a number of respects. It is called “learning with a critic,” as opposed to learning
with a teacher which we have in supervised learning. A critic differs from a teacher
in that it does not tell us what to do but only how well we have been doing in the
past; the critic never informs in advance. The feedback from the critic is scarce and
when it comes, it comes late.
Difference between human and machine intelligence

Human intelligence
Difficult to predict
Machine intelligence
Easy and consistent to predict
Difference between supervised and unsupervised learning
Decision Tree
Application of decision tree

1. Decision Making

In daily life, we use decision trees without even realizing it. For instance, imagine you’re
trying to decide what to wear in the morning. You might ask yourself:

 “What’s the weather like?” (root node)

 If it’s hot, you might choose to wear a t-shirt (leaf node).

 If it’s cold (decision node), you might ask yourself:

 “Will it rain?” (another decision node)

 If yes, you’d wear a jacket and take an umbrella (leaf node).


 If no, you might just wear a sweater (leaf node).

This simple example represents a decision tree in real life. Similarly, businesses use

decision trees to help make decisions about strategies, investments, and operations.

2. Health Care

In healthcare, decision trees can help medical professionals with diagnoses. For

example, based on symptoms (decision nodes), a doctor can narrow down the possible

conditions (leaf nodes). This can be particularly useful for initial screening or in rural

areas where there is a lack of specialized healthcare professionals.

3. Financial Analysis

In the financial sector, decision trees are used in options pricing and strategy

development. They can model possible future price movements based on different

market conditions to help investors make informed decisions.

4. Customer Relationship Management (CRM)

Companies use decision trees to predict customer behavior, such as whether a customer

will churn or respond positively to a marketing campaign. Based on different

characteristics (e.g., age, purchase history, browsing behavior), a company can

categorize customers and tailor their marketing strategies accordingly.

5. Quality Control

In manufacturing and quality control, decision trees can be used to predict whether a

product will fail a quality assurance test based on different measurements and

conditions during the manufacturing process.

6. Fraud Detection
Decision trees can help detect fraud by identifying patterns in transactions. Based on

parameters like transaction frequency, amount, and location, a decision tree model can

flag suspicious activities for further investigation.

7. Recommendation Systems

Many online platforms use decision trees as part of their recommendation algorithms.

For example, Netflix or Spotify may use decision trees to determine what movies or

songs to recommend based on a user’s past viewing or listening habits, demographic

information, and preferences.

Clustering
Before diving into algorithmic details, let’s just build an intuition behind clustering using a toy
example of fruit datasets.

Let’s say we have a huge collection of image dataset containing three fruits (i) strawberries, (ii)
pears, and (iii) apples.

In the dataset all the images are mixed up and your use-case is to group similar fruits together
i.e. create three groups with each one of them containing one type of fruit. This is exactly what
a clustering algorithm will do.
REGRESSION
1)Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables. More specifically, Regression analysis
helps us to understand how the value of the dependent variable is changing corresponding to an independent
variable when other independent variables are held fixed. It predicts continuous/real values such as temperature,
age, salary, price, etc.

2)Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables. It is
mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship
between variables.

3)There are different types of regression. Two of the most common are linear regression and logistic regression.
In linear regression, the goal is to fit all the data points along a clear line. Logistic regression focuses on
determining whether each data point should be below or above the line. This is useful for sorting observations
into distinct buckets such as fraud/not-fraud, spam/not-spam or cat/not-cat.

4)Regression analysis helps in the prediction of a continuous variable. There are various scenarios in the real world
where we need some future predictions such as weather condition, sales prediction, marketing trends, etc., for such
case we need some technology which can make predictions more accurately. So for such case we need Regression
analysis which is a statistical method and used in machine learning and data science. Below are some other reasons
for using Regression analysis:

o Regression estimates the relationship between the target and the independent variable.

o It is used to find the trends in data.

o It helps to predict real/continuous values.

o By performing the regression, we can confidently determine the most important factor, the least
important factor, and how each factor is affecting the other factors.

Example: Suppose there is a marketing company A, who does various advertisement every year
and get sales on that. The below list shows the advertisement made by the company in the last
5 years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know the prediction about
the sales for this year. So to solve such type of prediction problems in machine learning, we need regression
analysis.

Linear Regression:

o Linear regression is a statistical regression method which is used for predictive analysis.

o It is one of the very simple and easy algorithms which works on regression and shows the relationship
between the continuous variables.

o It is used for solving the regression problem in machine learning.

o Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent
variable (Y-axis), hence called linear regression.

o If there is only one input variable (x), then such linear regression is called simple linear regression. And
if there is more than one input variable, then such linear regression is called multiple linear regression.

o The relationship between variables in the linear regression model can be explained using the below image.
Here we are predicting the salary of an employee on the basis of the year of experience.

Logistic Regression:

o Logistic regression is another supervised learning algorithm which is used to solve the classification
problems. In classification problems, we have dependent variables in a binary or discrete format such as 0
or 1.

o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False,
Spam or not spam, etc.

o It is a predictive analysis algorithm which works on the concept of probability.

o Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term
how they are used.

o Logistic regression uses sigmoid function or logistic function which is a complex cost function. This
sigmoid function is used to model the data in logistic regression
Classification
As we know, the Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms. In Regression algorithms, we have predicted the output for continuous values, but to
predict the categorical values, we need Classification algorithms.

What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the category of new
observations on the basis of training data. In Classification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not
Spam, cat or dog, etc. Classes can be called as targets/labels or categories.

Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or
animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data,
which means it contains input with the corresponding output.

The best example of an ML classification algorithm is Email Spam Detector.

The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are
mainly used to predict the output for the categorical data.

Classification algorithms can be better understood using the below diagram. In the below diagram, there are two
classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other classes.
The algorithm which implements the classification on a dataset is known as a classifier. There are two types of
Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.

o Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi
class Classifier.
Example: Classifications of types of crops, Classification of types of music.

#)Difference between classification and regression?

What is kernel Machine algorithm?

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is
the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems.

The general task of pattern analysis is to find and study general types of relations (for
example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that
solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations
via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity
function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite
dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem.
Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel
processing.

Kernel method in machine learning is defined as the class of algorithms for pattern analysis, which is used to study
and find the general types of relations (such as correlation, classification, ranking, clusters, principle components,
etc) in datasets by transforming raw representation of the data explicitly into feature vector representation using a
user-specified feature map so that the high dimensional implicit feature space of these data can be operated with
computing the coordinates of the data in that particular space.

The algorithm used for pattern analysis. In general pattern, analysis is done to find relations in datasets. These

relations can be clustering, classification, principal components, correlation, etc. Most of these algorithms that

solve these tasks of analyzing the pattern, Need the data in raw representative, to be explicitly transformed into a

feature vector representation. This transformation can be done via a user-specified feature map. So, it can be taken

that only the user-specified kernel is required by the kernel method.

The terminology Kernal Method comes from the fact that they use kernel function, which allows them to perform

the operation in high-dimensional, implicit feature space without the need of computing the coordinates of the data
in that space. Instead, they simply compute the inner product between the images of all pairs of data in feature

space.

These kinds of operations are computationally cheaper most of the time compared to the explicit computation of

the coordinates. This technique is termed as ‘kernel trick’. Any linear model can be converted into a non-linear

model by applying the kernel trick to the model.

What is linear discrimination ?

Linear discriminant analysis (LDA) is an approach used in supervised machine learning to solve multi-class
classification problems. LDA separates multiple classes with multiple features through data dimensionality
reduction. This technique is important in data science as it helps optimize machine learning models.

What is Linear Discriminant Analysis?


Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis
or Discriminant Function Analysis, is a dimensionality reduction technique
primarily utilized in supervised classification problems. It facilitates the modeling
of distinctions between groups, effectively separating two or more classes. LDA
operates by projecting features from a higher-dimensional space into a lower-
dimensional one. In machine learning, LDA serves as a supervised learning
algorithm specifically designed for classification tasks, aiming to identify a linear
combination of features that optimally segregates classes within a dataset.
For example, we have two classes and we need to separate them efficiently. Classes
can have multiple features. Using only a single feature to classify them may result
in some overlapping as shown in the below figure. So, we will keep on increasing
the number of features for proper classification. Assumptions of LDA
LDA assumes that the data has a Gaussian distribution and that
the covariance matrices of the different classes are equal. It also assumes that the
data is linearly separable, meaning that a linear decision boundary can accurately
classify the different classes.
Suppose we have two sets of data points belonging to two different classes that we
want to classify. As shown in the given 2D graph, when the data points are plotted
on the 2D plane, there’s no straight line that can separate the two classes of data
points completely. Hence, in this case, LDA (Linear Discriminant Analysis) is used
which reduces the 2D graph into a 1D graph in order to maximize the separability
between the two classes.
Here, Linear Discriminant Analysis uses both axes (X and Y) to create a new axis
and projects data onto a new axis in a way to maximize the separation of the two
categories and hence, reduces the 2D graph into a 1D graph.
Two criteria are used by LDA to create a new axis:
1. Maximize the distance between the means of the two classes.
2. Minimize the variation within each class.

Use geeks or java point

Search in AI

Artificial Intelligence is the study of building agents that act rationally. Most of the time, these agents perform some
kind of search algorithm in the background in order to achieve their tasks. Search is one of the most important area
of artificial intelligence. Searching is the universal technique of problem solving in AI.

o Search: Searching is a step by step procedure to solve a search-problem in a given search space. A search
problem can have three main factors:

a. Search Space: Search space represents a set of possible solutions, which a system may have.

b. Start State: It is a state from where agent begins the search.

c. Goal test: A function that looks at the current state returns whether or not it is the goal state.
Types of search algorithms

Based on the search problems we can classify the search algorithms into uninformed (Blind search) search
and informed search (Heuristic search) algorithms.

 The uninformed search does not contain any domain knowledge such as closeness, the location of the goal.
It operates in a brute-force way as it only includes information about how to traverse the tree and how to
identify leaf and goal nodes.
 Uninformed search applies a way in which search tree is searched without any information about the search
space like initial state operators and test for the goal, so it is also called blind search.It examines each node
of the tree until it achieves the goal node.

Informed Search

 Informed search algorithms use domain knowledge. In an informed search, problem information is available
which can guide the search. Informed search strategies can find a solution more efficiently than an
uninformed search strategy. Informed search is also called a Heuristic search.
 A heuristic is a way which might not always be guaranteed for best solutions but guaranteed to find a good
solution in reasonable time.
 Informed search can solve much complex problem which could not be solved in another way.

An example of informed search algorithms is a traveling salesman problem.

1. Greedy Search

2. A* Search

1. Breadth-first Search:

o Breadth-first search is the most common search strategy for traversing a tree or graph. This algorithm
searches breadthwise in a tree or graph, so it is called breadth-first search.

o BFS algorithm starts searching from the root node of the tree and expands all successor node at the current
level before moving to nodes of next level.

o The breadth-first search algorithm is an example of a general-graph search algorithm.

o Breadth-first search implemented using FIFO queue data structure.

Example:

Question. Which solution would BFS find to move from node S to node G if run on the graph below?
Solution. The equivalent search tree for the above graph is as follows. As BFS traverses the tree “shallowest
node first”, it would always pick the shallower branch until it reaches the solution (or it runs out of nodes,
and goes to the next branch). The traversal is shown in blue arrows.

Path: S -> D -> G

 s = the depth of the shallowest solution.

 n^i = number of nodes in level i .

 Time complexity: Equivalent to the number of nodes traversed in BFS until the shallowest solution.
T(n) = 1 + n^2 + n^3 + ... + n^s = O(n^s)

 Space complexity: Equivalent to how large can the fringe get. S(n) = O(n^s)

 Optimality: BFS is optimal as long as the costs of all edges are equal.

2. Depth-first Search
o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.
o It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts
at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible
along each branch before backtracking. It uses last in- first-out strategy and hence it is implemented using a stack.
Example:

Question. Which solution would DFS find to move from node S to node G if run on the graph below?

The equivalent search tree for the above graph is as follows. As DFS traverses the tree “deepest node first”, it would
always pick the deeper branch until it reaches the solution (or it runs out of nodes, and goes to the next branch). The
traversal is shown in blue arrows.

Path: S -> A -> B -> C -> G


d = the depth of the search tree = the number of levels of the search tree.

n^i = number of nodes in level i

Time complexity: Equivalent to the number of nodes traversed in DFS. T(n) = 1 + n^2 + n^3 + ... + n^d = O(n^d)

Space complexity: Equivalent to how large can the fringe get. S(n) = O(n *d)

Completeness: DFS is complete if the search tree is finite, meaning for a given finite search tree, DFS will come up
with a solution if it exists.

Optimality: DFS is not optimal, meaning the number of steps in reaching the solution, or the cost spent in reaching
it is high.

Uniform Cost Search:

Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This algorithm comes
into play when a different cost is available for each edge. The primary goal of the uniform-cost search is to find a
path to the goal node which has the lowest cumulative cost. Uniform-cost search expands nodes according to their
path costs form the root node. It can be used to solve any graph/tree where the optimal cost is in demand. A uniform-
cost search algorithm is implemented by the priority queue. It gives maximum priority to the lowest cumulative cost.
Uniform cost search is equivalent to BFS algorithm if the path cost of all edges is the same.

INFORMED SEARCH

1.) Best-first Search Algorithm (Greedy Search):

Greedy best-first search algorithm always selects the path which appears best at that
moment. It is the combination of depth-first search and breadth-first search algorithms. It
uses the heuristic function and search. Best-first search allows us to take the advantages
of both algorithms. With the help of best-first search, at each step, we can choose the
most promising node. In the best first search algorithm, we expand the node which is
closest to the goal node and the closest cost is estimated by heuristic function, i.e.

A* Tree Search:

A* Tree Search, or simply known as A* Search, combines the strengths of uniform-cost


search and greedy search. In this search, the heuristic is the summation of the cost in
UCS, denoted by g(x), and the cost in the greedy search, denoted by h(x). The summed
cost is denoted by f(x).

A* Graph Search:

A* tree search works well, except that it takes time re-exploring the branches it has
already explored. In other words, if the same node has expanded twice in different
branches of the search tree, A* search might explore both of those branches, thus
wasting time

A* Graph Search, or simply Graph Search, removes this limitation by adding this rule: do
not expand the same node more than once.

Hill Climbing Algorithm in Artificial Intelligence


o Hill climbing algorithm is a local search algorithm which continuously moves in the
direction of increasing elevation/value to find the peak of the mountain or best solution to
the problem. It terminates when it reaches a peak value where no neighbor has a higher
value.
o Hill climbing algorithm is a technique which is used for optimizing the mathematical
problems. One of the widely discussed examples of Hill climbing algorithm is Traveling-
salesman Problem in which we need to minimize the distance traveled by the salesman.
o It is also called greedy local search.
o A node of hill climbing algorithm has two components which are state and value.
o Hill Climbing is mostly used when a good heuristic is available.
o In this algorithm, we don't need to maintain and handle the search tree or graph as it only
keeps a single current state.
o Hill Climbing is a heuristic search used for mathematical optimization problems in the
field of Artificial Intelligence.

State-space Diagram for Hill Climbing:

The state-space landscape is a graphical representation of the hill-climbing algorithm which is


showing a graph between various states of algorithm and Objective function/Cost.

On Y-axis we have taken the function which can be an objective function or cost function, and
state-space on the x-axis. If the function on Y-axis is cost then, the goal of search is to find the
global minimum and local minimum. If the function of Y-axis is Objective function, then the goal
of the search is to find the global maximum and local maximum.
Model-Based Learning

Model-based Reinforcement Learning refers to learning optimal behavior


indirectly by learning a model of the environment by taking actions and
observing the outcomes that include the next state and the immediate reward.
We start with model-based learning where we completely know the
environment model parameters, p(rt+1|st, at) and P (st+1|st, at).
In such a case, we do not need any exploration and can directly solve for the
optimal value function and policy using dynamic programming.
The optimal value function is unique and is the solution to the simultaneous
equations given in equation 18.6.

?) To each possible next state st+1, we move with probability P (st+1|st, at), and
continuing from there using the optimal policy, the expected cumulative reward
is V ∗(st+1). We sum over all such possible next states, and we discount it
because it is one time step later. Adding our immediate expected reward, we get
the total expected cumulative reward for action at.
Once we have the optimal value function, the optimal policy is to choose the
action that maximizes the value in the next state:

Value Iteration


To find the optimal policy, we can use the optimal value function, and value there
is an iterative algorithm called value iteration that has been shown to converge to
the correct V ∗ values. Its pseudocode is given in figure 18.2





Policy Iteration

In policy iteration, we store and update the policy rather than doing this indirectly
over the values. The pseudocode is given in figure 18.3. The idea is to start with a
policy and improve it repeatedly until there is no change. The value function can
be calculated by solving for the linear equations. We then check whether we can
improve the policy by taking these into account. This step is guaranteed to
improve the policy, and when no improvement is possible, the policy is
guaranteed to be optimal. Each iteration of this algorithm takes O(|A||S|2 + |S|3)
time that is more than that of value iteration, but policy iteration needs fewer
iterations than value iteration.
UNIT 1
What is An Artificial intelligence?

 Intelligence is the ability to learn from experiences and to adapt to shape


and select environment.


Knowledge-Based Systems
1)An agent is a program that implements a mapping from perceptions to
actions. For simple agents this way of looking at the problem is sufficient. For
complex applications in which the agent must be able to rely on a large amount
of information and is meant to do a difficult task, programming the agent can be
very costly and unclear how to proceed. Here AI provides a clear path to follow
that will greatly simplify the work.
2) First we separate knowledge from the system or program, which uses the
knowledge to, for example, reach conclusions, answer queries, or come up with
a plan. This system is called the inference mechanism. The knowledge is stored
in a knowledge base (KB). Acquisition of knowledge in the knowledge base is
denoted Knowledge Engineering and is based on various knowledge sources
such as human experts,the knowledge engineer, and databases
3)Moving toward a separation of knowledge and inference has several
crucial advantages. The separation of knowledge and inference can allow
inference systems to be implemented in a largely application-independent
way. For example, application of a medical expert system to other diseases
is much easier by replacing the knowledge base rather than by programming
a whole new
system.

4)Through the decoupling of the knowledge base from inference, knowledge can
be stored declaratively. In the knowledge base there is only a description of the
knowledge, which is independent from the inference system in use. Without this
clear separation, knowledge and processing of inference steps would be
interwoven, and any changes to the knowledge would be very costly.
Intelligent AGENT

Agent denotes generally a system that processes information and produces an


output from an input. These agents may be classified in many different ways.

In classical computer science, software agents are primarily employed (Fig. 1.5).
In this case the agent consists of a program that calculates a result from user
input.

In robotics, on the other hand, hardware agents (also called autonomous robots)
are employed, which additionally have sensors and actuators at their disposal
(Fig. 1.6). The agent can perceive its environment with the sensors. With the
actuators it carries out actions and changes its environment.

With respect to the intelligence of the agent, there is a distinction


between reflex agents, which only react to input, and agents with memory, which
can also include the past in their decisions.
If a reflex agent is controlled by a deterministic program, it represents a function
of the set of all inputs to the set of all outputs. An agent with memory, on the
other hand, is in general not a function.

A mobile robot which should move from room 112 to room 179 in a building
takes actions different from those of a robot that should move to room 105. In
other words, the actions depend on the goal. Such agents are called goal-based
Agent.

The goal of a cost-based agent is to minimize the cost of erroneous decisions in


the long term, that is, on average. In Sect. 7.3 we will become familiar with the
medical diagnosis system LEXMED as an example of a cost-based agent.

Analogously, the goal of a utility-based agent is to maximize the utility derived


from correct decisions in the long term, that is, on average. The sum of all
decisions weighted by their respective utility factors gives the total utility.

Of particular interest in AI are Learning agents, which are capable of changing


themselves given training examples or through positive or negative feedback,
such that the average utility of their actions grows over time.

The design of an agent is oriented, along with its objective, strongly toward its
environment, or alternately its picture of the environment, which strongly
depends on it sensors. The environment is observable if the agent always knows
the complete state of the world. Otherwise the environment is only partially
observable. If an action always leads to the same result, then the environment is
deterministic. Otherwise it is nondeterministic. In a discrete environment only
finitely many states and actions occur, whereas a continuous environment boasts
infinitely many states or actions.

CHARACTERISTICS

 The intelligent agent must learn and improve through interaction with the
environment.
 The IA must adapt online and in the real time situation.
 The IA must learn quickly from large amount of data.
 The IA must accommodate new problem solving rules incrementally.
 The IA must have memory which must exhibit storage and retrieval
capacities.
 The IA should be able to analyze self in term of behavior, error and
success.

1)Situatedness
The agent receives some form of sensory input from its environment, and it
performs some action that changes its environment in some way. Examples of
environments: the physical world and the Internet.

2)Autonomy
The agent can act without direct intervention by humans or other agents and that
it has control over its own actions and internal state.

3)Adaptivity
The agent is capable of
(1) reacting flexibly to changes in its environment;
(2) taking goal-directed initiative (i.e., is pro-active), when appropriate; and
(3) Learning from its own experience, its environment, and interactions with
others.

4)Sociability
The agent is capable of interacting in a peer-to-peer manner with other agents or
humans.

5)Adjustments
Ability to take on new rules incrementally.

6)Proactivity
refers to the ability of an agent to perceive and react to environmental changes in
order to achieve the goal(s).

7)Goal-oriented

They are designed to achieve specific goals, which can be pre-defined or learned
through interactions with the environment.

8)Perception
Intelligent agents use sensors to perceive their environment and understand
information. This perception can be through various sensors such as cameras,
microphones, or other data collection tools.
AGENT
Unit 2
Propogisional logic
First order logic

In the topic of Propositional logic, we have seen that how to represent statements using
propositional logic. But unfortunately, in propositional logic, we can only represent the facts,
which are either true or false. PL is not sufficient to represent the complex sentences or natural language
statements. The propositional logic has very limited expressive power. Consider the following
sentence, which we cannot represent using PL logic.

o"Some humans are intelligent", or


o"Sachin likes cricket."

To represent the above statements, PL logic is not sufficient, so we required some more powerful logic,
such as first-order logic.

First-Order logic:
 First-order logic is another way of knowledge representation in artificial intelligence. It is
an extension to propositional logic.
 FOL is sufficiently expressive to represent the natural language statements in a concise way.
 First-order logic is also known as Predicate logic or First-order predicate logic. First-order
logic is a powerful language that develops information about the objects in a more easy way and
can also express the relationship between those objects.
 First-order logic (like natural language) does not only assume that the world contains facts
like propositional logic but also assumes the following things in the world:

 Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
 Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such
as: the sister of, brother of, has color, comes between
 Function: Father of, best friend, third inning of, end of, ......

As a natural language, first-order logic also has two main parts:


a. Syntax
b. Semantics

Syntax of First-Order logic:


The syntax of FOL determines which collection of symbols is a logical expression
in first-order logic. The basic syntactic elements of first-order logic are
symbols. We write statements in short-hand notation in FOL.
Basic Elements of First-order logic:
Following are the basic elements of FOL syntax.
difference between first order logic and propositional logic

Key Differences:

Expressiveness: Propositional logic deals with simple true/false propositions and their combinations
using logical operators. First-order logic goes beyond this by allowing the representation of complex
relationships, quantification over variables, and functions that operate on objects.

Quantification: Propositional logic lacks quantifiers (like "for all" and "exists") which are essential for
expressing general statements and relationships involving variables. First-order logic includes
quantifiers to make statements about entire classes of objects.

Structure: In propositional logic, propositions are atomic and not further decomposed. In first-order
logic, propositions can contain variables, predicates, and functions, allowing for more detailed
representation of relationships and properties.

Scope: Propositional logic is often used for simple reasoning tasks and truth tables. First-order logic is
more suitable for representing complex relationships, making inferences, and expressing higher-level
concepts.

Ontology
 The limitation of PL is that it does not represent any individual entities

whereas FOL can easily represent the individual establishment that means if

you are writing a single sentence then it can be easily represented in FOL.

 PL does not signify or express the generalization, specialization or pattern

for example ‘QUANTIFIERS’ cannot be used in PL but in FOL users can

easily use quantifiers as it does express the generalization, specialization,

and pattern.
What are the limitations of First-Order Logic?
First-Order Logic struggles with uncertainty, handling large-scale reasoning, and capturing
context-dependent knowledge. It can become complex for certain problems, and its
representations might not be ideal for dealing with vague or probabilistic information.

While first-order logic is powerful, it has limitations, such as its


inability to handle uncertainty, make decisions under incomplete
information, or reason about self-reference and certain types of non-
monotonic reasoning. As a result, it’s often used in combination with
other formalisms and techniques in AI systems.
Unit 4
How does ANN work?
1. It can be viewed as weighted directed graphs in which artificial neurons are nodes, and
directed edges with weights are connections between neuron outputs and neuron inputs.
2. The Artificial Neural Network receives information from the external world in pattern and
image in vector form. These inputs are designated by the notation x(n) for n number of inputs.
3. Every input is multiplied by its specific weights, which serve as crucial information for the
neural network to solve problems. These weights essentially represent the strength of the
connections between neurons within the neural network.
4. The weighted inputs are all summed up inside the computing unit (artificial neuron). In case
the weighted sum is zero, bias is added to make the output not- zero or to scale up the system
response. Bias has the weight and input always equal to ‘1'.
5. The sum corresponds to any numerical value ranging from 0 to infinity. To limit the response
to arrive at the desired value, the threshold value is set up. For this, the sum is forward through
an activation function.
6. The activation function is set to the transfer function to get the desired output. There are linear
as well as the nonlinear activation function.
Some of commonly used activation function are Binary, sigmoidal (linear) and tan hyperbolic
sigmoidal functions(nonlinear).

Architecture of ANN
ANNs are arranged in layers. Nodes interconnect layers. Each node is characterized by a
specific activation function. Typically, an ANN may have flowing layers.

Input lay e r
It receives the data, which usually is in the form of vectors. The vector may contain any number
of parameters. Generally, the number of input nodes in the input layer is equal to the number of
parameters in the input vector. Input layers preprocess the data and feed it to subsequent hidden
layers. The input nodes do not change the data; it simply checks if the data is in a valid format
and then passes it along to the next layer.
Hidden Layer
The main processing happens in the hidden layers. The number of hidden layers can vary. The
incoming information passes through weighted connections in the hidden layer, where the input
values are multiplied with weights. Subsequently, the weighted inputs are summed up to
produce a single number.
Output layer
The processed information is directed towards the output layer. The output layer can be
connected with the hidden layer, input layer, or both. In some cases, the output layers feed the
information back to the input layer. The output layer generates the final prediction value. There
is typically one output node in classification networks. The activation functions at the nodes of
the output layer add and change the data to produce the output values. Proper weight adjustment
is vital for neural networks to find useful data patterns and prevent overfitting.

Difference between ANN and BNN


The human brain works asynchronously, ANNs work synchronously.
7)Processing speed: Single biological neurons are slow, while standard neurons in ANNs
are fast.

What is Neural Network?

The term ‘Neural’ is derived from the human (animal) nervous system’s basic
functional unit ‘neuron’ or nerve cells which are present in the brain and other
parts of the human (animal) body.
Humans have made several attempts to mimic the biological systems, and one of them is
artificial neural networks inspired by the biological neural networks in living organisms.
However, they are very much different in several ways. For example, the birds had inspired
humans to create airplanes, and the four-legged animals inspired us to develop cars .The
artificial counterparts are definitely more powerful and make our life better. The perceptrons,
who are the predecessors of artificial neurons, were created to parts of a biological neuron
such as dendrite, axon, and cell body using mathematical models, electronics, and whatever
limited information we have of biological neural networks.

 Structure of BNN

In living organisms, the brain is the control unit of the neural network, and it has different subunits that
take care of vision, senses, movement, and hearing. The brain is connected with a dense network of nerves
to the rest of the body’s sensors and actors .There are approximately 10¹¹ neurons in the brain, and these
are the building blocks of the complete central nervous system of the living body .The neuron is the
fundamental building block of neural networks. In the biological systems, a neuron is a cell just like any
other cell of the body, which has a DNA code and is generated in the same way as the other cells. Though
it might have different DNA, the function is similar in all the organisms. A neuron comprises
three major parts: the cell body (also called Soma), the dendrites, and the axon. The dendrites are like
fibers branched in different directions and are connected to many cells in that cluster. Dendrites receive
the signals from surrounding neurons, and the axon transmits the signal to the other neurons. At the
ending terminal of the axon, the contact with the dendrite is made through a synapse. Axon is a long fiber
that transports the output signal as electric impulses along its length. Each neuron has one axon. Axons
passim pulses from one neuron to another like a domino effect.

Dendrite — It receives signals from other neurons.


Soma (cell body) — It sums all the incoming signals to generate input.
Axon — When the sum reaches a threshold value, neuron fires and the signal travels
down the axon to the other neurons.
Synapses — The point of interconnection of one neuron with other neurons. The amount
of signal transmitted depend upon the strength (synaptic weights) of the connections.
The connections can be inhibitory (decreasing strength) or excitatory (increasing
strength) in nature.
So, neural network, in general, is a highly interconnected network of billions of neuron
with trillion of interconnections between them.

2.2.3. Back Propagation Networks (BPN)


Introduced by Rumelhart, Hinton, & Williams in 1986. BPN is a Multi- layer Feedforward
Network but error is back propagated, Hence the name Back Propagation Network (BPN). It
uses Supervised Training process; it has a systematic procedure for training the network and is
used in Error Detection and Correction. Generalized Delta Law /Continuous Perceptron Law/
Gradient Descent Law is used in this network. Generalized Delta rule minimizes the mean
squared error of the output calculated from the output. Delta law has faster convergence rate
when compared with Perceptron Law. It is the extended version of Perceptron Training Law.
Limitations of this law is the Local minima problem. Due to this the convergence speed
reduces, but it is better than perceptron’s. Figure 1 represents a BPN network architecture. Even
though Multi level perceptron’s can be used they are flexible and efficient that BPN. In figure 1
the weights between input and the hidden portion is considered as Wij and the weight between
first hidden to the next layer is considered as Vjk. This network is valid only for Differential
Output functions. The Training process used in backpropagation involves three stages, which
are listed as below
1. Feedforward of input training pair
B. Calculation and backpropagation of associated error
2. Adjustments of weights

Advantage of SVM
What is SVM?
 Hopfeild Network
1)Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a
single layer which contains one or more fully connected recurrent neurons. The Hopfield
network is commonly used for auto-association and optimization tasks.

2)Hopfield networks, which are a beautiful and visualizable example of auto-associative


memory, are based on this idea. Patterns can be stored in auto-associative memory. To call up
a saved pattern, it is sufficient to provide a similar pattern. The store then finds the most
similar saved pattern. A classic application of this is handwriting recognition.

3) The basic diagram for Hopfield Networks is given in Figure 7. Here no learning algorithm
is used. No Hidden units/layers used. Patterns are simply stored by learning the energies.
Similar to Human brain in storing and retrieving memory patterns. Some patterns / images
are stored & when similar noisy input is provided the network recalls the related stored
pattern. The neuron can be ONN(+1) or OFF(-1).The neurons can change state between +1 &
-1 based on the inputs which they receives from other neurons. Hopfield Network is trained
to store patterns(memories). It can recognize previously learned (stored) Pattern from partial
(noisy) inputs.

4) A Hopfield network is a single-layered and recurrent network in which the neurons are
entirely connected, i.e., each neuron is associated with other neurons. If there are two neurons
i and j, then there is a connectivity weight wij lies between them which is symmetric wij = wji .
5)Types of Hopfield Network Based on the activation functions used the Hopfield Network can Be classified into two
types.
They are (a) Discrete Hopfield network (b) Continuous Hopfield Network
Discrete Hopfield Network – Uses Discrete Activation Function
Continuous Hopfield Network – Uses Continuous Activation Function

6)Continuous Hopfield Network


Unlike the discrete Hopfield networks, here the time parameter is treated as a continuous variable. So, instead of
getting binary/bipolar outputs, we can obtain values that lie between 0 and 1. It can be used to solve constrained
optimization and associative memory problems.

7)Discrete Hopfield Network


It is a fully interconnected neural network where each unit is connected to every other unit. It behaves in a
discrete manner, i.e. it gives finite distinct output, generally of two types:
 Binary (0/1)
 Bipolar (-1/1)
The weights associated with this network are symmetric in nature and have the following properties.
1. 𝑤𝑖𝑗=𝑤𝑗𝑖2. 𝑤𝑖𝑖=0 1. wij=wji2. wii=0

8)These networks were introduced to collect and retrieve memory and store various patterns. Also, auto-
association and optimization of the task can be done using these networks. In this network, each node is fully
connected(recurrent) to other nodes. These nodes exist only in two states: ON (1) or OFF (0). These states can be
restored based on the input received from other nodes. Unlike other neural networks, the output of the Hopfield
network is finite. Also, the input and output sizes must be the same in these networks.
9)The Hopfield network consists of associative memory. This memory allows the system to retrieve the memory
using an incomplete portion. The network can restore the closest pattern using the data captured in associative
memory. This feature of Hopfield networks makes it a good candidate for pattern recognition.
10) The Hopfield's model consists of processing elements with two outputs, one inverting and the other non-
inverting. The outputs from each processing element are fed back to the input of other processing elements but not
to itself.
11) Hopfield networks are a type of recurrent neural network used for associative memory.
12) The number of feedback loops is equal to the number of neurons.

Deep Learning
 Deep learning is a subset of machine learning that uses multi-layered neural networks, called deep neural
networks, to simulate the complex decision-making power of the human brain.
 Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in a way
that is inspired by the human brain. Deep learning models can recognize complex patterns in pictures,
text, sounds, and other data to produce accurate insights and predictions.
 Deep learning neural networks, or artificial neural networks, attempts to mimic the human brain through a
combination of data inputs, weights, and bias. These elements work together to accurately recognize,
classify, and describe objects within the data.
 What is deep learning?

 Deep learning is a branch of machine learning that is made up of a neural network with three or more layers:
 Input layer: Data enters through the input layer.
 Hidden layers: Hidden layers process and transport data to other layers.
 Output layer: The final result or prediction is made in the output layer.
 Deep learning is an important element of data science, including statistics and predictive modeling. It is
extremely beneficial to data scientists who are tasked with collecting, analyzing and interpreting large
amounts of data; deep learning makes this process faster and easier.
Examples of deep learning used in everyday life:

 Voice-enabled technology, such as TV remotes


 Virtual assistants
 Credit card fraud detection
 Chatbots and service bots
 Image analysis
 Customised shopping and entertainment

Applications

Healthcare

Deep Learning has found its application in the Healthcare sector. Computer-aided disease detection and computer-
aided diagnosis have been possible using Deep Learning. It is widely used for medical research, drug discovery,
and diagnosis of life-threatening diseases such as cancer and diabetic retinopathy through the process of medical
imaging.

Entertainment

Companies such as Netflix, Amazon, YouTube, and Spotify give relevant movies, songs, and video
recommendations to enhance their customer experience. This is all thanks to Deep Learning. Based on a person’s
browsing history, interest, and behavior, online streaming companies give suggestions to help them make product
and service choices. Deep learning techniques are also used to add sound to silent movies and generate subtitles
automatically.

Robotics

Deep Learning is heavily used for building robots to perform human-like tasks. Robots powered by Deep Learning
use real-time updates to sense obstacles in their path and pre-plan their journey instantly. It can be used to carry
goods in hospitals, factories, warehouses, inventory management, manufacturing products, etc.

Natural Language Processing

Another important field where Deep Learning is showing promising results is NLP, or Natural Language
Processing. It is the procedure for allowing robots to study and comprehend human language.

Fraud Detection

Another attractive application for deep learning is fraud protection and detection; major companies in the payment
system sector are already experimenting with it. PayPal, for example, uses predictive analytics technology to detect
and prevent fraudulent activity.

Computer vision. Deep learning has greatly enhanced computer vision, providing computers with extreme

accuracy for object detection and image classification, restoration and segmentation.
UNIT 5

An ‘end-to-end’ open-source platform for ML, TensorFlow by Google is a flexible, efficient, and easy-to-use
framework. With features like robust production, easy development, and impactful experimentation, this open-
source AI library was launched in the year 2015.Tensorflow is a successful AI library with advanced Google AI
tools. Written in Python and C++, TensorFlow is not only meant for ML programming, but it also integrates Deep
Learning technologies for the user to apply.

Written in Python and C++, PyTorch is one of the best open-source AI libraries that allows optimization of
performance and gives way for scalable training of a software application. With a rich set of tools and models to
support API, PyTorch is also used for development in applications of Natural Language Processing or NLP. Some
of the artificial intelligence tools and techniques of the PyTorch ecosystem are - pystiche, bayesian active learning,
glow, and skorch. All these tools enable applications to be user-friendly and compatible in terms of performance

A Python AI library, Keras is programmed in the Python program language and allows deep learning techniques to
be integrated. As it is one of the most used AI libraries among all, it is certainly more powerful, efficient, and
flexible for utilization. Launched in 2016, Keras has always been a promising AI library with advanced machine
learning algorithms in python.

Scikit-Learn : Scikit-Learn is a Python library for machine learning. It is an open-source and beginner friendly
tool that offers data mining and machine learning capabilities, as well as comprehensive documentation and
tutorials. Scikit-Learn is well-suited for smaller projects and quick model prototyping but may not be the best
choice for deep learning tasks. comprehensive documentation and tutorials. Scikit-Learn is well-suited for smaller
projects and quick model prototyping but may not be the best choice for deep learning tasks.

OpenAI : OpenAI provides a range of tools for different AI tasks, including making images or converting text to
speech. It's known for its powerful GPT language models that can understand and generate human-like text.
OpenAI's platform is user-friendly, making it easier for people to use advanced AI in their own projects, especially
for creating AI assistants

PyBrain : PyBrain is an open-source ML library for Python. It provides a simple and flexible environment for
experimenting with various machine learning algorithms and is perfect for researchers, educators, and developers
looking for a lightweight Python-based framework for exploring machine learning concepts. It is lightweight and
easy to use for experimentation, supporting a wide range of machine learning algorithms

Microsoft Cognitive Toolkit (CNTK) : The Microsoft Cognitive Toolkit, or CNTK, is a free and open-source
deep learning AI framework developed by Microsoft. It's known for its efficiency, especially on multi-GPU
systems, and is suitable for both research and production deployments. It is preferred by many researchers, data
scientists, and developers working on deep learning projects with access to powerful hardware because it's highly
efficient, particularly for training large models. It also supports multiple neural network types, including
feedforward and recurrent networks; additionally, it provides a Python API for ease of use.

4)Theano, an open-source python library for deep learning, is also popular in the neural processing and
data science communities. It's widely known for making it easy to implement complex neural networks
by abstracting away the neural network components (such as the layers and hidden layers). It's often
used to build and train AI models on graphics processing units (GPUs) and has been adopted by
Facebook for both training and deploying AI applications.

Theano comes with a library of algorithms that perform neural network operations on data frames. It works
with Python, C++, Java, Julia, Scala, and Tensorflow and is currently the most popular AI framework used by
developers who use either Tensorflow or Theano. Theoretically, Theano can be used on any platform, but most
Theano developers use Tensorflow and Tensorboard.

5)Caffe

Developed by Berkeley AI Research, Caffe is another popular deep learning framework. It is most widely used in
the development of convolutional neural networks (CNNs) and is also popular for its fast execution time.

The framework includes numerous tools for developing and training neural networks and also provides support for
GPU and CPU computation. You can use Caffe for image classifications, semantic segmentation, and object
detection.

6) OpenNN

OpenNN is another robust AI framework that includes extensive analytics that is suitable for both beginners and
experienced programmers. It comes with a tool like Neural Designer, which is one of the most popular tools for
sophisticated analysis. The tool has the ability to offer tables and graphs to analyze the data entered.

7) MxNet

MxNet is another AI framework that is built with scalability in mind. Provides full support for multi-machine and
multi-GPU training. The framework includes many features, including easy writing of custom layers in higher-
level languages. As a community-developed framework, MxNet is available with TVM support, increasing
implementation support.

Importance of regression

1. Understanding the relationship between variables: One of the main benefits of regression
analysis is that it helps in understanding the relationship between two variables. It helps in
identifying the degree of correlation between two variables and whether the correlation is
positive or negative. For example, if we want to understand the relationship between income and
expenditure, we can use regression analysis to determine whether an increase in income leads to
an increase in expenditure.
2. https://www.simplilearn.com/tutorials/excel-tutorial/regression-
analysis#:~:text=Regression%20analysis%20is%20helpful%20in,pred
ictions%20based%20on%20relevant%20factors.

Kernel Machines
1)Kernel machines are maximum margin methods that allow the model to be written as a sum
of the influences of a subset of the training instances. These influences are given by application-
specific similarity kernels, and we discuss “kernelized” classification, regression, ranking,
outlier detection and dimensionality reduction, and how to choose and use kernels.
2)the parameter of the linear model, the weight vector, can be written down in terms of a subset
of the training set, which are the so-called support vectors. In classification, these are the cases
that are close to the boundary and as such, knowing them allows knowledge extraction: Those
are the uncertain or erroneous cases that lie in the vicinity of the boundary between two classes.
Their number gives us an estimate of the generalization error, and, as we see below, being able
to write the model parameter in terms of a set of instances allows kernelization.

3)the output is written as a sum of the influences of support vectors and these are given by
kernel functions that are application-specific measures of similarity between data instances.
Previously, we talked about nonlinear basis functions allowing us to map the input to another
space where a linear (smooth) solution is possible; the kernel function uses the same idea.

3)Typically in most learning algorithms, data points are represented as vectors, and either dot
product (as in the multilayer perceptrons) or Euclidean distance (as in radial basis function
networks) is used. A kernel function allows us to go beyond that.

4) The kernel function defines the space according to its notion of similarity and a kernel
function is good if we have better separation in its corresponding space.

You might also like