Ai Mca
Ai Mca
Tic-Tac-Toe Learner AI
Tic-Tac-Toe is a simple game for two players that we enjoyed playing as kids
(especially in boring classrooms). The game involves 2 players placing their
respective symbols in a 3x3 grid. The player who manages to place three of their
symbols in horizontal/vertical/diagonal row wins the game. If either player fails
to do so the game ends in a draw. If both the people always play their optimal
strategies the game always ends in a draw.
Since the grid is small & there are only two players involved the number of
possible moves for every board state is limited thus allowing Tree-based search
algorithms like Alpha-Beta pruning to provide a computationally feasible &
exact solution to building a Computer-based Tic-Tac-Toe player.
In this article, we look at an approximate (Learning-based) approach to the same
game. Even though a better algorithm exists (i.e. Alpha-beta pruning), the
approximate method provides an alternative method that might be useful if the
complexity of the board were to increase. Also, the code changes to incorporate
that would be minimal.
The basic idea behind the learning system is that the system should be able to
improve its performance (P) w.r.t to a set of tasks (T) by learning from training
experience (E). The training experience (E) can be a direct (predefined set of
data with individual labels) or indirect feedback (No labels for each training
example). In our case:-
The score (R) for each non-final board state is assigned with the estimated score
of the successor board state. The final board state is assigned a score based on
the end result of the game.
3. V(boardState) ←V_hat(Successor(boardState))
4. V(finalBoardState) ←100 (Win) | 0 (Draw) | -100 (Loss)
Implementation
The Final Design is split into four modules (Ch-1, Tom Mitchell’s Machine
Learning Book):-
4. Generalizer: This module uses the training examples provided by the critic
to Update/Improve the Target Function by learning the desired weights using
the LMS weight update rule at each epoch.
The following is a small video showing the training & testing phases ( BTW: My
video editing skills are not good).
1. JAVA
Java by Oracle is one of the best programming languages available out there.
Over the years, this language has adapted to the latest innovations and
technological advancements. The same is true for AI. Using Java for AI
development can help you get some scalable applications.
For AI development, Java offers ease of usage and debugging and simplifies
large-scale projects. You can represent the data in graphics and offer better user
interaction.
Java’s Virtual Machine Technology helps the developers build a single version
of an app that they can run on other Java-based platforms. Developers can also
work on the graphics and interfaces, making them more appealing with the
Standard Widget Toolkit. In all, Java helps you maintain, port, and make AI
applications secure.
2. PYTHON
Another one on the list is Python, the programming language that offers the
least code among all others. There are many reasons why we need to hire
Python developers that help in AI development. These reasons include:
With all these features and many others, Python has become one of the best
languages for AI development.
3. JAVASCRIPT
Just like Java, JavaScript is also an ideal match for AI development. However, it
is used to develop more secure and dynamic websites. While Python is suitable
for developers who don’t like coding, JavaScript is for those who don’t mind
it.
The AI capabilities of JavaScript help it interact and work smoothly with other
source codes like HTML and CSS. Like Java, JavaScript also has a large
community of developers that support the development process. With libraries
like jQuery, React.js, and Underscore.js, AI development becomes more
effective. From multimedia, buttons, to data storage, you can manage both
frontend and backend functions using JavaScript.
With JavaScript, you can ensure security, high performance, and less
development time.
4. JULIA
While Julia does not come with a large community or support, it offers many
high-end features for top-notch AI development. When it comes to handling
data analysis and numbers, Julia is the best development tool.
If you need to make a dynamic interface, catchy graphics, and data visuals, Julia
provides you with the right tools for perfect execution. With features like
debugging, memory management, and metaprogramming, this language makes
AI development a breeze.
For machine learning AI projects, Julia is the best bet. It comes with many
packages like Metahead, MLJ.JL, Turing.JL, and Flux.JL.
5. LISP
Lisp is one of the oldest languages used for AI development. It was developed
in the 1960s and has always been an adaptable and smart language. If your
project requires modification of code, problem-solving, rapid prototyping, or
dynamic development, Lisp is for you.
Some successful projects made with Lisp are Routinic, Grammarly, and DART.
Though it has its drawbacks, Lisp is still a promising programming language for
AI development.
6. R
7. PROLOG
Prolog is short for Programming in Logic. The language was developed in 1972
in a rule-like form. It is majorly used for projects that involve computational
linguistics and artificial intelligence. For the projects that require a database,
natural language processing, and symbolic reasoning, Prolog is the best bet! It is
the perfect language support for research when it comes to artificial
intelligence.
Used for automated planning, theorem proving, expert and type systems, Prolog
still has limited usage. However, it is used to build some high-end NLP
applications and by giants like IBM Watson.
8. SCALA
Scala makes the coding process fast, easy, and much more productive. The
index Scaladex that has the Scala libraries and resources helps developers create
some quality applications.
It runs on the Java Virtual Machine (JVM) environment and helps developers
program smart software. Scala is compatible with Java and JS and offers many
features like pattern matching, high-performing functions, browser tools, and
flexible interfaces. For AI development, Scala is one of the best options and it
has impressed the developers in that area.
9. RUST
10. HASKELL
As for modern technology, the most important reason why Python is always
ranked near the top is that there are AI-specific frameworks that were created
for the language. One of the most popular is TensorFlow, which is an open-
source library created specifically for machine learning and can be used for
training and inference of deep neural networks. Other AI-centric frameworks
include:
Lisp
Lisp has been around since the 60s and has been widely used for scientific
research in the fields of natural languages, theorem proofs, and to solve artificial
intelligence problems. Lisp was originally created as a practical mathematical
notation for programs, but eventually became a top choice of developers in the
field of AI.
Even though Lisp is the second oldest programming language still in use, it
includes several features that are critical to successful AI projects:
Rapid prototyping.
Dynamic object creation.
Mandatory garbage collection.
Data structures can be executed as programs.
Programs can be modified as data.
Uses recursion as a control structure and not an iteration.
Great symbolic information processing capabilities.
Read-Eval-Print-Loop to ease interactive programming.
More importantly, the man who created Lisp (John McCarthy) was very
influential in the field of AI, so much of his work had been implemented for a
long time.
Java
It should go without saying that Java is an important language for AI. One
reason for that is how prevalent the language is in mobile app development.
And given how many mobile apps take advantage of AI, it’s a perfect match.
Not only can Java work with TensorFlow, but it also has other libraries and
frameworks specifically designed for AI:
Deep Java Library – a library built by Amazon to create deep learning abilities.
Kubeflow – makes it possible to deploy and manage Machine Learning stacks
on Kubernetes.
OpenNLP – a Machine Learning tool for processing natural language.
Java Machine Learning Library – provides several Machine Learning
algorithms.
Neuroph – makes it possible to design neural networks.
Java also makes use of simplified debugging, and its easy-to-use syntax offers
graphical data presentation and incorporates both WORA and Object-Oriented
patterns.
C++
C++ is another language that’s been around for quite some time, but still is a
legitimate contender for AI use. One of the reasons for this is how widely
flexible the language is, which makes it perfectly suited for resource-intensive
applications. C++ is a low-level language that provides better handling for the
AI model in production. And although C++ might not be the first choice for AI
engineers, it can’t be ignored that many of the deep and machine learning
libraries are written in C++.
And because C++ converts user code to machine-readable code, it’s incredibly
efficient and performant.
R
R might not be the perfect language for AI, but it’s fantastic at crunching very
large numbers, which makes it better than Python at scale. And with R’s built-in
functional programming, vectorial computation, and Object-Oriented Nature, it
does make for a viable language for Artificial Intelligence.
R also enjoys a few packages that are specifically designed for AI:
gmodels – provides several tools for the task of model fitting.
TM – a framework used for text mining applications.
RODBC – an ODBC interface.
OneR – makes it possible to implement the One Rule Machine Learning
classification algorithm.
Julia
Julia is one of the newer languages on the list and was created to focus on
performance computing in scientific and technical fields. Julia includes several
features that directly apply to AI programming:
https://nitsri.ac.in/Department/Computer%20Science%20&%20Engineering/
ProblemSolving(L-2).pdf
Set of all possible state for a given problem is known as state space of a
problem.
For finding the solution one can make use of explicit search tree that is
generated by initial state and the successor function that together define
the state space.
In general We May have search graph rather than search tree as the same
can be reached from multiple paths.
State Space:
– The root of search tree is a search node corresponding to initial state in
this state only we can check if goal is reached.
– if goal is not reached we need to consider another state. such a can be
done by expanding form the current state by applying successor function
which generates new state. from this we may get multiple states.
– for each one of these, again we need to check goal test or else repeat
expansion of each state.
– the choice of which state to expand is determined by search strategy.
Set of all possible state for a given problem is known as state space of a
problem.
For finding the solution one can make use of explicit search tree that is
generated by initial state and the successor function that together define
the state space.
In general We May have search graph rather than search tree as the same
can be reached from multiple paths.
State Space:
– The root of search tree is a search node corresponding to initial state in
this state only we can check if goal is reached.
– if goal is not reached we need to consider another state. such a can be
done by expanding form the current state by applying successor function
which generates new state. from this we may get multiple states.
– for each one of these, again we need to check goal test or else repeat
expansion of each state.
– the choice of which state to expand is determined by search strategy.
Exhaustive Search
Constraint Satisfaction
1. Until a complete solution is found or until all path has led to dead ends, do:
(i) Select an unexpected node of the search graph.
(ii) Apply the constraints inference rules to the selected node to generate all
possible new constraints.
(iii) If the set of constraints contains a contradiction then report that this path is
a dead-end.
(iv) If the set of constraints describes a complete solution then report success.
(v) If neither a contradiction nor a complete solution has been found then apply
the problem space rules to generate new partial solutions that are consistent with
the current set of constraints. Insert these partial solutions into the search graph.
Means-End Analysis in Artificial Intelligence
Means-end-analysis is a special type of knowledge-rich search that allows both
backward and forward-searching.
Principle of Means-End Analysis:
“It allows us to solve parts of a problem first and then go back and solve
the smaller problems that arise while assembling the final solution”.
Technique: It is based on the use of the operations which transform the state of
the world. MEA works on three primary goals:
i. Transformation
ii. Reduction
iii. Application
Transformation: It means to transform object A into object B. It is an AND
graph which subdivided a problem into an intermediate problem and then
transform that problem into the goal state B.
Note: This process terminates when there is no difference between A and B or
we can say when the goal is reached.
Reduction: It means to reduce the difference between object-A and object-B by
modifying object-A.
Note: The goal of the operation is to reduce the difference between the object-A
and object-B by transforming object-A into object-A’ nearer goal B. This is
called Relevant operator (R).
Application: It means to apply the operator R to object-A. This will again be an
AND graph showing the goal of reducing the difference between object-A and
the pre-conditions required for the operator R, giving intermediate object A”.
Operator R is then applied to A”, transforming it to A’, which is close to goal B.
Game playing in ai
1. Introduction
2. What is Alpha Beta pruning?
3. Condition for Alpha-beta pruning
4. Minimax algorithm
5. Key points in Alpha-beta Pruning
6. Working of Alpha-beta Pruning
7. Move Ordering in Pruning
8. Rules to find Good ordering
9. Codes in Python
Introduction
The word ‘pruning’ means cutting down branches and leaves. In data science
pruning is a much-used term which refers to post and pre-pruning in decision
trees and random forest. Alpha-beta pruning is nothing but the pruning of
useless branches in decision trees. This alpha-beta pruning algorithm was
discovered independently by researchers in the 1900s.
Alpha: At any point along the Maximizer path, Alpha is the best
option or the highest value we’ve discovered. The initial value for
alpha is – ∞.
Beta: At any point along the Minimizer path, Beta is the best option
or the lowest value we’ve discovered.. The initial value for alpha is
+ ∞.
The condition for Alpha-beta Pruning is that α >= β.
The alpha and beta values of each node must be kept track of. Alpha
can only be updated when it’s MAX’s time, and beta can only be
updated when it’s MIN’s turn.
MAX will update only alpha values and the MIN player will update
only beta values.
The node values will be passed to upper nodes instead of alpha and
beta values during going into the tree’s reverse.
Alpha and Beta values only are passed to child nodes.
Learn about A* algorithm.
Minimax algorithm
Here is an example. Four coins are in a row and each player can pick up one
coin or two coins on his/her turn. The player who picks up the last coin wins.
Assuming that Max plays first, what move should Max make to win?
If Max picks two coins, then only two coins remain and Min can pick two
coins and win. Thus picking up 1 coin shall maximise Max’s reward.
As you might have noticed, the nodes of the tree in the figure below have
some values inscribed on them, these are called minimax value. The minimax
value of a node is the utility of the node if it is a terminal node.
If the node is a non-terminal Max node, the minimax value of the node is the
maximum of the minimax values of all of the node’s successors. On the other
hand, if the node is a non-terminal Min node, the minimax value of the node
is the minimum of the minimax values of all of the node’s successors.
Now we will discuss the idea behind the alpha beta pruning. If we apply
alpha-beta pruning to the standard minimax algorithm it gives the same
decision as that of standard algorithm but it prunes or cuts down the nodes
that are unusual in decision tree i.e. which are not affecting the final decision
made by the algorithm. This will help to avoid the complexity in the
interpretation of complex trees.
Now let us discuss the intuition behind this technique. Let us try to find
minimax decision in the below tree :
In this case,
Minimax Decision = MAX {MIN {3, 5, 10}, MIN {2, a, b}, MIN {2, 7, 3}}
= MAX {3, c, 2} = 3
Here in the above result you must have a doubt in your mind that how can we
find the maximum from missing value. So, here is solution of your doubt
also:
In the second node we choose the minimum value as c which is less than or
equal to 2 i.e. c <= 2. Now If c <= 3 and we have to choose the max of 3, c, 2
the maximum value will be 3.
Alpha: Alpha is the best choice or the highest value that we have
found at any instance along the path of Maximizer. The initial value
for alpha is – ∞.
Beta: Beta is the best choice or the lowest value that we have found
at any instance along the path of Minimizer. The initial value for
alpha is + ∞.
Each node has to keep track of its alpha and beta values. Alpha can
be updated only when it’s MAX’s turn and, similarly, beta can be
updated only when it’s MIN’s chance.
MAX will update only alpha values and MIN player will update
only beta values.
1. We will first start with the initial move. We will initially define the
alpha and beta values as the worst case i.e. α = -∞ and β= +∞. We
will prune the node only when alpha becomes greater than or equal
to beta.
2. Since the initial value of alpha is less than beta so we didn’t prune it. Now
it’s turn for MAX. So, at node D, value of alpha will be calculated. The value
of alpha at node D will be max (2, 3). So, value of alpha at node D will be 3.
3. Now the next move will be on node B and its turn for MIN now. So, at
node B, the value of alpha beta will be min (3, ∞). So, at node B values will
be alpha= – ∞ and beta will be 3.
In the next step, algorithms traverse the next successor of Node B which is
node E, and the values of α= -∞, and β= 3 will also be passed.
4. Now it’s turn for MAX. So, at node E we will look for MAX. The current
value of alpha at E is – ∞ and it will be compared with 5. So, MAX (- ∞, 5)
will be 5. So, at node E, alpha = 5, Beta = 5. Now as we can see that alpha is
greater than beta which is satisfying the pruning condition so we can prune
the right successor of node E and algorithm will not be traversed and the
value at node E will be 5.
6. In the next step the algorithm again comes to node A from node B. At node
A alpha will be changed to maximum value as MAX (- ∞, 3). So now the
value of alpha and beta at node A will be (3, + ∞) respectively and will be
transferred to node C. These same values will be transferred to node F.
7. At node F the value of alpha will be compared to the left branch which is
0. So, MAX (0, 3) will be 3 and then compared with the right child which is
1, and MAX (3,1) = 3 still α remains 3, but the node value of F will become
1.
8. Now node F will return the node value 1 to C and will compare to beta
value at C. Now its turn for MIN. So, MIN (+ ∞, 1) will be 1. Now at node C,
α= 3, and β= 1 and alpha is greater than beta which again satisfies the
pruning condition. So, the next successor of node C i.e. G will be pruned and
the algorithm didn’t compute the entire subtree G.
Now, C will return the node value to A and the best value of A will be MAX
(1, 3) will be 3.
The above represented tree is the final tree which is showing the nodes which
are computed and the nodes which are not computed. So, for this example the
optimal value of the maximizer will be 3.
The effectiveness of alpha – beta pruning is based on the order in which node
is examined. Move ordering plays an important role in alpha beta pruning.
Codes in Python
1 class MinimaxABAgent:
2 """
3 Minimax agent
4 """
5 def __init__(self, max_depth, player_color):
6 """
7 Initiation
8 Parameters
9 ----------
10 max_depth : int
11 The max depth of the tree
12 player_color : int
13 The player's index as MAX in minimax algorithm
14 """
15 self.max_depth = max_depth
16 self.player_color = player_color
17 self.node_expanded = 0
18
19 def choose_action(self, state):
20 """
21 Predict the move using minimax algorithm
22 Parameters
23 ----------
24 state : State
25 Returns
26 -------
27 float, str:
28 The evaluation or utility and the action key name
29 """
30 self.node_expanded = 0
31
32 start_time = time.time()
33
34 print("MINIMAX AB : Wait AI is choosing")
35 list_action = AIElements.get_possible_action(state)
36 eval_score, selected_key_action = self._minimax(0,state,True,float('-inf'),float('inf
37 print("MINIMAX : Done, eval = %d, expanded %d" % (eval_score, self.node_exp
38 print("--- %s seconds ---" % (time.time() - start_time))
39
40 return (selected_key_action,list_action[selected_key_action])
41
42 def _minimax(self, current_depth, state, is_max_turn, alpha, beta):
43
44 if current_depth == self.max_depth or state.is_terminal():
45 return AIElements.evaluation_function(state, self.player_color), ""
46
47 self.node_expanded += 1
48
49 possible_action = AIElements.get_possible_action(state)
50 key_of_actions = list(possible_action.keys())
51
52 shuffle(key_of_actions) #randomness
53 best_value = float('-inf') if is_max_turn else float('inf')
54 action_target = ""
55 for action_key in key_of_actions:
56 new_state = AIElements.result_function(state,possible_action[action_key])
57
58 eval_child, action_child = self._minimax(current_depth+1,new_state,not is_max
59
60 if is_max_turn and best_value < eval_child:
61 best_value = eval_child
62 action_target = action_key
63 alpha = max(alpha, best_value)
64 if beta <= alpha:
65 break
66
67 elif (not is_max_turn) and best_value > eval_child:
68 best_value = eval_child
69 action_target = action_key
70 beta = min(beta, best_value)
71 if beta <= alpha:
72 break
73
74 return best_value, action_target
In this document we have seen an important component of game theory.
Although the minimax algorithm’s performance is good but the algorithm is
slow. So to make it fast we use alpha-beta pruning algorithm which will cut
down the unusual nodes from the decision tree to improve the performance.
Nowadays fast and well-performed algorithm is widely used.
Unit III
Assignment : 6 problems each on following topics
Propositional alculus,
Proportional logic
Proportional logic
Predicate Logic
Unit III
Knowledge Representation
Unit IV
Download
Probability Theory
There are some basic terminologies associated with probability theory that aid
in the understanding of this field of mathematics.
Random Experiment
Sample Space
Sample space can be defined as the set of all possible outcomes that result from
conducting a random experiment. For example, the sample space of tossing a
fair coin is {heads, tails}.
Event
Independent events: Events that are not affected by other events are
independent events.
Dependent events: Events that are affected by other events are known as
dependent events.
Mutually exclusive events: Events that cannot take place at the same time
are mutually exclusive events.
Equally likely events: Two or more events that have the same chance of
occurring are known as equally likely events.
Exhaustive events: An exhaustive event is one that is equal to the sample
space of an experiment.
Random Variable
Probability
Conditional Probability
Expectation
Variance
Variance is the measure of dispersion that shows how the distribution of a
random variable varies with respect to the mean. It can be defined as the
average of the squared differences from the mean of the random variable.
Variance can be denoted as Var[X].
There are many formulas in probability theory that help in calculating the
various probabilities associated with events. The most important probability
theory formulas are listed below.
Probability theory is used in every field to assess the risk associated with a
particular decision. Some of the important applications of probability theory are
listed below:
Related Articles:
Probability Rules
Probability and Statistics
Geometric Distribution
This article is all about the certainty factor, which tells us about how much a
situation, is likely to come true. Through this, we can determine an estimate of
the amount of certainty or uncertainty in our decisions. In this article, we are
going to study about what certainty factor is and how it is determined for a
particular statement?
Submitted by Monika Sharma, on June 10, 2019
As we all know that when analyzing a situation and drawing certain results
about it in the real world, we cannot be cent percent sure about our conclusions.
There is some uncertainty in it for sure. We as human beings have the capability
of deciding whether the statement is true or false according to how much certain
we are about our observations. But machines do not have this analyzing power.
So, there needs to be some method to quantize this estimate of certainty or
uncertainty in any decision made. To implement this method, the certainty
factor was introduced for systems which work on Artificial Intelligence.
The Certainty Factor (CF) is a numeric value which tells us about how likely
an event or a statement is supposed to be true. It is somewhat similar to what we
define in probability, but the difference in it is that an agent after finding the
probability of any event to occur cannot decide what to do. Based on the
probability and other knowledge that the agent has, this certainty factor is
decided through which the agent can decide whether to declare the statement
true or false.
The value of the Certainty factor lies between -1.0 to +1.0, where the negative
1.0 value suggests that the statement can never be true in any situation, and the
positive 1.0 value defines that the statement can never be false. The value of
the Certainty factor after analyzing any situation will either be a positive or a
negative value lying between this range. The value 0 suggests that the agent has
no information about the event or the situation.
A minimum Certainty factor is decided for every case through which the agent
decides whether the statement is true or false. This minimum Certainty
factor is also known as the threshold value. For example, if the
minimum certainty factor (threshold value) is 0.4, then if the value of CF is
less than this value, then the agent claims that particular statement false.
What Dempster Shafer Theory was given by Arthure P.Dempster in 1967 and
his student Glenn Shafer in 1976.
This theory was released because of following reason:-
Fuzzy Logic
Fuzzy logic has been used in numerous applications such as facial pattern
recognition, air conditioners, washing machines, vacuum cleaners, antiskid
braking systems, transmission systems, control of subway systems and
unmanned helicopters, knowledge-based systems for multiobjective
optimization of power systems
Development
1. Step 1 − Define linguistic variables and terms. Linguistic variables are input and
output variables in the form of simple words or sentences. ...
2. Step 2 − Construct membership functions for them. ...
3. Step3 − Construct knowledge base rules. ...
4. Step 4 − Obtain fuzzy value. ...
5. Step 5 − Perform defuzzification
What are the characteristic of fuzzy logic?
Characteristics of Fuzzy Logic
Fuzzy Sets
https://www.tutorialspoint.com/fuzzy_logic/fuzzy_logic_set_theory.htm
What is Fuzzy Logic?
In other words, we can say that fuzzy logic is not logic that is fuzzy, but logic
that is used to describe fuzziness. There can be numerous other examples like
this with the help of which we can understand the concept of fuzzy logic.
Fuzzy Logic was introduced in 1965 by Lofti A. Zadeh in his research paper
“Fuzzy Sets”. He is considered as the father of Fuzzy Logic.
Fuzzy Logic - Classical Set Theory
A set is an unordered collection of different elements. It can be written
explicitly by listing its elements using the set bracket. If the order of the
elements is changed or any element of a set is repeated, it does not make any
changes in the set.
Example
A set of all positive integers.
A set of all the planets in the solar system.
A set of all the states in India.
A set of all the lowercase letters of the alphabet.
Types of Sets
Sets can be classified into many types; some of which are finite, infinite, subset,
universal, proper, singleton set, etc.
Finite Set
A set which contains a definite number of elements is called a finite set.
Example − S = {x|x ∈ N and 70 > x > 50}
Infinite Set
A set which contains infinite number of elements is called an infinite set.
Example − S = {x|x ∈ N and x > 10}
Subset
A set X is a subset of set Y (Written as X ⊆ Y) if every element of X is an
element of set Y.
Example 1 − Let, X = {1,2,3,4,5,6} and Y = {1,2}. Here set Y is a subset of set
X as all the elements of set Y is in set X. Hence, we can write Y⊆X.
Example 2 − Let, X = {1,2,3} and Y = {1,2,3}. Here set Y is a subset (not a
proper subset) of set X as all the elements of set Y is in set X. Hence, we can
write Y⊆X.
Proper Subset
The term “proper subset” can be defined as “subset of but not equal to”. A Set
X is a proper subset of set Y (Written as X ⊂ Y) if every element of X is an
element of set Y and |X| < |Y|.
Example − Let, X = {1,2,3,4,5,6} and Y = {1,2}. Here set Y ⊂ X, since all
elements in Y are contained in X too and X has at least one element which is
more than set Y.
Universal Set
It is a collection of all elements in a particular context or application. All the
sets in that context or application are essentially subsets of this universal set.
Universal sets are represented as U.
Example − We may define U as the set of all animals on earth. In this case, a
set of all mammals is a subset of U, a set of all fishes is a subset of U, a set of
all insects is a subset of U, and so on.
Empty Set or Null Set
An empty set contains no elements. It is denoted by Φ. As the number of
elements in an empty set is finite, empty set is a finite set. The cardinality of
empty set or null set is zero.
Example – S = {x|x ∈ N and 7 < x < 8} = Φ
Singleton Set or Unit Set
A Singleton set or Unit set contains only one element. A singleton set is denoted
by {s}.
Example − S = {x|x ∈ N, 7 < x < 9} = {8}
Equal Set
If two sets contain the same elements, they are said to be equal.
Example − If A = {1,2,6} and B = {6,1,2}, they are equal as every element of
set A is an element of set B and every element of set B is an element of set A.
Equivalent Set
If the cardinalities of two sets are same, they are called equivalent sets.
Example − If A = {1,2,6} and B = {16,17,22}, they are equivalent as
cardinality of A is equal to the cardinality of B. i.e. |A| = |B| = 3
Overlapping Set
Two sets that have at least one common element are called overlapping sets. In
case of overlapping sets −
n(A∪B)=n(A)+n(B)−n(A∩B)n(A∪B)=n(A)+n(B)−n(A∩B)
n(A∪B)=n(A−B)+n(B−A)+n(A∩B)n(A∪B)=n(A−B)+n(B−A)+n(A∩B)
n(A)=n(A−B)+n(A∩B)n(A)=n(A−B)+n(A∩B)
n(B)=n(B−A)+n(A∩B)n(B)=n(B−A)+n(A∩B)
Example − Let, A = {1,2,6} and B = {6,12,42}. There is a common element
‘6’, hence these sets are overlapping sets.
Disjoint Set
Two sets A and B are called disjoint sets if they do not have even one element
in common. Therefore, disjoint sets have the following properties −
n(A∩B)=ϕn(A∩B)=ϕ
n(A∪B)=n(A)+n(B)n(A∪B)=n(A)+n(B)
Example − Let, A = {1,2,6} and B = {7,9,14}, there is not a single common
element, hence these sets are overlapping sets.
Set Operations include Set Union, Set Intersection, Set Difference, Complement
of Set, and Cartesian Product.
Union
The union of sets A and B (denoted by A ∪ BA ∪ B) is the set of elements
which are in A, in B, or in both A and B. Hence, A ∪ B = {x|x ∈ A OR x ∈ B}.
Example − If A = {10,11,12,13} and B = {13,14,15}, then A ∪ B =
{10,11,12,13,14,15} – The common element occurs only once.
Intersection
The intersection of sets A and B (denoted by A ∩ B) is the set of elements
which are in both A and B. Hence, A ∩ B = {x|x ∈ A AND x ∈ B}.
Complement of a Set
The complement of a set A (denoted by A′) is the set of elements which are not
in set A. Hence, A′ = {x|x ∉ A}.
More specifically, A′ = (U−A) where U is a universal set which contains all
objects.
Example − If A = {x|x belongs to set of add integers} then A′ = {y|y does not
belong to set of odd integers}
Cartesian Product / Cross Product
The Cartesian product of n number of sets A1,A2,…An denoted as A1 × A2...×
An can be defined as all possible ordered pairs (x1,x2,…xn) where x1 ∈ A1,x2
∈ A2,…xn ∈ An
Example − If we take two sets A = {a,b} and B = {1,2},
The Cartesian product of A and B is written as − A × B = {(a,1),(a,2),(b,1),
(b,2)}
And, the Cartesian product of B and A is written as − B × A = {(1,a),(1,b),(2,a),
(2,b)}
Properties on sets play an important role for obtaining the solution. Following
are the different properties of classical sets −
Commutative Property
Having two sets A and B, this property states −
A∪B=B∪AA∪B=B∪A
A∩B=B∩AA∩B=B∩A
Associative Property
Having three sets A, B and C, this property states −
A∪(B∪C)=(A∪B)∪CA∪(B∪C)=(A∪B)∪C
A∩(B∩C)=(A∩B)∩CA∩(B∩C)=(A∩B)∩C
Distributive Property
Having three sets A, B and C, this property states −
A∪(B∩C)=(A∪B)∩(A∪C)A∪(B∩C)=(A∪B)∩(A∪C)
A∩(B∪C)=(A∩B)∪(A∩C)A∩(B∪C)=(A∩B)∪(A∩C)
Idempotency Property
For any set A, this property states −
A∪A=AA∪A=A
A∩A=AA∩A=A
Identity Property
For set A and universal set X, this property states −
A∪φ=AA∪φ=A
A∩X=AA∩X=A
A∩φ=φA∩φ=φ
A∪X=XA∪X=X
Transitive Property
Having three sets A, B and C, the property states −
If A⊆B⊆CA⊆B⊆C, then A⊆CA⊆C
Involution Property
For any set A, this property states −
A¯¯¯¯¯¯¯¯=AA¯¯=A
De Morgan’s Law
It is a very important law and supports in proving tautologies and contradiction.
This law states −
A∩B¯¯¯¯¯¯¯¯¯¯¯¯¯=A¯¯¯¯∪B¯¯¯¯A∩B¯=A¯∪B¯
A∪B¯¯¯¯¯¯¯¯¯¯¯¯¯=A¯¯¯¯∩B¯¯¯¯A∪B¯=A¯∩B¯
Fuzzy
Fuzzy sets can be considered as an extension and gross oversimplification of
classical sets. It can be best understood in the context of set membership.
Basically it allows partial membership which means that it contain elements that
have varying degrees of membership in the set. From this, we can understand
the difference between classical set and fuzzy set. Classical set contains
elements that satisfy precise properties of membership while fuzzy set contains
elements that satisfy imprecise properties of membership.
Mathematical Concept
Having two fuzzy sets A˜A~ and B˜B~, the universe of information UU and an
element 𝑦 of the universe, the following relations express the union, intersection
and complement operation on fuzzy sets.
Union/Fuzzy ‘OR’
Let us consider the following representation to understand how
the Union/Fuzzy ‘OR’ relation works −
μA˜∪B˜(y)=μA˜∨μB˜∀y∈UμA~∪B~(y)=μA~∨μB~∀y∈U
Here ∨ represents the ‘max’ operation.
Intersection/Fuzzy ‘AND’
Let us consider the following representation to understand how
the Intersection/Fuzzy ‘AND’ relation works −
μA˜∩B˜(y)=μA˜∧μB˜∀y∈UμA~∩B~(y)=μA~∧μB~∀y∈U
Here ∧ represents the ‘min’ operation.
Complement/Fuzzy ‘NOT’
Let us consider the following representation to understand how
the Complement/Fuzzy ‘NOT’ relation works −
μA˜=1−μA˜(y)y∈UμA~=1−μA~(y)y∈U
We already know that fuzzy logic is not logic that is fuzzy but logic that is used
to describe fuzziness. This fuzziness is best characterized by its membership
function. In other words, we can say that membership function represents the
degree of truth in fuzzy logic.
Mathematical Notation
We have already studied that a fuzzy set à in the universe of information U can
be defined as a set of ordered pairs and it can be represented mathematically as
−
A˜={(y,μA˜(y))|y∈U}A~={(y,μA~(y))|y∈U}
Here μA˜(∙)μA~(∙) = membership function of A˜A~; this assumes values in the
range from 0 to 1, i.e., μA˜(∙)∈[0,1]μA~(∙)∈[0,1]. The membership
function μA˜(∙)μA~(∙) maps UU to the membership spaceMM.
The dot (∙)(∙) in the membership function described above, represents the
element in a fuzzy set; whether it is discrete or continuous.
Features of Membership Functions
Fuzzification
Defuzzification
It may be defined as the process of reducing a fuzzy set into a crisp set or to
convert a fuzzy member into a crisp member.
We have already studied that the fuzzification process involves conversion from
crisp quantities to fuzzy quantities. In a number of engineering applications, it is
necessary to defuzzify the result or rather “fuzzy result” so that it must be
converted to crisp result. Mathematically, the process of Defuzzification is also
called “rounding it off”.
The different methods of Defuzzification are described below −
Max-Membership Method
This method is limited to peak output functions and also known as height
method. Mathematically it can be represented as follows −
μA˜(x∗)>μA˜(x)forallx∈XμA~(x∗)>μA~(x)forallx∈X
Here, x∗x∗ is the defuzzified output.
Centroid Method
This method is also known as the center of area or the center of gravity method.
Mathematically, the defuzzified output x∗x∗ will be represented as −
x∗=∫μA˜(x).xdx∫μA˜(x).dxx∗=∫μA~(x).xdx∫μA~(x).dx
Weighted Average Method
In this method, each membership function is weighted by its maximum
membership value. Mathematically, the defuzzified output x∗x∗ will be
represented as −
x∗=∑μA˜(xi¯¯¯¯¯).xi¯¯¯¯¯∑μA˜(xi¯¯¯¯¯)x∗=∑μA~(xi¯).xi¯∑μA~(xi¯)
Mean-Max Membership
This method is also known as the middle of the maxima. Mathematically, the
defuzzified output x∗x∗ will be represented as −
x∗=∑i=1nxi¯¯¯¯¯nx∗=∑i=1nxi¯n
UNIT V
Machine Learning
computer/machine learns from the past experiences (input data) and makes future
Supervised Learning:
In supervised learning the machine experiences the examples along with the
labels or targets for each example. The labels in the data help the algorithm to
correlate the features.
Unsupervised Learning:
When we have unclassified and unlabeled data, the system attempts to uncover
patterns from the data . There is no label or target given for the examples. One
common task is to group similar examples together called clustering.
Reinforcement Learning:
Linear Regression
Now to calculate the performance of the model, we first calculate the error of
each example i as:
we take the absolute value of the error to take into account both positive and
negative values of error.
Finally we calculate the mean for all recorded absolute errors (Average sum of
all absolute errors).
The main aim of training the ML algorithm is to adjust the weights W to reduce
To minimize the error, the model while experiencing the examples of the
training set, updates the model parameters W. These error calculations when
plotted against the W is also called cost function J(w), since it determines the
cost/penalty of the model. So minimizing the error is also called as minimization
the cost function J.
As we see from the curve, there exists a value of parameters W which has the
minimum cost Jmin. Now we need to find a way to reach this minimum cost.
In the gradient descent algorithm, we start with random model parameters and
calculate the error for each learning iteration, keep updating the model
parameters to move closer to the values that results in minimum cost.
In the above equation we are updating the model parameters after each iteration.
The second term of the equation calculates the slope or gradient of the curve at
each iteration.
Batch gradient descent: Uses all of the training instances to update the model
parameters in each iteration.
Logistic Regression
In some problems the response variable is not normally distributed. For instance,
a coin toss can result in two outcomes: heads or tails. The Bernoulli distribution
describes the probability distribution of a random variable that can take the
positive case with probability P or the negative case with probability 1-P. If the
response variable represents a probability, it must be constrained to the
range {0,1}.
In logistic regression, the response variable describes the probability that the
outcome is the positive case. If the response variable is equal to or exceeds a
discrimination threshold, the positive class is predicted; otherwise, the negative
class is predicted.
Now coming back to our logistic regression problem, Let us assume that z is a
linear function of a single explanatory variable x. We can then express z as
follows:
The input to the sigmoid function ‘g’ doesn’t need to be linear function. It can
very well be a circle or any shape.
Cost Function
We cannot use the same cost function that we used for linear regression because
the Sigmoid Function will cause the output to be wavy, causing many local
optima. In other words, it will not be a convex function.
In order to ensure the cost function is convex (and therefore ensure convergence
to the global minimum), the cost function is transformed using the logarithm of
the sigmoid function. The cost function for logistic regression looks like:
Which can be written as:
Since the cost function is a convex function, we can run the gradient descent
algorithm to find the minimum cost.
To increase model capacity, we add another feature by adding term x² to it. This
produces a better fit ( middle figure). But if we keep on doing so ( x⁵, 5th order
polynomial, figure on the right side), we may be able to better fit the data but
will not generalize well for new data. The first figure represents under-fitting and
the last figure represents over-fitting.
Under-fitting:
When the model has fewer features and hence not able to learn from the data
very well. This model has high bias.
Over-fitting:
When the model has complex functions and hence able to fit the data very well
but is not able to generalize to predict new data. This model has high variance.
2. Regularization: Keep all the features, but reduce the magnitude of weights
W. Regularization works well when we have a lot of slightly useful feature.
Regularization
Regularization can be applied to both linear and logistic regression by adding a
penalty term to the error function in order to discourage the coefficients or
weights from reaching large values.
Hyper-parameters
Cross-Validation
The training set is used to fit the different models, and the performance on the
validation set is then used for the model selection. The advantage of keeping a
test set that the model hasn’t seen before during the training and model selection
steps is that we avoid over-fitting the model and the model is able to better
generalize to unseen data.
In many applications, however, the supply of data for training and testing will be
limited, and in order to build good models, we wish to use as much of the
available data as possible for training. However, if the validation set is small, it
will give a relatively noisy estimate of predictive performance. One solution to
this dilemma is to use cross-validation, which is illustrated in Figure below.
Below Cross-validation steps are taken from here, adding here for completeness.
Cross-Validation Step-by-Step:
These are the steps for selecting hyper-parameters using K-fold cross-validation:
3. Train your model with that set of hyper-parameters on the first 3 folds.
7. Repeat steps (2) to (6) for all sets of hyper-parameters you wish to consider.
Conclusion
We’ve covered some of the key concepts in the field of Machine Learning,
starting with the definition of machine learning and then covering different types
of machine learning techniques. We discussed the theory behind the most
common regression techniques (Linear and Logistic) alongside discussed other
key concepts of machine learning.
Example: Let's understand the clustering technique with the real-world example
of Mall: When we visit any shopping mall, we can observe that the things with
similar usage are grouped together. Such as the t-shirts are grouped in one
section, and trousers are at other sections, similarly, at vegetable sections,
apples, bananas, Mangoes, etc., are grouped in separate sections, so that we can
easily find out the things. The clustering technique also works in the same way.
Other examples of clustering are grouping documents according to the topic.
The clustering technique can be widely used in various tasks. Some most
common uses of this technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
The below diagram explains the working of the clustering algorithm. We can
see the different fruits are divided into several groups with similar properties.
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to
another group also). But there are also other various approaches of Clustering
exist. Below are the main clustering methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
In this type, the dataset is divided into a set of k groups, where K is used to
define the number of pre-defined groups. The cluster center is created in such a
way that the distance between the data points of one cluster is minimum as
compared to another cluster centroid.
Density-Based Clustering
These algorithms can face difficulty in clustering the data points if the dataset
has varying densities and high dimensions.
Distribution Model-Based Clustering
Hierarchical Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to
more than one group or cluster. Each dataset has a set of membership
coefficients, which depend on the degree of membership to be in a
cluster. Fuzzy C-means algorithm is the example of this type of clustering; it
is sometimes also known as the Fuzzy k-means algorithm.
Clustering Algorithms
The Clustering algorithms can be divided based on their models that are
explained above. There are different types of clustering algorithms published,
but only a few are commonly used. The clustering algorithm is based on the
kind of data that we are using. Such as, some algorithms need to guess the
number of clusters in the given dataset, whereas some are required to find the
minimum distance between the observation of the dataset.
Here we are discussing mainly popular Clustering algorithms that are widely
used in machine learning:
Applications of Clustering
Both approaches are used in various types of research, and it’s not uncommon
to combine them in one large study.
Table of contents
1. Observation
oA low-cost airline flight is delayed
o Dogs A and B have fleas
o Elephants depend on water to exist
2. Observe a pattern
o Another 20 flights from low-cost airlines are delayed
o All observed dogs have fleas
o All observed animals depend on water to exist
3. Develop a theory or general (preliminary) conclusion
o Low cost airlines always have delays
o All dogs have fleas
o All biological life depends on water to exist
Example
You observe 1000 flights from low-cost airlines. All of them experience a
delay, which is in line with your theory. However, you can never prove that
flight 1001 will also be delayed. Still, the larger your dataset, the more reliable
the conclusion.
Example
Based on the premises we have, the conclusion must be true. However, if the
first premise turns out to be false, the conclusion that Benno has fleas cannot be
relied upon
In the examples above, the conclusion (theory) of the inductive study is also
used as a starting point for the deductive study
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put the
new data point in the correct category in the future. This best decision boundary
is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange creature.
So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme
case of cat and dog. On the basis of the support vectors, it will classify it as a
cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will
be a straight line. And if there are 3 features, then hyperplane will be a 2-
dimension plane.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors
support the hyperplane, hence called a Support vector.
Linear SVM:
So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support
vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below
image:
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add
a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
If we convert it in 2d space with z=1, then it will become as:
Now we will implement the SVM algorithm using Python. Here we will use the
same dataset user_data, which we have used in Logistic regression and KNN
classification.
Till the Data pre-processing step, the code will remain the same. Below is the
code:
After executing the above code, we will pre-process the data. The code will give
the dataset as:
The scaled output for the test set will be:
Fitting the SVM classifier to the training set:
Now the training set will be fitted to the SVM classifier. To create the SVM
classifier, we will import SVC class from Sklearn.svm library. Below is the
code for it:
In the above code, we have used kernel='linear', as here we are creating SVM
for linearly separable data. However, we can change it for non-linear data. And
then we fitted the classifier to the training dataset(x_train, y_train)
Output:
Out[8]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=0,
shrinking=True, tol=0.001, verbose=False)
Output: Below is the output for the prediction of the test set:
o Creating the confusion matrix:
Now we will see the performance of the SVM classifier that how many
incorrect predictions are there as compared to the Logistic regression
classifier. To create the confusion matrix, we need to import
the confusion_matrix function of the sklearn library. After importing the
function, we will call it using a new variable cm. The function takes two
parameters, mainly y_true( the actual values) and y_pred (the targeted
value return by the classifier). Below is the code for it:
Output:
As we can see in the above output image, there are 66+24= 90 correct
predictions and 8+2= 10 correct predictions. Therefore we can say that our
SVM model improved as compared to the Logistic regression model.
Output:
As we can see, the above output is appearing similar to the Logistic regression
output. In the output, we got the straight line as hyperplane because we
have used a linear kernel in the classifier. And we have also discussed above
that for the 2d space, the hyperplane in SVM is a straight line.
Output:
As we can see in the above output image, the SVM classifier has divided the
users into two regions (Purchased or Not purchased). Users who purchased the
SUV are in the red region with the red scatter points. And users who did not
purchase the SUV are in the green region with green scatter points. The
hyperplane has divided the two classes into Purchased and not purchased
variable.
The history of ANN can be divided into the following three eras −
ANN during 1940s to 1960s
Some key developments of this era are as follows −
1943 − It has been assumed that the concept of neural network started
with the work of physiologist, Warren McCulloch, and mathematician,
Walter Pitts, when in 1943 they modeled a simple neural network using
electrical circuits in order to describe how neurons in the brain might
work.
1949 − Donald Hebb’s book, The Organization of Behavior, put forth the
fact that repeated activation of one neuron by another increases its
strength each time they are used.
1956 − An associative memory network was introduced by Taylor.
1958 − A learning method for McCulloch and Pitts neuron model named
Perceptron was invented by Rosenblatt.
1960 − Bernard Widrow and Marcian Hoff developed models called
"ADALINE" and “MADALINE.”
ANN during 1960s to 1980s
Some key developments of this era are as follows −
1961 − Rosenblatt made an unsuccessful attempt but proposed the
“backpropagation” scheme for multilayer networks.
1964 − Taylor constructed a winner-take-all circuit with inhibitions
among output units.
1969 − Multilayer perceptron MLPMLP was invented by Minsky and
Papert.
1971 − Kohonen developed Associative memories.
1976 − Stephen Grossberg and Gail Carpenter developed Adaptive
resonance theory.
ANN from 1980s till Present
Some key developments of this era are as follows −
1982 − The major development was Hopfield’s Energy approach.
1985 − Boltzmann machine was developed by Ackley, Hinton, and
Sejnowski.
1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta
Rule.
1988 − Kosko developed Binary Associative Memory BAMBAM and
also gave the concept of Fuzzy Logic in ANN.
The historical review shows that significant progress has been made in this
field. Neural network based chips are emerging and applications to complex
problems are being developed. Surely, today is a period of transition for neural
network technology.
Biological Neuron
Soma Node
Dendrites Input
Axon Output
The following table shows the comparison between ANN and BNN based on
some criteria mentioned.
Lear They can Very precise, structured and formatted data is required to tolerate
ning tolerate ambiguity
ambiguity
Fault Performan It is capable of robust performance, hence has the potential to be fault
toler ce tolerant
ance degrades
with even
partial
damage
The following diagram represents the general model of ANN followed by its
processing.
For the above general model of artificial neural network, the net input can be
calculated as follows −
yin=x1.w1+x2.w2+x3.w3…xm.wmyin=x1.w1+x2.w2+x3.w3…xm.wm
i.e., Net input yin=∑mixi.wiyin=∑imxi.wi
The output can be calculated by applying the activation function over the net
input.
Y=F(yin)Y=F(yin)
Output = function netinputcalculated
Network Topology
A network topology is the arrangement of a network along with its nodes and
connecting lines. According to the topology, ANN can be classified as the
following kinds −
Feedforward Network
It is a non-recurrent network having processing units/nodes in layers and all the
nodes in a layer are connected with the nodes of the previous layers. The
connection has different weights upon them. There is no feedback loop means
the signal can only flow in one direction, from input to output. It may be
divided into the following two types −
Single layer feedforward network − The concept is of feedforward
ANN having only one weighted layer. In other words, we can say the
input layer is fully connected to the output layer.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of
similar type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to which
the input pattern belongs.
There is no feedback from the environment as to what should be the desired
output and if it is correct or incorrect. Hence, in this type of learning, the
network itself must discover the patterns and features from the input data, and
the relation for the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to
supervised learning, however we might have very less information.
During the training of network under reinforcement learning, the network
receives some feedback from the environment. This makes it somewhat similar
to supervised learning. However, the feedback obtained here is evaluative not
instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.
Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an
exact output. In ANN, we can also apply activation functions over the input to
get the exact output. Followings are some activation functions of interest −
Linear Activation Function
It is also called the identity function as it performs no input editing. It can be
defined as −
F(x)=xF(x)=x
Sigmoid Activation Function
It is of two type as follows −
Binary sigmoidal function − This activation function performs input
editing between 0 and 1. It is positive in nature. It is always bounded,
which means its output cannot be less than 0 and more than 1. It is also
strictly increasing in nature, which means more the input higher would be
the output. It can be defined as
F(x)=sigm(x)=11+exp(−x)F(x)=sigm(x)=11+exp(−x)
Bipolar sigmoidal function − This activation function performs input
editing between -1 and 1. It can be positive or negative in nature. It is
always bounded, which means its output cannot be less than -1 and more
than 1. It is also strictly increasing in nature like sigmoid function. It can
be defined as
F(x)=sigm(x)=21+exp(−x)−1=1−exp(x)1+exp(x)