0% found this document useful (0 votes)
40 views27 pages

ML Notes Mod 4

Machine learning notes for topics like neutral networks, clustering topics that come with these topics

Uploaded by

nandan.696.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views27 pages

ML Notes Mod 4

Machine learning notes for topics like neutral networks, clustering topics that come with these topics

Uploaded by

nandan.696.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

VI SEMESTER
MVJ22CS62 MACHINE LEARNING NOTES
ACADEMIC YEAR 2024 – 2025 [EVEN]

MODULE 4
BY
PROF POSHITHA M
ASSISTANT PROFESSOR
COMPUTER SCIENCE AND ENGINEERING
MVJ COLLEGE OF ENGINEERING
MACHINE LEARNING LABORATORY[MVJ22CS62]

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


VISION:
To create an ambiance in excellence and provide innovative emerging programs in Computer Science
and Engineering and to bring out future ready engineers equipped with technical expertise and strong
ethical values.

MISSION:
1. Concepts of Computing Discipline: To educate students at under graduate, postgraduate and doctoral
levels in the fundamental and advanced concepts of computing discipline.
2. Quality Research: To provide strong theoretical and practical background across the Computer Science
and Engineering discipline with the emphasis on computing technologies, quality research, consultancy
and training's.
3. Continuous Teaching Learning: To promote a teaching learning process that brings advancements in
Computer Science and Engineering discipline leading to new technologies and products.
4. Social Responsibility and Ethical Values: To inculcate professional behavior, innovative research
Capabilities , leadership abilities and strong ethical values in the young minds so as to work with the
commitment for the betterment of the society

PROGRAM EDUCATIONAL OBJECTIVES (PEOs):


PEO1: Current Industry Practices: Graduates will analyze real world problems and give solution
using current industry practices in computing technology.
PEO2: Research and Higher Studies: Graduates with strong foundation in mathematics and
engineering fundamentals that will enable graduates to pursue higher learning, R&D activities and
consultancy.
PEO3: Social Responsibility: Graduates will be professionals with ethics, who will provide industry
growth and social transformation as responsible citizens.
PEO4: Entrepreneur: Graduates will be able to become entrepreneur to address social, technical and
business challenges.

PROGRAM OUTCOMES (POs):


1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and
an engineering specialization to the solution of complex engineering problems.
2. Problem analysis: Identify, formulate, research literature, and analyze complex engineering problems
reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering
and IT tools including prediction and modeling to complex engineering activities with an understanding
of the limitations.

DEPARTMENT OF CSE, MVJCE Page 2


MACHINE LEARNING LABORATORY[MVJ22CS62]
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse
teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

PROGRAM SPECIFIC OUTCOMES (PSOs):


PSO1: Programming: Ability to understand, analyze and develop computer programs in the areas
related to algorithms, system software, multimedia, web design, DBMS, and networking for efficient
design of computer-based systems of varying complexity.
PSO2: Practical Solution: Ability to practically provide solutions for real world problems with a broad
range of programming language and open source platforms in various computing domains.
PSO3: Research: Ability to use innovative ideas to do research in various domains to solve societal
problems.

PROGRAM SPECIFIC OUTCOMES (PSOs):


PSO1: Programming: Ability to understand, analyze and develop computer programs in the
areas related to algorithms, system software, multimedia, web design, DBMS, and networking
for efficient design of computer-based systems of varying complexity.
PSO2: Practical Solution: Ability to practically provide solutions for real world problems with
a broad range of programming language and open source platforms in various computing
domains.
PSO3: Research: Ability to use innovative ideas to do research in various domains to solve
societal problems.
COURSE OBJECTIVES:
1. Understand fundamentals of machine learning, including the types of learning, data pre-processing
techniques, and design principles, to enable them to develop effective learning systems that can tackle
real-world problems.
2. Implement and evaluate regression and classification models, including linear and polynomial regression,
logistic regression, and decision trees, to solve real-world problems and make informed decisions.
3. Understand classification techniques, including decision trees, random forests, naive Bayes, K-NN, SVM,
and evaluation metrics, to develop robust and accurate classification models that can handle complex data
sets and real-world applications.
4. Understand the concepts and techniques of clustering and artificial neural networks, enabling them to
apply clustering algorithms and design neural networks to solve real-world problems, including data
DEPARTMENT OF CSE, MVJCE Page 3
MACHINE LEARNING LABORATORY[MVJ22CS62]
clustering, classification, and prediction.
5. Understand the fundamentals of reinforcement learning and deep learning, enabling them to understand
the concepts of learning from feedback and building deep neural networks to solve complex problems in
artificial intelligence, such as decision-making and pattern recognition.
PREREQUISITES:
 Programming experience in Python.
 Knowledge of basic Machine Learning Algorithms.
 Knowledge of common statistical methods and data analysis best practices

COURSE OUTCOMES (CO’s

CO No CO’s
C406.1 Design and develop effective machine learning systems that can tackle real-
world problems by applying fundamental concepts, data pre-processing
techniques, and design principles to build accurate and reliable models.
C406.2 Evaluate, and interpret various regression and classification models to solve
real-world problems, making informed decisions by analyzing and predicting
continuous and categorical outcomes with accuracy and confidence.
C406.3 Design and develop robust and accurate classification models using various
techniques, including decision trees, random forests, and Naive Bayes, K-NN,
SVM, and evaluation metrics, to effectively classify complex data sets and
solve real-world problems.
C406.4 Apply clustering algorithms and design artificial neural networks to solve real-
world problems, including data clustering, classification, and prediction, using
techniques such as partitioning, hierarchical clustering, and deep learning.
C406.5 Design and implement reinforcement learning and deep learning models that
can learn from feedback, recognize patterns, and make decisions in complex
environments, enabling them to solve real-world problems in artificial
intelligence

DEPARTMENT OF CSE, MVJCE Page 4


MACHINE LEARNING LABORATORY[MVJ22CS62]

MODULE 4
CONTENTS

CHAPTER 1
Clustering:
 Need and Applications of Clustering
 Partitioned methods
 Hierarchical methods
 Density-based methods.

CHAPTER 2
Artificial Neural Networks:
 Introduction
 Neural Network representation
 Appropriate problems
 Perceptron
 Backpropagation algorithm.

DEPARTMENT OF CSE, MVJCE Page 5


MACHINE LEARNING LABORATORY[MVJ22CS62]
CHAPTER 1
Clustering in Machine Learning
In real world, not every data we work upon has a target variable. Have you ever wondered how Netflix
groups similar movies together or how Amazon organizes its vast product catalog? These are real-world
applications of clustering. This kind of data cannot be analyzed using supervised learning algorithms.
When the goal is to group similar data points in a dataset, then we use cluster analysis. In this guide, we’ll learn
understand concept of clustering, its applications, and some popular clustering algorithms.
What is Clustering?
The task of grouping data points based on their similarity with each other is called Clustering or Cluster
Analysis. This method is defined under the branch of unsupervised learning, which aims at gaining insights
from unlabelled data points.
Think of it as you have a dataset of customers shopping habits. Clustering can help you group customers
with similar purchasing behaviors, which can then be used for targeted marketing, product
recommendations, or customer segmentation
For Example, In the graph given below, we can clearly see that there are 3 circular clusters forming on the basis
of distance.

Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters can be
arbitrary. There are many algorithms that work well with detecting arbitrary shaped clusters.
For example, In the below given graph we can see that the clusters formed are not circular in shape.

DEPARTMENT OF CSE, MVJCE Page 6


MACHINE LEARNING LABORATORY[MVJ22CS62]

Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group similar data points:
 Hard Clustering: In this type of clustering, each data point belongs to a cluster completely or not. For
example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So each data point
will either belong to cluster 1 or cluster 2.
Data Points Clusters

A C1

B C2

C C2

D C1

 Soft Clustering: In this type of clustering, instead of assigning each data point into a separate cluster, a
probability or likelihood of that point being that cluster is evaluated. For example, Let’s say there are 4 data
point and we have to cluster them into 2 clusters. So we will be evaluating a probability of a data point
belonging to both clusters. This probability is calculated for all data points.

Data Points Probability of C1 Probability of C2

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0

Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use cases of Clustering
algorithms. Clustering algorithms are majorly used for:
1. Market Segmentation: Businesses use clustering to group their customers and use targeted advertisements
to attract more audience.
2. Market Basket Analysis: Shop owners analyze their sales and figure out which items are majorly bought
together by the customers. For example, In USA, according to a study diapers and beers were usually
bought together by fathers.
3. Social Network Analysis: Social media sites use your data to understand your browsing behavior and
provide you with targeted friend recommendations or content recommendations.
4. Medical Imaging: Doctors use Clustering to find out diseased areas in diagnostic images like X-rays.
DEPARTMENT OF CSE, MVJCE Page 7
MACHINE LEARNING LABORATORY[MVJ22CS62]
5. Anomaly Detection: To find outliers in a stream of real-time dataset or forecasting fraudulent transactions
we can use clustering to identify them.
6. Simplify working with large datasets: Each cluster is given a cluster ID after clustering is complete. Now,
you may reduce a feature set’s whole feature set into its cluster ID. Clustering is effective when it can
represent a complicated case with a straightforward cluster ID. Using the same principle, clustering data
can make complex datasets simpler.
There are many more use cases for clustering but there are some of the major and common use cases of
clustering. Moving forward we will be discussing Clustering Algorithms that will help you perform the above
tasks.

Why is clustering useful?


 Data visualization: Clusters can help identify natural groups in data, which can be visualized to help
understand the data.
 Anomaly detection: Clusters can help identify data points that are not part of any cluster, or outliers.
 Resource allocation: Clusters can help identify which groups or areas need the most attention or
resources.
 Model building: Clusters can help improve the predictive performance of supervised models.
 Sampling: Clusters can be used to create different types of data samples.
 Market segmentation: Clusters can help marketers identify distinct groups in their customer base.

Partitioning Method: This clustering method classifies the information into multiple groups based on the
characteristics and similarity of the data. Its the data analysts to specify the number of clusters that has to be
generated for the clustering methods.
n the partitioning method when database(D) that contains multiple(N) objects then the partitioning method
constructs user-specified(K) partitions of the data in which each partition represents a cluster and a particular
region.
There are many algorithms that come under partitioning method some of the popular ones are K-Mean,
PAM(K-Medoids), CLARA algorithm (Clustering Large Applications) etc.
K-Mean (A centroid based Technique): The K means algorithm takes the input parameter K from the user
and partitions the dataset containing N objects into K clusters so that resulting similarity among the data
objects inside the group (intracluster) is high but the similarity of data objects with the data objects from
outside the cluster is low (intercluster).
L-The similarity of the cluster is determined with respect to the mean value of the cluster. It is a type of
square error algorithm.
M-At the start randomly k objects from the dataset are chosen in which each of the objects represents a
cluster mean(centre). For the rest of the data objects, they are assigned to the nearest cluster based on their
distance from the cluster mean.
N- The new mean of each of the cluster is then calculated with the added data objects.

DEPARTMENT OF CSE, MVJCE Page 8


MACHINE LEARNING LABORATORY[MVJ22CS62]
Algorithm:
K mean:
Input:
K: The number of clusters in which the dataset has to be divided
D: A dataset containing N number of objects

Output:
A dataset of K clusters
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the updated values.
4. Repeat Step 2 until no change occurs.

Figure – K-mean Clustering

Flowchart:
Figure – K-mean Clustering
Example: Suppose we want to group the visitors to a website using just their age as follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
DEPARTMENT OF CSE, MVJCE Page 9
MACHINE LEARNING LABORATORY[MVJ22CS62]
Initial Cluster:
K=2
Centroid(C1) = 16 [16]
Centroid(C2) = 22 [22]
Note: These two points are chosen randomly from the dataset.
Iteration-1:
C1 = 16.33 [16, 16, 17]
C2 = 37.25 [20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-2:
C1 = 19.55 [16, 16, 17, 20, 20, 21, 21, 22, 23]
C2 = 46.90 [29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-3:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-4:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
No change Between Iteration 3 and 4, so we stop. Therefore we get the clusters (16-29) and (36-66) as 2
clusters we get using K Mean Algorithm.
Hierarchical Clustering in Data Mining
A Hierarchical clustering method works via grouping data into a tree of clusters. Hierarchical clustering begins
by treating every data point as a separate cluster. Then, it repeatedly executes the subsequent steps:
1. Identify the 2 clusters which can be closest together, and
2. Merge the 2 maximum comparable clusters. We need to continue these steps until all the clusters are
merged together.
In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters. A diagram
called Dendrogram (A Dendrogram is a tree-like diagram that statistics the sequences of merges or splits)
graphically represents this hierarchy and is an inverted tree that describes the order in which factors are merged
(bottom-up view) or clusters are broken up (top-down view).
What is Hierarchical Clustering?
Hierarchical clustering is a method of cluster analysis in data mining that creates a hierarchical representation
of the clusters in a dataset. The method starts by treating each data point as a separate cluster and then
iteratively combines the closest clusters until a stopping criterion is reached. The result of hierarchical
clustering is a tree-like structure, called a dendrogram, which illustrates the hierarchical relationships among the
clusters.

DEPARTMENT OF CSE, MVJCE Page 10


MACHINE LEARNING LABORATORY[MVJ22CS62]

 The ability to handle non-convex clusters and clusters of different sizes and densities.
 The ability to handle missing data and noisy data.
 The ability to reveal the hierarchical structure of the data, which can be useful for understanding the
relationships among the clusters.

Drawbacks of Hierarchical Clustering

 The need for a criterion to stop the clustering process and determine the final number of clusters.
 The computational cost and memory requirements of the method can be high, especially for large datasets.
 The results can be sensitive to the initial conditions, linkage criterion, and distance metric used.
In summary, Hierarchical clustering is a method of data mining that groups similar data points into clusters
by creating a hierarchical structure of the clusters.
 This method can handle different types of data and reveal the relationships among the clusters. However, it
can have high computational cost and results can be sensitive to some conditions.

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:


1. Agglomerative Clustering
2. Divisive clustering

1. Agglomerative Clustering

Initially consider every data point as an individual Cluster and at every step, merge the nearest pairs of the
cluster. (It is a bottom-up method). At first, every dataset is considered an individual entity or cluster. At every
iteration, the clusters merge with different clusters until one cluster is formed.
The algorithm for Agglomerative Hierarchical Clustering is:
 Calculate the similarity of one cluster with all the other clusters (calculate proximity matrix)
 Consider every data point as an individual cluster
 Merge the clusters which are highly similar or close to each other.
 Recalculate the proximity matrix for each cluster
 Repeat Steps 3 and 4 until only a single cluster remains.
Let’s see the graphical representation of this algorithm using a dendrogram.
Note: This is just a demonstration of how the actual algorithm works no calculation has been
performed below all the proximity among the clusters is assumed.
Let’s say we have six data points A, B, C, D, E, and F.

DEPARTMENT OF CSE, MVJCE Page 11


MACHINE LEARNING LABORATORY[MVJ22CS62]

Agglomerative Hierarchical clustering


 Step-1: Consider each alphabet as a single cluster and calculate the distance of one cluster from all the
other clusters.
 Step-2: In the second step comparable clusters are merged together to form a single cluster. Let’s say
cluster (B) and cluster (C) are very similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
 Step-3: We recalculate the proximity according to the algorithm and merge the two nearest clusters([(DE),
(F)]) together to form new clusters as [(A), (BC), (DEF)]
 Step-4: Repeating the same process; The clusters DEF and BC are comparable and merged together to form
a new cluster. We’re now left with clusters [(A), (BCDEF)].
 Step-5: At last, the two remaining clusters are merged together to form a single cluster [(ABCDEF)].

2. Divisive Hierarchical clustering

We can say that Divisive Hierarchical clustering is precisely the opposite of Agglomerative Hierarchical
clustering. In Divisive Hierarchical clustering, we take into account all of the data points as a single cluster and
in every iteration, we separate the data points from the clusters which aren’t comparable. In the end, we are left
with N clusters.

Divisive Hierarchical clustering

DEPARTMENT OF CSE, MVJCE Page 12


MACHINE LEARNING LABORATORY[MVJ22CS62]

DBSCAN is a density-based clustering algorithm that groups data points that are closely packed together
and marks outliers as noise based on their density in the feature space. It identifies clusters as dense regions
in the data space, separated by areas of lower density.
Unlike K-Means or hierarchical clustering, which assume clusters are compact and spherical, DBSCAN
excels in handling real-world data irregularities such as:
 Arbitrary-Shaped Clusters: Clusters can take any shape, not just circular or convex.
 Noise and Outliers: It effectively identifies and handles noise points without assigning them to any
cluster.

DBSCAN Clustering in ML | Density based clustering


The figure above shows a data set with clustering algorithms: K-Means and Hierarchical handling compact,
spherical clusters with varying noise tolerance, while DBSCAN manages arbitrary-shaped clusters and
excels in noise handling.
Key Parameters in DBSCAN
 1. eps: This defines the radius of the neighborhood around a data point.
If the distance between two points is less than or equal to eps, they are considered neighbors. Choosing the
right eps is crucial:
 If eps is too small, most points will be classified as noise.
 If eps is too large, clusters may merge, and the algorithm may fail to distinguish between them.
A common method to determine eps is by analyzing the k-distance graph.
 2. MinPts: This is the minimum number of points required within the eps radius to form a dense region.
A general rule of thumb is to set MinPts >= D+1, where D is the number of dimensions in the dataset. For most
cases, a minimum value of MinPts = 3 is recommended.

DEPARTMENT OF CSE, MVJCE Page 13


MACHINE LEARNING LABORATORY[MVJ22CS62]
How Does DBSCAN Work?
DBSCAN works by categorizing data points into three types:
1. core points, which have a sufficient number of neighbors within a specified radius (eplison)
2. border points, which are near core points but lack enough neighbors to be core points themselves
3. noise points, which do not belong to any cluster.
By iteratively expanding clusters from core points and connecting density-reachable points, DBSCAN forms
clusters without relying on rigid assumptions about their shape or size.

Steps in the DBSCAN Algorithm

 Identify Core Points: For each point in the dataset, count the number of points within
its eps neighborhood. If the count meets or exceeds MinPts, mark the point as a core point.
 Form Clusters: For each core point that is not already assigned to a cluster, create a new cluster.
Recursively find all density-connected points (points within the eps radius of the core point) and add them
to the cluster.
 Density Connectivity: Two points, a and b, are density-connected if there exists a chain of points where
each point is within the eps radius of the next, and at least one point in the chain is a core point. This
chaining process ensures that all points in a cluster are connected through a series of dense regions.
 Label Noise Points: After processing all points, any point that does not belong to a cluster is labeled
as noise.

DEPARTMENT OF CSE, MVJCE Page 14


MACHINE LEARNING LABORATORY[MVJ22CS62]
Pseudocode For DBSCAN Clustering Algorithm

DBSCAN(dataset, eps, MinPts){


# cluster index
C=1
for each unvisited point p in dataset {
mark p as visited
# find neighbors
Neighbors N = find the neighboring points of p

if |N|>=MinPts:
N = N U N'
if p' is not a member of any cluster:
add p' to cluster C

DEPARTMENT OF CSE, MVJCE Page 15


MACHINE LEARNING LABORATORY[MVJ22CS62]

CHAPTER 2
Artificial Neural Networks

Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our Artificial Neural
Network tutorial is developed for beginners as well as professions.

The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence modeled
after the brain. An Artificial neural network is usually a computational network based on biological neural
networks that construct the structure of the human brain. Similar to a human brain has neurons interconnected
to each other, artificial neural networks also have neurons that are linked to each other in various layers of the
networks. These neurons are known as nodes.

Artificial neural network tutorial covers all the aspects related to the artificial neural network. In this tutorial,
we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing map, Building blocks,
unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure of
a human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the networks. These
neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

DEPARTMENT OF CSE, MVJCE Page 16


MACHINE LEARNING LABORATORY[MVJ22CS62]
The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural network:

Biological neural network Artificial neural network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the network of
neurons makes up a human brain so that computers will have an option to understand things and make decisions
in a human-like manner. The artificial neural network is designed by programming computers to behave simply
like interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an association point somewhere in
the range of 1,000 and 100,000. In the human brain, data is stored in such a manner as to be distributed, and we
can extract more than one piece of this data when necessary from our memory parallelly. We can say that the
human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example of a digital logic gate
that takes an input and gives an output. "OR" gate, which takes two inputs. If one or both the inputs are "On,"
then we get "On" in output. If both the inputs are "Off," then we get "Off" in output. Here the output depends
upon input. Our brain does not perform the same task. The outputs to inputs relationship keep changing because
of the neurons in our brain, which are "learning."

DEPARTMENT OF CSE, MVJCE Page 17


MACHINE LEARNING LABORATORY[MVJ22CS62]
The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to understand what a
neural network consists of. In order to define a neural network that consists of a large number of artificial
neurons, which are termed units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden
features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally results in output that is
conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This
computation is represented in the form of a transfer function.

DEPARTMENT OF CSE, MVJCE Page 18


MACHINE LEARNING LABORATORY[MVJ22CS62]
It determines weighted total is passed as an input to an activation function to produce the output. Activation
functions choose whether a node should fire or not. Only those who are fired make it to the output layer. There
are distinctive activation functions available that can be applied upon the sort of task we are performing.

Advantages of Artificial Neural Network (ANN)

 Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task simultaneously.

 Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the network from working.

 Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate data. The loss of performance
here relies upon the significance of missing data.

 Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to encourage the network
according to the desired output by demonstrating these examples to the network. The succession of the network
is directly proportional to the chosen instances, and if the event can't appear to the network in all its aspects, it
can produce false output.

 Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature makes the
network fault-tolerance.

Disadvantages of Artificial Neural Network:

 Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural networks. The appropriate
network structure is accomplished through experience, trial, and error.

 Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide insight
concerning why and how. It decreases trust in the network.

 Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their structure. Therefore, the
realization of the equipment is dependent.

DEPARTMENT OF CSE, MVJCE Page 19


MACHINE LEARNING LABORATORY[MVJ22CS62]
 Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly impact the performance of
the network. It relies on the user's abilities.

 The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us optimum results.

Science artificial neural networks that have steeped into the world in the mid-20th century are exponentially
developing. In the present time, we have investigated the pros of artificial neural networks and the issues
encountered in the course of their utilization. It should not be overlooked that the cons of ANN networks, which
are a flourishing science branch, are eliminated individually, and their pros are increasing day by day. It means
that artificial neural networks will turn into an irreplaceable part of our lives progressively important.

How do artificial neural networks work?

Artificial Neural Network can be best represented as a weighted directed graph, where the artificial neurons
form the nodes. The association between the neurons outputs and neuron inputs can be viewed as the directed
edges with weights. The Artificial Neural Network receives the input signal from the external source in the
form of a pattern and image in the form of a vector. These inputs are then mathematically assigned by the
notations x(n) for every n number of inputs.

DEPARTMENT OF CSE, MVJCE Page 20


MACHINE LEARNING LABORATORY[MVJ22CS62]
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the details utilized by
the artificial neural networks to solve a specific problem ). In general terms, these weights normally represent
the strength of the interconnection between neurons inside the artificial neural network. All the weighted inputs
are summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or something else to scale
up to the system's response. Bias has the same input, and weight equals to 1. Here the total of weighted inputs
can be in the range of 0 to positive infinity. Here, to keep the response in the limits of the desired value, a
certain maximum value is benchmarked, and the total of weighted inputs is passed through the activation
function.

The activation function refers to the set of transfer functions used to achieve the desired output. There is a
different kind of the activation function, but primarily either linear or non-linear sets of functions. Some of the
commonly used sets of activation functions are the Binary, linear, and Tan hyperbolic sigmoidal activation
functions. Let us take a look at each of them in details:

Binary:

In binary activation function, the output is either a one or a 0. Here, to accomplish this, there is a threshold
value set up. If the net weighted input of neurons is more than 1, then the final output of the activation function
is returned as one or else the output is returned as 0.

Sigmoidal Hyperbolic:

The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan hyperbolic function is
used to approximate output from the actual net input. The function is defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Artificial Neural Network:

There are various types of Artificial Neural Networks (ANN) depending upon the human brain neuron and
network functions, an artificial neural network similarly performs tasks. The majority of the artificial neural
networks will have some similarities with a more complex biological partner and are very effective at their
expected tasks. For example, segmentation or classification.

Feedback ANN:

In this type of ANN, the output returns into the network to accomplish the best-evolved results internally. As
per the University of Massachusetts, Lowell Centre for Atmospheric Research. The feedback networks feed
information back into itself and are well suited to solve optimization issues. The Internal system error
corrections utilize feedback ANNs.

Feed-Forward ANN:

A feed-forward network is a basic neural network comprising of an input layer, an output layer, and at least one
layer of a neuron. Through assessment of its output by reviewing its input, the intensity of the network can be
DEPARTMENT OF CSE, MVJCE Page 21
MACHINE LEARNING LABORATORY[MVJ22CS62]
noticed based on group behavior of the associated neurons, and the output is decided. The primary advantage of
this network is that it figures out how to evaluate and recognize input patterns

Appropriate Problems for ANN

 Training data is noisy, complex sensor data also problems where symbolic algos are used
(decision tree learning (DTL)) - ANN and DTL produce results of comparable accuracy
 Instances are attribute-value pairs, attributes may be highly correlated or independent, values
can be any real value
 Target function may be discrete-valued, real-valued or a vector
 Training examples may contain errors
 Long training times are acceptable
 Requires fast eval. of learned target func.
 Humans do NOT need to understand the learned target func.

What is Perceptron?
Perceptron is a type of neural network that performs binary classification that maps input features to an
output decision, usually classifying data into one of two categories, such as 0 or 1.
Perceptron consists of a single layer of input nodes that are fully connected to a layer of output nodes. It is
particularly good at learning linearly separable patterns. It utilizes a variation of artificial neurons
called Threshold Logic Units (TLU), which were first introduced by McCulloch and Walter Pitts in the
1940s. This foundational model has played a crucial role in the development of more advanced neural
networks and machine learning algorithms.

Types of Perceptron

1. Single-Layer Perceptron is a type of perceptron is limited to learning linearly separable patterns. It is


effective for tasks where the data can be divided into distinct categories through a straight line. While
powerful in its simplicity, it struggles with more complex problems where the relationship between
inputs and outputs is non-linear.
2. Multi-Layer Perceptron possess enhanced processing capabilities as they consist of two or more layers,
adept at handling more complex patterns and relationships within the data.

Basic Components of Perceptron


A Perceptron is composed of key components that work together to process information and make predictions.
 Input Features: The perceptron takes multiple input features, each representing a characteristic of the
input data.
 Weights: Each input feature is assigned a weight that determines its influence on the output. These
weights are adjusted during training to find the optimal values.
 Summation Function: The perceptron calculates the weighted sum of its inputs, combining them with
their respective weights.
 Activation Function: The weighted sum is passed through the Heaviside step function, comparing it to a
threshold to produce a binary output (0 or 1).
 Output: The final output is determined by the activation function, often used for binary
classification tasks.
DEPARTMENT OF CSE, MVJCE Page 22
MACHINE LEARNING LABORATORY[MVJ22CS62]
 Bias: The bias term helps the perceptron make adjustments independent of the input, improving its
flexibility in learning.

Learning Algorithm:

The perceptron adjusts its weights and bias using a learning algorithm, such as the Perceptron Learning Rule,
to minimize prediction errors.
These components enable the perceptron to learn from data and make predictions. While a single perceptron
can handle simple binary classification, complex tasks require multiple perceptrons organized into layers,
forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the importance of that input in
determining the output. The Perceptron’s output is calculated as a weighted sum of the inputs, which is then
passed through an activation function to decide whether the Perceptron will fire.
The weighted sum is computed as:

The step function compares this weighted sum to a threshold. If the input is larger than the threshold value,
the output is 1; otherwise, it’s 0. This is the most common activation function used in Perceptrons are
represented by the Heaviside step function:

A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully connected to all
input nodes.

Threshold Logic units

DEPARTMENT OF CSE, MVJCE Page 23


MACHINE LEARNING LABORATORY[MVJ22CS62]
In a fully connected layer, also known as a dense layer, all neurons in one layer are connected to every
neuron in the previous layer.
The output of the fully connected layer is computed as:

where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is the step function.
During training, the Perceptron’s weights are adjusted to minimize the difference between the predicted
output and the actual output. This is achieved using supervised learning algorithms like the delta rule or the
Perceptron learning rule.
The weight update formula is:

Where:
 wi,jwi,j​ is the weight between the ithith input and jthjth output neuron,
 xixi​ is the ithith input value,
 yjyj​ ​ is the actual value, and y^jy^​ j​ ​ is the predicted value,
 ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy over time.

Example: Perceptron in Action

Let’s take a simple example of classifying whether a given fruit is an apple or not based on two inputs: its
weight (in grams) and its color (on a scale of 0 to 1, where 1 means red). The perceptron receives these inputs,
multiplies them by their weights, adds a bias, and applies the activation function to decide whether the fruit is
an apple or not.
 Input 1 (Weight): 150 grams
 Input 2 (Color): 0.9 (since the fruit is mostly red)
 Weights: [0.5, 1.0]
 Bias: 1.5
The perceptron’s weighted sum would be:
(150∗ 0.5)+(0.9∗ 1.0)+1.5=76.4(150∗ 0.5)+(0.9∗ 1.0)+1.5=76.4
Let’s assume the activation function uses a threshold of 75. Since 76.4 > 75, the perceptron classifies the
fruit as an apple (output = 1).
Building and Training Single Layer Perceptron Model
For building the perceptron model we are going to implement following steps
Step 1: Initialize the weight and learning rate
We consider the weight values for the number of inputs + 1 (with the additional +1 accounting for the bias
term). This ensures that both the inputs and bias are included during training.
class Perceptron:
def __init__(self, num_inputs, learning_rate=0.01):

DEPARTMENT OF CSE, MVJCE Page 24


MACHINE LEARNING LABORATORY[MVJ22CS62]
# Initialize the weights (num_inputs + 1 for bias)
self.weights = np.random.rand(num_inputs + 1) # Random initialization
self.learning_rate = learning_rate # Learning rate
Step 2: Define the Linear Layer
The first step is to calculate the weighted sum of the inputs. This is done using the formula: Z = XW + b,
where X represents the inputs, W the weights, and b the bias.
def linear(self, inputs):
Z = inputs @ self.weights[1:].T + self.weights[0] # Weighted sum: XW + b
return Z
Step 3: Define the Activation Function
The Heaviside Step function is used as the activation function, which compares the weighted sum to a
threshold. If the sum is greater than or equal to 0, it outputs 1; otherwise, it outputs 0.
def Heaviside_step_fn(self, z):
if z >= 0:
return 1 # Output 1 if the input is >= 0
else:
return 0 # Output 0 otherwise
Step 4: Define the Prediction
Use the linear function followed by the activation function to generate predictions based on the input features.
def predict(self, inputs):
Z = self.linear(inputs) # Pass inputs through the linear layer
try:
pred = []
for z in Z: # For batch inputs
pred.append(self.Heaviside_step_fn(z))
except:
return self.Heaviside_step_fn(Z) # For single input
return pred # Return prediction
Step 5: Define the Loss Function
The loss function calculates the error between the predicted output and the actual output. In the Perceptron,
the loss is the difference between the target value and the predicted value.
def loss(self, prediction, target):
loss = (prediction - target) # Error or loss calculation
return loss
Step 6: Define Training
In this step, weights and bias are updated according to the error calculated from the loss function.
The Perceptron learning rule is applied to adjust the weights to minimize the error.
def train(self, inputs, target):
prediction = self.predict(inputs) # Get prediction
error = self.loss(prediction, target) # Calculate error (loss)

DEPARTMENT OF CSE, MVJCE Page 25


MACHINE LEARNING LABORATORY[MVJ22CS62]
self.weights[1:] += self.learning_rate * error * inputs # Update weights
self.weights[0] += self.learning_rate * error # Update bias
Step 7: Fit the Model
The fitting process involves training the model over multiple iterations (epochs) to adjust the weights and
bias. This allows the Perceptron to learn from the data and improve its prediction accuracy over time.
def fit(self, X, y, num_epochs):
for epoch in range(num_epochs):
for inputs, target in zip(X, y): # Loop through dataset
self.train(inputs, target) # Train on each input-target pair

Backpropagation

Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial neural networks,
particularly feed-forward networks. It works iteratively, minimizing the cost function by adjusting weights
and biases.
In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient descent.
The algorithm computes the gradient using the chain rule from calculus, allowing it to effectively navigate
complex layers in the neural network to minimize the cost function.

fig(a) A simple illustration of how the backpropagation works by adjustments of weights

Why is Backpropagation Important?


Backpropagation plays a critical role in how neural networks improve over time. Here's why:
 Efficient Weight Update: It computes the gradient of the loss function with respect to each weight
using the chain rule, making it possible to update weights efficiently.
 Scalability: The backpropagation algorithm scales well to networks with multiple layers and complex
architectures, making deep learning feasible.
 Automated Learning: With backpropagation, the learning process becomes automated, and the
model can adjust itself to optimize its performance.

DEPARTMENT OF CSE, MVJCE Page 26


MACHINE LEARNING LABORATORY[MVJ22CS62]
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward Pass.

How Does the Forward Pass Work?

In the forward pass, the input data is fed into the input layer. These inputs, combined with their respective
weights, are passed to hidden layers.
For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output from h1 serves
as the input to h2. Before applying an activation function, a bias is added to the weighted inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the input if
it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex relationships in
the data. Finally, the outputs from the last hidden layer are passed to the output layer, where an activation
function, such as softmax, converts the weighted outputs into probabilities for classification.

The forward pass using weights and biases

How Does the Backward Pass Work?

In the backward pass, the error (the difference between the predicted and actual output) is propagated back
through the network to adjust the weights and biases. One common method for error calculation is the Mean
Squared Error (MSE), given by:
MSE=(Predicted Output−Actual Output)2MSE=(Predicted Output−Actual Output)2
Once the error is calculated, the network adjusts weights using gradients, which are computed with the chain
rule. These gradients indicate how much each weight and bias should be adjusted to minimize the error in the
next iteration. The backward pass continues layer by layer, ensuring that the network learns and improves its
performance. The activation function, through its derivative, plays a crucial role in computing these gradients
during backpropagation.

DEPARTMENT OF CSE, MVJCE Page 27

You might also like