MCA - ML Question Bank Answer
MCA - ML Question Bank Answer
1. Define Machine learning and explain the concept of machine learning with a
neat diagram.
Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses
on the development of algorithms and models that enable computers to learn
and make predictions or decisions without being explicitly programmed for a
specific task.
The first step in machine learning activity starts with data. In case of
supervised learning, it is the labelled training data set followed by test data
which is not labelled. In case of unsupervised learning, there is no question of
labelled data but the task is to find patterns in the input data.
A thorough review and exploration of the data is needed to understand the
type of the data, the quality of the data and relationship between the different
data elements. Based on that, multiple pre-processing activities may need to
be done on the input data before we can go ahead with core machine learning
activities. Following are the typical preparation activities done once the input
data comes into the machine learning system:
• Understand the type of data in the given input data set (For example
Numerical Data).
• Explore the data to understand the nature and quality.
• Explore the relationships amongst the data elements, e.g. inter-
feature relationship.
• Find potential issues in data. (you might find missing values, outliers,
duplicate entries, or data entry errors)
• Do the necessary remediation, e.g. impute missing data values, etc.,
if needed. Once issues are identified, you take steps to address them.
• Apply the following pre-processing steps, as necessary.
1. Dimensionality Reduction
2. Feature sub-set selection
Once the data is prepared for modelling, then the learning tasks start off.
Once the data is prepared for modelling, then the learning tasks start off. As a
part of it, do the following activities:
1. “ The input data is first divided into parts — the training data and the test
data (called holdout). This step is applicable for supervised learning only.
2. Consider different models or learning algorithms for selection. “Train the
model based on the training data for supervised learning problem and
apply to unknown data. Directly apply the chosen unsupervised model on
the input data for unsupervised learning problem.
3. After the model is selected, trained (for supervised learning), and applied
on input data, the performance of the model is evaluated. Based on options
available, specific actions can be taken to improve the performance of the
model, if possible.
6. Image Recognition:
It is one of the most common machine learning applications. There
are many situations where you can classify the object as a digital image. For
digital images, the measurements describe the outputs of each pixel in the
image.
In the case of a black and white image, the intensity of each pixel serves as
one measurement.
So if a black and white image has N*N pixels, the total number of pixels and
hence measurement is N2. In the coloured image, each pixel considered as
providing 3 measurements of the intensities
of 3 main colour components ie RGB. So N*N coloured image there are 3
N2 measurements.
• For face detection – The categories might be face versus no face
present. There might be
a separate category for each person in a database of several individuals.
• For character recognition – We can segment a piece of writing into
smaller images, each
containing a single character. The categories might consist of the 26 letters
of the English alphabet, the 10 digits, and some special characters.
7. Speech Recognition
Speech recognition (SR) is the translation of spoken words into text. It is also
known as “automatic speech recognition” (ASR), “computer speech
recognition”, or “speech to text” (STT). In speech recognition, a software
application recognizes spoken words. The measurements in
this Machine Learning application might be a set of numbers that represent
the speech signal. We can segment the signal into portions that contain
distinct words or phonemes. In each segment, we can represent the speech
signal by the intensities or energy in different time frequency bands.
Although the details of signal representation are outside the scope of this
program, we can represent the signal by a set of real values.
Speech recognition, Machine Learning applications include voice user
interfaces. Voice user interfaces are such as voice dialing, call routing, domotic
appliance control. It can also use as simple data entry, preparation of
structured documents, speech-to-text processing, and plane.
3. Briefly Explain the types of Supervised and Unsupervised Machine Learning
with appropriate examples.
2. Regression :
Regression is a type of supervised machine learning task where the goal is
to predict a continuous numerical value or outcome based on input features.
Regression is a type of supervised machine learning where algorithms learn
from the data to predict continuous values such as sales, salary, weight, or
temperature.
Note:
In the context of regression in machine learning, a continuous numerical value
refers to an outcome or target variable that can take on an infinite number of
values within a specific range. In case of predicting a person's age, age is
considered a continuous numerical value because it can theoretically take on
any value within a certain range (for example, from 0 to 100+ years). There are
no gaps or intervals between possible ages, and age can be expressed as a
decimal or fraction if necessary (e.g., 25.5 years).
It is a variable that can have any real number value, and there are no
distinct categories or classes. The term "continuous" implies that the variable
can vary over a continuous range, and there are no gaps or interruptions in the
possible values it can take. In contrast, in a classification task, the target
variable would be a discrete set of categories or classes.
Example:
1. Predicting age of a person: Given certain features or attributes of a person,
such as height, weight, gender, and other relevant factors, the task is to
predict the person's age in years.
2. Predicting the price of houses based on their features:
In real estate markets, house prices can vary continuously based on factors
such as location, size, amenities, market conditions, and other features. The
House prices can range from a few thousand dollars for smaller properties in
certain areas to millions of dollars for luxury properties in prime locations.
3. Predicting the salary of an employee on the basis of the year of experience.
Unsupervised Learning:
1. Clustering:
We typically see association rule mining used for market basket analysis:
this is a data mining technique retailers use to gain a better understanding of
customer purchasing patterns based on the relationships between various
products.
Example:
Consider training an autonomous vehicle to navigate a maze. The vehicle
(agent) interacts with the maze environment, receiving positive rewards for
moving closer to the maze's exit and negative rewards for hitting walls or
going further from the exit. Through trial and error, the vehicle learns a policy
(sequence of actions) to efficiently navigate the maze and reach the exit,
optimizing its path to maximize cumulative rewards.
7. Discuss the broad classification of data used in machine learning along with
appropriate examples
Examples:
Examples:
• Number of siblings.
• Number of goals scored in a soccer match.
• Number of defects in a manufacturing process.
• Number of customers in a store at a given time.
• Age of a Person
• Number of cars in a parking lot
2. Continues Data:
Continuous data is data that can take any value. Height, weight,
temperature and length are all examples of continuous data. Some
continuous data will change over time; the weight of a baby in its first
year or the temperature in a room throughout the day.
Continuous data represents measurements that can take on any
value within a certain range. These values are not restricted to whole
numbers and can include decimals or fractions.
Example:
• Height of individuals.
• Weight of objects.
• Temperature readings.
• Time taken to complete a task.
• Distance traveled by a vehicle.
2. Imputation:
Imputation involves replacing missing values with estimated or calculated
values based on the available data.
Common imputation techniques include mean, median, mode imputation, or
using predictive models to estimate missing values.
For numerical features, replacing missing values with the mean or median of
the respective feature is a straightforward approach.
For categorical features, replacing missing values with the mode (most
frequent value) is often used.
Example: In a dataset containing age values with missing entries, missing
values can be replaced with the mean age of the non-missing entries.
5. Domain-specific Methods:
In some cases, domain-specific knowledge may guide the handling of missing
values.
For example, in time-series data, missing values may be filled with the most
recent available value or interpolated based on trends in the data.
By employing appropriate techniques to handle missing values, data
preprocessing ensures that machine learning models can effectively learn from
the available data, leading to more accurate and reliable predictions or
insights. Each method has its advantages and limitations, and the choice of
technique depends on factors such as the nature of the data, the extent of
missingness, and the requirements of the analysis or modeling task.
10 Mark Questions:
3. Classification:
Classification is a type of supervised machine learning task where the goal
is to predict the category or class that a new instance or observation belongs
to, on the basis of training data. The output variable in classification is
discrete and represents different classes or labels.
In Classification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups. Such
as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called
as targets/labels or categories.
4. Regression :
Regression is a type of supervised machine learning task where the goal is
to predict a continuous numerical value or outcome based on input features.
Regression is a type of supervised machine learning where algorithms learn
from the data to predict continuous values such as sales, salary, weight, or
temperature.
Note:
In the context of regression in machine learning, a continuous numerical value
refers to an outcome or target variable that can take on an infinite number of
values within a specific range. In case of predicting a person's age, age is
considered a continuous numerical value because it can theoretically take on
any value within a certain range (for example, from 0 to 100+ years). There are
no gaps or intervals between possible ages, and age can be expressed as a
decimal or fraction if necessary (e.g., 25.5 years).
It is a variable that can have any real number value, and there are no
distinct categories or classes. The term "continuous" implies that the variable
can vary over a continuous range, and there are no gaps or interruptions in the
possible values it can take. In contrast, in a classification task, the target
variable would be a discrete set of categories or classes.
Example:
4. Predicting age of a person: Given certain features or attributes of a person,
such as height, weight, gender, and other relevant factors, the task is to
predict the person's age in years.
5. Predicting the price of houses based on their features:
In real estate markets, house prices can vary continuously based on factors
such as location, size, amenities, market conditions, and other features. The
House prices can range from a few thousand dollars for smaller properties in
certain areas to millions of dollars for luxury properties in prime locations.
6. Predicting the salary of an employee on the basis of the year of experience.
Unsupervised Learning:
Example:
4. Association Rule:
We typically see association rule mining used for market basket analysis:
this is a data mining technique retailers use to gain a better understanding of
customer purchasing patterns based on the relationships between various
products. So Association is the process of discovering interesting relationships,
associations, or patterns within a dataset. This type of analysis is often applied to
transactional data, where the goal is to identify associations between items or
events that frequently co-occur. Association rules are used to express these
relationships, and they help reveal hidden connections in the data.
Reinforcement Learning:
Reinforcement Learning (RL) is a type of machine learning paradigm in
which an agent learns to make decisions by interacting with an environment.
The agent takes actions in the environment, and in return, it receives feedback in
the form of rewards or punishments.
Reinforcement Learning is a feedback-based Machine learning technique
in which an agent learns to behave in an environment by performing the actions
and seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty.
In Reinforcement Learning, the agent learns automatically using feedbacks
without any labeled data, unlike supervised learning Since there is no labeled
data, so the agent is bound to learn by its experience only. RL solves a specific
type of problem where decision making is sequential, and the goal is long-term,
such as game-playing, robotics, etc. The agent interacts with the environment and
explores it by itself. The primary goal of an agent in reinforcement learning is to
improve the performance by getting the maximum positive rewards.
Examples:
Examples:
• Number of siblings.
• Number of goals scored in a soccer match.
• Number of defects in a manufacturing process.
• Number of customers in a store at a given time.
• Age of a Person
• Number of cars in a parking lot
4. Continues Data:
Continuous data is data that can take any value. Height, weight,
temperature and length are all examples of continuous data. Some
continuous data will change over time; the weight of a baby in its first
year or the temperature in a room throughout the day.
Continuous data represents measurements that can take on any
value within a certain range. These values are not restricted to whole
numbers and can include decimals or fractions.
Example:
• Height of individuals.
• Weight of objects.
• Temperature readings.
• Time taken to complete a task.
• Distance traveled by a vehicle.
1. Structured Data:
Structured data refers to data that has a well-defined and organized
structure, typically stored in databases or tabular formats.
2. Unstructured Data:
Unstructured data refers to data that does not have a predefined data
model or organization, making it more challenging to analyze using
traditional methods.
It lacks a formal structure and can include text documents, images, audio
files, videos, social media posts, emails, and web pages.
3. Semi-Structured Data:
1. Feature transformation:
Feature transformation transforms the data — structured or unstructured,
into a new set of features which can represent the underlying problem
which machine learning is trying to solve.
Feature transformation involves changing the representation of the
features in the dataset to make them more suitable for the machine
learning algorithm.
Engineering a good feature space is a crucial prerequisite for the
success of any machine learning model. However, often it is not clear
which feature is more important. For that reason, all available attributes of
the data set are used as features and the problem of identifying the
important features is left to the learning model. This is definitely not a
feasible approach, particularly for certain domains e.g. medical image
classification, text categorization, etc. In case a model has to be trained to
classify a document as spam or non-spam, we can represent a document as
a bag of words. Then the feature space will contain all unique words
occurring across all documents. This will easily be a feature space of a few
hundred thousand features. If we start including bigrams or trigrams along
with words, the count of features will run in millions. To deal with this
problem, feature transformation comes into play. Feature transformation
is used as an effective tool for dimensionality reduction and hence for
boosting learning model performance. Broadly, there are two distinct
goals of feature transformation:
1. Achieving best reconstruction of the original features in the data
set
2. Achieving highest efficiency in the learning task
There are two variants of feature transformation:
1. Feature construction (or Generation):
2. Feature extraction:
Example of Playing Card = 52. Now The card that I chose is a Face Card and
the Problem is to find whether the face card is King or Not
2. Posterior:
The probability that a particular hypothesis holds for a data set based on
the Prior is called the posterior probability or simply Posterior.
In the above example, the probability of the hypothesis that the patient
has a malignant tumour considering the Prior of correctness of the malignancy
testis a posterior probability.
The posterior probability represents our updated belief or probability of an
event or hypothesis being true after observing new evidence or data.
The posterior probability is calculated using Bayes' theorem, which
combines the prior probability, the likelihood, and the evidence or data.
The posterior probability reflects our updated understanding of the event
or hypothesis based on the observed evidence.
Example:
After the patient undergoes a diagnostic test for Disease X and the test
results come back positive, we want to update our belief about the probability
of the patient having the disease. Using Bayes' theorem, we calculate the
posterior probability of the patient having Disease X given the positive test
result. Let's say the likelihood of a positive test result given that the patient
has Disease X is 0.95, and the likelihood of a positive test result given that the
patient does not have Disease X (false positive rate) is 0.10. Using Bayes'
theorem, we update our prior probability to calculate the posterior
probability:
P(Disease X | Positive Test) = P(Positive Test | Disease X)×P(Disease X) /
P(Positive Test)
3. Likelihood:
The likelihood represents the probability of observing the evidence or data
given that a particular hypothesis or event is true. It measures how well the
Hypothesis explains the observed data. i.e, The likelihood quantifies how well
the observed data supports the hypothesis or event A.
The likelihood plays a crucial role in Bayesian inference as it helps update
the prior probability to the posterior probability.
For Example:
The likelihood represents the probability of observing the evidence (test
results) given the hypothesis (presence or absence of Disease X). In our
example, the likelihood of a positive test result given that the patient has
Disease X is 0.95, and the likelihood of a positive test result given that the
patient does not have Disease X (false positive rate) is 0.10.
In summary, the prior probability represents our initial belief about the
likelihood of an event (presence of Disease X), the posterior probability
represents our updated belief after observing new evidence (positive test
result), and the likelihood represents the probability of observing the evidence
given the hypothesis. These concepts are fundamental to Bayesian inference
and help us make informed decisions in uncertain situations, such as medical
diagnosis.
Here, P(x1, x2, x3, ……, xn) is common for all the classes or records in the
data set. So we can ignore it
This equation implies that it is used to find the value of y that maximizes the
expression In RHS. In other words, it returns the class label y that has the
highest probability given the input features x1,x2,x3,…,xn.
For Example: Let say we have Binary Classifiier for Spam or Not Spam if we are
getting the Probability = 0.7 for the Mail is Spam and 0.3 that the mail is Not
Spam then we have to consider the maximum among these that is 0.7 which
means the mail is Spam.
Example:
Suppose we have a dataset with two classes, "spam" (denoted as y = spam)
and "not spam" (denoted as y=not spam), and two features, x1 and x2.
We want to classify a new email with the following features:
x1=buy
x2=discount
Let's assume we have already calculated the following probabilities from our
training dataset:
1. Prior probabilities:
• P(spam)=0.4 // Consider it as 40% out of 100%
• P(not spam)=0.6 // Consider it as 60%
2. Likelihoods:
• P(buy∣spam)=0.8
• P(discount∣spam)=0.6
• P(buy∣not spam)=0.3
• P(discount∣not spam)=0.5
Now, let's plug these values into the Naive Bayes classifier equation:
Comparing the two values, we see that y=spam gives the higher result.
Therefore, according to the Naive Bayes classifier, the predicted class for the
given features "buy" and "discount" is "spam".
2. Spam Filtering:
Spam filtering aims to automatically identify and filter out unwanted or
unsolicited emails (spam) from legitimate emails (ham).
The Naive Bayes classifier analyzes the content and features of emails, such
as words, sender information, and email headers, to determine the
probability of an email being spam or ham.
Example: Gmail's spam filter uses a Naive Bayes classifier to classify
incoming emails as spam or not spam based on various criteria.
Formula for Conditional Probability for Event A given that (|) event B already occurred:
Here:
Formula for Conditional Probability for Event B given that (|) event A already
occurred:
Bayes theorem is one of the most popular machine learning concepts that helps
to calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
Bayes theorem One of the most well-known theories in machine learning, the
Bayes theorem helps determine the likelihood that one event will occur with unclear
information while another has already happened.
10 Mark Questions:
1. What is Bayes' theorem? Briefly explain Bayes' theorem and the various
terms associated with Bayesian theory, along with its derivation.
Formula for Conditional Probability for Event A given that (|) event B already occurred:
Here:
Bayes theorem is one of the most popular machine learning concepts that helps
to calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
Bayes theorem One of the most well-known theories in machine learning, the
Bayes theorem helps determine the likelihood that one event will occur with unclear
information while another has already happened.
2. Posterior:
The probability that a particular hypothesis holds for a data set based on
the Prior is called the posterior probability or simply Posterior.
In the above example, the probability of the hypothesis that the patient
has a malignant tumour considering the Prior of correctness of the malignancy
testis a posterior probability.
The posterior probability represents our updated belief or probability of an
event or hypothesis being true after observing new evidence or data.
The posterior probability is calculated using Bayes' theorem, which
combines the prior probability, the likelihood, and the evidence or data.
The posterior probability reflects our updated understanding of the event
or hypothesis based on the observed evidence.
Example:
After the patient undergoes a diagnostic test for Disease X and the test
results come back positive, we want to update our belief about the probability
of the patient having the disease. Using Bayes' theorem, we calculate the
posterior probability of the patient having Disease X given the positive test
result. Let's say the likelihood of a positive test result given that the patient
has Disease X is 0.95, and the likelihood of a positive test result given that the
patient does not have Disease X (false positive rate) is 0.10. Using Bayes'
theorem, we update our prior probability to calculate the posterior
probability:
P(Disease X | Positive Test) = P(Positive Test | Disease X)×P(Disease X) /
P(Positive Test)
3. Likelihood:
The likelihood represents the probability of observing the evidence or data
given that a particular hypothesis or event is true. It measures how well the
Hypothesis explains the observed data. i.e, The likelihood quantifies how well
the observed data supports the hypothesis or event A.
The likelihood plays a crucial role in Bayesian inference as it helps update
the prior probability to the posterior probability.
For Example:
The likelihood represents the probability of observing the evidence (test
results) given the hypothesis (presence or absence of Disease X). In our
example, the likelihood of a positive test result given that the patient has
Disease X is 0.95, and the likelihood of a positive test result given that the
patient does not have Disease X (false positive rate) is 0.10.
In summary, the prior probability represents our initial belief about the
likelihood of an event (presence of Disease X), the posterior probability
represents our updated belief after observing new evidence (positive test
result), and the likelihood represents the probability of observing the evidence
given the hypothesis. These concepts are fundamental to Bayesian inference
and help us make informed decisions in uncertain situations, such as medical
diagnosis.
2. What is Naïve Bayes Classifier? Write the algorithm of Navie Bayes Classifier
with appropriate example.
Here, P(x1, x2, x3, ……, xn) is common for all the classes or records in the
data set. So we can ignore it
This equation implies that it is used to find the value of y that maximizes the
expression In RHS. In other words, it returns the class label y that has the
highest probability given the input features x1,x2,x3,…,xn.
For Example: Let say we have Binary Classifiier for Spam or Not Spam if we are
getting the Probability = 0.7 for the Mail is Spam and 0.3 that the mail is Not
Spam then we have to consider the maximum among these that is 0.7 which
means the mail is Spam.
Example:
Suppose we have a dataset with two classes, "spam" (denoted as y = spam)
and "not spam" (denoted as y=not spam), and two features, x1 and x2.
We want to classify a new email with the following features:
x1=buy
x2=discount
Let's assume we have already calculated the following probabilities from our
training dataset:
1. Prior probabilities:
• P(spam)=0.4 // Consider it as 40% out of 100%
• P(not spam)=0.6 // Consider it as 60%
2. Likelihoods:
• P(buy∣spam)=0.8
• P(discount∣spam)=0.6
• P(buy∣not spam)=0.3
• P(discount∣not spam)=0.5
Now, let's plug these values into the Naive Bayes classifier equation:
Comparing the two values, we see that y=spam gives the higher result.
Therefore, according to the Naive Bayes classifier, the predicted class for the
given features "buy" and "discount" is "spam".
Classification :
6. Training:
The learning algorithm identified in the previous step is run on the gathered
training set for further fine tuning. Some supervised learning algorithms
require the user to determine specific control parameters (which are given as
inputs to the algorithm). These parameters (inputs given to algorithm) may
also be adjusted by optimizing performance on a subset (called as validation
set) of the training set.
7. Evaluation with the Test Data Set:
Training data is run on the algorithm, and its performance is measured
here. If a suitable result is not obtained, further training of parameters may be
required.
Note:
When we are trying to predict a categorical or nominal variable, the
problem is known as a classification problem. A classification problem is one
where the output variable is a category such as ‘red’ or ‘blue’ or ‘malignant
tumour’ or ‘benign tumour’ etc. Whereas when we are trying to predict a
numerical variable such as ‘price’, ‘weight’, etc. the problem falls under the
category of regression.
Note: Best Fit Line is draws based on the Slope of the Line
Logistic Regression:
Logistic regression is a versatile technique that serves both classification
and regression tasks, depending on the context in which it is applied. It's
primarily utilized as a classification method and is often referred to as logit
regression.
This statistical approach is employed for predicting the outcome of a
categorical dependent variable. In logistic regression, the dependent variable
(Y) typically takes on binary values (0 or 1), representing two possible
outcomes. Meanwhile, the independent variables (X) are continuous in
nature, providing predictive features for the model.
Logistic regression is used when our dependent variable is dichotomous
or binary. It just means a variable that has only 2 outputs, for example, A
person will survive this accident or not, The student will pass this exam or
not. The outcome can either be yes or no (2 outputs). This regression
technique is similar to linear regression and can be used to predict
the Probabilities for classification problems. Like all regression analyses,
logistic regression is a predictive analysis.
Also, instead of fitting a line to the data, logistic regression fits an "S"
shaped "Logistic Function" or "Sigmoid Function".
4. Discuss the SVM model in detail with different scenarios
Support Vector Machines:
Support Vector Machines (SVM) is a powerful supervised machine learning
algorithm used for classification and regression tasks. The primary objective
of SVM is to find the optimal hyperplane that best separates different classes
in the feature space.
The goal of the SVM algorithm is to create the Best line or decision
boundary that can segregate the n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
SVM is a model, which can do linear classification as well as regression.
SVM is based on the concept of a surface, called a hyperplane, which draws a
boundary between data instances plotted in the multi-dimensional feature
space. The output prediction of an SVM is one of two conceivable classes
which are already defined in the training data. In summary, the SVM algorithm
builds an N-dimensional hyperplane model that assigns future instances into
one of the two possible output classes.
Note:
The SVM model is does not depends on the single Hyperplane. Instead it
will create 2 more hyperplane where, one is created which is passing through
the nearest point in one class or category and one more is for another classs or
category.
The goal of the SVM analysis is to find a plane, or rather a hyperplane, which
separates the instances on the basis of their classes. New examples (i.e. new
instances) are then mapped into that same space and predicted to belong to a
class on the basis of which side of the gap the new instance will fall on. In
summary, in the overall training process, the SVM algorithm analyses input
data and identifies a surface in the multidimensional feature space called the
hyperplane. There may be many possible hyperplanes, and one of the
challenges with the SVM model is to find the optimal hyperplane.
Important Terminologies:
1. Marginal Plane:
In Support Vector Machines (SVM), the marginal plane refers to the
hyperplane or decision boundary that maximizes the margin between the
support vectors of different classes. The margin is the distance between the
decision boundary and the closest data points (support vectors) from each
class.
2. Support Vectors: The data points which are passing through the Marginal
Plane is called Support Vectors. It is possible to have more than one Data
Points.
3. Marginal Distance :In Support Vector Machines (SVM), the marginal
distance refers to the distance between the data point and the decision
boundary (hyperplane) of the SVM model.
Higher the marginal distance is more the Generalized our Model is.
Scenario 2:
As depicted in figure below, we have three hyperplanes: A, B, and C. We have
to identify the correct hyperplane which classifies the triangles and circles in
the best possible way. Here, maximizing the distances between the nearest
data points of both the classes and hyperplane will help us decide the correct
hyperplane. This distance is called as margin.
We can see that the margin for hyperplane A is high as compared to those for
both B and C. Hence, hyperplane A is the correct hyperplane.
Another quick reason for selecting the hyperplane with higher margin
(distance) is robustness. If we select a hyperplane having a lower margin
(distance), then there is a high probability of misclassification.
Scenario 3:
When we use the rules as discussed in the previous section to identify the
correct hyperplane in the scenario shown in Figure there may a chance of
selecting hyperplane B as it has a higher margin (distance from the class) than
A. But, here is the catch; SVM selects the hyperplane which classifies the
classes accurately before maximizing the margin. Here, hyperplane B has a
classification error, and A has classified all data instances correctly. Therefore,
A is the correct hyperplane.
Scenario 4:
In this scenario, as shown in Figure a, it is not possible to distinctly segregate
the two classes by using a straight line, as one data instance belonging to one
of the classes (triangle) lies in the territory of the other class (circle) as an
outlier. One triangle at the other end is like an outlier for the triangle class.
SVM has a feature to ignore outliers and find the hyperplane that has the
maximum margin. Hence, we can say that SVM is robust to outliers. So
Consider the Figure b as the Final Hyperplane.
5. Explain the concept of KNN with an example
k-Nearest Neighbour (kNN)
A = np.array(
[
[3.1, 2.3],
[2.3, 4.2],
[3.9, 3.5],
[3.7, 6.4],
[4.8, 1.9],
[8.3, 3.1],
[5.2, 7.5],
[4.8, 4.7],
[3.5, 5.1],
[4.4, 2.9],
]
)
plt.figure()
plt.title('Input data')
plt.scatter(A[:,0], A[:,1], marker = 'x', s = 50, color = 'red')
# Here A[:] means the slicing has no range it will take entire data set
# A[:, 0] - This means consider full data set and from the data set take only the
1st column
# Find the nearest neighbour for the new data point [5.2, 2.9]
test_data = [5.2, 2.9]
knn_model.fit(A)
plt.figure()
plt.title('Nearest neighbors')
plt.scatter(A[:, 0], A[:, 1], marker = 'x', s = 100, color = 'red')
plt.scatter(test_data[0], test_data[1],marker = 'x', s = 100, color = 'blue')
plt.show()
1. Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. Logistic Regression
Classification:
3. Decision Tree
Decision tree learning is one of the most widely adopted algorithms
for classification. As the name indicates, it builds a model in the form of a
tree structure.
It has a hierarchical tree structure consisting of a root node,
branches, internal nodes, and leaf nodes. Decision trees are used for
classification and regression tasks, providing easy-to-understand models.
Its grouping exactness is focused with different strategies, and it is
exceptionally productive. A decision tree is used for multi-dimensional
analysis with multiple classes. It is characterized by fast execution time and
ease in the interpretation of the rules.
4. Random Forest:
Random forest is an ensemble classifier, i.e. a combining classifier
that uses and combines many decision tree classifiers. It is also used for
Regression.
Random Forest combines the output of multiple decision trees to
reach a single result. It combines the opinions of many “trees” i.e,
individual models to make better predictions, creating a more robust and
accurate overall model.
Regression:
1. Linear Regression
Simple linear regression:
As the name indicates, simple linear regression is the simplest
regression model which involves only one independent variable or
predictor and only one Dependent Variable or the Response Variable.
This model assumes a linear relationship between the dependent
variable and the predictor variable i.e, The relationship between
Dependent and Independent Variable is Linear. Here Linear means if the
value of independent variable increases or decreses then the value of
Dependent varible will increases or decreases for sure.
2. Multiple Linear Regression:
In a multiple regression model, two or more independent variables,
i.e. predictors are involved in the model. In the context of simple linear
regression, we considered Price of a Property as the dependent variable
and the Area of the Property (in sq. m.) as the predictor variable.
However, location, floor, number of years since purchase, amenities
available, etc. are also important predictors which should not be ignored.
Thus, if we consider Price of a Property (in ₹) as the dependent variable and
Area of the Property (in sq.m.), location, floor, number of years since
purchase and amenities available as the independent variables, we can
form a multiple regression equation as shown below:
Priceproperty = f (Areaproperty, location, floor, Ageing, Amenities)
The simple linear regression model and the multiple regression
model assume that the dependent variable is continuous.
3. Polynomial Regression:
A simple linear regression algorithm only works when the
relationship between the data is linear. But suppose we have non-linear
data, then linear regression will not be able to draw a best-fit line. Simple
regression analysis fails in such conditions. Consider the below diagram,
which has a non-linear relationship, and you can see the linear regression
results on it, which does not perform well, meaning it does not come close
to reality. Hence, we introduce polynomial regression to overcome this
problem, which helps identify the curvilinear relationship between
independent and dependent variables.
4. Logistic Regression:
Logistic regression is a versatile technique that serves both
classification and regression tasks, depending on the context in which it is
applied. It's primarily utilized as a classification method and is often
referred to as logit regression.
This statistical approach is employed for predicting the outcome of a
categorical dependent variable. In logistic regression, the dependent
variable (Y) typically takes on binary values (0 or 1), representing two
possible outcomes. Meanwhile, the independent variables (X) are
continuous in nature, providing predictive features for the model.
Logistic regression is used when our dependent variable is
dichotomous or binary. It just means a variable that has only 2 outputs, for
example, A person will survive this accident or not, The student will pass
this exam or not. The outcome can either be yes or no (2 outputs). This
regression technique is similar to linear regression and can be used to
predict the Probabilities for classification problems. Like all regression
analyses, logistic regression is a predictive analysis.
10 Mark Questions:
Classification Learning Algorithms: (Pick one algorithm from this a write the
Python Code)
1. KNN
2. SVM
3. Decision Tree
4. Random Forest
Regression Learning Algorithms: (Pick one algorithm from this a write the
Python Code)
1. Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. Logistic Regression
Here I will choose KNN for Classification and Linear Regression for Regression:
A = np.array(
[
[3.1, 2.3],
[2.3, 4.2],
[3.9, 3.5],
[3.7, 6.4],
[4.8, 1.9],
[8.3, 3.1],
[5.2, 7.5],
[4.8, 4.7],
[3.5, 5.1],
[4.4, 2.9],
]
)
plt.figure()
plt.title('Input data')
plt.scatter(A[:,0], A[:,1], marker = 'x', s = 50, color = 'red')
# Here A[:] means the slicing has no range it will take entire data set
# A[:, 0] - This means consider full data set and from the data set take only the
1st column
# Find the nearest neighbour for the new data point [5.2, 2.9]
test_data = [5.2, 2.9]
knn_model.fit(A)
plt.figure()
plt.title('Nearest neighbors')
plt.scatter(A[:, 0], A[:, 1], marker = 'x', s = 100, color = 'red')
plt.scatter(test_data[0], test_data[1],marker = 'x', s = 100, color = 'blue')
plt.show()
2. Linear Regression
Simple linear regression:
As the name indicates, simple linear regression is the simplest regression
model which involves only one independent variable or predictor and only
one Dependent Variable or the Response Variable.
This model assumes a linear relationship between the dependent variable
and the predictor variable i.e, The relationship between Dependent and
Independent Variable is Linear. Here Linear means if the value of independent
variable increases or decreases then the value of Dependent variable will
increases or decreases for sure.
Note:
When we are trying to predict a categorical or nominal variable, the
problem is known as a classification problem. A classification problem is one
where the output variable is a category such as ‘red’ or ‘blue’ or ‘malignant
tumour’ or ‘benign tumour’ etc. Whereas when we are trying to predict a
numerical variable such as ‘price’, ‘weight’, etc. the problem falls under the
category of regression.
X =
[[2001,5.2],[2002,5.1],[2003,5.1],[2004,4.9],[2005,5.0],[2006,5.1],[2007,5.4],[20
08,5.6],[2009,5.9],[2010,5.8],[2011,6.2],
[2012,6.0],[2013,5.8],[2014,6.1],[2015,6.4],[2016,6.6],[2017,6.6],[2018,6.8],
[2019,6.85],[2020,5.9]]
Y =
[2.5,2.52,2.54,2.48,2.52,2.54,2.55,2.7,2.9,3.2,3.16,3.28,3.2,3.15,3.26,3.29,3.17,
3.25,3.29,3.18]
len(X), len(Y)
Note:
When we are trying to predict a categorical or nominal variable, the
problem is known as a classification problem. A classification problem is one
where the output variable is a category such as ‘red’ or ‘blue’ or ‘malignant
tumour’ or ‘benign tumour’ etc. Whereas when we are trying to predict a
numerical variable such as ‘price’, ‘weight’, etc. the problem falls under the
category of regression.
X =
[[2001,5.2],[2002,5.1],[2003,5.1],[2004,4.9],[2005,5.0],[2006,5.1],[2007,5.4],[20
08,5.6],[2009,5.9],[2010,5.8],[2011,6.2],
[2012,6.0],[2013,5.8],[2014,6.1],[2015,6.4],[2016,6.6],[2017,6.6],[2018,6.8],
[2019,6.85],[2020,5.9]]
Y =
[2.5,2.52,2.54,2.48,2.52,2.54,2.55,2.7,2.9,3.2,3.16,3.28,3.2,3.15,3.26,3.29,3.17,
3.25,3.29,3.18]
len(X), len(Y)
Unsupervised Learning
Aspect Supervised Learning
Learning from unlabeled
Learning from labeled data, data, where the algorithm
Definition where input-output pairs are must infer patterns without
given. explicit output labels.
Clustering, Dimensionality
Types Classification, Regression. Reduction, Association.
Use Real-Time analysis of
Usage Use Off-line analysis of Data. Data.
K-Means, Hierarchical
Examples of Decision Trees, Support Vector Clustering, Principal
Algorithms Machines, Neural Networks. Component Analysis (PCA).
2. Confidence:
Confidence measures the reliability of the association rule.
Confidence measures the reliability of the inference made by an association rule. It is
the probability of seeing the consequent (B) in a transaction given that the
transaction also contains the antecedent (A). A high confidence indicates a strong
association between the antecedent and consequent.
For example, let's calculate the confidence for the rule {Milk} → {Bread}.
Confidence({Milk} → {Bread}) = (Number of transactions containing {Milk,
Bread}) / (Number of transactions containing {Milk})
Confidence({Milk} → {Bread}) = 3 / 4 = 0.75
This means that the confidence for the rule {Milk} → {Bread} is 0.75,
indicating that in 75% of the transactions where Milk is purchased, Bread is also
purchased.
Example: Apply the Market Basket Analysis to bellow Transactions:
# market data
transactions = [('butter', 'milk', 'bread'),
('butter', 'milk', 'apple'),
('bread', 'milk', 'banana'),
('milk','bread','butter')]
The Apriori algorithm is a classic algorithm used for association rule mining
in data mining and machine learning. It is particularly useful for discovering
frequent itemsets in transactional datasets and extracting association rules
between items.
Applications of the Apriori algorithm:
2. E-commerce Recommendations:
E-commerce platforms leverage the Apriori algorithm to generate personalized
product recommendations for users based on their browsing and purchase
history.
By identifying frequent itemsets in historical transaction data, e-commerce
websites can recommend related or complementary products to users,
enhancing their shopping experience and increasing sales.
3. Inventory Management:
In inventory management, the Apriori algorithm can assist in optimizing stock
levels and inventory replenishment strategies.
By analyzing transaction data and identifying frequently co-purchased items,
businesses can better predict demand for certain products and ensure that
they have adequate stock on hand to meet customer needs.
6. Fraud Detection:
The Apriori algorithm can be used in fraud detection applications to identify
patterns of fraudulent behavior in financial transactions or insurance claims
data.
By detecting frequent combinations of suspicious activities or transactions,
organizations can implement preventive measures and mitigate the risk of
fraud.
Algorithm Implementation:
x1 = [] # hieght
x2 = [] # age
for item in X:
x1.append(item[0])
x2.append(item[1])
print(x1)
print(x2)
# Step-5
import matplotlib.pyplot as plt
plt.scatter(x1,x2, c=model.labels_)
Where,
Ci : Cluster Number (C1, C2, ….. , Cn) which is nothing but the Medoid
Pi : is the data point
| |: Cardinality to consider only +ve value.
Step-4: Swap one medoid point with a non-medoid point and recalculate the
cost.
Step-5: If the calculated cost with new medoid point is grater than the
previous cost, we undo the swap and the algorithm coverges else; we repeat
step 4
7. Explain the concept of DBSCAN algorithm in unsupervised learning
1. Dense Area:
A dense area in the dataset is a region where there is a high concentration
of data points. In a dense area, data points are closely packed together, and
there are relatively many data points within a small area. Dense areas often
correspond to clusters in clustering algorithms, as they represent regions
where the data points share similar characteristics or properties.
2. Sparse Area:
A sparse area in the dataset is a region where there is a low concentration
of data points. In a sparse area, data points are more sparsely distributed, and
there are relatively few data points within a given area. Sparse areas often
occur between clusters or in regions of the feature space where there is little
or no data present.
10 Mark Questions:
Algorithm Implementation:
x1 = [] # hieght
x2 = [] # age
for item in X:
x1.append(item[0])
x2.append(item[1])
print(x1)
print(x2)
# Step-5
import matplotlib.pyplot as plt
plt.scatter(x1,x2, c=model.labels_)
k-Medoids Cluster:
The k-Medoids algorithm is a variation of the k-Means algorithm that
focuses on finding representative objects or medoids in the dataset to form
clusters. Instead of using the mean or centroid of the data points within a
cluster, the k-Medoids algorithm selects actual data points or medoids as
cluster representatives. This makes the k-Medoids algorithm more robust to
outliers compared to k-Means, as it directly uses data points as cluster centers
rather than relying on the mean, which can be sensitive to outliers.
Because of the use of medoids from the actual representative data points,
k-medoids is less influenced by the outliers in the data. One of the practical
implementation of the k-medoids principle is the Partitioning Around
Medoids (PAM) algorithm.
What Do you mean by medoids:
A medoid in a data set is a central point within a cluster minimizing the sum of
distances to the other point.
Outliers:
Outliers refer to data points that significantly differ from other
observations in a dataset.
For Example in a dataset of student exam scores where most students score
between 60 and 90, but there is one student who scores 10. This score of 10
would be considered an outlier because it significantly differs from the rest of
the scores in the dataset.
Where,
Ci : Cluster Number (C1, C2, ….. , Cn) which is nothing but the Medoid
Pi : is the data point
| |: Cardinality to consider only +ve value.
Step-4: Swap one medoid point with a non-medoid point and recalculate the
cost.
Step-5: If the calculated cost with new medoid point is grater than the
previous cost, we undo the swap and the algorithm coverges else; we repeat
step 4.
UNIT 5: NEURAL NETWORKS
7 Mark Questions:
Deep Learning:
Deep learning is a branch of machine learning which is completely based
on artificial neural networks, as neural network is going to mimic the human
brain so deep learning is also a kind of mimic of human brain.
In deep learning, we don’t need to explicitly program everything. The
concept of deep learning is not new. It has been around for a couple of years
now. It’s on hype nowadays because earlier we did not have that much
processing power and a lot of data. As in the last 20 years, the processing
power increases exponentially, deep learning and machine learning came in
the picture.
Deep learning deals with algorithms inspired by the structure and function
of the brain's neural networks. It aims to mimic the way humans learn and
process information, enabling computers to learn from large amounts of data
and make predictions or decisions without being explicitly programmed.
Architectures in Deep Learning:
Deep learning encompasses a wide range of neural network architectures,
each designed to solve specific types of problems and address various
challenges in machine learning. These architectures differ in their structure,
connectivity, and functionality, allowing them to excel in different domains
and tasks.
Here are some popular Techniques in deep learning:
1. Deep Neural Network :
2. Convolutional Neural Networks (CNNs):
3. Recurrent Neural Networks (RNNs):
4. Deep Belief Network(DBN)
In the multi-layer perceptron diagram above, we can see that there are
three inputs and thus three input nodes and the hidden layer has three
nodes. The output layer gives two outputs, therefore there are two output
nodes. The nodes in the input layer take input and forward it for further
process, in the diagram above the nodes in the input layer forwards their
output to each of the three nodes in the hidden layer, and in the same way,
the hidden layer processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation
function. The sigmoid activation function takes real values as input and
converts them to numbers between 0 and 1 using the sigmoid formula.
A basic perceptron works very successfully for data sets which
possess linearly separable patterns. This is the philosophy used to design the
multi-layer perceptron model.
The major highlights of this model are as follows:
1. The neural network contains one or more intermediate layers between the
input and the output nodes, which are hidden from both input and output
nodes.
2. Each neuron in the network includes a non-linear activation function that
is differentiable.
3. The neurons in each layer are connected with some or all the neurons in
the previous layer.
The diagram in the figure below resembles a fully connected multi-layer
perceptron with multiple hidden layers between the input and output layers. It
is called fully connected because any neuron in any layer of the perceptron is
connected with all neurons (or input nodes in the case of the first hidden
layer) in the previous layer. The signals flow from one layer to another layer
from left to right.
• In this definition, x represents the input to the function, and the threshold
is a predetermined value that separates the two classes.
• where k = steepness or slope parameter of the sigmoid function. By varying
the value of k, sigmoid functions with different slopes can be obtained. It
has range of (0,1).
• The binary sigmoid function is commonly used in binary classification tasks,
where the goal is to classify inputs into one of two categories.
• The slope at the origin is k/4. As the value of k becomes very large, the
sigmoid function becomes a threshold function
• Like the binary sigmoid function, the bipolar sigmoid function produces
output values bounded between -1 and 1.
• The bipolar sigmoid function is useful in contexts where inputs and outputs
are naturally signed, such as in certain types of neural networks or signal
processing applications.
• It can also be advantageous in scenarios where the mean of the inputs is
close to zero, as it allows the network to capture both positive and
negative information.
6. Discuss the various types of activation functions in neural networks.
Activation Function:
• The Activation Function is applied over the net input i.e, ysum to calculate the output
of an ANN.
• The activation function is a mathematical "Gate" in between the input feeding in the
current node and it's output to the next layer.
1. Identity Function:
The identity function, also known as the "linear activation function," is a simple
mathematical function commonly used as an activation function for the input layer of
neural networks.
Unlike other activation functions that introduce non-linearity to the model, the identity
function preserves the original input values, resulting in a linear relationship between the
input and output.
yout = f(x) = x, for all x
2. Step function:
The threshold function, also known as the step function or Heaviside step function, is a
simple mathematical function commonly used in artificial neural networks as an activation
function.
It is a binary function that outputs one of two possible values based on whether the
input is greater than or equal to a specified threshold.
Mathematical Form:
Note:
θ represents a threshold value that determines the point at which the function
transitions from one state to another.
Here, θ acts as the boundary or threshold. If the input x is greater than or equal to θ, the
function outputs 1. If x is less than θ, the function outputs 0.
4. ReLU (Rectified Linear Unit) function:
The Rectified Linear Unit (ReLU) function is a popular activation function used in
artificial neural networks, particularly in deep learning models.
It introduces non-linearity to the network by outputting the input directly if it is
positive, and zero otherwise. Mathematically, the ReLU function can be defined as:
f(x)=max(0,x)
Where,
• For any input x, if x is greater than zero, the function outputs x.
• If x is less than zero, the function outputs zero.
Graphically, the ReLU function appears as a linear function with a positive slope for
positive input values, and it flattens out to zero for negative input values.
5. Sigmoid Function or Logistic Function:
The sigmoid function is a common activation function used in artificial neural networks.
It is a smooth, S-shaped function that squashes or compresses the input values into the
range between 0 and 1.
The sigmoid function is particularly useful for binary classification tasks, where the goal
is to produce a probability score indicating the likelihood of an input belonging to one of
two classes.
There are two types of sigmoid function:
1. Binary sigmoid function
2. Bipolar sigmoid function
Backpropagation :
Backpropagation is a fundamental algorithm used for training artificial
neural networks, including Multi-layer Perceptron’s and other deep learning
models.
In 1986, an efficient method of training an ANN was discovered. In this
method, errors, i.e. difference in output values of the output layer and the
expected values, are propagated back from the output layer to the preceding
layers. Hence, the algorithm implementing this method is known as
backpropagation, i.e. propagating the errors backward to the preceding
layers.
The backpropagation algorithm is applicable for multi-layer feed forward
networks. It is a supervised learning algorithm which continues adjusting the
weights of the connected neurons with an objective to reduce the deviation
of the output signal from the target output.
This algorithm consists of multiple iterations, also known as epochs.
Each epoch consists of two phases —
1. A forward phase in which the signals flow from the neurons in the input
layer to the neurons in the output layer through the hidden layers. The
weights of the interconnections and activation functions are used during
the flow. In the output layer, the output signals are generated.
2. A backward phase in which the output signal is compared with the
expected value. The computed errors are propagated backwards from the
output to the preceding layers. The errors propagated back are used to
adjust the interconnection weights between the layers.
The iterations continue till a stopping criterion is reached. The figure below
depicts a reasonably simplified version of the backpropagation algorithm.
Here's an overview of how backpropagation works:
1. Forward Pass:
During the forward pass, input data is passed through the network, layer
by layer, to produce a predicted output. Each layer applies a set of linear
transformations (matrix multiplications) and non-linear activation functions to
the input data.
1. Loss Calculation:
After the forward pass, the difference between the predicted outputs and
the actual targets (the ground truth) is computed using a loss function.
Common loss functions include mean squared error (MSE) for regression tasks
and cross-entropy loss for classification tasks.
1. Backward Pass (Backpropagation):
In the backward pass, gradients (derivatives of the loss function)of the loss
function with respect to the network's parameters such as weights and
biases are computed recursively using the chain rule of calculus. Gradients are
computed layer by layer, starting from the output layer and moving backward
through the network.
1. Gradient Descent:
Once the gradients are computed, the network's parameters are updated
in the opposite direction of the gradient (i.e., descending along the gradient)
to minimize the loss function. This process is known as gradient descent. The
magnitude of the parameter updates is controlled by a learning rate
hyperparameter.
5. Iterative Training:
The forward pass, loss calculation, backward pass, and parameter updates
are repeated iteratively for multiple epochs (passes through the entire training
dataset). During training, the network's parameters gradually adjust to
minimize the error between predicted outputs and actual targets, improving
the network's performance on the task.
Backpropagation enables neural networks to learn complex patterns and
relationships in data by iteratively adjusting their parameters based on the
error feedback from the training data.
It is a key algorithm in the field of deep learning and has enabled the
development of powerful models for a wide range of tasks, including image
classification, natural language processing, and reinforcement learning.
10 Mark Questions:
Each of the layers may have varying number of neurons. For example, the one
shown above has ‘m’ neurons in the input layer and ‘r’ neurons in the output layer, and
there is only one hidden layer with ‘n’ neurons. The net signal input to the neuron in the
hidden layer is given by
for the k-th output neuron. The net signal input to the neuron in the output layer is given by
3. Competitive Network:
A Competitive Network, also known as a Self-Organizing Map (SOM) or Kohonen
Network. The competitive network is almost the same in structure as the single-layer
feed forward network.
The only difference is that the output neurons are connected with each other
(either partially or fully). The figure depicts a fully connected competitive network. In
competitive networks, for a given input, the output neurons compete amongst
themselves to represent the input. It represents a form of unsupervised learning algorithm
in ANN that is suitable to find clusters in a data set.
Competitive Network is a type of neural network used for unsupervised
learning and dimensionality reduction. It is commonly used for clustering and
visualization of high-dimensional data.
4. Recurrent Network:
We have seen that in feed forward networks, signals always flow from the input
layer towards the output layer (through the hidden layers in the case of multi-layer feed
forward networks), i.e. in one direction.
In the case of recurrent neural networks, there is a small deviation. There is a
feedback loop, as depicted in Figure from the neurons in the output layer to the input
layer neurons. There may also be self-loops.
2. Write a note on Artificial Neuron and Biological Neuron along with their
differences? Briefly explain the Learning process in Artificial Neural
Network?
Machine learning, as we have seen, mimics the human form of learning.
On the other hand, human learning, or for that matter every action of a human
being, is controlled by the nervous system.
In any human being, the nervous system coordinates the different actions
by transmitting signals to and from different parts of the body.
The nervous system is constituted of a special type of cell, called neuron or
nerve cell, which has special structures allowing it to receive or send signals to
other neurons. Neurons connect with each other to transmit signals to or
receive signals from other neurons. This structure essentially forms a network
of neurons or a Neural Network.
Note:
1. The CNS integrates all information, in the form of signals, from the
different parts of the body.
2. The peripheral nervous system, on the other hand, connects the CNS with
the limbs and organs.
3. Neurons are basic structural units of the CNS.
4. A neuron is able to receive, process, and transmit information in the form
of chemical and electrical signals.
1. Inputs:
An artificial neuron receives input signals from other neurons or external
sources. Each input is associated with a weight that represents the strength
or importance of that input signal to the neuron.
2. Weights:
The weights assigned to the inputs determine how much influence each
input has on the neuron's output. A higher weight amplifies the input signal's
contribution, while a lower weight diminishes it. The weights are parameters
of the neuron that are adjusted during the training process to optimize the
network's performance.
3. Summation:
The neuron computes a weighted sum of its inputs by multiplying each
input signal by its corresponding weight and summing up the results.
Mathematically, this can be represented as the dot product of the input vector
and weight vector, followed by adding a bias term
5. Output:
The output of the activation function represents the neuron's response to
the input signals. It can be interpreted as the neuron's activation level or
firing rate.
This output is either transmitted to other neurons as input or serves as the
final output of the neural network.
1. Number of Layers:
As we have seen, a neural network may have a single layer or multi-layer.
In the case of a single layer, a set of neurons in the input layer receives signal,
i.e. a single feature per neuron, from the data set. The value of the feature is
transformed by the activation function of the input neuron. The signals
processed by the neurons in the input layer are then forwarded to the
neurons in the output layer. The neurons in the output layer use their own
activation function to generate the final prediction.
More complex networks may be designed with multiple hidden layers
between the input layer and the output layer. Most of the multi-layer
networks are fully connected.