Aiml 1,2,3,5
Aiml 1,2,3,5
4. Computer Vision
This branch allows machines to see, interpret, and process visual information
from the world.
Applications:
o Facial recognition
o Object detection
o Medical image analysis
o Surveillance systems
5. Expert Systems
Expert systems are AI programs that simulate the decision-making ability of a
human expert.
Features:
o Knowledge base (facts and rules)
o Inference engine (logic to apply rules)
Example: Medical diagnosis systems, legal advisors
6. Robotics
Robotics combines AI with mechanical engineering to create intelligent
machines that can perform tasks in the real world.
Examples:
o Industrial robots
o Delivery drones
o Humanoid robots like Sophia
7. Fuzzy Logic
Fuzzy logic helps AI systems handle uncertain or imprecise information, unlike
classical logic that deals with true or false.
Used in:
o Climate control systems
o Automatic gear transmission
o Washing machines
8. Cognitive Computing
Cognitive computing aims to simulate human thought processes in a
computerized model. It uses AI and signal processing to mimic human brain
functioning.
Applications:
o Personalized learning
o Medical research analysis
Summary Table:
Features of an AI Technique:
1. Efficiency: Should represent knowledge in a way that makes solving
problems fast.
2. Flexibility: Should handle a variety of situations, including unexpected
ones.
3. Generality: Should be applicable to many types of problems.
4. Correctness: Should give accurate results or good approximations.
1. Problem Representation:
State: Current configuration of the 3x3 board.
Initial State: Empty board.
Players: Maximizer (X) and Minimizer (O).
Moves: Placing a symbol in an empty cell.
Terminal States: Win, Lose, or Draw.
Utility Values:
o Win: +10 (for X), -10 (for O)
o Draw: 0
3. Example:
Imagine the current board state is:
AI (X) has to make a move.
The algorithm explores all empty spots.
Applies minimax on each possible future state.
Chooses the move that gives the best outcome for X (winning or drawing
at least).
Advantages of KBS:
Provides expert-level solutions.
Available 24/7.
Reduces human error.
Speeds up decision-making.
Disadvantages:
Cannot learn automatically (unless combined with ML).
Needs regular updates.
Difficult to build a complete and correct knowledge base.
Question-4 What is PEAS? Explain different agent types with their PEAS
descriptions.
PEAS stands for:
Performance Measure, Environment, Actuators, Sensors
It is a framework used to describe an intelligent agent by clearly defining its
task environment. The PEAS model helps in designing and understanding
agents by specifying:
1. Performance Measure: What defines success for the agent?
2. Environment: The surroundings in which the agent operates.
3. Actuators: Devices the agent uses to take actions.
4. Sensors: Devices the agent uses to perceive the environment.
Component Description
Performance Measure Cleanliness, energy efficiency
Environment Rooms with dirt and obstacles
Actuators Wheels, vacuum, brushes
Sensors Dirt sensor, bump sensor
✅ 3. Goal-Based Agent
Description: Takes actions to achieve a specific goal.
Example: Autonomous car reaching a destination
Component Description
Performance Measure Safe and fast arrival at destination
Environment Roads, traffic, pedestrians
Actuators Steering, brakes, accelerator
Sensors Cameras, GPS, LIDAR, speedometer
✅ 4. Utility-Based Agent
Description: Chooses actions based on a utility function (preferences).
Example: Shopping recommendation system
Component Description
Performance Customer satisfaction, sales growth
Measure
Environment Online users, product catalog
Actuators Product suggestions
Sensors User behavior, preferences, purchase history
✅ 5. Learning Agent
Description: Can learn from past experiences and improve its behavior.
Example: Personalized virtual assistant (like Siri, Alexa)
Component Description
Summary Table:
Components of a Problem:
1. Initial State
o The starting point of the agent.
o Example: Robot at position A.
2. Actions / Successor Function
o The set of all possible actions that can be taken from a state.
o Also defines the result of each action.
3. State Space
o The set of all states reachable from the initial state by applying
sequences of actions.
4. Goal Test
o A function to determine whether the current state is a goal state
or not.
5. Path Cost
o The cost associated with a path from the initial state to the goal
state.
o Helps in finding an optimal solution.
🔹 Example:
Let’s say:
"John is a student"
We can represent it as:
👉 Student(John)
Here, Student is the predicate.
John is the object (constant).
The sentence says that the predicate 'Student' is true for 'John'.
🔹 Relationship Example:
"Alice likes Bob"
Represented as:
👉 Likes(Alice, Bob)
Likes is a predicate showing a relationship between two people.
✅ 2) Quantifier
A quantifier is used to indicate the quantity (how many) of subjects the
predicate applies to.
There are two main types:
Existential ∃
There exists / At least "There is a student who is
one smart"
👉 ∀x (Human(x) → Mortal(x))
Full logic:
∃x (x < 4)
✅ So, we write:
"There is a y" → ∃y
"y = x + 1" → that stays the same
∀x ∃y (y = x + 1)
✅ So, we write:
∃y ∀x (y = x + 1)
✅ So, we write:
(Note: It’s correct symbolically, but doesn’t make sense in real math.)
∀x (Rational(x) → Real(x))
✅ So, we write:
∀x (Man(x) → Mortal(x))
✅ So, we write:
∀x (x < 4 ∨ x ≥ 4)
✅ So, we write:
Symbol: ∀
Meaning: "For all", "Every", or "Each"
It says that the statement is true for every possible value.
🔹 Example:
∀x P(x) means:
➡️"For every x, P(x) is true."
Symbol: ∃
Meaning: "There exists", "At least one", or "Some"
It says the statement is true for at least one value.
🔹 Example:
∃x Q(x) means:
➡️"There is at least one x for which Q(x) is true."
Or in natural English:
"Some person is a teacher."
a) ∀x P(x)
Meaning: For every x, P(x) is true.
English sentence:
➡️"All x satisfy property P"
(Example: If P(x) means "x is honest", then this means:
"All people are honest.")
b) ∃x Q(x)
Meaning: There is at least one x for which Q(x) is true.
English sentence:
➡️"There exists an x such that Q(x) is true."
(Example: If Q(x) means "x is a doctor", then this means:
"There is at least one doctor.")
Question-10 Use Q(x) for x is a rational number and R(x) for x is a real number
Translate the following statements using the quantifiers:-
a) All rational numbers are real numbers.
This means: "If x is a rational number, then x is a real number."
👉 ∀x (Q(x) → R(x))
Symbolic Translation:
👉 ∀x (Q(x) → ¬R(x))
Symbolic Translation:
👉 ∃x (Q(x) ∧ R(x))
Symbolic Translation:
👉 ∃x (Q(x) ∧ ¬R(x))
Symbolic Translation:
In the Hill Climbing algorithm, the process begins with an initial solution,
which is then iteratively improved by making small, incremental changes.
These changes are evaluated by a heuristic function to determine the quality
of the solution. The algorithm continues to make these adjustments until it
reaches a local maximum—a point where no further improvement can be
made with the current set of moves.
1. Local Maximum: A point that is higher than its neighboring points, but
not the highest overall. It’s a "small peak" that might not be the best
possible solution.
2. Global Maximum: The highest point in the entire diagram, representing
the best possible solution to the problem.
3. Plateau: A flat area where all nearby points have the same value. It’s
hard for the algorithm to figure out where to go next since all options
seem equally good.
4. Ridge: A long, sloped area that looks like a peak. The algorithm might
get stuck here, thinking it’s the highest point, even though better solutions
might be nearby.
5. Current State: This is where the algorithm is at any point during its
search, representing its current position in the diagram.
6. Shoulder: A flat area with a slight upward slope at one edge. If the
algorithm keeps going, it might find better solutions beyond the flat area.
https://youtu.be/dEs_kbvu_0s?si=ODAyDzAelw4qhz1R
The AO* (And-Or Star) algorithm is used for searching AND-OR graphs, where problems
can have:
OR nodes: Choose one of the branches (like in decision trees).
AND nodes: All child nodes must be solved together (like solving subproblems in
parallel).
It's typically used in problem-solving, automated planning, and heuristic-based AI
systems.
Question-7 What are the problems in hill climbing search methods due to which they may
fail to find the solutions?
Hill Climbing is a simple and commonly used local search algorithm that
continuously moves in the direction of increasing value (uphill) to find the
peak (optimal solution). However, it suffers from several limitations that
can prevent it from finding the best solution.
3. Ridges
The optimal path is along a steep slope, but moving directly uphill
doesn’t lead there.
The algorithm can’t follow the ridge because it moves only in simple
directions (e.g., north, south, east, west).
🔹 Example: Trying to climb a mountain ridge but taking only straight steps;
you need diagonal or smarter moves.
5. Lack of Backtracking
Once it moves in a direction, it doesn’t remember past decisions or
explore other paths.
So it can't recover from a wrong turn.
Solutions/Improvements
To handle these issues, variants are used:
Stochastic Hill Climbing – picks a random uphill move.
First-choice Hill Climbing – chooses the first move that improves.
Simulated Annealing – allows some downhill moves to escape local
maxima.
Beam Search – keeps track of multiple states at a time.
Genetic Algorithms – maintain a population of states.
⚠️Limitations of Sigmoid
Vanishing Gradient Problem:
For very high or low inputs, the gradient becomes near zero, slowing
down or stopping learning.
Outputs not zero-centered:
Causes gradients to zigzag and makes optimization slower.
Because of these issues, ReLU and its variants are now more popular in modern
deep learning architectures.
🧠 Structure of a Perceptron
Component Description
⚠️Common Mistakes
Using too many layers without enough data → overfitting
Using too few neurons → underfitting
Ignoring validation performance → misleading accuracy
📌 Summary
1. Dataset
A collection of data used to train or evaluate a model.
Types:
o Training Set: Used to train the model.
o Testing Set: Used to test the model’s accuracy.
o Validation Set (optional): Used to tune the model’s
hyperparameters.
4. Model
A mathematical representation that maps inputs (features) to outputs
(predictions).
It is built by learning from the training data.
5. Algorithm
A method or set of rules used by the model to learn patterns from
data.
Examples: Linear Regression, Decision Trees, k-Nearest Neighbors (k-
NN).
6. Training
The process of feeding data into a model so that it learns to make
accurate predictions.
Involves adjusting internal parameters to minimize error.
7. Prediction
The output generated by the model when it processes new input data.
8. Overfitting
When a model performs well on training data but poorly on unseen
data because it has learned noise and details too well.
Symptom: High accuracy on training data, low accuracy on test data.
9. Underfitting
When a model is too simple and fails to learn the underlying patterns
in the data.
Symptom: Poor performance on both training and test data.
🔑 1. Classification
Goal: Assign input data to one of the predefined categories or classes.
Type: Supervised Learning
Example: Email spam detection (spam or not spam), disease diagnosis
(positive or negative).
🔑 2. Regression
Goal: Predict a continuous numeric value based on input features.
Type: Supervised Learning
Example: Predicting house prices, stock market forecasting,
temperature prediction.
🔑 3. Clustering
Goal: Group similar data points together without predefined labels.
Type: Unsupervised Learning
Example: Customer segmentation, grouping articles by topic.
🔑 4. Dimensionality Reduction
Goal: Reduce the number of input variables (features) while retaining
important information.
Type: Unsupervised Learning
Example: Visualizing high-dimensional data, speeding up learning
algorithms.
Techniques: PCA (Principal Component Analysis), t-SNE
🔑 6. Recommendation Systems
Goal: Suggest items or content to users based on their preferences or
behavior.
Type: Supervised, Unsupervised, or Reinforcement Learning
Example: Product recommendations on Amazon, movie suggestions on
Netflix.
🔑 7. Ranking
Goal: Order items based on relevance or importance.
Example: Search engine results, job candidates ranking.
K-Means, Hierarchical
Clustering Customer segmentation
Clustering
Collaborative Filtering,
Recommendation Suggesting movies/products
Matrix Factorization
📌 Example Summary:
Situation Suggested Algorithm
✅ Summary Table:
Step Purpose
🧠 Key Idea:
“Learn from the past (training data with answers) to predict the future.”
🟩 Advantages:
Produces highly accurate models if enough labeled data is available
Easy to evaluate using metrics like accuracy and error rate
🟥 Disadvantages:
Needs a lot of labeled data, which can be expensive to collect
May not generalize well if the training data is biased
🧠 Example:
Suppose you want to classify animals based on their features:
🔍 Types of Classification:
1. Binary Classification:
o Only two classes
o Example: Yes/No, Spam/Not Spam
2. Multiclass Classification:
o More than two classes
o Example: Cat, Dog, Bird
3. Multilabel Classification:
o Each input can belong to multiple classes at the same time
o Example: A news article might be labeled as both Politics and
Economy
✅ Advantages ❌ Disadvantages
Where:
P(A|B) = Probability of A given B (posterior)
P(B|A) = Probability of B given A (likelihood)
P(A) = Probability of A (prior)
P(B) = Probability of B (evidence)
✅ Pros:
Fast and simple
Works well with large datasets and text
❌ Cons:
Assumes feature independence (not always true)
🧠 Key Concepts:
1. Agent: The learner or decision-maker.
2. Environment: The world the agent interacts with.
3. Action: Choices the agent makes.
4. State: A snapshot of the environment at a particular
time.
5. Reward: Feedback from the environment based on the
action taken.
✅ Types of RL Algorithms:
1. Q-Learning: A model-free algorithm where the agent
learns the value of actions in states.
2. Deep Q-Networks (DQN): Combines Q-learning with
deep learning for complex environments.
3. Policy Gradient Methods: Directly optimize the agent’s
policy.
✅ Applications of RL:
Robotics: Robots learning to perform tasks.
Gaming: AI playing games like Chess, Go, or video
games.
Self-Driving Cars: Learning to drive by interacting with
the environment.
Healthcare: Personalized treatment planning.
Question-11: Differentiate: -
a) Regression & Classification.
Aspect Regression Classification
Definition Predicts continuous Predicts discrete
values. labels or categories.
Output A real number (e.g., A class label (e.g.,
25.3, 1500). Spam, Not Spam).
Example Predicting house prices Classifying emails as
based on features (e.g., Spam or Not Spam.
size, location).
Algorithms Linear Regression, Logistic Regression,
Decision Trees, Random SVM, Decision Trees
Forest (for continuous (for categorical
outputs). outputs).
Goal To minimize the error To assign the correct
between predicted and class label based on
actual continuous features.
values.
Evaluation Mean Squared Error Accuracy, Precision,
Metric (MSE), R² (coefficient of Recall, F1 Score.
determination).
b) Feature Extraction.
Feature Extraction is the process of transforming the original
features into a smaller set of new features that still capture the
essential information from the original ones. It reduces the
dimensionality of the dataset by combining or transforming features
into more meaningful forms.
🔑 What is Bias?
Bias is the error that occurs when your model makes too many assumptions
and doesn’t fit the data well. A model with high bias is too simple and can't
capture the patterns in the data, leading to underfitting.
High Bias: The model misses the patterns and makes incorrect
predictions (e.g., a straight line to predict a curve).
Low Bias: The model can capture the patterns in the data better.
🔑 What is Variance?
Variance is the error that happens when your model is too sensitive to the
small details in the training data. A model with high variance fits the data too
closely, including noise or mistakes, leading to overfitting.
High Variance: The model fits the training data too well but doesn't
perform well on new, unseen data (e.g., memorizing the data).
Low Variance: The model is stable and doesn’t change too much when
trained on different data sets.
✅ The Tradeoff
You need to find a balance between bias and variance to create a model that
works well on both training data and new data:
High Bias, Low Variance: The model is too simple and doesn't fit the data
well (underfitting).
Low Bias, High Variance: The model is too complex and fits the training
data too closely (overfitting).
Low Bias, Low Variance: The ideal model that fits the data well and
generalizes to new data.
✅ Examples
1. High Bias (Underfitting): Imagine trying to predict house prices using
only one feature, like size, with a simple linear model. It would ignore
other important features like location, so it wouldn’t do well.
2. High Variance (Overfitting): Now, imagine using a very complex model
(like a decision tree) that fits the training data perfectly, but it doesn't
work well on new data because it learned too much of the noise in the
data.
3. Balanced Model: Using a simpler model or adding regularization (e.g.,
pruning the decision tree) can help find the right balance and make the
model perform well on both training and new data.
✅ Key Takeaways
Bias = Error from overly simple models (underfitting).
Variance = Error from overly complex models (overfitting).
The goal is to balance bias and variance to get a model that fits well and
performs well on new data.
Applications:
Medical diagnosis
Fraud detection
Machine learning
Risk assessment
Natural language processing
🔍 When to Use EM
When your data has missing values.
When the model involves latent variables, such as in:
o Gaussian Mixture Models (GMMs)
o Hidden Markov Models (HMMs)
o Bayesian Networks with hidden nodes
✅ Advantages
Can handle incomplete data.
Often converges quickly.
Provides a general-purpose framework for parameter estimation in
probabilistic models.
⚠️Disadvantages
Only guarantees local maxima, not global.
Convergence can be slow in some cases.
Can be sensitive to initialization.
✅ What is a Hyperplane?
A hyperplane is a decision boundary that:
Separates data points of different classes.
In 2D: it's a line.
In 3D: it's a plane.
In higher dimensions: it’s called a hyperplane.
SVM aims to find the optimal hyperplane that maximizes the margin.
🔄 Different Scenarios
1. Linearly Separable Case
Classes can be perfectly separated by a straight line (or hyperplane).
SVM finds the hyperplane with the maximum margin.
Example:
The support vectors
are the closest o and x to the hyperplane.
2. Non-Linearly Separable Case
Data isn't separable by a straight line.
SVM uses a kernel trick to transform the data into a higher dimension
where it becomes linearly separable.
Example:
A circular pattern can be separated using the Radial Basis Function (RBF)
kernel.
3. Soft Margin SVM
Used when data has noise or overlaps.
Allows some misclassification but tries to maintain a balance between
maximizing margin and minimizing errors.
Introduces a regularization parameter (C) to control this trade-off.
🧠 Summary
Term Meaning
Support
Closest points to the hyperplane; define the boundary
Vectors
space
🟰 What is a Hyperplane?
A hyperplane is just a fancy word for a line (in 2D), or a plane (in 3D),
that separates the data.
It looks like this in 2D:
yaml
CopyEdit
Class A: o o o
|
| <--- Hyperplane (line)
|
Class B: x x x
📏 What is Margin?
Margin is the space between the hyperplane and the nearest points
(support vectors) from each class.
The wider the margin, the better the model.
Where:
www = the weight (slope) of the line.
Bigger margin → better classification.
✍️• Hyperplane
A decision boundary that separates data points into different classes.
In 2D: it's a line, in 3D: a plane, in higher dimensions: still called a
hyperplane.
SVM tries to find the best hyperplane that gives the largest margin
between classes.
🧠 Example:
Suppose you have data like this:
Class A (o): in the center
Class B (x): surrounding in a circle
📌 How It Works:
1. Original data (non-linear) → mapped to higher dimension using a
kernel function.
2. In higher dimension, SVM finds a linear hyperplane.
3. That hyperplane corresponds to a non-linear boundary in original
space.
✅ What is AdaBoost?
AdaBoost stands for Adaptive Boosting.
It is a machine learning algorithm used for classification (and sometimes
regression).
💡 Main Idea:
Combine many weak learners (simple models like decision stumps) to
create a strong learner.
It focuses more on the mistakes made by previous models.
🧪 Simple Example:
Imagine you’re trying to classify apples 🍎 and oranges 🍊 using pictures.
1. First small tree says:
"If round → Apple" (gets 70% correct)
2. Next model focuses on the 30% it got wrong.
3. Repeat this 5–10 times.
4. Final model combines all and gives a much better result.
Speed Slower (because it's sequential) Faster (trees are built independently)
Accuracy Often better on clean data Better for large, messy data
✅ Summary:
AdaBoost = Many weak models built one after another, each fixing the
mistakes of the previous one.
It works best on clean data.
It's different from Random Forest, which builds many full trees
independently and averages their results.
Question-7:- What is the General Principle of Ensemble method?Discuss the
Bagging and Boosting with their Difference.
Sure! Here's a simple and easy explanation of the Ensemble method, with
Bagging vs Boosting:
✅ 2. Boosting
Builds models one after another (sequentially).
Each new model focuses more on the mistakes made by the previous
model.
Final prediction = weighted vote of all models.
📌 Example:
AdaBoost, Gradient Boosting, XGBoost
🔧 How it helps:
Reduces bias (learns complex patterns).
Focuses on hard-to-classify examples.
Error
All errors treated equally Focuses more on difficult cases
Handling
Stump
A stump is a very simple decision tree with just one level, meaning it has only
one split. It takes one feature from the data and splits it into two parts. In other
words, it only makes one decision, dividing the data based on a single
condition.
For example, if you were predicting whether someone will buy a product, a
stump might look at just one feature like age and split the data into "younger
than 30" and "older than 30". It’s a basic decision tree and is considered a
weak learner because it can’t capture complex patterns by itself.
Weak Learners
A weak learner is a model that performs slightly better than random guessing.
It doesn't have strong predictive power on its own but can still make useful
contributions when combined with other weak learners.
In boosting algorithms like AdaBoost, many weak learners are combined to
create a strong model. A weak learner could be something like a simple
decision stump (a one-level decision tree), or any model that is not very
accurate by itself, but when used together with other weak models, it can
significantly improve performance.
In summary:
Stump: A decision tree with just one split, very simple.
Weak Learner: A model that is not very accurate on its own but can be
part of a stronger combined model in ensemble methods.
MODULE-3:-Artificial Neural Networks
ANS:
It decides whether the neuron’s output signal should be passed forward or not.
With activation functions, the network can learn complex and non-linear
relationships in data (like speech, images, and text).
It helps the neural network to mimic the human brain, where not all neurons fire all
the time — only important ones do.
The sigmoid function is a popular activation function. It looks like an "S" curve and
converts any value into a range between 0 and 1.
🧠 Formula of sigmoid:
ANS:
It helps in finding the best weights in a neural network so that predictions are
accurate.
o The black dot in the diagram shows the starting point (initial weight).
o This is a random guess, and it usually doesn’t give the lowest cost.
o A small step is taken downhill from the initial weight (shown by arrows).
o We keep updating the weights, moving closer to the bottom of the curve
(minimum cost).
o After many steps, we reach the lowest point in the curve (shown at the
bottom right).
o This is where the cost is minimum, and the weights are optimal for the
model.
Question-3:-What is Perceptron model ? write an algorithm for the perceptron learning rule.
What are the limitations of Perceptron.
ANS:
o Compute output:
o Where:
η = learning rate
d = desired output
y = actual output
3. Repeat until all outputs are correct or a max number of epochs is reached.
⚠️Limitations of Perceptron
ANS:
✅ What is Backpropagation?
It is used to minimize the error by adjusting the weights of the network using
Gradient Descent.
It works by propagating the error backward from the output layer to the input layer.
🔹 1. Forward Pass
🔹 2. Error Calculation
Calculate the error (loss) using a loss function like Mean Squared Error (MSE).
Calculate the gradient of the loss with respect to each weight (using derivatives).
The error is propagated backward through the layers using the chain rule.
🔹 4. Update Weights
where:
o www = weight,
🔁 5. Repeat
Repeat the process for many epochs (iterations) until the error is minimized.
2 inputs,
1 output neuron,
🧩 Inputs:
Input = [1, 0]
Target Output = 1
🧮 Process:
1. Forward pass:
2. Calculate error:
3. Backward pass:
4. Update weights:
o Use gradients and learning rate to slightly change weights in direction that
reduces error.
https://youtu.be/QZ8ieXZVjuE?si=xp7Jvv6jTCljRjjv
Question-5:- Write note on Tunning network size of neural network.
ANS:
A large network may memorize the data but fail on new data (overfitting).
✅ 1. Start Simple
Begin with 1 hidden layer and a small number of neurons (e.g., 4–10).
✅ 2. Use Cross-Validation
Try different network sizes and pick the one with best validation performance.
✅ 4. Rule of Thumb
Try grid search or automated tools like Keras Tuner to test different sizes.
Question-6:- What are the basic elements of Biological neuron? What are equivalent
elements in ANN?
ANS:
1. Dendrites: These are the tree-like structures that receive signals from other neurons.
They act as the input channels for a neuron.
2. Cell Body (Soma): The cell body integrates the signals received from the dendrites
and contains the nucleus of the neuron.
3. Axon: This is a long projection that carries the electrical signal away from the cell
body to other neurons or muscles. It transmits the output.
4. Axon Terminals: These are the endings of the axon, where the signal is transmitted
to the next neuron or muscle.
5. Synapse: The synapse is the gap between two neurons, where neurotransmitters are
released to transmit the signal across.
6. Myelin Sheath: This is a fatty layer that insulates the axon and helps speed up the
transmission of the signal.
1. Dendrites → Input layer: The input layer of an ANN receives the data, similar to how
dendrites receive signals.
4. Axon Terminals → Output layer: The output layer of an ANN produces the final
result, similar to how the axon terminal sends signals to the next neuron.
5. Synapse → Activation Function: The synapse in biological neurons is where the
signal is transmitted using neurotransmitters. In ANN, this is analogous to the
activation function that determines if a neuron will "fire" and pass the signal forward.