0% found this document useful (0 votes)
17 views23 pages

Ai Final

The document covers various search algorithms and optimization techniques in AI, including exhaustive search, greedy search, hill climbing, simulated annealing, and gradient descent. It also discusses Prolog's logic programming capabilities, rule-based systems, fuzzy logic, and machine learning types such as supervised, unsupervised, and reinforcement learning. Key concepts include the importance of balancing exploration and exploitation, the use of fuzzy sets for handling uncertainty, and the machine learning process from data collection to model evaluation.

Uploaded by

shaikham2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

Ai Final

The document covers various search algorithms and optimization techniques in AI, including exhaustive search, greedy search, hill climbing, simulated annealing, and gradient descent. It also discusses Prolog's logic programming capabilities, rule-based systems, fuzzy logic, and machine learning types such as supervised, unsupervised, and reinforcement learning. Key concepts include the importance of balancing exploration and exploitation, the use of fuzzy sets for handling uncertainty, and the machine learning process from data collection to model evaluation.

Uploaded by

shaikham2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Intro to AI final

Lecture 2

Exhaustive Search:
Systematically explores all possible solutions.
Guarantees finding the global optimum but is computationally
expensive.

Greedy Search & Hill Climbing:


Travelling Salesman Problem (TSP)
Greedy Search: Makes locally optimal choices, hoping to find a
global optimum.

Hill Climbing: Iteratively improves a single solution by


comparing neighboring solutions.

Exploration vs. Exploitation:

Balancing between exploring new possibilities (exploration)


and refining known good solutions (exploitation).
Critical in achieving an optimal balance for solution discovery.

Simulated Annealing:
Gradient ascent / descent Mimics the annealing process in metallurgy.
Uses a probabilistic technique to escape local optima and find a
global optimum.

Continuous Optimization & Gradient Descent:

Applied to continuous, differentiable functions.


Gradient Descent: Minimizes a function by iteratively moving
towards the steepest descent.

Gradient Ascent: Maximizes a function by moving towards the


steepest ascent.
Gradient Descent
Assignment 1
1. In hill climbing search, what is a “local maximum”?

Answer: A peak that is higher than neighboring points but not the highest overall

2. What type of problems is gradient descent particularly useful for?

Answer: Continuous optimization problems

3. What is the primary purpose of simulated annealing in optimization?

Answer: To avoid getting trapped in local optima by allowing occasional worse moves

4. What is the goal of optimization in computational problems?

Answer: To find the best possible solution

5. Which of the following best describes “exhaustive search”?

Answer: Considering all possible solutions

6. What is the main principle behind greedy search algorithms?

Answer: To choose the best immediate option at each step

7. Greedy algorithms always find the global optimum in all cases.

Answer: False

8. In continuous optimization, what is the goal of gradient descent?

Answer: To find the lowest point in a solution space

9. Which optimization technique is inspired by the process of cooling metals?

Answer: Simulated annealing


Lecture 3
Prolog is its ability to handle uncertain or
incomplete information.
Fact: proposition is assumed to be true
Query: truth of the proposition is to be
determined
In Prolog, a programmer can specify a set of rules
and facts that are known to be true, but they
can also specify rules and facts that might be
true or false.
Headed Horn Clause: Contains a single
atomic proposition on the left side.

Headless Horn Clause: Has an empty left


The Prolog interpreter will then use those rules side, typically used to state facts.
and facts to automatically reason about the
problem domain and find solutions that are
most likely to be correct, given the available
information.

Bottom-up resolution, forward chaining


Begin with facts and rules of the database and attempt to find a sequence that leads to a goal

Works well with a large set of possibly correct answers

Top-down resolution, backward chaining


Begin with a goal and attempt to find a sequence that leads to a set of facts in the database

Works well with a small set of possibly correct answers

Prolog implementations use backward chaining

Logical Operators
Breadth-First Search (BFS) Depth-First Search (DFS)
A search strategy that explores all nodes at the A search strategy that explores as far down a branch
present depth level before moving to the next level. as possible before backtracking.
How It Works:
Explores all subgoals at the current level How It Works:
simultaneously. -Focuses on one subgoal at a time.
Moves level by level, expanding all possible nodes -Attempts to find a complete proof for the first
at each depth. subgoal before moving to the next.
Ensures all possible paths are considered in -If a subgoal fails, backtracks to the previous choice
parallel. point to try alternative paths.

Advantages:
Advantages:
Guaranteed to find the shortest path to a solution, if
-Efficient in terms of memory usage.
one exists.
-Quickly finds a solution in deep but narrow search
Does not get stuck in infinite loops.
spaces.
Disadvantages:
Can be memory-intensive as it stores all nodes at Disadvantages:
the current level. May get stuck in deep or infinite paths (i.e., may not
Slower in deep search spaces compared to DFS. find a solution if it exists on a different path

QUESTION 1
What is the primary purpose of logic programming?

Answer: to define rules and relationships for problem-solving.

QUESTION 2
In predicate calculus, what is a "predicate"?

Answer: A function that represents a property or relationship.

QUESTION 3
Which logic programming language is most commonly associated with artificial intelligence and computational linguistics?

Answer: Prolog.

QUESTION 4
In Prolog, which search strategy explores as far down a branch as possible before backtracking?

Answer: depth-first search.

QUESTION 5
Which is a common application of logic programming?

Answer: Natural language processing.

QUESTION 6
In logic programming, "resolution" is used to prove the truth of propositions.

Answer: True.

QUESTION 7
Prolog uses depth-first search as its default strategy for solving goals.

Answer: True.
Lecture 4

Types of Rule-Based Inference

1. Forward Chaining:
• Starts with known facts and applies rules to generate new facts.
• Continues until a specific goal is reached.
• Commonly used in real-time systems.
Example:
For a weather-based system:
• Input: Temperature = 5°C, Condition = Rainy
• Output: “Wear a coat and take an umbrella.”

2. Backward Chaining:
• Starts with a goal and works backward to find facts supporting the goal.
• Often used in diagnostic systems (e.g., determining a disease based on symptoms).
Example:
Diagnosing a disease by analyzing symptoms, where the system works backward from the symptom to find
possible diseases.

Steps to Build a Rule-Based System

1. Define the Problem Domain: Understand the area where the system will be applied.
2. Gather Knowledge: Collect rules from domain experts.
3. Design the Knowledge Base: Organize the rules logically.
4. Implement the Inference Engine: Choose either forward or backward chaining.
5. Test and Validate: Ensure that the system works correctly with real-world data.

Fuzzy Logic is an extension of classical logic, where truth values range between 0 and 1 instead of being strictly
true (1) or false (0). It is used to handle uncertainty and imprecision, mimicking how humans reason.

Why Fuzzy Logic?


• Handles ambiguous and imprecise information.
• Suitable for systems that require approximate reasoning, like control systems, decision-making, and .
pattern recognition.

Basic Concepts in Fuzzy Logic

• Fuzzy Sets: In contrast to classical (crisp) sets, where an element either belongs or doesn’t, fuzzy sets
allow partial membership (values between 0 and 1).
Example:
• “Temperature is hot” could have a membership degree of 0.7 (somewhat hot) or 0.9 (very hot),
depending on how close it is to the boundary.

• Membership Functions: These define how each element maps to a degree of membership in a fuzzy set.

Crisp Sets vs. Fuzzy Sets

• Crisp Sets: A crisp set is binary in nature. An element either belongs to the set (1) or does not (0).
Examples include even and odd numbers or a class of students (boys or girls).
• Fuzzy Sets: Membership in fuzzy sets is gradual, and elements can partially belong to a set. For
example, a temperature of 25°C might be “somewhat hot,” while 30°C might be “very hot.”
Fuzzy Rule-Based Systems

A Fuzzy Rule-Based System uses fuzzy logic instead of Boolean logic to make decisions.

Components:
• Fuzzification Module: Converts crisp input values into fuzzy values (e.g., a temperature reading might be converted into fuzzy
terms like “warm”).

• Inference Engine: Applies fuzzy rules (e.g., “If temperature is warm, set heating to medium”).

• Defuzzification Module: Converts fuzzy outputs back into crisp values for decision-making.

Example:
A heating system might have rules such as:
• If temperature is “cold,” then set heating to “high.”
• If temperature is “warm,” then set heating to “medium.”
• If temperature is “hot,” then set heating to “low.”

Differences Between Probability and Fuzziness

• Probability: Deals with the likelihood of an event happening (e.g., “There’s a 50% chance of rain tomorrow”).
• Fuzziness: Deals with the degree of truth (e.g., “It’s somewhat warm today”). Fuzziness acknowledges that the boundary between
“cold” and “warm” isn’t sharp.

Fuzzy Set Representation

Fuzzy sets can be represented in various ways, depending on the membership function. The closer an element is to the ideal value, the higher
its membership value. As the element moves away, the membership decreases.

Example:
• For a heating system, a temperature of 15°C might be “warm” with a membership value of 0.7, while a temperature of 20°C might
be “warm” with a value of 0.9.

Advantages and Challenges of Fuzzy Logic

Advantages:
• Better at handling uncertain and imprecise data.
• Models human-like reasoning, making it suitable for applications like control systems.
• Flexible and adaptable.

Challenges:
• Defining accurate membership functions and rules can be difficult.
• Requires tuning to perform optimally.
• Not always as interpretable as rule-based systems.
Intro. To AI systems QUIZ 2 Assignment 2

QUESTION 1: Question 1
Rule-based systems are always more accurate than fuzzy
Fuzzy rule-based systems use: logic systems.
* Fuzzy logic for reasoning Answer: False

QUESTION 2: Question 2
Fuzzy sets allow for partial membership of elements. In a fuzzy set, membership values:
* True Answer: Can range from 0 to 1

QUESTION 3:
Question 3
Which inference method starts with a goal and works backward Fuzzy logic differs from classical logic in that:
to find supporting facts? Answer: Fuzzy logic allows for partial truth values between 0
and 1
* Backward Chaining

QUESTION 4: Question 4
What is the primary advantage of using fuzzy logic over
Fuzzy logic can handle situations with uncertainty and
imprecision. classical logic in certain applications?
Answer: Fuzzy logic can handle uncertainty and imprecision
* True

QUESTION 5: Question 5
Backward chaining is commonly used in expert systems.
Fuzzy logic differs from classical logic by using: Answer: True
* Degrees of membership
Question 6
QUESTION 6: Fuzzy logic can be used to model human reasoning and
A crisp set is characterized by: decision-making processes.
Answer: True
* Clear boundaries and distinct elements

QUESTION 7: Question 7
The process of converting crisp inputs into fuzzy values is
In fuzzy logic, a membership function defines: called:
* The degree of membership for elements in a fuzzy set
Answer: Fuzzification

QUESTION 8: Question 8
Which of the following is not a component of a rule-based
Which of the following is a key advantage of fuzzy logic?
system?
* Handles uncertainty and imprecision Answer: Defuzzification module
QUESTION 9:
Question 9
What is the primary component of a rule-based system that stores Which of the following is a common application of fuzzy
temporary information during processing? logic?
* Working Memory Answer: Air conditioning control systems
Lecture 5

Swap Mutation Pick


any two and swap them

a
1234567
0153426789

Insert mutation

12345678 Pick any 2


and put them beside each other

-
12364578

Scramble mutation

12345789 Pick a set and move them

-A

- 17892345

Cross over operation :

1232
setin
set

54354
Lecture 6

Introduction to Machine Learning

• Definition: Machine Learning involves algorithms that improve automatically through experience, enabling systems to:
• Generalize: Provide sensible outputs for unseen inputs.
• Extract and apply relevant information from data to analyze new data.

Types of Machine Learning

• Supervised Learning:
• Learning from labeled data to predict labels for new data.
• Example: Classifying animals based on given labels like “dog,” “cat,” etc.

• Unsupervised Learning:
• No labeled data; focuses on identifying patterns and grouping data.
• Example: Clustering users based on reading habits to recommend articles.

• Reinforcement Learning:
• Learning through rewards and penalties.
• Example: A pigeon pecking the right button for a reward.

The Machine Learning Process

Steps involved:
1. Data Collection and Preparation: Gathering and cleaning data.
2. Feature Selection and Extraction: Identifying key variables (features) from the data.
3. Algorithm Choice: Selecting an ML algorithm suitable for the task.
4. Model Selection: Choosing the best model structure.
5. Parameter Selection: Tuning model parameters for optimal performance.
6. Training: Teaching the model using training data.
7. Evaluation: Assessing the model using testing data.

Classification and Features

• Classifier:
• Maps objects to predefined labels (e.g., MNIST dataset for digit classification).
• Features:
• Attributes representing data.
• Can be categorical (e.g., color) or numerical (e.g., weight).
• Feature Engineering: Transforming raw data into suitable inputs for algorithms (e.g., scaling, encoding categorical data).

Supervised Learning

• Key Concepts:
• Training and Test Sets: Used to train and evaluate the model, respectively.
• Confusion Matrix: Tool to measure performance (e.g., true positives, false positives).
• Accuracy: Proportion of correct predictions out of total predictions.

Example Algorithm: k-Nearest Neighbors (kNN)

• How it Works:
• Measures similarity (distance) between data points in feature space.
• Predicts based on the majority label of the nearest neighbors.
• Key Considerations:
• The value of k (number of neighbors) impacts accuracy and generalization.
• Scaling features ensures distances are meaningful.

Feature Scaling

• Importance:
• Features with different scales can distort distance measurements.
• Normalization techniques (e.g., Min-Max Scaler, Standard Scaler) adjust features to a uniform range.

Practical Applications

• Examples include email spam detection, language models predicting the next word, and clustering users for recommendations.
Homework 3: Quiz 3

Question 1 1.
What type of data does the k-Nearest Neighbors (kNN) Which of the following is an example of supervised
algorithm require?
Answer: numerical features
learning?
Answer: Linear regression
Question 2
Which of the following is an example of supervised 2.
learning? What does the Mean Squared Error (MSE) measure?
Answer: Predicting email spam Answer: The average squared difference
between actual values and predictions
Question 3
Reinforcement learning is based on the psychological
concept of Operant Conditioning. 3.
Answer: True What is the function of the bias term in a
perceptron?
Question 4 Answer: It shifts the decision boundary
Which of the following is a type of machine learning?
Answer: All of the above 4.
Question 5
Gradient descent can be used to minimize Mean
Which machine learning method is used for the MNIST Squared Error (MSE).
dataset? Answer: True
Answer: Supervised learning
5.
Question 6 The perceptron is an example of a linear classifier.
What does supervised learning involve? Answer: True
Answer: Learning from exemplars

Question 7 6.
The confusion matrix is used to evaluate the performance of Which of the following inspires AI and ML?
supervised learning classification models. Answer: The human brain
Answer: True
7.
Question 8 What is the purpose of gradient descent?
What is the purpose of scaling in machine learning?
Answer: To ensure input features are within a similar range
Answer: To minimize the Mean Squared Error
Lecture 7: Perceptron and Linear Regression

1. Perceptron:
The perceptron is a simple model for binary classification, consisting of:

Inputs: Features from the data.


Weights: Assigned to each input to indicate its importance.
Bias: Adjusts the output to better fit data patterns.
Activation Function: Converts the weighted sum of inputs to a binary output (e.g., a step function).

2. Training a Perceptron:
Forward pass: Compute the weighted sum of inputs, apply the activation function to produce an output.
Weight updates: Adjust based on the difference between predicted and actual labels to minimize error.
Linear separability: The perceptron works if data can be separated by a straight line.

The perceptron is a linear classifier

input + weight
Step
function

adder

Inductive bias refers to the set of assumptions a model uses to generalize beyond the training data. It
affects how a model makes predictions on unseen data.

Examples:

• Linear Models:

• Assume relationships between variables are linear, meaning the change in the output is proportional to the
change in the input.

• Decision Trees:

• Assume the data can be split hierarchically into subsets, which allows the model to capture complex
interactions between features through a series of binary decisions.
Linear Regression

• Definition: Linear regression is a method for modeling the relationship between a dependent variable (target) and one or
more independent variables (features) using a linear equation.

• Simple Linear Regression: One feature, produces a straight line.

• Multiple Linear Regression: Multiple features, forms a hyperplane.


What is MSE? Purpose of MSE

• MSE stands for Mean Squared Error, and it measures the average squared • Minimization Objective:
difference between the actual data points and the predictions made by the
In regression models, the goal is to find
line.
parameters (weights and bias) that minimize the
MSE.

Why do we need MSE? • Model Evaluation:

• When we create a line (or a model) to predict values from data, we want to MSE indicates how well the model predictions
know how well this line fits the actual data points. match the actual values. Smaller MSE values
mean better performance.
• We also want to compare different lines (or models) to see which one does
a better job at predicting the data.

• MSE helps us quantify the "goodness" of the fit by telling us how far the
predictions are from the actual values.

Disadvantages of MSE

1. Outlier Sensitivity:

Because errors are squared, outliers (extreme deviations) can


disproportionately increase the MSE.

2. Units of Measurement:

The units of MSE are the square of the target variable’s units, which can
make interpretation less intuitive.

Applications of MSE

1. Linear Regression:

MSE is used as the loss function to train regression models. By


minimizing MSE, the model learns the best-fitting parameters.

2. Model Comparison:

MSE is commonly used to compare the performance of different


regression models or parameter settings.

Between lectures 7 and 8


Definition:

Linear regression is a method to predict a continuous outcome based on the relationship between dependent and independent
variables.

Key Features:

• Model: Predicts a continuous value using the equation:

• Goal: Minimize the difference between predicted and actual values (error).

• Output: A continuous value (e.g., house price, temperature).

• Assumption: The relationship between variables is linear. Learning Process:

1. Calculate the best-fitting line by minimizing the Mean Squared Error (MSE).

2. Use Gradient Descent or other methods to optimize weights. Use Cases:

• Predicting prices (houses, stocks)

• Forecasting trends (weather, sales)


Lecture 8: Logistic Regression

1. Logistic Function:
• Maps any input to a range between 0 and 1, representing probabilities.
• Essential for converting linear regression outputs to probabilities for classification.

2. Forward Pass:
• Compute the linear combination of weights and inputs.
• Apply the logistic function to produce a probability for the positive class.

3. Backward Pass in Logistic Regression:


• Uses gradient descent to minimize the cross-entropy loss, which measures the difference between predicted
probabilities and actual class labels.

Sigmoid Curve:

• A mathematical function that maps any input from (∞, ∞) to a value


between 0 and 1.

• Common in logistic regression for converting linear outputs into


probabilities.
sigmoid curve
• Key Properties:

• Monotonic: The function is always increasing or decreasing, never reversing


direction.

• Probabilistic Interpretation: Transforms numeric outputs into probabilities,


making it useful for classification tasks.
Understanding logistic regression Variants of Gradient Descent

Logistic Regression: Variants of Gradient Descent


• A probability-based model used for binary classification.
Batch Training:
• It predicts the likelihood that an input 𝒙 belongs to one
of two classes: • Calculate the loss for the whole training set and the gradient for this.
• Make one move in the correct direction.
*Class 1 ( 𝑡 = 1) • Repeat (an epoch).
• Can be slow.
*Class 0 ( 𝑡 = 0)
Stochastic Gradient Descent:
Goal:
•Pick one item.
For a given observation 𝒙 (feature vector 𝑥 Ԧ), determine:
•Calculate the loss for this item.
•Calculate the gradient for this item and move in the opposite direction.
• The probability that 𝒙 belongs to Class 1.
•Each move does not have to be towards the direction of the gradient for
• The probability that 𝒙 belongs to Class 0.
the whole set.
•But the overall effect may be good.
• Compare these probabilities to assign the input to the •Can be faster.
most likely class.
Mini-Batch Training:

•Pick a subset of the training set of a certain size.


Cross-entropy loss, also known as log-loss, is a measure
used to quantify the difference between two probability •Calculate the loss for this subset.
distributions. It’s commonly used in classification problems, •Make one move in the direction opposite of this gradient.
especially for models like logistic regression and neural •Repeat (an epoch).
networks. •A good compromise between the two extremes.
•(The other two are subcases of this).
• Cross-entropy penalizes the model more for confident
wrong predictions.
Comparison and Application
• If the predicted probability is close to the true label (either
0 or 1), the loss will be 1. Batch Gradient Descent is suitable for smaller datasets where
computational resources are not a concern.
small. 2. Stochastic Gradient Descent is effective for online learning or when
• If the model is confident but wrong, the loss increases
working with very large datasets.
sharply. 3. Mini-Batch Gradient Descent is often the most practical approach,
striking a balance between efficiency and convergence stability.
Use in Optimization:

• Cross-entropy loss is used as the objective function in


classification tasks, and the

model parameters are adjusted to minimize this loss during


training (e.g., using gradient descent).

Gradient descent

• Loss Function: Measures how well the model

predicts the target values.


Variants of Gradient Descent
• No Closed-Form Solution: Unlike linear

regression, there's no simple formula to find the best model.

• Good News: The log-loss function is convex.

• This means there are no local minima.

• We always know the direction to move for

optimization.
Multi-Class Classification Multinomial Logistic Regression
•Assigns a label (class) from a finite set of labels () to an Extension:
observation. Extends logistic regression to handle multiple classes ().

Examples: Model:
•Binary: Yes/No, 1/0. Calculates the probability of each class using the softmax
•Multi-class: To each observation , choose one label function.
from a set .
Optimizes the log-loss (cross-entropy) for multi-class
Approaches: predictions.
1.One-vs-Rest Classifier (One-vs-All):
•Builds one classifier per class. Applications: Handwriting recognition, sentiment analysis,
•Compares scores of all classifiers and chooses the class image classification.
with the highest score.

2.Multinomial Logistic Regression (Softmax Regression):


•Directly handles multi-class classification by modeling
probabilities for all classes.

One-vs-All (One-vs-Rest) Classification

One-vs-All (OvA) is a strategy for handling multi-class classification


tasks using binary classification algorithms.

Advantages
• Simple to implement with binary classifiers.
• Works well for most classification problems.
Homework 4:
Disadvantages
• Assumes independence between classifiers. Question 1
• Can struggle with ambiguous cases where multiple Which of the following is used in logistic regression to measure the difference
classifiers give high probabilities. between predicted and actual outcomes?
Answer: Cross-entropy loss

Question 2
Which of the following is an assumption of linear regression?
Answer: The relationship between variables is linear

Question 3
Which function does logistic regression use to map predicted values to
probabilities?
Answer: Sigmoid function

Question 4
Multi-class classification involves predicting more than two possible categories.
Answer: True

Question 5
Ones vs all What type of outcome does logistic regression typically predict?
Answer: Binary

Question 6
What is a key advantage of stochastic gradient descent (SGD)?
Answer: It only updates based on one item, making it faster

Question 7
What does logistic regression predict?
Answer: A categorical outcome

Question 8
Logistic regression assumes a linear relationship between the logit of the
outcome and predictor variables.
Answer: True

Question 9
One vs. rest is a method used for binary classification only.
Answer: True
Lecture 9

& 1. Feed-Forward Neural Networks (Multi-Layer Perceptron)


• Structure:
• Consists of input, hidden, and output layers.
• Connections flow from the input to the output layers in one direction.

• Hidden Layers:
• Nodes process inputs through weighted sums and activation functions.
• Activation functions introduce non-linearity, enabling the network to model complex relationships.

2. Activation Functions
• Purpose: Introduce non-linearity to handle complex patterns.

• Common functions:
• Sigmoid: Outputs values between 0 and 1.
• ReLU: Outputs the input directly if positive, else zero.
• Softmax: Converts raw scores into probabilities (used in multi-class classification).
• Tanh: Outputs values between -1 and 1.

3. Tasks and Output Layer


• Regression: Continuous output; no activation function in the output layer.
• Binary Classification: One output neuron with a sigmoid activation function.
• Multi-Class Classification: Multiple output neurons with softmax activation.
• Multi-Label Classification: Independent probabilities for each label using sigmoid activation.

1.Regression:

1. Output: A single continuous value or multiple values.

2. Activation function: None (or linear activation).

3. Example: Predicting house prices.

2.Binary Classification:

1. Output: A single neuron, outputting a probability between 0 and 1.

F
2. Activation function: Sigmoid or logistic.
predictions
3. Example: Classifying whether an email is spam or not.

3.Multi-Class Classification:

1. Output: Multiple neurons, each corresponding to a class.

2. Activation function: Softmax (to get class probabilities).

3. Example: Recognizing digits (0-9) from images.

4.Multi-Label Classification:

1. Output: Multiple neurons, each outputting a probability for different


labels.

2. Activation function: Sigmoid (for independent probability per label).

3. Example: Detecting multiple objects in an image (cat, car, tree).


D 4. Backpropagation Algorithm

Overview: Backpropagation is a key algorithm for training neural networks by adjusting weights based on the error of
predictions.

1. Forward Pass: Input data flows through the network to generate an output, which is compared to the true target to
compute a loss.

2. Loss Calculation: The loss quantifies the difference between predicted and actual values.

3. Backward Pass:

+Calculate the gradient of the loss with respect to the output.

+Use the chain rule to propagate this gradient backward through the network.

4. Weight Update: Adjust weights and biases using an optimization algorithm (e.g., stochastic gradient descent) by
subtracting a fraction of the gradients.

5. Iteration: Repeat the process over many epochs to improve the model's performance.

5. Gradient Descent Variants


•Batch Training: Uses the entire dataset for weight updates.
•Stochastic Gradient Descent (SGD): Updates weights for each data point.
•Mini-Batch Training: Processes a small subset of the dataset for weight updates (a balance between batch and SGD).

7. Practical Considerations
•Scaling: Normalize input data to ensure efficient training.
•Weight Initialization: Random initialization helps avoid symmetry issues.
•Avoiding Overfitting:
•Regularization techniques like dropout.
•Early stopping by monitoring validation set loss.

8. Evaluation Metrics
•Accuracy: Fraction of correct predictions.
•Precision: Focuses on correctly predicted positives.
•Recall: Measures how well the model identifies all positives.
•F1 Score: Harmonic mean of precision and recall.
Qu
Question 1: HM 1. Question 1: What activation function is typically used for
What is backpropagation in the context of neural networks? the output layer in binary classification?
Answer: A method to update weights to minimize the error • Answer: Sigmoid

Question 2:
Which of the following best describes optimization in machine 2. Question 2: What is the role of the output layer in a
learning? neural network?
Answer: Finding the best possible solution from all possible • Answer: To map learned representations to task-
options specific outputs

Question 3:
What is the primary goal of search algorithms? 3. Question 3: What is the purpose of the loss calculation
Answer: To navigate through a set of possibilities to find an in neural networks?
optimal solution • Answer: To minimize the difference between
predictions and actual values
Question 4:
Logistic regression is mainly used for what type of tasks?
Answer: Classification tasks 4. Question 4: What does the backpropagation algorithm
primarily update in a neural network?
Question 5: • Answer: Weights
Evolutionary algorithms are inspired by what?
Answer: Natural selection and biological evolution
5. Question 5: In feed-forward neural networks, data flows
Question 6: in which direction?
Logical programming mainly deals with which of the following? • Answer: Forward only
Answer: Representing and reasoning with logical statements

Question 7: 6. Question 6: What is the most common activation


Rule-based systems are often used in which type of function recommended for most tasks in the hidden layers of
application? a neural network?
Answer: Expert systems • Answer: ReLU

Question 8:
Which of the following is true about fuzzy logic? 7. Question 7: Batch processing in neural networks
Answer: It handles reasoning that is approximate rather than processes data samples one by one.
fixed and exact • Answer: False

Question 9:
Which of these best describes unsupervised learning? 8. Question 8: Backpropagation requires calculating the
Answer: Finding patterns and groupings in data without labeled gradient of the loss with respect to network weights.
outputs • Answer: True

Question 10:
Which of these is a key feature of the perceptron model? 9. Question 9: Multi-label classification uses sigmoid
Answer: It can classify linearly separable data activation in the output layer.
• Answer: True
(Excluding Cross-Validation)
Supervised Learning Summary
• Definition: A type of machine learning where the model is trained on labeled data.
• Goal: Make accurate predictions on unseen data based on training data.

Key Terms
• Training Data: Input-output pairs used for training.
• Labels: Known outputs corresponding to the inputs.
• Features: Characteristics used as input for the model.
• Model: Mathematical representation trained to make predictions.

Common Algorithms
• Linear Regression: Predicts continuous values assuming a linear relationship.
• Logistic Regression: Estimates probabilities for binary classification.
• Decision Trees: Splits data based on feature values to make decisions.
• Support Vector Machines (SVM): Finds a hyperplane to separate classes.
• Neural Networks: Composed of interconnected nodes for learning complex patterns.

Model Evaluation Metrics


1. Accuracy: Proportion of correct predictions.
2. Precision: Ratio of true positives to total predicted positives.
3. Recall: Ratio of true positives to actual positives.
4. F1 Score: Harmonic mean of precision and recall; useful for imbalanced datasets.

Overfitting and Underfitting


• Overfitting:
• Captures noise and patterns too closely, performing poorly on unseen data.
• Characteristics: High training accuracy, low testing accuracy, complex models.
• Solutions: Regularization (L1, L2), pruning, reducing complexity.
• Underfitting:
• Model is too simple to capture data patterns.
• Characteristics: Low training and testing accuracy, simple models.
• Solutions: Increase complexity, feature engineering, reduce regularization.

Hyperparameter Tuning
• Definition: Adjusting settings that govern training (e.g., learning rate).
• Methods: Grid search, random search, Bayesian optimization.
Real-World Applications
• Healthcare: Predicting outcomes, diagnosing diseases.
• Finance: Credit scoring, fraud detection.
• Marketing: Customer segmentation, recommendation systems.
• NLP: Sentiment analysis, text classification.

Regularization Techniques
• Adds a penalty term to discourage overfitting:
• L2 Regularization: Adds the sum of squares of weights.
• Dropout: Randomly drops neurons during training.
• Elastic Net: Combines L1 and L2 regularization.

Bias-Variance Tradeoff
• Bias: Systematic errors; leads to underfitting.
• Variance: Sensitivity to noise; leads to overfitting.
• Goal: Balance bias and variance for optimal performance.

Ensemble Learning
• Combines multiple models for better predictions:
• Bagging: Trains models on random data subsets (e.g., Random Forests).
• Boosting: Sequentially corrects errors (e.g., AdaBoost).
• Stacking: Combines predictions from multiple models using a meta-model.
• Voting Classifiers:
• Combines classifiers for predictions via majority votes or averaging probabilities.
Q5

1. Question 1: What is an indicator of


overfitting in a model?
Answer: High training accuracy and low testing
accuracy.

2. Question 2: Scikit-learn provides


preprocessing tools for data.
Answer: True.

3. Question 3: Underfitting occurs when the


model is too complex.
Answer: False.

4. Question 4: What does regularization aim to


address in machine learning?
Answer: Balancing training error and generalization.

5. Question 5: When does underfitting


typically occur?
Answer: When the model is too simple to capture data
patterns.

6. Question 6: What is a key concept in


hyperparameter tuning?
Answer: Finding the optimal settings for model
performance.

7. Question 7: Which of the following is a


common application of supervised learning in
healthcare?

Answer: Predicting patient outcomes.


8. Question 8: What technique can prevent
overfitting by reducing the reliance on specific
neurons in neural networks?
Answer: Dropout.

You might also like