0% found this document useful (0 votes)

17 views23 pages

Ai Final

The document covers various search algorithms and optimization techniques in AI, including exhaustive search, greedy search, hill climbing, simulated annealing, and gradient descent. It also discusses Prolog's logic programming capabilities, rule-based systems, fuzzy logic, and machine learning types such as supervised, unsupervised, and reinforcement learning. Key concepts include the importance of balancing exploration and exploitation, the use of fuzzy sets for handling uncertainty, and the machine learning process from data collection to model evaluation.

Uploaded by

shaikham2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Ai Final

Uploaded by

shaikham2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Intro to AI final

Lecture 2

Exhaustive Search:
Systematically explores all possible solutions.
Guarantees finding the global optimum but is computationally
expensive.

Greedy Search & Hill Climbing:

Travelling Salesman Problem (TSP)
Greedy Search: Makes locally optimal choices, hoping to find a
global optimum.

Hill Climbing: Iteratively improves a single solution by

comparing neighboring solutions.

Exploration vs. Exploitation:

Balancing between exploring new possibilities (exploration)

and refining known good solutions (exploitation).
Critical in achieving an optimal balance for solution discovery.

Simulated Annealing:
Gradient ascent / descent Mimics the annealing process in metallurgy.
Uses a probabilistic technique to escape local optima and find a
global optimum.

Continuous Optimization & Gradient Descent:

Applied to continuous, differentiable functions.

Gradient Descent: Minimizes a function by iteratively moving
towards the steepest descent.

Gradient Ascent: Maximizes a function by moving towards the

steepest ascent.
Gradient Descent
Assignment 1
1. In hill climbing search, what is a “local maximum”?

Answer: A peak that is higher than neighboring points but not the highest overall

2. What type of problems is gradient descent particularly useful for?

Answer: Continuous optimization problems

3. What is the primary purpose of simulated annealing in optimization?

Answer: To avoid getting trapped in local optima by allowing occasional worse moves

4. What is the goal of optimization in computational problems?

Answer: To find the best possible solution

5. Which of the following best describes “exhaustive search”?

Answer: Considering all possible solutions

6. What is the main principle behind greedy search algorithms?

Answer: To choose the best immediate option at each step

7. Greedy algorithms always find the global optimum in all cases.

Answer: False

8. In continuous optimization, what is the goal of gradient descent?

Answer: To find the lowest point in a solution space

9. Which optimization technique is inspired by the process of cooling metals?

Answer: Simulated annealing

Lecture 3
Prolog is its ability to handle uncertain or
incomplete information.
Fact: proposition is assumed to be true
Query: truth of the proposition is to be
determined
In Prolog, a programmer can specify a set of rules
and facts that are known to be true, but they
can also specify rules and facts that might be
true or false.
Headed Horn Clause: Contains a single
atomic proposition on the left side.

Headless Horn Clause: Has an empty left

The Prolog interpreter will then use those rules side, typically used to state facts.
and facts to automatically reason about the
problem domain and find solutions that are
most likely to be correct, given the available
information.

Bottom-up resolution, forward chaining

Begin with facts and rules of the database and attempt to find a sequence that leads to a goal

Works well with a large set of possibly correct answers

Top-down resolution, backward chaining

Begin with a goal and attempt to find a sequence that leads to a set of facts in the database

Works well with a small set of possibly correct answers

Prolog implementations use backward chaining

Logical Operators
Breadth-First Search (BFS) Depth-First Search (DFS)
A search strategy that explores all nodes at the A search strategy that explores as far down a branch
present depth level before moving to the next level. as possible before backtracking.
How It Works:
Explores all subgoals at the current level How It Works:
simultaneously. -Focuses on one subgoal at a time.
Moves level by level, expanding all possible nodes -Attempts to find a complete proof for the first
at each depth. subgoal before moving to the next.
Ensures all possible paths are considered in -If a subgoal fails, backtracks to the previous choice
parallel. point to try alternative paths.

Advantages:
Advantages:
Guaranteed to find the shortest path to a solution, if
-Efficient in terms of memory usage.
one exists.
-Quickly finds a solution in deep but narrow search
Does not get stuck in infinite loops.
spaces.
Disadvantages:
Can be memory-intensive as it stores all nodes at Disadvantages:
the current level. May get stuck in deep or infinite paths (i.e., may not
Slower in deep search spaces compared to DFS. find a solution if it exists on a different path

QUESTION 1
What is the primary purpose of logic programming?

Answer: to define rules and relationships for problem-solving.

QUESTION 2
In predicate calculus, what is a "predicate"?

Answer: A function that represents a property or relationship.

QUESTION 3
Which logic programming language is most commonly associated with artificial intelligence and computational linguistics?

Answer: Prolog.

QUESTION 4
In Prolog, which search strategy explores as far down a branch as possible before backtracking?

Answer: depth-first search.

QUESTION 5
Which is a common application of logic programming?

Answer: Natural language processing.

QUESTION 6
In logic programming, "resolution" is used to prove the truth of propositions.

Answer: True.

QUESTION 7
Prolog uses depth-first search as its default strategy for solving goals.

Answer: True.
Lecture 4

Types of Rule-Based Inference

1. Forward Chaining:
• Starts with known facts and applies rules to generate new facts.
• Continues until a specific goal is reached.
• Commonly used in real-time systems.
Example:
For a weather-based system:
• Input: Temperature = 5°C, Condition = Rainy
• Output: “Wear a coat and take an umbrella.”

2. Backward Chaining:
• Starts with a goal and works backward to find facts supporting the goal.
• Often used in diagnostic systems (e.g., determining a disease based on symptoms).
Example:
Diagnosing a disease by analyzing symptoms, where the system works backward from the symptom to find
possible diseases.

Steps to Build a Rule-Based System

1. Define the Problem Domain: Understand the area where the system will be applied.
2. Gather Knowledge: Collect rules from domain experts.
3. Design the Knowledge Base: Organize the rules logically.
4. Implement the Inference Engine: Choose either forward or backward chaining.
5. Test and Validate: Ensure that the system works correctly with real-world data.

Fuzzy Logic is an extension of classical logic, where truth values range between 0 and 1 instead of being strictly
true (1) or false (0). It is used to handle uncertainty and imprecision, mimicking how humans reason.

Why Fuzzy Logic?

• Handles ambiguous and imprecise information.
• Suitable for systems that require approximate reasoning, like control systems, decision-making, and .
pattern recognition.

Basic Concepts in Fuzzy Logic

• Fuzzy Sets: In contrast to classical (crisp) sets, where an element either belongs or doesn’t, fuzzy sets
allow partial membership (values between 0 and 1).
Example:
• “Temperature is hot” could have a membership degree of 0.7 (somewhat hot) or 0.9 (very hot),
depending on how close it is to the boundary.

• Membership Functions: These define how each element maps to a degree of membership in a fuzzy set.

Crisp Sets vs. Fuzzy Sets

• Crisp Sets: A crisp set is binary in nature. An element either belongs to the set (1) or does not (0).
Examples include even and odd numbers or a class of students (boys or girls).
• Fuzzy Sets: Membership in fuzzy sets is gradual, and elements can partially belong to a set. For
example, a temperature of 25°C might be “somewhat hot,” while 30°C might be “very hot.”
Fuzzy Rule-Based Systems

A Fuzzy Rule-Based System uses fuzzy logic instead of Boolean logic to make decisions.

Components:
• Fuzzification Module: Converts crisp input values into fuzzy values (e.g., a temperature reading might be converted into fuzzy
terms like “warm”).

• Inference Engine: Applies fuzzy rules (e.g., “If temperature is warm, set heating to medium”).

• Defuzzification Module: Converts fuzzy outputs back into crisp values for decision-making.

Example:
A heating system might have rules such as:
• If temperature is “cold,” then set heating to “high.”
• If temperature is “warm,” then set heating to “medium.”
• If temperature is “hot,” then set heating to “low.”

Differences Between Probability and Fuzziness

• Probability: Deals with the likelihood of an event happening (e.g., “There’s a 50% chance of rain tomorrow”).
• Fuzziness: Deals with the degree of truth (e.g., “It’s somewhat warm today”). Fuzziness acknowledges that the boundary between
“cold” and “warm” isn’t sharp.

Fuzzy Set Representation

Fuzzy sets can be represented in various ways, depending on the membership function. The closer an element is to the ideal value, the higher
its membership value. As the element moves away, the membership decreases.

Example:
• For a heating system, a temperature of 15°C might be “warm” with a membership value of 0.7, while a temperature of 20°C might
be “warm” with a value of 0.9.

Advantages and Challenges of Fuzzy Logic

Advantages:
• Better at handling uncertain and imprecise data.
• Models human-like reasoning, making it suitable for applications like control systems.
• Flexible and adaptable.

Challenges:
• Defining accurate membership functions and rules can be difficult.
• Requires tuning to perform optimally.
• Not always as interpretable as rule-based systems.
Intro. To AI systems QUIZ 2 Assignment 2

QUESTION 1: Question 1
Rule-based systems are always more accurate than fuzzy
Fuzzy rule-based systems use: logic systems.
* Fuzzy logic for reasoning Answer: False

QUESTION 2: Question 2
Fuzzy sets allow for partial membership of elements. In a fuzzy set, membership values:
* True Answer: Can range from 0 to 1

QUESTION 3:
Question 3
Which inference method starts with a goal and works backward Fuzzy logic differs from classical logic in that:
to find supporting facts? Answer: Fuzzy logic allows for partial truth values between 0
and 1
* Backward Chaining

QUESTION 4: Question 4
What is the primary advantage of using fuzzy logic over
Fuzzy logic can handle situations with uncertainty and
imprecision. classical logic in certain applications?
Answer: Fuzzy logic can handle uncertainty and imprecision
* True

QUESTION 5: Question 5
Backward chaining is commonly used in expert systems.
Fuzzy logic differs from classical logic by using: Answer: True
* Degrees of membership
Question 6
QUESTION 6: Fuzzy logic can be used to model human reasoning and
A crisp set is characterized by: decision-making processes.
Answer: True
* Clear boundaries and distinct elements

QUESTION 7: Question 7
The process of converting crisp inputs into fuzzy values is
In fuzzy logic, a membership function defines: called:
* The degree of membership for elements in a fuzzy set
Answer: Fuzzification

QUESTION 8: Question 8
Which of the following is not a component of a rule-based
Which of the following is a key advantage of fuzzy logic?
system?
* Handles uncertainty and imprecision Answer: Defuzzification module
QUESTION 9:
Question 9
What is the primary component of a rule-based system that stores Which of the following is a common application of fuzzy
temporary information during processing? logic?
* Working Memory Answer: Air conditioning control systems
Lecture 5

Swap Mutation Pick

any two and swap them

a
1234567
0153426789

Insert mutation

12345678 Pick any 2

and put them beside each other

-
12364578

Scramble mutation

12345789 Pick a set and move them

-A

- 17892345

Cross over operation :

1232
setin
set

54354
Lecture 6

Introduction to Machine Learning

• Definition: Machine Learning involves algorithms that improve automatically through experience, enabling systems to:
• Generalize: Provide sensible outputs for unseen inputs.
• Extract and apply relevant information from data to analyze new data.

Types of Machine Learning

• Supervised Learning:
• Learning from labeled data to predict labels for new data.
• Example: Classifying animals based on given labels like “dog,” “cat,” etc.

• Unsupervised Learning:
• No labeled data; focuses on identifying patterns and grouping data.
• Example: Clustering users based on reading habits to recommend articles.

• Reinforcement Learning:
• Learning through rewards and penalties.
• Example: A pigeon pecking the right button for a reward.

The Machine Learning Process

Steps involved:
1. Data Collection and Preparation: Gathering and cleaning data.
2. Feature Selection and Extraction: Identifying key variables (features) from the data.
3. Algorithm Choice: Selecting an ML algorithm suitable for the task.
4. Model Selection: Choosing the best model structure.
5. Parameter Selection: Tuning model parameters for optimal performance.
6. Training: Teaching the model using training data.
7. Evaluation: Assessing the model using testing data.

Classification and Features

• Classifier:
• Maps objects to predefined labels (e.g., MNIST dataset for digit classification).
• Features:
• Attributes representing data.
• Can be categorical (e.g., color) or numerical (e.g., weight).
• Feature Engineering: Transforming raw data into suitable inputs for algorithms (e.g., scaling, encoding categorical data).

Supervised Learning

• Key Concepts:
• Training and Test Sets: Used to train and evaluate the model, respectively.
• Confusion Matrix: Tool to measure performance (e.g., true positives, false positives).
• Accuracy: Proportion of correct predictions out of total predictions.

Example Algorithm: k-Nearest Neighbors (kNN)

• How it Works:
• Measures similarity (distance) between data points in feature space.
• Predicts based on the majority label of the nearest neighbors.
• Key Considerations:
• The value of k (number of neighbors) impacts accuracy and generalization.
• Scaling features ensures distances are meaningful.

Feature Scaling

• Importance:
• Features with different scales can distort distance measurements.
• Normalization techniques (e.g., Min-Max Scaler, Standard Scaler) adjust features to a uniform range.

Practical Applications

• Examples include email spam detection, language models predicting the next word, and clustering users for recommendations.
Homework 3: Quiz 3

Question 1 1.
What type of data does the k-Nearest Neighbors (kNN) Which of the following is an example of supervised
algorithm require?
Answer: numerical features
learning?
Answer: Linear regression
Question 2
Which of the following is an example of supervised 2.
learning? What does the Mean Squared Error (MSE) measure?
Answer: Predicting email spam Answer: The average squared difference
between actual values and predictions
Question 3
Reinforcement learning is based on the psychological
concept of Operant Conditioning. 3.
Answer: True What is the function of the bias term in a
perceptron?
Question 4 Answer: It shifts the decision boundary
Which of the following is a type of machine learning?
Answer: All of the above 4.
Question 5
Gradient descent can be used to minimize Mean
Which machine learning method is used for the MNIST Squared Error (MSE).
dataset? Answer: True
Answer: Supervised learning
5.
Question 6 The perceptron is an example of a linear classifier.
What does supervised learning involve? Answer: True
Answer: Learning from exemplars

Question 7 6.
The confusion matrix is used to evaluate the performance of Which of the following inspires AI and ML?
supervised learning classification models. Answer: The human brain
Answer: True
7.
Question 8 What is the purpose of gradient descent?
What is the purpose of scaling in machine learning?
Answer: To ensure input features are within a similar range
Answer: To minimize the Mean Squared Error
Lecture 7: Perceptron and Linear Regression

1. Perceptron:
The perceptron is a simple model for binary classification, consisting of:

Inputs: Features from the data.

Weights: Assigned to each input to indicate its importance.
Bias: Adjusts the output to better fit data patterns.
Activation Function: Converts the weighted sum of inputs to a binary output (e.g., a step function).

2. Training a Perceptron:
Forward pass: Compute the weighted sum of inputs, apply the activation function to produce an output.
Weight updates: Adjust based on the difference between predicted and actual labels to minimize error.
Linear separability: The perceptron works if data can be separated by a straight line.

The perceptron is a linear classifier

input + weight
Step
function

adder

Inductive bias refers to the set of assumptions a model uses to generalize beyond the training data. It
affects how a model makes predictions on unseen data.

Examples:

• Linear Models:

• Assume relationships between variables are linear, meaning the change in the output is proportional to the
change in the input.

• Decision Trees:

• Assume the data can be split hierarchically into subsets, which allows the model to capture complex
interactions between features through a series of binary decisions.
Linear Regression

• Definition: Linear regression is a method for modeling the relationship between a dependent variable (target) and one or
more independent variables (features) using a linear equation.

• Simple Linear Regression: One feature, produces a straight line.

• Multiple Linear Regression: Multiple features, forms a hyperplane.

What is MSE? Purpose of MSE

• MSE stands for Mean Squared Error, and it measures the average squared • Minimization Objective:
difference between the actual data points and the predictions made by the
In regression models, the goal is to find
line.
parameters (weights and bias) that minimize the
MSE.

Why do we need MSE? • Model Evaluation:

• When we create a line (or a model) to predict values from data, we want to MSE indicates how well the model predictions
know how well this line fits the actual data points. match the actual values. Smaller MSE values
mean better performance.
• We also want to compare different lines (or models) to see which one does
a better job at predicting the data.

• MSE helps us quantify the "goodness" of the fit by telling us how far the
predictions are from the actual values.

Disadvantages of MSE

1. Outlier Sensitivity:

Because errors are squared, outliers (extreme deviations) can

disproportionately increase the MSE.

2. Units of Measurement:

The units of MSE are the square of the target variable’s units, which can
make interpretation less intuitive.

Applications of MSE

1. Linear Regression:

MSE is used as the loss function to train regression models. By

minimizing MSE, the model learns the best-fitting parameters.

2. Model Comparison:

MSE is commonly used to compare the performance of different

regression models or parameter settings.

Between lectures 7 and 8

Definition:

Linear regression is a method to predict a continuous outcome based on the relationship between dependent and independent
variables.

Key Features:

• Model: Predicts a continuous value using the equation:

• Goal: Minimize the difference between predicted and actual values (error).

• Output: A continuous value (e.g., house price, temperature).

• Assumption: The relationship between variables is linear. Learning Process:

1. Calculate the best-fitting line by minimizing the Mean Squared Error (MSE).

2. Use Gradient Descent or other methods to optimize weights. Use Cases:

• Predicting prices (houses, stocks)

• Forecasting trends (weather, sales)

Lecture 8: Logistic Regression

1. Logistic Function:
• Maps any input to a range between 0 and 1, representing probabilities.
• Essential for converting linear regression outputs to probabilities for classification.

2. Forward Pass:
• Compute the linear combination of weights and inputs.
• Apply the logistic function to produce a probability for the positive class.

3. Backward Pass in Logistic Regression:

• Uses gradient descent to minimize the cross-entropy loss, which measures the difference between predicted
probabilities and actual class labels.

Sigmoid Curve:

• A mathematical function that maps any input from (∞, ∞) to a value

between 0 and 1.

• Common in logistic regression for converting linear outputs into

probabilities.
sigmoid curve
• Key Properties:

• Monotonic: The function is always increasing or decreasing, never reversing

direction.

• Probabilistic Interpretation: Transforms numeric outputs into probabilities,

making it useful for classification tasks.
Understanding logistic regression Variants of Gradient Descent

Logistic Regression: Variants of Gradient Descent

• A probability-based model used for binary classification.
Batch Training:
• It predicts the likelihood that an input 𝒙 belongs to one
of two classes: • Calculate the loss for the whole training set and the gradient for this.
• Make one move in the correct direction.
*Class 1 ( 𝑡 = 1) • Repeat (an epoch).
• Can be slow.
*Class 0 ( 𝑡 = 0)
Stochastic Gradient Descent:
Goal:
•Pick one item.
For a given observation 𝒙 (feature vector 𝑥 Ԧ), determine:
•Calculate the loss for this item.
•Calculate the gradient for this item and move in the opposite direction.
• The probability that 𝒙 belongs to Class 1.
•Each move does not have to be towards the direction of the gradient for
• The probability that 𝒙 belongs to Class 0.
the whole set.
•But the overall effect may be good.
• Compare these probabilities to assign the input to the •Can be faster.
most likely class.
Mini-Batch Training:

•Pick a subset of the training set of a certain size.

Cross-entropy loss, also known as log-loss, is a measure
used to quantify the difference between two probability •Calculate the loss for this subset.
distributions. It’s commonly used in classification problems, •Make one move in the direction opposite of this gradient.
especially for models like logistic regression and neural •Repeat (an epoch).
networks. •A good compromise between the two extremes.
•(The other two are subcases of this).
• Cross-entropy penalizes the model more for confident
wrong predictions.
Comparison and Application
• If the predicted probability is close to the true label (either
0 or 1), the loss will be 1. Batch Gradient Descent is suitable for smaller datasets where
computational resources are not a concern.
small. 2. Stochastic Gradient Descent is effective for online learning or when
• If the model is confident but wrong, the loss increases
working with very large datasets.
sharply. 3. Mini-Batch Gradient Descent is often the most practical approach,
striking a balance between efficiency and convergence stability.
Use in Optimization:

• Cross-entropy loss is used as the objective function in

classification tasks, and the

model parameters are adjusted to minimize this loss during

training (e.g., using gradient descent).

Gradient descent

• Loss Function: Measures how well the model

predicts the target values.

Variants of Gradient Descent
• No Closed-Form Solution: Unlike linear

regression, there's no simple formula to find the best model.

• Good News: The log-loss function is convex.

• This means there are no local minima.

• We always know the direction to move for

optimization.
Multi-Class Classification Multinomial Logistic Regression
•Assigns a label (class) from a finite set of labels () to an Extension:
observation. Extends logistic regression to handle multiple classes ().

Examples: Model:
•Binary: Yes/No, 1/0. Calculates the probability of each class using the softmax
•Multi-class: To each observation , choose one label function.
from a set .
Optimizes the log-loss (cross-entropy) for multi-class
Approaches: predictions.
1.One-vs-Rest Classifier (One-vs-All):
•Builds one classifier per class. Applications: Handwriting recognition, sentiment analysis,
•Compares scores of all classifiers and chooses the class image classification.
with the highest score.

2.Multinomial Logistic Regression (Softmax Regression):

•Directly handles multi-class classification by modeling
probabilities for all classes.

One-vs-All (One-vs-Rest) Classification

One-vs-All (OvA) is a strategy for handling multi-class classification

tasks using binary classification algorithms.

Advantages
• Simple to implement with binary classifiers.
• Works well for most classification problems.
Homework 4:
Disadvantages
• Assumes independence between classifiers. Question 1
• Can struggle with ambiguous cases where multiple Which of the following is used in logistic regression to measure the difference
classifiers give high probabilities. between predicted and actual outcomes?
Answer: Cross-entropy loss

Question 2
Which of the following is an assumption of linear regression?
Answer: The relationship between variables is linear

Question 3
Which function does logistic regression use to map predicted values to
probabilities?
Answer: Sigmoid function

Question 4
Multi-class classification involves predicting more than two possible categories.
Answer: True

Question 5
Ones vs all What type of outcome does logistic regression typically predict?
Answer: Binary

Question 6
What is a key advantage of stochastic gradient descent (SGD)?
Answer: It only updates based on one item, making it faster

Question 7
What does logistic regression predict?
Answer: A categorical outcome

Question 8
Logistic regression assumes a linear relationship between the logit of the
outcome and predictor variables.
Answer: True

Question 9
One vs. rest is a method used for binary classification only.
Answer: True
Lecture 9

& 1. Feed-Forward Neural Networks (Multi-Layer Perceptron)

• Structure:
• Consists of input, hidden, and output layers.
• Connections flow from the input to the output layers in one direction.

• Hidden Layers:
• Nodes process inputs through weighted sums and activation functions.
• Activation functions introduce non-linearity, enabling the network to model complex relationships.

2. Activation Functions
• Purpose: Introduce non-linearity to handle complex patterns.

• Common functions:
• Sigmoid: Outputs values between 0 and 1.
• ReLU: Outputs the input directly if positive, else zero.
• Softmax: Converts raw scores into probabilities (used in multi-class classification).
• Tanh: Outputs values between -1 and 1.

3. Tasks and Output Layer

• Regression: Continuous output; no activation function in the output layer.
• Binary Classification: One output neuron with a sigmoid activation function.
• Multi-Class Classification: Multiple output neurons with softmax activation.
• Multi-Label Classification: Independent probabilities for each label using sigmoid activation.

1.Regression:

1. Output: A single continuous value or multiple values.

2. Activation function: None (or linear activation).

3. Example: Predicting house prices.

2.Binary Classification:

1. Output: A single neuron, outputting a probability between 0 and 1.

F
2. Activation function: Sigmoid or logistic.
predictions
3. Example: Classifying whether an email is spam or not.

3.Multi-Class Classification:

1. Output: Multiple neurons, each corresponding to a class.

2. Activation function: Softmax (to get class probabilities).

3. Example: Recognizing digits (0-9) from images.

4.Multi-Label Classification:

1. Output: Multiple neurons, each outputting a probability for different

labels.

2. Activation function: Sigmoid (for independent probability per label).

3. Example: Detecting multiple objects in an image (cat, car, tree).

D 4. Backpropagation Algorithm

Overview: Backpropagation is a key algorithm for training neural networks by adjusting weights based on the error of
predictions.

1. Forward Pass: Input data flows through the network to generate an output, which is compared to the true target to
compute a loss.

2. Loss Calculation: The loss quantifies the difference between predicted and actual values.

3. Backward Pass:

+Calculate the gradient of the loss with respect to the output.

+Use the chain rule to propagate this gradient backward through the network.

4. Weight Update: Adjust weights and biases using an optimization algorithm (e.g., stochastic gradient descent) by
subtracting a fraction of the gradients.

5. Iteration: Repeat the process over many epochs to improve the model's performance.

5. Gradient Descent Variants

•Batch Training: Uses the entire dataset for weight updates.
•Stochastic Gradient Descent (SGD): Updates weights for each data point.
•Mini-Batch Training: Processes a small subset of the dataset for weight updates (a balance between batch and SGD).

7. Practical Considerations
•Scaling: Normalize input data to ensure efficient training.
•Weight Initialization: Random initialization helps avoid symmetry issues.
•Avoiding Overfitting:
•Regularization techniques like dropout.
•Early stopping by monitoring validation set loss.

8. Evaluation Metrics
•Accuracy: Fraction of correct predictions.
•Precision: Focuses on correctly predicted positives.
•Recall: Measures how well the model identifies all positives.
•F1 Score: Harmonic mean of precision and recall.
Qu
Question 1: HM 1. Question 1: What activation function is typically used for
What is backpropagation in the context of neural networks? the output layer in binary classification?
Answer: A method to update weights to minimize the error • Answer: Sigmoid

Question 2:
Which of the following best describes optimization in machine 2. Question 2: What is the role of the output layer in a
learning? neural network?
Answer: Finding the best possible solution from all possible • Answer: To map learned representations to task-
options specific outputs

Question 3:
What is the primary goal of search algorithms? 3. Question 3: What is the purpose of the loss calculation
Answer: To navigate through a set of possibilities to find an in neural networks?
optimal solution • Answer: To minimize the difference between
predictions and actual values
Question 4:
Logistic regression is mainly used for what type of tasks?
Answer: Classification tasks 4. Question 4: What does the backpropagation algorithm
primarily update in a neural network?
Question 5: • Answer: Weights
Evolutionary algorithms are inspired by what?
Answer: Natural selection and biological evolution
5. Question 5: In feed-forward neural networks, data flows
Question 6: in which direction?
Logical programming mainly deals with which of the following? • Answer: Forward only
Answer: Representing and reasoning with logical statements

Question 7: 6. Question 6: What is the most common activation

Rule-based systems are often used in which type of function recommended for most tasks in the hidden layers of
application? a neural network?
Answer: Expert systems • Answer: ReLU

Question 8:
Which of the following is true about fuzzy logic? 7. Question 7: Batch processing in neural networks
Answer: It handles reasoning that is approximate rather than processes data samples one by one.
fixed and exact • Answer: False

Question 9:
Which of these best describes unsupervised learning? 8. Question 8: Backpropagation requires calculating the
Answer: Finding patterns and groupings in data without labeled gradient of the loss with respect to network weights.
outputs • Answer: True

Question 10:
Which of these is a key feature of the perceptron model? 9. Question 9: Multi-label classification uses sigmoid
Answer: It can classify linearly separable data activation in the output layer.
• Answer: True
(Excluding Cross-Validation)
Supervised Learning Summary
• Definition: A type of machine learning where the model is trained on labeled data.
• Goal: Make accurate predictions on unseen data based on training data.

Key Terms
• Training Data: Input-output pairs used for training.
• Labels: Known outputs corresponding to the inputs.
• Features: Characteristics used as input for the model.
• Model: Mathematical representation trained to make predictions.

Common Algorithms
• Linear Regression: Predicts continuous values assuming a linear relationship.
• Logistic Regression: Estimates probabilities for binary classification.
• Decision Trees: Splits data based on feature values to make decisions.
• Support Vector Machines (SVM): Finds a hyperplane to separate classes.
• Neural Networks: Composed of interconnected nodes for learning complex patterns.

Model Evaluation Metrics

1. Accuracy: Proportion of correct predictions.
2. Precision: Ratio of true positives to total predicted positives.
3. Recall: Ratio of true positives to actual positives.
4. F1 Score: Harmonic mean of precision and recall; useful for imbalanced datasets.

Overfitting and Underfitting

• Overfitting:
• Captures noise and patterns too closely, performing poorly on unseen data.
• Characteristics: High training accuracy, low testing accuracy, complex models.
• Solutions: Regularization (L1, L2), pruning, reducing complexity.
• Underfitting:
• Model is too simple to capture data patterns.
• Characteristics: Low training and testing accuracy, simple models.
• Solutions: Increase complexity, feature engineering, reduce regularization.

Hyperparameter Tuning
• Definition: Adjusting settings that govern training (e.g., learning rate).
• Methods: Grid search, random search, Bayesian optimization.
Real-World Applications
• Healthcare: Predicting outcomes, diagnosing diseases.
• Finance: Credit scoring, fraud detection.
• Marketing: Customer segmentation, recommendation systems.
• NLP: Sentiment analysis, text classification.

Regularization Techniques
• Adds a penalty term to discourage overfitting:
• L2 Regularization: Adds the sum of squares of weights.
• Dropout: Randomly drops neurons during training.
• Elastic Net: Combines L1 and L2 regularization.

Bias-Variance Tradeoff
• Bias: Systematic errors; leads to underfitting.
• Variance: Sensitivity to noise; leads to overfitting.
• Goal: Balance bias and variance for optimal performance.

Ensemble Learning
• Combines multiple models for better predictions:
• Bagging: Trains models on random data subsets (e.g., Random Forests).
• Boosting: Sequentially corrects errors (e.g., AdaBoost).
• Stacking: Combines predictions from multiple models using a meta-model.
• Voting Classifiers:
• Combines classifiers for predictions via majority votes or averaging probabilities.
Q5

1. Question 1: What is an indicator of

overfitting in a model?
Answer: High training accuracy and low testing
accuracy.

2. Question 2: Scikit-learn provides

preprocessing tools for data.
Answer: True.

3. Question 3: Underfitting occurs when the

model is too complex.
Answer: False.

4. Question 4: What does regularization aim to

address in machine learning?
Answer: Balancing training error and generalization.

5. Question 5: When does underfitting

typically occur?
Answer: When the model is too simple to capture data
patterns.

6. Question 6: What is a key concept in

hyperparameter tuning?
Answer: Finding the optimal settings for model
performance.

7. Question 7: Which of the following is a

common application of supervised learning in
healthcare?

Answer: Predicting patient outcomes.

8. Question 8: What technique can prevent
overfitting by reducing the reliance on specific
neurons in neural networks?
Answer: Dropout.

Grayson BBM JMPPro Chapter7
No ratings yet
Grayson BBM JMPPro Chapter7
39 pages
Top 100 Interview Questions On Machine Learning
100% (1)
Top 100 Interview Questions On Machine Learning
155 pages
Rms PDF
No ratings yet
Rms PDF
506 pages
Coure Main
No ratings yet
Coure Main
323 pages
The Complete Guide To Machine Learning in Retail Demand Forecasting Links
100% (1)
The Complete Guide To Machine Learning in Retail Demand Forecasting Links
20 pages
2 Machine Learning General
No ratings yet
2 Machine Learning General
56 pages
L2 Expert System and Uncertainty
No ratings yet
L2 Expert System and Uncertainty
109 pages
Pyrespect: A Computer Program To Extract Discrete and Continuous Spectra From Stress Relaxation Experiments
No ratings yet
Pyrespect: A Computer Program To Extract Discrete and Continuous Spectra From Stress Relaxation Experiments
24 pages
International Journal of Forecasting: Devon K. Barrow Sven F. Crone
No ratings yet
International Journal of Forecasting: Devon K. Barrow Sven F. Crone
17 pages
ML BIT Ans
No ratings yet
ML BIT Ans
5 pages
Classification and Prediction: Data Mining Concepts and Techniques
No ratings yet
Classification and Prediction: Data Mining Concepts and Techniques
18 pages
DL Unit-2
No ratings yet
DL Unit-2
32 pages
6 - Fuzzy Logic - Report
0% (1)
6 - Fuzzy Logic - Report
20 pages
To Build A System To Solve A Problem
No ratings yet
To Build A System To Solve A Problem
60 pages
Artificial Intelligence ICS461 Fall 2010: Informed Searches
No ratings yet
Artificial Intelligence ICS461 Fall 2010: Informed Searches
14 pages
Chapter 6 MP
No ratings yet
Chapter 6 MP
30 pages
AI Unit III Part 2
No ratings yet
AI Unit III Part 2
72 pages
AI MID Answers
No ratings yet
AI MID Answers
7 pages
AI Components:: Empirical Science
No ratings yet
AI Components:: Empirical Science
14 pages
AIR - Unit I
No ratings yet
AIR - Unit I
41 pages
Unit 2 Ai
No ratings yet
Unit 2 Ai
79 pages
Fuzzy Material
No ratings yet
Fuzzy Material
107 pages
Searching Is The Universal Technique of Problem Solving in AI (AutoRecovered)
No ratings yet
Searching Is The Universal Technique of Problem Solving in AI (AutoRecovered)
39 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
21 pages
AI Ebook (2) - Merged
No ratings yet
AI Ebook (2) - Merged
124 pages
Ieee - 2024 - Fracture Identification in Facial Bone X-Rays - Journel
No ratings yet
Ieee - 2024 - Fracture Identification in Facial Bone X-Rays - Journel
12 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
3 pages
Machine Learning For Geochemical Exploration: Classifying Metallogenic Fertility in Arc Magmas and Insights Into Porphyry Copper Deposit Formation
No ratings yet
Machine Learning For Geochemical Exploration: Classifying Metallogenic Fertility in Arc Magmas and Insights Into Porphyry Copper Deposit Formation
24 pages
AI OneShot
No ratings yet
AI OneShot
38 pages
Ai Unit 2 Notes
No ratings yet
Ai Unit 2 Notes
52 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Lec12 Dss PDF
No ratings yet
Lec12 Dss PDF
10 pages
Oracle 1z0 1127 24 Dumps by Houston 28 05 2024 6qa Certscare
No ratings yet
Oracle 1z0 1127 24 Dumps by Houston 28 05 2024 6qa Certscare
7 pages
Ai Unit 2 Missing Parts
No ratings yet
Ai Unit 2 Missing Parts
13 pages
Question 1: How Does The Use of Heuristics Reduce The Search Space ?
No ratings yet
Question 1: How Does The Use of Heuristics Reduce The Search Space ?
19 pages
Ai 221214 185507
No ratings yet
Ai 221214 185507
5 pages
Arsh
No ratings yet
Arsh
13 pages
ML Price Prediction
No ratings yet
ML Price Prediction
7 pages
Artificial Intelligence Unit IV
No ratings yet
Artificial Intelligence Unit IV
105 pages
Data Mining - UOG (HH) - Final - F23-1
No ratings yet
Data Mining - UOG (HH) - Final - F23-1
10 pages
UNIT-2: Introduction To Searching Methods in AI
No ratings yet
UNIT-2: Introduction To Searching Methods in AI
46 pages
DL Notes
No ratings yet
DL Notes
652 pages
32 Lecture CSC462
No ratings yet
32 Lecture CSC462
34 pages
Ai&Ml: Unit-1
No ratings yet
Ai&Ml: Unit-1
28 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
12 pages
Soft Computing Unt 2
No ratings yet
Soft Computing Unt 2
17 pages
Practical 7 Introduction To Fuzzy Logic
No ratings yet
Practical 7 Introduction To Fuzzy Logic
20 pages
Major Project Presentation Template For Review 1
No ratings yet
Major Project Presentation Template For Review 1
49 pages
Lect 05
No ratings yet
Lect 05
7 pages
AI Jul 2023
No ratings yet
AI Jul 2023
9 pages
Chapter-3 Problem Solving
No ratings yet
Chapter-3 Problem Solving
37 pages
Ai Notes Unit 3
No ratings yet
Ai Notes Unit 3
26 pages
DLIR Oct 2024 Assignment
No ratings yet
DLIR Oct 2024 Assignment
4 pages
03 Localsearch
No ratings yet
03 Localsearch
27 pages
Lecture 7 - Perceptron and Linear Regression
No ratings yet
Lecture 7 - Perceptron and Linear Regression
62 pages
AI Chapter-Three
No ratings yet
AI Chapter-Three
33 pages
Lec 2 - Memory and Variables
No ratings yet
Lec 2 - Memory and Variables
42 pages
Lec 4 - If Statement Switch Case
No ratings yet
Lec 4 - If Statement Switch Case
35 pages
Lec 4 - If Statement Switch Case
No ratings yet
Lec 4 - If Statement Switch Case
33 pages
Lec 1 - Numerical Systems
No ratings yet
Lec 1 - Numerical Systems
32 pages
Lec 5 - For Loop With Exercises
No ratings yet
Lec 5 - For Loop With Exercises
17 pages
Ai Paper 2
No ratings yet
Ai Paper 2
9 pages
ML SP24 Final Term Exam (Solution)
No ratings yet
ML SP24 Final Term Exam (Solution)
14 pages
Soft Computing
No ratings yet
Soft Computing
123 pages
CH 2
No ratings yet
CH 2
41 pages
Lecture 3.1.5
100% (1)
Lecture 3.1.5
16 pages
Unit II Problem Solving
No ratings yet
Unit II Problem Solving
25 pages
UNIT4 IntroductionToFuzzyLogic
No ratings yet
UNIT4 IntroductionToFuzzyLogic
22 pages
Predictive - Analytics 2
No ratings yet
Predictive - Analytics 2
18 pages
AI Roadmap - Based On Berkeley AI Graduate Certificate
No ratings yet
AI Roadmap - Based On Berkeley AI Graduate Certificate
23 pages
Uniit 2
No ratings yet
Uniit 2
63 pages
Unit II - Problem Solving by Searching
No ratings yet
Unit II - Problem Solving by Searching
21 pages
Unit 4
No ratings yet
Unit 4
16 pages
MP - Report
No ratings yet
MP - Report
18 pages
AI Ques Ans Unit 2
No ratings yet
AI Ques Ans Unit 2
51 pages
Module 3 Sample Questions Solution
No ratings yet
Module 3 Sample Questions Solution
10 pages
Fuzzy Logic (Principles & Applications)
No ratings yet
Fuzzy Logic (Principles & Applications)
39 pages
AI May June 2024 Solved Paper
No ratings yet
AI May June 2024 Solved Paper
7 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
6 pages
Machine MCQ
No ratings yet
Machine MCQ
32 pages
Artificial Intelligence1
No ratings yet
Artificial Intelligence1
139 pages
Soft Comp
No ratings yet
Soft Comp
12 pages
Knowledge Representation
No ratings yet
Knowledge Representation
9 pages
Unit 2 Searching Methods
No ratings yet
Unit 2 Searching Methods
63 pages
DL Unit 4&5
No ratings yet
DL Unit 4&5
27 pages
Module 1 Artificial Intelligence
No ratings yet
Module 1 Artificial Intelligence
131 pages
Question Bank Aiml
No ratings yet
Question Bank Aiml
10 pages
5th Sem Syllabus Autonomy
No ratings yet
5th Sem Syllabus Autonomy
28 pages
of Decision Tree
No ratings yet
of Decision Tree
14 pages
UNIT-II 4MCA AI and Machine Learning
No ratings yet
UNIT-II 4MCA AI and Machine Learning
62 pages
AI Chapter 3 Notes
No ratings yet
AI Chapter 3 Notes
18 pages
Unit2 Material
No ratings yet
Unit2 Material
20 pages
H13-311 - V3.5 Huawei Exam Practice Questions
No ratings yet
H13-311 - V3.5 Huawei Exam Practice Questions
13 pages
Fuzzy Logic (Q-Ans)
No ratings yet
Fuzzy Logic (Q-Ans)
6 pages
Beam Search: Fundamentals and Applications
From Everand
Beam Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Best First Search: Fundamentals and Applications
From Everand
Best First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet