Ai Final
Ai Final
Lecture 2
Exhaustive Search:
Systematically explores all possible solutions.
Guarantees finding the global optimum but is computationally
expensive.
Simulated Annealing:
Gradient ascent / descent Mimics the annealing process in metallurgy.
Uses a probabilistic technique to escape local optima and find a
global optimum.
Answer: A peak that is higher than neighboring points but not the highest overall
Answer: To avoid getting trapped in local optima by allowing occasional worse moves
Answer: False
Logical Operators
Breadth-First Search (BFS) Depth-First Search (DFS)
A search strategy that explores all nodes at the A search strategy that explores as far down a branch
present depth level before moving to the next level. as possible before backtracking.
How It Works:
Explores all subgoals at the current level How It Works:
simultaneously. -Focuses on one subgoal at a time.
Moves level by level, expanding all possible nodes -Attempts to find a complete proof for the first
at each depth. subgoal before moving to the next.
Ensures all possible paths are considered in -If a subgoal fails, backtracks to the previous choice
parallel. point to try alternative paths.
Advantages:
Advantages:
Guaranteed to find the shortest path to a solution, if
-Efficient in terms of memory usage.
one exists.
-Quickly finds a solution in deep but narrow search
Does not get stuck in infinite loops.
spaces.
Disadvantages:
Can be memory-intensive as it stores all nodes at Disadvantages:
the current level. May get stuck in deep or infinite paths (i.e., may not
Slower in deep search spaces compared to DFS. find a solution if it exists on a different path
QUESTION 1
What is the primary purpose of logic programming?
QUESTION 2
In predicate calculus, what is a "predicate"?
QUESTION 3
Which logic programming language is most commonly associated with artificial intelligence and computational linguistics?
Answer: Prolog.
QUESTION 4
In Prolog, which search strategy explores as far down a branch as possible before backtracking?
QUESTION 5
Which is a common application of logic programming?
QUESTION 6
In logic programming, "resolution" is used to prove the truth of propositions.
Answer: True.
QUESTION 7
Prolog uses depth-first search as its default strategy for solving goals.
Answer: True.
Lecture 4
1. Forward Chaining:
• Starts with known facts and applies rules to generate new facts.
• Continues until a specific goal is reached.
• Commonly used in real-time systems.
Example:
For a weather-based system:
• Input: Temperature = 5°C, Condition = Rainy
• Output: “Wear a coat and take an umbrella.”
2. Backward Chaining:
• Starts with a goal and works backward to find facts supporting the goal.
• Often used in diagnostic systems (e.g., determining a disease based on symptoms).
Example:
Diagnosing a disease by analyzing symptoms, where the system works backward from the symptom to find
possible diseases.
1. Define the Problem Domain: Understand the area where the system will be applied.
2. Gather Knowledge: Collect rules from domain experts.
3. Design the Knowledge Base: Organize the rules logically.
4. Implement the Inference Engine: Choose either forward or backward chaining.
5. Test and Validate: Ensure that the system works correctly with real-world data.
Fuzzy Logic is an extension of classical logic, where truth values range between 0 and 1 instead of being strictly
true (1) or false (0). It is used to handle uncertainty and imprecision, mimicking how humans reason.
• Fuzzy Sets: In contrast to classical (crisp) sets, where an element either belongs or doesn’t, fuzzy sets
allow partial membership (values between 0 and 1).
Example:
• “Temperature is hot” could have a membership degree of 0.7 (somewhat hot) or 0.9 (very hot),
depending on how close it is to the boundary.
• Membership Functions: These define how each element maps to a degree of membership in a fuzzy set.
• Crisp Sets: A crisp set is binary in nature. An element either belongs to the set (1) or does not (0).
Examples include even and odd numbers or a class of students (boys or girls).
• Fuzzy Sets: Membership in fuzzy sets is gradual, and elements can partially belong to a set. For
example, a temperature of 25°C might be “somewhat hot,” while 30°C might be “very hot.”
Fuzzy Rule-Based Systems
A Fuzzy Rule-Based System uses fuzzy logic instead of Boolean logic to make decisions.
Components:
• Fuzzification Module: Converts crisp input values into fuzzy values (e.g., a temperature reading might be converted into fuzzy
terms like “warm”).
• Inference Engine: Applies fuzzy rules (e.g., “If temperature is warm, set heating to medium”).
• Defuzzification Module: Converts fuzzy outputs back into crisp values for decision-making.
Example:
A heating system might have rules such as:
• If temperature is “cold,” then set heating to “high.”
• If temperature is “warm,” then set heating to “medium.”
• If temperature is “hot,” then set heating to “low.”
• Probability: Deals with the likelihood of an event happening (e.g., “There’s a 50% chance of rain tomorrow”).
• Fuzziness: Deals with the degree of truth (e.g., “It’s somewhat warm today”). Fuzziness acknowledges that the boundary between
“cold” and “warm” isn’t sharp.
Fuzzy sets can be represented in various ways, depending on the membership function. The closer an element is to the ideal value, the higher
its membership value. As the element moves away, the membership decreases.
Example:
• For a heating system, a temperature of 15°C might be “warm” with a membership value of 0.7, while a temperature of 20°C might
be “warm” with a value of 0.9.
Advantages:
• Better at handling uncertain and imprecise data.
• Models human-like reasoning, making it suitable for applications like control systems.
• Flexible and adaptable.
Challenges:
• Defining accurate membership functions and rules can be difficult.
• Requires tuning to perform optimally.
• Not always as interpretable as rule-based systems.
Intro. To AI systems QUIZ 2 Assignment 2
QUESTION 1: Question 1
Rule-based systems are always more accurate than fuzzy
Fuzzy rule-based systems use: logic systems.
* Fuzzy logic for reasoning Answer: False
QUESTION 2: Question 2
Fuzzy sets allow for partial membership of elements. In a fuzzy set, membership values:
* True Answer: Can range from 0 to 1
QUESTION 3:
Question 3
Which inference method starts with a goal and works backward Fuzzy logic differs from classical logic in that:
to find supporting facts? Answer: Fuzzy logic allows for partial truth values between 0
and 1
* Backward Chaining
QUESTION 4: Question 4
What is the primary advantage of using fuzzy logic over
Fuzzy logic can handle situations with uncertainty and
imprecision. classical logic in certain applications?
Answer: Fuzzy logic can handle uncertainty and imprecision
* True
QUESTION 5: Question 5
Backward chaining is commonly used in expert systems.
Fuzzy logic differs from classical logic by using: Answer: True
* Degrees of membership
Question 6
QUESTION 6: Fuzzy logic can be used to model human reasoning and
A crisp set is characterized by: decision-making processes.
Answer: True
* Clear boundaries and distinct elements
QUESTION 7: Question 7
The process of converting crisp inputs into fuzzy values is
In fuzzy logic, a membership function defines: called:
* The degree of membership for elements in a fuzzy set
Answer: Fuzzification
QUESTION 8: Question 8
Which of the following is not a component of a rule-based
Which of the following is a key advantage of fuzzy logic?
system?
* Handles uncertainty and imprecision Answer: Defuzzification module
QUESTION 9:
Question 9
What is the primary component of a rule-based system that stores Which of the following is a common application of fuzzy
temporary information during processing? logic?
* Working Memory Answer: Air conditioning control systems
Lecture 5
a
1234567
0153426789
Insert mutation
-
12364578
Scramble mutation
-A
- 17892345
1232
setin
set
54354
Lecture 6
• Definition: Machine Learning involves algorithms that improve automatically through experience, enabling systems to:
• Generalize: Provide sensible outputs for unseen inputs.
• Extract and apply relevant information from data to analyze new data.
• Supervised Learning:
• Learning from labeled data to predict labels for new data.
• Example: Classifying animals based on given labels like “dog,” “cat,” etc.
• Unsupervised Learning:
• No labeled data; focuses on identifying patterns and grouping data.
• Example: Clustering users based on reading habits to recommend articles.
• Reinforcement Learning:
• Learning through rewards and penalties.
• Example: A pigeon pecking the right button for a reward.
Steps involved:
1. Data Collection and Preparation: Gathering and cleaning data.
2. Feature Selection and Extraction: Identifying key variables (features) from the data.
3. Algorithm Choice: Selecting an ML algorithm suitable for the task.
4. Model Selection: Choosing the best model structure.
5. Parameter Selection: Tuning model parameters for optimal performance.
6. Training: Teaching the model using training data.
7. Evaluation: Assessing the model using testing data.
• Classifier:
• Maps objects to predefined labels (e.g., MNIST dataset for digit classification).
• Features:
• Attributes representing data.
• Can be categorical (e.g., color) or numerical (e.g., weight).
• Feature Engineering: Transforming raw data into suitable inputs for algorithms (e.g., scaling, encoding categorical data).
Supervised Learning
• Key Concepts:
• Training and Test Sets: Used to train and evaluate the model, respectively.
• Confusion Matrix: Tool to measure performance (e.g., true positives, false positives).
• Accuracy: Proportion of correct predictions out of total predictions.
• How it Works:
• Measures similarity (distance) between data points in feature space.
• Predicts based on the majority label of the nearest neighbors.
• Key Considerations:
• The value of k (number of neighbors) impacts accuracy and generalization.
• Scaling features ensures distances are meaningful.
Feature Scaling
• Importance:
• Features with different scales can distort distance measurements.
• Normalization techniques (e.g., Min-Max Scaler, Standard Scaler) adjust features to a uniform range.
Practical Applications
• Examples include email spam detection, language models predicting the next word, and clustering users for recommendations.
Homework 3: Quiz 3
Question 1 1.
What type of data does the k-Nearest Neighbors (kNN) Which of the following is an example of supervised
algorithm require?
Answer: numerical features
learning?
Answer: Linear regression
Question 2
Which of the following is an example of supervised 2.
learning? What does the Mean Squared Error (MSE) measure?
Answer: Predicting email spam Answer: The average squared difference
between actual values and predictions
Question 3
Reinforcement learning is based on the psychological
concept of Operant Conditioning. 3.
Answer: True What is the function of the bias term in a
perceptron?
Question 4 Answer: It shifts the decision boundary
Which of the following is a type of machine learning?
Answer: All of the above 4.
Question 5
Gradient descent can be used to minimize Mean
Which machine learning method is used for the MNIST Squared Error (MSE).
dataset? Answer: True
Answer: Supervised learning
5.
Question 6 The perceptron is an example of a linear classifier.
What does supervised learning involve? Answer: True
Answer: Learning from exemplars
Question 7 6.
The confusion matrix is used to evaluate the performance of Which of the following inspires AI and ML?
supervised learning classification models. Answer: The human brain
Answer: True
7.
Question 8 What is the purpose of gradient descent?
What is the purpose of scaling in machine learning?
Answer: To ensure input features are within a similar range
Answer: To minimize the Mean Squared Error
Lecture 7: Perceptron and Linear Regression
1. Perceptron:
The perceptron is a simple model for binary classification, consisting of:
2. Training a Perceptron:
Forward pass: Compute the weighted sum of inputs, apply the activation function to produce an output.
Weight updates: Adjust based on the difference between predicted and actual labels to minimize error.
Linear separability: The perceptron works if data can be separated by a straight line.
input + weight
Step
function
adder
Inductive bias refers to the set of assumptions a model uses to generalize beyond the training data. It
affects how a model makes predictions on unseen data.
Examples:
• Linear Models:
• Assume relationships between variables are linear, meaning the change in the output is proportional to the
change in the input.
• Decision Trees:
• Assume the data can be split hierarchically into subsets, which allows the model to capture complex
interactions between features through a series of binary decisions.
Linear Regression
• Definition: Linear regression is a method for modeling the relationship between a dependent variable (target) and one or
more independent variables (features) using a linear equation.
• MSE stands for Mean Squared Error, and it measures the average squared • Minimization Objective:
difference between the actual data points and the predictions made by the
In regression models, the goal is to find
line.
parameters (weights and bias) that minimize the
MSE.
• When we create a line (or a model) to predict values from data, we want to MSE indicates how well the model predictions
know how well this line fits the actual data points. match the actual values. Smaller MSE values
mean better performance.
• We also want to compare different lines (or models) to see which one does
a better job at predicting the data.
• MSE helps us quantify the "goodness" of the fit by telling us how far the
predictions are from the actual values.
Disadvantages of MSE
1. Outlier Sensitivity:
2. Units of Measurement:
The units of MSE are the square of the target variable’s units, which can
make interpretation less intuitive.
Applications of MSE
1. Linear Regression:
2. Model Comparison:
Linear regression is a method to predict a continuous outcome based on the relationship between dependent and independent
variables.
Key Features:
• Goal: Minimize the difference between predicted and actual values (error).
1. Calculate the best-fitting line by minimizing the Mean Squared Error (MSE).
1. Logistic Function:
• Maps any input to a range between 0 and 1, representing probabilities.
• Essential for converting linear regression outputs to probabilities for classification.
2. Forward Pass:
• Compute the linear combination of weights and inputs.
• Apply the logistic function to produce a probability for the positive class.
Sigmoid Curve:
Gradient descent
optimization.
Multi-Class Classification Multinomial Logistic Regression
•Assigns a label (class) from a finite set of labels () to an Extension:
observation. Extends logistic regression to handle multiple classes ().
Examples: Model:
•Binary: Yes/No, 1/0. Calculates the probability of each class using the softmax
•Multi-class: To each observation , choose one label function.
from a set .
Optimizes the log-loss (cross-entropy) for multi-class
Approaches: predictions.
1.One-vs-Rest Classifier (One-vs-All):
•Builds one classifier per class. Applications: Handwriting recognition, sentiment analysis,
•Compares scores of all classifiers and chooses the class image classification.
with the highest score.
Advantages
• Simple to implement with binary classifiers.
• Works well for most classification problems.
Homework 4:
Disadvantages
• Assumes independence between classifiers. Question 1
• Can struggle with ambiguous cases where multiple Which of the following is used in logistic regression to measure the difference
classifiers give high probabilities. between predicted and actual outcomes?
Answer: Cross-entropy loss
Question 2
Which of the following is an assumption of linear regression?
Answer: The relationship between variables is linear
Question 3
Which function does logistic regression use to map predicted values to
probabilities?
Answer: Sigmoid function
Question 4
Multi-class classification involves predicting more than two possible categories.
Answer: True
Question 5
Ones vs all What type of outcome does logistic regression typically predict?
Answer: Binary
Question 6
What is a key advantage of stochastic gradient descent (SGD)?
Answer: It only updates based on one item, making it faster
Question 7
What does logistic regression predict?
Answer: A categorical outcome
Question 8
Logistic regression assumes a linear relationship between the logit of the
outcome and predictor variables.
Answer: True
Question 9
One vs. rest is a method used for binary classification only.
Answer: True
Lecture 9
• Hidden Layers:
• Nodes process inputs through weighted sums and activation functions.
• Activation functions introduce non-linearity, enabling the network to model complex relationships.
2. Activation Functions
• Purpose: Introduce non-linearity to handle complex patterns.
• Common functions:
• Sigmoid: Outputs values between 0 and 1.
• ReLU: Outputs the input directly if positive, else zero.
• Softmax: Converts raw scores into probabilities (used in multi-class classification).
• Tanh: Outputs values between -1 and 1.
1.Regression:
2.Binary Classification:
F
2. Activation function: Sigmoid or logistic.
predictions
3. Example: Classifying whether an email is spam or not.
3.Multi-Class Classification:
4.Multi-Label Classification:
Overview: Backpropagation is a key algorithm for training neural networks by adjusting weights based on the error of
predictions.
1. Forward Pass: Input data flows through the network to generate an output, which is compared to the true target to
compute a loss.
2. Loss Calculation: The loss quantifies the difference between predicted and actual values.
3. Backward Pass:
+Use the chain rule to propagate this gradient backward through the network.
4. Weight Update: Adjust weights and biases using an optimization algorithm (e.g., stochastic gradient descent) by
subtracting a fraction of the gradients.
5. Iteration: Repeat the process over many epochs to improve the model's performance.
7. Practical Considerations
•Scaling: Normalize input data to ensure efficient training.
•Weight Initialization: Random initialization helps avoid symmetry issues.
•Avoiding Overfitting:
•Regularization techniques like dropout.
•Early stopping by monitoring validation set loss.
8. Evaluation Metrics
•Accuracy: Fraction of correct predictions.
•Precision: Focuses on correctly predicted positives.
•Recall: Measures how well the model identifies all positives.
•F1 Score: Harmonic mean of precision and recall.
Qu
Question 1: HM 1. Question 1: What activation function is typically used for
What is backpropagation in the context of neural networks? the output layer in binary classification?
Answer: A method to update weights to minimize the error • Answer: Sigmoid
Question 2:
Which of the following best describes optimization in machine 2. Question 2: What is the role of the output layer in a
learning? neural network?
Answer: Finding the best possible solution from all possible • Answer: To map learned representations to task-
options specific outputs
Question 3:
What is the primary goal of search algorithms? 3. Question 3: What is the purpose of the loss calculation
Answer: To navigate through a set of possibilities to find an in neural networks?
optimal solution • Answer: To minimize the difference between
predictions and actual values
Question 4:
Logistic regression is mainly used for what type of tasks?
Answer: Classification tasks 4. Question 4: What does the backpropagation algorithm
primarily update in a neural network?
Question 5: • Answer: Weights
Evolutionary algorithms are inspired by what?
Answer: Natural selection and biological evolution
5. Question 5: In feed-forward neural networks, data flows
Question 6: in which direction?
Logical programming mainly deals with which of the following? • Answer: Forward only
Answer: Representing and reasoning with logical statements
Question 8:
Which of the following is true about fuzzy logic? 7. Question 7: Batch processing in neural networks
Answer: It handles reasoning that is approximate rather than processes data samples one by one.
fixed and exact • Answer: False
Question 9:
Which of these best describes unsupervised learning? 8. Question 8: Backpropagation requires calculating the
Answer: Finding patterns and groupings in data without labeled gradient of the loss with respect to network weights.
outputs • Answer: True
Question 10:
Which of these is a key feature of the perceptron model? 9. Question 9: Multi-label classification uses sigmoid
Answer: It can classify linearly separable data activation in the output layer.
• Answer: True
(Excluding Cross-Validation)
Supervised Learning Summary
• Definition: A type of machine learning where the model is trained on labeled data.
• Goal: Make accurate predictions on unseen data based on training data.
Key Terms
• Training Data: Input-output pairs used for training.
• Labels: Known outputs corresponding to the inputs.
• Features: Characteristics used as input for the model.
• Model: Mathematical representation trained to make predictions.
Common Algorithms
• Linear Regression: Predicts continuous values assuming a linear relationship.
• Logistic Regression: Estimates probabilities for binary classification.
• Decision Trees: Splits data based on feature values to make decisions.
• Support Vector Machines (SVM): Finds a hyperplane to separate classes.
• Neural Networks: Composed of interconnected nodes for learning complex patterns.
Hyperparameter Tuning
• Definition: Adjusting settings that govern training (e.g., learning rate).
• Methods: Grid search, random search, Bayesian optimization.
Real-World Applications
• Healthcare: Predicting outcomes, diagnosing diseases.
• Finance: Credit scoring, fraud detection.
• Marketing: Customer segmentation, recommendation systems.
• NLP: Sentiment analysis, text classification.
Regularization Techniques
• Adds a penalty term to discourage overfitting:
• L2 Regularization: Adds the sum of squares of weights.
• Dropout: Randomly drops neurons during training.
• Elastic Net: Combines L1 and L2 regularization.
Bias-Variance Tradeoff
• Bias: Systematic errors; leads to underfitting.
• Variance: Sensitivity to noise; leads to overfitting.
• Goal: Balance bias and variance for optimal performance.
Ensemble Learning
• Combines multiple models for better predictions:
• Bagging: Trains models on random data subsets (e.g., Random Forests).
• Boosting: Sequentially corrects errors (e.g., AdaBoost).
• Stacking: Combines predictions from multiple models using a meta-model.
• Voting Classifiers:
• Combines classifiers for predictions via majority votes or averaging probabilities.
Q5