AI Notes
AI Notes
Notes
This module serves as an introduction to Artificial Intelligence (AI), covering foundational concepts,
historical developments, and key applications. It also explores the structure and functioning of
intelligent agents, and provides a detailed overview of computer vision and natural language
processing, two important subfields of AI.
• Artificial Intelligence (AI) refers to the simulation of human intelligence in machines. These
machines are designed to think, learn, and solve problems in ways that mimic human
behavior. AI aims to make computers capable of performing tasks that require human-like
intelligence, such as reasoning, learning, perception, and language understanding.
o AI systems are capable of learning from data, improving performance over time, and
making decisions based on inputs.
• Early Concepts:
o The idea of automating human thought dates back to early philosophy and
mythology, with stories of artificial beings such as the "golem" and the "automaton."
o Alan Turing (1950): Proposed the Turing Test, a test to determine whether a
machine can exhibit intelligent behavior indistinguishable from that of a human.
o Expert Systems: AI systems that could make decisions based on a set of rules and a
knowledge base. These were used in fields like medicine and finance.
o AI Winter: In the late 1970s and early 1980s, funding for AI research declined, partly
due to overly ambitious expectations and technical limitations (e.g., lack of
computing power and data).
o Focus shifted to machine learning (ML), where algorithms learned from data rather
than following fixed rules. The availability of larger datasets and advances in
computing power facilitated this shift.
o Support Vector Machines (SVMs), decision trees, and neural networks became
popular tools for ML tasks like classification and regression.
o Deep Learning: A subset of machine learning that uses deep neural networks (DNNs)
to model complex patterns in large datasets. Deep learning has led to breakthroughs
in areas like computer vision, natural language processing, and speech recognition.
AI has vast applications across a wide range of industries. Here are some of the most prominent
fields where AI is being utilized:
• Healthcare:
o AI helps in diagnostic systems, such as image analysis for detecting diseases like
cancer and interpreting medical scans (e.g., X-rays, MRIs).
o Predictive analytics for patient outcomes, personalized treatment plans, and drug
discovery.
• Finance:
• Transportation:
• Retail:
• Entertainment:
o AI-generated content is also being used for creating realistic visual effects and
animation in films and video games.
• Manufacturing:
4. Intelligent Agents
An Intelligent Agent is any system that perceives its environment, makes decisions based on its
goals, and takes actions to achieve those goals.
2. Actuators: Perform actions to change the environment (e.g., robotic arm, speaker).
3. Perception: The agent processes and interprets data from sensors to understand the
environment.
4. Reasoning: The agent makes decisions based on the interpreted data to achieve a
specific goal.
5. Learning: The agent improves its performance over time through experience (e.g.,
reinforcement learning).
o Simple Reflex Agents: These agents act based on the current state, responding to
specific stimuli (e.g., thermostat, basic robotic vacuum).
o Goal-based Agents: These agents take actions to achieve specific goals, such as a
chess-playing agent trying to win the game.
1. Perception: The agent perceives the environment via sensors (e.g., visual, auditory).
2. Decision-Making: The agent processes the sensory data and decides on an appropriate
action.
3. Action: The agent performs the chosen action using actuators (e.g., moving, speaking).
The agent continually cycles through this process of perceiving, reasoning, and acting, with the goal
of maximizing its chances of achieving its goals.
6. Computer Vision
Computer Vision is a field of AI that focuses on enabling machines to interpret and understand visual
information from the world, much like humans do.
2. Object Detection: Detecting and locating objects within an image (e.g., detecting
faces or pedestrians in a photo).
o Feature Extraction: Identifying key features in images (e.g., edges, corners, textures)
for analysis.
o Retail: Computer vision systems help with inventory tracking and checkout-free
shopping experiences.
1. Text Classification: Assigning labels or categories to text (e.g., spam vs. non-spam
email).
2. Named Entity Recognition (NER): Identifying entities such as names, dates, and
locations in text.
6. Text Generation: Generating coherent and contextually relevant text (e.g., chatbots
or language models like GPT).
• Techniques in NLP:
• Applications of NLP:
o Chatbots: Virtual assistants (e.g., Siri, Alexa) use NLP to interact with users in natural
language.
o Text Analytics: Analyzing customer reviews, feedback, or social media posts for
insights.
Notes
This module focuses on searching for solutions in AI problems, discussing different types of search
strategies, algorithms, and techniques used to explore the problem space and find optimal or
feasible solutions. Search is a fundamental problem-solving technique in AI, especially when the
solution involves navigating through large search spaces (e.g., finding the shortest path in a graph,
solving puzzles, playing games). The module covers both uninformed and informed search
strategies, local search algorithms, adversarial search, and Alpha-Beta pruning.
In AI, many problems can be modeled as a search problem where the goal is to find a path from an
initial state to a goal state. The search involves exploring the search space (i.e., all possible states)
and finding a sequence of actions that lead to the goal.
• State Space: The set of all possible configurations that the problem can take.
• Solution: A sequence of actions that leads from the initial state to the goal state.
Search Algorithms: These are methods for exploring the search space to find a solution, where:
• A search tree represents the states as nodes, with edges representing actions that lead from
one state to another.
• Uninformed Search (Blind Search): Algorithms that do not have additional information about
the goal (other than what is provided in the problem definition).
Uninformed search strategies explore the search space without using any additional knowledge
other than the structure of the problem. The primary objective is to find a solution by systematically
exploring all possible options.
o Description: BFS explores the search tree level by level, starting from the initial state
and expanding all nodes at the current level before moving to the next level.
o Advantages:
▪ Guarantees finding the shortest path to the goal (if the path cost is uniform).
o Disadvantages:
o Advantages:
o Disadvantages:
o Description: UCS is a variant of BFS that expands nodes based on their path cost
rather than the depth of the node. It always expands the least costly node.
o Advantages:
▪ Guarantees finding the optimal solution (i.e., the path with the minimum
cost).
o Disadvantages:
Informed search strategies, also known as heuristic search, use additional information (heuristics) to
guide the search process. These algorithms estimate how close a given state is to the goal, allowing
the search to focus on promising paths.
o Description: This algorithm expands the node that appears to be closest to the goal,
based on a heuristic function h(n)h(n)h(n), which estimates the cost from a node to
the goal.
o Advantages:
o Disadvantages:
2. A* Search:
o Description: A* combines the advantages of both BFS and greedy search. It uses
both the cost to reach the node g(n)g(n)g(n) and the estimated cost to the goal
h(n)h(n)h(n) to determine the most promising path. The total cost function is
f(n)=g(n)+h(n)f(n) = g(n) + h(n)f(n)=g(n)+h(n), where:
▪ h(n)h(n)h(n) is the heuristic estimate of the cost from node nnn to the goal.
o Advantages:
o Disadvantages:
Local search algorithms are used for problems where finding an exact solution is infeasible due to the
large search space. These algorithms focus on finding a solution by iteratively improving a candidate
solution.
1. Hill Climbing:
o Description: Hill climbing is a local search algorithm that starts with an arbitrary
solution and iteratively moves to a neighbor state with a better value, hoping to
reach the peak (goal state).
o Advantages:
o Disadvantages:
▪ Can get stuck in local optima (solutions that are better than their neighbors
but not globally optimal).
2. Simulated Annealing:
o Advantages:
▪ Can escape local minima and is more likely to find the global optimum.
o Disadvantages:
3. Genetic Algorithms:
o Advantages:
o Disadvantages:
5. Adversarial Search
Adversarial search involves searching for solutions in environments where multiple agents (e.g.,
players in a game) have conflicting goals. This type of search is commonly used in game theory and
board games (e.g., chess, checkers).
• Two-Person Zero-Sum Games: These are games where one player's gain is the other player's
loss. The goal is to find an optimal strategy for one player, assuming the opponent plays
optimally.
• Minimax Algorithm:
o Description: The Minimax algorithm is used to choose the best action for a player by
minimizing the possible loss for a worst-case scenario. It assumes that the opponent
is also playing optimally to minimize the player's payoff.
o Advantages:
o Disadvantages:
Alpha-Beta pruning is an optimization technique for the Minimax algorithm that reduces the number
of nodes evaluated in the search tree. The idea is to "prune" branches of the tree that cannot
possibly affect the final decision.
o The algorithm maintains two values: alpha (the best score for the maximizing player)
and beta (the best score for the minimizing player).
o If, at any point, a node's score is worse than the current alpha or beta, the branch is
pruned (i.e., not explored further).
• Advantages:
o Can sometimes explore a game tree more efficiently, without losing the optimality of
the solution.
• Complexity: Alpha-Beta pruning reduces the time complexity of the Minimax algorithm from
O(bd)O(b^d)O(bd) to O(bd/2)O(b^{d/2})O(bd/2) in the best case.
This module introduced the essential concepts of search in Artificial Intelligence, focusing on
different search strategies and algorithms for solving problems. It covered:
• Uninformed search strategies like BFS, DFS, and Uniform Cost Search, which explore the
search space without additional knowledge.
• Informed search strategies like Greedy Search and A*, which use heuristics to guide the
search more efficiently.
• Local search algorithms like Hill Climbing and Simulated Annealing, which are used for
optimization problems.
• Adversarial search, including the Minimax algorithm and Alpha-Beta pruning, for problems
like two-player games where players have conflicting goals.
Module III: Knowledge Representation & Reasoning
Notes
This module delves into Knowledge Representation (KR) and Reasoning in Artificial Intelligence. It
discusses how knowledge is represented in AI systems, how reasoning mechanisms work, and
various formal systems and algorithms used for inference and decision-making. This module covers
logical reasoning, probabilistic reasoning, and the theory behind some of the most widely used
models, including Propositional Logic, First-Order Logic (FOL), Forward & Backward Chaining,
Resolution, and Bayesian Networks.
1. Propositional Logic
Propositional logic (also called sentential logic or Boolean logic) is one of the simplest formal
systems for representing knowledge. It deals with statements or propositions that can be either true
or false.
• Propositions: Basic units of propositional logic. They represent statements that can be either
true or false (e.g., "It is raining").
• Logical Connectives: Propositions are combined using logical operators to form more
complex expressions:
o AND (∧\land∧): A∧BA \land BA∧B is true if both AAA and BBB are true.
o IMPLIES (→\rightarrow→): A→BA \rightarrow BA→B is true if AAA implies BBB (if
AAA is true, then BBB must also be true).
• Truth Tables: A truth table is used to evaluate the truth values of logical expressions by
considering all possible truth values of the propositions involved.
• Logical Inference: Propositional logic allows reasoning through logical inference, where new
facts can be derived from existing facts using valid rules of inference (e.g., Modus Ponens,
Modus Tollens, etc.).
First-Order Logic (FOL), also known as Predicate Logic, is a more expressive formalism than
propositional logic. It allows for reasoning about objects, relations between objects, and
quantification.
o Universal Quantifier (∀\forall∀): Indicates that the statement applies to all elements
in the domain (e.g., ∀x Loves(x,Mary)\forall x \, \text{Loves}(x,
Mary)∀xLoves(x,Mary)).
o Existential Quantifier (∃\exists∃): Indicates that there exists at least one element for
which the statement holds true (e.g., ∃x Loves(x,Mary)\exists x \, \text{Loves}(x,
Mary)∃xLoves(x,Mary)).
• Logical Connectives: Same as in propositional logic, but applied to predicates and quantified
statements.
• Example:
Inference in FOL refers to the process of deriving new knowledge from existing knowledge using
logical rules. The main aim is to deduce consequences (theorems) from axioms or facts.
• Forward Chaining:
o Example: If you know that "John loves Mary" and "Loves(x, y) implies Happy(x)", you
can deduce that "John is happy".
o Advantages: Suitable for rule-based systems or expert systems where the facts are
known upfront.
• Backward Chaining:
o Example: If you want to prove "John is happy", you check if the rule "Loves(x, y)
implies Happy(x)" is satisfied, and then check if "John loves Mary".
o Advantages: Suitable for goal-oriented systems (like Prolog), where you work
backward from the desired conclusion.
• Resolution:
o Process:
o Example: If you have a set of facts and want to prove a particular statement, you
negate the statement, and use resolution to check if it leads to a contradiction.
4. Probabilistic Reasoning
Probabilistic reasoning allows reasoning about uncertainty and is particularly useful in situations
where the knowledge is incomplete or ambiguous. Instead of relying on deterministic logic,
probabilistic reasoning assigns probabilities to different events or hypotheses.
• Bayes' Theorem: A foundational tool in probabilistic reasoning, it allows the update of the
probability of a hypothesis based on new evidence. It states:
where:
• Probabilistic Models:
o Markov Models: Systems where the future state depends only on the current state
and not on past states (memoryless property).
o Hidden Markov Models (HMMs): A more advanced probabilistic model used for
sequential data, where the system has hidden (unobservable) states.
5. Utility Theory
Utility theory is a framework for decision-making under uncertainty, where agents make decisions
that maximize their expected utility.
• Utility Function: A function that assigns a real number to each possible outcome in a way
that reflects the agent’s preferences. The higher the utility, the more preferred the outcome
is.
• Expected Utility: The expected utility of an action is calculated by summing the utilities of all
possible outcomes, weighted by their probabilities. EU(a)=∑iP(oi)⋅U(oi)EU(a) = \sum_{i}
P(o_i) \cdot U(o_i)EU(a)=i∑P(oi)⋅U(oi) where:
o P(oi)P(o_i)P(oi) is the probability of outcome oio_ioi,
• Risk and Decision Making: Utility theory helps agents make decisions by balancing potential
rewards with the risks or uncertainties involved.
HMMs are a type of probabilistic model used to represent systems that transition between hidden
states, observable only through noisy measurements. They are widely used in time-series analysis,
speech recognition, and other sequential data tasks.
• Components of HMM:
2. Observations: A set of observable events that are emitted by the hidden states.
• Forward and Backward Algorithms: These algorithms are used for efficiently computing the
likelihood of a sequence of observations in an HMM and for decoding the most likely
sequence of hidden states given the observations.
7. Bayesian Networks
A Bayesian Network is a graphical model used to represent a set of random variables and their
conditional dependencies via a directed acyclic graph (DAG).
• Components:
o Conditional Probability Tables (CPTs): Define the probability distribution for each
variable, conditioned on its parents in the graph.
This module introduced the foundational concepts and techniques for Knowledge Representation
and Reasoning in AI. Topics covered include:
• Inference techniques such as forward and backward chaining, resolution, and logical
reasoning in FOL.
• Probabilistic reasoning using Bayes' Theorem, and models like Hidden Markov Models
(HMM) and Bayesian Networks for dealing with uncertainty.
• Utility theory for decision-making under uncertainty, helping agents choose actions that
maximize expected outcomes.
Module IV concepts and algorithms in Machine Learning (ML)
Notes
Machine learning can be broadly classified into two main types based on the nature of the training
data and the learning task:
Supervised Learning
• Definition: In supervised learning, the model is trained on labeled data. The training dataset
contains both input data (features) and the corresponding correct output (labels or targets).
The goal is for the model to learn a mapping from inputs to outputs, and make predictions
on unseen data.
• Common Algorithms:
o Neural Networks,
o Decision Trees.
Unsupervised Learning
• Definition: In unsupervised learning, the model is trained on data that has no labels or target
outputs. The goal is to identify underlying patterns, structures, or relationships in the data,
such as grouping similar items together or reducing the data’s dimensionality.
• Common Algorithms:
o K-Means Clustering,
o Hierarchical Clustering,
2. Decision Trees
A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It
works by recursively splitting the data into subsets based on feature values, creating a tree-like
structure of decisions.
o Nodes: Each internal node represents a decision based on a feature, and each leaf
node represents the output (a class label for classification or a numerical value for
regression).
o The goal is to recursively partition the data to maximize information gain (in
classification) or variance reduction (in regression). Popular methods to determine
the best feature to split on include:
▪ Entropy and Information Gain: Measures the uncertainty in the dataset and
the reduction in uncertainty by splitting (used in classification).
• Advantages:
• Disadvantages:
• Pruning: To avoid overfitting, trees are often pruned (i.e., remove nodes or branches that
provide little predictive power).
Statistical learning refers to a broad set of machine learning techniques based on statistical theory.
These models aim to infer the relationship between input variables and output variables.
1. Linear Regression:
o Used for regression tasks where the goal is to predict a continuous output variable as
a linear combination of input features.
o The model estimates the coefficients (β\betaβ) that minimize the error between
predicted and actual values (usually by minimizing Mean Squared Error - MSE).
2. Logistic Regression:
o These are statistical models where the parameters are treated as random variables
with probability distributions.
o A powerful algorithm for classification tasks. It works by finding the hyperplane that
maximizes the margin between classes in a high-dimensional space.
5. Naive Bayes:
o A simple probabilistic classifier based on Bayes' Theorem with the naive assumption
that features are conditionally independent given the class label.
Naive Bayes is a probabilistic classifier based on Bayes' Theorem. It assumes that the features are
independent given the class label, which simplifies the computation and makes the model very
efficient, even with high-dimensional data.
• Bayes' Theorem:
P(C∣X)=P(X∣C)P(C)P(X)P(C|X) = \frac{P(X|C)P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)P(C)
where:
• Naive Assumption: The features are conditionally independent given the class label, which
simplifies the likelihood term P(X∣C)P(X|C)P(X∣C) as the product of the individual feature
likelihoods:
• Advantages:
o Works well with small datasets or when features are highly independent.
• Disadvantages:
o Not suitable for tasks where the relationship between features is important.
The Expectation-Maximization (EM) algorithm is an iterative method for finding maximum likelihood
estimates of parameters in models with latent (hidden) variables.
• Overview: EM is particularly useful when the data is incomplete or has missing values, and
the model involves unobserved (latent) variables that need to be estimated.
1. E-step (Expectation): Estimate the expected value of the hidden variables, given the
current model parameters and the observed data.
2. M-step (Maximization): Maximize the likelihood of the observed data given the
expected values of the hidden variables. Update the model parameters based on this
maximization.
6. Reinforcement Learning
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions
by interacting with an environment. The agent performs actions to maximize a cumulative reward
over time, based on feedback from the environment.
• Key Concepts:
o Reward: A scalar feedback signal that evaluates the agent’s action in a given state.
o Value Function: A function that estimates the long-term reward for each state (or
state-action pair).
• Markov Decision Process (MDP): RL problems are often modeled as MDPs, where:
o Transition Model: Describes how the environment transitions between states based
on actions.
o Reward Function: Describes the reward given to the agent for taking an action in a
particular state.
• Q-Learning:
o A type of Q-learning that uses deep neural networks to approximate the Q-values,
enabling the use of RL in environments with large state spaces (e.g., video games).
• Supervised learning (e.g., Decision Trees, Naive Bayes) and Unsupervised learning (e.g.,
Clustering, Dimensionality Reduction).
• Statistical learning models for regression and classification, including linear models and
SVM.
• Techniques for learning with complete data (e.g., Naive Bayes) and learning with
incomplete or hidden data (e.g., EM algorithm).
Each concept and algorithm have its applications, strengths, and limitations, and the choice of model
depends on the problem at hand, the data available, and the desired outcome.
Module V: Pattern Recognition
Notes
Pattern Recognition (PR) is a field of machine learning that focuses on classifying input data based
on patterns or features extracted from the data. This module covers the fundamental concepts of
pattern recognition, including its design principles, statistical methods, and various classification and
clustering techniques.
Pattern Recognition refers to the process of classifying data into predefined categories or classes
based on some measure of similarity or features extracted from the data. It is widely used in fields
such as computer vision, speech recognition, bioinformatics, and data mining.
The goal is to build a system that can identify patterns in complex, high-dimensional data and assign
it to appropriate categories.
A pattern recognition system typically involves several key components, each contributing to the
overall process of identifying patterns from input data.
1. Feature Extraction:
o The first step in pattern recognition is to extract relevant features from the raw data.
These features are more compact and informative representations of the data.
2. Preprocessing:
o This step involves transforming or normalizing the data to reduce noise, improve
quality, and ensure consistency.
3. Classification:
o The core task is to classify the data into one of the predefined classes. This step uses
the features extracted and applies a classification algorithm to make a decision.
4. Evaluation:
o After classification, the system's performance is evaluated using measures such as
accuracy, precision, recall, and F1 score to assess how well the system performs.
Statistical pattern recognition uses statistical methods to classify patterns based on the statistical
properties of the data. The key idea is to model the distribution of data in each class and use these
models to assign new data to the correct class.
• Bayes’ Theorem plays a crucial role in statistical pattern recognition, as it helps in calculating
the probability of a data point belonging to a certain class based on observed features.
• Assumptions: Statistical methods often assume that the data in each class follows a
particular statistical distribution, such as a Gaussian distribution.
Two common methods used in pattern recognition for reducing the complexity of the data and
improving classification performance are:
• Purpose: PCA is a technique used for dimensionality reduction. It reduces the number of
variables while preserving the variance of the data. This is done by transforming the original
features into a new set of orthogonal features called principal components.
• Steps:
1. Standardize the data (make the features have zero mean and unit variance).
5. Select the top kkk eigenvectors and project the original data onto these new axes.
• Advantages:
• Applications: Image compression, noise reduction, feature extraction for machine learning
tasks.
• Purpose: LDA is another technique for dimensionality reduction, but unlike PCA, which
focuses on preserving variance, LDA seeks to maximize the separation between multiple
classes. It is especially useful for supervised learning tasks.
• Steps:
1. Compute the mean vector of each class and the overall mean vector of the data.
2. Compute the between-class scatter matrix and the within-class scatter matrix.
3. Compute the eigenvectors and eigenvalues of the matrix formed by the inverse of
the within-class scatter matrix multiplied by the between-class scatter matrix.
4. Select the top kkk eigenvectors to form a transformation matrix that maps the data
to a lower-dimensional space.
• Advantages:
5. Classification Techniques
Classification is the process of predicting the category or class of an unknown data point based on its
features. There are several classification techniques commonly used in pattern recognition.
• Definition: The Nearest Neighbor (NN) rule is one of the simplest classification algorithms. It
assigns a class to a data point based on the class of its nearest neighbor(s) in the feature
space.
• Types:
o 1-NN (k = 1): Classifies a new point based on the class of the single closest training
point.
o k-NN (k > 1): Classifies a new point based on the majority class of the kkk closest
training points.
• Distance Metric: The most common distance metric used is Euclidean distance, but other
metrics like Manhattan distance or Cosine similarity can also be used.
• Advantages:
• Disadvantages:
Bayes Classifier
• Definition: The Bayes classifier is based on Bayes’ Theorem, which calculates the posterior
probability of a class given the observed data and the prior probability of the class.
• Bayes' Theorem:
• Naive Bayes Assumption: Naive Bayes assumes that features are conditionally independent
given the class label, which simplifies the computation of P(X∣C)P(X|C)P(X∣C).
• Advantages:
• Disadvantages:
• Definition: SVM is a powerful classification algorithm that aims to find the hyperplane that
best separates the data into different classes with the largest margin.
• Key Concepts:
o Margin: The distance between the decision boundary (hyperplane) and the closest
data points (called support vectors).
o Kernel Trick: SVM can be extended to non-linearly separable data by using kernel
functions (e.g., polynomial, Gaussian) to map the input features into a higher-
dimensional space where the classes are separable.
• Advantages:
o Works well even when the number of dimensions is greater than the number of
samples.
• Disadvantages:
6. K-Means Clustering
K-Means is an unsupervised clustering algorithm that groups data into kkk clusters, where each data
point belongs to the cluster whose center (centroid) is closest.
• Steps:
3. Update the centroids by calculating the mean of all data points assigned to each
cluster.
4. Repeat steps 2 and 3 until convergence (when the centroids no longer change).
• Advantages:
• Disadvantages: