complete ml (1)
complete ml (1)
History of ML, Introduction of Machine Learning Approaches - (Artificial Neural Network, Clustering, Reinforcement
Learning, Decision Tree Learning, Bayesian networks, Support Vector Machine, Genetic Algorithm), Issues in Machine
Learning and Data Science Vs Machine Learning.
• (UNIT-2: REGRESSION & BAYESIAN LEARNING) REGRESSION: Linear Regression and Logistic Regression. BAYESIAN
LEARNING - Bayes theorem, Concept learning, Bayes Optimal Classifier, Naïve Bayes classifier, Bayesian belief networks,
EM algorithm. SUPPORT VECTOR MACHINE: Introduction, Types of support vector kernel - (Linear kernel, polynomial
kernel,and Gaussiankernel), Hyperplane - (Decision surface), Properties of SVM, and Issues in SVM.
• (UNIT-3: DECISION TREE LEARNING) DECISION TREE LEARNING - Decision tree learning algorithm, Inductive bias,
Inductive inference with decision trees, Entropy and information theory, Information gain, ID-3 Algorithm, Issues in
Decision tree learning. INSTANCE-BASED LEARNING - k-Nearest Neighbour Learning, Locally Weighted Regression,
Radial basis function networks, Case-based learning.
• (UNIT-4: ARTIFICIAL NEURAL NETWORKS) ARTIFICIAL NEURAL NETWORKS - Perceptron's, Multilayer perceptron,
Gradient descent & the Delta rule, Multilayer networks, Derivation of Backpropagation Algorithm, Generalization,
Unsupervised Learning - SOM Algorithm and its variant; DEEP LEARNING - Introduction, concept of convolutional neural
network, Types of layers - (Convolutional Layers, Activation function, pooling, fully connected), Concept of Convolution
(1D and 2D) layers, Training of network, Case study of CNN for eg on Diabetic Retinopathy, Building a smart speaker,
Self-deriving car etc.
• (UNIT-5: REINFORCEMENT LEARNING) REINFORCEMENT LEARNING-Introduction to Reinforcement Learning, Learning
Task,Example of Reinforcement Learning in Practice, Learning Models for Reinforcement - (Markov Decision process, Q
Learning - Q Learning function, @ Learning Algorithm ), Application of Reinforcement Learning,Introduction to Deep Q
Learning. GENETIC ALGORITHMS: Introduction, Components, GA cycle of reproduction, Crossover, Mutation, Genetic
Programming, Models of Evolution and Learning , Applications .
(UNIT-1 : INTRODUCTION)
• Learning, Types of Learning
• Well defined learning problems, Designing a Learning System.
• History of ML
• Introduction of Machine Learning Approaches –
• Artificial Neural Network
• Clustering
• Reinforcement Learning
• Decision Tree Learning
• Bayesian networks
• Support Vector Machine
• Genetic Algorithm),
• Issues in Machine Learning and
• Data Science Vs Machine Learning
Definition of Learning
• Problem Solving:
• Integrates new knowledge and deduces solutions when not all data is available, akin to a
doctor diagnosing illnesses with limited information.
Performance measures for learning
• Generality
• Generality refers to a machine learning model's ability to perform well across
various datasets and environments, not just the one it was trained on. For
instance, a facial recognition system that can accurately identify faces in
diverse lighting conditions and angles demonstrates good generality.
• Efficiency
• Efficiency in machine learning measures how quickly a model can learn from
data. A spam detection algorithm that quickly adapts to new types of spam
emails with minimal training data exhibits high efficiency.
• Robustness
• Robustness is the ability of a model to handle errors, noise, and unexpected
data without failing. A voice recognition system that can understand
commands in a noisy room shows robustness.
• Efficacy
• Efficacy is the overall effectiveness of a machine learning model in performing its
intended tasks. An autonomous driving system that safely navigates city traffic and avoids
accidents under various conditions demonstrates high efficacy.
• Ease of Implementation
• This measures how straightforward it is to develop and deploy a machine
learning model. A recommendation system that can be integrated into an
existing e-commerce platform using standard algorithms and software
libraries highlights ease of implementation.
Supervised Learning
• Supervised learning involves training a machine learning model using labeled
data, which means the data is already associated with the correct answer.
• Example: Consider teaching a child to identify fruits. You show them pictures of
various fruits, like apples and bananas, while telling them, "This is an apple,"
and "This is a banana." Over time, the child learns to identify fruits correctly
based on the examples given.
• Key Steps in Supervised Learning:
• Input and Output Pairing: Each input (e.g., a fruit picture) is paired with its
correct label (e.g., "apple").
• Training: The model learns by comparing its prediction with the actual label
and adjusting itself to improve accuracy.
• Error Correction: If the model predicts incorrectly (e.g., calls an apple a
banana), it adjusts its internal parameters to reduce the error.
• Outcome: The model eventually learns to map inputs (fruit images) to the
correct outputs (fruit names).
Unsupervised learning
• Unsupervised learning involves training a model without any labels, which
means the model tries to identify patterns and data groupings on its own.
• Example: Imagine placing a mix of different coins on a table and asking a child to
sort them. Without explaining any criteria, the child might start grouping the
coins by size, color, or denomination on their own.
• Key Steps in Unsupervised Learning:
• Input Without Labels: The model receives data without any explicit
instructions on what to do with it.
• Pattern Recognition: The model analyzes the data and tries to find any
natural groupings or patterns (e.g., clustering coins based on size or color).
• Self-Organization: The model organizes data into different categories based
on the patterns it perceives.
• Outcome: The model creates its own system of categorization without
external guidance.
Well-defined learning problems
• A well-defined learning problem allows a computer program to improve at a
specific task through experience. This is characterized by three key elements:
• Task (T): The specific activity or challenge the program is expected to
perform.
• Performance Measure (P): The criteria used to gauge the program's
effectiveness at the task.
• Experience (E): The data or interactions from which the program learns.
• Checkers Game:
• Task (T): Playing the game of checkers.
• Performance Measure (P): The percentage of games won against various
opponents.
• Experience (E): Engaging in numerous practice games, possibly including self-
play.
• Handwriting Recognition:
• Task (T): Identifying and categorizing handwritten words in images.
• Performance Measure (P): The accuracy rate, measured as the percentage of
words correctly recognized.
• Experience (E): Analysis of a large dataset of labeled handwritten word
images.
• Autonomous Driving Robot:
• Task (T): Navigating public four-lane highways using vision-based sensors.
• Performance Measure (P): The average distance the robot travels without
making a mistake, as determined by a human supervisor.
• Experience (E): Processing sequences of images and corresponding steering
commands previously collected from human drivers.
Overview of the history of Machine Learning
Early Developments:
• 1943: Neurophysiologist Warren McCulloch and mathematician Walter Pitts
introduced the concept of a neural network by modeling neurons with electrical
circuits.
knowledgegate.in/gate
Overview of the history of Machine Learning
Early Developments:
• 1952: Arthur Samuel developed the first computer program capable of learning
from its activities.
Overview of the history of Machine Learning
Early Developments:
• 1958: Frank Rosenblatt created the Perceptron, the first artificial neural network,
which was designed for pattern and shape recognition.
Overview of the history of Machine Learning
Early Developments:
• 1959: Bernard Widrow and Marcian Hoff developed two neural network models:
ADELINE, which could detect binary patterns, and MADELINE, which was used to
reduce echo on phone lines.
Advancements in the 1980s and 1990s:
• 1982: John Hopfield proposed a network with bidirectional lines that
mimicked actual neuronal structures.
Advancements in the 1980s and 1990s:
• 1986: The backpropagation algorithm was popularized, allowing the
use of multiple layers in neural networks, enhancing their learning
capabilities.
Advancements in the 1980s and 1990s:
• 1997: IBM’s Deep Blue, a chess-playing computer, famously beat the
reigning world chess champion.
Advancements in the 1980s and 1990s:
• 1998: AT&T Bell Laboratories achieved significant progress in digit
recognition, notably enhancing the ability to recognize handwritten
postcodes for the US Postal Service.
21st Century Innovations:
• The 21st century has seen a significant surge in machine learning, driven by
both industry and academia, to boost computational capabilities and
innovation.
• Notable projects include:
• GoogleBrain (2012): A deep learning project.
• AlexNet (2012): A deep convolutional neural network.
• DeepFace (2014) and DeepMind (2014): Projects that advanced facial
recognition and AI decision-making.
• OpenAI (2015), ResNet (2015), and U-net (2015): Each contributed to
advancements in AI capabilities, from gameplay to medical imaging.
Machine learning
• Machine learning is a subset of artificial intelligence (AI) that enables computers to
learn from and make decisions based on data, without being explicitly programmed.
• Definition: Machine learning involves developing algorithms that allow computers to
process and learn from data automatically.
• Purpose: The aim is to enable computers to learn from their experiences and improve
their performance over time without human intervention.
• Functionality: Machine learning algorithms analyze vast amounts of data, enabling
them to perform tasks more efficiently and accurately. This could be anything from
predicting consumer behavior to detecting fraudulent transactions.
• Integration: Combining machine learning with AI and cognitive technologies enhances
its ability to process and interpret large volumes of complex data.
Example: Consider a streaming service like Netflix. Machine learning is used to analyze your
viewing habits and the habits of others with similar tastes. Based on this data, the system
recommends movies and shows that you might like. Here, the algorithm learns from the
accumulated data to make increasingly accurate predictions over time, thereby enhancing user
experience without manual intervention. This demonstrates machine learning’s capability to
adapt and improve autonomously, making it a powerful tool in many tech-driven applications.
Machine learning has a wide range of applications across different fields, Here are
some key applications along with examples:
• Image Recognition:
• Application: Image recognition involves identifying objects, features, or
patterns within digital images or videos.
• Example: Used in facial recognition systems for security purposes or to
detect defective products on assembly lines in manufacturing.
• Speech Recognition:
• Application: Speech recognition technology converts spoken words into text,
facilitating user interaction with devices and applications.
• Example: Virtual assistants like Siri and Alexa use speech recognition to
understand user commands and provide appropriate responses.
• Medical Diagnosis:
• Application: Machine learning assists in diagnosing diseases by analyzing
clinical parameters and their combinations.
• Example: Predicting diseases such as diabetes or cancer by examining
patient data and previous case histories to identify patterns that precede
diagnoses.
• Statistical Arbitrage:
• Application: In finance, statistical arbitrage involves automated trading
strategies that capitalize on patterns identified in trading data.
• Example: Algorithmic trading platforms that analyze historical stock data to
make buy or sell decisions in milliseconds to capitalize on market
inefficiencies.
• Learning Associations:
• Application: This process uncovers relationships between variables in large
databases, often revealing hidden patterns.
• Example: Market basket analysis in retail, which analyzes purchasing
patterns to understand product associations and optimize store layouts.
• Information Extraction:
• Application: Information extraction involves pulling structured information
from unstructured data, like text.
• Example: Extracting key pieces of information from legal documents or news
articles to summarize content or populate databases automatically.
Advantages of Machine Learning:
• Identifies Trends and Patterns:
• Example: Streaming services like Netflix analyze viewer data to identify
viewing patterns and recommend shows and movies that individual users
are likely to enjoy.
Advantages of Machine Learning:
• Automation:
• Example: Autonomous vehicles use machine learning to interpret sensory
data and make driving decisions without human input, improving
transportation efficiency and safety.
Advantages of Machine Learning:
• Continuous Improvement:
• Example: Credit scoring systems evolve by learning from new customer data,
becoming more accurate in predicting creditworthiness over time.
Advantages of Machine Learning:
• Handling Complex Data:
• Example: Financial institutions use machine learning algorithms to detect
fraudulent transactions by analyzing complex patterns of customer behavior
that would be difficult for humans to process.
Disadvantages of Machine Learning:
• Data Acquisition:
• Example: In healthcare, acquiring large datasets of patient medical records
that are comprehensive and privacy-compliant is challenging and expensive.
Disadvantages of Machine Learning:
• Time and Resources:
• Example: Developing a machine learning model for predicting stock market
trends requires extensive computational resources and time to analyze years
of market data before it can be deployed.
Disadvantages of Machine Learning:
• Interpretation of Results:
• Example: In genomics research, interpreting the vast amounts of data
produced by machine learning algorithms requires highly specialized
knowledge to ensure findings are accurate and meaningful.
Disadvantages of Machine Learning:
• High Error-Susceptibility:
• Example: Early stages of facial recognition technology showed high error
rates, particularly in accurately identifying individuals from minority groups,
leading to potential biases and inaccuracies.
Machine Learning Approaches
Artificial Neural Network
• Overview of ANNs:
• Inspiration: ANNs mimic the structure and function of the nervous systems in animals,
particularly how neurons transmit signals.
• Functionality: These networks are used for machine learning and pattern recognition,
handling complex data inputs effectively.
Artificial Neural Network
• Components of ANNs:
• Neurons: Modeled as nodes within a network.
• Connections: Nodes are linked by arcs that represent synapses, with weights
that signify the strength of each connection.
• Processing: The network processes signals in a way analogous to neural
activity in biological brains.
Artificial Neural Network
• Operation:
• Signal Transmission: Connections in the network facilitate the propagation
of data, similar to synaptic transmission in biology.
• Information Processing: ANNs adjust the weights of connections to learn
from data and make informed decisions.
Clustering
• Definition: Clustering is the process of sorting items into groups based on their similarities,
forming distinct clusters where items within each cluster are more alike to each other than to
those in other clusters.
• Visual Representation: Imagine organizing fruits into groups by type, such as grouping apples
together, oranges in another group, and bananas in a separate one, visually representing how
clusters segregate similar items.
ttp://www.knowledgegate.in/
• Characteristics: Clusters act like exclusive clubs, where members share common
traits but differ significantly from members of other clusters, illustrating the
distinctiveness of each group.
• Multidimensional Space: Clusters are akin to islands in an expansive ocean, with
dense population points representing similar items within each cluster, and low-
density water symbolizing dissimilar items separating clusters.
• Machine Learning Perspective: Clustering entails discovering patterns without
explicit guidance, akin to exploring a forest without a map, where similarities
guide the grouping process. It's a form of unsupervised learning, akin to solving
a puzzle without knowledge of the final solution.
It is done by grouping only the input data It is done by classifying output based on
3.
because output is not predefined. the values of the input data.
The number of clusters is not known The number of classes is known before
4. before clustering. These are identified classification as there is predefined
after the completion of clustering. output based on input data.
5. Unknown class label Known class label
It is considered as unsupervised learning It is considered as the supervised
6. because there is no prior knowledge of learning because class labels are known
• Hierarchical Clustering:
• Agglomerative Hierarchical Clustering: Treats each data point as its own cluster, then merges
clusters into larger ones. For example, a dataset of academic papers starts with each paper
as its own cluster, then papers on similar topics merge into bigger clusters.
• Divisive Hierarchical Clustering: Starts with all data points in one cluster and splits them into
smaller clusters. For instance, starting with one cluster of all store customers, the cluster is
split based on purchasing behavior until each customer forms their own cluster.
• Partitional Clustering:
• Centroid-based Clustering (e.g., K-means): Partitions data into clusters, each represented by a
centroid. Clusters minimize distance between data points and centroid, optimizing intra-cluster
similarity and inter-cluster dissimilarity. For example, retail customers can be clustered by buying
patterns, with each cluster's centroid reflecting average behavior.
• Model-based Clustering: Uses a statistical model for each cluster, finding the best data fit. For
instance, Gaussian mixture models assume data points in each cluster are Gaussian distributed.
This method is used in image processing to model different textures as coming from different
Gaussian distributions.
• Density-based Clustering (e.g., DBSCAN):
• This method clusters points that are closely packed together, marking as
outliers points that lie alone in low-density regions. This is useful in
geographical data analysis where, for example, identifying regions of high
economic activity based on point density of businesses can be achieved.
• Grid-based Clustering:
• This method quantizes the space into a finite number of cells that form a grid structure
and then performs clustering on the grid structure. This is effective for large spatial data
sets, as it speeds up the clustering process. For example, in meteorological data, clustering
can be applied to grid squares to categorize regional weather patterns.
• Spectral Clustering:
• Uses the eigenvalues of a similarity matrix to reduce dimensionality before clustering in
fewer dimensions. This technique is particularly useful when the clusters have a complex
shape, unlike centroid-based clustering which assumes spherical clusters. For example, in
social network analysis, spectral clustering can help identify communities based on the
patterns of relationships between members.
Decision Tree
• A decision tree is a model used in data mining, statistics, and machine learning to predict
an outcome based on input variables. It resembles a tree structure with branches and
leaves, where each internal node represents a "decision" based on a feature, each branch
represents the outcome of that decision, and each leaf node represents the final outcome
or class label.
• Advantages and Limitations:
• Advantages:
• Easy to interpret and visualize.
• Requires little data preparation compared to other algorithms.
• Can handle both numerical and categorical data.
• Limitations:
• Prone to overfitting, especially with many branches.
• Can be biased towards features with more levels.
• Decisions are based on heuristics, hence might not provide the best split in some cases.
http://www.knowled egate.in/gate
Bayesian belief networks
• Are tools for representing and reasoning under conditions of uncertainty. They capture the
probabilistic relationships among a set of variables and allow for the inference of probabilities
even with partial information.
• Structure: The core components of a Bayesian belief network include:
• Directed Acyclic Graph (DAG): Each node in the graph represents a random
variable, which can be either discrete or continuous. These variables often
correspond to attributes in data. Arrows or arcs between nodes represent causal
influences.
• Conditional Probability Tables (CPTs): Each node has an associated table that
quantifies the effect of the parents on the node.
• Usage:
• Learning: Bayesian networks can be trained using data to learn the conditional
dependencies.
• Inference: Once trained, the network can be used for inference, such as predicting the
likelihood of lung cancer given that a patient is a smoker with no family history.
• Classification: Bayesian networks can classify new cases based on learned probabilities.
Reinforcement learning
• Reinforcement learning is a type of machine learning where an agent learns to
make decisions by performing actions and receiving feedback in the form of
rewards or penalties. This method is similar to how individuals learn from the
consequences of their actions in real life.
• Key Concepts in Reinforcement Learning:
• Environment: The world in which the agent operates.
• State: The current situation of the agent.
• Actions: What the agent can do.
• Rewards: Feedback from the environment which can be positive
(reinforcements) or negative (punishments).
• Imagine a robot navigating a maze. The robot has to find the shortest path to a
destination without prior knowledge of the layout. Each step it takes provides new
information:
• If it moves closer to the destination, it receives a positive reward.
• If it hits a wall or moves away from the goal, it receives a negative reward. Through
trial and error, the robot learns the optimal path by maximizing its cumulative rewards.
Support Vector Machine
• A Support Vector Machine (SVM) is a powerful machine most commonly used in
classification problems.
• SVM constructs a hyperplane or set of hyperplanes in a high-dimensional space, which
can be used for classification. The goal is to find the best hyperplane that has the
largest distance to the nearest training data points of any class (functional margin), in
order to improve the classification performance on unseen data.
• Applications of SVM:
• Text and Hypertext Classification: For filtering spam and categorizing text
based content for news articles.
• Image Classification: Useful in categorizing images into different groups (e.g.,
animals, cars, fruits).
• Handwritten Character Recognition: Used to recognize letters and digits from
handwritten documents.
• Biological Sciences: Applied in protein classification and cancer classification
based on gene expression data.
Genetic Algorithm
• A genetic algorithm (GA) is a search heuristic inspired by Charles Darwin's theory
of natural selection. It is used to find optimal or near-optimal solutions to
complex problems which might otherwise take a long time to solve.
Tools include SAS, Tableau, Apache Spark, Tools include Amazon Lex, IBM Watson Studio,
4
MATLAB. Microsoft Azure ML Studio.
Prediction
Outcome
Predicts continuous values Predicts binary outcomes (0 or 1)
Estimates values of a
Purpose Estimates probability of an event
dependent variable
• Disadvantages:
• Overfitting: Risk of overly complex models.
• Requires Labeled Data: Needs a lot of labeled examples.
• Applications:
• Email Spam Detection: Classifies emails as spam or not.
• Medical Diagnosis: Predicts diseases from patient data.
Bayes Optimal Classifier
• The Bayes Optimal Classifier is a theoretical model in machine learning that
makes predictions based on the highest posterior probability. It uses Bayes'
theorem to combine prior knowledge with observed data to make the most
accurate possible predictions.
• Advantages
• Optimal Predictions: Provides the most accurate predictions theoretically possible by
minimizing the probability of misclassification.
• Incorporates All Information: Uses all available data and prior knowledge.
• Disadvantages
• Computationally Intensive: Often impractical to compute for real-world applications due
to the need for exact probability distributions.
• Requires Accurate Probabilities: Performance depends on the accuracy of the estimated
probabilities.
• Applications
• Theoretical Benchmark: Serves as an ideal performance benchmark for other classifiers.
• Ensemble Methods: Used in ensemble learning techniques like Bayesian averaging to
improve classification performance.
Naive Bayes Classifier
• The Naive Bayes classifier is a simple and effective probabilistic classifier based on Bayes'
Theorem. It assumes that the features are conditionally independent given the class label,
which is often not true in practice but simplifies the computation significantly.
• Overweight Due to Eating Habits or Medical Issues Person
Eating Habits Medical Issues Overweight
(Good/Bad) (Yes/No) (Yes/No)
1 Bad No Yes
2 Bad No No
3 Bad Yes Yes
4 Good No No
5 Good Yes Yes
6 Good No No
7 Bad No Yes
8 Good Yes Yes
9 Bad No No
10 Good Yes No
• Overweight Due to Eating Habits or Medical Issues
Bayesian Belief Network (BBN)
• A Bayesian Belief Network (BBN), also known as a Bayesian Network or Belief Network, is a
graphical model that represents a set of variables and their conditional dependencies using a
directed acyclic graph (DAG). Each node in the network represents a variable, and each edge
represents a conditional dependency between variables.
http://www.knowledgegate.in/gate
• Advantages:
• Modular Representation: Each variable is conditionally independent of its
non-descendants given its parents.
• Efficient Computation: Supports efficient inference algorithms for calculating
probabilities.
• Visualization: Provides a clear visual representation of dependencies among
variables.
• Disadvantages:
• Complexity: Can become computationally intensive for large networks.
• Data Requirements: Requires sufficient data to accurately estimate the CPTs.
• Design Effort: Constructing a good network requires expert knowledge and
effort.
• Applications:
• Medical Diagnosis: Modeling diseases and symptoms to assist in diagnosis.
• Risk Assessment: Evaluating risks in engineering and finance.
• Natural Language Processing: Understanding dependencies between words
and sentences.
• Decision Support Systems: Providing recommendations based on
probabilistic reasoning.
Support Vector Machine
• A Support Vector Machine (SVM) is a powerful machine most commonly used in
classification problems.
• SVM constructs a hyperplane or set of hyperplanes in a high-dimensional space, which
can be used for classification. The goal is to find the best hyperplane that has the
largest distance to the nearest training data points of any class (functional margin), in
order to improve the classification performance on unseen data.
• Applications of SVM:
• Text and Hypertext Classification: For filtering spam and categorizing text based
content for news articles.
• Image Classification: Useful in categorizing images into different groups (e.g.,
animals, cars, fruits).
• Handwritten Character Recognition: Used to recognize letters and digits from
handwritten documents.
• Biological Sciences: Applied in protein classification and cancer classification based
on gene expression data.
Linear SVM Vs Non-Liner SVM
http://www.knowledgegate.in/gate
Polynomial Kernel in SVM
• Historical Context: The polynomial kernel has been a fundamental part of kernel methods in
machine learning since the 1990s. Introduced as a way to handle non-linearly separable data,
it has helped extend the applicability of Support Vector Machines (SVMs) to more complex
problems. Initially popularized by researchers like Vladimir Vapnik, the polynomial kernel has
been used in various domains including image recognition, bioinformatics, and text
classification.
50
45
40
Sepal width
35
3 0
2 5 Sepal length
20
15
10
Gaussian (RBF) Kernel in SVM
• The Gaussian kernel, also known as the Radial Basis Function (RBF) kernel, is
widely used in SVM due to its ability to handle non-linear relationships between
features. It maps input features into an infinite-dimensional space where a linear
separator can be found.
• Applications
• Image and text classification: Effective for data with complex structures.
• Anomaly detection: Widely used in one-class SVM for identifying outliers.
• Advantages
• Capable of modelling complex decision boundaries.
• Effective in high-dimensional spaces.
• Disadvantages
• Requires careful tuning of the parameter γ\gammaγ.
• Computationally intensive with large datasets.
http://www.knowledgegate.in/gate
• (UNIT-3: DECISION TREE LEARNING) DECISION TREE
LEARNING - Decision tree learning algorithm, Inductive
bias, Inductive inference with decision trees, Entropy
and information theory, Information gain, ID-3
Algorithm, Issues in Decision tree learning. INSTANCE-
BASED LEARNING - k-Nearest Neighbour Learning,
Locally Weighted Regression, Radial basis function
networks, Case-based learning.
http://www.knowledgegate.in/gate
Basic terminology used in Decision Trees
• Root Node: This is the starting point of the decision tree. It represents the entire
dataset, which is then divided into smaller, more homogeneous groups.
• Splitting: This process involves dividing a node into two or more sub-nodes based on a
certain condition that maximizes the separation of the classes.
• Decision Node: These nodes are where the splits happen. A decision node can lead to
further decision nodes or to a leaf node.
• Leaf (Terminal) Node: These are the final nodes that do not split further. Each leaf
node represents a classification or decision outcome.
• Pruning: This technique is used to reduce the size of a decision tree. It involves
removing nodes that have little impact on the decision process, to prevent overfitting.
• Branch/Sub-tree: A section of the decision tree that represents a part of the whole
decision process.
• Parent and Child Nodes: In any split, the original node is called the parent node, and
the resulting sub-nodes are called child nodes.
http://www.knowledgegate.in/gate
Decision trees are a popular tool for decision-making and prediction for
several reasons
• Clear Visualization: Decision trees are easily visualized, allowing users to follow
decision-making steps clearly. For example, a bank might use a decision tree to
determine creditworthiness based on criteria like income, debt, and credit
history.
• Low Data Preparation: They require minimal data preprocessing, such as not
needing to normalize data. An example is a marketing team using customer
demographic data directly to predict purchasing behaviors.
• Efficient and Scalable: The model is efficient, with prediction costs
growing logarithmically with data size. For instance, an online retailer
could use a decision tree to handle millions of customer transactions
efficiently.
• Flexible Data Handling: Capable of processing both numerical and categorical data,
making them suitable for diverse scenarios. For example, a healthcare provider could
use decision trees to analyze patient data that includes both numerical (e.g., blood
pressure readings) and categorical (e.g., symptom presence) data.
• Robust to Assumptions: Decision trees work well even when data assumptions
are not met, such as non-linearity or non-normal distributions. An example is
their use in environmental studies, where data often defies typical statistical
assumptions.
Entropy
• Entropy is a measure of the uncertainty or impurity in a dataset. In the context of
machine learning, particularly in decision trees, entropy quantifies the amount of
randomness or disorder in the target variable. It helps in determining the best way to
split the data to reduce this uncertainty and create pure subsets.
• ’ is the set of examples (dataset).
• Usage Today: Used in various machine learning applications and often cited in academic
research. Basis for the popular data mining tool, See5/C5.0.
• Advantages: More robust than ID3, handling both categorical and continuous data. Includes
tree pruning to reduce overfitting. Handles missing values better than ID3.
• Disadvantages: More complex than ID3. Computationally intensive due to gain ratio
calculations.
CART (Classification and Regression Trees)
• Historical Context: Developed by Breiman, Friedman, Olshen, and Stone in 1984. Introduced to
provide a method for both classification and regression tasks.
• Algorithm: Uses Gini impurity (for classification) or variance reduction (for regression) to split
nodes. Can handle both categorical and continuous attributes. Performs binary splits, which
simplifies the tree but may increase its depth. Includes pruning to avoid overfitting.
• Usage Today: Widely used in various machine learning applications, especially in ensemble
methods like Random Forests and Gradient Boosting Machines.
• Advantages: Versatile, supporting both classification and regression tasks. Handles both
categorical and continuous data. Simplifies splits to binary decisions, making it easy to
implement. Robust and forms the basis of many ensemble learning techniques.
• Disadvantages: Prone to overfitting if not pruned properly. Can create deep trees, which may
be difficult to interpret.
Each algorithm has contributed significantly to the development of decision tree
methods in machine learning:
• ID3 laid the foundation for decision tree algorithms but is less commonly used
today due to its limitations.
• C4.5 improved on ID3 by handling a wider variety of data types and
implementing pruning techniques, making it more robust and versatile.
• Linear Regression: Assumes a linear relationship between input features and the target
variable. Predictive modeling in economics, biology, and engineering.
• Neural Networks: Assumes that complex patterns can be learned through multiple layers of
non-linear transformations. Image and speech recognition, natural language processing, and
game playing.
• k-Nearest Neighbors (k-NN): Assumes that similar instances have similar outputs (locality-based
bias). Recommendation systems, pattern recognition, and anomaly detection.
• Local Decision Making: Decisions are made based on the local neighbourhood
of the input instance. The algorithm considers only a subset of the training data
(e.g., the k-nearest neighbours) to make a prediction.
Example Algorithms:
• k-Nearest Neighbors (k-NN):
• Algorithm: For a given input instance, the algorithm finds the k training
instances closest to the input and assigns the most common label (for
classification) or averages the labels (for regression).
• Application: Used in recommendation systems, image recognition, and
anomaly detection.
Advantages of Instance-Based Learning:
• Simplicity:
• Easy to implement and understand.
• No need for a complex model training phase.
• Adaptability:
• Can adapt quickly to new data without retraining the entire model.
• Useful for applications where the data distribution changes over time.
• Performance:
• Effective for problems where the decision boundary is irregular and cannot
be easily captured by a global model.
Disadvantages of Instance-Based Learning:
• Computational Cost:
• High storage requirements since all training instances are retained.
• Prediction can be slow for large datasets due to the need to compute
distances to all stored instances.
• Sensitivity to Noise:
• Can be sensitive to noisy data, as outliers can affect the prediction.
• Feature Scaling:
• Performance depends on the proper scaling of features, as different scales
can distort distance calculations.
K-Nearest Neighbors (KNN)
• Historical Context: Introduced in the 1950s by Fix and Hodges,
formalized in the 1960s by Cover and Hart, KNN was initially popular
for pattern recognition and statistical estimation due to its simplicity
and intuitive approach.
K-Nearest Neighbors (KNN)
• K-Nearest Neighbors (KNN) is a simple, non-parametric, and lazy learning
algorithm used for classification and regression tasks. KNN is an instance-based
learning algorithm that classifies a data point based on how its neighbors are
classified.
• Algorithm Steps:
• Select K: Choose the number of neighbors (K).
• Calculate Distance: Compute the distance between the new data point and
all the training points (commonly used distance metrics are Euclidean,
Manhattan).
• Find Nearest Neighbors: Identify the K nearest neighbors to the new data
point.
• Vote (for classification): Each neighbor votes for their class, and the class
with the most votes is assigned to the data point.
• Average (for regression): The average value of the K nearest neighbors is
taken as the prediction.
• To classify the new data point (170, 57) using the K-Nearest Neighbors (KNN)
algorithm and Euclidean distance, follow these steps
Height (cm) Weight (kg) Classification
168 50 Underweight
181 63 Normal
175 68 Normal
172 63 Normal
173 64 Normal
173 55 Underweight
170 59 Normal
174 56 Normal
170 57 ?
Height (cm) Weight (kg) Classification
168 50 Underweight
181 63 Normal
175 68 Normal
172 63 Normal
173 64 Normal
173 55 Underweight
170 59 Normal
174 56 Normal
170 57 ?
Nearest Neighbors:
• The distances are:
• (170, 59): 2 (Normal)
• (173, 55): 3.61 (Underweight)
• (174, 56): 4.12 (Normal)
• Classification:
• The majority class among the nearest neighbors is "Normal".
• Thus, the new data point (170, 57) is classified as "Normal".
• Advantages:
• Simple and easy to implement.
• No training phase (lazy learning), which makes it fast for small datasets.
• Disadvantages:
• Computationally expensive for large datasets.
• Sensitive to the choice of K and the distance metric.
• Poor performance with imbalanced data.
• Applications:
• Pattern recognition
• Data mining
• Image classification
• Recommendation systems
• Modern-Day Use: KNN is currently applied in image recognition,
recommendation systems, anomaly detection, and genomics. It remains
favoured for its ease of implementation and versatility, despite computational
challenges, which are mitigated by techniques like KD-Trees, PCA, and distributed
computing.
• Current Research and Trends: Ongoing research focuses on optimizing KNN for
specific applications and enhancing efficiency through approximate nearest
neighbour searches. Trends include integrating KNN with deep learning models,
using it in edge computing and IoT, and improving performance with parallel
processing and GPU acceleration.
Locally Weighted Regression (LWR)
• Locally Weighted Regression (LWR), also known as Locally Weighted Scatterplot Smoothing
(LOWESS or LOESS), is a non-parametric regression method that fits multiple regressions in
localized subsets of the data to produce a smooth curve through points in a scatter plot.
• Non-Parametric: LWR does not assume a fixed form (parametric form) for the relationship
between predictors and the response variable. It adapts to the data locally.
• Localized Fitting: Regression is performed at each point using a subset of data that is close to
the point in question. This local fitting is weighted by the distance of each data point to the
point being estimated.
• Weighting: Points closer to the point of interest have higher weights. Commonly used
weighting functions include the Gaussian kernel and the tricube kernel.
• Imagine you are a botanist studying the growth of Age Height
(years) (meters)
trees in a forest. You have a small dataset of tree
1 2
heights at various ages, and you want to predict the
2 3
height of a tree that is 3 years old. 3 2
4 5
5 4
Age Height
(years) (meters)
1 2
2 3
3 2
4 5
5 4
• The goal of Locally Weighted Regression (LWR) is to fit a linear model to the data points in a
localized region around a query point. To achieve this, we assign weights to the data points
based on their distance from the query point, and then perform a weighted least squares
regression. This ensures that points closer to the query point have more influence on the
model than points farther away.
• History and Development:
• Emerged from mid-20th-century non-parametric regression techniques to handle
nonlinear relationships without assuming a global model.
• Formalized in the 1970s and 1980s by statisticians like Cleveland and Devlin through
methods like LOWESS and LOESS, introducing key innovations such as kernel functions and
bandwidth selection.
• Modern-Day Applications:
• Used in diverse fields like economics, bioinformatics, robotics, and environmental science
for forecasting, gene expression analysis, trajectory planning, and modeling environmental
variables.
• Advantages include flexibility in modeling complex, nonlinear relationships and local
adaptation to data variations.
• Current Trends and Research:
• Ongoing research focuses on improving efficiency, especially for high-dimensional data,
and integrating LWR with deep learning.
• Trends include real-time applications, big data, and edge computing, with modern toolkits
like Python's statsmodels and scikit-learn and R's loess and gam facilitating
implementation.
Radial Basis Functions
• Radial Basis Functions are a powerful
tool for modeling complex, nonlinear
relationships in data. They originated
from mathematical interpolation
methods and have evolved to
become a key component in various
modern applications, particularly in
machine learning and data science.
RBFs offer flexibility, smoothness,
and local adaptation, making them
suitable for a wide range of problems
where traditional linear models fall
short.
Problem They Solve:
• Function Approximation, Interpolation, and Machine Learning: RBFs
approximate complex functions and model intricate patterns in data, create
smooth surfaces for precise interpolation (crucial in geostatistics for predicting
unsampled values), and are used in machine learning for classification and
regression, effectively handling nonlinear relationships between inputs and
outputs.
History:
• Origins and Development: Rooted in mid-20th century interpolation and
approximation theory, RBFs were formalized in the 1970s and 1980s by
researchers like Hardy. They became popular for smooth interpolations and
handling scattered data points, with RBF networks extending their applications
to machine learning and artificial intelligence.
• Applications:
• Machine Learning: RBFs are used in Radial Basis Function Networks (RBFNs)
for classification, regression, and time-series prediction.
• Image and Signal Processing: RBFs help in tasks like image reconstruction,
noise reduction, and signal interpolation.
• Geostatistics: Used for spatial interpolation and modeling, such as predicting
pollution levels at unmonitored locations.
• Robotics: Applied in path planning and control systems, where smooth and
adaptive control strategies are required.
• Finance: Used for modeling financial data and predicting market trends.
• Problem and Significance:
• RBFs are used to predict air pollution levels across a city by interpolating data from
a limited number of monitoring stations, providing smooth and accurate estimates
even in unsampled locations.
• Hidden Layers:
• Located between the input and output layers.
• Comprise hidden nodes or neurons not directly visible to the external system.
• Hidden layers are responsible for intermediate computations and feature extraction.
• There can be zero or more hidden layers depending on the complexity of the problem.
• One hidden layer is sufficient for many problems, but deeper networks (with more hidden layers)
can model more complex patterns.
Classification of Artificial Neural Networks (ANN)
• Single Layer Feed-forward Network:
• Single Layer: Only one computational layer (output layer), making it a single-layer ANN, but
it consists of two layers in total (input and output layers).
• Feed-forward Network: Information flows from the input layer to the output layer without
any feedback loops.
• Multilayer Feed-forward Network:
• Multilayer: Includes an input layer, one or more intermediate (hidden) layers, and
an output layer.
• Hidden layers perform intermediate computations before passing the information
to the output layer.
• Recurrent Network:
• These networks include at least one feedback loop, differing from feed-
forward architectures. Can be single-layer or multi-layer recurrent networks.
(Artificial Neuron) Perceptron
• Definition: The simplest form of a neural network. Consists of a single neuron
with adjustable synaptic weights and bias.
• History: Introduced by Frank Rosenblatt in 1957.
• Components: One or more inputs, a processing unit, and a single output. Used
for supervised learning of binary classifiers. Enables neurons to learn and
process elements in the training set one at a time.
Perceptron Function
• Definition: Maps input 𝑥, multiplied with the learned weight coefficient, to
generate an output value 𝑓(𝑥).
• Formula:
• Where:
• 𝑤: Values of weights.
• 𝑏: Bias.
• 𝑥: Values of inputs.
• 𝑛: Number of inputs to the perceptron.
Output:
• The output can be represented as 0 or 1, or -1 and 1, depending on the
activation function used. For e.g.
AND gate using neural networks Truth Ta ble
Input Output
• Steps in Perceptron Algorithm: A B Y=A.B
• Initialize weight values and bias. 0 0 0
• Forward propagate the inputs. 0 1 0
• Check the error.
1 0 0
• Backpropagate and adjust weights and bias.
• Repeat for all training examples. 1 1 1
OR gate using neural networks Truth Ta ble
Input Output
• Steps in Perceptron Algorithm: A B Y=A+B
• Initialize weight values and bias. 0 0 0
• Forward propagate the inputs. 0 1 1
• Check the error.
1 0 1
• Backpropagate and adjust weights and bias.
• Repeat for all training examples. 1 1 1
Linearly Separable Data
• Definition: Data that can be perfectly separated by a straight line.
• Algorithm: Perceptron adjusts weights and bias to minimize classification error.
• Outcome: Converges to find an optimal decision boundary.
Non-Linearly Separable Data
• Definition: Data that cannot be perfectly separated by a single straight line.
• Algorithm Limitation: Perceptron may not converge, continuously adjusting
weights and bias without finding a perfect solution.
Key Point
• The Perceptron rule is effective for linearly separable data.
• For non-linearly separable data, more advanced methods like the delta rule
(gradient descent) or multi-layer neural networks are required.
Gradient Descent Learning Algorithm
• It is the most used algorithm to train neural networks.
• An optimization algorithm to find model parameters (coefficients, weights)
that minimize the error on the training dataset.
• It achieves this by making changes to the model that move it along a
gradient/slope of errors towards a minimum error value.
• Working:
• Adjusts model parameters iteratively to minimize the error.
• Gradient descent can vary in terms of the training patterns used to calculate
errors and update the model.
The main practical challenges in applying gradient descent are:
• Converging to a local minimum can be quite slow, often requiring many
thousands of gradient descent steps.
• With multiple local minima in the error surface, there is no guarantee that the
procedure will find the global minimum.
Types of Gradient Descent Algorithms
• Key Concepts:
• Error: The difference between the target and the actual output value.
• Also Known As: LMS (Least Mean Square) Learning Rule.
• Characteristics:
• Used in supervised training models.
• Independent of the activation function.
• Update rule for a single layer perceptron.
Steps in Delta Learning Rule
• Initialize weights with random values.
• Apply perceptron to each training sample 𝑖:
• If sample 𝑖 is misclassified, modify all weights:
• 𝑤𝑗=𝑤𝑖+𝜂(𝑡𝑖−𝑦𝑖)𝑥𝑖𝑗
• Continue until all samples are correctly classified or the process
converges.
Back Propagation
• Definition:
• A standard method for training artificial neural networks.
• Repeatedly adjusts the weights of the connections in the network to
minimize the difference between actual output and desired output.
• Steps in Back Propagation Algorithm:
• Inputs 𝑋 arrive through the precomputed path.
• Neurons are modelled using randomly assigned weights 𝑊.
• Calculate the output for each neuron from the input layer, through the
hidden layers, to the output layer.
• Calculate the error in the outputs:Error=Actual Output−Desired
• Travel back from the output layer to the hidden layer to adjust the weights
such that the error is decreased.
• Repeat the process until the desired output is achieved.
Advantages of Back Propagation
• Speed: Fast, simple, and easy to program.
• Flexibility: Does not require prior knowledge about the network.
• Utility: Mainly useful for deep neural networks working on error-prone
projects such as image or speech recognition.
Self-Organizing Maps (SOM) / Kohonene Self Organizing Maps
• Developed by T. Kohonen in 1982. Named "Self-organizing" because no
supervision is required. Learn through unsupervised competitive learning.
• A Kind of neural network which uses Competitive learning algorithm to maps
multidimensional data into lower dimensions for easier interpretation specially
for complex problems. It will contain only two layers
• Input layer (𝑛 units)
• Output layer (𝑚 units)
Deep Learning Overview
• Deep learning is a subset of machine learning that involves neural networks with many
layers (hence "deep"). These models are capable of learning from large amounts of
data and can perform complex tasks by automatically learning features and
representations.
Historical Context
• 1940s-1950s: Early neural network models like the Perceptron were developed.
• 1980s: Backpropagation algorithm was introduced, enabling the training of
multi-layer neural networks.
• 2000s: Deep learning gained popularity due to increased computational power
(GPUs), large datasets, and algorithmic advancements.
• 2012: Breakthrough in image recognition with AlexNet winning the ImageNet
competition, showcasing the power of deep neural networks.
Current Use Deep learning is widely used across various domains:
• Computer Vision: Image classification, object detection, facial recognition (e.g., self-driving
cars use deep learning for environment perception).
• Natural Language Processing (NLP): Language translation, sentiment analysis, chatbots (e.g.,
Google Translate, GPT-3).
• Speech Recognition: Transcribing spoken words into text (e.g., virtual assistants like Siri,
Alexa).
• Healthcare: Medical image analysis, drug discovery (e.g., detecting anomalies in X-rays,
predicting patient outcomes).
• Finance: Fraud detection, algorithmic trading (e.g., identifying fraudulent transactions, stock
price prediction).
Advantages
• High Performance: Exceptional accuracy and performance on complex tasks.
• Automatic Feature Extraction: Automatically learns relevant features from raw data.
• Versatility: Applicable to a wide range of problems (vision, speech, text, etc.).
Disadvantages
• Data-Hungry: Requires large amounts of labeled data for training.
• Computationally Intensive: Demands significant computational resources (GPUs, TPUs).
• Black Box: Lack of interpretability; difficult to understand how decisions are made.
• Overfitting: Prone to overfitting if not properly regularized or if insufficient data is available.
Convolutional Neural Networks (CNNs)
• Convolutional Neural Networks (CNNs) are a class of deep learning models designed
specifically for processing data with a grid-like topology, such as images. They are composed of
multiple layers that automatically and adaptively learn spatial hierarchies of features from
input images.
Convolutional Layers
• Function: Extract features from the input image by applying convolution operations using
filters (kernels).
• Mechanism:
• Filters/Kernels: Small matrices (e.g., 3x3, 5x5) that slide over the input image.
• Convolution Operation: Each filter convolves (slides) over the input image, computing the
dot product between the filter and the input patch. This generates a feature map.
• Stride: The step size by which the filter moves. Larger strides result in
smaller feature maps.
• Padding: Adding zeros around the input to preserve the spatial
dimensions. 'Same' padding keeps the output size the same as the
input, while 'valid' padding reduces it.
Pooling Layers
• Function: Reduce the spatial dimensions of the feature maps, keeping the most important
information.
• Types:
• Max Pooling: Takes the maximum value from each patch of the feature map.
• Average Pooling: Takes the average value from each patch of the feature map.
• Mechanism:
• A pooling layer slides a window (e.g., 2x2) over the input feature map and downsamples it
by taking the maximum or average value within the window.
Fully Connected Layers
• Function: Perform classification based on the features extracted by the convolutional and
pooling layers.
• Mechanism:
• Flatten the output from the last convolutional or pooling layer into a one-dimensional
vector.
• Connect each neuron in the fully connected layer to every neuron in the previous layer.
Dropout Layers
• Function: Prevent overfitting by randomly setting a fraction of input units to
zero at each update during training.
• Mechanism:
• During training, randomly drop neurons along with their connections.
• During testing, use all neurons but scale the output to account for dropped
units during training.
(UNIT-5: REINFORCEMENT LEARNING)
• -Introduction to Reinforcement Learning, Learning Task,
Example of Reinforcement Learning in Practice, Learning
Models for Reinforcement - (Markov Decision process, Q
Learning - Q Learning function, @ Learning Algorithm ),
Application of Reinforcement Learning, Introduction to Deep Q
Learning. GENETIC ALGORITHMS: Introduction, Components,
GA cycle of reproduction, Crossover, Mutation, Genetic
Programming, Models of Evolution and Learning, Applications.
Reinforcement Learning
• Reinforcement Learning (RL) is a feedback-based machine learning approach. An agent learns
which actions to perform by observing the environment and the results of its actions.
• Feedback:
• Correct actions receive positive feedback.
• Incorrect actions receive negative feedback or penalties.
• Key Components:
• Agent: Learns and makes decisions.
• Environment: The external system the agent interacts with.
• Action: What the agent does.
• State: The current situation of the agent.
• Reward: Feedback from the environment.
• Process:
• The agent interacts with the environment.
• It identifies possible actions.
• Performs actions based on observations.
• Receives rewards or penalties.
• Goals and Learning:
• Primary Goal: Perform actions that maximize positive rewards.
• Learning Method:
• The agent learns automatically from feedback without labelled data.
• Unlike supervised learning, RL relies on experience.
• Application:
• Problem Solving: Suitable for problems requiring sequential decision-making
like game-playing, robotics, etc.
• Learning Types:
• Positive Reinforcement: Encourages behaviour by providing positive rewards.
• Negative Reinforcement: Discourages behaviour by providing negative
rewards.
• Examples:
• Maze Game: Avoiding danger spots to minimize loss.
• Grid Games: Finding the shortest path or avoiding blocks and danger.
• Limitations:
• Inapplicability: Not suitable for environments with complete information. For example,
object detection and face recognition are better solved using classifiers rather than RL.
Reinforcement Unsupervised
Criteria Supervised Learning
Learning Learning
Mapping Present Present Not present
Instantaneous feedback
Constant feedback No feedback from
Feedback once the model is
from environment environment
created
No supervisor and
Presence of supervisor
Supervisor labeled dataset is not No supervisor
and labeled data
available
Independent and based Dependent and made
Decisions Independent
on training input sequentially
Clustering
Classifiers (e.g., image Chess, Go Games,
Examples classification Robotics (e.g.,customer
segmentation)
Characteristics of Reinforcement Learning
• Sequential Decision Making:
• Decisions are made in sequence, each action affecting the next.
Example: In a maze game, the path from start to goal involves a
series of decisions.
• Actions are Interdependent:
• Each action's outcome affects subsequent actions. Example: The
agent's next move depends on the reward received from the
previous action.
• Delayed Feedback:
• Rewards or penalties are not immediate and can take time to be
realized. Example: A robot may need several steps to receive
feedback on whether it reached a target or avoided an obstacle.
• Time-Related:
• All actions are associated with time stamps and are ordered in
sequence. Example: In a grid game, actions are performed in a
specific order over time.
• Time-Consuming Operations:
• The learning process can be time-consuming due to the complexity
and number of possible actions and states. Example: Training a
model to play a game requires extensive computation due to the
large state space.
Challenges of Reinforcement Learning
• Reward Design:
• Designing appropriate rewards and determining their value can be
complex. Example: In many games, setting suitable rewards is a
significant challenge.
• Absence of a Model:
• Some environments lack a fixed structure or rules, requiring
simulations to gather experience. Example: Unlike chess, many
games need simulations because they have no underlying model.
• Partial Observability of States:
• Not all states are fully observable, leading to uncertainty. Example: In
weather forecasting, complete information about the state is often
unavailable.
• Complexity:
• Large board configurations and numerous possible actions increase
complexity. Example: Games like Go have vast possibilities, making
labeled data unavailable and increasing algorithmic complexity.
• Time-Consuming Operations:
• The need to explore extensive state spaces and possible actions
increases the time required for learning. Example: Complex
scenarios result in longer computational times, making the training
process slow.
Applications of Reinforcement Learning
• Industrial Automation:
• Used to automate and optimize manufacturing processes.
• Resource Management Applications:
• Allocates resources efficiently in various domains.
• Traffic Light Control:
• Reduces congestion by optimizing traffic light timings.
• Personalized Recommendation Systems:
• Provides personalized content recommendations, such as news articles.
• Bidding for Advertisement:
• Optimizes bidding strategies in digital advertising.
• Driverless Cars:
• Used in autonomous vehicle navigation and decision-making.
Markov Decision Process (MDP)
• A Markov Decision Process (MDP) is a mathematical framework used to describe an
environment in reinforcement learning. It provides a formal model for decision-making in
situations where outcomes are partly random and partly under the control of a decision-maker
(agent).
• Characteristics of MDP
• Markov Property:
• The future state depends only on the current state and action, not on the sequence of
events that preceded it.
• This is known as the "memoryless" property.
• Policy (π):
• A strategy or rule that defines the action to be taken in each state.
• Example: A policy that always chooses the action that leads to the highest reward.
Q-Learning
• Q-learning is a model-free reinforcement learning algorithm used to find the
optimal action-selection policy for any given finite Markov decision process
(MDP). It is a value-based method, meaning it focuses on finding the optimal
value of the action-state pairs.
R 0 1 2 3 4 5
0 -1 -1 -1 -1 0 -1
1 -1 -1 -1 0 -1 100
2 -1 -1 -1 0 -1 -1
3 -1 0 0 -1 0 -1
4 0 -1 -1 0 -1 100
5 -1 0 -1 -1 0 100
• Let learning rate =0.8 and the initial state as room 1
R 0 1 2 3 4 5 Q 0 1 2 3 4 5
0 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 0
1 -1 -1 -1 0 -1 100 1 0 0 0 0 0 0
2 -1 -1 -1 0 -1 -1 2 0 0 0 0 0 0
3 -1 0 0 -1 0 -1 3 0 0 0 0 0 0
4 0 -1 -1 0 -1 100 4 0 0 0 0 0 0
5 -1 0 -1 -1 0 100 5 0 0 0 0 0 0
Consider the second row (state 1) of matrix 𝑅:
• There are two possible actions for the current state 1:
• Go to state 3 (Action 3)
• Go to state 5 (Action 5)
• By random selection, we choose to go to state 5 as our action.
R 0 1 2 3 4 5 Q 0 1 2 3 4 5
0 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 0
1 -1 -1 -1 0 -1 100 1 0 0 0 0 0 0
2 -1 -1 -1 0 -1 -1 2 0 0 0 0 0 0
3 -1 0 0 -1 0 -1 3 0 0 0 0 0 0
4 0 -1 -1 0 -1 100 4 0 0 0 0 0 0
5 -1 0 -1 -1 0 100 5 0 0 0 0 0 0
Now lets consider next episode, suppose initial state is 3
We can go to 1 or 2 or 4. and suppose we choose 1.
R 0 1 2 3 4 5 Q 0 1 2 3 4 5
0 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 0
1 -1 -1 -1 0 -1 100 1 0 0 0 0 0 100
2 -1 -1 -1 0 -1 -1 2 0 0 0 0 0 0
3 -1 0 0 -1 0 -1 3 0 0 0 0 0 0
4 0 -1 -1 0 -1 100 4 0 0 0 0 0 0
5 -1 0 -1 -1 0 100 5 0 0 0 0 0 0
Now lets consider next episode, suppose initial state is 3
We can go to 1 or 2 or 4. and suppose we choose 1.
R 0 1 2 3 4 5 Q 0 1 2 3 4 5
0 -1 -1 -1 -1 0 -1 0 0 0 0 0 0 0
1 -1 -1 -1 0 -1 100 1 0 0 0 0 0 100
2 -1 -1 -1 0 -1 -1 2 0 0 0 0 0 0
3 -1 0 0 -1 0 -1 3 0 80 0 0 0 0
4 0 -1 -1 0 -1 100 4 0 0 0 0 0 0
5 -1 0 -1 -1 0 100 5 0 0 0 0 0 0
After repeating these episode multiple times
R 0 1 2 3 4 5 Q 0 1 2 3 4 5
0 -1 -1 -1 -1 0 -1 0 0 0 0 0 80 0
1 -1 -1 -1 0 -1 100 1 0 0 0 64 0 100
2 -1 -1 -1 0 -1 -1 2 0 0 0 64 0 0
3 -1 0 0 -1 0 -1 3 0 80 51 0 80 0
4 0 -1 -1 0 -1 100 4 64 0 0 64 0 100
5 -1 0 -1 -1 0 100 5 0 80 0 0 80 100
Deep Q-Learning
• Deep Q-Learning (DQL) is an extension of Q-Learning that uses deep
neural networks to approximate the Q-values. This method is particularly
useful for handling high-dimensional state spaces, where traditional Q-
Learning becomes impractical due to the exponential growth in the
number of state-action pairs.
Benefits:
• Can handle high-dimensional state spaces.
• Generalizes across similar states, reducing the need for exhaustive state-
action pair enumeration.
• Experience replay improves sample efficiency and stability.
Challenges:
• Training can be unstable due to the non-stationary target values.
• Requires significant computational resources.
• Hyperparameter tuning (e.g., learning rate, batch size, replay buffer size)
can be complex.
GENETIC ALGORITHMS: Introduction
• Evolutionary Algorithms (EAs) are optimization algorithms inspired by the
principles of natural selection and genetics. They are used to find approximate
solutions to complex problems. Types of Evolutionary Algorithms:
• Uniform Crossover
• Process: Each gene is treated separately, and we decide for each gene whether to swap it
based on a predefined probability or using a binary mask.
• Example:
• Parent 1: 00101100
• Parent 2: 11010011
• Binary Mask: 11010110
• Child 1: 01101100 (biased to Parent 1)
• Child 2: 10010011
• Half-Uniform Crossover
• Method:
• Calculate the Hamming distance (number of different bits) between Parent 1 and Parent 2.
• Swap the bits in Offspring 1 and 2 if the bits are different; otherwise, they remain the same.
• Example:
• Parent 1: 11100010
• Parent 2: 01011011
• Hamming Distance = 4
• Offspring 1: 01101010
• Offspring 2: 11010001
• Three-Parent Crossover
• Method:
• Three parents are taken.
• Each bit of Parent 1 is compared with each bit of Parent 2.
• If the bit is the same in both parents, it is taken in the offspring. If different, the bit from Parent 3 is
taken for the offspring.
• Example:
• Parent 1: A B C A D B E A
• Parent 2: B D E A C D B E
• Parent 3: D A B C D B E A
• Offspring: D B C C D B B A
• Shuffle Crossover
• Method:
• A single crossover point is selected, dividing the chromosome into two
parts.
• Shuffle bits (genes) in each parent using any logic.
• After shuffling, mix modified parents as in the single crossover method.
• Example:
• Parent 1: 1 2 3 4 | 5 6 7 8
• Parent 2: 8 1 7 6 | 2 5 3 4
• Shuffled Parent 1: 3 5 1 4 2 7 6 8
• Shuffled Parent 2: 5 7 8 6 1 2 3 4
• Offspring 1: 3 5 1 4 5 7 8 6
• Offspring 2: 5 7 8 6 2 7 6 8
Mutation
• Objective: Encourage genetic diversity among solutions and attempt to provide
genetic algorithms with solutions that are slightly different from the previous
generation. Mutation alters one or more gene values in a chromosome from its
initial state. With the new gene values, the genetic algorithm may be able to
arrive at better solutions.
• Convergence:
• The progression toward increasing uniformity. A population is said to have
converged when 95% of the individuals (solutions) constituting the
population have the same fitness value.
• Example Workflow:
• Initialization: Generate an initial population of random programs.
• Evaluation: Assess the fitness of each program using the fitness function.
• Selection: Select the fittest programs to act as parents for the next generation.
• Crossover: Combine parts of two parent programs to create offspring programs.
• Mutation: Apply random changes to some programs to introduce variability.
• Iteration: Repeat the evaluation, selection, crossover, and mutation steps for many
generations.
• Example Application:
• Symbolic Regression: GP can be used to find a mathematical expression that
best fits a set of data points. For instance, given a dataset of input-output
pairs, GP can evolve an equation that models the relationship between inputs
and outputs.