Machine Learning Complete Notes
Machine Learning Complete Notes
in
BATCH: 2022-25
Unit-I
Introduction
What is Machine Learning, Unsupervised Learning, Reinforcement Learning Machine
Learning Use-Cases, Machine Learning Process Flow, Machine Learning Categories, 10
Linear regression and Gradient descent.
Unit-II
Supervised Learning
Classification and its use cases, Decision Tree, Algorithm for Decision Tree Induction
Creating a Perfect Decision Tree, Confusion Matrix, Random Forest. What is Naïve
Bayes, How Naïve Bayes works, Implementing Naïve Bayes Classifier, Support Vector
Machine, Illustration how Support Vector Machine works, Hyper parameter 12
Optimization, Grid Search Vs Random Search, Implementation of Support Vector
Machine for Classification.
Unit-III
Clustering
What is Clustering & its Use Cases, K-means Clustering, How does K-means algorithm 10
work, C-means Clustering, Hierarchical Clustering, How Hierarchical Clustering
works.
Unit-IV
UNIT-1
1.Supervised Learning: In this type of learning, the algorithm is trained on labeled data to learn the relationship
between input data and the corresponding output.
Example: Image classification, where the algorithm is trained on labeled images to learn the features of different
objects.
2. Unsupervised Learning: In this type of learning, the algorithm is trained on unlabeled data to discover patterns,
relationships, or groupings.
Example: Customer segmentation, where the algorithm groups customers based on their buying behavior and
demographics.
3. Reinforcement Learning: In this type of learning, the algorithm learns by interacting with an environment and
receiving rewards or penalties for its actions.
Example: Game playing, where the algorithm learns to play a game by trial and error and receiving rewards for
winning.
Reinforcement Learning (RL) is a type of Machine Learning where an agent learns to make decisions
by interacting with an environment.
# Key Components:
# How RL Works:
# Goal:
The ultimate goal of RL is to learn an optimal policy that maximizes the cumulative reward over time.
# Simple Analogy:
Imagine a child learning to ride a bike. The child takes actions (pedaling, steering), receives rewards (staying upright,
reaching a destination), and learns to improve their policy (balancing, turning) through trial and error.
5. Computer Vision
a. Image Classification: Classify images into different categories, such as objects, scenes, and
actions.
b. Object Detection: Detect and locate objects within images, such as pedestrians, cars, and
buildings.
c. Facial Recognition: Identify individuals based on their facial features.
a. Sentiment Analysis: Analyze text data to determine the sentiment or emotional tone behind it.
b. Text Classification: Classify text into different categories, such as spam vs. non-spam emails.
c. Language Translation: Translate text from one language to another.
d. Chatbots: Build conversational interfaces that can understand and respond to user queries.
Healthcare
1. Disease Diagnosis: Analyze medical data to diagnose diseases, such as cancer or diabetes.
2. Personalized Medicine: Tailor medical treatment to individual patients based on their genetic profiles and
medical histories.
3. Medical Imaging Analysis: Analyze medical images, such as X-rays and MRIs, to diagnose diseases.
Finance
Customer Service
1. Customer Segmentation: Segment customers based on their behavior, demographics, and preferences.
2. Recommendation Systems: Recommend products or services to customers based on their past purchases and
behavior.
3. Chatbots: Build conversational interfaces that can understand and respond to customer queries.
1. Targeted Advertising: Target advertisements to specific customer segments based on their behavior,
demographics, and preferences.
2. Marketing Automation: Automate marketing tasks, such as email marketing and lead scoring.
3. Predictive Analytics: Analyze customer data to predict future behavior and preferences.
Education
1. Personalized Learning: Tailor educational content to individual students based on their learning styles and
abilities.
2. Intelligent Tutoring Systems: Build systems that can provide one-on-one support to students.
3. Automated Grading: Automate the grading process to reduce teacher workload and improve accuracy.
Transportation
1. Predictive Maintenance: Predict when vehicles are likely to require maintenance, allowing for proactive
scheduling.
2. Route Optimization: Optimize routes to reduce fuel consumption and lower emissions.
3. Autonomous Vehicles: Build self-driving cars that can navigate roads safely and efficiently.
Energy
1. Energy Demand Forecasting: Predict energy demand to optimize energy production and reduce waste.
2. Energy Efficiency Optimization: Optimize energy efficiency in buildings and homes.
3. Renewable Energy Integration: Integrate renewable energy sources, such as solar and wind power, into the
grid.
Cybersecurity
Unsupervised Learning
Unsupervised Learning is a type of Machine Learning where the algorithm is trained on unlabeled data.
The goal is to discover patterns, relationships, or groupings in the data without any prior knowledge of
the expected output.
#Supervised Learning
Supervised Learning is a type of Machine Learning where the algorithm is trained on labeled data. The
goal is to learn a mapping between input data and the corresponding output labels.
Linear Regression
Linear Regression is a supervised learning algorithm that predicts a continuous output variable based on one or more
input features. The goal is to learn a linear relationship between the input features and the output variable.
y = β0 + β1x + ε
where:
We can use linear regression to learn a linear relationship between the size of the house and its price.
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the cost function in linear regression. The goal is to
find the values of the model parameters (β0 and β1) that minimize the sum of the squared errors.
βj = βj - α * (dJ/dβj)
where:
We start by initializing the model parameters to some random values, say β0 = 0 and β1 = 0.
We then compute the cost function (J) using the current values of the model parameters.
Next, we compute the partial derivatives of the cost function with respect to each model parameter.
We then update the model parameters using the gradient descent equation.
UNIT-II
Supervised Learning
1. Data Collection: Gather labeled data, where each example is paired with its corresponding target
variable.
2. Data Preprocessing: Clean, transform, and prepare the data for training.
3. Model Training: Train a machine learning algorithm on the labeled data to learn the relationship
between the input features and the target variable.
4. Model Evaluation: Evaluate the trained model on a separate test dataset to estimate its performance.
1. Image Classification: Classify images into categories, such as objects, scenes, or actions.
- Example: Self-driving cars use image classification to detect pedestrians, traffic lights, and road signs.
4. Recommendation Systems: Suggest products or services based on user behavior and preferences.
- Example: Online retailers like Amazon and Netflix use recommendation systems to suggest products and movies.
5. Medical Diagnosis: Predict medical outcomes or diagnose diseases based on patient data.
- Example: Doctors use machine learning algorithms to diagnose diseases like cancer and diabetes.
6. Credit Risk Assessment: Predict the likelihood of loan repayment based on credit history and other factors.
- Example: Banks and financial institutions use credit risk assessment models to evaluate loan applications.
Types of Classification
1. Binary Classification: This type of classification involves predicting one of two classes or labels. For example, spam
vs. not spam emails, cancer vs. not cancer diagnosis, or creditworthy vs. not creditworthy customers.
Example: A bank wants to predict whether a customer is creditworthy or not based on their credit score, income, and
other factors.
2. Multi-Class Classification: This type of classification involves predicting one of multiple classes or labels. For
example, classifying images into different categories such as animals, vehicles, buildings, etc.
Example: A self-driving car needs to classify objects on the road into different categories such as pedestrians, cars,
trees, etc.
3. Multi-Label Classification: This type of classification involves predicting multiple labels or classes for a single
instance. For example, classifying a movie into multiple genres such as action, comedy, romance, etc.
Example: A music streaming service wants to classify a song into multiple genres such as rock, pop, hip-hop, etc.
4. Hierarchical Classification: This type of classification involves predicting a label or class that is part of a
hierarchical structure. For example, classifying a product into a category and subcategory.
Example: An e-commerce website wants to classify a product into a category (e.g., electronics) and subcategory (e.g.,
smartphones).
5.Imbalanced Classification: This type of classification involves dealing with datasets where one class has a
significantly larger number of instances than the others.
Example: A bank wants to predict whether a customer is likely to default on a loan or not. However, the dataset has a
large number of customers who are not likely to default, making it an imbalanced classification problem
Use Cases
1. Image Classification: Classifying images into different categories (e.g., objects, scenes, actions).
- Example: Facebook's image classification algorithm can identify objects, people, and scenes in images.
# Confusion Matrix
A confusion matrix is a table that is used to evaluate the performance of a classification model. It provides a summary
of the predictions made by the model against the actual outcomes.
Using this confusion matrix, we can calculate various metrics such as accuracy, precision, recall, F1-score, false positive
rate, and false negative rate.
# Random Forest
Random Forest is an ensemble learning method that combines multiple decision trees to produce a more accurate and
robust prediction model.
1. Improved Accuracy: Random Forest can produce more accurate predictions than individual decision trees.
2. Robustness to Overfitting: Random Forest is less prone to overfitting than individual decision trees.
3. Handling High-Dimensional Data: Random Forest can handle high-dimensional data with a large number of features.
4. Interpretable Results: Random Forest provides feature importance scores, which can be used to interpret the results.
In this example, we load the iris dataset and split it into training and testing sets. We then train a Random Forest
classifier on the training set and evaluate its performance on the testing set.
Naïve Bayes
Naïve Bayes is a supervised learning algorithm used for classification tasks. It is based on Bayes' theorem and assumes
that the features of the data are independent of each other.
1. Bayes' Theorem: Naïve Bayes uses Bayes' theorem to calculate the probability of a class given a set of features.
2. Assumption of Independence: Naïve Bayes assumes that the features of the data are independent of each other. This
simplifies the calculation of the probability of a class given a set of features.
3. Class Prior Probability: The class prior probability is the probability of a class before observing any features. It is
calculated as the number of instances of the class divided by the total number of instances.
4. Likelihood: The likelihood is the probability of observing a set of features given a class. It is calculated as the product
of the probabilities of each feature given the class.
5. Posterior Probability: The posterior probability is the probability of a class given a set of features. It is calculated
using Bayes' theorem.
Here is an example implementation of a Naïve Bayes classifier in Python using the scikit-learn library:
gnb = GaussianNB()
# Make predictions
y_pred = gnb.predict(X_test)
A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It
works by finding the hyperplane that maximally separates the classes in the feature space.
1. Linearly Separable Data: SVM works by finding the hyperplane that maximally separates the classes in the feature
space. If the data is linearly separable, SVM finds the hyperplane that maximally separates the classes.
2. Non-Linearly Separable Data: If the data is not linearly separable, SVM uses a kernel trick to transform the data into
a higher-dimensional space where it becomes linearly separable.
3. Soft Margin: SVM allows for some misclassifications by introducing a soft margin. The soft margin is a parameter
that controls the trade-off between the margin and the misclassification error.
4. Kernel Functions: SVM uses kernel functions to transform the data into a higher-dimensional space. Common kernel
functions include linear, polynomial, radial basis function (RBF), and sigmoid.
Suppose we have a dataset with two features (x1 and x2) and two classes (Class A and Class B). The dataset is linearly
separable.
| x1 | x2 | Class |
| --- | --- | --- |
|1 |2 |A |
|2 |3 |A |
|3 |4 |A |
|4 |5 |B |
|5 |6 |B |
|6 |7 |B |
SVM finds the hyperplane that maximally separates the classes. In this case, the hyperplane is a line that separates the
two classes.
Hyperparameter Optimization
Hyperparameter optimization is the process of selecting the best hyperparameters for a machine learning model.
Hyperparameters are parameters that are set before training the model, such as the learning rate, regularization strength,
and kernel parameters.
Grid search and random search are two common methods for hyperparameter optimization.
- Grid Search: Grid search involves searching through a predefined grid of hyperparameters. It is a exhaustive search
method that can be computationally expensive.
- Random Search: Random search involves randomly sampling hyperparameters from a predefined distribution. It is
Cheaper as compare to Grid Search.
UNIT-III
CLUSTERING
1. What is Clustering?
Clustering is a type of unsupervised learning technique in machine learning where data
points are grouped into clusters based on their similarities. In clustering, we do not have
labeled data. Instead, the goal is to group data points into clusters where points in the
same cluster are more similar to each other than to points in other clusters.
Clustering is used in a wide range of applications where the goal is to identify inherent
structures in the data.
Customer Segmentation:
Businesses use clustering to divide customers into different segments based on
purchasing behavior, demography, or preferences.
Example: A retailer can use clustering to segment customers into groups such as
"young, high-spending customers," "middle-aged, moderate-spending customers," and
so on.
Document or Text Clustering:
Grouping similar documents or text files based on topics.
Example: News articles can be clustered into categories like "Politics," "Sports,"
"Technology," and "Health."
Image Segmentation:
Segmenting an image into different regions based on pixel similarity.
Example: Clustering can be used in medical imaging to identify different types of tissues
or tumors.
Anomaly Detection:
Detecting outliers or anomalies by identifying data points that do not fit into any
cluster.
Example: Fraud detection systems can cluster user behavior and flag deviations from
normal patterns as potential fraud.
2. K-Means Clustering
K-means clustering is one of the simplest and most widely used clustering algorithms. It
groups data points into K clusters, where K is predefined. The algorithm works
iteratively to minimize the variance within each cluster.
Example:
Let's say you have the following 2D data points:
[(1, 2), (2, 3), (6, 7), (8, 8), (5, 6)]
Initially, pick 2 random points, say (1, 2) and (6, 7), as the centroids. Assign each
point to the nearest centroid.
Calculate the new centroids based on the mean of the assigned points. Repeat
until the centroids stop changing.
Real-Time Example:
In customer segmentation, K-means can be used to group customers based on their
spending habits. For instance, customers who spend similarly on products like
electronics, clothes, and groceries can be grouped into one cluster.
Real-Time Example:
In an image classification task, pixels may belong to multiple categories (e.g., a pixel
could be partly part of a "sky" cluster and partly part of a "cloud" cluster). Fuzzy C-
means can be used to model such ambiguities.
4. Hierarchical Clustering
At each step, split the most heterogeneous cluster into two. Repeat
this process until each point is in its own cluster.
Example:
Consider the following data points:
[(1, 2), (2, 3), (6, 7), (8, 8), (5, 6)]
Real-Time Example:
In a document clustering task, hierarchical clustering can be used to group similar
documents together, starting from individual documents and
progressively merging them based on similarity (e.g., cosine similarity). It’s particularly
useful when you want to visualize the relationships between clusters using a
dendrogram.
If you have 5 points: [(1, 2), (2, 3), (6, 7), (8, 8), (5, 6)], initially you have 5 clusters, one
for each point.
Calculate distances between each pair of clusters:
Use a distance metric like Euclidean distance to calculate the distance between each pair
of clusters.
Merge the closest clusters:
Identify the two clusters that are closest (in terms of distance) and merge them into one.
Repeat until all points are in one cluster:
Real-Time Example:
In the case of gene expression analysis, hierarchical clustering can be used to group
genes that have similar patterns of expression across multiple conditions. This allows
biologists to identify genes that work together in specific biological processes.
C-Means Clustering:
In fuzzy clustering, each point can belong to multiple clusters with a certain degree of
membership.
Simple Example:
You start with each point as a cluster and repeatedly merge the closest clusters.
Eventually, all points belong to one cluster. Simple
Example:
Means: Refers to the centroids (mean) of the clusters. During the algorithm, the mean
(average) of the data points in each cluster is calculated and used as the cluster's
"center" or centroid. This is why it's called K-means, as it refers to dividing data into K
clusters and finding the mean of the points within each cluster.
Summary:
K = Number of clusters
Means = The average (mean) of the points in each cluster
So, K-means is simply the algorithm that divides data into K clusters, where each cluster
is represented by its centroid (the mean of the data points in that cluster).
1. Start with individual data points: Each data point is treated as a separate
cluster.
2. Calculate distances: Calculate the distance between each pair of clusters.
3. Merge closest clusters: Merge the two closest clusters into a new cluster.
4. Repeat steps 2-3: Continue calculating distances and merging clusters until all
clusters are merged into a single cluster.
5. Build the hierarchy: The resulting hierarchy of clusters is represented
| |A|B|C|D|E|
| --- | --- | --- | --- | --- | --- |
|A|0|2|4|6|8|
|B|2|0|3|5|7|
|C|4|3|0|2|4|
|D|6|5|2|0|3|
|E|8|7|4|3|0|
A and B are closest (distance 2), so merge them into a new cluster (AB). AB, C, D, E
| | AB | C | D | E |
| --- | --- | --- | --- | --- |
| AB | 0 | 3 | 5 | 7 |
|C|3|0|2|4|
|D|5|2|0|3|
|E|7|4|3|0|
C and D are closest (distance 2), so merge them into a new cluster (CD). AB, CD, E
Repeat steps 2-3 until all clusters are merged into a single cluster. The
AB
CD
E
+ ----------------- +
| |
| AB |
| / \ |
|/ \|
+------+ +------+ +---------- +
| A | | B | | CD |
+------+ +------+ +---------- +
|
|
v
+------+ ------- +
| |
| C D |
| |
+------+ ------- +
This dendrogram shows the hierarchy of clusters, with the most similar clusters
merged first.
1. Customer Segmentation
A company wants to segment its customers based on their buying
behavior. They collect data on customer demographics, purchase history,
and browsing behavior.
3. Image Segmentation
A computer vision system wants to segment an image into regions based on pixel
similarities.
Hierarchical Clustering can be used to group pixels into clusters based on their
color, texture, and intensity features. For example:
4. Text Classification
A text analysis system wants to classify documents into categories based on their
content.
Hierarchical Clustering helps you group customers into clusters based on their
similarities. For example:
Hierarchical Clustering helps you group songs into clusters based on their
similarities. For example:
These examples illustrate how Hierarchical Clustering can help group similar
things together based on their characteristics.
UNIT –IV
Example:
A self-driving car needs to learn how to navigate through a city. The car receives a
reward signal for reaching its destination safely and efficiently. Through trial and error,
the car learns to take actions (e.g., turn left, accelerate) to maximize the reward signal.
Example:
A person wants to find the best restaurant in town. They can either try a new restaurant
(exploration) or go to a familiar restaurant that they know is good (exploitation).
Example:
A person wants to play a game where they can choose between two buttons. They use an
ε-greedy algorithm with ε = 0.1. This means that 10% of the time, they will choose a
random button, and 90% of the time, they will choose the button that they think will
give them the highest reward.
Example:
A person wants to navigate a grid world. The states are the different cells in the grid, the
actions are up, down, left, and right, the transitions are the probabilities of moving from
one cell to another, and the rewards are the feedback signals received for reaching
certain cells.
Example:
A person wants to play a game where they can choose between two buttons. The
Q-values represent the expected return for pressing each button in each state, and
the V-values represent the expected return for being in each state.
Q-Learning
A popular RL algorithm:
Example:
A person wants to play a game where they can choose between two buttons. They use Q-
learning to learn the optimal policy. The Q-learning update rule updates the Q-values
based on the observed reward and the next state.
α values
The learning rate in Q-learning:
Example:
A person wants to play a game where they can choose between two buttons. They
use Q-learning with an α value of 0.1. This means that the Q-values will be
updated slowly, and the agent will learn the optimal policy gradually.
How it works:
Formula:
Example:
Let's say we are using the epsilon-greedy algorithm in a simple 2-state environment:
In 90% of the time, the agent will pick the action with the highest Q- value.
Components of MDP:
1. States SSS: All possible situations or configurations the agent can be in.
2. Actions AAA: The set of actions the agent can take.
3. Transition Model PPP: The probability P(s′∣s,a)P(s'|s, a)P(s′∣s,a) that the agent
will transition from state sss to state s′s's′ when action aaa is taken.
4. Reward Function R(s,a)R(s, a)R(s,a): The immediate reward received when
action aaa is taken in state sss.
5. Discount Factor γ\gammaγ: A factor that discounts future rewards (0
≤γ≤1\leq \gamma \leq 1≤γ≤1).
6. Policy π\piπ: A strategy or function that determines the action to take based on
the current state.
Example:
Q-Value:
The Q-value represents the expected future reward (return) when starting from a state
sss and taking an action aaa, and then following a certain policy. It is a key part of Q-
learning.
Q(s, a): The expected cumulative reward for taking action aaa in state sss and
following the policy thereafter.
V-Value:
The V-value represents the value of a state, which is the expected return when starting
in that state and following a policy.
V(s): The expected cumulative reward starting in state sss and following the
policy.
Relationship:
4. Q-Learning
Q-learning is a model-free reinforcement learning algorithm used to find the optimal
policy in an MDP. It learns the Q-values through interactions with the environment.
Example:
The agent starts in S1S_1S1, takes action A1A_1A1, and moves to S2S_2S2,
receiving a reward of 5.
The Q-value update rule will adjust Q(S1,A1)Q(S_1, A_1)Q(S1,A1) to account
for this reward.
Steps:
Impact of α\alphaα:
Conclusion
a. Machine Learning?
b. Hierarchical Clustering?
g. Reinforcement Learning?
h. Supervised Learning?
i. Unsupervised Learning?
Q2. Differentiate between unsupervised learning and reinforcement learning with the help
of suitable examples.
Q3. How are Decision Trees used for classification? Explain the steps of constructing a
decision tree with the help of an example.
Q4. Discuss the working of Naive Bayes classifier in detail. List the advantages and
disadvantages of this technique.
Q6. Explain Why Reinforcement Learning Required with example? Write Elements of
Reinforcement Learning.
Unit -4 Notes
1. What is Reinforcement Learning (RL)? How does it differ from supervised and unsupervised
learning?
3. What are the key components of a reinforcement learning problem? Explain with examples.
5. What is a policy in reinforcement learning, and how does it guide an agent's behavior?
6. How does the concept of delayed rewards affect the learning process in reinforcement learning?
1. What are the main elements of a reinforcement learning system? Explain the roles of the
agent, environment, actions, and rewards.
2. What is the role of a value function in reinforcement learning, and how does it
relate to decision-making?
3. How does the concept of state and action influence the decision-making process
in reinforcement learning?
5. What is a reward function, and how does it shape the behavior of the agent in
reinforcement learning?