0% found this document useful (0 votes)
12 views37 pages

Machine Learning Complete Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views37 pages

Machine Learning Complete Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

BSC-IT & BCA www.sbs.ac.

in

CLASS: BSC-IT & BCA 6th -sem

SUBJECT: MACHINE LEARNING

BATCH: 2022-25

Notes as per IKGPTU Syllabus

NAME OF FACULTY: HIMANSHU SINGH

FACULTY OF IT SBS COLLEGE. LUDHIANA

SBS ©PROPRIETARY Page 1


BSC-IT & BCA www.sbs.ac.in

SYLLABUS OF MACHINE LEARNING

Unit-I

Introduction
What is Machine Learning, Unsupervised Learning, Reinforcement Learning Machine
Learning Use-Cases, Machine Learning Process Flow, Machine Learning Categories, 10
Linear regression and Gradient descent.

Unit-II

Supervised Learning
Classification and its use cases, Decision Tree, Algorithm for Decision Tree Induction
Creating a Perfect Decision Tree, Confusion Matrix, Random Forest. What is Naïve
Bayes, How Naïve Bayes works, Implementing Naïve Bayes Classifier, Support Vector
Machine, Illustration how Support Vector Machine works, Hyper parameter 12
Optimization, Grid Search Vs Random Search, Implementation of Support Vector
Machine for Classification.

Unit-III
Clustering
What is Clustering & its Use Cases, K-means Clustering, How does K-means algorithm 10
work, C-means Clustering, Hierarchical Clustering, How Hierarchical Clustering
works.

Unit-IV

Why Reinforcement Learning, Elements of Reinforcement Learning, Exploration vs


Exploitation dilemma, Epsilon Greedy Algorithm, Markov Decision Process (MDP)
12
Q values and V values, Q – Learning, α values.

SBS ©PROPRIETARY Page 2


BSC-IT & BCA www.sbs.ac.in

UNIT-1

Introduction to Machine Learning


Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables machines to learn from data and improve
their performance on a task without being explicitly programmed. It involves training algorithms on data to make
predictions, classify objects, or make decisions.

 Types of Machine Learning

There are three primary types of Machine Learning:

1.Supervised Learning: In this type of learning, the algorithm is trained on labeled data to learn the relationship
between input data and the corresponding output.

Example: Image classification, where the algorithm is trained on labeled images to learn the features of different
objects.

2. Unsupervised Learning: In this type of learning, the algorithm is trained on unlabeled data to discover patterns,
relationships, or groupings.

Example: Customer segmentation, where the algorithm groups customers based on their buying behavior and
demographics.

3. Reinforcement Learning: In this type of learning, the algorithm learns by interacting with an environment and
receiving rewards or penalties for its actions.

Example: Game playing, where the algorithm learns to play a game by trial and error and receiving rewards for
winning.

 Reinforcement Learning (RL) is a type of Machine Learning where an agent learns to make decisions
by interacting with an environment.

# Key Components:

1. Agent: The learner or decision-maker (e.g., a robot, a chatbot, or a self-driving car).


2. Environment: The external world with which the agent interacts (e.g., a game board, a physical space, or a user).
3. Actions: The decisions made by the agent (e.g., moving left, turning right, or responding to a user query).
4. Rewards: Feedback from the environment indicating how good or bad the agent's actions were (e.g., +1 for winning a
game, -1 for losing, or a reward for completing a task).
5. Policy: The strategy used by the agent to select actions (e.g., random, greedy, or optimized).

# How RL Works:

1. The agent takes an action in the environment.


2. The environment responds with a reward and a new state.
3. The agent uses the reward and state to learn and update its policy.
4. The agent repeats steps 1-3 to continually improve its policy.

SBS ©PROPRIETARY Page 3


BSC-IT & BCA www.sbs.ac.in

# Goal:
The ultimate goal of RL is to learn an optimal policy that maximizes the cumulative reward over time.

# Simple Analogy:
Imagine a child learning to ride a bike. The child takes actions (pedaling, steering), receives rewards (staying upright,
reaching a destination), and learns to improve their policy (balancing, turning) through trial and error.

 Machine Learning Use-Cases


Machine Learning has numerous applications across various industries, including:

1. Healthcare: Disease diagnosis, patient outcome prediction, and personalized medicine.

2. Finance: Credit risk assessment, portfolio optimization, and fraud detection.

3. Marketing: Customer segmentation, recommendation systems, and sentiment analysis.

4. Autonomous Vehicles: Object detection, motion planning, and control.

5. Computer Vision
a. Image Classification: Classify images into different categories, such as objects, scenes, and
actions.
b. Object Detection: Detect and locate objects within images, such as pedestrians, cars, and
buildings.
c. Facial Recognition: Identify individuals based on their facial features.

5. Natural Language Processing (NLP)

a. Sentiment Analysis: Analyze text data to determine the sentiment or emotional tone behind it.
b. Text Classification: Classify text into different categories, such as spam vs. non-spam emails.
c. Language Translation: Translate text from one language to another.
d. Chatbots: Build conversational interfaces that can understand and respond to user queries.

Healthcare

1. Disease Diagnosis: Analyze medical data to diagnose diseases, such as cancer or diabetes.
2. Personalized Medicine: Tailor medical treatment to individual patients based on their genetic profiles and
medical histories.
3. Medical Imaging Analysis: Analyze medical images, such as X-rays and MRIs, to diagnose diseases.

Finance

1. Credit Risk Assessment: Assess the creditworthiness of individuals or businesses.


2. Portfolio Optimization: Optimize investment portfolios to maximize returns and minimize risk.
3. Fraud Detection: Detect fraudulent transactions, such as credit card transactions or insurance claims.

Customer Service

1. Customer Segmentation: Segment customers based on their behavior, demographics, and preferences.
2. Recommendation Systems: Recommend products or services to customers based on their past purchases and
behavior.
3. Chatbots: Build conversational interfaces that can understand and respond to customer queries.

SBS ©PROPRIETARY Page 4


BSC-IT & BCA www.sbs.ac.in
Marketing

1. Targeted Advertising: Target advertisements to specific customer segments based on their behavior,
demographics, and preferences.
2. Marketing Automation: Automate marketing tasks, such as email marketing and lead scoring.
3. Predictive Analytics: Analyze customer data to predict future behavior and preferences.

Education

1. Personalized Learning: Tailor educational content to individual students based on their learning styles and
abilities.
2. Intelligent Tutoring Systems: Build systems that can provide one-on-one support to students.
3. Automated Grading: Automate the grading process to reduce teacher workload and improve accuracy.

Transportation

1. Predictive Maintenance: Predict when vehicles are likely to require maintenance, allowing for proactive
scheduling.
2. Route Optimization: Optimize routes to reduce fuel consumption and lower emissions.
3. Autonomous Vehicles: Build self-driving cars that can navigate roads safely and efficiently.

Energy

1. Energy Demand Forecasting: Predict energy demand to optimize energy production and reduce waste.
2. Energy Efficiency Optimization: Optimize energy efficiency in buildings and homes.
3. Renewable Energy Integration: Integrate renewable energy sources, such as solar and wind power, into the
grid.

Cybersecurity

1. Intrusion Detection: Detect and prevent cyber attacks.


2. Malware Detection: Detect and prevent malware infections.
3. Predictive Analytics: Analyze network traffic to predict future cyber threats.

 Difference b/w supervised and Unsupervised learning?

Unsupervised Learning
Unsupervised Learning is a type of Machine Learning where the algorithm is trained on unlabeled data.
The goal is to discover patterns, relationships, or groupings in the data without any prior knowledge of
the expected output.

Characteristics of Unsupervised Learning


- No labeled data is provided.
- The algorithm learns from the data without any guidance.
- The goal is to identify patterns, relationships, or groupings in the data.

Examples of Unsupervised Learning


- Clustering customers based on their buying behavior.
- Identifying patterns in stock prices.
- Segmenting images into different regions.

SBS ©PROPRIETARY Page 5


BSC-IT & BCA www.sbs.ac.in

#Supervised Learning
Supervised Learning is a type of Machine Learning where the algorithm is trained on labeled data. The
goal is to learn a mapping between input data and the corresponding output labels.

Characteristics of Supervised Learning


- Labeled data is provided.
- The algorithm learns from the data with guidance.
- The goal is to predict the output label for new, unseen input data.

Examples of Supervised Learning


- Image classification (e.g., classifying images as dogs or cats).
- Sentiment analysis (e.g., classifying text as positive or negative).
- Predicting house prices based on features like number of bedrooms and location.

 Machine Learning Process Flow


The Machine Learning process flow involves the following steps:

1. Problem Definition: Define the problem you want to solve.


2. Data Collection: Collect relevant data for the problem.
3. Data Preprocessing: Clean, transform, and prepare the data for training.
4. Model Selection: Choose a suitable algorithm for the problem.
5. Model Training: Train the model using the preprocessed data.
6. Model Evaluation: Evaluate the performance of the model.
7. Model Deployment: Deploy the model in a production-ready environment.

 Machine Learning Categories


Machine Learning can be categorized into several types, including:

1. Regression: Predicting continuous values.


2. Classification: Predicting categorical values.
3. Clustering: Grouping similar data points.
4. Dimensionality Reduction: Reducing the number of features in the data.

Linear Regression
Linear Regression is a supervised learning algorithm that predicts a continuous output variable based on one or more
input features. The goal is to learn a linear relationship between the input features and the output variable.

Linear Regression Equation


The linear regression equation is:

y = β0 + β1x + ε

where:

- y is the output variable


- x is the input feature
- β0 is the intercept or bias term
- β1 is the slope coefficient
- ε is the error term

Linear Regression Example


Suppose we want to predict the price of a house based on its size. We collect data on the size of several houses and their

SBS ©PROPRIETARY Page 6


BSC-IT & BCA www.sbs.ac.in
corresponding prices.

| Size (x) | Price (y) |


| --- | --- |
| 1000 | 200000 |
| 1200 | 250000 |
| 1500 | 300000 |
| 1800 | 350000 |
| 2000 | 400000 |

We can use linear regression to learn a linear relationship between the size of the house and its price.

 Gradient Descent

Gradient Descent is an optimization algorithm used to minimize the cost function in linear regression. The goal is to
find the values of the model parameters (β0 and β1) that minimize the sum of the squared errors.

Gradient Descent Equation


The gradient descent equation is:

βj = βj - α * (dJ/dβj)

where:

- βj is the jth model parameter


- α is the learning rate
- J is the cost function
- dJ/dβj is the partial derivative of the cost function with respect to the jth model parameter

Gradient Descent Example


Suppose we want to use gradient descent to optimize the model parameters (β0 and β1) in the linear regression equation.

We start by initializing the model parameters to some random values, say β0 = 0 and β1 = 0.

We then compute the cost function (J) using the current values of the model parameters.

Next, we compute the partial derivatives of the cost function with respect to each model parameter.

We then update the model parameters using the gradient descent equation.

We repeat this process until convergence or a stopping criterion is reached.

SBS ©PROPRIETARY Page 7


BSC-IT & BCA www.sbs.ac.in

UNIT-II

Supervised Learning

 What is Supervised Learning?


Supervised learning is a type of machine learning where the algorithm is trained on labeled data to learn the relationship
between the input features and the target variable. The goal is to make predictions on new, unseen data.

# How Supervised Learning Works

1. Data Collection: Gather labeled data, where each example is paired with its corresponding target
variable.

2. Data Preprocessing: Clean, transform, and prepare the data for training.

3. Model Training: Train a machine learning algorithm on the labeled data to learn the relationship
between the input features and the target variable.

4. Model Evaluation: Evaluate the trained model on a separate test dataset to estimate its performance.

5. Model Deployment: Deploy the trained model in a production environment to


make predictions on new, unseen data.

# Use Cases for Supervised Learning

1. Image Classification: Classify images into categories, such as objects, scenes, or actions.
- Example: Self-driving cars use image classification to detect pedestrians, traffic lights, and road signs.

2. Speech Recognition: Transcribe spoken words into text.


- Example: Virtual assistants like Siri, Alexa, and Google Assistant use speech recognition to understand voice
commands.

3. Sentiment Analysis: Determine the sentiment or emotional tone of text data.


- Example: Companies use sentiment analysis to analyze customer feedback and reviews.

4. Recommendation Systems: Suggest products or services based on user behavior and preferences.
- Example: Online retailers like Amazon and Netflix use recommendation systems to suggest products and movies.

5. Medical Diagnosis: Predict medical outcomes or diagnose diseases based on patient data.
- Example: Doctors use machine learning algorithms to diagnose diseases like cancer and diabetes.

6. Credit Risk Assessment: Predict the likelihood of loan repayment based on credit history and other factors.
- Example: Banks and financial institutions use credit risk assessment models to evaluate loan applications.

7. Spam Detection: Classify emails or messages as spam or not spam.


- Example: Email providers like Gmail and Yahoo use spam detection algorithms to filter out unwanted emails.

SBS ©PROPRIETARY Page 8


BSC-IT & BCA www.sbs.ac.in

# Supervised Learning Algorithms

Some popular supervised learning algorithms include:

1. Linear Regression: Predict continuous outcomes using linear equations.

2. Logistic Regression: Predict binary outcomes using logistic functions.

3. Decision Trees: Predict outcomes using tree-like models.

4. Random Forests: Predict outcomes using ensembles of decision trees.

5. Support Vector Machines (SVMs): Predict outcomes using hyperplanes.

6. Neural Networks: Predict outcomes using complex, layered models.

 Supervised Learning Classification


Supervised learning classification is a type of machine learning where the algorithm is trained on labeled data to predict
a categorical outcome.

Types of Classification

1. Binary Classification: This type of classification involves predicting one of two classes or labels. For example, spam
vs. not spam emails, cancer vs. not cancer diagnosis, or creditworthy vs. not creditworthy customers.

Example: A bank wants to predict whether a customer is creditworthy or not based on their credit score, income, and
other factors.

2. Multi-Class Classification: This type of classification involves predicting one of multiple classes or labels. For
example, classifying images into different categories such as animals, vehicles, buildings, etc.

Example: A self-driving car needs to classify objects on the road into different categories such as pedestrians, cars,
trees, etc.

3. Multi-Label Classification: This type of classification involves predicting multiple labels or classes for a single
instance. For example, classifying a movie into multiple genres such as action, comedy, romance, etc.

Example: A music streaming service wants to classify a song into multiple genres such as rock, pop, hip-hop, etc.

4. Hierarchical Classification: This type of classification involves predicting a label or class that is part of a
hierarchical structure. For example, classifying a product into a category and subcategory.

Example: An e-commerce website wants to classify a product into a category (e.g., electronics) and subcategory (e.g.,
smartphones).

5.Imbalanced Classification: This type of classification involves dealing with datasets where one class has a
significantly larger number of instances than the others.

SBS ©PROPRIETARY Page 9


BSC-IT & BCA www.sbs.ac.in

Example: A bank wants to predict whether a customer is likely to default on a loan or not. However, the dataset has a
large number of customers who are not likely to default, making it an imbalanced classification problem

Use Cases
1. Image Classification: Classifying images into different categories (e.g., objects, scenes, actions).
- Example: Facebook's image classification algorithm can identify objects, people, and scenes in images.

2. Sentiment Analysis: Classifying text as positive, negative, or neutral.


- Example: Movie review websites use sentiment analysis to classify user reviews as positive or negative.

3. Speech Recognition: Classifying spoken words into text.


- Example: Virtual assistants like Siri and Alexa use speech recognition to classify spoken words into text.

# Confusion Matrix
A confusion matrix is a table that is used to evaluate the performance of a classification model. It provides a summary
of the predictions made by the model against the actual outcomes.

Metrics Derived from Confusion Matrix


Several metrics can be derived from a confusion matrix, including:

1. Accuracy: (TP + TN) / (TP + TN + FP + FN)


2. Precision: TP / (TP + FP)
3. Recall: TP / (TP + FN)
4. F1-Score: 2 * (Precision * Recall) / (Precision + Recall)
5. False Positive Rate: FP / (FP + TN)
6. False Negative Rate: FN / (FN + TP)

Example of Confusion Matrix


Suppose we have a binary classification model that predicts whether a customer will buy a product or not. The actual
outcomes and predicted outcomes are as follows:

| Actual | Predicted | Count |


| --- | --- | --- |
| Buy | Buy | 80 |
| Buy | Not Buy | 20 |
| Not Buy | Buy | 30 |
| Not Buy | Not Buy | 70 |

The confusion matrix for this example would be:

| | Predicted Buy | Predicted Not Buy |


| --- | --- | --- |
| Actual Buy | 80 (TP) | 20 (FN) |
| Actual Not Buy | 30 (FP) | 70 (TN) |

Using this confusion matrix, we can calculate various metrics such as accuracy, precision, recall, F1-score, false positive
rate, and false negative rate.

SBS ©PROPRIETARY Page


10
BSC-IT & BCA www.sbs.ac.in

# Random Forest

Random Forest is an ensemble learning method that combines multiple decision trees to produce a more accurate and
robust prediction model.

How Random Forest Works


Here's a step-by-step overview of how Random Forest works:

1. Bootstrapping: Randomly select a subset of samples from the training data.


2. Decision Tree Construction: Construct a decision tree using the bootstrapped samples.
3. Feature Randomization: Randomly select a subset of features to consider at each node of the decision tree.
4. Tree Ensemble: Repeat steps 1-3 to construct multiple decision trees.
5. Voting: Combine the predictions from each decision tree to produce a final prediction.

Advantages of Random Forest


Random Forest has several advantages, including:

1. Improved Accuracy: Random Forest can produce more accurate predictions than individual decision trees.
2. Robustness to Overfitting: Random Forest is less prone to overfitting than individual decision trees.
3. Handling High-Dimensional Data: Random Forest can handle high-dimensional data with a large number of features.
4. Interpretable Results: Random Forest provides feature importance scores, which can be used to interpret the results.

Disadvantages of Random Forest


Random Forest also has some disadvantages, including:

1. Computational Cost: Training a Random Forest model can be computationally expensive.


2. Hyperparameter Tuning: Random Forest has several hyperparameters that need to be tuned for optimal performance.

Example of Random Forest in Python


Here's an example of using Random Forest in Python using the scikit-learn library:

from sklearn.ensemble import RandomForestClassifier


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the iris dataset


iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest classifier


rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Evaluate the model on the testing set


accuracy = rf.score(X_test, y_test)
print("Accuracy:", accuracy)

SBS ©PROPRIETARY Page


11
BSC-IT & BCA www.sbs.ac.in

In this example, we load the iris dataset and split it into training and testing sets. We then train a Random Forest
classifier on the training set and evaluate its performance on the testing set.

 Naïve Bayes

What is Naïve Bayes?

Naïve Bayes is a supervised learning algorithm used for classification tasks. It is based on Bayes' theorem and assumes
that the features of the data are independent of each other.

How Naïve Bayes Works

1. Bayes' Theorem: Naïve Bayes uses Bayes' theorem to calculate the probability of a class given a set of features.

P(Class|Features) = P(Features|Class) * P(Class) / P(Features)

2. Assumption of Independence: Naïve Bayes assumes that the features of the data are independent of each other. This
simplifies the calculation of the probability of a class given a set of features.

3. Class Prior Probability: The class prior probability is the probability of a class before observing any features. It is
calculated as the number of instances of the class divided by the total number of instances.

4. Likelihood: The likelihood is the probability of observing a set of features given a class. It is calculated as the product
of the probabilities of each feature given the class.

5. Posterior Probability: The posterior probability is the probability of a class given a set of features. It is calculated
using Bayes' theorem.

Implementing Naïve Bayes Classifier

Here is an example implementation of a Naïve Bayes classifier in Python using the scikit-learn library:

from sklearn.naive_bayes import GaussianNB


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the iris dataset


iris = load_iris()

# Split the dataset into features (X) and target (y)


X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

SBS ©PROPRIETARY Page


12
BSC-IT & BCA www.sbs.ac.in

# Create a Naïve Bayes classifier

gnb = GaussianNB()

# Train the classifier


gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate the classifier


accuracy = gnb.score(X_test, y_test)
print("Accuracy:", accuracy)

Support Vector Machine

What is Support Vector Machine?

A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It
works by finding the hyperplane that maximally separates the classes in the feature space.

How Support Vector Machine Works

1. Linearly Separable Data: SVM works by finding the hyperplane that maximally separates the classes in the feature
space. If the data is linearly separable, SVM finds the hyperplane that maximally separates the classes.

2. Non-Linearly Separable Data: If the data is not linearly separable, SVM uses a kernel trick to transform the data into
a higher-dimensional space where it becomes linearly separable.

3. Soft Margin: SVM allows for some misclassifications by introducing a soft margin. The soft margin is a parameter
that controls the trade-off between the margin and the misclassification error.

4. Kernel Functions: SVM uses kernel functions to transform the data into a higher-dimensional space. Common kernel
functions include linear, polynomial, radial basis function (RBF), and sigmoid.

Illustration of How Support Vector Machine Works

Here is an illustration of how SVM works:

Suppose we have a dataset with two features (x1 and x2) and two classes (Class A and Class B). The dataset is linearly
separable.

| x1 | x2 | Class |
| --- | --- | --- |
|1 |2 |A |
|2 |3 |A |
|3 |4 |A |
|4 |5 |B |
|5 |6 |B |
|6 |7 |B |

SVM finds the hyperplane that maximally separates the classes. In this case, the hyperplane is a line that separates the
two classes.

SBS ©PROPRIETARY Page


13
BSC-IT & BCA www.sbs.ac.in

Hyperparameter Optimization
Hyperparameter optimization is the process of selecting the best hyperparameters for a machine learning model.

Hyperparameters are parameters that are set before training the model, such as the learning rate, regularization strength,
and kernel parameters.

Grid Search vs Random Search

Grid search and random search are two common methods for hyperparameter optimization.

- Grid Search: Grid search involves searching through a predefined grid of hyperparameters. It is a exhaustive search
method that can be computationally expensive.

- Random Search: Random search involves randomly sampling hyperparameters from a predefined distribution. It is
Cheaper as compare to Grid Search.

SBS ©PROPRIETARY Page


14
BSC-IT & BCA www.sbs.ac.in

UNIT-III

CLUSTERING

1. What is Clustering?
Clustering is a type of unsupervised learning technique in machine learning where data
points are grouped into clusters based on their similarities. In clustering, we do not have
labeled data. Instead, the goal is to group data points into clusters where points in the
same cluster are more similar to each other than to points in other clusters.

Clustering is used in a wide range of applications where the goal is to identify inherent
structures in the data.

Use Cases of Clustering:

Customer Segmentation:
Businesses use clustering to divide customers into different segments based on
purchasing behavior, demography, or preferences.

Example: A retailer can use clustering to segment customers into groups such as
"young, high-spending customers," "middle-aged, moderate-spending customers," and
so on.
Document or Text Clustering:
Grouping similar documents or text files based on topics.

Example: News articles can be clustered into categories like "Politics," "Sports,"
"Technology," and "Health."
Image Segmentation:
Segmenting an image into different regions based on pixel similarity.

SBS ©PROPRIETARY Page 3


BSC-IT & BCA www.sbs.ac.in

Example: Clustering can be used in medical imaging to identify different types of tissues
or tumors.
Anomaly Detection:

Detecting outliers or anomalies by identifying data points that do not fit into any
cluster.

Example: Fraud detection systems can cluster user behavior and flag deviations from
normal patterns as potential fraud.

2. K-Means Clustering
K-means clustering is one of the simplest and most widely used clustering algorithms. It
groups data points into K clusters, where K is predefined. The algorithm works
iteratively to minimize the variance within each cluster.

# How does K-means Algorithm Work?


The K-means algorithm follows these steps:

Choose K initial centroids:


Select K random data points as the initial centroids (or center points) of K clusters.

Assign each data point to the nearest centroid:


For each data point, assign it to the cluster whose centroid is nearest (usually using the
Euclidean distance).

Recalculate the centroids:


After assigning all data points to clusters, recalculate the centroids as the mean of all
points within each cluster.

Repeat steps 2 and 3 until convergence:


Continue assigning points to the nearest centroid and recalculating the centroids until
the assignments no longer change, meaning the algorithm has converged.

SBS ©PROPRIETARY Page 4


BSC-IT & BCA www.sbs.ac.in

Example:
Let's say you have the following 2D data points:
[(1, 2), (2, 3), (6, 7), (8, 8), (5, 6)]

You want to use K-means to group them into 2 clusters (K=2):

Initially, pick 2 random points, say (1, 2) and (6, 7), as the centroids. Assign each
point to the nearest centroid.
Calculate the new centroids based on the mean of the assigned points. Repeat
until the centroids stop changing.
Real-Time Example:
In customer segmentation, K-means can be used to group customers based on their
spending habits. For instance, customers who spend similarly on products like
electronics, clothes, and groceries can be grouped into one cluster.

3. C-Means Clustering (Fuzzy C-Means)

C-means clustering, or fuzzy C-means clustering, is a variation of K-means where each


data point can belong to multiple clusters with a certain degree of membership. This is
different from K-means, where each point belongs strictly to one cluster.

How C-means Algorithm Works:

Choose K centroids and initialize membership values:


Randomly select K centroids, and for each point, initialize its membership to each
cluster.

Calculate membership degrees:


Each data point’s degree of membership to each cluster is calculated based on its
distance from the centroids. The closer a point is to a centroid, the higher its
membership in that cluster.

SBS ©PROPRIETARY Page 5


BSC-IT & BCA www.sbs.ac.in

Update the centroids:


Calculate the new centroid as the weighted average of all data points, where the weights
are the membership degrees.

Repeat steps 2 and 3 until convergence:


Continue updating the membership degrees and centroids until the membership values
stabilize.

Real-Time Example:

In an image classification task, pixels may belong to multiple categories (e.g., a pixel
could be partly part of a "sky" cluster and partly part of a "cloud" cluster). Fuzzy C-
means can be used to model such ambiguities.

4. Hierarchical Clustering

Hierarchical clustering creates a tree-like structure (also called a dendrogram) that


shows the hierarchy of clusters. There are two types of hierarchical clustering:

Agglomerative (bottom-up approach):


Start with each data point as its own cluster. Then, iteratively merge the closest
clusters until only one cluster remains.

Divisive (top-down approach):


Start with all data points in a single cluster and iteratively split the clusters into
smaller ones.

How Hierarchical Clustering Works:


Agglomerative (bottom-up approach):

Each point is initially its own cluster.


At each step, find the two closest clusters and merge them into one. Repeat this
process until all points are in one cluster.
Divisive (top-down approach):

Start with all points in a single cluster.

SBS ©PROPRIETARY Page 6


BSC-IT & BCA www.sbs.ac.in

At each step, split the most heterogeneous cluster into two. Repeat
this process until each point is in its own cluster.
Example:
Consider the following data points:
[(1, 2), (2, 3), (6, 7), (8, 8), (5, 6)]

Initially, each data point is its own cluster.


Find the two closest points (e.g., (1, 2) and (2, 3)) and merge them into one cluster.
Continue this process until there is only one cluster.

Real-Time Example:
In a document clustering task, hierarchical clustering can be used to group similar
documents together, starting from individual documents and
progressively merging them based on similarity (e.g., cosine similarity). It’s particularly
useful when you want to visualize the relationships between clusters using a
dendrogram.

5. How Hierarchical Clustering Works


Here’s a deeper look at Agglomerative Hierarchical Clustering with an example:

Start with each point as its own cluster:

If you have 5 points: [(1, 2), (2, 3), (6, 7), (8, 8), (5, 6)], initially you have 5 clusters, one
for each point.
Calculate distances between each pair of clusters:

Use a distance metric like Euclidean distance to calculate the distance between each pair
of clusters.
Merge the closest clusters:

Identify the two clusters that are closest (in terms of distance) and merge them into one.
Repeat until all points are in one cluster:

SBS ©PROPRIETARY Page 7


BSC-IT & BCA www.sbs.ac.in

Continue merging clusters until only one cluster remains.


Dendrogram:
The result of hierarchical clustering is often visualized as a dendrogram, which shows
the sequence of merges or splits.

Real-Time Example:
In the case of gene expression analysis, hierarchical clustering can be used to group
genes that have similar patterns of expression across multiple conditions. This allows
biologists to identify genes that work together in specific biological processes.

Summary of UNIT –III with Example

SBS ©PROPRIETARY Page 8


BSC-IT & BCA www.sbs.ac.in

C-Means Clustering:

In fuzzy clustering, each point can belong to multiple clusters with a certain degree of
membership.
Simple Example:

Points: [(1, 2), (2, 3), (8, 9)]


K = 2 (2 clusters):
Point (2, 3) might belong 60% to Cluster 1 and 40% to Cluster 2. Hierarchical
Clustering (Agglomerative):

You start with each point as a cluster and repeatedly merge the closest clusters.
Eventually, all points belong to one cluster. Simple
Example:

Points: [(1, 2), (2, 3), (6, 5), (8, 8)]


Start with: [(1, 2)], [(2, 3)], [(6, 5)], [(8, 8)]
After merging: [(1, 2), (2, 3)], [(6, 5)], [(8, 8)]
Continue until one cluster remains.
Real-World Applications of Clustering with Simple Examples: Customer
Segmentation (K-Means):
A retailer wants to group customers based on spending habits. Using K- means, the
retailer can segment customers into groups like "high-spending" and "low-spending"
customers.

Image Segmentation (C-Means):


In medical image analysis, fuzzy clustering can be used to detect and segment various
regions of interest in an image (e.g., tumors), where each pixel might belong to more
than one region.

Gene Expression Clustering (Hierarchical Clustering):


Biologists use hierarchical clustering to group genes with similar expression patterns,
which can help identify genes involved in similar biological processes or diseases.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

What Does "K-means" Mean?


Refers to the number of clusters you want to divide the data into. For example, if you
want to group your data into 3 clusters, K will be 3. So, the "K" is simply the number of
clusters you are aiming for.

Means: Refers to the centroids (mean) of the clusters. During the algorithm, the mean
(average) of the data points in each cluster is calculated and used as the cluster's
"center" or centroid. This is why it's called K-means, as it refers to dividing data into K
clusters and finding the mean of the points within each cluster.

Summary:
K = Number of clusters
Means = The average (mean) of the points in each cluster
So, K-means is simply the algorithm that divides data into K clusters, where each cluster
is represented by its centroid (the mean of the data points in that cluster).

FAQ:- What is Hierarchical Clustering?


Hierarchical Clustering is a type of clustering algorithm that builds a hierarchy of
clusters by merging or splitting existing clusters.

How Hierarchical Clustering Works:


Here's a step-by-step explanation:

1. Start with individual data points: Each data point is treated as a separate
cluster.
2. Calculate distances: Calculate the distance between each pair of clusters.
3. Merge closest clusters: Merge the two closest clusters into a new cluster.
4. Repeat steps 2-3: Continue calculating distances and merging clusters until all
clusters are merged into a single cluster.
5. Build the hierarchy: The resulting hierarchy of clusters is represented

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

as a dendrogram (a tree-like diagram).


Example:
Let's say we have 5 data points: A, B, C, D, and E.

Step 1: Each data point is treated as a separate cluster. A, B, C, D, E

Step 2: Calculate distances between each pair of clusters.

| |A|B|C|D|E|
| --- | --- | --- | --- | --- | --- |
|A|0|2|4|6|8|
|B|2|0|3|5|7|
|C|4|3|0|2|4|
|D|6|5|2|0|3|
|E|8|7|4|3|0|

Step 3: Merge closest clusters.

A and B are closest (distance 2), so merge them into a new cluster (AB). AB, C, D, E

Step 4: Repeat steps 2-3.

Calculate distances between each pair of clusters.

| | AB | C | D | E |
| --- | --- | --- | --- | --- |
| AB | 0 | 3 | 5 | 7 |
|C|3|0|2|4|
|D|5|2|0|3|
|E|7|4|3|0|

Merge closest clusters.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

C and D are closest (distance 2), so merge them into a new cluster (CD). AB, CD, E
Repeat steps 2-3 until all clusters are merged into a single cluster. The

resulting hierarchy of clusters is:

AB
CD
E

This hierarchy can be represented as a dendrogram:

+ ----------------- +
| |
| AB |
| / \ |
|/ \|
+------+ +------+ +---------- +
| A | | B | | CD |
+------+ +------+ +---------- +
|
|
v
+------+ ------- +
| |
| C D |
| |
+------+ ------- +

This dendrogram shows the hierarchy of clusters, with the most similar clusters
merged first.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

#USES CASE OG HIERARICHAL CLUSTERING

1. Customer Segmentation
A company wants to segment its customers based on their buying
behavior. They collect data on customer demographics, purchase history,
and browsing behavior.

Hierarchical Clustering can be used to group customers into clusters based on


their similarities. For example:

- Cluster 1: Young adults who frequently buy electronics


- Cluster 2: Middle-aged women who often purchase clothing
- Cluster 3: Retirees who mostly buy groceries

2. Gene Expression Analysis


Researchers want to identify patterns in gene expression data from different
tissue samples.

Hierarchical Clustering can be used to group genes with similar expression


profiles. For example:

- Cluster 1: Genes involved in cell proliferation


- Cluster 2: Genes related to immune response
- Cluster 3: Genes involved in metabolism

3. Image Segmentation
A computer vision system wants to segment an image into regions based on pixel
similarities.

Hierarchical Clustering can be used to group pixels into clusters based on their
color, texture, and intensity features. For example:

- Cluster 1: Pixels belonging to the sky


- Cluster 2: Pixels belonging to the trees
- Cluster 3: Pixels belonging to the buildings

4. Text Classification

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

A text analysis system wants to classify documents into categories based on their
content.

Hierarchical Clustering can be used to group documents into clusters based on


their semantic similarities. For example:

- Cluster 1: Documents related to sports


- Cluster 2: Documents related to politics
- Cluster 3: Documents related to entertainment

Example 1: Customer Groups


Imagine you own a coffee shop. You want to group your customers based on their
preferences.

You collect data on:

- What type of coffee they like (latte, cappuccino, etc.)


- How often they visit
- What time of day they usually come

Hierarchical Clustering helps you group customers into clusters based on their
similarities. For example:

- Cluster 1: Morning latte lovers


- Cluster 2: Afternoon cappuccino fans
- Cluster 3: Weekend coffee enthusiasts

Example 2: Music Playlist


Imagine you have a music library with thousands of songs. You want to group
similar songs together.

You collect data on:

- Song genre (pop, rock, hip-hop, etc.)


- Song tempo (fast, slow, etc.)

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Hierarchical Clustering helps you group songs into clusters based on their
similarities. For example:

- Cluster 1: Upbeat pop songs


- Cluster 2: Slow rock ballads
- Cluster 3: Happy hip-hop tracks

These examples illustrate how Hierarchical Clustering can help group similar
things together based on their characteristics.

No genes or image segmentation involved!

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

UNIT –IV

Why Reinforcement Learning?


Reinforcement Learning (RL) is a type of machine learning that involves training an
agent to take actions in an environment to maximize a reward signal. RL is useful when:

- The problem is complex and difficult to model.


- The agent needs to learn from trial and error.
- The goal is to optimize a long-term reward.

Example:
A self-driving car needs to learn how to navigate through a city. The car receives a
reward signal for reaching its destination safely and efficiently. Through trial and error,
the car learns to take actions (e.g., turn left, accelerate) to maximize the reward signal.

Elements of Reinforcement Learning


1. Agent: The entity that takes actions in the environment.
2. Environment: The external world that the agent interacts with.
3. Actions: The decisions made by the agent.
4. States: The current situation of the environment.
5. Reward: The feedback signal received by the agent for taking an action.

Exploration vs Exploitation Dilemma


The agent needs to balance two competing goals:

1. Exploration: Gathering information about the environment to learn the optimal


policy.
2. Exploitation: Using the current knowledge to take actions that maximize the
reward.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Example:
A person wants to find the best restaurant in town. They can either try a new restaurant
(exploration) or go to a familiar restaurant that they know is good (exploitation).

Epsilon Greedy Algorithm


A simple algorithm to balance exploration and exploitation:

1. Choose a random action with probability ε (exploration).


2. Choose the action with the highest estimated value with probability (1 - ε)
(exploitation).

Example:
A person wants to play a game where they can choose between two buttons. They use an
ε-greedy algorithm with ε = 0.1. This means that 10% of the time, they will choose a
random button, and 90% of the time, they will choose the button that they think will
give them the highest reward.

Markov Decision Process (MDP)


A mathematical framework for modeling decision-making problems:

1. States: The current situation of the environment.


2. Actions: The decisions made by the agent.
3. Transitions: The probability of moving from one state to another.
4. Rewards: The feedback signal received by the agent for taking an action.

Example:
A person wants to navigate a grid world. The states are the different cells in the grid, the
actions are up, down, left, and right, the transitions are the probabilities of moving from
one cell to another, and the rewards are the feedback signals received for reaching
certain cells.

Q-values and V-values


Two important concepts in RL:

1. Q-values: The expected return for taking an action in a state.


2. V-values: The expected return for being in a state.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Example:
A person wants to play a game where they can choose between two buttons. The
Q-values represent the expected return for pressing each button in each state, and
the V-values represent the expected return for being in each state.

Q-Learning
A popular RL algorithm:

1. Initialize Q-values for each state-action pair.


2. Choose an action using an ε-greedy algorithm.
3. Take the action and observe the next state and reward.
4. Update the Q-values using the Q-learning update rule.

Example:
A person wants to play a game where they can choose between two buttons. They use Q-
learning to learn the optimal policy. The Q-learning update rule updates the Q-values
based on the observed reward and the next state.

α values
The learning rate in Q-learning:

1. α: The step size for updating the Q-values.


2. α ∈ [0, 1]: The learning rate should be between 0 and 1.

Example:
A person wants to play a game where they can choose between two buttons. They
use Q-learning with an α value of 0.1. This means that the Q-values will be
updated slowly, and the agent will learn the optimal policy gradually.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

1. Epsilon Greedy Algorithm


The Epsilon-Greedy Algorithm is a popular exploration-exploitation strategy in
reinforcement learning. The algorithm is used to balance exploration (choosing random
actions to discover the best ones) and exploitation (choosing the best-known action
based on current knowledge).

How it works:

 Greedy approach: With probability 1−ϵ1 - \epsilon1−ϵ, the algorithm chooses


the action that has the highest estimated reward (the one with the maximum Q-
value).
 Exploration approach: With probability ϵ\epsilonϵ, the algorithm chooses a
random action, allowing the agent to explore other possible actions.

Formula:

 Choose action a=arg⁡max⁡aQ(s,a)a = \arg\max_a Q(s, a)a=argmaxa Q(s,a)


with probability 1−ϵ1 - \epsilon1−ϵ (exploitation).
 Choose action a=random actiona = \text{random action}a=random action
with probability ϵ\epsilonϵ (exploration).

Example:

Let's say we are using the epsilon-greedy algorithm in a simple 2-state environment:

 State S1S_1S1 and S2S_2S2.


 Possible actions: A1A_1A1 and A2A_2A2.
 Initially, both Q-values Q(S1,A1)Q(S_1, A_1)Q(S1,A1) and Q(S1,A2)Q(S_1,
A_2)Q(S1,A2) are zero.

If ϵ=0.1\epsilon = 0.1ϵ=0.1 (i.e., 10% exploration):

 In 90% of the time, the agent will pick the action with the highest Q- value.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in
 In 10% of the time, the agent will pick a random action.

2. Markov Decision Process (MDP)


A Markov Decision Process (MDP) is a mathematical framework for modeling
decision-making situations. It provides a formal way to describe an environment in
reinforcement learning where an agent interacts with it.

Components of MDP:

1. States SSS: All possible situations or configurations the agent can be in.
2. Actions AAA: The set of actions the agent can take.
3. Transition Model PPP: The probability P(s′∣s,a)P(s'|s, a)P(s′∣s,a) that the agent
will transition from state sss to state s′s's′ when action aaa is taken.
4. Reward Function R(s,a)R(s, a)R(s,a): The immediate reward received when
action aaa is taken in state sss.
5. Discount Factor γ\gammaγ: A factor that discounts future rewards (0
≤γ≤1\leq \gamma \leq 1≤γ≤1).
6. Policy π\piπ: A strategy or function that determines the action to take based on
the current state.

Example:

In a grid world MDP:

 States: Each position on the grid (e.g., S1S_1S1, S2S_2S2, etc.)


 Actions: Moving up, down, left, right.
 Transition Model: For example, moving right in state S1S_1S1 may lead to
S2S_2S2.
 Rewards: Moving to a particular state might give a reward (e.g., +1 for reaching
a goal state).

3. Q-Values and V-Values


In reinforcement learning, Q-values and V-values are important for evaluating the
desirability of states and state-action pairs.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Q-Value:
The Q-value represents the expected future reward (return) when starting from a state
sss and taking an action aaa, and then following a certain policy. It is a key part of Q-
learning.

 Q(s, a): The expected cumulative reward for taking action aaa in state sss and
following the policy thereafter.

V-Value:

The V-value represents the value of a state, which is the expected return when starting
in that state and following a policy.

 V(s): The expected cumulative reward starting in state sss and following the
policy.

Relationship:

 The Q-value is more specific and action-dependent.


 The V-value is a summary of the values of all actions from a state, i.e.,
V(s)=max⁡aQ(s,a)V(s) = \max_a Q(s, a)V(s)=maxaQ(s,a).

4. Q-Learning
Q-learning is a model-free reinforcement learning algorithm used to find the optimal
policy in an MDP. It learns the Q-values through interactions with the environment.

Example:

Consider an agent in a 3-state environment (S1,S2,S3S_1, S_2, S_3S1,S2,S3) where:

 The agent starts in S1S_1S1, takes action A1A_1A1, and moves to S2S_2S2,
receiving a reward of 5.
 The Q-value update rule will adjust Q(S1,A1)Q(S_1, A_1)Q(S1,A1) to account
for this reward.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Steps:

1. Initialize all Q-values to 0.


2. At each step, choose an action using epsilon-greedy.
3. Update Q-values based on the reward and next state.
4. Repeat for many episodes until convergence.

5. α (Alpha) Value (Learning Rate)


The learning rate α\alphaα controls how much the new information will influence the
existing Q-value.

Impact of α\alphaα:

 High α\alphaα (close to 1): The agent rapidly incorporates new


information, giving less weight to past experiences.
 Low α\alphaα (close to 0): The agent takes smaller steps and values previous
experiences more than new ones.

Conclusion

 Epsilon-Greedy: Balances exploration and exploitation.


 MDP: A mathematical framework for modeling decision processes.
 Q-values: Represent the expected future rewards for state-action pairs.
 V-values: Represent the value of states.
 Q-Learning: A model-free reinforcement learning algorithm to find the optimal
policy.
 α\alphaα: The learning rate that influences how quickly the agent learns
from new experiences.

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

IMPORTANT QUESTION OF UNIT-3,4

a. Machine Learning?

b. Hierarchical Clustering?

c. What is a random forest??

d. Define a confusion matrix.?

e. Differences between grid search and random search?

f. Markov Decision Process.?

g. Reinforcement Learning?

h. Supervised Learning?

i. Unsupervised Learning?

j. Use-cases of Machine Learning?

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Q2. Differentiate between unsupervised learning and reinforcement learning with the help
of suitable examples.

Q3. How are Decision Trees used for classification? Explain the steps of constructing a
decision tree with the help of an example.

Q4. Discuss the working of Naive Bayes classifier in detail. List the advantages and
disadvantages of this technique.

Q5. What is the exploration-exploitation in reinforcement learning? How is Epsilon Greedy


policy used to choose between exploration and exploitation?

Q6. Explain Why Reinforcement Learning Required with example? Write Elements of
Reinforcement Learning.

Q7. Give Examples of Machine learning in computer Applications? Include justification


for each.

Unit -4 Notes

1. What is Reinforcement Learning (RL)? How does it differ from supervised and unsupervised
learning?

2. Why is reinforcement learning important, and in which types of problems is it particularly


useful?

3. What are the key components of a reinforcement learning problem? Explain with examples.

4. Explain the concept of an agent, environment, reward, and action in RL.

5. What is a policy in reinforcement learning, and how does it guide an agent's behavior?

6. How does the concept of delayed rewards affect the learning process in reinforcement learning?

SBS ©PROPRIETARY Page


BSC-IT & BCA www.sbs.ac.in

Elements of Reinforcement Learning

1. What are the main elements of a reinforcement learning system? Explain the roles of the
agent, environment, actions, and rewards.

2. What is the role of a value function in reinforcement learning, and how does it
relate to decision-making?

3. How does the concept of state and action influence the decision-making process
in reinforcement learning?

4. Explain the difference between model-free and model-based reinforcement learning.

5. What is a reward function, and how does it shape the behavior of the agent in
reinforcement learning?

SBS ©PROPRIETARY Page

You might also like