AI Algoritm Course
AI Algoritm Course
Linear regression is a fundamental statistical and machine learning technique used to model
the relationship between a dependent variable y and one or more independent variables
It assumes that there is a linear relationship between the input(s) and the output.
In simple linear regression, the relationship between the independent variable xxx and the
dependent variable y is modeled as:
The goal of linear regression is to find the best-fit line that minimizes the difference between
the predicted values and the actual values. The difference is measured by the residual sum of
squares (RSS).
• Ordinary Least Squares (OLS). This method minimizes the sum of squared residuals:
1. Linearity: There is a linear relationship between the dependent and independent variables.
2. Independence: The residuals (errors) are independent of each other.
3. Homoscedasticity: The variance of the residuals is constant for all levels of the independent
variable(s).
4. Normality: The residuals are normally distributed.
6. Evaluation Metrics
To assess the performance of a linear regression model, several evaluation metrics are used:
2
POPULAR AI ALGORITM AND THEIR METRICS
• Root Mean Squared Error (RMSE):
R-squared (R2): The proportion of the variance in the dependent variable that is predictable
from the independent variables:
7. Regularization Techniques
Linear regression can be prone to overfitting, especially when there are many features. Two
common regularization techniques are:
1. Ridge Regression (L2 regularization): Adds a penalty term to the loss function:
2. Lasso Regression (L1 regularization): Adds a penalty term that uses the absolute values of
the coefficients:
Gradient descent is a common optimization algorithm used to minimize the MSE. The update
rule for the coefficients is:
3
POPULAR AI ALGORITM AND THEIR METRICS
For large datasets or complex models, OLS is computationally expensive. In such cases,
gradient descent is used to iteratively find the optimal values of the coefficients.
For each feature ,The update rule for gradient descent is:
4
POPULAR AI ALGORITM AND THEIR METRICS
10. Quiz Questions
Answer: D) Multicollinearity
5
POPULAR AI ALGORITM AND THEIR METRICS
6
POPULAR AI ALGORITM AND THEIR METRICS
Problem 3: Regularization
Given Data:
Question:
Solution:
• Use the formula for Ridge regression to compute the coefficients. The model will have less
variance compared to standard linear regression.
7
POPULAR AI ALGORITM AND THEIR METRICS
K-Means is one of the most widely used unsupervised machine learning algorithms for
clustering tasks. The goal of the K-Means algorithm is to partition a given dataset into K
clusters, where each data point belongs to the cluster with the nearest mean.
It is one of the most popular unsupervised machine learning techniques. It is widely used for
clustering analysis, pattern recognition, and data mining applications. This course will provide
a detailed overview of the K-Means algorithm, its working mechanism, mathematical
foundations, and applications. Additionally, quizzes, exercises, and problems will be included
to reinforce the understanding of the concepts.
0. Introduction to Clustering
What is Clustering?
Clustering is a technique used in machine learning where similar data points are grouped
together into a cluster, and different clusters are as dissimilar as possible. It is an
unsupervised learning task, meaning there are no labels or categories provided for the data
points.
The K-Means algorithm is a partitional clustering technique where the goal is to divide a
dataset into KKK distinct clusters. Each cluster is represented by its centroid, which is the
mean of all the points in that cluster.
Key Concepts:
8
POPULAR AI ALGORITM AND THEIR METRICS
2. Steps in K-Means Algorithm
1. Initialize Centroids:
o Choose K initial centroids randomly or using some heuristic method (e.g., K-
Means++).
2. Assign Points to the Nearest Centroid:
o For each data point, assign it to the cluster whose centroid is closest. The
distance is usually measured by Euclidean distance.
3. Recompute Centroids:
o After assigning all points to clusters, compute the new centroids by calculating
the mean of all points assigned to each cluster.
4. Repeat:
o Repeat the process of assigning points and updating centroids until the
centroids no longer change (i.e., convergence) or until a maximum number of
iterations is reached.
3. Mathematical Formula
• Euclidean Distance: The distance between a data point xix_ixi and the centroid
ckc_kck of a cluster kkk is given by the Euclidean distance formula:
9
POPULAR AI ALGORITM AND THEIR METRICS
5. Convergence Criteria
• Elbow Method: Plot the sum of squared distances (inertia) as a function of KKK. The
"elbow" point indicates the optimal number of clusters.
• Silhouette Score: Measures how similar a point is to its own cluster compared to other
clusters. Higher silhouette scores suggest better clustering.
• Gap Statistic: Compares the performance of clustering against random clustering to
determine the best KKK.
7. Applications of K-Means
10
POPULAR AI ALGORITM AND THEIR METRICS
• Image Compression: Reducing the number of colors used in an image by clustering pixel
colors.
• Anomaly Detection: Identifying outliers by finding data points that do not fit well into any
cluster.
• Document Clustering: Grouping similar documents based on text data.
7. Applications of K-Means
Advantages:
1. Scalability: K-Means can be efficient and scalable, particularly with large datasets.
2. Simple to Understand and Implement: It’s one of the most intuitive clustering algorithms.
3. Works well when clusters are spherical and well-separated.
Limitations:
11
POPULAR AI ALGORITM AND THEIR METRICS
3. Plot the dataset and visualize the clusters and their centroids.
1. Given a dataset of 100 points, how many distance calculations are required in the first
iteration of K-Means with K=5K = 5K=5?
2. If you use the K-Means algorithm with K=3K = 3K=3 on a 2D dataset, how will the algorithm
handle overlapping clusters?
9. Problems
• You are given a dataset of customer purchase history and need to segment customers into
distinct groups. How would you determine the optimal number of clusters for K-Means?
Explain the steps and the methods you would use.
• The K-Means algorithm tends to perform poorly on non-spherical or non-convex clusters. How
would you address this limitation in practice? Suggest an alternative clustering algorithm that
could handle non-spherical clusters.
Scenario: A retail company wants to segment its customer base into different clusters based
on purchasing behavior. They have a dataset of 1000 customers, each described by their
purchase history across 5 different categories: electronics, clothing, food, entertainment, and
home goods.
Steps:
1. Data Preprocessing:
o Normalize the data (since the scales of the features might vary).
o Handle missing values (if any).
2. Applying K-Means:
o Use the elbow method to determine the optimal number of clusters, KKK.
o Run K-Means clustering with KKK chosen from the elbow plot.
o Assign each customer to a cluster.
12
POPULAR AI ALGORITM AND THEIR METRICS
3. Analyzing Results:
o Visualize the clusters using PCA (Principal Component Analysis) to reduce the
dimensionality.
o Interpret the characteristics of each cluster. For example, a cluster might represent
"high spenders" or "price-sensitive shoppers".
4. Actionable Insights:
o Use the segmentation to target marketing campaigns or design personalized offers for
each customer segment.
• Use the Elbow Method to plot the total within-cluster sum of squares (inertia) as a function of
KKK. The point where the decrease in inertia slows down is the optimal number of clusters.
• Additionally, use the Silhouette Score to confirm the choice of KKK. A higher silhouette score
indicates well-separated clusters.
• K-Means assumes spherical clusters. If the data has non-spherical clusters (e.g., elongated or
concentric shapes), K-Means might perform poorly.
• Alternative:
o Use DBSCAN (Density-Based Spatial Clustering of Applications with Noise), which can
find arbitrarily shaped clusters and is robust to noise.
o Another option is Gaussian Mixture Models (GMM), which assumes that the data is a
mixture of several Gaussian distributions and can capture elliptical clusters.
• K-Means++ Initialization:
o To improve the initialization of centroids and reduce the likelihood of poor
convergence, K-Means++ spreads out the initial centroids by selecting them one by
one, with probability proportional to their squared distance from the nearest already
chosen centroid.
• Mini-Batch K-Means:
o A variation of K-Means that works on small random subsets (mini-batches) of the data
at each iteration, speeding up convergence for large datasets.
• K-Medoids:
o Unlike K-Means, which uses the mean of data points as the centroid, K-Medoids uses
actual data points as the center of the cluster. It is more robust to noise and outliers.
13
POPULAR AI ALGORITM AND THEIR METRICS
Decision Tree Algorithm for AI
A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It is a tree-like structure where:
Decision trees are popular because they are easy to interpret and visualize, making them
useful in decision-making processes.
1. Root Node: The top node of the tree, where the dataset is split based on a feature.
2. Internal Nodes: Nodes that represent features or attributes; the decision points in the tree.
3. Leaf Nodes: The final nodes of the tree that represent a decision or classification output.
4. Edges/Branches: Represent the decision rules or conditions that split the data.
1. Choosing a Feature to Split on: The key idea in decision trees is to select a feature that
maximizes the information gain or minimizes the impurity (based on criteria like Gini
index or Entropy for classification, or variance reduction for regression).
2. Recursive Splitting: The tree is built recursively by selecting the best feature at each
step and splitting the dataset. This process is repeated for each child node.
3. Stopping Criteria: The recursion stops when:
o A stopping condition is met (e.g., a certain tree depth, or a node contains data points
that belong to the same class).
o All data points in a node belong to the same class.
o The feature set is exhausted.
14
POPULAR AI ALGORITM AND THEIR METRICS
Information Gain:
b) Gini Index
Gini Impurity:
c) Chi-square
• The Chi-square test can also be used as a splitting criterion to check for independence
between features and the target variable.
15
POPULAR AI ALGORITM AND THEIR METRICS
6. Types of Decision Trees
1. Classification Trees: Used for predicting categorical outcomes. The target variable is
discrete (e.g., Yes/No, Spam/Not Spam).
2. Regression Trees: Used for predicting continuous outcomes (e.g., predicting house
prices, temperature).
A decision tree can become too complex, leading to overfitting. Pruning helps simplify the
tree by removing nodes that have little predictive power.
1. Pre-Pruning: Stop the tree-building process early (e.g., limit tree depth, minimum number of
samples in a node).
2. Post-Pruning: Remove nodes after the tree has been fully grown (e.g., using cross-validation).
• Overfitting: Decision trees can easily overfit the data, especially if the tree is too deep.
• Instability: Small changes in the data can result in a very different tree structure.
• Bias toward features with more levels: Decision trees tend to favor features with more
categories or levels.
Answers:
16
POPULAR AI ALGORITM AND THEIR METRICS
1. A decision tree is used to make predictions based on feature data, classifying data (for
classification) or predicting continuous values (for regression).
2. Root node, internal nodes, leaf nodes.
3. Gini Index measures impurity of a node, while Information Gain measures the reduction in
entropy. Gini is used in CART, while IG is used in ID3.
4. Overfitting happens when the model is too complex and fits noise in the data. Pruning,
limiting tree depth, or setting a minimum number of samples in a node helps prevent
overfitting.
5. Classification trees predict categorical outcomes, while regression trees predict continuous
values.
Problem: Given a dataset of customers (Age, Income, Purchased Product), build a decision
tree to predict whether a customer will purchase a product (Yes/No).
Steps:
Solution:
• Use a tool like Scikit-learn in Python to build a decision tree. Alternatively, you can use a
decision tree diagram on paper to represent the splits.
python
Copier le code
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Predictions
predictions = clf.predict(X_test)
Problem: Visualize the decision tree you created in the previous exercise.
Solution:
17
POPULAR AI ALGORITM AND THEIR METRICS
python
Copier le code
import matplotlib.pyplot as plt
from sklearn import tree
Answers:
1. Overfitting occurs when a decision tree learns patterns that do not generalize well to unseen
data, causing poor performance on new data.
2. Pruning (pre-pruning or post-pruning) and limiting tree depth or the minimum number of
samples in a leaf node.
3. Post-pruning involves trimming branches from a fully grown tree that do not significantly
improve prediction accuracy.
Advanced Problems
Problem 1: Implement a decision tree for regression using the mean squared error (MSE)
criterion.
python
Copier le code
from sklearn.tree import DecisionTreeRegressor
# Predictions
regressor_predictions = regressor.predict(X_test)
python
Copier le code
from sklearn.model_selection import cross_val_score
18
POPULAR AI ALGORITM AND THEIR METRICS
# Perform cross-validation
scores = cross_val_score(clf, X_train, y_train
19