0% found this document useful (0 votes)

4 views

ML-Notes

The document provides an overview of various machine learning algorithms, including Decision Trees, Support Vector Machines (SVM), K-Means Clustering, Hierarchical Clustering, and Dimensionality Reduction techniques. It explains the structure and functioning of Decision Trees, the principles behind SVM, and methods for clustering and reducing dimensionality in datasets. Key concepts such as overfitting, splitting criteria, and feature extraction methods like PCA are also discussed.

Uploaded by

Anton Vergara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

ML-Notes

Uploaded by

Anton Vergara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

DECISION TREES

Overview

 Decision Tree is a supervised learning algorithm used for

classification and regression tasks.

 It has a hierarchical structure:

o Root node: starting point of the tree.

o Internal nodes: represent decisions based on feature tests.

o Branches: represent the outcomes of those tests.

o Leaf nodes: represent final class labels or predictions.

Working Mechanism

 The algorithm starts at the root.

 It tests attributes at each internal node.

 Based on test results, it follows a branch to the next node.

 Once a leaf node is reached, a prediction or class label is assigned.

Popular Decision Tree Algorithms

 ID3 (Iterative Dichotomiser 3):

o Developed by Ross Quinlan.

o Uses Entropy and Information Gain to evaluate splits.

 C4.5:

o Successor of ID3, also by Quinlan.

o Uses Gain Ratio or Information Gain.

 CART (Classification and Regression Trees):

o Developed by Leo Breiman.

o Uses Gini Impurity as the split criterion.

Splitting Criteria

Entropy and Information Gain

 Entropy: Measures impurity or randomness in the dataset.

o Entropy = 0 if all samples belong to one class.

o Maximum entropy (1 for binary classification) occurs when

samples are equally divided among classes.

 Information Gain: Measures the reduction in entropy after a dataset

is split on an attribute.

o The attribute with the highest information gain is chosen for the
split.

Gini Impurity

 Measures the probability of misclassifying a randomly chosen instance.

 Lower Gini values are better.

 Similar to entropy but computationally simpler.

Overfitting and Instability

 Decision trees are prone to overfitting because they can grow very
deep and model noise in the data.

 Overfitting reduces generalization performance on unseen data.

 Instability: Small changes in training data can lead to very different

tree structures.

 Pruning techniques help mitigate overfitting:

o Pre-pruning: Limit tree depth or minimum samples per node

(e.g., max_depth, min_samples_split).

o Post-pruning: Grow the full tree, then cut back using

complexity measures (e.g., ccp_alpha in Scikit-learn).

Random Forests

Overview

 A powerful ensemble learning method for classification and

regression.
 Composed of multiple decision trees.

 Built using a technique called bagging:

o Each tree trains on a random subset of the data (with

replacement).

 The final prediction is made by majority vote (classification) or mean

value (regression).

Key Features

 Reduces overfitting and instability of individual trees.

 Ensures diversity among trees using:

o Bootstrapping: Sampling training data with replacement for

each tree.

o Feature randomness: Randomly selecting a subset of features

for splitting at each node (controlled via max_features).

Important Properties

 Works best when trees are uncorrelated.

 Aggregating predictions from uncorrelated trees improves model

accuracy and robustness
SUPPORT VECTOR MACHINES (SVM)

Overview

 SVM is a supervised learning algorithm mainly used for

classification tasks.

 It seeks to find the optimal hyperplane that separates data

classes with the maximum margin.

 Originally designed for linear classification, but through the kernel

trick, SVM can also handle non-linear problems.

Linear SVM

 For linearly separable data, SVM looks for the line (in 2D) or
hyperplane (in higher dimensions) that best separates the classes.

 Among many possible dividing lines, SVM chooses the one that
maximizes the margin, i.e., the distance between the hyperplane
and the closest data points from each class (called support
vectors).

 This large margin leads to better generalization on unseen data.

Key Terminology

Term Description

Hyperplan A decision boundary that separates data into classes. In linear

e SVM, defined by wx+b=0wx + b = 0wx+b=0.

Support Data points closest to the hyperplane. Crucial for defining the
Vectors margin.

The distance between the hyperplane and the support vectors.

Margin
Larger margins are better.

Hard No misclassifications allowed. Used when data is perfectly

Margin separable.

Soft Allows some misclassification to improve generalization,

Margin especially on noisy data.
Term Description

A regularization parameter that controls trade-off between

C margin width and classification error. High C = less tolerance
for misclassification.

The loss function used to penalize misclassified points or

Hinge Loss
margin violations.

Non-linear SVM and the Kernel Trick

 When data is not linearly separable, SVM uses kernel functions to

map input data to a higher-dimensional space where it can be
linearly separated.

 Common kernel functions:

o Linear Kernel: No transformation (for linearly separable data).

o Polynomial Kernel: Captures polynomial relationships.

o Gaussian (RBF) Kernel: Effective for complex, non-linear

boundaries.

o Sigmoid Kernel: Similar to neural network activation functions.

Robustness and Generalization

 SVM is robust to outliers due to its margin-maximization strategy.

 The soft margin formulation balances:

o Maximizing margin (to generalize well),

o Allowing some misclassification (to handle noise/outliers),

o Controlled via the C parameter.

UNSUPERVISED LEARNING – K-MEANS CLUSTERING
 k-means clustering takes the number of clusters kkk and a dataset of
nnn objects as input.

 The algorithm outputs kkk clusters by minimizing within-cluster

variances.

 Data is split into kkk clusters where:

o Objects in the same cluster have high similarity.

o Objects in different clusters have low similarity.

 Automatically classifies unlabeled data into groups based on feature

similarity.

Choosing the Optimal Number of Clusters kkk

 Elbow Method:
Plot the inertia (sum of squared distances) and find the point where
additional clusters provide minimal gain (the "elbow").

 Silhouette Score Method:

Measures how similar an object is to its own cluster compared to other
clusters. Higher scores indicate better-defined clusters.

 Inertia:
Defined as the sum of squared distances between each point and its
assigned cluster's centroid.

k-Means Clustering Algorithm

1. Initialization:
Randomly select kkk centroids, where kkk is the number of desired
clusters. Each centroid represents the center of a cluster.

2. Expectation Step:
Assign each data point to the nearest centroid based on distance.

3. Maximization Step:
Recalculate the centroid as the mean of all points assigned to that
cluster.
4. Repeat:
Continue the expectation and maximization steps until centroids
stabilize and no further changes occur.

Unsupervised Learning – Hierarchical Clustering

 Divides the dataset across different levels to form a tree-like structure.

 Two main approaches:

o Bottom-Up (Agglomerative):
Starts with individual points and merges them into clusters.

o Top-Down (Divisive):
Starts with one large cluster and splits it into smaller clusters.

 The result is a tree diagram (dendrogram) where:

o The root represents the complete dataset.

o The leaves represent individual data points.

Unsupervised Learning – Density-Based Clustering (DBSCAN)

 DBSCAN groups together data points that are closely packed in feature
space.

 Points in low-density regions are considered noise or outliers.

 It identifies clusters as dense regions that are separated by areas of

lower density.

Applications of Clustering

1. Preprocessing for Supervised Learning:

Clustering can be used to reduce dimensionality and simplify the data
before applying supervised algorithms.

2. Semi-Supervised Learning:
Useful when only a small subset of data is labeled. Clustering helps
detect patterns in the unlabeled portion.
3. Image Segmentation:
Pixels with similar characteristics (color, intensity, texture) are grouped
into clusters. Each cluster forms a segment or region in the image.
Dimensionality Reduction

Why Reduce Dimensionality?

Problems with Too Many Features:

 Curse of Dimensionality: As the number of features increases, data

becomes sparse. Models struggle to generalize well.

 Overfitting: High-dimensional data can lead to the model learning noise

instead of patterns.

 Computational Cost: More features mean slower training and

prediction.

 Redundancy and Noise: Some features may be irrelevant or highly

correlated.

Goal: Retain only the most informative features to simplify models and
improve their efficiency and accuracy.

What is Dimensionality Reduction?

Dimensionality reduction refers to the process of reducing the number of

input variables (features) in a dataset while preserving as much meaningful
information as possible.

This is done either by:

 Removing irrelevant/redundant features

 Combining or transforming features into a lower-dimensional space

Two Main Approaches:

1. Feature Selection – Selecting a subset of the original features.

2. Feature Extraction – Transforming or combining existing features to

create new ones.

Feature Selection

Selects existing features without modifying them.

Techniques:
 Filter Methods: Use statistical tests (e.g., correlation, chi-square,
ANOVA F-test). Example: remove features with low variance.

 Wrapper Methods: Use a machine learning model to evaluate subsets

of features. Example: Recursive Feature Elimination (RFE).

 Embedded Methods: Feature selection is integrated into the model

itself. Example: Lasso Regression (L1 regularization reduces some
coefficients to zero).

Feature Extraction

Transforms or combines original features into a new feature space.

Linear Methods:

 PCA (Principal Component Analysis): Projects data into a lower-

dimensional space that captures the most variance.

 LDA (Linear Discriminant Analysis): Supervised technique that finds the

axes maximizing class separability.

Non-Linear Methods:

 t-SNE (t-distributed Stochastic Neighbor Embedding): Good for

visualizing high-dimensional data in 2D/3D; preserves local structure.

 UMAP (Uniform Manifold Approximation and Projection): Similar to t-

SNE but faster and better at preserving global structure.

 Kernel PCA: Extension of PCA using kernel methods to capture non-

linear patterns.
Principal Component Analysis (PCA)

PCA is a linear feature extraction technique introduced by Karl Pearson

(1901). It reduces dimensionality while retaining as much variability as
possible in the data.

Key Points:

 Transforms high-dimensional data into a lower-dimensional space.

 Maximizes variance in the new axes (principal components).

 Principal components are linear combinations of the original features.

 First principal component captures the most variance; the second

captures the most remaining variance, and so on.

 Components are uncorrelated and not directly interpretable.

Steps to Perform PCA

1. Standardize the Data

o Normalize the range of the continuous variables so each has a

mean of 0 and standard deviation of 1.

2. Compute the Covariance Matrix

o Shows how variables vary with each other.

o Covariance matrix is symmetric and has dimensions p x p, where

p is the number of features.

3. Compute Eigenvectors and Eigenvalues

o Eigenvectors determine the direction of the new feature space

(principal components).

o Eigenvalues measure the variance carried in each eigenvector.

o Rank eigenvalues in descending order to prioritize components.

4. Create the Feature Vector

o Choose top k eigenvectors (with the highest eigenvalues).

o Combine these into a matrix called the feature vector.

5. Recast the Data

o Multiply the original standardized data by the feature vector to
transform it into the new feature space (principal components).

This results in a dataset with reduced dimensions while preserving as much

variance as possible.

What's Holding You Back 30 Days To Having The Courage and Confidence
No ratings yet
What's Holding You Back 30 Days To Having The Courage and Confidence
20 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
IDS26 Clustering and Classification
No ratings yet
IDS26 Clustering and Classification
30 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
DSBDUNITIII_T1729232981820-1
No ratings yet
DSBDUNITIII_T1729232981820-1
26 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
Data Analytics Unit IV
No ratings yet
Data Analytics Unit IV
36 pages
Previous Year Placement Questions of ISI KOLKATA
No ratings yet
Previous Year Placement Questions of ISI KOLKATA
9 pages
FML - |||
No ratings yet
FML - |||
7 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
DL Unit 1
No ratings yet
DL Unit 1
20 pages
Module Iii
No ratings yet
Module Iii
15 pages
Top 10 Data Mining Algorithms
No ratings yet
Top 10 Data Mining Algorithms
65 pages
Unit 2
No ratings yet
Unit 2
57 pages
classification
No ratings yet
classification
34 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
ML Mod 4
No ratings yet
ML Mod 4
13 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Dl
No ratings yet
Dl
10 pages
CZ4032 Data Analytics & Mining Notes
No ratings yet
CZ4032 Data Analytics & Mining Notes
16 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
MLQB2
No ratings yet
MLQB2
11 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
ML models
No ratings yet
ML models
21 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
ML ModuleUntitled 2
No ratings yet
ML ModuleUntitled 2
8 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Machine Learning
No ratings yet
Machine Learning
32 pages
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
No ratings yet
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
34 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
SVM Unit3
No ratings yet
SVM Unit3
23 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Classifying in Machine Learning
No ratings yet
Classifying in Machine Learning
26 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
60 pages
Unit 3 big data
No ratings yet
Unit 3 big data
50 pages
Data Mining Concepts Explained (2)[1]
No ratings yet
Data Mining Concepts Explained (2)[1]
2 pages
Unit-IV_new (1)
No ratings yet
Unit-IV_new (1)
18 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Slides Courtesy: Ling Chen [email protected]
No ratings yet
Slides Courtesy: Ling Chen [email protected]
42 pages
Classification
No ratings yet
Classification
7 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Recommendation Systems
No ratings yet
Recommendation Systems
27 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Detection Tools
No ratings yet
Detection Tools
5 pages
IAS Module 6-8
No ratings yet
IAS Module 6-8
20 pages
ECC Budget Proposal
No ratings yet
ECC Budget Proposal
2 pages
Research Methods Reviewer
No ratings yet
Research Methods Reviewer
2 pages
IAS (Notes)
No ratings yet
IAS (Notes)
14 pages
Science Action Plan
No ratings yet
Science Action Plan
3 pages
2Nd Tentative Merit List of Pet (Fts 2019-2020) : Office of The District Education Officer (Male) Charsadda
No ratings yet
2Nd Tentative Merit List of Pet (Fts 2019-2020) : Office of The District Education Officer (Male) Charsadda
2 pages
Emptiness and Wholeness - Untangling The Realities of Tibetan Bu PDF
100% (1)
Emptiness and Wholeness - Untangling The Realities of Tibetan Bu PDF
159 pages
Lguide 12
No ratings yet
Lguide 12
5 pages
1-s2.0-S1110016821002027-main
No ratings yet
1-s2.0-S1110016821002027-main
9 pages
GT-Review-v4-issue-3-2 - Basic Social Process
No ratings yet
GT-Review-v4-issue-3-2 - Basic Social Process
116 pages
BLUEPRINT-CLASS IX - SES 20 - 21 - 80 Marks
No ratings yet
BLUEPRINT-CLASS IX - SES 20 - 21 - 80 Marks
1 page
BIO007.2 Lab Act 5 Answer Sheet
No ratings yet
BIO007.2 Lab Act 5 Answer Sheet
3 pages
Uttar Pradesh Technical Admission Counselling Brochure (2022-2023)
No ratings yet
Uttar Pradesh Technical Admission Counselling Brochure (2022-2023)
23 pages
Lesson Plan: Previous Knowledge
No ratings yet
Lesson Plan: Previous Knowledge
8 pages
Wood - 2015 - Best Practice in The Psychological Assessment of Early Years Children With Differences
No ratings yet
Wood - 2015 - Best Practice in The Psychological Assessment of Early Years Children With Differences
7 pages
Malaysian IMO 2007-2008 Booklet
No ratings yet
Malaysian IMO 2007-2008 Booklet
8 pages
Unit-1-ProblemSheet-BSc and BTech
No ratings yet
Unit-1-ProblemSheet-BSc and BTech
2 pages
Mendix Dont Get Lost in The App Jungle
No ratings yet
Mendix Dont Get Lost in The App Jungle
19 pages
Sample Add Math
No ratings yet
Sample Add Math
2 pages
100 SPEAKING sample
No ratings yet
100 SPEAKING sample
100 pages
CPT Authors and Reviewers
No ratings yet
CPT Authors and Reviewers
2 pages
Unit 1, Lesson 1.1 - Vocab & Listening, Trang 4
No ratings yet
Unit 1, Lesson 1.1 - Vocab & Listening, Trang 4
4 pages
Factors To Consider in Writing Ims
50% (2)
Factors To Consider in Writing Ims
6 pages
The Scope of Sociology
No ratings yet
The Scope of Sociology
3 pages
Listening Skills
100% (1)
Listening Skills
2 pages
Innovative Work Report
No ratings yet
Innovative Work Report
4 pages
Kanji JLPT N4
No ratings yet
Kanji JLPT N4
8 pages
October 2016 MS 1
No ratings yet
October 2016 MS 1
6 pages
BIO345 Syllabus Fall 2023-Byblos Campus
No ratings yet
BIO345 Syllabus Fall 2023-Byblos Campus
7 pages
Waterloo Maths Competition 2024-9
No ratings yet
Waterloo Maths Competition 2024-9
2 pages
Org - Bseap Sscer 2320101783xx2023xxr
No ratings yet
Org - Bseap Sscer 2320101783xx2023xxr
1 page
GED106 Syllabus Revised2 - Word
No ratings yet
GED106 Syllabus Revised2 - Word
10 pages
We Play and Recycle
No ratings yet
We Play and Recycle
4 pages