0% found this document useful (0 votes)
2 views23 pages

Unit 4 Da

The document provides an overview of regression and segmentation, explaining regression analysis as a method for predicting dependent variables using independent variables, and segmentation as the process of dividing datasets into meaningful groups. It also covers supervised and unsupervised learning, detailing decision trees for classification and regression, their construction, and the importance of pruning to avoid overfitting. Additionally, it discusses ensemble learning methods like Random Forest and boosting techniques, along with time series forecasting and its applications.

Uploaded by

22wh1a6647
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views23 pages

Unit 4 Da

The document provides an overview of regression and segmentation, explaining regression analysis as a method for predicting dependent variables using independent variables, and segmentation as the process of dividing datasets into meaningful groups. It also covers supervised and unsupervised learning, detailing decision trees for classification and regression, their construction, and the importance of pruning to avoid overfitting. Additionally, it discusses ensemble learning methods like Random Forest and boosting techniques, along with time series forecasting and its applications.

Uploaded by

22wh1a6647
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

OBJECT SEGMENTATION

🔷 What is Regression?

Regression analysis is a statistical method used to model and analyze the relationship between a
dependent variable (target) and one or more independent variables (predictors).

🔑 Key Points:

 Objective: Predict the value of a dependent variable (Y) using independent variables (X).

 Types:

o Linear Regression: Used when the target variable is continuous.

o Logistic Regression: Used when the target variable is binary (e.g., 0 or 1).

 Insights: Helps in understanding the impact of changes in independent variables on the


dependent variable.

 Applications: Predicting sales, customer response, loan defaults, etc.

🔷 What is Segmentation?

Segmentation is the process of dividing a larger dataset (e.g., customer base) into smaller,
meaningful groups based on shared characteristics or behaviors.

🔑 Key Points:

 Purpose:

o To better understand subgroups.

o To improve targeting, personalization, and overall analytics.

 Methods:

o Objective (Supervised) Segmentation: Based on a specific outcome (e.g., response to


an offer).

o Non-Objective (Unsupervised) Segmentation: Based on profiling (e.g., customer


lifestyle, geography).

 Techniques:

o Supervised: CHAID (Chi-squared Automatic Interaction Detection), CRT (Classification


and Regression Trees).

o Unsupervised: Clustering, K-means, K-Nearest Neighbors (KNN).


Supervised and Unsupervised Learning

🔷 Supervised Learning

Supervised learning is a machine learning technique where the model is trained using labeled data—
that is, the input (X) is provided along with the correct output (Y).

🔑 Key Features:

 Goal: Learn the mapping function Y = f(X)

 Needs supervision (similar to learning with a teacher).

 Used when both input and output data are available.

 Common tasks: Classification and Regression

 Learns from training data and predicts output for new/unseen data.

📌 Examples:

 Identifying fruits based on features like shape, color, size, and taste.

 Email spam detection (Spam or Not Spam).

 Predicting house prices based on size, location, etc.

✅ Common Algorithms:

 Linear Regression

 Logistic Regression
 Decision Trees

 Random Forest

 Support Vector Machine (SVM)

 Naive Bayes

 Multi-class Classification

🔷 Unsupervised Learning

Unsupervised learning is a technique where the model is given unlabeled data and must find hidden
patterns, relationships, or groupings on its own.

🔑 Key Features:

 No supervision is provided; model learns from raw input data.

 Goal: Discover structure or patterns in data.

 Used when only inputs (X) are known; no outputs (Y) are available.

 Common tasks: Clustering and Association

📌 Examples:

 Grouping customers by purchasing behavior.

 Clustering fruits into similar types based on features without labels.

 Market basket analysis to find items frequently bought together.

✅ Common Algorithms:

 K-means Clustering

 Hierarchical Clustering

 K-Nearest Neighbors (KNN)

 Principal Component Analysis (PCA)

 Independent Component Analysis (ICA)

 Apriori Algorithm

 Anomaly Detection

 Neural Networks (in some cases)


🌳 TREE BUILDING: REGRESSION, CLASSIFICATION, OVERFITTING, PRUNING & COMPLEXITY

🔷 Introduction to Decision Trees

 A decision tree is a supervised learning algorithm used for classification and regression
tasks.

 Tree building in regression means constructing a decision tree that splits data into regions
based on feature values to predict a numeric value.
 The model recursively splits the dataset into smaller subsets (nodes) to minimize the
prediction error (e.g., MSE).
 At each leaf node, the output is a mean value of the target variable for that region

 It mimics human decision-making by asking a sequence of questions based on feature


values.

 Structure:

o Root Node – The starting point containing the entire dataset.

o Internal Nodes – Represent tests on attributes.

o Branches – Outcomes of the tests.

o Leaf Nodes – Final class labels or output values.


📚 Decision Tree Terminology

Term Meaning

Root Node First node representing the entire dataset

Leaf Node Terminal node giving final output

Splitting Dividing node into sub-nodes

Parent/Child Node relationships in the tree

Subtree Part of the tree under a node

🔶 Types of Decision Trees

Type Purpose Output Type

Used when the target is categorical/class-


Classification Tree Discrete (e.g., Yes/No, Class A/B)
based

Real values (e.g., price,


Regression Tree Used when the target is continuous
temperature)

🔶 How Tree Building Works

Step-by-step Process (ID3 Algorithm Example):

1. Start with the entire dataset at the root node.

2. Select the best attribute using a measure like Information Gain.

3. Split the dataset into subsets based on the attribute's values.

4. Recursively build subtrees for each subset.

5. Stop when:

o All tuples belong to one class, or

o No attributes are left to split, or

o A stopping condition is met (like max depth).


📊 Key Concepts in Tree Construction

Entropy (H):

 Entropy is a measure of the randomness in the information being processed.


 The higher the entropy, the harder it is to draw any conclusions from that information.
Flipping a coin is an example of an action that provides information that is random.
 From the graph, it is quite evident that the entropy H(X) is zero when the probability is
either 0 or 1.
 The Entropy is maximum when the probability is 0.5 because it projects perfect randomness
in the data and there is no chance if perfectly determining the outcome.

Information Gain (IG):

 Information Gain (IG) tells us how much we learn by splitting data using a particular
attribute (feature).

 It helps us find the best question to ask at each step in building a decision tree.

📊 Why is it useful?

 The goal is to reduce entropy — make the data more clear (less random).

 So we choose the attribute that gives the highest information gain — it best separates the
data into clear groups.

Decision Trees (like ID3):

 If the entropy becomes 0 after a split, that branch becomes a leaf node (no further splitting).
 If entropy > 0, we keep splitting the data further.

Algorithms for Tree Building


Algorithm Description

ID3 Uses Information Gain and builds trees using greedy approach

C4.5 Improvement over ID3; handles missing data, uses Gain Ratio

CART Builds binary trees, supports classification & regression

CHAID Uses chi-square tests; supports multi-way splits

MARS Handles continuous variables better, used for regression

Conditional Inference Trees Uses statistical tests to avoid overfitting

✂️Pruning in Decision Trees

Definition:
Pruning helps make a decision tree simpler and more accurate by removing unnecessary branches.
This prevents overfitting (when the tree memorizes the training data too much)..

Why Prune a Decision Tree?

 Avoid Overfitting: Removes branches that fit only training data, not new data.

 Improves Accuracy: Helps the model make better predictions on unseen data.

 Reduces Complexity: Makes the tree smaller and easier to understand.

 Increases Stability: Less sensitive to small changes in data.

🔸 Types of Pruning

1. Pre-Pruning (Early Stopping)

o Stop tree construction early based on a threshold (e.g., min samples per node).

o Pros: Prevents overgrowth.

o Cons: Might miss subtle patterns.

2. Post-Pruning (Reduced Error or Cost Complexity)

o Build the full tree first.

o Then remove branches that do not improve performance on a validation set.

o Often done using a validation set or cross-validation.

🔻 Bottom-Up Pruning

 Starts from the bottom (leaf nodes).

 Moves upwards, checking each node to see if it’s useful for classification.

 If a node doesn’t improve accuracy, it’s removed or replaced by a leaf.

 ✅ Advantage: Keeps important sub-trees. No useful part is removed by mistake.


Examples:

 REP (Reduced Error Pruning)

 MCCP (Minimum Cost Complexity Pruning)

 MEP (Minimum Error Pruning)

🔺 Top-Down Pruning

 Starts from the top (root).

 Moves down the tree, checking if each node is helpful.

 If a node is not useful, it’s cut off, and the whole sub-tree under it may be removed.

 ⚠️Risk: May accidentally remove useful sub-trees.

 Still useful because it often works well with new (unseen) data.

Example:

 PEP (Pessimistic Error Pruning)

📈 Model Complexity

Definition:
Model complexity refers to how flexible a model is in capturing patterns from data.

🔸 Effects of Complexity

 Too Complex (Overfitting):

o Very low training error

o High generalization error

o Learns noise in data

 Too Simple (Underfitting):

o High training and test errors

o Fails to capture patterns in the data

🔸 Balancing Complexity

 Use pruning techniques to simplify complex trees.

 Control parameters like:

o Max Depth of tree

o Min Samples per Leaf

o Min Information Gain for splitting

⚖️Overfitting and Underfitting


Concept Description

Model is too complex and learns noise in training data; performs poorly on unseen
Overfitting
data

Underfitting Model is too simple; fails to learn patterns in data

 Overfitting: Low training error but high test error.

 Underfitting: High training and test error.

 Balance model complexity using:

o Pruning

o Limiting tree depth

o Minimum samples per leaf

✅ When to Use Decision Trees

 Data is structured as attribute-value pairs.

 Target variable is categorical (for classification) or numerical (for regression).

 Handles noisy data and missing values well.

 Useful for extracting if-then rules from data.

🔁 Decision Tree as Rule Set

 After construction, the tree can be transformed into a series of if-then rules for better
interpretability.

Advantages of Decision Trees

 Easy to understand and explain — no complex math needed.

 No need for much data preparation (like scaling or converting values).

 Can handle both numbers and categories (e.g., age or color).

 Transparent ("white-box") model — you can explain how it made a decision.

 Testable and validatable — can be checked using statistics.

 Works well on large datasets using normal computers.

📌 Summary
 Decision Trees are simple, interpretable, and powerful for both classification and regression.

 Overfitting is a common issue, tackled via pruning and controlling model complexity.

 Key metrics like Entropy and Information Gain guide the tree-building process.

 Suitable for real-world data with categorical or continuous variables, even when incomplete
or noisy.

Classification & Regression Trees:

Classification Trees:

 A classification tree is an algorithm where the target variable is fixed or categorical. The
algorithm is then used to identify the “class” within which a target variable would most likely
fall.
 An example of a classification-type problem would be determining who will or will not
subscribe to a digital platform; or who will or will not graduate from high school.
 These are examples of simple binary classifications where the categorical dependent variable
can assume only one of two, mutually exclusive values.

Regression Trees :

 A regression tree refers to an algorithm where the target variable is and the algorithm is
used to predict its value which is a continuous variable.
 As an example of a regression type problem, you may want to predict the selling prices of a
residential house, which is a continuous dependent variable.
 This will depend on both continuous factors like square footage as well as categorical factors.

Here are clean, easy-to-understand, and well-structured notes on Multiple Decision Trees with all
the key points from your explanation:

🌳 Multiple Decision Trees – Ensemble Learning

Using multiple decision trees together is a core concept in ensemble learning — improving
prediction accuracy by combining the power of many models.

❓ Why Not Just One Tree?

A single decision tree can:

 Underfit: Too simple to capture patterns.

 Overfit: Too complex and memorizes data instead of generalizing.


✅ Multiple trees solve this by:

 Averaging predictions (for regression)

 Majority voting (for classification)

 Reducing overfitting and improving accuracy

🔍 Ensemble Methods Using Multiple Trees

1. 🌲 Random Forest

A "forest" of decision trees where each tree is built with randomness.

✅ How It Works:

 Randomly selects data samples (rows) and features (columns) to build each tree.

 Trees are trained independently.

 Final output:

o Regression → average of tree predictions

o Classification → majority vote of class predictions

✅ Why It Works:

 Combines many weak learners into a strong one.

 Reduces overfitting.

 Robust to noise and missing data.

📌 Analogy: Like asking 100 students to solve a problem and taking a group consensus.

2. ⚡ Boosting (e.g., XGBoost, AdaBoost, LightGBM)

Trees are built sequentially, each fixing the mistakes of the previous one.

✅ How It Works:

 First tree predicts the outcome.

 Next tree focuses on errors from the previous tree.

 Final prediction is a weighted sum of all trees.

✅ Why It Works:

 Learns from mistakes and focuses on hard-to-predict samples.

 Often more accurate than Random Forest on complex tasks.

📌 Analogy: Like improving your performance by learning from test mistakes each time.

3. 🔄 Bagging (Bootstrap Aggregation)

Similar to Random Forest but may not use random feature selection.

✅ How It Works:
 Create multiple datasets using bootstrapping (random sampling with replacement).

 Train one tree on each dataset.

 Combine predictions.

✅ Benefit:

 Reduces variance without increasing bias.

4. 🧠 Stacking

Uses different model types, not just decision trees.

✅ How It Works:

 Train several different models (e.g., trees, SVMs, neural networks).

 Each model makes predictions.

 A meta-model learns how to combine these predictions into a final output.

✅ Why Use It:

 Leverages strengths of different algorithms.

 Often achieves better performance than any single model.

📊 Time Series Forecasting and Analysis

🔄 What is Time Series Forecasting?

 It means studying how data changes over time at regular intervals — like every hour, day,
week, or year.

 We use this method to predict the future based on past patterns.

🌍 Where is it used?

Time series is used in many fields like:

 Economics (e.g., sales, stock market)

 Weather forecasting (e.g., temperature, wind speed)

 Earthquake prediction

 Health and medical treatment forecasting

 Engineering and applied sciences

💽 What is Time Series Data?

 It’s a type of data collected at different times, in order (like a calendar).

 Time series analysis helps find hidden patterns in this data.

 It also helps in:


o Predicting the future

o Finding unusual changes (anomalies)

 To get reliable results, we need a lot of data points over time.

🧠 Types of Time Series Analyses & Models

1. Classification

o Grouping or labeling the time-based data into different categories.

2. Curve Fitting

o Drawing a curve that follows the trend in the data, so we can understand how
variables are related.

3. Descriptive Analysis

o Describes patterns like:

 Trends (long-term direction),

 Cycles (repeating ups and downs),

 Seasonal effects (monthly or yearly repeats)

4. Explanative Analysis

o Tries to explain the reasons behind the changes.

o Looks at cause-and-effect relationships in the data.

5. Exploratory Analysis

o Focuses on showing the main features of the data, often using charts and graphs.

6. Forecasting

o Uses past data to predict future values.

o Very useful in planning, budgeting, or strategy building.

7. Intervention Analysis

o Studies how a specific event (like a festival, disaster, or policy change) affects the
time series.

8. Segmentation

o Splits the data into parts (segments) to find underlying patterns in each part.

🧩 Four Key Components of Time Series

1. Trend (Long-Term Direction)


o The overall direction the data is moving in — going up, down, or staying stable over
years.

2. Seasonal Variation

o Regular changes that repeat every year, like higher ice cream sales in summer.

3. Cyclical Variation

o Similar to seasonal, but these patterns happen over several years — like business
cycles.

4. Irregular Variation

o Random, unpredictable changes — caused by things like accidents or natural


disasters.

o Can be:

 Stationary: No clear trend; changes are random.

 Non-Stationary: Data has patterns that can still be studied.

📊 ARIMA – AutoRegressive Integrated Moving Average - https://www.youtube.com/watch?


v=JnRVpnwD4Kw

🔹 What is ARIMA?

 ARIMA is a powerful and popular statistical model used in time series analysis.

 The full form is:

o AR – AutoRegressive

o I – Integrated (Differencing)

o MA – Moving Average

 ARIMA is used to understand patterns in data over time and to predict future values (called
forecasting).

 It is a generalization of ARMA, which only works when the data is stationary (constant mean
and variance over time).

🔹 Why Use ARIMA?


 Time series data often shows trends or seasonality, making it non-stationary.

 ARIMA handles this using differencing (Integrated part), which converts non-stationary data
to stationary by subtracting past values.

🔹 Key Features

 ARIMA models can be used for:

o Forecasting future values

o Analyzing trends

o Filling in missing values

o Detecting outliers or sudden changes

o Monitoring systems over time (in industries, healthcare, finance, etc.)

🔹 ARIMA Model Structure: ARIMA(p, d, q)

ARIMA models are defined by three parameters:

Parameter Meaning

p Number of AutoRegressive (AR) terms (uses past values)

d Number of Differences applied to make data stationary

q Number of Moving Average (MA) terms (uses past forecast errors)

🔹 Understanding the Components

1. Autoregression (AR):

o Uses the relationship between an observation and a number of its past values.

o Example: predicting today’s temperature using past few days' temperatures.

2. Integrated (I):

o Means differencing the data to make it stationary.

o Example:

 If d = 1 → Yt - Yt−1 (first difference)

 If d = 2 → Yt - 2Yt−1 + Yt−2 (second difference)

3. Moving Average (MA):

o Uses past forecast errors to improve predictions.

o Example: adjusting today's forecast using how wrong we were in the last few days.
🔹 ARMA – Special Case of ARIMA

 If data is already stationary, you don’t need differencing.

 Then ARIMA becomes an ARMA(p, q) model (no "I" part).

 Used when:

o Data doesn’t have trends or seasonality

o The structure is stable over time

🔹 Seasonal ARIMA

 For data with seasonal patterns (like monthly or quarterly data), use:

ARIMA(p, d, q)(P, D, Q)m

Where:

o P, D, Q are seasonal counterparts of AR, I, and MA

o m is the seasonal cycle (e.g., 12 for monthly data)

🔹 Forecast Accuracy

Forecast error = Actual − Forecast


We evaluate accuracy using:

1. Mean Forecast Error (MFE)

o Average of forecast errors

o Tells whether model under-predicts (MFE > 0) or over-predicts (MFE < 0)

2. Mean Absolute Deviation (MAD)

o Average of absolute values of errors

o Shows how large the typical error is (ignores direction)

🔹 Applications of ARIMA

ARIMA is widely used in:

 Finance: stock price and sales forecasting

 Healthcare: predicting patient demand or disease spread


 Manufacturing: quality and process control

 Economics: GDP, inflation, and employment trends

 Environmental data: temperature, rainfall forecasting

 System monitoring: detecting sudden changes or breakdowns

📈 Measure of Forecast Accuracy

🔹 What is Forecast Accuracy?

Forecast Accuracy refers to how close the forecasted values are to the actual observed values in
time series data. It helps evaluate the performance of a forecasting model.

A perfect forecast would have zero error for all time periods.

🔹 Why is Forecast Accuracy Important?

Forecast accuracy is crucial for:

 Checking how well the model predicts

 Identifying and correcting model bias

 Choosing the best model among alternatives

 Adjusting models when patterns change

 Improving decision-making in business, economics, and engineering

🔹 Two Main Methods to Measure Forecast Accuracy

1. Mean Forecast Error (MFE)

 Measures the average error over n time periods.

 It indicates the bias in the forecast (whether the model tends to overestimate or
underestimate).

2. Mean Absolute Deviation (MAD)

 Measures the average size of the forecast errors ignoring direction (i.e., absolute values).

 It tells how far off, on average, the forecasts are from actual values.
🔹 Difference Between MFE and MAD

Metric Description Indicates

MFE Average of errors (with signs) Bias (under- or over-forecasting)

MAD Average of absolute errors Accuracy (how large errors are)

🔹 Uses of Forecast Error

Forecast errors are useful for:

 Detecting forecast bias

 Measuring the absolute size of prediction errors

 Comparing multiple forecasting models

 Tuning or adjusting models when needed

 Identifying poor-performing models

 Improving future predictions and decisions

📊 STL Approach (Seasonal-Trend Decomposition using LOESS)

STL is a statistical method that decomposes a time series into three core components:

🔹 STL Decomposition Components

1. Trend (T):
Represents the long-term direction in the data. For example, sales consistently increasing
month by month shows a positive trend.

2. Seasonality (S):
Regular and recurring patterns at specific intervals, such as monthly or quarterly effects.
Example: festive sales spikes every December.

3. Residual/Noise (R):
Captures the irregular, unpredictable part of the data that cannot be explained by trend or
seasonality. Example: sales drop during a sudden lockdown.

🔹 STL Decomposition Process


1. Estimate the Trend
A smooth line is fitted using LOESS (locally weighted regression) to
extract the general trend of the data.
2. Detrend the Series
Subtract the estimated trend from the original series to isolate seasonal
and residual components.
3. Estimate Seasonality
Detect repeating seasonal patterns from the detrended series and model
them.
4. De-seasonalize the Series
Subtract the seasonal component from the original data to simplify the
dataset further.
5. Fit a Polynomial Trend (Optional)
A polynomial model may be used on the de-seasonalized series to refine
the trend curve.
6. Compute Residuals
This final component captures noise, randomness, or unexpected events.

Why Perform Seasonal Decomposition?


1. 🧩 Pattern Identification
Decomposition helps isolate and interpret underlying structures in time
series data—such as long-term movement (trend), repeating patterns
(seasonality), and irregular noise (residuals). This breakdown provides
clarity and aids in understanding the nature of the data.
2. 📈 Better Forecasting Accuracy
By modeling each component separately, forecasting becomes more
robust and reliable. For instance, future values can be predicted by
extrapolating the trend and seasonal components and then adjusting for
potential noise.
3. 🚨 Anomaly Detection
Unusual spikes or dips become easier to spot when we remove the
"expected" behavior. What remains—the residual—is more clearly
attributable to anomalies or outliers, making it easier to detect sudden
events like system failures or external shocks.
4. 📊 Improved Statistical Modeling
Statistical models work better on stable, stationary data. Decomposition
simplifies the data structure, allowing better application of forecasting
techniques like ARIMA, Exponential Smoothing, or even machine
learning models.

🔹 Key Advantages of STL


 Handles changing seasonality better than classical decomposition.
 Can deal with missing values and outliers more robustly.
 Ideal for feature extraction before applying ML models.
 Effective for anomaly detection, forecasting, and trend visualization.

🔹 Summary of STL

 STL is flexible, robust, and handles changing seasonality.

 Uses LOESS for smoothing curves.

 Helps in feature extraction and preprocessing for forecasting.

Here’s a well-integrated and polished version of your content combining Feature Extraction with
Dimensionality Reduction, ideal for use in image-based prediction systems:

🧠 Feature Extraction & Dimensionality Reduction for Prediction

In image-based machine learning systems, the performance of a model depends heavily on how well
we extract and select meaningful features from data. However, as the number of features grows, so
does complexity. That’s where Dimensionality Reduction comes into play, helping streamline the
process for better results.

🔹 What is a Feature?

A feature is a measurable and informative property of data. In images, a feature—also known as a


descriptor—could describe anything from shape and size to color and texture. These are essential for
understanding the content and structure of images.

Methods of Feature Extraction

In a typical system, three different methods of feature extraction may be applied, each producing
different sets of results. These methods are then compared, and the most effective feature set
(based on classification performance) is selected for building the final model.
🧮 Why Feature Selection Matters
Not all features are equally useful. Some may be redundant or irrelevant, and
including them increases computational cost and risk of overfitting—where a
model performs well on training data but poorly on unseen data.
To avoid this:
 Only relevant and non-redundant features are selected.
 The aim is to preserve discriminative power while improving model
generalizability.
🔧 Using Extracted Features for Prediction
Once features are extracted:
1. A Predictor Model is trained using labeled examples (e.g., good vs bad
quality images, salient vs non-salient content).
2. The model outputs:
o A class label (e.g., "defective", "normal", "salient").

o A score matrix indicating prediction confidence.

✅ Model Choice & Overfitting Consideration


The selected predictor should:
 Be resistant to overfitting, which commonly occurs when the number of
features is too large relative to the number of samples.
 Be efficient in handling high-dimensional feature spaces, which is
typical in image data.

🔍 Feature Extraction

Feature Extraction involves calculating these descriptors from image data to capture:

 Geometric characteristics (e.g., shape, size)

 Textural properties (e.g., energy, contrast)

 Color information (e.g., RGB means, Lab values)

Multiple methods of extraction can be applied. Each method yields a different set of features, and
the most effective one is selected based on classification accuracy and computational efficiency.

📈 Why Dimensionality Reduction?

When many features are extracted, the dataset becomes high-dimensional, making it:

 Harder to train models effectively

 Slower to compute

 Prone to overfitting (where models perform well on training data but poorly on unseen data)

This is known as the Curse of Dimensionality.

✨ Benefits of Dimensionality Reduction:

 Reduces noise and irrelevant data

 Speeds up training and prediction

 Enhances model generalization

 Improves visualization in 2D/3D plots

 Lowers memory and storage usage

🔧 Techniques for Dimensionality Reduction

1. Feature Selection
Selects a subset of the original features based on relevance and redundancy, often using
statistical tests or model-based techniques.

2. Feature Extraction
Transforms data into a lower-dimensional space while preserving key information. The most
common method is:

📌 Principal Component Analysis (PCA)


PCA is a linear technique that identifies directions (principal components) where the data varies the
most.

Steps:

1. Compute the directions (components) of maximum variance.

2. Transform data to align with these components.

3. Retain top components that preserve most variance.

Advantages:

 Removes noise and redundant information

 Produces a compact representation

 Enables faster computation and clearer visualization

📊 Feature-Based Prediction

Once relevant features are extracted and dimensionality is reduced, a Predictor (machine learning
model) is trained using labeled examples:

 e.g., good vs bad quality images, or salient vs non-salient regions

Output:

 A class label

 A confidence score matrix

The predictor is selected to avoid overfitting, especially when feature count is high compared to
sample size—a common issue in image processing.

You might also like