Unit 4 Da
Unit 4 Da
🔷 What is Regression?
Regression analysis is a statistical method used to model and analyze the relationship between a
dependent variable (target) and one or more independent variables (predictors).
🔑 Key Points:
Objective: Predict the value of a dependent variable (Y) using independent variables (X).
Types:
o Logistic Regression: Used when the target variable is binary (e.g., 0 or 1).
🔷 What is Segmentation?
Segmentation is the process of dividing a larger dataset (e.g., customer base) into smaller,
meaningful groups based on shared characteristics or behaviors.
🔑 Key Points:
Purpose:
Methods:
Techniques:
🔷 Supervised Learning
Supervised learning is a machine learning technique where the model is trained using labeled data—
that is, the input (X) is provided along with the correct output (Y).
🔑 Key Features:
Learns from training data and predicts output for new/unseen data.
📌 Examples:
Identifying fruits based on features like shape, color, size, and taste.
✅ Common Algorithms:
Linear Regression
Logistic Regression
Decision Trees
Random Forest
Naive Bayes
Multi-class Classification
🔷 Unsupervised Learning
Unsupervised learning is a technique where the model is given unlabeled data and must find hidden
patterns, relationships, or groupings on its own.
🔑 Key Features:
Used when only inputs (X) are known; no outputs (Y) are available.
📌 Examples:
✅ Common Algorithms:
K-means Clustering
Hierarchical Clustering
Apriori Algorithm
Anomaly Detection
A decision tree is a supervised learning algorithm used for classification and regression
tasks.
Tree building in regression means constructing a decision tree that splits data into regions
based on feature values to predict a numeric value.
The model recursively splits the dataset into smaller subsets (nodes) to minimize the
prediction error (e.g., MSE).
At each leaf node, the output is a mean value of the target variable for that region
Structure:
Term Meaning
5. Stop when:
Entropy (H):
Information Gain (IG) tells us how much we learn by splitting data using a particular
attribute (feature).
It helps us find the best question to ask at each step in building a decision tree.
📊 Why is it useful?
The goal is to reduce entropy — make the data more clear (less random).
So we choose the attribute that gives the highest information gain — it best separates the
data into clear groups.
If the entropy becomes 0 after a split, that branch becomes a leaf node (no further splitting).
If entropy > 0, we keep splitting the data further.
ID3 Uses Information Gain and builds trees using greedy approach
C4.5 Improvement over ID3; handles missing data, uses Gain Ratio
Definition:
Pruning helps make a decision tree simpler and more accurate by removing unnecessary branches.
This prevents overfitting (when the tree memorizes the training data too much)..
Avoid Overfitting: Removes branches that fit only training data, not new data.
Improves Accuracy: Helps the model make better predictions on unseen data.
🔸 Types of Pruning
o Stop tree construction early based on a threshold (e.g., min samples per node).
🔻 Bottom-Up Pruning
Moves upwards, checking each node to see if it’s useful for classification.
🔺 Top-Down Pruning
If a node is not useful, it’s cut off, and the whole sub-tree under it may be removed.
Still useful because it often works well with new (unseen) data.
Example:
📈 Model Complexity
Definition:
Model complexity refers to how flexible a model is in capturing patterns from data.
🔸 Effects of Complexity
🔸 Balancing Complexity
Model is too complex and learns noise in training data; performs poorly on unseen
Overfitting
data
o Pruning
After construction, the tree can be transformed into a series of if-then rules for better
interpretability.
📌 Summary
Decision Trees are simple, interpretable, and powerful for both classification and regression.
Overfitting is a common issue, tackled via pruning and controlling model complexity.
Key metrics like Entropy and Information Gain guide the tree-building process.
Suitable for real-world data with categorical or continuous variables, even when incomplete
or noisy.
Classification Trees:
A classification tree is an algorithm where the target variable is fixed or categorical. The
algorithm is then used to identify the “class” within which a target variable would most likely
fall.
An example of a classification-type problem would be determining who will or will not
subscribe to a digital platform; or who will or will not graduate from high school.
These are examples of simple binary classifications where the categorical dependent variable
can assume only one of two, mutually exclusive values.
Regression Trees :
A regression tree refers to an algorithm where the target variable is and the algorithm is
used to predict its value which is a continuous variable.
As an example of a regression type problem, you may want to predict the selling prices of a
residential house, which is a continuous dependent variable.
This will depend on both continuous factors like square footage as well as categorical factors.
Here are clean, easy-to-understand, and well-structured notes on Multiple Decision Trees with all
the key points from your explanation:
Using multiple decision trees together is a core concept in ensemble learning — improving
prediction accuracy by combining the power of many models.
1. 🌲 Random Forest
✅ How It Works:
Randomly selects data samples (rows) and features (columns) to build each tree.
Final output:
✅ Why It Works:
Reduces overfitting.
📌 Analogy: Like asking 100 students to solve a problem and taking a group consensus.
Trees are built sequentially, each fixing the mistakes of the previous one.
✅ How It Works:
✅ Why It Works:
📌 Analogy: Like improving your performance by learning from test mistakes each time.
Similar to Random Forest but may not use random feature selection.
✅ How It Works:
Create multiple datasets using bootstrapping (random sampling with replacement).
Combine predictions.
✅ Benefit:
4. 🧠 Stacking
✅ How It Works:
It means studying how data changes over time at regular intervals — like every hour, day,
week, or year.
🌍 Where is it used?
Earthquake prediction
1. Classification
2. Curve Fitting
o Drawing a curve that follows the trend in the data, so we can understand how
variables are related.
3. Descriptive Analysis
4. Explanative Analysis
5. Exploratory Analysis
o Focuses on showing the main features of the data, often using charts and graphs.
6. Forecasting
7. Intervention Analysis
o Studies how a specific event (like a festival, disaster, or policy change) affects the
time series.
8. Segmentation
o Splits the data into parts (segments) to find underlying patterns in each part.
2. Seasonal Variation
o Regular changes that repeat every year, like higher ice cream sales in summer.
3. Cyclical Variation
o Similar to seasonal, but these patterns happen over several years — like business
cycles.
4. Irregular Variation
o Can be:
🔹 What is ARIMA?
ARIMA is a powerful and popular statistical model used in time series analysis.
o AR – AutoRegressive
o I – Integrated (Differencing)
o MA – Moving Average
ARIMA is used to understand patterns in data over time and to predict future values (called
forecasting).
It is a generalization of ARMA, which only works when the data is stationary (constant mean
and variance over time).
ARIMA handles this using differencing (Integrated part), which converts non-stationary data
to stationary by subtracting past values.
🔹 Key Features
o Analyzing trends
Parameter Meaning
1. Autoregression (AR):
o Uses the relationship between an observation and a number of its past values.
2. Integrated (I):
o Example:
o Example: adjusting today's forecast using how wrong we were in the last few days.
🔹 ARMA – Special Case of ARIMA
Used when:
🔹 Seasonal ARIMA
For data with seasonal patterns (like monthly or quarterly data), use:
Where:
🔹 Forecast Accuracy
🔹 Applications of ARIMA
Forecast Accuracy refers to how close the forecasted values are to the actual observed values in
time series data. It helps evaluate the performance of a forecasting model.
A perfect forecast would have zero error for all time periods.
It indicates the bias in the forecast (whether the model tends to overestimate or
underestimate).
Measures the average size of the forecast errors ignoring direction (i.e., absolute values).
It tells how far off, on average, the forecasts are from actual values.
🔹 Difference Between MFE and MAD
STL is a statistical method that decomposes a time series into three core components:
1. Trend (T):
Represents the long-term direction in the data. For example, sales consistently increasing
month by month shows a positive trend.
2. Seasonality (S):
Regular and recurring patterns at specific intervals, such as monthly or quarterly effects.
Example: festive sales spikes every December.
3. Residual/Noise (R):
Captures the irregular, unpredictable part of the data that cannot be explained by trend or
seasonality. Example: sales drop during a sudden lockdown.
🔹 Summary of STL
Here’s a well-integrated and polished version of your content combining Feature Extraction with
Dimensionality Reduction, ideal for use in image-based prediction systems:
In image-based machine learning systems, the performance of a model depends heavily on how well
we extract and select meaningful features from data. However, as the number of features grows, so
does complexity. That’s where Dimensionality Reduction comes into play, helping streamline the
process for better results.
🔹 What is a Feature?
In a typical system, three different methods of feature extraction may be applied, each producing
different sets of results. These methods are then compared, and the most effective feature set
(based on classification performance) is selected for building the final model.
🧮 Why Feature Selection Matters
Not all features are equally useful. Some may be redundant or irrelevant, and
including them increases computational cost and risk of overfitting—where a
model performs well on training data but poorly on unseen data.
To avoid this:
Only relevant and non-redundant features are selected.
The aim is to preserve discriminative power while improving model
generalizability.
🔧 Using Extracted Features for Prediction
Once features are extracted:
1. A Predictor Model is trained using labeled examples (e.g., good vs bad
quality images, salient vs non-salient content).
2. The model outputs:
o A class label (e.g., "defective", "normal", "salient").
🔍 Feature Extraction
Feature Extraction involves calculating these descriptors from image data to capture:
Multiple methods of extraction can be applied. Each method yields a different set of features, and
the most effective one is selected based on classification accuracy and computational efficiency.
When many features are extracted, the dataset becomes high-dimensional, making it:
Slower to compute
Prone to overfitting (where models perform well on training data but poorly on unseen data)
1. Feature Selection
Selects a subset of the original features based on relevance and redundancy, often using
statistical tests or model-based techniques.
2. Feature Extraction
Transforms data into a lower-dimensional space while preserving key information. The most
common method is:
Steps:
Advantages:
📊 Feature-Based Prediction
Once relevant features are extracted and dimensionality is reduced, a Predictor (machine learning
model) is trained using labeled examples:
Output:
A class label
The predictor is selected to avoid overfitting, especially when feature count is high compared to
sample size—a common issue in image processing.