0% found this document useful (0 votes)

4 views4 pages

Data Analytics Unit4 FullNotes

The document covers key concepts in data analytics, focusing on supervised vs unsupervised learning, segmentation, decision trees, overfitting and pruning, measures of forecast accuracy, and STL decomposition. It outlines definitions, types, techniques, applications, and challenges associated with each topic. The goal is to enhance analysis, decision-making, and forecasting in various fields such as marketing, healthcare, and finance.

Uploaded by

nagarajchintu1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

Data Analytics Unit4 FullNotes

Uploaded by

nagarajchintu1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Analytics - Unit 4 Full Notes

1. Supervised vs Unsupervised Learning

Supervised Learning vs Unsupervised Learning:

| Feature | Supervised Learning | Unsupervised Learning |

|-------------------------------|----------------------------------------------------------|----------------------------------------------------------|
| Definition | Learning with labeled data | Learning with unlabeled data |
| Input Data | Input has output labels | Input has no output labels |
| Goal | Predict output | Discover hidden patterns |
| Output Type | Predictive (classification/regression) | Descriptive (clusters/associations)
|
| Examples of Tasks | Classification, Regression | Clustering, Association
|
| Evaluation | Accuracy, RMSE, etc. | Silhouette score, manual interpretation
|
| Algorithms | Decision Trees, SVM, Linear Regression | K-Means, DBSCAN, PCA
|
| Use Cases | Email spam detection, loan approval | Customer segmentation, anomaly
detection |

2. Segmentation

Segmentation is the process of dividing a dataset into smaller, meaningful subgroups based on similarities in attributes
or behavior.

Types of Segmentation:
- Demographic: Age, income, gender
- Geographic: Region, city, country
- Behavioral: Purchase habits, product usage
- Psychographic: Lifestyle, interests

Segmentation Techniques:
- K-Means Clustering
- Hierarchical Clustering
- DBSCAN
- Self-Organizing Maps (SOM)

Applications:
- Marketing: Targeting specific customer groups
- Healthcare: Grouping patients by conditions
Data Analytics - Unit 4 Full Notes

- Retail: Personalizing product recommendations

Goal: Improve analysis, decision-making, and forecasting by understanding group-specific behavior.

3. Decision Trees

Decision Trees are flowchart-like structures used for classification and regression tasks.

Types:
- Classification Tree: Output is categorical
- Regression Tree: Output is numerical

Structure:
- Nodes: Attribute tests
- Branches: Outcomes of tests
- Leaves: Final decisions or class labels

Splitting Criteria:
- Gini Index, Entropy/Information Gain for classification
- Variance reduction for regression

Process:
1. Choose the best splitting attribute
2. Partition the data accordingly
3. Recursively build subtrees
4. Stop when data is pure or depth is limited

Challenges:
- Overfitting: Very deep trees memorize noise
- Pruning: Technique to simplify the tree by removing branches

Ensembles (Multiple Trees):

- Random Forests: Uses voting among multiple trees
- Boosting: Combines weak learners into a strong model

Applications: Credit scoring, medical diagnosis, churn prediction

4. Overfitting and Pruning

Overfitting occurs when a model learns the training data too closely, including noise and anomalies, leading to poor
Data Analytics - Unit 4 Full Notes

generalization.

Symptoms:
- High training accuracy but low test accuracy
- Complex and deep tree structure

Causes:
- Too many attributes
- Lack of pruning
- Small datasets

Pruning is used to reduce tree size and improve generalization.

Types of Pruning:
- Pre-Pruning: Stops tree growth early (e.g., max depth, min samples)
- Post-Pruning: Removes unnecessary branches after full tree is built

Benefits:
- Reduces overfitting
- Improves prediction on unseen data
- Enhances interpretability

Goal: Build a model that balances complexity and accuracy.

5. Measures of Forecast Accuracy

Forecast accuracy metrics evaluate how close predictions are to actual values.

Common Metrics:
- MAE (Mean Absolute Error): Average of absolute errors
- MSE (Mean Squared Error): Average of squared errors
- RMSE (Root Mean Squared Error): Square root of MSE
- MAPE (Mean Absolute Percentage Error): Error as a percentage
- sMAPE (Symmetric MAPE): Balanced version of MAPE

Choosing the Right Metric:

- Use MAE for simple average error
- Use RMSE when large errors matter more
- Use MAPE for relative accuracy (not if data has zero values)

Applications:
Data Analytics - Unit 4 Full Notes

- Retail: Sales forecasting

- Finance: Stock price prediction
- Healthcare: Patient count prediction

Lower metric values indicate higher accuracy.

6. STL Decomposition

STL (Seasonal and Trend decomposition using Loess) breaks a time series into three components:

1. Trend: Long-term progression

2. Seasonality: Repeating short-term cycles
3. Residual: Random noise

STL uses LOESS (Local regression) for smoothing and is highly flexible.

Advantages:
- Works with any seasonality type
- Robust to outliers
- Allows component-wise analysis

Steps:
1. Input time series
2. Apply smoothing to extract trend and seasonality
3. Subtract from original to get residual

Applications:
- Retail: Understand sales trends
- Finance: Analyze stock patterns
- Weather: Seasonal forecasting

STL is ideal for preprocessing time series before applying models like ARIMA.

Predictive Analytics Updated
No ratings yet
Predictive Analytics Updated
30 pages
Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition
From Everand
Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition
Brett Lantz
No ratings yet
Data Analytics Unit4 Notes
No ratings yet
Data Analytics Unit4 Notes
3 pages
Da Imp Qna Cleaned
No ratings yet
Da Imp Qna Cleaned
7 pages
MCC Mba ML and Ai May30 2024
No ratings yet
MCC Mba ML and Ai May30 2024
201 pages
Unit Iv Material 06032025 Object Segmentation
No ratings yet
Unit Iv Material 06032025 Object Segmentation
38 pages
Data Analytics Unit IV
No ratings yet
Data Analytics Unit IV
36 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Classification
No ratings yet
Classification
5 pages
Pattern Recognition Unit 2
No ratings yet
Pattern Recognition Unit 2
24 pages
3 Pred Analysis
No ratings yet
3 Pred Analysis
18 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
9 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
Module 4 - Study Material - Overview of Predictive Analytics
No ratings yet
Module 4 - Study Material - Overview of Predictive Analytics
15 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
3 pages
Kavin
No ratings yet
Kavin
15 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
SML
No ratings yet
SML
8 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
R Data Analysis
No ratings yet
R Data Analysis
10 pages
مشین سیکھنا
No ratings yet
مشین سیکھنا
5 pages
Chapter 2 Machine Learning Draft-85-172
No ratings yet
Chapter 2 Machine Learning Draft-85-172
88 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
Exercise of Chapter 4 - Data Mining Tools and Techniques Worksheet
No ratings yet
Exercise of Chapter 4 - Data Mining Tools and Techniques Worksheet
4 pages
ML 2
No ratings yet
ML 2
3 pages
All About ML
No ratings yet
All About ML
18 pages
PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
Unit 3
No ratings yet
Unit 3
28 pages
Da Mid 2
No ratings yet
Da Mid 2
12 pages
Artificial Intelligence Course Curriculum
No ratings yet
Artificial Intelligence Course Curriculum
7 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Machine Learning AL-405 GS Answers
No ratings yet
Machine Learning AL-405 GS Answers
3 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Supervised Learning in Healthcare
No ratings yet
Supervised Learning in Healthcare
6 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
AIML105
No ratings yet
AIML105
5 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Marketing Analytics Week-8 LAQ
No ratings yet
Marketing Analytics Week-8 LAQ
4 pages
Unit-4 Pda
No ratings yet
Unit-4 Pda
111 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
Unit I Data Analytics
No ratings yet
Unit I Data Analytics
46 pages
GATE ML Updated 111023
No ratings yet
GATE ML Updated 111023
109 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Final ML
No ratings yet
Final ML
2 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Eigenvector Centrality and Pagerank
No ratings yet
Eigenvector Centrality and Pagerank
37 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
1 s2.0 S0167865510001169 Main
No ratings yet
1 s2.0 S0167865510001169 Main
11 pages
Chapter Twoo Simplex
100% (2)
Chapter Twoo Simplex
34 pages
Elchanan Mossel (UC Berkeley) CS 170:spring 2014: April 3, 2014 1 / 16
No ratings yet
Elchanan Mossel (UC Berkeley) CS 170:spring 2014: April 3, 2014 1 / 16
21 pages
AAM Practicals 6-10
No ratings yet
AAM Practicals 6-10
13 pages
Secant Method: Major: All Engineering Majors Authors: Autar Kaw, Jai Paul
No ratings yet
Secant Method: Major: All Engineering Majors Authors: Autar Kaw, Jai Paul
24 pages
Algorithms Unit1 074220
No ratings yet
Algorithms Unit1 074220
73 pages
Numerical Methods For Engineers - 9780073401065 - Egzersiz 15 - Quizlet
No ratings yet
Numerical Methods For Engineers - 9780073401065 - Egzersiz 15 - Quizlet
6 pages
Algo
No ratings yet
Algo
3 pages
03 Backtrace For Computing Alignments 5-55
No ratings yet
03 Backtrace For Computing Alignments 5-55
3 pages
Strudel Transformer Segmentation
No ratings yet
Strudel Transformer Segmentation
17 pages
Binary Search Trees: Objectives
No ratings yet
Binary Search Trees: Objectives
36 pages
Midterm 2022
No ratings yet
Midterm 2022
7 pages
Numerical Analysis Project
No ratings yet
Numerical Analysis Project
2 pages
EECE 301 Signals & Systems Prof. Mark Fowler: Note Set #18
No ratings yet
EECE 301 Signals & Systems Prof. Mark Fowler: Note Set #18
10 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
Ai 3
No ratings yet
Ai 3
8 pages
Lab 5,6,7
No ratings yet
Lab 5,6,7
18 pages
Assignment 2 Comp 425
No ratings yet
Assignment 2 Comp 425
3 pages
PFFT Max/MSP
100% (1)
PFFT Max/MSP
14 pages
Week 02 - Algorithm Complexity - Design Analysis of Algorithm
No ratings yet
Week 02 - Algorithm Complexity - Design Analysis of Algorithm
14 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
BSP Syllabus
No ratings yet
BSP Syllabus
3 pages
3rd Sem CSIT-1 Data Structure Using C (MCQ)
No ratings yet
3rd Sem CSIT-1 Data Structure Using C (MCQ)
3 pages
How To Find A Value in An Array?
No ratings yet
How To Find A Value in An Array?
33 pages
COMP4337 Lab1 Report
No ratings yet
COMP4337 Lab1 Report
7 pages
Autoencoder: Tuan Nguyen - AI4E
No ratings yet
Autoencoder: Tuan Nguyen - AI4E
35 pages
Merge Sort, Radix Sort, Shell Sort
100% (1)
Merge Sort, Radix Sort, Shell Sort
21 pages
Remainder Theorem
No ratings yet
Remainder Theorem
12 pages

Data Analytics Unit4 FullNotes

Uploaded by

Data Analytics Unit4 FullNotes

Uploaded by

Data Analytics - Unit 4 Full Notes

1. Supervised vs Unsupervised Learning

Supervised Learning vs Unsupervised Learning:

| Feature | Supervised Learning | Unsupervised Learning |

- Retail: Personalizing product recommendations

Goal: Improve analysis, decision-making, and forecasting by understanding group-specific behavior.

Ensembles (Multiple Trees):

Applications: Credit scoring, medical diagnosis, churn prediction

4. Overfitting and Pruning

Pruning is used to reduce tree size and improve generalization.

Goal: Build a model that balances complexity and accuracy.

5. Measures of Forecast Accuracy

Choosing the Right Metric:

- Retail: Sales forecasting

Lower metric values indicate higher accuracy.

1. Trend: Long-term progression

You might also like