0% found this document useful (0 votes)
4 views3 pages

Data Analytics Unit4 Notes

The document outlines key concepts in data analytics, including the differences between supervised and unsupervised learning, segmentation techniques, decision trees, overfitting, and measures of forecast accuracy. It details various algorithms, applications, and evaluation methods relevant to these topics. Additionally, it covers STL decomposition for analyzing time series data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

Data Analytics Unit4 Notes

The document outlines key concepts in data analytics, including the differences between supervised and unsupervised learning, segmentation techniques, decision trees, overfitting, and measures of forecast accuracy. It details various algorithms, applications, and evaluation methods relevant to these topics. Additionally, it covers STL decomposition for analyzing time series data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Analytics - Unit 4 Notes

1. Supervised vs Unsupervised Learning (Tabular Format)

| Feature | Supervised Learning | Unsupervised Learning |


|-------------------------------|----------------------------------------------------------|----------------------------------------------------------|
| Definition | Learning with labeled data | Learning with unlabeled data |
| Input Data | Input has output labels | Input has no output labels |
| Goal | Predict output | Discover hidden patterns |
| Output Type | Predictive (classification/regression) | Descriptive (clusters/associations)
|
| Examples of Tasks | Classification, Regression | Clustering, Association
|
| Evaluation | Accuracy, RMSE, etc. | Silhouette score, manual interpretation
|
| Algorithms | Decision Trees, SVM, Linear Regression | K-Means, DBSCAN, PCA
|
| Use Cases | Email spam detection, loan approval | Customer segmentation, anomaly
detection |

2. Segmentation

Segmentation divides a large dataset into smaller, meaningful subgroups based on similar behavior or attributes.

Types: Demographic, Geographic, Behavioral, Psychographic

Techniques: K-Means, Hierarchical, DBSCAN

Applications: Marketing, Healthcare, Finance, E-commerce

Purpose: Discover patterns, target specific user groups, improve model performance.

3. Decision Trees

A tree-like structure used for classification or regression.

Types:
- Classification Tree: Categorical output
- Regression Tree: Numerical output

Process:
1. Select splitting attribute (e.g., Gini, Entropy)
2. Split data into subsets
3. Recur until leaf nodes are pure
Data Analytics - Unit 4 Notes

Overfitting: Deep trees that memorize training data


Pruning: Reduces tree size to prevent overfitting

Applications: Loan approval, diagnosis, HR attrition

4. Overfitting and Pruning

Overfitting: When model fits training data too well, including noise

Symptoms: High training accuracy, poor test accuracy

Pruning Types:
- Pre-pruning: Stop early (e.g., max depth, min samples)
- Post-pruning: Build full tree, then cut weak branches

Goal: Improve generalization, reduce complexity

5. Measures of Forecast Accuracy

Used to evaluate time series model performance:

- MAE = Mean Absolute Error


- MSE = Mean Squared Error
- RMSE = Root Mean Squared Error
- MAPE = Mean Absolute Percentage Error
- sMAPE = Symmetric MAPE

Applications: Retail demand, finance, weather forecasting

Lower values = Better accuracy

6. STL Decomposition

STL = Seasonal and Trend decomposition using Loess

Components:
- Trend: Long-term movement
- Seasonality: Repeated cycles
- Residual: Noise

Uses Loess smoothing for flexible decomposition


Data Analytics - Unit 4 Notes

Applications: Sales trends, stock prices, weather patterns

Helps clean and analyze time series data before forecasting.

You might also like