Data Analytics Unit4 Notes
Data Analytics Unit4 Notes
2. Segmentation
Segmentation divides a large dataset into smaller, meaningful subgroups based on similar behavior or attributes.
Purpose: Discover patterns, target specific user groups, improve model performance.
3. Decision Trees
Types:
- Classification Tree: Categorical output
- Regression Tree: Numerical output
Process:
1. Select splitting attribute (e.g., Gini, Entropy)
2. Split data into subsets
3. Recur until leaf nodes are pure
Data Analytics - Unit 4 Notes
Overfitting: When model fits training data too well, including noise
Pruning Types:
- Pre-pruning: Stop early (e.g., max depth, min samples)
- Post-pruning: Build full tree, then cut weak branches
6. STL Decomposition
Components:
- Trend: Long-term movement
- Seasonality: Repeated cycles
- Residual: Noise