0% found this document useful (0 votes)
4 views4 pages

Data Analytics Unit4 FullNotes

The document covers key concepts in data analytics, focusing on supervised vs unsupervised learning, segmentation, decision trees, overfitting and pruning, measures of forecast accuracy, and STL decomposition. It outlines definitions, types, techniques, applications, and challenges associated with each topic. The goal is to enhance analysis, decision-making, and forecasting in various fields such as marketing, healthcare, and finance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

Data Analytics Unit4 FullNotes

The document covers key concepts in data analytics, focusing on supervised vs unsupervised learning, segmentation, decision trees, overfitting and pruning, measures of forecast accuracy, and STL decomposition. It outlines definitions, types, techniques, applications, and challenges associated with each topic. The goal is to enhance analysis, decision-making, and forecasting in various fields such as marketing, healthcare, and finance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Analytics - Unit 4 Full Notes

1. Supervised vs Unsupervised Learning

Supervised Learning vs Unsupervised Learning:

| Feature | Supervised Learning | Unsupervised Learning |


|-------------------------------|----------------------------------------------------------|----------------------------------------------------------|
| Definition | Learning with labeled data | Learning with unlabeled data |
| Input Data | Input has output labels | Input has no output labels |
| Goal | Predict output | Discover hidden patterns |
| Output Type | Predictive (classification/regression) | Descriptive (clusters/associations)
|
| Examples of Tasks | Classification, Regression | Clustering, Association
|
| Evaluation | Accuracy, RMSE, etc. | Silhouette score, manual interpretation
|
| Algorithms | Decision Trees, SVM, Linear Regression | K-Means, DBSCAN, PCA
|
| Use Cases | Email spam detection, loan approval | Customer segmentation, anomaly
detection |

2. Segmentation

Segmentation is the process of dividing a dataset into smaller, meaningful subgroups based on similarities in attributes
or behavior.

Types of Segmentation:
- Demographic: Age, income, gender
- Geographic: Region, city, country
- Behavioral: Purchase habits, product usage
- Psychographic: Lifestyle, interests

Segmentation Techniques:
- K-Means Clustering
- Hierarchical Clustering
- DBSCAN
- Self-Organizing Maps (SOM)

Applications:
- Marketing: Targeting specific customer groups
- Healthcare: Grouping patients by conditions
Data Analytics - Unit 4 Full Notes

- Retail: Personalizing product recommendations

Goal: Improve analysis, decision-making, and forecasting by understanding group-specific behavior.

3. Decision Trees

Decision Trees are flowchart-like structures used for classification and regression tasks.

Types:
- Classification Tree: Output is categorical
- Regression Tree: Output is numerical

Structure:
- Nodes: Attribute tests
- Branches: Outcomes of tests
- Leaves: Final decisions or class labels

Splitting Criteria:
- Gini Index, Entropy/Information Gain for classification
- Variance reduction for regression

Process:
1. Choose the best splitting attribute
2. Partition the data accordingly
3. Recursively build subtrees
4. Stop when data is pure or depth is limited

Challenges:
- Overfitting: Very deep trees memorize noise
- Pruning: Technique to simplify the tree by removing branches

Ensembles (Multiple Trees):


- Random Forests: Uses voting among multiple trees
- Boosting: Combines weak learners into a strong model

Applications: Credit scoring, medical diagnosis, churn prediction

4. Overfitting and Pruning

Overfitting occurs when a model learns the training data too closely, including noise and anomalies, leading to poor
Data Analytics - Unit 4 Full Notes

generalization.

Symptoms:
- High training accuracy but low test accuracy
- Complex and deep tree structure

Causes:
- Too many attributes
- Lack of pruning
- Small datasets

Pruning is used to reduce tree size and improve generalization.

Types of Pruning:
- Pre-Pruning: Stops tree growth early (e.g., max depth, min samples)
- Post-Pruning: Removes unnecessary branches after full tree is built

Benefits:
- Reduces overfitting
- Improves prediction on unseen data
- Enhances interpretability

Goal: Build a model that balances complexity and accuracy.

5. Measures of Forecast Accuracy

Forecast accuracy metrics evaluate how close predictions are to actual values.

Common Metrics:
- MAE (Mean Absolute Error): Average of absolute errors
- MSE (Mean Squared Error): Average of squared errors
- RMSE (Root Mean Squared Error): Square root of MSE
- MAPE (Mean Absolute Percentage Error): Error as a percentage
- sMAPE (Symmetric MAPE): Balanced version of MAPE

Choosing the Right Metric:


- Use MAE for simple average error
- Use RMSE when large errors matter more
- Use MAPE for relative accuracy (not if data has zero values)

Applications:
Data Analytics - Unit 4 Full Notes

- Retail: Sales forecasting


- Finance: Stock price prediction
- Healthcare: Patient count prediction

Lower metric values indicate higher accuracy.

6. STL Decomposition

STL (Seasonal and Trend decomposition using Loess) breaks a time series into three components:

1. Trend: Long-term progression


2. Seasonality: Repeating short-term cycles
3. Residual: Random noise

STL uses LOESS (Local regression) for smoothing and is highly flexible.

Advantages:
- Works with any seasonality type
- Robust to outliers
- Allows component-wise analysis

Steps:
1. Input time series
2. Apply smoothing to extract trend and seasonality
3. Subtract from original to get residual

Applications:
- Retail: Understand sales trends
- Finance: Analyze stock patterns
- Weather: Seasonal forecasting

STL is ideal for preprocessing time series before applying models like ARIMA.

You might also like