Sales Forecasting Using Machine
Learning
Class Presentation by [Your Name]
1. Problem Understanding
• Sales forecasting is the process of estimating
future sales based on historical data.
• Why is it important?
• - Helps businesses plan inventory
• - Optimizes resources and logistics
• - Improves financial planning and decision
making
2. Data Collection
• The data used in sales forecasting typically
includes:
• - **Date**: Sales at a daily, weekly, or
monthly level
• - **Sales numbers**: Historical sales revenue
or units sold
• - **Features**: Promotions, holidays,
competitor prices, weather, etc.
• Example datasets available on Kaggle:
3. Data Preprocessing
• Data preprocessing is crucial before feeding
the data into machine learning models.
• Key steps include:
• - **Handling missing values**: Remove or fill
missing data
• - **Feature engineering**: Extract features
like 'Month', 'Day of the Week' from the 'Date'
• - **Handling outliers**: Correct or remove
extreme data points that may skew results
4. Feature Engineering
• Feature engineering helps create meaningful
features from raw data to improve model
performance.
• Example Features:
• - **Date-based features**: Extract month,
week, day, and seasonality from the 'Date'
column
• - **Lag features**: Create previous day, week,
or month sales as input features
5. Model Selection
• Various machine learning models can be used
for sales forecasting:
• - **Linear Regression**: Suitable for simple,
linear relationships
• - **Decision Trees & Random Forest**:
Capable of capturing non-linear relationships
• - **XGBoost**: An advanced gradient
boosting algorithm, highly effective for
structured data
6. Example: XGBoost Model
• Why XGBoost?
• - It handles missing data and works well with
large datasets.
• - It captures complex non-linear patterns and
interactions between features.
• Steps for training the model:
• 1. Initialize the XGBoost model
• `model = XGBRegressor()`
7. Model Evaluation
• To assess the performance of the model, we
use the following metrics:
• - **Root Mean Squared Error (RMSE)**:
Measures the square root of the average
squared differences between predicted and
actual values
• - **Mean Absolute Error (MAE)**: Measures
the average magnitude of the errors without
considering their direction
8. Hyperparameter Tuning
• Hyperparameter tuning optimizes the model’s
performance by finding the best combination
of parameters.
• Common hyperparameters in XGBoost:
• - **n_estimators**: The number of trees to
build
• - **learning_rate**: Step size for updating
model weights
• - **max_depth**: Maximum depth of each
9. Conclusion and Future Work
• In this project, we successfully built a machine
learning model to forecast sales using
historical data.
• Key Takeaways:
• - Data preprocessing and feature engineering
are critical to model success.
• - XGBoost is a powerful tool for handling
tabular data and providing accurate
predictions.