0% found this document useful (0 votes)
2 views

A project based on Python

The project focuses on time series forecasting of steel sales using Python and machine learning techniques, specifically XGBoost and Exponential Smoothing. It involves data preprocessing, model training, and evaluation, with XGBoost achieving the highest accuracy. The project aims to enhance business decision-making through effective sales predictions and plans for future improvements, including a user-friendly dashboard and integration of additional data sources.

Uploaded by

ceomessai1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

A project based on Python

The project focuses on time series forecasting of steel sales using Python and machine learning techniques, specifically XGBoost and Exponential Smoothing. It involves data preprocessing, model training, and evaluation, with XGBoost achieving the highest accuracy. The project aims to enhance business decision-making through effective sales predictions and plans for future improvements, including a user-friendly dashboard and integration of additional data sources.

Uploaded by

ceomessai1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

A project based on Python

Time Series Forecasting of Steel Sales


and Quantities

Python/ML Internship at RINL Steel Plant,


Visakhapatnam

Duration: May 2024- June 2024


Supervisor: K. Kameshwar Rao
(Deputy General Manager of IT and ERP)
Table of Contents

• Acknowledgement

• Introduction

• Abstract

• Code

• Output

• Conclusion
ACKNOWLEDGEMENT
We are very thankful to our project guide Mr. T.V. KAMESHWARA RAO,
DGM (IT & ERP) from whom we received continuous support and
guidance throughout this period with which we are able to complete
our project successfully. We are wholeheartedly thankful to him for
giving us their valuable time and attention and also providing me a
systematic way for completing our project in time. We are also very
thankful to Visakhapatnam Steel Plant, especially IT & ERP and L&DC
department for giving this opportunity.
1. Introduction
Objectives:
• To Gather historical sales data and clean the dataset by handling
missing values, outliers, and ensuring consistency.
• To Understand the data characteristics and uncover underlying
patterns, trends, and seasonal effects
• To Enhance the dataset with additional features that could improve
the forecasting model's accuracy.
• To Choose and implement appropriate time series forecasting models
to predict future sales.
• To Implement models such as XGBoost, Exponential Smoothing and
comparing the models to select the model with the best accuracy.
• To Train the selected models on historical data and validate their
performance on a validation set.
• To Generate sales forecasts and evaluate their accuracy and
reliability.
• To Deploy the forecasting model into a production environment where
it can be used for real-time sales prediction.
• To Create visualizations and reports to communicate the forecast
results and insights to stakeholders.
• To Assess the impact of the forecasting model on business operations
and decision-making.
• To Ensure the forecasting model remains accurate and relevant over
time.
Problem Statement
Sales forecasting is a critical aspect of business operations, yet it
remains challenging due to fluctuating trends influenced by
multiple factors. Traditional methods often fail to capture
complex patterns, leading to inaccurate predictions and
inefficient resource utilization. This project aims to address these
issues using machine learning techniques.

Scope
The scope of this project includes:
• Analysing sales data for trends and patterns.
• Preprocessing data to ensure quality and consistency.
• Implementing and comparing machine learning models such as
XGBoost and Exponential Smoothing.
• Evaluating model performance using metrics such as Mean
Absolute Error (MAE), Mean Squared Error (MSE), Root Mean
Squared Error (RMSE), and R².
What is a Time Series?
A time series is a sequence of data points recorded or measured at
successive points in time, typically at uniform intervals. Examples
include daily stock prices, monthly sales data, yearly rainfall, and
quarterly GDP figures. Unlike other data types, time series data have a
natural temporal ordering, which is crucial for analysis and forecasting.

Key Components of Time Series


1. Trend: The long-term progression of the series. It represents the
general direction in which data points are moving over time. For
example, an upward trend in sales data indicates growing sales over
time.
2. Seasonality: Regular, periodic fluctuations in the time series data.
These are patterns that repeat at regular intervals due to seasonal
factors such as quarters of the year, months, or days of the week. For
instance, retail sales might peak during the holiday season.
3. Cyclic Patterns: Fluctuations in the time series data that are not of
a fixed period. These are often influenced by economic or business
cycles and can span several years.
4. Irregular (or Random) Component: The residual variations in the
time series data that cannot be attributed to trend, seasonality, or
cyclic patterns. These are random or unpredictable influences.
Techniques in Time Series Analysis
1. Decomposition: Breaking down a time series into its trend,
seasonal, and irregular components. This helps in understanding
the individual effects.
2. Smoothing: Techniques like moving averages or exponential
smoothing are used to remove noise and highlight the underlying
patterns in the data.
3. Autoregressive (AR) Models: Models where future values depend
linearly on past values. AR models capture the relationship
between an observation and a number of lagged observations
(prior time steps).
4. Moving Average (MA) Models: Models where future values depend
linearly on past forecast errors. MA models use past forecast
errors in the prediction of future values.
5. Seasonal Decomposition of Time Series (STL): A method for
decomposing time series into seasonal, trend, and remainder
components.
6. Machine Learning Models: Advanced techniques like Long
ShortTerm Memory (LSTM) networks and other deep learning
models that can capture complex patterns in the time series
data.

Applications of Time Series Analysis


1. Economic Forecasting: Predicting indicators like GDP,
unemployment rates, and inflation.
2. Stock Market Analysis: Forecasting stock prices and market
indices. 3. Sales Forecasting: Estimating future sales for
inventory management and planning.
3. Weather Prediction: Forecasting weather conditions such as
temperature and precipitation.
4. Energy Demand Forecasting: Predicting future energy
consumption for better resource management
2. Abstract:
This project utilizes three advanced machine learning models—
XGBoost, and Exponential Smoothing—to predict sales trends
effectively. Comprehensive data preprocessing techniques, including
data cleaning, transformation, handling of missing values, and feature
engineering, were employed to prepare the dataset. The models were
trained and evaluated using key performance metrics, revealing that
XGBoost outperformed the others in accuracy and robustness.
Visualizations of actual vs. predicted sales trends underscore the
practical applicability of these methods in enhancing business
decision-making.
Libraries and Frameworks used
Flask
• Description: Flask is a lightweight web framework for Python. It
is designed to be easy to use and flexible, allowing developers to
create web applications and APIs quickly.
• Usage in Project: Used to develop a web interface for the sales
forecasting application, enabling users to interact with the
forecasting model through a browser.
XGBoost Regressor
• Description: XGBoost (Extreme Gradient Boosting) is an efficient
and scalable machine learning library for regression and
classification problems. It implements gradient boosting
algorithms for decision trees.
• Usage in Project: Applied for building a powerful sales
forecasting model, leveraging its ability to handle large datasets
and complex patterns.
Exponential Smoothing
• Description: Exponential Smoothing is a time series forecasting
method that applies weighted averages of past observations,
where the weights decrease exponentially over time.
• Usage in Project: Used to model and forecast sales data,
capturing trends and seasonality in a simple and effective
manner.
Matplotlib
• Description: Matplotlib is a comprehensive library for creating
static, animated, and interactive visualizations in Python.
• Usage in Project: Utilized to create visualizations for data
exploration and to present forecast results, helping to understand
and communicate the findings.
NumPy
• Description: NumPy is a fundamental library for numerical
computing in Python. It provides support for arrays,
mathematical functions, and linear algebra operations.
• Usage in Project: Utilized for numerical computations and
handling array operations, which are essential for data
manipulation and model implementation.
Pandas
• Description: Pandas is an open-source data manipulation and
analysis library. It provides data structures like Data Frames,
which are essential for handling structured data.
• Usage in Project: Used for data cleaning, preparation, and
manipulation. Pandas is crucial for handling the time series data
efficiently.
Joblib
• Description: Joblib is a library for efficiently serializing and
deserializing Python objects. It is particularly useful for saving
and loading machine learning models.
• Usage in Project: Employed to save the trained forecasting
models, enabling them to be loaded and used without retraining.
Seaborn
• Description: Seaborn is a statistical data visualization library
based on Matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics.
• Usage in Project: Used to create advanced visualizations and
statistical plots, complementing Matplotlib by offering more
aesthetically pleasing and informative graphics.
3. Code < />
Data Preprocessing:
Data preprocessing ensures the dataset is clean, consistent, and
suitable for analysis and model training
It includes:
• Data Cleaning
• Data Tranformation
• Handling Missing Values
• Handling Outliers Feature Engineering

Data Cleaning:
• Address missing values using techniques like mean imputation or
forward fill.
• Remove duplicates and handle outliers effectively.

data.fillna(method='ffill', inplace=True)
data.drop_duplicates(inplace=True)

Data Transformation:

• Normalize or standardize features to ensure uniform scaling.


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
Handling Missing Values:

• Fill missing data points using interpolation or domain-specific


logic.
data['column'].fillna(data['column'].mean(), inplace=True)

Handling Outliers:
• Use statistical methods like the IQR or z-scores to identify and
manage outliers.
from scipy.stats import zscore
data = data[(zscore(data['column']) < 3)]

Feature Engineering:
• Create new features to enhance model performance, such as
lag variables or rolling averages.
data['rolling_avg'] = data['sales'].rolling(window=3).mean()

Data Splitting:
• Divide the dataset into training and testing sets for unbiased
model evaluation.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
Code < />
Model Implementation
XGBoost:
from xgboost import XGBRegressor
model = XGBRegressor()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Exponential Smoothing:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
model = ExponentialSmoothing(data, trend="add",
seasonal="add", seasonal_periods=12)
model_fit = model.fit()
predictions = model_fit.forecast(10)
4. Output
Key Results
XGBoost:
Achieved the lowest MAE, MSE, and RMSE, with the highest R² among
the models.
• Mean Squared Error: 0.0006829916551204035

• Root Mean Squared Error: 0.02613410903628443

• High R² value of 0.95 reflects excellent model performance


and strong accuracy

• R-squared: 0.9502872500313969

Key Terms:
• Mean Absolute Error (MAE): The average of the absolute
difference between the predicted and actual values. MAE is a
simple metric that' s good at handling outliers. A lower MAE
means better predictions
• Mean Squared Error (MSE): The average of the squared
difference between the predicted and actual values. MSE is more
sensitive to outliers than MAE and penalizes larger errors more
• Root Mean Squared Error (RMSE): The square root of the average
of the squared difference between the predicted and actual
values. RMSE is an intuitive measure of model accuracy that' s
easy to interpret.
• R-Squared: A statistical measure in a regression model that
determines the proportion of variance in the dependent variable
that can be explained by the independent variable.
Graphs:
XGBoost:
• Sales Value- Customer

• Sales Quantity- Customer


Exponential Smoothing:
• Sales Value-Customer

• Sales Quantity-Customer
5.Conclusion
Conclusion Summary:
The project successfully demonstrates the application of machine
learning for sales prediction. XGBoost proved to be the most effective
model due to its ability to handle large datasets and complex patterns.
Future Work: Integrate additional data sources for enhanced accuracy.
Develop a user-friendly dashboard for real-time sales forecasting.
Explore deep learning models for further improvements.

You might also like