E-Commerce Data Analysis and Visualization Using Random Forest Regression and Prophet Model
E-Commerce Data Analysis and Visualization Using Random Forest Regression and Prophet Model
net/publication/366020486
CITATIONS READS
0 326
4 authors, including:
K. M. Zubair
International Islamic University Malaysia
5 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by K. M. Zubair on 05 December 2022.
Abstract— The motivation behind this study is to better understand the ecommerce sector by
examining public datasets. In our case, we’ll work on Pakistan’s Largest E-commerce dataset that
we found from kaggle. We’ll work on problems like finding the most sold products in the store to
develop a better business model, most popular categories in terms of sale, demand for products
that are in high rising phase, products that get cancelled the most and predicting upcoming year’s
sales condition to analyze the growth rate of sale in different timespan. We'll use Apache Spark, an
open-source cluster-computing technology, as well as Tableau, which is also crucial for the tasks at
hand. This research could be applied to commodity assessments of purchased things on
e-commerce sites to do sentiment analysis.
Keyword: E-commerce, Sales Analysis, Upcoming sales prediction, fbProphet model, RandomForestRegressor
a) Correlation Test
B. Train Dataset
B. Model Testing
We have used two models for the prediction in our
project which are Random Forest Regression model and
Prophet model. Both models have been considered in terms
of their r2, mean squared error, mean absolute score. As the
Figure 8: Customers with total cancelled order
result of justification, we have preferred the Random Forest
method to forecast the future sales condition.
a) Descriptive Analysis:
The research question for descriptive analysis was,
“Which category has the most popularity in terms of
sale? ”
b) Diagnostic Analysis:
Regarding the diagnostic analysis, the
research question we selected is, “How was the
sales condition throughout the time from July, 2016
to July, 2018 ? ”
c) Causality Analysis:
For causality analysis, the research question
was, “Does the item cancellation get affected by time
or customer ?”
e) Predictive Analysis:
We have made some predictions with the next
year's sales state condition which is based on this
research question: “How is the sales state going to
Figure 14: Items that not cancelled based on time
be next year ?” We have used two models for the
prediction which are ‘RandomForest Regression
We were determined to discover the answer whether the Model’ as well as ‘Prophet Model’
item cancellation depends on some particular time of a year,
or it is some other reason such as customers etc.
View publication stats
VIII. ACKNOWLEDGMENT
This project was conducted as part of the Big Data
Analytics (CSC 3303) course offered by the Department of
Computer Science at the International Islamic University
Malaysia in Kuala Lumpur. The authors of the study would
Figure 18: r2, MSE, MAE scores using like to express their gratitude to Dr. Sharyar Wani for his
RandomForestRegressor assistance in making this work successful.
From the score (r2, MSE, MAE) stated above, after proper
justification, it may be declared that, RandomForest
regression model worked better in this case in terms of
accuracy and nicer prediction.
VII. CONCLUSIONS