Demand Prediction of Consumer Intention To Buy Edible Items Using Machine Learning Techniques
Demand Prediction of Consumer Intention To Buy Edible Items Using Machine Learning Techniques
2
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO PAULO. Downloaded on March 18,2024 at 23:15:16 UTC from IEEE Xplore. Restrictions apply.
cross-validation technique has been done on alternative sets. output variables. It exhibits the linear relationship, which
Finally, the %error of the values has been identified for signifies how the values of dependent variables are altered
predicting the performance of the models. There are various according to the values of independent variables.
ML models which have been used to identify the prediction
of the demand for okra and tomato. These algorithms are Decision tree regressor: The decision tree regressor
used to train the model to forecast the precise/best algorithm is a supervised learning algorithm whose primary
prediction outcome. The proposed methodology has been motive is to forecast the aimed variables by framing a
shown in Fig. 2. training model[15]. It is a technique based on trees in which
each path originating from the root is represented by a data-
isolated sequence until a boolean value is achieved at the leaf
node[15][16]. The decision tree algorithm is also optimized
if the data set carries the minimum number of nodes and is
properly classified.
Random forest regressor: It regressor is a well-admired
algorithm that pertains to the supervised learning technique
[17]. It can help solve both classification and regression
problems in ML. It mainly revolves around the concept of
ensemble learning, which is used to solve complex problems
by combining multiple classifiers for improving the
performance of the model. In short, a random forest is a
classifier in which numerous decision trees are carried out
on various subparts of a given dataset and takes the mean to
Fig. 1. Cross-validation technique improve the predictive accuracy of the model. Simply, a
random forest makes the prediction from each tree and based
The algorithms used in the proposed work are as follows: on the maximum votes prediction, it will predict the final
output [18][19].
Linear regression: Linear regression is a statistical way
used for predictive analysis [13][14]. It could be explained as
a single-layer perceptron neural layout involving input and
3
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO PAULO. Downloaded on March 18,2024 at 23:15:16 UTC from IEEE Xplore. Restrictions apply.
Logistic regression: It is also associated with supervised prediction and actual attribute shows that the execution of the
learning techniques which are used to predict categorical model is good and the predicted values are so close to the
dependent variables from the given set of independent original values.
variables [20][21]. It is used in various predictive models
such as healthcare, stock movement, crop yield and price Fig. 5 depicts the relationship between the actual value
prediction [22][23][24]. The output of this model must be and predicted values of the demands of okra and tomato. The
categorical or distinct values which are in the boolean form graph shows that the random forest has performed quite good
either 0 or 1, true or false, Yes or No. It gives the values in but some predicted values are too large which is directly
probability form between 0 and 1 instead of showing the affecting the accuracy of the model.
values as 0 and 1. This type of model is similar to linear
regression which is best known for solving and predicting
regression-based challenges while logistic regression works
for solving the problems of classification.
IV. IMPLEMENTATION RESULTS
This section identifies the prediction performance of
various ML algorithms used in this paper which are shown
with the help of illustrations in the graphs below.
In the evaluation phase, various algorithms have been used
to forecast the demand for fruits and vegetables and gained
different results from each algorithm. The proposed model
is based on a regression problem hence there are various Fig. 5. Predicted vs original values using random forest
regression-based parameters which have been utilized to
identify the performance results of the proposed work which Fig. 6 shows the prediction performance of the logistic
are: accuracy and %error. regression model in the case of predicting the demand for
okra and tomato. The graph identifies that the logistic
regression model has a very huge difference between the
actual demands and the predicted demand which results that
this model is not suitable for prediction with such kind of
dataset.
4
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO PAULO. Downloaded on March 18,2024 at 23:15:16 UTC from IEEE Xplore. Restrictions apply.
several advancements in the realm of agriculture have been
put into practice. With the advent of AI and ML, various
issues which are faced by farmers have been resolved.
Similarly, there is another issue causing a big loss to the
farmers which is the demand for crops and vegetables
required the next day. Various companies buy crops along
with vegetables from farmers and sell them directly to
consumers. But the problem with this is unpredicted demand
due to a huge amount of rotten vegetable and fruit waste each
day. These companies buy only limited stock. However, it
causes problems for the farmers as the remaining vegetable
and fruit get rotten and wasteful. The objective of this paper
is to describe modern ML techniques in the commercial
agricultural sector. To reduce the wastage of fruits and
Fig. 7. Accuracy comparison of various machine learning algorithms
vegetables by predicting the optimum demand of fruits and
vegetables required. In this paper, we are discovering the
solution to a real-time problem in which we predicted the
demand for a particular product, utilizing historical data. ML
plays a vital role in the prediction of the demand for
vegetables and fruit. The visualising techniques of ML are
also used for a better understanding of data, such as graphs
and histograms that provide ease in determining the
performance of the algorithm. Four different ML approaches
namely random forest, decision tree, logistic regression and
linear regression have been used for the prediction based on
the previous data. In this paper, the decision tree has been
identified as the best method with the highest accuracy value
of 99.62% for predicting the demand for crops in the
upcoming day. While the performance of random forest,
Fig. 8. %error comparison of various machine learning algorithms linear regression and logistic regression have been identified
with the accuracy value of 98.61%, 95.90% and 85.62%,
The performance results of logistic regression, linear respectively. This article can help various eCommerce
regression, random forest and decision tree have been companies for predicting the vegetable and fruit demand
tabulated in Table II. as %error and accuracy. each day. Furthermore, this work is also helpful for academia
to understand the applications of various algorithms used in
TABLE II. PERFORMANCE COMPARISON OF VARIOUS TECHNIQUES
PROPOSED IN THE WORK WITH EXISTING WORK the paper.
REFERENCES
Technique Accuracy %error
[1] R. Dhanapal, A. AjanRaj, S. Balavinayagapragathish, and J. Balaji,
Linear Regression 95.9% 4.09% “Crop price prediction using supervised machine learning
algorithms,” J. Phys. Conf. Ser., vol. 1916, no. 1, p. 012042, 2021.
Decision Tree 99.62% 0.37% [2] V. Nathgosavi, “A survey on crop yield prediction using machine
learning,” Turkish Journal of Computer and Mathematics Education
(TURCOMAT), vol. 12, no. 13, pp. 2343–2347, 2021.
Random Forest 98.61% 1.38%
[3] N. Bali and A. Singla, “Emerging trends in machine learning to
predict crop yield and study its influential factors: A survey,” Arch.
Logistic Regression 85.62% 14.37%
Comput. Methods Eng., vol. 29, no. 1, pp. 95–112, 2022.
Naive Bayes[11] 87% 13% [4] M. Rakhra et al., “Crop price prediction using random forest and
decision tree regression:-A review,” Mater. Today, 2021.
KNN[11] 85% 15% [5] S. Sharma and K. Guleria, “Deep learning models for image
classification: Comparison and applications,” in 2022 2nd
International Conference on Advance Computing and Innovative
Technologies in Engineering (ICACITE), 2022.
Table II. presents the performance comparison of the [6] R. Sharma and V. Kukreja, “Mustard Downy Mildew Disease
proposed work and existing work. This comparison shows Severity Detection using Deep Learning Model,” in 2021
that the proposed decision tree model as an outperforming International Conference on Decision Aid Sciences and Application
(DASA), pp. 466-470, 2021.
model having the highest accuracy rate of 99.62% while
other proposed models namely linear regression and random [7] S. Sharma and K. Guleria, “A systematic literature review on deep
learning approaches for pneumonia detection using chest X-ray
forest also outperform the existing models. Furthermore, in images,” Multimed. Tools Appl., 2023.
comparison with logistic regression, it performs quite better [8] R. Sharma and V. Kukreja, “Amalgamated convolutional long term
than KNN[11] but underperformed when compared to Naive network (CLTN) model for Lemon Citrus Canker Disease Multi-
Bayes[11]. classification,” in 2022 International Conference on Decision Aid
Sciences and Applications (DASA), 2022.
V. CONCLUSION [9] M. Iatrou et al., “Topdressing nitrogen demand prediction in rice crop
using machine learning systems,” Agriculture, vol. 11, no. 4, p. 312,
The agriculture sector is extremely important to the 2021.
country's economy in the nation. In the past few years,
5
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO PAULO. Downloaded on March 18,2024 at 23:15:16 UTC from IEEE Xplore. Restrictions apply.
[10] M. Rakhra, A. Bhargava, D. Bhargava, R. Singh, A. Bhanot, and A.
W. Rahmani, “Implementing machine learning for supply-demand
shifts and price impacts in farmer market for tool and equipment
sharing,” Journal of Food Quality, 2022.
[11] M. Rehman et al., “Semantics analysis of agricultural experts’
opinions for crop productivity through machine learning,” Appl.
Artif. Intell., vol. 36, no. 1, pp. 1–16, 2022.
[12] B. P. Bv and M. Dakshayini, “Computational Performance Analysis
of Neural Network and Regression Models in Forecasting the Societal
Demand for Agricultural Food Harvests,” in Research Anthology on
Artificial Neural Network Applications, IGI Global, 2022, pp. 1287–
1300.
[13] W. Wei and X. Yang, “Comparison of diagnosis accuracy between a
backpropagation artificial neural network model and linear regression
in digestive disease patients: An empirical research,” Comput. Math.
Methods Med., vol. 2021, p. 6662779, 2021.
[14] T. K. Saha, S. Pal, and R. Sarkar, “Prediction of wetland area and
depth using linear regression model and artificial neural network
based cellular automata,” Ecol. Inform., vol. 62, no. 101272, p.
101272, 2021.
[15] B. Charbuty and A. Abdulazeez, “Classification based on decision
tree algorithm for machine learning,” Journal of Applied Science and
Technology Trends, vol. 2, no. 01, pp. 20–28, 2021.
[16] C. S. Lee and P. Y. S. Cheang, “Predictive analysis in business
analytics: Application of decision tree in business decision making,”
Adv. Decis. Sci., vol. 26, no. 1, pp. 1–29, 2021.
[17] M. A. Khan et al., “Application of random forest for modelling of
surface water salinity,” Ain Shams Engineering Journal, vol. 13, no.
4, 2022.
[18] V. K. Gupta, A. Gupta, D. Kumar, and A. Sardana, “Prediction of
COVID-19 confirmed, death, and cured cases in India using random
forest model,” Big Data Min. Anal., vol. 4, no. 2, pp. 116–123, 2021.
[19] K. Guleria, S. Sharma, S. Kumar, and S. Tiwari, “Early prediction of
hypothyroidism and multiclass classification using predictive
machine learning and deep learning,” Measurement: Sensors, vol. 24,
no. 100482, p. 100482, 2022.
[20] P. Schober and T. R. Vetter, “Logistic regression in medical
research,” Anesth. Analg., vol. 132, no. 2, pp. 365–366, 2021.
[21] X. Song, X. Liu, F. Liu, and C. Wang, “Comparison of machine
learning and logistic regression models in predicting acute kidney
injury: A systematic review and meta-analysis,” Int. J. Med. Inform.,
vol. 151, no. 104484, p. 104484, 2021.
[22] P. K. Sarangi, K. Guleria, D. Prasad, and D. K. Verma, “Stock
movement prediction using neuro genetic hybrid approach and impact
on growth trend due to COVID-19,” Int. j. netw. virtual organ., vol.
25, no. 3/4, p. 333, 2021.
[23] S. Sharma, K. Guleria, S. Tiwari, and S. Kumar, “A deep learning
based convolutional neural network model with VGG16 feature
extractor for the detection of Alzheimer Disease using MRI scans,”
Measurement: Sensors, vol. 24, no. 100506, p. 100506, 2022.
[24] S. Srivastav, K. Guleria and S. Sharma, "Tea Leaf Disease Detection
Using Deep Learning-based Convolutional Neural Networks," 2023
IEEE World Conference on Applied Intelligence and Computing
(AIC), Sonbhadra, India, 2023, pp. 569-574, doi:
10.1109/AIC57670.2023.10263835.
6
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SAO PAULO. Downloaded on March 18,2024 at 23:15:16 UTC from IEEE Xplore. Restrictions apply.