Crop Yield Prediction Paper
Crop Yield Prediction Paper
1 ABSTRACT
As we all know, in India, agriculture has always been the foundation of our nation. Because of this, and
the fact that all humans depend on agriculture for sustenance, agriculture contributes significantly to
India’s economy. However, farmers in India face a number of issues due to various weather variations.
Therefore, crop yield prediction, based on data from the previous 20 years, can assist farmers in
selecting which crop to produce.
Numerous elements, including soil, temperature, precipitation, fertilizers, and pesticides, affect the
quality and quantity of crops. Using a variety of machine learning algorithms, including a stacked
model, Random Forest Regressor, KNeighbor Regressor, Gradient Boosting Regressor, and Linear
Regression, we will forecast the crop yield. This research utilizes rainfall, pesticides, area, fertilizers,
and other factors to predict agricultural yields, helping farmers choose crops appropriate for their
location and time of year. Based on the comparison of accuracy, R2 score, mean squared error,
mean absolute error, and other characteristics; additionally, a hybrid model is created to improve the
algorithm and provide a better prediction.
2 INTRODUCTION
In India, various crops are cultivated throughout the nation, contributing to about 17 percent of the
overall Indian economy. Agriculture is considered the foundation of the nation. Despite its crucial role,
70 percent of the country’s population is engaged in agriculture, providing raw materials for industry.
Two significant UN targets include ending poverty and ensuring food security by 2030.
Farmers face numerous difficulties and losses due to climate change, impacting crops in multiple
ways. According to NCRB data for 2022, about 6083 deaths, including five suicides among agricultural
laborers, were recorded. These statistics, with 611 females and 5472 males, are unfortunately common
in India, often attributed to financial losses leading to suicides.
To help farmers determine which crop production will be most profitable, crop yield predictions
are essential. With 20 years of historical data, we can forecast crops. The variables aiding our crop
prediction based on historical data include season, state, area production, annual rainfall, fertilizer,
and pesticides. Various machine learning models can be implemented, with 80 percent of the data
used for training and 20 percent for testing. The optimal model for crop yield prediction, determined
after testing and training, will produce the highest accuracy.
Comparisons among models are made using metrics like accuracy, mean squared error, and mean
absolute error. Given the availability of various technologies to improve agricultural efficiency, several
models are available to forecast crop yield, employing various machine learning algorithms for precise
information derived from the given data.
By applying crop yield prediction, farmers can increase production in favorable conditions and
reduce production loss in unfavorable climates. Regression analysis, a popular method for crop predic-
tion, can be done using techniques such as decision tree, random forest, gradient boosting, k neighbor,
and more, providing accurate results with pertinent data.
In this paper, various regressors are employed, including decision trees, gradient boosting, linear, K
neighbors, stacked model, and random forest. The hybridization of gradient boosting and random forest
1
regressors in stacked regressors results in our hybrid model yielding the best accuracy of 97.9percent,
while random forests maintain an accuracy of 97.7percent.
3 LITERATURE SURVEY
Janmejay Pant et al. (2020)The author of this paper proposed using machine learning (ML) to develop
a train model that can be used to identify patterns in data. In this study, ML is used to produce the four
most cultivated yields in India. Data from the Fao and Word data banks are used, and we compare the
results using the gradient boosting regressor, random forest regressor, select vector machine decision
tree regressor, and decision tree regressor that best fit the observed data[1].
Dr. Y. Jeevan Nagendra Kumar et al. (2020)The author of this paper used machine learning
methodology to predict crop yield. This methodology includes forecasting crop yield from historical
data that includes factors like temperature, humidity, rainfall, and crop name. It will help us identify
the best crop to plant based on yields in the field, weather conditions, and crop name. Random forest
algorithm is used to predict crop yield. The processed technique helps formal acquire knowledge about
crop requirements and prices[2].
Aruvansh Nigam et al. (2019)The author of this paper focuses on using a variety of machine
learning techniques to predict crop yield. The results of these techniques are compared based on
mean absolute error. The production made by the machine learning algorithm will assist the farmer in
choosing which crop to grow to get the maximum yield by taking into account factors like temperature,
rainfall area, etc. We use random forest regressor and rnn long term memory to give the best accuracy.
Additionally, rn is more accurate in predicting rainfall, whereas lstm is more accurate in predicting
temperature[3].
SHILPA MANGESH PANDE et al. (2021) This research model employs a support vector machine
artificial neural network random forest multivariant linear regressor and cane closest neighbor, and
random forests give the best accuracy of 95
Gour Hari Santra et al. (2016) The authors Subhadra Mishra, Debahuti Mishra, and Gour Hari
Santra have come to the conclusion that this is a highly researched area with room to grow in the
future. Crop forecasting is made easier by the combination of computer science and agriculture. This
technique also aids in the provision of agricultural information and yield rate enhancement techniques.
Regression analysis, decision tree techniques, and artificial neural networks are the algorithms used.
The lack of a clear technique is a drawback[5].
S. Kunchakuri et al. (2021)An implementation of a Crop Yield Prediction System (CYPS) makes
use of the KNN algorithm. However, a farmer’s production projections need to take into account a
number of variables that could impact the yield and quality of crops. Three factors affect yield pro-
duction: season, crop type, and production area. For this reason, authors anticipate yield production
using fields like year, crop, area, region, and season.Having precise knowledge of crop yield history is
essential for making judgments about agricultural risk management [6].
P. Mohan et al.(2017)The proposal is to estimate crop productivity by the application of Parallel
Layer Regression (PLR) and a Deep Belief Network (DBN). Here are five key crops that are thriving
in Karnataka: Rice, ragi, and pulses are the topics of a DBN method. According to the recommended
methodology, one of the five crops will be grown in each place listed in the pertinent database. Not
to mention, the experimental results show that the strategy has a great deal of potential for precise
agricultural productivity prediction in terms of accuracy, specificity, and sensitivity, and that real-time
data and human interactions have been used to validate its efficacy [7].
Singh, C. D. et al. (2014, January)created a crop-advising application that operates in a few
Madhya Pradesh, India, districts. The crop will be labeled and the results will be obtained based on
the trigger values provided. The user would input cloud cover, rainfall, temperature, and previous
yield observations, and the system would predict the yield [8].
D. Elavarasan et al.(2018)A sample of the input/output association is provided to the system
for learning purposes or to construct the training dataset in supervised learning. The device aims
to identify a certain skill that converts each input into its appropriate output. While keeping in
mind that the ultimate objective is to minimize the loss in future forecasting, linked target esteems
are taken into account. Unsupervised learning is a kind of machine learning where an informative
index is utilized to create the learning framework. There are no values for the ground truth. The
training dataset is comprised solely of an input vector and a cost function, both of which will be
2
minimized within the research program [9]. Fathima explains the various mining techniques to take
quantitative crops into account and relate them for interseason growth. Combining Since vast amounts
of information are tests, the letter k denotes that computation is used to manage vast amounts of data.
Appropriate computation is employed to determine which harvests are selected as continuous variables.
Additionally, they focus on edge cutting procedures and administration tactics[10].
4 METHODOLOGY
4.1 Dataset
Crop, Crop Year, Season, State, Area, Production, Annual Rainfall, Fertilizer, and Pesticide are among
the factors included in the dataset. The datasets were downloaded via the Kaggle platform. There are
19690 instances or data in the data set that are drawn from historical records. Ten features, such as
Crop, Crop Year, Season, State, Area, Production, Annual Rainfall, Fertilizer, Pesticide, and Yield,
are included.
3
Figure 1: the data flow diagram of the proposed model.
The strategy is highly attractive, but as the dimension grows, that is, when there are numerous
independent variables, it quickly becomes unfeasible.
LinearRegression 151974.730 81
GradientBoostingRegressor 24898.356 96.89
KNeighborsRegressor 32316.852 95.97
DecisionTreeRegressor 34273.404 95.73
RandomForestRegressor 17845.416 97.78
stacked model 16571.325 97.93
4
5 RESULT
The random forest regressor and gradient boosting regressor are hybridized using stacking regressor,
which is employed along with XGB, linear, KNN, decision tree, and forest regression.
assessed using Mean Squared Errors and Accuracy. Since this is a regression problem, accuracy
and MAE are suitable metrics to evaluate the efficacy of machine learning algorithms that can antici-
pate generate values with high accuracy. Finding more effective machine algorithms to forecast yield
projections is helpful. The Table 1 contains the corresponding values that various methods were able
to obtain for the India details test dataset. A Table 1 displays the comparison of the outcomes of the
various machine learning algorithms used in this work.and our Hybid model give the high accuracy of
97.93 percent.
7 REFRENCES
1.Pant, J., Pant, R. P., Singh, M. K., Singh, D. P., Pant, H. (2021). Analysis of agricultural crop
yield prediction using statistical techniques of machine learning. Materials Today: Proceedings, 46,
10922-10926.
2.Kumar, Y. J. N., Spandana, V., Vaishnavi, V. S., Neha, K., Devi, V. G. R. R. (2020, June).
Supervised machine learning approach for crop yield prediction in agriculture sector. In 2020 5th
International Conference on Communication and Electronics Systems (ICCES) (pp. 736-741). IEEE.
3.Nigam, A., Garg, S., Agrawal, A., Agrawal, P. (2019, November). Crop yield prediction using
machine learning algorithms. In 2019 Fifth International Conference on Image Information Processing
(ICIIP) (pp. 125-130). IEEE.
4.Pande, S. M., Ramesh, P. K., ANMOL, A., Aishwarya, B. R., ROHILLA, K., SHAURYA, K.
(2021, April). Crop recommender system using machine learning approach. In 2021 5th international
conference on computing methodologies and communication (ICCMC) (pp. 1066-1071). IEEE.
5. Gour Hari Santra, Debahuti Mishra and Subhadra Mishra, Applications of Machine Learning
Techniques in Agricultural Crop Production, Indian Journal of Science and Technology, October 2016
6. S. Kunchakuri, S. Pallerla, S. Kande, and N. R. Sirisala, “An efficient crop yield prediction
system using machine learning algorithm,” in Proc. 4th Smart Cities Symp. (SCS), vol. 2021, Nov.
2021, pp. 120–125.
7. P. Mohan and K. K. Patil, “Crop production rate estimation using parallel layer regression with
deep belief network,” in Proc. Int. Conf. Electr., Electron., Commun., Comput., Optim. Techn.
(ICEECCOT), Dec. 2017, pp. 168–173
8. Veenadhari, S., Misra, B., Singh, C. D. (2014, January). Machine learning approach for
forecasting crop yield based on climatic parameters. In 2014 International Conference on Computer
Communication and Informatics (pp. 1-5). IEEE
9.D. Elavarasan, D. R. Vincent, V. Sharma, A. Y. Zomaya, and K. Srinivasan, “Forecasting yield
by integrating agrarian factors and machine learning models: A survey,” Computers and electronics
in agriculture, vol. 155, pp. 257–282, 2018.
11.Nishant, P. S., Venkat, P. S., Avinash, B. L., Jabber, B. (2020, June). Crop yield prediction
based on Indian agriculture using machine learning. In 2020 International Conference for Emerging
Technology (INCET) (pp. 1-4). IEEE.
12.Arun, R. A., Saraswathi, T., Meeha, D., Sugeerthi, G. (2023, August). A Machine Learning-
based Agricultural Yield Forecasting System for Predicting Crop/Plant Yield before Planting the
5
Crops/Plants. In 2023 12th International Conference on Advanced Computing (ICoAC) (pp. 1-6).
IEEE.
13.Sharma, P., Dadheech, P., Aneja, N., Aneja, S. (2023). Predicting Agriculture Yields Based on
Machine Learning Using Regression and Deep Learning. IEEE Access.
14.Gupta, A., Dargar, A., Majid, A., Dargar, S. K., Kumar, M. S., Raju, M. (2023, July). Markov
Chain Model Used in Agricultural Yield Predictions Utilizing on Indian Agriculture. In 2023 IEEE
World Conference on Applied Intelligence and Computing (AIC) (pp. 1008-1014). IEEE.
15.Medar, R., Rajpurohit, V. S., Shweta, S. (2019, March). Crop yield prediction using machine
learning techniques. In 2019 IEEE 5th international conference for convergence in technology (I2CT)
(pp. 1-5). IEEE.