0% found this document useful (0 votes)
78 views8 pages

Crop Yield Prediction Using ML Algorithms

1. The document discusses using machine learning algorithms to predict crop yields based on factors like weather, temperature, humidity, and rainfall. 2. It proposes using a random forest machine learning algorithm to make predictions and help farmers plan crops accordingly. 3. The random forest algorithm considers multiple factors simultaneously, unlike previous methods that only considered one factor at a time, to generate more accurate yield predictions. This will help farmers increase crop yields and quality.

Uploaded by

385swayam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views8 pages

Crop Yield Prediction Using ML Algorithms

1. The document discusses using machine learning algorithms to predict crop yields based on factors like weather, temperature, humidity, and rainfall. 2. It proposes using a random forest machine learning algorithm to make predictions and help farmers plan crops accordingly. 3. The random forest algorithm considers multiple factors simultaneously, unlike previous methods that only considered one factor at a time, to generate more accurate yield predictions. This will help farmers increase crop yields and quality.

Uploaded by

385swayam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Crop Yield Prediction Using ML Algorithms

Authors:-

Shashwat Sinha Swayam Verma


Kalinga Institute of Industrial Technology Kalinga Institute of Industrial Technology
3rd Year Information Technology B.Tech 3rd Year Information Technology B.Tech

Pratima Chaudhury
Kalinga Institute of Industrial Technology
3rd Year Information Technology B.Tech

Abstract-
Most agricultural crops have been badly affected by the effect of global climate change in India. This project will allow
farmers to capture the yield of their crops before cultivation in the field of agriculture and thus help them make the
necessary decisions. It utilizes Random Forest which is a Machine Learning Algorithm. By researching such
problems and issues such as weather, temperature, humidity, rainfall, humidity, there are no adequate solutions and
inventions to resolve the situation we face. In countries like India, even in the agricultural sector, there are many
types of increasing economic growth. In addition, the processing is useful for forecasting the production of crop
yields.
Keywords— Machine Learning; Crop_yield_prediction; Random forest Algorithm;

1. Introduction
The Indus Valley Civilization Period is when India's the prediction, this will help farmers with predictions
agricultural history began. In this industry, India is ease their lifestyle a little bit and increase the yield
ranked second. Agriculture and allied sectors account and quality of their harvest. The practical
for 20.2% of GVA (gross value added) in fiscal year implementation of machine learning techniques and
2020-2021, which is 1.8% higher than the previous its quantification are the main topics of this study.In
fiscal year 2019-2020, and 18.8% with 42.6% of the order to obtain a consistent trend, the work presented
workforce in fiscal year 2021-2022. In terms of net here additionally takes into account the erratic data
cultivated area, India leads the world with 9.6% of all from the temperature and rainfall databases. Contrary
arable land, followed by the US (8.9%), China (8.8%), to the customary practice of making predictions about
and Russia (8.8%). According to demographics, crop yields by only taking into account one aspect at a
India's socio economic fabric is mostly based on time, this method takes into account all of the factors
agriculture. The GDP contribution of agriculture in The remainder of the paper is structured as follows.
India is significantly declining as industrialization Section 2 contains Literature Surveys of the
rises. Integration with technology is not at the desired researches that were done before this paper. Section
level, which is a problem for the Indian agricultural 3 contains Methodology that briefly describes the
sector. The reason why the agriculture sector's full different algorithms and the requirements for ML
potential is not being used. It is difficult for farmers to Algorithms. Section 4 contains the Proposed Model.
predict the rainfall and temperature, which has an Section 5 contains Brief Detail on Data Sources and
impact on the yield of crops, as a result of the overuse Datasets. Section 6 contains the Prediction Result
of industrial technologies and non-renewable that we get after using Formula. Section 7 Contains
resources. Here, machine learning can help farmers Result and Analysis that we get after processing the
by using algorithms like RNN, LSTM, and others to data in the Random Forest Model. Section 8 contains
predict trends in temperature, rainfall, and crop yield. Pros and Cons of the proposed model. Section 9
Due to the ability to pre-plan crops in accordance with
Contains Conclusion of the paper and Future use for the Proposed model.

2. Literature Survey
On a dataset from the Indian government, Leo Brieman [5] specializes in the random forest
experiments by Aruvansh Nigam, Saksham Garg, and algorithm's accuracy, strength, and correlation. The
Archit Agrawal[1] showed that the Random Forest random forest algorithm generates decision trees
machine learning method provides the best yield from different data samples, predicts the data from
forecast accuracy. each subset, and then provides the best answer for
Balamurugan [2], have implemented crop yield the system by voting.
prediction by using only the random forest classifier. Mishra [6] has theoretically described various
Various features like rainfall, temperature and season machine learning techniques that can be applied in
were taken into account to predict the crop yield. various forecasting areas.
According to Dr. Y. Jeevan Nagendra Kumar [3], Using data mining techniques, Shastry et al[7] fitted
supervised learning allows machine learning various regression models to forecast crop yield in
algorithms to forecast an objective or outcome. This India. The crop yields of maize, wheat, and cotton are
study focuses on supervised learning methods for studied using time series data, soil, and weather
predicting crop yields. parameters.
Jig Han et al. [4] used a random forest algorithm to Manjula's et al.[8] research aimed to propose and
predict global and regional crop yields for potato, implement a rule-based system to predict crop yield
maize, and wheat, as well as environmental variables production from past data by using association rule
such soil, climate, photoperiod, fertilization data, and mining on agricultural data from 2000 to 2012.
water. Here is the Table[2.1] showing data for different
survey for the crop yield prediction:

Sr.No Author Aim Method Result

1. Saeed Khaki, Lizhi Wang and Sotirios Crop yield prediction CNN-RNN Model Its used to capture the time
V. Archontoulis dependencies of environmental factors
and the genetic improvement of seeds
over time without having their genotype
information.

2. Mayank Champaneri , Darpan Crop yield prediction Random forest Predicting the crop yield
Chachpara , Chaitanya Chandvidkar , Algorithm
Mansing Rathod

3. Thomas van Klompenburga , Ayalew To find the best Deep Learning The results show that no specific
Kassahuna , Cagatay Catalb, performing model, Methods conclusion can be drawn as to what the
models for Crop yield best model is, but they clearly show
prediction that some machine learning models are
used more than the others.

4. Mayank Champaneri, Crop yield prediction Random forest Based on the climatic input parameters
Chaitanya Chandvidkar, Algorithm the present study provided the
Darpan Chachpara , demonstration of the potential use of
Mansing Rathod data mining techniques in predicting
the crop yield based.

5. Ms. Ranjani J, Ms. V.K.G Kalaiselvi, Crop yield Random Forest The user-friendly web page built for
Ms. A.Sheela , prediction Algorithm estimating crop yield can be utilized by
Deepika Sree D, any user with their choice of the crop
Janaki G, by giving climate data for that location.

6. S. Vinson Joshua, A. Selwin Mich Crop yield prediction General Regression In this work, statistical models namely
Priyadharson, Raju Kannadasan, Arfat Neural Networks MLR and machine learning models
Ahmad Khan, Worawat Lawanont, (GRNN), such as BPNN, SVM, and GRNN
Faizan Ahmed Khan, Ateeq Ur Back Propagation models, are demonstrated for wide-
Rehman and Muhammad Junaid Ali Neural area spectrum considering the Indian
Network(BPNN), state of Tamil Nadu.GRNN model had
Support Vector a more significant potential to explain
Machine(SVM) 97% of variance from the input
parameters towards the crop yield;
offered higher prediction accuracy.

7. Lontsi Saadio Cedric, Wilfried Yves Crop yield prediction K-Nearest Neighbor In this paper, they proposed decision-
Hamilton Adoni, Rubby Aworka, support tools for decision-makers and
Jérémie Thouakesseh Zoueu, Franck farmers that predict six crop yields in
Kalala Mutombo, Moez Krichenf and some West African countries
Charles Lebon Mberi Kimpolo throughout the year, namely bananas,
yams, cassava, maize, rice, and seed
cotton.

Table[2.1] Different survey Data

3. Data Source and Datasets


The acquisition of dataset in the Indian sub terrain is a tad difficult as there is no official compilation of the required
datasets but scattered datasets are available which upon merging can be used to provide the desired yield. The
following Table[3.1] Fig[3.2] data were used throughout the paper. We also visualized the features of the dataset
through machine learning techniques shown in the Fig[3.3]. For better understanding of the dataset we have created
the heatmap of the data shown in Fig[3.4].

Sl no. Attribute Description

1 States Andaman and Nicobar Islands, Andhra Pradesh,


Arunachal Pradesh, Assam, Bihar, Chandigarh,
Chhattisgarh, Dadra and Nagar Haveli, Goa, Gujarat, Haryana, Himachal Pradesh, Jammu and Kashmir,
Jharkhand, Karnataka, Kerala, Madhya Pradesh, Maharashtra, Manipur, Meghalaya, Mizoram, Nagaland,
Odisha, Puducherry, Punjab, Rajasthan, Sikkim, Tamil Nadu, Telangana, Tripura, Uttar Pradesh,
Uttarakhand, West Bengal

2 Crops Arecanut, Other Kharif pulses, Rice, Banana, Cashewnut, Coconut, Dry ginger, Sugarcane, Sweet potato,
Tapioca, Black pepper, Dry chillies, other oilseeds, Turmeric, Maize, Moong(Green Gram), Urad, Arhar/Tur,
Groundnut, Sunflower, Bajra, Castor seed, Cotton(lint), Horse-gram, Jowar, Korra, Ragi, Tobacco, Gram,
Wheat, Masoor, Sesamum, Linseed, Safflower, Onion, other misc. Pulses, Samai, Small millets, Coriander,
Potato, Other Rabi pulses, Soyabean, Beans & Mutter(Vegetable), Bhindi, Brinjal, Citrus Fruit, Cucumber,
Grapes, Mango, Orange, other Fibers, Other Fresh Fruits, Other Vegetables, Papaya, Pome Fruits, Tomato,
Rapeseed & Mustard, Mesta, Cowpea(Lobia), Lemon, Pomegranate, Sapota, Cabbage, Peas, Niger seed,
Bottle Gourd, Sannhamp, Varagu, Garlic, Ginger, Oilseeds total, Pulses total, Jute, Peas & beans (Pulses),
Blackgram, Paddy, Pineapple, Barley, Khesari, Guar seed, Other Cereals & Millets, Cond-spcs other, Turnip,
Carrot, Redish, Arcanut (Processed), Atcanut (Raw),Cashew Nut Processed, Cashew Nut Raw, Cardamom,
Rubber, Bitter Gourd, Drum Stick, JackFruit, Snake Guard, Pump Kin, Tea, Coffee, Cauliflower, Other Citrus
Fruit, Water Melon, Total foodgrain, Kapas, Colocasia, Lentil, Bean, Jobster, Perilla, Rajmash Kholar,
Ricebean (nagadal), Ash Gourd, Beet Root, Lab-Lab, Ribbed Gourd, Yam, Apple, Peach, Pear, Plums,
Litchi, Ber, Other Dry Fruit, Jute & mesta

Table[3.1] Attribute List of dataset

Fig[3.2] Dataset sample

Fig[3.3] Graphical Representation of year, Fig[3.4]Correlation Matrix of


Area Size, Production, Temperature. Different attributes
4. Methodology
The use of data is crucial to machine learning. A called logistic regression is used to
technique called data preprocessing is used to turn forecast the likelihood of a target
the raw data into a clean data set. The data are variable. Since the target or
acquired from various sources, however because they dependent variable has a dual
are collected in raw form, analysis is not possible. We nature, there are only two viable
can change data into a comprehensible format by classes. This regression method
using various strategies, such as substituting missing determines whether a dependent
values and null values. The division of training and variable and the other independent
testing data is the last step in the data preprocessing variables are linearly related. On
process. Due to the fact that training the model our dataset, the logistic regression
typically requires as many data points as possible, the technique delivers an accuracy of
data typically tend to be distributed unevenly. The 87.8%.
initial dataset used to teach ML algorithms how to ● Random Forest : Random Forest
learn and make accurate predictions is known as the has the capacity to examine how
training dataset crop growth is influenced by the
➢ Factors Affecting the Crop yield prevailing climatic factors and
Any crop's yield and production are biophysical change. It is a
influenced by a number of variables. In supervised machine learning
essence, these variables aid in the prediction algorithm that is frequently
of crop yield over a specific time frame. We employed to solve classification and
took into account variables like area, regression problems. The random
temperature, rainfall, humidity, and wind forest algorithm builds decision
speed in this research. trees using several data samples,
➢ Different Machine Learning algorithms predicts the data from each subset,
We must first assess and compare potential and then determines which answer
algorithms before selecting the one that best is best for the system through user
fits this particular dataset. The best method voting. The bagging approach is
for solving the crop production problem used by Random Forest to train the
practically is machine learning. data, increasing the accuracy of the
Numerous machine learning methods are outcome. RF offers a 90.47%
employed to forecast agricultural yield. The accuracy for our data. As a result,
following machine learning techniques for we will use the Random Forest
selection and accuracy comparison are Algorithm to analyze our data
included in this paper: because it performs more
● Linear regression : A supervised accurately than the alternative
learning classification approach algorithm

5. Proposed Model
The diagram of the proposed model shown above is of Random Forest Model and it works in several steps those are:
1. When the Algorithm is started the Data Sets are Loded in the model and Graphs are made according to
them in the 1st step and random samples are taken from the date sets that are then processed to get them
in suitable form to Construct Decision Trees.
2. When the Decision Trees are made they are made using Attribute selection Process and the attributes that
are selected are data points[subset] selected by the user and then the Decision Trees that are formed then
get the data and then the Decision Trees create some set of rules and formulas to predict the result each
tree uses different sets of data and form different rules for prediction.
3. The Result from each Decision Tree is taken and Voted upon By the random Forest Classifier and the result
that gets highest votes Gets selected for the Final Result.
4. The Final Result is Displayed and Graphs are made according to the result.
Pseudocode of the Proposed System in Fig[5.2]:
1. We first randomly select the 'k's to feature out of the total 'm' feature in the model
2. Using the best split point the k feature is chosen and node d is calculated.
3. Using the split method, split the nodes into daughter nodes.
4. Repeat steps 1 to 3 until several nodes have been reached.
5. To make an n number of trees, repeat steps 1 to 4 for an n number of times.
To perform prediction using the trained random forest algorithm uses the below
pseudocode as shown in Fig[4.2]:
1. We used the test features and each random decision tree to predict the output and
the outcome, which was then saved.
2. The vote given by each decision tree for each predicted event was then calculated.
3. Finally, we looked at the most popular predicted outcome, which is the random forest

algorithm's final forecast.

Fig[5.2] Random Forest Algorithm Pseudocode [5.1] Proposed System


Flowchart

6. Crop Yield Calculation


The crop which was predicted by the Random Forest Classifier was mapped to the production of the predicted
crop.Then the area entered by the user was divided from the production to get crop yield.
Yield= Production/Area
Here is the Fig[6.1] showing the Yield for the following dataset

Fig[6.1] Yield which indicates Production per unit Area.


7. Result & Analysis
The collection contains about 2 lakh records. As the preferred algorithm for the paper, we utilized random forest. The
model's accuracy is approximately 90.47%, with a standard deviation of 6.36%.Here is the Fig[7.1] showing the
accuracy and standard deviation in the model. Fig[7.2] shows the prediction of the model through Random forest.

Fig[7.1] Accuracy and standard deviation of the model

Fig[7.2] Prediction of Yield through model through Random Forest


8. Pros & Cons
8.1. Pros of Using Random Forest importance and mean decrease in impurity (MDI) are
Algorithm: usually used to measure how much the model’s
accuracy decreases when a given variable is
1. Reduced risk of overfitting: Decision trees excluded. However, permutation importance, also
run the risk of overfitting as they tend to tightly fit all known as mean decrease accuracy (MDA), is another
the samples within training data. However, when important measure. MDA identifies the average
there’s a robust number of decision trees in a random decrease in accuracy by randomly permuting the
forest, the classifier won’t overfit the model since the feature values in samples.
averaging of uncorrelated trees lowers the overall
variance and prediction error. 8.2 Cons of using Random Forest Algorithm:
2. Provides flexibility: Since random forest can 1. Time-consuming process: Since random forest
handle both regression and classification tasks with a algorithms can handle large data sets, they can
high degree of accuracy, it is a popular method provide more accurate predictions, but can be slow to
among data scientists. Feature bagging also makes process data as they are computing data for each
the random forest classifier an effective tool for individual decision tree.
estimating missing values as it maintains accuracy 2. Requires more resources: Since random forests
when a portion of the data is missing. process larger data sets, they’ll require more
3. Easy to determine feature importance: resources to store that data.
Random forest makes it easy to evaluate variable 3. More complex: The prediction of a single
importance, or contribution, to the model. There are a decision tree is easier to interpret when compared to
few ways to evaluate feature importance. Gini a forest of them.

9. Conclusion & Future Work


The paper outlined a variety of machine learning techniques for estimating agricultural output based on area, season,
temperature, and rainfall. Studies using datasets from the Indian government have shown that the Random Forest
Regressor has the best accuracy for predicting yield. This will enable the farmers in India to determine the yield they
may anticipate in a particular climate and adjust the timing of crop planting accordingly. In the following years, we can
try to create a data-independent system. Our system must perform accurately regardless of format. Since crop
selection also takes soil knowledge into account, it is advantageous to incorporate soil information into the system.
Effective irrigation is also necessary for crop cultivation. Rainfall may show whether more water availability is needed
or not.

10. References
[1] Aruvansh Nigam, Saksham Garg, and Archit Agrawal “Predict the best yield forecast accuracy”.
[2] Balamurugan, “Random Forests”, 2001
[3] Dr. Y. Jeevan Nagendra Kumar ”Predicting Yield of the Crop Using Machine Learning Algorithm”,2015
[4] Jig Han et al.,“Applications of machine learning techniques in agricultural crop production”,2016
[5] Leo Brieman.,Random forest
[6] Mishra.,“Random forest”
[7] Shastry et al., “Crop yield prediction”
[8] Manjula's et al.,“Crop yield prediction” .
[9] Saeed Khaki, Lizhi Wang and Sotirios V. Archontoulis.,”crop yield prediction”
[10]Mayank Champaneri , Darpan Chachpara ,
Chaitanya Chandvidkar , Mansing Ratho.,”crop yield prediction”.
[11]Mayank Champaneri , Darpan Chachpara , Chaitanya Chandvidkar , Mansing Rathod.,”crop yield prediction”.
[12]Thomas van Klompenburga , Ayalew Kassahuna , Cagatay Catalb,“Crop yield prediction”.
[13]Mayank Champaneri, Chaitanya Chandvidkar, Darpan Chachpara ,Mansing Rathod,”crop yield prediction”
[14] Ms. Ranjani J, Ms. V.K.G Kalaiselvi, Ms. A.Sheela ,Deepika Sree D,Janaki G,”crop yield prediction”
[15]S. Vinson Joshua, A. Selwin Mich Priyadharson, Raju Kannadasan, Arfat Ahmad Khan, Worawat Lawanont, Faizan,” crop yield
prediction”.
[16]Lontsi Saadio Cedric, Wilfried Yves Hamilton Adoni, Rubby Aworka, Jérémie Thouakesseh Zoueu,
“Crop yield prediction”.

You might also like