0% found this document useful (0 votes)
155 views13 pages

Crop Recommendation System Using KNN and Random Forest Considering Indian Dataset

1) The document proposes using machine learning techniques like KNN and Random Forest to predict optimal crop selections for farmers in India based on soil properties and weather conditions. 2) It reviews several previous studies that used methods like decision trees, neural networks, and ensemble models to predict crop yields with varying levels of accuracy based on inputs like soil nutrients, rainfall, temperature. 3) The paper aims to build KNN and Random Forest models on an Indian dataset to recommend the best crop to farmers based on nitrogen, phosphorus, potassium levels, rainfall, humidity and temperature to help maximize crop production.

Uploaded by

Nishith Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views13 pages

Crop Recommendation System Using KNN and Random Forest Considering Indian Dataset

1) The document proposes using machine learning techniques like KNN and Random Forest to predict optimal crop selections for farmers in India based on soil properties and weather conditions. 2) It reviews several previous studies that used methods like decision trees, neural networks, and ensemble models to predict crop yields with varying levels of accuracy based on inputs like soil nutrients, rainfall, temperature. 3) The paper aims to build KNN and Random Forest models on an Indian dataset to recommend the best crop to farmers based on nitrogen, phosphorus, potassium levels, rainfall, humidity and temperature to help maximize crop production.

Uploaded by

Nishith Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Crop Recommendation System using

KNN and Random Forest considering


Indian Dataset
ABSTRACT So in order to maximize crop
production, predictions of various aspects of
Agriculture plays a crucial role in the crop are required based on the weather
growth of the country’s economy. In conditions in the locality. Yield prediction is
comparison to other countries, India has the an important agricultural problem. Usually,
highest production rate in agriculture. the farmers used to predict their yield from
Agriculture when combined with technology the previous year’s yield. As it is discussed
can bring the finest results. Crop prediction earlier, we could not predict the yield based
is a highly complex trait determined by on last year’s outcome due to many factors
multiple factors such as Contents of like crop stress, soil impurity, floods,
Nitrogen, Phosphorus, Potassium, Rainfall, pesticides, pests and diseases.
Temperature, Humidity, Ph level. Predicting Here we are going to use some
the crop in advance would help the existing mathematical models. As farmers
policymakers and farmers for taking are growing hybrid products that the soil
appropriate measures for farming, marketing
generally is not supportable but they are
and storage. Thus, in this paper we propose
crop selection using machine learning using pesticides and growing those. So the
techniques such as K- Nearest Neighbour quality of the soil decreases. And hence we
(KNN) and Random Forest. Both of the could not predict for those crops. Due to
models are simulated comprehensively on these abundant inventions people are
the Indian Dataset and an analytical report concentrated on cultivating hybrid
has been presented. This model will help the crops where they lead an unhealthy life.
farmers to know the type of the crop before
cultivating onto the agricultural field and thus Now-a-days, modern people can take
help them to make appropriate decisions. the help of technology in various dimensions
to grow crops. Thus, this paper aims at
Keywords—Crop Selection, Machine predicting the suitable crop type by knowing
Learning. the contents of Nitrogen, Phosphorous,
Potassium of the land as an input with the
I. INTRODUCTION rainfall in that area and with soil’s humidity
as well as surrounding temperature as
Agriculture is the backbone of the temperature also plays an important role in
Indian economy. In India, most of the crop growth and predict the best crop to
farmers depend on the weather conditions cultivate which can be selected by the
for agriculture. However, the soil quality also farmers to grow, that is the need of the
plays a major role for the productivity of a today’s generation. Because of these
particular crop. For example, rice cultivation cultivating techniques the seasonal climatic
mainly depends on rainfall. Now-a-days , all conditions are also being changed against
the seasonal moments are not the same as the fundamental assets like soil, water and
the previous. We could not even predict air which lead to insecurity of food. Which
whether there would be any floods or any also affects the proportion of N-P-K in the
water scarcity in the future. In addition to soil in-turns affects the rain. Thus, we mainly
this, the farmers are not strong enough to focus on these parameters and use data
use the technology, so that they can predict mining techniques to solve this question.
the crop production for any particular crop if Data mining is also useful for predicting crop
it is chosen to be farmed. But it is inevitable yield production [1], [2].
that soil health status can be used to The rest of the paper is as follows. In
recommend a crop type to be farmed for the section II some related literature are
next season. highlighted with their problem solving
approaches. In section III, we have
presented our simulation models and details The result obtained from an ensemble model
about the dataset including pre-processing is usually better than the result from any one
stages. In section IV we have discussed the of individual models.
simulation results and its analysis in two In [5], they have taken a
different subsections. Finally, we have dataset containing soil type, soil PH,
concluded with some future directions in Humidity, Temperature, Rainfall, Wind,
section V. Production, Cost of Production and annual
yield of that region for past 10-12 years and
II. STATE OF THE ART a decision tree classifier model has been
implemented on the data for crop yield and
This problem was identified a couple K-Nearest neighbours has been applied for
of years ago. After that many attempts have prediction of rainfall with 76.8% accuracy for
been made by the researchers throughout crop yield prediction and 89.4% accuracy for
the globe. However, there exist many rainfall prediction.
limitations among the farmers and the In [6], the authors used a deep neural
technological support by the regions and network model to predict four crop yields
states. Some of the directions for this issue namely: Aus-rice, Aman rice, Boro rice, Jute,
are described here as follows. Wheat and Potato using rainfall data, land
In [3], the authors have predicted crop types, chemical fertilizers, soil information.
yield using boosting techniques, random The DNN model is compared with RF, SVM
forest, support vector machine, k-nearest and LR. DNN outperforms other models with
neighbours and artificial neural network. This highest accuracy rate of 98% (Aus rice),
paper [3] proposed a method named Crop 95% (Aman rice), 96% (Boro rice), 97%
Selection Method (CSM) to achieve net yield (Potato), 96% (Wheat) and 94% (Jute).
rate of crops over season. CSM method,[3] The authors in [7] investigated the
may improve net yield rate of crops using crop suggestion model based on soil
limited land resources and also increase re- classification using machine learning
usability of the land. The CSM algorithm techniques. The study proposed an SVM
works on prediction of crop yield rate based based model to suggest crops which are
on favourable conditions in advance and specific to soil conditions. The proposed
gives a sequence of crops with the highest SVM model outperforms KNN and bagged
net yield rate. trees with 95% of accuracy.
The authors in [4] mainly focus on In [8], the authors investigated the
predicting the yield of the crop based on the rice yield prediction performance of KNN,
existing data by using the Random Forest decision tree(DT) and Naive Bayes(NB)
algorithm. The data represents the scenario using 11 parameters of micronutrients and
of Tamil Nadu. The data sets considered are macronutrients. The prediction accuracy for
rainfall, perception, production, temperature Naive Bayes is 98%, DT is 94% and KNN is
to construct random forest, a collection of 97%. The study concluded that NB achieved
decision trees by considering two-third of the a better prediction rate and was suitable for
records in the datasets.[4] The Decision tree rice yield prediction using soil parameters.
classifiers uses greedy approach hence an In [9] proposed a crop recommender
attribute chooses at first step can’t be used model for farmers using machine learning
anymore which can give better classification models. The prediction model is prepared
if used in later steps.[4]Also it over fit the using ANN and the model performance is
training data which can give poor results for compared against DT, KNN, RF. ANN
unseen data. achieved a highest of 91% than other
So, to overcome this limitation models. The crop suitability is predicted
ensemble model is used. In ensemble model using rainfall,
results from different models are combined.
soil type, soil conditions, temperature and 'rice' 'maize' 'chickpea' 'kidney
geographical location. beans' 'pigeon peas' 'moth beans'
'mung bean' 'blackgram' 'lentil'
N
'pomegranate' 'banana' 'mango' 'grapes'
P K Temperature Humidity PH Rainfall Crop
'watermelon' 'muskmelon' 'apple'
X Y 'orange' 'papaya' 'coconut' 'cotton'
'jute' 'coffee'

_ _ _ _ _ _ _ _
Fig. 4 CROPS

TABLE 1 III. OUR CONTRIBUTION


DATASET MODEL
In this paper, we have focused on
predicting the best crop to be grown in the
lands of farmers for the maximum yield. The
crop is predicted by using various machine
learning algorithms such as KNN and
Random Forest. Further we have made a
quantitative analysis of the accuracy. To
predict the best crop we have used 7
parameters i.e. Nitrogen (N) , Phosphorous
FIG.1. SAMPLE DATASET (P), Potassium (K), Temperature , humidity,
PH value, rainfall from the Data-set. The
above seven parameters are considered to
recommend the crop name. Thus, the
deciding parameters are known as X-Input
Parameter and recommended factor is taken
as Y-Output Parameter which is shown in
Table I
The dataset has 1547 records for
training and 618 records for testing where
the crop name is categorical data and others
are numerical data. The snapshot of the
FIG.2. TRAINING DATASET dataset. is shown in Fig. 1. The training
dataset and testing dataset are shown in Fig.
2 and Fig. 3.
The dataset considered here is a
complete dataset that contains almost zero
blank fields. Thus it is not required any one
hot encoding or pre-processing for filling or
removing the blank fields. As the dataset
contains only one categorical field, we have
chosen to use random forest that may
perform better. In the dataset 22 different
crop types are taken. A snapshot of the
FIG.3.TESTING DATASET dataset showing unique crop types is shown
in Fig. 4.
Here we are having 22 unique crops in our
project they are:
EXISTING TECHNOLOGIES X nearest =>
(X1, y1) = coordinates of the neighbour
In this project we are interested in nodes
predicting the crop which is suitable to grow X current =>
under certain conditions of Nitrogen, (X2, y2) = coordinates of the new node
Phosphorus and potassium in the soil. And Here we took the value of p as 2 for
atmospheric temperature and humidity minkowski distance.
around the crop and PH value of the soil, Here we can see the coordinates of
because PH value is one of the important x1, y1, x2, y2. To use this formula for our
factors which directly affects the crop growth. model we can assume x1, y1 are the current
Nowadays the PH value of the land coordinates and x2,y2 are the neighbour
increases because of the pesticides and coordinates. And by calculating all the k
fertilizers that we are using on our crop to kill nearest point distances we can get a
the pests. And rainfall we are having in the decision based on those coordinates.
crop area. Consider the table above where you can see
Here we are using K-Nearest- 7 X variable columns and one Y variable
Neighbour and Random forest Classifiers to column. In those X variable columns all are
predict the Crop that is suitable for the crop. numeric columns and where Y variable
Further, we will make a quantitative analysis columns we are having categorical columns.
of the accuracy. So before applying formulae we have to
To predict the crop we use seven convert the numerical values into
parameters. They are N(Nitrogen) , standardised numerical values. This process
P(Phosphorus) , K(Potassium) , prevents outliers from existing in the dataset.
Temperature , Humidity , PH , Rainfall and Our dataset will be as Fig. 5 after
predict the output ―Crop‖. standardising the input values.

Proposed Models After applying standard scalar on the


dataset
K-Nearest Neighbour (Dataset used for training the model)

It stands for K-Nearest Neighbour in


this supervised learning algorithm which is
used for classification based on how its
neighbours are classified. It stores all its
previous cases and classifies new ones
based on how similar it is to the previous Fig 5: STANDARDIZED DATASET
cases.
And K here signifies the amount of Outliers are responsible for sudden
neighbours we take for comparing the change of the output. Outliers lead to wrong
distance. We find the distance between new output and don't lead to a proper generalised
and previous cases based on square root of conclusion.
square of difference of distance between two We need to apply standard scaling to
points.(assuming here p value is 2 in the dataset. So that we can apply the
Minkowiski distance) formulae on the dataset and get the results
efficiently.
( ) (∑ | | )
RANDOM FOREST CLASSIFIERS
MINKOWISKI DISTANCE FORMULA
Random forest is nothing but an
ensemble classifier using many decision tree
models. This can be used for both prune which variables have less impact on
classification and regression. The key the prediction
difference between decision tree and
random forest is that random forest provides
a better generalized result than that of the
decision tree.
The process of it is, let the number of
training cases be N and the number of
variables in the classifiers be M. The number
m of input variables are used to determine
the decision at a node of the tree, m should
be less than M.
Here we must ensure m<M because
here we have to create random trees so if a
tree is having every M variable then there is
no possibility to have a random number of
trees while performing random forest
operation. By utilising all these trees, We use
other than N available test cases and use
them as test cases and estimate the error of FIG 7: Process to select the best split
the tree by predicting their classes.
For each node of the tree, we need to In our project the training data subset
choose random m variables on which to is selected from the following columns
base the decision at that node and we must (N, P, K, Humidity, Rainfall, Temperature,
calculate the best split based on these m PH)
variables in the training set It checks which columns are
correlated efficiently with the output variable
and produce minimum error and utilise that
split and if the error is higher than other split
the neglect current subset.
And after the pruning process
happens, we will get pruned decision trees
and all those are grouped together and
become a model.
When we pass any test case into it.
We get various outputs from different
decision trees belonging to the current
random forest classifier. And if output is
categorical, the majority of all decisions of
FIG 6: RANDOM FOREST CLASSIFIER the decision trees will be declared as an
answer and if the output is numerical, we
Here we got our decision trees by would generate the mean of the decisions as
randomly selecting m variables from the a final approximate decision.
input variables and got unique trees from
each set of input variables .But, we left with Random Forest works in two-phase first is to
another task that is called pruning. Pruning is create the random forest by combining N
necessary to avoid over fitting. For predicting decision trees, and second is to make
the best split we calculate which is having predictions for each tree created in the first
minimum prediction error based on that we phase.
The Working process can be explained in FIG 8: EXAMPLE FOR KNN
the below steps and diagram:
Step-1: Select random K data points from For example, let us take the K value
the training set. as 1 then it will become the Nearest
Step-2: Build the decision trees associated Neighbour algorithm if we apply KNN
with the selected data points (Subsets). minkowski distance formula for the above
Step-3: Choose the number N for decision dataset it will be as the following
trees that you want to build.
Step-4: Repeat Step 1 & 2.

MATHEMATICAL MODELS FIG 9: DISTANCE FORMULAE

KNN CLASSIFICATION Now when we compare current point


with each Point 1, 2, 3 individually then we
Here in our project we classify get distance between current point to the
categorical output based on the numerical respective points the results of those would
input where all the input variables passed to be
standard scaling to prevent outliers. So here
we are having 7 input variables with 1 output Current point -> Point 1 = 240.041317001
variable. We use these variables to calculate Current point -> Point 2 = 912.944739105
distance between any two inputs using Current point -> Point 3 = 1607.38942287
minkowski distance.
From the above calculation we can
( ) (∑ | | ) see that the distance between current point
and point 1 is smaller than point 2 and point
3. Hence we are checking KNN by
MINKOWSKI DISTANCE considering K as 1 from this table. We can
finalize point 1 as the nearest neighbour
Here p is taken as 2, so the nearest from the table hence the current point also
neighbour value is taken and returns its belongs to the class that belongs to the point
position. Now for finding the distance 1 that is class ―rice‖.
between two points let's consider the current In our example, we have taken the K
point and nearest point as the following (as p value as 1 and performed an algorithm. We
value is taken as 2). can use multiple values of K and perform an
For explanation, let us assume k algorithm. For example, If we take the K
value as 1. It means we are comparing with value as 5, then we must find 5 nearest
only 1 nearest neighbour. points to the current point and find the
majority class among those. The K value is
recommended to be an odd value.
K-NN algorithm stores all the
available data and classifies a new data
point based on the similarity. This means
when new data appears then it can be easily
classified into a well suited category by using
K- NN algorithm.

RANDOM FOREST
Random forest is a supervised FIG 10: Table which obtained after
machine learning technique that constructs bagging technique
multiple decision trees. In random forest, the
main algorithm is similar to decision tree And later we use the following table to
technique. In decision trees, we suffer from create a decision tree among so many
Low bias and high variance. Here in a decision trees that random forest creates. In
random forest this has a flexibility to convert the same process random forest creates
the high variance that we face in the decision many trees.
tree to the low variance. But here The final decision is made based on
randomisation is present. In such a way, we the outcome of the majority of the decision
will not use the whole dataset while training trees
the decision tree.
We will randomly select rows of the
dataset and it is called a bagging dataset.
And coming to variable selection of the
dataset, we will not select all the input
variables at time for training. We will select a
subset of input variables as input variables of
the tree and generate respective output. This
process of selection of input variables is the
same in every level. Hence we would have
different trees of different structures which
provide different classes as the output.
The speciality of random forest is it
creates multiple decision trees internally.
And finally when new instances appear we
would consider all the outputs that decision
trees provide and consider maximum votes
of output between those outputs, if the
outputs are categorical. And if the outputs
are numerical then we will consider the IV. SIMULATION AND ANALYSIS
mean of all the outputs or any metric that
provides numerical output. Dataset manually separated into 70%
Here in this table we can see many training and 30% testing dataset
rows and many columns and the final label The dataset contains numerical and
column is the output variable of the table. categorical attributes. Pre-processing
This algorithm may consider a subset of techniques for this dataset are Standard
rows of the table i.e. row (0,1,3,.....) and for Scalar used for numerical attributes for
creating nodes of the tree , random forest normalizing values and maintaining equality.
finds which column of the subset is best Label Encoder used for categorical
suitable for good decision making. attributes to convert labels into a numeric
For example we can only consider (N, form so as to convert them into the machine-
P, K, Rainfall) as the subset of columns to readable form.
make decision nodes. The above process is
called bagging technique.
( ) (∑ | | )

MINKOWSKI DISTANCE FORMULA

Here, the value of p is taken as 2 for


minkowski distance. Further, X1 represents
coordinates of the neighbour nodes and X2
represents coordinates of the new node. By
FIG 11:
calculating all the k-nearest point distances
Output_Variable_Y_Before_Applying_Label_
Encoding_for_Training_Dataset we can get a decision based on those
coordinates. Consider the Fig. 1 above
where you can see seven X variable
columns and one Y variable column. In those
X columns we have no categorical variables
and everything are numerical variables
which are further scaled down using
FIG 12: Standard Scalar and Y is a categorical
Output_Variable_Y_After_Applying_Label_En variable. So before applying the formula we
coding_for_Training_Dataset don’t need to convert the categorical
variables into numerical before training but
The dataset considered here is a we need to encode the dependent variable Y
complete dataset that contains almost zero as it is a categorical and it is easy to display
blank fields. Fig.5. Here we can see 8 the mathematical model and to train the
columns of which 7 are numerical and those model too. So in this model we have used a
are also known to be input variables and 1 label encoder to convert the categorical into
categorical variable that would be output of the numerical and which is used to train the
the model that we create and as it is a model.
categorical variable. we need to encode it to The categorical column is transformed
the numerical which is useful while we train into the numerical column by applying label
the model, as supervised models couldn’t encoding which is shown in FIG 11.
train on the categorical values directly. We A confusion matrix is a summary of
need to convert them to numerical and train prediction results on a classification problem.
them. The number of correct and incorrect
The simulation setup and models on predictions are summarized with count
this above dataset is as follows. values and broken down by each class. This
is the key to the confusion matrix.
K-Nearest-Neighbour

KNN stands for K-Nearest Neighbour. In this


supervised learning algorithm which is used
for classification, we classify based on how
its neighbours are classified. It stores all its
previous cases and classifies new ones
based on how similar it is to the previous
cases. Here, K signifies the amount of
neighbours we take for comparing the
distance. We find the distance between new
and previous cases based on equation 1.
The distance D using minkowski equation is
original dataset by multiple (row sampling +
feature sampling) sampling techniques. So
we are going to find multiple correlations
within the dataset and get an accurate
answer. Here row sampling and feature
sampling happen with replacement of the
records. It means the rows which appear in
Decision tree 1 can also appear in Decision
tree 2 but not every row matches. so it is
called a sampling technique. In general
decision trees have low bias and high
variance it means we are having low training
FIG 12: CONFUSION MATRIX FOR KNN error but high testing error. But when we
combine all the decision trees and use the
FIG 13: K VALUE AND CHANGE IN ITS K VALUE ACCURACY
ACCURACY FOR OUR MODEL
1 0.9715586178097922
Even if accuracies of K value 1 and 3
are more than k value as 5 we are choosing K 3 0.9728531161916694
value as 5 because to prevent over fitting.
5 0.9696147823363608
RANDOM FOREST CLASSIFIER 0.9605658210669171
7
Random Forest classifier is simply a 9 0.9560392525315795
bagging technique. It is a classification
algorithm where we come to a decision 11 0.9540995928593798
based on several trees. As in our model we
majority vote technique, then the high
are classifying the crop based on the user
variance is converted to the low variance.
input we can use the random forest
Hyper parameter denotes the number of
classifier. This algorithm builds each tree to
Decision trees that we are going to use in
design the uncorrelated forests which are
our Random Forest machine learning model.
going to predict accurate answers by using
multiple decision trees. A decision tree
generates only one scenario of a tree
whereas a random forest is a group of trees
which checks every possible uncorrelated
tree and generates the accurate average
answer. This random forest algorithm
produces good predictions either by using
the classification as well as the regression
tasks. It generates good accuracy over the
decision tree. And using this random forest
classifier we can prevent over-fit.
So as this is the maximum vote result FIG 14: CONFUSION MATRIX FOR
there is high probability to get a correct RANDOM FOREST
answer. The testing performance won’t be
affected as the number of trees increases.
Random forest also produces lower bias.
Random Forest does have multiple
decision trees which are generated from the
ANALYSIS

This paper is related to classification


of crops. Random Forest and KNN are
mostly used for classification problems.
Dataset contains Temperature, Humidity,
PH-level, Rainfall, Nitrogen, Phosphorus and
Potassium.
FIG 15: RESULTS OBTAINED IN OUR
KNN-contains some instances if any
SIMULATION OF RANDOM FOREST
new instance matches with these instances
TABLE II then it is a neighbour.
ACCURACY ANALYSIS Random Forest-multiple instances
average and majority voting gives decision of

The value of K changes then the Model Accuracy in %


output also varies. Hence accuracy is
K-Neighbours Classifier 96.96147823363608
unstable. This model is good when we have
an abundant amount of data to track. When Random Forest Classifier 98.05825242718447
coming to the Random Forest the main
benefit is sampling techniques. output but decision tree makes only single
It finds all the correlations between decision.
the dataset and models multiple decision Logistic Regression is used for
trees based on some sampled data of the predicting the categorical dependent variable
dataset. Sampling includes both the row as using a given set of independent variables.
well as feature sampling. So there is a high The decision tree model gives high
chance of having different correlations importance to a particular set of features. But
among the dataset. Hence we get different the random forest chooses features
outputs for different decision trees based on randomly during the training process.
the correlations of respective trees. Therefore, it does not depend highly on any
Our model uses cross validation specific set of features. This is a special
technique. It is a resampling procedure used characteristic of random forest over bagging
to evaluate machine learning models and trees.
assess how the model will perform for an Here, we have trained our dataset
independent test dataset and used KNN with KNN and Random Forest Classifier. It is
classifier i.e. N-neighbours as 5 and applied observed that Random Forest Classifier
classifier to each model finally found out shows better accuracy when it is stimulated
accuracy. with our dataset. KNN couldn’t work as much
Based on those results by using as Random Forest because KNN and
majority vote technique we get a maximal Random Forest both depend on the majority
accurate correlated result. There is very high vote technique. But mainly KNN depends on
accuracy because we are developing the distance between already existing data
multiple decision trees and in turn getting the and the new data. And it mostly depends on
majority answer from that. So Random the classes of the neighbours.
Forest is much preferable but it is highly This system may help the farmers to
computation compared to KNN. Random decide the crop to be chosen for the
forests have low bias and low variance. upcoming season that will not degrade its
product outcome followed by loss. Using this
system, the profit of the agricultural sector
can be improved through maximizing crop
harvest, which will lead to gain of interest
among the youngsters for technology based Journal of Advancements in Engineering &
farming. In the future a hybrid model can be Technology, vol. 1, no. 1, pp. 1–8, 2020.
designed with a Data-set having a large [6] T. Islam, T. A. Chisty, and A.
number of attributes that will make a strong Chakrabarty, ―A deep neural network
and robust model. approach for crop selection and yield
prediction in bangladesh,‖ in 2018 IEEE
V. CONCLUSION Region 10 Humanitarian Technology
Conference (R10-HTC), pp. 1– 6, IEEE,
In this paper, we have shown that we 2018.
can suggest the crop that should be [7] S. A. Z. Rahman, K. C. Mitra, and S. M.
cultivated in a particular region by the help of Islam, ―Soil classification using machine
its soil quality, N-P-K values, humidity and learning methods and crop suggestion based
expected rainfall. The accuracy of the on soil series,‖ in 2018 21st International
prediction may improve if we will have a Conference of Computer and Information
proper tested soil with additional features. Technology (ICCIT), pp. 1–4, IEEE, 2018.
We have used K Nearest Neighbours [8] V. Singh, A. Sarwar, and V. Sharma,
Classifier and Random Forest Classifier for ―Analysis of soil and prediction of crop yield
the crop. (rice) using machine learning approach.,‖
International Journal of Advanced Research
in Computer Science, vol. 8, no. 5, 2017.
[9] Z. Doshi, S. Nadkarni, R. Agrawal, and N.
REFERENCES Shah, ―Agroconsultant: Intelligent crop
[1] T. van Klompenburg, A. Kassahun, and recommendation system using machine
C. Catal, ―Crop yield prediction using learning algorithms,‖ in 2018 Fourth
machine learning: A systematic literature International Conference on Computing
review,‖ Computers and Electronics in Communication Control and Automation
Agriculture, vol. 177, p. 105709, 2020. (ICCUBEA), pp. 1–6, IEEE, 2018
[2] A. Chlingaryan, S. Sukkarieh, and B.
Whelan, ―Machine learning approaches for
crop yield prediction and nitrogen status
estimation in precision agriculture: A review,‖
Computers and electronics in agriculture,
vol. 151, pp. 61–69, 2018.
[3] R. Kumar, M. Singh, P. Kumar, and J.
Singh, ―Crop selection method to maximize
crop yield rate using machine learning
technique,‖ in 2015 international conference
on smart technologies and management for
computing, communication, controls, energy
and materials (ICSTM), pp. 138–145, IEEE,
2015.
[4] P. Priya, U. Muthaiah, and M.
Balamurugan, ―Predicting yield of the crop
using machine learning algorithm,‖
International Journal of Engineering
Sciences & Research Technology, vol. 7, no.
1, pp. 1–7, 2018.
[5] A. Patil, S. Kokate, P. Patil, V. Panpatil,
and R. Sapkal, ―Crop prediction using
machine learning algorithms,‖ International

You might also like