Crop Recommendation System Using KNN and Random Forest Considering Indian Dataset
Crop Recommendation System Using KNN and Random Forest Considering Indian Dataset
_ _ _ _ _ _ _ _
Fig. 4 CROPS
RANDOM FOREST
Random forest is a supervised FIG 10: Table which obtained after
machine learning technique that constructs bagging technique
multiple decision trees. In random forest, the
main algorithm is similar to decision tree And later we use the following table to
technique. In decision trees, we suffer from create a decision tree among so many
Low bias and high variance. Here in a decision trees that random forest creates. In
random forest this has a flexibility to convert the same process random forest creates
the high variance that we face in the decision many trees.
tree to the low variance. But here The final decision is made based on
randomisation is present. In such a way, we the outcome of the majority of the decision
will not use the whole dataset while training trees
the decision tree.
We will randomly select rows of the
dataset and it is called a bagging dataset.
And coming to variable selection of the
dataset, we will not select all the input
variables at time for training. We will select a
subset of input variables as input variables of
the tree and generate respective output. This
process of selection of input variables is the
same in every level. Hence we would have
different trees of different structures which
provide different classes as the output.
The speciality of random forest is it
creates multiple decision trees internally.
And finally when new instances appear we
would consider all the outputs that decision
trees provide and consider maximum votes
of output between those outputs, if the
outputs are categorical. And if the outputs
are numerical then we will consider the IV. SIMULATION AND ANALYSIS
mean of all the outputs or any metric that
provides numerical output. Dataset manually separated into 70%
Here in this table we can see many training and 30% testing dataset
rows and many columns and the final label The dataset contains numerical and
column is the output variable of the table. categorical attributes. Pre-processing
This algorithm may consider a subset of techniques for this dataset are Standard
rows of the table i.e. row (0,1,3,.....) and for Scalar used for numerical attributes for
creating nodes of the tree , random forest normalizing values and maintaining equality.
finds which column of the subset is best Label Encoder used for categorical
suitable for good decision making. attributes to convert labels into a numeric
For example we can only consider (N, form so as to convert them into the machine-
P, K, Rainfall) as the subset of columns to readable form.
make decision nodes. The above process is
called bagging technique.
( ) (∑ | | )