0% found this document useful (1 vote)
345 views44 pages

Prediction of Crops Based On Soil Type Using Machine Learning

The main goal of our project is to create a one stop solution to various problems in the domain of agriculture. Now a day’s Machine Learning is getting more popular as it is the technique of teaching machines to make decisions by the provided data.

Uploaded by

mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
345 views44 pages

Prediction of Crops Based On Soil Type Using Machine Learning

The main goal of our project is to create a one stop solution to various problems in the domain of agriculture. Now a day’s Machine Learning is getting more popular as it is the technique of teaching machines to make decisions by the provided data.

Uploaded by

mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI-590018

Project Report on

(18CSP82)

“Prediction Of Crops Based on Soil Type Using Machine Learning”


Submitted in the partial fulfilment of the requirement for the award of the degree in

Bachelor of Engineering
In
Information Science & Engineering
Submitted by

Aishwarya Das Prakash 1NC18IS002


AnkithaHR 1NC18IS006
Prathiksha S V 1NC18IS036
Madhusudhan Reddy 1NC17IS011

Under the Guidance of


Shruti Jalapur
Assistant Professor ISE, NCET

DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING

NAGARJUNA COLLEGE OF ENGINEERING AND TECHNOLOGY


(An Autonomous Institution under VTU, Belgavi-590018) VENKATAGIRIKOTE,
DEVANAHALLI, BENGALURU– 562164

2021-2022
NAGARJUNACOLLEGEOFENGINEERINGANDTECHNOLOGY
(AnAutonomousInstitutionunderVTU,AccreditedbyNAACwith“A+”Grade)
Bengaluru-562164,Karnataka,India

DEPARTMENTOFINFORMATIONSCIENCE&ENGINEERING

CERTIFICATE
Certifiedthattheprojectworkentitled“Prediction Of Crops Based on Soil Type Using Deep
Learning”carried outby Ms. Aishwarya Das Prakash (INC18IS002), Ms. Ankitha H R
(1NC18IS006), Ms. Prathiksha S V (INC18IS036)andMs.Madhusudhan
Reddy(1NC17IS011)bonafide students of Nagarjuna College of Engineering and Technology, an
autonomous institutionunder VisvesvarayaTechnological University, Belagavi in partial fulfillment for
the awardof Bachelor in Engineering in Information Science & Engineering during the academic
year2021-2022. It is certified that all corrections/suggestions indicated for internal assessmenthave been
incorporated in the report deposited in the departmental library. The project workhasbeen approved, as
it satisfies the academic requirement in respect of project workprescribedforthe saiddegree.

SignatureoftheGuide SignatureoftheHOD SignatureofthePrincipal

Ms. Shruti Jalapur Dr. Anil Kannur Dr. BV Ravishankar

Assistant Professor Head of the Department Principal, NCET


Dept. of ISE Dept. of ISE

ExternalViva-Voce

NameoftheExaminer Signaturewithdate

1.......................................... .............................................

2. ......................................... .............................................
ACKNOWLDGEMENT

Every project begins with an idea and materializes with concrete efforts. In the beginning, we would like
to thank the almighty God and our parents who gave us the strength and capability to work on this
project and complete it successfully.

We are extremely grateful to our project guide Ms. Shruti Jalapur Assistant Professor, Department of
Information Science & Engineering for the guidance and encouragement.
It is indeed gratifying to have the privilege to express our sense of gratitude to our project coordinators
Ms. Shruti Jalapur Assistant Professor Department of Information Science and Engineering, NCET
for his scholarly guidance during the course of investigation.
We extend our sincere gratitude to Dr. Anil Kannur Professor &Head of the Department, Information
Science & Engineering, NCET, for his consistent assistance and guidance during the course of the
projectwork.
We also express our gratitude to Dr. B V Ravishankar, Principal, Nagarjuna College of Engineering
and Technology for his help andsupport.
Finally, we express our immense pleasure and thanks to all Teaching staff and non-teaching staff of
the Department of Information Science & Engineering, NCET for their co-operation andsupport.

Aishwarya Das Prakash (1NC18IS002)

Ankitha H R (1NC18IS006)

Prathiksha SV (1NC18IS036)

Madhusudhan Reddy (1NC17IS011)

I
TABLE OF CONTENTS

1. ACKNOWLEDGEMENT Ⅰ

2. TABLEOFCONTENTS Ⅱ-III

3. LISTOFFIGURES IV

4. LISTOFTABLES V

5. ABSTRACT V

II
CHAPTER NO. CHAPTER TITLE PAGE NO.

PAPER WITH CETIFICATE 1-5

CHAPTER 1 INTRODUCTION 6

CHAPTER 2 IMPLEMENTATION 7-12


SYSTEM ARCHITECTURE AND DESIGN 7-8

FLOWCHART DIAGRAM 8-9

SEQUENCE DIAGRAM 9-10

DATAFLOW DIAGRAM 10-12

CHAPTER 3 ALGORITHAMS USED 13-19

K-NEAREST NEIGHBOR (KNN) 13-15


RANDOM FOREST 15-17
SUPPORT VECTOR MACHINE 17-19

CHAPTER 4 MODEL OF THE PROJECT 20-21

CHAPTER 5 SYSTEM TESTING 22-30

5.1 TEST ENVIRONMENT 22-23


5.2 TEST CASE 23-24
5.3 TESTING IN MACHINE LEARNING 24-25

5.4 SYSTEM TESTING 26

5.5 UNIT TEXTING 26-30

CHAPTER 6 RESULTS 31-33

CHAPTER 7 CONCLUSION AND SCOPE FOR FUTURE WORK 34

REFERENCES 35

III
LIST OF FIGURES

FIGURE CAPTION PAGE NO.


NO.
Figure 2.1 SYSTEM ARCHITECTURE 8

Figure 2.2 FLOW CHART 9

Figure 2.3 SEQUENCE DIAGRAM 10

Figure 2.4 LEVEL 0 DATA FLOW DIAGRAM 11

Figure 2.5 LEVEL 1 DATAFLOW DIAGRAM 12

Figure 3.1 K-NEAREST NEIGHBOR 14

Figure 3.2 KNN CLASSIFICATION 14

Figure 3.3 KNN EUCLIDEAN DISTANCE 15

Figure 3.4 SIMPLIFIED RANDOM FOREST 16

Figure 3.5 VISUALIZATION OF A RANDOM FOREST MODEL 16


MAKING A PREDICTION
Figure 3.6 SUPPORT VECTOR MACHINE 18

Figure 3.7 FLOW DIAGRAM OF SVM 19

Figure 5.1 FILTRATION OF DATASETS 25

Figure 6.1 HOME PAGE WITH SIGN IN AND SING UP 31

Figure 6.2 ADMIN PAGE WITH STATICS OFDATASETS 31

Figure 6.3 INPUTTING ATTRIBUTES FOR CROPPREDICTION 32

Figure 6.4 OUTPUT FOR THE CROP PREDICTION ON 32


INPUTTING THE ATTRIBUTE VALUES
Figure 6.5 INPUTTING ATTRIBUTE VALUES FOR FERTILIZER 33
ESTIMATION

Figure 6.6 DATA VISUALIZATION ON ACCURACY OF 33


VARIOUS ALGORITHMS

IV
LIST OF TABLES

TABLE NO. CAPTION PAGE NO.


Table 5.1 TEST CASE FOR USER LOGIN 27

Table 5.2 TEST CASE FOR USER LOGIN IF NOT 27


REGISTERED
Table 5.3 TEST CASE FOR REGISTRATION 28

Table 5.4 TEST CASE FOR DATA PRE-PROCESSING 28

Table 5.5 TEST CASE FOR MODEL CREATION 29

Table 5.6 TEST CASE FOR DATA VISUALIZATION 29

Table 5.7 CROP PREDICTION 30

Table 5.8 TEST CASE FERTILIZER ESTIMATION 30

V
ABSTRACT

The main goal of our project is to create a one stop solution to various problems in the
domain of agriculture. Now a day’s Machine Learning is getting more popular as it is the
technique of teaching machines to make decisions by the provided data. This stream of
computer Science helps lot in achieving our goal i.e. predicting the Suitable Crop based on
the condition of the cultivation land, estimating of the quantity of fertilizers to be used based
on the weather condition and various practices taken. This Project eliminates the manual and
inaccurate approach practiced by the farmers and helps them to make right decisions for a
better yield. In this application the crops are paired with their suitable soil based soil type by
considering soil’s fertility, NPK contents, ph values, other necessary nutrients content.

VI
Prediction of crops based on soil type

CHAPTER 1

INTRODUCTION

In India, farming is done by traditional method, farmer’s plant crops traditionally without
knowing the content of soil and quality of that soil. As a result, farmers will not gain sufficient
profit from there farming. The existing method of soil testing is manual method which starts by
taking soil samples and then sends to laboratories for testing. This manual process is time
consuming and not so feasible. Due to human intervention there are chances of human errors so
farmers may receive incorrect report. So, there is need of automated process for soil testingand
cropprediction.Testingofsoilisimportantbecausesoiltestinghelpstodeterminefertilityofsoil and
thus crop prediction can be done. So, we proposed a system which will have a handheld
devicewhichgivespHvalueandwewillestimateNitrogen(N),Phosphorus(P)andPotassium
(K) from the pH, temperature, moisture and electrical conductivity of thatsoil.

India is one of the agriculture-based country in which 50% workforce is involved in agricultural
activities. India accounts for 7.68% of total global agricultural output. Contribution of
agriculture sector in Indian economy is much higher than world average (6.1%). But, traditional
farms in India still have some of the lowest per capita productivity and farmer incomes. This
sector also requires a lot of human efforts to do different kind of tasks like watering crop,
cultivating crop and spreading pesticides etc. Soil analysis is important methodology as it gives
nutrientspresentinsoilsuchasNPK,temperature,moistureandelectricalconductivityvalues.In
automated soil testing human efforts will be reduced by monitoring the quality of soil using
sensors.

Dept of ISE, NCET 6 2021-2022


Prediction of crops based on soil type

CHAPTER 2

LITERATURE SURVEY

Paper 1: “Crop selection method to maximize crop yield rate using machine learning
technique”

The paper presents a vivid representation of a Crop Selection Method which aims to solve the
crop selection issue and enhances the net yield of the harvest. The authors have proposed a
strategy that proposes a scope of crops to be chosen over a season by keeping into thought the
essential elements like the climate, water density, crop category. The estimated value of the
factors that are highly influential determine the precision of Crop Selection Method. The
technique taken into account in the paper is the method of crop sequencing. A categorization of
the crops is done in four divisions namely seasonal, whole year, short-time plantation, and long-
time. The grouping of the crops from each category is selected in a sequence for the crop
cultivation. Hence there is a necessity for a prediction technique with upgraded precision and
performance.

Paper 2: “Improving crop productivity through a crop recommendation system using


ensembling technique”

In this paper a crop recommendation system has been designed that takes into consideration the
soil dataset with respect to the four crops Rice, Cotton, Sugarcane, Wheat. The soil dataset is first
pre-processed and then the ensembling technique performs a critical function in the classification
of the four crops. The individual base learners used in the ensemble model are Random Forest,
Naive Bayes, and Linear SVM. Majority Voting Technique has been used as the combination
method to provide the best accuracy. Hence, the proposed work provides a helping hand to the
farmer in the accurate selection of the crop for cultivation.

Paper 3: “AgroConsultant: Intelligent Crop Recommendation System Using Machine Learning


Algorithms”

Dept of ISE, NCET 7 2021-2022


Prediction of crops based on soil type

In this paper, they have successfully proposed and implemented an intelligent crop
recommendation system, which can be easily used by farmers all over India. This system
would assist the farmers in making an informed decision about which crop to grow depending on
a variety of environmental and geographical factors. They have also implemented a secondary
system, calledRainfall

Predictor, which predicts the rainfall of the next 12 months. The high accuracies provided byboth
these models make them very efficient for all practical and real-time purposes. Furthemore, crop
demand and supply as well as other economic indicators like farm harvest prices and retail prices
can also be considered as parameters to the Crop Suitability Predictor model. This would
providea holistic prediction not only on the basis of environmental and geographical factors, but
also depending on the economicaspects.

Paper 4: “Soil Classification using Machine Learning Methods and Crop Suggestion Based on
Soil Series”

A model is proposed for predicting soil series and providing suitable crop yield suggestion
for that specific soil. The research has been done on soil datasets of six upazillas of Khulna
region. The model has been tested by applying different kinds of machine learning algorithm.
Bagged tree and K-NN shows good accuracy but among all the classifiers, SVM has given the
highest accuracy in soil classification. The proposed model is justified by a properly made dataset
and machine learning algorithms. The soil classification accuracy and also the recommendation
of crops for specific soil are more appropriate than many existing methods. In future, providing
fertilizer recommendation is our concern, also data of other districts will be added to make this
model more reliable andaccurate.

Paper 5: “Detection of N, P, K Fertilizers in Agricultural Soil with NIR Laser Absorption


Technique”

The paper talks about the detection of sodium, potassium and phosphorous contents respectively
by photon absorption technique. Photon absorption technique is a simple and non-destructive
analytical method that can be used to quantify several soil properties simultaneously.

Dept of ISE, NCET 8 2021-2022


Prediction of crops based on soil type

Paper 6:” Machine learning and statistical approaches used in estimating parameters that affect
soil fertility status: a survey”

In this paper, a study is made on different parameters used in the literature for defining the
characteristics of the soil and how they are used as input for machine learning algorithms/analysis
for predicting the soil fertility. Based on this study, it could be observed that prediction
techniquescould be efficiently applied over optimized soil parameters for soil fertility prediction
with more accuracy and less human intervention. The parameters that affect the fertility are
many, which are estimated in laboratories using conventional methods, and its need of the hour to
revolutionize these estimation procedures of soil fertility using automated methods. This survey
takes an account of the research work that has recently performed, on predicting soil fertility
parameters using machine learning approaches. The papers included in the survey aims at
mapping factors that directly or indirectly affect the soil fertility viz. pH, Phosphorous content,
etc. Soil fertility is measured in terms of presence/ absence of 8 macro nutrients and 9 micro
nutrients present in the soil. Apart from these, the physical properties such as texture, porosity
etc. and chemical properties such as pH, Cation Exchange Capacity, Soil water Retention
Capacity (SWRC). They used four types of machine learning models ANN(Artificial Neural
Network), Support Vector Regression, subtractive clustering Fuzzy inference systems (SC-
FIS),Wang and Mendels Fuzzy Inference Systems (WM-FIS). The crust of the paper is that it
uses FIS which is a novel approach in microbial prediction and appears to be the best compared
to other approaches which have tried to model thesame.

Paper 7: Potassium, an important element to improve water use efficiency and growth parameter
in quinoa under saline condition.

In this paper they have analysed the stress level in plants associated with salinity and K
deficiencyand used machine learning algorithms like KNN and Random forest. By considering
mineral nutrition as an important key for salt tolerance they have optimize crop productivity
under saline conditions

K+ is important to reduce possible damage caused by salinity and at the same time increase
cropproductivity

Paper 8: Heat, wheat and CO2: The relevance of timing and the mode of temperature stress
onbiomass and yield.

Dept of ISE, NCET 9 2021-2022


Prediction of crops based on soil type

In this paper author has predicted temperature and moisture IOT model is applied with sensor
techniques.Elevated temperature exerts different effects on wheat performance depending on
themode and the timing of temperature stress

Climate chamber approach to investigate the interactive effects of CO2 enrichment and
differenttemperature regimes, applied during different growth stages, on bio‐ mass and yield of
wheat.

Paper 9: Thresholds, sensitive stages and genetic variability of finger millet to high
temperaturestress

To record the yield and its components, crop selection method using SVM(support vector
machine) one of the assembling technique. Development of finger millet genotypes with
improved tolerance to HT stress can provide greater yield stability and resilience in current
andfuture climate. Under field conditions, HT stress will be different due to day to day
variation intiming, intensity and duration of stress events

Based on the stress response and grain yield, tolerant or susceptible genotypes were identified.

Paper 10: A Modal for Prediction of Crop Yield

The paper uses association rule mining to predict the yield of the crop. The algorithms used are k-
Means Algorithm, clustering method and A priori association rule mining. The major
disadvantage is that the paper uses association rule mining for prediction of crop yield. The
problem with association rule mining is that it generates too many rules in some cases and the
accuracy of the prediction reduces. Also the rules tend to vary as per dataset and the results also
very greatly. The proposed system mainly focuses on the issue of yield prediction of crop which
plays very important role in crop selection as farmer can select crop with maximum yield.

The systems uses association rule mining to find rules and crops with maximum yield. This
system focuses on creation of a prediction model which may be used to future prediction of crop
yield.

Dept of ISE, NCET 10 2021-2022


Prediction of crops based on soil type

CHAPTER 3
IMPLEMENTATION

Implementation of the project is given below:

System Architecture and Design

System architecture is a conceptual model that defines the structure, behavior and more views
of a system. A system architecture can comprise system components, the expand system
developed, that will work together the overall system.

We have divided the architecture into 3 different phases:

❖ Internet of Things (IoT)model

Inthisphasewecomeacrossdifferent sensorswhichincludeNPK,pH,temperature,moisture,
electric conductivity sensors which helps in reading the live data from the soil which is to
include IoT device built using an Arduino. IoT is a system of interrelated computing devices,
mechanical and digital machines provided with unique identifiers and the ability to transfer
data over a network without requiring human-to-human or human- to-computer interaction.
Hence the values captured by the sensors placed in the soil are directly sent to the softwarefor
analysis.

❖ PredictionModel

In this phase we use three different machine learning algorithms to predict which crop is
suitable to grow in the soil being tested, the algorithms used are Naive Bayes Classifier
algorithm, k-Nearest-Neighbor algorithm and Random Forest. These algorithms are known to
result in a set of high probability matches or crop varieties, hence the crops which occur a
higher number of times is selected as most suitable.

❖ Data Analytics

Reducing the resultant crop varieties to a single ideal variety that assures profitable yield, this
phase work on our datasets and by taking the results of each algorithm we predict a final
answer. Additional constraints are also applied at this at this stage to further narrow down
possibilities.

Dept of ISE, NCET 11 2021-2022


Prediction of crops based on soil type

Figure 2.1: System architecture

Fig.2.1laysoutthesystemarchitecturefortheproject.TheIoTmodelconsistofTemperature sensor,
moisture sensor, and electric conductivity sensors and NPK and Ph values. The three
algorithmsusedforpredictionmodelis K-NN,RandomForestandSVM.Byusingthiswecan predict
with training dataset, and data analytical is nothing but a live review and data visualization
from Arduino from IoTmodel.

Flowchart Diagram

A flowchart is one of the basic quality tools used in project management and it displays the
actions that are necessary to meet the goals of a particular task in the most practical sequence.
Also called as process maps, this type of tool displays a series of steps with branching
possibilities that depict one or more inputs and transforms them to outputs.

In this project, data is collected from the IoT model and that data is preprocessed using the
dataset. The crop model creation is done by applying three machine learning algorithms based
on their accuracy is classification and regression. A model is created and analysis of retrieved

Dept of ISE, NCET 12 2021-2022


Prediction of crops based on soil type
dataisdone.Thecroppredictionmodeldevelopedaccuratelypredictsthecroptobecultivated in the
tested soil and the fertilizer prediction model developed predicts the amount offertilizer
needed by thesoil.

Figure. 2.2: Flow chart

Sequence Diagram

A sequence diagram simply depicts interaction between objects in a sequential order i.e. the
orderinwhichtheseinteractionstakeplace. Wecanalsousethetermseventdiagramsorevent
scenarios to refer to a sequence diagram. In this project User produces the soil sample to be
tested as input to the sensor. The sensors used are able to measure values such as temperature,
moisture, electrical conductivity. Additionally, the NPK and pH value is manually added and
together all attributes and their respective values are passed to the IoT model. The Model is a
collection of data sets required to aid in the prediction of crop variety and fertilizer amount.It

Dept of ISE, NCET 13 2021-2022


Prediction of crops based on soil type
consists of historical data of previously tested soil samples and other research findings. This
data set creates the model which is processed together with the sensor-retrieved soil values.
Applying the multiple algorithms to increase accuracy, a prediction is made on the most
suitable crop for the given soil sample along with the fertilizer requirement for the crop.

Figure. 2.3: Sequence diagram

Dataflow Diagram

A data flow diagram (DFD) is graphic representation of the "flow" of data through an
information system. A data flow diagram can also be used for the visualization of data
processing (structured design). It is common practice for a designer to draw a context level
DFD first which shows the interaction between the system and outside entities. DFD’s show
the flow of data from external entities into the system, how the data moves from one process
to another, as well as its logical storage. There are only foursymbols:

1. Squares representing external entities, which are sources and destinations of


information entering and leaving thesystem.

Dept of ISE, NCET 14 2021-2022


Prediction of crops based on soil type
2. Rounded rectangles representing processes, in other methodologies, may be called
'Activities', 'Actions', 'Procedures', 'Subsystems' etc. which take data as input, do
processing to it, and outputit.
3. Arrows representing the data flows, which can either, be electronic data or physical
items.Itisimpossiblefordatatoflowfromdatastoretodatastoreexceptviaaprocess, and
external entities are not allowed to access data storesdirectly.
4. The flat three-sided rectangle is representing data stores should both receive
information for storing and provide it for furtherprocessing.

❖ Level 0 data flowdiagram

Figure 2.4: level 0 data flow diagram

Level0DFDrepresentsthesystemasasingleprocesswith itsrelationshiptoexternalentities. The


entire system is depicted by a single bubble with input and output data indicated by
incoming/outgoingarrows.InLevel0dataflow ofthisproject,theIoTmodelgivesthevalues such as
NPK, pH, temperature, moisture and electric conductivity to the combined crop selection and
fertilizer prediction model to output well calculated results on thesame.

Dept of ISE, NCET 15 2021-2022


Prediction of crops based on soil type
❖ Level 1 data flowdiagram

Figure 2.5: level 1 dataflow diagram

In level 1 DFD, the context diagram is decomposed into multiple processes that highlight the
main functions of the system and breakdown the high-level process of level 0 DFD into
subprocesses.Fig. 3.2showsthelevel1DFDfortheproject.Theuserprovidesthesoilsample as input
and the IoT model which is the collection of data sets, preprocesses the data and split the data
set to create the model by using the three algorithms namely Naive Bayes, KNN and Random
forest based on their accuracy and prediction of crop is done. The same procedure is followed
for fertilizer prediction by using the three algorithms based on the accuracy and the
attributeusedforthemodelcreationareNPK,pH,moisture,temperature,electricconductivity
andcrop.

Dept of ISE, NCET 16 2021-2022


Prediction of crops based on soil type

CHAPTER 4

ALGORITHMS USED
The three algorithms used to accurately make a decision on the type of crop to cultivate are
mentioned below:

K-Nearest Neighbor (KNN)


K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised
Learning technique. K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to the available
categories. K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a wellsuite
category by using K- NN algorithm. It can be used for Regression as well as for Classification
but mostly it is used for the Classification problems. K-NN is a non-parametric algorithm, which
means it does not make any assumption on underlying data. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it stores the dataset
and at the time of classification, it performs an action on the dataset. KNN algorithm at the
training phase just stores the dataset and when it gets new data, then it classifies that data into a
category that is much similar to the newdata.

For example, suppose we have an image of a creature that looks similar to cat and dog, but we
want to know either it is a cat or dog. So, for this identification, we can use the KNN algorithm,as
it works on a similarity measure. Our KNN model will find the similar features of the new data
set to the cats and dog’s images and based on the most similar features it will put it in eithercat or
dog category. Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve this type of
problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify thecategory
or class of a particular dataset.

Consider the below diagram:

Dept ofISE,NCET 17 2021-2022


Prediction of crops based on soil type

Figure. 3.1. K-Nearest Neighbor

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of theneighbors


o Step-2:CalculatetheEuclideandistanceofKnumberofneighbors
o Step-3:TaketheKnearestneighborsasperthecalculatedEuclideandistance.
o Step-4:Amongthesekneighbors,countthenumberofthedatapointsineachcategory.
o Step-5:Assignthenewdatapointstothatcategoryforwhichthenumberoftheneighbor
ismaximum.

o Step-6: Our model isready.

Suppose we have a new data point and we need to put it in the required category. Consider the
below image:

Figure. 3.2. KNN Classification

Dept of ISE, NCET 18 2021-2022


Prediction of crops based on soil type

o Firstly,wewillchoosethenumber ofneighbors,sowewillchoosethek=5.
o Next,wewillcalculatetheEuclideandistancebetweenthedatapoints.

The Euclidean distance is the distance between two points, which we have already studied in
geometry. It can be calculated as:

Figure. 3.3 KNN Euclidean Distance

o Euclidean Distance between A1 and B2 is sqrt of (X2-X1)+(Y2-Y1).

o BycalculatingtheEuclideandistance,wegotthenearestneighbors,asthreenearest
neighbors in category A and two nearest neighbors in categoryB.

Random Forest

Randomforestsorrandomdecisionforestsareanensemblelearning methodforclassification,
regressionandothertasksthatoperatesbyconstructingamultitudeofdecisiontreesattraining
timeandoutputtingtheclassthatisthemodeoftheclassesormeanpredictionoftheindividual trees.

Dept of ISE, NCET 19 2021-2022


Prediction of crops based on soil type

Figure. 3.4 Simplified Random Forest

Randomforest,likeitsnameimplies,consistsofalargenumberofindividualdecisiontreesthat
operate as an ensemble. Each individual tree in the random forest spits out a class prediction
andtheclasswiththemostvotesbecomesourmodel’sprediction(seefigurebelow)

Figure. 3.5 Visualization of a Random Forest Model Making a Prediction

The fundamentalconceptbehindrandomforestisasimplebut powerfulone—thewisdomof


crowds.Indatasciencespeak,thereasonthattherandomforestmodelworkssowellis:

Dept of ISE, NCET 20 2021-2022


Prediction of crops based on soil type
Soil Fertility Analysis andCropPrediction AlgorithmsUsed
A large number of relatively uncorrelated models (trees) operating as a committee will
outperformanyoftheindividualconstituentmodels.Thelowcorrelationbetweenmodelsisthe key.
Just like how investments with low correlations (like stocks and bonds) come together to form
a portfolio that is greater than the sum of its parts, uncorrelated models can produce ensemble
predictions that are more accurate than any ofthe individual predictions. The reason
forthiswonderfuleffectisthatthetreesprotecteachotherfromtheirindividualerrors(aslong as
theydon’t constantly all err in the same direction). While some trees may be wrong, many
other trees will be right, so as a group the trees are able to move in the correct direction.

The prerequisites for random forest to perform well are:

1. Thereneedstobesomeactualsignalinourfeaturessothatmodelsbuiltusingthosefeatures do
better than randomguessing.

2. The predictions (and therefore the errors) made by the individual trees need to have low
correlations with eachother.

Support vector machine


Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be
used for both classification or regression challenges. However, it is mostly used in
classification problems. In the SVM algorithm, we plot each data item as a point in n-
dimensional space (where n is number of features you have) with the value of each feature
beingthevalueofaparticularcoordinate.Then,weperformclassificationbyfindingthehyper- plane
that differentiates the two classes verywell.

A hyperplane in an n-dimensional Euclidean space is a flat, n-1 dimensional subset of that


spacethatdividesthespaceintotwodisconnectedparts.Forexample,let’sassumea lineto be our
one-dimensional Euclidean space. If we pick a point on the line, this point divides the line
intotwoparts.Thelinehas1dimension,whilethepointhas0dimensions.Fortwodimensions we saw
that the separating line was the hyperplane. Similarly, for three dimensions a plane
withtwodimensionsdividesthe3dspaceintotwopartsandthusactasahyperplane.Thus,for a space
of n dimensions we have a hyperplane of n-1 dimensions separating it into twoparts.

Dept of ISE, NCET 21 2021-2022


Prediction of crops based on soil type

Figure. 3.6. Support Vector Machine

To separate the two classes of data points, there are many possible hyperplanes that could be
chosen.Ourobjectiveistofindaplanethathasthemaximummargin,i.e.themaximumdistance
between data points of both classes. Maximizing the margin distance provides some
reinforcement so that future data points can be classified with more confidence. A sample of
data is provided to the model. These inputs should be structured appropriately in order to be
read. The next step is to define the initial SVM parameters and the kernel function that will be
used from the SVM algorithm. The error cost term C and the maximal margin ε should be
selected randomly. Then, the training process begins. The sample is divided into v parts. One
subset is used as a validation part and the remaining are used to train the model. This process
prevents the over fitting problem and making the trained model to have good generalization
performance. Once the trained model is created, a new unknown data set is provided to the
model. SVM produces a forecast output for this unknown sample based on the trainedmodel.

Dept of ISE, NCET 22 2021-2022


Prediction of crops based on soil type

Figure. 3.7. Flow diagram of SVM

Dept of ISE, NCET 23 2021-2022


Prediction of crops based on soil type

CHAPTER 5
MODEL OF THE PROJECT
❖ IoT Model

In this IoT model we use 3 sensors to detect the following:

• Temperature

• Moisture

• Electric conductivity

Testing of the soil sample is to be done by the sensors and the measured values are sent to

an IoT device to be processed. The NPK and Ph values of the soil sample are extracted

manually.

❖ Data preprocessing
Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In
other words, whenever the data is gathered from different sources it is collected in raw format
which is not feasible for the analysis. Therefore, certain steps are executed to convert the data
into a small clean data set. This technique is performed before the execution of Iterative
Analysis. The set required to be carried out in this technique is known as data preprocessing.

It includes:

• DataCleaning

Data cleaning refers to identification of incomplete, incorrect, inaccurate or

irrelevant parts of the data and the replacement, modification, or deletion of the

dirty or coarse data.

• DataIntegration

Data integration is the method of combining data residing in different sources and

providing users with a unified view of them.

• DataTransformation

Dept of ISE, NCET 24 2021-2022


Prediction of crops based on soil type

Data transformation is process of converting data from one format or structure into another
format or structure

• Data Reduction

Datareduction isthereductionoftheamountofcapacityrequiredtostoredata.Datareduction can


increase storage efficiency and reduce costs. Data Preprocessing is necessary because of the
presence of unformatted real-world data. Mostly real-world data is composedof

❖ Crop modelcreation

After the implementation of three algorithms namely Random forest,KNN, and Naïve Bayes,
we select the algorithm with the highest accuracy to train and create the model. Model
construction once complete, is followed by model training. We were able to build models
which take our data. Split the dataset into train and test dataset. Finally, we will build and
train the model using training dataset. and the attribute used for the model creation are NPK,
pH, moisture, temperature, and electric conductivity. By using this we predict the crop most
suitable forcultivation.

❖ Fertilizer estimator modelcreation


Similarly, for the development of the fertilizer estimator model, the candidate algorithm with
the highest accuracy rate is chosen to train and create a model. The attributes used for model
creation include NPK, pH, moisture, temperature, electrical conductivity and crop variety.
Through this we are able to estimate the amount fertilizer needed.

❖ Modelanalysis

Algorithm used in this project are Naive Bayes, Random Forest and KNN in which It has the
model accuracy, and visualization part

Dept of ISE, NCET 25 2021-2022


Prediction of crops based on soil type

CHAPTER 6

SYSTEM TESTING
In this chapter, an overview of testing is provided to verify the correctness and the
functionality of the system. Software testing is the process of analyzing a software item to
detect the differences between the existing and required conditions and to evaluate the
features of software item. It is a task intended to detect defects in software by contrasting a
computerprogram’sexpectedresultswithitsactualresultsforgivensetofinputsandshould be
done throughout the developmentprocess.

The aim of testing phase is to discover defects or errors by testing individual program
components. During a system testing, these components are integrated to form a complete
system. At this stage, testing was focused on establishing that the system met its functional
requirements, and does not behave in an unexpected way. Test data were inputs which had
been devised to test the system and the outputs were predicted from these inputs if the
systemoperatesaccordingto itsspecification.Testingwasdonetoexaminethebehaviorin a
cohesive system. The test cases were selected to ensure that the system behavior can be
examined in all possible combination ofconditions.

Accordingly,theexpectedbehaviorofthesystemunderdifferentcombinationsweregiven.
Therefore,testcaseswereselectedwhichhadinputsandtheoutputswereonexpectedlines. Inputs
that were not valid and for which suitable messages had to be given and the inputs that did
not occur frequently were regarded as specialcases.

Test Environment

A testing environment is a setup of software and hardware on which the testing team is
going to perform the testing of the newly built software product. This setup consists of the
physical setup which includes hardware, and logical setup that includes Server Operating
system, client operating system, database server, front end running environment, browser
(ifwebapplication),oranyothersoftwarecomponentsrequiredtorunthissoftwareproduct. This
testing setup is to be built on both theends.

In this project the testing environment mainly consists of the following:

Dept of ISE, NCET 26 2021-2022


Prediction of crops based on soil type
❖ Software

• Anaconda command line interface to run theserver

• Visual Studio code for editing. Any code editing software can be used to
write theprograms

• Python ide to run pythonscripts

• SQL Database to store datasets of soil propertiescollected

• XAMPP server used to run the web server and the databaseserver

❖ Hardware

• PC with sufficient storage space to store and run themodule

• Internetconnection

Test Case

Setoftestinputs,executionconditions,andexpectedresultsweredevelopedforaparticular
objective, such as to exercise a particular program path or to verify compliance with a
specific requirement. It included thefollowing:

❖ Features to betested
• Sign up and signin
• Crop prediction model
• Fertilizer predictionmodel
❖ Items to betested
• Accuracy predictorscales
• Forminputs
• Submit button and other main menubuttons
❖ Purpose oftesting
• To identify inaccuratepredictions
• To make sure exception handling isdone
• Incorrect inputs-in terms of values range or format of input-areidentified
❖ Pass/Failcriteria
• If same inputs produce varying results,fail

Dept of ISE, NCET 27 2021-2022


Prediction of crops based on soil type
• Inability to produce result when particular input is fed,fail
• Loading time to produce result exceeds threshold,fail
• Accurate result-as per analysis of database-for each type of crop taken into
consideration,pass

Testing in Machine Learning


Machines learning is a study of applying algorithms and statistics to make the computer to
learn by itself without being programmed explicitly. Computers rely on an algorithm that
usesamathematicalmodel.Theusageoftheword"testing"inrelationtoMachineLearning
models is primarily used for testing the model performance in terms of accuracy/precision
of the model. It can be noted that the word, "testing" means different for conventional
software development and Machine Learning modelsdevelopment.

Hence as mentioned above the traditional unit/integration testing would not work on
machine learning models hence it is tested based on its accuracy and prediction.

Accuracy is one metric for evaluating classification models. Informally, accuracy


is the fraction of predictions our model got right. Formally, accuracy has the following
definition:

Accuracy = Number of correct predictions/Total number of predictions

Forbinaryclassification,accuracycanalsobecalculated intermsofpositivesandnegatives
asfollows:

TP + TN
Accuracy =
TP + TN + FP + FN

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False


Negatives.

Precision and Recall is also used as a metric for evaluating classification models. Precision
(also called positive predictive value) is the fraction of relevant instances among the
retrievedinstances,whilerecall(alsoknownassensitivity)isthefractionofthetotalamount of
relevant instances that were actuallyretrieved.

TP
Precision =
TP + FP

Dept of ISE, NCET 28 2021-2022


Prediction of crops based on soil type
TP
Recall =
TP +FN

When it comes to forecasting, the models are evaluated based on the expected results they
predict.Inthecaseofcropselectionforecasting,wehavedividedthedataintotrainingset and testing
set. Again, it is split into training dataset and validation dataset in the training set. We train our
model using the training dataset and validation dataset is used to test the traineddata.
Avalidationdatasetisa sampleofdataheldbackfromtraining yourmodelthatisusedto give an
estimate of model skill while tuning model'shyperparameters.

Figure 5.1 Filtration of Datasets

A test dataset is a dataset that is independent of the training dataset, but that follows the
sameprobabilitydistributionasthetrainingdataset.Ifamodelfit tothetrainingdataset also fits
the test dataset well. Hence by observing the predicted vs observed value we can tell how
well our modelworks.

Dept of ISE, NCET 29 2021-2022


Prediction of crops based on soil type
System Testing

System testing is the testing conducted on a complete, integrated system to evaluate the
systemcompliancewithitsspecifiedrequirements.Systemtestinginvolvesputtingthenew
program in many different environments to ensure that the program works in typical
customer environments with various versions and types of operating systems and/or
applications.

System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. Although each test has a different purpose, the main
purpose is to verify that all the system elements have been properly integrated and perform
the allocated functions.

Unit Texting

Unit testing is the mechanism where individual model of the projected is tested. It can also be
called as differentiation testing, as the project is tested based on individual model. Using the
modulesleveldesignsdepictionasamonitorsignificantdevicerouteistestedtodiscoverfaults
around the boundary of eachmodule.

Thebelowtable6.1showsthesuccessfulloadingofthehazyimagethat isselectedbytheuser to do
the processing. Haze images are suspended particles in atmosphere such as fog, murk, mist,
dust which causes poor visibility image and distorts the colors of the scene. Haze image
regards as a major challenge problem in many applications in the fields of image processing
and computervision.

Hazyimagecanbemodeledasacombinationofsceneradiance,airlightandtransmission.The
mainchallengeinde-hazingprocessisduetodifferentdensityofhazefromoneregiontoother
inthehazeimage,alsotheweatherconditionatthetimeofimagecapturing,thehazingimages lose the
color fidelity and contrast. Also, the position of camera and how far from the scene may be
cause for imagedegradation.

Dept of ISE, NCET 30 2021-2022


Prediction of crops based on soil type
❖ User login: User need to insert the user name and password if username is registered
will get a message of login successful. The output expected is login successful and out
obtained is the same in this case. final result issuccessful.

Test Case Sl. No 1

Test Name User login

Insert user name and password

Test Feature If user name is registered will get a message of login


successful.

Output Expected Login Successfully

Output Obtained Login Successfully

Result Successful

Table 5.1: Test case for user login.

❖ User if not registered: User need to insert the user name and password if username is
not registered will get a message not registered. The output expected is not registered
and out obtained is the same in this case. final result issuccessful.

Test Case Sl. No 2

Test Name User login not registered.

Insert user name and password

Test Feature If user name is not registered will get a message of not
registered.

Output Expected Not registered

Output Obtained Not registered

Result Successful

Table 5.2: Test case for user login if not registered.

Dept of ISE, NCET 31 2021-2022


Prediction of crops based on soil type

❖ Registration: User need to insert the details for registration. will get a message of
registration successful. The output expected is registration successful and out obtained
is the same in this case. final result issuccessful.

Test Case Sl. No 3

Test Name Registration

Test Feature Insert the details for registration

Output Expected Registration Successfully

Output Obtained Registration Successfully

Result Successful

Table 5.3: Test case for Registration.

❖ Data pre-processing: The process removes the null values and converts the string
value to int or float and split the data. output expected is data splitting is successfuland
out obtained is the same the final result issuccessful.

Test Case Sl. No 4

Test Name Data pre processing

It removes null values and converts the string value to int


Test Feature
or float and split the data.

Output Expected Dataset splitting is successful

Output Obtained Dataset splitting is successful

Result Successful

Table 5.4: Test case for data pre processing

Dept of ISE, NCET 32 2021-2022


Prediction of crops based on soil type

❖ Model creation: Creates the model based on algorithm using dataset. output expected
is model created successful and out obtained is the same the final result issuccessful.

Test Case Sl. No 5

Test Name Model creation

Test Feature Create the model based on algorithm using dataset.

Output Expected Model created successful

Output Obtained Model created successful

Result Successful

Table 5.5: Test case for Model creation

❖ DataVisualization:Usingmultiplealgorithms,wecancreatemodel.Ifoncethemodel is
created successfully the data visualization displays output expected is successfully
data will be visual and out obtained is the same the final result issuccessful.

Test Case Sl. No 6

Test Name Data visualization

Using multiple algorithms, we create a module. If once


Test Feature the model is created successfully the data visualization
displays

Output Expected Successfully data will be visual

Output Obtained Successfully data will be visual

Result Successful

Table 5.6: Test case for Data visualization

Dept of ISE, NCET 33 2021-2022


Prediction of crops based on soil type

Test Case Sl. No 7

Test Name Crop prediction

Test Feature Check all the attribute and predict the crop

Output Expected Successfully predicts the crop

Output Obtained Successfully predicts the crop

Result Successful

Table 5.7: crop prediction

Test Case Sl. No 8

Test Name Fertilizer estimation

Test Feature Checks the attribute of the fertilizer and predicts the fertilizer
estimation

Output Expected Successfully predicts the fertilizer estimation

Output Obtained Successfully predicts the fertilizer estimation

Result Successful

Table 5.8: Test case fertilizer estimation

Dept of ISE, NCET 34 2021-2022


Prediction of crops based on soil type

CHAPTER 7
RESULTS
The Resultant Screenshots of the project are as shown below:

 LoginPage

Figure 6.1 Login page with Sign In and Sign Up

 Home Page

Figure 6.2 Home Page with statistics of datasets

Dept of ISE, NCET 35 2021-2022


Prediction of crops based on soil type
Crop Prediction

Figure 6.3 Inputting attributes for crop prediction

 ResultPage

Figure 6.4 Output for the crop prediction on inputting the attribute values

Dept of ISE, NCET 36 2021-2022


Prediction of crops based on soil type
Fertilizer Estimator

Figure 6.5 Inputting attribute values for fertilizer estimation

 Data Visualization

Figure 6.6 Data Visualization on accuracy of various algorithms

Dept of ISE, NCET 37 2021-2022


Prediction of crops based on soil type

ACCURACY and STANDARD VALUES TABLE

SVM

Crops N P K Accuracy
Cucumber 73 48 73 96.5
Corn paddy 51 51 73 97.9
Cotton sugarcane 49 76 76 98

SVM
120
100
80
60
40
20
0
cucumber corn paddy cotton sugarcane

N P K accuracy

KNN

Crops N P K Accuracy
Cucumber 72 75 76 97.83
Corn paddy 48 48 72 95.3
Cotton sugarcane 47 73 74 96.4

KNN
120

100

80

60

40

20

0
cucumber corn paddy cotton sugarcane

N P K Accurancy

Dept of ISE, NCET 38 2021-2022


Prediction of crops based on soil type

Random Forest

Crops N P K Accuracy
Cucumber 75 49 73 98.2
Corn paddy 50 50 76 99
Cotton sugarcane 50 73 75 97.9

Random forest
120
100
80
60
40
20
0
cucumber Category 2 Category 3

N P K accuracy

Dept of ISE, NCET 39 2021-2022


Prediction of crops based on soil type

CHAPTER 8
CONCLUSION AND SCOPE FOR FUTURE WORK
Thesystem“Analysisofsoilbehaviorofcroppredictionthroughsensordevices” isdeveloped and
tested successfully and satisfies all the requirement of the User. The goals that have been
achieved by the developed systemare:

• Simplified and reduced the manualwork.

• Large volumes of data can bestored.

• It provides Smooth workflow.

It is successfully accomplished by applying KNN, SVM and Random Forest classification


algorithmtechniques.Thisclassificationtechniquescomesunderdataminingtechnology.This
algorithm takes Temperature, electrical conductivity, moisture, Ph and NPK values as input
and predicts the crop based on particular soil and land /area using IoT device and which is
compared with masterdata.

Scope for Future Enhancement

• We can add IoT Sensor device to get values directly from soil testing sample/area of
land to server, we can add device called sensors like PH, NPK and soil other sensors
etc. As a part of parameter based on parameters crop can predicted which crop can be
grown by farmer that can be suggested by admin and he can guide manually to farmer
also.

• We can add Email module if any queries are there, the admin can directly interactwith
the administrator veryeasily.

Email module: In the proposed system, get the live data along with crop predicted
values using IoT devices and sensors and is intimated manually, so we can add Email
module as a future enhancement where Admin and Farmer can receive an Email
notification regarding the Id and password.

• Audio output can be added that convert the result produced in text to speech in various
dialects for better understanding byusers.

Dept of ISE, NCET 40 2021-2022


REFERENCES
[1]Analysis and Prediction of Suitable Crop for Agriculture using MachineLearning.
Authors: S. Panchamurthi. M. E, M. D. Perarulalan,A. Syed Hameeduddin, P. Yuvaraj
September 2020.

https://ijarsct.co.in/Paper1040.pdf

[2] AgroConsultant: Intelligent Crop Recommendation System Using Machine Learning


Algorithm

Authors: Zeel Doshi, Subhash Nadkarni, Rashi Agrawal, Neepa Shah,Vol 7,Issue
3,December 2019.

https://www.irjet.net/archives/V9/i4/IRJET-V9I4214.pdf

[3]Relevance of Machine Learning Algorithms on Soil FertilityPrediction.

Authors: Yanxin Jayalakshmi, M. Savitha Devi “Relevance Vol 10,Issue 1,No 1,January 2020.

https://ijip.in/issue/?volumes=Volume-10-Issue-1

[4] Crop Selection Method toMaximize Crop Yield Rate using Machine Learning Technique.

Authors:Rakesh Kumar, M.P.Singh, Prabhat Kumar, J.P.Singh, Vol. 4 (6),2018.

https://www.semanticscholar.org/paper/Crop-Selection-Method-to-maximize-crop-yield-rate-
Kumar-Singh/dec1c760fdfdcf5b7ff1b633cc03849fa66ef1ba

[5]Improving Crop Productivity Through A Crop Recommendation System Using


EnsemblingTechnique”

Authors: Nidhi H Kulkarni, Dr. G N Srinivasan, Dr. B M Sagar, Dr. N K


Cauvery,Volume 1,Issue5,2019.

https://www.academia.edu/45328575/Crop_Predication_and_Diseases_Detection_Usi
ng_Machine_Learning_

Dept of ISE, NCET 41 2021-2022

You might also like