Paper 7

3rd IEEE International Conference on Computational Systems and Information Technology for Sustainable Solutions 2018
Improving Crop Productivity Through A Crop

Recommendation System Using Ensembling
Technique
Nidhi H Kulkarni Dr. G N Srinivasan Dr. B M Sagar Dr.N K Cauvery
MTech IT, Dept. of ISE Professor, Dept. of ISE HOD, Dept. of ISE Professor, Dept. of ISE
RV College of Engineering RV College of Engineering RV College of Engineering RV College of Engineering
Bangalore, India Bangalore, India Bangalore India Bangalore, India
1
[email protected] [email protected] [email protected] [email protected]
Abstract - Agriculture plays a predominant role in the economic India is one of the established nations that has agriculture as
growth and development of the country. The major and serious its primary source of income. Agriculture is one such domain
setback in the crop productivity is that the farmers do not that contributes only around 14% to the GDP but has a
choose the right crop for cultivation. In order to improve the considerate amount of impact on the Indian economy. The
crop productivity, a crop recommendation system is to be
conventional agricultural practices and techniques are posing
developed that uses the ensembling technique of machine
learning. The ensembling technique is used to build a model that a lot of issues in terms of efficiency, cost-effectiveness and
combines the predictions of multiple machine learning models resource utilization. There is a necessity of better techniques
together to recommend the right crop based on the soil specific that can improve the standard of living of the farmers too.
type and characteristics with high accuracy. The independent Over the years due to globalization, agriculture has evolved
base learners used in the ensemble model are Random Forest, by adapting the latest technologies and techniques for a
Naive Bayes, and Linear SVM. Each classifier provides its own better standard of living. Among the technologies and
set of class labels with an acceptable accuracy. The class labels techniques , precision agriculture is one budding technology
of individual base learners are combined using the majority in the field of agriculture. Precision agriculture mainly
voting technique. The crop recommendation system classifies the
focuses on site-specific farming [1]. Crop recommendation is
input soil dataset into the recommendable crop type, Kharif and
Rabi. The dataset comprises of the soil specific physical and a prima arena in precision agriculture. Crop recommendation
chemical characteristics in addition to the climatic conditions relies on multiple parameters, for which precision agriculture
such as average rainfall and the surface temperature samples. practices help in identifying the parameters thereby
The average classification accuracy obtained by combining the facilitating better crop selection.
independent base learners is 99.91%.
Agricultural domain has imbibed the machine-learning
Keywords - Ensemble, Majority-Voting, Naive-Bayes, soil, crop- algorithm to produce efficient, cost-effective solutions to the
recommendation difficulties faced by the farmers. Researchers can utilize PC
simulations to lead early tests to assess how an assortment
I. OBJECTIVES may perform when looked with changed sub-atmosphere, soil
composes, climate designs, and different variables.
• Design a recommendation system for accurate crop Researchers in present day agribusiness are trying their
selection based on the various soil, rainfall and speculations at a more prominent scale and making
surface temperature parameters. considerably more precise, ongoing forecasts..
• To improve crop productivity by providing
predictions of high accuracy and efficiency through III. LITERATURE SURVEY
the ensembling technique
• To reduce the wrong choice on a crop by application The paper [2] mainly throws light on the implementation of a
of principles of precision agriculture. crop prediction system based on sensor networks that has
been developed using IoT. Soil testing labs take a
considerable amount of time in providing the results of the
II. INTRODUCTION submitted soil samples. Hence the system claims that it helps
ISBN: 978-1-5386-6078-2 © 2018 IEEE 114

the farmers to get a better crop prediction without any delay experimented on very less data mining algorithms. There is a
in the waiting period [2]. In the paper, the authors have necessity to include multiple data mining algorithms. The
mainly focused on analyzing the N(Nitrogen), P(Phosphorus), dataset size used is very small as stated by the authors
K(Potassium) contents in the soil sample collected for themselves, as huge datasets invite large amount of
survey. The proposed method in the paper efficiently complexities.
estimates the soil nutrients based on the data fetched by the
sensor network. This enables in predicting the apt crop for The paper provides clear details regarding the new framework
that soil under test. The farmers need to enlist their NPK designed by the authors namely eXtensible Crop Yield
sensor with the fundamental server. The NPK extract the Prediction Framework (XCYPF)[5]. This framework is
supplement level from the soil sample and refresh this mainly designed for predicting the crop yield. The framework
information to the primary server through the raspberry pi claims to provide crop selection, selection of dependent and
unit. In view of the readings got from the calculation makes independent variables ,datasets required for crop yield
predictions on the basis of the recorded information. The forecast. The framework has been tested for rice and
major shortfalls in this implementation were the inefficiency sugarcane crops. The framework professes to have been
of the crop prediction algorithm and major focus on the data coordinated with a management information system
collection through NPK sensor which has high range of providing expertise in the field of precision agriculture. The
fluctuations. framework that has been designed in the paper works
exclusively for rice and sugarcane crops.
The paper [3] presents a vivid representation of a Crop
Selection Method which aims to solve the crop selection issue A special concern has always been shown in case of how to
and enhances the net yield of the harvest[3]. The authors have increase the productivity of the crops. There have been
proposed a strategy that proposes a scope of crops to be various methods designed and other improvised techniques
chosen over a season by keeping into thought the essential that are used to boost the yield of the crops. Solving the
elements like the climate, soil composition, water density, problem at the source immediately eradicates the issue.
crop category. The estimated value of the factors that are Hence deciding the perfect solution for the crop to be
highly influential determine the precision of Crop Selection cultivated will lead to better crop productivity, and in turn
Method. The technique taken into account in the paper is the boosting the economy of the country.
method of crop sequencing. A categorization of the crops is
done in four divisions namely seasonal, whole year, short-
time plantation, and long-time. The grouping of the crops IV. METHODOLOGY
from each category is selected in a sequence for the crop
cultivation. Hence there is a necessity for a prediction A brief step by step procedure of designing the crop
technique with upgraded precision and performance. In recommendation system is explained as follows:
addition to this, there is a compulsion of selecting atleast one
crop from the category which serves as a major setback. Step 1: Input
The paper presents a comprehensive analysis of the soil The input dataset is a comma separated values file containing
characteristics and behaviour, and using this the authors have the soil dataset, which has to be subjected to preprocessing.
introduced a method of foreseeing the crop utilization using
the information mining approach [4]. The paper highlights the Step 2: Preprocessing of input data
focus on improvising the yield of the crops rather than the
crop selection technique. The soil datasets are taken as inputs Input dataset is subject to various preprocessing techniques
and analyzed. Based on the thorough analysis, the soil is such as filling of missing values, encoding of categorical data
arranged into low, medium, and high classes by making best and scaling of values in the appropriate range
use of the procedures that are used in data mining. As a result
of such categorization, the crop yield has been predicted Step 3: Splitting into training and testing dataset
using Naive-Bayes and k-Nearest algorithm. Here the crop
yield anticipated is formalized as a classification rule. The preprocessed dataset is then split into training and testing
Although there are many positive highlights with respect to dataset based on the specified split ratio. The split ratio
improvising the crop yield, there are a few drawbacks. The considered in the proposed work is 75:25, which means 75%
major drawback is that problem eradication at the source is of the dataset is used for the training the ensemble model and
not focussed, rather more focus is only on increasing the crop the rest 25% is used as test dataset.
yield. The further setbacks are that the method has been
ISBN: 978-1-5386-6078-2 © 2018 IEEE 115

Step 4: Building individual classifiers on the training

dataset M = I(S)
The training dataset is fed to each of the independent base M => Classifier , I => Inducer, S => Training set
learners and the individual classifiers are built using the
training dataset.
c) Diversity generator – Generation of diverse classifiers.
Step 5: Testing the data on each of the classifiers
d) Combiner – Responsible for combining the class labels
The testing dataset is applied on each of the classifiers, and obtained from the individual classifiers [21].
the individual class labels are obtained.
Step 6: Ensembling the individual classifier output using

Majority Voting Technique.
The class labels obtained from the individual classifiers is

subjected to the majority voting technique to get an
ensembled class label as the final prediction
A. Ensemble Framework
The ensemble framework is of utmost importance. The

ensemble framework is explained as follows.
Before diving into the details of the ensemble framework, the
actual meaning of ensembling and the reason for its usage.
Ensembling is a technique of building a prescient model by

incorporating multiple models. The main reason for using an
ensemble framework is that it provides a classifier that
outperforms each of the individual classifiers.
Ensembling uses two frameworks, dependent framework and

independent framework: In the dependent framework, the
yield of one classifier is utilized in the development of the
following classifier. The second method involves independent Advantage of using the independent framework of
method, that is each classifier produces a class label in an ensembling:
independent fashion. All the classifiers work in a parallelized
manner. The output of one classifier is independent of the a) Enhances the prescient intensity of the classifiers.
other. The independent method has been used in the proposed b) Decreases the aggregate execution time.
work since it reduces the execution time.
Random Forest, Naive Bayes and Linear SVM are the three
The ensemble framework comprises of the basic components independent base learners that have been used to build the
that are explained as follows: ensemble model. Each of the algorithms have been concisely
explained as follows.
a) Training set – A labelled set of instances that is utilized in
training the ensemble model. Each example in the training set B. Random Forest
is potrayed as attribute-value vectors.
A Random Forest is a classifier comprising of accumulation
b) Base Inducers - Inducer is an inducing algorithm that of tree-organized classifiers where independent random
produces a classifier on feeding a labelled set of instances to vectors are disseminated indistinguishably and each tree
the inducer as input. The resulting classifier gives a make a unit choice for the most mainstream class at input x.
distinctive potrayal of the generalized relationship between A random vector is produced which is autonomous of the past
the input attribute and the target attribute. arbitrary vectors with same dissemination and a tree is
ISBN: 978-1-5386-6078-2 © 2018 IEEE 116

created by utilizing the training set. The main advantages of where yk (x) is the classification of the kth classifier and g(y,
considering the Random Forest algorithm is that it provides c) is an indicator function defined as:
better accuracy, vigorous to the outliers, quicker than bagging
and boosting, basic and easy to parallelized.
C. Naive Bayes
Naive Bayes is a classification algorithm for binary and

V. DATASET DETAILS
multi-class classification problems. When binary or
categorically input values are provided, Naive Bayes method The dataset considered for usage in the given
is very easy to understand. In Naive Bayes, a Naive Bayes proposed work is a soil dataset primarily comprising of soil
classifier assumes that the presence of a particular feature in a physical and chemical properties, along with the climatic
class is not at all related to the presence of any other feature. details. An open source dataset is obtained from the data
Hence the name 'Naive Bayes'. The Naive Bayes classifier repository site of the Government of India, data.gov.in
depends on the Bayes hypothesis and this technique is useful
in the cases where the dimensionality of the sources of The dataset size is 5MB containing 9000 rows and 15
information is high. Naive Bayes has multiple applications attributes that are of prime importance.
such as for making predictions in real time, to predict the The crops considered are Cotton, Sugarcane, Rice, Wheat.
probability of multiple classes of target attribute, spam- The dataset attributes that are of prime importance are
filtering, and coupled with collaborative filtering helps to
build recommendation systems.
• Soil Type
• pH value of the soil
Initially the probability of each attribute in the dataset is to be
calculated which is also known as class probability. • NPK content of the soil
The conditional probability gives the conditional probability • Porosity of the soil
of each input value given each class value. • Average rainfall
• Surface temperature
D. Linear SVM • Sowing season
Linear SVM is the current machine learning algorithm that is

the quickest to solve the multiclass classification problems. VI. RESULTS
Linear SVM is linearly scalable, this means that the SVM
model is created in a CPU time that scales linearly with the The collected data is initially subjected to preprocessing. Post
training dataset size. The main advantage of Linear SVM is dataset preprocessing, the dataset is divided into training set
that it works well with extremely large datasets along with and test set samples. Out of the 9000 samples, 6750 samples
eminent accuracy. Linear SVM also provides better are used as training samples, and the rest 2250 samples are
performance on working with multidimensional data[5]. used as test samples. Each of the sample is trained and tested
on the Random Forest, Naive Bayes and the Linear SVM
E. Majority Voting algorithms. The average accuracy of crop classification into
Kharif and Rabi crops is 99.91%.
Majority Voting technique is one of the techniques of
combining the class labels obtained as a result of the
independent classifiers. In this combining plan, a CONCLUSION
classification of an unlabelled instance is performed by the
class that gets the most astounding number of votes. This
technique is otherwise called the plurality vote This A crop recommendation system has been designed that takes
methodology has been utilized much of the time as a into consideration the soil dataset with respect to the four
consolidating strategy for looking at recently proposed crops Rice, Cotton, Sugarcane, Wheat. The soil dataset is first
strategies. This is the most frequently used combiner. preprocessed and then the ensembling technique performs a
Mathematically, it can be expressed as: critical function in the classification of the four crops. The
individual base learners used in the ensemble model are
Random Forest, Naive Bayes, and Linear SVM. Majority
ISBN: 978-1-5386-6078-2 © 2018 IEEE 117

Voting Technique has been used as the combination method [7] D Ramesh , B Vishnu Vardhan, “Data mining
to provide the best accuracy. technique and applications to agriculture yield
data”, International Journal of Advanced Research
The accuracy obtained using the ensembling technique is in Computer and Communication Engineering
99.91%. Hence, the proposed work provides a helping hand Vol. 2, Issue 9, September2013 .
to the farmer in the accurate selection of the crop for
cultivation. This creates an exponential gain in the crop [8] Satish Babu (2013), ‘A Software Model for
productivity which in turn boosts the economy of the country. Precision Agriculture for Small and Marginal
Farmers’, at the International Centre forFree and
Open Source Software (ICFOSS) Trivandrum,
REFERENCES India.
[1] S.Pudumalar , E.Ramanujam , ”Crop [9] M.Soundarya, R.Balakrishnan,” Survey on

Recommendation System for Precision Classification Techniques in Data mining”,
Agriculture”, 2016, IEEE Eighth International International Journal ofAdvanced Research in
Conference on Advanced Computing (ICoAC) Computer and Communication Engineering Vol.
3, Issue 7, July 2014.
[2] Lokesh.K,Shakti.J, Sneha Wilson, Tharini.M.S,
“Automated crop prediction based on efficient [10] Miss. Snehal, S.Dahikar, Dr.SandeepV.Rode,
soil nutrient estimation using sensor network”, July ”Agricultural Crop Yield Prediction Using
2016,National Conference on Product Design (NCPD Artificial Neural Network Approach”.
2016) International Journal of Innovative Reasearch in
Electrical, Electronic, Instrumentation and Control
[3] Rakesh Kumar, M.P. Singh, Prabhat Kumar and Engineering, Vol. 2, Issue 1, January 2014.
J.P. Singh (2015), “Crop Selection Method to
Maximize Crop Yield Rate using Machine [11] Thoranin Sujjaviriyasup, Komkrit Pitiruek,
Learning Technique”, International Conference on ”Agricultural Product Fore- casting Using
Smart Technologies and Management for Machine Learning Approach”. Int. Journal of
Computing, Communication, Controls, Energy Math. Analysis, Vol. 7, no. 38, 1869 1875, 2013.
and Materials (ICSTM).
[12] Luke Bomn, James V. Zidek. (2012). “Efficient
[4] Monali Paul, Santosh K. Vishwakarma, Ashok stabilization of crop yield prediction in the
Verma (2015), “Analysis of Soil Behaviour and Canadian Prairies”,.Elsevier , P223-232.
Prediction of Crop Yield using Data Mining
Approach”, International Conference on [13] Raorane A.A.I, Kulkarni R.V.2. (2012). “Data
Computational Intelligence and Communication Mining: An effective tool for yield estimation in
Networks. the agricultural sector”. UETTCS. 1 (2), P75-79.
[5] Aakunuri Manjula, Dr.G .Narsimha (2015), [14] M.C.S.Geetha,” Implementation of Association
“XCYPF: A Flexible and Extensible Rule Mining for different soil types in
Framework for Agricultural Crop Yield Agriculture”, International Journal of Advanced
Prediction”, Conference on Intelligent Research in Computer and Communication
Systems and Control (ISCO) Engineering Vol. 4, Issue 4, April 2015.
[6] Anshal Savla, Parul Dhawan, Himtanaya [15] T.R.Lekha, “Efficient Crop Yield and Pesticide
Bhadada, Nivedita Israni, Alisha Mandholia , Prediction for Improving Agricultural Economy
Sanya Bhardwaj (2015), ‘Survey of classification using Data Mining Techniques”, International
algorithms for formulating yield prediction Journal of Modern Trends in Engineering and
accuracy in precision agriculture', Innovations in Science, Vol-03,Issue-10, 2016
Information, Embedded and Communication
systems (ICIIECS). [16] Shweta Taneja, Rashmi Arora, Savneet Kaur,
“Mining of Soil Data Using Unsupervised
Learning Technique”, International Journal of
ISBN: 978-1-5386-6078-2 © 2018 IEEE 118

Applied Engineering Research,ISSN 0973- 4562

Vol. 7 No.11, 2012.
[17] Washington Okori, Joseph Obua, “Machine

Learning Classification Technique for Famine
Prediction”. Proceedings of the World Congress
on Engineering 2011 Vol II WCE 2011, July 6 - 8,
London, U.K, 2011.
[18] Liying Yang (2011), ‘Classifiers selection for

ensemble learning based on accuracy and
diversity’, Elsevier Ltd.
[19] Aymen E Khedr, Mona Kadry, GhadaWalid (2015),

‘Proposed Framework for Implementing Data
Mining Techniques to Enhance Decisions in
Agriculture Sector Applied Case on Food Security
Information Center Ministry of Agriculture, Egypt’,
International Conference on
Communications, management, andInformation
technology (ICCMIT').
[20] Roshani Ade, P.R.Deshmukh (2014), ‘Efficient

Knowledge Transformation System Using Pair
of Classifiers for Prediction of Students Career
Choice’, International Conference on Information
and Communication Technologies (ICICT).
[21] LiorRokach, “Ensemble-based classifiers”,

ArtifIntell Rev (2010) 33:1–39DOI 10.1
ISBN: 978-1-5386-6078-2 © 2018 IEEE 119

Paper 7

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Paper 7

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper 7

Uploaded by

Copyright:

Available Formats

3rd IEEE International Conference on Computational Systems and Information Technology for Sustainable Solutions 2018

Improving Crop Productivity Through A Crop

ISBN: 978-1-5386-6078-2 © 2018 IEEE 114

ISBN: 978-1-5386-6078-2 © 2018 IEEE 115

Step 4: Building individual classifiers on the training

Step 6: Ensembling the individual classifier output using

The class labels obtained from the individual classifiers is

The ensemble framework is of utmost importance. The

Ensembling is a technique of building a prescient model by

Ensembling uses two frameworks, dependent framework and

ISBN: 978-1-5386-6078-2 © 2018 IEEE 116

Naive Bayes is a classification algorithm for binary and

Linear SVM is the current machine learning algorithm that is

ISBN: 978-1-5386-6078-2 © 2018 IEEE 117

[1] S.Pudumalar , E.Ramanujam , ”Crop [9] M.Soundarya, R.Balakrishnan,” Survey on

ISBN: 978-1-5386-6078-2 © 2018 IEEE 118

Applied Engineering Research,ISSN 0973- 4562

[17] Washington Okori, Joseph Obua, “Machine

[18] Liying Yang (2011), ‘Classifiers selection for

[19] Aymen E Khedr, Mona Kadry, GhadaWalid (2015),

[20] Roshani Ade, P.R.Deshmukh (2014), ‘Efficient

[21] LiorRokach, “Ensemble-based classifiers”,

ISBN: 978-1-5386-6078-2 © 2018 IEEE 119

You might also like