0% found this document useful (0 votes)
80 views8 pages

Analysis and Prediction in Agricultural Data Using Data Mining Techniques

This document discusses analyzing Indian agricultural data using data mining techniques to improve crop yields. It summarizes common data mining algorithms like K-means, DBSCAN, and EM clustering that have been applied to agricultural data. The document also provides an overview of the agricultural dataset from India containing attributes like crop type, production, rainfall, and temperature from 2005-2009. It proposes applying data mining algorithms to this dataset using the WEKA tool to form clusters and gain insights to help increase crop yields.

Uploaded by

Bhaskar Rao P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views8 pages

Analysis and Prediction in Agricultural Data Using Data Mining Techniques

This document discusses analyzing Indian agricultural data using data mining techniques to improve crop yields. It summarizes common data mining algorithms like K-means, DBSCAN, and EM clustering that have been applied to agricultural data. The document also provides an overview of the agricultural dataset from India containing attributes like crop type, production, rainfall, and temperature from 2005-2009. It proposes applying data mining algorithms to this dataset using the WEKA tool to form clusters and gain insights to help increase crop yields.

Uploaded by

Bhaskar Rao P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Journal of Research In Science & Engineering e-ISSN: 2394-8299

Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280

ANALYSIS AND PREDICTION IN AGRICULTURAL DATA USING


DATA MINING TECHNIQUES
Vinayak A. Bharadi1, Prachi P. Abhyankar2, Ravina S. Patil3, Sonal S. Patade4 ,
Tejaswini U. Nate5, Anaya M. Joshi6
1-6
Information Technology, Finolex Academy of Management and Technology, India

Abstract : Agriculture contributes nearly sixteen percent to total GDP of India and ten
percent of the total exports which helps in increasing foreign exchange. The population of
India is continuously increasing and to meet the food necessities of this growing
population, agricultural yield should be boosted. Knowledge discovered from raw data is
useful for many purposes. Data mining techniques are better choices for the same. This
paper aims to analyze the agricultural data of India using data mining algorithms and to
find useful information from the results of these techniques which would help to improve
the agricultural yield. Various mining algorithms applied on agricultural data were
studied. Data mining techniques applied in this paper include clustering algorithms- K-
means, DBSCAN, EM; the results of these algorithms are analyzed.
Keywords –Data Mining, DBSCAN, K-means, EM, WEKA

I. INTRODUCTION

Agriculture is the backbone of Indian economy. Though number of emerging sectors such
as IT and BPOs are contributing significantly to the GDP of India, agriculture is still the most
important sector. Agriculture majorly contributes to the exports of India, directly improving
foreign currency exchange. In India, majority of the farmers do not get expected yield due to
several reasons. The agricultural yield primarily depends on environmental factors such as
rainfall, temperature and geographical topology of the particular region. These factors along
with some other influence the crop cultivation.
In this context farmers require timely advices to predict the crop productivity and to predict
this, an intensive analysis should be made in order to achieve desired results accurately. Yield
is an important agricultural issue. Large amount of data can be gathered from Indian
agriculture sector. Knowledge acquired from data is highly useful for many purposes. Data
mining is a field in Information Technology that deals with finding unknown and hidden
patterns from the available data. Applying data mining techniques in agricultural field to
predict useful crop productivity related information is a noble work [1].
This paper aims to analyze such agricultural data using data mining techniques and
consolidate the knowledge acquired from the result of data mining techniques. The
comparison of results from different data mining algorithms will be made which will help in
finding the most suitable algorithm for agricultural data.

IJRISE| www.ijrise.org|[email protected] [386-393]


International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280

II. BACKGROUND

Data mining in the field of agriculture is a recent research topic. It consists within the
application of knowledge of data mining techniques to agriculture. Recent technologies are
nowadays able to abundant information on agriculture related activities, which can then be
analyzed in order to find important information. India is agriculture based country. Crop yield
depends on multiple different factors such as climate changes, soil type etc. Farmers are
interested in knowing the crop yield beforehand. Traditionally, this process was dependent on
experiences of farmers and it used to be limited only for a particular region.
Data mining techniques can be helpful in predicting crop yield. Data mining techniques such
as data classification and data clustering can be used for data analysis. Data classification is
supervised learning where training data set is used to classify the further data. Data clustering
is unsupervised learning where training set in unavailable [2]. Multiple data mining
algorithms have been used to analyze agricultural data. Various algorithms including K-
Means, K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN) and Support Vector
Machines (SVM) are applicable to agricultural data. Suitable data models can be found out
that achieve a high accuracy in terms of yield prediction [3].
Agriculture sector in India is facing problems to increase the crop yield to meet the demands
of growing population. More than 50% of crops are still dependent on Monsoon. The
researchers’ implemented K-Means algorithm to forecast the pollution in the atmosphere, the
K Nearest Neighbor is applied for simulating daily rains and other weather variables and
various changes of the weather conditions are analyzed using Support Vector Machines [3].
Artificial Neural Networks can be used to analyze the patterns in soil data set [4].

Frequent pattern mining is also a data mining technique. A frequent pattern is a pattern that
occurs frequently in a dataset and provides crucial information that was unknown before [5].
Support vector machine is a binary classifier. It is able to disjoint classes. The basic idea
behind it is to classify the sample data into linearly separable classes. It is a set of allied
supervised learning methods used for classification and regression. It is used to access
spatiotemporal characteristics of the soil moisture product [6].
Decision tree is one of the popular classification algorithm that is currently used in data
mining and machine learning. Decision tree involves algorithmic gaining of structured
knowledge in the forms such as- concepts, decision trees and discrimination nets or
production rules [7].
A Naïve Bayes classifier is a simple probabilistic classifier established on applying Bayes
theorem with strong independence assumptions. Depending on the precise kind of probability
model, Naïve Bayes classifier can be trained very proficiently in a supervised learning
settings. J48 is an open source java implementation of the C4.5 algorithm in the weka data
mining tool. C4.5 is a program that makes a decision tree based on the set of labelled input
data. This decision tree can be tested against unseen labelled test data to tell how well it
generalizes [8].
IJRISE| www.ijrise.org|[email protected] [386-393]
International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280
Partitioning algorithms are based on specifying initial number of groups and
iteratively altering objects among groups to conjunction. In contrast hierarchical algorithms
combine and divide existing groups creating hierarchical structure that returns the order in
which groups are combined or divided [9]. Data clustering is an efficient unsupervised
learning technique that deals with grouping unlabeled data into clusters. Clustering
algorithms such as k-Means Clustering, Hierarchical Clustering, DBSCAN (Density Based
Spatial Clustering of Applications with Noise) clustering, OPTICS (Ordering Points to
Identify the Clustering Structure), STING (Statistical Information Grid) [10]. The WEKA
(Waikato Environment for Knowledge Analysis) system provides a broad suite of facilities
for applying data mining techniques to large data [11]. Overview of the data used for analysis
is given in the next section.

III. OVERVIEW OF DATA

The data used in this paper are obtained for the years from 2005 to 2009 from website of
Planning Commission of India [12]. It contains information about plantation, fruits and
vegetables of 35 states of India including- Andhra Pradesh, Andaman Nicobar, Arunachal
Pradesh, Assam, Bihar, Chandigarh, Chhattisgarh, Dadra and Nagar Haveli, Daman and Diu,
Delhi, Goa, Gujrat, Haryana, Himachal Pradesh, Jammu and Kashmir, Jharkhand, Karnataka,
Kerala, Lakshadweep, Madhya Pradesh, Maharashtra, Manipur, Meghalaya, Mizoram,
Nagaland, Orissa, Pondicherry, Punjab, Rajasthan, Sikkim, Tamil Nadu, Tripura, Uttar
Pradesh, Uttarakhand, West Bengal. The dataset contains total 4180 instances having eight
attributes. They are Year, State, Crop type, and Crop name, Area, Production, Rainfall and
Temperature. The data has been gathered from website of planning commission of India [13].
Following figure shows database schema.

Fig. 3.1 Dataset Overview

IJRISE| www.ijrise.org|[email protected] [386-393]


International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280
IV. METHODOLOGY

The following diagram depicts the process model to analyze agricultural data to predict
useful information from it-

Fig. 4.1 Architecture Diagram

WEKA (Waikato Environment for Knowledge Analysis) provides filters for


preprocessing of data. Following data mining algorithms are used to form clusters of data- K-
means, Density based algorithm, EM. The results of individual algorithms are given below.

V. RESULT ANALYSIS
1. K-means

In K-means algorithm clusters are formed based on centroids. On applying this


algorithm, two clusters of data were formed. Clusters and their centroids w.r.t
attributes are given below-

Table 5.1 Result of K-means algorithm

Attribute Cluster 0 Cluster 1

Production 72.1519 305.1148

IJRISE| www.ijrise.org|[email protected] [386-393]


International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280
Rainfall 1903.281 1405.904

Temperature 17.8973 26.0942

Means and standard deviations of clusters formed by DBSCAN and EM algorithms


are given below:

2. DBSCAN

Table 5.2 Result of DBSCAN algorithm

Attribute Cluster 0 Cluster 1

Mean 72.1519 305.1148


Production
Std. Dev 271.7347 838.2223

Mean 1903.281 1405.904


Rainfall
Std. Dev 767.6352 723.2672

Mean 17.8973 26.0942


Temperature
Std. Dev 2.8739 1.4335

3. EM

Table 5.3 Result of EM algorithm

Attribute Cluster Cluster Cluster Cluster Cluster


0 1 2 3 Cluster 4 5

1504.266
Mean 98.7995 62.3811 45.6236 0 8 31.2018
Production
Std. 115.402 718.772 1481.156
Dev 7 83.3918 59.0153 3 5 37.5763

1167.49 3004.69 2783.75 1649.42 1468.171 1385.29


Mean 3 1 2 4 6 6
Rainfall
Std. 405.531 837.683 453.286
Dev 3 37.5244 35.2653 2 681.3877 7

Temperatur Mean 25.9862 26.7027 20.3796 23.2489 24.429 16.9638

IJRISE| www.ijrise.org|[email protected] [386-393]


International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280
e Std.
Dev 1.5017 0.6337 1.8794 4.3715 4.0206 2.9644

Result analysis shows that production tends to increase when rainfall ranges from 1405.904mm
to 1562.3756mm and temperature ranges from 23.5156o C to 26.0942o C.

DBSCAN algorithm gives similar results as base algorithm K-means, whereas EM gives more
specific production values on given rainfall and temperature range as compared to K-means
and DBSCAN.

VI. CONCLUSION

In this paper certain data mining algorithms were adopted to cluster the data that
shows relevance with desired attributes. K-means clustering algorithm is adopted as base
algorithm. DBSCAN and EM algorithms are also applied to data. DBSCAN showed similar
behavior to K-means algorithm. Many data mining techniques yet have not been applied to
agricultural data.
Future work aimed at applying advanced mining techniques to larger dataset such as one of
the big data techniques MapReduce.

ACKNOWLEDGEMENT

We would like to express our sincere gratitude towards Finolex Academy of Management
and Technology for providing an encouraging environment and all the required resources for
project work. We are thankful to all our professors and friends for their unrelenting support.
We are thankful to planning commission of India for making data available.

REFERENCES

[1] Jiawei Han, Micheline Kamber, Jian Pie, “Data Mining Concepts and Techniques”,
Morgan Kaufmann, ASIN B0058NBJ2M

[2] D. Ramesh, Vishnu Vardhan, “Analysis of Crop Yield Prediction using Data Mining
Techniques” IJRET: International Journal of Research in Engineering and Technology
eISSN: 2319-1163 | pISSN : 2321-7308

IJRISE| www.ijrise.org|[email protected] [386-393]


International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280
[3] D.Ramesh, B Vishnu Vardhan, “Data Mining Techniques and Applications to
Agricultural Yield Data” International Journal of Advance Research in Computer and
Communication Engineering Volume 2, Issue 9,September 2013 ISSN: 2319-5940

[4] Dr. D. Ashok Kumar, N. Kannathasan, “A Survey on Data Mining and Pattern
Recognition Techniques for Soil Data Mining” IJCSI International Journal of Computer
Science Issues, Volume 8, Issue 3, No. 1, May 2011 ISSN :1694-0814 www.ijcsr.org

[5] Dr. Jean-Claude Franchitti, “Data Mining Session 6 – Mining Frequent Patterns,
Association, and Correlations” Adapted from course textbook resources Data Mining
Concepts and Techniques (2nd Edition)
Jiawei Han and Micheline Kamber

[6] Andrew Smith, Neil Alldrin, Doug Turnbull, “Clustering with EM and K-Means”
International Journal of Advance Research in Computer and Communication Engineering

[7] Dr. D. Ashok Kumar, N. Kannathasan, “A Survey on Data Mining and Pattern
Recognition Techniques for Soil Data Mining” IJCSI International Journal of Computer
Science Issues, Volume 8, Issue 3, No. 1, May 2011 ISSN :1694-0814

[8] “The Institute connecting the dots with Big Data” September 2014,
www.theinstitute.ieee.org.in

[9] Mr. Osama Abu Abbas, “Comparison between Data Clustering Algorithms” The
International Arab Journal of Information Technology Volume 5, No. 3, July 2008

[10] Aastha Joshi, Ranjeet Kaur, “A Review: Comparative Study of Various Clustering
Techniques in Data Mining”, International Journal of Advanced Research in Computer
Science and Software Engineering Volume 3, ISSN: 2277 128X, Issue 3, March 2013

[11] Sally Jo Cunningham and Geoffrey Holmes, “Developing Innovative Applications in


Agriculture Using Data Mining”, Department of Computer Science, University of Waikato,
Hamilton, New Zealand

[12] www.planningcommission.nic.in/data/datatables/index.php?data=datatab

[13] S. Shajitha Banu, S.Manjula, S.Swathi Priya, V.Yamuna Devi


Predictive Analysis of Rainfall Data to Help the Farmers
International Journal of Advanced Research in Computer Science and Software Engineering
Volume 6, Issue 3, March 2016 ISSN: 2277 128X

IJRISE| www.ijrise.org|[email protected] [386-393]


International Journal of Research In Science & Engineering e-ISSN: 2394-8299
Special Issue 7-ICEMTE March 2017 p-ISSN: 2394-8280

IJRISE| www.ijrise.org|[email protected] [386-393]

You might also like