0% found this document useful (0 votes)
59 views5 pages

Data Mining Application To Identify Crop Disease and Recommendation A Solution

This document proposes a framework to identify crop diseases and recommend solutions using big data analytics. The framework collects agricultural data from various sources, cleanses the data by removing irrelevant information, and extracts features. It then uses Hadoop and Hive tools to analyze the data and identify crop diseases based on symptom similarity. The framework recommends solutions based on historical data and highest similarity. It was tested on identifying leaf blast disease in paddy crops in Sindh, Pakistan. The framework aims to help farmers and researchers make better decisions using large agricultural data sets.

Uploaded by

anon_199410665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views5 pages

Data Mining Application To Identify Crop Disease and Recommendation A Solution

This document proposes a framework to identify crop diseases and recommend solutions using big data analytics. The framework collects agricultural data from various sources, cleanses the data by removing irrelevant information, and extracts features. It then uses Hadoop and Hive tools to analyze the data and identify crop diseases based on symptom similarity. The framework recommends solutions based on historical data and highest similarity. It was tested on identifying leaf blast disease in paddy crops in Sindh, Pakistan. The framework aims to help farmers and researchers make better decisions using large agricultural data sets.

Uploaded by

anon_199410665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data mining application to Identify Crop Disease and Recommendation

a Solution

Abstract
Rapidly advancements in the technology causes agricultural data enter into the era
of big data now days. Traditional tools and techniques are unable to store and
analyze this massive amount of data. To store and analyze this type of data parallel
computing and analyze paradigm is required. Big data analytic is used as a solution
to this. In the paper big data analytic Agriculture framework is developed that
identify disease based on symptoms similarity and recommend a solution based on
high similarity. To achieve this objective HADOOOP and other one is Hive tools
has been used now days for such type of problem solution. The data is collected,
cleansed and normalized. Data is collected from laboratory reports, web sites etc.
then cleansing of data is done that is important information is extracted from
unstructured redundant data. In the next step normalization is done that is features
are extracted from cleaned data.

INTRODUCTION
With the technological advancements and augmented growth of data agriculture
data has entered the era of big data .Big data is a term used to depict augmented
growth of data. Data may be in the form of file system or it may be in database.
And that data can’t be processed by traditional software techniques and databases
The main aim of the paper is to develop a recommendation system to identify and
provide solution of agriculture crop diseases. With the help of big data agriculture
analytics, researchers can easily make decision from historical data. It will be a
great innovation and pioneering work in human history if big data analytics is used
in agriculture [4].Agriculture data is increasing day by day at astonishing rate.

recommend solutions based on historical data.And this framework will help


researchers in decision making and it is easily understandable .The solution that is
highly used for a particular disease has highest priority. For demonstration
developed framework is used to identify Paddy crop leaf blast disease and
recommend a solution. Apache hadoop and its various tools are explained in next
section. Hadoop and Tools Platform/Tools Description Apache Hadoop Apache
Hadoop is software framework that is a freely available in universe and apache
Foundation get registered trademark of it.Hadoop is supported by linux. hadoop
implements a distributed processing , storage and execution environment .
Hadoop is a software written in java programming language that provides
framework for MapReduce jobs. MapReduce MapReduce is a programming
model used for parallelizable problems across large scale data sets using many
computers or nodes. A MapReduce program contain two methods map (), reduce
(). The Hadoop Distributed File System (HDFS) Hadoop distributed file system; it
uses commodity hardware to superfluous repository of bulk amount of data.
Commodity hardware may be a higher capacity single CPU machine or server. In
HDFS there two types of nodes: data nodes, Name node. Apache Mahout Mahout
is mainly used in machine learning applications. Its aim is to build an environment
for quickly creating extensible performing machine learning applications. Apache
Hive HIVE is a tool a used for data warehousing. To extract data out

from hadoop system Hive provides interface that is similar to SQL interface which
is termed as HIVEQL HIVE query language. Hive is used for querying data in
distributed environment.

Apache HBase It is distributed and non relational DBMS that means it does not
support SQL Structured query language.java is used to write the HBase
applications.

Apache Pig

It is a open source project used as a scripting platform for evaluating and


analyzing large data sets. Compiler is the bottom layer of Pig. Pig has significant
advantage due to its structure is that it can support the highly parallel test which is
based on parallel computing .

RELATED WORK
There are various sectors in which big data analytics is used. Data in Agriculture
sector is growing at rapid rate and it also enter in the era of big data .in the year
2015 IBM introduced agriculture big data analytics. Various software platforms are
developed to give information to farmers about new tools and techniques

sources are used that are web pages,databases,flat files etc. Predict the healthcare
benefits of different drugs and life style choice of patient. Risk factor of heart
disease is identified based on LDL and HDL level of cholesterol. At ideal levels of
diastolic and systolic patient’s blood pressure is under control and have less risk of
moving to next stage of hypertension. FRAMEWORK METHODOLOGY
Primary motive of generation of results from the collection of data is to serve
researchers by giving a solution for various diseases of crops. It was not an easy
task to develop a new framework identify disease and recommend solution based
on symptoms similarity. These frameworks provide the solution based on historical
data. Data for this framework is collected from various sources. This model
basically works on recommendation system. The recommendation systems use the
historical data or the knowledge of the product. Many e-commerce companies use
recommendation system for sales (e.g. Amazon. in). In the proposed model
recommendation system is applied to agriculture domain. Firstly data is collected
from various sources e.g. lab reports, agriculture websites etc. collected data is
known as raw data because it contain irregularities and unwanted information. So
data is unformatted and it needs formatting or confirmation. This data is stored on
HDFS. NameNode of HDFS keeps track how your files are broken down into file
blocks, which nodes store those blocks. clients communicates directly with
DataNode to process the local files corresponding to the blocks. Data sources are:
Laboratory Test reports: It is a crucial source of data for researchers .the tests
conducted are soil, water, manure, plant analysis etc.

Agriculture info websites: These websites act like mentor for farmers. These sites
give information related to agricultural economic entity; commonly used pesticides
etc. agriculture information websites provide information to farmers about which
crop to plant where and when. And suggest solutions to various problems related to
crops. by these sites farmers get knowledge about new techniques and tools.
Agriculture department reports: Using these reports decision making is easy for
crops of particular area.These reports are important to provide information
regarding particular field of a geographical area.

Data that is collected from above sources is stored on Hadoop distributed file
system in the form of text file. Collected data is unstructured and it contain
irrelevant data.

Firstly unimportant data is removed and relevant data is extracted from collected
data. Then features are selected and extracted from relevant data and save into text
file on hive data warehouse. Hive is used to querying the data in distributed
environment. Hive is open source software tool used for data ware housing. To
extract data out from Hadoop system Hive provides interface that is similar to SQL
interface which is termed as HIVEQL HIVE query language.

Query is submitted in distributed environment by three ways: 1. By using


command line interface 2. Application programming interface 3. Web user
interface Thrift server is used as an interface when client and server use different
language. HiveQL extract data from hive data

warehouse and save query results into text file that will store on HDFS. Now
submit text file to distributed environment to identify crop disease name based on
crop disease symptoms similarity. In this process after splitting text file submitted
to mapper to calculate pair based symptoms similarity, pair based similarity ignore
spelling mistakes and word ordering this will increase efficiency of
recommendation system. After calculate similarity mapper create a pair <key,
value>, save into file and submitted to reducer, in this system disease name is key
and similarity, solution and location are save as values. Reducer calculates average
similarity where disease name (key) is same and select high similarity disease.
Now select a high similarity solution id from file that saves by mapper.

EXPERIMENTAL SETUP AND RESULTS

For demonstration purpose the developed model use to identify Paddy crop leaf
blast disease based on symptoms similarity and recommend solution for the
specific region. To select crop and region specific data from the Hive data
warehouse Query is: INSERT OVERWRITE LOCAL DIRECTORY
'/home/raghu/Documents/' ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' SELECT Dname, location, solutionID FROM crop-
details WHERE crop name="qurban" AND state="sindh" ORDER by location;
Above Query is implemented using Java API and save results on HDFS in text file
format. Now submitted this file to map reduce to calculate the similarity to identify
disease name. mapper divides the file into parts and give each part to different
hosts for processing .In this process mapper splitting text file submitted to different
host to calculate pair based symptoms similarity, pair based similarity ignore
spelling mistakes and word ordering this will increase efficiency of
recommendation system. After calculate similarity reducer calculate the average
of same disease shown in graph

.CONCLUSION

with such type of apps we solve our now days agricultural problem on the spot and
countries goes on the peak of the economy and grow.

You might also like