Data Mining Application To Identify Crop Disease and Recommendation A Solution

This document proposes a framework to identify crop diseases and recommend solutions using big data analytics. The framework collects agricultural data from various sources, cleanses the data by removing irrelevant information, and extracts features. It then uses Hadoop and Hive tools to analyze the data and identify crop diseases based on symptom similarity. The framework recommends solutions based on historical data and highest similarity. It was tested on identifying leaf blast disease in paddy crops in Sindh, Pakistan. The framework aims to help farmers and researchers make better decisions using large agricultural data sets.

Uploaded by

anon_199410665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views5 pages

Data Mining Application To Identify Crop Disease and Recommendation A Solution

Uploaded by

anon_199410665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data mining application to Identify Crop Disease and Recommendation

a Solution

Abstract
Rapidly advancements in the technology causes agricultural data enter into the era
of big data now days. Traditional tools and techniques are unable to store and
analyze this massive amount of data. To store and analyze this type of data parallel
computing and analyze paradigm is required. Big data analytic is used as a solution
to this. In the paper big data analytic Agriculture framework is developed that
identify disease based on symptoms similarity and recommend a solution based on
high similarity. To achieve this objective HADOOOP and other one is Hive tools
has been used now days for such type of problem solution. The data is collected,
cleansed and normalized. Data is collected from laboratory reports, web sites etc.
then cleansing of data is done that is important information is extracted from
unstructured redundant data. In the next step normalization is done that is features
are extracted from cleaned data.

INTRODUCTION
With the technological advancements and augmented growth of data agriculture
data has entered the era of big data .Big data is a term used to depict augmented
growth of data. Data may be in the form of file system or it may be in database.
And that data can’t be processed by traditional software techniques and databases
The main aim of the paper is to develop a recommendation system to identify and
provide solution of agriculture crop diseases. With the help of big data agriculture
analytics, researchers can easily make decision from historical data. It will be a
great innovation and pioneering work in human history if big data analytics is used
in agriculture [4].Agriculture data is increasing day by day at astonishing rate.

recommend solutions based on historical data.And this framework will help

researchers in decision making and it is easily understandable .The solution that is
highly used for a particular disease has highest priority. For demonstration
developed framework is used to identify Paddy crop leaf blast disease and
recommend a solution. Apache hadoop and its various tools are explained in next
section. Hadoop and Tools Platform/Tools Description Apache Hadoop Apache
Hadoop is software framework that is a freely available in universe and apache
Foundation get registered trademark of it.Hadoop is supported by linux. hadoop
implements a distributed processing , storage and execution environment .
Hadoop is a software written in java programming language that provides
framework for MapReduce jobs. MapReduce MapReduce is a programming
model used for parallelizable problems across large scale data sets using many
computers or nodes. A MapReduce program contain two methods map (), reduce
(). The Hadoop Distributed File System (HDFS) Hadoop distributed file system; it
uses commodity hardware to superfluous repository of bulk amount of data.
Commodity hardware may be a higher capacity single CPU machine or server. In
HDFS there two types of nodes: data nodes, Name node. Apache Mahout Mahout
is mainly used in machine learning applications. Its aim is to build an environment
for quickly creating extensible performing machine learning applications. Apache
Hive HIVE is a tool a used for data warehousing. To extract data out

from hadoop system Hive provides interface that is similar to SQL interface which
is termed as HIVEQL HIVE query language. Hive is used for querying data in
distributed environment.

Apache HBase It is distributed and non relational DBMS that means it does not
support SQL Structured query language.java is used to write the HBase
applications.

Apache Pig

It is a open source project used as a scripting platform for evaluating and

analyzing large data sets. Compiler is the bottom layer of Pig. Pig has significant
advantage due to its structure is that it can support the highly parallel test which is
based on parallel computing .

RELATED WORK
There are various sectors in which big data analytics is used. Data in Agriculture
sector is growing at rapid rate and it also enter in the era of big data .in the year
2015 IBM introduced agriculture big data analytics. Various software platforms are
developed to give information to farmers about new tools and techniques

sources are used that are web pages,databases,flat files etc. Predict the healthcare
benefits of different drugs and life style choice of patient. Risk factor of heart
disease is identified based on LDL and HDL level of cholesterol. At ideal levels of
diastolic and systolic patient’s blood pressure is under control and have less risk of
moving to next stage of hypertension. FRAMEWORK METHODOLOGY
Primary motive of generation of results from the collection of data is to serve
researchers by giving a solution for various diseases of crops. It was not an easy
task to develop a new framework identify disease and recommend solution based
on symptoms similarity. These frameworks provide the solution based on historical
data. Data for this framework is collected from various sources. This model
basically works on recommendation system. The recommendation systems use the
historical data or the knowledge of the product. Many e-commerce companies use
recommendation system for sales (e.g. Amazon. in). In the proposed model
recommendation system is applied to agriculture domain. Firstly data is collected
from various sources e.g. lab reports, agriculture websites etc. collected data is
known as raw data because it contain irregularities and unwanted information. So
data is unformatted and it needs formatting or confirmation. This data is stored on
HDFS. NameNode of HDFS keeps track how your files are broken down into file
blocks, which nodes store those blocks. clients communicates directly with
DataNode to process the local files corresponding to the blocks. Data sources are:
Laboratory Test reports: It is a crucial source of data for researchers .the tests
conducted are soil, water, manure, plant analysis etc.

Agriculture info websites: These websites act like mentor for farmers. These sites
give information related to agricultural economic entity; commonly used pesticides
etc. agriculture information websites provide information to farmers about which
crop to plant where and when. And suggest solutions to various problems related to
crops. by these sites farmers get knowledge about new techniques and tools.
Agriculture department reports: Using these reports decision making is easy for
crops of particular area.These reports are important to provide information
regarding particular field of a geographical area.

Data that is collected from above sources is stored on Hadoop distributed file
system in the form of text file. Collected data is unstructured and it contain
irrelevant data.

Firstly unimportant data is removed and relevant data is extracted from collected
data. Then features are selected and extracted from relevant data and save into text
file on hive data warehouse. Hive is used to querying the data in distributed
environment. Hive is open source software tool used for data ware housing. To
extract data out from Hadoop system Hive provides interface that is similar to SQL
interface which is termed as HIVEQL HIVE query language.

Query is submitted in distributed environment by three ways: 1. By using

command line interface 2. Application programming interface 3. Web user
interface Thrift server is used as an interface when client and server use different
language. HiveQL extract data from hive data

warehouse and save query results into text file that will store on HDFS. Now
submit text file to distributed environment to identify crop disease name based on
crop disease symptoms similarity. In this process after splitting text file submitted
to mapper to calculate pair based symptoms similarity, pair based similarity ignore
spelling mistakes and word ordering this will increase efficiency of
recommendation system. After calculate similarity mapper create a pair <key,
value>, save into file and submitted to reducer, in this system disease name is key
and similarity, solution and location are save as values. Reducer calculates average
similarity where disease name (key) is same and select high similarity disease.
Now select a high similarity solution id from file that saves by mapper.

EXPERIMENTAL SETUP AND RESULTS

For demonstration purpose the developed model use to identify Paddy crop leaf
blast disease based on symptoms similarity and recommend solution for the
specific region. To select crop and region specific data from the Hive data
warehouse Query is: INSERT OVERWRITE LOCAL DIRECTORY
'/home/raghu/Documents/' ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' SELECT Dname, location, solutionID FROM crop-
details WHERE crop name="qurban" AND state="sindh" ORDER by location;
Above Query is implemented using Java API and save results on HDFS in text file
format. Now submitted this file to map reduce to calculate the similarity to identify
disease name. mapper divides the file into parts and give each part to different
hosts for processing .In this process mapper splitting text file submitted to different
host to calculate pair based symptoms similarity, pair based similarity ignore
spelling mistakes and word ordering this will increase efficiency of
recommendation system. After calculate similarity reducer calculate the average
of same disease shown in graph

.CONCLUSION

with such type of apps we solve our now days agricultural problem on the spot and
countries goes on the peak of the economy and grow.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Big Data Emerging Technologie
No ratings yet
Big Data Emerging Technologie
10 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
7 pages
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Big Data Analytics Using Hadoop Tools - Apache Hive VS Apache Pig - 1604726800
No ratings yet
Big Data Analytics Using Hadoop Tools - Apache Hive VS Apache Pig - 1604726800
5 pages
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Agriculture Data Analysis Using Parallel K-Nearest Neighbour Classification Algorithm
No ratings yet
Agriculture Data Analysis Using Parallel K-Nearest Neighbour Classification Algorithm
9 pages
Sih Pro
No ratings yet
Sih Pro
5 pages
Using Big Data Analytics in The Field of Agriculture A Survey
No ratings yet
Using Big Data Analytics in The Field of Agriculture A Survey
3 pages
Crop Care Formatted Final
No ratings yet
Crop Care Formatted Final
34 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Detection of Diseases in Rice Plants Using Machine Learning Techniques
No ratings yet
Detection of Diseases in Rice Plants Using Machine Learning Techniques
25 pages
Big Data and Machine Learning With Hyperspectral Information in Agriculture
No ratings yet
Big Data and Machine Learning With Hyperspectral Information in Agriculture
20 pages
Bda Lab Manual Symca .Docx-1
No ratings yet
Bda Lab Manual Symca .Docx-1
18 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Plant Disease Detection
No ratings yet
Plant Disease Detection
6 pages
Unit 4
No ratings yet
Unit 4
4 pages
Hadoop Based Feature Selection and Decision Making Models On Big Data
No ratings yet
Hadoop Based Feature Selection and Decision Making Models On Big Data
6 pages
BDP Unit 4
No ratings yet
BDP Unit 4
28 pages
Hadoop For Dummies
From Everand
Hadoop For Dummies
Dirk deRoos
3/5 (2)
Big Data Testing
100% (1)
Big Data Testing
34 pages
Project Proposal-Saida-adcr513
No ratings yet
Project Proposal-Saida-adcr513
12 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Plant Leaf Disease Fertilizer Recommendation System
No ratings yet
Plant Leaf Disease Fertilizer Recommendation System
30 pages
Ultimate Big Data Analytics with Apache Hadoop: Master Big Data Analytics with Apache Hadoop Using Apache Spark, Hive, and Python
From Everand
Ultimate Big Data Analytics with Apache Hadoop: Master Big Data Analytics with Apache Hadoop Using Apache Spark, Hive, and Python
Simhadri Govindappa
No ratings yet
ICAI 2023 Paper 3719
No ratings yet
ICAI 2023 Paper 3719
6 pages
ENSEMBLED CROPIFY Crop Amp Fertilizer Recommender System With Leaf Disease Prediction
No ratings yet
ENSEMBLED CROPIFY Crop Amp Fertilizer Recommender System With Leaf Disease Prediction
5 pages
Agricultural Data Analysis
No ratings yet
Agricultural Data Analysis
9 pages
6596-Article Text-7064-1-10-20230514
No ratings yet
6596-Article Text-7064-1-10-20230514
6 pages
Comparative Analysis of MapReduce and Apache Tez Performance in Multinode Clusters With Data Compression
No ratings yet
Comparative Analysis of MapReduce and Apache Tez Performance in Multinode Clusters With Data Compression
8 pages
Plant Disease Detection and Classification by Deep Learning
No ratings yet
Plant Disease Detection and Classification by Deep Learning
11 pages
Big Data Analytics Litrature Review
No ratings yet
Big Data Analytics Litrature Review
7 pages
Smart Farm: Data Driven Crop Recommendation System
No ratings yet
Smart Farm: Data Driven Crop Recommendation System
71 pages
Final
No ratings yet
Final
50 pages
Crop Disease and Yield Prediction System Using ML or Plant Disease - Leaf Disease Predictor and Solution Provider
No ratings yet
Crop Disease and Yield Prediction System Using ML or Plant Disease - Leaf Disease Predictor and Solution Provider
10 pages
BATCH12
No ratings yet
BATCH12
32 pages
Performance Comparison of Apache Hadoop and Apache Spark
No ratings yet
Performance Comparison of Apache Hadoop and Apache Spark
5 pages
Big Data and Hadoop: A Review Paper
No ratings yet
Big Data and Hadoop: A Review Paper
3 pages
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Agronomy 12 00748 With Cover
No ratings yet
Agronomy 12 00748 With Cover
35 pages
Comprehension Presentation Updated
No ratings yet
Comprehension Presentation Updated
48 pages
Self-Medical Analysis Using Internet-Based Computing Upon Big Data
No ratings yet
Self-Medical Analysis Using Internet-Based Computing Upon Big Data
6 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
21 pages
Leaf Disease Detection Using Machine Learning: A Project Phase I On
No ratings yet
Leaf Disease Detection Using Machine Learning: A Project Phase I On
24 pages
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
A Review On The Combination of Deep Learning Techniques With Proximal Hyperspectral Images in Agriculture - ScienceDirect
No ratings yet
A Review On The Combination of Deep Learning Techniques With Proximal Hyperspectral Images in Agriculture - ScienceDirect
9 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
Big Data 1 PDF
No ratings yet
Big Data 1 PDF
17 pages
Report - Ai Based Optimization of Fertilizer Usage With Agro Health Card
No ratings yet
Report - Ai Based Optimization of Fertilizer Usage With Agro Health Card
72 pages
English Grade - 5 (Model Paper) PDF
100% (1)
English Grade - 5 (Model Paper) PDF
2 pages
HCI
No ratings yet
HCI
5 pages
Collection Framework
No ratings yet
Collection Framework
54 pages
Collection Framework
No ratings yet
Collection Framework
54 pages
Collection Framework
100% (1)
Collection Framework
54 pages
XZCVV
No ratings yet
XZCVV
79 pages
Key To Maps PDF
No ratings yet
Key To Maps PDF
1 page
Key To Maps PDF
No ratings yet
Key To Maps PDF
1 page
Key To Maps
No ratings yet
Key To Maps
1 page
Outline Marx2A
No ratings yet
Outline Marx2A
1 page
Ukcp Safeguarding Guidelines 2018
No ratings yet
Ukcp Safeguarding Guidelines 2018
5 pages
Ncma217 Week11 Reclec Mod
No ratings yet
Ncma217 Week11 Reclec Mod
10 pages
Biosignal Processing Final Exam Updated
No ratings yet
Biosignal Processing Final Exam Updated
3 pages
Sat Practice Test 7
No ratings yet
Sat Practice Test 7
3 pages
Hydrology WSE 3 2008
No ratings yet
Hydrology WSE 3 2008
22 pages
Jasmina Milicevic
100% (1)
Jasmina Milicevic
17 pages
All in The Stars English Practice
No ratings yet
All in The Stars English Practice
4 pages
Scaler User Manual
No ratings yet
Scaler User Manual
20 pages
MSUAAF Glidden 2013 Plans Book
No ratings yet
MSUAAF Glidden 2013 Plans Book
24 pages
Insit of Medicine Members 2008
No ratings yet
Insit of Medicine Members 2008
33 pages
Phần trả lời
No ratings yet
Phần trả lời
4 pages
Technical Specifications / Tender Text Wöhr Autoparksysteme GMBH Parklift 462-2,0 D / 462-2,6 D
No ratings yet
Technical Specifications / Tender Text Wöhr Autoparksysteme GMBH Parklift 462-2,0 D / 462-2,6 D
1 page
Problem Solving and Conceptual Understanding
No ratings yet
Problem Solving and Conceptual Understanding
4 pages
Calculate With Confidence 8th Edition Morris Test Bank Available Instantly
No ratings yet
Calculate With Confidence 8th Edition Morris Test Bank Available Instantly
311 pages
Life Plan by Randy Pope
No ratings yet
Life Plan by Randy Pope
25 pages
ISO 9001 Internal Auditor Training
100% (3)
ISO 9001 Internal Auditor Training
7 pages
Democracy in Athens
No ratings yet
Democracy in Athens
2 pages
EPB-6. Cs-Ti
No ratings yet
EPB-6. Cs-Ti
29 pages
LSB Exercise 1 Boot Sequence
No ratings yet
LSB Exercise 1 Boot Sequence
11 pages
500D High Pressure Syringe Pump Datasheet PDF
No ratings yet
500D High Pressure Syringe Pump Datasheet PDF
2 pages
2022 - Digital Transformation Towards Education 4.0
No ratings yet
2022 - Digital Transformation Towards Education 4.0
28 pages
Engineer Pros Backend Level 2 Course
No ratings yet
Engineer Pros Backend Level 2 Course
12 pages
Indian Mathematicians
No ratings yet
Indian Mathematicians
12 pages
Essentials C3D2010 Session 01 Introduction
No ratings yet
Essentials C3D2010 Session 01 Introduction
13 pages
APPENDIX IV Geotechnical Factual Report
No ratings yet
APPENDIX IV Geotechnical Factual Report
69 pages
ControllerKUKA Sunrise Cabinet Med
No ratings yet
ControllerKUKA Sunrise Cabinet Med
114 pages
Open or Closed Communion?: by A. C. Sas
No ratings yet
Open or Closed Communion?: by A. C. Sas
8 pages
Ma 2024 How AI Use in Organizations Contributes To Employee Competitive Advantage - The Moderating Role of Perceived Organization Support
No ratings yet
Ma 2024 How AI Use in Organizations Contributes To Employee Competitive Advantage - The Moderating Role of Perceived Organization Support
14 pages
64709b0902cd9 RN Ati Capstone Proctored Comprehensive Assessment 2019 B Ati Comprehensive Practice Test B Best Study Guide Version With Complete Solution 2 Revised (1) - 2
No ratings yet
64709b0902cd9 RN Ati Capstone Proctored Comprehensive Assessment 2019 B Ati Comprehensive Practice Test B Best Study Guide Version With Complete Solution 2 Revised (1) - 2
1 page
Study of E Banking Services Offered by ICICI Bank Manavi Mhaskar 09
No ratings yet
Study of E Banking Services Offered by ICICI Bank Manavi Mhaskar 09
58 pages

Data Mining Application To Identify Crop Disease and Recommendation A Solution

Uploaded by

Data Mining Application To Identify Crop Disease and Recommendation A Solution

Uploaded by

Data mining application to Identify Crop Disease and Recommendation

recommend solutions based on historical data.And this framework will help

It is a open source project used as a scripting platform for evaluating and

Query is submitted in distributed environment by three ways: 1. By using

EXPERIMENTAL SETUP AND RESULTS

You might also like