Crop Price Prediction System Using Machi
Crop Price Prediction System Using Machi
Research Paper
ABSTRACT: Price Prediction, nowadays, has become a very important agricultural problem which is to be
solved only based on the available data. The aim of this paper is to predict the crop price for the next rotation.
This work is based on finding suitable data models that helps in achieving high accuracy and generality for
price prediction. For solving this problem, different Data Mining techniques were evaluated on different data
sets. This work presents a system which uses data analytics techniques in order to predict the price of the crop.
The proposed system will apply machine learning algorithms and predict the price of the crop based on multiple
factors like Area harvested, Area planted etc. This provides a farmer with an insight of what the future price of
the crop that he is going to harvest. Thus, the paper develops a system by integrating data from various sources,
data analytics, prediction analysis which can help predict the target price of the crop and increase the profit
margins of farmer helping them over a longer run. The complete research comes up to a conclusion that
XGBoost is the suitable technique for our project.
KEYWORDS: Data Analytics, Prediction, Machine Learning, Linear Regression, Decision Trees, XGBoost,
Neural
Networks
Received 04 Apr., 2020; Accepted 19 Apr., 2020 © The author(s) 2020.
Published with open access at www.questjournals.org
I. INTRODUCTION
Agriculture is the backbone of every economy. From ancient period, agriculture is considered as the
main and the foremost culture practiced in any region. There are multiple ways to increase and improve the crop
yield and the quality of the crops. Data mining is also useful for predicting the crop price. Generally, data
mining is the process of analyzing data from different perspectives and summarizing it into useful information.
Crop price prediction is an important agricultural problem. Each and every farmer always tries to know, how
much price he will get from his expectation. In the past, price prediction was calculated by analyzing farmer's
previous experience on a particular crop. Accurate information about history of crop yield is an important thing
for making decisions related to price prediction of the crops. Therefore, this paper proposes an idea to predict
the price of the crop. Data Analytics is the process of examining data sets in order to draw conclusions about the
information they contain, increasingly with the aid of specialized systems and software. Earlier, crop price
prediction was performed by considering the farmer's experience on a field and crop. However, as the
conditions change day by day very rapidly, farmers are forced to cultivate more and more crops. Being this as
the current situation, many of them don‟t have enough knowledge about the losses that might incur and are not
completely aware of the benefits they get while farming them. The proposed system applies machine learning
and prediction algorithm like Logistic Regression, Decision Trees, XGBoost, Neural Nets, and Clustering to
identify the pattern among data and then process it. This in turn will help predict the target price of the crop.
for the prediction which are Artificial Neural Networks, Information Fuzzy Network and Data Mining
techniques. Finally, it is seen that Artificial Neural Network is the suitable technique for the project.
III. METHODOLOGY
Tools Used: The whole system is implemented using python programming language in the jupyternotebook.
Workflow: The approach for analysis is shown in Figure 1.
Data Ingestion:
Data ingestion is the transportation of data from assorted sources to a storage medium where it can be
accessed, used, and analyzed by an organization. The destination is typically a data warehouse, data mart,
database, or a document store. Sources may be almost anything – including SaaS data, in-house apps, databases,
spreadsheets, or even information scraped from the internet. The data ingestion layer is the backbone of any
analytics architecture. Downstream reporting and analytics systems rely on consistent and accessible data. There
are different ways of ingesting data, and the design of a particular data ingestion layer can be based on various
models or architectures.
Data Preprocessing:
Data Preprocessing is a data mining technique used to transform the raw data into useful and efficient format.
The data here goes through 2 stages
1. Data Cleaning: It is very important for data to be error free and free of unwanted data. So, the data is
cleansed before performing the next steps. Cleansing of data includes checking for missing values, duplicate
records and invalid formatting and removing them.
2. Data Transformation: Data Transformation is transformation of the datasets mathematically; data is
transformed into appropriate forms suitable for data mining process. This allows us to understand the data more
keenly by arranging the 100‟s of records in an orderly way. Transformation includes Normalization,
Standardization, Attribute Selection.
Also, from the monthly trends graphs it is observed that China‟s imports are historically positively correlated to
the US exports trend, but in the recent years, Cina‟s import growth rate is becoming higher than US export
increase rate, which in trun implies China also possibly sources it‟s supply from other parts of the world.
Similar to the yearly trends, US exports roughly follows areas planted and harvested whereas China imports is
in stable growth irrespective of the yield and production.
The monthly and yearly time trends allowed us to identify the relationship of price with area planted and
harvested. Price is inversely proportional to the area planted and harvested
Clustering is used to answer the question „Which years have similar profiles of US Soybean production, export
and import?‟
Elbow curve is used to find the optimal number of clusters as 3 and the visualization of the centroids of the
clusters is as follows:
Modeling: Modelling of data involves creating a data model for the data to be stored in the database. The
process of modeling means training a Machine Learning Algorithm to predict the labels from the features,
tuning it for business need, and validating it on the hold out data. The output from modeling is a trained model
that can be used for inference, making predictions on new data points. Modeling is independent of the previous
steps in the Machine Learning process and has standardized inputs which means we can alter the prediction
problem without needing to rewrite all our code. If the business requirements change, we can generate new label
times, build corresponding features, and input them into the model. Models are implemented and later evaluated
for their accuracies using root mean square error
Following are the plots of actual vs predicted values for each model:
Linear Regression Decision Trees
IV. CONCLUSION
In this paper certain Data Analytics techniques were adopted in order to estimate crop price analysis
with existing data. Linear Regression is employed for discovering important information from the agricultural
datasets. Neural Network is constructed for price prediction to increase the accuracy percentage. The root mean
square error is calculated for every technique to accurately measure the accuracy of each system employed and
the most accurate system is then selected. Strong dependencies between soybean yield/production and
exports/imports is observed, similarly between US soybean export and other countries' import, for example,
China. US soybean business has gone through 4 to 6 typical periods, from starting to grow before 2008, to big
fluctuations of both yield and prices from 2009 to 2014 and to adjustment/transition in 2014 and then the stable
increasing stage after 2015. It is reliable to use soybean growth areas, yield/productions and export/import
profiles to predict the nearby_close market price, which is of practical values for financial purposes. XGBoost
predicts the target better when compared to all the other algorithms.
REFERENCES
[1]. A Survey on Crop Prediction using Machine Learning Approach
[2]. Predicting Yield of the Crop Using Machine Learning Algorithm
[3]. Crop Price Forecasting System Using Supervised Machine Learning Algorithms
[4]. www.kaggle.com for datasets
Corresponding Author: Pandit Samuel 19 | Page
Crop Price Prediction System using Machine learning Algorithms
[5]. https://en.wikipedia.org/wiki/Agriculture
[6]. https://en.wikipedia.org/wiki/Data_analysis
[7]. JeetendraShenoy, YogeshPingle, “IOT in agriculture”, 2016 IEEE.
[8]. M.R. Bendre, R.C. Thool, V.R.Thool, “Big Data in Precision agriculture”,Sept,2015 NGCT.
[9]. Monali Paul, Santosh K. Vishwakarma, Ashok Verma, “Analysis of Soil Behavior and Prediction of Crop Yield using Data Mining
approach”, 2015 International Conference on Computational Intelligence and Communication Networks.
[10]. Abdullah Na, William Isaac, ShashankVarshney, Ekram Khan, “An IoT Based System for Remote Monitoring of Soil
Characteristics”, 2016 International Conference of Information Technology.
[11]. Dr.N.Suma, Sandra Rhea Samson, S.Saranya, G.Shanmugapriya, R.Subhashri, “IOT Based Smart Agriculture Monitoring System”,
Feb 2017 IJRITCC.
[12]. N.Heemageetha, “A survey on Application of Data Mining Techniques to Analyze the soil for agricultural purpose”, 2016IEEE.
Pandit Samuel,etal.“Crop Price Prediction System using Machine learning Algorithms." Quest Journals
Journal of Software Engineering And Simulation, Vol. 06, No. 01, 2020, Pp. 14-20.