0% found this document useful (0 votes)
39 views13 pages

Predictive Modeling For Optimal Crop Yields Using Machine Learning

Uploaded by

cryptoram486
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views13 pages

Predictive Modeling For Optimal Crop Yields Using Machine Learning

Uploaded by

cryptoram486
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Predictive Modeling For Optimal Crop Yields Using

Machine Learning

S. Sravan Kumar1[9920004595], S. Amutha2, T. Govardhan Reddy3[9920004596],


T. Kumar4[9920004585] and K. Janakiram 5[9920004604]
1
Department of Computer Science and Engineering, Kalasalingam Academy of
Research and Education, Virudhunagar dt, Tamil Nadu – 626126, India

Corresponding Author*: [email protected]

Abstract - The use of cutting-edge technology has becomeessential in the quickly changing field
of agriculture, where uncertainties are caused by shifting soil conditions, erratic weather patterns,
and dynamic insect dynamics. With the use of cutting-edge machine learning algorithms, this
research aims to address the issues facing contemporary agriculture by providing accurate
production predictions. Using an all-encompassing investigation of agricultural data that includes
a varietyof factors, including meteorological data, soil characteristics, and insect dynamics, the
main objectiveis to transform traditional farming practices.
The use of cutting-edge technology has become essential in the quickly changing field of
agriculture, where uncertainties are caused by shifting soil conditions, erratic weather patterns,
and dynamic insect dynamics. With the use of cutting-edge machine learning algorithms, this
research aims to address the issues facing contemporary agriculture by providing accurate
production predictions. By means of an all- encompassing investigation of agricultural data that
includes a variety of factors, including meteorological data, soil characteristics, and insect
dynamics, the mainobjective is to transform traditional farming practices.

Keywords - Machine Learning, NLP, Python, Flask,HTML, CSS, JavaScript and Bootstrap

I.INTRODUCTION
Agriculture, serving as the backbone of human civilization, plays a crucial role in
sustaining the world's food supply and accommodating its ever-expanding population.
To meet the projected 9.7 billion people by 2050, innovative solutions areimperative to
maximize crop yields and enhance agricultural productivity. Predictive modeling
emerges as a viable option,utilizing machine learning to anticipate optimal crop yields
based on various contributing factors. Traditional farming methods often rely on
intuition, experience, and historical knowledge, which may fall short in meeting the
needs of a growing population and a rapidly changing environment.Machine learning
holds the potential to revolutionize agricultural decision-making processes, leveraging
its abilityto analyze vast datasets, discern intricate patterns, and offer accurate forecasts.
The significance of precise crop output projections cannot be overstated. Early
information on potential yields enables farmers to optimize resource allocation, adapt
agricultural practices, and mitigate the impact of environmental conditions. Accurate
forecasts not only minimize resource wastage but also reduce environmental impact,
fostering sustainable agriculture and increasing overall agricultural production.

II. LITERATURE SURVEY


The work in [1] focuses on estimating agricultural productionusing machine learning
methods. Using Tamil Nadu'available data, the Random Forest method is used to
anticipateagricultural production. The Random Forest Algorithm is used to illustrate
how accurate crop yield estimates may be. The usefulness and flexibility of Random
Forests for predicting agricultural yields both locally and globally are examined in [2].
According to the study, Random Forest outperforms multiple linear regression (MLR)
as a machine-learning technique, with high accuracy and precision. AdaNaive and
AdaSVM are introduced in [3], a machine learning model for agricultural production
ensemble, to estimate crop output over a certain time period. AdaBoost improves the
performance of SVM and Naive Bayesalgorithms. A machine learning technique for
forecasting agricultural output based on meteorological variables is described in [4].
The study presents Crop Advisor, an easily navigable website that uses the C4.5
algorithm to determine the climate factors that have the greatest impact on crop
production in Madhya Pradesh districts. Crop-soil and fertilizer matching techniques
are covered in [5] of the International Journal of Advanced Research in Computer
Science and Electronics Engineering. The article discusses soil micronutrient shortages
and offers suggestions for raising crop yields. The work presented in [6] in the
International Journal of Research in Technology and Engineering aims to give farmers
an easy-to-use interface for analyzing rice production using data that is already
available. Several data mining techniques, such as the K-Means algorithm, are used to
predict agricultural productivity. A thorough summary of the research on machine
learning's application to agricultural production systems can be found in [7]. Using
Support Vector Machines (SVM) for the implementation, the study highlights the
relevance of big data technologies and high-performance computers in decoding data-
intensive operations in agricultural industries. A study on precision agriculture that
addresses field variability by utilizing remote sensors and GIStechniques is described in
[8]. Precision agricultural approaches are implemented through the use of ensemble
learning (EL). The average forecast of each tree in Random Forests for Global and
Regional Crops is covered in [9]. Thework implements the k-nearest neighbor technique
with Support Vector Regression (SVR) to highlight the dependability of Random
Forests for agricultural output estimates at both regional and global scales.

III.DATA ENTRY AND INVESTIGATIVE STUDY

A. Cleaning Data:

1. Elimination of Superfluous Data and SuperfluousColumns: To guarantee that every


observation in the datasetwas unique, duplicate entries were found and eliminated. The
dependability of the dataset is improved by the removalof these duplicates, which may
have been caused by mistakes in data input or collection. 'Year' and other extraneous
columns were also removed, simplifying the dataset for more targeted investigation.

2. Numerical Rainfall Data Conversion from Non-Numeric Source: A number of data,


most notably rainfall data, that were originally entered in non-numeric forms were
converted to numeric values. This stage is crucial for carrying out quantitative analyses
because it makes statistical calculations possible and encourages a more thorough
examination of the data.

B. Data Analysis via Exploration (EDA):

1. Examining Particular Regions and Crop Types: A closer look at the information
revealed different crop kinds and geographical regions in the agriculture data. This
exploration identifies the wide diversity of places and crops represented, providing
important context for further research. Comprehending the distinct features of the
dataset facilitates the development of an all-encompassing comprehension of
agricultural methodologies.

2. Data Distribution Visualization Using Count Plots: Count plots were employed to aid
in the visual depiction of data distribution among several categories, including regions
andcrop kinds. These graphics provide a brief synopsis of category frequency. Count
plots, representing areas and cropkinds, offer valuable information on the prevailing
agricultural practices in different countries. A technique forfinding patterns, trends, or
anomalies in the dataset is visualization.

Data processing includes steps to remove duplicates and unnecessary columns to


preserve data integrity. Exploratory Data Analysis uses visualization to identify
patterns and trends while delving into understanding the unique characteristics
of regions and crop types. These preparatory actions lay the foundation for
deeper analysis and the eventual creation of models in the study.
IV. METHODOLOGY

4.1 Synopsis

To estimate crop yields, this study uses a thorough techniquethat includes data collection,
preprocessing, machine learning model selection, and training. The aim is to providea
workable framework for precise crop yield forecasts.

Figure 1: Flow Chart for Methodology of Crop YieldPrediction

4.2 Gathering and Preparing Data

4.2.1 Information Gathering


This study's agricultural dataset comes from reputable sources and includes crucial
information on the year, average rainfall, use of pesticides, temperature, location, and
crop type. This dataset serves as the foundation for creating a reliable prediction model.

4.2.2 Preprocessing the Data


Preprocessing procedures were carried out in great detail to guarantee data quality and
compatibility:

- Removal of Duplicate Entries and unnecessary Columns: For faster analysis,


unnecessary columns like "Unnamed: 0" were removed, and the removal of duplicate
entries guaranteed data integrity.

The process of converting non-numeric information in the '


average_rain_fall_mm_per_year ' column to numerical format enabled quantitative
analysis.

- Handling Missing or Inconsistent Data: To provide a comprehensive and


trustworthy dataset, appropriate procedures were used to handle any missing or
inconsistentdata.

- Feature Selection: For model training, pertinent data suchas crop type, crop year,
average rainfall, pesticide usage, temperature, and geographic location were chosen.

4.3 Selecting and Training Machine Learning Models

4.3.1 Scaling and Encoding Features


A preprocessing pipeline made use of the `Column Transformer` from scikit-learn.
Numerical variables were standardized using the {Standard Scaler}, whereas
categorical variables ('Area' and 'Item') were encoded one-hot. This made the machine-
learning models compatible with a variety of features kinds.

4.3.2 Choosing the Model


To forecast crop yields, some regression models weretaken into consideration:

Lasso Regression ( lss ) - Linear Regression ( lr) - K- Nearest Neighbours Regression (


knr) - Ridge Regression (rg)

- Regression using Decision Trees (DTR)

Every model was trained using the pre-processed training set to investigate various
regression strategies and evaluatetheir efficacy.
4.3.3 Model Assessment
The evaluation of the model's performance was conductedusing conventional
regression measures, such as Mean Squared Error (MSE) and R-squared. This
comprehensiveanalysis made choosing the best model for precise crop yield forecasts
easier.

V. FINDINGS AND CONVERSATION


5.1 Results Presentation

Promising outcomes were obtained when crop yields were predicted using machine
learning algorithms.

The followingsignificant results were noted:


5.1.2 Analysis of Crop-wise Yield
Investigating crop-wise yield dynamics has provided important new understandings of
the complex patterns that control agricultural output. Through examining the overall
yield for each type of crop, a more complex picture of the varied agricultural
environment has been revealed. We produced visually striking bar plots to illustrate
yield changes across several crops using Seaborn and Matplotlib. The proportional
contributions of each crop to the total agricultural output might be determined with the
use of theseimpressive visualizations.

5.1.3 Bar Plot-Based Insights


Designed to illustrate agricultural yield, the bar graphs provided comprehensive
information about the relative economic importance of different crops in different areas.
Each bar represented the overall yield for a particular crop, allowing for an easy-to-
understand visual comparison. The bars' varied heights drew attention to differences in
crop yields as well as the financial significance of various crops in the agricultural
system. For academics, policymakers, and stakeholders looking for more in-depth
knowledge of the economic dynamics related to certain crops, this information is
essential.
Figure 2: Bar Plot of Crop Per Yield.

5.1.4 Dissecting the Dynamics of Agriculture


The thorough crop-by-crop yield study revealed the intricate dynamics
underlying agricultural practices, going beyondsimple numerical statistics. Bar
plots' visually appealing design made it easier to identify economically
important and high-yielding crops. This investigation paved the way for
strategic agricultural planning while also advancing our knowledge of crop-
specific production. Finding these trends is critical to improving crop diversity,
allocating resources asefficiently as possible, and promoting3
sustainable farming methods.
5.1.5 Crops' Economic Significance
Beyond just numbers, the crop-by-crop yield study provides insights into the
relative economic importance of different crops in different geographic areas.
Stakeholders can use this data to guide their decisions about market strategy,
crop selection, and resource allocation. An understandable depiction of each
crop's economic impact was made possible by the visually striking bar plots,
which helped peopleunderstand the agricultural environment and how it affects
local economies.
5.2 Key Findings Discussion
5.2.1 Model Effectiveness
Determining which regression models were most useful for predicting crop productivity
required a thorough analysis of each one. Comprehensive training and testing were
conducted on Decision Tree Regression, Lasso, Ridge, K- Nearest Neighbours, and
Linear Regression, demonstratingunique performance characteristics for each model.
The models' ability to forecast crop yields was shown to vary inaccuracy. During an
examination, Decision Tree Regression stood up as the most promising model,
exhibiting better performance with the test dataset's lowest Mean Squared Error and
greatest R-squared value. As a result, Decision Tree Regression is now the
recommended model for preciseand trustworthy crop yield forecasts.

5.2.2 Crop Yield-Relating Factors


The goal of the investigation was to pinpoint the major variables affecting crop output,
providing insightful information for agricultural decision-making. Temperature,
pesticide use, and average rainfall have all been shown to be important factors in
determining.

5.3 Consequences and Prospects


5.3.1 Allocating Resources and Planning for Agriculture
The results of this study have important ramifications for resource allocation and
agricultural planning. Decision- makers in the agriculture industry now have a powerful
instrument to optimize resource usage: the built predictive model. Understanding key
variables like temperature, average rainfall, and pesticide use allows for strategic
planning, whichin turn helps stakeholders make well-informed decisions thatsupport
sustainable agriculture practices. This knowledge becomes especially helpful while
negotiating the difficulties brought on by shifting climatic circumstances.

The goal of the investigation was to pinpoint the major variables affecting crop output,
providing insightful information for agricultural decision-making. Temperature,
pesticide use, and average rainfall have all been shown to be important factors in
determining agricultural productivity. These complex interactions were well-captured
and modeledusing the interpretable Decision Tree Regression model. For stakeholders
and policymakers looking to comprehend the precise influence of each feature on
agricultural output, this interpretive capacity is essential. Understanding the impact of
variables like as temperature, rainfall, and pesticide use helps farmers make well-
informed decisions and take

preventative action to increase crop output and reduce hazards.

5.2.3 The Decision Tree Model's Interpretability


The Decision Tree Regression model's interpretability offered a special benefit in
addition to its superior predicting ability. The decision-making process of the model is
transparent by nature, providing a clear knowledge of the contributions of temperature,
average rainfall, and pesticide consumption to the total projection. This interpretability
comes in handy for stakeholders who need information that can be put into practice. By
comprehending the subtleties of each feature's influence, practitioners may better focus
tactics, allocate resources optimally, and customize treatments to increase agricultural
output.

5.2.4 Real-World Consequences


The results highlight the usefulness of the chosen model and the factors that were found.
Decision Tree Regression is a useful tool for those involved in the agriculture industry
because of its consistent performance and ease of interpretation. Allocating resources
and developing strategies are made possible by an understanding of the effects of
temperature, pesticide use, and average rainfall. Interventionsto improve crop output,
adjust to climatic fluctuations, and promote sustainable agricultural practices are guided
by this understanding.

5.3.2 Adjusting to Changing Climate Circumstances


The predictive model has the potential to adapt to changing climate circumstances in
addition to streamlining presentprocedures. An accurate model that takes into account
dynamic elements like temperature and precipitation is necessary as climate variability
rises. Using the model's findings, decision-makers may put adaptive measures into place
to make sure that agricultural practices are resilient to shifting climatic trends. To reduce
risks and promote long- term sustainability in agriculture, this flexibility is essential.

5.3.3 Prospective Research Paths


Subsequent investigations ought to concentrate on improving the accuracy of the
model and broadening its use.Including more features—like data on crop diseases and
soilquality—stands out as a critical step toward gaining a deepercomprehension of the
complex variables affectingagricultural productivity. This feature enhancement has
thepotential to greatly improve the model's predictive power. Furthermore,
investigating sophisticated machine learning strategies and ensemble approaches
offers encouraging avenues for development. These methods might lead to increased
model accuracy and resilience, bringing in a newage of accuracy and dependability in
agricultural productionpredictions.

5.3.4 Conclusion and Outlook


To sum up, the findings highlight the potential of the createdpredictive model for crop
yield forecasting and offer insightful information about the variables influencing
agricultural output. In addition to outlining useful implications for policymakers, the
debate suggested future research initiatives to further the subject. The combination of
technology innovation and data-driven insights has the potential to revolutionize
agricultural methods, promotesustainability, and guarantee global food security as we
move forward.

VI. CONCLUSION

This work is a comprehensive investigation of crop yield prediction that incorporates


important environmental elements and makes use of a variety of regression models. Our
research's contributions and conclusions provide important new information for the
agriculture industry. The Decision Tree Regression model stood up as the most
promising model due to its exceptional performance in cropproduction prediction.

Figure 3: Crop Yield Prediction interface for taking inputs.


6.1 Principal Inputs and Results
This study's main contribution is the creation andassessment of prediction models for
estimating crop yields.After a thorough investigation of several regression methods,
Decision Tree Regression was shown to be a reliable and accurate model. This finding
creates opportunities for agricultural stakeholders to use predictive methods for
scheduling and allocating resources. Additionally, our data demonstrated how
important environmental variables like temperature, pesticide use, and average rainfall
are in determining crop output. Decision-makers are better equipped to optimize
agricultural methods, adjust to changing climate circumstances, and increase overall
production with the help of this information.

Figure 4: Predictive Suggestion for Crop Yield for the following given inputs.

6.2 The Study's Limitations


Although our study offers insightful information, it has certain drawbacks. Despite their
effectiveness, the prediction models are essentially dependent on past data and
assumptions. Uncertainties may be introduced by environmental variation and
unanticipated outside sources.Furthermore, the models may show limits when it comes
to extrapolating to areas or circumstances that are not well-represented in the training
data, and their prediction poweris limited by the dataset that is currently available.

6.3 Ideas for Further Research


To advance agricultural yield prediction, more studiesshould look at some approaches.
Enhancing the models' accuracy and resilience may be possible by expanding thedataset
to include other characteristics, such as data on crop diseases and soil quality. There is
great potential for improvement by investigating ensemble approaches and
sophisticated machine-learning techniques. Furthermore, research on particular crop
types and geographical quirks may offer more nuanced and localized insights for
customized agricultural strategy.

Ultimately, our research adds to the growing body of knowledge in agricultural data
science by laying the groundwork for data-driven decision-making. Although thestudy's
limitations are acknowledged, the knowledgegathered and the models created provide
a foundation for future developments in crop yield prediction, providing useful
instruments for resilient and sustainable agriculture practices.
VII. ACKNOWLEDGMENTS
We would like to express our sincere thanks to the people and institutions whose
assistance and efforts made this research project a success.

We would first and foremost want to sincerely thank S. Amutha for all of her help and
mentorship during this endeavor. Their knowledge and perceptions have greatly
advanced our comprehension of agricultural data science and helped to create reliable
prediction models.

We would like to express our thanks to Kaggle for supplying the dataset needed for this
research. The capacity to obtain this data was essential to the investigation and analysis
carried out in this study, allowing us to make significant inferences and create
forecasting models for crop output.

We would like to thank the scientific community as well asour colleagues and peers for
their informative conversations and helpful criticism. Their input has been crucial in
helping us improve our approach and broaden the focus of our study.

Finally, we want to express our gratitude for the steadfast support that our family and
friends have given us during the difficulties of this research trip. Their support and
comprehension have provided me with courage and inspiration.

Without everyone listed above's cooperation and assistance, this research would not
have been feasible. We appreciate your crucial participation in our scholarly and
investigative activities.

VIII. REFERENCES
[1] "Forecasting yield by integrating agrarian factors and machine learning models: A
survey," D. Elavarasan, D. R. Vincent, V. Sharma, A. Y. Zomaya, and K. Srinivasan,
Computers and Electronics in Agriculture, vol. 155, pp. 257-282, 2018.

[2] J. Zhang, Y. Luo, Z. Zhang, F. Tao, L. Zhang, J. Cao, etal., "Integrating multi-source
data for rice yield predictionacross China using machine learning and deep learning
approaches," Agricultural and Forest Meteorology, vol. 297, p. 108275, 2021.
[3] Computers and Electronics in Agriculture, vol. 177, p. 105709, 2020; T. Van
Klompenburg, A. Kassahun, and C. Catal, "Crop yield prediction using machine
learning: A systematic literature review,"
[4] "Crop yield prediction using machine learning algorithm," D. J. Reddy and M. R.
Kumar, in IEEE, May 2021, 5th International Conference on Intelligent Computing and
Control Systems (ICICCS), pp. 1466-1470.

[5] A thorough evaluation of agricultural yield prediction using machine learning


algorithms with specific emphasis on palm oil production prediction, M. Rashid, B. S.
Bari, Y.Yusup, M. A. Kamaruddin, and N. Khan, IEEE Access, vol.9, pp. 63406-63439,
2021.
[6]. In Agricultural and Forest Meteorology, vol. 311, p. 108666, 2021, L. Zhang, Z.
Zhang, Y. Luo, J. Cao, R. Xie, and S. Li, "Integrating satellite-derived climatic and
vegetation indices to predict smallholder maize yield using deep learning."

[7] "Integrated phenology and climate in rice yields prediction using machine learning
methods," Ecological Indicators, vol. 120, p. 106935, 2021, Y. Guo, Y. Fu, F. Hao,
X. Zhang, W. Wu, X. Jin, et al.

[8] "An integrated, probabilistic model for improved seasonal forecasting of


agricultural crop yield underenvironmental uncertainty," N. K. Newlands, D. S. Zamar,
L. A. Kouadio, Y. Zhang, A. Chipanshi, A. Potgieter, et al., Frontiers in Environmental
Science, vol. 2, p. 17, 2014.

[9] "Optimal county-level crop yield prediction using MODIS-based variables and
weather data: A comparative study on machine learning models," Agricultural and
ForestMeteorology, vol. 307, p. 108530, 2021, S. Ju, H. Lim, J. W.Ma, S. Kim, K. Lee,
S. Zhao, et al.

[10] Agricultural and Forest Meteorology, vol. 285, p. 107922, 2020; P. Feng, B.
Wang, L. Liu, D. Li, C. Waters,
D. Xiao, et al., "Hybrid approach using a biophysical model and machine learning
technique improves dynamic wheat yield forecasts."

[11] "Crop yield prediction using machine learning approaches on a wide spectrum,"
S. V. Joshua, A. S. M. Priyadharson, R. Kannadasan, A. A. Khan, W. Lawanont, F.
A. Khan, et al., Computers, Materials & Continua, vol. 72, no. 3, pp. 5663-5679, 2022.

[12] Chlingaryan, Sukkarieh, and Whelan, B., "Machine learning approaches for
nitrogen status estimation and crop yield prediction in precision agriculture: A review,"
Computers and Electronics in Agriculture, vol. 151, pp. 61-69, 2018.

You might also like