0% found this document useful (0 votes)
28 views5 pages

Stock Prediction Model CEP

This document summarizes a report on using machine learning models for crop yield prediction. It introduces linear regression, polynomial regression, and XGBoost regression models and evaluates each model's ability to predict crop yields based on environmental and agricultural data. The XGBoost model achieved the highest prediction accuracy and best generalized to new data, demonstrating its promise for predicting crop yields using readily available country-wide data.

Uploaded by

Muhammad Arsal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views5 pages

Stock Prediction Model CEP

This document summarizes a report on using machine learning models for crop yield prediction. It introduces linear regression, polynomial regression, and XGBoost regression models and evaluates each model's ability to predict crop yields based on environmental and agricultural data. The XGBoost model achieved the highest prediction accuracy and best generalized to new data, demonstrating its promise for predicting crop yields using readily available country-wide data.

Uploaded by

Muhammad Arsal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Crop Yield Prediction

Group 9
M. Saeed Shaikh, 2022421, CE, Affan Khan, 2022047, CE, and Farhan
Moaviz, 2022166, CE

Abstract—This report investigates the application Machine learning offers a powerful tool for tackling the
of machine learning for crop yield prediction, complex task of crop yield prediction. By analyzing historical
focusing on three different models: linear data on weather, soil conditions, agricultural practices, and
past yields, machine learning algorithms can discover patterns
regression, polynomial regression, and XGBoost and relationships that contribute to crop yields. This report
regression. Conducted in Python, the project investigates the application of three different machine learning
utilized a dataset containing country-wide data on models for crop yield prediction:
rainfall, average temperature, pesticide use, and 1. Linear Regression: A simple yet effective model that
crop yields across various countries. This dataset establishes a linear relationship between input variables
allowed for the assessment of the models' ability to and crop yields.
predict crop yields based on these environmental
2. Polynomial Regression: An extension of linear regression
and agricultural factors. Each model's performance that allows for non-linear relationships between variables.
was evaluated through metrics like Mean Squared
Error (MSE) and R-squared, revealing that the 3. XGBoost Regression: A powerful ensemble learning
technique that combines multiple decision trees to
XGBoost model achieved the highest prediction achieve high accuracy.
accuracy and best generalized to unseen data. This
suggests that XGBoost is a promising tool for By comparing the performance of these models, this
predicting crop yields based on readily available project aims to identify the most effective approach for crop
yield prediction in a specific context. This report will delve
country-wide data. This research highlights the into the details of each model, implementation specifics, and
potential of machine learning as a valuable tool for analysis of the outcomes. The findings will provide valuable
farmers, agricultural stakeholders, and insights into the potential of machine learning for improving
policymakers to optimize resource allocation, crop yields and contribute to advancements in agricultural
enhance crop yields, and improve food security. practices.

Keywords—crop yield prediction, machine learning,


XGBoost, linear regression, polynomial regression, country- II. DETAILED LOOK INTO EACH APPROACH
wide data, rainfall, temperature, pesticides
A. Linear Regression
I. INTRODUCTION
Crop yield prediction is a crucial task in agriculture, Linear Regression is a fundamental statistical technique
informing decisions across various levels. As Van that establishes a linear relationship between a dependent
Klompenburg et al. (2020) stated, "Crop yield prediction is an variable (crop yield) and one or more independent variables
essential task for the decision-makers at national and regional (rainfall, temperature, pesticide use). This simple model has
levels (e.g., the EU level) for rapid decision-making. An several advantages:
accurate crop yield prediction model can help farmers to • Interpretability: The coefficients obtained from
decide on what to grow and when to grow." (p. 1). Accurate linear regression represent the direct impact of each
predictions enable stakeholders to: independent variable on the crop yield, making it
1. Optimize resource allocation: Farmers can utilize easy to understand the relative importance of each
resources more efficiently by planning their planting and factor.
harvesting schedules based on predicted yields. • Ease of Implementation: As Iqbal (2021) noted,
2. Enhance crop yields: Precise predictions allow for "Models obtained from linear regression techniques
interventions such as adjusting irrigation and fertilization can be implemented easily and efficiently on systems
based on anticipated yields, leading to improved having low computational capacity as compared to
productivity. complex algorithms." This makes it a readily
accessible option for resource-constrained settings.
3. Improve food security: Accurate forecasts of crop yields • Computational Efficiency: The training process for
can guide policymakers in planning food distribution and linear regression is computationally inexpensive,
storage, ensuring food availability for populations. allowing for rapid model development and analysis.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


However, linear regression also has limitations: :
• Limited to Linear Relationships: This model can Many factors influencing crop yield, such as
only capture linear relationships between variables, temperature, water availability, and nutrient levels, often
potentially missing non-linear trends and leading to exhibit non-linear trends. Polynomial regression is better
inaccurate predictions for complex systems like crop equipped to capture these trends and account for their
yields. impact on crop growth and yield, leading to more realistic
• Sensitivity to Outliers: Outliers in the data can and accurate predictions.
significantly influence the model's parameters,
leading to biased and inaccurate predictions. 5) Increased Generalizability:
• Overfitting: Linear regression tends to overfit the
training data, resulting in poor generalizability to When implemented with appropriate model selection
unseen data. and regularization techniques, polynomial regression can
achieve a good balance between capturing non-linear
By analyzing the strengths and limitations of linear relationships and avoiding overfitting. This can lead to
regression in the context of crop yield prediction, we gain models that generalize better to unseen data, providing more
valuable insights that can guide the selection and reliable predictions across different agricultural settings.
implementation of more complex models in subsequent
sections. However, it's important to remember that polynomial
regression also has limitations, such as increased
B. Polynomial Regression complexity, computational cost, and susceptibility to
overfitting. Therefore, careful consideration is required
While linear regression offers a simple and interpretable when choosing between linear and polynomial approaches
approach, it often struggles to capture the complex, non- for crop yield prediction, taking into account the specific
linear relationships commonly observed in agricultural characteristics of the data and desired outcomes.
systems, particularly when it comes to crop yield prediction.
In such scenarios, polynomial regression provides several
advantages: C. XGBoost Regression

1) Improved Flexibility and Fit: XGBoost Regression stands out as a powerful and
versatile tool for crop yield prediction. This ensemble
Polynomial Regression addresses the limitations of learning technique combines multiple decision tree models
linear regression by capturing non-linear relationships to achieve high accuracy and robustness. Compared to linear
between variables. As Iqbal (2021) noted, "Non-linear data and polynomial regression, XGBoost offers several distinct
cannot be fit by linear regression technique (under-fitting). advantages:
So, we increase the model complexity and use the
polynomial regression model, which fits such big non-linear 1) Enhanced Accuracy:
data in a better way." This model introduces non-linear
terms to the equation, allowing for the representation of By leveraging the combined power of multiple decision
more complex relationships between input and output trees, XGBoost can capture complex non-linear
variables. relationships and intricate interactions between various
factors influencing crop yield. This often leads to improved
2) Modeling Complex Interactions: prediction accuracy compared to simpler models like linear
regression.
Polynomial regression can capture complex interactions
between input variables. For example, the effect of rainfall 2) Robustness to Outliers and Noise:
on crop yield might not be linear but rather quadratic, with
both low and high rainfall levels potentially impacting yield XGBoost's ensemble learning nature makes it less
negatively. Polynomial modeling can account for such sensitive to outliers and noise in the training data compared
interactions, leading to a more accurate representation of the to single-model approaches like linear regression. This
underlying system. robustness leads to more reliable predictions in real-world
scenarios where data quality might be imperfect.
3) Adaptability to Diverse Datasets:
3) Automatic Feature Selection:
Unlike linear regression, which assumes a constant
linear relationship across the entire data range, polynomial XGBoost automatically selects the most relevant
regression can adapt to diverse datasets with varying features from the available data, eliminating the need for
patterns. This is particularly beneficial when dealing with manual feature engineering. This feature selection process
real-world agricultural data, which can exhibit significant reduces model complexity and improves generalizability.
variations due to factors like different climate zones, soil
types, and agricultural practices. 4) Regularization for Avoiding Overfitting:

4) Handling Non-Linear Trends


XGBoost incorporates built-in regularization techniques
like L1 and L2 regularization to prevent overfitting, which 1) Missing Value Imputation:
can be a major concern with complex models. This allows
the model to capture the true underlying relationships The dataset contained missing values for certain
without memorizing the training data. variables. We employed appropriate techniques like mean or
median imputation to fill these missing values and avoid
5) Scalability to Large Datasets: introducing bias into the models.

XGBoost can efficiently handle large datasets with 2) Outlier Detection and Removal:
millions of data points, making it suitable for real-world
applications with extensive agricultural data. This scalability Outliers can significantly impact the performance of
allows for comprehensive analysis and potentially more machine learning models. We used statistical methods and
accurate predictions across diverse landscapes. visualizations to identify outliers and remove them from the
dataset, ensuring the model learned from representative data
6) Interpretability: points.

While XGBoost is a complex model, it offers various 3) Data Transformation:


tools for interpretability, such as feature importance
analysis. This allows researchers to understand the relative Some features might require further transformation to
contribution of each input variable towards the model's improve their suitability for certain models. For example,
predictions, providing valuable insights into the factors we might need to perform logarithmic or square root
influencing crop yields. transformations to address non-linear relationships or
outliers.
However, it's important to note that XGBoost requires
careful tuning of its hyperparameters to achieve optimal 4) Categorical Feature Encoding:
performance. Additionally, its complex nature can make it
computationally more expensive compared to simpler The dataset might contain categorical features that need
models. to be encoded into numerical values for machine learning
models to understand and utilize them. Techniques like one-
Overall, XGBoost presents a powerful approach for crop hot encoding or min-max scaling can be implemented for
yield prediction due to its ability to capture non-linear effective feature encoding.
relationships, handle diverse data, and achieve high
accuracy and generalizability. Its interpretability features 5) Train-Test Split:
further enhance its value for agricultural research and
decision-making. The preprocessed data is then split into two subsets: a
training set used to train the models and a test set used to
evaluate their performance on unseen data. This split
III. IMPLEMENTATION ensures that the final evaluation is unbiased and reflects the
A. Data Selection and Preprocessing model’s generalizability to real-world scenarios.

The first crucial step in any machine learning project is By performing these meticulous data preprocessing
selecting and preparing the data that will be used to train steps, we ensure that the models are trained on high-quality,
and evaluate the models. For this project, we utilized the clean data, paving the way for accurate and reliable
Crop Yield Prediction Dataset available on Kaggle predictions of crop yield.
(https://www.kaggle.com/datasets/patelris/crop-yield-
prediction-dataset). This dataset contains data on various B. Building the Models
factors influencing crop yield, including:
Once our data is prepared and preprocessed, we can
• Pesticides: Amount of pesticides used per country per proceed to building the three models for crop yield
year. prediction: Linear Regression, Polynomial Regression, and
• Rainfall: Average rainfall per country per year. XGBoost Regression. We will utilize the scikit-learn library
• Temperature: Average temperature per country per in Python to implement each model.
year.
• Yield: Crop yield for various crops per country per 1) Linear Regression:
year.
• This data provides a valuable foundation for building • We import the LinearRegression class from
and evaluating models for crop yield prediction. sklearn.linear_model.
• We create a LinearRegression object and fit it to the
Before utilizing the dataset for model training, we training data.
performed several data preprocessing steps to ensure its
quality and suitability for machine learning algorithms:
• We obtain the model's coefficients and can interpret Furthermore, the XGBoost model's R² value of 0.922
their meaning in terms of the impact of each feature on indicates that it can explain 92.2% of the variance in the
crop yield. actual crop yields, which is a remarkable performance. This
suggests that the XGBoost model can be used to generate
2) Polynomial Regression: reliable and accurate predictions of crop yields, which can be
valuable for farmers, agricultural stakeholders, and
• We import the PolynomialFeatures class from policymakers.
sklearn.preprocessing.
• We create a PolynomialFeatures object with the desired
IV. SUMMARY OF RESULTS
degree of polynomial terms.
• We transform the training data using the
PolynomialFeatures object to create new features based This project investigated the application of three
on existing ones. different machine learning models for crop yield prediction:
• We then follow the same steps as for linear regression, linear regression, polynomial regression, and XGBoost
but with the transformed data. regression. The analysis utilized a dataset containing
country-wide data on rainfall, average temperature, pesticide
3) XGBoost Regression: use, and crop yields across various countries. Each model's
performance was evaluated based on its ability to accurately
• We import the XGBRegressor class from xgboost. predict crop yields using these environmental and
agricultural factors.
• We create an XGBRegressor object and set its
hyperparameters.
• We fit the XGBoost model to the training data. A. Evaluation Metrics:
• We can utilize XGBoost's built-in feature importance
analysis to understand the relative contribution of each • Mean Squared Error (MSE): Measures the average
feature to the model's predictions. squared difference between predicted and actual crop
By building these three models and comparing their yields.
performance, we can gain valuable insights into their • R-squared (R²): Measures the proportion of variance in
effectiveness for crop yield prediction. Analyzing the the actual crop yields explained by the model.
strengths and limitations of each approach allows us to
identify the most suitable model for this specific context and
potentially leverage its capabilities for further research and B. Model Performance:
development.

C. Evaluating Model Outcomes

We evaluated the performance of the linear regression,


polynomial regression, and XGBoost models on the test set
using the following metrics:

• Mean Squared Error (MSE): Measures the average


squared difference between the predicted and actual
crop yields.
• R-squared (R²): Measures the proportion of variance in
the actual crop yields explained by the model.

The following table summarizes the evaluation results:

Model MSE R²
Linear Regression 9.69e+08 0.755
Polynomial Regression 2.15e+08 0.945
XGBoost Regression 3.08e+08 0.922

As evident from the table, the XGBoost model achieved


the lowest MSE and highest R², suggesting that it
outperformed the linear and polynomial regression models
in predicting crop yields. This is likely due to XGBoost's
ability to capture complex non-linear relationships and
intricate interactions between the various factors influencing
crop yield.
C. Key Findings: A. Empowered Farmers, Optimized Resources:

XGBoost Regression achieved the lowest MSE and Armed with precise yield predictions, farmers gain
highest R², indicating superior performance in predicting unprecedented control over resource allocation. Planting and
crop yields compared to linear and polynomial regression. harvesting strategies, fertilization and irrigation practices,
XGBoost's R² value of 0.922 suggests it can explain 92.2% and land utilization can be optimized based on anticipated
of the variance in actual crop yields, exhibiting remarkable yields, minimizing waste, maximizing output, and driving
accuracy and reliability. significant cost reductions.

The results highlight XGBoost's potential as a powerful


B. Enhanced Crop Management, Proactive Interventions
tool for crop yield prediction, especially when dealing with
complex data with non-linear relationships.
Accurate forecasts empower farmers to proactively
manage their crops, mitigating potential risks and
D. Implications: maximizing yield. Anticipated yield losses due to weather
events or pest outbreaks trigger timely interventions, such as
Accurate crop yield predictions provide valuable insights adjusted irrigation, tailored fertilizers, and targeted pest
for stakeholders, enabling informed decision-making in control, ensuring optimal crop health and resilience.
various areas:
• Resource allocation: Farmers can optimize resource
C. Informed Policy Decisions, Stabilizing Food Systems
utilization by planning planting and harvesting
schedules based on predicted yields.
• Crop yield enhancement: Precise predictions facilitate Reliable yield forecasts equip policymakers with
interventions like adjusting irrigation and fertilization, invaluable data for strategic decision-making. Informed
leading to improved productivity. policies on food distribution, storage, and trade can be
• Food security improvement: Accurate yield forecasts formulated, addressing potential food shortages and
guide policymakers in planning food distribution and surpluses. This proactive approach ensures equitable access
storage, ensuring food availability for populations. to food resources, stabilizes prices, and bolsters food
security across regions.

E. Future Directions:
D. Looking Ahead, a Future of Abundance
• Further research could investigate the use of different
The path forward in crop yield prediction is paved with
data sources, including satellite imagery and soil data,
continuous research and development. Expanding data
to enhance prediction accuracy.
sources, optimizing models, ensuring accessibility, and
• Exploring ensemble models combining XGBoost with
addressing ethical considerations are crucial aspects for
other machine learning algorithms might lead to further
realizing the full potential of this technology. By harnessing
improvements in prediction performance.
the power of machine learning, we can usher in a new era
• Adapting and refining the presented methodology for for agriculture, one where sustainability, resilience, and
specific crops and regions can offer tailored solutions abundance are the cornerstones of a food-secure world.
for diverse agricultural contexts.

REFERENCES
V. CONCLUSION
[1] T. Van Klompenburg, A. Kassahun, and Ç. Çatal, “Crop yield
This research venture stands as a testament to the prediction using machine learning: A systematic literature review,”
transformative potential of machine learning, particularly Computers and Electronics in Agriculture, vol. 177, p. 105709, Oct.
2020, doi: 10.1016/j.compag.2020.105709.
XGBoost regression, in revolutionizing crop yield
[2] M. A. Iqbal, “Application of Regression Techniques with their
prediction. By leveraging readily available, country-wide Advantages and Disadvantages,” ResearchGate, Sep. 2021, [Online].
data, XGBoost remarkably outperformed simpler models, Available: https://www.researchgate.net/publication/354921553_App
demonstrating its exceptional capabilities in accurately lication_of_Regression_Techniques_with_their_Advantages_and_Dis
forecasting crop yields. This breakthrough holds profound advantages
implications across the agricultural landscape, impacting [3] “Crop Yield Prediction Dataset,” Kaggle, Dec. 01, 2021.
https://www.kaggle.com/datasets/patelris/crop-yield-prediction-
farmers, policymakers, and the broader community. dataset/

You might also like