0% found this document useful (0 votes)
9 views50 pages

Viraj Project Documentation

The project report presents an analysis of agriculture crop production in India, aiming to identify trends, high-performing crops, and utilize machine learning techniques for predictions. It includes methodologies for data collection, preprocessing, exploratory data analysis, and predictive modeling using various libraries such as Pandas and scikit-learn. The findings are intended to inform agricultural policy and enhance productivity and sustainability in Indian agriculture.

Uploaded by

shaikhaaqif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views50 pages

Viraj Project Documentation

The project report presents an analysis of agriculture crop production in India, aiming to identify trends, high-performing crops, and utilize machine learning techniques for predictions. It includes methodologies for data collection, preprocessing, exploratory data analysis, and predictive modeling using various libraries such as Pandas and scikit-learn. The findings are intended to inform agricultural policy and enhance productivity and sustainability in Indian agriculture.

Uploaded by

shaikhaaqif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

A

Project Report on
India Agriculture Crop
Production Analysis
Submitted to
UNIVERSITY OF MUMBAI
In the partial fulfillment of the degree
Of Masters of Computer Science
Project By:
Mr. Viraj Vasudev Pawasakar
Exam Seat No:
1183893
Under the Guidance of
Mrs. Rupali Agavekar
Navkokan Education Society’s
D.B.J College, Chiplun
(2023-2024)
1
Navkokan Education Society’s

D.B.J. COLLEGE, CHIPLUN


NAAC Reaccredited Grade ‘A’ (CGPA 3.15)
DEPARTMENT OF COMPUTER SCIENCE

CERTIFICATE
This is to certify that Mr. Viraj Vasudev Pawasakar of
MSc. Part-II (Semester IV) Computer Science has
successfully completed the Project in Machine Learning
and has submitted the same to my satisfaction during the
academic year 2023-24 towards partial fulfillment of
MSc. Part-II (Semester IV) Computer Science,
University of Mumbai.

Date:
Guide Signature:

INCHARGE
Department of Computer Science2
Acknowledgement

It’s my great pleasure to take opportunity and


sincerely thanks all those who have showed me the way to
successful project and helped me a lot during the
completion of my project.
I greatly thank my Project Guide Mrs. Rupali
Agavekar without whom the completion of this project
couldn’t have been Possible.
My sincerely thanks to respected Head of Computer
Science Department Mr. S. J. Nalawade for providing all
the facilities including availability of Computer Lab. I
take this opportunity to express my deep gratitude
towards all the members of the Computer Science
Department, for helping me in the completion of the
project.
My special thanks to my parents, my friends and all
those people who have encouraged me, helped me to
complete this project proposal successfully in time

Mr. Viraj Vasudev Pawasakar


M.Sc. Part-II (Computer Science)
3
Table of Content
Sr. No Title Page No
1. Topic 5
2. Implementation details 6
3. Experimental setups and results 11
4. Analysis of the results 16
5. Conclusion 48
6. Future enhancement 49

4
India Agriculture Crop
Production Analysis
Mr. Viraj Vasudev Pawasakar

A dissertation submitted in partial fulfillment of D.B.J


College (Chiplun) for the degree of MSC in Computer Science
(Machine Learning). July 2024

5
2. Implementation Details

This project is focused on analyzing the agriculture crop


production in India. The aim of this analysis is to provide
insights into crop production trends, identify high-performing
crops and districts, and utilize various data visualization and
machine learning techniques to understand and predict
agricultural productivity.

Project Overview
India is one of the largest agricultural producers in the world,
and understanding the dynamics of crop production is crucial for
ensuring food security and optimizing resource allocation. This
project leverages historical data on crop production to derive
meaningful insights.

Aim of the Project


The primary aim of this project is to analyze and visualize crop
production data in India to uncover patterns and trends that can
inform agricultural policy and decision-making. By identifying
the factors that contribute to high crop yields, stakeholders can
develop strategies to enhance productivity and sustainability in
Indian agriculture.

6
Libraries and Frameworks Used

 Streamlit:
Streamlit is a framework for creating web applications with
Python. It's used for building interactive and customizable
web-based interfaces for data analysis, machine learning, and
more.

 Pandas:
Pandas is a powerful data manipulation and analysis library. It
provides data structures like DataFrames and Series, which
are essential for handling structured data.

 NumPy:
NumPy is a fundamental package for numerical computing in
Python. It provides support for large, multi-dimensional
arrays and matrices, along with a collection of mathematical
functions to operate on these arrays.

 Matplotlib:
Matplotlib is a comprehensive library for creating static,
animated, and interactive visualizations in Python. pyplot is a
module in Matplotlib that provides a MATLAB-like interface
for plotting.

7
 Seaborn:
Seaborn is built on top of Matplotlib and provides a higher-
level interface for drawing attractive and informative
statistical graphics. It simplifies the process of creating
complex visualizations such as heatmaps, violin plots, and
more.

 scikit-learn:
Scikit-learn is a versatile machine learning library for Python.
It includes various tools for supervised and unsupervised
learning, such as regression, classification, clustering, and
dimensionality reduction. LinearRegression is a model class
for fitting linear regression models, and train_test_split is a
function for splitting data into training and testing sets. The
mean_squared_error is a function that calculates the mean
squared error between predicted values and actual values,
commonly used to evaluate regression models.

8
Implementation Steps

1. Setting up the Environment


Python and the necessary libraries installed. We can create a
virtual environment for our project and install the required
libraries.

2. Loading the Dataset


Load the Indian Agriculture Crop Production Data into a Pandas
DataFrame.

3. Data Overview and Preprocessing


Get an overview of the dataset and preprocess it as necessary.

4. Exploratory Data Analysis (EDA)


Analyze the data to understand trends and patterns. Using Data
Visualizations

5. Trend Analysis
Analyzes the trends in crop production to identify patterns and
seasonal variations.

6. Future Data Prediction using Linear Regression Model


Predicts future crop production based on historical data.

9
7. Correlation Analysis
Examines the relationships between different variables to
understand their interdependencies.

8. Seasonal Analysis
Analyzes the seasonal patterns in crop production to understand
the impact of seasons.

9. Linear Regression
Applied to predict future crop production based on historical
data.

10. Train-Test Split


Used to validate the performance of the predictive models.

11. Yield Prediction Model (Mean Squared Error) Evaluates


the accuracy of the yield prediction model using the Mean
Squared Error metric.

10
3. Experimental Setup and Results
Microsoft Visual Studio code:

Visual Studio Code is a source-code editor that can be used with a


variety of programming languages, including Java, JavaScript, Go,
Node.js, Python and C++. It is based on the Electron framework, which
is used to develop Node.js Web applications that run on the Blink layout
engine. Visual Studio Code employs the same editor component
(codenamed "Monaco") used in Azure DevOps(formerly called Visual
Studio Online and Visual Studio Team Services).

Instead of a project system, it allows users to open one or more


directories, which can then be saved in workspaces for future reuse. This
allows it to operate as a language- agnostic code editor for any language.
It supports a number of programming languages and a set of features
that differs per language. Unwanted files and folders can be excluded
from the project tree via the settings. Many Visual Studio Code features
are not exposed through menus or the user interface but can be accessed
via the command palette.

Visual Studio Code can be extended via extensions availablethrough a


central repository. This includes additions to the editor and language
support. A notable feature is the abilityto create extensions that add
support for new languages, themes, and debuggers, perform static code
analysis, and add code linters using the Language Server Protocol

11
CSV

A CSV (Comma-Separated Values) file is a plain text file that stores


tabular data in a simple format, making it easy to import and export data
between different applications. Each line in a CSV file corresponds to a
row in the table, with fields separated by commas. The first line
typically contains headers that describe the fields. CSV files are highly
portable and universally supported, allowing for seamless data exchange
across various platforms and software. Their simplicity also makes them
easy to create, read, and edit with any text editor, ensuring accessibility
and flexibility for data handling in projects.

CSV files are especially useful in data analysis and machine learning
projects where large datasets need to be processed efficiently. Their
straightforward structure allows for quick parsing and integration with
numerous data processing libraries in programming languages like
Python, R, and Java. For instance, in Python, libraries such as pandas
provide robust tools for reading, writing, and manipulating CSV data,
facilitating tasks like data cleaning, transformation, and visualization.
Furthermore, the simplicity of CSV files ensures minimal overhead and
compatibility issues, making them an ideal choice for both small-scale
data operations and large-scale data workflows in various domains.

12
Methodology

The methodology of this project involves several steps to


analyze Indian agriculture crop production and derive
meaningful insights as listed below:

Data Collection:
Gather data from reliable sources, including parameters like
crop type, year, area under cultivation, production, yield,
and weather conditions.
Obtain data in CSV format for easy storage and analysis.

Data Preprocessing:
 Data Cleaning: Address missing values, remove
duplicates, and correct inconsistencies.
 Data Transformation: Ensure correct data types and
create derived features as needed.

Exploratory Data Analysis (EDA):


 Conduct EDA to understand data distribution, identify
trends, and detect outliers.
 Use visualizations (histograms, box plots, scatter plots,
heatmaps) to explore variable relationships.

13
Correlation Analysis:
 Calculate correlation coefficients to evaluate relationships
between variables like rainfall, temperature, and crop yield.
 Identify key factors significantly correlated with crop
production.

Predictive Modeling:
 Model Selection: Choose machine learning models (e.g.,
Linear Regression) for future crop production prediction.
 Model Training: Split data into training and testing sets,
then train the models.
 Model Evaluation: Use metrics such as Mean Squared
Error (MSE) to assess model accuracy.

14
Database Description

 Year: The year in which the data was recorded (e.g., 2018-19,
2019-20).
 Crop: The type of crop being analyzed (e.g., rice, wheat, maize).
 Area: The area under cultivation, typically measured in hectares.
 Production: The total production of the crop, usually measured in
tonnes.
 Yield: The yield of the crop, calculated as production per unit area
(e.g., tonnes per hectare).
 Geographical Location: Details about the location of cultivation,
including state, district, and village.

Fields DataTypes
State object
District object
Crop object
Year object
Season object
Area float64
Area Units object
Production float64
Production Units object
Yield float64

15
4. Analysis of the results
Code:
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample data for illustration


def scroll_to_top():
scroll_to_top_js = """
<script>
window.scrollTo(0, 0);
</script>
"""
st.markdown(scroll_to_top_js, unsafe_allow_html=True)

def main():
scroll_to_top()

@st.cache_data
def load_data():
data = pd.read_csv('India Agriculture Crop Production.csv')
return data

data = load_data()

# Custom CSS to make the sidebar collapsible


st.markdown(
"""
<style>
.css-1d391kg {
transition: margin-left 0.3s;
}
.css-1d391kg[data-expanded="false"] {

16
margin-left: -20rem;
}
.css-1d391kg[data-expanded="true"] {
margin-left: 0;
}
</style>
""",
unsafe_allow_html=True,
)

# Sidebar content
st.sidebar.title("Navigation")

if st.sidebar.button("Introduction"):
st.session_state.page = "Introduction"

if st.sidebar.button("Analysis of Data"):
st.session_state.page = "Analysis of Data"

if st.sidebar.button("Data Cleaning"):
st.session_state.page = "Data Cleaning"

if st.sidebar.button("Visual Analysis"):
st.session_state.page = "Visual Analysis"

if st.sidebar.button("Trend Analysis"):
st.session_state.page = "Trend Analysis"

if st.sidebar.button("Correlation Analysis"):
st.session_state.page = "Correlation Analysis"

if st.sidebar.button("Seasonal Analysis"):
st.session_state.page = "Seasonal Analysis"

if st.sidebar.button("Yield Prediction Model"):


st.session_state.page = "Yield Prediction Model"

# Initialize session state variables if they don't exist


if 'show_crop_production_years' not in st.session_state:
st.session_state.show_crop_production_years = False

17
if 'show_crop_production_state' not in st.session_state:
st.session_state.show_crop_production_state = False
if 'show_area_cultivation_state' not in st.session_state:
st.session_state.show_area_cultivation_state = False
if 'show_share_area_cultivation_year' not in st.session_state:
st.session_state.show_share_area_cultivation_year = False
if 'show_production_state_year' not in st.session_state:
st.session_state.show_production_state_year = False
if 'show_production_crop_year' not in st.session_state:
st.session_state.show_production_crop_year = False
if 'show_selected_state_crop_production' not in st.session_state:
st.session_state.show_selected_state_crop_production = False
if 'show_selected_crop_production_top_states' not in st.session_state:
st.session_state.show_selected_crop_production_top_states = False
if 'show_total_production_rice_wheat' not in st.session_state:
st.session_state.show_total_production_rice_wheat = False
if 'show_heat_map_average_yield_by_state_year' not in
st.session_state:
st.session_state.show_heat_map_average_yield_by_state_year = False
if 'show_total_production' not in st.session_state:
st.session_state.show_total_production = False
if 'show_future_data_prediction' not in st.session_state:
st.session_state.show_future_data_prediction = False
if 'show_seasonal_analysis' not in st.session_state:
st.session_state.show_seasonal_analysis = False
if 'show_yield_prediction_model'not in st.session_state:
st.session_state.show_yield_prediction_model = False

if 'page' not in st.session_state:


st.session_state.page = "Introduction"

if st.session_state.page == "Introduction":
st.title("India Agriculture Crop Production Analysis")
st.write("""
## Welcome to the Introduction Tab
This project is focused on analyzing the agriculture crop
production in India. The aim of this analysis is to
provide insights into crop production trends, identify high-
performing crops and districts, and utilize various

18
data visualization and machine learning techniques to
understand and predict agricultural productivity.

### Project Overview


India is one of the largest agricultural producers in the
world, and understanding the dynamics of crop
production is crucial for ensuring food security and
optimizing resource allocation. This project leverages
historical data on crop production to derive meaningful
insights.

### Types of Analysis Conducted


- **Data Cleaning**: Prepares the data for analysis by
handling missing values, outliers, and inconsistencies.
- **Crop-wise Analysis**: Identifies the top crops in terms of
production.
- **District-wise Analysis**: Identifies the top districts in
terms of crop production.
- **Year-wise Analysis**: Give the analysis of the data by the
usere year as main factor.

### Data Visualizations Used


- **Bar Charts**: Used to display the average production of
top crops and districts.
- **Line Charts**: Used to show trends in crop production over
time (if applicable).
- **Scatter Plots**: Used to examine relationships between
different variables (if applicable).
- **Heat Map**: Used to show a graphical representation of
data where values are depicted by color.

### Machine Learning Algorithms Used


- **Trend Analysis**: Analyzes the trends in crop production
to identify patterns and seasonal variations.
- **Future Data Prediction using Linear Regression Model**:
Predicts future crop production based on historical data.
- **Correlation Analysis**: Examines the relationships between
different variables to understand their interdependencies.
- **Seasonal Analysis**: Analyzes the seasonal patterns in
crop production to understand the impact of seasons.

19
- **Linear Regression**: Applied to predict future crop
production based on historical data.
- **Train-Test Split**: Used to validate the performance of
the predictive models.
- **Yield Prediction Model (Mean Squared Error)**: Evaluates
the accuracy of the yield prediction model using the Mean Squared
Error metric.

### Aim of the Project


The primary aim of this project is to analyze and visualize
crop production data in India to uncover patterns
and trends that can inform agricultural policy and decision-
making. By identifying the factors that contribute
to high crop yields, stakeholders can develop strategies to
enhance productivity and sustainability in Indian
agriculture.

### Conclusion
This project provides a comprehensive analysis of agricultural
crop production in India, offering valuable
insights through data visualization and machine learning
techniques. We hope that this analysis will contribute
to a better understanding of India's agricultural landscape
and support efforts to improve crop production
efficiency and food security.
""")

elif st.session_state.page == "Analysis of Data":


st.title("Analysis of Data")
st.write("""
## Welcome to the Analysis of Data Tab
In this section, we will get some basic understanding of the
data used, columns present
in the data, the dataTypes in it ,etc.
""")

# First Few Rows of the Dataset


st.write("### First Few Rows of the Dataset")
st.write("""

20
**First Few Rows of the Dataset**: This displays the first few
rows of the dataset to give an overview of the data structure and
contents.
""")
st.write(data.head())

# Summary statistics
st.write("### Summary Statistics")
st.write("""
**Summary Statistics**: Provides basic descriptive statistics
such as mean, standard deviation, min, max, and quartiles for each
numeric column. This helps in understanding the distribution and
spread of the data.
""")
st.write(data.describe())

# Data type information


st.write("### Data Types")
st.write("""
**Data Types**: Shows the data types of each column, which is
important to ensure that the data types are appropriate for analysis
(e.g., numeric columns should be of a numeric type).
""")
st.write(data.dtypes)

elif st.session_state.page == "Data Cleaning":


st.title("Data Cleaning")
st.write("""
## Welcome to the Data Cleaning Tab
In this secction, we will perform data cleaning to prepare the
dataset for analysis.
This involves examining the first few rows of the dataset,
summarizing statistics,
checking data types, and identifying any missing values.

Data cleaning is essential to ensure that our analyses and


machine learning models
are accurate and reliable.
""")

21
# Check for missing values
st.write("### Missing Values")
st.write("""
**Missing Values**: Lists the number of missing values in each
column. Identifying missing values is crucial as they need to be
handled before further analysis.
""")
missing_values = data.isnull().sum()
st.write(missing_values)

# Drop missing values


st.write("### Data after Dropping Missing Values")
st.write("""
**Data after Dropping Missing Values**: Displays the dataset
after removing rows with missing values. This step ensures that
subsequent analyses are not affected by incomplete data.
""")
data_cleaned = data.dropna()
st.write(data_cleaned.head())

# Ensure all columns have compatible data types


for col in data_cleaned.select_dtypes(include=['object']).columns:
try:
data_cleaned[col] = pd.to_numeric(data_cleaned[col])
except ValueError:
data_cleaned[col] = data_cleaned[col].astype(str)

st.write("### Data Types after Conversion")


st.write("""
**Data Types after Conversion**: Displays the data types after
converting object columns to numeric or string types, ensuring
compatibility with Arrow.
""")
st.write(data_cleaned.dtypes)

# Summary by State and Crop


st.write("### Summary Statistics by State and Crop")
st.write("""
**Summary Statistics by State and Crop**: Provides descriptive
statistics for 'Area', 'Production', and 'Yield' grouped by 'State'

22
and 'Crop'. This allows for a detailed analysis of these metrics
across different states and crops.
""")
summary_by_state_crop = data_cleaned.groupby(['State',
'Crop'])[['Area', 'Production', 'Yield']].describe()
st.write(summary_by_state_crop)

elif st.session_state.page == "Visual Analysis":


st.title("Learning Data Analysis Through Visualization")
st.write("Welcome to the Learning Data Analysis Through
Visualization tab.")

# Crop Production Over the Years


st.write("### Crop Production Over the Years")
if st.button("Show Crop Production Over the Years"):
st.session_state.show_crop_production_years = not
st.session_state.show_crop_production_years

if st.session_state.show_crop_production_years:
@st.cache_resource
def plot_crop_production_years():
plt.figure(figsize=(12, 6))
sns.lineplot(data=data, x='Year', y='Production')
plt.title('Crop Production Over the Years')
plt.xlabel('Year')
plt.ylabel('Production')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_crop_production_years()

# Crop Production by State


st.write("### Crop Production by State")
if st.button("Show Crop Production by State"):
st.session_state.show_crop_production_state = not
st.session_state.show_crop_production_state

if st.session_state.show_crop_production_state:
@st.cache_resource
def plot_crop_production_state():
plt.figure(figsize=(12, 8))

23
sns.barplot(data=data, x='State', y='Production',
estimator=sum)
plt.title('Crop Production by State')
plt.xlabel('State')
plt.ylabel('Total Production')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_crop_production_state()

# Area under Cultivation by State


st.write("### Area under Cultivation by State")
if st.button("Show Area under Cultivation by State"):
st.session_state.show_area_cultivation_state = not
st.session_state.show_area_cultivation_state

if st.session_state.show_area_cultivation_state:
year = st.selectbox("Select Year", data['Year'].unique())

@st.cache_resource
def plot_area_cultivation_state(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
area_by_state =
grouped_state['Area'].sum().sort_values(ascending=False)

plt.figure(figsize=(12, 4))
plt.bar(area_by_state.index, area_by_state / 1e7)
plt.title(f'Area under Cultivation by State {year}
(million hect)')
plt.ylabel('Area under Cultivation (million hect)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_area_cultivation_state(year)

# Share of Area under Cultivation in Year


st.write("### Share of Area under Cultivation in Year")
if st.button("Show Share of Area under Cultivation in Year"):

24
st.session_state.show_share_area_cultivation_year = not
st.session_state.show_share_area_cultivation_year

if st.session_state.show_share_area_cultivation_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="share_area_cultivation_year")

@st.cache_resource
def plot_share_area_cultivation_year(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
area_by_state =
grouped_state['Area'].sum().sort_values(ascending=False)
pie_break = [i for i in area_by_state.head(10)] +
[area_by_state.sum() - (area_by_state.head(10).sum())]
pie_labels = [i for i in area_by_state.head(10).index] +
['other']

plt.figure(figsize=(10, 6))
plt.pie(pie_break, labels=pie_labels, autopct='%.2f%%')
plt.title(f'Share of Area under Cultivation in Year
{year}')
st.pyplot(plt)
plot_share_area_cultivation_year(year)

# Production by State in Year


st.write("### Production by State in Year")
if st.button("Show Production by State in Year"):
st.session_state.show_production_state_year = not
st.session_state.show_production_state_year

if st.session_state.show_production_state_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="production_state_year")

@st.cache_resource
def plot_production_state_year(year):
crop_df = pd.DataFrame(data)

25
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
prod_by_state =
grouped_state['Production'].sum().sort_values(ascending=False)

plt.figure(figsize=(18, 4))
plt.bar(prod_by_state.index, prod_by_state / 1e7)
plt.title(f'Production by State in Year {year} (million
hect)')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_production_state_year(year)

# Production by Crop in Year


st.write("### Production by Crop in Year")
if st.button("Show Production by Crop in Year"):
st.session_state.show_production_crop_year = not
st.session_state.show_production_crop_year

if st.session_state.show_production_crop_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="production_crop_year")

@st.cache_resource
def plot_production_crop_year(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_crop = crop_df_Year.groupby('Crop')
percent_crop =
grouped_crop['Production'].sum().sort_values(ascending=False)

plt.figure(figsize=(18, 4))
plt.bar(percent_crop.index, percent_crop)
plt.title(f'Production by Crop in Year {year} (million
hect)')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)

26
st.pyplot(plt)
plot_production_crop_year(year)

# Selected State and Crop Production


st.write("### Selected State and Crop Production")
if st.button("Show Selected State and Crop Production"):
st.session_state.show_selected_state_crop_production = not
st.session_state.show_selected_state_crop_production

if st.session_state.show_selected_state_crop_production:
year = st.selectbox("Select Year", data['Year'].unique(),
key="selected_state_crop_year")
crop = st.selectbox("Select Crop", data['Crop'].unique(),
key="selected_state_crop_crop")

@st.cache_resource
def plot_selected_state_crop_production(year, crop):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_year = crop_df[crop_df.Year == year]
selected_crop_df = crop_df_year[crop_df_year.Crop == crop]
production_by_state =
selected_crop_df.groupby('State')['Production'].sum().sort_values(asce
nding=False)

plt.figure(figsize=(15, 5))
plt.bar(production_by_state.index, production_by_state /
1e6)
plt.title(f'{crop} production by State {year} (million
tonnes)')
plt.ylabel(f'{crop} production (mill tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_selected_state_crop_production(year, crop)

# Selected Crop Production Across Top 10 States


st.write("### Selected Crop Production Across Top 10 States")
if st.button("Show Selected Crop Production Across Top 10
States"):

27
st.session_state.show_selected_crop_production_top_states =
not st.session_state.show_selected_crop_production_top_states

if st.session_state.show_selected_crop_production_top_states:
crop = st.selectbox("Select Crop", data['Crop'].unique(),
key="selected_crop_production_top_states_crop")

@st.cache_resource
def plot_selected_crop_production_top_states(crop):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
selected_crop_df = crop_df[crop_df['Crop'] == crop]
production_by_state =
selected_crop_df.groupby('State')['Production'].sum().sort_values(asce
nding=False).head(10)

plt.figure(figsize=(15, 5))
plt.bar(production_by_state.index, production_by_state /
1e6)
plt.title(f'{crop} Production by State (Million Tonnes)')
plt.ylabel(f'{crop} Production (Million Tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_selected_crop_production_top_states(crop)

# Total Production of Rice & Wheat


st.write("### Total Production of Rice & Wheat")
if st.button("Show Total Production of Rice & Wheat"):
st.session_state.show_total_production_rice_wheat = not
st.session_state.show_total_production_rice_wheat

if st.session_state.show_total_production_rice_wheat:
@st.cache_resource
def plot_total_production_rice_wheat():
rw_years = data[data.Crop.isin(['Rice',
'Wheat'])][['Year', 'Yield', 'Area', 'Production', 'State']]
rw_years.drop(rw_years.index[rw_years.Year == '2020-21'],
inplace=True)
rw_group = rw_years.groupby('Year')

28
plt.figure(figsize=(14, 8))
plt.plot(rw_group['Production'].sum() / 1e7)
plt.title('Total Production of Rice & Wheat Over the
Years')
plt.xlabel('Year')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)

plot_total_production_rice_wheat()

# Total Production of Rice & Wheat


st.write("### Heat Map Average Yield by State and Year")
if st.button("Show Heat Map Average Yield by State and Year"):
st.session_state.show_heat_map_average_yield_by_state_year =
not st.session_state.show_heat_map_average_yield_by_state_year

if st.session_state.show_heat_map_average_yield_by_state_year:
@st.cache_resource
def plot_heat_map_average_yield_by_state_year():
rw_years = data[data.Crop.isin(['Rice',
'Wheat'])][['Year', 'Yield', 'Area', 'Production', 'State']]
rw_years.drop(rw_years.index[rw_years.Year == '2020-21'],
inplace=True)
heatmap_df = rw_years[['State', 'Year',
'Yield']].groupby(['State', 'Year'])['Yield'].mean().unstack(level=-1)

# Handle missing values if necessary (e.g., fill with 0 or


a specific value)
heatmap_df = heatmap_df.fillna(0)

# Plot the heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_df, annot=True, cmap='viridis')
plt.title('Average Yield by State and Year')
st.pyplot(plt)

plot_heat_map_average_yield_by_state_year()

29
elif st.session_state.page == "Trend Analysis":
st.title("Trend Analysis")
st.write("Welcome to the Learning Data Analysis Through Other
Analysis Algorithms tab.")

# Total Crop Production in India


st.write("### Total Crop Production in India (1997-2020)")
if st.button("Total Crop Production in India"):
st.session_state.show_total_production = not
st.session_state.show_total_production

if st.session_state.show_total_production:
@st.cache_resource
def plot_total_production():
data.drop(data.index[data.Year == '2020-21'], inplace =
True)
production_trend =
data.groupby('Year')['Production'].sum()
plt.figure(figsize=(12, 6))
plt.plot(production_trend.index, production_trend.values,
marker='o')
plt.title('Total Crop Production in India (1997-2020)')
plt.xlabel('Year')
plt.ylabel('Total Production (Tonnes)')
plt.grid(True)
plt.xticks(rotation=90)
st.pyplot(plt)
plot_total_production()

# Future Data Prediction Linear Regression Model


st.write("## Future Data Prediction Linear Regression Model")
st.write("### Why to use a Linear Regression Model ?")
st.write("""A Linear Regression model is used in this function to
identify
and quantify the trend in historical crop production
data. It helps in predicting
future crop production by extending the linear trend
observed in past data. The simplicity
and interpretability of Linear Regression make it a
suitable choice for forecasting future

30
values based on historical trends. If the data shows a
consistent linear trend, this model provides
a straightforward method for making future
projections.""")
if st.button("Show Future Data Prediction Graph"):
st.session_state.show_future_data_prediction = not
st.session_state.show_future_data_prediction

if st.session_state.show_future_data_prediction:
@st.cache_resource
def plot_future_data_prediction():
data.drop(data.index[data.Year == '2020-21'],
inplace=True, errors='ignore')

data['Year'] = data['Year'].apply(lambda x: int(x.split('-


')[0]))

production_trend =
data.groupby('Year')['Production'].sum()

X = production_trend.index.values.reshape(-1, 1)
y = production_trend.values

model = LinearRegression()
model.fit(X, y)

future_years = np.arange(X[-1] + 1, X[-1] + 6).reshape(-1,


1)
predictions = model.predict(future_years)

plt.figure(figsize=(12, 6))
plt.plot(production_trend.index, production_trend.values,
marker='o', label='Actual Production')

plt.plot(future_years, predictions, marker='x',


linestyle='--', color='red', label='Predicted Production')

plt.title('Total Crop Production in India (1997-2025)')


plt.xlabel('Year')
plt.ylabel('Total Production (Tonnes)')

31
plt.grid(True)
plt.xticks(rotation=90)
plt.legend()
st.pyplot(plt)
plot_future_data_prediction()

elif st.session_state.page == "Correlation Analysis":


st.title("Correlation Analysis")
st.write(""" **Correlation Analysis**:
Correlation analysis helps in understanding the
relationship between different variables
related to crop production. For instance, it can reveal
how factors like rainfall, temperature,
soil pH, and fertilizer usage are correlated with crop
yield.""")

# Function to convert 'Year' from '2001-02' format to a numerical


format
def convert_year(year_str):
start_year, end_year = year_str.split('-')
start_year, end_year = int(start_year), int("20" + end_year)
return (start_year + end_year) / 2

# Create a temporary column for the numerical year


data['Temp_Year'] = data['Year'].apply(convert_year)

# Function to display correlation between two fields


def display_correlation(data, field1, field2):
correlation = data[[field1, field2]].corr()
if (field1=='Temp_Year'):
field1='Year'
st.write(f"### Correlation between {field1} and {field2}")
st.write(correlation)

# Correlation between Area and Production


display_correlation(data, 'Area', 'Production')

# Correlation between Area and Yield


display_correlation(data, 'Area', 'Yield')

32
# Correlation between Production and Yield
display_correlation(data, 'Production', 'Yield')

# Correlation between Year and Production using Temp_Year


display_correlation(data, 'Temp_Year', 'Production')

# Correlation between Year and Yield using Temp_Year


display_correlation(data, 'Temp_Year', 'Yield')

# Correlation between Year and Area using Temp_Year


display_correlation(data, 'Temp_Year', 'Area')

# Drop the temporary column after analysis


data.drop(columns=['Temp_Year'], inplace=True)

elif st.session_state.page == "Seasonal Analysis":


st.title("Seasonal Analysis")
st.write("""Seasonal analysis is used in projects to identify and
understand patterns that occur
at regular intervals over a specific period, such as
weeks, months, quarters, or years.
This analysis helps in forecasting, decision-making, and
strategy formulation.""")
if st.button("Show Seasonal Analysis Graph"):
st.session_state.show_seasonal_analysis = not
st.session_state.show_seasonal_analysis

if st.session_state.show_seasonal_analysis:
@st.cache_resource
def plot_seasonal_analysis():
# Boxplot of production by season
data.drop(data.index[data.Season == 'Whole Year'], inplace
= True)
# data.Season != 'Whole Season'
plt.figure(figsize=(12, 6))
sns.boxplot(x='Season', y='Production', data=data)
plt.title('Production by Season')
plt.xlabel('Season')
plt.ylabel('Production (Tonnes)')
plt.xticks(rotation=45)

33
st.pyplot(plt)

plot_seasonal_analysis()

elif st.session_state.page == "Yield Prediction Model":


st.title("Yield Prediction Model")
st.write("""A Yield Prediction Model is essential for optimizing
resource use, financial planning,
and risk management in agriculture. It enables accurate
forecasting of crop yields, helping
farmers and businesses make informed decisions.
Calculating Mean Squared Error (MSE) is crucial
as it measures the average squared difference between
actual and predicted values, providing a
clear metric for model accuracy. Lower MSE values
indicate better model performance, guiding
improvements and comparisons between different
models.""")
if st.button("Show Yield Prediction Model Graph"):
st.session_state.show_yield_prediction_model = not
st.session_state.show_yield_prediction_model

if st.session_state.show_yield_prediction_model:
@st.cache_resource
def plot_yield_prediction_model():
data = pd.read_csv('India Agriculture Crop
Production.csv')
data = data.dropna(subset=['Area', 'Production', 'Yield'])

data[['Area', 'Production', 'Yield']] = data[['Area',


'Production', 'Yield']].apply(pd.to_numeric, errors='coerce')

X = data[['Area', 'Production']]
y = data['Yield']

# Check for any remaining NaNs


if X.isnull().any().any() or y.isnull().any():
st.write("Data contains missing values.")
return

34
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)


st.write(f'Mean Squared Error: {mse}')

st.write("Actual vs Predicted Production")


comparison = pd.DataFrame({'Actual': y_test, 'Predicted':
y_pred})
st.line_chart(comparison)
plot_yield_prediction_model()

if __name__ == "__main__":
main()

Command to run the Project

python -m streamlit run agriculture_app1.py

35
Screenshots:

36
37
38
39
40
41
42
43
44
45
46
47
5. Conclusion

In conclusion, the Indian Agriculture Crop


Production Analysis provides critical insights into the
trends, patterns, and influencing factors of crop yield over
the years. By leveraging techniques such as correlation
analysis and predictive modeling with linear regression,
we can identify key variables that significantly impact
production.
This analysis not only helps in understanding past
performance but also enables accurate forecasting of
future yields, aiding in strategic planning and decision-
making. The integration of data science and machine
learning models, such as the Mean Squared Error
evaluation, enhances the accuracy of predictions and
optimizes agricultural practices.
Ultimately, this comprehensive analysis serves as a
valuable tool for policymakers, farmers, and researchers
to improve crop management, ensure food security, and
drive sustainable agricultural growth in India.

48
6. Future Enhancement
Remote Sensing and Satellite Imagery: Utilize remote sensing
technologies and satellite imagery to monitor crop health, soil moisture,
and other critical parameters in real-time, enabling more precise and
timely interventions.

IoT Integration: Deploy Internet of Things (IoT) devices in fields to


collect real-time data on weather conditions, soil properties, and crop
health. This data can be integrated with predictive models to enhance
decision-making.

Climate Change Impact Analysis: Conduct detailed studies on the


impact of climate change on crop production. Develop adaptive
strategies and models to mitigate adverse effects and ensure resilience in
agricultural practices.

Precision Agriculture: Implement precision agriculture techniques that


use data analytics to optimize the use of inputs like water, fertilizers, and
pesticides, thereby increasing efficiency and reducing environmental
impact.

Mobile Applications for Farmers: Develop user-friendly mobile


applications that provide farmers with real-time data, predictive insights,
and recommendations based on the latest analysis, empowering them to
make informed decisions.

49
References
https://www.youtube.com/
https://www.kaggle.com/
https://docs.streamlit.io/

50

You might also like