0% found this document useful (0 votes)

9 views50 pages

Viraj Project Documentation

The project report presents an analysis of agriculture crop production in India, aiming to identify trends, high-performing crops, and utilize machine learning techniques for predictions. It includes methodologies for data collection, preprocessing, exploratory data analysis, and predictive modeling using various libraries such as Pandas and scikit-learn. The findings are intended to inform agricultural policy and enhance productivity and sustainability in Indian agriculture.

Uploaded by

shaikhaaqif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views50 pages

Viraj Project Documentation

Uploaded by

shaikhaaqif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

A

Project Report on
India Agriculture Crop
Production Analysis
Submitted to
UNIVERSITY OF MUMBAI
In the partial fulfillment of the degree
Of Masters of Computer Science
Project By:
Mr. Viraj Vasudev Pawasakar
Exam Seat No:
1183893
Under the Guidance of
Mrs. Rupali Agavekar
Navkokan Education Society’s
D.B.J College, Chiplun
(2023-2024)
1
Navkokan Education Society’s

D.B.J. COLLEGE, CHIPLUN

NAAC Reaccredited Grade ‘A’ (CGPA 3.15)
DEPARTMENT OF COMPUTER SCIENCE

CERTIFICATE
This is to certify that Mr. Viraj Vasudev Pawasakar of
MSc. Part-II (Semester IV) Computer Science has
successfully completed the Project in Machine Learning
and has submitted the same to my satisfaction during the
academic year 2023-24 towards partial fulfillment of
MSc. Part-II (Semester IV) Computer Science,
University of Mumbai.

Date:
Guide Signature:

INCHARGE
Department of Computer Science2
Acknowledgement

It’s my great pleasure to take opportunity and

sincerely thanks all those who have showed me the way to
successful project and helped me a lot during the
completion of my project.
I greatly thank my Project Guide Mrs. Rupali
Agavekar without whom the completion of this project
couldn’t have been Possible.
My sincerely thanks to respected Head of Computer
Science Department Mr. S. J. Nalawade for providing all
the facilities including availability of Computer Lab. I
take this opportunity to express my deep gratitude
towards all the members of the Computer Science
Department, for helping me in the completion of the
project.
My special thanks to my parents, my friends and all
those people who have encouraged me, helped me to
complete this project proposal successfully in time

Mr. Viraj Vasudev Pawasakar

M.Sc. Part-II (Computer Science)
3
Table of Content
Sr. No Title Page No
1. Topic 5
2. Implementation details 6
3. Experimental setups and results 11
4. Analysis of the results 16
5. Conclusion 48
6. Future enhancement 49

4
India Agriculture Crop
Production Analysis
Mr. Viraj Vasudev Pawasakar

A dissertation submitted in partial fulfillment of D.B.J

College (Chiplun) for the degree of MSC in Computer Science
(Machine Learning). July 2024

5
2. Implementation Details

This project is focused on analyzing the agriculture crop

production in India. The aim of this analysis is to provide
insights into crop production trends, identify high-performing
crops and districts, and utilize various data visualization and
machine learning techniques to understand and predict
agricultural productivity.

Project Overview
India is one of the largest agricultural producers in the world,
and understanding the dynamics of crop production is crucial for
ensuring food security and optimizing resource allocation. This
project leverages historical data on crop production to derive
meaningful insights.

Aim of the Project

The primary aim of this project is to analyze and visualize crop
production data in India to uncover patterns and trends that can
inform agricultural policy and decision-making. By identifying
the factors that contribute to high crop yields, stakeholders can
develop strategies to enhance productivity and sustainability in
Indian agriculture.

6
Libraries and Frameworks Used

 Streamlit:
Streamlit is a framework for creating web applications with
Python. It's used for building interactive and customizable
web-based interfaces for data analysis, machine learning, and
more.

 Pandas:
Pandas is a powerful data manipulation and analysis library. It
provides data structures like DataFrames and Series, which
are essential for handling structured data.

 NumPy:
NumPy is a fundamental package for numerical computing in
Python. It provides support for large, multi-dimensional
arrays and matrices, along with a collection of mathematical
functions to operate on these arrays.

 Matplotlib:
Matplotlib is a comprehensive library for creating static,
animated, and interactive visualizations in Python. pyplot is a
module in Matplotlib that provides a MATLAB-like interface
for plotting.

7
 Seaborn:
Seaborn is built on top of Matplotlib and provides a higher-
level interface for drawing attractive and informative
statistical graphics. It simplifies the process of creating
complex visualizations such as heatmaps, violin plots, and
more.

 scikit-learn:
Scikit-learn is a versatile machine learning library for Python.
It includes various tools for supervised and unsupervised
learning, such as regression, classification, clustering, and
dimensionality reduction. LinearRegression is a model class
for fitting linear regression models, and train_test_split is a
function for splitting data into training and testing sets. The
mean_squared_error is a function that calculates the mean
squared error between predicted values and actual values,
commonly used to evaluate regression models.

8
Implementation Steps

1. Setting up the Environment

Python and the necessary libraries installed. We can create a
virtual environment for our project and install the required
libraries.

2. Loading the Dataset

Load the Indian Agriculture Crop Production Data into a Pandas
DataFrame.

3. Data Overview and Preprocessing

Get an overview of the dataset and preprocess it as necessary.

4. Exploratory Data Analysis (EDA)

Analyze the data to understand trends and patterns. Using Data
Visualizations

5. Trend Analysis
Analyzes the trends in crop production to identify patterns and
seasonal variations.

6. Future Data Prediction using Linear Regression Model

Predicts future crop production based on historical data.

9
7. Correlation Analysis
Examines the relationships between different variables to
understand their interdependencies.

8. Seasonal Analysis
Analyzes the seasonal patterns in crop production to understand
the impact of seasons.

9. Linear Regression
Applied to predict future crop production based on historical
data.

10. Train-Test Split

Used to validate the performance of the predictive models.

11. Yield Prediction Model (Mean Squared Error) Evaluates

the accuracy of the yield prediction model using the Mean
Squared Error metric.

10
3. Experimental Setup and Results
Microsoft Visual Studio code:

Visual Studio Code is a source-code editor that can be used with a

variety of programming languages, including Java, JavaScript, Go,
Node.js, Python and C++. It is based on the Electron framework, which
is used to develop Node.js Web applications that run on the Blink layout
engine. Visual Studio Code employs the same editor component
(codenamed "Monaco") used in Azure DevOps(formerly called Visual
Studio Online and Visual Studio Team Services).

Instead of a project system, it allows users to open one or more

directories, which can then be saved in workspaces for future reuse. This
allows it to operate as a language- agnostic code editor for any language.
It supports a number of programming languages and a set of features
that differs per language. Unwanted files and folders can be excluded
from the project tree via the settings. Many Visual Studio Code features
are not exposed through menus or the user interface but can be accessed
via the command palette.

Visual Studio Code can be extended via extensions availablethrough a

central repository. This includes additions to the editor and language
support. A notable feature is the abilityto create extensions that add
support for new languages, themes, and debuggers, perform static code
analysis, and add code linters using the Language Server Protocol

11
CSV

A CSV (Comma-Separated Values) file is a plain text file that stores

tabular data in a simple format, making it easy to import and export data
between different applications. Each line in a CSV file corresponds to a
row in the table, with fields separated by commas. The first line
typically contains headers that describe the fields. CSV files are highly
portable and universally supported, allowing for seamless data exchange
across various platforms and software. Their simplicity also makes them
easy to create, read, and edit with any text editor, ensuring accessibility
and flexibility for data handling in projects.

CSV files are especially useful in data analysis and machine learning
projects where large datasets need to be processed efficiently. Their
straightforward structure allows for quick parsing and integration with
numerous data processing libraries in programming languages like
Python, R, and Java. For instance, in Python, libraries such as pandas
provide robust tools for reading, writing, and manipulating CSV data,
facilitating tasks like data cleaning, transformation, and visualization.
Furthermore, the simplicity of CSV files ensures minimal overhead and
compatibility issues, making them an ideal choice for both small-scale
data operations and large-scale data workflows in various domains.

12
Methodology

The methodology of this project involves several steps to

analyze Indian agriculture crop production and derive
meaningful insights as listed below:

Data Collection:
Gather data from reliable sources, including parameters like
crop type, year, area under cultivation, production, yield,
and weather conditions.
Obtain data in CSV format for easy storage and analysis.

Data Preprocessing:
 Data Cleaning: Address missing values, remove
duplicates, and correct inconsistencies.
 Data Transformation: Ensure correct data types and
create derived features as needed.

Exploratory Data Analysis (EDA):

 Conduct EDA to understand data distribution, identify
trends, and detect outliers.
 Use visualizations (histograms, box plots, scatter plots,
heatmaps) to explore variable relationships.

13
Correlation Analysis:
 Calculate correlation coefficients to evaluate relationships
between variables like rainfall, temperature, and crop yield.
 Identify key factors significantly correlated with crop
production.

Predictive Modeling:
 Model Selection: Choose machine learning models (e.g.,
Linear Regression) for future crop production prediction.
 Model Training: Split data into training and testing sets,
then train the models.
 Model Evaluation: Use metrics such as Mean Squared
Error (MSE) to assess model accuracy.

14
Database Description

 Year: The year in which the data was recorded (e.g., 2018-19,
2019-20).
 Crop: The type of crop being analyzed (e.g., rice, wheat, maize).
 Area: The area under cultivation, typically measured in hectares.
 Production: The total production of the crop, usually measured in
tonnes.
 Yield: The yield of the crop, calculated as production per unit area
(e.g., tonnes per hectare).
 Geographical Location: Details about the location of cultivation,
including state, district, and village.

Fields DataTypes
State object
District object
Crop object
Year object
Season object
Area float64
Area Units object
Production float64
Production Units object
Yield float64

15
4. Analysis of the results
Code:
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Sample data for illustration

def scroll_to_top():
scroll_to_top_js = """
<script>
window.scrollTo(0, 0);
</script>
"""
st.markdown(scroll_to_top_js, unsafe_allow_html=True)

def main():
scroll_to_top()

@st.cache_data
def load_data():
data = pd.read_csv('India Agriculture Crop Production.csv')
return data

data = load_data()

# Custom CSS to make the sidebar collapsible

st.markdown(
"""
<style>
.css-1d391kg {
transition: margin-left 0.3s;
}
.css-1d391kg[data-expanded="false"] {

16
margin-left: -20rem;
}
.css-1d391kg[data-expanded="true"] {
margin-left: 0;
}
</style>
""",
unsafe_allow_html=True,
)

# Sidebar content
st.sidebar.title("Navigation")

if st.sidebar.button("Introduction"):
st.session_state.page = "Introduction"

if st.sidebar.button("Analysis of Data"):
st.session_state.page = "Analysis of Data"

if st.sidebar.button("Data Cleaning"):
st.session_state.page = "Data Cleaning"

if st.sidebar.button("Visual Analysis"):
st.session_state.page = "Visual Analysis"

if st.sidebar.button("Trend Analysis"):
st.session_state.page = "Trend Analysis"

if st.sidebar.button("Correlation Analysis"):
st.session_state.page = "Correlation Analysis"

if st.sidebar.button("Seasonal Analysis"):
st.session_state.page = "Seasonal Analysis"

if st.sidebar.button("Yield Prediction Model"):

st.session_state.page = "Yield Prediction Model"

# Initialize session state variables if they don't exist

if 'show_crop_production_years' not in st.session_state:
st.session_state.show_crop_production_years = False

17
if 'show_crop_production_state' not in st.session_state:
st.session_state.show_crop_production_state = False
if 'show_area_cultivation_state' not in st.session_state:
st.session_state.show_area_cultivation_state = False
if 'show_share_area_cultivation_year' not in st.session_state:
st.session_state.show_share_area_cultivation_year = False
if 'show_production_state_year' not in st.session_state:
st.session_state.show_production_state_year = False
if 'show_production_crop_year' not in st.session_state:
st.session_state.show_production_crop_year = False
if 'show_selected_state_crop_production' not in st.session_state:
st.session_state.show_selected_state_crop_production = False
if 'show_selected_crop_production_top_states' not in st.session_state:
st.session_state.show_selected_crop_production_top_states = False
if 'show_total_production_rice_wheat' not in st.session_state:
st.session_state.show_total_production_rice_wheat = False
if 'show_heat_map_average_yield_by_state_year' not in
st.session_state:
st.session_state.show_heat_map_average_yield_by_state_year = False
if 'show_total_production' not in st.session_state:
st.session_state.show_total_production = False
if 'show_future_data_prediction' not in st.session_state:
st.session_state.show_future_data_prediction = False
if 'show_seasonal_analysis' not in st.session_state:
st.session_state.show_seasonal_analysis = False
if 'show_yield_prediction_model'not in st.session_state:
st.session_state.show_yield_prediction_model = False

if 'page' not in st.session_state:

st.session_state.page = "Introduction"

if st.session_state.page == "Introduction":
st.title("India Agriculture Crop Production Analysis")
st.write("""
## Welcome to the Introduction Tab
This project is focused on analyzing the agriculture crop
production in India. The aim of this analysis is to
provide insights into crop production trends, identify high-
performing crops and districts, and utilize various

18
data visualization and machine learning techniques to
understand and predict agricultural productivity.

### Project Overview

India is one of the largest agricultural producers in the
world, and understanding the dynamics of crop
production is crucial for ensuring food security and
optimizing resource allocation. This project leverages
historical data on crop production to derive meaningful
insights.

### Types of Analysis Conducted

- **Data Cleaning**: Prepares the data for analysis by
handling missing values, outliers, and inconsistencies.
- **Crop-wise Analysis**: Identifies the top crops in terms of
production.
- **District-wise Analysis**: Identifies the top districts in
terms of crop production.
- **Year-wise Analysis**: Give the analysis of the data by the
usere year as main factor.

### Data Visualizations Used

- **Bar Charts**: Used to display the average production of
top crops and districts.
- **Line Charts**: Used to show trends in crop production over
time (if applicable).
- **Scatter Plots**: Used to examine relationships between
different variables (if applicable).
- **Heat Map**: Used to show a graphical representation of
data where values are depicted by color.

### Machine Learning Algorithms Used

- **Trend Analysis**: Analyzes the trends in crop production
to identify patterns and seasonal variations.
- **Future Data Prediction using Linear Regression Model**:
Predicts future crop production based on historical data.
- **Correlation Analysis**: Examines the relationships between
different variables to understand their interdependencies.
- **Seasonal Analysis**: Analyzes the seasonal patterns in
crop production to understand the impact of seasons.

19
- **Linear Regression**: Applied to predict future crop
production based on historical data.
- **Train-Test Split**: Used to validate the performance of
the predictive models.
- **Yield Prediction Model (Mean Squared Error)**: Evaluates
the accuracy of the yield prediction model using the Mean Squared
Error metric.

### Aim of the Project

The primary aim of this project is to analyze and visualize
crop production data in India to uncover patterns
and trends that can inform agricultural policy and decision-
making. By identifying the factors that contribute
to high crop yields, stakeholders can develop strategies to
enhance productivity and sustainability in Indian
agriculture.

### Conclusion
This project provides a comprehensive analysis of agricultural
crop production in India, offering valuable
insights through data visualization and machine learning
techniques. We hope that this analysis will contribute
to a better understanding of India's agricultural landscape
and support efforts to improve crop production
efficiency and food security.
""")

elif st.session_state.page == "Analysis of Data":

st.title("Analysis of Data")
st.write("""
## Welcome to the Analysis of Data Tab
In this section, we will get some basic understanding of the
data used, columns present
in the data, the dataTypes in it ,etc.
""")

# First Few Rows of the Dataset

st.write("### First Few Rows of the Dataset")
st.write("""

20
**First Few Rows of the Dataset**: This displays the first few
rows of the dataset to give an overview of the data structure and
contents.
""")
st.write(data.head())

# Summary statistics
st.write("### Summary Statistics")
st.write("""
**Summary Statistics**: Provides basic descriptive statistics
such as mean, standard deviation, min, max, and quartiles for each
numeric column. This helps in understanding the distribution and
spread of the data.
""")
st.write(data.describe())

# Data type information

st.write("### Data Types")
st.write("""
**Data Types**: Shows the data types of each column, which is
important to ensure that the data types are appropriate for analysis
(e.g., numeric columns should be of a numeric type).
""")
st.write(data.dtypes)

elif st.session_state.page == "Data Cleaning":

st.title("Data Cleaning")
st.write("""
## Welcome to the Data Cleaning Tab
In this secction, we will perform data cleaning to prepare the
dataset for analysis.
This involves examining the first few rows of the dataset,
summarizing statistics,
checking data types, and identifying any missing values.

Data cleaning is essential to ensure that our analyses and

machine learning models
are accurate and reliable.
""")

21
# Check for missing values
st.write("### Missing Values")
st.write("""
**Missing Values**: Lists the number of missing values in each
column. Identifying missing values is crucial as they need to be
handled before further analysis.
""")
missing_values = data.isnull().sum()
st.write(missing_values)

# Drop missing values

st.write("### Data after Dropping Missing Values")
st.write("""
**Data after Dropping Missing Values**: Displays the dataset
after removing rows with missing values. This step ensures that
subsequent analyses are not affected by incomplete data.
""")
data_cleaned = data.dropna()
st.write(data_cleaned.head())

# Ensure all columns have compatible data types

for col in data_cleaned.select_dtypes(include=['object']).columns:
try:
data_cleaned[col] = pd.to_numeric(data_cleaned[col])
except ValueError:
data_cleaned[col] = data_cleaned[col].astype(str)

st.write("### Data Types after Conversion")

st.write("""
**Data Types after Conversion**: Displays the data types after
converting object columns to numeric or string types, ensuring
compatibility with Arrow.
""")
st.write(data_cleaned.dtypes)

# Summary by State and Crop

st.write("### Summary Statistics by State and Crop")
st.write("""
**Summary Statistics by State and Crop**: Provides descriptive
statistics for 'Area', 'Production', and 'Yield' grouped by 'State'

22
and 'Crop'. This allows for a detailed analysis of these metrics
across different states and crops.
""")
summary_by_state_crop = data_cleaned.groupby(['State',
'Crop'])[['Area', 'Production', 'Yield']].describe()
st.write(summary_by_state_crop)

elif st.session_state.page == "Visual Analysis":

st.title("Learning Data Analysis Through Visualization")
st.write("Welcome to the Learning Data Analysis Through
Visualization tab.")

# Crop Production Over the Years

st.write("### Crop Production Over the Years")
if st.button("Show Crop Production Over the Years"):
st.session_state.show_crop_production_years = not
st.session_state.show_crop_production_years

if st.session_state.show_crop_production_years:
@st.cache_resource
def plot_crop_production_years():
plt.figure(figsize=(12, 6))
sns.lineplot(data=data, x='Year', y='Production')
plt.title('Crop Production Over the Years')
plt.xlabel('Year')
plt.ylabel('Production')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_crop_production_years()

# Crop Production by State

st.write("### Crop Production by State")
if st.button("Show Crop Production by State"):
st.session_state.show_crop_production_state = not
st.session_state.show_crop_production_state

if st.session_state.show_crop_production_state:
@st.cache_resource
def plot_crop_production_state():
plt.figure(figsize=(12, 8))

23
sns.barplot(data=data, x='State', y='Production',
estimator=sum)
plt.title('Crop Production by State')
plt.xlabel('State')
plt.ylabel('Total Production')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_crop_production_state()

# Area under Cultivation by State

st.write("### Area under Cultivation by State")
if st.button("Show Area under Cultivation by State"):
st.session_state.show_area_cultivation_state = not
st.session_state.show_area_cultivation_state

if st.session_state.show_area_cultivation_state:
year = st.selectbox("Select Year", data['Year'].unique())

@st.cache_resource
def plot_area_cultivation_state(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
area_by_state =
grouped_state['Area'].sum().sort_values(ascending=False)

plt.figure(figsize=(12, 4))
plt.bar(area_by_state.index, area_by_state / 1e7)
plt.title(f'Area under Cultivation by State {year}
(million hect)')
plt.ylabel('Area under Cultivation (million hect)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_area_cultivation_state(year)

# Share of Area under Cultivation in Year

st.write("### Share of Area under Cultivation in Year")
if st.button("Show Share of Area under Cultivation in Year"):

24
st.session_state.show_share_area_cultivation_year = not
st.session_state.show_share_area_cultivation_year

if st.session_state.show_share_area_cultivation_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="share_area_cultivation_year")

@st.cache_resource
def plot_share_area_cultivation_year(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
area_by_state =
grouped_state['Area'].sum().sort_values(ascending=False)
pie_break = [i for i in area_by_state.head(10)] +
[area_by_state.sum() - (area_by_state.head(10).sum())]
pie_labels = [i for i in area_by_state.head(10).index] +
['other']

plt.figure(figsize=(10, 6))
plt.pie(pie_break, labels=pie_labels, autopct='%.2f%%')
plt.title(f'Share of Area under Cultivation in Year
{year}')
st.pyplot(plt)
plot_share_area_cultivation_year(year)

# Production by State in Year

st.write("### Production by State in Year")
if st.button("Show Production by State in Year"):
st.session_state.show_production_state_year = not
st.session_state.show_production_state_year

if st.session_state.show_production_state_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="production_state_year")

@st.cache_resource
def plot_production_state_year(year):
crop_df = pd.DataFrame(data)

25
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
prod_by_state =
grouped_state['Production'].sum().sort_values(ascending=False)

plt.figure(figsize=(18, 4))
plt.bar(prod_by_state.index, prod_by_state / 1e7)
plt.title(f'Production by State in Year {year} (million
hect)')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_production_state_year(year)

# Production by Crop in Year

st.write("### Production by Crop in Year")
if st.button("Show Production by Crop in Year"):
st.session_state.show_production_crop_year = not
st.session_state.show_production_crop_year

if st.session_state.show_production_crop_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="production_crop_year")

@st.cache_resource
def plot_production_crop_year(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_crop = crop_df_Year.groupby('Crop')
percent_crop =
grouped_crop['Production'].sum().sort_values(ascending=False)

plt.figure(figsize=(18, 4))
plt.bar(percent_crop.index, percent_crop)
plt.title(f'Production by Crop in Year {year} (million
hect)')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)

26
st.pyplot(plt)
plot_production_crop_year(year)

# Selected State and Crop Production

st.write("### Selected State and Crop Production")
if st.button("Show Selected State and Crop Production"):
st.session_state.show_selected_state_crop_production = not
st.session_state.show_selected_state_crop_production

if st.session_state.show_selected_state_crop_production:
year = st.selectbox("Select Year", data['Year'].unique(),
key="selected_state_crop_year")
crop = st.selectbox("Select Crop", data['Crop'].unique(),
key="selected_state_crop_crop")

@st.cache_resource
def plot_selected_state_crop_production(year, crop):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_year = crop_df[crop_df.Year == year]
selected_crop_df = crop_df_year[crop_df_year.Crop == crop]
production_by_state =
selected_crop_df.groupby('State')['Production'].sum().sort_values(asce
nding=False)

plt.figure(figsize=(15, 5))
plt.bar(production_by_state.index, production_by_state /
1e6)
plt.title(f'{crop} production by State {year} (million
tonnes)')
plt.ylabel(f'{crop} production (mill tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_selected_state_crop_production(year, crop)

# Selected Crop Production Across Top 10 States

st.write("### Selected Crop Production Across Top 10 States")
if st.button("Show Selected Crop Production Across Top 10
States"):

27
st.session_state.show_selected_crop_production_top_states =
not st.session_state.show_selected_crop_production_top_states

if st.session_state.show_selected_crop_production_top_states:
crop = st.selectbox("Select Crop", data['Crop'].unique(),
key="selected_crop_production_top_states_crop")

@st.cache_resource
def plot_selected_crop_production_top_states(crop):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
selected_crop_df = crop_df[crop_df['Crop'] == crop]
production_by_state =
selected_crop_df.groupby('State')['Production'].sum().sort_values(asce
nding=False).head(10)

plt.figure(figsize=(15, 5))
plt.bar(production_by_state.index, production_by_state /
1e6)
plt.title(f'{crop} Production by State (Million Tonnes)')
plt.ylabel(f'{crop} Production (Million Tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_selected_crop_production_top_states(crop)

# Total Production of Rice & Wheat

st.write("### Total Production of Rice & Wheat")
if st.button("Show Total Production of Rice & Wheat"):
st.session_state.show_total_production_rice_wheat = not
st.session_state.show_total_production_rice_wheat

if st.session_state.show_total_production_rice_wheat:
@st.cache_resource
def plot_total_production_rice_wheat():
rw_years = data[data.Crop.isin(['Rice',
'Wheat'])][['Year', 'Yield', 'Area', 'Production', 'State']]
rw_years.drop(rw_years.index[rw_years.Year == '2020-21'],
inplace=True)
rw_group = rw_years.groupby('Year')

28
plt.figure(figsize=(14, 8))
plt.plot(rw_group['Production'].sum() / 1e7)
plt.title('Total Production of Rice & Wheat Over the
Years')
plt.xlabel('Year')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)

plot_total_production_rice_wheat()

# Total Production of Rice & Wheat

st.write("### Heat Map Average Yield by State and Year")
if st.button("Show Heat Map Average Yield by State and Year"):
st.session_state.show_heat_map_average_yield_by_state_year =
not st.session_state.show_heat_map_average_yield_by_state_year

if st.session_state.show_heat_map_average_yield_by_state_year:
@st.cache_resource
def plot_heat_map_average_yield_by_state_year():
rw_years = data[data.Crop.isin(['Rice',
'Wheat'])][['Year', 'Yield', 'Area', 'Production', 'State']]
rw_years.drop(rw_years.index[rw_years.Year == '2020-21'],
inplace=True)
heatmap_df = rw_years[['State', 'Year',
'Yield']].groupby(['State', 'Year'])['Yield'].mean().unstack(level=-1)

# Handle missing values if necessary (e.g., fill with 0 or

a specific value)
heatmap_df = heatmap_df.fillna(0)

# Plot the heatmap

plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_df, annot=True, cmap='viridis')
plt.title('Average Yield by State and Year')
st.pyplot(plt)

plot_heat_map_average_yield_by_state_year()

29
elif st.session_state.page == "Trend Analysis":
st.title("Trend Analysis")
st.write("Welcome to the Learning Data Analysis Through Other
Analysis Algorithms tab.")

# Total Crop Production in India

st.write("### Total Crop Production in India (1997-2020)")
if st.button("Total Crop Production in India"):
st.session_state.show_total_production = not
st.session_state.show_total_production

if st.session_state.show_total_production:
@st.cache_resource
def plot_total_production():
data.drop(data.index[data.Year == '2020-21'], inplace =
True)
production_trend =
data.groupby('Year')['Production'].sum()
plt.figure(figsize=(12, 6))
plt.plot(production_trend.index, production_trend.values,
marker='o')
plt.title('Total Crop Production in India (1997-2020)')
plt.xlabel('Year')
plt.ylabel('Total Production (Tonnes)')
plt.grid(True)
plt.xticks(rotation=90)
st.pyplot(plt)
plot_total_production()

# Future Data Prediction Linear Regression Model

st.write("## Future Data Prediction Linear Regression Model")
st.write("### Why to use a Linear Regression Model ?")
st.write("""A Linear Regression model is used in this function to
identify
and quantify the trend in historical crop production
data. It helps in predicting
future crop production by extending the linear trend
observed in past data. The simplicity
and interpretability of Linear Regression make it a
suitable choice for forecasting future

30
values based on historical trends. If the data shows a
consistent linear trend, this model provides
a straightforward method for making future
projections.""")
if st.button("Show Future Data Prediction Graph"):
st.session_state.show_future_data_prediction = not
st.session_state.show_future_data_prediction

if st.session_state.show_future_data_prediction:
@st.cache_resource
def plot_future_data_prediction():
data.drop(data.index[data.Year == '2020-21'],
inplace=True, errors='ignore')

data['Year'] = data['Year'].apply(lambda x: int(x.split('-

')[0]))

production_trend =
data.groupby('Year')['Production'].sum()

X = production_trend.index.values.reshape(-1, 1)
y = production_trend.values

model = LinearRegression()
model.fit(X, y)

future_years = np.arange(X[-1] + 1, X[-1] + 6).reshape(-1,

1)
predictions = model.predict(future_years)

plt.figure(figsize=(12, 6))
plt.plot(production_trend.index, production_trend.values,
marker='o', label='Actual Production')

plt.plot(future_years, predictions, marker='x',

linestyle='--', color='red', label='Predicted Production')

plt.title('Total Crop Production in India (1997-2025)')

plt.xlabel('Year')
plt.ylabel('Total Production (Tonnes)')

31
plt.grid(True)
plt.xticks(rotation=90)
plt.legend()
st.pyplot(plt)
plot_future_data_prediction()

elif st.session_state.page == "Correlation Analysis":

st.title("Correlation Analysis")
st.write(""" **Correlation Analysis**:
Correlation analysis helps in understanding the
relationship between different variables
related to crop production. For instance, it can reveal
how factors like rainfall, temperature,
soil pH, and fertilizer usage are correlated with crop
yield.""")

# Function to convert 'Year' from '2001-02' format to a numerical

format
def convert_year(year_str):
start_year, end_year = year_str.split('-')
start_year, end_year = int(start_year), int("20" + end_year)
return (start_year + end_year) / 2

# Create a temporary column for the numerical year

data['Temp_Year'] = data['Year'].apply(convert_year)

# Function to display correlation between two fields

def display_correlation(data, field1, field2):
correlation = data[[field1, field2]].corr()
if (field1=='Temp_Year'):
field1='Year'
st.write(f"### Correlation between {field1} and {field2}")
st.write(correlation)

# Correlation between Area and Production

display_correlation(data, 'Area', 'Production')

# Correlation between Area and Yield

display_correlation(data, 'Area', 'Yield')

32
# Correlation between Production and Yield
display_correlation(data, 'Production', 'Yield')

# Correlation between Year and Production using Temp_Year

display_correlation(data, 'Temp_Year', 'Production')

# Correlation between Year and Yield using Temp_Year

display_correlation(data, 'Temp_Year', 'Yield')

# Correlation between Year and Area using Temp_Year

display_correlation(data, 'Temp_Year', 'Area')

# Drop the temporary column after analysis

data.drop(columns=['Temp_Year'], inplace=True)

elif st.session_state.page == "Seasonal Analysis":

st.title("Seasonal Analysis")
st.write("""Seasonal analysis is used in projects to identify and
understand patterns that occur
at regular intervals over a specific period, such as
weeks, months, quarters, or years.
This analysis helps in forecasting, decision-making, and
strategy formulation.""")
if st.button("Show Seasonal Analysis Graph"):
st.session_state.show_seasonal_analysis = not
st.session_state.show_seasonal_analysis

if st.session_state.show_seasonal_analysis:
@st.cache_resource
def plot_seasonal_analysis():
# Boxplot of production by season
data.drop(data.index[data.Season == 'Whole Year'], inplace
= True)
# data.Season != 'Whole Season'
plt.figure(figsize=(12, 6))
sns.boxplot(x='Season', y='Production', data=data)
plt.title('Production by Season')
plt.xlabel('Season')
plt.ylabel('Production (Tonnes)')
plt.xticks(rotation=45)

33
st.pyplot(plt)

plot_seasonal_analysis()

elif st.session_state.page == "Yield Prediction Model":

st.title("Yield Prediction Model")
st.write("""A Yield Prediction Model is essential for optimizing
resource use, financial planning,
and risk management in agriculture. It enables accurate
forecasting of crop yields, helping
farmers and businesses make informed decisions.
Calculating Mean Squared Error (MSE) is crucial
as it measures the average squared difference between
actual and predicted values, providing a
clear metric for model accuracy. Lower MSE values
indicate better model performance, guiding
improvements and comparisons between different
models.""")
if st.button("Show Yield Prediction Model Graph"):
st.session_state.show_yield_prediction_model = not
st.session_state.show_yield_prediction_model

if st.session_state.show_yield_prediction_model:
@st.cache_resource
def plot_yield_prediction_model():
data = pd.read_csv('India Agriculture Crop
Production.csv')
data = data.dropna(subset=['Area', 'Production', 'Yield'])

data[['Area', 'Production', 'Yield']] = data[['Area',

'Production', 'Yield']].apply(pd.to_numeric, errors='coerce')

X = data[['Area', 'Production']]
y = data['Yield']

# Check for any remaining NaNs

if X.isnull().any().any() or y.isnull().any():
st.write("Data contains missing values.")
return

34
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

st.write(f'Mean Squared Error: {mse}')

st.write("Actual vs Predicted Production")

comparison = pd.DataFrame({'Actual': y_test, 'Predicted':
y_pred})
st.line_chart(comparison)
plot_yield_prediction_model()

if __name__ == "__main__":
main()

Command to run the Project

python -m streamlit run agriculture_app1.py

35
Screenshots:

36
37
38
39
40
41
42
43
44
45
46
47
5. Conclusion

In conclusion, the Indian Agriculture Crop

Production Analysis provides critical insights into the
trends, patterns, and influencing factors of crop yield over
the years. By leveraging techniques such as correlation
analysis and predictive modeling with linear regression,
we can identify key variables that significantly impact
production.
This analysis not only helps in understanding past
performance but also enables accurate forecasting of
future yields, aiding in strategic planning and decision-
making. The integration of data science and machine
learning models, such as the Mean Squared Error
evaluation, enhances the accuracy of predictions and
optimizes agricultural practices.
Ultimately, this comprehensive analysis serves as a
valuable tool for policymakers, farmers, and researchers
to improve crop management, ensure food security, and
drive sustainable agricultural growth in India.

48
6. Future Enhancement
Remote Sensing and Satellite Imagery: Utilize remote sensing
technologies and satellite imagery to monitor crop health, soil moisture,
and other critical parameters in real-time, enabling more precise and
timely interventions.

IoT Integration: Deploy Internet of Things (IoT) devices in fields to

collect real-time data on weather conditions, soil properties, and crop
health. This data can be integrated with predictive models to enhance
decision-making.

Climate Change Impact Analysis: Conduct detailed studies on the

impact of climate change on crop production. Develop adaptive
strategies and models to mitigate adverse effects and ensure resilience in
agricultural practices.

Precision Agriculture: Implement precision agriculture techniques that

use data analytics to optimize the use of inputs like water, fertilizers, and
pesticides, thereby increasing efficiency and reducing environmental
impact.

Mobile Applications for Farmers: Develop user-friendly mobile

applications that provide farmers with real-time data, predictive insights,
and recommendations based on the latest analysis, empowering them to
make informed decisions.

49
References
https://www.youtube.com/
https://www.kaggle.com/
https://docs.streamlit.io/

Crop Prediction System Final Report
No ratings yet
Crop Prediction System Final Report
46 pages
Astm F 1145
100% (2)
Astm F 1145
12 pages
Cropyeildpredection
No ratings yet
Cropyeildpredection
94 pages
Narration Final
100% (2)
Narration Final
28 pages
A Model To Predict The Crop Based On Soil Properties Using Machine Learning
No ratings yet
A Model To Predict The Crop Based On Soil Properties Using Machine Learning
82 pages
Crop Report
No ratings yet
Crop Report
113 pages
MSC Computer Science Part II (SEM III & IV) Syllabus
No ratings yet
MSC Computer Science Part II (SEM III & IV) Syllabus
51 pages
Final Main Predictive Crop Analytics
No ratings yet
Final Main Predictive Crop Analytics
105 pages
Final
No ratings yet
Final
50 pages
Major Document
No ratings yet
Major Document
110 pages
01 Excel Test CL 11 and Below
100% (1)
01 Excel Test CL 11 and Below
23 pages
Full Charm SLD
0% (1)
Full Charm SLD
31 pages
Boeing 777-300ER Air New Zealand
No ratings yet
Boeing 777-300ER Air New Zealand
18 pages
Descriptive Statistics Project
No ratings yet
Descriptive Statistics Project
11 pages
Cell Organelle Chart-1
No ratings yet
Cell Organelle Chart-1
4 pages
Agricultural Crop Management System
No ratings yet
Agricultural Crop Management System
40 pages
Reliability: Supplement Outline
No ratings yet
Reliability: Supplement Outline
19 pages
Final Updated Project
No ratings yet
Final Updated Project
53 pages
Crop Yield Prediction Using ML Algorithms: A Mini Project Report On
No ratings yet
Crop Yield Prediction Using ML Algorithms: A Mini Project Report On
9 pages
Cefr Letters b2 and c1
No ratings yet
Cefr Letters b2 and c1
32 pages
Black Book 01
No ratings yet
Black Book 01
55 pages
SNEHA JADHAV Projects........... 2000
No ratings yet
SNEHA JADHAV Projects........... 2000
84 pages
English Grammar For ESL Learners
No ratings yet
English Grammar For ESL Learners
3 pages
Report
No ratings yet
Report
97 pages
1822 B.E Cse Batchno 46
No ratings yet
1822 B.E Cse Batchno 46
79 pages
Crop
No ratings yet
Crop
63 pages
PES1PG21CA154
No ratings yet
PES1PG21CA154
48 pages
By Narasimhalu R PES1PG21CA154
No ratings yet
By Narasimhalu R PES1PG21CA154
45 pages
AoR-2020 (MGNREGA), Vol-I
No ratings yet
AoR-2020 (MGNREGA), Vol-I
135 pages
Crop Yeild System
No ratings yet
Crop Yeild System
71 pages
19mic0054 VL2022230502792 Pe004
No ratings yet
19mic0054 VL2022230502792 Pe004
50 pages
Crop
No ratings yet
Crop
53 pages
Sem 7 Reportt
No ratings yet
Sem 7 Reportt
40 pages
Sample Project Report
No ratings yet
Sample Project Report
52 pages
Project Report - 03-IT-2016-2020
No ratings yet
Project Report - 03-IT-2016-2020
87 pages
Financial Planning
No ratings yet
Financial Planning
53 pages
6.27 N MSC Computer Science Syllab - NEP 31.08.2023 - Organized
No ratings yet
6.27 N MSC Computer Science Syllab - NEP 31.08.2023 - Organized
47 pages
Thesis
No ratings yet
Thesis
59 pages
Report - Review3 3
No ratings yet
Report - Review3 3
31 pages
Project Review II
No ratings yet
Project Review II
47 pages
Tempus Guidelines
No ratings yet
Tempus Guidelines
69 pages
Unit 2
No ratings yet
Unit 2
39 pages
Final Document
No ratings yet
Final Document
105 pages
Sharplcd13 15 20s1u2
No ratings yet
Sharplcd13 15 20s1u2
59 pages
Crop Yield Prediction Using Random Forest Algorithm
No ratings yet
Crop Yield Prediction Using Random Forest Algorithm
11 pages
Viraj Project Documentation
No ratings yet
Viraj Project Documentation
65 pages
Sristy Documentation Pno
No ratings yet
Sristy Documentation Pno
58 pages
Agripredict CAPSTONE Report
No ratings yet
Agripredict CAPSTONE Report
46 pages
IOT Report
No ratings yet
IOT Report
20 pages
School of Computer Science and Engineering: FACULTY: Prof. Jagalingam P Submitted For The Course
No ratings yet
School of Computer Science and Engineering: FACULTY: Prof. Jagalingam P Submitted For The Course
39 pages
RM For Computer Science
No ratings yet
RM For Computer Science
2 pages
Final Main
No ratings yet
Final Main
59 pages
Application of Machine Learning To The Process of Crop Selection Based On Land Dataset
No ratings yet
Application of Machine Learning To The Process of Crop Selection Based On Land Dataset
22 pages
MINI Project Report
No ratings yet
MINI Project Report
54 pages
Live Project
No ratings yet
Live Project
12 pages
76 Command Set
No ratings yet
76 Command Set
27 pages
Final Report
No ratings yet
Final Report
46 pages
Capstone Final Review 210
No ratings yet
Capstone Final Review 210
32 pages
Sat - 55.Pdf - Farmer Customer Trades Along With Crop Recommendation System and Crop Yield Prediction Using Machine Learning Techniques
No ratings yet
Sat - 55.Pdf - Farmer Customer Trades Along With Crop Recommendation System and Crop Yield Prediction Using Machine Learning Techniques
11 pages
Crop Presentation ML
No ratings yet
Crop Presentation ML
4 pages
CSE Pre Crop 01-1
No ratings yet
CSE Pre Crop 01-1
13 pages
Finaldefense
No ratings yet
Finaldefense
65 pages
Report 1
No ratings yet
Report 1
27 pages
CD&CM Nikhil1
No ratings yet
CD&CM Nikhil1
47 pages
Agricultural Data Analysis
No ratings yet
Agricultural Data Analysis
9 pages
ACKS - Class - Illusionist PDF
No ratings yet
ACKS - Class - Illusionist PDF
8 pages
Project Students Guide
No ratings yet
Project Students Guide
23 pages
36 1517222455 - 29-01-2018 PDF
No ratings yet
36 1517222455 - 29-01-2018 PDF
7 pages
Agriculture Crop Doxx Jayam25
No ratings yet
Agriculture Crop Doxx Jayam25
63 pages
Agriculture Crop Doxx Jayam25
No ratings yet
Agriculture Crop Doxx Jayam25
61 pages
BS en 00847-3-2004
No ratings yet
BS en 00847-3-2004
18 pages
Agro Data Dynamics: A Project Report
No ratings yet
Agro Data Dynamics: A Project Report
36 pages
Mini Project
No ratings yet
Mini Project
15 pages
Motors AC
No ratings yet
Motors AC
5 pages
Mckay Denise 222105489 Epm742 At1 3
No ratings yet
Mckay Denise 222105489 Epm742 At1 3
16 pages
Sample Project Proposal
No ratings yet
Sample Project Proposal
19 pages
Full Paraphrased
No ratings yet
Full Paraphrased
26 pages
Visvesvaraya Technological University: Jnana Sangama, Belgavi-590018, Karnataka, INDIA
No ratings yet
Visvesvaraya Technological University: Jnana Sangama, Belgavi-590018, Karnataka, INDIA
7 pages
Pawan Internship Synopsis Report
No ratings yet
Pawan Internship Synopsis Report
10 pages
8 More Projects
No ratings yet
8 More Projects
10 pages
Group 1
No ratings yet
Group 1
9 pages
(Ebook) Cause and Correlation in Biology: A User's Guide To Path Analysis, Structural Equations and Causal Inference by Bill Shipley ISBN 9780521529211, 0521529212 PDF Download
No ratings yet
(Ebook) Cause and Correlation in Biology: A User's Guide To Path Analysis, Structural Equations and Causal Inference by Bill Shipley ISBN 9780521529211, 0521529212 PDF Download
54 pages
D1
No ratings yet
D1
9 pages
Significance of Sara Pariksha in Ayurveda: A Critical Review: October 2018
No ratings yet
Significance of Sara Pariksha in Ayurveda: A Critical Review: October 2018
7 pages
Astm A641
No ratings yet
Astm A641
5 pages
ML SBMP Final
No ratings yet
ML SBMP Final
25 pages
Leading For The Future
No ratings yet
Leading For The Future
4 pages
Gowtham Resume
No ratings yet
Gowtham Resume
2 pages
Sociolinguistics
No ratings yet
Sociolinguistics
2 pages
OVERVIEW Cost Quality
No ratings yet
OVERVIEW Cost Quality
2 pages
Catalogue Mitsubishi 6D24TC
No ratings yet
Catalogue Mitsubishi 6D24TC
2 pages
Pink and Red Collage Modern Maximalist Art Trifold Brochure
No ratings yet
Pink and Red Collage Modern Maximalist Art Trifold Brochure
2 pages
Plato's Apology Essay
No ratings yet
Plato's Apology Essay
2 pages