0% found this document useful (0 votes)
12 views54 pages

MINI Project Report

Enhancing agriculture technique using machine learning and python

Uploaded by

Kavya Shree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views54 pages

MINI Project Report

Enhancing agriculture technique using machine learning and python

Uploaded by

Kavya Shree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

A MINI PROJECT REPORT

ON

ENHANCING AGRICULTURAL
EFFICIENCY TECHNIQUES IN
PRECISION FARMING

Submitted in partial fulfilment of the requirement


For the award of the degree of

BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE AND ENGINEERING


By

N. Saikavyasree 20P61A05F1
R. Nikhila 21P65A0518
P. Neha 20P61A05H4

Under the esteemed guidance of

Dr. A.L Sreenivasulu


Professor
Department of CSE

Aushapur(V), Ghatkesar(M), Hyderabad, Medchal – Dist, Telangana – 501 301.

September – 2023
Aushapur(V), Ghatkesar(M), Hyderabad, Medchal – Dist, Telangana – 501 301.

DEPARTMENT
OF
COMPUTER SCIENCE & ENGINEERING

CER TIF IC ATE

This is to certify that the mini project titled “ENHANCING

AGRICULTURAL EFFICIENCY TECHNIQUES IN PRECISION

FARMING” submitted by N. Saikavyasree (20P61A05F1), R. Nikhila

(21P65A0518), P. Neha (20P61A05H4)

in B.Tech IV-I semester Computer Science & Engineering is a record of


the bonafide work carried out by them.

The results embodied in this report have not been submitted to any

other University for the award of any degree.

INTERNAL GUIDE HEAD OF THE DEPARTMENT


Dr. A.L Sreenivasulu Dr. M. Venkateswara Rao
(Professor, CSE)

EXTERNAL EXAMINER

i
DECLARATION

We, N. Saikavyasree, R. Nikhila, P. Neha, bearing hall ticket numbers (20P61A05F1,

21P65A0518, 20P61A05H4) hereby declare that the mini project report entitled

“ENHANCING AGRICULTURAL EFFICIENCY TECHINQUES IN PRECISION

FARMING” under the guidance of Dr. A.L Sreenivasulu, Associate Professor, Department of

Computer Science and Engineering, Vignana Bharathi Institute of Technology, Hyderabad, have

submitted to Jawaharlal Nehru Technological University Hyderabad, Kukatpally, in partial

fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer

Science And Engineering.

This is a record of bonafide work carried out by us and the results embodied in this project

have not been reproduced or copied from any source. The results embodied in this project

report have not been submitted to any other university or institute for the award of any

other degree or diploma.

N. Saikavyasree (20P61A05F1)
R. Nikhila (21P65A0518)
P. Neha (20P61A05H4)

ii
ACKNOWLEDGEMENT

We are extremely thankful to our beloved Chairman, Dr. N. Goutham Rao and Secretary,

Dr. G. Manohar Reddy who took keen interest to provide us the infrastructural facilities for

carrying out the project work.

Self-confidence, hard work, commitment, and planning are essential to carry out any task.

Possessing these qualities is sheer waste, if an opportunity does not exist. So, we whole-

heartedly thank Dr. P. V. S. Srinivas, Principal, and Dr. M. Venkateswara Rao, Head of the

Department, Computer Science and Engineering for their encouragement and support and

guidance in carrying out the project.

We would like to express our indebtedness to the project coordinator, Mr. G. Srikanth

Reddy, Associate Professor, Department of CSE and Mrs. P. Subhadra, Associate Professor,

Department of CSE, for their valuable guidance during the course of project work.

We thank our Project Guide, Dr. A.L Sreenivasulu, Professor, Department of CSE, for

providing us with an excellent project and guiding us in completing our mini project

successfully.

We would like to express our sincere thanks to all the staff of Computer Science and

Engineering, VBIT, for their kind cooperation and timely help during the course of our project.

Finally, we would like to thank our parents and friends who have always stood by uswhenever

we were in need of them.

iii
ABSTRACT

Agriculture is one of the sectors where most people are dependent on it for their livelihood.
Crop production is influenced by a variety of seasonal, economic, and biological patterns,
yet unforeseen variations in these patterns result in significant losses for farmers. When
appropriate procedures are used on data linked to soil type, temperature, air pressure,
humidity, and crop type, these hazards can be mitigated. Basically, it focuses on the
important aspect of inter-field and intra-field variability for growing crops. Crop and weather
forecasting, on the other hand, may be forecasted by obtaining helpful insights from
agricultural yield, and crop cost prediction algorithms. The system maps the input given by
the user with the crop data that is already stored in the database to predict the crop that is
most appropriate for the user’s soil and environment.

Increasing environmental consciousness of the general public is necessitating us to modify


agricultural management practices for sustainable conservation of natural resources such as
water, air and soil quality, while staying economically profitable. With the help of new
technology, we can gather data related to soil and weather by which we can decide type of
crop can be planted to increase profit. In this project the hardware requirements are 8GB
RAM and 512GB Hard Disk. This project is developed using Python and platform is
Pycharm/google.colab.

iv
CONTENTS

CHAPTER PAGE NO.


1. Introduction
1.1. Introduction to system 1
1.2. Problem Detection 1
1.3. Objective 1
1.4. Aim of the project 2
2. Literature Survey
2.1. Existing system 4
2.2. Proposed system 4
2.3. Scope of the project 7
3. Analysis
3.1. Technical Feasibility 8
3.2. Operational Feasibility 8
3.3. Economical Feasibility 9
4. Hardware and Software
Requirements
4.1. Software requirement 10
4.2. Hardware requirement 10
5. System Design and
Implementation
5.1. Software Design 11
5.2. System Architecture 13
5.3. UML Diagrams 14
6. Source code and
Performance
evaluation
6.1. Testing 20
6.2. Output 26
7. Conclusion and Future work 31
8. References 32
LIST OF FIGURES

S.NO FIGURE NAME PAGE NO.

1. elbow method for optimal values 5

2. System Architecture Diagram 13

3. Use Case Diagram 15

4. Collaboration Diagram 16

5. Activity Diagram 17

6. Sequence Diagram 18

7. Deployment Diagram 19

8. Reading dataset and Description 26


for each of the columns in
dataset

9. Descriptive statistics and Cluster 27


analysis

10. Graphical representation of crops 28


in different conditions

11. Classification report of the data 29

12. Prediction of suitable crops and 30


Dataset sample
CHAPTER –1
1. INTRODUCTION

1.1. INTRODUCTION TO THE SYSTEM

Agricultural is the foundation of all economies. It has long been regarded as the primary and the
most important culture performed in each place. There are numerous methods for increasing and
improving agricultural output and quality. Data mining can be used to forecast crops. Data mining,
in general, is the process of examining data from various angles and synthesizing it into valuable
knowledge. Crop forecasting is a significant agricultural issue. Every farmer wants to know which
crop to grow in a given region at that particular time. Farmers are required to produce more and
more crops as the weather changes swiftly from day to day. Given the current circumstances, many
of them are unaware of the potential losses and are unaware of the benefits they receive by farming
them. To detect and process data, the proposed system uses machine learning and prediction
algorithms such as Logistic Regression and Clustering. This in turn will aid predicting the crop.

1.2. PROBLEM STATEMENT

K-means clustering is a simple unsupervised learning approach for resolving clustering problems.
Logistic regression is one of the most widely used Machine Learning algorithms, and it belongs to
the Supervised Learining approach. It is employed in the prediction of a categorical dependent
variable using a set of independent variables.

1.3. OBJECTIVE

The main objectives is to help farmers with the problems they face. When unforeseen changes in
climate can affect the crop production drastically. This can also be helpful in the significant losses
for farmers with proper mitigation process.

1
1.4. AIM OF THE PROJECT

The main aim of this project to help with the problems facing when unforeseen changes in climate
affecting the crop production and decreasing the losses, so by taking necessary measures we can decide
which crop to be planted.

2
CHAPTER –2
2. LITERATURE SURVEY

Jérôme Treboux, Dominique Genoud ( Institute of Information Systems, University of Applied


Sciences, HES-SO Valais, Sierre, Switzerland ) This paper presents the impact of machine learning
in precision agriculture. The study presents a comparison of an innovative machine learning
methodology compared to a baseline used classically on vineyard and agricultural objects. The main
drawback of this is that the aerial imaging maybe sometimes inaccurate due to weather conditions.
This causes miscalculation and may lead to wrong crop recommendation.

Priyadharshini A, Swapneel Chakraborty, Aayush Kumar, Omen Rajendra Pooniwala (Information


Science and Engineering, CMR Institute of Technology, Bengaluru, India) Agriculture plays a vital
role in the socioeconomic fabric of India. Failure of farmers to decide on the best-suited crop for
the land using traditional and non-scientific methods is a serious issue for a country where
approximately 58 percent of the population is involved in farming. Sometimes farmers fail to choose
the right crops based on the soil conditions, sowing season, and geographical location. The Results
with this research work are not very consistent and it has to be more optimized in order to get better
results.

M. Kalimuthu, P. Vaishnavi, M. Kishore (Bannari Amman Institute of Technology, Tamil Nadu,


India ) now-a-days, food production and prediction is getting depleted due to unnatural climatic
changes, which will adversely affect the economy of farmers by getting a poor yield and also help
the farmers to remain less familiar in forecasting the future crops. This research work helps the
beginner farmer in such a way to guide them for sowing the reasonable crops by deploying machine
learning, one of the advanced technologies in crop prediction. Naive Bayes, a supervised learning
algorithm puts forth in the way to achieve it. This system only uses Naïve bayes as a for resolving
clustering issues and it is less optimized.

3
2.1. EXISTING SYSTEM

Algorithms like Random Forest and Naive Bayes for precise agriculture have been used in the
past.

We used Random Forests (RF), for its ability to predict crop yield responses to different variables
at global and regional scales, in comparison with multiple linear regressions (MLR) serving as a
standard. When forecasting the extreme ends or reactions beyond the confines of the training data,
however, RF may result in a loss of accuracy. The Naive Bayes method is a supervised learning
technique for addressing classification problems that are based on Bayes theorem. The original
naive Bayes algorithm has a severe flaw: it generates redundant predictors.

Each method is more accurate than the others, but they are not optimised and do not work smoothly
on all computers.

2.2. PROPOSED SYSTEM

K-means clustering is a simple unsupervised learning approach for resolving clustering problems.
Logistic regression is one of the most widely used Machine Learning algorithms, and it belongs to
the Supervised Learning approach. It is employed in the prediction of a categorical dependent
variable using a set of independent variables.

2.2.1 Data ingestion

Data ingestion is the process of transferring information from many sources to a storage media
where it may be accessed, used, and analysed. A data warehouse, database, or document store is
frequently used as the destination. SaaS data, internal apps, databases, spread sheets, and even
information scraped from the internet can all be used as sources. Any analytics architecture's data
ingestion layer is the foundation. Data consistency and accessibility are essential for downstream
reporting and analytics. Different models or architectures might be used to construct a data ingestion
layer.
4
2.2.2 Data Pre-processing

Data pre-processing is a data mining approach for transforming raw data into a format that is both
useful and efficient.

The information presented here goes through two stages:


Data cleaning: It is critical that data be free of errors and undesirable data. As a result, the data is
cleared before proceeding. Missing values, duplicate records, and improper formatting are all
checked for and removed during data cleansing.
Data Transformation: Data transformation is the mathematical transformation of datasets; data is
changed into appropriate formats for data mining. This allows us to better grasp the data by
arranging the hundreds of records in a logical sequence. Normalisation, standardisation, and
attribute selection are examples of transformations.

2.2.3 Exploratory Data Analysis

Exploratory data analysis (EDA) is a technique for better understanding datasets using visual
features such as scatter plots and bar charts. This helps us to more effectively identify data trends
and execute analysis accordingly.

Fig: 2.2.3 elbow method for optimal values

5
2.2.4 Building and Training models

Train-Test Split:

In machine learning model development, it is desirable that the trained model performs well on
new, unseen data. To simulate new, unseen data, the available data is subjected to data
segmentation, which splits it into two.
The first part is a larger data subset typically used as the training set and the second is asmaller
subset typically used as the test set.
The training set is then used to build a predictive model, which is then applied to the testset to make
predictions. The best model is chosen based on its performance on the test set, and various
optimizations can be performed to obtain the best possible model.

Train validation test split:


Different approach for data splitting is to split the data into three parts:
Training set,
Validation set and
Test set

Similar to the above, the training set is used to build the predictive model and is evaluatedagainst the
validation set. This allows you to make predictions, tune your model, and select the best performing
model based on. Result of validation set. As you can see, it does the same for the validation set
instead, similar to the test set above. Note that the test set is not involved in model building and
preparation. Therefore, the test set can actually act as new hidden data.

2.2.5 Data Modelling


Data modelling entails building a data model for data that will be kept in a database. Modelling
entails training a Machine Learning Algorithm to predict labels from features, adjusting it for
business requirements, and verifying it on holdout data. Modelling is the result of a trained Model
that can be used to estimate new data points and make predictions. modelling is different from
previous processes in the machine learning process, and it uses standardised input so we may change
the prediction issue without having to rewrite all of our code. We can produce new label timings,
6
build associated features, and insert them into the model if the business requirements change.
Models are developed and then evaluated.

2.3. SCOPE OF THE PROJECT

The applications of this project are at present limited, but with proper research, great results are
expected. There are limitations with this project, such as taking a larger amount of data to determine
suitable crops than the conventional methods. But there is scope for improvement. With better
algorithms, there could be a significant decrease in evaluation time. Further, this project can also
be extended by creating an application, to be compatible with mobiles, so that the users can
determine or knowing their own crop without relying on others to decide which crop to be planted.

7
CHAPTER –3
3. ANALYSIS

3.1. FEASIBILITY STUDY

A feasibility study is a high-level capsule version of the entire system analysis and Design process.
The study begins by classifying the problem definition. Feasibility is to determine if it’s worth
doing.
Once an acceptance problem definition has been generated, the analyst develops a logical model of
the system. A search for alternatives is analyzed carefully.

Three key considerations involved in the feasibility analysis are

• Technical Feasibility
• Operational Feasibility
• Economic Feasibility

3.1.1 Technical Feasibility

To determine whether the proposed system is technically feasible, a number of issues have to be
considered while doing technical analysis.
Understand the different technologies involved in the proposed system. Find out whether the
organization currently possesses the required technologies.

3.1.2 Operational Feasibility

To determine the operational feasibility of the system we should take into consideration the
awareness level of the users. This system is operationally feasible since the users are familiar with
the agriculture technologies and hence it helps to reduce the hardships encountered in the existing
manual system, the new system was considered to be operationally feasible, very friendly, and easy
to use.
8
3.1.3 Economical Feasibility
To decide whether a project is economically feasible, we have to consider various factors

• Cost benefit analysis


• Long-term returns
• Maintenance costs

The proposed system is Python. It requires average computing capabilities and access to the internet,
which are very basic requirements hence it doesn’t incur any additional economic overheads, which
renders the system to be economically feasible.
9
CHAPTER– 4
4. HARDWARE AND SOFTWARE REQUIREMENTS

4.1. Hardware Requirements

 Processor: 64-bit, intel-core i5, 2.1 GHz minimum per core


 RAM: 8GB
 Hard Disk: 512 GB SSD

4.2. Software Requirements

 Operating System: Microsoft Windows 10


 IDE: PyCharm/google.colab
 Language Used: Python

10
CHAPTER– 5
5. SYSTEM DESIGN AND IMPLEMENTATION

System design is the transition from a user-oriented document to programmers or database


personnel. The design is a solution, specifying how to approach to the creation of a new system.
This is composed of several steps. It provides the understanding and procedural details necessary
for implementing the system recommended in the feasibility study. Designing goes through logical
and physical stages of development. Logical design reviews the present physical system, prepare
input and output specification, details of implementation plan and prepare a logical design
walkthrough.

The database tables are designed by analyzing functions involved in the system and format of the
fields is also designed. The fields in the database tables should define their role in the system. The
unnecessary fields should be avoided because it affects the storage areas of the system. Then, in the
input and output screen design, the design should be made user friendly. The menu should be
precise and compact.

5.1. SOFTWARE DESIGN

 Modularity and partitioning: software is designed in such a way that; each system should
consist of hierarchy of modules and serve to partition into separate function.
 Coupling: modules should have little dependency on the other modules of a system.

 Cohesion: modules should carry out the operations in a single processing function.

 Shared use: avoid duplication by allowing a single module which is called by other, that
needs the function it provides.
11
5.1.1. INPUT DESIGN

Considering the requirements, procedures are adopted to collect the necessary input data in most
efficiently designed format. The input design has to be done keeping in view that, the interaction of
the user with the system should be in the most effective and simplified way. Also, the necessary
measures are taken for the following
 Controlling the amount of input

 Avoid unauthorized access to the users

 Eliminating the extra steps

 Keeping the process simple

 At this stage the input forms and screens are designed.

5.1.2. OUTPUT DESIGN

All the screens of the system are designed with a view to provide the user with easy operations in a
simpler and efficient way, with minimum key strokes possible. Important information is emphasized
on the screen. Almost every screen is provided with no error and important messages and option
selection facilitates. Emphasis is given for faster processing and speedy transactions between the
screens. Each screen assigned to make it as much user friendly as possible by using interactive
procedures. In other words, we can say that the user can operate the system without much help
from the operating manual.
12

5.2. SYSTEM ARCHITECTURE

Firstly, we collect data from various resources and then ingest the data. Which is then gone through
data pre-processing stages then the final data is used for the project execution. Here the data is
accessed for further processes which are used in the predictionof the crop that is most suitable and
convenient to plant according the weather and soil conditions of the area.

Fig: 5.2 System Architecture diagram for Precision Farming


13
5.3. UML DIAGRAMS

Unified Modelling Language

The Unified Modelling Language (UML) is a standard language for specifying, visualizing,
constructing, and documenting the artefacts of software systems, as well as for business modelling
and other non-software systems. The UML represents a collection of best engineering practices that
have proven to be successful in the modelling of large and complex systems. The UML is a very
important part of developing object-oriented software and the software development process. UML
mostly uses graphical notations to express the design of software projects. Using the UML helps
the project teams to communicate, explore potential designs and validate the architectural design
of the software.

The Unified Modelling Language (UML) is a standard language for writing software blue prints.
The UML is a language for

 Visualizing
 Specifying
 Constructing
 Documenting the artifacts of a software system.

UML is a language which provides vocabulary and the rules for combining words in that
vocabulary for the purpose of communication. A modelling language is a language whose
vocabulary and the rules focus on the conceptual and physical representation of a system. Modelling
yields an understanding of a system.
14
5.3.1. USE CASE DIAGRAM

 Use case Diagram consists of use case and actors.

 The main purpose is to show the interaction between the use cases and the actor.

 It intends to represent the system requirements from user‟s perspective.

 The use cases are the functions that are to be performed in the module.

Fig: 5.3.1. Use Case Diagram for Precision Farming

15
5.3.2. COLLABORATION DIAGRAM

 An architecture of a system defines the working model of a system.


 It provides an overview of a system in the form of UML (unified modelling language)
diagrams.

Fig: 5.3.2. Collaboration Diagram for precision farming

16
5.3.3. ACTIVITY DIAGRAM

 It shows the flow of the various activities that are undergone from the beginning
till the end.
 It consists of the activities that are held and carried out throughout the session
from starting till the ending stage.

Fig: 5.3.3. Activity Diagram for Precision Farming


17
5.3.4. SEQUENCE DIAGRAM

 It shows the sequence of the steps that are carried out throughout the process
of execution.
 It involves lifelines or life time of a process that shows the duration for which
the process is alive while the steps are taking place in the sequential manner.
 Sequence diagram specifies the order in which the various steps are executed.

Fig: 5.3.4. Sequence Diagram for Precision Farming


18
5.3.5. DEPLOYMENT DIAGRAM

 The deployment diagram visualizes the physical hardware on which the software will be
deployed. It portrays the static deployment view of a system. It involves the nodes and
their relationships.
 The deployment diagram visualizes the physical hardware on which the software will be
deployed. It portrays the static deployment view of a system. It involves the nodes and
their relationships.

Fig: 5.3.5. Deployment Diagram for precision farming


19

CHAPTER–6
6. SOURCE CODE AND PERFORMANCE EVALUATION

6.1. Testing

Testing is a critical phase in precision agriculture to ensure the accuracy, robustness, and
effectiveness of the model and systems. Here is a detailed explanation of some common testing
methods used in such projects:

1. Data Quality Testing: This method focuses on collected data is accurate, complete, and
consistent. Identify and address missing values, outliers and noise in the dataset. Test methods
for filling missing data points where necessary.

2. Real-time Data Testing: Real-time Data testing verify the models ability to handle and process
real time data streams, such as weather updates. Assess the models response time in providing
recommendations or predictions.

3. Performance Testing: Performance testing evaluates the assess accuracy, precision, recall, F1-
score and other relevant metrics based on the projects objectives. Use metrics such as RMSE,
MAE and R-squared for regression tasks.

4. Security Testing: Security is a crucial aspect it test data encryption, access controls and data
privacy measures. Assess model security against tampering and anauthorized access.

5. Integration Testing: Integartion testing verify that APIs and data integrations with external
systems(eg: weather APIs farm management software) work as expected. Ensure compatibility
with various devices and operating systems used on farms.

6. Model Robustness Testing: Test the models vulnerability to adversarial attacks or input
variations. Evaluate model robustness to noise, interference, or sensor inaccuracies in the field.

7. Scalability Testing: Scalability testing assess the models performance as the dataset size
increases. Evaluate the systems ability to handle multiple users or devices simultaneously.

8. User Interface (UI) and User Experience (UX) Testing: User Interface and user experience
testing ensure that the user interface is intuitive and easy to use for farmers and stakeholders.
Gather feedback from users to make improvements in UI/UX.

9. Failure and Recovery Testing: Failure and Recovery testing simulate system failures (eg:
server downtime) and test the recovery mechanisms.
20
10. User Acceptance Testing (UAT): User Acceptance Testing involve end-users (farmers,
agronomists) in testing to gather their feedback and ensure the system meets their needs.

These testing methods are crucial to adapt to changing environmental conditions and evolving user
needs. Additionally, consider conducting field trials and validations to assess how well the machine
learning models perform in real-world farming environments.
21
Source code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

data = pd.read_csv('data.csv')
data.head()

data.isnull()

print("Average Ratio of N : {0:.2f}".format(data['N'].mean()))


print("Average Ratio of P: {0:.2f}".format(data['P'].mean()))
print("Average Ratio of K: {0:.2f}".format(data['K'].mean()))
print("Average Temperature : {0:.2f}".format(data['temperature'].mean()))
print("Average Humidity : {0:.2f}".format(data['humidity'].mean()))
print("Average PH value : {0:.2f}".format(data['ph'].mean()))
print("Average Rainfall : {0:.2f}".format(data['rainfall'].mean()))

print("Winter Crops")
print(data[(data['temperature']<20.1)&(data['humidity']>30.1)]['label'].unique())
print("\n")
print("Rainy Crops")
print(data[(data['rainfall']>=200.9)&(data['humidity']>=30.1)]['label'].unique())
print("\n")
print('Summer Crops')
print(data[(data['temperature']>=30.1)&(data['humidity']>=60.1)]['label'].unique())

x = data.loc[:, ['N','P','K','temperature','ph','humidity','rainfall']].values
data_x = pd.DataFrame(x)
from sklearn.cluster import KMeans
k_means = KMeans(max_iter = 1000,n_clusters = 4, n_init = 10,init = 'k-means++')
ymeans = k_means.fit_predict(x)
n = data['label']
ymeans = pd.DataFrame(ymeans)
m = pd.concat([ymeans,n], axis = 1)
m = m.rename(columns = {0: 'cluster'})
print("K_Means Cluster Analysis\n")
print("Crops in First Cluster: ", m[m['cluster'] == 0]['label'].unique())
print("\t")
print("Crops in Second Cluster:", m[m['cluster'] == 1]['label'].unique())
print("\t")
print("Crops in Third Cluster:", m[m['cluster'] == 2]['label'].unique())
print("\t")
print("Crops in Forth Cluster:", m[m['cluster'] == 3]['label'].unique())

22
plt.subplot(3,5,1)
plt.xlabel('Nitrogen',fontsize=5)
sns.barplot(x=data['N'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 5, 2)
plt.xlabel(' Phosphorous',fontsize=5)
sns.barplot(x=data['P'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 3)
plt.xlabel('Potassium',fontsize=5)
sns.barplot(x=data['K'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 4)
plt.xlabel('Temperature',fontsize=5)
sns.barplot(x=data['temperature'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 5)
plt.xlabel('Humidity',fontsize=5)
sns.barplot(x=data['humidity'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 6)
plt.xlabel('pH value', fontsize = 5)
sns.barplot(x=data['ph'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 7)
plt.xlabel('Rainfall',fontsize=5)
sns.barplot(x=data['rainfall'],y=data['label'])
plt.ylabel(' ')
plt.suptitle('Different Conditions on Crops',fontsize=15)
plt.show()

y=data['label']
x=data.drop(['label'],axis = 1)

from sklearn.model_selection import train_test_split


train_x, test_x, train_y, test_y = train_test_split(x, y, test_size = 0.2)
print("Shape of X Train:", train_x.shape)
print("Shape of X Test:", test_x.shape)
print("Shape of Y Train:", train_y.shape)
print("Shape of Y Test:", test_y.shape)

from sklearn.linear_model import LogisticRegression


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_x_scaled = scaler.fit_transform(train_x)
test_x_scaled = scaler.transform(test_x)
model = LogisticRegression(max_iter=1000, solver='sag', penalty='l2')
model.fit(train_x_scaled, train_y)
pred_y = model.predict(test_x_scaled)

from sklearn.metrics import classification_report

23
cr = classification_report(test_y, pred_y)
print(cr)

prediction =model.predict((np.array([[90,40,40,20,80,7,200]])))
print("The recommended Crop for Climatic Condition :", prediction)

prediction =model.predict((np.array([[90,40,20,40,60,20,200]])))
print("The recomended Crop for Given Climatic Condition :", prediction)

6.1.1. Libraries used:

In this project we use various libraries like NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn.

 NumPy: It is a library that consists of multi - dimensional (2d) array objects and a set of
approaches for manipulating them. NumPy can be used to conduct logical and mathematical
operations on arrays.

 Pandas: Pandas is a data cleaning and analysis tool that is widely used in Data Science and
Machine Learning. Here, Pandas are used to read the following dataset.

 Matplotlib: It's a cross-platform library that creates 2D charts from array data. It has an object-
oriented API for embedding plots in Python GUI toolkits like PyQt and WxPython Tkinter. We
use Matplotlib to plot graphs for bettervisualisation of clustering.

 Seaborn: Seaborn is a matplotlib-based data visualisation library. It helps increating statistical


graphs easily.

 Scikit-learn: Classification, regression and clustering are some of the usefultools in the sklearn
package. Cross-validation, feature extraction, supervised learning methods, and unsupervised
learning algorithms are all features of scikit learn.

 sklean.clusters.KMeans: It is used to combine the attributes of similar data points together. It


scales well to big/large no. of values and has been used in a wide range of application fields.

 sklearn.model_selection.train: split the arrays into training and testing subsets.

24
 Sklearn.linear_model.LogisticRegression: is a classificationtechnique that is used to predict
the categorical dependent variable.

 sklearn.metrics.classification_report: A classification report consistsof precision, recall, fi-


score and support which gives us the accuracy of the model.

6.1.2. Loading Dataset:

We read the csv file by using ‘pd.read_csv’ command and then, check if there are any null values
present in them by using ‘isnull()’

6.1.3. Descriptive Statistics:

Here, we get a descriptive statistic and see different crops that grow in different seasons.

6.1.4. K-means Implementation:

Now, we use k-means. Then, use Matplotlib and Seaborn for visualization of the clustering.

6.1.5. Visual Representation:

The following graph shows the visual representation of crops that can be grown in different
conditions.

6.1.6. Predictive Modelling:

Now, let's split the Dataset for Predictive Modelling. Then, by using “train_test_split” training and
testing of subsets is done for validation of results. Finally, we create a Predictive Model.
6.1.7. Classification Report:

Here we get the classification report for using logistic regression technique.

6.1.8. Final Output:

Finally, by giving the values of N, P, K, Temperature, Ph, Humidity, Rainfall weget the suitable
crop for those climatic conditions.

25
6.2. Ouput

Fig: 6.2.1 Reading dataset

Fig: 6.2.2 Description for each of the columns in dataset

26
Fig: 6.2.3 Descriptive statistics

Fig: 6.2.4 Cluster analysis


27

Fig: 6.2.5 Graphical representation of crops in different conditions


28

Fig: 6.2.6 Classification report of the data


29

Fig: 6.2.7 Prediction of suitable crops

Database sample:

Fig: 6.2.8 Dataset sample


.. 30
CHAPTER–7
7. CONCLUSION AND FUTURE
SCOPE

CONCLUSION:

To conclude this documentation, we have discussed what our project is, its rationale and the goal
we achieve with this project. We applied specific data analysis techniques to find out a suitable crop
by using existing data. Logistic regression is used to extract important information from agriculture
recording. The analytical process began with data ingestion followed by data cleansing and
processing, missing value andresearch. It concludes by analysing the data, and finally modelling
and evaluation. The highest accuracy in the public test set. Higher accuracy evaluation of machine
learning method by mutual verification calculation verification, precision, recall, F1 score.

The proposed system takes into account relevant data on nitrogen, phosphorus, potassium,
temperature, rainfall, and highest yield crops over the past year. It can be cultivated underappropriate
environmental conditions. Lists all possible crops that are cultivated and it helps farmers make
decisions about the culture that is cultivated. The system also takes into account historical data
generation. This helps farmers gain insights into the requirements of the various crops that are
adequate to grow on the given plot of land.

Future Scope:

The applications of this project are at present limited, but with proper research, great results are
expected. There are limitations with this project, such as taking a larger amount of data to determine
suitable crops than the conventional methods. But there is scope for improvement. With better
algorithms, there could be a significant decrease in evaluation time.

Further, this project can also be extended by creating an application, to be compatible with
mobiles, so that the users can determine their own crop without relying on others.

31
CHAPTER–8
8. REFERENCES

1. https://scholar.google.co.in/scholar?q=random+forest+based+on+crop+prediction&h
l=en&as_ sdt=0&as_vis=1&oi=scholart - random forest existing system

2. https://ieeexplore.ieee.org/abstract/document/9214190 - NAIVE BAYES

3. https://publications.waset.org/9997276/modified-naive-bayes-based-prediction- modeling-
for-cr op-yield-prediction

4. https://en.wikipedia.org/wiki/Agriculture

5. https://en.wikipedia.org/wiki/Data_analysis

6. M.R. Bendre, R.C. Thool, V.R.Thool, “Big Data in Precision agriculture”,Sept,2015


NGCT.

7. Monali Paul, Santosh K. Vishwakarma, Ashok Verma, “Analysis of Soil Behavior and
Prediction of Crop Yield using Data Mining approach”

8. Abdullah Na, William Isaac, ShashankVarshney, Ekram Khan, “An IoT Based
System for Remote Monitoring of Soil Characteristics”, 2016 International Conference
of Information Technology.

9. Dr.N.Suma, Sandra Rhea Samson, S.Saranya, G.Shanmugapriya, R.Subhashri, “IOT


Based Smart Agriculture Monitoring System”, Feb 2017 IJRITCC.

10. N.Heemageetha, “A survey on Application of Data Mining Techniques to Analysethe


soil for agricultural purpose”, 2016IEEE.

32

You might also like