MINI Project Report
MINI Project Report
ON
ENHANCING AGRICULTURAL
EFFICIENCY TECHNIQUES IN
PRECISION FARMING
BACHELOR OF TECHNOLOGY
IN
N. Saikavyasree 20P61A05F1
R. Nikhila 21P65A0518
P. Neha 20P61A05H4
September – 2023
Aushapur(V), Ghatkesar(M), Hyderabad, Medchal – Dist, Telangana – 501 301.
DEPARTMENT
OF
COMPUTER SCIENCE & ENGINEERING
The results embodied in this report have not been submitted to any
EXTERNAL EXAMINER
i
DECLARATION
21P65A0518, 20P61A05H4) hereby declare that the mini project report entitled
FARMING” under the guidance of Dr. A.L Sreenivasulu, Associate Professor, Department of
Computer Science and Engineering, Vignana Bharathi Institute of Technology, Hyderabad, have
fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer
This is a record of bonafide work carried out by us and the results embodied in this project
have not been reproduced or copied from any source. The results embodied in this project
report have not been submitted to any other university or institute for the award of any
N. Saikavyasree (20P61A05F1)
R. Nikhila (21P65A0518)
P. Neha (20P61A05H4)
ii
ACKNOWLEDGEMENT
We are extremely thankful to our beloved Chairman, Dr. N. Goutham Rao and Secretary,
Dr. G. Manohar Reddy who took keen interest to provide us the infrastructural facilities for
Self-confidence, hard work, commitment, and planning are essential to carry out any task.
Possessing these qualities is sheer waste, if an opportunity does not exist. So, we whole-
heartedly thank Dr. P. V. S. Srinivas, Principal, and Dr. M. Venkateswara Rao, Head of the
Department, Computer Science and Engineering for their encouragement and support and
We would like to express our indebtedness to the project coordinator, Mr. G. Srikanth
Reddy, Associate Professor, Department of CSE and Mrs. P. Subhadra, Associate Professor,
Department of CSE, for their valuable guidance during the course of project work.
We thank our Project Guide, Dr. A.L Sreenivasulu, Professor, Department of CSE, for
providing us with an excellent project and guiding us in completing our mini project
successfully.
We would like to express our sincere thanks to all the staff of Computer Science and
Engineering, VBIT, for their kind cooperation and timely help during the course of our project.
Finally, we would like to thank our parents and friends who have always stood by uswhenever
iii
ABSTRACT
Agriculture is one of the sectors where most people are dependent on it for their livelihood.
Crop production is influenced by a variety of seasonal, economic, and biological patterns,
yet unforeseen variations in these patterns result in significant losses for farmers. When
appropriate procedures are used on data linked to soil type, temperature, air pressure,
humidity, and crop type, these hazards can be mitigated. Basically, it focuses on the
important aspect of inter-field and intra-field variability for growing crops. Crop and weather
forecasting, on the other hand, may be forecasted by obtaining helpful insights from
agricultural yield, and crop cost prediction algorithms. The system maps the input given by
the user with the crop data that is already stored in the database to predict the crop that is
most appropriate for the user’s soil and environment.
iv
CONTENTS
4. Collaboration Diagram 16
5. Activity Diagram 17
6. Sequence Diagram 18
7. Deployment Diagram 19
Agricultural is the foundation of all economies. It has long been regarded as the primary and the
most important culture performed in each place. There are numerous methods for increasing and
improving agricultural output and quality. Data mining can be used to forecast crops. Data mining,
in general, is the process of examining data from various angles and synthesizing it into valuable
knowledge. Crop forecasting is a significant agricultural issue. Every farmer wants to know which
crop to grow in a given region at that particular time. Farmers are required to produce more and
more crops as the weather changes swiftly from day to day. Given the current circumstances, many
of them are unaware of the potential losses and are unaware of the benefits they receive by farming
them. To detect and process data, the proposed system uses machine learning and prediction
algorithms such as Logistic Regression and Clustering. This in turn will aid predicting the crop.
K-means clustering is a simple unsupervised learning approach for resolving clustering problems.
Logistic regression is one of the most widely used Machine Learning algorithms, and it belongs to
the Supervised Learining approach. It is employed in the prediction of a categorical dependent
variable using a set of independent variables.
1.3. OBJECTIVE
The main objectives is to help farmers with the problems they face. When unforeseen changes in
climate can affect the crop production drastically. This can also be helpful in the significant losses
for farmers with proper mitigation process.
1
1.4. AIM OF THE PROJECT
The main aim of this project to help with the problems facing when unforeseen changes in climate
affecting the crop production and decreasing the losses, so by taking necessary measures we can decide
which crop to be planted.
2
CHAPTER –2
2. LITERATURE SURVEY
3
2.1. EXISTING SYSTEM
Algorithms like Random Forest and Naive Bayes for precise agriculture have been used in the
past.
We used Random Forests (RF), for its ability to predict crop yield responses to different variables
at global and regional scales, in comparison with multiple linear regressions (MLR) serving as a
standard. When forecasting the extreme ends or reactions beyond the confines of the training data,
however, RF may result in a loss of accuracy. The Naive Bayes method is a supervised learning
technique for addressing classification problems that are based on Bayes theorem. The original
naive Bayes algorithm has a severe flaw: it generates redundant predictors.
Each method is more accurate than the others, but they are not optimised and do not work smoothly
on all computers.
K-means clustering is a simple unsupervised learning approach for resolving clustering problems.
Logistic regression is one of the most widely used Machine Learning algorithms, and it belongs to
the Supervised Learning approach. It is employed in the prediction of a categorical dependent
variable using a set of independent variables.
Data ingestion is the process of transferring information from many sources to a storage media
where it may be accessed, used, and analysed. A data warehouse, database, or document store is
frequently used as the destination. SaaS data, internal apps, databases, spread sheets, and even
information scraped from the internet can all be used as sources. Any analytics architecture's data
ingestion layer is the foundation. Data consistency and accessibility are essential for downstream
reporting and analytics. Different models or architectures might be used to construct a data ingestion
layer.
4
2.2.2 Data Pre-processing
Data pre-processing is a data mining approach for transforming raw data into a format that is both
useful and efficient.
Exploratory data analysis (EDA) is a technique for better understanding datasets using visual
features such as scatter plots and bar charts. This helps us to more effectively identify data trends
and execute analysis accordingly.
5
2.2.4 Building and Training models
Train-Test Split:
In machine learning model development, it is desirable that the trained model performs well on
new, unseen data. To simulate new, unseen data, the available data is subjected to data
segmentation, which splits it into two.
The first part is a larger data subset typically used as the training set and the second is asmaller
subset typically used as the test set.
The training set is then used to build a predictive model, which is then applied to the testset to make
predictions. The best model is chosen based on its performance on the test set, and various
optimizations can be performed to obtain the best possible model.
Similar to the above, the training set is used to build the predictive model and is evaluatedagainst the
validation set. This allows you to make predictions, tune your model, and select the best performing
model based on. Result of validation set. As you can see, it does the same for the validation set
instead, similar to the test set above. Note that the test set is not involved in model building and
preparation. Therefore, the test set can actually act as new hidden data.
The applications of this project are at present limited, but with proper research, great results are
expected. There are limitations with this project, such as taking a larger amount of data to determine
suitable crops than the conventional methods. But there is scope for improvement. With better
algorithms, there could be a significant decrease in evaluation time. Further, this project can also
be extended by creating an application, to be compatible with mobiles, so that the users can
determine or knowing their own crop without relying on others to decide which crop to be planted.
7
CHAPTER –3
3. ANALYSIS
A feasibility study is a high-level capsule version of the entire system analysis and Design process.
The study begins by classifying the problem definition. Feasibility is to determine if it’s worth
doing.
Once an acceptance problem definition has been generated, the analyst develops a logical model of
the system. A search for alternatives is analyzed carefully.
• Technical Feasibility
• Operational Feasibility
• Economic Feasibility
To determine whether the proposed system is technically feasible, a number of issues have to be
considered while doing technical analysis.
Understand the different technologies involved in the proposed system. Find out whether the
organization currently possesses the required technologies.
To determine the operational feasibility of the system we should take into consideration the
awareness level of the users. This system is operationally feasible since the users are familiar with
the agriculture technologies and hence it helps to reduce the hardships encountered in the existing
manual system, the new system was considered to be operationally feasible, very friendly, and easy
to use.
8
3.1.3 Economical Feasibility
To decide whether a project is economically feasible, we have to consider various factors
The proposed system is Python. It requires average computing capabilities and access to the internet,
which are very basic requirements hence it doesn’t incur any additional economic overheads, which
renders the system to be economically feasible.
9
CHAPTER– 4
4. HARDWARE AND SOFTWARE REQUIREMENTS
10
CHAPTER– 5
5. SYSTEM DESIGN AND IMPLEMENTATION
The database tables are designed by analyzing functions involved in the system and format of the
fields is also designed. The fields in the database tables should define their role in the system. The
unnecessary fields should be avoided because it affects the storage areas of the system. Then, in the
input and output screen design, the design should be made user friendly. The menu should be
precise and compact.
Modularity and partitioning: software is designed in such a way that; each system should
consist of hierarchy of modules and serve to partition into separate function.
Coupling: modules should have little dependency on the other modules of a system.
Cohesion: modules should carry out the operations in a single processing function.
Shared use: avoid duplication by allowing a single module which is called by other, that
needs the function it provides.
11
5.1.1. INPUT DESIGN
Considering the requirements, procedures are adopted to collect the necessary input data in most
efficiently designed format. The input design has to be done keeping in view that, the interaction of
the user with the system should be in the most effective and simplified way. Also, the necessary
measures are taken for the following
Controlling the amount of input
All the screens of the system are designed with a view to provide the user with easy operations in a
simpler and efficient way, with minimum key strokes possible. Important information is emphasized
on the screen. Almost every screen is provided with no error and important messages and option
selection facilitates. Emphasis is given for faster processing and speedy transactions between the
screens. Each screen assigned to make it as much user friendly as possible by using interactive
procedures. In other words, we can say that the user can operate the system without much help
from the operating manual.
12
Firstly, we collect data from various resources and then ingest the data. Which is then gone through
data pre-processing stages then the final data is used for the project execution. Here the data is
accessed for further processes which are used in the predictionof the crop that is most suitable and
convenient to plant according the weather and soil conditions of the area.
The Unified Modelling Language (UML) is a standard language for specifying, visualizing,
constructing, and documenting the artefacts of software systems, as well as for business modelling
and other non-software systems. The UML represents a collection of best engineering practices that
have proven to be successful in the modelling of large and complex systems. The UML is a very
important part of developing object-oriented software and the software development process. UML
mostly uses graphical notations to express the design of software projects. Using the UML helps
the project teams to communicate, explore potential designs and validate the architectural design
of the software.
The Unified Modelling Language (UML) is a standard language for writing software blue prints.
The UML is a language for
Visualizing
Specifying
Constructing
Documenting the artifacts of a software system.
UML is a language which provides vocabulary and the rules for combining words in that
vocabulary for the purpose of communication. A modelling language is a language whose
vocabulary and the rules focus on the conceptual and physical representation of a system. Modelling
yields an understanding of a system.
14
5.3.1. USE CASE DIAGRAM
The main purpose is to show the interaction between the use cases and the actor.
The use cases are the functions that are to be performed in the module.
15
5.3.2. COLLABORATION DIAGRAM
16
5.3.3. ACTIVITY DIAGRAM
It shows the flow of the various activities that are undergone from the beginning
till the end.
It consists of the activities that are held and carried out throughout the session
from starting till the ending stage.
It shows the sequence of the steps that are carried out throughout the process
of execution.
It involves lifelines or life time of a process that shows the duration for which
the process is alive while the steps are taking place in the sequential manner.
Sequence diagram specifies the order in which the various steps are executed.
The deployment diagram visualizes the physical hardware on which the software will be
deployed. It portrays the static deployment view of a system. It involves the nodes and
their relationships.
The deployment diagram visualizes the physical hardware on which the software will be
deployed. It portrays the static deployment view of a system. It involves the nodes and
their relationships.
CHAPTER–6
6. SOURCE CODE AND PERFORMANCE EVALUATION
6.1. Testing
Testing is a critical phase in precision agriculture to ensure the accuracy, robustness, and
effectiveness of the model and systems. Here is a detailed explanation of some common testing
methods used in such projects:
1. Data Quality Testing: This method focuses on collected data is accurate, complete, and
consistent. Identify and address missing values, outliers and noise in the dataset. Test methods
for filling missing data points where necessary.
2. Real-time Data Testing: Real-time Data testing verify the models ability to handle and process
real time data streams, such as weather updates. Assess the models response time in providing
recommendations or predictions.
3. Performance Testing: Performance testing evaluates the assess accuracy, precision, recall, F1-
score and other relevant metrics based on the projects objectives. Use metrics such as RMSE,
MAE and R-squared for regression tasks.
4. Security Testing: Security is a crucial aspect it test data encryption, access controls and data
privacy measures. Assess model security against tampering and anauthorized access.
5. Integration Testing: Integartion testing verify that APIs and data integrations with external
systems(eg: weather APIs farm management software) work as expected. Ensure compatibility
with various devices and operating systems used on farms.
6. Model Robustness Testing: Test the models vulnerability to adversarial attacks or input
variations. Evaluate model robustness to noise, interference, or sensor inaccuracies in the field.
7. Scalability Testing: Scalability testing assess the models performance as the dataset size
increases. Evaluate the systems ability to handle multiple users or devices simultaneously.
8. User Interface (UI) and User Experience (UX) Testing: User Interface and user experience
testing ensure that the user interface is intuitive and easy to use for farmers and stakeholders.
Gather feedback from users to make improvements in UI/UX.
9. Failure and Recovery Testing: Failure and Recovery testing simulate system failures (eg:
server downtime) and test the recovery mechanisms.
20
10. User Acceptance Testing (UAT): User Acceptance Testing involve end-users (farmers,
agronomists) in testing to gather their feedback and ensure the system meets their needs.
These testing methods are crucial to adapt to changing environmental conditions and evolving user
needs. Additionally, consider conducting field trials and validations to assess how well the machine
learning models perform in real-world farming environments.
21
Source code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
data = pd.read_csv('data.csv')
data.head()
data.isnull()
print("Winter Crops")
print(data[(data['temperature']<20.1)&(data['humidity']>30.1)]['label'].unique())
print("\n")
print("Rainy Crops")
print(data[(data['rainfall']>=200.9)&(data['humidity']>=30.1)]['label'].unique())
print("\n")
print('Summer Crops')
print(data[(data['temperature']>=30.1)&(data['humidity']>=60.1)]['label'].unique())
x = data.loc[:, ['N','P','K','temperature','ph','humidity','rainfall']].values
data_x = pd.DataFrame(x)
from sklearn.cluster import KMeans
k_means = KMeans(max_iter = 1000,n_clusters = 4, n_init = 10,init = 'k-means++')
ymeans = k_means.fit_predict(x)
n = data['label']
ymeans = pd.DataFrame(ymeans)
m = pd.concat([ymeans,n], axis = 1)
m = m.rename(columns = {0: 'cluster'})
print("K_Means Cluster Analysis\n")
print("Crops in First Cluster: ", m[m['cluster'] == 0]['label'].unique())
print("\t")
print("Crops in Second Cluster:", m[m['cluster'] == 1]['label'].unique())
print("\t")
print("Crops in Third Cluster:", m[m['cluster'] == 2]['label'].unique())
print("\t")
print("Crops in Forth Cluster:", m[m['cluster'] == 3]['label'].unique())
22
plt.subplot(3,5,1)
plt.xlabel('Nitrogen',fontsize=5)
sns.barplot(x=data['N'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 5, 2)
plt.xlabel(' Phosphorous',fontsize=5)
sns.barplot(x=data['P'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 3)
plt.xlabel('Potassium',fontsize=5)
sns.barplot(x=data['K'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 4)
plt.xlabel('Temperature',fontsize=5)
sns.barplot(x=data['temperature'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 5)
plt.xlabel('Humidity',fontsize=5)
sns.barplot(x=data['humidity'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 6)
plt.xlabel('pH value', fontsize = 5)
sns.barplot(x=data['ph'],y=data['label'])
plt.ylabel(' ')
plt.subplot(3, 4, 7)
plt.xlabel('Rainfall',fontsize=5)
sns.barplot(x=data['rainfall'],y=data['label'])
plt.ylabel(' ')
plt.suptitle('Different Conditions on Crops',fontsize=15)
plt.show()
y=data['label']
x=data.drop(['label'],axis = 1)
23
cr = classification_report(test_y, pred_y)
print(cr)
prediction =model.predict((np.array([[90,40,40,20,80,7,200]])))
print("The recommended Crop for Climatic Condition :", prediction)
prediction =model.predict((np.array([[90,40,20,40,60,20,200]])))
print("The recomended Crop for Given Climatic Condition :", prediction)
In this project we use various libraries like NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn.
NumPy: It is a library that consists of multi - dimensional (2d) array objects and a set of
approaches for manipulating them. NumPy can be used to conduct logical and mathematical
operations on arrays.
Pandas: Pandas is a data cleaning and analysis tool that is widely used in Data Science and
Machine Learning. Here, Pandas are used to read the following dataset.
Matplotlib: It's a cross-platform library that creates 2D charts from array data. It has an object-
oriented API for embedding plots in Python GUI toolkits like PyQt and WxPython Tkinter. We
use Matplotlib to plot graphs for bettervisualisation of clustering.
Scikit-learn: Classification, regression and clustering are some of the usefultools in the sklearn
package. Cross-validation, feature extraction, supervised learning methods, and unsupervised
learning algorithms are all features of scikit learn.
24
Sklearn.linear_model.LogisticRegression: is a classificationtechnique that is used to predict
the categorical dependent variable.
We read the csv file by using ‘pd.read_csv’ command and then, check if there are any null values
present in them by using ‘isnull()’
Here, we get a descriptive statistic and see different crops that grow in different seasons.
Now, we use k-means. Then, use Matplotlib and Seaborn for visualization of the clustering.
The following graph shows the visual representation of crops that can be grown in different
conditions.
Now, let's split the Dataset for Predictive Modelling. Then, by using “train_test_split” training and
testing of subsets is done for validation of results. Finally, we create a Predictive Model.
6.1.7. Classification Report:
Here we get the classification report for using logistic regression technique.
Finally, by giving the values of N, P, K, Temperature, Ph, Humidity, Rainfall weget the suitable
crop for those climatic conditions.
25
6.2. Ouput
26
Fig: 6.2.3 Descriptive statistics
Database sample:
CONCLUSION:
To conclude this documentation, we have discussed what our project is, its rationale and the goal
we achieve with this project. We applied specific data analysis techniques to find out a suitable crop
by using existing data. Logistic regression is used to extract important information from agriculture
recording. The analytical process began with data ingestion followed by data cleansing and
processing, missing value andresearch. It concludes by analysing the data, and finally modelling
and evaluation. The highest accuracy in the public test set. Higher accuracy evaluation of machine
learning method by mutual verification calculation verification, precision, recall, F1 score.
The proposed system takes into account relevant data on nitrogen, phosphorus, potassium,
temperature, rainfall, and highest yield crops over the past year. It can be cultivated underappropriate
environmental conditions. Lists all possible crops that are cultivated and it helps farmers make
decisions about the culture that is cultivated. The system also takes into account historical data
generation. This helps farmers gain insights into the requirements of the various crops that are
adequate to grow on the given plot of land.
Future Scope:
The applications of this project are at present limited, but with proper research, great results are
expected. There are limitations with this project, such as taking a larger amount of data to determine
suitable crops than the conventional methods. But there is scope for improvement. With better
algorithms, there could be a significant decrease in evaluation time.
Further, this project can also be extended by creating an application, to be compatible with
mobiles, so that the users can determine their own crop without relying on others.
31
CHAPTER–8
8. REFERENCES
1. https://scholar.google.co.in/scholar?q=random+forest+based+on+crop+prediction&h
l=en&as_ sdt=0&as_vis=1&oi=scholart - random forest existing system
3. https://publications.waset.org/9997276/modified-naive-bayes-based-prediction- modeling-
for-cr op-yield-prediction
4. https://en.wikipedia.org/wiki/Agriculture
5. https://en.wikipedia.org/wiki/Data_analysis
7. Monali Paul, Santosh K. Vishwakarma, Ashok Verma, “Analysis of Soil Behavior and
Prediction of Crop Yield using Data Mining approach”
8. Abdullah Na, William Isaac, ShashankVarshney, Ekram Khan, “An IoT Based
System for Remote Monitoring of Soil Characteristics”, 2016 International Conference
of Information Technology.
32