Ipt Report
Ipt Report
Submitted by :
THOTA SIVA TEJA – 9920004281
SCHOOL OF COMPUTING
COMPUTER SCIENCE AND ENGINEERING
KALASALINGAM ACADEMY OF RESEARCH AND EDUCATION
KRISHNANKOIL 626 126
1
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
2
ACKNOWLEDGEMENT
Owing deeply to the supreme, we extend our sincere thanks to the Almighty, the great architect of the
universe and our parents, who has blessed us to come out successfully with our project. The success
and final outcome of our In-plant training required a lot of guidance and assistance from many people
and we are extremely privileged to have got this all along the completion of our project. All that we
have done is only due to such supervision and guidance and we would not forget to thank them.
We wish to express our deep sense of gratitude to our beloved Chancellor Dr.K.SRIDHRAN for
providing all necessary facilities to carry out the project work. We extend our sincere thanks to the Vice
President, the guiding light for our every action, Dr.S.SHASI ANAND for his consistent
encouragement &Vice Chancellor, Dr.S.NARAYANAN for the constant support. We respect and
thank our Dean, Dr.P.DEEPALAKSHMI for providing us an opportunity to do this project with all
the support and guidance, which made us successfully proceed in this training duty.
We owe my deep gratitude to our beloved Head of the Department Dr.N.SURESH KUMAR, for his
encouragement, support and guidance throughout this In-plant training. We express our sincere thanks
to our Internal Guide Dr.J.Jane Rubel Angelina for encouraging us to make our In-Plant training
successful. We are filled with at most gratitude to thank our In-Plant Training coordinators
Dr.R.Murugeswari and Mr.B.SHANMJUGA RAJA for his superlative efforts and espousals.
SPARROW SOFTWARE SOLUTIONS LLP who gave us an opportunity for completing this In-
plant training.
3
TABLE OF CONTENTS
1 ABSTACT 4
2 CHAPTER I: 5
INTRODUCTION
3 CHAPTER II: 6
INTRODUCTION TO
PYTHON & ML
4 CHAPTER III: 8
PROPOSED SYSTEM
5 CHAPTER IV: 10
USED
6 CHAPTER V: 14
SYSTEM ARCHITECTURE
7 CHAPTER VI: 15
SYSTEM REQUIREMENTS
8 CHAPTER VII: 16
OUTPUT SCREENSHORTS
9 CHAPTER VIII: 17
CONCLUSION
10 ACCEPTENCE LETTER 18
11 COMPLETION 19
CERTIFICATE
4
CHAPTER I: INTRODUCTION
India is the country where all different types of soil available for farming. But still
farmers getting loss because they don’t know which crop is suitable for that soil based on the
properties of soil and weather. So we are developing a model that will predict which crop is
suitable for that soil using ensemble learning. And also we used different machine learning
algorithms to find which model will be more efficient for crop recommendation. The dataset
contains attributes like chemical properties of soil[sodium, potassium, Nitrogen] and weather
conditions like [ph, rainfall, humidity etc]. This has is not only about crop recommendation but
it is useful for machine learning researchers which algorithm is more suitable for this type
problem. The algorithms are Decision tree classifier, logistic Regression, Random forest
classifier, etc. By finding the accuracy of each algorithm after training and testing we can
conclude which algorithm is more suitable. After crop recommendation other major problem
that farmers are facing is using pesticides and insecticides without knowing the proper disease
of crop. So that the crop may not produce good yield because more fertilizers or chemicals. For
this we can use one of the latest technology Deep Learning for crop disease classification.
Because if we are able to find the exact disease that crop has infected we can easily find the
medicine for that. So by using CNN and VGG16 algorithms we can classify the crop diseases.
For these we are using Infected crop image dataset to train our model.
5
CHAPTER II: INTRODUCTION TO PYTHON & ML
Machine learning (ML) is the scientific study of algorithms and statistical models that computer
systems use to perform a specific task without using explicit instructions, relying on patterns
and inference instead. It is seen as a subset of artificial intelligence. Machine learning
algorithms build a mathematical model based on sample data, known as "training data", in
order to make predictions or decisions without being explicitly programmed to perform the
task.
Machine learning algorithms are used in a wide variety of applications, such as email
filtering and computer vision, where it is difficult or infeasible to develop a conventional
algorithm for effectively performing the task.
Machine learning is closely related to computational statistics, which focuses on making
predictions using computers. The study of mathematical optimization delivers methods, theory
and application domains to the field of machine learning. Data mining is a field of study within
machine learning, and focuses on exploratory data analysis through learning. In its application
across business problems, machine learning is also referred to as predictive analytics.
Supervised learning:
Supervised learning algorithms build a mathematical model of a set of data that contains both
the inputs and the desired outputs. The data is known as training data, and consists of a set of
training examples. Each training example has one or more inputs and the desired output, also
known as a supervisory signal. In the mathematical model, each training example is represented
6
by an array or vector, sometimes called a feature vector, and the training data is represented by
a matrix. Through iterative optimization of an objective function, supervised learning
algorithms learn a function that can be used to predict the output associated with new inputs.
An optimal function will allow the algorithm to correctly determine the output for inputs that
were not a part of the training data. An algorithm that improves the accuracy of its outputs or
predictions over time is said to have learned to perform that task.
Supervised learning algorithms include classification and regression. Classification algorithms
are used when the outputs are restricted to a limited set of values, and regression algorithms
are used when the outputs may have any numerical value within a range. Similarity learning is
an area of supervised machine learning closely related to regression and classification, but the
goal is to learn from examples using a similarity function that measures how similar or related
two objects are. It has applications in ranking, recommendation systems, visual identity
tracking, face verification, and speaker verification.
Unsupervised learning: Unsupervised learning algorithms take a set of data that contains only
inputs, and find structure in the data, like grouping or clustering of data points. The algorithms,
therefore, learn from test data that has not been labeled, classified or categorized. Instead of
responding to feedback, unsupervised learning algorithms identify commonalities in the data
and react based on the presence or absence of such commonalities in each new piece of data.
A central application of unsupervised learning is in the field of density estimation in statistics,
though unsupervised learning encompasses other domains involving summarizing and
explaining data features. Cluster analysis is the assignment of a set of observations into subsets
(called clusters) so that observations within the same cluster are similar according to one or
more pre designated criteria, while observations drawn from different clusters are dissimilar.
Different clustering techniques make different assumptions on the structure of the data, often
defined by some similarity metric and evaluated, for example, by internal compactness, or the
similarity between members of the same cluster, and separation, the difference between
clusters. Other methods are based on estimated density and graph connectivity.
7
CHAPTER III: PROPOSED SYSTEM
Data Collection:
Gather data on various factors that influence crop growth, such as soil type, climate, rainfall,
temperature, humidity, elevation, etc.
Collect historical data on crop yields, diseases, pests, and other relevant agricultural
parameters.
Data Preprocessing:
Clean the data to remove any inconsistencies, missing values, or outliers.
Normalize or scale the data to ensure that all features are on a similar scale.
Feature Selection:
Identify the most relevant features that impact crop growth and yield through techniques like
correlation analysis, feature importance ranking, etc.
Model Selection:
Choose appropriate machine learning algorithms for building the recommendation system.
Common choices include decision trees, random forests, support vector machines, or neural
networks.
Consider using ensemble methods for improved performance.
Model Training:
Split the dataset into training and testing sets to evaluate the model's performance.
Train the selected models using the training dataset.
Model Evaluation:
Evaluate the performance of the trained models using appropriate metrics such as accuracy,
precision, recall, F1-score, etc.
Fine-tune the model hyperparameters to optimize performance.
Deployment:
Deploy the trained model into a user-friendly interface, such as a web application or mobile
app.
Integrate the model with a user interface where farmers can input their location, soil type,
climate data, and other relevant parameters.
8
know which crop will suitable for their land according to above chemical property. The dataset
used for crop disease classification is having the images of cotton crop with various types of
infected images. At first we have to develop deep learning model for training the image dataset.
The dataset is having 6 different classes[Aphids, Army worm, Bacterial Bright, Healthy,
Powdery Mildew, Target Spot]. By using these model we can justify which disease does the
crop was infected.
9
CHAPTER IV: METHODS & ALGORITHAMS USED.
Logistic Regression
Logistic Regression is a supervised machine learning algorithm. It is mostly used to find the
categorical dependent variable. Logistic regression is used to solve classification based
problems. It is one of the best algorithm for predicting categorical target variable. We have
created a logistic regression model and train the model with our dataset. And the before
predicting we have given numbering to the label from [1:10] i.e, we are having 10 different
types of categorical labels but predicting can be done if it is numerical data only. So we have
given numbering to the categorical data and according to that we have trained the data. After
testing we got 95.6% accuracy.
Decision Tree
Decision tree is a supervised machine learning algorithm. It can be used for both classification
and regression problems. Decision tree produce a tree-structure as an output. In that tree
representation the internal nodes represents features of dataset, branches represent decision
rules and leaf node represent outcome of the problem. It works based on the decision suitable
for solving the problem at each stage. The tree build by the model can be easily understandable
10
so mostly we use Decision tree algorithm. Decision tree will build based on CART algorithm.
We created a decision model and trained the model using our dataset and got accuracy upto
98.5%.
11
Random Forest algorithm
Random Forest algorithm is a supervised machine learning algorithm. It is also used for both
classification and regression problems. It works as ensemble learning to produce high accuracy
and efficiency. Random forest classifier will create various decision trees for each subset of
data and takes average of that to increase accuracy. Random forest classifier will take less time
when compared to other algorithms irrespective of size of dataset. At first it select x number of
data points and build some d number of decision trees and based on the average of output of
decision tree it predict the output. We created a model of random forest classifier and trained
using our dataset. Among all other algorithms we got high accuracy for this model. The
accuracy of this model is 99.09%.
12
Support Vector Machine
Support Vector Machine is a supervised machine learning algorithm used for classification
problems. SVM creates boundaries to segregate n-dimensional space into classes. The
boundary is called hyperplane. SVM can also work for image dataset. We created SVM model
and trained using our dataset. The accuracy of SVM for the dataset is 97.42%.
13
CHAPTER V: SYSTEM ARCHITECTURE
14
CHAPTER VI: SYSTEM REQUIREMENTS
● System: HP AMD5.
● Hard Disk: 512 GB.
● Monitor: 15’’ LED
● Ram: 16 GB
15
CHAPTER VII: OUTPUT SCREENSHORTS
16
CHAPTER VIII: CONCLUSION
By following these steps, agricultural stakeholders can leverage the power of machine learning
and data analysis to provide farmers with valuable insights and recommendations tailored to
their specific conditions and needs. This not only enhances agricultural productivity and
profitability but also contributes to sustainable farming practices and food security.
17
ACCEPTENCE LETTER
18
COMPLETION CERTIFICATE
19