0% found this document useful (0 votes)
52 views19 pages

Ipt Report

This document is an inplant training report submitted by Thota Siva Teja to fulfill the requirements for a B.Tech degree in Computer Science and Engineering. The report covers training done at Native Sparrow Software Solutions LLP under the guidance of Dr. J. Jane Rubel Angelina. The report includes an introduction to machine learning using Python for crop recommendation and crop disease classification. It provides an overview of Python programming language and machine learning algorithms like supervised learning, classification, and regression. Key algorithms discussed are decision trees, logistic regression, random forest, CNN and VGG16.

Uploaded by

thotasivateja57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views19 pages

Ipt Report

This document is an inplant training report submitted by Thota Siva Teja to fulfill the requirements for a B.Tech degree in Computer Science and Engineering. The report covers training done at Native Sparrow Software Solutions LLP under the guidance of Dr. J. Jane Rubel Angelina. The report includes an introduction to machine learning using Python for crop recommendation and crop disease classification. It provides an overview of Python programming language and machine learning algorithms like supervised learning, classification, and regression. Key algorithms discussed are decision trees, logistic regression, random forest, CNN and VGG16.

Uploaded by

thotasivateja57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

INPLANT TRAINING REPORT

UNDERGONE IN NATIVE SPARROW SOFTWARE SOLUTIONS LLP

Submitted by :
THOTA SIVA TEJA – 9920004281

in partial fulfillment for the award of the degree of


B.TECH
IN
COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF COMPUTING
COMPUTER SCIENCE AND ENGINEERING
KALASALINGAM ACADEMY OF RESEARCH AND EDUCATION
KRISHNANKOIL 626 126

1
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that the Industrial Training Report titled “MACHINE


LEARNING USING PYTHON CROP RECOMMENDATION SYSTEM
USING ML” is a bonafide record of the work done by MANAPURAM JNANA
SAIKIRAN (9920004371) in partial fulfillment of the requirements for the
award of the degree of Bachelor of Technology in Specialization of the Computer
Science and Engineering, during the Academic year Odd Semester (2023-2024).

Dr.J.Jane Rubel Angelina Dr. N.Suresh Kumar


(Internal Guide) Head of the Department
Assistant Professor School of Computing
School of Computing Department of Computer Science and
Department of Computer Science and Engineering,
Engineering, Kalasalingam Academy of Research
Kalasalingam Academy of Research and Education
and Education Krishnankoil 626126
Krishnankoil 626126

Internal Examiner External Examiner

2
ACKNOWLEDGEMENT

Owing deeply to the supreme, we extend our sincere thanks to the Almighty, the great architect of the

universe and our parents, who has blessed us to come out successfully with our project. The success

and final outcome of our In-plant training required a lot of guidance and assistance from many people

and we are extremely privileged to have got this all along the completion of our project. All that we

have done is only due to such supervision and guidance and we would not forget to thank them.

We wish to express our deep sense of gratitude to our beloved Chancellor Dr.K.SRIDHRAN for

providing all necessary facilities to carry out the project work. We extend our sincere thanks to the Vice

President, the guiding light for our every action, Dr.S.SHASI ANAND for his consistent

encouragement &Vice Chancellor, Dr.S.NARAYANAN for the constant support. We respect and

thank our Dean, Dr.P.DEEPALAKSHMI for providing us an opportunity to do this project with all

the support and guidance, which made us successfully proceed in this training duty.

We owe my deep gratitude to our beloved Head of the Department Dr.N.SURESH KUMAR, for his

encouragement, support and guidance throughout this In-plant training. We express our sincere thanks

to our Internal Guide Dr.J.Jane Rubel Angelina for encouraging us to make our In-Plant training

successful. We are filled with at most gratitude to thank our In-Plant Training coordinators

Dr.R.Murugeswari and Mr.B.SHANMJUGA RAJA for his superlative efforts and espousals.

Finally, we would like to thanks respected proprietor Mr.R.S.SENTHILNATHAN of NATIVE

SPARROW SOFTWARE SOLUTIONS LLP who gave us an opportunity for completing this In-

plant training.

3
TABLE OF CONTENTS

S.NO CONTENT PAGE NUMBER

1 ABSTACT 4

2 CHAPTER I: 5

INTRODUCTION

3 CHAPTER II: 6

INTRODUCTION TO

PYTHON & ML

4 CHAPTER III: 8

PROPOSED SYSTEM

5 CHAPTER IV: 10

METHODS & ALGORITHMS

USED

6 CHAPTER V: 14

SYSTEM ARCHITECTURE

7 CHAPTER VI: 15

SYSTEM REQUIREMENTS

8 CHAPTER VII: 16

OUTPUT SCREENSHORTS

9 CHAPTER VIII: 17

CONCLUSION

10 ACCEPTENCE LETTER 18

11 COMPLETION 19

CERTIFICATE

4
CHAPTER I: INTRODUCTION

India is the country where all different types of soil available for farming. But still
farmers getting loss because they don’t know which crop is suitable for that soil based on the
properties of soil and weather. So we are developing a model that will predict which crop is
suitable for that soil using ensemble learning. And also we used different machine learning
algorithms to find which model will be more efficient for crop recommendation. The dataset
contains attributes like chemical properties of soil[sodium, potassium, Nitrogen] and weather
conditions like [ph, rainfall, humidity etc]. This has is not only about crop recommendation but
it is useful for machine learning researchers which algorithm is more suitable for this type
problem. The algorithms are Decision tree classifier, logistic Regression, Random forest
classifier, etc. By finding the accuracy of each algorithm after training and testing we can
conclude which algorithm is more suitable. After crop recommendation other major problem
that farmers are facing is using pesticides and insecticides without knowing the proper disease
of crop. So that the crop may not produce good yield because more fertilizers or chemicals. For
this we can use one of the latest technology Deep Learning for crop disease classification.
Because if we are able to find the exact disease that crop has infected we can easily find the
medicine for that. So by using CNN and VGG16 algorithms we can classify the crop diseases.
For these we are using Infected crop image dataset to train our model.

5
CHAPTER II: INTRODUCTION TO PYTHON & ML

Python is a widely used general-purpose, high level programming language. It was


initially designed by Guido van Rossum in 1991 and developed by Python Software
Foundation.
It was mainly developed for emphasis on code readability, and its syntax allows programmers
to express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems more
efficiently.
It is used for:
● web development (server-side),
● software development,
● mathematics,
● System scripting.

Python Syntax compared to other programming languages


● Python was designed to for readability, and has some similarities to the English language
with influence from mathematics.
● Python uses new lines to complete a command, as opposed to other programming languages
which often use semicolons or parentheses.
● Python relies on indentation, using whitespace, to define scope; such as the scope of
loops,functions and classes. Other programming languages often use curly-brackets for this
purpose.

Machine learning (ML) is the scientific study of algorithms and statistical models that computer
systems use to perform a specific task without using explicit instructions, relying on patterns
and inference instead. It is seen as a subset of artificial intelligence. Machine learning
algorithms build a mathematical model based on sample data, known as "training data", in
order to make predictions or decisions without being explicitly programmed to perform the
task.
Machine learning algorithms are used in a wide variety of applications, such as email
filtering and computer vision, where it is difficult or infeasible to develop a conventional
algorithm for effectively performing the task.
Machine learning is closely related to computational statistics, which focuses on making
predictions using computers. The study of mathematical optimization delivers methods, theory
and application domains to the field of machine learning. Data mining is a field of study within
machine learning, and focuses on exploratory data analysis through learning. In its application
across business problems, machine learning is also referred to as predictive analytics.

Types of learning algorithms:


The types of machine learning algorithms differ in their approach, the type of data they input
and output, and the type of task or problem that they are intended to solve.

Supervised learning:
Supervised learning algorithms build a mathematical model of a set of data that contains both
the inputs and the desired outputs. The data is known as training data, and consists of a set of
training examples. Each training example has one or more inputs and the desired output, also
known as a supervisory signal. In the mathematical model, each training example is represented

6
by an array or vector, sometimes called a feature vector, and the training data is represented by
a matrix. Through iterative optimization of an objective function, supervised learning
algorithms learn a function that can be used to predict the output associated with new inputs.
An optimal function will allow the algorithm to correctly determine the output for inputs that
were not a part of the training data. An algorithm that improves the accuracy of its outputs or
predictions over time is said to have learned to perform that task.
Supervised learning algorithms include classification and regression. Classification algorithms
are used when the outputs are restricted to a limited set of values, and regression algorithms
are used when the outputs may have any numerical value within a range. Similarity learning is
an area of supervised machine learning closely related to regression and classification, but the
goal is to learn from examples using a similarity function that measures how similar or related
two objects are. It has applications in ranking, recommendation systems, visual identity
tracking, face verification, and speaker verification.

Unsupervised learning: Unsupervised learning algorithms take a set of data that contains only
inputs, and find structure in the data, like grouping or clustering of data points. The algorithms,
therefore, learn from test data that has not been labeled, classified or categorized. Instead of
responding to feedback, unsupervised learning algorithms identify commonalities in the data
and react based on the presence or absence of such commonalities in each new piece of data.
A central application of unsupervised learning is in the field of density estimation in statistics,
though unsupervised learning encompasses other domains involving summarizing and
explaining data features. Cluster analysis is the assignment of a set of observations into subsets
(called clusters) so that observations within the same cluster are similar according to one or
more pre designated criteria, while observations drawn from different clusters are dissimilar.
Different clustering techniques make different assumptions on the structure of the data, often
defined by some similarity metric and evaluated, for example, by internal compactness, or the
similarity between members of the same cluster, and separation, the difference between
clusters. Other methods are based on estimated density and graph connectivity.

7
CHAPTER III: PROPOSED SYSTEM

Data Collection:
Gather data on various factors that influence crop growth, such as soil type, climate, rainfall,
temperature, humidity, elevation, etc.
Collect historical data on crop yields, diseases, pests, and other relevant agricultural
parameters.

Data Preprocessing:
Clean the data to remove any inconsistencies, missing values, or outliers.
Normalize or scale the data to ensure that all features are on a similar scale.

Feature Selection:
Identify the most relevant features that impact crop growth and yield through techniques like
correlation analysis, feature importance ranking, etc.

Model Selection:
Choose appropriate machine learning algorithms for building the recommendation system.
Common choices include decision trees, random forests, support vector machines, or neural
networks.
Consider using ensemble methods for improved performance.

Model Training:
Split the dataset into training and testing sets to evaluate the model's performance.
Train the selected models using the training dataset.

Model Evaluation:
Evaluate the performance of the trained models using appropriate metrics such as accuracy,
precision, recall, F1-score, etc.
Fine-tune the model hyperparameters to optimize performance.

Deployment:
Deploy the trained model into a user-friendly interface, such as a web application or mobile
app.
Integrate the model with a user interface where farmers can input their location, soil type,
climate data, and other relevant parameters.

3.1 Dataset Collection


The dataset used for crop recommendation is having attributes like chemical
properties and weather conditions. The labels present in our dataset is rice, maize, chickpea,
kidneybeans, pigeonpeas, mothbeas, mungbean, blackgram, lentil, pomegranate, banana,
mango, grapes, watermelon, muskmelon, apple ,orange, papaya, coconut, cotton, jute, coffee.
The attributes in dataset is Nitrogen, phosphorus, potassium, temperature, humidity, PH value,
rainfall. The dataset contains 2201 data. According to chemical properties present in the soil
the fertility of soil changes and have different crop yielding capacity. So the farmers should

8
know which crop will suitable for their land according to above chemical property. The dataset
used for crop disease classification is having the images of cotton crop with various types of
infected images. At first we have to develop deep learning model for training the image dataset.
The dataset is having 6 different classes[Aphids, Army worm, Bacterial Bright, Healthy,
Powdery Mildew, Target Spot]. By using these model we can justify which disease does the
crop was infected.

3.2 Data Augmentation


Data Augmentation is used to increase the size of the image dataset for better
training by using techniques like rotating the image, zoom in, zoom out etc. We use data
augmentation for crop disease classification dataset. By using these technique we can get more
number of images for training and testing in less time with less manpower.

3.3 Data Pre-processing


Data pre-processing is used to clean the data, remove the unwanted data, selecting
the required features etc. For crop recommendation we have a theoretical dataset but we didn’t
have any noises to change and the dataset contains the required features as attributes. So for
crop recommendation dataset data preprocessing is not required because it was already a pre-
processed dataset. Data pre-processing is not only used for theoretical data but also for image
dataset it can be used. Since we are using image dataset for crop disease classification, but all
images are not in same size. We are scaling down the images to get into same size so that we
can train the model effectively.

9
CHAPTER IV: METHODS & ALGORITHAMS USED.
Logistic Regression
Logistic Regression is a supervised machine learning algorithm. It is mostly used to find the
categorical dependent variable. Logistic regression is used to solve classification based
problems. It is one of the best algorithm for predicting categorical target variable. We have
created a logistic regression model and train the model with our dataset. And the before
predicting we have given numbering to the label from [1:10] i.e, we are having 10 different
types of categorical labels but predicting can be done if it is numerical data only. So we have
given numbering to the categorical data and according to that we have trained the data. After
testing we got 95.6% accuracy.

Figure1. Classification report for logistic regression

Decision Tree
Decision tree is a supervised machine learning algorithm. It can be used for both classification
and regression problems. Decision tree produce a tree-structure as an output. In that tree
representation the internal nodes represents features of dataset, branches represent decision
rules and leaf node represent outcome of the problem. It works based on the decision suitable
for solving the problem at each stage. The tree build by the model can be easily understandable

10
so mostly we use Decision tree algorithm. Decision tree will build based on CART algorithm.
We created a decision model and trained the model using our dataset and got accuracy upto
98.5%.

Figure2. Decision tree classification

11
Random Forest algorithm

Random Forest algorithm is a supervised machine learning algorithm. It is also used for both
classification and regression problems. It works as ensemble learning to produce high accuracy
and efficiency. Random forest classifier will create various decision trees for each subset of
data and takes average of that to increase accuracy. Random forest classifier will take less time
when compared to other algorithms irrespective of size of dataset. At first it select x number of
data points and build some d number of decision trees and based on the average of output of
decision tree it predict the output. We created a model of random forest classifier and trained
using our dataset. Among all other algorithms we got high accuracy for this model. The
accuracy of this model is 99.09%.

Figure 3. Classification report for Random forest Classifier

12
Support Vector Machine
Support Vector Machine is a supervised machine learning algorithm used for classification
problems. SVM creates boundaries to segregate n-dimensional space into classes. The
boundary is called hyperplane. SVM can also work for image dataset. We created SVM model
and trained using our dataset. The accuracy of SVM for the dataset is 97.42%.

Figure 4. Classification report for SVM

13
CHAPTER V: SYSTEM ARCHITECTURE

Figure5. Architecture of machine learning

14
CHAPTER VI: SYSTEM REQUIREMENTS

6.1 HARDWARE REQUIREMENTS:

● System: HP AMD5.
● Hard Disk: 512 GB.
● Monitor: 15’’ LED
● Ram: 16 GB

6.2. SOFTWARE REQUIREMENTS:

● Operating system: Windows 11.

● Coding Language: Python.

15
CHAPTER VII: OUTPUT SCREENSHORTS

Figure5. Crop recommendation

16
CHAPTER VIII: CONCLUSION

In conclusion, the development of a crop recommendation system involves a systematic


approach that encompasses data collection, preprocessing, feature selection, model selection,
training, evaluation, deployment, feedback mechanisms, scalability, maintenance, and ethical
considerations.

By following these steps, agricultural stakeholders can leverage the power of machine learning
and data analysis to provide farmers with valuable insights and recommendations tailored to
their specific conditions and needs. This not only enhances agricultural productivity and
profitability but also contributes to sustainable farming practices and food security.

As technology continues to advance, crop recommendation systems have the potential to


become even more sophisticated and effective, empowering farmers with the tools and
knowledge they need to thrive in an ever-changing environment.

Ultimately, the successful implementation of a crop recommendation system requires


collaboration between researchers, data scientists, agronomists, policymakers, and farmers to
ensure that it addresses real-world challenges and delivers tangible benefits to agriculture and
society as a whole.

17
ACCEPTENCE LETTER

18
COMPLETION CERTIFICATE

19

You might also like