BTPPPPPPT
BTPPPPPPT
● The main objective of this research was to compare the performance of different machine
learning algorithms and select the best algorithm to be used for further development of a
mobile application to identify herbal, fruits, and vegetable plants.
● To develop an accurate and efficient placement prediction model using Machine Learning (ML).
Objective of the Project Identification of plants through plant leaves on the basis of their shape,
color and texture features using digital image processing techniques
● In this work, a recognition system capable of identifying plants by using the images to f their leaves has
been developed. A mobile application has been developed to allow the user to take pictures and upload
them to server. The server runs pre processing and feature extraction techniques on image before a
pattern matcher compares the information from this image with that on database in order to get
matches. The features being length and width of leaves, area and perimeter, hull area, hull perimeter, a
distance map among vertical and horizontal axes, a colour histogram and centroid based radial distance
map. A kNN classifier was implemented and achieved an accuracy of 87.3 percent.
● This paper presents the extraction of plant leaf gas alongside other features from the camera images or a
dataset of images by applying a convolutional neural network (CNN). The extraction of leaf gas enables
identification of the actual level of chlorophyll (Ch) and nitrogen (N) which may help to interpret future
predictions. This includes the study of texture and geometric features, analyzing ratio of Ch and N in
both healthy and dead leaves, and the study of color-based methods via CNN.
GAPS IDENTIFIED
● Sensitivity to Hyperparameters (KNN, Random Forest): - KNN's performance is sensitive to the
choice of the number of neighbors (k), and selecting an inappropriate k-value can lead to
suboptimal predictions. Similarly, Random Forest performance is impacted by the number of
trees and other hyperparameters, making tuning crucial but challenging.
● Overfitting (KNN, Logistic Regression, Random Forest): - All three algorithms are susceptible to
overfitting, especially when dealing with noisy or complex datasets. Logistic Regression and
Random Forest may overfit if the model is too complex or the dataset is not representative.
● Difficulty in Handling Missing Data (KNN, Logistic Regression): - KNN is sensitive to missing data,
and imputing missing values can introduce bias. Logistic Regression also requires handling
missing data appropriately, which can be challenging and may introduce potential inaccuracies.
PROPOSED MODEL
METHODOLOGY
PRE - PROCESSING
FEATURE EXTRACTION
Various types of leaf features were extracted from the pre-processed image which are listed as follows:
1. Shape based features : physiological length,physiological width, area, perimeter, aspect ratio,
rectangularity, circularity
2. Color based features : mean and standard deviations of R,G and B channels
3. Texture based features : contrast, correlation, inverse difference moments, entropy
Model building and testing
1. Support Vector Machine Classifier was used as the model to classify the plant species
2. Features were then scaled using StandardScaler
3. Also parameter tuning was done to find the appropriate hyperparameters of the model using
GridSearchCV
Import the dataset and libraries. numpy, pandas, seaborn, matplotlib are used in this implementation. “df”
variable is a pandas dataframe containing the dataset.
IMPLEMENTATION
● Conversion of RGB to Grayscale image
● Smoothing image using Guassian filter of size (25,25).
● 44Adaptive image thresholding using Otsu's threshold method
● Closing of holes using Morphological Transformation
● Boundary extraction (Using sobel filter and contour)
● Calculation of Shaped based Features (Calculation of Moments ,Rectangularity ,
Aspect Ratio , Circularity)
● Calculation of colour based features ( Mean and Standard deviation).
● Calculation of Texture based features (contrast, correlation, entropy.)
● Import the dataset and libraries. numpy, pandas, seaborn, matplotlib are used in
this implementation. “df” variable is a pandas dataframe containing the dataset.
● Clean the dataset by removing all the rows containing missing values.
● As the dataset was already preprocessed by now and it contained various
attributes such as area ,perimeter, physiological_length, physiological_width,
aspect_ratio etc.
● Train test split of data was then performed which helps to evaluate the ability of
machine learning models to generalize to new, unseen objects. It also prevents
overfitting, where the model performs well on training material but not well in
new situations. Using the validation process, we recalibrate the model to achieve
better performance on unseen data.
● Feature scaling was then performed to prevent overfitting and to normalize the
data that we used,imported via standardScalar.
● Now a model is trained on the Augmented/Preprocessed Dataset to predict the
best classifier for each row. keeping best classifier column as the dependent
variable (Y variable/target variable)
● Finally, the proposed Plant leaf identifier Model is evaluated on the original
cleaned preprocessed test dataset (“X_test_scaled”, “Y_test”)
● 7. Another step is done that is Perfoming parameter tuning of the model by taking
the 4 parameters to fine tune them again, them being parameters = [{'kernel':
['rbf'], 'gamma': [1e-4, 1e-3, 0.01, 0.1, 0.2, 0.5], 'C': [1, 10, 100, 1000]},{'kernel':
['linear'], 'C': [1, 10, 100, 1000]}].
● The result of means was stored in “mean_test_score” , while the standard result
was assigned as “std_test_score”.
● Dimensity Reduction using PCA was done to enhance the performance
furthermore.
RESULTS