Classification of Mushroom Fungi Using Machine Lea
Classification of Mushroom Fungi Using Machine Lea
Classification of Mushroom Fungi Using Machine Lea
ABSTRACT the human immune system. Currently, the mushroom refers to the
process that performed by robot in food industry. This technique
Mushroom is one of the fungi types’ food that has the used to limit the features such as color. Recently, mushroom
most potent nutrients on the plant. Mushrooms have major system used specific characteristics that improve the selection
medical advantages such as killing cancer cells. This study process of mushrooms. Such system depends on analyzing and
aims to find the most appropriate technique for mushroom investigating the features in order to get better classification
classification, and mushroom will be classified into two based on the well-known features [3].
categories, poisonous and nonpoisonous. The proposed
approach will implement a different techniques and 1.1 Machine Learning
algorithms like neural network (NN), Support Vector
Machines (SVM), Decision Tree, and k Nearest Neighbors Machine learning (ML) relies under the umbrella of artificial
(KNN), on dataset of mushroom images, where the dataset intelligence [4], that allows computer systems to learn based on
contains images with background and without background. previous history, experience, examples, and data, it has been
The experimental results shown that the best technique for making great progress in many directions. Machine learning
classifying mushroom images is kNN with accuracy of 94% involving the study of computational learning and pattern
based on features extracted from images with real recognition theory in the artificial intelligence. In addition,
dimensions of mushroom types, and 87% based on features machine learning spots the light on the construction of techniques
extracted from images only. that can learn and make predictions on available data. In instance,
applications such as detection of network intruders or email
Keywords: Machine Learning, Mushroom Classification, filtering, optical character recognition (OCR), and computer
Supervised Learning.
vision [5].
1. INTRODUCTION
1.2 Algorithms and Techniques
Nowadays, there are different challenges to develop
systems that analyze a huge and complex data to make better In this study, we will use different machine learning
decisions. This study aims to find new approach working to algorithms and techniques for mushroom classification, some of
classify the mushrooms images based on different features them are listed below:
using the different techniques of Machine Learning (ML).
The purpose of classification process is to predict categorical Neural Network (NN): is a distributed matrix structure, it
class labels or the target value [1], for example, feed-forward used in different applications, such as classifying data and
Artificial Neural Network (ANN), and the purpose of patterns, , predicting new cases or examples, and in pattern
classifier is to map data to predefined classes or groups [2]. recognition applications. NN simulates human biological
In the proposed approach, we used the training dataset that cells and human capability of thinking and learning [5][6].
contain the mushroom images to classify it into poisonous Decision Tree: is one of the most popular classification
and nonpoisonous. Where our approach aims to classifies techniques in machine learning, where it used in decision
and predict for the class (groups) of mushrooms when submit support system. Decision aims to classify objects (instances)
the features of the mushrooms to different techniques of to find a track from the great parent node [7][8].
machine learning. kNN: is an algorithm and classified under machine learning.
The kNN featured the low number of training parameters,
A mushroom is one of the fungi types’ food that has the where the computational complexity is not high, and the
most potent nutrients on the plant. Mushrooms have major performance is satisfactory [7].
advantages such as kill cancer cells, viruses and enhancing
2378
Mohammad Ashraf Ottom et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(5),September - October 2019, 2378- 2385
2379
Mohammad Ashraf Ottom et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(5),September - October 2019, 2378- 2385
steps in previous experiment to calculate each of height and dataset. We applied NN, DT, SVM, and KNN algorithms on
width according to detect edges. We built new features matrix Orange 3 software, where the best results were for KNN with
for the dataset. The experiment results of histogram features 80% for the accuracy.
shown the accuracy reached to 87%. To optimize our results,
3.6 Machine learning model
we build an algorithm aims to extract more features we called
this group as parametric features, which are:
After feature extraction phase we use Orange3 & Knime to
Local Contrast Normalization (LCN): used to build a machine learning model and apply different algorithms
contrast features within a feature map, as well as like SVM, Neural Network, Decision Tree and KNN. We use
across feature maps at the same spatial location, random sampling with training set size 66% but we didn’t get a
where it inspired by computational neuroscience good result due to the small number of the dataset instances
[16]. which were 380 instances. Therefore, we use cross validation
Skewness, Standard deviation, and Kurtosis: are with 10 folds. After building the trained model we evaluate the
considering as concepts in the statistical meta- results in term of accuracy, f-measure, precision and recall. The
features, which are calculated by considering a confusion matrix is used to determine the percentage of wrongly
statistical concept, calculate this for all numeric classified instances. Figures 3 and 4 portrayed the training model
attributes and taking the mean [17]. in orange3 and Knime respectively.
Entropy: is a concept with a complex history and has
been the subject of diverse reconstructions and
interpretations, where it’s defined as the average
amount of information produced by a stochastic
source of data [18].
Mean: is useful in assessing expected losses and
benefits. For instance, in the proposed approach we
used mean to determine in features matrix,
especially for calculate the height and width for
images.
Correlation: Correlation is one of the most widely
used, where the term "correlation" refers to a mutual
relationship or association between quantities [19].
Homogeneity: is one of the broad categories of
distributed data mining, it refers to the process of
mining the same set of attributes over all the
participating nodes.
Diameter: the real or virtual diameter of the length
of the mushroom stem tall.
Figure 5: Evaluation results for Eigen features with real Next step we tried to extract more features like histogram
dimensions features, which applied to selected ML and shown the best
The experiment results shown KNN technique produced the accuracy gained by KNN technique with accuracy reached 87%.
highest accuracy (0.944) for Eigen with real dimensions. Figure 8 shows result for histogram features.
2382
Mohammad Ashraf Ottom et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(5),September - October 2019, 2378- 2385
0.9
0.876 0.874
0.866 0.867 0.867
0.858
0.845 0.846 0.85 0.841
0.849
0.83 0.85
0.8
0.764
0.751
0.732 0.736 0.75
0.7
0.65
NN SVM Tree KNN
2383
Mohammad Ashraf Ottom et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(5),September - October 2019, 2378- 2385
0.841 0.9
0.809 0.797 0.793
0.748 0.8
0.723
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
histogram parametric eigen&histogram eigen eigen&histogram eigen¶metric
2385