Final Report Corrected
Final Report Corrected
INSTITUTE OF ENGINEERING
THAPATHALI CAMPUS
Submitted By:
Chhabi Raman Adhikari (THA074BEX008)
Gaurav Bhattarai (THA074BEX012)
Gaurav Sharma (THA074BEX013)
Pratik Thapa (THA074BEX023)
Submitted To:
Department of Electronics and Computer Engineering
Thapathali Campus
Kathmandu, Nepal
Submitted By:
Chhabi Raman Adhikari (THA074BEX008)
Gaurav Bhattarai (THA074BEX012)
Gaurav Sharma (THA074BEX013)
Pratik Thapa (THA074BEX023)
Submitted To:
Department of Electronics and Computer Engineering
Thapathali Campus
Kathmandu, Nepal
In partial fulfillment for the award of the Bachelor’s Degree in Electronics and
Communication Engineering.
We hereby declare that the report of the project entitled “AI Powered Shopping List
Generation And Recommendation System” which is being submitted to the
Department of Electronics and Computer Engineering, IOE, Thapathali Campus,
in the partial fulfillment of the requirements for the award of the Degree of Bachelor of
Engineering in Electronics and Communication Engineering, is a bonafide report of
the work carried out by us. The materials contained in this report have not been
submitted to any University or Institution for the award of any degree and we are the
only author of this complete work and no sources other than the listed here have been
used in this work.
i
CERTIFICATE OF APPROVAL
The undersigned certify that they have read and recommended to the Department of
Electronics and Computer Engineering, IOE, Thapathali Campus, a major project
work entitled “AI Powered Shopping List Generation And Recommendation
System” submitted by Chhabi Raman Adhikari, Gaurav Bhattarai, Gaurav
Sharma and Pratik Thapa in partial fulfillment for the award of Bachelor’s Degree in
Electronics and Communication Engineering. The project was carried out under special
supervision and within the time frame prescribed by the syllabus.
We found the students to be hardworking, skilled, and ready to undertake any related
work to their field of study and hence we recommend the award of partial fulfillment
of the Bachelor’s degree in Electronics and Communication Engineering.
Project Supervisor
Er. Rama Bastola
Department of Electronics and Computer Engineering, Thapathali Campus
External Examiner
Dr. Shailesh Pandey
Martin Chautari, Thapathali, Kathmandu, Nepal
Project Coordinator
Er. Umesh Kanta Ghimire
Department of Electronics and Computer Engineering, Thapathali Campus
Head of Department
Er. Kiran Chandra Dahal
Department of Electronics and Computer Engineering, Thapathali Campus
The author has agreed that the library, Department of Electronics and Computer
Engineering, Thapathali Campus, may make this report freely available for inspection.
Moreover, the author has agreed that the permission for extensive copying of this
project work for the scholarly purpose may be granted by the professor/lecturer, who
supervised the project work recorded herein or, in their absence, by the head of the
department. It is understood that the recognition will be given to the author of this report
and the Department of Electronics and Computer Engineering, IOE, Thapathali
Campus in any use of the material of this report. Copying or publication or other use of
this report for financial gain without the approval of the Department of Electronics and
Computer Engineering, IOE, Thapathali Campus, and author’s written permission is
prohibited.
Request for permission to copy or to make any use of the material in this project in
whole or part should be addressed to the Department of Electronics and Computer
Engineering, IOE, Thapathali Campus.
iii
ACKNOWLEDGEMENT
It gives us immense pleasure to express our deepest sense of gratitude and sincere
thanks to our highly respected and esteemed guide Er. Rama Bastola for her valuable
guidance, encouragement and help for getting us involved in this project. Her useful
suggestions for this project and co-operative behavior are sincerely acknowledged.
We would like to express our sincere thanks to Er. Saroj Shakya for giving us this
guidance to undertake this project selection and drafting the respective report.
At the end we would like to express our sincere thanks to all our friends and others who
helped us directly or indirectly during this project selection and making us realize the
current real time problems that can be solved somehow through this project.
iv
ABSTRACT
During the day-to-day activities, where we may have more important work to
remember and accomplish, many of us may not have time or may forget to
maintain the painful and tedious shopping list. The less useful items can be bought
during next shopping but forgetting the shopping of some needy products may create
difficult situations sometimes. The main aim of this project is to help people to generate
shopping list easily and efficiently. The user will be able to get a list of shopping items
that are listed in the website.
This project will help to process and identify items with the help of barcode scanning,
image recognition and speech recognition. The items having the barcode are scanned
with the barcode scanner and remaining items which lacks barcode are listed using
image recognition with the help of CNN. Items finding complexity in image recognition
are listed with the help of speech recognition. The website that is designed will give
the user interface in order to interact with the items of shopping list. The items that are
purchased can be removed manually from the list making it more efficient. The users
are also provided with the options to order listed items instantly through shopping
sites along with the item recommendation looking at user’s history of purchase.
v
TABLE OF CONTENTS
DECLARATION.......................................................................................................... I
ACKNOWLEDGEMENT ........................................................................................ IV
ABSTRACT ................................................................................................................. V
1. INTRODUCTION............................................................................................... 1
vi
5. SYSTEM ARCHITECTURE AND METHODOLOGY ............................... 14
vii
6.5.1 Barcode ................................................................................................ 37
6.5.2 Barcode API ......................................................................................... 37
6.5.3 Barcode Generator ............................................................................... 38
6.5.4 Barcode information ............................................................................ 38
6.6 TEXTUAL CLUSTERING-BASED RECOMMENDATION SYSTEM (K-MEANS
ALGORITHM) ........................................................................................................ 38
6.6.1 Dataset.................................................................................................. 38
6.6.2 Libraries imported for Recommendation ............................................. 38
6.6.3 Text conversion .................................................................................... 39
6.6.4 Elbow method for optimum value of cluster ....................................... 40
6.6.5 Silhouette Plot for validation of cluster size ........................................ 40
6.6.6 Fitting K-means to the dataset ............................................................. 40
6.6.7 Predicting the cluster based on keywords given .................................. 40
6.7 SPEECH RECOGNITION ................................................................................... 40
viii
7.5.8 Precision Recall Curve (PR curve) ...................................................... 55
7.6 RECOMMENDATION SYSTEM RESULTS ............................................................ 57
7.6.1 Input Dataset ........................................................................................ 57
7.6.2 Data conversion ................................................................................... 58
7.6.3 Elbow diagram ..................................................................................... 58
7.6.4 Visualization of cluster in linear diagram ............................................ 59
7.6.5 Silhouette Plot for cluster size obtained from Elbow method ............. 60
7.6.6 Top keywords of cluster....................................................................... 61
7.6.7 Recommendation of cluster ................................................................. 62
7.7 SPEECH RECOGNITION ................................................................................... 65
7.7.1 Output from Vosk ................................................................................ 65
7.7.2 Generate CSV file ................................................................................ 65
7.7.3 Shopping list generation ...................................................................... 65
9. CONCLUSION ................................................................................................. 68
REFERENCE ............................................................................................................. 74
ix
LIST OF FIGURES
x
Figure 7-20: Sparse matrix of text data........................................................................ 58
Figure 7-21: Elbow Diagram for K-means algorithm.................................................. 59
Figure 7-22: Cluster visualization ................................................................................ 60
Figure 7-23: Silhouette Score plot for clustering ......................................................... 61
Figure 7-24: Top keywords of cluster .......................................................................... 62
Figure 7-25: Recommendation sample ........................................................................ 63
Figure 7-26: Recommended items ............................................................................... 64
Figure 7-27: Output from Vosk ................................................................................... 65
Figure 7-28: CSV file................................................................................................... 65
Figure 7-29: Grocery item added to the shopping list ................................................. 66
Figure 10-1: Value of original data ............................................................................. 70
Figure 10-2 Value of augmented data ......................................................................... 71
Figure 10-3 Training result of VGG19 model ............................................................. 73
xi
LIST OF TABLES
xii
LIST OF ABBREVIATIONS
AI Artificial Intelligence
CNN Convolution Neural Network
CV Computer Vision
MVC Model View Controller
NLP Natural Language Processing
IoT Internet of Things
UI User Interface
ReLU Rectified Linear Unit
VGG Visual Geometry Group
TF-IDF Term-Frequency/Inverse Document Frequency
WCSS Within-Cluster-Sum-of-Squares
PR Precision Recall
xiii
1. INTRODUCTION
Computer vision has been started in the areas where there requires the computer
analysis for image identification and classification. Image recognition is a part of
computer vision that has further enlighten the process of digital image detection, useful
pattern extraction and most importantly supported decision making and automation.
Similarly, recommendation system has also been helpful in various e-commerce sites
which ease user interface. There has been a lot of projects regarding the image
recognition and its analysis. This project has also been conceptualized from the same
with certain variation in it and some added functionality with a hope to inject its
usefulness in real-life.
1
Speech recognition as computer processing for speech recognition and speech text
translation has been employed for various projects. It involves processing of human
speech into written form and finally recognizing specific speech. Voice recognition is
a bit different than speech recognition where user’s voice is identified and no further
work is performed. Speech processing finds application in the many places like
generating product name form user speech.
1.2 Motivation
Since the introduction of computer vision and image processing, various projects have
been developed to get rid of tedious and time-consuming tasks. This project is
developed to provide better experience in shopping through automatic shopping list
generation and other integrated features. Prevailing shopping list generation systems
are focused on listing of the items only. Also, the existing shopping list generators do
not have features like recommendation system and ecommerce site integration. The
user has to put an extra effort to find the appropriate ecommerce website. Every time,
the user has to rely on input system to generate shopping list in the absence of
recommendation system. Considering these short comings of the existing systems, we
have worked on this project to provide a better solution by incorporating ecommerce
websites and recommendation system. This project aims in providing users with better
shopping experience and relieve them from troublesome process of making manual
shopping list. The user won’t be required to scroll through number of websites just for
finding the suitable option. Also, with the help of recommendation system, the user can
instantly add the items without using provided input methods.
There are some objectives for which the projects are made. Similarly, this project also
has some specific objectives that are listed below.
2
1.4 Project Applications
This project finds its application in different fields with few or no modifications. The
applications are listed as follows.
This project also finds its applications in various other field like online shopping from
the list, expiry date alert system etc.
The developed system is a convenient method for generating shopping list. For
generating a list of products, the system is equipped with three different recognition
methods. The object thus recognized is listed and the user can access this list later and
specify the objects required. The system then generates the best match for the given
item and generates an interface for buying the product. Also, the system is equipped
with recommendation system that is handy just in case the user needs to order more
items.
Regarding the report organization, the whole report can be divided into several
chapters. The introduction of the project topic has been described in chapter 1. Every
project has its specific objectives along with applications and scopes which is also
described in this chapter. Chapter 2 is all about background and literature review related
to the project. Requirement analysis of different software tools is done in chapter 3. The
project is associated with specific dataset and its analysis is also described which in
chapter 4. Different algorithms, methodology used in this project is described in chapter
3
5. The block diagram and its working principle in discussed in chapter 5. These
algorithm and methodology find their specific application in this project and its
implementation details in chapter 6. The output is discussed in chapter 7. Chapter 8
contains the future enhancements. Chapter 9 contains the conclusion of the project.
Chapter 10 includes the appendices related to the project.
4
2. LITERATURE REVIEW
There are many technological developments in internet-assisted looking within the past
few decades. Today’s reach of web and also the availableness of the mobile
development technology have actually contributed these innovative advancements.
varied alternative comes are exhausted the past associated with this project. One in all
the comes is that the hybrid looking list. Heinrichs F., Schreiber D. and Schöning J.
underneath the patronage of academician. Dr. Antonio Krüger, worked on the project
to form epitome for a hybrid mobile application combining the benefits of paper and
electronic looking lists [1]. Similarly, Marcus Liwicki and therefore the 3 team
members (Sandra Thieme, Gerrit Kahl, and Andreas Dengel) developed a system that
mechanically extracts the meant things for buying from a written looking list that is
termed intelligent looking list [2]. Another fascinating development would be the work
ended by faith and also the team relating to a developing a image for making a looking
list from multiple supply devices like desktop, good phones, telephone circuit or
telephone in several formats, basically structured text, audio, still pictures, video,
unstructured text and annotated media titled multimodal looking list [3]. Similar to
Marcus Liwicki’s study, Nurmi and therefore the cluster introduce a product retrieval
system that maps the content of looking lists written in different language into the
relevant real-world product during a food market [4]. GeniCan project was also done
previously with the facility of barcode scanner and speech recognition to generate
shopping list. The products were recognized before throwing in the dustbin and the list
was generated using mobile app. The device with camera was attached within the
dustbin [5].
Apart from the above projects we added few more features to make our project more
efficient and interactive. Recommendation system is used in this project for the benefit
of the users which recommends the items similar to the items that the user just bought.
Comparative analysis of the different shopping sites of the respective item and
recommending the best options is also available so that user will be able to find out best
among different alternatives. We also emphasize on image recognition of some of the
items which lack barcode.
5
Collaborative filtering recommendation technology will be divided into user-based and
item-based recommendation technology. User-based cooperative filtering
recommendation technology predict item ratings supported the ratings of different users
to come up with item recommendations. However, its recommendation quality is well
tormented by the poorness of user analysis information. The content-based
recommendation technology is to analyze the characteristics of the item content info
and calculate the matching degree with the user’s interest to recommend things.
Therefore, compared with cooperative filtering recommendation, content-based
recommendation is a smaller amount addicted to evaluation information. [6]
Since the unfold of the MVC (Model View Controller) pattern into wed development,
Python has provided quite few selections once it involves web frameworks, like
Django, TurboGears and Zope. Though selecting one out of the many may be confusing
initially, having many competing frameworks will solely be an honest factor for the
Python community, because it drives the event of all frameworks any and provides an
expensive set of choices to decide on. [7]
6
There are unit four main ways in which recommender systems turn out an inventory of
recommendations for a user – content primarily based, Collaborative, Demographic and
hybrid filtering. In content-based filtering the model uses specifications of associate
item so as to suggest further things with similar properties. cooperative filtering uses
past behavior of the user like things that a user antecedently viewed or purchased, in
summation to any ratings the user gave those things rate and similar conclusions created
by alternative user's things list. To predicts things that the user could notice fascinating.
Demographic filtering is read user profile knowledge like age class, gender, education
and living space to seek out similarities with alternative profiles to urge a brand new
recommender list. Hybrid filtering combines all 3 filtering techniques. This project is
supposed to be proceed using Collaborative filtering. [9]Amazon.com launched item-
based collaborative filtering in 1998, enabling recommendations at a previously unseen
scale for millions of customers and a catalog of millions of items. [10]
We use speech recognition algorithms daily with our phones, computers, home
assistants, and more. Every of those systems use algorithms to convert the sound waves
into helpful knowledge for process that is then taken by the machine. a number of these
machines use older algorithms whereas the newer systems use neural networks to
interpret this knowledge. These systems then manufacture associate degree output
generated within the style of text to be used. an outsized quantity of coaching
knowledge is required to create these algorithms and neural networks perform
effectively. one amongst the more practical ways is that the Hidden Markov Model
(HMM) with associate degree finish purpose detection formula for pre-processing to
get rid of unwanted noise. The HMM needs the addition of different tools to properly
interpret speech. [11]
Data augmentation of the input options derived from the Short-Time Fourier remodel
(STFT) has become a preferred approach. However, for several speech process tasks,
there's proof that the mix of STFT-based and Hilbert–Huang remodel (HHT)-based
options improves the performance. The Hilbert spectrum will be obtained with
adaptative mode decomposition (AMD) techniques, that are noise-robust and
appropriate for non-linear and non-stationary signal analysis. [12]
Caffe framework and AlexNet model were used to extract the feature data regarding
pictures. AlexNet, as a representative of the Deep neural network, is associate degree
7
8-layer model. It contains five convolution layers and three fully-connected lays. Since
the deep structure and many parameters within the model, it gets additional options
from original information than ancient CNN. [13] K-means rule is applied to see
clusters of sentences for the formation of the ultimate outline. The worth of ‘K’ is set
exploitation the Elbow methodology. [14]. The scores for every sentence square
measure computed using Term-Frequency/Inverse Document Frequency (TF-IDF) of
constituent words and overlap with the title of the story and its point worth. Deep
learning has given way to a replacement in era of machine learning, apart from vision.
Convolutional neural networks are enforced in image classification, segmentation and
object detection. [15]
VGG19 could be a similar model design as VGG16 with 3 extra convolutional layers,
it consists of a complete of sixteen Convolution layers and three dense layers.
Following is that the design of VGG19 model. In VGG networks, the employment of
3x3 convolutions with stride one provides a good receptive filed cherish seven * seven.
This implies there are very less measuring parameters to evaluate. Each year ImageNet
competition is hosted within which the smaller version of this dataset (with one
thousand categories) is employed with an aim to accurately classify the photographs.
several winning solutions of the ImageNet Challenge have used state of the art
convolutional neural network architectures to beat the most effective potential accuracy
thresholds. [16]
8
3. REQUIRMENT ANALYSIS
3.1.1 Django
Django is a Python web framework that allows the developers to create modern
websites. The pre-existing framework allows the users to create a website without
initiating from the scratch. Django has an active community with experienced
developers. Also, the professional documentation contributes a lot during the
development of websites. It offers options for free and paid-for support. Having written
in Python, Django is supported in number of platforms like Linux, Windows, Mac OS
to name a few. Also, web hosting providers provide necessary resources to handle
websites developed using Django. In this project, Django finds its applications in
website development which is the major platform for user interface in the project.
Django is used as our backend support to perform the CRUD (Create, Retrieve, Update,
Delete) operations on a table in a database. It also helped in the frontend works of our
web development.
3.1.2 TensorFlow
TensorFlow is the most used platform for machine learning due to its abundance of
community resources, tools and libraries. Data scientists and developers can efficiently
build and deploy ML powered applications. Keras is an API running on top of the
TensorFlow platform. The API is written in Python language. The library files of
TensorFlow are used for image recognition and speech recognition in this project.
3.1.3 Kaggle
Kaggle is a platform where the users can find relevant data sets, publish their own
datasets and work with the professionals and take part in various challenges related to
data science. The simplest and best-supported file type available for handling of tabular
data is the “Comma-Separated Value” abbreviated as CSV. Kaggle hosts competitions
aiming for development of machine learning. It is python base framework which got
lot of codes and libraries that will be helpful in various deep learning projects. Since
our personal computer lacked high performance graphics and other processing units,
Kaggle provides bunch of CPUS, GPU, memory (RAM) and other essential
9
components to develop the computationally complex and deep CNN models. We were
able to use Kaggle’s GPU for 30 hours in a week. Our project is also complex
processing type which is performed by using Kaggle.
10
4. DATASET ANALYSIS
Image classification deals with pattern observation of the dataset for extracting features
out of the image. We use filters while using CNNs. Filters help us exploit the spatial
neurons.
The images with proper size and orientation were collected from
google. Very few of those images matched our image criteria and we
had to filter out many images that were not fit for our project.
➢ Kaggle
➢ Camera
We clicked the photo of the items we needed from various stores and
resized them according to our need to put them in our dataset manually.
The dataset contains various images of the grocery items (like apple, banana, maize,
cauliflower etc.) which were not recognized by bar code scanner. There are 19 classes
of items with many numbers of images in our dataset as shown in table 4.1. The number
of images in this dataset is 23927 in total among which 19142 images are used for
training and 4785 are used for validation of images.
11
The various steps that we followed for the CNN dataset can be listed as follows: -
The dataset of our interest for image classification have been prepared by us. The JPEG
and PNG images are collected and are grouped and labelled with respect to folder name.
All together about 19 classes has been collected for our dataset and is zipped in order
make it available for the training in CNN. The items collected are of grocery type.
Respective folder name contains the dataset required for identifying the items after
training.
The dataset thus obtained are made ready for training. The resizing, categories or
labeling of images is done. The resizing of 224×224 is best for optimum output. There
is an array that maintains and make arrangement of image pixel values along with the
index for images in the list.
The shuffling of the dataset is done which gives better result. The random shuffling
helps to maintain the score of models.
The data being forwarded is scaled to standardize the input to a layer. This helps in
stabilizing the learning process and leads to faster convergence.
The dataset is compiled with the CNN model. On basis of dataset volume and quality,
model can be best fit or not.
The score and accuracy level of model can be obtained. It can be analyzed for many
times to get fine performance of CNN model.
12
Table 4-1: Dataset
13
5. SYSTEM ARCHITECTURE AND METHODOLOGY
14
5.2 Elaboration of working Principle
The system is able to generate a list of items by recognizing the given item. In order to
recognize different types of items, the objects are classified as objects with barcode and
objects without barcode. The objects without bar cord are further classified into objects
that are recognizable through image recognition and objects that cannot be recognized
through the image recognition.
There are three different methods applied to determine the objects based on the
classification made. Objects with bar code are determined with the help of bar code
scanning. For the objects which do not contain barcode but are recognizable through
the image recognition process (for example: notebooks), the image is taken and
processed for image recognition. Finally, for the objects which neither contain bar code
nor are recognizable through image (for example: sugar), the system allows user to
enter the product name through voice recognition.
When the object is recognized, this object is added to the shopping list. The shopping
list is updated every time when a new object is entered. If the user needs to order an
item from the list, the system provides an interface to select the item that needs to be
ordered. Once the system knows what the user is willing to buy, the system goes
through the process of exploring the best alternatives within options and generates an
interface to allow the user to buy the item.
When the user is done purchasing the item, the recommendation system prompts the
user with items that are frequently bought together with the item the user has recently
bought.
15
producing method management, risk management and mitigation, validation,
knowledge target selling, and client analysis. Extremely specialized uses of neural
networks embody detection of mines beneath the ocean, medium package recovery,
diagnosing of diseases, 3D seeing, face and speech recognition, handwriting
recognition, etc.
A nerve cell is that the basic unit that receive input from associate external supply or
alternative nodes. every node is connected with another node from subsequent layer,
and every such affiliation includes a specific weight. Weights are assigned to a nerve
cell supported its relative importance against alternative inputs. once all the node values
from the input layer are increased (along with their weight) and summarized, it
generates a worth for the primary hidden layer. supported the summarized price, the
primary hidden layer includes a predefined “activation” operate that determines
whether or not or not this node is “activated” and the way “active” it'll be. although
calculation is completed during this layer and it generates the worth for next hidden
layer. once passing through multiple hidden layers the output result obtained within the
output layer ultimately. the most computation of a Neural Network takes place within
the hidden layers. So, the hidden layer takes all the inputs from the input layer and
performs the required calculation to get a result. This result then forwarded to the output
layer so the user will read the results of the computation.
In a Neural Network, the learning (or training) process is initiated by dividing the data
into three different sets:
16
• Training dataset – This dataset allows the Neural Network to understand the
weights between nodes.
• Validation dataset – This dataset is used for fine-tuning the performance of the
Neural Network.
• Test dataset – This dataset is used to determine the accuracy and margin of error
of the Neural Network.
Once the data is segmented into these three parts, Neural Network algorithms are
applied to them for training the Neural Network.
CNN is a feed forward neural network that processes structured arrays of data such as
images. Convolutional neural networks picks patterns in the input image, such as lines,
gradients, circles, or even eyes and faces. Convolutional neural networks contain many
convolutional layers stacked on top of each other, each one capable of recognizing more
sophisticated shapes. CNN casts multiple layers on images and uses filtration to analyze
image inputs.
In case of deep neural networks neuron in a given layer is fully connected to all the
neurons in the previous layer. Because of large number of connections, parameters to
be learned increases. As the number of parameters increases the network becomes more
complex. This more complexity of the network leads to overfitting. Especially, in the
case of Image data being pixel values of the images as features, the number of input
features would be of large dimension. And the most of the pixel portions of the images
may not contribute in predicting the output. To overcome these challenges, the
Convolution Neural Networks were discovered. In this, the input image data will be
subjected to set of convolution operations such as filtration and max pooling. Then, the
resultant data which will be of lesser dimension compared to the original image data
will be subjected to fully connected layers to predict output. By performing the
convolution operations, the dimensionality of the data shrinks significantly large. This
decreases the parameters to be learned. Hence, the network complexity decreases which
leads to less chances of overfitting.
17
1. Convolutional layers make use of inherent properties of images.
2. CNN would give better performance when trained on shuffled images.
3. CNN take advantage of local spatial coherence of images. This means that they
are able to reduce dramatically the number of operations needed to process an
image by using convolution on patches of adjacent pixels, because adjacent
pixels together are meaningful. We also call that local connectivity. Each map
is then filled with the result of the convolution of a small patch of pixels, slid
with a window over the whole image.
4. There are also the pooling layers, which downscale the image. This is possible
because we retain throughout the network, features that are organized spatially
like an image, and thus downscaling them makes sense as reducing the size of
the image. On classic inputs you cannot downscale a vector, as there is no
coherence between an input and the one next to it.
As CNN provide us with these various features, we used it in our project rather than
other neural networks like DNN and RNN which are not feasible and are more complex
than CNN.
The various learning process that a computer uses to learn are supervised, unsupervised
and semi supervised. We used supervised learning for our CNN model. In supervised
learning all the observations within the dataset are tagged and therefore the algorithms
learn to predict the output from the input file. Likewise, we also have the input images
and output result and we trained our model to obtain the output from those inputs by
using CNN.
18
Figure 5-3: Convolutional Neural Network (CNN)
CNN is a combination of layers which transform an image into output that is understood
by the model. A CNN consists of following layers:
Convolutional Layers produce a feature map by applying filter that scans the image
researching no. of pixels at a time. the foremost common sort of convolution that's used
is that the 2D convolution layer and is abbreviated as conv2D. A filter slides over the
2D input file, performing element-wise multiplication. As a result, it sums up the results
into solo pixel. Constant operation is performed for each location it slides over,
reworking a 2D matrix of options into a unique 2D matrix of options.
Feature map:
Feature maps are generated by applying Filters or Feature detectors to the input image
or the feature map output of the previous layers. Feature map image can give insight
into the inner representations for specific input for every of the Convolutional layers
within the model.
1. Define a brand new model, that may take a picture because the input. The
output of the model are going to be feature maps, that are associated in
19
intermediate illustration for all layers once the primary layer. This is often
supported the model we've got used for coaching.
2. Load the input image that| we would like to look at the Feature map to know
which options were outstanding to classify the image.
5. Run the input image through the visualization model to obtain all
intermediate representations for the input image.
6. Create the plot for all of the convolutional layers and the max pool layers
but not for the fully connected layer.
Zero Padding: Zero-padding refers to the method of symmetrically adding zeroes to the
input matrix. It’s a normally used modification that permits the dimensions of the input
to be adjusted to our demand. it's employed in planning the CNN layers once the scale
of the input volume ought to be preserved within the output volume.
Stride: Stride is the number of pixels shifts over the input matrix.
The pooling layer operates upon every feature map one by one to make a replacement
set of identical variety of pooled feature maps. A pooling layer is accessorial once the
convolutional layer once a non-linearity has been applied to the feature maps output by
a convolution layer.
Pooling involves choosing a pooling operation very like a filter to be applied to feature
maps. The dimensions of filter is smaller than the dimensions of the feature map. This
suggests that the pooling layer can invariably scale back the dimensions of every feature
20
map by an element of two, reducing the number of pixels or values in every feature map
to 1 quarter the dimensions.
The common pooling functions utilized in the pooling operation square measure
average pooling wherever the typical worth for every patch on the feature maps is
calculated. the opposite one is most pooling wherever the most worth for every patch
of the feature map is calculated. Max pooling is just a rule to require the most of a
locality and it helps to proceed with the foremost necessary options from the image.
The results of employing a pooling layer and making down sampled or pooled feature
maps could be a summarized version of the options detected within the input. they're
helpful as little changes within the location of the feature within the input detected by
the convolutional layer can lead to a pooled feature map with the feature on identical
location. This capability accessorial by pooling is termed the model’s unchangeability
to native translation.
5.3.1.3 Flattening:
Output from the previous layers is flattened to a single vector so that they can be input
to the next layer.
21
Figure 5-6: Flattening Operation
Fully Connected layers in neural networks are those layers wherever all the inputs from
one layer are connected to each activation unit of upcoming layer. In many machine
learning models, the previous couple of layers are full connected layers that compiles
the info extracted by previous layers to make the ultimate output. it's the second most
time consuming layer second to Convolution Layer.
After feature extraction we want to classify the info into varied categories, this may be
done employing an absolutely connected neural network. Adding absolutely connected
layers create the model end-to-end trainable. The absolutely connected layers learn a
perform between the high-level options given as associate degree output from the
convolutional layers.
22
Figure 5-7: Fully Connected Layer
5.3.1.5 Dropout
Once the options are connected to fully connected layer, it will result overfitting within
the dataset. Overfitting happens once a specific model performs therefore well on the
training knowledge inflicting a negative impact within the model’s performance once
used on a brand-new knowledge.
One of the foremost necessary parameters of the CNN model is the activation function.
They're accustomed to learn and approximate any reasonably continuous and
sophisticated relationship between variables of the network. In easy words, it decides
that info of the model ought to fire within the forward direction and which of them
mustn't.
23
It adds non-linearity to the network. There are unremarkably used activation
functions like the ReLU, Softmax, tanH and also the Sigmoid functions. Every of those
functions have a selected usage. For a binary classification CNN model, sigmoid and
softmax functions are most well-liked and for a multi-class classification, typically
softmax is employed. We've used Relu and softmax in our project.
24
5.3.2 AlexNet Algorithm
25
Figure 5-10: AlexNet Architecture
26
5.3.3 VGG-16 Algorithm
27
VGG16 is a CNN model planned by K. Simonyan and A. Zisserman. ‘VGG’ is that
abbreviation for Visual Geometry Group, that is a group of researchers at the University
of Oxford who developed this design, and ‘16’ implies that this design has sixteen
layers. It is advance than AlexNet due to exchange of massive kernel-sized filters with
multiple 3×3 kernel-sized filters one once another.
The main idea of the VGG19 model is same as VGG16 except it has 19 layers. This
means that VGG19 has three more convolutional layers.
A fixed size of (224 * 224) RGB image was given as input to the current network. The
sole preprocessing that's done is that it subtracts the mean RGB price from every
component, computed over the full training set. Applying the kernels of (3 * 3) size
with a stride size of one pixel enabled them to hide the full notion of the image. spatial
artifact was accustomed preserve the spatial resolution of the image. Max pooling is
performed over a pair of 2*2-pixel windows with stride 2. This can be followed by
ReLU to introduce non-linearity to model, classify higher and to boost machine time
because the previous models used tanh or sigmoid.
VGG-19 used three fully connected layers where first two were of size 4096 and a layer
with 1000 channels for 1000-way ILSVRC classification. Final layer is a SoftMax
function.
28
Figure 5-12: VGG-19 Flowchart
29
5.4 Recommendation System
Recommendation systems used in this project is done using textual clustering with K-
means algorithm. Text clustering is that the task of grouping a group of unlabeled texts
in such the way that texts within the same cluster are alike to those in other clusters.
Text clustering algorithms methods text and verify if natural clusters exist within the
information. Here during this project, K suggests that clustering formula is employed.
30
Figure 5-13: K-means algorithm flowchart
31
6. IMPLEMENTATION DETAILS
Web scrapping is done for three ecommerce sites namely Alibaba, sastodeal and
amazon India. Python library are used for web scrapping. The information obtained
from this includes description, price, image URL and volume of the first three items for
searched result. Barcode monster API is used to interface the items with barcode.
Website development for user interface in done with the help of Django. The website
includes shopping list and instant order for added item.
➢ tf.keras.layers.Conv2D
This layer creates a convolution kernel that is convolved with the layer input to
produce a tensor of outputs.
➢ tf. keras.layers.MaxPooling2D
This layer performs the down sampling of input along its spatial dimensions by
taking the maximum value of an input window .The window is shifted
by strides along each dimension and the max pooling is performed in every
strides.
➢ tf.keras.layers.Dropout
Over fitting is prevented by the dropout layer. This layer randomly sets input
units to 0 with a frequency of rate at each step during training time.
➢ tf.keras.layers.Flatten
32
It flattens output obtained from max pooling layer. It converts 3 dimensional
array into 2 dimension.
➢ tf.keras.layers.Dense
Dense layer is used in the final stage of neural network. It accepts input and
gives the output using activation function as relu and adding bias which helps
to classify the image.
1. Data collection
2. Data preprocessing
The data collected were preprocessed to make them the input that is fit for the
machine learning. We made the dataset of the images from various images
collected by resizing them to the same scale and by balancing the number of
images in all classes thereby making the machine learning model to process
those data easily.
The dataset is divided into training and testing dataset. The input as training
dataset is used to train the model while the testing dataset is used for testing
result(accuracy) of the machine.
3. Model selection
Since we are trying to identify the grocery items, we used the classification
model CNN.
4. Model training
33
The training dataset are trained using the model selected.
5. Evaluation
After training the dataset the test dataset is fed as input to the model and is
evaluated. If the evaluation is wrong, we change the parameters and train the
model again. We used K-fold cross validation, Confusion Matrix, Precision-
Recall Graphs, ROC Curves for evaluation of our model.
6. Performance tuning
Performance tuning is done to get the high accuracy in the output. We tune the
performance of the machine by various methods like increasing the number of
the epochs, batch size, image resolution, calculating the learning rate with the
help of graphs plotted between learning rate and loss, finding and deleting the
corrupt data in the dataset or changing the hyperparameters.
7. Prediction
We predict the outcome by taking the real time input after performance tuning.
➢ NumPy
NumPy is a library for handling the multidimensional arrays.
➢ Matplotlib
Matplotlib is a library for visualization of plots and graphs for
analyzing data.
The learning rate changes the model according to error while updating the weights and
bias. Learning rate should be tuned properly. The high or low value may results for
improper training.
34
6.2.3 Confusion matrix
A confusion matrix contains the correctly classified and misclassified counts for each
class. The models confusion is calculated with the help of this matrix. The high value
in the diagonal results the good performance of the model. While making the prediction
for classification the model may be confused for classifying the items into false class.
The whole performance can be monitored by the help of this matrix.
A PR curve is a graph having Precision values on the y-axis and Recall values on the
x-axis. Precision is also called the Positive Predictive Value (PPV). Recall is called
Sensitivity, True Positive Rate (TPR).
Precision = TruePositives / (TruePositives + FalsePositives)
Recall = TruePositives / (TruePositives + FalseNegatives)
6.2.5 Regularization
6.2.6 Optimizers
Optimizers are algorithms for changing the attributes of a neural network. The attributes
may be weights and learning rate which helps to reduce the losses. Optimization
algorithms are responsible for providing accurate results possible. There are many
optimizers, some of them are ADAM, Gradient Descent, Momentum. Among these
optimizers, for this project the ADAM optimizer used.
Adaptive Moment Estimation (ADAM) is an algorithm for optimization technique for
gradient descent. Dealing with big problem having lot of data can be easy by the use of
this methos. It requires less memory and is efficient.
35
6.2.7 Hyper-parameter Tuning
Website development is performed using Django. It is free and open source web
framework based on python. Various progressive tasks performed are listed below:-
The project is created using the “Django admin start-project ‘project name’” command.
The project thus created can be viewed in the browser.
SQLite is used as the default database in Django. It uses 'Object Relational Mapper
(ORM)' which makes it really easy to work with the database. Inside 'blog/models.py',
we need to create a new model. This is a class that will become a database table
afterward which currently inherits from 'models.Model'.
36
6.4 Web scrapping
In order to access the data from ecommerce site we used web scraping. The web
scraping is done using python code. Especially three library functions are utilized for
this purpose. Beautiful Soup, pandas, and requests are installed. These library functions
help in web scraping in python. To do that we need to make an HTTP call first. Then
we extract the element using Beautiful Soup.
The python library ‘Beautiful soup’ extracts data from HTML and XML file. It relies
on a parser. The default parser used is LXML. It provides idiomatic ways of navigating,
searching and modifying the parse tree.
The scraping is done for three ecommerce websites. Alibaba, Amazon India and
Sastodeal are scrapped. From all the three websites the first three results of the searched
product are taken into consideration. The URL of the product, title, price and image
URL are taken as information from web scrapping. This information is used for
shopping list generation. Also, the user can directly do shopping from within the
website which will redirect it to the original ecommerce shopping site.
The items having barcode can be read using its barcode. Barcode of item is handled
with the help of barcode monster. The barcode analysis can be done in following steps:
6.5.1 Barcode
There are two types of barcodes supported by barcode monster namely EAN-8 and
EAN-13. EAN is standard barcode used in most of the products available. It is
compatible with UPC and JAN. UPC contains machine readable barcode and twelve
digit number underneath. In this project, EAN 13 format is used.
API defines the set of rules for programs enabling them to communicate with each other
while exposing data and functionality. Barcode Monster used in this project provides
37
some REST APIs for looking up barcode information. This API is accessed via HTTP
protocol.
The python library is used to generate barcode EAN format. This includes two steps.
One is getting barcode scanned and another is getting barcode. The library import ‘cv2
as cv’ is used to access the computer camera. The barcode is scanned with camera port
inbuilt in laptop. Then the library ‘pyzbar.pyzbar import decode’ helps to decode the
scanned barcode. In this way we can get numerical digits which can be accessed using
API.
Information is provided both in HTML and in JSON format. The HTML format works
by inserting the code after /code URL path. To use the API and get a JSON formatted
content, we can call /api/1234567890123 where 1234567890123 is the code we are
looking for. The barcode gives the information of company, description, image_URL
of the item.
The project has no user past history that records the user behavior like ratings given to
the items listed in the shopping list. With this textual clustering-based recommendation
finds its application in this project. The description of item listed in the shopping list is
used to make clusters and recommend the item respectively.
6.6.1 Dataset
The csv file is imported from Django database in order to fed to recommendation model
as dataset. The information from Django database is exported dynamically and contains
the information of user and the product description in different column. The words in
the product description column are used for the textual clustering.
Different libraries are imported for the recommendation model. They can be listed as:
38
➢ Sklearn
Sklearn is the popular python library providing diverse algorithms for
classification. It also performs clustering, and dimensionality reduction.
➢ TfidfVectorizer
Convert a collection of raw documents to a matrix of TF-IDF features. TF-IDF
(term frequency-inverse document frequency) is a statistical measure that
evaluates how relevant a word is to a document in a collection of documents.
➢ CountVectorizer
CountVectorizer may be a useful function provided by the scikit-learn library
in Python. Its function is to remodel a given text into a vector on the idea of the
frequency (count) of every word that happens within the entire text. This can be
useful after we have multiple such texts, and that we want to convert every word
in every text into vectors.
The texts are converted into numerical value for feature extraction and further analysis.
The resultant output after text conversion is a sparse matrix. The sparse matrix contains
the positional value rather than actual value which consumes less memory. The stop
word is also used during this process. The words which are generally filtered out before
processing a natural language are called stop words. These are actually the most
common words in any language.
39
6.6.4 Elbow method for optimum value of cluster
The Elbow method is used for determining the cluster size. The sum of squared distance
is calculated from the center of the cluster to its each point. The shape of the plot looks
like an elbow and the point of bending of elbow gives the value of cluster size. The
resources required for study and implementation of Elbow method were easily available
and was best suited for the project after our research.
The cluster size obtained from the elbow technique is employed to come up with the
Silhouette plot. Silhouette analysis is accustomed study the separation distance between
the ensuing clusters. The silhouette plot displays a calculation of how every point within
one cluster is near to neighboring clusters and so provides some way to assess
parameters like range of clusters visually. This encompasses a range of [-1, 1].
Silhouette coefficients close to +1 indicate that the sample is way off from the
neighboring clusters. a worth of zero indicates that the sample is on or terribly on the
point of the choice boundary between 2 neighboring clusters and negative values
indicate that those samples might need been appointed to the incorrect cluster.
After getting the optimum value of K from elbow method, K-means is fitted to the
dataset. It computes cluster centers and predict cluster index for each sample.
Finally, the keywords are given to the model which then predicts the cluster. The
predicted cluster texts are then recommended to the user.
Vosk is an open-source and free Python toolkit used for offline speech recognition. It
supports speech recognition in many languages. It works offline on device like
Raspberry Pi, Android and iOS. It installs with “pip3 install vosk”.
40
It is portable per-language models (50Mb each), but there are much bigger server
models available including English, Indian English, French, Spanish, Portuguese,
Chinese, Russian, Turkish, German, Catalan. The user experience enhanced using
streaming API. It supports speaker identification also. The library used in Vosk are as
follows:
• Librosa
It is a library for retrieving the information form music.
• Scipy
It is used to manipulate and visualize the data. It is built on NumPy extension.
41
7. RESULT AND ANALYSIS
Items having the barcode can be easily entered in the shopping list as we can interface
the product with its barcode scanning. The python library files play the vital role in
scanning and decoding the barcode information. The items are scanned using the
camera of the laptop.
Name of the item, company name, its description, image URL is shown after scanning
the item as shown in the figure above. This information is used to generate the shopping
list with image of the product and its price.
After web scrapping using the python libraries, we are able to access three e-commerce
websites. They are sastodeal, Alibaba and amazon. Following information are obtained
after web scrapping:
42
➢ Title of the product
➢ Price of the product
➢ Image URL
This information is collected from the first three products from the respective e-
commerce website of the searched items. The result thus obtained is shown in the figure
below.
The user will interface a screen where they are free to select the alternative for three
methods of shopping list generation. When pressed 1 on the keyboard, the barcode
scanning interface will be opened. Similarly, pressing 2 and 3 will help them to use
image recognition and speech recognition method respectively.
43
Figure 7-3: User Interface
The website for this project is built with the help of Django. The appearance of the
website along with shopping list and instant order is obtained. The whole website can
be divided into two sections.
The items that are scanned with barcode are added to the shopping list along with the
following information.
➢ Added Date
➢ Added time
➢ Name of item
➢ Item Description(volume)
➢ Image obtained from URL
44
Figure 7-4: Shopping list
The items along with the description are presented in the instant order section from
which the users can place an order from the respective website. This section includes
the items price, image, description with order now option. After clicking on the order
now option it will redirect to the original website thereby placing the order.
45
Figure 7-5: Instant Order
The user database of the website is created as soon as the user login to the website. It
includes
➢ Username
➢ Company name
46
➢ Description of the product
➢ Image URL
➢ Date and time of the product added
This database differs from user to user. The product added will take database of
respective user id used during login process.
After generating the dataset of our interest, we feed this data for training the CNN
model. Normalization and shuffling of dataset is performed before passing to the model.
The split is done as 70% and 30 % for training and validation respectively. The result
was obtained using batch size=16 and epoch= 60. Various results are used for tuning
the model parameters. The results are obtained for various models namely VGG16,
VGG19 and AlexNet.
The dataset is also checked for its accuracy without augmentation. The result thus
obtained from the figure below.
47
Figure 7-7: Graph of non-augmented data
As we can see from the graph the spikes show that model is overfitted. So, we
augmented the available data with an expectation to obtain good model with good
generalization capability.
Image data augmentation is a technique that can be used to artificially expand the size
of a training dataset by creating modified versions of images in the dataset. Image data
augmentation is used to expand the training dataset in order to improve the performance
and ability of the model to generalize. Training deep learning neural network models
on more data can result in more skillful models, and the augmentation techniques can
create variations of the images that can improve the ability of the fit models to
generalize what they have learned to new images. The result thus obtained can be shown
below.
48
Figure 7-8: Graph of augmented data
The amount of data was still not enough to build the model with acceptable accuracy.
So, we focused on collecting more image dataset. From the further study to obtain good
model we reduced the number of classes and increased the number of total images in
dataset to 23927. Further finding the optimal learning rate could help make the model
better.
The learning rate is hyperparameter when configuring our neural network. Therefore,
it is vital to know how to investigate the effects of the learning rate on model
performance and to build an intuition about the dynamics of the learning rate on model
behavior. Curve is plotted with learning rate on horizontal axis and loss on vertical axis.
The value of learning rate is tuned in VGG-19 model. The value that best fitted in model
is 0.0001. In the absence of good learning rate, the accuracy of model is found to be
very less.
49
Figure 7-9: Learning rate for VGG19
We trained on 3 different models AlexNet, VGG-16 and VGG-19. We found that the
best architecture for our application was VGG-19 as suggested by the plots below.
These accuracy and loss curves are plotted with epoch=60, batchsize=16 and learning
rate=0.0001.
Figure 7-10: Training and validation accuracy and loss curve for AlexNet model
50
With the loss vs epoch graph of AlexNet model, it can be concluded that the loss has
decreased drastically with increase in epoch. The curve of validation and training loss
is similar to each other. This suggests about a good fit model. Also, the accuracy has
been increased sharply in the first half of total training period i.e., Learning rate is high.
After that the learning rate is slower and validation curve has somehow managed to
follow the training curve.
The graph of VGG16 model suggests that the accuracy is high during first quarter of
the total training period. The loss curve suggests that model is somehow overfitted.
VGG19 model is better fitted in comparison to VGG16. The learning rate is a bit faster
than that of the VGG16. In all of the above graphs, after certain epochs the model
accuracy increases slowly.
51
Figure 7-12: Accuracy and loss curve for VGG19 model
The result of confusion matrix for different models are listed below. We can analyze
the model using these matrix elements.
It performed best on class label 5 and 14 which are carrot and orange class respectively.
It performed worst on class label 13 which is onion class. This model classified 4389
images correctly. Apple, capsicum and orange have been most correctly classified using
this model.
52
Figure 7-13: Confusion matrix form AlexNet model
Using VGG16 model, garlic has been misclassified most. Apple, banana, capsicum and
orange have been classified more correctly. It performed best on class label 5 and 14
which are carrot and orange class respectively. It performed worst on class label 13
which is onion class. This model classified 4473 images correctly.
53
Figure 7-14: Confusion matrix form VGG16 model
Using VGG19 model, chili has been most misclassified. And apple, banana, capsicum,
orange etc. have been classified more correctly. It performed best on class label 5 which
is carrot class and performed worst on class label 13 which is onion class. This model
classified 4425 images correctly.
54
Figure 7-15: Confusion matrix form VGG19 model
It shows the tradeoff between precision and recall. A high area under the curve
represents both high recall and high precision, where high precision relates to a low
false positive rate, and high recall relates to a low false negative rate. High scores for
55
both show that the classifier is returning accurate results as well as returning a majority
of all positive results .
56
Figure 7-18: PR curve for VGG19 model
The curve obtained from different models are quite alike excellent performance. The
curves have gone horizontally from top left and then vertically down which scores the
value close to 1. This suggest that the model and dataset fit well in case of analysis with
precision recall curve.
Different results are obtained from the textual clustering using K-means. Based on these
outputs the model can be analyzed.
The csv dataset for the recommendation is obtained as follows. The dataset includes
two columns. One contains the user of website and the description of item added in the
shopping list.
57
Figure 7-19: Sample of CSV dataset generated from Django database
The text data is converted into numerical value for the feature extraction and analysis.
The generated sparse matrix is as shown below. The size of matrix generated is
5000*167. It includes the positional values of vectorized form of text data.
The elbow diagram is plotted between WCSS and clusters size. The cluster size is
selected as the elbow point where the WCSS value is minimum. From the curve plotted,
we can conclude that value of k (cluster size) is 25.
58
Figure 7-21: Elbow Diagram for K-means algorithm
The data fitted with K-means is then plotted in graph with “.” As indicator. The result
thus obtained is shown below. This shows different version of cluster with its respective
cluster number.
59
Figure 7-22: Cluster visualization
7.6.5 Silhouette Plot for cluster size obtained from Elbow method
On plotting the silhouette graph, the average value obtained is 0.75 which is very close
to 1. From this we can conclude that the cluster size obtained from the elbow method
is fit for the given data. That means the obtained score in this project is good and
acceptable score.
60
Figure 7-23: Silhouette Score plot for clustering
The clusters top keywords can also be visualized for analysis of how well cluster is
made. The keywords per cluster is shown below. For simplicity only 10 keyword per
cluster is printed.
61
Figure 7-24: Top keywords of cluster
On the basis of keyword given to the model, the respective cluster is recommended. In
the figure below, the keyword is tropical food. With this text as keyword, system
predicts cluster 4 with its top keywords as frankfurter, fruit, tropical, citrus, meat, pork,
beef, soda, berries and canned. This is how the text are recommended for the website
user as per their entry in shopping list.
62
Figure 7-25: Recommendation sample
The recommendation of the items from the three ecommerce sites with the order now
option is shown below:
63
Figure 7-26: Recommended items
64
7.7 Speech Recognition
The Vosk API is triggered using keyword Okay. Then the user is given the opportunity
to add the item to the shopping list.
The grocery item that the user wants to add to the list is successfully identified using
speech recognition and then the item is written to CSV file.
After creating the CSV file, the data is transferred to the Django database. After that
the item is listed in the shopping list by extracting it from the database.
65
Figure 7-29: Grocery item added to the shopping list
66
8. FUTURE ENHANCEMENTS
Every project cannot be perfect enough to cover every aspect of related field and can
be improved. This project can also be enhanced in various fields. One of them is dataset.
The dataset used are of the grocery items only which can be further upgraded to other
fields like clothing, electronics and many more. The website interface can be made
more user friendly. The shopping list can be updated smartly like removing it
automatically if the user orders the items instantly, flushing the shopping list
periodically may be weekly, monthly and so on. The scanning of products can be
integrated directly into the website making it more effective. Furthermore, app
development can be associated with the project for reducing the time and effort of the
users. The exploration of more CNN models and increasing the number of classes of
dataset can still be done. Recommendation of the items can also be upgraded using
other filtering process. The recommendation system integrated in this project is a bit
less matured. The optimum result can be obtained by the application of ontology driven
recommendation system. With application of this recommendation system, the case of
giving unrelated recommendation can be decreased to the greater extent.
67
9. CONCLUSION
68
10. APPENDICES
Chart Title
20-Nov
12-Aug
28-Feb
19-Apr
23-Jun
4-May
1-Oct
9-Jan
Brainstorming
69
Appendix B: Value of original data
70
Appendix C: Value of augmented data
71
Appendix D: Training result of VGG19 model
72
Figure 10-3 Training result of VGG19 model
73
REFERENCE
[1] F. S. D. &. S. J. Heinrichs, "The Hybrid Shopping List: Bridging the Gap Between
Physical and Digital Shopping Lists," In Proceedings of the 13th international
conference on Human computer interaction with mobile devices and services , pp.
251-254, 2011.
[5] R. G. Bishop, "The Walls Have Ears-and Eyes-and Noses:," Home Smart Devices
and the Fourth Amendment, pp. 61, 667, 2019.
[7] A. Hourieh, learning website developmwnt with django, Packt Publishing Ltd,
2008.
74
[9] M. H. K. M. H. &. I. M. H. Mohamed, "Recommender Systems Challenges and
Solutions," International Conference on Innovative Trends in Computer
Engineering (ITCE) , pp. 149-155, 2019.
[12] D. &. M. K. Vazhenina, ". End-to-end noisy speech recognition using Fourier and
Hilbert spectrum features," Electronics, vol. 9(7), p. 1157, 2020.
[13] Z.-W. a. J. Z. Yuan, "Feature extraction and image retrieval based on AlexNet,"
International Society for Optics and Photonics, vol. 10033, 2016.
75