0% found this document useful (0 votes)
211 views59 pages

Group-Project Final Documentation2

This document is a project report submitted for the degree of Bachelor of Technology in Computer Science and Engineering. It discusses predicting restaurant reviews using natural language processing techniques. The report includes an introduction, literature survey, system analysis of existing and proposed systems, system design including system architecture and UML diagrams, system implementation details, system testing methodology and results, screenshots, and conclusions.

Uploaded by

tarun nanduri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views59 pages

Group-Project Final Documentation2

This document is a project report submitted for the degree of Bachelor of Technology in Computer Science and Engineering. It discusses predicting restaurant reviews using natural language processing techniques. The report includes an introduction, literature survey, system analysis of existing and proposed systems, system design including system architecture and UML diagrams, system implementation details, system testing methodology and results, screenshots, and conclusions.

Uploaded by

tarun nanduri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

PREDICTING THE REVIEWS OF THE RESTAURANT

USING NATURAL LANGUAGE PROCESSING TECHNIQUE

A project report submitted in partial fulfillment of the requirements for the

award of the degree of


BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE AND ENGINEERING

by
D.NAMRATHA (15A31A0566)

CH.MEGHANA (15A31A0564)

S.MOUNICA (15A31A0590)

K.SRUTHI (15A31A0575)

G.POOJA (15A31A0570)

Under the Esteemed Guidance of

Internal Guide Head of the Department

Mr.T.Soma Sekhar Dr. M.Radhika Mani

Professor Professor & HOD

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


i
PRAGATI ENGINEERING COLLEGE
(Approved by AICTE & Permanently Affiliated to JNTUK & Accredited by NBA and NAAC)
1-378, ADB Road, Surampalem, E.G.Dist., A.P, Pin-533437.

2018-2019

ii
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PRAGATI ENGINEERING COLLEGE


(Approved by AICTE & Permanently Affiliated to JNTUK & Accredited by NBA and NAAC)
1-378, ADB Road, Surampalem, E.G.Dist., A.P, Pin-533437.

CERTIFICATE

This is to certify that the Project Report entitled “Predicting The Reviews Of The
Restaurant Using Natural Language Processing Technique”, that is being
submitted by D.NAMRATHA(15A31A566), CH.MEGHANA (15A31A0564),

S.MOUNICA (15A31A0590), K.SRUTHI (15A31A075), G.POOJA (15A31A0570)


in partial fulfillment for the award of the Degree of Bachelor of Technology in
Computer Science and Engineering, Pragati Engineering College is a record of
bonafide work carried out by them.

Internal Guide Head of the Department

iii
Mr.T.Soma Sekhar Dr. M.Radhika Mani

Professor Professor & HOD

External Examiner

iv
ACKNOWLEDGEMENTS

Entrusting into Project work of “Predicting The Reviews Of The Restaurant Using

Natural Language Processing Technique” enabled us to express our special thanks


to Dr. P. Krishna Rao, Chairman of Pragati Engineering College, Surampalem.

I am extremely thankful to our honorable principal Dr. S. Sambhu Prasad, who has shown keen
interest in us and encouraged us by providing all the facilities to complete our project successfully.

I owe our gratitude to our beloved Head of the Department of CSE, Dr.M.Radhika Mani, for
assisting us in completing our project work.

I express our sincere thanks to our guide Mr.T.Soma Sekhar, who has been a source of inspiration
for us throughout our project and for her valuable advices in making our project a success.

I wish to express my sincere thanks to all teaching and non-teaching staff of Computer Science and
Engineering Department.

D.NAMRATHA (15A31A0566)

CH.MEGHANA (15A31A0564)

S.MOUNICA (15A31A0590)

K.SRUTHI (15A31A0575)

v
G.POOJA (15A31A0570)

vi
ABSTRACT

In the era of the web, a huge amount of information is now flowing over the network. Since the
range of web content covers subjective opinion as well as objective information, it is now common
for people to gather information about products and services that they want to buy. However, since
a considerable amount of information exists as text-fragments without having any kind of numerical
scales, it is hard to classify their evaluation efficiently without reading full text. Here we will focus
on extracting scored ratings from text fragments on the web and suggests various experiments in
order to improve the quality of a classifier. Methodologies like Sentiment Analysis as Text
Classification Problem, Sentiment analysis as Feature Classification with mathematical treatment
are explored. Of late, the word of mouth opinions expressed online are more valuable as people
visit the restaurant by seeing the reviews.

Keywords: Sentimental Analysis, Naive Bayes, Support Vector Machine

vii
CONTENTS

S.NO DESCRIPTION PAGE NO

ACKNOWLEDGEMENTS ...................................................................................................iii

ABSTRACT ............................................................................................................................iv

CONTENTS.............................................................................................................................v

LIST OF FIGURES ................................................................................................................vi

LIST OF TABLES ..................................................................................................................vi

1. INTRODUCTION..............................................................................................................1

2. LITERATURE SURVEY...................................................................................................3

3. SYSTEM ANALYSIS.......................................................................................................5

3.1 EXISTING SYSTEM .........................................................................................................5

3.2 PROPOSED SYSTEM .......................................................................................................6

4. SYSTEM DESIGN ...........................................................................................................8

4.1 SYSTEM ARCHITECTURE ..............................................................................................8

4.2 UML REPRESENTATION .................................................................................................9

5. SYSTEM IMPLEMENTATION.....................................................................................15

5.1 MODULES ..........................................................................................................................15

5.2 SYSTEM REQUIREMENTS ..............................................................................................15

5.3 SOFTWARE ENVIRONEMNT ..........................................................................................16

6. SYSTEM TESTING .........................................................................................................18

6.1 TESTING OBJECTIVES .....................................................................................................18

6.2 TEST PLAN .........................................................................................................................18

6.3 TEST CASES .......................................................................................................................20

viii
6.4 EXPERIMENTAL RESULTS .............................................................................................22

7. SCREENSHOTS ...............................................................................................................24

8. CONCLUSION AND FUTURE WORK ........................................................................37

9. REFERENCES ..................................................................................................................38

10. SOURCE CODE ............................................................................................................40

LIST OF FIGURES

S.NO DESCRIPTION PAGE NO

Figure 3-1 Example for bloom filter……...………………………….……..........................6


Figure 4-1 System model architecture……...………………………….……........................8
Figure 4-2 Use Case Diagram for end user ...........................................................................9
Figure 4-3 Use Case Diagram for data consumer..................................................................10
Figure 4-4 Use Case Diagram for attribute............................................................................11
Figure 4-5 Use Case Diagram for cloud server......................................................................11
Figure 4-6 Class Diagram .....................................................................................................12
Figure 4-7 Sequence Diagram for end user............................................................................13
Figure 4-8 Sequence Diagram for data consumer..................................................................13
Figure 4-9 Sequence Diagram attribute.................................................................................14
Figure 4-10 Sequence Diagram for cloud server.....................................................................14

LIST OF TABLES

S.NO DESCRIPTION PAGE NO

Table 6-1 End user login page Test Cases .............................................................................20


Table 6-2 User registration form Test Cases .........................................................................21
Table 6-3 User file uploading Test Cases...............................................................................21

ix
INTRODUCTION
x
Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

1. INTRODUCTION

Businesses often want to know how customers think about the quality of their services in order
to improve and make more profits. Restaurant goers may want to learn from others’ experience
using a variety of criteria such as food quality, service, ambience, discounts and worthiness. Users
may post their reviews and ratings on businesses and services or simply express their thoughts on
other reviews. Bad (negative) reviews from one’s perspective may have an effect on potential
customers in making decisions, e.g., a potential customer may cancel a service and persuade other
do the same .The question is to quantify how customers and businesses are influenced and how
business ratings change in response to recent feedback.

In this project we use Naïve Bayes algorithm. Naive Bayes is a simple technique for constructing
classifiers: models that assign class labels to problem instances, represented as vectors
of feature values, where the class labels are drawn from some finite set. There is not a
single algorithm for training such classifiers, but a family of algorithms based on a common
principle: all Naive Bayes classifiers assume that the value of a particular feature is independent of
the value of any other feature, given the class variable. For example, a fruit may be considered to be
an apple if it is red, round, and about 10 cm in diameter. A Naive Bayes classifier considers each of
these features to contribute independently to the probability that this fruit is an apple, regardless of
any possible correlations between the color, roundness, and diameter features.

In this project we used the Natural Language Processing Technique(NLP) for pre-processing the
text. NLP is an area of computer science and artificial intelligence concerned with the interactions
between computers and human (natural) languages, in particular how to program computers to
process and analyze large amounts of natural language data. It is the branch of machine learning
which is about analyzing any text and handling predictive analysis.

Scikit-learn is a free software machine learning library for Python programming language. Scikit-
learn is largely written in Python, with some core algorithms written in Cython to achieve
performance. Cython is a superset of the Python programming language, designed to give C-like
performance with code that is written mostly in Python.

Pragati Engineering College Page 1


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Here we focus on the task of sentiment categorization, which takes a segment of unlabeled text and
attempts to classify the text according to overall sentiment. In this project, we apply natural
language processing techniques to classify a set of restaurant reviews based on the number of stars
that each review received. More specifically:

 We develop a classifier to categorize each review from 1-star to 5-stars.


 We implement a set of features that we believe to be relevant to the sentiment
expressed in reviews and analyze their effect on performance, providing insights into what
works and why sentiment categorization can be so difficult.
 We analyze how a review’s conformance to a particular language model can be affected by
the sentiment of the review.
 We experiment with different linguistically motivated models of sentiment expression, again
using the results to improve the performance of our classifier.
 We examine the effects of part-of-speech tagging on our ability to predict sentiment.

Pragati Engineering College Page 2


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

LITERATURE SURVEY

Pragati Engineering College Page 3


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

2. LITERATURE SURVEY

This section reviews literature on machine learning. In machine learning, naive Bayes classifiers
are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong
(naive) independence assumptions between the features.

Naive Bayes has been studied extensively since the 1960s. It was introduced (though not under that
name) into the text retrieval community in the early 1960s, and remains a popular (baseline) method
for text categorization, the problem of judging documents as belonging to one category or the other
(such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With
appropriate pre-processing, it is competitive in this domain with more advanced methods including
support vector machines.

Bo Pang et al., [5] used machine learning techniques to investigate the effectiveness of
classification of documents by overall sentiment. Experiments demonstrated that the machine
International Journal of Computer Applications (0975 – 888) Volume 47– No.11, June 2012 37
learning techniques are better than human produced baseline for sentiment analysis on movie
review data. The experimental setup consists of movie-review corpus with randomly selected 700
positive sentiment and 700 negative sentiment reviews. Features based on unigrams and bigrams
are used for classification. Learning methods Naïve Bayes, maximum entropy classification and
support vector machines were employed. Inferences made by Pang et al., is that machine learning
techniques are better than human baselines for sentiment classification. Whereas the accuracy
achieved in sentiment classification is much lower when compared to topic based categorization.

Zhu et al., [6] proposed aspect based opinion polling from free form textual customers reviews.
The aspect related terms used for aspect identification was learnt using a multi-aspect bootstrapping
method. A proposed aspect-based segmentation model, segments the multi aspect sentence into
single aspect units which was used for opinion polling. Using a opinion polling algorithm, they

Pragati Engineering College Page 4


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

tested on real Chinese restaurant reviews achieving 75.5 percent accuracy in aspect-based opinion
polling tasks. This method is easy to implement and are applicable to other domains like product or
movie reviews.

Jeonghee Yi et al., [7] proposed a Sentiment Analyzer to extract opinions about a subject from
online data documents. Sentiment analyzer uses natural language processing techniques. The
Sentiment analyzer finds out all the references on the subject and sentiment polarity of each
reference is determined. The sentiment analysis conducted by the researchers utilized the sentiment
lexicon and sentiment pattern database for extraction and association purposes. Online product
review articles for digital camera and music were analyzed using the system with good results.

Pragati Engineering College Page 5


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYTEM ANALYSIS

Pragati Engineering College Page 6


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

3. SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

Many researchers have done experiments to classify the sentiments of the customers on different
datasets earlier. Like Turney (2002) used a semantic orientation algorithm to classify reviews based
on the numbers of positively oriented and negatively oriented phrases in each review. Pang et al.
(2002) used machine learning tools such as Maximum Entropy and Support Vector Machine (SVM)
classifiers to classify movie reviews using a number of simple textual features..

3.1.1 Algorithms used in existing system

3.1.1.1 Semantic Orientation

The classification of a review is predicted by the average semantic orientation of the


phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic
orientation when it has good associations (e.g.,"subtle nuances") and a negative semantic
orientation when it has bad associations (e.g.,"very cavalier"). In this paper, the semantic orientation
of a phrase is calculated as the mutual information between the given phrase and the word
"excellent" minus the mutual information between the given phrase and the word "poor". A review
is classified as recommended if the average semantic orientation of its phrases is positive.

3.1.1.2 Maximum Entropy

The Max Entropy classifier is a probabilistic classifier which belongs to the class of
exponential models. Unlike the Naive Bayes classifier that we discussed in the previous article, the
Max Entropy does not assume that the features are conditionally independent of each other. The
MaxEnt is based on the Principle of Maximum Entropy and from all the models that fit our training
data, selects the one which has the largest entropy. The Max Entropy classifier can be used to solve
a large variety of text classification problems such as language detection, topic classification,
sentiment analysis and more.

Pragati Engineering College Page 7


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

3.1.1.3 Support Vector Machine(SVM)

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can
be used for both classification or regression challenges. However, it is mostly used in classification
problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two
classes very well (look at the below snapshot).

3.1.2 Drawbacks of the existing system

 This type of classification is only done when the classifier has to work on the binary data
which is not the case with Restaurant Reviews.

 However, from a practical point of view perhaps the most serious problem with SVMs is the
high algorithmic complexity and extensive memory requirements of the required quadratic
programming in large-scale tasks.

Pragati Engineering College Page 8


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

 If categorical variable has a category (in test data set), which was not observed in training data
set,then model will assign a 0 (zero) probability and will be unable to make a prediction. This
is oftenknown as “Zero Frequency”.

3.2 PROPOSED SYSTEM

Our proposed system is to apply natural language processing techniques to classify a set of
restaurant reviews based on the number of stars that each review received.We develop a maximum
entropy classifier to categorize each review from 1-star to 5-stars. We implement a set of features
that we believe to be relevant to the sentiment expressed in reviews and analyze their effect on
performance, providing insights into what works and why sentiment categorization can be so
difficult.We analyze how a review’s conformance to a particular language model can be affected by
the sentiment of the review.

We experiment with different linguistically motivated models of sentiment expression, again using
the results to improve the performance of our classifier We examine the effects of part-of-speech
tagging on our ability to predict sentiment.We experimented with different methods of
preprocessing the data. Because the reviews are unstructured in terms of user input, reviews can
look like anything from a paragraph of well-formatted text to a jumble of seemingly unrelated
words to a run-on sentence with no apparent regard for grammar or Punctuation.Our initial pass
over the data simply tokenized the reviews based on whitespace and treated each token as a
unigram, but we were able to improve performance by removing punctuation in addition to the
whitespace and converting all letters to lowercase.

Pragati Engineering College Page 9


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

In this way, we treat the occurrences of “good”, “Good”, and “good.” all as the same, which gives
better predictive power to any test set review containing any of these three forms.Before converting
into the unigram stemming was also done which means the various forms (tenses, verbs) of the
words were removed and treated as a single word. After the matrix is build the non-frequent words
are removed by setting a threshold in order to improve the accuracy. So our matrix includes relevant
unigrams as well as bigrams which are occurring more than the threshold times.

3.2.1 Algorithm used in proposed system

3.2.1.1 Naive Bayes

Proposed system uses this Naive Bayes It is a classification technique based on Bayes’
Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes
classifier assumes that the presence of a particular feature in a class is unrelated to the presence of
any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about
3 inches in diameter. Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability that this fruit is an apple
and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

3.2.2 Advantages of proposed system

 Good at pattern recognition problems


 Data-driven, and performance is high in many problems
 End-to-End training: little or no domain knowledge is needed in system construction
 Learn of representations: cross-modal processing is possible
 Gradient-based learning: learning algorithm is simple
 Mainly supervised learning methods

Pragati Engineering College Page 10


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Pragati Engineering College Page 11


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYTEM DESIGN

4. SYSTEM DESIGN

4.1 SYSTEM ARCHITECTURE

Pragati Engineering College Page 12


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

The system architecture is shown in Figure. It comprises two main modules, an offline

processing module, where the user profiles are being generated and the feature extraction

and rating happens, as well as an online module, that generates real -time

recommendations. The prototype uses user review data from restaurant. The dataset

contains user information, business information and user reviews. These objects are stored

on Sqlite3 database. A brief overview of the system is provided in what follows.

Fig 4-1: System Model

4.2 UML REPRESENTATION

Pragati Engineering College Page 13


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

The Unified Modeling Language is a standard language for specifying, visualization, constructing
and documenting the artifacts of software system, as well as for business modeling and other non-
software systems.

The following are the UML diagrams used in this project

4.2.1 Use case diagrams:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. A use case diagram at its simplest is a
representation of a user's interaction with the system that shows the relationship between the user
and the different use cases in which the user is involved. A use case diagram can identify the
different types of users of a system and the different use cases and will often be accompanied by
other types of diagrams as well. The use cases are represented by either circles or ellipses

Use case:

In software and systems engineering, a use case is a list of actions or event steps typically defining
the interactions between a role (known in the Unified Modeling Language as an actor and a system
to achieve a goal. The actor can be a human or other external system. In systems engineering, use
cases are used at a higher level than within software engineering, often representing missions
or stakeholder goals. The detailed requirements may then be captured in the Systems Modeling
Language (SysML) or as contractual statements.

Pragati Engineering College Page 14


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Fig 4-2: Use case diagram for restaurant reviews

Fig 4-2 shows the usecase diagram in which the actor is the end user who can import the data. The
use cases for the actor that is for the end user are splitting the data, training the data, predicting,
constructing confusion matrix and calculating Accuracy score. End user has only this system
boundary that is the actor can perform only these tasks. Beyond these tasks the actor is not given
permission.

4.2.2 Sequence Diagram: A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in what order.

Pragati Engineering College Page 15


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and
classes involved in the scenario and the sequence of messages exchanged between the objects
needed to carry out the functionality of the scenario. Sequence diagrams are typically associated
with use case realizations in the Logical View of the system under development. Sequence
diagrams are sometimes called event diagrams or event scenarios.

Fig 4-7: Sequence diagram for restaurant reviews

Fig 4-7 shows the sequence diagram in which the actor is the end user. Here there is a synchronous
process in which the end user can perform all the functions of importing, data cleaning, classifying
and splitting .More over the actor can also receive response for the actions performed.

Pragati Engineering College Page 16


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

4.2.3 Communication diagrams:

A Communication diagram models the interactions between objects or parts in terms of


sequenced messages. Communication diagrams represent a combination of information taken
from Class, Sequence, and Use Case Diagrams describing both the static structure and dynamic
behavior of a system.

However, communication diagrams use the free-form arrangement of objects and links as used in
Object diagrams. In order to maintain the ordering of messages in such a free-form diagram,
messages are labeled with a chronological number and placed near the link the message is sent over.
Reading a communication diagram involves starting at message 1.0, and following the messages
from object to object.

Fig 4-7: Communication diagram for restaurant reviews

Fig 4-7 shows the communication diagram in which the actor is the end user. Here there is a
synchronous process in which the end user can perform all the functions of importing, data

Pragati Engineering College Page 17


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

cleaning, classifying and splitting .More over the actor can also receive response for the actions
performed.

4.2.4 Deployment diagrams:

A deployment diagram in the Unified Modeling Language models the physical deployment
of artifacts on nodes. To describe a web site, for example, a deployment diagram would show what
hardware components ("nodes") exist (e.g., a web server, an application server, and a database
server), what software components ("artifacts") run on each node (e.g., web application, database),
and how the different pieces are connected (e.g. JDBC, REST, RMI).

The nodes appear as boxes, and the artifacts allocated to each node appear as rectangles within the
boxes. Nodes may have subnodes, which appear as nested boxes. A single node in a deployment
diagram may conceptually represent multiple physical nodes, such as a cluster of database servers.

Fig 4-7: Deployment diagram for restaurant reviews

Pragati Engineering College Page 18


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Fig 4-7 shows the deployment diagram in which the actor is the end user. Here there is a
synchronous process in which the end user can perform all the functions of importing, data
cleaning, classifying and splitting .More over the actor can also receive response for the actions
performed.

4.2.5 Component diagrams:

In Unified Modeling Language (UML), a component diagram depicts how components are
wired together to form larger components or software systems. They are used to illustrate the
structure of arbitrarily complex systems.

A component is something required to execute a stereotype function. Examples of stereotypes in


components include executables, documents, database tables, files, and library files. Components
are wired together by using an assembly connector to connect the required interface of one
component with the provided interface of another component.

Fig 4-7: Component diagram for restaurant reviews

Fig 4-7 shows the component diagram in which the actor is the end user. Here there is a
synchronous process in which the end user can perform all the functions of importing, data

Pragati Engineering College Page 19


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

cleaning, classifying and splitting .More over the actor can also receive response for the actions
performed.

4.2.6 Activity diagrams:

Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams are intended to model both computational and organizational processes (i.e., workflows),
as well as the data flows intersecting with the related activities.Although activity diagrams
primarily show the overall flow of control, they can also include elements showing the flow of data
between activities through one or more data stores

Fig 4-7: Activity diagram for restaurant reviews

Pragati Engineering College Page 20


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Fig 4-7 shows the Activity diagram in which the actor is the end user. Here there is a synchronous
process in which the end user can perform all the functions of importing, data cleaning, classifying
and splitting .More over the actor can also receive response for the actions performed.

Pragati Engineering College Page 21


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYSTEM IMPLEMENTATION

Pragati Engineering College Page 22


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

5. SYSTEM IMPLEMENTATION

5.1 SYSTEM REQUIREMENTS

5.1.1 HARDWARE REQUIREMENTS

 RAM : 4GB and Higher


 Processor : Intel i3 and above
 Hard Disk : 500GB: Minimum

5.1.2 SOFTWARE REQUIREMENTS

 Operating Systems : Windows Family

 Python IDE : Python (2.7.x and above) and Pycharm IDE

 setup tools and pip to be installed for 3.6.x and above

5.2 SOFTWARE ENVIRONMENT

5.2.1 Python

Python is a high-level, interpreted, interactive and object-oriented scripting language.

Python is designed to be highly readable. It uses English keywords frequently where as other

languages use punctuation, and it has fewer syntactical constructions than other languages.

Pragati Engineering College Page 23


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

5.2.1.1 Features of python

 Python is Interpreted − Python is processed at runtime by the interpreter. You do not need

to compile your program before executing it. This is similar to PERL and PHP.

 Python is Interactive − You can actually sit at a Python prompt and interact with the

interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style or technique of

programming that encapsulates code within objects.

 Python is a Beginner’s Language − Python is a great language for the beginner-level

programmers and supports the development of a wide range of applications from simple

text processing to WWW browsers to games.

5.2.2 Django Framework

Django is a Python-based free and open-source web framework, which follows the
model-view-template (MVT) architectural pattern. It is maintained by the Django Software
Foundation (DSF), an independent organization established as a non-profit.

Django's primary goal is to ease the creation of complex, database-driven websites. The framework
emphasizes reusability and "pluggability" of components, less code, low coupling, rapid
development, and the principle of don't repeat yourself.] Python is used throughout, even for
settings files and data models. Django also provides an optional administrative create, read, update
and delete interface that is generated dynamically through introspection and configured via admin
models.

Pragati Engineering College Page 24


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

5.2.2.1 Features of Django Framework

 a lightweight and standalone web server for development and testing

 a form serialization and validation system that can translate between HTML forms and
values suitable for storage in the database

 a template system that utilizes the concept of inheritance borrowed from object-oriented
programming

 a caching framework that can use any of several cache methods

 support for middleware classes that can intervene at various stages of request processing and
carry out custom functions

 an internal dispatcher system that allows components of an application to communicate


events to each other via pre-defined signals

 an internationalization system, including translations of Django's own components into a


variety of languages

 a system for extending the capabilities of the template engine

 an interface to Python's built-in unit test framework

Pragati Engineering College Page 25


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYSTEM TESTING

Pragati Engineering College Page 26


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

6. SYSTEM TESTING

6.1 TESTING OBJECTIVES

The reason for testing is to find system errors. Testing is the way toward attempting to find
each possible blame or shortcoming in a work item. It gives an approach to check the usefulness of
parts, sub-gatherings, congregations or potentially a completed item It is the way toward practicing
programming with the aim of guaranteeing that the programming framework lives up to its
necessities and client desires and does not bomb in an unsuitable way. There are different sorts of
test. Every test sort addresses a particular testing prerequisite.

 Identification of deformities: imperfections must be distinguished first in the item.


 Isolating the deformities: After distinguishing proof imperfections must be recorded.
Segregation implies division. Physical division is finished by the designer.
 Subjected for amendment: This is the obligation of the TE to send the rundown of
deformities for correction.
 Ensure that the item is sans imperfection: Ensure that the deformities are truly redressed
and the item is sans imperfection.

6.2 Test Plan

It is characterized as the key archive, which clarifies the general system of how to test an
application in a powerful, productive and in an enhanced way. The following testing techniques are
performed

6.2.1What is Web Testing?

Web testing is a software testing practice to test the websites or web applications for potential
bugs. It’s a complete testing of web-based applications before making live.

Pragati Engineering College Page 27


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

A web-based system needs to be checked completely from end-to-end before it goes live for end
users.

By performing website testing, an organization can make sure that the web-based system is
functioning properly and can be accepted by real-time users.

The UI design and functionality are the captains of website testing.

1.1.1 Web testing checklists

1) Functionality Testing
2) Usability testing
3) Interface testing
4) Compatibility testing
5) Performance testing
6) Security testing

1) Functionality Testing

Test for – all the links in web pages, database connection, forms used for submitting or getting
information from the user in the web pages, Cookie testing etc.

Check all the links:

 Test the outgoing links from all the pages to the specific domain under test.
 Test all internal links.
 Test links jumping on the same pages.
 Test links used to send email to admin or other users from web pages.
 Test to check if there are any orphan pages.
 Finally, link checking includes, check for broken links in all above-mentioned links.

Pragati Engineering College Page 28


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Test forms on all pages:


Forms are an integral part of any website. Forms are used for receiving information from users and
to interact with them. So what should be checked in these forms?

 First, check all the validations on each field.


 Check for default values of the fields.
 Wrong inputs in the forms to the fields in the forms.
 Options to create forms if any, form delete, view or modify the forms.

Let’s take an example of the search engine project currently I am working on, in this project we
have advertiser and affiliate signup steps. Each sign-up step is different but its dependent on the
other steps.

So sign up flow should get executed correctly. There are different field validations like email Ids,
User financial info validations etc. All these validations should get checked in manual or automated
web testing.

Cookies Testing:

Cookies are small files stored on the user machine. These are basically used to maintain the session-
mainly the login sessions. Test the application by enabling or disabling the cookies in your browser
options.

Test if the cookies are encrypted before writing to the user machine. If you are testing the session
cookies (i.e. cookies that expire after the session ends) check for login sessions and user stats after
the session ends. Check effect on application security by deleting the cookies. (I will soon write a
separate article on cookie testing as well)

Validate your HTML/CSS:

If you are optimizing your site for Search engines then HTML/CSS validation is the most important
one. Mainly validate the site for HTML syntax errors. Check if the site is crawlable to different
search engines.

Pragati Engineering College Page 29


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Database testing:

Data consistency is also very important in a web application. Check for data integrity and errors
while you edit, delete, modify the forms or do any DB related functionality.

Check if all the database queries are executing correctly, data is retrieved and also updated
correctly. More on database testing could be a load on DB, we will address this in web load or
performance testing below.

In testing the functionality of the websites the following should be tested:

Links
i. Internal Links
ii. External Links
iii. Mail Links
iv. Broken Links

Forms
i. Field validation
ii. Error message for wrong input
iii. Optional and Mandatory fields

Database
Testing will be done on the database integrity.

2) Usability Testing

Usability testing is the process by which the human-computer interaction characteristics of a system
are measured, and weaknesses are identified for correction.

• Ease of learning
• Navigation

Pragati Engineering College Page 30


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

• Subjective user satisfaction


• General appearance

Test for navigation:

Navigation means how a user surfs the web pages, different controls like buttons, boxes or how the
user uses the links on the pages to surf different pages.

Usability testing includes the following:

 The website should be easy to use.


 Instructions provided should be very clear.
 Check if the instructions provided are perfect to satisfy its purpose.
 The main menu should be provided on each page.
 It should be consistent enough.

Content checking:

Content should be logical and easy to understand. Check for spelling errors. Usage of dark colors
annoys the users and should not be used in the site theme.

You can follow some standard colors that are used for web page and content building. These are the
commonly accepted standards like what I mentioned above about annoying colors, fonts, frames
etc.

Content should be meaningful. All the anchor text links should be working properly. Images should
be placed properly with proper sizes.

These are some of the basic important standards that should be followed in web development. Your
task is to validate all for UI testing.

Other user information for user help:

Pragati Engineering College Page 31


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Like search option, sitemap also helps files etc. The sitemap should be present with all the links in
websites with a proper tree view of navigation. Check for all links on the sitemap.

“Search on the site” option will help users to find content pages that they are looking for easily and
quickly. These are all optional items and if present they should be validated.

3) Interface Testing

In web testing, the server side interface should be tested. This is done by verifying that
communication is done properly. Compatibility of the server with software, hardware, network, and
the database should be tested.

The main interfaces are:

 Web server and application server interface


 Application server and Database server interface.

Check if all the interactions between these servers are executed and errors are handled properly. If
database or web server returns an error message for any query by application server then application
server should catch and display these error messages appropriately to the users.

Check what happens if the user interrupts any transaction in-between? Check what happens if the
connection to the web server is reset in between?

4) Compatibility Testing

Compatibility of your website is a very important testing aspect. See which compatibility test to be
executed:

 Browser compatibility
 Operating system compatibility
 Mobile browsing
 Printing options

Pragati Engineering College Page 32


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Browser compatibility:

In my web-testing career, I have experienced this as the most influencing part of website testing.
Some applications are very dependent on browsers. Different browsers have different
configurations and settings that your web page should be compatible with.

Your website coding should be a cross-browser platform compatible. If you are using java scripts or
AJAX calls for UI functionality, performing security checks or validations then give more stress on
browser compatibility testing of your web application.

Test web application on different browsers like Internet Explorer, Firefox, Netscape Navigator,
AOL, Safari, Opera browsers with different versions.

OS compatibility:

Some functionality in your web application is that it may not be compatible with all operating
systems. All new technologies used in web development like graphic designs, interface calls like
different API’s may not be available in all Operating Systems.

Hence test your web application on different operating systems like Windows, Unix, MAC, Linux,
Solaris with different OS flavors.

Mobile browsing:

We are in the new technology era. So in future Mobile browsing will rock. Test your web pages on
mobile browsers. Compatibility issues may be there on mobile devices as well.

Printing options:

If you are giving page-printing options then make sure fonts, page alignment, page graphics etc., are
getting printed properly. Pages should fit the paper size or as per the size mentioned in the printing
option.

Pragati Engineering College Page 33


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

5) Performance testing

The web application should sustain to heavy load. Web performance testing should include:

 Web Load Testing


 Web Stress Testing

Test application performance on different internet connection speed.

Web load testing: You need to test if many users are accessing or requesting the same page. Can
system sustain in peak load times? The site should handle many simultaneous user requests, large
input data from users, simultaneous connection to DB, heavy load on specific pages etc.

Web Stress testing: Generally stress means stretching the system beyond its specified limits. Web
stress testing is performed to break the site by giving stress and its checked as for how the system
reacts to stress and how it recovers from crashes. Stress is generally given on input fields, login and
sign up areas.

In web performance, testing website functionality on different operating systems and different
hardware platforms is checked for software and hardware memory leakage errors.

Performance testing can be applied to understand the web site’s scalability or to benchmark the
performance in the environment of third-party products such as servers and middleware for
potential purchase.

Connection Speed
Tested on various networks like Dial-Up, ISDN etc.

Load
i.What is the no. of users per time?
ii. Check for peak loads and how the system behaves
iii. A large amount of data accessed by the user

Pragati Engineering College Page 34


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Stress
i. Continuous Load
ii. Performance of memory, CPU, file handling etc..

6) Security Testing

Following are some of the test cases for web security testing:

 Test by pasting internal URL directly into the browser address bar without login. Internal
pages should not open.
 If you are logged in using username and password and browsing internal pages then try
changing URL options directly. I.e. If you are checking some publisher site statistics with
publisher site ID= 123. Try directly changing the URL site ID parameter to different site ID
which is not related to the logged in user. Access should be denied for this user to view
others stats.
 Try some invalid inputs in input fields like login username, password, input text boxes etc.
Check the system’s reaction to all invalid inputs.
 Web directories or files should not be accessible directly unless they are given download
option.
 Test the CAPTCHA for automating script logins.
 Test if SSL is used for security measures. If it is used, the proper message should get
displayed when user switch from non-secure HTTP:// pages to secure HTTPS:// pages and
vice versa.
 All transactions, error messages, security breach attempts should get logged in log files
somewhere on the web server.

The primary reason for testing the security of a web is to identify potential vulnerabilities and
subsequently repair them.

 Network Scanning
 Vulnerability Scanning
 Password Crackin

Pragati Engineering College Page 35


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SCREENSHOTS

Pragati Engineering College Page 36


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

7.SCREENSHOTS

Pragati Engineering College Page 37


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Pragati Engineering College Page 38


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

CONCLUSION
&
FUTURE WORK

Pragati Engineering College Page 39


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

8. CONCLUSION AND FUTURE WORK

Humans are the "Gold Standard" of sentiment analysis yet there is always disagreement within a
group of raters on sentiment. Humans generally only agree about 80% of the time. Automatic
sentiment analysis can strive towards this level but, obviously, can not exceed it.

People and automatic systems both have a place in the process. The Automated systems can go
through huge quantities of data while humans can do a higher quality job on a smaller
sample. Saying "People are no good because they are not scalable" is probably just as silly as saying
"Automatic systems are no good because they are not as accurate".

Focus on and use the strengths of each as needed for your particular situation. It will have a lot to
do with social forums/platforms where people express free opinion. Presently tweets are one such
open medium, then if facebook at some point chooses to make the timeline updates/status messages
open to search (I think it will someday do that through a minuscule sounding update in "privacy
policy") it will be gold mine of real-time sentiments.

Present Sentiments hold a key to the future events. To make it sound a bit technical, you can say
that the sentiments represent the "present value of future events". Now this value can have deep
social, political and monetary significance. It can be "Expression of opinion about a public figure",
"opinions expressed through tweets before elections", or "the buzz before a movie release", all these
can be great cues for things to come.

Therefore when people comment about present news stories, the sentiment analysis can actually
offer a key to predict the future outcomes or atleast anticipate them better!

Pragati Engineering College Page 40


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

REFERENCES.

Pragati Engineering College Page 41


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

9. REFERENCES

1] Ariyasriwatana, W., Buente, W., Oshiro, M., & Streveler, D. (2014). Categorizing
health-related cues to action: using Yelp reviews of restaurants in Hawaii. New
Review of Hypermedia and Multimedia, 20(4), 317-340.

[2] Byers, J. W., Mitzenmacher, M., & Zervas, G. (2012, June). The groupon effect on
yelp ratings: a root cause analysis. In Proceedings of the 13th ACM conference on
electronic commerce (pp. 248-265). ACM.

[3] Hicks, A., Comp, S., Horovitz, J., Hovarter, M., Miki, M., & Bevan, J. L. (2012).
Why people use Yelp. com: An exploration of uses and gratifications. Computers in
Human Behavior, 28(6), 2274-2279.

[4] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. S. (2013, July). What yelp
fake review filter might be doing?. In ICWSM. 6

[5] dos Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for
Sentiment Analysis of Short Texts. In COLING (pp. 69-78).

[6] Mullen, T., & Collier, N. (2004, July). Sentiment Analysis using Support Vector
Machines with Diverse Information Sources. In EMNLP (Vol. 4, pp. 412-418).

[7] Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. (2014, August). NRC-
Canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the
8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 437-442).
Dublin, Ireland: Association for Computational Linguistics and Dublin City University.

[8] Huang, J., Rogers, S., & Joo, E. (2014). Improving restaurants by extracting

Pragati Engineering College Page 42


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

subtopics from yelp reviews. iConference 2014 (Social Media Expo).

[9] Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal
estimated sub-gradient solver for svm. Mathematical programming, 127(1), 3-30.

[10] Saif, Hassan, et al. "On stopwords, filtering and data sparsity for sentiment
analysis of Twitter." (2014): 810-817.

APPENDIX

Pragati Engineering College Page 43


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

10.SOURCE CODE

VIEWS.PY:

from django.shortcuts import render


import pandas as pd
from sklearn import metrics
from django.views.generic import TemplateView
import sklearn

def result(request):
# Importing the dataset
dataset = pd.read_csv('static/Restaurant_Reviews.tsv',delimiter='\t', quoting=3)

# Cleaning the texts


import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(0, 1000):
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review = ' '.join(review)
corpus.append(review)

# Creating the Bag of Words model


from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=1500)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

# Fitting Naive Bayes to the Training set


from sklearn.naive_bayes import GaussianNB

Pragati Engineering College Page 44


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

from sklearn.metrics import accuracy_score


accuracy=accuracy_score(y_test, y_pred, normalize=False)

#d={'i':accuracy,'j':cm}
d = {'i': metrics.accuracy_score(y_test, y_pred), 'j': metrics.confusion_matrix(y_test, y_pred)}
return render(request,'restaurant.html',context=d)

################################################################################
class Home(TemplateView):
template_name = 'home.html'

restaurant.html:

<!DOCTYPE html>
{% extends 'base.html' %}
{% load staticfiles %}

{% block body_block %}
<title>Restaurant Reviews</title>
<style>

h1{
color: green;
text-align: center;

Pragati Engineering College Page 45


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

}
h2{
color: blue;
}
p{
font-family: Arial;
}
</style>
<div class="container">
<div class="jumbotron">
<h1><em>This is Result page!!!</em></h1><br><br>
<h2>Accuracy :{{ i }}</h2><br>
<h2>Confusion Matrix :{{ j }}</h2><br><br>
<p><img src="{% static 'images/restaurant.jpg' %}" align="center"></p>
</div>
</div>
{% endblock %}

home.html:

{% extends 'base.html' %}
{% load staticfiles %}
{% block body_block %}
<title>Restaurant Reviews</title>
<style>
h1{
color: green;
text-align: center;
}

Pragati Engineering College Page 46


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

h2{
color: blue;
}
p{
font-family: Arial;
}
</style>

<div class="container">
<div class="jumbotron">
<h1>This is about the restaurant reviews!!</h1><br><br>
<img src="{% static 'images/restaurant1.jpg' %}" align="center"><br><br>

<p>
The purpose of this analysis is to build a prediction model to predict whether a review on the
restaurant is positive or negative.
To do so, we will work on Restaurant Review dataset, we will load it into predicitve algorithms
Multinomial Naive Bayes,
Bernoulli Naive Bayes and Logistic Regression. In the end, we hope to find a "best" model for
predicting the review's sentiment.
</p>
<p>
However since a considerable amount of information exists as text-fragments without having
any kind of numerical scales, it is hard to classify their evaluation efficiently without reading
full text. Here we will
focus on extracting scored ratings from text fragments on the web and suggests various
experiments in order to improve the
quality of a classifier.
</p>
</div>

Pragati Engineering College Page 47


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

</div>
{% endblock %}

base.html:

<!DOCTYPE html>
{% load staticfiles %}
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css"
integrity="sha384-
MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO"
crossorigin="anonymous">
</head>
<div class="container">

<nav class="navbar navbar-expand-lg navbar-light bg-light">


<a class="navbar-brand" href="{% url 'app1:home' %}">Home</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-
target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-
expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>

<div class="collapse navbar-collapse" id="navbarSupportedContent">


<ul class="navbar-nav mr-auto">
<li class="nav-item active">

Pragati Engineering College Page 48


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

<a class="nav-link" href="{% url 'app1:result'%}">Restaurant Reviews<span class="sr-


only">(current)</span></a>
</li>
</ul>
</div>
</nav>
</div>
{% block body_block %}
{% endblock %}
</body>
</html>

Restaurant_Reviews.tsv:

/* .tsv file consists of thousand lines separated by tab space and some of them are
mentioned below */

Review Liked 1
Wow... Loved this place. 1
Crust is not good. 0
Not tasty and the texture was just nasty. 0
Stopped by during the late May bank holiday off Rick Steve recommendation and loved it. 1
The selection on the menu was great and so were the prices. 1
Now I am getting angry and I want my damn pho. 0
Honeslty it didn't taste THAT fresh.) 0
The potatoes were like rubber and you could tell they had been made up ahead of time being kept
under a warmer. 0
The fries were great too. 1
A great touch. 1

Pragati Engineering College Page 49

You might also like