Keywords::Sentimental Analysis, Naive Bayes, Support Vector Machine
Keywords::Sentimental Analysis, Naive Bayes, Support Vector Machine
In the era of the web, a huge amount of information is now flowing over the network. Since the range of web
content covers subjective opinion as well as objective information, it is now common for people to gather
information about products and services that they want to buy. However, since a considerable amount of
information exists as text-fragments without having any kind of numerical scales, it is hard to classify their
evaluation efficiently without reading full text. Here we will focus on extracting scored ratings from text
fragments on the web and suggests various experiments in order to improve the quality of a classifier.
Methodologies like Sentiment Analysis as Text Classification Problem, Sentiment analysis as Feature
Classification with mathematical treatment areexplored. Of late, the word of mouth opinions expressed
online are more valuable as people visit the restaurant by seeing the reviews.
i
Contents
PAGE NO
1. INTRODUCTION.......................................................................................................................................................1
2. LITERATURE SURVEY...........................................................................................................................................3
3.SYSTEM ANALYSIS.....................................................................................................................................................4
4 .SYSTEM SPECIFICATIONS........................................................................................................................................7
5. SYSTEM ARCHITECTURE......................................................................................................................................8
6. SYSTEM DESIGN.....................................................................................................................................................9
7. SYSTEM IMPLEMENTATION..............................................................................................................................16
8. SAMPLE CODE.......................................................................................................................................................22
9. SOFTWARE ENVIRONMENT...............................................................................................................................26
9.1 Python..................................................................................................................................................................26
11.SYSTEM STUDY........................................................................................................................................................36
ii
11.3 Operational Feasibility......................................................................................................................................36
12. CONCLUSION.....................................................................................................................................................37
13. REFERENCES......................................................................................................................................................38
iii
LIST OF FIGURES
iv
1. INTRODUCTION
Businesses often want to know how customers think about the quality of their services in order to improve
and make more profits. Restaurant goers may want to learn from others’ experience using a variety of criteria
such as food quality, service, ambience, discounts and worthiness. Users may post their reviews and ratings
on businesses and services or simply express their thoughts on other reviews. Bad (negative) reviews from
one’s perspective may have an effect on potential customers in making decisions, e.g., a potential customer
may cancel a service and persuade other do the same. The question is to quantify how customers and
businesses are influenced and how business ratings change in response to recent feedback.
In this project we use Naïve Bayes algorithm. Naive Bayes is a simple technique for constructing classifiers:
models that assign class labels to problem instances, represented as vectors of feature values, where the class
labels are drawn from some finite set. There is not a single algorithm for training such classifiers, but a
family of algorithms based on a common principle: all Naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other feature, given the class variable. For example, a
fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter. A Naive Bayes
classifier considers each of these features to contribute independently to the probability that this fruit is an
apple ,regardless of any possible correlations between the color, roundness, and diameter features.
In this project we used the Natural Language Processing Technique(NLP) for pre-processing the text. NLP
is an area of computer science and artificial intelligence concerned with the interactions between computers
and human (natural) languages, in particular how to program computers to process and analyze large
amounts of natural language data. It is the branch of machine learning which is about analyzing any text and
handling predictive analysis.
Scikit-learn is a free software machine learning library for Python programming language. Scikit-learn is
largely written in Python, with some core algorithms written in Cython to achieve performance. Cython is a
superset of the Python programming language, designed to give C-like performance with code that is written
mostly in Python.
Here we focus on the task of sentiment categorization, which takes a segment of unlabeled text and attempts
to classify the text according to overall sentiment. In this project, we apply natural language processing
1
techniques to classify a set of restaurant reviews based on the number of stars that each review received.
More specifically:
2
2. LITERATURE SURVEY
This section reviews literature on machine learning. In machine learning, naive Bayes classifiers are a family
of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence
assumptions between the features.
Naive Bayes has been studied extensively since the 1960s. It was introduced (though not under that name)
into the text retrieval community in the early 1960s, and remains a popular (baseline) method for text
categorization, the problem of judging documents as belonging to one category or the other (such as spam or
legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate pre-processing, it
is competitive in this domain with more advanced methods including support vector machines.
Bo Pang et al., used machine learning techniques to investigate the effectiveness of classification of
documents by overall sentiment. Experiments demonstrated that the machine International Journal of
Computer Applications (0975 – 888) Volume 47– No.11, June 2012 37 learning techniques are better than
human produced baseline for sentiment analysis on movie review data. The experimental setup consists of
movie-review corpus with randomly selected 700 positive sentiment and 700 negative sentiment reviews.
Features based on unigrams and bigrams are used for classification. Learning methods Naïve Bayes,
maximum entropy classification and support vector machines were employed. Inferences made by Pang et
al., is that machine learning techniques are better than human baselines for sentiment classification. Whereas
the accuracy achieved in sentiment classification is much lower when compared to topic based
categorization.
Zhu et al., proposed aspect-based opinion polling from free form textual customers reviews. The aspect
related terms used for aspect identification was learnt using a multi-aspect bootstrapping method. Aproposed
aspect-basedsegmentation model, segments the multi aspect sentence into single aspect units which was used
for opinion polling. Using a opinion polling algorithm, they tested on real Chinese restaurant reviews
achieving 75.5 percent accuracy in aspect-based opinion polling tasks. This method is easy to implement and
are applicable to other domains like product or movie reviews.
Jeonghee Yi et al., proposed a Sentiment Analyzer to extract opinions about a subject from online data
documents. Sentiment analyzer uses natural language processing techniques. The Sentiment analyzer finds
out all the references on the subject and sentiment polarity of each reference is determined. The sentiment
analysis conducted by the researchers utilized the sentiment lexicon and sentiment pattern database for
extraction and association purposes. Online product review articles for digital camera and music were
analyzed using the system with good results.
3
3.SYSTEM ANALYSIS
Many researchers have done experiments to classify the sentiments of the customers on different datasets
earlier. Like Turney (2002) used a semantic orientationalgorithm to classify reviews based on the numbersof
positively oriented and negatively oriented phrasesin each review.Pang et al. (2002) used machine learning
tools such as Maximum Entropy and Support Vector Machine (SVM) classifiers to classify movie reviews
using a number of simple textual features.
The classification of a review is predicted by the average semantic orientation of the phrases in the review
that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations
(e.g.,"subtle nuances") and a negative semantic orientation when it has bad associations (e.g.,"very
cavalier"). The semantic orientation of a phrase is calculated as the mutual information between the given
phrase and the word "excellent" minus the mutual information between the given phrase and the word
"poor". A review is classified as recommended if the average semantic orientation of its phrases is positive.
The Max Entropy classifier is a probabilistic classifier which belongs to the class of exponential models.
Unlike the Naive Bayesclassifier that we discussed in the previous article, the Max Entropy does not assume
that the features are conditionally independent of each other. The MaxEnt is based on the Principle of
Maximum Entropy and from all the models that fit our training data, selects the one which has the largest
entropy. The Max Entropy classifier can be used to solve a large variety of text classification problems such
as language detection, topic classification, sentiment analysis and more.
“Support Vector Machine” (SVM) is a supervised machine learning algorithmwhich can be used for both
classification or regression challenges. However, it is mostly used in classification problems. In this
algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have)
with the value of each feature being the value of a particular coordinate. Then, we perform classification by
finding the hyper-plane that differentiate the two classes very well (look at the below snapshot).
4
FIG 3-1:SUPPORT VECTOR MACHINE
This type of classification is only done when the classifier has to work on the binary data which is not
the case with Restaurant Reviews.
However, from a practical point of view perhaps the most serious problem with SVMs is the high
algorithmic complexity and extensive memory requirements of the required quadratic programming
in large-scale tasks.
If categorical variable has a category (in test data set), which was not observed in training data set,then
model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as
“Zero Frequency”.
We experiment with different linguistically motivated models of sentiment expression, again using the
results to improve the performance of our classifier We examine the effects of part-of-speech tagging on our
ability to predict sentiment.We experimented with different methods of preprocessing the data. Because the
5
reviews are unstructured in terms of user input, reviews can look like anything from a paragraph of well-
formatted text to a jumble of seemingly unrelated words to a run-on sentence with no apparent regard for
grammar orPunctuation.Our initial pass over the data simply tokenized the reviews based on whitespace and
treated each token as a unigram, but we were able to improve performance by removing punctuation in
addition to the whitespace and converting all letters to lowercase.
In this way, we treat the occurrences of “good”, “Good”, and “good.” all as the same, which gives better
predictive power to any test set review containing any of these three forms.Before converting into the
unigram stemming was also done which means the various forms (tenses, verbs) of the words were removed
and treated as a single word. After the matrix is build the non-frequent words are removed by setting a
threshold in order to improve the accuracy. So our matrix includes relevant unigrams as well as bigrams
which are occurring more than the threshold times.
Proposed system uses this Naive BayesIt is a classification techniquebased on Bayes’ Theoremwith an
assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the
presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a
fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these
features depend on each other or upon the existence of the other features, all of these properties
independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity,
Naive Bayes is known to outperform even highly sophisticated classification methods.
6
4 .SYSTEM SPECIFICATIONS
Django
7
5. SYSTEM ARCHITECTURE
The system architecture is shown in Figure. It comprises two main modules, anoffline processing
module, where the user profiles are being generated and the feature extraction and rating happens, as
well as an online module, that generates real-time recommendations.The prototype uses user review data
from restaurant. The dataset contains user information, business information and user reviews. These
objects are stored on Sqlite3 database.A brief overview of the system is provided in what follows.
8
6. SYSTEM DESIGN
The Unified Modeling Language is a standard language for specifying, visualization, constructing and
documenting the artifacts of software system, as well as for business modeling and other non-software
systems.
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram defined by
and created from a Use-case analysis.A use case diagram at its simplest is a representation of a user's
interaction with the system that shows the relationship between the user and the different use casesin which
the user is involved. A use case diagram can identify the different types of users of a system and the different
use cases and will often be accompanied by other types of diagrams as well. The use cases are represented by
either circles or ellipses
Use case:
In software and systems engineering, a use case is a list of actions or event steps typically defining the
interactions between a role (known in the Unified Modeling Language as an actor and a system to achieve a
goal. The actor can be a human or other external system. In systems engineering, use cases are used at a
higher level than within software engineering, often representing missions or stakeholdergoals. The detailed
requirements may then be captured in the Systems Modeling Language (SysML) or as contractual
statements.
9
Fig 6-1: Use case diagram for restaurant reviews
Fig 6-1 shows the usecase diagram in which the actor is the end user who can import the data.The use
cases for the actor that is for the end user are splitting the data, training the data, predicting, constructing
confusion matrix and calculating Accuracy score. End user has only this system boundary that is the actor
can perform only these tasks. Beyond these tasks the actor is not given permission.
10
6.1.2 Sequence Diagram: A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in what order. A sequence
diagram shows object interactions arranged in time sequence. It depicts the objects and classes
involved in the scenario and the sequence of messages exchanged between the objects needed to
carry out the functionality of the scenario. Sequence diagrams are typically associated with use case
realizations in the Logical View of the system under development. Sequence diagrams are sometimes
called event diagrams or event scenarios.
Fig 6-2 shows the sequence diagram in which the actor is the end user. Here there is a synchronous process
in which the end user can perform all the functions of importing, data cleaning, classifying and
splitting .More over the actor can also receive response for the actions performed.
A Communication diagram models the interactions between objects or parts in terms of sequenced messages.
Communication diagrams represent a combination of information taken from Class, Sequence, and Use Case
Diagramsdescribing both the static structure and dynamic behavior of a system.
However, communication diagrams use the free-form arrangement of objects and links as used in Object
diagrams. In order to maintain the ordering of messages in such a free-form diagram, messages are labeled
with a chronological number and placed near the link the message is sent over. Reading a communication
diagram involves starting at message 1.0, and following the messages from object to object.
11
Fig 6-3: Communication diagram for restaurant reviews
Fig 6-3 shows the communication diagram in which the actor is the end user. Here there is a synchronous
process in which the end user can perform all the functions of importing, data cleaning, classifying and
splitting .More over the actor can also receive response for the actions performed.
12
The nodes appear as boxes, and the artifacts allocated to each node appear as rectangles within the boxes.
Nodes may have subnodes, which appear as nested boxes. A single node in a deployment diagram may
conceptually represent multiple physical nodes, such as a cluster of database servers.
libraries dataset
datacleanin
g
user
bag of words
system
accuracy features and labels
Fig 6-4 shows the deployment diagram in which the actor is the end user. Here there is a synchronous
process in which the end user can perform all the functions of importing, data cleaning, classifying and
splitting .More over the actor can also receive response for the actions performed.
In Unified Modeling Language (UML), a component diagram depicts how components are wired together to
form larger components or software systems. They are used to illustrate the structure of arbitrarily complex
systems.A component is something required to execute a stereotype function. Examples of stereotypes in
components include executables, documents, database tables, files, and library files. Components are wired
together by using an assembly connector to connect the required interface of one component with the
provided interface of another component.
user interface
algorithm
Fig 6-5 shows the component diagram in which the actor is the end user. Here there is a synchronous process
in which the end user can perform all the functions of importing, data cleaning, classifying and
splitting .More over the actor can also receive response for the actions performed.
6.1.6 Activity diagrams: Activity diagrams are graphical representations of workflowsof stepwise
activities and actionswith support for choice, iteration and concurrency. In the Unified Modeling Language,
activity diagrams are intended to model both computational and organizational processes (i.e., workflows),
as well as the data flows intersecting with the related activities.Although activity diagrams primarily show
the overall flow of control, they can also include elements showing the flow of data between activities
through one or more data stores
14
Fig 6-6 shows the Activity diagram in which the actor is the end user. Here there is a synchronous process in
which the end user can perform all the functions of importing, data cleaning, classifying and splitting .More
over the actor can also receive response for the actions performed.
15
7. SYSTEM IMPLEMENTATION
CODE:
VIEWS.PY:
def result(request):
# Importing the dataset
dataset = pd.read_csv('static/Restaurant_Reviews.tsv',delimiter='\t', quoting=3)
16
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selectionimport train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
#d={'i':accuracy,'j':cm}
d = {'i': metrics.accuracy_score(y_test, y_pred), 'j': metrics.confusion_matrix(y_test, y_pred)}
return render(request,'restaurant.html',context=d)
################################################################################
class Home(TemplateView):
template_name = 'home.html'
restaurant.html:
17
<!DOCTYPE html>
{% extends 'base.html' %}
{% load staticfiles%}
{% block body_block%}
<title>Restaurant Reviews</title>
<style>
h1{
color: green;
text-align: center;
}
h2{
color: blue;
}
p{
font-family: Arial;
}
</style>
<div class="container">
<div class="jumbotron">
<h1><em>This is Result page!!!</em></h1><br><br>
<h2>Accuracy :{{ i}}</h2><br>
<h2>Confusion Matrix :{{ j }}</h2><br><br>
<p><imgsrc="{% static 'images/restaurant.jpg' %}" align="center"></p>
</div>
</div>
{% endblock%}
home.html:
{% extends 'base.html' %}
{% load staticfiles%}
18
{% block body_block%}
<title>Restaurant Reviews</title>
<style>
h1{
color: green;
text-align: center;
}
h2{
color: blue;
}
p{
font-family: Arial;
}
</style>
<div class="container">
<div class="jumbotron">
<h1>This is about the restaurant reviews!!</h1><br><br>
<imgsrc="{% static 'images/restaurant1.jpg' %}" align="center"><br><br>
<p>
The purpose of this analysis is to build a prediction model to predict whether a review on the restaurant is
positive or negative.
To do so, we will work on Restaurant Review dataset, we will load it into predicitve algorithms
Multinomial Naive Bayes,
Bernoulli Naive Bayes and Logistic Regression. In the end, we hope to find a "best" model for predicting
the review's sentiment.
</p>
<p>
However since a considerable amount of information exists as text-fragments without having
any kind of numerical scales, it is hard to classify their evaluation efficiently without reading full text.
Here we will
focus on extracting scored ratings from text fragments on the web and suggests various experiments in
order to improve the
19
quality of a classifier.
</p>
</div>
</div>
{% endblock%}
base.html:
<!DOCTYPE html>
{% load staticfiles%}
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css"
integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPM
O" crossorigin="anonymous">
</head>
<div class="container">
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<a class="navbar-brand" href="{% url 'app1:home' %}">Home</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-
target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-
label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
<li class="nav-item active">
<a class="nav-link" href="{% url 'app1:result'%}">Restaurant Reviews<span class="sr-
only">(current)</span></a>
</li>
</ul>
</div>
</nav>
20
</div>
{% block body_block%}
{% endblock%}
</body>
</html>
Restaurant_Reviews.tsv:
/* .tsv file consists of thousand lines separated by tab space and some of them are mentioned
below */
Review Liked 1
Wow... Loved this place. 1
Crust is not good. 0
Not tasty and the texture was just nasty. 0
Stopped by during the late May bank holiday off Rick Steve recommendation and loved it. 1
The selection on the menu was great and so were the prices. 1
Now I am getting angry and I want my damn pho. 0
Honeslty it didn't taste THAT fresh.) 0
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a
warmer. 0
The fries were great too. 1
A great touch. 1
8. SAMPLE CODE
21
#!/usr/bin/env python
import os
import sys
if __name__ == '__main__':
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'restaurant.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
"""
Django settings for restaurant project.
import os
22
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
ALLOWED_HOSTS = []
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'app1'
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
ROOT_URLCONF = 'restaurant.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR, 'templates')]
,
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
23
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'restaurant.wsgi.application'
# Database
# https://docs.djangoproject.com/en/2.1/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
}
}
# Password validation
# https://docs.djangoproject.com/en/2.1/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME':
'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME':
'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME':
'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME':
'django.contrib.auth.password_validation.NumericPasswordValidator',
},
24
]
# Internationalization
# https://docs.djangoproject.com/en/2.1/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
STATIC_URL = '/static/'
STATIC_ROOT=os.path.join(BASE_DIR,'static')
9. SOFTWARE ENVIRONMENT
25
9.1 Python
Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile
your program before executing it. This is similar to PERL and PHP.
Django is a Python-based free and open-source web framework, which follows the model-view-
template (MVT) architectural pattern. It is maintained by the Django Software Foundation (DSF), an
independent organization established as a non-profit.
Django's primary goal is to ease the creation of complex, database-driven websites. The framework
emphasizes reusability and "pluggability" of components, less code, low coupling, rapid
development, and the principle of don't repeat yourself.] Python is used throughout, even for settings
files and data models. Django also provides an optional administrative create, read, update and
delete interface that is generated dynamically through introspection and configured via admin
models.
a form serialization and validation system that can translate between HTML forms and values
suitable for storage in the database
a template system that utilizes the concept of inheritance borrowed from object-oriented
programming
support for middleware classes that can intervene at various stages of request processing and carry
out custom functions
26
languages
27
10.SYSTEM TESTING
The reason for testing is to find system errors. Testing is the way toward attempting to find each possible
blame or shortcoming in a work item. It gives an approach to check the usefulness of parts, sub-gatherings,
congregations or potentially a completed item It is the way toward practicing programming with the aim of
guaranteeing that the programming framework lives up to its necessities and client desires and does not
bomb in an unsuitable way. There are different sorts of test. Every test sort addresses a particular testing
prerequisite.
It is characterized as the key archive, which clarifies the general system of how to test an application in a
powerful, productive and in an enhanced way. The following testing techniques are performed
A web-based system needs to be checked completely from end-to-end before it goes live for end users.
By performing website testing, an organization can make sure that the web-based system is functioning
properly and can be accepted by real-time users.
28
1) Functionality Testing
2) Usability testing
3) Interface testing
4) Compatibility testing
5) Performance testing
6) Security testing
1) Functionality Testing
Test for – all the links in web pages, database connection, forms used for submitting or getting information
from the user in the web pages, Cookie testing etc.
Test the outgoing links from all the pages to the specific domain under test.
Test all internal links.
Test links jumping on the same pages.
Test links used to send email to admin or other users from web pages.
Test to check if there are any orphan pages.
Finally, link checking includes, check for broken links in all above-mentioned links.
Test forms on all pages:
Forms are an integral part of any website. Forms are used for receiving information from users and to
interact with them. So what should be checked in these forms?
So sign up flow should get executed correctly. There are different field validations like email Ids, User
financial info validations etc. All these validations should get checked in manual or automated web testing.
Cookies Testing:
29
Cookies are small files stored on the user machine. These are basically used to maintain the session- mainly
the login sessions. Test the application by enabling or disabling the cookies in your browser options.
Test if the cookies are encrypted before writing to the user machine. If you are testing the session cookies
(i.e. cookies that expire after the session ends) check for login sessions and user stats after the session ends.
Check effect on application security by deleting the cookies. (I will soon write a separate article on cookie
testing as well)
If you are optimizing your site for Search engines then HTML/CSS validation is the most important one.
Mainly validate the site for HTML syntax errors. Check if the site is crawlable to different search engines.
Database testing:
Data consistency is also very important in a web application. Check for data integrity and errors while you
edit, delete, modify the forms or do any DB related functionality.
Check if all the database queries are executing correctly, data is retrieved and also updated correctly. More
on database testing could be a load on DB, we will address this in web load or performance testing below.
Links
i. Internal Links
ii. External Links
iii. Mail Links
iv. Broken Links
Forms
i. Field validation
ii. Error message for wrong input
iii. Optional and Mandatory fields
Database
Testing will be done on the database integrity.
30
2) Usability Testing
Usability testing is the process by which the human-computer interaction characteristics of a system are
measured, and weaknesses are identified for correction.
• Ease of learning
• Navigation
• Subjective user satisfaction
• General appearance
Navigation means how a user surfs the web pages, different controls like buttons, boxes or how the user uses
the links on the pages to surf different pages.
Content should be logical and easy to understand. Check for spelling errors. Usage of dark colors annoys the
users and should not be used in the site theme.
You can follow some standard colors that are used for web page and content building. These are the
commonly accepted standards like what I mentioned above about annoying colors, fonts, frames etc.
Content should be meaningful. All the anchor text links should be working properly. Images should be
placed properly with proper sizes.
These are some of the basic important standards that should be followed in web development. Your task is to
validate all for UI testing.
31
Other user information for user help:
Like search option, sitemap also helps files etc. The sitemap should be present with all the links in websites
with a proper tree view of navigation. Check for all links on the sitemap.
“Search on the site” option will help users to find content pages that they are looking for easily and quickly.
These are all optional items and if present they should be validated.
3) Interface Testing
In web testing, the server side interface should be tested. This is done by verifying that communication is
done properly. Compatibility of the server with software, hardware, network, and the database should be
tested.
Check what happens if the user interrupts any transaction in-between? Check what happens if the connection
to the web server is reset in between?
4) Compatibility Testing
Compatibility of your website is a very important testing aspect. See which compatibility test to be executed:
Browser compatibility
Operating system compatibility
Mobile browsing
Printing options
32
Browser compatibility:
In my web-testing career, I have experienced this as the most influencing part of website testing.
Some applications are very dependent on browsers. Different browsers have different configurations and
settings that your web page should be compatible with.
Your website coding should be a cross-browser platform compatible. If you are using java scripts or AJAX
calls for UI functionality, performing security checks or validations then give more stress on browser
compatibility testing of your web application.
Test web application on different browsers like Internet Explorer, Firefox, Netscape Navigator, AOL, Safari,
Opera browsers with different versions.
OS compatibility:
Some functionality in your web application is that it may not be compatible with all operating systems. All
new technologies used in web development like graphic designs, interface calls like different API’s may not
be available in all Operating Systems.Hence test your web application on different operating systems like
Windows, Unix, MAC, Linux, Solaris with different OS flavors.
Mobile browsing:
We are in the new technology era. So in future Mobile browsing will rock. Test your web pages on mobile
browsers. Compatibility issues may be there on mobile devices as well.
Printing options:
If you are giving page-printing options then make sure fonts, page alignment, page graphics etc., are getting
printed properly. Pages should fit the paper size or as per the size mentioned in the printing option.
5) Performance testing
The web application should sustain to heavy load. Web performance testing should include:
Web load testing: You need to test if many users are accessing or requesting the same page. Can system
sustain in peak load times? The site should handle many simultaneous user requests, large input data from
users, simultaneous connection to DB, heavy load on specific pages etc.
Web Stress testing: Generally stress means stretching the system beyond its specified limits. Web stress
testing is performed to break the site by giving stress and its checked as for how the system reacts to stress
and how it recovers from crashes. Stress is generally given on input fields, login and sign up areas.
In web performance, testing website functionality on different operating systems and different hardware
platforms is checked for software and hardware memory leakage errors.
Performance testing can be applied to understand the web site’s scalability or to benchmark the performance
in the environment of third-party products such as servers and middleware for potential purchase.
Connection Speed
Tested on various networks like Dial-Up, ISDN etc.
Load.
Stress
i. Continuous Load
ii. Performance of memory, CPU, file handling etc..
6) Security Testing
Following are some of the test cases for web security testing:
Test by pasting internal URL directly into the browser address bar without login. Internal pages should
not open.
If you are logged in using username and password and browsing internal pages then try changing URL
options directly. I.e. If you are checking some publisher site statistics with publisher site ID= 123. Try
34
directly changing the URL site ID parameter to different site ID which is not related to the logged in
user. Access should be denied for this user to view others stats.
Try some invalid inputs in input fields like login username, password, input text boxes etc. Check the
system’s reaction to all invalid inputs.
Web directories or files should not be accessible directly unless they are given download option.
Test the CAPTCHA for automating script logins.
Test if SSL is used for security measures. If it is used, the proper message should get displayed when
user switch from non-secure HTTP:// pages to secure HTTPS:// pages and vice versa.
All transactions, error messages, security breach attempts should get logged in log files somewhere on
the web server.
The primary reason for testing the security of a web is to identify potential vulnerabilities and subsequently
repair them.
Network Scanning
Vulnerability Scanning
Password Crackin
35
11.SYSTEM STUDY
11.1 Technical Feasibility
Data Availability: There is a vast amount of publicly available restaurant review data that can be scraped or accessed
via APIs.
Tools and Technologies: NLP libraries (such as NLTK, spaCy, and Hugging Face), machine learning frameworks
(such as scikit-learn, TensorFlow, and PyTorch), and database systems (such as MySQL, MongoDB) are readily
available.
Expertise: The project requires expertise in NLP, machine learning, and software development, which are accessible
skills in the current technological landscape.
Cost of Development: The primary costs involve data acquisition, computational resources for model training, and
development time. Open-source tools and cloud services can help minimize costs.
Potential Benefits: The system can provide valuable insights to restaurant owners, improving customer satisfaction and
potentially increasing revenue.
Integration: The system can be integrated with existing restaurant management systems or review platforms.
Maintenance: Regular updates to the model with new data will be required to maintain accuracy.
36
12. CONCLUSION
Humans are the "Gold Standard" of sentiment analysis yet there is always disagreement within a group of
raters on sentiment. Humans generally only agree about 80% of the time. Automatic sentiment analysis can
strive towards this level but, obviously, can not exceed it.
People and automatic systems both have a place in the process. The Automated systems can go through huge
quantities of data while humans can do a higher quality job on a smaller sample. Saying "People are no good
because they are not scalable" is probably just as silly as saying "Automatic systems are no good because
they are not as accurate".
Focus on and use the strengths of each as needed for your particular situation.It will have a lot to do with
social forums/platforms where people express free opinion. Presently tweets are one such open medium, then
if facebook at some point chooses to make the timeline updates/status messages open to search (I think it will
someday do that through a minuscule sounding update in "privacy policy") it will be gold mine of real-time
sentiments.
Present Sentiments hold a key to the future events. To make it sound a bit technical, you can say that the
sentiments represent the "present value of future events". Now this value can have deep social, political and
monetary significance. It can be "Expression of opinion about a public figure", "opinions expressed through
tweets before elections", or "the buzz before a movie release", all these can be great cues for things to come.
Therefore when people comment about present news stories, the sentiment analysis can actually offer a key
to predict the future outcomes or atleast anticipate them better!
37
13.REFERENCES
1] Ariyasriwatana, W., Buente, W., Oshiro, M., & Streveler, D. (2014). Categorizing health-related cues
to action: using Yelp reviews of restaurants in Hawaii. New Review of Hypermedia and Multimedia, 20(4),
317-340.
[2] Byers, J. W., Mitzenmacher, M., & Zervas, G. (2012, June). The groupon effect on yelp ratings: a
root cause analysis. In Proceedings of the 13th ACM conference on electronic commerce (pp. 248-265).
ACM.
[3] Hicks, A., Comp, S., Horovitz, J., Hovarter, M., Miki, M., & Bevan, J. L. (2012). Why people use
Yelp. com: An exploration of uses and gratifications. Computers in Human Behavior, 28(6), 2274-2279.
[4] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. S. (2013, July). What yelp fake review
filter might be doing?. In ICWSM. 6
[5] dos Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment Analysis
of Short Texts. In COLING (pp. 69-78).
[6] Mullen, T., & Collier, N. (2004, July). Sentiment Analysis using Support Vector Machines with
Diverse Information Sources. In EMNLP (Vol. 4, pp. 412-418).
[7] Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. (2014, August). NRC-Canada-2014:
Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on
Semantic Evaluation (SemEval 2014) (pp. 437-442).Dublin, Ireland: Association for Computational
Linguistics and Dublin City University.
[8] Huang, J., Rogers, S., & Joo, E. (2014). Improving restaurants by extracting subtopics from yelp
reviews. iConference 2014 (Social Media Expo).
[9] Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-
gradient solver for svm. Mathematical programming, 127(1), 3-30.
38
[10] Saif, Hassan, et al. "Onstopwords, filtering and data sparsity for sentiment analysis of
Twitter." (2014): 810-817.
12.SCREEN SHOTS
39
Result page
40