Team14 Mini Report FINAL
Team14 Mini Report FINAL
AUTOMATED REVIEW
CLASSIFICATION USING ML
Submitted to
BACHELOR OF TECHNOLOGY
in
By
Mrs D.SRIVENI
Assistant Professor
Department of CSE
CERTIFICATE
This is to certify that the project entitled AUTOMATED REVIEW CLASSIFICATION USING ML
being submitted by
G. Prathyusha (18BD1A054L)
M.Sindhuja (18BD1A0554)
In partial fulfilment for the award of Bachelor of Technology in Computer Science and Engineering
affiliated to the Jawaharlal Nehru Technological University, Hyderabad during the year 2021-22.
External Examiner
Vision of KMIT
Producing quality graduates trained in the latest technologies and related tools and striving to make India a
world leader in software and hardware products and services.To achieve academic excellence by imparting
in depth knowledge to the students, facilitating research activities and catering to the fast growingandever-
changing industrial demands and societal needs.
Mission of KMIT
To provide a learning environment that inculcates problem solving skills, professional, ethical
responsibilities, lifelong learning through multi modal platforms and prepare students to become
successful professionals.
To establish industry institute Interaction to make students ready for the industry.
To provide exposure to students on latest hardware and software tools.
To promote research based projects/activities in the emerging areas of technology convergence.
To encourage and enable students to not merely seek jobs from the industry but also to create new
enterprises.
To induce a spirit of nationalism which will enable the student to develop, understand lndia's
challenges and to encourage them to develop effective solutions.
To support the faculty to accelerate their learning curve to deliver excellent service to students.
Vision & Mission of CSE
Vision of the CSE
To be among the region's premier teaching and research Computer Science and Engineering departments
producing globally competent and socially responsible graduates in the most conducive academic
environment.
Mission of the CSE
To provide faculty with state of the art facilities for continuous professional development and
research, both in foundational aspects and of relevance to emerging computing trends.
To impart skills that transform students to develop technical solutions for societal needs and
inculcate entrepreneurial talents.
To inculcate an ability in students to pursue the advancement of knowledge in various specializations
of Computer Science and Engineering and make them industry-ready.
To engage in collaborative research with academia and industry and generate adequate resources for
research activities for seamless transfer of knowledge resulting in sponsored projects and
consultancy.
To cultivate responsibility through sharing of knowledge and innovative computing solutions that
benefit the society-at-large.
To collaborate with academia, industry and community to set high standards in academic excellence
and in fulfilling societal responsibilities.
PROGRAM OUTCOMES (POs)
1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering fundamentals and
an engineering specialization to the solution of complex engineering problems.
2. Problem analysis: Identify formulate, review research literature, and analyse complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences
3. Design/development of solutions: Design solutions for complex engineering problem and design system
component or processes that meet the specified needs with appropriate consideration for the public health
and safety, and the cultural societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create select, and, apply appropriate techniques, resources, and modern engineering
and IT tools including prediction and modelling to complex engineering activities with an understanding of
the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to societal, health,
safety. legal und cultural issues and the consequent responsibilities relevant to professional engineering
practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse
teams and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one's own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
PSO1: An ability to analyse the common business functions to design and develop appropriate Information
Technology solutions for social upliftment.
PSO2: Shall have expertise on the evolving technologies like Mobile Apps, CRM, ERP, Big Data, etc.
PEO1: Graduates will have successful careers in computer related engineering fields or will be able to
successfully pursue advanced higher education degrees.
PEO2: Graduates will try and provide solutions to challenging problems in their profession by applying
computer engineering principles.
PEO3: Graduates will engage in life-long learning and professional development by rapidly adapting
changing work environment.
PEO4: Graduates will communicate effectively, work collaboratively and exhibit high levels of
professionalism and ethical responsibility.
PROJECT OUTCOMES
P1: To pick current days problems and simplify the process using modern tools and technologies.
P2: One comes to know which data pre-processing technique to be selected when several techniques
are prevailed.
P3: One can learn how to choose a particular algorithm to meet the project requirements.
P4: Among all the scales used to judge a model’s performance, one can learn which metric to be used.
LOW - 1
MEDIUM - 2
HIGH - 3
PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
P1 1 3 3 1 2 1 3 1
P2 3 2 1 2 1 2 2
P3 3 3 2 1 1 2 2
P4 2 1 3 1 2
P1 3 1
P2 2
P3 1
P4 1
PROJECT OUTCOMES MAPPING PROGRAM EDUCATIONAL OBJECTIVES
P1 3 3 1 2
P2 3 3 1
P3 1 3 2
P4 2
Team Number:14
Team details:
We render our thanks to Dr. MaheshwarDutta, B.E., M Tech., Ph.D., Principal who
We are grateful to Mr. Neil Gogte, Director for facilitating all the amenities required
We express our sincere gratitude to Mr. S. Nitin, Director and Mrs.Anuradha, Dean
We are also thankful to Dr. S. Padmaja, Head of the Department for providing us
with both time and amenities to make this project a success within the given schedule.
We are also thankful to our guide Mrs.D.Sriveni, for his/her valuable guidance and
We would like to thank the entire CSE Department faculty, who helped us directly
and indirectly in the completion of the project. We sincerely thank our friends and family
ABSTRACT i
LIST OF FIGURES ii
CHAPTERS
1. INTRODUCTION 1-3
1.1. Purpose of the Project 2
1.2. Problem with the Existing System 2
1.3. Proposed System 2
1.4. Scope of the Project 3
1.5. Architecture Diagram 4
2. SOFTWARE REQUIREMENTS 6-9
SPECIFICATIONS
2.1. What is SRS 6
2.2. Role of SRS 6
2.3. Requirements Specification Document 6
2.4. Functional Requirements 7
2.5. Non-Functional Requirements 7
2.6. Software Requirements 9
2.7. Hardware Requirements 9
3. LITERATURE SURVE 11-15
3.1. Technologies Used 12
4. SYSTEM DESIGN 17-21
4.1. Introduction to UML 17
4.2. UML Diagrams 17
4.2.1. Use Case diagram 17
4.2.2. Sequence diagram 19
4.2.3. Class diagram 20
5. IMPLEMENTATION 23-26
5.1. Pseudo code 23
5.2. Code Snippets 24
6. TESTING 28-30
6.1. Introduction to Testing 28
6.2. Software Test Lifecycle 29
6.3. Test Cases 30
7. SCREENSHOTS 32-34
7.1. Mounting data and loading required 32
libraries
7.2. load text from training and test data 32
7.3. Vectorizationand fitting data 33
7.4. Plotting confusion matrix and testing 33
7.5. Testing results 34
FURTHER ENHANCEMENTS 36
CONCLUSION 38
REFERENCES 40
ABSTRACT
Social media has given ample opportunity to the consumer in terms of gauging the
quality of the products by reading and examining the reviews posted by the users of
online shopping platforms. Moreover, online platforms such as “Amazon.com ” provides
an option to the users to label a review as 'Helpful if they find the content of the review
valuable. This helps both consumers and manufacturers to evaluate general preferences
in an efficient manner by focusing mainly on the selected helpful reviews. However, the
recently posted reviews get comparatively fewer votes and the higher voted reviews get
into the users' radars first. This study deals with these issues by building an automated
text classification system to predict the helpfulness of online reviews irrespective of the
time they are posted. The study is conducted on the data collected from Amazon.com
consisting of the reviews on fine food. The focus of previous research has mostly
remained on finding a correlation between the review helpfulness measure and review
content based features. In addition to finding significant content-based features, this
study uses three different approaches to predict the review helpfulness which includes
vectorized features, review and summary centric features, and word embedding-based
features. Moreover, the conventional classifiers used for text classification such as
Support vector machine, Logistic regression, and Multinomial naive Bayes are compared
with a decision tree-based ensemble classifier, namely Extremely randomized trees. It is
found that the extremely randomized trees classifier outperforms the conventional
classifiers except in the case of vectorized features with unigrams and bigrams. Among
the features, vectorized features perform much better compared to other features. This
study also found that the content-based features such as review polarity, review
subjectivity. Review character and word count, review average word length, and
summary character count are significant predictors of the review helpfulness.
i
LIST OF FIGURES
1 Proposed System 3
.
4
1 Architecture diagram 4
.
5
4.3 Use Case Diagram 18
.1
4.3 Sequence Diagram 20
.2
4.3 Class Diagram 21
.3.
ii
REVIEW CLASSIFICATION USING ML
CHAPTER -1
1. INTRODUCTION
1
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
The purpose of this project is to provide review classification for any product.Why
reviews are important is the fact that they help boost customer loyalty towards a brand. A
person who takes the time to leave a positive review on Yelp about a particular brand on
product is likely to come back for more business should the need arise . There are a
number of productsavailable in the market. Customers needs to read the positive and
negative reviews for each product and choose the best one. This proposed system
provides weather the reviews are positive or negativeandhelps in calculating how good a
product is.
A sentiment model predicts whether the opinion given in a piece of text is positive,
negative, or neutral. Text classification is the process of categorizing the text into a
group of words. By using NLP, text classification can automatically analyze text and
then assign a set of predefined tags or categories based on its context. NLP is used for
sentiment analysis, topic detection, and language detection. In a bag of words, a vector
represents the frequency of words in a predefined dictionary of a word list.
Of all data, text is the most unstructured form and so means we have a lot of cleaning
to do. These pre-processing steps help convert noise from high dimensional features to the
low dimensional space to obtain as much accurate information as possible from the text.
Data classification is the process of classifying or categorizing the raw texts into
2
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
The scope of Text classification using sentimental analysis includes majority divided
into two categories: A bag of words model: In this case, all the sentences in our dataset
are tokenized to form a bag of words that denotes our vocabulary.
3
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
The data or review set with which the model will be trained are converted to vectors and
machine learning algorithm process them. The algorithm is trained for maximum
accuracy. A predictive Model is generated to classify the reviews or data on its own.
When we give vectorised document of test data, this predictive model should classify
them as positive or negative.
4
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -2
5
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
7
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
requirements:
NFR 1)Performance Requirements:
● The application must have a minimum processor speed so that there are some
restrictions on what type of computer can use it. However, this will be as small as
possible to enable a broad range of clients to use the application.
● From studies it can be seen that speed was a common issue while dealing with large
set of data /reviews.
● The system must also aim to use minimum hard disk space yet keep the quality of the
available facility as high as possible.
● The system must automatically log out all customers after a period of inactivity.
● The system should not leave any cookies on the customer’s computer containing the
user’s password.
● Information of users such as IP addresses will be kept private so that third parties
cannot gain access to this personal information in order to keep within the Data
Protection Act.
NFR 4) Maintainability:
● We can save the information in our Google drive and we can use the tools that are
available on googleColab.
NFR 6) Portability:
● So The end-user part is fully portable and any system using any web browser should
be able to use the features of the system, including any hardware platform that is
available or will be available in the future.
● An end-user is using this system on any OS.
● The system shall run on PC, Laptops.
NFR 7) Performance:
The performance of the developed applications can be calculated by using following
methods: Measuring enables you to identify how the performance of your application
stands in relation to your defined performance goals and helps you to identify the
bottlenecks that affect your application performance.
8
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
It helps you identify whether your application is moving toward or away from your
performance goals.
Defining what you will measure, that is, your metrics, and defining the objectives
for each metric is a critical part of your testing plan.
Performance objectives include the following:
Response time, Latency throughput or Resource utilization.
9
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -3
10
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
3 LITERATURE SURVEY
In this paper [3] proposed by Ari Aulia Hakim et al, the proposed system employs
the term-frequency and inverse document frequency to prioritize the keywords by
11
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
weighting it. The keywords are weighted by using term frequency (TF) in order to
find the number of times thekeyword has occurred in the document. Then Inverse
Document Frequency (IDF) is used to find the number of documents in which the
keywords are found.This method is found to be very efficient but there seems to be a
small defect. Since the method considers all the keywords and weights them, the
keyword with the highest frequency might not relate to the corresponding category.
Sometimes this system may give a wrong output.
SVM is a very good classifier. It classifies the documents based on the lowest
structural risk principle and creation of hyperplanes. The main disadvantage of this
system is that the SVM uses all the keyword regardless of whether they are
important or not. Sometimes this might give a wrong output.
PYTHON
Python is the most widely used programming language for AI and ML. The reason for it
to be the leader in this is because of its simple syntax and versatility, also due to its open-
source availability. It includes are many built - in libraries just for AI and ML.
Examples:
12
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
• Scikit-Learn
• Tensorflow
• Keras
Key Features:
• Python is easy to learn.
• There is no need to recompile the source code. Make modification and the results can
be seen easily.
• It is independent of the operating system.
PANDAS
Pandas is an open - source library that provides high - performance data manipulation in
Python. A lot of processing is required for data analysis, such as cleaning, restructuring,
merging etc. There are different tools for this purpose. But Pandas is mostly preferred
due to simplicity, fast working.
Key Features:
• It works on Data Frame objects, which is fast and efficient.
• Used for dataset reshaping and pivoting.
• Alignment and Integration of data is done in case of any missing data.
• Provides the functionality of Time Series.
• The datasets can be performed in different formats such as, matrix, tabular,
heterogeneous, time series.
• It can be integrated with other libraries such as SciPy, scikit learn.
NUMPY
NumPy means numeric python, and is a python library which is used to work with a
single andmulti-dimensional array. It has a very powerful data structure. The multi-
dimensional arrays and matrices are therefore computed in an optimal way. It can handle
vast amount of data in a convenient and efficient way.
Advantages:
• Used to compute array.
• Multidimensional arrays are implemented efficiently.
• It has the ability to perform scientific calculations.
• Used to reshape the data stored in multidimensional arrays.
13
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
• Contains built - in functions for linear algebra and random number generation.
MATPLOTLIB
Matplotlib is a Python packages used for data visualization. It can be used in Jupyter
notebook and web application. It has a procedural interface named the Pylab, which is
designed to resemble MATLAB, it is a proprietary programming language developed by
MathWorks.
SCIKIT LEARN
In python Sklearn is a robust library for machine learning. In python it provides the
selection of efficient tools and statistical modeling that includes classification,
regression, and clustering and dimensionality reduction via a consistence interface. This
library is built upon numpy, scipy and matplotlib. It is not only focusing on loading,
manipulating and summarizing the data, but also focusing on modeling the data.
TENSORFLOW
• Graph: TensorFlow uses a graph framework. All the series computations that are done
during the training are gathered and described by the graph.
KERAS
Keras is an open-source and high-level neural network library. It can be run on Theano,
TensorFlow and CNTK. Francois Chollet, one of the Google engineers developed this
library. It supports Convolution Networks, Recurrent Networks and their combination. It
is made user- friendly, extensible and modular to facilitate faster experimentation of
deep neural networks. It uses backend library to resolve the problem of low-level
computations as it cannot handle it.
Features:
• It is a multi-backend and supports multi-platform that helps the encoders come together
for coding.
• All concepts can be easily grasped.
• Supports fast prototyping.
• It runs on both CPU and GPU
• Models can be produced easily using keras.
GOOGLE COLAB
• It Colab for short, is a Google Research product, which allows developers to write and
execute Python code through their browser. Google Colab is an excellent tool for deep
learning tasks.
15
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -4
16
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
4. SYSTEM DESIGN
The Unified Modeling Language allows the software engineer to express an analysis
model using the modeling notation that is governed by a set of syntactic, semantic and
pragmatic rules. A UML system is represented using five different views that describe
the system from distinctly different perspective. Each view is defined by a set of
diagram, which is as follows:
1. User Model View
This view represents the system from the users’ perspective. The analysis
representation describes a usage scenario from the end-users’ perspective.
2. Structural Model View
In this model, the data and functionality are arrived from inside the system. This
model view models the static structures.
3. Behavioural Model View
It represents the dynamic of behavioural as parts of the system, depicting he
interactions of collection between various structural elements described in the
user model and structural model view.
4. Implementation Model View
In this view, the structural and behavioural as parts of the system are represented
as they are to be built.
5. Environmental Model View
In this view, the structural and behavioural aspects of the environment in which
the system is to be implemented are represented.
4.2.1 UseCaseDiagram
To model a system, the most important aspect is to capture the dynamic behaviour.
To clarify a bit in details, dynamic behaviour means the behaviour of the system when it
17
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
is running/operating.
So only static behaviour is not sufficient to model a system rather dynamic behaviour is
more important than static behaviour. In UML there are five diagrams available to model
dynamic nature and use case diagram is one of them. Now as we have to discuss that the
use case diagram is dynamic in nature there should be some internal or external factors
for making the interaction.
These internal and external agents are known as actors. So use case diagrams are
consisting of actors, use cases and their relationships. The diagram is used to model the
system/subsystem of an application. A single use case diagram captures a particular
functionality of a system. So to model the entire system numbers of use case diagrams
are used.
18
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
Use case diagrams are used to gather the requirements of a system including internal
and external influences. These requirements are mostly design requirements. So when a
system is analysed to gather its functionalities use cases are prepared and actors are
identified. In brief, the purposes of use case diagrams can be as follows:
a. Used to gather requirements of a system.
b. Used to get an outside view of a system.
c. Identify external and internal factors influencing the system.
d. Show the interacting among the requirements are actors.
Destroying Objects
Objects can be terminated early using an arrow labelled "<< destroy >>" that
points to an X. This object is removed from memory. When that object's lifeline
ends, you can place an X at the end of its lifeline to denote a destruction
occurrence.
Loops
A repetition or loop within a sequence diagram is depicted as a rectangle. Place
the condition for exiting the loop at the bottom left corner in square brackets [].
Guards
20
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
When modelling object interactions, there will be times when a condition must be met
for a message to be sent to an object. Guards are conditions that need to be used
throughout UML diagrams to control flow.
Class diagrams are the main building blocks of every object oriented methods. The class
diagram can be used to show the classes, relationships, interface, association, and
collaboration. UML is standardized in class diagrams. Since classes are the building
block of an application that is based on OOPs, so as the class diagram has appropriate
structure to represent the classes, inheritance, relationships, and everything that OOPs
have in its context. It describes various kinds of objects and the static relationship in
between them.
21
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
22
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -5
23
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
5. IMPLEMENTATION
Step 1: Mount the training and testing dataset into colabinorder to get usage accessibility
Step 2: Load the required libraries
Step 3: Create a function to load the text and labels from train and test set
Step 4: Import required libraries for text preprocessing
24
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.python.keras import models, layers, optimizers
import tensorflow
from tensorflow.keras.preprocessing.text import Tokenizer, text_to_word_sequence
from tensorflow.keras.preprocessing.sequence import pad_sequences
import bz2
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
import re
#Creating a function to load the text and labels from train and test set
def get_labels_and_texts(file):
target = {0:'Negative', 1:'Positive'}
file1 = open(file, 'r')
Lines = file1.readlines()
label = [target[int(label[9])-1] for label in Lines]
reviews = [review[11:] for review in Lines]
25
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
df = pd.DataFrame(data = {"label":label, "reviews": reviews})
return df
train = get_labels_and_texts('/content/drive/MyDrive/data/train.ft.txt').sample(140000, rando
m_state=42)
test = get_labels_and_texts('/content/drive/MyDrive/data/test.ft.txt')
#text pre-processing
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import plot_confusion_matrix, plot_precision_recall_curve
#vectorization
vec = TfidfVectorizer(ngram_range=(1,2),min_df=3, max_df=0.9, strip_accents='unicode', us
e_idf=1,smooth_idf=1,sublinear_tf=1)
encoder = LabelEncoder()
X_train = vec.fit_transform(train['reviews'])
X_test = vec.transform(test['reviews'])
Y_train = encoder.fit_transform(train['label'])
Y_test = encoder.transform(test['label'])
log_model = LogisticRegression(C=4, dual=True, solver='liblinear', random_state=42)
log_model.fit(X_train, Y_train)
predicts = log_model.predict_proba(X_test)
plot_confusion_matrix(log_model, X_test, Y_test)
26
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
27
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -6
28
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
6. TESTING
Testing is the process of evaluating a system or its component(s) with the intent to
find whether it satisfies the specified requirements or not. Testing is executing a system
in order to identify any gaps, errors, or missing requirements in contrary to the actual
requirements.
It depends on the process and the associated stakeholders of the project(s). In the IT
industry, large companies have a team with responsibilities to evaluate the developed
software in context of the given requirements. Moreover, developers also conduct testing
which is called Unit Testing. In most cases, the following professionals are involved in
testing a system within their respective capacities:
● Software Tester
● Software Developer
● Project Lead/Manager
● End User
Levels of testing include different methodologies that can be used while conducting
software testing. The main levels of software testing are:
● Functional Testing
● Non-functional Testing
Functional Testing
This is a type of black-box testing that is based on the specifications of the software
that is to be tested. The application is tested by providing input and then the results are
examined that need to conform to the functionality it was intended for. Functional testing
of a software is conducted on a complete,integratedsystem to evaluate the system.
29
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
1. Requirements Analysis
2. Test Planning
3. Test Analysis
4. Test Design
● Requirements Analysis
In this phase testers analyze the customer requirements and work with developers
during the design phase to see which requirements are testable and how they are
going to test those requirements.
It is very important to start testing activities from the requirements phase itself
because the cost of fixing defect is very less if it is found in requirements phase
rather than in future phases.
● Test Planning
In this phase all the planning about testing is done like what needs to be tested,
how the testing will be done, test strategy to be followed, what will be the test
environment, what test methodologies will be followed, hardware and software
availability, resources, risks etc. A high level test plan document is created which
includes all the planning inputs mentioned above and circulated to the
stakeholders.
● Test Analysis
After test planning phase is over test analysis phase starts, in this phase we need
to dig deeper into project and figure out what testing needs to be carried out in
each SDLC phase. Automation activities are also decided in this phase,
ifautomation needs to be done for software product, how will the automation be
30
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
done, how much time will it take to automate and which features need to be
automated. Non functional testing areas(Stress and performance testing) are also
analyzed and defined in this phase.
● Test Design
In this phase various black-box and white-box test design techniques are used to
design the test cases for testing, testers start writing test cases by following those
design techniques, if automation testing needs to be done then automation scripts
also needs to written in this phase.
31
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -7
32
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
7.SCREENSHOTS
33
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
7.2 Creating a function to load text from training and test data
Fig 7.2 – Creating a function to load text from training and test data
34
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
35
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
36
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
37
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -8
38
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
FUTURE ENHANCEMENTS
Sentiment analysis is a uniquely powerful tool for businesses that are looking to measure
attitudes, feelings and emotions regarding their brand. To date, the majority of sentiment
analysis projects have been conducted almost exclusively by companies and brands
through the use of social media data, survey responses and other hubs of user-generated
content. By investigating and analyzing customer sentiments, these brands are able to get
an inside look at consumer behaviors and, ultimately, better serve their audiences with
the products, services and experiences they offer.
Algorithms have long been at the foundation of most forms of analytics, including social
media and sentiment analysis. With recent years bringing big leaps in machine learning
and artificial intelligence, many analytics solutions are looking to these technologies to
replace algorithms. Unfortunately for organizations looking to leverage sentiment
analysis to measure audience emotions, machine learning isn’t yet ready to tackle the
complex nuances of text and how we talk, especially on social media channels that are
rife with slang, sarcasm, double meanings and misspellings. These make it difficult for
artificial intelligence systems to accurately sort and classify sentiments on social media.
And, with any analysis project, accuracy is crucial. It is uncertain if machine learning
will progress to the point that it is capable of accurately analyzing text, or if sentiment
analysis projects will have to find a new basis to avoid the current plateau of algorithms.
The game is now currently limited to Amazon Smart Speakers but maybe in future the
same
39
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -9
40
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CONCLUSION
41
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
CHAPTER -10
42
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
REFERENCES
Kaburlasos V.,
Fragkou P., “A Comparison of
Word- and
43
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
44
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
45
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
46
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
REVIEW CLASSIFICATION USING ML
47
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY