Sentimental Analysis Project Documentation
Sentimental Analysis Project Documentation
ON
SENTIMENTAL ANALYSIS USING AI-DEEP LEARNING
A project report submitted in partial fulfillment of the requirements for the award of
the degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
By
A. Supraja 172P1A0505
D. Indira 172P1A0517
H. Neelesh 172P1A0534
G. Yashaswini 172P1A0523
C. Vyshnavi 172P1A0513
CERTIFICATE
This is to certify that the project work entitled “SENTIMENTAL ANALYSIS USING AI-
DEEP LEARNING” is a bonafide work of A. SUPRAJA (172P1A0505),
D.INDIRA(172P1A0517),H.NEELESH(172P1A0534),G.YASHASWINI(172P1A0523),C.VYS
HNAVI(172P1A0513) submitted to Chaitanya Bharathi Institute of Technology, Proddatur in
partial fulfillment of the requirement for the award of degree of Bachelor of Technology In
COMPUTER SCIENCE AND ENGINEERING. The work reported here in does not form part of
any other thesis on which a degree has been awarded earlier.
This is to further certify that they have worked for a period of one semester for preparing
their work under our supervision and guidance.
PROJECT CO-ORDINATOR
N. SRINIVASAN, M.Tech.,(ph.D)
A.SUPRAJA 172P1A0505
D.INDIRA 172P1A0517
H.NEELESH 172P1A0534
G.YASHASWINI 172P1A0523
C.VYSHNAVI 172P1A0513
Vidyanagar,Proddatur,Y.S.R.(Dist.)
ACKNOWLEDGEMENT
An endeavor over a long period can be successful only with the advice and supports of many
well wishers. We take this opportunity to express our gratitude and appreciation to all of them.
We are extremely thankful to our beloved Chairman Sri V.Jayachandra Reddy who took
keen interest and encouraged us in every effort throughout this course.
We express our heartfelt thanks to G. Sreenivasa Reddy B.Tech.,Ph.D Head of Dept – CSE
for his kind attention and valuable guidance to us throughout this course.
We also express our deep sense of gratitude towards N. Srinivasan, Project Co-Ordinator,
Dept. of COMPUTER SCIENCE AND ENGINEERING for her support and guidance in
completing our project.
We express our profound respect and gratitude to our project guide P. Narasimhaiah,
B.Tech., for her valuable support and guidance in completing the project successfully.
We are highly thankful to Mr. M. Naresh Raju Try Logic Soft Solutions AP Pvt. Limited,
Hyderabad, who has been kind enough to guide us in the preparation and execution of this project.
We also thank all the teaching and non-teaching staff of the Dept. of COMPUTER
SCIENCE AND ENGINEERING. For their support throughout our B.Tech. course.
We express our heartful thanks to My Parents for their valuable support and encouragement
in completion of my course. Also I express my heartful regards to my Friends for being supportive
in completion of my project.
TABLE OF CONTENTS
CONTENT PAGE NO
ABSTRACT I
TABLE OF CONTENTS Ⅱ
1.INTRODUCTION 1
1.3 Objective 8
Specifications 12
Software Requirements 13
Hardware Requirements 13
Module Description 13
4. DESIGN 17
Block Diagram 17
Sequence Diagram 24
Collaboration Diagram 25
Activity Diagram 26
5. IMPLEMENTATION 28
6. TESTING 47
7. OUTPUT SCREENS 49
8. CONCLUSION 55
9. FUTURE ENHANCEMENT 56
10 BIBLIOGRAPHY 57
ABSTRACT
Sentiment Analysis or opinion mining is done using Machine Learning models and
Deep Learning models. Deep learning has made a great breakthrough in the field of speech
and image recognization.
Ⅰ
LIST OF FIGURES
Ⅱ
CHAPTER 1
INTRODUCTION
10.1 DOMAIN
Learning ?
Deep learning has emerged as a powerful machine learning technique that learns
multiple layers of representations or features of the data and produces state-of-the-art
prediction results. Along with the success of deep learning in many other application domains,
deep learning is also popularly used in sentiment analysis in recent years.
1
What is Machine Learning?
Machine Learning (ML) is coming into its own, with a growing recognition that ML
can play a key role in a wide range of critical applications, such as data mining,Natural
language processing, image recognition, and expert systems. ML provides potential solutions
in all these domains and more, and is set to be a pillar of our future civilization.
“A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves with
experience E.” -- Tom Mitchell, Carnegie Mellon University
1.Computer Vision
High-end gamers interact with deep learning modules on a very frequent basis.
Deep neural networks power bleeding-edge object detection, image classification, image
restoration, and image segmentation.
So much so, they even power the recognition of hand-written digits on a computer
system. To wit, deep learning is riding on an extraordinary neural network to empower
machines to replicate the mechanism of the human visual agency.
2.Autonomous Vehicles
The next time you are lucky enough to witness an autonomous vehicle driving
down, understand that there are several AI models working simultaneously. While some
models pin-point pedestrians, others are adept at identifying street signs. A single car can be
informed by millions of AI models while driving down the road. Many have considered AI-
powered car drives safer than human riding.
3.Automated Translation
Automated translations did exist before the addition of deep learning. But deep
learning is helping machines make enhanced translations with the guaranteed accuracy that
was missing in the past. Plus, deep learning also helps in translation derived from images –
something totally new that could not have been possible using traditional text-based
interpretation.
Take a moment to digest this – Nvidia researchers have developed an AI system that
helps robots learn from human demonstrative actions. Housekeeping robots that perform
actions based on artificial intelligence inputs from several sources are rather common. Like
human brains process actions based on past experiences and sensory inputs, deep-learning
infrastructures help robots execute tasks depending on varying AI opinions.
Fig 1.1.5 Image for Bots based on Deep Learning
Carolyn Gregorie writes in her Huffington Post piece: “the world isn’t falling apart, but
it can sure feel like it.” And we couldn’t agree more. I am not naming names here, but you
cannot scroll down any of your social media feed without stumbling across a couple of global
disasters – with the exception of Instagram perhaps.
News aggregators are now using deep learning modules to filter out negative news and
show you only the positive stuff happening around. This is especially helpful given how
blatantly sensationalist a section of our media has been of late.
1 . Image Recognition
the real world. It can identify an object as a digital image, based on the intensity of the pixels
in black and white images or colour images.
Machine learning can help with the diagnosis of diseases. Many physicians use
In the case of rare diseases, the joint use of facial recognition software and
machine learning helps scan patient photos and identify phenotypes that correlate with rare
genetic diseases.
3. Sentimental Analysis
sentiment classification , opinion mining , and analyzing emotions using this model, machines
groom themselves to analyze sentiments based on the words. They can identify if the words
are said in a positive, negative or neutral notion. Also they can define the magnitude of these
words.
The growth of the internet due to social networks such as facebook, twitter, Linkedin,
instagram etc. has led to significant users interaction and has empowered users to express
their opinions about products, services, events, their preferences among others. It has also
provided opportunities to the users to share their wisdom and experiences with each other.
The faster development of social networks is causing explosive growth of digital content. It
has turned online opinions, blogs, tweets, and posts into a very valuable asset for the
corporates to get insights from the data and plan their strategy. Business organizations need to
process and study these sentiments to investigate data and to gain business insights.
Traditional approach to manually extract complex features, identify which feature is relevant,
and derive the patterns from this huge information is very time consuming and require
significant human efforts. However, Deep Learning can exhibit excellent performance via
Natural Language Processing (NLP) techniques to perform sentiment analysis on this massive
information. The core idea of Deep Learning techniques is to identify complex features
extracted from this vast amount of data without much external intervention using deep neural
networks. These algorithms automatically learn new complex features. Both automatic feature
extraction and availability of resources are very important when comparing the traditional
machine learning approach and deep learning techniques. Here the goal is to classify the
opinions and sentiments expressed by users.
The online medium has become a significant way for people to express their
opinions and with social media, there is an abundance of opinion information available. Using
sentiment analysis, the polarity of opinions can be found, such as positive, negative, or neutral
by analyzing the text of the opinion. Sentiment analysis has been useful for companies to get
their customer's opinions on their products predicting outcomes of elections , and getting
opinions from movie reviews. The information gained from sentiment analysis is useful for
companies making future decisions. Many traditional approaches in sentiment analysis uses
the bag of words method. The bag of words technique does not consider language
morphology, and it could incorrectly classify two phrases of having the same meaning
because it could have the same bag of words. The relationship between the collection of
words is considered instead of the relationship between individual words . When determining
the overall sentiment, the sentiment of each word is determined and combined using a
function . Bag of words also ignores word order, which leads to phrases with negation in them
to be incorrectly classified.
10.3 OBJECTIVE
To address this solution , we should collect the data from various sources like different
websites , pdfs and word document. After collecting the data we will convert it into csv file
then, we will break the data into individual sentences.Then.by using Natural Language
Processing(NLP) we eliminate stop words. Stop words are those words which are referred as
useless words in the sentence or the extra data which are of no use. For example, ”the” , ”a” ,
“an”, “in” , are some of the examples of stop words in English. After that the algorithm naïve
bayes is used to train the model. ANN algorithm works in backend to generae pickle file.
Confusion matrix is used for validation technique and Accuracy is used for model
defect.
CHAPTER 2
Without NLP and access to the right data , it is difficult to discover and collect insight
necessary for driving business decisions. Deep Learning algorithms are used to build a
model.
The advanced techniques like natural language processing is used for the sentimental
analysis.It makes our project very accurate.
NLP defines a relation between user posted tweet and opinion and in addition
suggestions of people.
NLP is a best way to understand natural language used by the people and uncover the
sentiment behind it.NLP makes speech analysis easier.
2.5.2.2Feasibility of Technology:
For our project from Machine Learning aka M.L., we have chosen Unsupervised
Machine Learning task to train our data on the GloVe, i.e., Global Vectors for Word
Representation Dataset. Later, training on this dataset, we’ll then, give our inputs to the model
and it’ll display the top N Sentences.
SYSTEM ANALYSIS
3.1 SPECIFICATION
Functional requirements
One of the most difficult tasks is that, the selection of the software, once system
requirement is known that is determining whether a particular software package fits the
requirements.
TECHNOLOGY PYCHARM
BROWSER GOOGLECHROME
Table3.2.1SoftwareRequirements
HARDDISK 1TB
I/O KEYBOARD,MONITER,MOUSE
For predicting the literacy rate of India, our project has been divided into
following modules:
3. Accuracy Measures
Pandas:
In order to be able to work with the data in Python, we'll need to read the csv file
into a Pandas Data Frame. A Data Frame is a way to represent and work with tabular data.
Tabular data has rows and columns, just like our csv file.
Simple Linear Regression is an approach for predicting a response using a single feature. It is
assumed that the two variables are linearly related. Hence, we try to find a linear function that
predicts the response value(y) as accurately as possible as a function of the feature or
independent variable(x).
For predicting the literacy rate of any given year, first we need predict the population for that
year. Then the predicted population is given as input to the model which predict literacy rate
For the algorithm which predict population, year is taken as independent variable. And the
predicted population is taken as independent variable for the literacy prediction algorithm.
Testing:
In testing, now we predict the data. Here we have 2 steps: predict the literacy rate
and plot it to compare with the real results.Using fit transform to scale the data and then
reshape it for the prediction. Predict the data and rescale the predicted data to match its real
values. Then plot real and predicted literacy rate on a graph. Then calculate the accuracy.
We use Sklearn and Numpy python module for Training and testing
Sklearn:
Numpy is the core library for scientific computing in Python. It provides a high-
performance multidimensional array object, and tools for working with these arrays.It is used
for Numerical Calculations
3. Accuracy Measures
Using the Proposed model prediction is made for coming years. Graphs are
used to visualize state wise literacy rate predictions. We use Matplotlib python module for
Visualization
Matplotlib:
DESIGN
A DFD shows what kinds of information will be input to and output from the
system, where the data will come from and go to, and where the data will be stored. It
doesn’t show information about timing of processes, or information about whether
processes will operate in sequence or parallel. A DFD is also called as “bubble chart”.
DFD Symbols:
Process: People, procedures or devices that use or produce (Transform) data. The
physical component is not identified.
In our project, we had built the data flow diagrams at the very beginning of
business process modelling in order to model the functions that our project has to carry
out and the interaction between those functions together with focusing on data
exchanges between processes.
A Context level Data flow diagram created using select structured systems
analysis and design method (SSADM). This level shows the overall context of the
system and its operating environment and shows the whole system as just one process. It
does not usually show data stores, unless they are “owned” by external systems, e.g. are
accessed by but not maintained by this system, however, these are often shown as
external entities. The Context level DFD is shown in fig.3.2.1
The Context Level Data Flow Diagram shows the data flow from the application
to the database and to the system.
After starting and executing the application, training and testing the dataset can
be done as shown in the above figure
This level explains each process of the system in a detailed manner. In first
detailed level DFD (Generation of individual fields): how data flows through individual
process/fields in it are shown.
how data flows through the system to form a detailed description of the individual
processes.
Figure 4.2.3.1 Detailed level DFD for Sentimental Analysis
After starting and executing the application, training the dataset is done by using
dividing into 2D array and scaling using normalization algorithms, and then testing is
done.
After starting and executing the application, training the dataset is done by using
linear regression and then testing is done.
ii. The analysis representation describes a usage scenario from the end-
users perspective.
• Structural Model View
i. In this model the data and functionality are arrived from inside the
system.
Use case diagrams are one of the five diagrams in the UML for modeling the
dynamic aspects of the systems (activity diagrams, sequence diagram, state chart
diagram, collaboration diagram are the four other kinds of diagrams in the UML for
modeling the dynamic aspects of systems).Use case diagrams are central to modeling
the behavior of the system, a sub-system, or a class. Each one shows a set of use cases
and actors and relations.
Figure 4.3.1 USECASE DIAGRAM
4.3.3 Sequence Diagram:
Sequence diagram is an interaction diagram which is focuses on the time
ordering of messages. It shows a set of objects and messages exchanged between these
objects. This diagram illustrates the dynamic view of a system.
IMPLEMENTATION
Implementation is the stage of the project when the theoretical design is turned out
into a working system. Thus it can be considered to be the most critical stage in achieving a
successful new system and in giving the user, confidence that the new system will work and
be effective.
The project is implemented by accessing simultaneously from more than one system
and more than one window in one system. The application is implemented in the Internet
Information Services 5.0 web server under the Windows XP and accessed from various
clients.
5.1 TECHNOLOGIES USED
What is Python?
Python interpreters are available for many operating systems. Python, the reference
implementation of Python, is open source software and has a community-based development
model, as do nearly all of its variant implementations. Python is managed by the non-profit
Python Software Foundation.
Python is a general purpose, dynamic, high level and interpreted programming
language. It supports object-oriented programming approach to develop applications. It is
simple and easy to learn and provides lots of high level data structures.
Windows XP
Python Programming
Python Versions
Python 2.0 was released on 16 October 2000 and had many major new features,
including a cycle-detecting, garbage collector, and support for Unicode. With this release, the
development process became more transparent and community-backed.
Python 3.0 (initially called Python 3000 or py3k) was released on 3 December 2008
after a long testing period. It is a major revision of the language that is not completely
backward-compatible with previous versions. However, many of its major features have been
back ported to the Python 2.6.xand 2.7.x version series, and releases of Python 3 include the
2to3 utility, which automates the translation of Python 2 code to Python 3.
Python 2.7's end-of-life date (a.k.a. EOL, sunset date) was initially set at 2015, then
postponed to 2020 out of concern that a large body of existing code could not easily be
forward-ported to Python 3.In January 2017, Google announced work on a Python 2.7 to go
Trans compiler to improve performance under concurrent workloads.
Python 3.6 had changes regarding UTF-8 (in Windows, PEP 528 and PEP 529) and
Python 3.7.0b1 (PEP 540) adds a new "UTF-8 Mode" (and overrides POSIX locale).
Why Python?
Python is a scripting language like PHP, Perl, and Ruby.
Excellent documentation
Thriving developer community
Libraries Of python:
Some parts of the standard library are covered by specifications (for example, the Web
Server Gateway Interface (WSGI) implementation wsgiref follows PEP 33), but most
modules are not.
They are specified by their code, internal documentation, and test suites (if supplied).
However, because most of the standard library is cross-platform Python code, only a few
modules need altering or rewriting for variant implementations.
As of March 2018, the Python Package Index (PyPI), the official repository for
thirdparty Python software, contains over 130,000 packages with a wide range of
functionality, including:
• Web frameworks
• Multimedia
• Databases
• Networking
• Test frameworks
• Automation
• Web scraping
• Documentation
• System administration
• You'll know how to use Python and its libraries to explore your data with the help of
matplotlib and Principal Component Analysis (PCA).
• And you'll preprocess your data with normalization and you'll split your data into training
and test sets.
• Next, you'll work with the well-known K-Means algorithm to construct an unsupervised
model, fit this model to your data, predict values, and validate the model that you have
built.
• As an extra, you'll also see how you can also use Support Vector Machines (SVM) to
construct another model to classify your data.
• It was born from pattern recognition and theory that computers can learn without being
programmed to specific tasks.
• It is a method of Data analysis that automates analytical model building.
Machine learning tasks are typically classified into two broad categories, depending
on whether there is a learning "signal" or "feedback" available to a learning system. They are
Supervised learning:The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule thatmapsinputs to outputs.
As special cases, the input signal can be only partially available, or restricted to special
feedback:
Semi-supervised learning: the computer is given only an incomplete training signal: a
training set with some (often many) of the target outputs missing.
Active learning:the computer can only obtain training labels for a limited set of instances
(based on a budget), and also has to optimize its choice of objects to acquire labels for. When
used interactively, these can be presented to the user for labelling.
Reinforcement learning: training data (in form of rewards and punishments) is given only as
feedback to the program's actions in a dynamic environment, such asdriving a vehicleor
playing a game against an opponent.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).
Inregression, also a supervised problem, the outputs are continuous rather than
discrete.
Regression: The analysis or measure of the association between one variable (the dependent
variable) and one or more other variables (the independent variables), usually formulated in
an equation in which the independent variables have parametric coefficients, which may
enable future values of the dependent variable to be predicted.
Types of Regression:
1. Linear Regression
2. Logistic Regression
3. Polynomial Regression
4. Stepwise Regression
5. Ridge Regression
6. Lasso Regression
7. Elastic Net Regression
1. Linear Regression: -It is one of the most widely known modelling techniques. Linear
regression is usually among the first few topics which people pick while learning predictive
modelling. In this technique, the dependent variable is continuous, independent variable(s)
can becontinuousordiscrete, and nature of regression line is linear.
Linear Regression establishes a relationship between dependent variable (Y) and one
or more independent variables (X) using a best fit straight line (also known as regression line).
y=a+b*x^2
4. Stepwise Regression: -This form of regression is used when we deal with multiple
independent variables. In this technique, the selection of independent variables is done with
the help of an automatic process, which involves no human intervention.
This feat is achieved by observing statistical values like R-square, t-stats and AIC
metric to discern significant variables. Stepwise regression basically fits the regression model
by adding/dropping co-variants one at a time based on a specified criterion. Some of the most
commonly used Stepwise regression methods are listed below:
• Standard stepwise regression does two things. It adds and removes predictors as
needed for each step.
• Forward selection starts with most significant predictor in the model and adds variable
for each step.
• Backward elimination starts with all predictors in the model and removes the least
significant variable for each step.
The aim of this modelling technique is to maximize the prediction power with
minimum number of predictor variables. It is one of the methods to handle higher
dimensionality of data set.
5. Ridge Regression: -Ridge Regression is a technique used when the data suffers from
multi collinearity (independent variables are highly correlated). In multi collinearity, even
though the least squares estimate (OLS) are unbiased; their variances are large which deviates
the observed value far from the true value. By adding a degree of bias to the regression
estimates, ridge regression reduces the standard errors.
Above, we saw the equation for linear regression. Remember? It can be represented as:
y=a+ b*x
This equation also has an error term. The complete equation becomes:
y=a+b*x+e (error term), [error term is the value needed to correct for a prediction
error between the observed and predicted value]
In this equation, we have two components. First one is least square term and other one
is lambda of the summation of β2 (beta- square) where β is the coefficient. This is added to
least square term in order to shrink the parameter to have a very low variance.
Important Points:
Important Points:
• If group of predictors are highly correlated, lasso picks only one of them and shrinks
the others to zero
7. Elastic Net Regression: -Elastic Net is hybrid of Lasso and Ridge Regression
techniques. It is trained with L1 and L2 prior as regularize. Elastic-net is useful when there
are multiple features which are correlated. Lasso is likely to pick one of these at random,
while elastic-net is likely to pick both.
A practical advantage of trading-off between Lasso and Ridge is that, it allows Elastic-
Net to inherit some of Ridge’s stability under rotation.
Important Points:
Beyond these 7 most commonly used regression techniques, you can also look at
other models likeBayesian,EcologicalandRobust regression.
Classification
1.Logistic regression
2.Decision tree
3. Random forest
4. Naive Bayes.
1.Logistic Regression
2.Decision Tree
Decision Trees are a type of Supervised Machine Learning (that is you explain what
the input is and what the corresponding output is in the training data) where the data is
continuously split according to a certain parameter. The tree can be explained by two entities,
namely decision nodes and leaves.
3.Random Forest
The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
Fig 5.2.1 Image for Random Forest Algorithm
4.Naive Bayes
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c),
P(x) and P(x|c). Look at the equation below:
5.3 Deep learning
Deep learning has emerged as a powerful machine learning technique that learns
multiple layers of representations or features of the data and produces state-of-the-art
prediction results. Along with the success of deep learning in many other application domains,
deep learning is also popularly used in sentiment analysis in recent years.
1.Structured Algorithm
2.Unstructured Algorithm
1. Structured Algorithm
Artificial Neural Networks are computational models and inspire by the human brain.
Many of the recent advancements have been made in the field of Artificial Intelligence,
including Voice Recognition, Image Recognition, Robotics using Artificial Neural Networks.
Artificial Neural Networks are the biologically inspired simulations performed on the
computer to perform certain specific tasks like –
Clustering
Classification
Pattern Recognization
2. Unstructured Algorithm
A deep neural network (DNN) is an artificial neural network (ANN) with multiple
layers between the input and output layers. There are different types of neural networks but
they always consist of the same components: neurons, synapses, weights, biases, and
functions.These components functioning similar to the human brains and can be trained like
any other ML algorithm.
For example, a DNN that is trained to recognize dog breeds will go over the given
image and calculate the probability that the dog in the image is a certain breed. The user can
review the results and select which probabilities the network should display (above a certain
threshold, etc.) and return the proposed label. Each mathematical manipulation as such is
considered a layer, and complex DNN have many layers, hence the name "deep" networks.
Modules in python
Pandas: -
• Robust IO tools for loading data from flat files (CSV and delimited), Excel files,
databases, and saving / loading data from the ultrafast HDF5 format
• Time series-specific functionality: date range generation and frequency
conversion, moving window statistics, moving window linear regressions, date
shifting and lagging, etc.
• pandas is fast. Many of the low-level algorithmic bits have been extensively
improved in Python code. However, as with anything else generalization usually
sacrifices performance. So, if you focus on one feature for your application you
may be able to create a faster specialized tool.
• pandas are a dependency of stats models, making it an important part of the
statistical computing ecosystem in Python.
• pandas have been used extensively in production in financial applications.
NumPy: -
All this is explained with the help of examples for better understanding.
NumPy is a Python package. It stands for 'Numerical Python'. It is a library
consisting of multidimensional array objects and a collection of routines for processing
of array. Numeric, the ancestor of NumPy, was developed by Jim Humulin. Another
package Numara was also developed, having some additional functionalities. In 2005,
Travis Oliphant created NumPy package by incorporating the features of Numara into
Numeric package. There are many contributors to this open source project.
• Operations related to linear algebra. NumPy has in-built functions for linear
algebra and random number generation.
Sickit-learn: -
The original codebase was later rewritten by other developers. In 2010 Fabian
Pedrosa, Gael Viroqua, Alexandre Gramfort and Vincent Michel, all from INRIA took
leadership of the project and made the first public release on February the 1st 2010 .Of
the various scikits, scikit-learn as well as scikit-image were described as “well-
maintained and popular” in November 2012. Scikit-learn is largely written in Python,
with some core algorithms written in Cython to achieve performance. Support vector
machines are implemented by a Cython wrapper around LIBSVM; logistic regression
and linear support vector machines by a similar wrapper around LIBLINEAR.
• Defect Detection
• Reliability estimation
The base of the black box testing strategy lies in the selection of appropriate data
as per functionality and testing it against the functional specifications in order to check
for normal and abnormal behavior of the system. Now a days, it is becoming to route
the testing work to a third party as the developer of the system knows too much of the
internal logic and coding of the system, which makes it unfit to test application by the
developer. The following are different types of techniques involved in black box testing.
They are:
• Equivalence Partitioning
White box testing [10] requires access to source code. Though white box testing [10] can
be performed any time in the life cycle after the code is developed, it is a good practice to
perform white box testing [10] during unit testing phase.
In designing of database the flow of specific inputs through the code, expected output and
the functionality of conditional loops are tested.
Artificial Neural Networks are computational models and inspire by the human brain.
Many of the recent advancements have been made in the field of Artificial Intelligence,
including Voice Recognition, Image Recognition, Robotics using Artificial Neural Networks.
Artificial Neural Networks are the biologically inspired simulations performed on the
computer to perform certain specific tasks like –
Clustering
Classification
Pattern Recognization
For the above review the output is:
For the review ”Food is Bad”
For the above review the output is
CHAPTER 8
CONCLUSION
Sentimental analysis of data collected from social media like Twitter , Facebook,
Instagram is beneficial for mankind in providing them better health care. The workflow of
analysis healthcare content in the social media helps to overcome the limitations of large scale
data analysis and manual analysis of user generated textual content in social media.
This work can help the users to be updated with the effectiveness of the
medicines and it can even suggest them with few better medications available. This project
can provide feedback to the healthcare system organization and pharmaceutical companies for
the available treatments and medicines. With the help of this project ,pharmaceutical
companies and healthcare providers can work on the feedback and try to come up with the
improvised medicines and treatments for diabetes. Users are provided with the resources of
social media for the corresponding field of healthcare.
Opinion Mining and Sentiment analysis has wide area of applications and it also
facing many research challenges. Since the fast growth of internet and internet related
applications, the Opinion Mining and Sentiment Analysis become a most interesting research
area among natural language processing community. A more innovative and effective
techniques required to be invented which should overcome the current challenges faced by
Opinion Mining and Sentiment Analysis.
CHAPTER 9
In future , one can collect large healthcare related data from multiple social networking
sites which may provide better results by overcoming the limitations of the project.
In future, one can even collect data which includes videos and images for analyzing and
we can use more Deep Learning and Neural network to implement our project in Future.
CHAPTER 10
BIBLIOGRAPHY
The elements of statistical learning: Data Mining, Inference, and Prediction, Second Edition.
Springer Series in Statistics, 2009.
[3] G. Szabo and B. Huberman. Predicting the popularity of online content. Communic. of
ACM, 53(8), 2010.
[4[ A Machine Learning Model for Stock Market Prediction Article by Osman
Hegazy and Mustafa Abdul Salam.
[5]. The Unified Modeling Language User Guide, Low Price Edition Grady Booch, James
Rumbaugh, Ivar Jacob,ISBN: 81-7808-769-5, 91, 1997.
[9].Black_Box Testing: Techniques for Functional Testing of Software and Systems Boris
Beizer - Wiley Publications, ISBN: 978-0-471-120-940, 1995.