Research Proposal
Research Proposal
Project Title: Deep Convolution Neural Networks for Twitter Sentiment Analysis
Introduction :Twitter,with over 319 million monthly active users,has now become goldmine who
have a strong political, social and economic interest in maintaining and enhancing their clout and
reputation. Sentiment analysis provides these organizations with the ability to surveying various
social media sites in real time.Text Sentiment analysis is an automatic process to determining
whether a text segment contains objective or opinionated content, and it can furthermore determine
the text’s sentiment polarity. The goal of Twitter sentiment classification is to automatically
determine whether a tweet's sentiment polarity is negative or positive
Literature Review:
A description of the change classification approach, techniques for extracting features from the
source code and change histories, a characterization of the performance of change classification
across 12 open source projects, and an evaluation of the predictive power of different groups of
features.it also introduced a new bug prediction technique that works at the granularity of an
individual file level change and has accuracy comparable to the best existing bug prediction
techniques in the literature (78 percent on average).
Defect prediction models are necessary in aiding project managers for better utilizing valuable
project resources for software quality improvement. The efficacy and usefulness of a fault-
proneness prediction model is only as good as the quality of the software measurement data.
Authors has been studied seven filter-based feature ranking techniques and three filter-based subset
selection search algorithms. The search space for the subset selection algorithms is reduced for a
more practical application of the proposed approach.
Objectives:
METHODOLOGY
Creating Corpus
1. File level change deltas are extracted from the revision history of a project, as stored in its SCM
repository
2. The bug fix changes for each file are identified by examining keywords in SCM change log
messages.
3. The bug-introducing and clean changes at the file level are identified by tracing backward in the
revision history from bug fix changes.
4. Features are extracted from all changes, both buggy and clean. Features include all terms in the
complete source code, the lines modified in each change (delta), and change meta-data such as
author and change time. Complexity metrics, if available, are computed at this step. At the end,a
project-specific corpus has been created, a set of labeled changes with a set of features associated
with each change.
Feature Selection
5. Perform a feature selection process that employs a combination of wrapper and filter methods to
compute a reduced set of features. The filter methods used are Gain Ratio, Chi-Squared,
Significance, and Relief-F feature rankers. The wrapper methods are based on the Naive Bayes and
the SVM classifiers. Feature selection is
iteratively performed until one feature is left.
At the end, there is a reduced feature set that performs optimally for the chosen classifier metric.
Classification
6. Using the reduced feature set, a classification mode is trained.
7. Once a classifier has been trained, it is ready to use. New code changes can now be fed to the
classifier, which determines whether a new change is more similar to a buggy change or a clean
change. Classification is performed at a code change level using file level change deltas as input to
the classifier.
Project Plan :