Preprints202008 0265 v2
Preprints202008 0265 v2
v2
Abstract – In software development, developers received bug reports that describe the software bug.
Developers find the cause of bug through reviewing the code and reproducing the abnormal behavior that can
be considered as tedious and time-consuming processes. The developers need an automated system that
incorporates large domain knowledge and recommends a solution for those bugs to ease on developers rather
than spending more manual efforts to fixing the bugs or waiting on Q&A websites for other users to reply to
them. Stack Overflow is a popular question-answer site that is focusing on programming issues, thus we can
benefit knowledge available in this rich platform. This paper, presents a survey covering the methods in the
field of mining software repositories. We propose an architecture to build a recommender System using the
learning to rank approach. Deep learning is used to construct a model that solve the problem of learning to
rank using stack overflow data. Text mining techniques were invested to extract, evaluate and recommend
the answers that have the best relevance with the solution of this bug report.
Keywords – Recommender System, learning to rank, Mining software repositories, Text Mining, Deep learning,
Stack Overflow.
related knowledge. Extracting knowledge from points that can be earned in a day is 200, thus
Stack Overflow by using data mining will be a making sure that the reputation gained by a user
great success of major interest for the is by actively and consistently participating in the
community. In the next section, we will outline site activities.
and clarify the concept of mining software
Privileges: are the accesses that user unlock
repositories.
each time it win a certain reputation. For example,
2.1.4 What is Stack overflow? user can negatively vote a position only if it has
at least 15 reputations. The last privilege is that
Ahasanuzzaman et al [4] define Stack Overflow which will allow user to have access to the
as "a popular question answering site that is Google analytics data of the site.
focused on programming problems». In this
community, Users can ask questions, provide Badges: user can earn badges when it perform a
answers to the questions asked, mark the specific predefined operation. For example,
questions as favorite, up vote / down vote an earning the Supporter badge when user vote
answer, tag questions, and carry out other positively for the first time.
community related tasks. Programmers have
actively used it to ask questions from January 3. Mining Software repositories (MSR)
2009. Today there is more than 18 million
questions, 27 million answers, and 10 million The field of mining software repositories aims at
users. examining and analyzing “the rich data available
in software repositories to uncover interesting
In addition to the above, Stack Overflow makes
and actionable information in about software
some mechanisms of tagging and earning
projects and systems” [5].
reputation, badges to have more privileges in the
community. These mechanisms are explained in 3.1. Application of MSR
the following lines:
Prediction and identifying bugs: Predicting
Tags: Stack Overflow allows users to tag each
the occurrence of bugs remains one of the
question, with up to a maximum of five tags.
most active areas in software engineering
Users can select an existing tag provided in the
research. By using MSR, it is possible to
autocomplete text box or create a new one. To
predict and localize the bugs, so managers
create a new tag, users need to have a minimum
can allocate testing resource appropriately,
level of reputation on Stack Overflow. This makes
developers can review risky code more
sure that only expert users can create new tags,
closely, and testers can prioritize their testing
which is maintaining consistency among tags
efforts.
found on Stack Overflow. Expert users can also
change the question tags, in case they found Understanding Software Systems:
them incorrectly tagged. Understanding large software systems
remains a challenge for most software
User Reputation: Stack Overflow provides a
organizations. Documentations for large
metric called Reputation to rank their users.
systems rarely exist and if they exist, they are
Reputation is an approximate measurement of
often not up-to-date. Information stored in
how much the community trusts a user; it is
historical software repositories, such as
earned when the peers appreciate what a user is
mailing lists and bug repositories, represent a
contributing. Users do not need reputation for
group memory for a project. Such information
basic site functionalities such as asking questions
is very valuable for current members of a
and providing answers, however users with high
project.
reputation score gain more privileges. The
primary way to gain reputation is by posting good Understanding Team Dynamics: Many
questions and useful answers. Votes on these large projects communicate through mailing
posts cause user to gain (or sometimes lose) lists, IRC channels, or instant messaging.
reputation. The maximum number reputation These discussions over many important topics
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2
such as plans, design decisions, project Deep learning focuses on a specific category of
policies, and code or patch reviews. These machine learning called Artificial Neural
discussions represent a rich source of Networks that is inspired from the functionality of
historical information about the inner workings the human brain. Modern deep learning provides
of large projects. The mining of these a very powerful framework for supervised
discussions can help better understand the learning. By adding more layers and more units
dynamics of large software development within a layer, a deep network can represent
teams. functions of increasing complexity. Most tasks
that consist of mapping an input vector to an
Propagating Changes: Change propagation output vector, and that are easy for a person to
is the process of propagating code changes to do rapidly, can be accomplished via deep
other entities of a software system to ensure learning, given sufficiently large models and
the consistency of assumptions in the system sufficiently large datasets of labeled training
after changing an entity. For example, a example [9].
change to an interface may require the
change to propagate to all the components, 3.2.3 NLP Techniques for Data Preprocessing
which use that interface. Instead of using
traditional dependency graphs to propagate NLP is the capacity of a computer program to
changes, we could make use of the historical comprehend human language [10]. Researchers
co-changes. The intuition is that entities co- use NLP techniques to preprocess data before
changing frequently in the past are very likely applying IR models to the data. Preprocessing
to co-change in the future. steps include tokenization, splitting, stop word
removal, stemming and pruning. IR overlaps with
3.2. Data mining for Software Engineering other fields, especially database technology and
natural language processing (NLP). Information
Data mining is the science of extracting useful Retrieval techniques and algorithms are also
knowledge from such huge data repositories, to used in the recommender systems research field.
use this knowledge in the decision process [6].
Data mining is aims to discover hidden and useful
patterns in huge data sets. Data Mining is all 4. Related Works
about discovering unsuspected and unknown
relationships amongst the data. Data mining uses Many researchers have discussed the
machine learning, statistics, AI and database effectiveness of using data mining techniques to
technology to provide reliable results. facilitate the debugging process for software
engineering developers. This section presents an
3.2.1 Machine Learning overview of the state of the art of this research
area.
Tom Mitchell in his book Machine Learning [7]
provides another definition: “The field of machine Xin et al [11] presented a ranking approach that
learning is concerned with the question of how to simulates the bug locating process used by
construct computer programs that automatically developers. The ranking model benefited from
improve with experience”. Like Humans have the domain knowledge such as API specifications,
ability to learn by experience. Machines with the bug-fix history, and code-change history.
machine learning can do the same. The goal of "The ranking score of each source file is
machine learning processes is to generate in calculated as a weighted combination of an array
output a predictive Model based on data used in of features ". Evaluation of experimental results
training. Depending on the nature of the business was done on six open sources java projects
problem being addressed, there are different which are Eclipse, JDT, Birt ,SWT. Tomcat and
approaches based on the type and volume of the AspectJ. Results showed that the ranking
data: supervised learning, unsupervised learning approach is better than BugLocator, VSM, and
and reinforcement learning [8]. Usual suspect approaches. Their method assigns
the relevant files for over 70 % of the bug reports
3.2.2 Deep Learning
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2
within the top 10 recommendations in Eclipse and at least one recommended pair proved to be
Tomcat projects. useful to solve a programming problem.
Rafi et al. [12] proposed an automated approach Fabio et al. [15] discussed developers
for finding and ranking potential relevant classes abandoned from legacy developer forums to
for bug reports. Their approach used a multi- stack overflow platform, a lot of crowd-sourced
objective optimization algorithm to find balance knowledge is at risk of being left behind. They
between minimizing the number of recommended aimed to add to the body of evidence of existing
classes and maximizing the correctness of the research on best-answer predication. They did an
proposed solution. Based on the use of the experiment by using data from Stack Overflow to
history of changes and bug-fixing, and the lexical train a binary classifier. After that, they tested a
similarity between the bug report description and classifier on a dataset retrieved from the legacy
the API documentation estimated the correctness Doc using support forum. The findings showed
of the recommended classes. They evaluated that their model could find best answers with a
their system on six large open-source Java good accuracy when all features are enabled e.g.
projects. The experimental results showed that answer up votes, number of sentences and
the search-based approach was better than answer length. Results gave a positive proof
mono-objective formulations (LS and HS). Their towards the automatic migration of crowd-
search-based approach can find the true buggy sourced knowledge from legacy forums to
methods for over 87% of the bug reports within modern Q&A sites.
the top 10 recommendations.
Jacob Perricone [16] utilizes the network
Lam et al., [13] presented an integrating structure of Stack Overflow to recommend a set
approach between deep neural network (DNN) of related questions for a given input question. In
and rVSM, and an information retrieval (IR) particular, the project employs a modified guided
technique to locating and ranking potential Personalized Page Rank algorithm to generate
relevant classes for bug reports. rVSM gathers candidate recommendations and compares the
the textual similarity feature between bug reports results to those recommended by Stack
and source files. DNN is used to learn to relate Overflow. Semantic similarity and tag-overlap
the terms in bug reports to potentially different were used to assess candidate
code tokens and terms in source files. The recommendations. For a given recommendation
Evaluation of their approach was on real-world set, the average text, title, and tag-overlap scores
bug reports in open-source projects. Combining were calculated. These scores were then
DNN with their new model achieved high averaged across all trials of the experiment to
accuracy of bug localization than the state-of-art yield a final score.
IR and machine learning techniques.
Daniel S. Weld et al., [17], investigate a new
The approach that was used to benefit from the problem of systematically mining question-code
"crowd knowledge" available in stack overflow to pairs from Stack Overflow (in contrast to
aid developers in their activities was presented in heuristically collecting them). They formulated
[14]. This strategy recommended a ranked list of the problem as predicting problem whether a
question-answer pairs from stack overflow based code snippet is a standalone solution to a
on a query. The ranking criteria was based on the question. They proposed a novel Bi-View
textual similarity of the pairs with respect to the Hierarchical Neural Network that can capture
query, the quality of the pairs, and a filtering both the programming content and the textual
mechanism that considers only “how-to” posts. context of a code snippet (i.e., two views) to make
They conducted an experiment about a prediction. On two manually annotated datasets
programming problems on three different topics in Python and SQL domain, the framework
(Swing, Boost and LINQ) frequently used by the substantially outperforms heuristic methods with
software development community. The results at least 15% higher F1 and accuracy.
showed that for Lucene+Score+How-to Furthermore, they presented StaQC (Stack
approach, 77.14% of the assessed activities have Overflow Question-Code pairs), the largest
dataset to date of ∼148K Python and ∼120K SQL
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2
question-code pairs, automatically mined from Yun Zhang et al., [19] proposed a novel approach
SO using this framework. named RFEB, which recommends frequently
encountered bugs (FEBugs) that may affect
Stefanie Beyer et al., [18] aim to automate such many other developers. RFEB analyzes Stack
a classification of SO posts into seven question Overflow, which is the largest software
categories. As a first step, they have manually engineering-specific Q&A communities. Among
created a curated data set of 500 SO posts, the plenty of questions posted in Stack Overflow,
classified into the seven categories. Using this many of them provide the descriptions and
data set, they applied machine-learning solutions of different kinds of bugs. Unfortunately,
algorithms (Random Forest and Support Vector the search engine that comes with Stack
Machines) to build a classification model for SO Overflow is not able to identify FEBugs well. To
questions. They then experimented with 82 address the limitation of the search engine of
different configurations regarding the Stack Overflow, they propose RFEB, which is an
preprocessing of the text and representation of integrated and iterative approach that considers
the input data. The results of the best performing both relevance and popularity of Stack Overflow
models show that their models can classify posts questions to identify FEBugs. To evaluate the
into the correct question category with an performance of RFEB, they performed
average precision and recall of 0.88 and 0.87 experiments on a dataset from Stack Overflow,
when using Random Forest and the phrases which contains more than ten million posts.
indicating a question category as input data for Finally, they compared this model with Stack
the training. The obtained model can be used to Overflow’s search engine on 10 domains, and the
aid developers in browsing SO discussions or experiment results show that RFEB achieves the
researchers in building recommenders based on average 𝑁𝐷𝐶𝐺10 score of 0.96, which improves
SO. Stack Overflow’s search engine by 20%.
Performances
Ref Year Method used Data set Results
Evaluation
Xin et al.[ 11] 2016 - Ranking model Benchmark datasets The learning rank 1) Eclipse :
from open source approach accuracy =80%
- VSM projects: achieved better ,MAP= 0.44
results than the ,MAR=0.51 .
Eclipse BugLocator ,
VSM, Ususl 2)JDT: accuracy
-JDT suspects on all six =80% ,MAP=0.39
projects. ,MRR=0.51 .
- Birt
- SWT 3) Birt
:accuracy=50%
,MAP=0.16
,MRR=0.21.
Fabio et al., [ 6102 Alternating 1). Dataset from The model can accuracy ~90%,
15 ] Decision Trees Docusign , it is a find best answers
(ADT) classifier legacy forum. with a good F=.86, AUC=.71.
+ performance,
2) Dataset from when all features
information gain Stack Overflow are enabled
(IG)
Data mining techniques that were used in these mining stack overflow in order to find a solution
studies are different such as [15] and [18] that for a bug report.
used classification, [16] that used a modified
version of page-rank, [17] that used a novel Bi- 5.1. Overall Framework
View Hierarchical Neural Network algorithm and
[19] that used a novel RFEB approach to By looking to the architecture of our model, we find
recommends frequently encountered bugs. The that a bug report is issued as input and the ranked
researchers suggested in [13] and [14] to improve list of related best answers is recommended as
their approaches by leveraging additional types of output. First, when a new bug report is received,
domain knowledge and using the SVM ranking the preprocessing step is then started. The
with nonlinear kernels. They evaluated their Information extracted from the bug report are then
approaches in different datasets of programming formulated as query and issued to the index of
languages codes. In this paper, we tried to take questions. A similarity measure between the
benefits from these suggestions in the design and query and a set of answers is calculated. The N
the development of our proposed framework. best selected answers are then passed to the
Table 1 is showing a summary of all these related kernel of our model. The list of proposed solutions
studies. is then re-ranked by score. The learning to rank
approach is applied to train a ranking model that
use many features extracted from the Stack
5. Proposed Approach Overflow dataset. Figure 2 shows the overall
architecture of our model framework.
Our framework is using the concept of
recommender systems to model the problem of
Refers to the procedure of transforming deferent Based on cosine similarity between the bug
variations of the same word to their stem, usually report and each question, the top N questions are
through stripping suffixes and prefixes such as ranked. The system collect the answers that are
(ed, ize, s, ing in English Language). Although directly related to the question from the SO
stemming is a very commonly used process in database and return the whole set of answers
information retrieval, it might cause false which will be used again as input to the ranking
matching of some words with deferent stems to model. The learn to rank process generate a new
the same root. ordered set of answer after re-ranking them by
score of pertinence to the original bug report.
5.3. Building TF-IDF Index of questions
problem is defined as a derivation of ordering learning to rank model based on deep learning to
over a list of examples that maximizes the utility recommend relevant solutions for programing
of the entire list [21]. We can consider this error and bug report.
approach as very similar to classification and
regression problem but ranking problems are For future work, we will try to test the approach of
fundamentally different. While the goal of learning to rank using pair-wise or list-wise
classification or regression is to predict a label or techniques. We will also try to improve our model
a value for each individual document, the goal of performance using features that are more specific
ranking is to optimally sort the entire example list in the training phase. Other deep learning
in a way that the examples with highest relevance algorithms and techniques like CNN and LSTM
are presented first. will be also considered for future testing and
In this way, we can naturally apply learning to comparison purposes.
rank to build ranking models to recommend
solution for bug report based on Stack Overflow
Data. We have two stage in learning to rank 7. REFERENCES
algorithm, the learning stage and the deployment [1] A. E. Hassan, “The road ahead for Mining
(or testing) stage. The main task in this Software Repositories,” in Frontiers of
recommendation system is to train a ranking Software Maintenance, Oct 2008, pp. 48–57.
model f(Bri, Aj) where Br represents the bug [2] A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H.
report and A represented the associated V. Nguyen, and T. N. Nguyen. A topic-based
Answers from stack overflow community. approach for narrowing the search space of
recently, deep learning approaches were buggy files from a bug report. In Proceedings
of the 26th International Conference on
achieving better results compared to previous Automated Software Engineering, pages
machine learning algorithms on tasks like image 263–272, 2011.
classification, natural language processing, face [3] Crowdsourcing. Wikipédia [Online]. 2020.
recognition and text mining field. Deep learning [Accessed July,17 2020]. Available on:
techniques have high power and capacity to https://en.wikipedia.org/wiki/Crowdsourcing.
resolve complex and non-linear problem . Neural [4] M. Ahasanuzzaman, M. Asaduzzaman, C. K.
networks can effectively incorporate sparse Roy, and K. A. Schneider. Mining Duplicate
Questions in Stack Overflow. In Proc. of MSR
features like query or document text. Based on 2016.
these advantages, we opted for the use of deep [5] Mining Software Repositories. Wikipédia
learning model to solve our ranking problem by [Online]. 2020. [Accessed July,17 2020].
training it from the textual data of stack overflow Available on:
dataset. https://en.wikipedia.org/wiki/Mining_software
_repositories
[6] Data Mining Curriculum: A Proposal. KDD
6. Conclusion [Online]. 2020. [Accessed July,17 2020].
Available on:
https://www.kdd.org/curriculum/index.html
This work aims to develop and propose a [7] M. I. Jordan, T. M. Mitchell, Machine learning:
recommender system based on learning to rank Trends, perspectives, and prospects.
approach and deep learning techniques. Our Science 349, 255–260 (2015).
main idea was to investigate the use of Data [8] Data science and machine learning. IBM
mining on Stack Overflow to automatically Analytics [Online]. 2020. [Accessed July,17
suggest relevant solutions that fix software bugs 2020]. Available on:
https://www.ibm.com/analytics/machine-
and programming errors. This system will learning.
decrease the time generally spent by developers [9] Jeff Heaton. “Ian Goodfellow, Yoshua
during the manual efforts for fixing bugs or during Bengio, and Aaron Courville: Deep learning -
the consultation of Q&A websites. The MIT Press, 2016, 800 pp, ISBN:
0262035618”. In: Genetic Programming and
We started our research by discovering this new Evolvable Machines 19.1-2 (2018).
area of data mining: mining software repositories. [10] Manning, C. and Schütze, H. 1999.
This new field will have an important impact in the Foundations of Statistical Natural Language
future of software development. We also provided Processing, MIT Press. May 1999.
an overview of Text mining, NLP, Recommender [11] Ye, Xin & Shen, Hui & Ma, Xiao & Bunescu,
Razvan & Liu, Chang. (2016). From Word
Systems and Deep learning techniques. Embeddings To Document Similarities for
Improved Information Retrieval in Software
In the part of contribution, we proposed an Engineering. The 38th International
architecture of recommender system that contain Conference on Software Engineering, ICSE
baseline stage based on TF-IDF index and a ’16, May 14-22, 2016, Austin, TX, USA.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2
[12] Almhana, Rafi & Mkaouer, Mohamed Wiem [19] Y. Zhang, D. Lo, X. Xia, J. Jiang, and J. Sun.
& Kessentini, Marouane & Ouni, Ali. (2016). Recommending frequently encountered
Recommending relevant classes for bug bugs. In International Conference on
reports using multi-objective search. ASE Program Comprehension, Gothenburg,
2016: Proceedings of the 31st IEEE/ACM Sweden, 2018.
International Conference on Automated [20] T. Y. Liu. Learning to rank for information
Software Engineering, August 2016 Pages retrieval. Foundations and Trends in
286–295, Singapore. Information Retrieval, 3(3):225–331, 2009.
[13] N. Lam, A. T. Nguyen, H. A. Nguyen, and T. [21] H. Li, Learning to Rank for Information
N. Nguyen, “Bug localization with Retrieval and Natural Language Processing,
combination of deep learning and information San Mateo, CA, USA:Morgan & Claypool
retrieval,” in Program Comprehension Publishers., 2011.
(ICPC), 2017 IEEE/ACM 25th International
Conference on. IEEE, 2017, pp. 218–229.