0% found this document useful (0 votes)
6 views11 pages

Preprints202008 0265 v2

This paper proposes a recommender system model for software bug resolution using data from Stack Overflow, aiming to automate and streamline the debugging process for developers. It discusses the importance of mining software repositories and the application of machine learning and deep learning techniques to extract relevant information and improve bug-fixing efficiency. The study highlights the potential of leveraging unstructured data from platforms like Stack Overflow to enhance software development practices.

Uploaded by

bloody.mary.3pm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views11 pages

Preprints202008 0265 v2

This paper proposes a recommender system model for software bug resolution using data from Stack Overflow, aiming to automate and streamline the debugging process for developers. It discusses the importance of mining software repositories and the application of machine learning and deep learning techniques to extract relevant information and improve bug-fixing efficiency. The study highlights the potential of leveraging unstructured data from platforms like Stack Overflow to enhance software development practices.

Uploaded by

bloody.mary.3pm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.

v2

Mining Stack Overflow: a Recommender


Systems-Based Model

Fouzi Harrag Mokdad Khamliche


Computer Sciences Department, Computer Sciences Department,
College of Sciences, College of Sciences,
Ferhat Abbas University, Ferhat Abbas University,
Setif, Algeria, Setif, Algeria,
[email protected] [email protected]

Abstract – In software development, developers received bug reports that describe the software bug.
Developers find the cause of bug through reviewing the code and reproducing the abnormal behavior that can
be considered as tedious and time-consuming processes. The developers need an automated system that
incorporates large domain knowledge and recommends a solution for those bugs to ease on developers rather
than spending more manual efforts to fixing the bugs or waiting on Q&A websites for other users to reply to
them. Stack Overflow is a popular question-answer site that is focusing on programming issues, thus we can
benefit knowledge available in this rich platform. This paper, presents a survey covering the methods in the
field of mining software repositories. We propose an architecture to build a recommender System using the
learning to rank approach. Deep learning is used to construct a model that solve the problem of learning to
rank using stack overflow data. Text mining techniques were invested to extract, evaluate and recommend
the answers that have the best relevance with the solution of this bug report.

Keywords – Recommender System, learning to rank, Mining software repositories, Text Mining, Deep learning,
Stack Overflow.

considered in our study case as a knowledge


1. INTRODUCTION repository.
Section 3 summarizes the state of modern mining
In software development area, there is huge software repositories area and gives definition of
amount of unstructured data that grows fastly data mining, recommender System, NLP
every day. This data exists in different levels and techniques. Section 4 presents an overview of
systems used in the software development related works in the field of mining software
process such as versioning systems, issue repositories with discussion of different proposed
trackers, achieved communications systems and approaches. Section 5 is dedicated to the
many other repositories. Mining the rich software presentation of our proposed approach. We focus
engineering data represents a modern field for the concepts and techniques of machine and
data mining domain that will open big doors for deep learning that will be considered in the
researchers and developers. In fact, the construction of our framework.
investigating in such data will make revolution in
development and maintenance activities.
This paper represent a state of the art for this 2. Software data
research topic. Section 2 of this paper covers the
definitions and the overview of software Unstructured data refer to information that is not
repositories and describe as well each type of organized by following a precise schema or
these repositories. We will also give a global structure. Such data often include text (e.g., email
presentation of Stack overflow platform messages, software documentation, and code
comments) and multimedia (e.g., video tutorials,

© 2020 by the author(s). Distributed under a Creative Commons CC BY license.


Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

presentations) contents. These kinds of data are


estimated to represent 80% of the overall
information created and used by enterprises in
software projects [1]. Large amount of artefacts
are generated in development process. This huge
data is continuously growing over time. It is called
software repositories.

2.1. Software repositories

Ahmed E. Hassan define software repositories [1]


as a record-keeping database that stores data Figure 1 Types of Software Repositories [2]
about artifacts of a complex computer based
system. It tracks changes applied to the artifact
and stores corresponding Meta data. Source 2.1.2 Importance of Software repositories:
control repositories, bug repositories, archived
communications, deployment logs, and code  Software repositories contain a wealth of
repositories are examples of software valuable information about software projects,
repositories that are commonly available for most that’s why they are considered as a first class
software projects. source of information.
 Software engineering becomes as a
2.1.1 Type of software repositories more and more data-driven discipline lowering
the dependency on intuition and experience of
Through the type of information and their purpose developers.
in this repositories. We can divide them to the  When we applies mining techniques on
following types: software repositories, we win a lot of time and
 Historical repositories: record costs and we will increase the productivities of
information about the evolution and progress of development and maintenance software projects.
project. This type of repositories can contain
2.1.3 Stack Overflow as a Knowledge
tracks of all changes of the source code, the
historical bug reports, the communication Repository
between team of development Such as: Version
control systems (CVS, SVN, Git, Mercurial), Bug Social media, unlike traditional media, gives
repositories (Bugzilla, JIRA), Mailing lists (e- people an easy way to communicate, collaborate
mails, wiki pages) and Development and share information with each other’s. There
collaboration sites (Stack Overflow). are many types of social media such as blogs,
micro blogs, bookmarking sites, social news,
 Code repositories: contain source code media sharing and social networks. In the present
of various applications Developed by several time, Social media turns into social productivity
teams of developments Such as Code bases through the participation of individuals in their
(Source Forge, Google Code) and Project
ecosystems (GitHub). ideas, experiences and skills to generate content.
This collective work is called crowd sourcing.
 Runtime repositories: contain D.Brayvold [3] defined crowd sourcing as "the
information about the execution and usage of an process of getting work or funding, usually online,
application Such as Crash reports, Field logs, and from a crowd of people". It generates content by
Execution traces. participation of a large group of people in their
skills, ideas, and knowledge. The usage of crowd
 Mobile Application repositories: sourcing can help companies and individual in
contains the logs and bug report of applications
doing their tasks with low cost compared to
mobile, feedbacks of users Such as: App Stores
(Google Play Store, Apple App Store), mobile employing specialists as well providing a high
apps user feedbacks (reviews, ratings). number of people who are ready to work anytime.
Figure 1 shows examples of the current and One of the most important crowd sourcing
historical artifact and interaction that are platforms is stack overflow.
registered in software repositories.
Stack overflow depends on the crowd to provide
accumulated and to construct quality developer-
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

related knowledge. Extracting knowledge from points that can be earned in a day is 200, thus
Stack Overflow by using data mining will be a making sure that the reputation gained by a user
great success of major interest for the is by actively and consistently participating in the
community. In the next section, we will outline site activities.
and clarify the concept of mining software
Privileges: are the accesses that user unlock
repositories.
each time it win a certain reputation. For example,
2.1.4 What is Stack overflow? user can negatively vote a position only if it has
at least 15 reputations. The last privilege is that
Ahasanuzzaman et al [4] define Stack Overflow which will allow user to have access to the
as "a popular question answering site that is Google analytics data of the site.
focused on programming problems». In this
community, Users can ask questions, provide Badges: user can earn badges when it perform a
answers to the questions asked, mark the specific predefined operation. For example,
questions as favorite, up vote / down vote an earning the Supporter badge when user vote
answer, tag questions, and carry out other positively for the first time.
community related tasks. Programmers have
actively used it to ask questions from January 3. Mining Software repositories (MSR)
2009. Today there is more than 18 million
questions, 27 million answers, and 10 million The field of mining software repositories aims at
users. examining and analyzing “the rich data available
in software repositories to uncover interesting
In addition to the above, Stack Overflow makes
and actionable information in about software
some mechanisms of tagging and earning
projects and systems” [5].
reputation, badges to have more privileges in the
community. These mechanisms are explained in 3.1. Application of MSR
the following lines:
 Prediction and identifying bugs: Predicting
Tags: Stack Overflow allows users to tag each
the occurrence of bugs remains one of the
question, with up to a maximum of five tags.
most active areas in software engineering
Users can select an existing tag provided in the
research. By using MSR, it is possible to
autocomplete text box or create a new one. To
predict and localize the bugs, so managers
create a new tag, users need to have a minimum
can allocate testing resource appropriately,
level of reputation on Stack Overflow. This makes
developers can review risky code more
sure that only expert users can create new tags,
closely, and testers can prioritize their testing
which is maintaining consistency among tags
efforts.
found on Stack Overflow. Expert users can also
change the question tags, in case they found  Understanding Software Systems:
them incorrectly tagged. Understanding large software systems
remains a challenge for most software
User Reputation: Stack Overflow provides a
organizations. Documentations for large
metric called Reputation to rank their users.
systems rarely exist and if they exist, they are
Reputation is an approximate measurement of
often not up-to-date. Information stored in
how much the community trusts a user; it is
historical software repositories, such as
earned when the peers appreciate what a user is
mailing lists and bug repositories, represent a
contributing. Users do not need reputation for
group memory for a project. Such information
basic site functionalities such as asking questions
is very valuable for current members of a
and providing answers, however users with high
project.
reputation score gain more privileges. The
primary way to gain reputation is by posting good  Understanding Team Dynamics: Many
questions and useful answers. Votes on these large projects communicate through mailing
posts cause user to gain (or sometimes lose) lists, IRC channels, or instant messaging.
reputation. The maximum number reputation These discussions over many important topics
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

such as plans, design decisions, project Deep learning focuses on a specific category of
policies, and code or patch reviews. These machine learning called Artificial Neural
discussions represent a rich source of Networks that is inspired from the functionality of
historical information about the inner workings the human brain. Modern deep learning provides
of large projects. The mining of these a very powerful framework for supervised
discussions can help better understand the learning. By adding more layers and more units
dynamics of large software development within a layer, a deep network can represent
teams. functions of increasing complexity. Most tasks
that consist of mapping an input vector to an
 Propagating Changes: Change propagation output vector, and that are easy for a person to
is the process of propagating code changes to do rapidly, can be accomplished via deep
other entities of a software system to ensure learning, given sufficiently large models and
the consistency of assumptions in the system sufficiently large datasets of labeled training
after changing an entity. For example, a example [9].
change to an interface may require the
change to propagate to all the components, 3.2.3 NLP Techniques for Data Preprocessing
which use that interface. Instead of using
traditional dependency graphs to propagate NLP is the capacity of a computer program to
changes, we could make use of the historical comprehend human language [10]. Researchers
co-changes. The intuition is that entities co- use NLP techniques to preprocess data before
changing frequently in the past are very likely applying IR models to the data. Preprocessing
to co-change in the future. steps include tokenization, splitting, stop word
removal, stemming and pruning. IR overlaps with
3.2. Data mining for Software Engineering other fields, especially database technology and
natural language processing (NLP). Information
Data mining is the science of extracting useful Retrieval techniques and algorithms are also
knowledge from such huge data repositories, to used in the recommender systems research field.
use this knowledge in the decision process [6].
Data mining is aims to discover hidden and useful
patterns in huge data sets. Data Mining is all 4. Related Works
about discovering unsuspected and unknown
relationships amongst the data. Data mining uses Many researchers have discussed the
machine learning, statistics, AI and database effectiveness of using data mining techniques to
technology to provide reliable results. facilitate the debugging process for software
engineering developers. This section presents an
3.2.1 Machine Learning overview of the state of the art of this research
area.
Tom Mitchell in his book Machine Learning [7]
provides another definition: “The field of machine Xin et al [11] presented a ranking approach that
learning is concerned with the question of how to simulates the bug locating process used by
construct computer programs that automatically developers. The ranking model benefited from
improve with experience”. Like Humans have the domain knowledge such as API specifications,
ability to learn by experience. Machines with the bug-fix history, and code-change history.
machine learning can do the same. The goal of "The ranking score of each source file is
machine learning processes is to generate in calculated as a weighted combination of an array
output a predictive Model based on data used in of features ". Evaluation of experimental results
training. Depending on the nature of the business was done on six open sources java projects
problem being addressed, there are different which are Eclipse, JDT, Birt ,SWT. Tomcat and
approaches based on the type and volume of the AspectJ. Results showed that the ranking
data: supervised learning, unsupervised learning approach is better than BugLocator, VSM, and
and reinforcement learning [8]. Usual suspect approaches. Their method assigns
the relevant files for over 70 % of the bug reports
3.2.2 Deep Learning
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

within the top 10 recommendations in Eclipse and at least one recommended pair proved to be
Tomcat projects. useful to solve a programming problem.
Rafi et al. [12] proposed an automated approach Fabio et al. [15] discussed developers
for finding and ranking potential relevant classes abandoned from legacy developer forums to
for bug reports. Their approach used a multi- stack overflow platform, a lot of crowd-sourced
objective optimization algorithm to find balance knowledge is at risk of being left behind. They
between minimizing the number of recommended aimed to add to the body of evidence of existing
classes and maximizing the correctness of the research on best-answer predication. They did an
proposed solution. Based on the use of the experiment by using data from Stack Overflow to
history of changes and bug-fixing, and the lexical train a binary classifier. After that, they tested a
similarity between the bug report description and classifier on a dataset retrieved from the legacy
the API documentation estimated the correctness Doc using support forum. The findings showed
of the recommended classes. They evaluated that their model could find best answers with a
their system on six large open-source Java good accuracy when all features are enabled e.g.
projects. The experimental results showed that answer up votes, number of sentences and
the search-based approach was better than answer length. Results gave a positive proof
mono-objective formulations (LS and HS). Their towards the automatic migration of crowd-
search-based approach can find the true buggy sourced knowledge from legacy forums to
methods for over 87% of the bug reports within modern Q&A sites.
the top 10 recommendations.
Jacob Perricone [16] utilizes the network
Lam et al., [13] presented an integrating structure of Stack Overflow to recommend a set
approach between deep neural network (DNN) of related questions for a given input question. In
and rVSM, and an information retrieval (IR) particular, the project employs a modified guided
technique to locating and ranking potential Personalized Page Rank algorithm to generate
relevant classes for bug reports. rVSM gathers candidate recommendations and compares the
the textual similarity feature between bug reports results to those recommended by Stack
and source files. DNN is used to learn to relate Overflow. Semantic similarity and tag-overlap
the terms in bug reports to potentially different were used to assess candidate
code tokens and terms in source files. The recommendations. For a given recommendation
Evaluation of their approach was on real-world set, the average text, title, and tag-overlap scores
bug reports in open-source projects. Combining were calculated. These scores were then
DNN with their new model achieved high averaged across all trials of the experiment to
accuracy of bug localization than the state-of-art yield a final score.
IR and machine learning techniques.
Daniel S. Weld et al., [17], investigate a new
The approach that was used to benefit from the problem of systematically mining question-code
"crowd knowledge" available in stack overflow to pairs from Stack Overflow (in contrast to
aid developers in their activities was presented in heuristically collecting them). They formulated
[14]. This strategy recommended a ranked list of the problem as predicting problem whether a
question-answer pairs from stack overflow based code snippet is a standalone solution to a
on a query. The ranking criteria was based on the question. They proposed a novel Bi-View
textual similarity of the pairs with respect to the Hierarchical Neural Network that can capture
query, the quality of the pairs, and a filtering both the programming content and the textual
mechanism that considers only “how-to” posts. context of a code snippet (i.e., two views) to make
They conducted an experiment about a prediction. On two manually annotated datasets
programming problems on three different topics in Python and SQL domain, the framework
(Swing, Boost and LINQ) frequently used by the substantially outperforms heuristic methods with
software development community. The results at least 15% higher F1 and accuracy.
showed that for Lucene+Score+How-to Furthermore, they presented StaQC (Stack
approach, 77.14% of the assessed activities have Overflow Question-Code pairs), the largest
dataset to date of ∼148K Python and ∼120K SQL
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

question-code pairs, automatically mined from Yun Zhang et al., [19] proposed a novel approach
SO using this framework. named RFEB, which recommends frequently
encountered bugs (FEBugs) that may affect
Stefanie Beyer et al., [18] aim to automate such many other developers. RFEB analyzes Stack
a classification of SO posts into seven question Overflow, which is the largest software
categories. As a first step, they have manually engineering-specific Q&A communities. Among
created a curated data set of 500 SO posts, the plenty of questions posted in Stack Overflow,
classified into the seven categories. Using this many of them provide the descriptions and
data set, they applied machine-learning solutions of different kinds of bugs. Unfortunately,
algorithms (Random Forest and Support Vector the search engine that comes with Stack
Machines) to build a classification model for SO Overflow is not able to identify FEBugs well. To
questions. They then experimented with 82 address the limitation of the search engine of
different configurations regarding the Stack Overflow, they propose RFEB, which is an
preprocessing of the text and representation of integrated and iterative approach that considers
the input data. The results of the best performing both relevance and popularity of Stack Overflow
models show that their models can classify posts questions to identify FEBugs. To evaluate the
into the correct question category with an performance of RFEB, they performed
average precision and recall of 0.88 and 0.87 experiments on a dataset from Stack Overflow,
when using Random Forest and the phrases which contains more than ten million posts.
indicating a question category as input data for Finally, they compared this model with Stack
the training. The obtained model can be used to Overflow’s search engine on 10 domains, and the
aid developers in browsing SO discussions or experiment results show that RFEB achieves the
researchers in building recommenders based on average 𝑁𝐷𝐶𝐺10 score of 0.96, which improves
SO. Stack Overflow’s search engine by 20%.

Table 1: Summary of related studies

Performances
Ref Year Method used Data set Results
Evaluation

Xin et al.[ 11] 2016 - Ranking model Benchmark datasets The learning rank 1) Eclipse :
from open source approach accuracy =80%
- VSM projects: achieved better ,MAP= 0.44
results than the ,MAR=0.51 .
Eclipse BugLocator ,
VSM, Ususl 2)JDT: accuracy
-JDT suspects on all six =80% ,MAP=0.39
projects. ,MRR=0.51 .
- Birt

- SWT 3) Birt
:accuracy=50%
,MAP=0.16
,MRR=0.21.

Rafi et al., [12] - Multi-objective Benchmark datasets NSGA-II is better 1) EclipseIU :


algorithms called than random precision=82%,re
2016 NSGA-II form open-source search and the call=79%,
systems : three mono
- search-based objective accuracy=80%.
algorithms - EclipseIU formulations (
lexical-based 2) Tomcat:
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

-Tomcat similarity (LS) , precision=91%,re


history-based call=81%
-AspectJ similarity (HS,.and
(GA) on all the 6 accuracy=90%.
-Birt
systems.
3) AspectJ
-SWT :precision=79%
- JDT. ,recall=86%
,accuracy =88%.

Lam et al., 2017 Revised Vector Benchmark datasets DNNLOC 1) TomCat


[13] Space Model from open-source achieves highest :accuracy=80.4%,
(rVSM) + Deep projects: Aspectj accuracy with the MRR=0.60,MAP=
Neural Network ,Birt ,Eclipe platform combination of 0.52
(DNN) ,JDT,SWT,Tomcat . relevancy via
DNN, textual 2) AspectJ :
similarity via accuracy=85%,M
rVSM, and the RR=0.52,MAP=0.
metadata 32.
features.
3)

Eduardo et 2016 Logistic Dataset from Stack The Lucene+ NDCGRelev=0.35


al., [14 ] regression Overflow (the Score+ Howto 83
classifier (LR)+ version of March approach
Normalized 2013) achieved better NDCGReprod=0.5
Discounted performance than 243
Google on Boost.
Cumulative Gain
(NDCG)

Fabio et al., [ 6102 Alternating 1). Dataset from The model can accuracy ~90%,
15 ] Decision Trees Docusign , it is a find best answers
(ADT) classifier legacy forum. with a good F=.86, AUC=.71.
+ performance,
2) Dataset from when all features
information gain Stack Overflow are enabled
(IG)

4.1. Discussion The researchers used the same benchmark


datasets for evaluation in [11] [12] [13]. There is
We noted from the previous state of the art that a similarity between [13] and [14], both of them
the most of studies like [11] [12] and [13] provide used vector space model for ranking whereas
a ranking approach that leverages domain [12] used Multi-objective algorithms called
knowledge to locate a bug by ranking all the NSGA-II while [13] used Revised Vector Space
source files likely to contain the cause of the bug. Model (rVSM) and Deep Neural Network (DNN).
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

Data mining techniques that were used in these mining stack overflow in order to find a solution
studies are different such as [15] and [18] that for a bug report.
used classification, [16] that used a modified
version of page-rank, [17] that used a novel Bi- 5.1. Overall Framework
View Hierarchical Neural Network algorithm and
[19] that used a novel RFEB approach to By looking to the architecture of our model, we find
recommends frequently encountered bugs. The that a bug report is issued as input and the ranked
researchers suggested in [13] and [14] to improve list of related best answers is recommended as
their approaches by leveraging additional types of output. First, when a new bug report is received,
domain knowledge and using the SVM ranking the preprocessing step is then started. The
with nonlinear kernels. They evaluated their Information extracted from the bug report are then
approaches in different datasets of programming formulated as query and issued to the index of
languages codes. In this paper, we tried to take questions. A similarity measure between the
benefits from these suggestions in the design and query and a set of answers is calculated. The N
the development of our proposed framework. best selected answers are then passed to the
Table 1 is showing a summary of all these related kernel of our model. The list of proposed solutions
studies. is then re-ranked by score. The learning to rank
approach is applied to train a ranking model that
use many features extracted from the Stack
5. Proposed Approach Overflow dataset. Figure 2 shows the overall
architecture of our model framework.
Our framework is using the concept of
recommender systems to model the problem of

Figure 2: The overall architecture of our model framework

framework, text mining and NLP techniques are


5.2. Data Preprocessing applied on two kinds of data:
 SO data: containing a large amount of
Data Preprocessing is the most important task in textual data presented as posts (questions and
data mining process particularly in text mining answers) and Meta data about users. The goals
field where we are dealing with a huge amount of from this preprocessing stage is to prepare this
raw textual data. This phase has a great impact posts’ data for the task of index building. This
on the performance of the model. In our index will be used to train the ranking model.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

 Bug report: contains a lot of information


about the software bug processed by any TF(t) refers to the term t frequency in the current
developer or programmer. This data represent in document d.
𝐷
our system what we call context (or query). IDF(t, D)=log(DF(t,D)) (2)
DF(t, D) refers the number of document where
5.2.1 Dataset Cleaning the term t appear and IDF(t, D) refers to the
inverse document frequency, or the frequency of
Before performing any type of preprocessing on a term over the whole set of documents.
the dataset, it is necessary to clean the data.We
5.4. Similarity Calculation
filter our data by deleting the unwanted parts.
Once a Bug report is submitted, the system will
5.2.2 Tokenization
search for the N most similar documents among
This process split the sequence of strings into the knowledge bases. In order for our system to
words. It removes all the punctuations marks from rank the results, it needs to rely on a metric to
the input text data and returns words as tokens. compare how similar a pair of documents (a bug
report and a question) are.
5.2.3 Stop word elimination For scoring the similarity between bug report and
question, we used Cosine similarity, which is a
One of the fundamental ideas in the context of standard form of measuring document similarity
information retrieval is the removing of common in vector space model. This measure is the
words that appear frequently in the documents, cosine of the angle between two vectors and can
but do not provide helpful information for the be in the range of 0 (orthogonal vectors) to 1
users' needs. These words called stop words can (identical vectors). Cosine similarity between two
effectively decrease the retrieval rate. vectors is calculated by the dot product of those
two vectors and divide it by their magnitude.
5.2.4 Stemming Cosine (Q, D) = Q*D / |Q| * |D| (3)

Refers to the procedure of transforming deferent Based on cosine similarity between the bug
variations of the same word to their stem, usually report and each question, the top N questions are
through stripping suffixes and prefixes such as ranked. The system collect the answers that are
(ed, ize, s, ing in English Language). Although directly related to the question from the SO
stemming is a very commonly used process in database and return the whole set of answers
information retrieval, it might cause false which will be used again as input to the ranking
matching of some words with deferent stems to model. The learn to rank process generate a new
the same root. ordered set of answer after re-ranking them by
score of pertinence to the original bug report.
5.3. Building TF-IDF Index of questions

The vector-space representation is a framework


for representing raw or unstructured documents 5.5. Features Extraction
as vectors of terms. Using this idea, Vectors of
term weights can be represented as one-row To select the best N relevant answers, our
matrix representing the textual features of each learning Model ranks answers based on a
bug report or question in Stack overflow dataset. heterogeneous set of dense and sparse features.
The goal from building this index is to calculate In our model, we focus on textual/embedding
the similarity between the bug report as a query features of question-answer pairs and dense
and a set of questions from the OS dataset. The features based on three kinds of aspects: textual,
index of documents is searched to retrieve and community and affective.
rank the N first questions with high similarity.
The vast majority of work using vector-space
representation makes use of the TFIDF 5.6. Learning to rank model
representation of documents. The TFIDF
representation is a heuristic metric that is used as Learning to rank in the context of Information
a weight to represent each term feature of a given Retrieval (IR) is a task used to automatically
document. construct a ranking model based on the training
data. This model is generally used to sort new
TFIDF(t, D)=TF(t).IDF(t, D) (1) objects according to their degrees of relevance,
preference, or importance [20]. The ranking
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

problem is defined as a derivation of ordering learning to rank model based on deep learning to
over a list of examples that maximizes the utility recommend relevant solutions for programing
of the entire list [21]. We can consider this error and bug report.
approach as very similar to classification and
regression problem but ranking problems are For future work, we will try to test the approach of
fundamentally different. While the goal of learning to rank using pair-wise or list-wise
classification or regression is to predict a label or techniques. We will also try to improve our model
a value for each individual document, the goal of performance using features that are more specific
ranking is to optimally sort the entire example list in the training phase. Other deep learning
in a way that the examples with highest relevance algorithms and techniques like CNN and LSTM
are presented first. will be also considered for future testing and
In this way, we can naturally apply learning to comparison purposes.
rank to build ranking models to recommend
solution for bug report based on Stack Overflow
Data. We have two stage in learning to rank 7. REFERENCES
algorithm, the learning stage and the deployment [1] A. E. Hassan, “The road ahead for Mining
(or testing) stage. The main task in this Software Repositories,” in Frontiers of
recommendation system is to train a ranking Software Maintenance, Oct 2008, pp. 48–57.
model f(Bri, Aj) where Br represents the bug [2] A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H.
report and A represented the associated V. Nguyen, and T. N. Nguyen. A topic-based
Answers from stack overflow community. approach for narrowing the search space of
recently, deep learning approaches were buggy files from a bug report. In Proceedings
of the 26th International Conference on
achieving better results compared to previous Automated Software Engineering, pages
machine learning algorithms on tasks like image 263–272, 2011.
classification, natural language processing, face [3] Crowdsourcing. Wikipédia [Online]. 2020.
recognition and text mining field. Deep learning [Accessed July,17 2020]. Available on:
techniques have high power and capacity to https://en.wikipedia.org/wiki/Crowdsourcing.
resolve complex and non-linear problem . Neural [4] M. Ahasanuzzaman, M. Asaduzzaman, C. K.
networks can effectively incorporate sparse Roy, and K. A. Schneider. Mining Duplicate
Questions in Stack Overflow. In Proc. of MSR
features like query or document text. Based on 2016.
these advantages, we opted for the use of deep [5] Mining Software Repositories. Wikipédia
learning model to solve our ranking problem by [Online]. 2020. [Accessed July,17 2020].
training it from the textual data of stack overflow Available on:
dataset. https://en.wikipedia.org/wiki/Mining_software
_repositories
[6] Data Mining Curriculum: A Proposal. KDD
6. Conclusion [Online]. 2020. [Accessed July,17 2020].
Available on:
https://www.kdd.org/curriculum/index.html
This work aims to develop and propose a [7] M. I. Jordan, T. M. Mitchell, Machine learning:
recommender system based on learning to rank Trends, perspectives, and prospects.
approach and deep learning techniques. Our Science 349, 255–260 (2015).
main idea was to investigate the use of Data [8] Data science and machine learning. IBM
mining on Stack Overflow to automatically Analytics [Online]. 2020. [Accessed July,17
suggest relevant solutions that fix software bugs 2020]. Available on:
https://www.ibm.com/analytics/machine-
and programming errors. This system will learning.
decrease the time generally spent by developers [9] Jeff Heaton. “Ian Goodfellow, Yoshua
during the manual efforts for fixing bugs or during Bengio, and Aaron Courville: Deep learning -
the consultation of Q&A websites. The MIT Press, 2016, 800 pp, ISBN:
0262035618”. In: Genetic Programming and
We started our research by discovering this new Evolvable Machines 19.1-2 (2018).
area of data mining: mining software repositories. [10] Manning, C. and Schütze, H. 1999.
This new field will have an important impact in the Foundations of Statistical Natural Language
future of software development. We also provided Processing, MIT Press. May 1999.
an overview of Text mining, NLP, Recommender [11] Ye, Xin & Shen, Hui & Ma, Xiao & Bunescu,
Razvan & Liu, Chang. (2016). From Word
Systems and Deep learning techniques. Embeddings To Document Similarities for
Improved Information Retrieval in Software
In the part of contribution, we proposed an Engineering. The 38th International
architecture of recommender system that contain Conference on Software Engineering, ICSE
baseline stage based on TF-IDF index and a ’16, May 14-22, 2016, Austin, TX, USA.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2020 doi:10.20944/preprints202008.0265.v2

[12] Almhana, Rafi & Mkaouer, Mohamed Wiem [19] Y. Zhang, D. Lo, X. Xia, J. Jiang, and J. Sun.
& Kessentini, Marouane & Ouni, Ali. (2016). Recommending frequently encountered
Recommending relevant classes for bug bugs. In International Conference on
reports using multi-objective search. ASE Program Comprehension, Gothenburg,
2016: Proceedings of the 31st IEEE/ACM Sweden, 2018.
International Conference on Automated [20] T. Y. Liu. Learning to rank for information
Software Engineering, August 2016 Pages retrieval. Foundations and Trends in
286–295, Singapore. Information Retrieval, 3(3):225–331, 2009.
[13] N. Lam, A. T. Nguyen, H. A. Nguyen, and T. [21] H. Li, Learning to Rank for Information
N. Nguyen, “Bug localization with Retrieval and Natural Language Processing,
combination of deep learning and information San Mateo, CA, USA:Morgan & Claypool
retrieval,” in Program Comprehension Publishers., 2011.
(ICPC), 2017 IEEE/ACM 25th International
Conference on. IEEE, 2017, pp. 218–229.

[14] Campos, Eduardo & de Souza, Lucas &


Maia, Marcelo. (2016). Searching Crowd
Knowledge to Recommend Solutions for API
Usage Tasks. Journal of Software: Evolution
and Process. 28. 1-32.
[15] Calefato, Fabio & Lanubile, Filippo & Novielli,
Nicole. (2016). Moving to Stack Overflow:
Best-Answer Prediction in Legacy Developer
Forums, 1-10, ESEM '16, September 08-09,
2016, Ciudad Real, Spain.
[16] Jacob Perricone, Question Recommendation
On the Stack Overflow Network, Stanford
University, 2017.
[17] Ziyu Yao, Daniel S Weld, Wei-Peng Chen,
and Huan Sun. 2018. StaQC: A
Systematically Mined Question-Code
Dataset from Stack Overflow. arXiv preprint
arXiv:1803.09371 (2018).
[18] Stefanie Beyer, Christian Macho, Martin
Pinzger, and Massimiliano Di Penta. 2018.
Automatically classifying posts into question
categories on stack overflow. In the 26th
Conference. ACM, Gothenburg, Sweden,
211–221.

You might also like