Automatic Learning Path Recommendation For Open Source Projects Using Deep Learning On Knowledge Graphs
Automatic Learning Path Recommendation For Open Source Projects Using Deep Learning On Knowledge Graphs
Abstract—Open source is an important way for developers to When developers are trying to commit a contribution to an
collaborate on software development. More and more developers existing open source project, the first thing they need to do is to
begin contributing to open-source projects. When a developer read and understand the project code according to their goals of
begins to contribute to an existing open source project, the first contribution. However, it is difficult for developers to directly
thing to do is to read and understand the project code. However, find the source code related to their goals. The main features of
most current open source projects only provide API any current open source community (such as GitHub [1]) focus
documentation, not project design documents for new developers. on version control and project management, such as discussing
Developers can only understand the code based on scattered issues, reviewing pull requests, etc.. These features don’t
comments in the code, which are difficult for new comers.
contain the feature of project code knowledge management
Therefore, developers need to find a learning path, which helps
them understand the project and finish their contribution tasks
needed by new developers. At the same time, most of open
quickly. In order to help developers find the learning path easily source projects only maintain user documentation, and lack of
and quickly, this paper puts forward a method to automatically documentation for developers. New developers can only learn
recommend learning paths of open source projects. It uses the project code by reading the code of the project gradually.
multiple data sources in an open source community to extract Functions are the core part of the source code. Most of the
knowledge data and build knowledge graphs for open source source code which developers would like to contribute to is
projects. After that, based on a deep-learning-based knowledge related to specific functions. Therefore, developers always need
graph embedding model and a path recommendation algorithm,
to learn and understand specific functions. However, locating
the method recommends proper learning paths for developers. We
select three well-known open source projects, including Lua,
and reading the function are not enough, because functions are
Memcached and TensorFlow, according to language, scope and always embedded into a function call path to implement a
community activity, as cases to verify our method, and do feature. Usually the developer needs to start from the outermost
comparative experiments between the learning paths found by layer of the program, gradually reads in depth along the function
real developers and recommended by the method. Experiment call to the target function, and learns the entire function and its
results show that our method saves developers a lot of time while position in the program through the call path. This function call
ensuring the accuracy of the recommended learning path. path that the developer goes through during the learning process
can be considered as a learning path for the open source project.
Keywords—Learning path recommendation, Open source For example, the developer can start from a relevant function in
project analysis, Deep learning, Knowledge graph the unit test, go deeper into the specific function tested, and
understand the details of the operation of the function called by
I. INTRODUCTION the test case. During learning this learning path, the quality of
Open source means multiple developers develop together the learning path could be highly concerned. To help developers
through a freely available source code. It is an important way for learn the project more quickly, a proper learning path should not
developers to collaborate on software development. Open contain functions which are not highly related to the main
source projects can not only improve the robustness and safety functionality of the project. Recommending proper learning
of the software, but also help participating developers improve paths is therefore a critical task. But so far, there is no good
development abilities. With the rapid development of open enough solution to help developers find proper learning paths .
source communities in recent years, popular open source
In order to solve the problem above, we propose a learning
projects are not only used by individual developers, but also
path recommendation method for open source projects to help
introduced by enterprises as part of their technology stacks.
developers understand the knowledge they need for
Since there are seldom software analysis and design documents
development easier, so as to commit contributions more quickly.
in open source communities, it is a huge challenge for new
The method is composed of three parts. Firstly, the method
developers, who want to participate in the contribution of open
analyzes and constructs the knowledge graph of an open source
source projects, to learn and get familiar with the source code.
project based on the information including code, documents,
discussions and comments of the project. Secondly, the method
* Corresponding author.
825
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
of the program itself, such as risk analysis and defect analysis. the approach. SLAMPA is a tool that uses neural networks to
Gascon et al. propose an embedding method based on call write source code snippets [16]. It first uses neural language
graphs and use machine learning to extract a structure graph to models to infer developers’ programming intent, then retrieves
identify the code structure of the malware [9]. Trapp et al. the source code snippets from code bases and recommends
divide the program into different parts and permissions them to developers.
according to functions based on static analysis results and call There are also many other studies focusing on learning-
graphs to improve the security of the program [10]. related recommendation for developers, which can improve the
The studies mentioned above on source code analysis learning efficiency of developers. Sun et al. propose a method
mainly focus on the use of source code itself, and the for building code knowledge graphs based on open source
granularity of analysis is usually at code segment level. These projects and recommending learning paths to developers [17].
studies mainly utilize source code analysis to find errors or to Prabhakar et al. propose a recommendation system for
find code logic for specific functions. However, for an open developers to match the developers who are mutually interested
source project, how to make it easier for developers to learn and in each other in potential [18]. The system can improve the
use is more important. Especially, how to make it easier for communications between different developers. Dai et al.
developers in an open source community to participate in and propose a supporting system which can recommend a personal
contribute source code is a key to maintaining the long-term learning path of different learning objectives for individual
development of the open source project. At present, there are learners [19]. Chang et al. develop a data-driven learning
relatively few studies focusing on developers learning open interest recommendation system [20]. Pang et al. propose a new
source project. recommendation model with learner neighbors and learning
series, called RLNLS [21]. These studies currently mainly
B. Open Source Project Analysis
focus on assisting smart development and collaborative
With the growth of an open source community, the development of developers based on the open source
community itself has accumulated a wealth of development community. However, there are few studies to help developers
data. Through analyzing and mining the development data, learn open source projects, the data sources used in these
researchers can help developers learn and develop projects studies are relatively limited and only contain single-version
better. At present, there are many studies on how to better use project information, and the support for developer learning is
the information in open source projects to assist developers. also very limited.
Many studies view source code in an open source community
as a document used for searching. They assist developers in C. Knowledge Graph Analysis
searching source code in open source projects. Zou et al. In recent years, the construction and the application of
propose a novel approach based on graph embedding to search knowledge graphs have grown rapidly. People have created a
source code in an open source project [11]. They extract large number of knowledge graphs and successfully applied
properties and definitions in the source code to build a code them in many practical applications, such as: Freebase [22],
graph and use the embedding vector of the code graph for code DBpedia [23], YAGO [24] and NELL [25].
Open source projects contain a lot of knowledge which is of
search. CodeHow is an approach to recognizing potential APIs
great help to developers’ learning process of open source
and understanding the potentially relevant APIs [12]. It expands projects, such as source code, Issues, Commits, etc. This paper
queries with the APIs and performs code retrieval by applying utilizes the knowledge of a specific open source project in an
the Extended Boolean model, which considers the impact of open source community to build a knowledge graph for a
both text similarity and potential APIs on code search. Lin et al. specific open source project, and further generates an
find that in large-scale open source projects, developers would embedding vector for each node in the knowledge graph using
face the gap between the words used in querying and using a knowledge graph embedding model for later learning path
documents when they want to learn APIs. They propose an recommendation. This type of embedding model is called a
approach to improving searching API learning content which translational distance model. TransE is a representative
leverages software-specific knowledge [13]. translational distance model [26]. The model is based on a
distributed vector representation of entities and relationships.
On the other hand, there are many studies on assisting
TransH [27], TransR [28] and TransSparse [29] are a series of
developers in developing projects better from different aspects. improved models based on TransE. In addition, Perozzi et al.
The most usual scenario is assisting developers in programming. propose an embedding approach called DeepWalk, which uses
Nowadays, the IDE (Integrated Development Environment), the SkipGram model to predict the embedding vector of a
such as Intellij, can only complete and check developers’ code current node through adjacent joints [30].
in a limited way [14]. Linn et al. propose an approach to In summary, in the field of source code analysis in open
detecting reusable source code in source code projects and source communities, most of the studies are currently focusing
helping developers directly fill in the code detected using on analyzing the code itself. The analysis methods mainly use
templates [15]. The experiment results show the efficiency of information such as syntax tree and other static analysis results.
826
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
The studies on learning-related recommendation for developers On the other hand, we also need to extract the knowledge
mostly focus on how to help developers in programming, such from the source code of the open source project. As we
as recommending source code snippets, understanding the analyzed in Section II-A, compared with dynamic analysis,
developer’s intentions and intelligent programming tools. static analysis is fast, convenient, and less dependent, and is
However, for developers, the biggest obstacle preventing them more suitable for our method. This paper uses a static code
from contributing to open source communities is not how to analysis method to extract knowledge from the project
write code, but how to understand open source projects. Also, code.Using static analysis, we can obtain the information of the
one important thing to notice is that there is a lot of code-related program code itself from the source code of an open source
knowledge in open source communities. Integrating the project, including the functions, files, and classes in the project
knowledge and recommending learning paths for open source code, as well as relationships between them, namely function
projects can help developers get familiar with open source call and inclusion relationship. We complete the static analysis
projects more effectively. using Doxygen, which is a popular, multi-language supported,
cross-platform static analysis tool. Doxygen can take source
III. METHOD OVERVIEW code of a project as input and extract static analysis information
We propose a method which recommends learning paths of such as function call relationship, the structure of files, the
open source projects based on knowledge graphs for developers. attributes of functions, etc. Taking the function call relationship
We will introduce the detail of the method in the following. as an example, Doxygen analyzes the code for each file and
A. Method Framework each project module, and generates partial function call
relationship subgraphs respectively. These function call
Our method recommends a proper learning path for subgraphs are described and outputted in Dot language
developers to help them better understand the specific functions respectively [31], which provides a simple way to describe
they want to learn. Fig. 1 shows the framework of the method.
graphics.The definition and construction of knowledge graphs:
This method is composed of three main parts: open source
project knowledge graph construction, open source project In order to build a knowledge graph of an open source project,
knowledge graph analysis, and automatic recommendation of we must define the schema of the graph at first. The Schema of
learning path. the knowledge graph is the specification of the knowledge in
the graph. Pre-designing schema help standardization, which
B. Open Source Project Knowledge Graph Construction finally facilitate the subsequent processing and querying of
knowledge. Fig. 2 shows the schema designed for the
knowledge graph of an open source project in the method.
In order to implement the proposed method, we need to Fig. 2. Schema of a knowledge graph
construct the knowledge graph of an open source project at first.
Since this paper is mainly aimed at the developers who want
We construct the knowledge graph with multiple data sources,
to learn and understand open source projects, the entities in the
which can help model relationships between functions better.
knowledge graph mainly contain various knowledge that is
The graph construction includes two parts: the data collection
related to development, such as functions, files, commits, issues
of an open source project, the definition and construction of the
and pull requests, etc. The relationships between different kinds
open source project knowledge graph.
of entities are also extracted. Table I shows the entities and their
1) Data collection: Open source projects in an open source
descriptions during the knowledge graph extraction of open
community have accumulated huge amounts of data. This data
source projects in this paper. Similarly, Table II shows the
contains not only the code and the documents but also a lot of relationships and their descriptions. The relationships are
data generated by developers during development processes. displayed in the form of Subject-Predicate-Object (SPO) triples.
Taking a widely used open source community GitHub as an In SPO triples, Subject has a Predicate relationship to Object,
example, the information involved in its open source projects where Subject is an entity, and Object can be an entity or an
about development can mainly be divided into Commits, Issues attribute.
and Pull Requests. We collect all of the above information to When developers read and learn the learning materials of
help recommend learning paths. open source projects, such as blogs and discussions, they often
827
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
encounter the problem that the learning materials and the actual composed of entities (nodes) and relationships (different types
project versions do not match. The common reason why this of edges). Each edge is represented as a triple, namely (object,
problem shows up is that the project is updated too fast and the relationship, subject). Although it is effective in representing
learning materials cannot keep up with the update speed of the structured data, the underlying symbolic nature of such triples
versions. Another reason is that a specific scenario requires often makes knowledge graphs difficult to be manipulated.
referring an old version of the project. This problem may cause In order to solve the problem above, researchers have
the developers to learn the knowledge of the project not easily. proposed a new research direction: knowledge graph
Therefore, this paper extracts different stable versions of an embedding. The key idea is to embed components of the
open source project during data extraction. Open source knowledge graph, including transforming entities and
projects use the Tag function in the version control system to relationships into a continuous vector space. This method can
mark the submission location of each release in the version simplify the operation on the knowledge graph while preserving
submission records during development. Therefore, we can roll the original structure of the knowledge graph. The embedding
back the project to a specific version and execute the data of these entities and relationships can be further used in various
extraction process separately to obtain the knowledge graphs of tasks, such as knowledge graph completion [32], relationship
different versions for subsequent analysis. extraction [33] [34], entity classification [35] [36] and entity
resolution [36] [37]. In our method, the embeddings are used to
TABLE I. TYPE OF ENTITIES
calculate the semantic distances between functions and help
recommend learning paths. To represent the knowledge graph
Entity Description
using vectors, One-Hot vectors are used. However, the
representing a function in the code of an open source
func
project dimension of the One-Hot vectors is too high, and it cannot
file representing a file in an open source project
express the similarity between similar entities or relationships.
Therefore, researchers use distributed representations to
representing a submission record in the submission history
commit
of an open source project
represent entities and relationships in the knowledge graph. The
representing a collection of questions and comments of an TransE model is a graph embedding model which can express
issue
open source project in the open source community a triple as its corresponding embedding.
pull request
representing a merging request from an open source In our method, the graph embedding model TransE is
project in the open source community
trained to generate an embedded representation of the
TABLE II. TYPE OF RELATIONS knowledge graph of the open source project, and the embedding
Relation Description vector is generated for each node. These embedding vectors can
(sub, func_call, obj)
The sub function calls the obj represent the position of the entity in the embedded space for
function the knowledge graph of the open source project. The
(sub, file_contain_func, obj) The sub file contains the obj function subsequent algorithms are based on the distance calculated in
(sub, commit_change_file, obj)
The sub submission record modified the embedded space between entities, that is, the distance
obj file weight of the relationship between the entities. Although the
The sub issue involves the obj
(sub, issue_relate_commit, obj) learning path recommended by our method contains only
submission record
functions, other entities are also embedded to better calculate
(sub, issue_relate_issue, obj) The issue involves the obj issue
the semantic distances between each pair of functions. This
(sub, issue_relate_pr, obj) The issue involves the obj pull quest paper uses the OpenKE framework for model training. OpenKE
(sub, pr_relate_commit, obj)
The sub pull request contains obj is an open source framework for knowledge embedding
commit organized by THUNLP based on TensorFlow toolkit [38] [39]
The sub pull request contains obj
(sub, pr_relate_commit, obj)
commit
[40]. The OpenKE framework provides a fast and stable toolkit,
including the most popular KRL (knowledge representation
(sub, pr_relate_file, obj) The sub pull request contains obj file
learning) method [41].
C. Open Source Project Knowledge Graph Analysis 2) Multi-version knowledge fusion: There are many data
sources in an open source community, and a lot of knowledge
After the above data extraction and knowledge graph
construction of an open source project, we generate original in the field of source code. With the development and iteration
knowledge graphs for different versions of the open source of project code, the expression, data format, and the consistency
project. The knowledge graphs contain all knowledge triples of the knowledge may not keep consistent. The multi-source
from different data sources. Next, we need to analyze the open knowledge needs to be extracted, disambiguated, and integrated.
source project knowledge graph and train a graph embedding At the same time, after the program is developed and iterated,
model based on deep learning. A deep-learning-based graph the potential knowledge of the community cannot be mapped to
embedding model embeds every entity into a low-dimensional the latest version of the code. The above problem makes novice
vector, which is used by the recommendation algorithm we developers unable to map the content of the information to the
propose later for calculating distances between knowledge actual code, and the development experiences cannot be shared
graph nodes. and saved. For the knowledge from different versions of the
1) Knowledge graph embedding based on deep learning: same project, entity names may not be aligned. Therefore, we
The knowledge graph is a multi-relationship graph which is need to integrate the generated knowledge graphs for the multi-
828
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
version open source project by fuse different versions of source code from a proper learning entrance. Therefore, we first
knowledge graphs before training the graph embeddings. need to analyze the entrance for developers to start learning.
We fuse knowledge graphs of different versions with a Since the entrance needs to be at the outermost layer of the
heuristic method. Considering that the entities in the knowledge project code structure, the ingress of its node in the knowledge
graphs are all programming-related and are extracted directly graph should be 0. However, the nodes with a degree of 0 may
from open source projects, the semantic deviations among the also contain some nodes which are not related to the main
same knowledge from different versions will not be as large as functionality of the project and cannot reach plenty of other
ones in common knowledge graphs. We assign entity nodes in nodes. Using these nodes as entrance may cause recommending
different version of graphs with a same name the same node id, improper learning paths. In order to distinguish these nodes, we
and link knowledge entities with same id during the fusion of further divide 0-indegree nodes according to the characteristics
all versions of the knowledge graphs. In this way, old of the entrance. For developers, while learning repeatedly, they
knowledge is linked to new knowledge by fusing different usually hope to learn more knowledge through one learning
versions of the knowledge graphs. For developers who want to
entrance to reduce their learning effort during their learning
learn the knowledge graph, old knowledge still has the value
process. Besides, considering that the entrance extracted will be
for learning. Therefore, the knowledge graph after fusion will
taken as the input of the following learning path
contain all versions of the knowledge set. Besides, for each
version of the unique knowledge entities and relationships, we recommendation algorithm, the more nodes the entrance is able
will add the ‘gVersion’ attribute to it to indicate which version to reach, the more possible we can reuse it while recommending
of the open source project it comes from. learning paths for other target functions. Because of the above
reasons, the entrance node needs to be able to reach plenty of
D. Automatic Recommendation of Learning Path nodes, and its own reachable node set should be as much as
Algorithm 1 Learning path recommendation algorithm possible. At the same time, the total number of recommended
entrances should not be too large, because too many
Input: entrance set A , target function B , graph G = (V, E)
recommended entrances will result in an impact on the
Output: path (𝐍𝒊 , 𝐍𝒊"𝟏 , ⋯ , 𝐍𝒊"𝒏 ) ∧ 𝑵 ∈ 𝑮
readability of recommendation results. Therefore, to ensure that
1. path_list = [ ]
the number of entrances does not affect readability, we count
2. For entrance in A : and filter out the nodes that meet the two conditions above and
3. path_list.append(dijkstra_path(G, entrance, B)) consider the nodes which rank as high as possible as the
4. End entrance nodes for learning. In this way, the method generates
5. last_dominate_num = 0 a list of entrances used in subsequent algorithms.
6. For cur_path in path_list: 2) Learning path recommendation algorithm: In order to
7. source = cur_path.first help developers understand the project code more easily, we
8. current = dfs_tree(𝐆,source).number_of_nodes() propose a learning path recommendation algorithm for project
9. If current > last_dominate_num: code. Algorithm 1 shows the pseudo-code of the learning path
10. last_dominate_num = current recommendation algorithm.
11. path = cur_path First, we abstract the learning path analysis problem into a
12. End
path search problem from multiple source points to a single
13. End
target point. When developers learn the target function through
the learning path, they can save more learning time if the
14. return path
learning path can involve the main logic of the program and can
After the above steps, we have constructed and further link more functions. Therefore, we need to choose the most
processed the generated knowledge graph of an open source extensive path from multiple reachable paths as the final
project. Then we will implement the learning path learning path. The algorithm first uses the Dijkstra algorithm to
recommendation based on this knowledge graph. The perform a path search from each entrance to the target function.
recommendation method is divided into two parts: learning Then, for each path, the path with the most nodes in the
entrance analysis and learning path recommendation algorithm. coverage tree is selected as the recommended learning path.
Learning path: The learning path here is defined as: When
a developer wants to understand a specific function in an open IV. EXPERIMENTS
source project, he needs to read the function from the outermost This paper uses the open source community GitHub as the
layer of the program call relationship and follow the function data source for analysis, and selects three representative well-
call to go deep into the target function step by step, learn the known open source projects from GitHub: Lua [43],
entire function and its position in the program by the call path. Memcached [44] and TensorFlow [45] as cases for experiments.
In this process, the function call path read by the developer is These three open source projects have different code sizes,
the learning path, namely(N! , N!"# , ⋯ , N!"$ ) ∧ 𝑁 ∈ 𝐺, where different numbers of discussions in the open source community,
N is a function node in the function call graph G. and different language features. They are all well-known and
1) Learning entrance analysis: Developers who want to widely used open source projects. They are commonly
understand a specific function usually need to start reading
829
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
recommended for learning among novice developers and open TABLE IV. ENTRANCE ANALYSIS EXPERIMENT RESULTS
source community contributors. Project Memcached Lua TensorFlow
830
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
recommended first, which results in that the best semantical abilities by learning the source code of open source projects. At
learning path is not recommended. Here, this paper also the same time, 100% of them are willing to learn and participate
classifies this situation as inaccurate. in the contribution of open source projects if possible, which
reflects the fact that open source projects have not only the
C. Comparative Experiments
value as an open source program, but also the value as an
TABLE VI. SURVEY QUESTIONS
important learning resource. They are widely used to help
developers learn and improve their programming skills.
Question
Question content
ID
Q1 I am familiar with C++
831
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
version of the project. It can be seen that linking different acceptance of these developers. 80% of them believe that the
versions of knowledge are very helpful for the learning of learning path we propose is a good way to learn open source
developers, which can greatly expand the range of learning projects. The experiment results show the overall feasibility and
materials the developers can use. The above result shows that the effectiveness of the method we propose.
our practice of fusing different versions of knowledge for open
source projects is significant.
2) Learning-path-finding comparative experiment: After
the survey, through two sets of experiments on these developers,
we conduct a comparative experiment between the learning-
path-finding task without multi-version problem and the
learning-path-finding task with multi-version problem. The
first purpose of this comparative experiment is to verify
whether our recommendation algorithm can help developers
understand the project code faster and better than the
developers’ personal learning methods, namely how much time
developers can save. The second purpose is to verify whether
our method can help developers link different versions of Fig. 6. The acceptance of the concept of learning path
knowledge in a multi-version environment, thereby improve
the accuracy of learning the correct target. V. CONCLUSION AND FUTURE WORK
The first set of experiment asks the developers to get
This paper puts forwards a method which can automatically
familiar with a function named “item_lock” in the project
recommend learning paths of open source projects for
Memcached and records their time costs while doing the job.
developers using deep learning on knowledge graphs. It uses
They are asked to not only find the function, but also find a
multiple data sources in an open source community to build a
function call path from the outermost layer of the project to the
knowledge graph of an open source project. Based on a
target function, which is considered to be the learning path. The
knowledge graph embedding model and a learning path
second set of experiment asks them to do the same thing for a
recommendation algorithm, the method recommends suitable
function named “try_read_command”. However, the second
learning paths for developers. We select three well-known open
target function is not involved in the latest version of
source projects Lua, Memcached, and TensorFlow as cases to
Memcached. We can therefore measure the developers’
verify our method. The experiments verify that our method can
learning efforts while facing a multi-version environment.
greatly reduce the time for developers to learn the project code.
During the experiments, these developers can use any tools they
The experiments also verify that the method can help
are familiar with, including using search engines, using IDEs,
developers better learn about open source projects. The method
and consulting related documents. All of the developers have
makes developers contribute to open source projects more
not heard of Memcached before.Table VII and Table VIII show
easily and enhance the vitality of open source communities. In
the experiment results of the two sets of experiments. We
the future, we can further improve knowledge graphs by
calculate the accuracy of the learning paths by the same
extracting more entities and relationships from the project code.
criterion described in the experiment we propose in Section IV-
B. From the results, our method can greatly reduce the time ACKNOWLEDGMENT
spent by developers on reading and studying the project code,
This effort is sponsored by the Beijing Outstanding Young
and when facing cross-version knowledge links, it can improve
Scientist Program under the grant number
the accuracy of developers’ learning correct knowledge.
BJJWZYJH01201910001004 and the National Natural Science
TABLE VII. EXPERIMENT RESULTS WITHOUT MULTI-VERSION PROBLEMS Foundation of China under Grant No.61421091.
832
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.
[6] J. Tu, X. Xie, Y. Zhou, B. Xu, & L. Chen. 2016. A Search Based Context- [24] Suchanek F M, Kasneci G, Weikum G. Yago: a core of semantic
Aware Approach for Understanding and Localizing the Fault via knowledge[C]//Proceedings of the 16th international conference on
Weighted Call Graph. In 2016 Third International Conference on World Wide Web. 2007: 697-706.
Trustworthy Systems and their Applications (TSA) , IEEE, 64-72. [25] Carlson A, Betteridge J, Kisiel B, et al. Toward an architecture for never-
[7] Ali, K., Lai, X., Luo, Z., Lhoták, O., Dolby, J., & Tip, F. (2019). A Study ending language learning[C]//Twenty-Fourth AAAI Conference on
of Call Graph Construction for JVM-Hosted Languages. IEEE Artificial Intelligence. 2010.
Transactions on Software Engineering. [26] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, & O. Yakhnenko.
[8] Gharibi G, Tripathi R, Lee Y. Code2graph: automatic generation of static 2013. Translating embeddings for modeling multi-relational data. In
call graphs for python source code[C]//Proceedings of the 33rd Advances in neural information processing systems, 2787-2795.
ACM/IEEE International Conference on Automated Software [27] Z. Wang, J. Zhang, J. Feng, & Z. Chen. 2014. Knowledge graph
Engineering. 2018: 880-883. embedding by translating on hyperplanes. In Twenty-Eighth AAAI
[9] H. Gascon, F. Yamaguchi, D. Arp, & K. Rieck. 2013. Structural detection conference on artificial intelligence.
of android malware using embedded call graphs. In Proceedings of the [28] Y. Lin, Z. Liu, M. Sun, Y. Liu, & X. Zhu. 2015. Learning entity and
2013 ACM workshop on Artificial intelligence and security, ACM, 45- relation embeddings for knowledge graph completion. In Twenty-ninth
54. AAAI conference on artificial intelligence.
[10] M. Trapp, M. Rossberg, & G. Schaefer. 2015. Program partitioning based [29] G. Ji, K. Liu, S. He, & J. Zhao. 2016. Knowledge graph completion with
on static call graph analysis for privilege separation. In 2015 IEEE adaptive sparse transfer matrix. In Thirtieth AAAI Conference on
Symposium on Computers and Communication (ISCC) , IEEE, 613-618. Artificial Intelligence.
[11] Y. Zou, C. Ling, Z. Lin, & B. Xie. 2018. Graph Embedding based Code [30] B. Perozzi, R. Al-Rfou, & S. Skiena. 2014. Deepwalk: Online learning of
Search in Software Project. In Proceedings of the Tenth Asia-Pacific social representations. In Proceedings of the 20th ACM SIGKDD
Symposium on Internetware, ACM, 1 international conference on Knowledge discovery and data mining, ACM,
[12] F. Lv, H. Zhang, J. G. Lou, S. Wang, D. Zhang, & J. Zhao. 2015. 701-710.
Codehow: Effective code search based on api understanding and extended [31] Dot, May 2020, [online] Available:
boolean model (e). In 2015 30th IEEE/ACM International Conference on https://www.graphviz.org/doc/info/lang.html
Automated Software Engineering (ASE), IEEE, 260-270.
[32] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, & O. Yakhnenko.
[13] Lin Z, Zou Y, Zhao J, et al. Improving software text retrieval using 2013. Translating embeddings for modeling multi-relational data. In
conceptual knowledge in source code[C]//2017 32nd IEEE/ACM Advances in neural information processing systems, 2787-2795.
International Conference on Automated Software Engineering (ASE).
[33] Weston J, Bordes A, Yakhnenko O, et al. Connecting language and
IEEE, 2017: 123-134.
knowledge bases with embedding models for relation extraction[J]. arXiv
[14] https://www.jetbrains.com/idea/ preprint arXiv:1307.7973, 2013.
[15] Y. Lin, G. Meng, Y. Xue, Z. Xing, J. Sun, X. Peng, ... & J. Dong. 2017. [34] Riedel S, Yao L, McCallum A, et al. Relation extraction with matrix
Mining implicit design templates for actionable code reuse. In factorization and universal schemas[C]//Proceedings of the 2013
Proceedings of the 32nd IEEE/ACM International Conference on Conference of the North American Chapter of the Association for
Automated Software Engineering, IEEE Press, 394-404. Computational Linguistics: Human Language Technologies. 2013: 74-84.
[16] S. Zhou, H. Zhong, & B. Shen. 2018. SLAMPA: Recommending Code [35] Nickel M, Tresp V, Kriegel H P. Factorizing yago: scalable machine
Snippets with Statistical Language Model. In 2018 25th Asia-Pacific learning for linked data[C]//Proceedings of the 21st international
Software Engineering Conference (APSEC), IEEE, 79-88. conference on World Wide Web. 2012: 271-280.
[17] Sun Z, Peng F, Guan J, et al. An approach to helping developers learn [36] Nickel M, Tresp V, Kriegel H P. A three-way model for collective
open source projects based on machine learning[C]//Proceedings of the learning on multi-relational data[C]//Icml. 2011, 11: 809-816.
11th Asia-Pacific Symposium on Internetware. 2019: 1-10.
[37] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for
[18] S. Prabhakar, G. Spanakis, & O. Zaiane. 2017. Reciprocal recommender modeling multi-relational data[C]//Advances in neural information
system for learners in massive open online courses (moocs). In processing systems. 2013: 2787-2795.
International Conference on Web-Based Learning, Springer, Cham, 157-
[38] X. Han, S. Cao, X. Lv, Y. Lin, Z. Liu, M. Sun, & J. Li. 2018. Openke: An
167.
open toolkit for knowledge embedding. In Proceedings of the 2018
[19] Y. Dai, Y. Asano, & M. Yoshikawa. 2016. Course Content Analysis: An Conference on Empirical Methods in Natural Language Processing:
Initiative Step toward Learning Object Recommendation Systems for System Demonstrations, 139-144.
MOOC Learners. International Educational Data Mining Society.
[39] Nickel M, Rosasco L, Poggio T. Holographic embeddings of knowledge
[20] H. M. Chang, T. M. L. Kuo, S. C. Chen, C. A. Li, Y. W. Huang, Y. C. graphs[C]//Thirtieth Aaai conference on artificial intelligence. 2016.
Cheng, ... & J. W. Tzeng. 2016. Developing a data-driven learning interest
[40] Trouillon T, Welbl J, Riedel S, et al. Complex embeddings for simple link
recommendation system to promoting self-paced learning on MOOCs. In
2016 IEEE 16th International Conference on Advanced Learning prediction[C]. International Conference on Machine Learning (ICML),
Technologies (ICALT), IEEE, 23-25. 2016.
[41] Wang Q, Mao Z, Wang B, et al. Knowledge graph embedding: A survey
[21] Y. Pang, C. Liao, W. Tan, Y. Wu, & C. Zhou. 2018. Recommendation for
MOOC with Learner Neighbors and Learning Series. In International of approaches and applications[J]. IEEE Transactions on Knowledge and
Conference on Web Information Systems Engineering, Springer, Cham, Data Engineering, 2017, 29(12): 2724-2743.
379-394. [42] Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of
[22] Bollacker K, Evans C, Paritosh P, et al. Freebase: a collaboratively created communities in large networks[J]. Journal of statistical mechanics: theory
graph database for structuring human knowledge[C]//Proceedings of the and experiment, 2008, 2008(10): P10008.
2008 ACM SIGMOD international conference on Management of data. [43] Lua, May 2020, [online] Available: https://github.com/lua/lua
2008: 1247-1250. [44] Memcached, May 2020, [online] Available:
[23] Lehmann J, Isele R, Jakob M, et al. DBpedia–a large-scale, multilingual https://github.com/memcached/ memcached
knowledge base extracted from Wikipedia[J]. Semantic Web, 2015, 6(2): [45] TensorFlow, May 2020, [online] Available:
167-195. https://github.com/tensorflow/ tensorflow
[46] Gephi, May 2020, [online] Available: https://gephi.org/
833
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on October 14,2024 at 11:41:37 UTC from IEEE Xplore. Restrictions apply.