Working Session Information Retrieval Ba

Uploaded by

qdr8y5mm44

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views3 pages

Working Session Information Retrieval Ba

Uploaded by

qdr8y5mm44

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Working Session:

Information Retrieval Based Approaches in Software Evolution

Andrian Marcus1, Andrea De Lucia2, Jane Huffman Hayes3, Denys Poshyvanyk1

1 2 3
Department of Computer Science Dipart. di Matem. e Informatica Department of Computer Science
Wayne State University Università di Salerno University of Kentucky
Detroit, MI 48202 Via ponte don Melillo, 301 Rose Street
313 577 5408 84084, Fisciano (SA), Italy Lexington, KY 40506
[email protected], +39 089 963376 859 257 3171
[email protected] [email protected] [email protected]

domain of the software and capture design decisions,

Abstract change requests, developer information, etc. This
During software evolution a collection of related unstructured information is referred to as semantic, as
artifacts with different representations are created. opposed to structural, which is expressed mainly by the
Some of these are composed of structured data (e.g., source code and other data intensive artifacts, such as
analysis data), some contain semi-structured information analysis information.
(e.g., source code), and many include unstructured The single developer/maintainer development model
information (e.g., text). Research efforts exist that are did not need capturing much of this information, as the
trying to extract, represent, and analyze the unstructured working and long term memory of the developer often
information in software. Information retrieval (IR) sufficed to store such information. Today, the increasing
techniques are used quite successfully in the past years to size and complexity of software needs large development
represent and extract textual information from software groups, often distributed geographically. Storing and
artifacts, with application to many maintenance tasks. sharing the semantic information is much needed today.
This working session will focus on the state on the art More than that, given the large amount of it, tools are
in the application of IR-based techniques to support necessary for its storage, retrieval, and analysis, before it
software maintenance activities. The session aims to is delivered to the users.
identify the main research and practical issues in the
field, to determine future work directions, and to foster 2. State of the Art
collaborations among the participants.
In the past decade, researchers proposed information
retrieval (IR) models to address these problems related to
1. Introduction and Rationale the semantic information in existing software. Early
models were used to construct software libraries [13] and
Software is comprised of a multitude of artifacts; more recent work focused on specific software
some of them are intended to be read by the compiler, maintenance or development tasks such as:
while many others are intended to be read by developers. • Traceability link recovery [1, 5, 8, 12, 15]
This is especially true during software evolution, when • Concept location [17, 19, 24]
developers have to deal with large software, often written • Software and web site modularization and reverse
by others. engineering [9, 10, 14, 21]
The user centric information is often expressed in • Requirements engineering [3, 18]
natural language and it is embedded in documentation • Software reuse [7, 13, 23]
and source code. This information is very important for • Impact analysis [2]
the developers to understand a great deal of the why and • Quality assessment and software measurement [11,
what of the software system, as much as the source code 16, 20], etc.
is useful to understand the how of the software. Natural These IR based approaches to software engineering
language external documentation (e.g., requirements, problems differ not only in their scope, but also in their
design documents, user manual, etc.), comments, and underlying indexing mechanism, corpus construction, or
identifiers in the source code encode to a large degree the
data analysis method. A general model can be described and precision, etc.? Are there specific problems
with the following steps: associated with different IR methods?
1. A corpus is created using the source code and other • Who among the current researchers can collaborate
linguistic software artifacts, such as the external on future projects?
documentation. Various processing methods are • Is there available software produced by any research
employed in the corpus construction, some based on group? Can we initiate and maintain an open source
natural processing techniques, such as word effort in the area?
stemming. Each document in the corpus • How can we best integrate IR methods with other
corresponds to a specific software element, such as a techniques for the analysis of unstructured
file, a class, or a method. information (e.g., natural language processing)?
2. An IR method is used to index the corpus, such as What is the trade-off?
vector space models [22], Latent Semantic Indexing • How can we bridge the work of the software
[6], Bayes classifiers, or other probabilistic models maintenance community and other groups from areas
[4], etc. A semantic space of the software system is like requirements engineering, programming
created. languages, etc?
3. A similarity measure between the documents in the • Is there a need for future, organized meetings like
corpus is defined and similarities are computed this working session?
among the corresponding software elements. These
measures are commonly referred to as semantic 4. Session Format
similarities.
4. The semantic similarities are used to solve the The working session will have 90 minutes and will
maintenance or development task at hand. Some consist of three parts.
approaches combine these measures with additional It will start with short interactive presentations given
data extracted with structural software analysis tools, by some of the participants, which will be solicited in
such as: dependencies, software change data, advance and selected by the organizers. These
execution traces, test cases, etc. presentations will focus on existing approaches and
techniques.
3. Open Issues and Problems Following these presentations, all the participants will
participate in an open brainstorming session, which will
The working session has several complementing focus on identifying open issues in the field, new
goals. First, it aims at clearly defining the state of the art challenges, etc. Questions will be asked and answers
in the filed, briefly described above. As the field grows, provided by the participants.
researchers and practitioners need to agree on a common The final part will be devoted to recapitulate and
terminology, as the current work by different groups is reiterate the unanswered items from the previous two
somewhat incoherent. We need to assess how far this parts and to build a roadmap for future events, research,
field came to date and how far it can go in the future. and collaborations among the participants.
In addition, we want to identify which issues are
already answered by research and ready for practical 5. Expected Outcome of the Session
applications and which are still open or unaddressed.
Several questions will be directly addressed during the A website for the working session will be developed
working session and many more will be raised on the and maintained by the organizers. The discussions and
spot: presentations from the session will be summarized and
• How can we refine and improve the general model, publicized on the website and other appropriate venues.
presented above? Does the model suit all current We expect that this session will be the first in a
and future applications? succession of future events that will focus on this
• Do certain IR methods suit specific software research area and will also include related fields.
maintenance problems, or we can use any of them
for any task? 6. References
• Is the field mature enough to talk about
benchmarking? [1] Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., and
• What new applications in software evolution exist Merlo, E., "Recovering Traceability Links between Code and
for the IR-based approaches? Documentation", IEEE Transactions on Software Engineering,
28, 10, October 2002, pp. 970 - 983.
• What are the major practical problems with the
current state of the art: efficiency, scalability, recall [2] Antoniol, G., Canfora, G., Casazza, G., and Lucia, A.,
"Identifying the Starting Impact Set of a Maintenance Request:
A Case Study", in Proceedings 4th European Conference on Proceedings 23rd International Conference on Software
Software Maintenance and Reengineering (CSMR'00), Zurich, Engineering (ICSE'01), Toronto, Ontario, Canada, May 12-19
Switzerland, February 29 - March 03 2000, pp. 227-230. 2001, pp. 103-112.
[3] Clelang-Huang, J., Settimi, R., Duan, C., and Zou, X., [15] Marcus, A., Maletic, J. I., and Sergeyev, A., "Recovery of
"Utilizing Supporting Evidence to Improve Dynamic Traceability Links Between Software Documentation and
Requirements Traceability", in Proceedings International Source Code", International Journal of Software Engineering
Requirements Engineering Conference (RE'05), Paris, France, and Knowledge Engineering, 15, 5, October 2005, pp. 811-836.
2005, pp. 135-144.
[16] Marcus, A. and Poshyvanyk, D., "The Conceptual
[4] Crestani, F., Lalmas, M., Van Rijsbergen, C. J., and Cohesion of Classes", in Proceedings IEEE International
Campbell, I., "Is this document relevant?…probably: a survey Conference on Software Maintenance (ICSM'05), Budapest,
of probabilistic models in information retrieval", ACM Hungary, September 25-30 2005, pp. 133-142.
Computing Surveys, 30, 4, 1998, pp. 528-552.
[17] Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J., "An
[5] De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G., Information Retrieval Approach to Concept Location in Source
"Enhancing an Artefact Management System with Traceability Code", in Proceedings 11th IEEE Working Conference on
Recovery Features", in Proceedings IEEE International Reverse Engineering (WCRE'04), Delft, The Netherlands,
Conference on Software Maintenance (ICSM'04), Chicago, IL, November 9-12 2004, pp. 214-223.
September 11-17 2004, pp. 306-315.
[18] och Dag, J. N., Gervasi, V., Brinkkemper, S., and Regnell,
[6] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. B., "A Linguistic-Engineering Approach to Large-Scale
K., and Harshman, R., "Indexing by Latent Semantic Analysis", Requirements Management", IEEE Software, 22, 1, 2005, pp.
Journal of the American Society for Information Science, 41, 32-39.
1990, pp. 391-407.
[19] Poshyvanyk, D., Gael-Gueheneuc, Y., Marcus, A.,
[7] Frakes, W., "Software Reuse Through Information Antoniol, G., and Rajlich, V., "Combining Probabilistic
Retrieval", in Proceedings 20th Hawaii International Ranking and Latent Semantic Indexing for Feature
Conference On System Sciences (HICSS'87), Kona, HI, January Identification", in Proceedings 14th IEEE International
1987, pp. 530-535. Conference on Program Comprehension (ICPC'06), Athens,
Greece, June 14-16 2006, pp. 137-148.
[8] Hayes, J. H., Dekhtyar, A., and Sundaram, S. K.,
"Advancing Candidate Link Generation for Requirements [20] Poshyvanyk, D. and Marcus, A., "The Conceptual
Tracing: The Study of Methods", IEEE Transactions on Coupling Metrics for Object-Oriented Systems", in Proceedings
Software Engineering, 32, 1, January 2006, pp. 4-19. 22nd IEEE International Conference on Software Maintenance
(ICSM'06), Philadelphia, PA, September 25-27 2006, pp. to
[9] Kawaguchi, S., Garg, P. K., Matsushita, M., and Inoue, K., appear.
"Mudablue: An automatic categorization system for open
source repositories", in Proceedings the 11th Asia-Pacific [21] Ricca, F., Tonella, P., Girardi, C., and Pianta, E., "An
Software Engineering Conference (APSEC'04), 2004, pp. 184- Empirical Study on Keyword-based Web Site Clustering", in
193. Proceedings 12th IEEE International Workshop on Program
Comprehension (IWPC'04), Bari, Italy, 2004, pp. 204-213.
[10] Kuhn, A., Ducasse, S., and Girba, T., "Enriching Reverse
Engineering with Semantic Clustering", in Proceedings IEEE [22] Salton, G. and McGill, M., Introduction to Modern
Working Conference On Reverse Engineering (WCRE'05), Information Retrival, McGraw-Hill, 1983.
Pittsburgh, PA, November 8-11 2005, pp. 113—122.
[23] Ye, Y. and Fischer, G., "Supporting Reuse by Delivering
[11] Lawrie, D., Feild, H., and Binkley, D., "Leveraged Quality Task-Relevant and Personalized Information", in Proceedings
Assessment Using Information Retrieval Techniques", in IEEE/ACM International Conference on Software Engineering
Proceedings 14th IEEE International Conference on Program (ICSE'02), Orlando, FL, May 19-25 2002, pp. 513-523.
Comprehension (ICPC'06), Athens, Greece, June 14-16 2006,
pp. 149-158. [24] Zhao, W., Zhang, L., Liu, Y., Sun, J., and Yang, F.,
"SNIAFL: Towards a Static Non-Interactive Approach to
[12] Lormans, M. and Van Deursen, A., "Can LSI help Feature Location", ACM Transactions on Software Engineering
Reconstructing Requirements Traceability in Design and Test?" and Methodologies, 2006, pp. to appear.
in Proceedings 10th European Conference on Software
Maintenance and Reengineering (CSMR'06), Bari, Italy, March
12 2006, pp. 47-56.
[13] Maarek, Y. S., Berry, D. M., and Kaiser, G. E., "An
Information Retrieval Approach for Automatically Constructing
Software Libraries", IEEE Transactions on Software
Engineering, 17, 8, 1991, pp. 800-813.
[14] Maletic, J. I. and Marcus, A., "Supporting Program
Comprehension Using Semantic and Structural Information", in

Static Program Analysis
No ratings yet
Static Program Analysis
11 pages
Software Evolution and Maintenance Basic Concepts and Preliminaries
No ratings yet
Software Evolution and Maintenance Basic Concepts and Preliminaries
31 pages
Lecture 5
No ratings yet
Lecture 5
22 pages
Week No. 6
No ratings yet
Week No. 6
24 pages
Application of DM in Se
No ratings yet
Application of DM in Se
15 pages
Software Engineering
No ratings yet
Software Engineering
114 pages
Mining Textual Data For Software Enginee
No ratings yet
Mining Textual Data For Software Enginee
5 pages
Software Maintenance: An Approach To Impact Analysis of Objects Change
No ratings yet
Software Maintenance: An Approach To Impact Analysis of Objects Change
27 pages
ADT Semester 1 Summary
No ratings yet
ADT Semester 1 Summary
46 pages
Studies On Reading Techniques Victor Basili, Gianluigi Caldiera, Filippo
No ratings yet
Studies On Reading Techniques Victor Basili, Gianluigi Caldiera, Filippo
7 pages
DMML
100% (2)
DMML
5 pages
Introduction SW
No ratings yet
Introduction SW
40 pages
Lesson 3 Traditional Approach Vs OO Approach
No ratings yet
Lesson 3 Traditional Approach Vs OO Approach
10 pages
Process Activities: Software Specification Software Design and Implementation Software Validation Software Evolution
No ratings yet
Process Activities: Software Specification Software Design and Implementation Software Validation Software Evolution
3 pages
Water Sluice
No ratings yet
Water Sluice
56 pages
2014 Paper S2 A PDF
No ratings yet
2014 Paper S2 A PDF
16 pages
Need For Maintenance
No ratings yet
Need For Maintenance
9 pages
Mining Software Repositories With Topic Models: Stephen W. Thomas
No ratings yet
Mining Software Repositories With Topic Models: Stephen W. Thomas
43 pages
SEN Notes
No ratings yet
SEN Notes
2 pages
PBRS
No ratings yet
PBRS
92 pages
Bug IR
No ratings yet
Bug IR
24 pages
Wild Software Meta-Systems PalmerKD 2007
No ratings yet
Wild Software Meta-Systems PalmerKD 2007
508 pages
Model Versioning in Context of Living Models
No ratings yet
Model Versioning in Context of Living Models
8 pages
Maintenance
No ratings yet
Maintenance
10 pages
Lesson 2
No ratings yet
Lesson 2
27 pages
Model 4
No ratings yet
Model 4
3 pages
Application of Ontologies in SE
No ratings yet
Application of Ontologies in SE
14 pages
Editorial For The JSS Top Scholar Special I - 2010 - Journal of Systems and Soft
No ratings yet
Editorial For The JSS Top Scholar Special I - 2010 - Journal of Systems and Soft
1 page
Y Frame A Modular and Declarative Framew PDF
No ratings yet
Y Frame A Modular and Declarative Framew PDF
26 pages
Abderhu Hussen 1469 Oose Indivisual Ass
No ratings yet
Abderhu Hussen 1469 Oose Indivisual Ass
16 pages
CH 2
No ratings yet
CH 2
5 pages
13software Maintenance Overview PDF
No ratings yet
13software Maintenance Overview PDF
6 pages
Introduction
No ratings yet
Introduction
7 pages
C1 Cuadro Comparativo Metodologias Ingles - Navarro Jimenez Christian
No ratings yet
C1 Cuadro Comparativo Metodologias Ingles - Navarro Jimenez Christian
11 pages
Unit 5.software Reliability
No ratings yet
Unit 5.software Reliability
15 pages
SE Unit 1 - Notes
No ratings yet
SE Unit 1 - Notes
74 pages
Assignment SE - 2017 - 022
No ratings yet
Assignment SE - 2017 - 022
6 pages
Chapter 2 All
No ratings yet
Chapter 2 All
10 pages
Software Evolution
No ratings yet
Software Evolution
11 pages
Adv Software Engineering Lect1
No ratings yet
Adv Software Engineering Lect1
45 pages
Kamran - PDF 085200
No ratings yet
Kamran - PDF 085200
44 pages
Software Engineering CH4
No ratings yet
Software Engineering CH4
13 pages
The Object Oriented Approach
No ratings yet
The Object Oriented Approach
12 pages
1 Empirical Studies in Reverse Engineering State of
No ratings yet
1 Empirical Studies in Reverse Engineering State of
22 pages
Software Engineering
No ratings yet
Software Engineering
144 pages
Software Testing
No ratings yet
Software Testing
20 pages
Presentation On: Software Development Life Cycle
No ratings yet
Presentation On: Software Development Life Cycle
27 pages
Oose 1
No ratings yet
Oose 1
13 pages
Topics in Software Engineering - Summary
No ratings yet
Topics in Software Engineering - Summary
43 pages
Software Engineering Interview Questions
No ratings yet
Software Engineering Interview Questions
7 pages
Module - 01: Software Engineering
No ratings yet
Module - 01: Software Engineering
145 pages
Software Basic
No ratings yet
Software Basic
56 pages
From Requirements To Design Specifications-A Formal Approach
No ratings yet
From Requirements To Design Specifications-A Formal Approach
12 pages
Alexandru Panichella Gall Code Analysis Saner17
No ratings yet
Alexandru Panichella Gall Code Analysis Saner17
12 pages
Review On Structural Software Testing Coverage Approaches: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Review On Structural Software Testing Coverage Approaches: ISSN: 2454-132X Impact Factor: 4.295
6 pages
SSRN Id3563876
No ratings yet
SSRN Id3563876
68 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
The Impact of Money Exchange Fluctuation
No ratings yet
The Impact of Money Exchange Fluctuation
58 pages
Conducting Effective Negotiations: Negotiating Skill-Building
No ratings yet
Conducting Effective Negotiations: Negotiating Skill-Building
5 pages
Malak Al Aabiad CV
No ratings yet
Malak Al Aabiad CV
2 pages
Project Sketch
No ratings yet
Project Sketch
3 pages
诗歌创造
No ratings yet
诗歌创造
15 pages
AI and Freedom of Speech RESEARCH PAPER
No ratings yet
AI and Freedom of Speech RESEARCH PAPER
16 pages
Unit 4 AI
No ratings yet
Unit 4 AI
27 pages
CV Lhuang
No ratings yet
CV Lhuang
12 pages
AI in Medical Coding and Billing Syllabus 1
No ratings yet
AI in Medical Coding and Billing Syllabus 1
2 pages
Syllabus (AI & ML BlackBelt+ Program)
No ratings yet
Syllabus (AI & ML BlackBelt+ Program)
15 pages
Case Study C++
No ratings yet
Case Study C++
11 pages
Smart College Enquiry Chatbot Using Deep Learning Algorithm
No ratings yet
Smart College Enquiry Chatbot Using Deep Learning Algorithm
88 pages
Artificial Intelligence: B.E. (Computer Technology) Semester Seventh (C.B.S.)
No ratings yet
Artificial Intelligence: B.E. (Computer Technology) Semester Seventh (C.B.S.)
2 pages
AI-102 Official Course Study Guide
100% (1)
AI-102 Official Course Study Guide
24 pages
Lalitha Priyanka
No ratings yet
Lalitha Priyanka
8 pages
Research Gr4 Chapter 1-3
No ratings yet
Research Gr4 Chapter 1-3
43 pages
IX AI Holiday Homework
No ratings yet
IX AI Holiday Homework
2 pages
25EASMarch 3369
No ratings yet
25EASMarch 3369
10 pages
Revised TT Remedial ESE All Colleges Summer 2024
No ratings yet
Revised TT Remedial ESE All Colleges Summer 2024
40 pages
ANLP Syllabus 2021
No ratings yet
ANLP Syllabus 2021
7 pages
Lecture 01
No ratings yet
Lecture 01
44 pages
Sarcia - Judd Michael - AS4
No ratings yet
Sarcia - Judd Michael - AS4
6 pages
X AI Monthly Test QP & MS
No ratings yet
X AI Monthly Test QP & MS
9 pages
WILP Degree Course Descriptions
No ratings yet
WILP Degree Course Descriptions
80 pages
Module 2
No ratings yet
Module 2
5 pages
Aichatbots
No ratings yet
Aichatbots
26 pages
Mahi Gupta
No ratings yet
Mahi Gupta
3 pages
Final Btech - BTechBrochure2024
No ratings yet
Final Btech - BTechBrochure2024
11 pages
POA - Tracker
No ratings yet
POA - Tracker
60 pages
7amba Proposal
No ratings yet
7amba Proposal
23 pages
Engineering Applications of Artificial Intelligence by Aziza Chakir
No ratings yet
Engineering Applications of Artificial Intelligence by Aziza Chakir
443 pages
Introduction NLC
No ratings yet
Introduction NLC
69 pages

Working Session Information Retrieval Ba

Uploaded by

Working Session Information Retrieval Ba

Uploaded by

Working Session:

Information Retrieval Based Approaches in Software Evolution

Andrian Marcus1, Andrea De Lucia2, Jane Huffman Hayes3, Denys Poshyvanyk1

domain of the software and capture design decisions,

You might also like