0% found this document useful (0 votes)
31 views3 pages

Working Session Information Retrieval Ba

Uploaded by

qdr8y5mm44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views3 pages

Working Session Information Retrieval Ba

Uploaded by

qdr8y5mm44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Working Session:

Information Retrieval Based Approaches in Software Evolution

Andrian Marcus1, Andrea De Lucia2, Jane Huffman Hayes3, Denys Poshyvanyk1


1 2 3
Department of Computer Science Dipart. di Matem. e Informatica Department of Computer Science
Wayne State University Università di Salerno University of Kentucky
Detroit, MI 48202 Via ponte don Melillo, 301 Rose Street
313 577 5408 84084, Fisciano (SA), Italy Lexington, KY 40506
[email protected], +39 089 963376 859 257 3171
[email protected] [email protected] [email protected]

domain of the software and capture design decisions,


Abstract change requests, developer information, etc. This
During software evolution a collection of related unstructured information is referred to as semantic, as
artifacts with different representations are created. opposed to structural, which is expressed mainly by the
Some of these are composed of structured data (e.g., source code and other data intensive artifacts, such as
analysis data), some contain semi-structured information analysis information.
(e.g., source code), and many include unstructured The single developer/maintainer development model
information (e.g., text). Research efforts exist that are did not need capturing much of this information, as the
trying to extract, represent, and analyze the unstructured working and long term memory of the developer often
information in software. Information retrieval (IR) sufficed to store such information. Today, the increasing
techniques are used quite successfully in the past years to size and complexity of software needs large development
represent and extract textual information from software groups, often distributed geographically. Storing and
artifacts, with application to many maintenance tasks. sharing the semantic information is much needed today.
This working session will focus on the state on the art More than that, given the large amount of it, tools are
in the application of IR-based techniques to support necessary for its storage, retrieval, and analysis, before it
software maintenance activities. The session aims to is delivered to the users.
identify the main research and practical issues in the
field, to determine future work directions, and to foster 2. State of the Art
collaborations among the participants.
In the past decade, researchers proposed information
retrieval (IR) models to address these problems related to
1. Introduction and Rationale the semantic information in existing software. Early
models were used to construct software libraries [13] and
Software is comprised of a multitude of artifacts; more recent work focused on specific software
some of them are intended to be read by the compiler, maintenance or development tasks such as:
while many others are intended to be read by developers. • Traceability link recovery [1, 5, 8, 12, 15]
This is especially true during software evolution, when • Concept location [17, 19, 24]
developers have to deal with large software, often written • Software and web site modularization and reverse
by others. engineering [9, 10, 14, 21]
The user centric information is often expressed in • Requirements engineering [3, 18]
natural language and it is embedded in documentation • Software reuse [7, 13, 23]
and source code. This information is very important for • Impact analysis [2]
the developers to understand a great deal of the why and • Quality assessment and software measurement [11,
what of the software system, as much as the source code 16, 20], etc.
is useful to understand the how of the software. Natural These IR based approaches to software engineering
language external documentation (e.g., requirements, problems differ not only in their scope, but also in their
design documents, user manual, etc.), comments, and underlying indexing mechanism, corpus construction, or
identifiers in the source code encode to a large degree the
data analysis method. A general model can be described and precision, etc.? Are there specific problems
with the following steps: associated with different IR methods?
1. A corpus is created using the source code and other • Who among the current researchers can collaborate
linguistic software artifacts, such as the external on future projects?
documentation. Various processing methods are • Is there available software produced by any research
employed in the corpus construction, some based on group? Can we initiate and maintain an open source
natural processing techniques, such as word effort in the area?
stemming. Each document in the corpus • How can we best integrate IR methods with other
corresponds to a specific software element, such as a techniques for the analysis of unstructured
file, a class, or a method. information (e.g., natural language processing)?
2. An IR method is used to index the corpus, such as What is the trade-off?
vector space models [22], Latent Semantic Indexing • How can we bridge the work of the software
[6], Bayes classifiers, or other probabilistic models maintenance community and other groups from areas
[4], etc. A semantic space of the software system is like requirements engineering, programming
created. languages, etc?
3. A similarity measure between the documents in the • Is there a need for future, organized meetings like
corpus is defined and similarities are computed this working session?
among the corresponding software elements. These
measures are commonly referred to as semantic 4. Session Format
similarities.
4. The semantic similarities are used to solve the The working session will have 90 minutes and will
maintenance or development task at hand. Some consist of three parts.
approaches combine these measures with additional It will start with short interactive presentations given
data extracted with structural software analysis tools, by some of the participants, which will be solicited in
such as: dependencies, software change data, advance and selected by the organizers. These
execution traces, test cases, etc. presentations will focus on existing approaches and
techniques.
3. Open Issues and Problems Following these presentations, all the participants will
participate in an open brainstorming session, which will
The working session has several complementing focus on identifying open issues in the field, new
goals. First, it aims at clearly defining the state of the art challenges, etc. Questions will be asked and answers
in the filed, briefly described above. As the field grows, provided by the participants.
researchers and practitioners need to agree on a common The final part will be devoted to recapitulate and
terminology, as the current work by different groups is reiterate the unanswered items from the previous two
somewhat incoherent. We need to assess how far this parts and to build a roadmap for future events, research,
field came to date and how far it can go in the future. and collaborations among the participants.
In addition, we want to identify which issues are
already answered by research and ready for practical 5. Expected Outcome of the Session
applications and which are still open or unaddressed.
Several questions will be directly addressed during the A website for the working session will be developed
working session and many more will be raised on the and maintained by the organizers. The discussions and
spot: presentations from the session will be summarized and
• How can we refine and improve the general model, publicized on the website and other appropriate venues.
presented above? Does the model suit all current We expect that this session will be the first in a
and future applications? succession of future events that will focus on this
• Do certain IR methods suit specific software research area and will also include related fields.
maintenance problems, or we can use any of them
for any task? 6. References
• Is the field mature enough to talk about
benchmarking? [1] Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., and
• What new applications in software evolution exist Merlo, E., "Recovering Traceability Links between Code and
for the IR-based approaches? Documentation", IEEE Transactions on Software Engineering,
28, 10, October 2002, pp. 970 - 983.
• What are the major practical problems with the
current state of the art: efficiency, scalability, recall [2] Antoniol, G., Canfora, G., Casazza, G., and Lucia, A.,
"Identifying the Starting Impact Set of a Maintenance Request:
A Case Study", in Proceedings 4th European Conference on Proceedings 23rd International Conference on Software
Software Maintenance and Reengineering (CSMR'00), Zurich, Engineering (ICSE'01), Toronto, Ontario, Canada, May 12-19
Switzerland, February 29 - March 03 2000, pp. 227-230. 2001, pp. 103-112.
[3] Clelang-Huang, J., Settimi, R., Duan, C., and Zou, X., [15] Marcus, A., Maletic, J. I., and Sergeyev, A., "Recovery of
"Utilizing Supporting Evidence to Improve Dynamic Traceability Links Between Software Documentation and
Requirements Traceability", in Proceedings International Source Code", International Journal of Software Engineering
Requirements Engineering Conference (RE'05), Paris, France, and Knowledge Engineering, 15, 5, October 2005, pp. 811-836.
2005, pp. 135-144.
[16] Marcus, A. and Poshyvanyk, D., "The Conceptual
[4] Crestani, F., Lalmas, M., Van Rijsbergen, C. J., and Cohesion of Classes", in Proceedings IEEE International
Campbell, I., "Is this document relevant?…probably: a survey Conference on Software Maintenance (ICSM'05), Budapest,
of probabilistic models in information retrieval", ACM Hungary, September 25-30 2005, pp. 133-142.
Computing Surveys, 30, 4, 1998, pp. 528-552.
[17] Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J., "An
[5] De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G., Information Retrieval Approach to Concept Location in Source
"Enhancing an Artefact Management System with Traceability Code", in Proceedings 11th IEEE Working Conference on
Recovery Features", in Proceedings IEEE International Reverse Engineering (WCRE'04), Delft, The Netherlands,
Conference on Software Maintenance (ICSM'04), Chicago, IL, November 9-12 2004, pp. 214-223.
September 11-17 2004, pp. 306-315.
[18] och Dag, J. N., Gervasi, V., Brinkkemper, S., and Regnell,
[6] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. B., "A Linguistic-Engineering Approach to Large-Scale
K., and Harshman, R., "Indexing by Latent Semantic Analysis", Requirements Management", IEEE Software, 22, 1, 2005, pp.
Journal of the American Society for Information Science, 41, 32-39.
1990, pp. 391-407.
[19] Poshyvanyk, D., Gael-Gueheneuc, Y., Marcus, A.,
[7] Frakes, W., "Software Reuse Through Information Antoniol, G., and Rajlich, V., "Combining Probabilistic
Retrieval", in Proceedings 20th Hawaii International Ranking and Latent Semantic Indexing for Feature
Conference On System Sciences (HICSS'87), Kona, HI, January Identification", in Proceedings 14th IEEE International
1987, pp. 530-535. Conference on Program Comprehension (ICPC'06), Athens,
Greece, June 14-16 2006, pp. 137-148.
[8] Hayes, J. H., Dekhtyar, A., and Sundaram, S. K.,
"Advancing Candidate Link Generation for Requirements [20] Poshyvanyk, D. and Marcus, A., "The Conceptual
Tracing: The Study of Methods", IEEE Transactions on Coupling Metrics for Object-Oriented Systems", in Proceedings
Software Engineering, 32, 1, January 2006, pp. 4-19. 22nd IEEE International Conference on Software Maintenance
(ICSM'06), Philadelphia, PA, September 25-27 2006, pp. to
[9] Kawaguchi, S., Garg, P. K., Matsushita, M., and Inoue, K., appear.
"Mudablue: An automatic categorization system for open
source repositories", in Proceedings the 11th Asia-Pacific [21] Ricca, F., Tonella, P., Girardi, C., and Pianta, E., "An
Software Engineering Conference (APSEC'04), 2004, pp. 184- Empirical Study on Keyword-based Web Site Clustering", in
193. Proceedings 12th IEEE International Workshop on Program
Comprehension (IWPC'04), Bari, Italy, 2004, pp. 204-213.
[10] Kuhn, A., Ducasse, S., and Girba, T., "Enriching Reverse
Engineering with Semantic Clustering", in Proceedings IEEE [22] Salton, G. and McGill, M., Introduction to Modern
Working Conference On Reverse Engineering (WCRE'05), Information Retrival, McGraw-Hill, 1983.
Pittsburgh, PA, November 8-11 2005, pp. 113—122.
[23] Ye, Y. and Fischer, G., "Supporting Reuse by Delivering
[11] Lawrie, D., Feild, H., and Binkley, D., "Leveraged Quality Task-Relevant and Personalized Information", in Proceedings
Assessment Using Information Retrieval Techniques", in IEEE/ACM International Conference on Software Engineering
Proceedings 14th IEEE International Conference on Program (ICSE'02), Orlando, FL, May 19-25 2002, pp. 513-523.
Comprehension (ICPC'06), Athens, Greece, June 14-16 2006,
pp. 149-158. [24] Zhao, W., Zhang, L., Liu, Y., Sun, J., and Yang, F.,
"SNIAFL: Towards a Static Non-Interactive Approach to
[12] Lormans, M. and Van Deursen, A., "Can LSI help Feature Location", ACM Transactions on Software Engineering
Reconstructing Requirements Traceability in Design and Test?" and Methodologies, 2006, pp. to appear.
in Proceedings 10th European Conference on Software
Maintenance and Reengineering (CSMR'06), Bari, Italy, March
12 2006, pp. 47-56.
[13] Maarek, Y. S., Berry, D. M., and Kaiser, G. E., "An
Information Retrieval Approach for Automatically Constructing
Software Libraries", IEEE Transactions on Software
Engineering, 17, 8, 1991, pp. 800-813.
[14] Maletic, J. I. and Marcus, A., "Supporting Program
Comprehension Using Semantic and Structural Information", in

You might also like