Unit 1
Unit 1
Introduction
to Mining Software Repositories
Even Sem Jan 2024
Dr. Kamaldeep Kaur
Software Engineering
• The establishment and use of sound engineering
principles in order to obtain economically
developed software that is reliable and works
efficiently on real machines.
• Software engineering is defined by IEEE
Computer Society as (Abren et al. 2004):
– The application of a systematic, disciplined,
quantifiable approach to the development, operation
and maintenance of software, and the study of these
approaches, that is, the application of engineering to
software.
Empirical Software Engineering
• “Empirical” is typically used to define any statement about the
world that is related to observation or experience.
• Empirical software engineering (ESE) is an area of research
that emphasizes methods in the field of software engineering.
It involves methods for evaluating, assessing, predicting,
monitoring, and controlling the existing artifacts of software
development.
• ESE applies quantitative methods to the software engineering
phenomenon to understand software development better.
• ESE has been gaining importance over the past few decades
because of the ability to mine data from open source
software repositories that contain information about software
requirements, bugs, and changes.
What are Software Repositories?
• Software repositories also known as code repositories are
centralized hubs that help developers create, maintain, and
track software packages.
• Repository management software controls access to
software packages, tracks package deployments, and
includes or integrates with version control systems. The
repositories work with package managers and build tools.
• Advanced repository features include pipeline workflow
tools, static code analysis, vulnerability testing, and
ensuring developers have access to the latest versions of
public artifacts.
Public and Private Repositories
• Public repositories securely store, publish, and
freely share open-source software.
• Organizations use private repositories to
manage their proprietary software resources.
They can publish that software and charge
fees through licensing arrangements.
• Code repositories support the discovery of
software assets and promote code reuse.
Software Repositories Features
• Step 4:
– Create a local directory using the following command:
– $ mkdir test
– $ cd test
• Step 5:
– The next step is to initialize the directory:
– $ git init
How to use GIT?
• Step 6:
– Go to the folder where "test" is created and create a text
document named "demo." Open "demo" and put any content,
like "Hello Simplilearn." Save and close the file.
• Step 7:
– Enter the Git bash interface and type in the following command
to check the status:
– $ git status
• Step 8:
– Add the "demo" to the current directory using the following command:
– $ git add demo.txt
• Step 9:
– Next, make a commit using the following command:
– $ git commit -m "committing a text file"
How to use GIT?
• Step 10:
– Link the Git to a Github Account:
– $ git config --global user.username
• Step 11:
– Open your Github account and create a new repository
with the name "test_demo" and click on "Create
repository." This is the remote repository. Next, copy the
link of "test_demo.“
• Step 12:
– Go back to Git bash and link the remote and local
repository using the following command:
– $ git remote add origin <link>
• Step 13:
– Push the local file onto the remote repository
using the following command:
– $ git push origin master
• Step 14:
– Move back to Github and click on "test_demo"
and check if the local file "demo.txt" is pushed to
this repository.
GIT Features in Detail
• Study GIT features in Detail from the GIT
Document provided to you.(Attached with
this post)
Quiz
• Empirical Software Engineering is not based
upon
– Experience
– Intuition
– Evidence from Data
– Observation
• The potential benefit of MSR is
– Informed decision making
– Creating jobs
– Creating network of developers
– None of the above three
– Some of the above three
• Write three examples of meta data available in
in a source control repository
Bug Tracking Systems
• A bug tracking system (also known as defect tracking
system) is a software system/ application that is built
with the intent of keeping a track record of various
defects, bugs, or issues in software development life
cycle. It is a type of issue tracking system.
• Bug tracking systems are commonly employed by a
large number of OSS systems and most of these
tracking systems allow the users to generate various
types of defect reports directly.
• Typical bug tracking systems are integrated with other
software project management tools and
methodologies.
Bug Information
The information about a bug typically includes the
following:
• The time when the bug was reported in the software
system
• Severity of the reported bug
• Behavior of the source program/module in which the
bug was encountered.
• Details on how to reproduce that bug
• Information about the person who reported that bug
• Developers who are possibly working to fix that bug, or
will be assigned the job to do so
Components of BTS
• A database is a crucial component of a bug
tracking system, which stores and maintains
information regarding the bugs reported by
the users and/or developers.
• Many bug tracking systems also support
tracking through the status of a bug to
determine what is known as the concept of
bug life cycle.
Bug Life cycle
1. New: When any new defect is identified by the tester, it falls in the
‘New’ state. It is the first state of the Bug Life Cycle. The tester
provides a proper Defect document to the Development team so that
the development team can refer to Defect Document and can fix the
bug accordingly.
2. Assigned: Defects that are in the status of ‘New’ will be approved
and that newly identified defect is assigned to the development team
for working on the defect and to resolve that. When the defect is
assigned to the developer team the status of the bug changes to the
‘Assigned’ state.
3. Open: In this ‘Open’ state the defect is being addressed by the
developer team and the developer team works on the defect for fixing
the bug. Based on some specific reason if the developer team feels
that the defect is not appropriate then it is transferred to either the
‘Rejected’ or ‘Deferred’ state.
BLC
4. Fixed: After necessary changes of codes or after fixing
identified bug developer team marks the state as ‘Fixed’.
5. Pending Request: During the fixing of the defect is
completed, the developer team passes the new code to
the testing team for retesting. And the code/application is
pending for retesting on the Tester side so the status is
assigned as ‘Pending Retest’.
6. Retest: At this stage, the tester starts work of retesting
the defect to check whether the defect is fixed by the
developer or not, and the status is marked as ‘Retesting’.
BLC
7. Reopen: After ‘Retesting’ if the tester team found that the
bug continues like previously even after the developer team
has fixed the bug, then the status of the bug is again changed
to ‘Reopened’. Once again bug goes to the ‘Open’ state and
goes through the life cycle again. This means it goes for Re-
fixing by the developer team.
8. Verified: The tester re-tests the bug after it got fixed by the
developer team and if the tester does not find any kind of
defect/bug then the bug is fixed and the status assigned is
‘Verified’.
9. Closed: It is the final state of the Defect Cycle, after fixing
the defect by the developer team when testing found that the
bug has been resolved and it does not persist then they mark
the defect as a ‘Closed’ state.
Bug Severity and Bug Priority
• Severity is basically a parameter that denotes
the total impact of a given defect on any
software.
• Priority is basically a parameter that decides
the order in which we should fix the defects.
• Severity relates to the standards of quality.
• Priority relates to the scheduling of defects to
resolve them in software.
How BTS is Used by admins and Devs?
• Ideally, the administrators of a bug tracking system are
allowed to manipulate the bug information, such as
determining the possible values of bug status, and
hence the bug life cycle states, configuring the
permissions based on bug status, changing the status
of a bug, or even remove the bug information from the
database.
• Many systems also update the administrators and
developers associated with a bug through emails or
other means, whenever new information is added in
the database corresponding to the bug, or when the
status of the bug changes.
Advantages of BTS
• The primary advantage of a bug tracking system is
that it provides a clear, concise, and centralized
overview of the bugs reported in any phase of the
software development life cycle, and their state.
• The information provided is valuable for defining
the product road map and plan of action, or even
planning the next release of a software system .
• Bugzilla is one of the most widely used bug
tracking systems. Several open source projects,
including Mozilla, employ the Bugzilla
Mailing List Analysis
• Most open source developers communicate
through mailing lists.
• This style of communication makes mailing lists a
rich source of information which researchers can
use to understand software processes and
improve development practices.
• Mailing lists have been used to infer social
structure , identify architectural changes , and
also to study the code review process .
Mailing List Analysis
• Developers use mailing lists to discuss a
variety of issues and project decisions
• Many of these issues and decisions are related
to and affect the source code. These issues are
often driven by external factors such as the
introduction of new features in competing
products.
Role of Mailing Lists in OSS
Extracting Data from Software
Repositories
• The procedure for extracting data from software
repositories is depicted in Figure on next slide
• The Figure shows the data-collection process of extracting
defect/change reports.
• The first step in the data-collection procedure is to extract
metrics using metrics-collection toolssuch as Understand
and chidamber and kemerer java metrics (CKJM).
• The second step involves collection of bug information to
the desired level of detail (file, method, or class) from the
defect report and source control repositories.
• Finally, the report containing the software metrics and the
defects extracted from the repositories is generated and
can be used by the researchers for further analysis
Extracting Data from Software
Repositories