0% found this document useful (0 votes)
13 views79 pages

Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views79 pages

Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

UNIT-I

Introduction
to Mining Software Repositories
Even Sem Jan 2024
Dr. Kamaldeep Kaur
Software Engineering
• The establishment and use of sound engineering
principles in order to obtain economically
developed software that is reliable and works
efficiently on real machines.
• Software engineering is defined by IEEE
Computer Society as (Abren et al. 2004):
– The application of a systematic, disciplined,
quantifiable approach to the development, operation
and maintenance of software, and the study of these
approaches, that is, the application of engineering to
software.
Empirical Software Engineering
• “Empirical” is typically used to define any statement about the
world that is related to observation or experience.
• Empirical software engineering (ESE) is an area of research
that emphasizes methods in the field of software engineering.
It involves methods for evaluating, assessing, predicting,
monitoring, and controlling the existing artifacts of software
development.
• ESE applies quantitative methods to the software engineering
phenomenon to understand software development better.
• ESE has been gaining importance over the past few decades
because of the ability to mine data from open source
software repositories that contain information about software
requirements, bugs, and changes.
What are Software Repositories?
• Software repositories also known as code repositories are
centralized hubs that help developers create, maintain, and
track software packages.
• Repository management software controls access to
software packages, tracks package deployments, and
includes or integrates with version control systems. The
repositories work with package managers and build tools.
• Advanced repository features include pipeline workflow
tools, static code analysis, vulnerability testing, and
ensuring developers have access to the latest versions of
public artifacts.
Public and Private Repositories
• Public repositories securely store, publish, and
freely share open-source software.
• Organizations use private repositories to
manage their proprietary software resources.
They can publish that software and charge
fees through licensing arrangements.
• Code repositories support the discovery of
software assets and promote code reuse.
Software Repositories Features

• Build, maintain, store, or source software packages and containers


from public and private feeds
• Package management
• Deployment tracking
• Version control
• Access controls
• Encrypted storage and backup
• Asset discovery
• Pipeline workflow tools
• Release management tools
• Static code analysis
• Vulnerability testing
• Dashboards, statistics, and reporting
What are the benefits of using
software repositories?
• Productivity: Software repositories facilitate
code reuse expediting product development.
• Efficiency: A centralized software repository
fosters collaboration. Repository management
tools streamline the tracking and deployment
of software packages.
• Security: Encrypted storage, access controls,
and backups secure the software packages.
Importance of Mining Software
Repositories(MSR)
• Software repositories usually provide a vast array
of varied and valuable information regarding
software projects.
• By utilizing the information mined from these
repositories, software engineering researchers
and practitioners do not need to depend
primarily on their intuition and experience, but
more on real data for informed decision making.
• Effective mining techniques can extract the right
kind of information from these repositories in the
right form.
Introduction to MSR
• Until 2005, Software repositories were used for
historical record supporting activities such as
retrieving old versions of the source code or
examining the status of a defect and were are
rarely used to facilitate decision making
processes (Hassan 2008).
• MSR aims to carrying out a significant
transformation of these repositories from static
record-keeping repositories into active ones for
guiding the decision-making process of modern
software project management.
Potential Benefits of MSR
• Mining data from software repositories is
expected to bring the following potential
benefits:
• Support maintenance of the software system
• Empirical validation of techniques and
methods
• Supporting software reuse
• Proper allocation of testing and maintenance
resources
Data Analysis Procedure After Mining
Types of Software repositories
• Historical
– Source Control
– Bug Repositories
– Archived Communications
• Run-time repositories or Deployment logs
• Source Code repositories.
Source control repositories
• Source control repositories record and maintain the development
trail of a project.
• They track each and every change incurred in any of the artifacts of
a software system, such as the source code, documentation
manuals, and so on.
• Additionally, they also maintain the metadata regarding each
change, for instance, the developer or project member who carried
out the change, the timestamp when the change was performed,
and a short description of the change.
• These are the most readily available repositories, and also the most
employed in software projects .
• Git, CVS, subversion (SVN), Perforce, and ClearCase are some of the
popular source control repositories that are used in practice.
• Source control repositories, also known as VCS
Bug Repositories
• These repositories track and maintain the
resolution history of defect/bug reports,
which provide valuable information regarding
the bugs that were reported by theusers of a
large software project, as well as the
developers of that project.
• Bugzilla and Jira are the commonly used bug
repositories.
Archived Communications
• Discussions regarding the various aspects of a
software project during its lifecycle, such as
mailing lists, emails, instant messages, and
internet relay chats (IRCs) are recorded in the
archived communications.
Run-Time Repositories
• Run-time repositories, also known as deployment logs,
record information regarding the execution of a single
deployment, or different deployments of a software
system. For example, run-time repositories may record the
error messages reported by a software application at varied
deployment sites.
• Run-time repositories can possibly be employed to
determine the execution anomalies by discovering
dominant execution or usage patterns across various
deployments, and recording the deviations observed from
such patterns.
• Sarbanes-Oxley Act of 2002 states that it is mandatory to
log the execution of every commercial, financial, and
telecommunication application in these repositories.
Source-Code Repositories
• Source code repositories maintain the source
code for a large number of OSS projects.
• Sourceforge.net and Google code are among
the most commonly employed code
repositories, and host the source code for a
large number of Open source systems, such
as Android OS, Apache Foundation Projects,
Version Control Systems(VCS)
• VCS, also known as source control systems or
simply versioning systems, are systems that
track and record changes incurred to a single
artifact or a set of artifacts of a software
system.
• Each and every change, no matter how big or
small, is recorded over time so that we may
recall specific revisions or versions of the
system artifacts later.
Three Types of VCS
• Local VCS(e.g Revision Control System)
• Centralized Version Control System (CVCS)(e.g
SVN)
• Distributed/Decentralized Version Control
System (DVCS).(e.g Git)
Local VCS
• Local VCS employ a simple database that records and maintains all the
changes to artifacts of the software project under revision control.
• A system named revision control system (RCS) was a very popular local
versioning system.
• This tooloperates by simply recording the patch sets (i.e., the differences
between two artifacts) while moving from one revision to the other in a
specific format on the user’s system.
• It can then easily recreate the image of a project artifact at any point of
time by summing up all the maintained patches.
• However, the user cannot collaborate with other users on other systems,
as the database is local and not maintained centrally.
• Each user has his/her own copy of the different revisions of project
artifacts, and thus there are consistency and data sharing problems.
• Moreover, if one user loses the versioning data, recovering it is impossible
until and unless a backup is maintained from time to time.
Diagram of Local VCS
Centralized VCS (CVCS)
• The main aim of CVCS is to allow the user to easily
collaborate with different users on other systems.
• These systems, such as CVS, Perforce, and SVN, employ
a single centralized server that records and maintains
all the versioned artifacts of a software projectunder
revision control, and there are a number of clients or
users that check out (obtain) the project artifacts from
that central server.
• However, if the central server fails or the data stored at
central server is corrupted or lost, there are no chances
of recovery unless we maintain periodic backups
Diagram of Centralized VCS
Distributed VCS(DVCS)
• As opposed to CVCS, a DVCS (such as Bazaar, Darcs, Git,
and Mercurial) ensures that the clients or users do not
just obtain or check out the latest revision or snapshot
of the project artifacts, but clone, mirror, o download
the entire software project repository to obtainthe
artifacts.
• If any server of the DVCS fails or its data is corrupted or
lost, any of the software project repositories stored at
the client machine can be uploaded as back up to the
server torestore it. Therefore, every checkout carried
out by a client is essentially a complete backup of the
entire software project data.
Diagram of Distributed VCS
Some Important Terms-Build, Revision
• Build - a binary that’s produced after committing
to VCS (Version Control System).
• Version is assign to any build to be able to
identify it.
• Revision is an identifier in VCS - it’s something
that can tell which source code was at which
point in time
• Release - is a build or its version that goes to
production environment or is announced to the
public.
Example
• Consider a complete version number like 8.3.2.37.
• The first group (before the first period) is the release number. A release introduces
major new features, may involve significant internal rework/re-architecting, and
may break compatibility with previous releases or previously-supported platforms.
• The second group is the version number. This is a smaller update within an update.
It might add some new features and fix assorted bugs. It generally does not
include major internal design or architecture changes and should be (mostly)
backwards-compatible with other versions of the same release.
• The third group is the revision number. A revision usually contains bug fixes and
tiny enhancements.
• The build number is a sequential build number within a release. In my example
above, this is the 37th build of revision 8.3.2. The build number is generally
automatically increment by a Continuous Integration (CI) build process. Only one
build will be the official build for a given revision. If my 8.3.2.37 is released and
then a critical bug is found, the fix would be in 8.3.3.x.
• Not all projects use this scheme but it is fairly common.
VCS Terminology Generic
• Repository: A repository is the heart of any version control system. It is
the central place where developers store all their work. Repository not
only stores files but also the history. Repository is accessed over a
network, acting as a server and version control tool acting as a client.
Clients can connect to the repository, and then they can store/retrieve
their changes to/from repository. By storing changes, a client makes these
changes available to other people and by retrieving changes, a client takes
other people's changes as a working copy.
• Trunk: The trunk is a directory where all the main development happens
and is usually checked out by developers to work on the project.
• Tags : The tags directory is used to store named snapshots of the project.
Tag operation allows to give descriptive and memorable names to specific
version in the repository.
• For example, LAST_STABLE_CODE_BEFORE_EMAIL_SUPPORT is more
memorable than Repository UUID: 7ceef8cb-3799-40dd-a067-
c216ec2e5247 and Revision: 13
VCS Terminology Continued
• Branches: Branch operation is used to create another line of development.
It is useful when you want your development process to fork off into two
different directions. For example, when you release version 5.0, you might
want to create a branch so that development of 6.0 features can be kept
separate from 5.0 bug-fixes.
• Working copy: Working copy is a snapshot of the repository. The
repository is shared by all the teams, but people do not modify it directly.
Instead each developer checks out the working copy. The working copy is a
private workplace where developers can do their work remaining isolated
from the rest of the team.
• Commit changes: Commit is a process of storing changes from private
workplace to central server. After commit, changes are made available to
all the team. Other developers can retrieve these changes by updating
their working copy. Commit is an atomic operation. Either the whole
commit succeeds or is rolled back. Users never see half finished commit.
• Head:refers to the commit that has been made most recently, either to a
branch or to the trunk.
Concurrent Versioning System(CVS)
• Concurrent Versioning System(CVS) is a
particular VCS of Centralized type.
• CVS is used for two apparently unrelated
purposes: record keeping and collaboration. It
turns out, however, that these two functions
are closely connected.
CVS
• Record keeping became necessary because people wanted
to compare a program's current state with how it was at
some point in the past.
• For example, in the normal course of implementing a new
feature, a developer may bring the program into a
thoroughly broken state, where it will probably remain until
the feature is mostly finished.
• Unfortunately, this is just the time when someone usually
calls to report a bug in the last publicly released version.
• To debug the problem (which may also exist in the current
version of the sources), the program has to be brought back
to a useable state.
CVS Specifics
• Revision A committed change in the history of a file or set of files. A
revision is one "snapshot" in a constantly changing project.
• Repository The master copy where CVS stores a project's full
revision history. Each project has exactly one repository.
• Working copy The copy in which you actually make changes to a
project. There can be many working copies of a given project;
generally each developer has his or her own copy.
• Check out To request a working copy from the repository. Your
working copy reflects the state of the project as of the moment you
checked it out; when you and other developers make changes, you
must use commit and update to "publish" your changes and view
others' changes.
• Commit To send changes from your working copy into the central
repository. Also known as check-in.
CVS Specifics
• Log message A comment you attach to a revision when you
commit it, describing the changes. Others can page through
the log messages to get a summary of what's been going on
in a project.
• Update To bring others' changes from the repository into
your working copy and to show if your working copy has
any uncommitted changes. Be careful not to confuse this
with commit; they are complementary operations.
Mnemonic: update brings your working copy up to date
with the repository copy.
• Conflict The situation when two developers try to commit
changes to the same region of the same file. CVS notices
and points out conflicts, but the developers must resolve
them.
CVS features in Detail
• CVS is a popular CVCS that hosts a large number of OSS
systems (Cederqvist et al. 1992).
• CVS has been developed with the primary goal to
handle different revisions of various software project
artifacts by storing the changes between two
subsequent revisions of these artifacts in the
repository.
• Thus, CVS predominantly stores the change logs rather
than the actual artifacts such as binary files.
• It does not imply that CVS cannot store binaryfiles. It
can, but they are not handled efficiently.
CVS Revision Numbers
• Revision numbers: Each new revision or version of a project artifact
stored in the CVS repository is assigned a unique revision number
by the CVS itself.
• For example, the first version of a checked in artifact is assigned
the revision number 1.1.
• After the artifacts are modified (updated) and the changes are
committed (permanently recorded) to the CVS repository, the
revision number of each modified artifactis incremented by one.
• Since some artifacts may be more affected by updation or changes
than the others, the revision numbers of the artifacts are not
unique.
• Therefore, a release of the software project, which is basically a
snapshot of theCVS repository, comprises of all the artifacts under
version control where the artifacts can have individual revision
numbers.
CVS Branching and Merging
• CVS supports almost all of the functionalities pertaining to branches in a
VCS.
• The user can create his/her own branch for development, and view,
modify, or delete a branch created by the user as well as other users,
provided the user is authorized to access those branches in the repository.
• To create a new branch, CVS chooses the first unused even integer,
starting with 2, and appends to the artifacts’ revision number from where
the branch is forked off, that is, the user who has created that branch
wishes to work on those particular artifacts only.
• For example, the first branch, which is created at the revision number 1.2
of an artifact, receives the branch number 1.2.2 but CVS internally stores it
as 1.2.0.2.
• However, the main issue with branches is that the detection of branch
merges is not supported by CVS. Consequently, CVS does not boast of
enough mechanisms that support tracking of evolution of typically large-
sized software systems as wellas their particular products.
CVS Version control Data
• For each artifact, which is under the repository’s
version control,CVS generates detailed version
control data and saves it in a change log or
• simply log files.
• The recorded log information can be easily
retrieved by using the CVS log command.
• Moreover, we can specify some additional
parameters soas to allow the retrieval of
information regarding a particular artifact or even
thecomplete project directory.
Example Log File from CVS
CVS Shortcoming
• a major shortcoming of CVS that haunts most
of the developers is the lack of functionality to
provide appropriate mechanisms for linking
detailed modification reports and classifying
changes
CVS other details
• RCS file: This field contains the path information to identify an artifact in the
repository.
• Locks and AccessList: These are file content access and security options set by
the developer during the time of committing the file with the CVS. These may be
used to prevent unauthorized modification of the file and allow the users to only
download certain file, but does not allow them to commit protected or locked files
with the CVS repository.
• Symbolic names: This field contains the revision numbers assigned to tag names.
The assignment of revision numbers to the tag names is carried out individually
for each artifact because the revision numbers might be different.
CVS Details
• Description: This field contains the modification reports that describe the change
history of the artifact, beginning from the first commit until the current version.
Apart from the changes incurred in the head or main trunk, changes in all the
branches are also recorded there. The revisions are separated by a few number of
“-” characters.
• Revision number: This field is used to identify the revision of source code artifact
(main trunk, branch) that has been subject to change(s).
• Date: This field records the date and time of the check in.
• Author: This field provides the information of the person who committed the
change.
• State: This field provides information about the state of the committed artifact and
generally assumes one of these values: “Exp” (experimental) and “dead” (file has
been removed).
CVS Details
• Lines: This field counts the lines added and/or
deleted of the newly checked inrevision
compared with the previous version of a file. If
the current revision isalso a branch point, a list of
branches derived from this revision is listed in
thebranches field. In the above example, the
branches field is blank, indicating that thecurrent
revision is not a branch point.
• Free Text: This field provides the comments
entered by the author while committing the
artifact.
SVN
• Apache Subversion which is often abbreviated
as SVN, is a software another versioning and
revision control system distributed under an
open source license.
• Subversion was created by CollabNet Inc. in
2000, but now it is developed as a project of
the Apache Software Foundation, and as such
is part of a rich community of developers and
users.
SVN Continued
• SVN hosts a large number of OSS systems, such as Tomcat
and other Apache projects.
• Being a CVCS, SVN has the capability to operate across
various networks, because of which people working on
different locations and devices can use SVN.
• Similar to other VCS, SVN also conceptualizes and
implements a version control database or repository in the
same manner.
• However, different from a working copy, a SVN repository
can be considered as an abstract entity, which has the
ability to be accessed and operated upon almost exclusively
by employing the tools and libraries, such as the Tortoise-
SVN.
SVN Specifics
Revision numbers: Each revision of a project artifact stored in the SVN
repository is assigned a unique natural number, which is one more than the
number assigned to the previous revision.
• The initial revision of a newly created repository is typically assigned the
number “0,” indicating that it consists of nothing other than an empty
trunk or main directory.
• Unlike most of the VCS (including CVS), the revision numbers assigned by
SVN apply to the entire repository tree of a project, not the individual
project artifacts.
• Each revision number represents an entire tree, or a specific state of the
repository after a change is committed. In other words, revision “i” means
the state of the SVN repository after the “ith” commit.
• Since some artifacts may be more affected by updation or changes than
the others, it implies that the two revisions of a single file may be the
same, since even if one file is changed the revision number of each and
every artifact is incremented by one.
• Therefore, every artifact has the same revision number for a given version
of the entire project.
SVN Specifics
Branching and merging:
• SVN fully provides the developers with various options to maintain parallel
branches of their project artifacts and directories.
• It permits them to create branches by simply replicating or copying their
data, and remembers that the copies which are created are related among
themselves.
• It also supports the duplication of changes from a given branch to another.
• SVN’s repository is specially calibrated to support efficient branching.
• When we duplicate or copy any directory to create a branch, we need not
worry that the entire SVN repository will grow in size.
• Instead, SVN does not copy any data in reality. It simply creates a new
directory entry, pointing to an existing tree in the repository.
• Owing to this mechanism, branches in the SVN exist as normal directories.
• This is opposed to many of the other VCS, where branches are typically
identified by some specific“labels” or identifiers to the concerned
artifacts.
SVN Specifics
Merging of different branches.
• SVN 1.5 had incorporated the feature of
merge tracking to SVN.
• In the absence of this feature, a great deal of
manual effort and the application of external
tools were required to keeptrack of merges.
SVN Specifics
Version control data:
• For each artifact, which is under version control in the
repository, SVN also generates detailed version control
data and stores it to change log or simply log files.
• The recorded log information can be easily retrieved by
using a SVN client, such as Tortoise-SVN client, and also
by the “svn log” command.
• Moreover, we can also specify some additional
parameters so as to allow the retrieval of information
regarding a particular artifact or even the complete
project directory.
• Although the SVN classifies changes to the files as
modified, added, or deleted, there are no other
classification types for the incurred changes that
are directly provided by it, such as classifying
changes for enhancement, bug-fixing, and so on.
• Even though we have a “Bugzilla-ID” field, it is still
optional and the developer committing the
change is notbound to specify it, even if he has
fixed a bug already reported in the Bugzilla
database.
SVN Commit Record
• Revision number: This field identifies the source code revision (main trunk,
branch) that has been modified.
• Actions: This field specifies the type of operation(s) performed with the file(s)
being changed in the current commit. Possible values include
“Modified” (if a file has been changed), “Deleted” (if a file has been
deleted),
“Added” (if a file has been added), and a combination of these values is also
possible, in case there are multiple files affected in the current commit.
• Author: This field identifies the person who did the check in.
• Date: Date and time of the check in, that is, permanently recording changes with
the SVN, are recorded in the date field.
• Bugzilla ID (optional): This field contains the ID of a bug (if the current commit
fixes a bug) that has also been reported in the Bugzilla database. If specified, then
this field may be used to link the two repositories: SVN and Bugzilla, together.
SVN Commit Record Example
We may obtain change logs from the SVN (through
version control data) and bug details from the Bugzilla.
• Modified: This field lists the source code files that were
modified in the current commit. In the above log file, the
file “mbeans-descriptors.dtd” was modified.
• Added: This field lists the source code files that were
added to the project in thecurrent commit. In the above
log file, this field is not specified, indicating that no
files have been added.
• Message: The following message field contains informal
data entered by the authorduring the check in process.
GIT
• Git is a version control system for tracking changes in
computer files. It helps in coordinating work amongst
several people in a project and tracks progress over
time. Unlike the centralized version control system, Git
branches can be easily merged. A new branch is
created every time a developer wants to start working
on something. This ensures that the master branch
always has a production-quality code.
• Git is a distributed version control system, so here,
every developer gets their local repository with full
commit history. The commit history makes Git fast, as
now a network connection is not needed to create
commits or perform diffs between commits.
What Is GitHub?

• GitHub is a Git repository hosting service that provides a


web-based graphical interface (GUI). It helps every team
member work together on a project from anywhere,
making it easy to collaborate.
• GitHub is one place where project managers and
developers coordinate, track, and update their work, so
projects stay transparent and on schedule. The packages
can be published privately, within the team, or publicly for
the open-source community. Downloading packages from
GitHub enables them to be used and reused. GitHub helps
all team members stay on the same page and stay
organized. Moderation tools, like issue and pull request
locking, helps the team focus on the code.
How to use GIT?
• Git Installation on Windows
• Step 1:
– Download the latest version of Git and choose the
64/32 bit version. After the file is downloaded, install
it in the system. Once installed, select Launch the Git
Bash, then click on finish. The Git Bash is now
launched.
• Step 2:
– Check the Git version:
– $ git --version
How to use GIT?
• Step 3:
– For any help, use the following command:
– $ git help config

• Step 4:
– Create a local directory using the following command:
– $ mkdir test
– $ cd test

• Step 5:
– The next step is to initialize the directory:
– $ git init
How to use GIT?
• Step 6:
– Go to the folder where "test" is created and create a text
document named "demo." Open "demo" and put any content,
like "Hello Simplilearn." Save and close the file.
• Step 7:
– Enter the Git bash interface and type in the following command
to check the status:
– $ git status
• Step 8:
– Add the "demo" to the current directory using the following command:
– $ git add demo.txt
• Step 9:
– Next, make a commit using the following command:
– $ git commit -m "committing a text file"
How to use GIT?
• Step 10:
– Link the Git to a Github Account:
– $ git config --global user.username
• Step 11:
– Open your Github account and create a new repository
with the name "test_demo" and click on "Create
repository." This is the remote repository. Next, copy the
link of "test_demo.“
• Step 12:
– Go back to Git bash and link the remote and local
repository using the following command:
– $ git remote add origin <link>
• Step 13:
– Push the local file onto the remote repository
using the following command:
– $ git push origin master
• Step 14:
– Move back to Github and click on "test_demo"
and check if the local file "demo.txt" is pushed to
this repository.
GIT Features in Detail
• Study GIT features in Detail from the GIT
Document provided to you.(Attached with
this post)
Quiz
• Empirical Software Engineering is not based
upon
– Experience
– Intuition
– Evidence from Data
– Observation
• The potential benefit of MSR is
– Informed decision making
– Creating jobs
– Creating network of developers
– None of the above three
– Some of the above three
• Write three examples of meta data available in
in a source control repository
Bug Tracking Systems
• A bug tracking system (also known as defect tracking
system) is a software system/ application that is built
with the intent of keeping a track record of various
defects, bugs, or issues in software development life
cycle. It is a type of issue tracking system.
• Bug tracking systems are commonly employed by a
large number of OSS systems and most of these
tracking systems allow the users to generate various
types of defect reports directly.
• Typical bug tracking systems are integrated with other
software project management tools and
methodologies.
Bug Information
The information about a bug typically includes the
following:
• The time when the bug was reported in the software
system
• Severity of the reported bug
• Behavior of the source program/module in which the
bug was encountered.
• Details on how to reproduce that bug
• Information about the person who reported that bug
• Developers who are possibly working to fix that bug, or
will be assigned the job to do so
Components of BTS
• A database is a crucial component of a bug
tracking system, which stores and maintains
information regarding the bugs reported by
the users and/or developers.
• Many bug tracking systems also support
tracking through the status of a bug to
determine what is known as the concept of
bug life cycle.
Bug Life cycle
1. New: When any new defect is identified by the tester, it falls in the
‘New’ state. It is the first state of the Bug Life Cycle. The tester
provides a proper Defect document to the Development team so that
the development team can refer to Defect Document and can fix the
bug accordingly.
2. Assigned: Defects that are in the status of ‘New’ will be approved
and that newly identified defect is assigned to the development team
for working on the defect and to resolve that. When the defect is
assigned to the developer team the status of the bug changes to the
‘Assigned’ state.
3. Open: In this ‘Open’ state the defect is being addressed by the
developer team and the developer team works on the defect for fixing
the bug. Based on some specific reason if the developer team feels
that the defect is not appropriate then it is transferred to either the
‘Rejected’ or ‘Deferred’ state.
BLC
4. Fixed: After necessary changes of codes or after fixing
identified bug developer team marks the state as ‘Fixed’.
5. Pending Request: During the fixing of the defect is
completed, the developer team passes the new code to
the testing team for retesting. And the code/application is
pending for retesting on the Tester side so the status is
assigned as ‘Pending Retest’.
6. Retest: At this stage, the tester starts work of retesting
the defect to check whether the defect is fixed by the
developer or not, and the status is marked as ‘Retesting’.
BLC
7. Reopen: After ‘Retesting’ if the tester team found that the
bug continues like previously even after the developer team
has fixed the bug, then the status of the bug is again changed
to ‘Reopened’. Once again bug goes to the ‘Open’ state and
goes through the life cycle again. This means it goes for Re-
fixing by the developer team.
8. Verified: The tester re-tests the bug after it got fixed by the
developer team and if the tester does not find any kind of
defect/bug then the bug is fixed and the status assigned is
‘Verified’.
9. Closed: It is the final state of the Defect Cycle, after fixing
the defect by the developer team when testing found that the
bug has been resolved and it does not persist then they mark
the defect as a ‘Closed’ state.
Bug Severity and Bug Priority
• Severity is basically a parameter that denotes
the total impact of a given defect on any
software.
• Priority is basically a parameter that decides
the order in which we should fix the defects.
• Severity relates to the standards of quality.
• Priority relates to the scheduling of defects to
resolve them in software.
How BTS is Used by admins and Devs?
• Ideally, the administrators of a bug tracking system are
allowed to manipulate the bug information, such as
determining the possible values of bug status, and
hence the bug life cycle states, configuring the
permissions based on bug status, changing the status
of a bug, or even remove the bug information from the
database.
• Many systems also update the administrators and
developers associated with a bug through emails or
other means, whenever new information is added in
the database corresponding to the bug, or when the
status of the bug changes.
Advantages of BTS
• The primary advantage of a bug tracking system is
that it provides a clear, concise, and centralized
overview of the bugs reported in any phase of the
software development life cycle, and their state.
• The information provided is valuable for defining
the product road map and plan of action, or even
planning the next release of a software system .
• Bugzilla is one of the most widely used bug
tracking systems. Several open source projects,
including Mozilla, employ the Bugzilla
Mailing List Analysis
• Most open source developers communicate
through mailing lists.
• This style of communication makes mailing lists a
rich source of information which researchers can
use to understand software processes and
improve development practices.
• Mailing lists have been used to infer social
structure , identify architectural changes , and
also to study the code review process .
Mailing List Analysis
• Developers use mailing lists to discuss a
variety of issues and project decisions
• Many of these issues and decisions are related
to and affect the source code. These issues are
often driven by external factors such as the
introduction of new features in competing
products.
Role of Mailing Lists in OSS
Extracting Data from Software
Repositories
• The procedure for extracting data from software
repositories is depicted in Figure on next slide
• The Figure shows the data-collection process of extracting
defect/change reports.
• The first step in the data-collection procedure is to extract
metrics using metrics-collection toolssuch as Understand
and chidamber and kemerer java metrics (CKJM).
• The second step involves collection of bug information to
the desired level of detail (file, method, or class) from the
defect report and source control repositories.
• Finally, the report containing the software metrics and the
defects extracted from the repositories is generated and
can be used by the researchers for further analysis
Extracting Data from Software
Repositories

You might also like