0% found this document useful (0 votes)
22 views

Introduction To Data Repositories - Unlocked

Uploaded by

Akshat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
22 views

Introduction To Data Repositories - Unlocked

Uploaded by

Akshat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 182
Introduction to Data Repositories Shweta Meena Department of Software Engineering Delhi Technological University Which repositories are available for extracting software engineering data? + Software repositories can be mined to collect and gather the data that can be used for providing empirical results by validating various techniques or methods. + These evidences can allow software researchers to establish well-formed and generalized theories. + By applying the information mined from these repositories, software engineering researchers and practitioners do not need to depend primarily on their intuition and experience, but more on field and historical data Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group What type of questions can be answered from data mined from software repositories? + Is design A better than design B? + Is process/method A better than process/method B? + What is the probability of occurrence of a defect or change in a module? + Is the effort estimation process accurate? + What is the time taken to correct a bug? + Is testing technique A better than testing technique B? CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se What are various data collection source? + Data can be collected from proprietary software, open source software (OSS), or university software. + Data collection from proprietary software is extremely difficult due to privacy concerns. + Data collection from programs developed by students are not concerned due to non-determination of accuracy and applicability. + Data collection from university software is not recommended due to inexperienced programmers involvement and _ limited applicability in real-life sciences CRC Press. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration Management Systems + Configuration management systems are central to almost all software projects developed by the organizations. + The aim of a configuration management system is to control and manage changes that occur in all the artifacts produced during the software development life cycle. + The artifacts (also known as deliverables) produced during the software development life cycle include software requirement specification, software design document, source code listings, user manuals, and so on. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration management system: Types of activities Configuration Accounting Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Configuration Identification + Each and every software project artifact produced during the software development life cycle is uniquely named. + Release: Q The first issue of a software artifact is called a release Q This usually provides most of the functionalities of a product, but may contain a large number of bugs and thus is prone to issue fixing and enhancements. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration Identification + Versions: Q Significant changes incurred in the software project's artifacts are called versions. Q Each version tends to enhance the functionalities of a product, or fix some critical bugs reported in the previous version. Q New functionalities may or may not be added. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration Identification + Editions: Q Minor changes or revisions incurred in the software artifacts are termed as editions. Q As opposed to a version, an edition may not introduce significant enhancements or fix some critical issues reported in the previous version. Q Small fixes and patches are introduced. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration Control + Configuration control is a critical process of versioning or configuration management activities. + Configuration control incorporates the approval, control, and implementation of changes to the software project artifact(s), or to the software project itself. Change cycle Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Configuration Control + Its primary purpose is to ensure that each and every change incurred to any software artifact is carried out with the knowledge and approval of the software project management team. + The request consists of some important fields such as severity (impact of failure on software operation) and priority (speed with which the defect must be addressed). + The change control board (CCB) is responsible for the approval and tracking of changes. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se ‘Change Request Form ‘Change Request ID “Type of Change Request Enhancement DiDetect Fixing Dotter Specify Project Requested By Project tenn member name Brief Description ofthe Change Request Description of the change being request ‘ate Submited Date Required Priority Titov DMediom Mien Mandatory Severity Ciriviat Gwoderate — [Cserious Deritieat ‘Reason for Change scription of why the change Bing requested Estimated Cost of Change Estimates for the cost of incurring the change ‘Other Artifacts Impacted List other arifct affected by this change Signature Change request form Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press (Change Notice Form Change Request ID. ‘Type of Change Request TiDefect Fixing Project “Module in which change is made Change Implemented by Project team member name Date and time of change implementation ‘Change Approved By ‘CCB member who approved the change Brief Description of the Change Request Description ofthe change incurred Decision DApproved TiApproved with Conditions Decision Date ‘Conditions Conditions imposed by the CCB “Approval Signature Change notice form Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Taylore Francs Group Configuration Control + The CCB carefully and closely reviews each and every change before approval + After the changes are successfully implemented and documented, they must be notified so that they are tracked and recorded in the software repository hosted at version control systems (VCS). + Sometimes, it is also known as the software library, archive, or repository, wherein the entire official artifacts (documents and source code) are maintained during the software development life cycle. + The changes are notified through a software change notice. Empirical Researchin Software Engineering: Concepts, Anais, and Applicatonsby Rech Malhotra CRC Press Tayor Francs Group Configuration Accounting + Configuration accounting is the process that is responsible for keeping track of each and every activity, including changes, and any action that affects the configuration of a software product artifact, or the software product itself. + Generally, the entire data corresponding to each and every change is maintained in the VCS. + Configuration accounting also incorporates recording and reporting of all the information required for versioning or configuration management of a software project. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration Accounting + This information includes the status of software artifacts under versioning control, metadata, and other related information for the proposed changes, and the implementation status of the changes that were approved in the configuration control process. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Configuration Accounting + Atypical configuration status report includes + A list of software artifacts under versioning. These comprise a baseline + Version-wise date as to when the baseline of a version was established. + Specifications that describe each artifact under versioning + History of changes incurred in the baseline * Open change requests for a given artifact. + Deficiencies discovered by reviews and audits + The status of approved changes. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Importance of Mining Software Repositories + Software repositories usually provide a vast array of varied and valuable information regarding software projects. + By applying the information mined from these repositories, software engineering researchers and practitioners do not need to depend primarily on their intuition and experience, but more on field and historical data. + A major reason behind the ignorance of how valuable is the information provided in software engineering repositories, is perhaps the lack of effective mining techniques that can extract the right kind of information from these repositories in the right form Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Importance of Mining Software Repositories + Recognizing the need for effective mining techniques, the mining software repositories (MSR) field has been developed by software engineering practitioners. + The MSR field analyzes and cross-links the rich and valuable data stored in the software repositories to discover interesting and applicable information about various software systems as well as projects. + The MSR researchers aims at carrying out a significant transformation of these repositories from static-record keeping into active ones for guiding the decision-making process of modern software projects. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Data analysis procedure after mining software repositories + Mining repositories Source code Change record + Timestamp + Author + Logs Bug fixes + Defectidentifier + Fixed-By + Dateandtime + Fundin (component/module) + Description + Severity + Priority Web archives + Mails + Chats + Messages Preprocessed data (defect and changes) Metrics Learning techniques + Statistical models + Machine learning Results + Obtain + Validate + Analyze + Interpret Empirical Research n Sofware Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Importance of Mining Software Repositories + After mining the relevant information from software repositories, data mining techniques can be applied and useful results can be obtained, analyzed, and interpreted. + These results will guide the practitioners in decision making. Hence, mining data from software repositories will exhibit the following potential benefits: (Enhance maintenance of the software system O Empirical validation of techniques and methods O Supporting software reuse Q Proper allocation of testing and maintenance resources CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Commonly used Software Repositories ry Cees Epica Reserchin oftware Engng Cones Aras andApplatiosby chika ahora (@pa®) CRE Types of Software Repositories Epica Reserchin oftware Engng: Cones Aras andApplatiosby Ruchita ahora (@pa®) CRE Historical repositories + Historical repositories record varied information regarding the evolution and progress of a software project. + They capture significant historical dependencies prevalent between various artifacts of a project, such as functions (in the source code), documentation files, or configuration files. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Historical repositories + Historical repositories include O Source control repositories U Bug repositories Q Archived communications Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Historical Repositories: Source Control Repositories + They record and maintain the development trail of a project and track each and every change incurred in any of the artifacts of a software system, such as the source code, documentation manuals, and so on. + They maintain metadata regarding each change, for instance, the developer or project member who carried out the change, the timestamp when the change was performed, and a short description of the change. + Examples: Git, CVS, subversion (SVN), Perforce, and ClearCase. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Example Log File from Apache Logaj at Git com 41625bac0085 0695209726937 ood 1D48 ‘Autor Cuts Wika Acted Date Sun Mar 28 0440"14 2070-0000 ‘2948002 Att hnrowsben} pater pls bing patter you tessa deta test un) .9t-omic: pis apace orropostastloggngog runkeQ028598 1379535-47bb 03109986 -eASO6do18 srclchangesicnanges xm 1 Teriapachelogt/EnnancesPaternayutjova [ia se lapachetogsjtelpersNOGKeySetExractor ava | 1 larlapachalogt/patemGachedDataFmatjova| 4 r++ lapacnetogsjpatomDateaterConverar java | 8 ‘ogtjpatem'PropariesPatemConererjove | 5 ‘onabelvonationPatemonverier java, [Bh eessseeceanves ‘esisbuld sm Ta ipatrleshancadPatemayoulmée 1 popes | 2+ ipatoriennancedPatorLayou grepenies | 2% IpaenlennancadPatoLsyou groperies | 2+ ipatriennancedPaterLayouS preperies | 2% IpatenleshancedPatiemayoul ropes | 2+ ipatomiennancedPatorLayou proper | 2+ IpaterlenhancedPatiemsyouls ropes | 2+ TpattonienhancadPaterLayout® proper | 2+ AegtjEmharcesPatemtajeutTostCave|ava || 11 reem fogsjualemrancedntTestRunnerite: java. [16 #45 27 fos changed. 72 nsoins(). 85 slot). Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Taylore Francs Group Historical Repositories: Bug Repositories + These repositories track and maintain the resolution history of defect/bug reports, which provide valuable information regarding the bugs that were reported by the users or developers of that project. eugzila YJIRA CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Historical Repositories: Archived Communication Repositories + Discussions regarding the various aspects of a software project during its life cycle are recorded in the archived communications. Q Mailing Lists OQ Emails Q Instant messages, and O Internet Relay Chat (IRC) chats Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Run Time Repositories + Run-time repositories, also known as deployment logs, record information regarding the execution of a single deployment, or different deployments of a software system. + For example, run-time repositories may record the error messages reported by a software application at varied deployment sites. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Run Time Repositories + They can possibly be employed to determine the execution anomalies by discovering dominant execution or usage patterns across various deployments, and recording the deviations observed from such patterns. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Source Code Repositories + Source code repositories maintain the source code for a large number of OSS projects. + Example: Sourceforge.net & Google code. They host the source code for a large number of OSS systems, such as Android OS, Apache Foundation Projects, and many source Google, HigL Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Understanding Systems + Understanding large software systems still remains a challenging process for most of the software organizations. + Most importantly, documentation manuals and files pertaining to large systems rarely exist and even if such data exists, they are often not updated + In addition, system experts are usually too preoccupied to guide novice developers, or may no longer be a part of the organization + Evaluating the system characteristics and tracing its evolution history thus have become important techniques to gain an understanding about the system. Empirical Researchin Software Engineering: Concepts, Analysis, and Applicatonsby Ruck Malhotra CRC Press Tayor Francs Group System Characteristics + A software system may be analyzed by the following general characteristics, which may prove helpful in decision-making process on whether data should be collected from a software system and used in research- centric applications or not. G Programming language(s): The computer language(s) in which a software system has been written and developed. Java remains the most popular programming language for many OSS systems, such as Apache projects, Android OS, and many more. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se System Characteristics + Number of source files: Q This attribute gives the total number of source code files contained in a software system. C In some cases, this measure may be used to depict the complexity of a software system. © Asystem with greater number of source files tends to be more complex than those with lesser number of source files. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group System Characteristics + Number of lines of code (LOC): Q It is an important size metric of any software system that indicates the total number of LOC of the system. Many software systems are classified on the basis of their LOC as small-, medium-, and large scale systems. Q This attribute also gives an indication of the complexity of a software system. O Generally, systems with larger size, that is, LOC, tend to be more complex than those with smaller size. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se System Characteristics + Platform: QThis attribute indicates the hardware and software environment (predominantly software environment) that is required for a particular software system to function. G For example, some software systems are meant to work only on Windows os. + Company: Q This attribute provides information about the organization that has developed, or contributed to the development of a software system. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se System Characteristics + Versions and editions: Q A software system is typically released in versions, with each version being rolled out to incorporate some significant changes in the previous version of that software system. O Even for a given version, several editions may be released to incorporate some minor changes in the software system Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group System Characteristics + Application/domain: OA software system usually serves a fundamental purpose or application, along with some optional or secondary features. Q Open source systems typically belong to one of these domains: graphics/media/3D, IDE, SDK, database, diagramjvisualization, games, middleware, parsers/generators, programming language, testing, and general purpose tools that combine multiple such domains. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Version Control Systems + Version Control Systems (VCS), also known as source control systems or simply versioning systems, are systems that track and record changes incurred to a single artifact or a set of artifacts of a software system. Each and every change, no matter how big or small, is recorded over time so that we may recall specific revisions or versions of the system artifacts later. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Version 3 Version 2 Version 1 CRC Press Tayor Francs Group Basic Terminology used for VCS + Revision numbers: Q VCS typically tend to distinguish between different version numbers of the software artifacts. These version numbers are usually called revision numbers and indicate various versions of an artifact. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Basic Terminology used for VCS + Release numbers: Q With respect to software products, revision numbers are termed as release numbers and these indicate different releases of the software product. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Basic Terminology used for VCS + Baseline or trunk: QA baseline is the approved version or revision of a software artifact from which changes can be made subsequently. C Itis also called trunk or master. po Branch > Branch 2 > Baseline (original line of development)} —————> Branch3 CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Basic Terminology used for VCS + Tag: QO Whenever a new version of a software product is released, a symbolic name, called the tag, is assigned to the revision numbers of current software artifacts. Q The tag indicates the release number. (In the header section of every tagged artifact, the relation tag (symbolic name)—revision number is stored. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Basic Terminology used for VCS + Branch: Q Branch are very common in a VCS and a single branch indicates a self maintained line of development. OA developer may create a copy of some project artifacts for his own use, and give an appropriate identification to the new line of development. Q This new line of development created from the originally stored software artifacts is referred to as a branch. Multiple copies of a file may be created independent of each other. Q Each branch is characterized by its branch number or identification. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Basic Terminology used for VCS + Head: Q It (sometimes also called “tip”) refers to the commit that has been made most recently, either toa branch or to the trunk Q The trunk and every branch have their individual heads. O Head is also sometimes referred to the trunk. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Functionalities provided by VCS + Revert project artifacts back to a previously recorded and maintained state. + Revert the entire software project back to a previously recorded state. + Review any change made over time to any of the project artifacts. + Retrieve metadata about any change, such as the developer or project member who last modified any artifact that might be causing a problem, and more. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Classification of VCS Local VCS + Local VCS employ a simple database that records and maintains all the changes to artifacts of the software project under revision control + A system named revision control system (RCS) was a very popular local versioning system, which is still being used by many organizations as well as. the end users. + RCS operates by simply recording the patch sets (i.e., the differences between two artifacts) while moving from one revision to the other in a specific format on the user's system. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Local VCS + It can then easily recreate the image of a project artifact at any point of time by summing up all the maintained patches. + However, the user cannot collaborate with other users on other systems, as the database is local and not maintained centrally. + Each user has his/her own copy of the different revisions of project artifacts, and thus there are consistency and data sharing problems. + If one user loses the versioning data, recovering it is impossible until and unless a backup is maintained from time to time. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Centralized VCS (CVCS) + The main aim of CVCS is to allow the user to easily collaborate with different users on other systems + These systems, such as CVS, Perforce, and subversion (SVN), employ a single centralized server that records and maintains all the versioned artifacts of a software project under revision control, and there are a number of clients or users that check out (obtain) the project artifacts from that central server. + For several years, this has been the standard methodology followed in various organizations for version control CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Q However, if the central server fails or the data stored at central server is corrupted or lost, there are no chances of recovery unless we maintain periodic backups. Client system Client server Client system Versioning database Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Centralized version control system CRC Press Distributed VCS + To overcome the limitations of CVCS, distributed VCS (DVCS) were introduced + As opposed to CVCS, a DVCS (such as Bazaar, Darcs, Git, and Mercurial) ensures that the clients or users do not just obtain or check out the latest revision or snapshot of the project artifacts, but clone, mirror, or download the entire software project repository to obtain the artifacts. + If any server of the DVCS fails or its data is corrupted or lost, any of the software project repositories stored at the client machine can be uploaded as back up to the server to restore it. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Distributed VCS + Therefore, every checkout carried out by a client is essentially a complete backup of the entire software project data. + Nowadays, DVCS have earned the attention of various organizations across the globe, and these organizations are relying on them for maintaining their software project repositories. + Git is the most popular DVCS employed in practice and hosts a large number of software project repositories. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Distributed VCS + Google and Apache Software Foundation also employ Git to maintain the source code and change control data for their various projects, including following projects: G Android OS (https://android.googlesource.com) Q Chromium OS Q Chrome browser (https://chromium.googlesource.com) Open Office O logaJ O PDFBox Q Apache-Ant (https://apache.googlesource.com) Empirical Researchin oftware Engineering: Concepts, Analysis, and Appiatonsby Ruch Malhotra CRC Press Tayor Francs Group Server Versioning database ee —_— | Revision 2 Project artifacts Versioning [-—, Client Revision 2 Project artifacts Versioning Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Distributed version control system CRC Press Taylore Francs Group Bug Tracking System + A bug tracking system (also known as defect tracking system) is a software system/application that is built with the intent of keeping a track record of various defects, bugs, or issues in software development life cycle. + Itis a type of issue tracking system. + Bug tracking systems are commonly employed by a large number of OSS systems and most of these tracking systems allow the users to generate various types of defect reports directly. + Typical bug tracking systems are integrated with other software project management tools and methodologies. Some systems are also used internally by efpmRoraa nizations (nites dwwan A707. 00 ations by Ruchita Malhotra CRC Press Tayor Francs Group Bug Tracking System + A database is a crucial component of a bug tracking system, which stores and maintains information regarding the bugs reported by the users and/or developers. + These bugs are generally referred to as known bugs. + The information about a bug typically includes the following: The time when the bug was reported in the software system Severity of the reported bug Behavior of the source program/module in which the bug was encountered Details on how to reproduce that bug Information about the person who reported that bug Developers who are possibly working to fix that bug, or will be assigned the jo Empirical Researchin Sofware Engineering: Concepts, Analysis, and Applications by Rchika Malhotra, ERC Press Bug Tracking System + Many bug tracking systems also support tracking through the status of a bug to determine what is known as the concept of bug life cycle + Ideally, the administrators of a bug tracking system are allowed to manipulate the bug information, such as determining the possible values of bug status. + Hence the bug life cycle states, configuring the permissions based on bug status, changing the status of a bug, or even remove the bug information from the database. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Bug Tracking System + Many systems also update the administrators and developers associated with a bug through emails or other means, whenever new information is added in the database corresponding to the bug, or when the status of the bug changes. + The primary advantage of a bug tracking system is that it provides a clear, concise, and centralized overview of the bugs reported in any phase of the software development life cycle, and their state. + The information provided is valuable for defining the product road map and plan of action, or even planning the next release of a software system. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Bug Tracking System + Bugzilla is one of the most widely used bug tracking systems. + Several open source projects, such as Mozilla, employ the Bugzilla repository. 6? Bugzilla Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Tayor Francs Group Extracting Data from Software Repositories + 1. The first step in the data-collection procedure is to extract metrics using metrics-collection tools such as understand and chidamber and kemerer java metrics (CKJM). + 2. The second step involves collection of bug information to the desired level of detail (file, method, or class) from the defect report and source control repositories. + 3. Finally, the report containing the software metrics and the defects extracted from the repositories is generated and can be used by the researchers for further analysis. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Procedure to extract data from software repositories +The data is kept in software ] Soute control repositories in various types such epostoes J [Som as CVS, Git, SVN, ClearCase, evscnac Perforce, Mercurial, Veracity, and Collect dfectichange | Fossil Collet software metrics wing ‘metrics calslator tools ach understand, CKI, data using bugichange ‘lection eo + These repositories are used for a management of software content a and changes, including No documents, programs, —_ user = documentation, and other related information. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Concurrent Version System (CVS) + CVS is a popular CVCS that hosts a large number of OSS systems. + CVS has been developed with the primary goal to handle different revisions of various software project artifacts by storing the changes between two subsequent revisions of these artifacts in the repository. + Thus, CVS predominantly stores the change logs rather than the actual artifacts such as binary files + CVS canstore binary files also, but they are not handled efficiently. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se What are the various features provided by CVS? + Revision numbers: Q Each new revision or version of a project artifact stored in the CVS repository is assigned a unique revision number by the VCS itself. O For example, the first version of a checked in artifact is assigned the revision number 2.2. After the artifacts are modified (updated) and the changes are committed (permanently recorded) to the CVS repository, the revision number of each modified artifact is incremented by one. Q After updation or changes, the revision numbers of the artifacts are not unique. O The final release of a software project comprises of all the artifacts under version control where the artifacts can have individual revision numbers. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se What are the various features provided by CVS? + Branching and merging: Q The user can create his/her own branch for development, and view, modify, or delete a branch created by the user as well as other users, provided the user is authorized to access those branches in the repository. Q To create a new branch, CVS chooses the first unused even integer, starting with 2, and appends it to the artifacts’ revision number from where the branch is forked off, that is, the user who has created that branch wishes to work on those particular artifacts only. For example, the first branch, which is created at the revision number 1.2 of an artifact, receives the branch number 1.2.2 but CVS internally stores it as 1.2.0.2. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se What are the various features provided by CVS? + Branching and merging: Q However, the main issue with branches is that the detection of branch merges is not supported by CVS. Q Consequently, CVS does not boast of enough mechanisms that support tracking of evolution of typically large-sized software systems as well as their particular products. + Drawback of CVS: Lack of functionality to provide appropriate mechanisms for linking detailed modification reports and classifying changes. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se What are the various features provided by CVS? + Version control data: O For each artifact, which is under the repository’s version control, CVS generates detailed version control data and saves it in a change log or simply log files. Q The recorded log information can be easily retrieved by using the CVS log command. Moreover, we can specify some additional parameters so as to allow the retrieval of information regarding a particular artifact or even the complete project directory. CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Se Example log file from Mozilla project at CVS doa: ert 1 It shows the versioning data for the symolie nanos source file “nsCSSFrameConstructor.cpp,” which is taken from the Mozilla epuord apatitotton: 7 project. ee O The CVS change log file typically comprises of several sections and ESTRUS nae memes mesne.om wate messmo «| each section presents the version t history of an artifact (source file in the given example). O Different sections are always separated by a single line of “=” characters. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Taylore Francs Group Example log file from Mozilla project at CVS RCS file: This field contains the path information to identify an artifact in the repository. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Example log file from Mozilla project at CVS Locks and Access List: O These are file content access and security options set by the developer during the time of committing the file with the cvs. CO These may be used to prevent unauthorized modification of the file and allow the users to only download certain file, but does not allow them to commit protected or locked files with the CVS repository. CRC Press Taylore Francs Group Example log file from Mozilla project at CVS Symbolic names: |_, QO This field contains the revision numbers assigned to tag names. OThe assignment of revision numbers to the tag names is =— carried out individually for /a3 20:23:16; author: [email protected]; state: Exp; Lint: +16 - 47 each artifact because the lace fantie omfineiom UAT Ee | revision numbers might be different. Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press Taylore Francs Group Example log file from Mozilla project at CVS Description: OThis field contains the modification reports that describe the change history of the artifact, beginning from 1 the first commit until the current version. a/talis senna; autos doolneteape.cam sates Boy lissee 6 - «7 QGApart from the changes icc baiie_comfaneioned USENET | incurred in the head or main meee trunk, changes in all the branches are also recorded e there. The revisions are ere me apn ht tree nearest st bY separated by a few number of Empirical Researchin Software Engineering: Concepts, Analysis, and Application by Rucika Mafb¥A ACE er?) CRC Press Taylore Francs Group Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Example log file from Mozilla project at CVS Revision number: This field is used to identify the revision of source code artifact (main trunk, branch) that has been subject to change(s). CRC Press Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Example log file from Mozilla project at CVS Date: This field records the date and time of the check in. 1 Author: This field provides the information of the person who committed the change. CRC Press Example log file from Mozilla project at CVS State: This field provides information about the state of the committed artifact and |“ generally assumes one of these values: “Exp” (experimental) and “dead” (file has been removed). Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra CRC Press ta meas _ -_ Date: Thu Jul 11 09:32128 2013 + 0900 Bug: 9767739 Urce: fix read EF Image Instance ‘The E¥e(4mc) path under OF Graphics are not distinguish with che EFs(amco) path Under DP Phonebook. 0, getarpsth (8? IIDF) is not able to return correct path. Because goteFPach(EF_IMG) 18 corroct Bach, OP graphics, gotzFPath(EP_ING) 1s used instead of geteFPath(EP_UIDe), EF_IMG is a linear fixed EF. The result of loading EP_IMs should be processed as'a LoadLinearPixedcontext. 60, it is needed to calculate the nunber of EF ima records. If those changes are adied, the changes are duplicated ‘with the codes of EVENT GET RECORD S122. DONS. The codes of EVENT_GST RECORD STZE_ING_ DONE are renoved and the event 1s treated by the logic of the EVENT_GET RECORD Size ‘DONE. And then renove incorrect handler eventa(EVENT_READ_IMG_GONE and [EVENT_RUAD_ICON_DONS) are qoved to the handler eventa which hava the procedure for Loading same type EFS (EVENT READ_RECORD_DOWE and che EVENT_READ_BINARY_DOWE) /internal/telephony/uicc/IecPilellandler java | 140 +++ /internal/telephony/uice/RuimPileHandier-java | 8 4 2 Eslee changed, 38 insertions (+), 120 deletions (-) Empirical Research in Software Engineering: Concepts, Analysis, and Applications by Ruchika Malhotra Git also maintains integrity (no change can be made without the knowledge of Git) and, generally, Git only adds data. CRC Press Taylore Francs Group

You might also like