0% found this document useful (0 votes)
7 views

Module_4

Uploaded by

shezrisa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module_4

Uploaded by

shezrisa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Module_4

Identification of
Different Version Control Systems
Modules Objective
• Identify the local version control works ?
• Discuss the nature and characteristics of CVCS ?
• Discuss the nature and characteristics of DVCS ?
• Identify workspace?
• Merging of the source code ?
• Scaling in version control system?
• CVCS vs DVCS?
• Identify the multiple repositories model ?
Identify the types of repositories :
• Local repositories reside on the computers of team members. In contrast, remote repositories are hosted on a server that
is accessible for all team members - most likely on the internet or on a local network.
• Local version control keeps track of the files within the local system.
• Local version control systems save a series of patches, but collaboration or branching is almost impossible with local
repositories.
• A local repository, from a DVCS viewpoint is a collection of files which originate from a certain version of the repository
• The collection of files is called the working tree or the checkout.
• A local version control, as the name suggests is a local system that keep track of trees within the local systems.
• We now look at local repositories from a DVCS viewpoint
• There are two kinds of repositories the DVCS.
• Local repository-copy of a central repository that is available on the local computer
• Remote repository-the repository available in the central server
• The primary advantage of DVCS like Git is that local repository is the mirror central repository. It is the local repository
that the developers make the necessary changes.
• The local repository is called the working copy of the checkout.
• From working directory ,changes are added to the staging area and then committed to the local repository
• From Local repository changes are then pushed to the remote repository
Centralized Version Control System
• In a centralized (CVCS), a server acts as the main repository which stores every version of code.
• Using centralized source control, every user commits directly to the main branch, so this type of version control often
works well for small teams, because team members have the ability to communicate quickly so that no two developers
want to work on the same piece of code simultaneously.
• Strong communication and collaboration are important to ensure a centralized workflow is successful.
• Centralized version control systems are based on the idea that there is a single “central” copy of your project somewhere
(probably on a server), and programmers will “commit” their changes to this central copy. “Committing” a change simply
means recording the change in the central system.
Centralized Version Control System
• In centralized source control, there is a server and a client.
• The server is the master repository that contains all of the versions of the code. To work on any project, firstly user or
client needs to get the code from the master repository or server.
• So the client communicates with the server and pulls all the code or current version of the code from the server to their
local machine. In other terms we can say, you need to take an update from the master repository and then you get the
local copy of the code in your system.
• So once you get the latest version of the code, you start making your own changes in the code and after that, you simply
need to commit those changes straight forward into the master repository.
• Committing a change simply means merging your own code into the master repository or making a new version of the
source code. So everything is centralized in this model.
• There will be just one repository and that will contain all the history or version of the code and different branches of the
code. So the basic workflow involves in the centralized source control is getting the latest version of the code from a
central repository that will contain other people’s code as well, making your own changes in the code, and then
committing or merging those changes into the central repository.
CVCS :
What are the advantages of a centralized
version control system?
Works well with binary files
• Binary files, such as graphic assets and text files, require a large amount of space, so software developers turn to
centralized version control systems to store this data. With a centralized server, teams can pull a few lines of code without
saving the entire history on their local machine. Users of distributed systems have to download the entire project, which
takes up time and space and prevents them from doing diffs. If a team works with binary files regularly, a centralized
system offers the most efficient approach to code development.
Offers full visibility
• With a centralized location, every team member has full visibility into what code is currently worked on and what changes
are made. This knowledge helps software development teams understand the state of a project and provides a foundation
for collaboration, since developers share work in the central server. A centralized version control system only has two data
repositories that users have to monitor: the local copy and the central server. Distributed version control systems, like Git,
use multiple repositories, which can decrease insight into projects.
What are the advantages of a centralized
version control system?
Decreases the learning curve
• Centralized version control is easy to understand and use, so developers of any skill level can push changes and start
contributing to the code quickly. Setting up the system and the workflow is also simple and doesn’t require a significant
amount of time investment to establish how the software development team should use the tool. When developers can
navigate a workflow quickly and easily, they’re able to focus on feature development rather than memorizing a series of
complicated steps to merge versioned changes. Decreasing the learning curve also helps new developers make an impact
as soon as possible.
• Centralized systems are typically easier to understand and use
• You can grant access level control on directory level
What are the disadvantages of a centralized
version control system?
• A single point of failure risks data
• The biggest disadvantage is the single point of failure embedded within the centralized server. If the remote server
goes down, then no one can work on the code or push changes. The lack of offline access means that any disruption
can significantly impact code development and even result in code loss. The entire project and team comes to a
standstill during an outage. In the event of hard disk corruption, software development teams must rely on backups
to retrieve the running history of a project. If backups haven’t been kept properly, then the team loses everything.
When storing all versions on a central server, teams risk losing their source code at any time. Only the snapshots on
local machines are retrievable, but that is a small amount of code compared to the entire history of a project.
• Unlike a centralized VCS, a distributed version control system enables every user to have a local copy of the running
history on their machine, so if there’s an outage, every local copy becomes a backup copy and team members can
continue to development offline.
• Slow speed delays development
• Centralized version control system users often have a difficult time branching quickly, because users must
communicate with the remote server for every command, which slows down code development. Branching becomes
a time-consuming task and allows merge conflicts to appear, because developers can’t push their changes to the
repository fast enough for others to view. If team members have slow network connections, the code development
process becomes even more tedious when trying to connect with the remote server.
What are the disadvantages of a centralized
version control system?
• The speed at which software development teams operate has a direct impact on how quickly they can ship features and
deliver business value. If teams are slow to develop, iteration and innovation stall and developers can become frustrated
with how long it takes to see their changes in the application. Missed releases are possible if the remote server or
networks are down, and team members won’t be able to make up for lost time and quickly push changes.

• Few stable moments to push changes


• A centralized workflow is easy for small teams to utilize, but there are limitations when larger teams try to
collaborate. When multiple developers want to work on the same piece of code, it becomes difficult to find a stable
moment to push changes. Unstable changes cannot be pushed to the main repository so developers have to keep
them local until they’re ready for release.
• Because users delay pushing changes, software development projects can be delayed, and merge conflicts can arise,
because the rest of the team doesn’t have visibility into changes that exist only on a user’s machine. Once changes
are finally pushed to the central repository - after dealing with stability and speed issues - users will have to resolve
conflicts quickly when merging to ensure the rest of the team can contribute to the code. The lack of stability is what
leads many teams to migrate to different version control systems, such as Git.
Distributed Version Control System :
• A distributed version control system (DVCS) is a type of version control where the complete codebase — including its full
version history — is mirrored on every developer's computer. It's abbreviated DVCS.
• Changes to files are tracked between computers. For example, my workstation and yours. In the beginning, this required
specific coordination strategies to maintain consistency in projects, so all the developers could keep track of what was
happening to files at any given time.
• Distributed version control (also known as distributed revision control) is a form of version control in which the complete
codebase, including its full history, is mirrored on every developer's computer.[1] Compared to centralized version control,
this enables automatic management branching and merging, speeds up most operations (except pushing and pulling),
improves the ability to work offline, and does not rely on a single location for backups
• What Is True About Distributed Source Control System?
• Here is what many cite as distributed source control system advantages compared to other systems like centralized
version control:
• Branching and merging can happen automatically and quickly
• Developers have the ability to work offline
• Multiple copies of the software eliminate reliance on a single backup
• Another cited benefit is the increase in developer productivity. Because all the code is on your own workstation, it
makes common activities quick: check in, check out, and commit. This was vital because back in the “olden days,”
server access, and even workstations, were slower than today.
Distributed Version Control System :
Distributed Version Control System :
• These systems work on a peer-to-peer model: the code base is distributed amongst the individual developers’ computers.
In fact, the entire history of the code is mirrored on each system.
• There is still a master copy of the code base, but it’s kept on a client machine rather than a server. There is no locking of
parts of the code; developers make changes in their local copy and then, once they’re ready to integrate their changes into
the master copy, they issue a request to the owner of the master copy to merge their changes into the master copy.
• With a DVCS, the emphasis switches from versions to changes, and so a new version of the code is simply a combination of
a number of different sets of changes. That’s quite a fundamental change in the way many developers work, which is why
DVCS’s are sometimes considered harder to understand than centralized systems.
The Evolution of Distributed Version Control
Systems
• The nature of software development activities has changed a lot since 2005. That was when distributed version control
began its climb in popularity. The original Linux hackers drove the need for a new, free version control system to support
their work on the kernel. It was great because developers were working mostly from home and contributing to the same
project.
• Today, there are far more challenging requirements. An enterprise has numerous projects going at the same time, each
potentially made up of thousands of files. It is an explosion of code. Projects often include code and non-code assets,
artifacts, containers, and even graphics, audio, movies, and other binaries. Distributed version control systems just can’t
keep up with the demands.
• Speed and Security Shortfalls
• Let’s start with performance. It is no longer a handful of developers working from home on a project. Now, it can be
hundreds or thousands of contributors. For example, Android OS code consists of upwards of 1,100 repositories.
Continuous Integration (CI) builds can take forever. And the result — developers spending time waiting for pass/fail
results — is costly.
The Evolution of Distributed Version Control
Systems
• Security is now a far greater concern than it used to be. Developers downloading an entire code base onto a laptop is
problematic even without the risk of attacks, which are increasingly prevalent. From a security standpoint, you want
to follow the principle of least privilege with all your systems, and this includes the version control system. This is
cumbersome or impossible to do, when the unit of granularity is an entire repo.
• Redundant Backup Plans
• When it comes to backups, the redundancy of having your code on individual workstations isn’t going to help today.
What you need is a single source of truth that is backed up in a way to ensure that you can maintain developer
productivity without interruption. That doesn’t mean resuming work off someone else’s workstation copy of the
software. It means defining a recovery point objective (RPO) and a recovery time objective (RTO) in your business
continuity and disaster recovery (BC/DR) plan.
Advantages Over Centralized Version Control
• The act of cloning an entire repository gives distributed version control tools several advantages over centralized systems:
• Performing actions other than pushing and pulling changesets is extremely fast because the tool only needs to access the
hard drive, not a remote server.
• Committing new changesets can be done locally without anyone else seeing them. Once you have a group of changesets
ready, you can push all of them at once.
• Everything but pushing and pulling can be done without an internet connection. So you can work on a plane, and you
won’t be forced to commit several bugfixes as one big changeset.
• Since each programmer has a full copy of the project repository, they can share changes with one or two other people at
a time if they want to get some feedback before showing the changes to everyone.
• Other than push and pull, all actions can be performed very quickly, since it is the hard drive, and not the remote server
that is accessed every time.
• Changesets can be committed to the local repository first and then a group of these changesets can be pushed to the
central repository in a single shot.
• Only the pushing and pulling activities need internet connectivity; everything else can be managed locally.
• Every developer has a complete copy of the entire repository and the impact any change can be checked locally before
the code is pushed to the central repository.
• DVCS is built to handle changes efficiently, since every change has a Global Unique Identifier (GUID) that makes it easy to
track.
• Tasks like branching and merging can be done with ease, since every developer has their own branch and every shared
change is like reverse integration
• DVCS is very easy to manage compared to CVCS.
Disadvantages Compared to Centralized Version
Control
• With many projects, large binary files that are difficult to compress, will occupy more space.
• Projects with a long history, i.e., a large number of changesets may take a lot of time and occupy more disk space.
• With DVCS, a backup is still needed, since the latest updated version may not be available to all the developers.
• Though DVCS doesn’t prevent having a central server, not having a central server might cause confusions in identifying the
right recent version.
• Though every repo has its own revision numbers, releases have to be tagged with appropriate names to avoid confusions.
Distributed Version Control (DVCS) vs Centralised
Version Control (CVCS)
• DVCS
• DVCS focuses on sharing changes; every change has a guid or unique id.
• Every developer has one local copy of the source code repository, in addition to the central source code repository.
• Distributed systems have no forced structure. You can create “centrally administered” locations or keep everyone as
peers.
• DVCS enables working offline. Apart from push and pull actions, everything is done locally.
• CVCS
• CVCS focuses on synchronizing, tracking, and backing up files.
• CVCS works based on a client-server relationship, with the source repository located on one single server, providing
access to developers across the globe.
• Recording/downloading and applying a change are separate steps in a centralized system, they happen together.
• CVCS relies on internet connectivity for access to the server.
Private Workspace
• DVCS offers a each developer a private copy of the complete repository.
• By giving developers a private copy of the entire repository ,the DVCS opens up much more flexibility for the kind of things
they can do in their private workspace.
• With DVCS, developer can do frequent commits as often as they want to.
• The options proves advantageous for a solitary developer ,who never has to worry about coordinating with others and
managing the maintenance overhead.
• With private workspace ,one can commit incomplete functionality regularity to the local repository to check point without
affecting the other users.
• Almost all version control tools offers private Workspace In CVCS, developers get a working Copy of the files , which acts
as the private space. With DVCS developers get complete repository private as a private copy which the most important
point to note about DVCS
• This private workspace provides an added advantage in the sense are developers never have to think about coordinating
with others during the development.
• When there are multiple developers in a team ,the situation becomes complex. Normally version control system take this
responsibility of managing the complexities
• With the private space in DVCS ,a developer gets a feel that he/she is working alone on the project ,for atleast a while.
• Developers have the flexibility to do anything within their private workspace ,without affecting the workflow of other
developers
Easier Merging
• Branching is generally an easy thing to do, but merging is not.
• People using a CVCS tend to avoid branching because most of those centralized tools aren't very
• good at merging .
• Merging in a DVCS is less error-prone, since they keep the developer's changes distinct from the
• intended merge in order to get the changes committed.
• DVCS deals with whole-tree branches, not directory branches. The path names in the tree are independent of the branch.
This improves interoperability with other tools.
• Branching is easy as compared to merging. Branching is like two people going off in their own directions and not
collaborating.
• People using a CVCS tend to avoid branching because most of those centralized tools aren't very good at merging. When
they switch to a DVCS, they tend to bring that attitude with them, even though it's not really necessary anymore.
Decentralized tools are much better at merging.
Reasons :
• They're built on a Directed Acyclic Graph (DAGs). Merge algorithms need good information about history and common
ancestors. A DAG is a better way to represent that kind of information than the techniques used by most centralized tools.
• They keep the developer's intended changes distinct from the merge she had to do in order to get those changes
committed. This approach is less error-prone at commit time, since the developer's changes are already cleanly tucked
away in an immutable changeset. The only thing that needs to be done is the merge itself, so it gets all the attention it
needs. Later, when tracking down a problem, it is easy to figure out if the problem happened during the intended changes
or the merge, since those two things are distinct in the history.
• They deal with whole-tree branches, not directory branches. The path names in the tree are independent of the branch.
This improves interoperability with other tooling.
Easy to Scale Horizontally
• DVCS much more modest hardware requirements for central server
• Users don’t interact with server unless they need to push or pull.
• All the heavy lifting happens on the client side so the server hardware can be very simple needed
• With a DVCS ,it is also possible to scale the central server by turning it to server farm.
• Instead of one large server machine ,you can add capacity by adding more small server machines using scripts to keep
them all in sync with each other.
• With DVCS the server holding the central repository needs to be powerful enough to serve the needs of the entire team.
For a team of 10 people, this is not an issue. For larger teams, the hardware limitations of the server can be a
performance bottleneck .Some system's expect the server to do a lot of wert. It can be challenging and expensive to set
up a server to support thousands of users
• A DCB has much more modest hardware requirements for central server. Users don't interact with the server unless they
need to push or Dull. All the heavy Ing happens on the client side se the server hardware can be very simple Indeed, with
OVO, SE possible to scale the central server by turning it into a server for instead of one large server machine, you can add
capacity by adding more small server machines, using scripts to keep them al in sync with each other
Multi-Repositories Model
Multi-Repositories Model
• With DVCS, the multiple repositories model can be approached in two ways.
• By understanding how a DVCS works
• The flexibility offered by DVCS of having different repositories for different services within the same organization
• We are familiar that in a DVCS there are multiple repositories, one central repository and multiple local repositories.
Although it is widely claimed that with DVCS, there is no single authoritative central repository, there is always an option
to label any repository, that which has all the revision history updated, as the central repository.
• From the central repository, any number of local repositories can be cloned and kept in the developers local computers.
These local repositories will be the exact replicas of the central repository, with the entire revision history.
• Developers make changes to this local repository and once the changes are done, there is a room to test if the changes
impact rest of the code.
• Once tested, changes from the local repository are merged with the central repository One of the major advantages of this
is there is no single point of failure at any point in time. Even if the central repository is broken, developers can continue to
commit changes to the local repository and push them to the central repository when it is fixed.
• Since the local repository is the mirror of the central repository, any of them can be made as the central repository, even if
any irreparable damage happens to the central repo. Let's now look at how DVCS like Git allows having multiple
repositories within the same organization.
Multi-Repositories Model
• In mono repositories, the source code is placed in a single repository in an organization and developers have access to all code in a
single shot.
• The Multiple repository concept refers to organizing the projects into their own separate repositories.
• DVCS like Git allows us to have multiple repositories, by which giving access to subsets of repositories on a need basis, becomes
possible.
• Especially, products with multiple services can be handled efficiently by having separate repositories for each of them.
• While setting up continuous deployment projects, it is easier to let the repositories have their own processes for getting deployed.
• The second approach to this multiple repository model is about organizing the project into multiple repositories. The multiple
repository concept refer to organizing the project into separate repositories. With DVCS like Git, there is always an option to have
different repositories for different services in the same project.
• Especially, with developers work from different locations across the globe, having multiple repos gives us the freedom to share
only those repositories that are required for a developer at that point in time.
• Setting up continuous deployment projects become easier to handle, since each of the repositories will have their own processes
for getting deployed.
• Having multiple repositories might have its own disadvantages, especially if the number of repositories are more. Managing the
repositories will become a tedious task and leveraging the full advantage of DVCS might become a question. With multiple
repositories, it might be difficult for developers to git clone a whole lot of different repositories at a time.
Resetting the local environment
• The reset command moves both the HEAD and the branches to a specific commit.
• Reset is one of the operations performed to undo the changes done to the local repository.
• We can understand this with the help of reset option in git.
• There are three basic forms or modes of invoking the reset operation that is done using three arguments:
• --soft - Does not affect the index file or the working tree at all. This command resets the head to , in a way that the other
modes perform). This leaves all your changed files "Changes to be committed”, as git status would put it.
• --mixed - This mode resets the index, but not the working tree (the files that undergo change are preserved but not
marked for commit). It also reports what has not been updated. This is the default action. If -N is specified, removed paths
are marked as intent-to-add.
• --hard - This mode resets both the index and working tree. Changes made to the tracked files in the working tree since are
discarded if this mode is selected. This option also wipes out the uncommitted changes. This option has to be used with
caution as the lost data cannot be recovered
• These arguments again correspond to three state management systems in Git, namely the Commit Tree (HEAD), the
Staging Index and Working Directory.
• The reset operation behaves in a way similar to the checkout operation. The command git reset, moves both the HEAD
and branch refs to a specific commit. The command git reset will modify the state of the three trees as specified above.
The command line arguments --soft, --mixed, and --hard direct how to modify the Staging Index, and Working Directory
trees.
• There is a risk of losing work with the reset operation, and revert has been always a safe option to undo any commit. Git
reset will not delete a commit, but commits will become orphaned, i.e., there will not be any direct path from a ref to
access them.
Revert - Cancelling out the Changes
• Revert is also one of the options for undoing the changes, but works in a different way compared to the traditional undo
operation.
• The revert option doesn't remove the commit from the project history, but it checks how to invert the changes
introduced by the commit and appends a new commit with the resulting inverse content.
• The revert operation prevents Git from losing the history, which is very important to maintain the integrity of the version
history.
• Revert option can be used when an inverse of a commit from the project has to be applied.

You might also like