0% found this document useful (1 vote)
2K views24 pages

Version Control Systems - Emphasis On Distributed (BZR, HG, Git)

The various version control systems and their time frames. Emphasis on DVCS Git, Hg and Bzr programs. Pros and cons, development speed and future market share. Which DVCS is in use in which Open Source projects.

Uploaded by

jaalto
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
2K views24 pages

Version Control Systems - Emphasis On Distributed (BZR, HG, Git)

The various version control systems and their time frames. Emphasis on DVCS Git, Hg and Bzr programs. Pros and cons, development speed and future market share. Which DVCS is in use in which Open Source projects.

Uploaded by

jaalto
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Software Development and Version

Control Systems
- Emphasis on Distributed -

By Jari Aalto

© Jari Aalto 1
Terminology

• Terms SCM, VCS and RCS are commonly used


interchangeable; they usually refer to the same thing:
– SCM = Source code management*
– VCS = Version control systems
– RCS = Revision control system
• Note, that ”RCS”was also name of an early program that
provided revision control (see also SCCS).
• Glossary:
– Revision = a change identified by the system
– Change set = a set of changes in one commit
– Version = a product from the system; usually a numeric release.

(*) SCM is also used for Software configuration Management © Jari Aalto 2
Collaboration: Basic Problems

• Persons working on same files


– Person A makes a modification; Person B makes too – who wins?
– The code does not work. ”I’m waiting for you to fix it”
– The change is too big! ”We only needed bug fix, no new features”
– Copies easily float around – which is the latest?

© Jari Aalto 3
Collaboration: Convential Solutions

• Never two persons work at the same time. When one person
finishes, he notifies other / sends work to others.
– Problems: Unsafe, Doesn’t scale, copies float around
• Shared directory (Windows network share; Unix/Linux NFS)
– Problems: Access restrictions, permissions etc. See above
• File is locked during edit (Windows; Unix/Linux *.lock files)
– Problems: Someone forgots to unlock, program left open

Serialized work, time management problems

© Jari Aalto 4
Benefits of Version Control

• Single developer
– possibility to revert to a previous revision (backup)
– Code review between changes (revision differences)
• Team development
– Noticing changes immediately
– Separation of development lines (stable/devel branches)
– Improves project structure (directories, naming conventions)
– Easy sharing of code in new projects

© Jari Aalto 5
Version Control Systems Compared*

• Centralized: Accurev, CearCase, Perforce, MS VSS/TFS;


Svn, Cvs
– Client–server: admin, security, access rights, participating problems
– Disk failure / repository corruption causes whole project to halt
– All code is in one server: a star-like system needs beefy hardware
– Branching and merging issues (Cvs, Svn)

• Distributed: BitKeeper; Bazaar, Mercurial, Git, [Monotone,


Darcs]
– In DCVS, only disk space (http, sftp, ssh) is needed; easy to relocate
– All developers can have a complete copy (sandboxes)
– Disk crash at some developer’s host does not necessarily affect project.
– Fast: "offline", no network lag / communication only when needed
– Branching and private modifications
© Jari Aalto 6
An example: Linux Kernel Project

• Not the biggest FOSS project, but probably the most active
• 10 MiB code changes a months (in form of patches)
• ≈ 20 000 files, 280 MiB sources, approx. 5.5-7 million lines
of code.
• Many branches
– Short life: develop a feature, a fix. Merged when ready
– Long life: bigger features that need separate line of development
(ReiserFs4, Ext4 etc.)
– Test, debug: a modification goes through several phases before feature is
accepted to mainline (*-mm trees etc.)

Centralized model difficult; need personal sandboxes

© Jari Aalto 7
Version Control System Maturity
Features

Commercial
Git (C/sh)
= star
BitKeeper Bzr (P)
= dvcs Hg (P)
ClearCase
Open Source Accurev
Perforce Darcs (H)
= star Mtn (C)
MS TFS
= dvcs
Svn (C) (Arch C/sh)
Cvs (C) MS VSS Programming languages:
GNU Rcs (C) C, (H)askel, (P)ython, (sh)ell

old design mature / stable New design


Accurev: novel new ideas.
Streams, not distributed
© Jari Aalto 8
Version Control Software Timelines
Features

Git (2005 1.0)


Commercial
Bzr (2005/07 1.0)
= star
Hg (2005/08 1.0)
Cvs BitK
= dvcs Linux kernel 2009
patches/tarballs 1991 2002
(cf. quilt) Mtn (2003/?)
Open Source Darcs (2002/04 1.0)
Arch (2001/03 1.0)
= star BitK(1998)
Svn (2001/04 1.0)
= dvcs
Perforce (1995) 2006
YYYY = The year ClearCase (1992, 2003 IBM)
of many projects
moving to use Cvs (1986/90) Legacy Systems
the VCS
Time
1980 1990 2000
© Jari Aalto 9
Free Version Control Hosting
The start of DVCS bandwagon (see table 1, table 2)

• Sourceforge.net 2001 160 000/1.7M users


– Cvs, Svn2006 Svn2006, Bzr2009, Hg2009, Git2009, Semi-commercial
• Savannah.gnu.org and Gna.org 2001 30 000/60K users
– Cvs, Svn2005; Arch*2005, Git2007, Hg2008 GNU ideology
• Launchpad.net 2004 5000/1.5M users
– Cvs, Svn, Bzr. Ubuntu Linux development, PPA2007
! • Github.com 2008, Gitorious.org 2008 rapid growth
– Git. GitHub is semi-commercial (see also Repo.or.cz)
• Code.google.com 2006 200M users
– Svn, Hg2009

[*] = The first FOSS DCVS; GNU Arch, unused by 2006 © Jari Aalto 10
DVCS Release Schedules
Open Source
usable!

Git
0.1 - 1.0 1.5 1.6 1.6.4
Git (2005-04-07)
4 12 1 2 4 6 2 6

Hg
0.1 - 0.7 0.9.3 0.9.4 0.9.5 1.0 1.3.1
Hg (2005-05-27)
5 6 9 1 4 7 12 6 10 3

Bzr Speed
0.1 - 0.6
12 0.9 1.0 (2.0)
Bzr (2005-03-22)
3 4 6 8 9 10 8 11 12 1 4 5 6 7 8 9 1112
Time
2005 2006 2007 2008 2009
© Jari Aalto 11
Pace of Development (1/3)

Git

Bzr

Hg

Source: Gmane.org
© Jari Aalto 12
Pace of Development (2/3)

Git

Bzr

Hg

Source: www.ohloh.net
© Jari Aalto 13
Pace of Development (3/3)

Git

Bzr

Hg

Source: www.ohloh.net
© Jari Aalto 14
DVCS and FOSS projects

Million lines of code

Source: Ohloh.net (2009). FOSS = Free and Open Source Software


© Jari Aalto 15
DVCS Popularity Estimates
popularity
- Darcs: Exotic. Scaling / mem issues
- Hg: head start. Xen 10M,
Predicted OpenJDK 6M, OpenSolaris 5M,
growth Python, Mozilla, XEmacs
- Bzr: At Fringe. Future looks
bright (launchpad.net Ubuntu),
Emacs 1.7M, MySQL 1.5M
- Git: Rapid growth in user base and
Projects. QT 24M, Kernel 11M,
Git has technology advantages:
• merging: multiple strategies
Perl 4M, X.org 3M, Wine 2.5M,
• gateways: Cvs, Svn
Android , Gnome, (Debian)
Current
popularity
Project that will move:
- Cvs: OO 20M, FreeBSD (Hg),
Eclipse
- Svn: Samba(git) 2M
Apache/TomC 1.5M, GCC 8K,
Mono 8K, Kde (git)
Prediction source: Darcs Hg Bzr Git N million lines of code
personal gut feeling (Monotone) W2K: 20 M © Jari Aalto 16
State of DVCS: Performance

Compared to Git (average): Hg 6x, Bzr 7-8x


init Hg 1.6x, Bzr 20x ci Hg 40x, Bzr 70x
add Hg –90%, Bzr 60% clone Hg 5x, Bzr 3x

Linux 2.6.30 sources (ca. 28 000 files, 1700 dirs; 350 MiB)

Source: DVCS Benchmark results http://www.editgrid.com/user/jaalto/vc-test


© Jari Aalto 17
DVCS Space Requirements

Percentages (%) bigger than original sources

Source: DVCS Benchmark results http://www.editgrid.com/user/jaalto/vc-test


© Jari Aalto 18
Scope of DVCS Projects
• Git
– Features, features, more cool features
– No usability roadmap. 80/20* rule problem
– No bug tracker. "Decentralized”: go and fix it yourself if you want
something and be prepared for harsh critizism. Good quality achieved by
mailing list patch reviews
– High rate of development, very lively community
• Hg
– Portability, ease of use
– Small development team
• Bzr
– Extensions, UI and speed is the primary focus; emphasis on usability
– Features are simple and serve the needs of the people well (80/20 ok!)
– TDD: well planned tests, development process and bug tracker (launchpad)
(*) ”Version control and the 80%” by Ben Collins-Sussman 2007-10-16 © Jari Aalto 19
Weaknesses of DVCS
• Git
– Highly complex, non-unified UI with 150 commands: plumbing API,
porcelain. Manual care needed for repository maintenance (garbage collect).
– cross-platform issues: tied tightly to Unix/Linux –like OS.
– Revision numbers are very different: SHA1 abcd24132b8e65678f… vs. Cvs
1.1 or Svn r12343.
– Migration issues: centralized-emulation is not the easiest of the pack
– No plug-in features other than hooks (Due to C/sh).
• Hg
– Although quite fast, contains less features. Weak collaboration support.
(email/receive). Limited network protocols: http.

• Bzr
– Overall slowness, branching efficiently is difficult (special setup).
– Branches are "directories”. Good cross-platform line ending control.
– Cherry picks are just "merges" that are not tracked (cf. Git).
© Jari Aalto 20
State of DVCS: Git

• Recent enhancements since 1.5.x


– Sub module support: super projects
– "git gui" – a graphical display of commits, merges
• TODO
– Conversion to C language continues (Windows OS; unofficial)
– UI unification needed for all commands: options naming, --long option
support for all etc.
– Directory versioning support (may never be)
– Real rename support (may never be; must be careful)
– Does not track file permissions: ACLs (may never be)
– Extremely inefficient HTTP protocol (may never improve): 12-22x slower
than Hg

© Jari Aalto 21
State of DVCS: Hg

• Recent enhancements since 1.x


– Symbolic link support, large file handling support
– More performance
• TODO
– No directory versioning
– Can't diff by date
– EOL-handling needs more robust design

© Jari Aalto 22
State of DVCS: Bzr

• Recent enhancements since 1.x


– Repository performance improvements
– Cherry picking (new repository format), almost git-style branch switching
– OS line ending control (cf. SVN:properties) 1.15
• 2.x (2009)
– Performance gap to Hg leveled
– Branching speed is in par with Git: very fast with shared repositories.
• TODO
– More performance tuning (repository changes)
– Network communication bottlenecks need resolving
– Network protocols rsync, WebDAV and web interfaces (like bzrweb) need
to be moved into the core

© Jari Aalto 23
Conclusions

• After the start of DVCS development in 2005, three strong


contenders are left. Others, like Darcs, have serious
technical problems*: scaling, disk consumption etc.

– Git will dominate technically and offer ”enough rope to hang oneself
multiple times”. On the other hand support for git is easy to find. Extremely
flexible but a complex system.
– Bzr will probably be the choice of corporates: is has clear migration plan: 1)
same command set as those in centralized VCS and 2) it offers an easy
migration plan. User can choose centralized or distributed model. Big
shoulders: GNU and Canonical backing. Speed is no longer an issue.
– Hg has too little development power to keep in pace with the two.

(*) DVCS Round-up: One System to Rule Them All by Robert Fendt 2009-01-19
© Jari Aalto 24

You might also like