Git Tuturial
Git Tuturial
Version control systems are a category of software tools that help a software team manage changes
to source code over time. Version control software keeps track of every modification to the code in a
special kind of database. If a mistake is made, developers can turn back the clock and compare
earlier versions of the code to help fix the mistake while minimizing disruption to all team members.
For almost all software projects, the source code is like the crown jewels - a precious asset whose
value must be protected. For most software teams, the source code is a repository of the invaluable
knowledge and understanding about the problem domain that the developers have collected and
refined through careful effort. Version control protects source code from both catastrophe and the
casual degradation of human error and unintended consequences.
Software developers working in teams are continually writing new source code and changing existing
source code. The code for a project, app or software component is typically organized in a folder
structure or “file tree”. One developer on the team may be working on a new feature while another
developer fixes an unrelated bug by changing code, each developer may make their changes in
several parts of the file tree.
Version control helps teams solve these kinds of problems, tracking every individual change by each
contributor and helping prevent concurrent work from conflicting. Changes made in one part of the
software can be incompatible with those made by another developer working at the same time. This
problem should be discovered and solved in an orderly manner without blocking the work of the rest
of the team. Further, in all software development, any change can introduce new bugs on its own
and new software can‘t be trusted until it’s tested. So testing and development proceed together
until a new version is ready.
Good version control software supports a developer's preferred workflow without imposing one
particular way of working. Ideally it also works on any platform, rather than dictate what operating
system or tool chain developers must use. Great version control systems facilitate a smooth and
continuous flow of changes to the code rather than the frustrating and clumsy mechanism of file
locking - giving the green light to one developer at the expense of blocking the progress of others.
Software teams that do not use any form of version control often run into problems like not knowing
which changes that have been made are available to users or the creation of incompatible changes
between two unrelated pieces of work that must then be painstakingly untangled and reworked. If
you‘re a developer who has never used version control you may have added versions to your files,
perhaps with suffixes like “final” or “latest” and then had to later deal with a new final version.
Perhaps you’ve commented out code blocks because you want to disable certain functionality
without deleting the code, fearing that there may be a use for it later. Version control is a way out of
these problems.
Version control software is an essential part of the every-day of the modern software team‘s
professional practices. Individual software developers who are accustomed to working with a
capable version control system in their teams typically recognize the incredible value version control
also gives them even on small solo projects. Once accustomed to the powerful benefits of version
control systems, many developers wouldn’t consider working without it even for non-software
projects.
Version Control Systems (VCS) have seen great improvements over the past few decades and some
are better than others. VCS are sometimes known as SCM (Source Code Management) tools or RCS
(Revision Control System). One of the most popular VCS tools in use today is called Git. Git is
a Distributed VCS, a category known as DVCS, more on that later. Like many of the most popular VCS
systems available today, Git is free and open source. Regardless of what they are called, or which
system is used, the primary benefits you should expect from version control are as follows.
1. A complete long-term change history of every file. This means every change made by many
individuals over the years. Changes include the creation and deletion of files as well as edits
to their contents. Different VCS tools differ on how well they handle renaming and moving of
files. This history should also include the author, date and written notes on the purpose of
each change. Having the complete history enables going back to previous versions to help in
root cause analysis for bugs and it is crucial when needing to fix problems in older versions
of software. If the software is being actively worked on, almost everything can be considered
an “older version” of the software.
2. Branching and merging. Having team members work concurrently is a no-brainer, but even
individuals working on their own can benefit from the ability to work on independent
streams of changes. Creating a “branch” in VCS tools keeps multiple streams of work
independent from each other while also providing the facility to merge that work back
together, enabling developers to verify that the changes on each branch do not conflict.
Many software teams adopt a practice of branching for each feature or perhaps branching
for each release, or both. There are many different workflows that teams can choose from
when they decide how to make use of branching and merging facilities in VCS.
3. Traceability. Being able to trace each change made to the software and connect it to project
management and bug tracking software such as JIRA, and being able to annotate each
change with a message describing the purpose and intent of the change can help not only
with root cause analysis and other forensics. Having the annotated history of the code at
your fingertips when you are reading the code, trying to understand what it is doing and why
it is so designed can enable developers to make correct and harmonious changes that are in
accord with the intended long-term design of the system. This can be especially important
for working effectively with legacy code and is crucial in enabling developers to estimate
future work with any accuracy.
While it is possible to develop software without using any version control, doing so subjects the
project to a huge risk that no professional team would be advised to accept. So the question is not
whether to use version control but which version control system to use.
There are many choices, but here we are going to focus on just one, Git.
What is Git
By far, the most widely used modern version control system in the world today is Git. Git is a mature,
actively maintained open source project originally developed in 2005 by Linus Torvalds, the famous
creator of the Linux operating system kernel. A staggering number of software projects rely on Git
for version control, including commercial projects as well as open source. Developers who have
worked with Git are well represented in the pool of available software development talent and it
works well on a wide range of operating systems and IDEs (Integrated Development Environments).
Having a distributed architecture, Git is an example of a DVCS (hence Distributed Version Control
System). Rather than have only one single place for the full version history of the software as is
common in once-popular version control systems like CVS or Subversion (also known as SVN), in Git,
every developer's working copy of the code is also a repository that can contain the full history of all
changes.
In addition to being distributed, Git has been designed with performance, security and flexibility in
mind.
Performance
The raw performance characteristics of Git are very strong when compared to many alternatives.
Committing new changes, branching, merging and comparing past versions are all optimized for
performance. The algorithms implemented inside Git take advantage of deep knowledge about
common attributes of real source code file trees, how they are usually modified over time and what
the access patterns are.
Unlike some version control software, Git is not fooled by the names of the files when determining
what the storage and version history of the file tree should be, instead, Git focuses on the file
content itself. After all, source code files are frequently renamed, split, and rearranged. The object
format of Git's repository files uses a combination of delta encoding (storing content differences),
compression and explicitly stores directory contents and version metadata objects.
For example, say a developer, Alice, makes changes to source code, adding a feature for the
upcoming 2.0 release, then commits those changes with descriptive messages. She then works on a
second feature and commits those changes too. Naturally these are stored as separate pieces of
work in the version history. Alice then switches to the version 1.3 branch of the same software to fix
a bug that affects only that older version. The purpose of this is to enable Alice's team to ship a bug
fix release, version 1.3.1, before version 2.0 is ready. Alice can then return to the 2.0 branch to
continue working on new features for 2.0 and all of this can occur without any network access and is
therefore fast and reliable. She could even do it on an airplane. When she is ready to send all of the
individually committed changes to the remote repository, Alice can “push” them in one command.
Security
Git has been designed with the integrity of managed source code as a top priority. The content of
the files as well as the true relationships between files and directories, versions, tags and commits,
all of these objects in the Git repository are secured with a cryptographically secure hashing
algorithm called SHA1. This protects the code and the change history against both accidental and
malicious change and ensures that the history is fully traceable.
With Git, you can be sure you have an authentic content history of your source code.
Some other version control systems have no protections against secret alteration at a later date. This
can be a serious information security vulnerability for any organization that relies on software
development.
Flexibility
One of Git's key design objectives is flexibility. Git is flexible in several respects: in support for
various kinds of nonlinear development workflows, in its efficiency in both small and large projects
and in its compatibility with many existing systems and protocols.
Git has been designed to support branching and tagging as first-class citizens (unlike SVN) and
operations that affect branches and tags (such as merging or reverting) are also stored as part of the
change history. Not all version control systems feature this level of tracking.
Git is good
Git has the functionality, performance, security and flexibility that most teams and individual
developers need. These attributes of Git are detailed above. In side-by-side comparisons with most
other alternatives, many teams find that Git is very favorable.
Vast numbers of developers already have Git experience and a significant proportion of college
graduates may have experience with only Git. While some organizations may need to climb the
learning curve when migrating to Git from another version control system, many of their existing and
future developers do not need to be trained on Git.
In addition to the benefits of a large talent pool, the predominance of Git also means that many
third party software tools and services are already integrated with Git including IDEs, and our own
tools like DVCS desktop client SourceTree, issue and project tracking software, JIRA, and code
hosting service, Bitbucket.
If you are an inexperienced developer wanting to build up valuable skills in software development
tools, when it comes to version control, Git should be on your list.
Git enjoys great community support and a vast user base. Documentation is excellent and plentiful,
including books, tutorials and dedicated web sites. There are also podcasts and video tutorials.
Being open source lowers the cost for hobbyist developers as they can use Git without paying a fee.
For use in open-source projects, Git is undoubtedly the successor to the previous generations of
successful open source version control systems, SVN and CVS.
Criticism of Git
One common criticism of Git is that it can be difficult to learn. Some of the terminology in Git will be
novel to newcomers and for users of other systems, the Git terminology may be different, for
example, revert in Git has a different meaning than in SVN or CVS. Nevertheless, Git is very capable
and provides a lot of power to its users. Learning to use that power can take some time, however
once it has been learned, that power can be used by the team to increase their development speed.
For those teams coming from a non-distributed VCS, having a central repository may seem like a
good thing that they don‘t want to lose. However, while Git has been designed as a distributed
version control system (DVCS), with Git, you can still have an official, canonical repository where all
changes to the software must be stored. With Git, because each developer’s repository is complete,
their work doesn‘t need to be constrained by the availability and performance of the “central”
server. During outages or while offline, developers can still consult the full project history. Because
Git is flexible as well as being distributed, you can work the way you are accustomed to but gain the
additional benefits of Git, some of which you may not even realise you’re missing.
Now that you understand what version control is, what Git is and why software teams should use it,
read on to discover the benefits Git can provide across the whole organization.
Why Git for your organization
Switching from a centralized version control system to Git changes the way your development team
creates software. And, if you’re a company that relies on its software for mission-critical
applications, altering your development workflow impacts your entire business.
In this article, we’ll discuss how Git benefits each aspect of your organization, from your
development team to your marketing team, and everything in between. By the end of this article, it
should be clear that Git isn’t just for agile software development—it’s for agile business.
Git for developers
Feature branches provide an isolated environment for every change to your codebase. When a
developer wants to start working on something—no matter how big or small—they create a new
branch. This ensures that the master branch always contains production-quality code.
Using feature branches is not only more reliable than directly editing production code, but it also
provides organizational benefits. They let you represent development work at the same granularity
as the your agile backlog. For example, you might implement a policy where each JIRA ticket is
addressed in its own feature branch.
Distributed Development
In SVN, each developer gets a working copy that points back to a single central repository. Git,
however, is a distributed version control system. Instead of a working copy, each developer gets
their own local repository, complete with a full history of commits.
Having a full local history makes Git fast, since it means you don’t need a network connection to
create commits, inspect previous versions of a file, or perform diffs between commits.
Distributed development also makes it easier to scale your engineering team. If someone breaks the
production branch in SVN, other developers can’t check in their changes until it’s fixed. With Git, this
kind of blocking doesn’t exist. Everybody can continue going about their business in their own local
repositories.
And, similar to feature branches, distributed development creates a more reliable environment.
Even if a developer obliterates their own repository, they can simply clone someone else’s and start
anew.
Pull Requests
Many source code management tools such as Bitbucket enhance core Git functionality with pull
requests. A pull request is a way to ask another developer to merge one of your branches into their
repository. This not only makes it easier for project leads to keep track of changes, but also lets
developers initiate discussions around their work before integrating it with the rest of the codebase.
Since they’re essentially a comment thread attached to a feature branch, pull requests are extremely
versatile. When a developer gets stuck with a hard problem, they can open a pull request to ask for
help from the rest of the team. Alternatively, junior developers can be confident that they aren’t
destroying the entire project by treating pull requests as a formal code review.
Community
In many circles, Git has come to be the expected version control system for new projects. If your
team is using Git, odds are you won’t have to train new hires on your workflow, because they’ll
already be familiar with distributed development.
In addition, Git is very popular among open source projects. This means it’s easy to leverage 3rd-
party libraries and encourage others to fork your own open source code.
As you might expect, Git works very well with continuous integration and continuous delivery
environments. Git hooks allow you to run scripts when certain events occur inside of a repository,
which lets you automate deployment to your heart’s content. You can even build or deploy code
from specific branches to different servers.
For example, you might want to configure Git to deploy the most recent commit from the develop
branch to a test server whenever anyone merges a pull request into it. Combining this kind of build
automation with peer review means you have the highest possible confidence in your code as it
moves from development to staging to production.
The entire team is finishing up a game-changing feature that they’ve been working on for
the last 6 months.
Mary is implementing a smaller, unrelated feature that only impacts existing customers.
If you’re using a traditional development workflow that relies on a centralized VCS, all of these
changes would probably be rolled up into a single release. Marketing can only make one
announcement that focuses primarily on the game-changing feature, and the marketing potential of
the other two updates is effectively ignored.
The shorter development cycle facilitated by Git makes it much easier to divide these into individual
releases. This gives marketers more to talk about, more often. In the above scenario, marketing can
build out three campaigns that revolve around each feature, and thus target very specific market
segments.
For instance, they might prepare a big PR push for the game changing feature, a corporate blog post
and newsletter blurb for Mary’s feature, and some guest posts about Rick’s underlying UX theory for
sending to external design blogs. All of these activities can be synchronized with a separate release.
This same functionality makes it easy to manage innovation projects, beta tests, and rapid
prototypes as independent codebases.
Encapsulating user interface changes like this makes it easy to present updates to other
stakeholders. For example, if the director of engineering wants to see what the design team has
been working on, all they have to do is tell the director to check out the corresponding branch.
Pull requests take this one step further and provide a formal place for interested parties to discuss
the new interface. Designers can make any necessary changes, and the resulting commits will show
up in the pull request. This invites everybody to participate in the iteration process.
Perhaps the best part of prototyping with branches is that it’s just as easy to merge the changes into
production as it is to throw them away. There’s no pressure to do either one. This encourages
designers and UI developers to experiment while ensuring that only the best ideas make it through
to the customer.
Git’s streamlined development cycle avoids postponing bug fixes until the next monolithic release. A
developer can patch the problem and push it directly to production. Faster fixes means happier
customers and fewer repeat support tickets. Instead of being stuck with, “Sorry, we’ll get right on
that” your customer support team can start responding with “We’ve already fixed it!
Employees are drawn to companies that provide career growth opportunities, and understanding
how to leverage Git in both large and small organizations is a boon to any programmer. By choosing
Git as your version control system, you’re making the decision to attract forward-looking developers.
But, don’t forget that these efficiencies also extend outside your development team. They prevent
marketing from pouring energy into collateral for features that aren’t popular. They let designers
test new interfaces on the actual product with little overhead. They let you react to customer
complaints immediately.
Being agile is all about finding out what works as quickly as possible, magnifying efforts that are
successful, and eliminating ones that aren’t. Git serves as a multiplier for all your business activities
by making sure every department is doing their job more efficiently.
Setting up a repository
This tutorial provides a succinct overview of the most important Git commands. First, the Setting Up
a Repository section explains all of the tools you need to start a new version-controlled project.
Then, the remaining sections introduce your everyday Git commands.
By the end of this module, you should be able to create a Git repository, record snapshots of your
project for safekeeping, and view your project’s history.
git init
The git init command creates a new Git repository. It can be used to convert an existing, unversioned
project to a Git repository or initialize a new empty repository. Most of the other Git commands are
not available outside of an initialized repository, so this is usually the first command you’ll run in a
new project.
Executing git init creates a .git subdirectory in the project root, which contains all of the necessary
metadata for the repo. Aside from the .git directory, an existing project remains unaltered (unlike
SVN, Git doesn't require a .git folder in every subdirectory).
Usage
git init
Transform the current directory into a Git repository. This adds a .gitfolder to the current directory
and makes it possible to start recording revisions of the project.
Create an empty Git repository in the specified directory. Running this command will create a new
folder called <directory containing nothing but the .git subdirectory.
Initialize an empty Git repository, but omit the working directory. Shared repositories should always
be created with the --bare flag (see discussion below). Conventionally, repositories initialized with
the --bare flag end in .git. For example, the bare version of a repository called my-project should be
stored in a directory called my-project.git.
Discussion
Compared to SVN, the git init command is an incredibly easy way to create new version-controlled
projects. Git doesn’t require you to create a repository, import files, and check out a working copy.
All you have to do is cd into your project folder and run git init, and you’ll have a fully functional Git
repository.
However, for most projects, git init only needs to be executed once to create a central repository—
developers typically don‘t use git init to create their local repositories. Instead, they’ll usually use git
clone to copy an existing repository onto their local machine.
Bare Repositories
The --bare flag creates a repository that doesn’t have a working directory, making it impossible to
edit files and commit changes in that repository. Central repositories should always be created as
bare repositories because pushing branches to a non-bare repository has the potential to overwrite
changes. Think of --bare as a way to mark a repository as a storage facility, opposed to a
development environment. This means that for virtually all Git workflows, the central repository is
bare, and developers local repositories are non-bare.
Example
Since git clone is a more convenient way to create local copies of a project, the most common use
case for git init is to create a central repository:
ssh <user>@<host>
cd path/above/repo
git init --bare my-project.git
First, you SSH into the server that will contain your central repository. Then, you navigate to
wherever you’d like to store the project. Finally, you use the --bare flag to create a central storage
repository. Developers would then [clone](/tutorials/setting-up-a-repository/git-clone) my-
project.gitto create a local copy on their development machine.
git clone
The git clone command copies an existing Git repository. This is sort of like svn checkout, except the
“working copy” is a full-fledged Git repository—it has its own history, manages its own files, and is a
completely isolated environment from the original repository.
As a convenience, cloning automatically creates a remote connection called origin pointing back to
the original repository. This makes it very easy to interact with a central repository.
Usage
git clone <repo>
Clone the repository located at <repo> onto the local machine. The original repository can be
located on the local filesystem or on a remote machine accessible via HTTP or SSH.
Clone the repository located at <repo> into the folder called <directory> on the local machine.
Discussion
If a project has already been set up in a central repository, the git clone command is the most
common way for users to obtain a development copy. Like git init, cloning is generally a one-time
operation—once a developer has obtained a working copy, all version control operations and
collaborations are managed through their local repository.
Repo-To-Repo Collaboration
It’s important to understand that Git’s idea of a “working copy” is very different from the working
copy you get by checking out code from an SVN repository. Unlike SVN, Git makes no distinction
between the working copy and the central repository—they are all full-fledged Git repositories.
This makes collaborating with Git fundamentally different than with SVN. Whereas SVN depends on
the relationship between the central repository and the working copy, Git’s collaboration model is
based on repository-to-repository interaction. Instead of checking a working copy into SVN’s central
repository, you push or pull commits from one repository to another.
Of course, there’s nothing stopping you from giving certain Git repos special meaning. For example,
by simply designating one Git repo as the “central” repository, it’s possible to replicate a Centralized
workflow using Git. The point is, this is accomplished through conventions rather than being
hardwired into the VCS itself.
Example
The example below demonstrates how to obtain a local copy of a central repository stored on a
server accessible at example.com using the SSH username john:
cd my-project
The first command initializes a new Git repository in the my-projectfolder on your local machine and
populates it with the contents of the central repository. Then, you can cd into the project and start
editing files, committing snapshots, and interacting with other repositories. Also note that
the .git extension is omitted from the cloned repository. This reflects the non-bare status of the local
copy.
git config
The git config command lets you configure your Git installation (or an individual repository) from the
command line. This command can define everything from user info to preferences to the behavior of
a repository. Several common configuration options are listed below.
Usage
git config user.name <name>
Define the author name to be used for all commits in the current repository. Typically, you’ll want to
use the --global flag to set configuration options for the current user.
git config --global user.name <name>
Define the author name to be used for all commits by the current user.
Define the author email to be used for all commits by the current user.
Define the text editor used by commands like git commit for all users on the current machine.
The <editor> argument should be the command that launches the desired editor (e.g., vi).
Open the global configuration file in a text editor for manual editing.
Discussion
All configuration options are stored in plaintext files, so the git configcommand is really just a
convenient command-line interface. Typically, you’ll only need to configure a Git installation the first
time you start working on a new development machine, and for virtually all cases, you’ll want to use
the --global flag.
Git stores configuration options in three separate files, which lets you scope options to individual
repositories, users, or the entire system:
~/.gitconfig – User-specific settings. This is where options set with the --global flag are stored.
When options in these files conflict, local settings override user settings, which override system-
wide. If you open any of these files, you’ll see something like the following:
[user]
email = [email protected]
[alias]
st = status
co = checkout
br = branch
up = rebase
ci = commit
[core]
editor = vim
You can manually edit these values to the exact same effect as git config.
Example
The first thing you’ll want to do after installing Git is tell it your name/email and customize some of
the default settings. A typical initial configuration might look something like the following:
This will produce the ~/.gitconfig file from the previous section.
Saving changes
git add
The git add command adds a change in the working directory to the staging area. It tells Git that you
want to include updates to a particular file in the next commit. However, git add doesn't really affect
the repository in any significant way—changes are not actually recorded until you run git commit.
In conjunction with these commands, you'll also need git status to view the state of the working
directory and the staging area.
Usage
git add <file>
git add -p
Begin an interactive staging session that lets you choose portions of a file to add to the next commit.
This will present you with a chunk of changes and prompt you for a command. Use y to stage the
chunk, n to ignore the chunk, s to split it into smaller chunks, e to manually edit the chunk, and q to
exit.
Discussion
The git add and git commit commands compose the fundamental Git workflow. These are the two
commands that every Git user needs to understand, regardless of their team’s collaboration model.
They are the means to record versions of a project into the repository’s history.
Developing a project revolves around the basic edit/stage/commit pattern. First, you edit your files
in the working directory. When you’re ready to save a copy of the current state of the project, you
stage changes with git add. After you’re happy with the staged snapshot, you commit it to the
project history with git commit.
The git add command should not be confused with svn add, which adds a file to the repository.
Instead, git add works on the more abstract level of changes. This means that git add needs to be
called every time you alter a file, whereas svn add only needs to be called once for each file. It may
sound redundant, but this workflow makes it much easier to keep a project organized.
The staging area is one of Git's more unique features, and it can take some time to wrap your head
around it if you’re coming from an SVN (or even a Mercurial) background. It helps to think of it as a
buffer between the working directory and the project history.
Instead of committing all of the changes you've made since the last commit, the stage lets you group
related changes into highly focused snapshots before actually committing it to the project history.
This means you can make all sorts of edits to unrelated files, then go back and split them up into
logical commits by adding related changes to the stage and commit them piece-by-piece. As in any
revision control system, it’s important to create atomic commits so that it’s easy to track down bugs
and revert changes with minimal impact on the rest of the project.
Example
When you’re starting a new project, git add serves the same function as svn import. To create an
initial commit of the current directory, use the following two commands:
git add .
git commit
Once you’ve got your project up-and-running, new files can be added by passing the path to git add:
git add hello.py
git commit
The above commands can also be used to record changes to existing files. Again, Git doesn’t
differentiate between staging changes in new files vs. changes in files that have already been added
to the repository.
git commit
The git commit command commits the staged snapshot to the project history. Committed snapshots
can be thought of as “safe” versions of a project—Git will never change them unless you explicity ask
it to. Along with git add, this is one of the most important Git commands.
While they share the same name, this command is nothing like svn commit. Snapshots are
committed to the local repository, and this requires absolutely no interaction with other Git
repositories.
Usage
git commit
Commit the staged snapshot. This will launch a text editor prompting you for a commit message.
After you’ve entered a message, save the file and close the editor to create the actual commit. git
commit -m "<message>"
Commit the staged snapshot, but instead of launching a text editor, use <message> as the commit
message.
git commit -a
Commit a snapshot of all changes in the working directory. This only includes modifications to
tracked files (those that have been added with git add at some point in their history).
Discussion
Snapshots are always committed to the local repository. This is fundamentally different from SVN,
wherein the working copy is committed to the central repository. In contrast, Git doesn’t force you
to interact with the central repository until you’re ready. Just as the staging area is a buffer between
the working directory and the project history, each developer’s local repository is a buffer between
their contributions and the central repository.
This changes the basic development model for Git users. Instead of making a change and committing
it directly to the central repo, Git developers have the opportunity to accumulate commits in their
local repo. This has many advantages over SVN-style collaboration: it makes it easier to split up a
feature into atomic commits, keep related commits grouped together, and clean up local history
before publishing it to the central repository. It also lets developers work in an isolated environment,
deferring integration until they’re at a convenient break point.
Snapshots, Not Differences
Aside from the practical distinctions between SVN and Git, their underlying implementation also
follow entirely divergent design philosophies. Whereas SVN tracks differences of a file, Git’s version
control model is based on snapshots. For example, an SVN commit consists of a diff compared to the
original file added to the repository. Git, on the other hand, records the entire contents of each file in
every commit.
This makes many Git operations much faster than SVN, since a particular version of a file doesn’t
have to be “assembled” from its diffs—the complete revision of each file is immediately available
from Git's internal database.
Git's snapshot model has a far-reaching impact on virtually every aspect of its version control model,
affecting everything from its branching and merging tools to its collaboration workflows.
Example
The following example assumes you’ve edited some content in a file called hello.py and are ready to
commit it to the project history. First, you need to stage the file with git add, then you can commit
the staged snapshot.
git commit
This will open a text editor (customizable via git config) asking for a commit message, along with a
list of what’s being committed:
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#modified: hello.py
Git doesn't require commit messages to follow any specific formatting constraints, but the canonical
format is to summarize the entire commit on the first line in less than 50 characters, leave a blank
line, then a detailed explanation of what’s been changed. For example:
Note that many developers also like to use present tense in their commit messages. This makes
them read more like actions on the repository, which makes many of the history-rewriting
operations more intuitive.
Inspecting a repository
git status
The git status command displays the state of the working directory and the staging area. It lets you
see which changes have been staged, which haven’t, and which files aren’t being tracked by Git.
Status output does not show you any information regarding the committed project history. For this,
you need to use git log.
Usage
git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
#modified: hello.py
#
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
#modified: main.py
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
#hello.pyc
Ignoring Files
Untracked files typically fall into two categories. They‘re either files that have just been added to the
project and haven’t been committed yet, or they're compiled binaries like .pyc, .obj, .exe, etc. While
it's definitely beneficial to include the former in the git status output, the latter can make it hard to
see what’s actually going on in your repository.
For this reason, Git lets you completely ignore files by placing paths in a special file called .gitignore.
Any files that you'd like to ignore should be included on a separate line, and the * symbol can be
used as a wildcard. For example, adding the following to a .gitignore file in your project root will
prevent compiled Python modules from appearing in git status:
*.pyc
Example
It‘s good practice to check the state of your repository before committing changes so that you don’t
accidentally commit something you don't mean to. This example displays the repository status
before and after staging and committing a snapshot:
# Edit hello.py
git status
# hello.py is listed under "Changes not staged for commit"
git add hello.py
git status
# hello.py is listed under "Changes to be committed"
git commit
git status
# nothing to commit (working directory clean)
The first status output will show the file as unstaged. The git addaction will be reflected in the
second git status, and the final status output will tell you that there is nothing to commit—the
working directory matches the most recent commit. Some Git commands (e.g., git merge) require
the working directory to be clean so that you don't accidentally overwrite changes.
git log
The git log command displays committed snapshots. It lets you list the project history, filter it, and
search for specific changes. While git status lets you inspect the working directory and the staging
area, git log only operates on the committed history.
Log output can be customized in several ways, from simply filtering commits to displaying them in a
completely user-defined format. Some of the most common configurations of git log are presented
below.
Usage
git log
Display the entire commit history using the default formatting. If the output takes up more than one
screen, you can use Space to scroll and q to exit.
Limit the number of commits by <limit>. For example, git log -n 3will display only 3 commits.
Along with the ordinary git log information, include which files were altered and the relative number
of lines that were added or deleted from each of them.
git log -p
Display the patch representing each commit. This shows the full diff of each commit, which is the
most detailed view you can have of your project history.
Search for commits by a particular author. The <pattern> argument can be a plain string or a regular
expression.
Search for commits with a commit message that matches <pattern>, which can be a plain string or a
regular expression.
Show only commits that occur between <since> and <until>. Both arguments can be either a commit
ID, a branch name, HEAD, or any other kind of revision reference.
Only display commits that include the specified file. This is an easy way to see the history of a
particular file.
A few useful options to consider. The —graph flag that will draw a text based graph of the commits
on the left hand side of the commit messages. —decorate adds the names of branches or tags of the
commits that are shown. —oneline shows the commit information on a single line making it easier to
browse through commits at-a-glance.
Discussion
The git log command is Git's basic tool for exploring a repository’s history. It’s what you use when
you need to find a specific version of a project or figure out what changes will be introduced by
merging in a feature branch.
commit 3157ee3718e180a9476bf2e5cab8e3f1e78a73b7
This ID can be used in commands like git log <since>..<until> to refer to specific commits. For
instance, git log 3157e..5ab91 will display everything between the commits with
ID's 3157e and 5ab91. Aside from checksums, branch names (discussed in the Branch Module) and
the HEAD keyword are other common methods for referring to individual commits. HEAD always
refers to the current commit, be it a branch or a specific commit.
The ~ character is useful for making relative references to the parent of a commit. For
example, 3157e~1 refers to the commit before 3157e, and HEAD~3 is the great-grandparent of the
current commit.
The idea behind all of these identification methods is to let you perform actions based on specific
commits. The git log command is typically the starting point for these interactions, as it lets you find
the commits you want to work with.
Example
The Usage section provides many examples of git log, but keep in mind that several options can be
combined into a single command:
This will display a full diff of all the changes John Smith has made to the file hello.py.
The .. syntax is a very useful tool for comparing branches. The next example displays a brief overview
of all the commits that are in some-feature that are not in master.
git checkout
The git checkout command serves three distinct functions: checking out files, checking out commits,
and checking out branches. In this module, we’re only concerned with the first two configurations.
Checking out a commit makes the entire working directory match that commit. This can be used to
view an old state of your project without altering your current state in any way. Checking out a file
lets you see an old version of that particular file, leaving the rest of your working directory
untouched.
Usage
Return to the master branch. Branches are covered in depth in the next module, but for now, you
can just think of this as a way to get back to the “current” state of the project.
Check out a previous version of a file. This turns the <file> that resides in the working directory into
an exact copy of the one from <commit> and adds it to the staging area.
Update all files in the working directory to match the specified commit. You can use either a commit
hash or a tag as the <commit> argument. This will put you in a detached HEAD state.
Discussion
The whole idea behind any version control system is to store “safe” copies of a project so that you
never have to worry about irreparably breaking your code base. Once you’ve built up a project
history, git checkout is an easy way to “load” any of these saved snapshots onto your development
machine.
Checking out an old commit is a read-only operation. It’s impossible to harm your repository while
viewing an old revision. The “current” state of your project remains untouched in the master branch
(see the Branches Module for details). During the normal course of development, the HEAD usually
points to master or some other local branch, but when you check out a previous commit, HEAD no
longer points to a branch—it points directly to a commit. This is called a “detached HEAD” state, and
it can be visualized as the following:
On the other hand, checking out an old file does affect the current state of your repository. You can
re-commit the old version in a new snapshot as you would any other file. So, in effect, this usage of
git checkout serves as a way to revert back to an old version of an individual file.
Example
Let’s say your project history looks something like the following:
You can use git checkout to view the “Make some import changes to hello.py” commit as follows:
This makes your working directory match the exact state of the a1e8fb5 commit. You can look at
files, compile the project, run tests, and even edit files without worrying about losing the current
state of the project. Nothing you do in here will be saved in your repository. To continue developing,
you need to get back to the “current” state of your project:
This assumes that you're developing on the default master branch, which will be thoroughly
discussed in the Branches Module.
Once you’re back in the master branch, you can use either git revert or git reset to undo any
undesired changes.
Remember, unlike checking out a commit, this does affect the current state of your project. The old
file revision will show up as a “Change to be committed,” giving you the opportunity to revert back
to the previous version of the file. If you decide you don’t want to keep the old version, you can
check out the most recent version with the following:
This tutorial provides all of the necessary skills to work with previous revisions of a software project.
First, it shows you how to explore old commits, then it explains the difference between reverting
public commits in the project history vs. resetting unpublished changes on your local machine.
git checkout
The git checkout command serves three distinct functions: checking out files, checking out commits,
and checking out branches. In this module, we’re only concerned with the first two configurations.
Checking out a commit makes the entire working directory match that commit. This can be used to
view an old state of your project without altering your current state in any way. Checking out a file
lets you see an old version of that particular file, leaving the rest of your working directory
untouched.
Usage
Return to the master branch. Branches are covered in depth in the next module, but for now, you
can just think of this as a way to get back to the “current” state of the project.
Check out a previous version of a file. This turns the <file> that resides in the working directory into
an exact copy of the one from <commit> and adds it to the staging area.
Update all files in the working directory to match the specified commit. You can use either a commit
hash or a tag as the <commit>argument. This will put you in a detached HEAD state.
Discussion
The whole idea behind any version control system is to store “safe” copies of a project so that you
never have to worry about irreparably breaking your code base. Once you’ve built up a project
history, git checkout is an easy way to “load” any of these saved snapshots onto your development
machine.
Checking out an old commit is a read-only operation. It’s impossible to harm your repository while
viewing an old revision. The “current” state of your project remains untouched in the master branch
(see the Branches Module for details). During the normal course of development, the HEAD usually
points to master or some other local branch, but when you check out a previous commit, HEAD no
longer points to a branch—it points directly to a commit. This is called a “detached HEAD” state, and
it can be visualized as the following:
On the other hand, checking out an old file does affect the current state of your repository. You can
re-commit the old version in a new snapshot as you would any other file. So, in effect, this usage
of git checkout serves as a way to revert back to an old version of an individual file.
Example
Let’s say your project history looks something like the following:
You can use git checkout to view the “Make some import changes to hello.py” commit as follows:
This makes your working directory match the exact state of the a1e8fb5 commit. You can look at
files, compile the project, run tests, and even edit files without worrying about losing the current
state of the project. Nothing you do in here will be saved in your repository. To continue developing,
you need to get back to the “current” state of your project:
This assumes that you're developing on the default master branch, which will be thoroughly
discussed in the Branches Module.
Once you’re back in the master branch, you can use either git revert or git reset to undo any
undesired changes.
Remember, unlike checking out a commit, this does affect the current state of your project. The old
file revision will show up as a “Change to be committed,” giving you the opportunity to revert back
to the previous version of the file. If you decide you don’t want to keep the old version, you can
check out the most recent version with the following:
Usage
Generate a new commit that undoes all of the changes introduced in <commit>, then apply it to the
current branch.
Discussion
Reverting should be used when you want to remove an entire commit from your project history. This
can be useful, for example, if you’re tracking down a bug and find that it was introduced by a single
commit. Instead of manually going in, fixing it, and committing a new snapshot, you can use git
revert to automatically do all of this for you.
Second, git revert is able to target an individual commit at an arbitrary point in the history,
whereas git reset can only work backwards from the current commit. For example, if you wanted to
undo an old commit with git reset, you would have to remove all of the commits that occurred after
the target commit, remove it, then re-commit all of the subsequent commits. Needless to say, this is
not an elegant undo solution.
Example
The following example is a simple demonstration of git revert. It commits a snapshot, then
immediately undoes it with a revert.
# Commit a snapshot
git commit -m "Make some changes that will be undone"
git reset
If git revert is a “safe” way to undo changes, you can think of git reset as the dangerous method.
When you undo with git reset(and the commits are no longer referenced by any ref or the reflog),
there is no way to retrieve the original copy—it is apermanent undo. Care must be taken when using
this tool, as it’s one of the only Git commands that has the potential to lose your work.
Like git checkout, git reset is a versatile command with many configurations. It can be used to
remove committed snapshots, although it’s more often used to undo changes in the staging area and
the working directory. In either case, it should only be used to undo local changes—you should
never reset snapshots that have been shared with other developers.
Usage
Remove the specified file from the staging area, but leave the working directory unchanged. This
unstages a file without overwriting any changes.
git reset
Reset the staging area to match the most recent commit, but leave the working directory
unchanged. This unstages all files without overwriting any changes, giving you the opportunity to re-
build the staged snapshot from scratch.
Move the current branch tip backward to <commit>, reset the staging area to match, but leave the
working directory alone. All changes made since <commit> will reside in the working directory,
which lets you re-commit the project history using cleaner, more atomic snapshots.
Move the current branch tip backward to <commit> and reset both the staging area and the working
directory to match. This obliterates not only the uncommitted changes, but all commits
after <commit>, as well.
Discussion
All of the above invocations are used to remove changes from a repository. Without the --
hard flag, git reset is a way to clean up a repository by unstaging changes or uncommitting a series of
snapshots and re-building them from scratch. The --hard flag comes in handy when an experiment
has gone horribly wrong and you need a clean slate to work with.
Whereas reverting is designed to safely undo a public commit, git reset is designed to
undo local changes. Because of their distinct goals, the two commands are implemented differently:
resetting completely removes a changeset, whereas revertingmaintains the original changeset and
uses a new commit to apply the undo.
As soon as you add new commits after the reset, Git will think that your local history has diverged
from origin/master, and the merge commit required to synchronize your repositories is likely to
confuse and frustrate your team.
The point is, make sure that you’re using git reset <commit> on a local experiment that went wrong
—not on published changes. If you need to fix a public commit, the git revert command was
designed specifically for this purpose.
Examples
Unstaging a File
The git reset command is frequently encountered while preparing the staged snapshot. The next
example assumes you have two files called hello.py and main.py that you’ve already added to the
repository.
# Unstage main.py
git reset main.py
As you can see, git reset helps you keep your commits highly-focused by letting you unstage changes
that aren’t related to the next commit.
# Edit `foo.py` again and change some other tracked files, too
The git reset HEAD~2 command moves the current branch backward by two commits, effectively
removing the two snapshots we just created from the project history. Remember that this kind of
reset should only be used on unpublished commits. Never perform the above operation if you’ve
already pushed your commits to a shared repository.
git clean
The git clean command removes untracked files from your working directory. This is really more of a
convenience command, since it’s trivial to see which files are untracked with git status and remove
them manually. Like an ordinary rm command, git clean is notundoable, so make sure you really
want to delete the untracked files before you run it.
The git clean command is often executed in conjunction with git reset --hard. Remember that
resetting only affects tracked files, so a separate command is required for cleaning up untracked
ones. Combined, these two commands let you return the working directory to the exact state of a
particular commit.
Usage
git clean -n
Perform a “dry run” of git clean. This will show you which files are going to be removed without
actually doing it.
git clean -f
Remove untracked files from the current directory. The -f (force) flag is required unless
the clean.requireForce configuration option is set to false (it's true by default). This will not remove
untracked folders or files specified by .gitignore.
Remove untracked files, but limit the operation to the specified path.
Remove untracked files and untracked directories from the current directory.
Remove untracked files from the current directory as well as any files that Git usually ignores.
Discussion
The git reset --hard and git clean -f commands are your best friends after you’ve made some
embarrassing developments in your local repository and want to burn the evidence. Running both of
them will make your working directory match the most recent commit, giving you a clean slate to
work with.
The git clean command can also be useful for cleaning up the working directory after a build. For
example, it can easily remove the .o and .exe binaries generated by a C compiler. This is occasionally
a necessary step before packaging a project for release. The -xoption is particularly convenient for
this purpose.
Keep in mind that, along with git reset, git clean is one of the only Git commands that has the
potential to permanently delete commits, so be careful with it. In fact, it’s so easy to lose important
additions that the Git maintainers require the -f flag for even the most basic operations. This
prevents you from accidentally deleting everything with a naive git clean call.
Example
The following example obliterates all changes in the working directory, including new files that have
been added. It assumes you’ve already committed a few snapshots and are experimenting with
some new developments.
Note that, unlike the second example in git reset, the new files were _not _added to the repository.
As a result, they could not be affected by git reset --hard, and git clean was required to delete them.
Rewriting history
Intro
Git‘s main job is to make sure you never lose a committed change. But, it’s also designed to give you
total control over your development workflow. This includes letting you define exactly what your
project history looks like; however, it also creates the potential to lose commits. Git provides its
history-rewriting commands under the disclaimer that using them may result in lost content.
This tutorial discusses some of the most common reasons for overwriting committed snapshots and
shows you how to avoid the pitfalls of doing so.
Usage
Combine the staged changes with the previous commit and replace the previous commit with the
resulting snapshot. Running this when there is nothing staged lets you edit the previous commit’s
message without altering its snapshot.
Discussion
Premature commits happen all the time in the course of your everyday development. It’s easy to
forget to stage a file or to format your commit message the wrong way. The --amend flag is a
convenient way to fix these little mistakes.
Amended commits are actually entirely new commits, and the previous commit is removed from the
project history. This has the same consequences as resetting a public snapshot. If you amend a
commit that other developers have based their work on, it will look like the basis of their work
vanished from the project history. This is a confusing situation for developers to be in and it’s
complicated to recover from.
Example
The following example demonstrates a common scenario in Git-based development. We edit a few
files that we would like to commit in a single snapshot, but then we forget to add one of the files the
first time around. Fixing the error is simply a matter of staging the other file and committing with the
--amend flag:
The editor will be populated with the message from the previous commit and including the --no-edit
flag will allow you to make the amendment to your commit without changing its commit message.
You can change it if necessary, otherwise just save and close the file as usual. The resulting commit
will replace the incomplete one, and it will look like we committed the changes to hello.py and
main.py in a single snapshot.
git rebase
Rebasing is the process of moving a branch to a new base commit. The general process can be
visualized as the following:
From a content perspective, rebasing really is just moving a branch from one commit to another. But
internally, Git accomplishes this by creating new commits and applying them to the specified base—
it’s literally rewriting your project history. It’s very important to understand that, even though the
branch looks the same, it’s composed of entirely new commits.
Usage
Discussion
The primary reason for rebasing is to maintain a linear project history. For example, consider a
situation where the master branch has progressed since you started working on a feature:
You have two options for integrating your feature into the master branch: merging directly or
rebasing and then merging. The former option results in a 3-way merge and a merge commit, while
the latter results in a fast-forward merge and a perfectly linear history. The following diagram
demonstrates how rebasing onto master facilitates a fast-forward merge.
Fast-forward merge
Rebasing is a common way to integrate upstream changes into your local repository. Pulling in
upstream changes with git merge results in a superfluous merge commit every time you want to see
how the project has progressed. On the other hand, rebasing is like saying, “I want to base my
changes on what everybody has already done.”
Examples
The example below combines git rebase with git merge to maintain a linear project history. This is a
quick and easy way to ensure that your merges will be fast-forwarded.
In the middle of our feature, we realize there’s a security hole in our project
After merging the hotfix into master, we have a forked project history. Instead of a plain git merge,
we’ll integrate the feature branch with a rebase to maintain a linear history:
This moves new-feature to the tip of master, which lets us do a standard fast-forward merge from
master:
git rebase -i
Running git rebase with the -i flag begins an interactive rebasing session. Instead of blindly moving
all of the commits to the new base, interactive rebasing gives you the opportunity to alter individual
commits in the process. This lets you clean up history by removing, splitting, and altering an existing
series of commits. It’s like git commit --amend on steroids.
Usage
Rebase the current branch onto <base>, but use an interactive rebasing session. This opens an editor
where you can enter commands (described below) for each commit to be rebased. These commands
determine how individual commits will be transferred to the new base. You can also reorder the
commit listing to change the order of the commits themselves.
Discussion
Interactive rebasing gives you complete control over what your project history looks like. This affords
a lot of freedom to developers, as it lets them commit a “messy” history while they’re focused on
writing code, then go back and clean it up after the fact.
Most developers like to use an interactive rebase to polish a feature branch before merging it into
the main code base. This gives them the opportunity to squash insignificant commits, delete
obsolete ones, and make sure everything else is in order before committing to the “official” project
history. To everybody else, it will look like the entire feature was developed in a single series of well-
planned commits.
Examples
The example found below is an interactive adaptation of the one from the non-interactive git rebase
page.
The last command will open an editor populated with the two commits from new-feature, along
with some instructions:
You can change the pick commands before each commit to determine how it gets moved during the
rebase. In our case, let’s just combine the two commits with a squash command:
pick 32618c4 Start developing a feature
squash 62eed47 Fix something from the previous commit
Save and close the editor to begin the rebase. This will open another editor asking for the commit
message for the combined snapshot. After defining the commit message, the rebase is complete and
you should be able to see the squashed commit in your git log output. This entire process can be
visualized as follows:
Note that the squashed commit has a different ID than either of the original commits, which tells us
that it is indeed a brand new commit.
Finally, you can do a fast-forward merge to integrate the polished feature branch into the main code
base:
The real power of interactive rebasing can be seen in the history of the resulting master branch—the
extra 62eed47 commit is nowhere to be found. To everybody else, it looks like you’re a brilliant
developer who implemented the new-feature with the perfect amount of commits the first time
around. This is how interactive rebasing can keep a project’s history clean and meaningful.
git reflog
Git keeps track of updates to the tip of branches using a mechanism called reflog. This allows you to
go back to changesets even though they are not referenced by any branch or tag. After rewriting
history, the reflog contains information about the old state of branches and allows you to go back to
that state if necessary.
Usage
git reflog
Discussion
Every time the current HEAD gets updated (by switching branches, pulling in new changes, rewriting
history or simply by adding new commits) a new entry will be added to the reflog.
Example
To understand git reflog, let's run through an example.
The reflog above shows a checkout from master to the 2.2 branch and back. From there, there's a
hard reset to an older commit. The latest activity is represented at the top labeled HEAD@{0}.
If it turns out that you accidentially moved back, the reflog will contain the commit master pointed
to (0254ea7) before you accidentially dropped 2 commits.
Using git reset it is then possible to change master back to the commit it was before. This provides a
safety net in case history was accidentially changed.
It's important to note that the reflog only provides a safety net if changes have been commited to
your local repository and that it only tracks movements.
Collaborating
Syncing
SVN uses a single central repository to serve as the communication hub for developers, and
collaboration takes place by passing changesets between the developers’ working copies and the
central repository. This is different from Git’s collaboration model, which gives every developer their
own copy of the repository, complete with its own local history and branch structure. Users typically
need to share a series of commits rather than a single changeset. Instead of committing a changeset
from a working copy to the central repository, Git lets you share entire branches between
repositories.
The commands presented below let you manage connections with other repositories, publish local
history by “pushing” branches to other repositories, and see what others have contributed by
“pulling” branches into your local repository.
git remote
The git remote command lets you create, view, and delete connections to other repositories.
Remote connections are more like bookmarks rather than direct links into other repositories. Instead
of providing real-time access to another repository, they serve as convenient names that can be
used to reference a not-so-convenient URL.
For example, the following diagram shows two remote connections from your repo into the central
repo and another developer’s repo. Instead of referencing them by their full URLs, you can pass the
origin and john shortcuts to other Git commands.
Usage
git remote
git remote -v
Same as the above command, but include the URL of each connection.
Create a new connection to a remote repository. After adding a remote, you’ll be able to use
<name> as a convenient shortcut for <url> in other Git commands.
Discussion
Git is designed to give each developer an entirely isolated development environment. This means
that information is not automatically passed back and forth between repositories. Instead,
developers need to manually pull upstream commits into their local repository or manually push
their local commits back up to the central repository. The git remote command is really just an easier
way to pass URLs to these “sharing” commands.
The origin Remote
When you clone a repository with git clone, it automatically creates a remote connection called
origin pointing back to the cloned repository. This is useful for developers creating a local copy of a
central repository, since it provides an easy way to pull upstream changes or publish local commits.
This behavior is also why most Git-based projects call their central repository origin.
Repository URLs
Git supports many ways to reference a remote repository. Two of the easiest ways to access a
remote repo are via the HTTP and the SSH protocols. HTTP is an easy way to allow anonymous, read-
only access to a repository. For example:
http://host/path/to/repo.git
But, it’s generally not possible to push commits to an HTTP address (you wouldn’t want to allow
anonymous pushes anyways). For read-write access, you should use SSH instead:
ssh://user@host/path/to/repo.git
You’ll need a valid SSH account on the host machine, but other than that, Git supports authenticated
access via SSH out of the box.
Examples
In addition to origin, it’s often convenient to have a connection to your teammates’ repositories. For
example, if your co-worker, John, maintained a publicly accessible repository on
dev.example.com/john.git, you could add a connection as follows:
Having this kind of access to individual developers’ repositories makes it possible to collaborate
outside of the central repository. This can be very useful for small teams working on a large project.
git fetch
The git fetch command imports commits from a remote repository into your local repo. The resulting
commits are stored as remote branches instead of the normal local branches that we’ve been
working with. This gives you a chance to review changes before integrating them into your copy of
the project.
Usage
Fetch all of the branches from the repository. This also downloads all of the required commits and
files from the other repository.
Discussion
Fetching is what you do when you want to see what everybody else has been working on. Since
fetched content is represented as a remote branch, it has absolutely no effect on your local
development work. This makes fetching a safe way to review commits before integrating them with
your local repository. It’s similar to svn update in that it lets you see how the central history has
progressed, but it doesn’t force you to actually merge the changes into your repository.
Remote Branches
Remote branches are just like local branches, except they represent commits from somebody else’s
repository. You can check out a remote branch just like a local one, but this puts you in a detached
HEAD state (just like checking out an old commit). You can think of them as read-only branches. To
view your remote branches, simply pass the -r flag to the git branch command. Remote branches are
prefixed by the remote they belong to so that you don’t mix them up with local branches. For
example, the next code snippet shows the branches you might see after fetching from the origin
remote:
git branch -r
# origin/master
# origin/develop
# origin/some-feature
Again, you can inspect these branches with the usual git checkout and git log commands. If you
approve the changes a remote branch contains, you can merge it into a local branch with a normal
git merge. So, unlike SVN, synchronizing your local repository with a remote repository is actually a
two-step process: fetch, then merge. The git pull command is a convenient shortcut for this process.
Examples
This example walks through the typical workflow for synchronizing your local repository with the
central repository's master branch.
The commits from these new remote branches are shown as squares instead of circles in the
diagram below. As you can see, git fetch gives you access to the entire branch structure of another
repository.
To see what commits have been added to the upstream master, you can run a git log using
origin/master as a filter
To approve the changes and merge them into your local master branch with the following
commands:
The origin/master and master branches now point to the same commit, and you are synchronized
with the upstream developments.
git pull
Merging upstream changes into your local repository is a common task in Git-based collaboration
workflows. We already know how to do this with git fetch followed by git merge, but git pull rolls
this into a single command.
Usage
Fetch the specified remote’s copy of the current branch and immediately merge it into the local
copy. This is the same as git fetch <remote> followed by git merge origin/<current-branch>.
git pull --rebase <remote>
Same as the above command, but instead of using git merge to integrate the remote branch with
the local one, use git rebase.
Discussion
You can think of git pull as Git's version of svn update. It’s an easy way to synchronize your local
repository with upstream changes. The following diagram explains each step of the pulling process.
You start out thinking your repository is synchronized, but then git fetch reveals that origin's version
of master has progressed since you last checked it. Then git merge immediately integrates the
remote master into the local one:
In fact, pulling with --rebase is such a common workflow that there is a dedicated configuration
option for it:
After running that command, all git pull commands will integrate via git rebase instead of git merge.
Examples
The following example demonstrates how to synchronize with the central repository's master
branch:
This simply moves your local changes onto the top of what everybody else has already contributed.
git push
Pushing is how you transfer commits from your local repository to a remote repo. It's the
counterpart to git fetch, but whereas fetching imports commits to local branches, pushing exports
commits to remote branches. This has the potential to overwrite changes, so you need to be careful
how you use it. These issues are discussed below.
Usage
Push the specified branch to <remote>, along with all of the necessary commits and internal objects.
This creates a local branch in the destination repository. To prevent you from overwriting commits,
Git won’t let you push when it results in a non-fast-forward merge in the destination repository.
Same as the above command, but force the push even if it results in a non-fast-forward merge. Do
not use the --force flag unless you’re absolutely sure you know what you’re doing.
Tags are not automatically pushed when you push a branch or use the --all option. The --tags flag
sends all of your local tags to the remote repository.
Discussion
The most common use case for git push is to publish your local changes to a central repository. After
you’ve accumulated several local commits and are ready to share them with the rest of the team,
you (optionally) clean them up with an interactive rebase, then push them to the central repository.
The above diagram shows what happens when your local master has progressed past the central
repository’s master and you publish changes by running git push origin master. Notice how git push
is essentially the same as running git merge master from inside the remote repository.
Force Pushing
Git prevents you from overwriting the central repository’s history by refusing push requests when
they result in a non-fast-forward merge. So, if the remote history has diverged from your history,
you need to pull the remote branch and merge it into your local one, then try pushing again. This is
similar to how SVN makes you synchronize with the central repository via svn update before
committing a changeset.
The --force flag overrides this behavior and makes the remote repository’s branch match your local
one, deleting any upstream changes that may have occurred since you last pulled. The only time you
should ever need to force push is when you realize that the commits you just shared were not quite
right and you fixed them with a git commit --amend or an interactive rebase. However, you must be
absolutely certain that none of your teammates have pulled those commits before using the --force
option.
Examples
The following example describes one of the standard methods for publishing local contributions to
the central repository. First, it makes sure your local master is up-to-date by fetching the central
repository’s copy and rebasing your changes on top of them. The interactive rebase is also a good
opportunity to clean up your commits before sharing them. Then, the git push command sends all of
the commits on your local master to the central repository.
Since we already made sure the local master was up-to-date, this should result in a fast-forward
merge, and git push should not complain about any of the non-fast-forward issues discussed above.
Pull requests are a feature that makes it easier for developers to collaborate using Bitbucket. They
provide a user-friendly web interface for discussing proposed changes before integrating them into
the official project.
In their simplest form, pull requests are a mechanism for a developer to notify team members that
they have completed a feature. Once their feature branch is ready, the developer files a pull request
via their Bitbucket account. This lets everybody involved know that they need to review the code
and merge it into the master branch.
But, the pull request is more than just a notification—it’s a dedicated forum for discussing the
proposed feature. If there are any problems with the changes, teammates can post feedback in the
pull request and even tweak the feature by pushing follow-up commits. All of this activity is tracked
directly inside of the pull request.
Compared to other collaboration models, this formal solution for sharing commits makes for a much
more streamlined workflow. SVN and Git can both automatically send notification emails with a
simple script; however, when it comes to discussing changes, developers typically have to rely on
email threads. This can become haphazard, especially when follow-up commits are involved. Pull
requests put all of this functionality into a friendly web interface right next to your Bitbucket
repositories.
How it works
Pull requests can be used in conjunction with the Feature Branch Workflow, the Gitflow Workflow,
or the Forking Workflow. But a pull request requires either two distinct branches or two distinct
repositories, so they will not work with the Centralized Workflow. Using pull requests with each of
these workflows is slightly different, but the general process is as follows:
After receiving the pull request, the project maintainer has to decide what to do. If the feature is
ready to go, they can simply merge it into master and close the pull request. But, if there are
problems with the proposed changes, they can post feedback in the pull request. Follow-up commits
will show up right next to the relevant comments.
It’s also possible to file a pull request for a feature that is incomplete. For example, if a developer is
having trouble implementing a particular requirement, they can file a pull request containing their
work-in-progress. Other developers can then provide suggestions inside of the pull request, or even
fix the problem themselves with additional commits.
Features are generally merged into the develop branch, while release and hotfix branches are
merged into both develop and master. Pull requests can be used to formally manage all of these
merges.
The notification aspect of pull requests is particularly useful in this workflow because the project
maintainer has no way of knowing when another developer has added commits to their Bitbucket
repository.
Since each developer has their own public repository, the pull request’s source repository will differ
from its destination repository. The source repository is the developer’s public repository and the
source branch is the one that contains the proposed changes. If the developer is trying to merge the
feature into the main codebase, then the destination repository is the official project and the
destination branch is master.
Pull requests can also be used to collaborate with other developers outside of the official project.
For example, if a developer was working on a feature with a teammate, they could file a pull request
using the teammate’s Bitbucket repository for the destination instead of the official project. They
would then use the same feature branch for the source and destination branches.
The two developers could discuss and develop the feature inside of the pull request. When they’re
done, one of them would file another pull request asking to merge the feature into the official
master branch. This kind of flexibility makes pull requests very powerful collaboration tool in the
Forking workflow.
Example
The example below demonstrates how pull requests can be used in the Forking Workflow. It is
equally applicable to developers working in small teams and to a third-party developer contributing
to an open source project.
In the example, Mary is a developer, and John is the project maintainer. Both of them have their
own public Bitbucket repositories, and John’s contains the official project.
To start working in the project, Mary first needs to fork John’s Bitbucket repository. She can do this
by signing in to Bitbucket, navigating to John’s repository, and clicking the Fork button.
After filling out the name and description for the forked repository, she will have a server-side copy
of the project.
Next, Mary needs to clone the Bitbucket repository that she just forked. This will give her a working
copy of the project on her local machine. She can do this by running the following command:
Keep in mind that git clone automatically creates an origin remote that points back to Mary’s forked
repository.
Mary develops a new feature
Before she starts writing any code, Mary needs to create a new branch for the feature. This branch is
what she will use as the source branch of the pull request.
This makes her changes available to the project maintainer (or any collaborators who might need
access to them).
After Bitbucket has her feature branch, Mary can create the pull request through her Bitbucket
account by navigating to her forked repository and clicking the Pull request button in the top-right
corner. The resulting form automatically sets Mary’s repository as the source repository, and it asks
her to specify the source branch, the destination repository, and the destination branch.
Mary wants to merge her feature into the main codebase, so the source branch is her feature
branch, the destination repository is John’s public repository, and the destination branch is master.
She’ll also need to provide a title and description for the pull request. If there are other people who
need to approve the code besides John, she can enter them in the Reviewers field.
After she creates the pull request, a notification will be sent to John via his Bitbucket feed and
(optionally) via email.
John can access all of the pull requests people have filed by clicking on the Pull request tab in his
own Bitbucket repository. Clicking on Mary’s pull request will show him a description of the pull
request, the feature’s commit history, and a diff of all the changes it contains.
If he thinks the feature is ready to merge into the project, all he has to do is hit the Merge button to
approve the pull request and merge Mary’s feature into his master branch.
But, for this example, let’s say John found a small bug in Mary’s code, and needs her to fix it before
merging it in. He can either post a comment to the pull request as a whole, or he can select a specific
commit in the feature’s history to comment on.
Mary adds a follow-up commit
If Mary has any questions about the feedback, she can respond inside of the pull request, treating it
as a discussion forum for her feature.
To correct the error, Mary adds another commit to her feature branch and pushes it to her Bitbucket
repository, just like she did the first time around. This commit is automatically added to the original
pull request, and John can review the changes again, right next to his original comment.
Using Branches
This tutorial is a comprehensive introduction to Git branches. First, we‘ll take a look at creating
branches, which is like requesting a new project history. Then, we’ll see how git checkout can be
used to select a branch. Finally, we'll learn how git merge can integrate the history of independent
branches.
As you read, remember that Git branches aren't like SVN branches. Whereas SVN branches are only
used to capture the occasional large-scale development effort, Git branches are an integral part of
your everyday workflow.
git branch
A branch represents an independent line of development. Branches serve as an abstraction for the
edit/stage/commit process discussed in Git Basics, the first module of this series. You can think of
them as a way to request a brand new working directory, staging area, and project history. New
commits are recorded in the history for the current branch, which results in a fork in the history of
the project.
The git branch command lets you create, list, rename, and delete branches. It doesn’t let you switch
between branches or put a forked history back together again. For this reason, git branch is tightly
integrated with the git checkout and git merge commands.
Usage
git branch
Create a new branch called <branch>. This does not check out the new branch.
Delete the specified branch. This is a “safe” operation in that Git prevents you from deleting the
branch if it has unmerged changes.
Force delete the specified branch, even if it has unmerged changes. This is the command to use if
you want to permanently throw away all of the commits associated with a particular line of
development.
git branch -m <branch>
Discussion
In Git, branches are a part of your everyday development process. When you want to add a new
feature or fix a bug—no matter how big or how small—you spawn a new branch to encapsulate your
changes. This makes sure that unstable code is never committed to the main code base, and it gives
you the chance to clean up your feature’s history before merging it into the main branch.
For example, the diagram above visualizes a repository with two isolated lines of development, one
for a little feature, and one for a longer-running feature. By developing them in branches, it’s not
only possible to work on both of them in parallel, but it also keeps the main master branch free from
questionable code.
Branch Tips
The implementation behind Git branches is much more lightweight than SVN’s model. Instead of
copying files from directory to directory, Git stores a branch as a reference to a commit. In this
sense, a branch represents the tip of a series of commits—it's not a container for commits. The
history for a branch is extrapolated through the commit relationships.
This has a dramatic impact on Git's merging model. Whereas merges in SVN are done on a file-basis,
Git lets you work on the more abstract level of commits. You can actually see merges in the project
history as a joining of two independent commit histories.
Example
Creating Branches
It's important to understand that branches are just pointers to commits. When you create a branch,
all Git needs to do is create a new pointer—it doesn’t change the repository in any other way. So, if
you start with a repository that looks like this:
Then, you create a branch using the following command:
The repository history remains unchanged. All you get is a new pointer to the current commit:
Note that this only creates the new branch. To start adding commits to it, you need to select it with
git checkout, and then use the standard git add and git commit commands. Please see the git
checkout section of this module for more information.
Deleting Branches
Once you’ve finished working on a branch and have merged it into the main code base, you’re free
to delete the branch without losing any history:
git branch -d crazy-experiment
However, if the branch hasn’t been merged, the above command will output an error message:
This protects you from losing your reference to those commits, which means you would effectively
lose access to that entire line of development. If you really want to delete the branch (e.g., it’s a
failed experiment), you can use the capital -D flag:
This deletes the branch regardless of its status and without warnings, so use it judiciously.
git checkout
The git checkout command lets you navigate between the branches created by git branch. Checking
out a branch updates the files in the working directory to match the version stored in that branch,
and it tells Git to record all new commits on that branch. Think of it as a way to select which line of
development you’re working on.
In the previous module, we saw how git checkout can be used to view old commits. Checking out
branches is similar in that the working directory is updated to match the selected branch/revision;
however, new changes are saved in the project history—that is, it’s not a read-only operation.
Usage
Check out the specified branch, which should have already been created with git branch. This makes
<existing-branch> the current branch, and updates the working directory to match.
Create and check out <new-branch>. The -b option is a convenience flag that tells Git to run git
branch <new-branch> before running git checkout <new-branch>. git checkout -b <new-branch>
<existing-branch>
Same as the above invocation, but base the new branch off of <existing-branch> instead of the
current branch.
Discussion
git checkout works hand-in-hand with git branch. When you want to start a new feature, you create
a branch with git branch, then check it out with git checkout. You can work on multiple features in a
single repository by switching between them with git checkout.
Having a dedicated branch for each new feature is a dramatic shift from the traditional SVN
workflow. It makes it ridiculously easy to try new experiments without the fear of destroying existing
functionality, and it makes it possible to work on many unrelated features at the same time. In
addition, branches also facilitate several collaborative workflows.
Detached HEADs
Now that we’ve seen the three main uses of git checkout we can talk about that “detached HEAD”
we encountered in the previous module.
Remember that the HEAD is Git’s way of referring to the current snapshot. Internally, the git
checkout command simply updates the HEAD to point to either the specified branch or commit.
When it points to a branch, Git doesn't complain, but when you check out a commit, it switches into
a “detached HEAD” state.
This is a warning telling you that everything you’re doing is “detached” from the rest of your
project’s development. If you were to start developing a feature while in a detached HEAD state,
there would be no branch allowing you to get back to it. When you inevitably check out another
branch (e.g., to merge your feature in), there would be no way to reference your feature:
The point is, your development should always take place on a branch—never on a detached HEAD.
This makes sure you always have a reference to your new commits. However, if you’re just looking at
an old commit, it doesn’t really matter if you’re in a detached HEAD state or not.
Example
The following example demonstrates the basic Git branching process. When you want to start
working on a new feature, you create a dedicated branch and switch into it:
Then, you can commit new snapshots just like we’ve seen in previous modules:
All of these are recorded in new-feature, which is completely isolated from master. You can add as
many commits here as necessary without worrying about what’s going on in the rest of your
branches. When it’s time to get back to “official” code base, simply check out the master branch:
This shows you the state of the repository before you started your feature. From here, you have the
option to merge in the completed feature, branch off a brand new, unrelated feature, or do some
work with the stable version of your project.
git merge
Merging is Git's way of putting a forked history back together again. The git merge command lets
you take the independent lines of development created by git branch and integrate them into a
single branch.
Note that all of the commands presented below merge into the current branch. The current branch
will be updated to reflect the merge, but the target branch will be completely unaffected. Again, this
means that git merge is often used in conjunction with git checkout for selecting the current branch
and git branch -d for deleting the obsolete target branch.
Usage
Merge the specified branch into the current branch. Git will determine the merge algorithm
automatically (discussed below).
Merge the specified branch into the current branch, but always generate a merge commit (even if it
was a fast-forward merge). This is useful for documenting all merges that occur in your repository.
Discussion
Once you’ve finished developing a feature in an isolated branch, it's important to be able to get it
back into the main code base. Depending on the structure of your repository, Git has several distinct
algorithms to accomplish this: a fast-forward merge or a 3-way merge.
A fast-forward merge can occur when there is a linear path from the current branch tip to the target
branch. Instead of “actually” merging the branches, all Git has to do to integrate the histories is
move (i.e., “fast forward”) the current branch tip up to the target branch tip. This effectively
combines the histories, since all of the commits reachable from the target branch are now available
through the current one. For example, a fast forward merge of some-feature into master would look
something like the following:
However, a fast-forward merge is not possible if the branches have diverged. When there is not a
linear path to the target branch, Git has no choice but to combine them via a 3-way merge. 3-way
merges use a dedicated commit to tie together the two histories. The nomenclature comes from the
fact that Git uses three commits to generate the merge commit: the two branch tips and their
common ancestor.
While you can use either of these merge strategies, many developers like to use fast-forward merges
(facilitated through rebasing) for small features or bug fixes, while reserving 3-way merges for the
integration of longer-running features. In the latter case, the resulting merge commit serves as a
symbolic joining of the two branches.
Resolving Conflicts
If the two branches you‘re trying to merge both changed the same part of the same file, Git won’t be
able to figure out which version to use. When such a situation occurs, it stops right before the merge
commit so that you can resolve the conflicts manually.
The great part of Git's merging process is that it uses the familiar edit/stage/commit workflow to
resolve merge conflicts. When you encounter a merge conflict, running the git status command
shows you which files need to be resolved. For example, if both branches modified the same section
of hello.py, you would see something like the following:
# On branch master
# Unmerged paths:
# (use "git add/rm ..." as appropriate to mark resolution)
#
# both modified: hello.py
#
Then, you can go in and fix up the merge to your liking. When you're ready to finish the merge, all
you have to do is run git add on the conflicted file(s) to tell Git they're resolved. Then, you run a
normal git commit to generate the merge commit. It’s the exact same process as committing an
ordinary snapshot, which means it’s easy for normal developers to manage their own merges.
Note that merge conflicts will only occur in the event of a 3-way merge. It’s not possible to have
conflicting changes in a fast-forward merge.
Example
Fast-Forward Merge
Our first example demonstrates a fast-forward merge. The code below creates a new branch, adds
two commits to it, then integrates it into the main line with a fast-forward merge.
This is a common workflow for short-lived topic branches that are used more as an isolated
development than an organizational tool for longer-running features.
Also note that Git should not complain about the git branch -d, since new-feature is now accessible
from the master branch.
3-Way Merge
The next example is very similar, but requires a 3-way merge because master progresses while the
feature is in-progress. This is a common scenario for large features or when several developers are
working on a project simultaneously.
Note that it’s impossible for Git to perform a fast-forward merge, as there is no way to move master
up to new-feature without backtracking.
For most workflows, new-feature would be a much larger feature that took a long time to develop,
which would be why new commits would appear on master in the meantime. If your feature branch
was actually as small as the one in the above example, you would probably be better off rebasing it
onto master and doing a fast-forward merge. This prevents superfluous merge commits from
cluttering up the project history.
Comparing Workflows
The array of possible workflows can make it hard to know where to begin when implementing Git in
the workplace. This page provides a starting point by surveying the most common Git workflows for
enterprise teams.
As you read through, remember that these workflows are designed to be guidelines rather than
concrete rules. We want to show you what’s possible, so you can mix and match aspects from
different workflows to suit your individual needs.
Centralized Workflow
Transitioning to a distributed version control system may seem like a daunting task, but you don’t
have to change your existing workflow to take advantage of Git. Your team can develop projects in
the exact same way as they do with Subversion.
However, using Git to power your development workflow presents a few advantages over SVN. First,
it gives every developer their own local copy of the entire project. This isolated environment lets
each developer work independently of all other changes to a project—they can add commits to their
local repository and completely forget about upstream developments until it's convenient for them.
Second, it gives you access to Git’s robust branching and merging model. Unlike SVN, Git branches
are designed to be a fail-safe mechanism for integrating code and sharing changes between
repositories.
How It Works
Like Subversion, the Centralized Workflow uses a central repository to serve as the single point-of-
entry for all changes to the project. Instead of trunk, the default development branch is called
master and all changes are committed into this branch. This workflow doesn’t require any other
branches besides master.
Developers start by cloning the central repository. In their own local copies of the project, they edit
files and commit changes as they would with SVN; however, these new commits are stored locally—
they’re completely isolated from the central repository. This lets developers defer synchronizing
upstream until they’re at a convenient break point.
To publish changes to the official project, developers “push” their local master branch to the central
repository. This is the equivalent of svn commit, except that it adds all of the local commits that
aren’t already in the central master branch.
Managing Conflicts
The central repository represents the official project, so its commit history should be treated as
sacred and immutable. If a developer’s local commits diverge from the central repository, Git will
refuse to push their changes because this would overwrite official commits.
Before the developer can publish their feature, they need to fetch the updated central commits and
rebase their changes on top of them. This is like saying, “I want to add my changes to what everyone
else has already done.” The result is a perfectly linear history, just like in traditional SVN workflows.
If local changes directly conflict with upstream commits, Git will pause the rebasing process and give
you a chance to manually resolve the conflicts. The nice thing about Git is that it uses the same git
status and git add commands for both generating commits and resolving merge conflicts. This makes
it easy for new developers to manage their own merges. Plus, if they get themselves into trouble, Git
makes it very easy to abort the entire rebase and try again (or go find help).
Example
Let’s take a step-by-step look at how a typical small team would collaborate using this workflow.
We’ll see how two developers, John and Mary, can work on separate features and share their
contributions via a centralized repository.
Central repositories should always be bare repositories (they shouldn’t have a working directory),
which can be created as follows:
Be sure to use a valid SSH username for user, the domain or IP address of your server for host, and
the location where you'd like to store your repo for /path/to/repo.git. Note that the .git extension is
conventionally appended to the repository name to indicate that it’s a bare repository.
When you clone a repository, Git automatically adds a shortcut called origin that points back to the
“parent” repository, under the assumption that you'll want to interact with it further on down the
road.
In his local repository, John can develop features using the standard Git commit process: edit, stage,
and commit. If you’re not familiar with the staging area, it’s a way to prepare a commit without
having to include every change in the working directory. This lets you create highly focused commits,
even if you’ve made a lot of local changes.
git status # View the state of the repo
git add <some-file> # Stage a file
git commit # Commit a file</some-file>
Remember that since these commands create local commits, John can repeat this process as many
times as he wants without worrying about what’s going on in the central repository. This can be very
useful for large features that need to be broken down into simpler, more atomic chunks.
Meanwhile, Mary is working on her own feature in her own local repository using the same
edit/stage/commit process. Like John, she doesn’t care what’s going on in the central repository, and
she really doesn’t care what John is doing in his local repository, since all local repositories are
private.
Remember that origin is the remote connection to the central repository that Git created when John
cloned it. The master argument tells Git to try to make the origin’s master branch look like his local
master branch. Since the central repository hasn’t been updated since John cloned it, this won’t
result in any conflicts and the push will work as expected.
Let’s see what happens if Mary tries to push her feature after John has successfully published his
changes to the central repository. She can use the exact same push command:
But, since her local history has diverged from the central repository, Git will refuse the request with
a rather verbose error message:
This prevents Mary from overwriting official commits. She needs to pull John’s updates into her
repository, integrate them with her local changes, and then try again.
Mary rebases on top of John’s commit(s)
Mary can use git pull to incorporate upstream changes into her repository. This command is sort of
like svn update—it pulls the entire upstream commit history into Mary’s local repository and tries to
integrate it with her local commits:
The --rebase option tells Git to move all of Mary’s commits to the tip of the master branch after
synchronising it with the changes from the central repository, as shown below:
The pull would still work if you forgot this option, but you would wind up with a superfluous “merge
commit” every time someone needed to synchronize with the central repository. For this workflow,
it’s always better to rebase instead of generating a merge commit.
Mary resolves a merge conflict
Rebasing works by transferring each local commit to the updated master branch one at a time. This
means that you catch merge conflicts on a commit-by-commit basis rather than resolving all of them
in one massive merge commit. This keeps your commits as focused as possible and makes for a clean
project history. In turn, this makes it much easier to figure out where bugs were introduced and, if
necessary, to roll back changes with minimal impact on the project.
If Mary and John are working on unrelated features, it’s unlikely that the rebasing process will
generate conflicts. But if it does, Git will pause the rebase at the current commit and output the
following message, along with some relevant instructions:
# Unmerged paths:
# (use "git reset HEAD <some-file>..." to unstage)
# (use "git add/rm <some-file>..." as appropriate to mark resolution)
#
# both modified: <some-file>
Then, she’ll edit the file(s) to her liking. Once she’s happy with the result, she can stage the file(s) in
the usual fashion and let git rebase do the rest:
And that’s all there is to it. Git will move on to the next commit and repeat the process for any other
commits that generate conflicts.
If you get to this point and realize and you have no idea what’s going on, don’t panic. Just execute
the following command and you’ll be right back to where you started before you ran [git pull --
rebase](/tutorials/syncing/git-pull):
After she’s done synchronizing with the central repository, Mary will be able to publish her changes
successfully:
If your team is comfortable with the Centralized Workflow but wants to streamline its collaboration
efforts, it's definitely worth exploring the benefits of the Feature Branch Workflow. By dedicating an
isolated branch to each feature, it’s possible to initiate in-depth discussions around new additions
before integrating them into the official project.
Once you've got the hang of the Centralized Workflow, adding feature branches to your
development process is an easy way to encourage collaboration and streamline communication
between developers.
The core idea behind the Feature Branch Workflow is that all feature development should take place
in a dedicated branch instead of the master branch. This encapsulation makes it easy for multiple
developers to work on a particular feature without disturbing the main codebase. It also means the
master branch will never contain broken code, which is a huge advantage for continuous integration
environments.
Encapsulating feature development also makes it possible to leverage pull requests, which are a way
to initiate discussions around a branch. They give other developers the opportunity to sign off on a
feature before it gets integrated into the official project. Or, if you get stuck in the middle of a
feature, you can open a pull request asking for suggestions from your colleagues. The point is, pull
requests make it incredibly easy for your team to comment on each other’s work.
How It Works
The Feature Branch Workflow still uses a central repository, and master still represents the official
project history. But, instead of committing directly on their local master branch, developers create a
new branch every time they start work on a new feature. Feature branches should have descriptive
names, like animated-menu-items or issue-#1061. The idea is to give a clear, highly-focused purpose
to each branch.
Git makes no technical distinction between the master branch and feature branches, so developers
can edit, stage, and commit changes to a feature branch just as they did in the Centralized
Workflow.
In addition, feature branches can (and should) be pushed to the central repository. This makes it
possible to share a feature with other developers without touching any official code. Since master is
the only “special” branch, storing several feature branches on the central repository doesn’t pose
any problems. Of course, this is also a convenient way to back up everybody’s local commits.
Pull Requests
Aside from isolating feature development, branches make it possible to discuss changes via pull
requests. Once someone completes a feature, they don’t immediately merge it into master. Instead,
they push the feature branch to the central server and file a pull request asking to merge their
additions into master. This gives other developers an opportunity to review the changes before they
become a part of the main codebase.
Code review is a major benefit of pull requests, but they’re actually designed to be a generic way to
talk about code. You can think of pull requests as a discussion dedicated to a particular branch. This
means that they can also be used much earlier in the development process. For example, if a
developer needs help with a particular feature, all they have to do is file a pull request. Interested
parties will be notified automatically, and they’ll be able to see the question right next to the
relevant commits.
Once a pull request is accepted, the actual act of publishing a feature is much the same as in the
Centralized Workflow. First, you need to make sure your local master is synchronized with the
upstream master. Then, you merge the feature branch into master and push the updated master
back to the central repository.
Pull requests can be facilitated by product repository management solutions like Bitbucket Cloud or
Bitbucket Server. View the Bitbucket Server pull requests documentation for an example.
Example
The example included below demonstrates a pull request as a form of code review, but remember
that they can serve many other purposes.
This checks out a branch called marys-feature based on master, and the -b flag tells Git to create the
branch if it doesn’t already exist. On this branch, Mary edits, stages, and commits changes in the
usual fashion, building up her feature with as many commits as necessary:
git status
git add <some-file>
git commit
Mary adds a few commits to her feature over the course of the morning. Before she leaves for lunch,
it’s a good idea to push her feature branch up to the central repository. This serves as a convenient
backup, but if Mary was collaborating with other developers, this would also give them access to her
initial commits.
When Mary gets back from lunch, she completes her feature. Before merging it into master, she
needs to file a pull request letting the rest of the team know she's done. But first, she should make
sure the central repository has her most recent commits:
git push
Then, she files the pull request in her Git GUI asking to merge marys-feature into master, and team
members will be notified automatically. The great thing about pull requests is that they show
comments right next to their related commits, so it's easy to ask questions about specific
changesets.
To make the changes, Mary uses the exact same process as she did to create the first iteration of her
feature. She edits, stages, commits, and pushes updates to the central repository. All her activity
shows up in the pull request, and Bill can still make comments along the way.
If he wanted, Bill could pull marys-feature into his local repository and work on it on his own. Any
commits he added would also show up in the pull request.
Mary publishes her feature
Once Bill is ready to accept the pull request, someone needs to merge the feature into the stable
project (this can be done by either Bill or Mary):
This process often results in a merge commit. Some developers like this because it’s like a symbolic
joining of the feature with the rest of the code base. But, if you’re partial to a linear history, it’s
possible to rebase the feature onto the tip of master before executing the merge, resulting in a fast-
forward merge.
Some GUI’s will automate the pull request acceptance process by running all of these commands just
by clicking an “Accept” button. If yours doesn’t, it should at least be able to automatically close the
pull request when the feature branch gets merged into master
The Feature Branch Workflow is an incredibly flexible way to develop a project. The problem is,
sometimes it’s too flexible. For larger teams, it’s often beneficial to assign more specific roles to
different branches. The Gitflow Workflow is a common pattern for managing feature development,
release preparation, and maintenance.
Gitflow Workflow
The Gitflow Workflow section below is derived from Vincent Driessen at nvie.
The Gitflow Workflow defines a strict branching model designed around the project release. While
somewhat more complicated than the Feature Branch Workflow, this provides a robust framework
for managing larger projects.
This workflow doesn’t add any new concepts or commands beyond what’s required for the Feature
Branch Workflow. Instead, it assigns very specific roles to different branches and defines how and
when they should interact. In addition to feature branches, it uses individual branches for preparing,
maintaining, and recording releases. Of course, you also get to leverage all the benefits of the
Feature Branch Workflow: pull requests, isolated experiments, and more efficient collaboration.
How It Works
The Gitflow Workflow still uses a central repository as the communication hub for all developers.
And, as in the other workflows, developers work locally and push branches to the central repo. The
only difference is the branch structure of the project.
Historical Branches
Instead of a single master branch, this workflow uses two branches to record the history of the
project. The master branch stores the official release history, and the develop branch serves as an
integration branch for features. It's also convenient to tag all commits in the master branch with a
version number.
The rest of this workflow revolves around the distinction between these two branches.
Feature Branches
Each new feature should reside in its own branch, which can be pushed to the central repository for
backup/collaboration. But, instead of branching off of master, feature branches use develop as their
parent branch. When a feature is complete, it gets merged back into develop. Features should never
interact directly with master.
Note that feature branches combined with the develop branch is, for all intents and purposes, the
Feature Branch Workflow. But, the Gitflow Workflow doesn’t stop there.
Release Branches
Once develop has acquired enough features for a release (or a predetermined release date is
approaching), you fork a release branch off of develop. Creating this branch starts the next release
cycle, so no new features can be added after this point—only bug fixes, documentation generation,
and other release-oriented tasks should go in this branch. Once it's ready to ship, the release gets
merged into master and tagged with a version number. In addition, it should be merged back into
develop, which may have progressed since the release was initiated.
Using a dedicated branch to prepare releases makes it possible for one team to polish the current
release while another team continues working on features for the next release. It also creates well-
defined phases of development (e.g., it‘s easy to say, “this week we’re preparing for version 4.0” and
to actually see it in the structure of the repository).
Common conventions:
Maintenance Branches
Maintenance or “hotfix” branches are used to quickly patch production releases. This is the only
branch that should fork directly off of master. As soon as the fix is complete, it should be merged
into both master and develop (or the current release branch), and master should be tagged with an
updated version number.
Having a dedicated line of development for bug fixes lets your team address issues without
interrupting the rest of the workflow or waiting for the next release cycle. You can think of
maintenance branches as ad hoc release branches that work directly with master.
Example
The example below demonstrates how this workflow can be used to manage a single release cycle.
We’ll assume you have already created a central repository.
This branch will contain the complete history of the project, whereas master will contain an abridged
version. Other developers should now clone the central repository and create a tracking branch for
develop:
Everybody now has a local copy of the historical branches set up.
Our example starts with John and Mary working on separate features. They both need to create
separate branches for their respective features. Instead of basing it on master, they should both
base their feature branches on develop:
Both of them add commits to the feature branch in the usual fashion: edit, stage, commit:
git status
git add <some-file>
git commit
The first command makes sure the develop branch is up to date before trying to merge in the
feature. Note that features should never be merged directly into master. Conflicts can be resolved in
the same way as in the Centralized Workflow.
While John is still working on his feature, Mary starts to prepare the first official release of the
project. Like feature development, she uses a new branch to encapsulate the release preparations.
This step is also where the release’s version number is established:
git checkout -b release-0.1 develop
This branch is a place to clean up the release, test everything, update the documentation, and do
any other kind of preparation for the upcoming release. It’s like a feature branch dedicated to
polishing the release.
As soon as Mary creates this branch and pushes it to the central repository, the release is feature-
frozen. Any functionality that isn’t already in develop is postponed until the next release cycle.
Once the release is ready to ship, Mary merges it into master and develop, then deletes the release
branch. It’s important to merge back into develop because critical updates may have been added to
the release branch and they need to be accessible to new features. Again, if Mary’s organization
stresses code review, this would be an ideal place for a pull request.
Release branches act as a buffer between feature development (develop) and public releases
(master). Whenever you merge something into master, you should tag the commit for easy
reference:
Maintenance Branch
After shipping the release, Mary goes back to developing features for the next release with John.
That is, until an end-user opens a ticket complaining about a bug in the current release. To address
the bug, Mary (or John) creates a maintenance branch off of master, fixes the issue with as many
commits as necessary, then merges it directly back into master.
Like release branches, maintenance branches contain important updates that need to be included in
develop, so Mary needs to perform that merge as well. Then, she’s free to delete the branch:
By now, you’re hopefully quite comfortable with the Centralized Workflow, the Feature Branch
Workflow, and the Gitflow Workflow. You should also have a solid grasp on the potential of local
repositories, the push/pull pattern, and Git's robust branching and merging model.
Remember that the workflows presented here are merely examples of what’s possible—they are not
hard-and-fast rules for using Git in the workplace. So, don't be afraid to adopt some aspects of a
workflow and disregard others. The goal should always be to make Git work for you, not the other
way around.
Forking Workflow
The Forking Workflow is fundamentally different than the other workflows discussed in this tutorial.
Instead of using a single server-side repository to act as the “central” codebase, it gives every
developer a server-side repository. This means that each contributor has not one, but two Git
repositories: a private local one and a public server-side one.
The main advantage of the Forking Workflow is that contributions can be integrated without the
need for everybody to push to a single central repository. Developers push to their own server-side
repositories, and only the project maintainer can push to the official repository. This allows the
maintainer to accept commits from any developer without giving them write access to the official
codebase.
The result is a distributed workflow that provides a flexible way for large, organic teams (including
untrusted third-parties) to collaborate securely. This also makes it an ideal workflow for open source
projects.
How It Works
As in the other Git workflows, the Forking Workflow begins with an official public repository stored
on a server. But when a new developer wants to start working on the project, they do not directly
clone the official repository.
Instead, they fork the official repository to create a copy of it on the server. This new copy serves as
their personal public repository—no other developers are allowed to push to it, but they can pull
changes from it (we’ll see why this is important in a moment). After they have created their server-
side copy, the developer performs a git clone to get a copy of it onto their local machine. This serves
as their private development environment, just like in the other workflows.
When they're ready to publish a local commit, they push the commit to their own public repository
—not the official one. Then, they file a pull request with the main repository, which lets the project
maintainer know that an update is ready to be integrated. The pull request also serves as a
convenient discussion thread if there are issues with the contributed code.
To integrate the feature into the official codebase, the maintainer pulls the contributor’s changes
into their local repository, checks to make sure it doesn’t break the project, merges it into his local
master branch, then pushes the master branch to the official repository on the server. The
contribution is now part of the project, and other developers should pull from the official repository
to synchronize their local repositories.
Example
As with any Git-based project, the first step is to create an official repository on a server accessible
to all of the team members. Typically, this repository will also serve as the public repository of the
project maintainer.
Public repositories should always be bare, regardless of whether they represent the official codebase
or not. So, the project maintainer should run something like the following to set up the official
repository:
ssh user@host
git init --bare /path/to/repo.git
Bitbucket also provides a convenient GUI alternative to the above commands. This is the exact same
process as setting up a central repository for the other workflows in this tutorial. The maintainer
should also push the existing codebase to this repository, if necessary.
Next, all of the other developers need to fork this official repository. It’s possible to do this by
SSH’ing into the server and running git clone to copy it to another location on the server—yes,
forking is basically just a server-side clone. But again, Bitbucket let developers fork a repository with
the click of a button.
After this step, every developer should have their own server-side repository. Like the official
repository, all of these should be bare repositories.
Our example assumes the use of Bitbucket to host these repositories. Remember, in this situation,
each developer should have their own Bitbucket account and they should clone their server-side
repository using:
Whereas the other workflows in this tutorial use a single origin remote that points to the central
repository, the Forking Workflow requires two remotes—one for the official repository, and one for
the developer’s personal server-side repository. While you can call these remotes anything you
want, a common convention is to use origin as the remote for your forked repository (this will be
created automatically when you run git clone) and upstream for the official repository.
ou’ll need to create the upstream remote yourself using the above command. This will let you easily
keep your local repository up-to-date as the official project progresses. Note that if your upstream
repository has authentication enabled (i.e., it‘s not open source), you’ll need to supply a username,
like so:
This requires users to supply a valid password before cloning or pulling from the official codebase.
Since developers should be working in a dedicated feature branch, this should generally result in a
fast-forward merge.
Once a developer is ready to share their new feature, they need to do two things. First, they have to
make their contribution accessible to other developers by pushing it to their public repository. Their
origin remote should already be set up, so all they should have to do is the following:
git push origin feature-branch
This diverges from the other workflows in that the origin remote points to the developer’s personal
server-side repository, not the main codebase.
Second, they need to notify the project maintainer that they want to merge their feature into the
official codebase. Bitbucket provides a “Pull request” button that leads to a form asking you to
specify which branch you want to merge into the official repository. Typically, you’ll want to
integrate your feature branch into the upstream remote’s master branch.
When the project maintainer receives the pull request, their job is to decide whether or not to
integrate it into the official codebase. They can do this in one of two ways:
The first option is simpler, as it lets the maintainer view a diff of the changes, comment on it, and
perform the merge via a graphical user interface. However, the second option is necessary if the pull
request results in a merge conflict. In this case, the maintainer needs to fetch the feature branch
from the developer’s server-side repository, merge it into their local master branch, and resolve any
conflicts:
Once the changes are integrated into their local master, the maintainer needs to push it to the
official repository on the server so that other developers can access it:
Since the main codebase has moved forward, other developers should synchronize with the official
repository:
This article explained how a contribution flows from one developer into the official master branch,
but the same methodology can be used to integrate a contribution into any repository. For example,
if one part of your team is collaborating on a particular feature, they can share changes amongst
themselves in the exact same manner—without touching the main repository.
This makes the Forking Workflow a very powerful tool for loosely-knit teams. Any developer can
easily share changes with any other developer, and any branch can be efficiently merged into the
official codebase.
Migrating to Git
SVN to Git - prepping for the migration
In Why Git?, we discussed the many ways that Git can help your team become more agile. Once
you’ve decided to make the switch, your next step is to figure out how to migrate your existing
development workflow to Git.
This article explains some of the biggest changes you’ll encounter while transitioning your team from
SVN to Git. The most important thing to remember during the migration process is that Git is not
SVN. To realize the full potential of Git, try your best to open up to new ways of thinking about
version control.
For administrators
Adopting Git can take anywhere from a few days to several months depending on the size of your
team. This section addresses some of the main concerns for engineering managers when it comes to
training employees on Git and migrating repositories from SVN to Git.
Git once had a reputation for a steep learning curve. However the Git maintainers have been steadily
releasing new improvements like sensible defaults and contextual help messages that have made
the on-boarding process a lot more pleasant.
Atlassian offers a comprehensive series of self-paced Git tutorials, as well as webinars and live
training sessions. Together, these should provide all the training options your team needs to get
started with Git. To get you started, here are a list of some basic Git commands to get you going with
Git:
There’s a number of tools available to help you migrate your existing projects from SVN to Git, but
before you decide what tools to use, you need to figure out how you want to migrate your code.
Your options are:
Migrate your entire codebase to Git and stop using SVN altogether.
Don’t migrate any existing projects to Git, but use Git for all new projects.
Migrate some of your projects to Git while continuing to use SVN for other projects.
Use SVN and Git simultaneously on the same projects.
A complete transition to Git limits the complexity in your development workflow, so this is the
preferred option. However, this isn’t always possible in larger companies with dozens of
development teams and potentially hundreds of projects. In these situations, a hybrid approach is a
safer option.
Your choice of migration tool(s) depends largely on which of the above strategies you choose. Some
of the most common SVN-to-Git migration tools are introduced below.
We’ve provided a complete technical walkthrough for using these scripts to convert your entire
codebase to a collection of Git repositories. This walkthrough explains everything from extracting
SVN author information to re-organizing non-standard SVN repository structures.
SVN Mirror for Stash (now Bitbucket Server) plugin
SVN Mirror for Stash is a Bitbucket Server plugin that lets you easily maintain a hybrid codebase that
works with both SVN and Git. Unlike Atlassian’s migration scripts, SVN Mirror for Stash lets you use
Git and SVN simultaneously on the same project for as long as you like.
This compromise solution is a great option for larger companies. It enables incremental Git adoption
by letting different teams migrate workflows at their convenience.
Git-SVN
The git svn tool that comes with Git serves as an interface between a local Git repository and a
remote SVN repository. It lets developers write code and create commits locally with Git, then push
them up to a central SVN repository with svn commit-style behavior.
git svn is a good option if you’re not sure about making the switch to Git and want to let some of
your developers explore Git commands without committing to a full-on migration. It’s also perfect
for the training phase—instead of an abrupt transition, your team can ease into it with local Git
commands before worrying about collaboration workflows.
Note that git svn should only be a temporary phase of your migration process. Since it still depends
on SVN for the “backend,” it can’t leverage the more powerful Git features like branching or
advanced collaboration workflows.
Rollout Strategies
Migrating your codebase is only one aspect of adopting Git. You also need to consider how to
introduce Git to the people behind that codebase. External consultants, internal Git champions, and
pilots teams are the three main strategies for moving your development team over to Git.
On the other hand, designing and implementing a Git workflow on your own is a great way for your
team to understand the inner workings of their new development process. This avoids the risk of
your team being left in the dark when your consultant leaves.
Internal Git Champions
A Git champion is a developer inside of your company who’s excited to start using Git. Leveraging a
Git champion is a good option for companies with a strong developer culture and eager
programmers comfortable being early adopters. The idea is to enable one of your engineers to
become a Git expert so they can design a Git workflow tailored to your company and serve as an
internal consultant when it’s time to transition the rest of the team to Git.
Compared to an external consultant, this has the advantage of keeping your Git expertise in-house.
However, it requires a larger time investment to train that Git champion, and it runs the risk of
choosing the wrong Git workflow or implementing it incorrectly.
Pilot Teams
The third option for transitioning to Git is to test it out on a pilot team. This works best if you have a
small team working on a relatively isolated project. This could work even better by combining
external consultants with internal Git champions in the pilot team for a winning combo.
This has the advantage of requiring buy-in from your entire team, and also limits the risk of choosing
the wrong workflow, since it gets input from the entire team while designing the new development
process. In other words, it ensures any missing pieces are caught sooner than when a consultant or
champion designs the new workflow on their own.
On the other hand, using a pilot team means more initial training and setup time: instead of one
developer figuring out a new workflow, there’s a whole team that could potentially be temporarily
less productive while they’re getting comfortable with their new workflow. However, this short term
pain is absolutely worth the long term gain.
In SVN, you typically store your entire codebase in a single central repository, then limit access to
different teams or individuals by folder. In Git, this is not possible: developers must retrieve the
entire repository to work with it. You typically can not retrieve a subset of the repository, as you can
with SVN. permissions can only be granted to entire Git repositories.
This means you have to split up your large, monolithic SVN repository into several small Git
repositories. We actually experienced this first hand here at Atlassian when our JIRA development
team migrated to Git. All of our JIRA plugins used to be stored in a single SVN repository, but after
the migration, each plugin ended up in its own repository.
Keep in mind that Git was designed to securely integrate code contributions from thousands of
independent Linux developers, so it definitely provides some way to set up whatever kind of access
control your team needs. This may, however, require a fresh look at your build cycle.
If you’re concerned about maintaining dependencies between your new collection of Git
repositories, you may find a dependency management layer on top of Git helpful. A dependency
management layer will help with build times because as a project grows, you need “caching” in
order to speed up your build time. A list of recommended dependency management layer tools for
every technology stack can be found in this helpful article: “Git and project dependencies”.
For developers
Instead of checking out an SVN repository with svn checkout and getting a working copy, you clone
the entire Git repository to your local machine with git clone.
Collaboration occurs by moving branches between repositories with either git push, git fetch, or git
pull. Sharing is commonly done on the branch level in Git but can be done on the commit level,
similar to SVN. But in Git, a commit represents the entire state of the whole project instead rather
than file modifications. Since you can use branches in both Git and SVN, the important distinction
here is that you can commit locally with Git, without sharing your work. This enables you to
experiment more freely, work more effectively offline and speeds up almost all version control
related commands.
However, it’s important to understand that a remote repository is not a direct link into somebody
else’s repository. It’s simply a bookmark that prevents you from having to re-type the full URL each
time you interact with a remote repository. Until you explicitly pull or push a branch to a remote
repository, you’re working in an isolated environment.
The other big adjustment for SVN users is the notion of “local” and “remote” repositories. Local
repositories are on your local machine, and all other repositories are referred to as remote
repositories. The main purpose of a remote repository is to make your code accessible to the rest of
the team, and thus no active development takes place in them. Local repositories reside on your
local machine, and it’s where you do all of your software development.
Git’s basic development workflow is much different. Instead of being bound to a single line of
development (e.g., trunk/), life revolves around branching and merging.
When you want to start working on anything in Git, you create and check out a new branch with git
checkout -b <branch-name>. This gives you a dedicated line of development where you can write
code without worrying about affecting anyone else on your team. If you break something beyond
repair, you simply throw the branch away with git branch -d <branch-name>. If you build something
useful, you file a pull request asking to merge it into the master branch.
A centralized workflow provides the closest match to common SVN processes, so it's a good option
to get started.
Building on that idea, using a feature branch workflow lets developers keep their work in progress
isolated and important shared branches protected. Feature branches also form the basis for
managing changes via pull requests.
A Gitflow workflow is a more formal, structured extension to feature branching, making it a great
option for larger teams with well-defined release cycles.
Finally, consider a forking workflow if you need maximum isolation and control over changes, or
have many developers contributing to one repository.
But, if you really want to get the most out of Git as a professional team, you should consider the
feature branch workflow. This is a truly distributed workflow that is highly secure, incredibly
scalable, and quintessentially agile.
Conclusion
Transitioning your team to Git can be a daunting task, but it doesn’t have to be. This article
introduced some of the common options for migrating your existing codebase, rolling out Git to your
development teams, and dealing with security and permissions. We also introduced the biggest
challenges that your developers should be prepared for during the migration process.
Hopefully, you now have a solid foundation for introducing distributed development to your
company, regardless of its size or current development practices.
Migrate to Git from SVN
We’ve broken down the SVN-to-Git migration process into 5 simple steps:
The prepare, convert, and synchronize steps take a SVN commit history and turn it into a Git
repository. The best way to manage these first 3 steps is to designate one of your team members as
the migration lead (if you’re reading this guide, that person is probably you). All 3 of these steps
should be performed on the migration lead’s local computer.
After the synchronize phase, the migration lead should have no trouble keeping a local Git repository
up-to-date with an SVN counterpart. To share the Git repository, the migration lead can share his
local Git repository with other developers by pushing it to Bitbucket, a Git hosting service.
Once it’s on Bitbucket, other developers can clone the converted Git repository to their local
machines, explore its history with Git commands, and begin integrating it into their build processes.
However, we advocate a one-way synchronization from SVN to Git until your team is ready to switch
to a pure Git workflow. This means that everybody should treat their Git repository as read-only and
continue committing to the original SVN repository. The only changes to the Git repository should
happen when the migration lead synchronizes it and pushes the updates to Bitbucket.
This provides a clear-cut transition period where your team can get comfortable with Git without
interrupting your existing SVN-based workflow. Once you’re confident that your developers are
ready to make the switch, the final step in the migration process is to freeze your SVN repository and
begin committing with Git instead.
This switch should be a very natural process, as the entire Git workflow is already in place and your
developers have had all the time they need to get comfortable with it. By this point, you have
successfully migrated your project from SVN to Git.
Prepare
The first step to migrating a project from SVN to Git-based version control is to prepare the
migration lead’s local machine. In this phase, you’ll download a convenient utility script, mount a
case-sensitive filesystem (if necessary), and map author information from SVN to Git.
All of the the following steps should be performed on the migration lead’s local machine.
Once you’ve downloaded it, it’s a good idea to verify the scripts to make sure you have the Java
Runtime Environment, Git, Subversion, and the git-svn utility installed. Open a command prompt
and run the following:
This will display an error message in the console if you don’t have the necessary programs for the
migration process. Make sure that any missing software is installed before moving on.
If you get a warning about being unable to determine a version, run export LANG=C (*nix) or SET
LANG=C (Windows) and try again.
If you’re performing the migration on a computer running OS X, you’ll also see the following
warning:
You appear to be running on a case-insensitive file-system. This is unsupported, and can result in
data loss.
If you’re not running OS X, all you need to do is create a directory on your local machine called
~/GitMigration. This is where you will perform the conversion. After that, you can skip to the next
section.
If you are running OS X, you need to mount a case-sensitive disk image with the create-disk-image
script included in svn-migration-scripts.jar. It takes two parameters:
1. The size of the disk image to create in gigabytes. You can use any size you like, as long as it’s
bigger than the SVN repository that you’re trying to migrate.
2. The name of the disk image. This guide uses GitMigration for this value.
For example, the following command creates a 5GB disk image called GitMigration:
The disk image is mounted in your home directory, so you should now see a directory called
~/GitMigration on your local machine. This serves as a virtual case-sensitive filesystem, and it’s
where you’ll store the converted Git repository.
SVN only records the username of the author for each revision. Git, however, stores the full name
and email address of the author. This means that you need to create a text file that maps SVN
usernames to their Git counterparts.
Run the following commands to automatically generate this text file:
cd ~/GitMigration
Be sure to replace <svn-repo> with the URI of the SVN repository that you want to migrate. For
example, if your repository resided at https://svn.example.com, you would run the following:
This creates a text file called authors.txt that contains the username of every author in the SVN
repository along with a generated name and email address. It should look something like this:
Change the portion to the right of the equal sign to the full name and email address of the
corresponding user. For example, you might change the above authors to:
Summary
Now that you have your migration scripts, disk image (OS X only), and author information, you’re
ready to import your SVN history into a new Git repository. The next phase explains how this
conversion works.
Convert
The next step in the migration from SVN to Git is to import the contents of the SVN repository into a
new Git repository. We’ll do this with the git svn utility that is included with most Git distributions,
then we’ll clean up the results with svn-migration-scripts.jar.
Beware that the conversion process can take a significant amount of time for larger repositories,
even when cloning from a local SVN repository. As a benchmark, converting a 400MB repository with
33,000 commits on master took around 12 hours to complete.
For reasonably sized repositories, the following steps should be run on the migration lead’s local
computer. However, if you have a very large SVN repository and want to cut down on the conversion
time, you can run git svn clone on the SVN server instead of on the migration lead’s local machine.
This will avoid the overhead of cloning via a network connection.
If your SVN project uses the standard /trunk, /branches, and /tags directory layout, you can use the
--stdlayout option instead of manually specifying the repository’s structure. Run the following
command in the ~/GitMigration directory:
<svn-repo>/<project> <git-repo-name>
Where <svn-repo> is the URI of the SVN repository that you want to migrate and, <project> is the
name of the project that you want to import, and <git-repo-name> is the directory name of the new
Git repository.
For example, if you were migrating a project called Confluence, hosted on https://svn.atlassian.com,
you might run the following:
If your SVN repository doesn’t have a standard layout, you need to provide the locations of your
trunk, branches, and tags using the --trunk, --branches, and --tags command line options. For
example, if you have branches stored in both the /branches directory and the /bugfixes directories,
you would use the following command:
<svn-repo>/<project> <git-repo-name>
Branches and tags are not imported into the new Git repository as you might expect. You won’t find
any of your SVN branches in the git branch output, nor will you find any of your SVN tags in the git
tag output. But, if you run git branch -r, you’ll find all of the branches and tags from your SVN
repository. The git svn clone command imports your SVN branches as remote branches and imports
your SVN tags as remote branches prefixed with tags/.
This behavior makes certain two-way synchronization procedures easier, but it can be very confusing
when trying to make a one-way migration Git. That’s why our next step will be to convert these
remote branches to local branches and actual Git tags.
If you’re following this migration guide, this isn’t a problem, as it advocates a one-way sync from
SVN to Git (the Git repository is considered read-only until after the Migrate step). However, if
you’re planning on committing to the Git repository and the SVN repository during the migration
process, you should not perform the following commands. This is an advanced task, as is not
recommended for the typical project.
To see what can be cleaned up, run the following command in ~/GitMigration/<git-repo-name>:
This will output all of the changes the script wants to make, but it won’t actually make any of them.
To execute these changes, you need to use the --force option, like so:
--force
You should now see all of your SVN branches in the git branch output, along with your SVN tags in
the git tag output. This means that you’ve successfully converted your SVN project to a Git
repository.
Summary
In this step, you turned an SVN repository into a new Git repository with the git svn clone command,
then cleaned up the structure of the resulting repository with svn-migration-scripts.jar. In the next
step, you’ll learn how to keep this new Git repo in sync with any new commits to the SVN repository.
This will be a similar process to the conversion, but there are some important workflow
considerations during this transition period.
Synchronize
It’s very easy to synchronize your Git repository with new commits in the original SVN repository.
This makes for a comfortable transition period in the migration process where you can continue to
use your existing SVN workflow, but begin to experiment with Git.
It’s possible to synchronize in both directions. However, we recommend a one-way sync from SVN to
Git. During your transition period, you should only commit to your SVN repository, not your Git repo.
Once you’re confident that your team is ready to make the switch, you can complete the migration
process and begin to commit changes with Git instead of SVN.
In the meantime, you should continue to commit to your SVN repository and synchronize your Git
repository whenever necessary. This process is similar to the Convert phase, but since you’re only
dealing with incremental changes, it should be much more efficient.
If new developers have committed to the SVN repository since the last sync (or the initial clone), the
authors file needs to be updated accordingly. You can do this by manually appending new users to
authors.txt, or you can use the --authors-prog option, as discussed in the next section.
For one-off synchronizations it’s often easier to directly edit the authors file; however, the---authors-
prog option is preferred if you’re performing unsupervised syncs (i.e. in a scheduled task).
If you want to use the --authors-prog option, create a file called authors.sh option in ~/GitMigration.
Add the following line to authors.sh to return a dummy Git name and email for any authors that
aren’t found in authors.txt:
Again, this will only generate a dummy name and email based on the SVN username, so feel free to
alter it if you can provide a more meaningful mapping.
This is similar to the git svn clone command from the previous phase in that it only updates the Git
repository’s remote branches—the local branches will not reflect any of the updates yet. Your
remote branches, on the other hand, should exactly match your SVN repo’s history.
If you’re using the --authors-prog option, you need include it in the above command, like so:
This will rebase the fetched commits onto your local branches so that they match their remote
counterparts. You should now be able to see the new commits in your git log output.
It’s also a good idea to run the git-clean script again to remove any obsolete tags or branches that
were deleted from the original SVN repository since the last sync:
--force
Your local Git repository should now be synchronized with your SVN repository.
Summary
During this transition period, it’s very important that your developers only commit to the original
SVN repository. The only time the Git repository should be updated is via the synchronization
process discussed above. This is much easier than managing a two-way synchronization workflow,
but it still allows you to start integrating Git into your build process.
Share
In SVN, developers share contributions by committing changes from a working copy on their local
computer to a central repository. Then, other developers pull these updates from the central repo
into their own local working copies.
Git’s collaboration workflow is much different. Instead of differentiating between working copies
and the central repository, Git gives each developer their own local copy of the entire repository.
Changes are committed to this local repository instead of a central one. To share updates with other
developers, you need to push these local changes to a public Git repository on a server. Then, the
other developers can pull your new commits from the public repo into their own local repositories.
Giving each developer their own complete repository is the heart of distributed version control, and
it opens up a wide array of potential workflows. You can read more about these workflows from our
Git Workflows section.
So far, you’ve only been working with a local Git repository. This page explains how to push this local
repo to a public repository hosted on Bitbucket. Sharing the Git repository during the migration
allows your team to experiment with Git commands without affecting their active SVN development.
Until you’re ready to make the switch, it’s very important to treat the shared Git repositories as
read-only. All development should continue to be committed to the original SVN repository.
In the resulting form, add a name and description for your repository. If your project is private, keep
the Access level option checked so that only designated developers are allowed to clone it. For the
Forking field, use Allow only private forks. Use Git for the Repository type, select any project
management tools you want to use, and select the primary programming language of your project in
the Language field.
To create the hosted repository, submit the form by clicking the Create repository button. After your
repository is set up, you’ll see a Next steps page that describes some useful commands for importing
an existing project. The rest of this page will walk you through those instructions step-by-step.
After running the above command, you can use origin in other Git commands to refer to your
Bitbucket repository.
The -u option tells Git to track the upstream branches. This enables Git to tell you if the remote
repo’s commit history is ahead or behind your local ones. The --all option pushes all of the local
branches to the remote repository.
You also need to push your local tags to the Bitbucket repository with the --tags option:
If your repository is private, you’ll also need to grant access to your team members in the
Administration tab of the Bitbucket web interface. Users and groups can be managed by clicking the
Access management link the left sidebar.
As an alternative, you can use Bitbucket’s built-in invitation feature to invite other developers to fork
the repository. The invited users will automatically be given access to the repository, so you don’t
need to worry about granting permissions.
Once they have the URL of your repository, another developer can copy the repository to their local
machine with git clone and begin working with the project. For example, after running the following
command on their local machine, another developer would find a new Git repository containing the
project in the <destination> directory.
You should now be able to push your local project to a remote repository, and your team should be
able to use that remote repository to clone the project onto their local machines. These are all the
tools you need to start collaborating with Git. However, you and your team should continue to
commit changes using SVN until everybody is ready to make the switch.
The only changes to the Git repository should come from the original SVN repository using the
synchronization process discussed on the previous page. For all intents and purposes, this means
that all of your Git repositories (both local and remote) are read-only. Your developers can
experiment with them, and you can begin to integrate them into your build process, but you should
avoid committing any permanent changes using Git.
Summary
In this step, you set up a Bitbucket repository to share your converted Git repository with other
developers. You should now have all the tools you need to implement any of the git workflows
described in Git Workflows. You can continue synchronizing with the SVN repository and sharing the
resulting Git commits via Bitbucket for as long as it takes to get your development team comfortable
with Git. Then, you can complete the migration process by retiring your SVN repository.
Migrate
This migration guide advocates a one-way synchronization from SVN to Git during the transition
period. This means that while your team is getting comfortable with Git, they should still only be
committing to the original SVN repository. When you’re ready to make the switch, the SVN
repository should freeze at whatever state it’s in. Then, developers should begin committing to their
local Git repositories and sharing them via Bitbucket.
The discrete switch from SVN to Git makes for a very intuitive migration. All of your developers
should already understand the new Git workflows that they’ll be using, and they should have had
plenty of time to practice using Git commands on the local repositories they cloned from Bitbucket.
This page guides you through the final step of the migration.
Replace <svn-repo> with the file path of the SVN repository that you’re backing up, and replace
<backup-file> with the file path of the compressed file containing the backup.
Make the SVN repository read-only
All of your developers should now be committing with Git. To enforce this convention, you can make
your SVN repository read-only. This process can vary depending on your server setup, but if you’re
using the svnserve daemon, you can accomplish this by editing your SVN repo’s conf/svnserve.conf
file. It’s [general] section should contain the following lines:
anon-access = read
auth-access = read
This tells svnserve that both anonymous and authenticated users only have read permissions.
Summary
And that’s all there is to migrating a project to Git. Your team should now be developing with a pure
Git workflow and enjoying all of the benefits of distributed development. Good job!
Advanced Tips
Atlassian’s Git tutorials introduce the most common Git commands, and our Git Workflows modules
discuss how these commands are typically used to facilitate collaboration. Alone, these are enough
to get a development team up and running with Git. But, if you really want to leverage the full power
of Git, you’re ready to dive into our Advanced Git articles.
Each of these articles provide an in-depth discussion of an advanced feature of Git. Instead of
presenting new commands and concepts, they refine your existing Git skills by explaining what’s
going on under the hood. Armed with this knowledge, you’ll be able to use familiar Git commands
more effectively. More importantly, you’ll never be scared of breaking your Git repository because
you’ll understand why it broke and how to fix it.
Merging vs. Rebasing
Git is all about working with divergent history. Its git merge and git rebase commands offer
alternative ways to integrate commits from different branches, and both options come with their
own advantages. In this article, we’ll discuss how and when a basic git merge operation can be
replaced with a rebase.
The git rebase command has a reputation for being magical Git voodoo that beginners should stay
away from, but it can actually make life much easier for a development team when used with care.
In this article, we’ll compare git rebase with the related git merge command and identify all of the
potential opportunities to incorporate rebasing into the typical Git workflow.
Conceptual Overview
The first thing to understand about git rebase is that it solves the same problem as git merge. Both
of these commands are designed to integrate changes from one branch into another branch—they
just do it in very different ways.
Consider what happens when you start working on a new feature in a dedicated branch, then
another team member updates the master branch with new commits. This results in a forked
history, which should be familiar to anyone who has used Git as a collaboration tool.
Now, let’s say that the new commits in master are relevant to the feature that you’re working on. To
incorporate the new commits into your feature branch, you have two options: merging or rebasing.
This creates a new “merge commit” in the feature branch that ties together the histories of both
branches, giving you a branch structure that looks like this:
Merging is nice because it’s a non-destructive operation. The existing branches are not changed in
any way. This avoids all of the potential pitfalls of rebasing (discussed below).
On the other hand, this also means that the feature branch will have an extraneous merge commit
every time you need to incorporate upstream changes. If master is very active, this can pollute your
feature branch’s history quite a bit. While it’s possible to mitigate this issue with advanced git log
options, it can make it hard for other developers to understand the history of the project.
This moves the entire feature branch to begin on the tip of the master branch, effectively
incorporating all of the new commits in master. But, instead of using a merge commit, rebasing re-
writes the project history by creating brand new commits for each commit in the original branch.
The major benefit of rebasing is that you get a much cleaner project history. First, it eliminates the
unnecessary merge commits required by git merge. Second, as you can see in the above diagram,
rebasing also results in a perfectly linear project history—you can follow the tip of feature all the
way to the beginning of the project without any forks. This makes it easier to navigate your project
with commands like git log, git bisect, and gitk.
But, there are two trade-offs for this pristine commit history: safety and traceability. If you don’t
follow the Golden Rule of Rebasing, re-writing project history can be potentially catastrophic for
your collaboration workflow. And, less importantly, rebasing loses the context provided by a merge
commit—you can’t see when upstream changes were incorporated into the feature.
Interactive Rebasing
Interactive rebasing gives you the opportunity to alter commits as they are moved to the new
branch. This is even more powerful than an automated rebase, since it offers complete control over
the branch’s commit history. Typically, this is used to clean up a messy history before merging a
feature branch into master.
To begin an interactive rebasing session, pass the i option to the git rebase command:
This will open a text editor listing all of the commits that are about to be moved:
This listing defines exactly what the branch will look like after the rebase is performed. By changing
the pick command and/or re-ordering the entries, you can make the branch’s history look like
whatever you want. For example, if the 2nd commit fixes a small problem in the 1st commit, you can
condense them into a single commit with the fixup command:
When you save and close the file, Git will perform the rebase according to your instructions,
resulting in project history that looks like the following:
Eliminating insignificant commits like this makes your feature’s history much easier to understand.
This is something that git merge simply cannot do.
For example, think about what would happen if you rebased master onto your feature branch:
The rebase moves all of the commits in master onto the tip of feature. The problem is that this only
happened in your repository. All of the other developers are still working with the original master.
Since rebasing results in brand new commits, Git will think that your master branch’s history has
diverged from everybody else’s.
The only way to synchronize the two master branches is to merge them back together, resulting in
an extra merge commit and two sets of commits that contain the same changes (the original ones,
and the ones from your rebased branch). Needless to say, this is a very confusing situation.
So, before you run git rebase, always ask yourself, “Is anyone else looking at this branch?” If the
answer is yes, take your hands off the keyboard and start thinking about a non-destructive way to
make your changes (e.g., the git revert command). Otherwise, you’re safe to re-write history as
much as you like.
Force-Pushing
If you try to push the rebased master branch back to a remote repository, Git will prevent you from
doing so because it conflicts with the remote master branch. But, you can force the push to go
through by passing the --force flag, like so:
This overwrites the remote master branch to match the rebased one from your repository and
makes things very confusing for the rest of your team. So, be very careful to use this command only
when you know exactly what you’re doing.
One of the only times you should be force-pushing is when you’ve performed a local cleanup after
you’ve pushed a private feature branch to a remote repository (e.g., for backup purposes). This is
like saying, “Oops, I didn’t really want to push that original version of the feature branch. Take the
current one instead.” Again, it’s important that nobody is working off of the commits from the
original version of the feature branch.
Workflow Walkthrough
Rebasing can be incorporated into your existing Git workflow as much or as little as your team is
comfortable with. In this section, we’ll take a look at the benefits that rebasing can offer at the
various stages of a feature’s development.
The first step in any workflow that leverages git rebase is to create a dedicated branch for each
feature. This gives you the necessary branch structure to safely utilize rebasing:
Local Cleanup
One of the best ways to incorporate rebasing into your workflow is to clean up local, in-progress
features. By periodically performing an interactive rebase, you can make sure each commit in your
feature is focused and meaningful. This lets you write your code without worrying about breaking it
up into isolated commits—you can fix it up after the fact.
When calling git rebase, you have two options for the new base: The feature’s parent branch (e.g.,
master), or an earlier commit in your feature. We saw an example of the first option in the
Interactive Rebasing section. The latter option is nice when you only need to fix up the last few
commits. For example, the following command begins an interactive rebase of only the last 3
commits.
If you want to re-write the entire feature using this method, the git merge-base command can be
useful to find the original base of the feature branch. The following returns the commit ID of the
original base, which you can then pass to git rebase:
This use of interactive rebasing is a great way to introduce git rebase into your workflow, as it only
affects local branches. The only thing other developers will see is your finished product, which
should be a clean, easy-to-follow feature branch history.
But again, this only works for private feature branches. If you’re collaborating with other developers
via the same feature branch, that branch is public, and you’re not allowed to re-write its history.
There is no git merge alternative for cleaning up local commits with an interactive rebase.
This use of git rebase is similar to a local cleanup (and can be performed simultaneously), but in the
process it incorporates those upstream commits from master.
Keep in mind that it’s perfectly legal to rebase onto a remote branch instead of master. This can
happen when collaborating on the same feature with another developer and you need to
incorporate their changes into your repository.
For example, if you and another developer named John added commits to the feature branch, your
repository might look like the following after fetching the remote feature branch from John’s
repository:
You can resolve this fork the exact same way as you integrate upstream changes from master: either
merge your local feature with john/feature, or rebase your local feature onto the tip of john/feature.
Note that this rebase doesn’t violate the Golden Rule of Rebasing because only your local feature
commits are being moved—everything before that is untouched. This is like saying, “add my changes
to what John has already done.” In most circumstances, this is more intuitive than synchronizing
with the remote branch via a merge commit.
By default, the git pull command performs a merge, but you can force it to integrate the remote
branch with a rebase by passing it the --rebase option.
Any changes from other developers need to be incorporated with git merge instead of git rebase.
For this reason, it’s usually a good idea to clean up your code with an interactive rebase before
submitting your pull request.
This is a similar situation to incorporating upstream changes into a feature branch, but since you’re
not allowed to re-write commits in the master branch, you have to eventually use git merge to
integrate the feature. However, by performing a rebase before the merge, you’re assured that the
merge will be fast-forwarded, resulting in a perfectly linear history. This also gives you the chance to
squash any follow-up commits added during a pull request.
If you’re not entirely comfortable with git rebase, you can always perform the rebase in a temporary
branch. That way, if you accidentally mess up your feature’s history, you can check out the original
branch and try again. For example:
Summary
And that’s all you really need to know to start rebasing your branches. If you would prefer a clean,
linear history free of unnecessary merge commits, you should reach for git rebase instead of git
merge when integrating changes from another branch.
On the other hand, if you want to preserve the complete history of your project and avoid the risk of
re-writing public commits, you can stick with git merge. Either option is perfectly valid, but at least
now you have the option of leveraging the benefits of git rebase.
Resetting, Checking Out, and Reverting
The git reset, git checkout, and git revert commands are all similar in that they undo some type of
change in your repository. But, they all affect different combinations of the working directory, staged
snapshot, and commit history. This article clearly defines how these commands differ and when each
of them should be used in the standard Git workflows.
The git reset, git checkout, and git revert command are some of the most useful tools in your Git
toolbox. They all let you undo some kind of change in your repository, and the first two commands
can be used to manipulate either commits or individual files.
Because they’re so similar, it’s very easy to mix up which command should be used in any given
development scenario. In this article, we’ll compare the most common configurations of git reset, git
checkout, and git revert. Hopefully, you’ll walk away with the confidence to navigate your repository
using any of these commands.
It helps to think about each command in terms of their effect on the three main components of a Git
repository: the working directory, the staged snapshot, and the commit history. Keep these
components in mind as you read through this article.
Commit-level Operation
The parameters that you pass to git reset and git checkout determine their scope. When you don’t
include a file path as a parameter, they operate on whole commits. That’s what we’ll be exploring in
this section. Note that git revert has no file-level counterpart.
Reset
On the commit-level, resetting is a way to move the tip of a branch to a different commit. This can be
used to remove commits from the current branch. For example, the following command moves the
hotfix branch backwards by two commits.
The two commits that were on the end of hotfix are now dangling commits, which means they will
be deleted the next time Git performs a garbage collection. In other words, you’re saying that you
want to throw away these commits. This can be visualized as the following:
This usage of git reset is a simple way to undo changes that haven’t been shared with anyone else.
It’s your go-to command when you’ve started working on a feature and find yourself thinking, “Oh
crap, what am I doing? I should just start over.”
In addition to moving the current branch, you can also get git reset to alter the staged snapshot
and/or the working directory by passing it one of the following flags:
--soft – The staged snapshot and working directory are not altered in any way.
--mixed – The staged snapshot is updated to match the specified commit, but the working
directory is not affected. This is the default option.
--hard – The staged snapshot and the working directory are both updated to match the
specified commit.
It’s easier to think of these modes as defining the scope of a git reset operation:
These flags are often used with HEAD as the parameter. For instance, git reset --mixed HEAD has the
affect of unstaging all changes, but leaves them in the working directory. On the other hand, if you
want to completely throw away all your uncommitted changes, you would use git reset --hard HEAD.
These are two of the most common uses of git reset.
Be careful when passing a commit other than HEAD to git reset, since this re-writes the current
branch’s history. As discussed in The Golden Rule of Rebasing, this a big problem when working on a
public branch.
Checkout
By now, you should be very familiar with the commit-level version of git checkout. When passed a
branch name, it lets you switch between branches.
Internally, all the above command does is move HEAD to a different branch and update the working
directory to match. Since this has the potential to overwrite local changes, Git forces you to commit
or stash any changes in the working directory that will be lost during the checkout operation. Unlike
git reset, git checkout doesn’t move any branches around.
You can also check out arbitrary commits by passing in the commit reference instead of a branch.
This does the exact same thing as checking out a branch: it moves the HEAD reference to the
specified commit. For example, the following command will check out out the grandparent of the
current commit:
Revert
Reverting undoes a commit by creating a new commit. This is a safe way to undo changes, as it has
no chance of re-writing the commit history. For example, the following command will figure out the
changes contained in the 2nd to last commit, create a new commit undoing those changes, and tack
the new commit onto the existing project.
You can also think of git revert as a tool for undoing committed changes, while git reset HEAD is for
undoing uncommitted changes.
Like git checkout, git revert has the potential to overwrite files in the working directory, so it will ask
you to commit or stash changes that would be lost during the revert operation.
File-level Operations
The git reset and git checkout commands also accept an optional file path as a parameter. This
dramatically alters their behavior. Instead of operating on entire snapshots, this forces them to limit
their operations to a single file.
Reset
When invoked with a file path, git reset updates the staged snapshot to match the version from the
specified commit. For example, this command will fetch the version of foo.py in the 2nd-to-last
commit and stage it for the next commit:
As with the commit-level version of git reset, this is more commonly used with HEAD rather than an
arbitrary commit. Running git reset HEAD foo.py will unstage foo.py. The changes it contains will still
be present in the working directory.
The --soft, --mixed, and --hard flags do not have any effect on the file-level version of git reset, as the
staged snapshot is always updated, and the working directory is never updated.
Checkout
Checking out a file is similar to using git reset with a file path, except it updates the working
directory instead of the stage. Unlike the commit-level version of this command, this does not move
the HEAD reference, which means that you won’t switch branches.
For example, the following command makes foo.py in the working directory match the one from the
2nd-to-last commit:
Just like the commit-level invocation of git checkout, this can be used to inspect old versions of a
project—but the scope is limited to the specified file.
If you stage and commit the checked-out file, this has the effect of “reverting” to the old version of
that file. Note that this removes all of the subsequent changes to the file, whereas the git revert
command undoes only the changes introduced by the specified commit.
Like git reset, this is commonly used with HEAD as the commit reference. For instance, git checkout
HEAD foo.py has the effect of discarding unstaged changes to foo.py. This is similar behavior to git
reset HEAD --hard, but it operates only on the specified file.
Summary
You should now have all the tools you could ever need to undo changes in a Git repository. The git
reset, git checkout, and git revert commands can be confusing, but when you think about their
effects on the working directory, staged snapshot, and commit history, it should be easier to discern
which command fits the development task at hand.
The table below sums up the most common use cases for all of these commands. Be sure to keep
this reference handy, as you’ll undoubtedly need to use at least some them during your Git career.
Command Scope Common use cases
git reset Commit-level Discard commits in a private branch or throw away uncommited changes
git reset File-level Unstage a file
git checkout Commit-level Switch between branches or inspect old snapshots
git checkout File-level Discard changes in the working directory
git revert Commit-level Undo commits in a public branch
git revert File-level (N/A)
The git log command is what makes your project history useful. Without it, you wouldn’t be able to
access any of your commits. But, if you’re like most aspiring Git users, you’ve probably only
scratched the surface of what’s possible with git log. This article walks you through its advanced
formatting and filtering options, giving you the power to extract all sorts of interesting information
from your Git repository.
The purpose of any version control system is to record changes to your code. This gives you the
power to go back into your project history to see who contributed what, figure out where bugs were
introduced, and revert problematic changes. But, having all of this history available is useless if you
don’t know how to navigate it. That’s where the git log command comes in.
By now, you should already know the basic git log command for displaying commits. But, you can
alter this output by passing many different parameters to git log.
The advanced features of git log can be split into two categories: formatting how each commit is
displayed, and filtering which commits are included in the output. Together, these two skills give you
the power to go back into your project and find any information that you could possibly need.
Formatting Log Output
First, this article will take a look at the many ways in which git log’s output can be formatted. Most of
these come in the form of flags that let you request more or less information from git log.
If you don’t like the default git log format, you can use git config’s aliasing functionality to create a
shortcut for any of the formatting options discussed below. Please see in The git config Command for
how to set up an alias.
Oneline
The --oneline flag condenses each commit to a single line. By default, it displays only the commit ID
and the first line of the commit message. Your typical git log --oneline output will look something like
this:
Decorating
Many times it’s useful to know which branch or tag each commit is associated with. The --decorate
flag makes git log display all of the references (e.g., branches, tags, etc) that point to each commit.
This can be combined with other configuration options. For example, running git log --oneline --
decorate will format the commit history like so:
This lets you know that the top commit is also checked out (denoted by HEAD) and that it is also the
tip of the master branch. The second commit has another branch pointing to it called feature, and
finally the 4th commit is tagged as v0.9.
Branches, tags, HEAD, and the commit history are almost all of the information contained in your Git
repository, so this gives you a more complete view of the logical structure of your repository.
Diffs
The git log command includes many options for displaying diffs with each commit. Two of the most
common options are --stat and -p.
The --stat option displays the number of insertions and deletions to each file altered by each commit
(note that modifying a line is represented as 1 insertion and 1 deletion). This is useful when you
want a brief summary of the changes introduced by each commit. For example, the following
commit added 67 lines to the hello.py file and removed 38 lines:
commit f2a238924e89ca1d4947662928218a06d39068c3
The amount of + and - signs next to the file name show the relative number of changes to each file
altered by the commit. This gives you an idea of where the changes for each commit can be found.
If you want to see the actual changes introduced by each commit, you can pass the -p option to git
log. This outputs the entire patch representing that commit:
commit 16b36c697eb2d24302f89aa22d9170dfe609855b
--- a/hello.py
+++ b/hello.py
@@ -13,14 +13,14 @@ B
-print("Hello, World!")
+print("Hello, Git!")
For commits with a lot of changes, the resulting output can become quite long and unwieldy. More
often than not, if you’re displaying a full patch, you’re probably searching for a specific change. For
this, you want to use the pickaxe option.
The Shortlog
The git shortlog command is a special version of git log intended for creating release
announcements. It groups each commit by author and displays the first line of each commit
message. This is an easy way to see who’s been working on what.
For example, if two developers have contributed 5 commits to a project, the git shortlog output
might look like the following:
Mary (2):
John (3):
By default, git shortlog sorts the output by author name, but you can also pass the -n option to sort
by the number of commits per author.
Graphs
The --graph option draws an ASCII graph representing the branch structure of the commit history.
This is commonly used in conjunction with the --oneline and --decorate commands to make it easier
to see which commit belongs to which branch:
For a simple repository with just 2 branches, this will produce the following:
|\
|/
The asterisk shows which branch the commit was on, so the above graph tells us that the 23ad9ad
and 16b36c6 commits are on a topic branch and the rest are on the master branch.
While this is a nice option for simple repositories, you’re probably better off with a more full-
featured visualization tool like gitk or SourceTree for projects that are heavily branched.
Custom Formatting
For all of your other git log formatting needs, you can use the --pretty=format:"<string>" option. This
lets you display each commit however you want using printf-style placeholders.
For example, the %cn, %h and %cd characters in the following command are replaced with the
committer name, abbreviated commit hash, and the committer date, respectively.
The complete list of placeholders can be found in the Pretty Formats section of the git log manual
page.
Aside from letting you view only the information that you’re interested in, the --
pretty=format:"<string>" option is particularly useful when you’re trying to pipe git log output into
another command.
Filtering the Commit History
Formatting how each commit gets displayed is only half the battle of learning git log. The other half
is understanding how to navigate the commit history. The rest of this article introduces some of the
advanced ways to pick out specific commits in your project history using git log. All of these can be
combined with any of the formatting options discussed above.
By Amount
The most basic filtering option for git log is to limit the number of commits that are displayed. When
you’re only interested in the last few commits, this saves you the trouble of viewing all the commits
in a pager.
You can limit git log’s output by including the -<n> option. For example, the following command will
display only the 3 most recent commits.
git log -3
By Date
If you’re looking for a commit from a specific time frame, you can use the --after or --before flags for
filtering commits by date. These both accept a variety of date formats as a parameter. For example,
the following command only shows commits that were created after July 1st, 2014 (inclusive):
You can also pass in relative references like "1 week ago" and "yesterday":
To search for a commits that were created between two dates, you can provide both a --before and
--after date. For instance, to display all the commits added between July 1st, 2014 and July 4th,
2014, you would use the following:
Note that the --since and --until flags are synonymous with --after and --before, respectively.
By Author
When you’re only looking for commits created by a particular user, use the --author flag. This
accepts a regular expression, and returns all commits whose author matches that pattern. If you
know exactly who you’re looking for, you can use a plain old string instead of a regular expression:
git log --author="John"
This displays all commits whose author includes the name John. The author name doesn’t need to be
an exact match—it just needs to contain the specified phrase.
You can also use regular expressions to create more complex searches. For example, the following
command searches for commits by either Mary or John.
Note that the author’s email is also included with the author’s name, so you can use this option to
search by email, too.
If your workflow separates committers from authors, the --committer flag operates in the same
fashion.
By Message
To filter commits by their commit message, use the --grep flag. This works just like the --author flag
discussed above, but it matches against the commit message instead of the author.
For example, if your team includes relevant issue numbers in each commit message, you can use
something like the following to pull out all of the commits related to that issue:
You can also pass in the -i parameter to git log to make it ignore case differences while pattern
matching.
By File
Many times, you’re only interested in changes that happened to a particular file. To show the history
related to a file, all you have to do is pass in the file path. For example, the following returns all
commits that affected either the foo.py or the bar.py file:
The -- parameter is used to tell git log that subsequent arguments are file paths and not branch
names. If there’s no chance of mixing it up with a branch, you can omit the --.
By Content
It’s also possible to search for commits that introduce or remove a particular line of source code.
This is called a pickaxe, and it takes the form of -S"<string>". For example, if you want to know when
the string Hello, World! was added to any file in the project, you would use the following command:
git log -S"Hello, World!"
If you want to search using a regular expression instead of a string, you can use the -G"<regex>" flag
instead.
This is a very powerful debugging tool, as it lets you locate all of the commits that affect a particular
line of code. It can even show you when a line was copied or moved to another file.
By Range
You can pass a range of commits to git log to show only the commits contained in that range. The
range is specified in the following format, where <since> and <until> are commit references:
This command is particularly useful when you use branch references as the parameters. It’s a simple
way to show the differences between 2 branches. Consider the following command:
The master..feature range contains all of the commits that are in the feature branch, but aren’t in
the master branch. In other words, this is how far feature has progressed since it forked off of
master. You can visualize this as follows:
Note that if you switch the order of the range (feature..master), you will get all of the commits in
master, but not in feature. If git log outputs commits for both versions, this tells you that your
history has diverged.
Filtering Merge Commits
By default, git log includes merge commits in its output. But, if your team has an always-merge
policy (that is, you merge upstream changes into topic branches instead of rebasing the topic branch
onto the upstream branch), you’ll have a lot of extraneous merge commits in your project history.
You can prevent git log from displaying these merge commits by passing the --no-merges flag:
On the other hand, if you’re only interested in the merge commits, you can use the --merges flag:
Summary
You should now be fairly comfortable using git log’s advanced parameters to format its output and
select which commits you want to display. This gives you the power to pull out exactly what you
need from your project history.
These new skills are an important part of your Git toolkit, but remember that git log is often used in
conjunction other Git commands. Once you’ve found the commit you’re looking for, you typically
pass it off to git checkout, git revert, or some other tool for manipulating your commit history. So, be
sure to keep on learning about Git’s advanced features.
Git Hooks
If you want to perform custom actions when a certain event takes place in a Git repository, hooks
are your tool of choice. They let you normalize commit messages, automate testing suites, notify
continuous integration systems, and much more. After this article, you’ll understand the many ways
in which Git hooks can streamline your workflow.
Git hooks are scripts that run automatically every time a particular event occurs in a Git repository.
They let you customize Git’s internal behavior and trigger customizable actions at key points in the
development life cycle.
Common use cases for Git hooks include encouraging a commit policy, altering the project
environment depending on the state of the repository, and implementing continuous integration
workflows. But, since scripts are infinitely customizable, you can use Git hooks to automate or
optimize virtually any aspect of your development workflow.
In this article, we’ll start with a conceptual overview of how Git hooks work. Then, we’ll survey some
of the most popular hooks for use in both local and server-side repositories.
Conceptual Overview
All Git hooks are ordinary scripts that Git executes when certain events occur in the repository. This
makes them very easy to install and configure.
Hooks can reside in either local or server-side repositories, and they are only executed in response
to actions in that repository. We’ll take a concrete look at categories of hooks later in this article.
The configuration discussed in the rest of this section apply to both local and server-side hooks.
Installing Hooks
Hooks reside in the .git/hooks directory of every Git repository. Git automatically populates this
directory with example scripts when you initialize a repository. If you take a look inside .git/hooks,
you’ll find the following files:
applypatch-msg.sample pre-push.sample
commit-msg.sample pre-rebase.sample
post-update.sample prepare-commit-msg.sample
pre-applypatch.sample update.sample
pre-commit.sample
These represent most of the available hooks, but the .sample extension prevents them from
executing by default. To “install” a hook, all you have to do is remove the .sample extension. Or, if
you’re writing a new script from scratch, you can simply add a new file matching one of the above
filenames, minus the .sample extension.
As an example, try installing a simple prepare-commit-msg hook. Remove the .sample extension
from this script, and add the following to the file:
#!/bin/sh
Hooks need to be executable, so you may need to change the file permissions of the script if you’re
creating it from scratch. For example, to make sure that prepare-commit-msg is executable, you
would run the following command:
chmod +x prepare-commit-msg
You should now see this message in place of the default commit message every time you run git
commit. We’ll take a closer look at how this actually works in the Prepare Commit Message section.
For now, let’s just revel in the fact that we can customize some of Git’s internal functionality.
The built-in sample scripts are very useful references, as they document the parameters that are
passed in to each hook (they vary from hook to hook).
Scripting Languages
The built-in scripts are mostly shell and PERL scripts, but you can use any scripting language you like
as long as it can be run as an executable. The shebang line (#!/bin/sh) in each script defines how
your file should be interpreted. So, to use a different language, all you have to do is change it to the
path of your interpreter.
For instance, we can write an executable Python script in the prepare-commit-msg file instead of
using shell commands. The following hook will do the same thing as the shell script in the previous
section.
#!/usr/bin/env python
import sys, os
commit_msg_filepath = sys.argv[1]
Notice how the first line changed to point to the Python interpreter. And, instead of using $1 to
access the first argument passed to the script, we used sys.argv[1] (again, more on this in a
moment).
This is a very powerful feature for Git hooks because it lets you work in whatever language you’re
most comfortable with.
Scope of Hooks
Hooks are local to any given Git repository, and they are not copied over to the new repository when
you run git clone. And, since hooks are local, they can be altered by anybody with access to the
repository.
This has an important impact when configuring hooks for a team of developers. First, you need to
find a way to make sure hooks stay up-to-date amongst your team members. Second, you can’t
force developers to create commits that look a certain way—you can only encourage them to do so.
Maintaining hooks for a team of developers can be a little tricky because the .git/hooks directory
isn’t cloned with the rest of your project, nor is it under version control. A simple solution to both of
these problems is to store your hooks in the actual project directory (above the .git directory). This
lets you edit them like any other version-controlled file. To install the hook, you can either create a
symlink to it in .git/hooks, or you can simply copy and paste it into the .git/hooks directory whenever
the hook is updated.
As an alternative, Git also provides a Template Directory mechanism that makes it easier to install
hooks automatically. All of the files and directories contained in this template directory are copied
into the .git directory every time you use git init or git clone.
All of the local hooks described below can be altered—or completely un-installed—by the owner of a
repository. It’s entirely up to each team member whether or not they actually use a hook. With this
in mind, it’s best to think of Git hooks as a convenient developer tool rather than a strictly enforced
development policy.
That said, it is possible to reject commits that do not conform to some standard using server-side
hooks. We’ll talk more about this later in the article.
Local Hooks
Local hooks affect only the repository in which they reside. As you read through this section,
remember that each developer can alter their own local hooks, so you can’t use them as a way to
enforce a commit policy. They can, however, make it much easier for developers to adhere to
certain guidelines.
pre-commit
prepare-commit-msg
commit-msg
post-commit
post-checkout
pre-rebase
The first 4 hooks let you plug into the entire commit life cycle, and the final 2 let you perform some
extra actions or safety checks for the git checkout and git rebase commands, respectively.
All of the pre- hooks let you alter the action that’s about to take place, while the post- hooks are
used only for notifications.
We’ll also see some useful techniques for parsing hook arguments and requesting information about
the repository using lower-level Git commands.
Pre-Commit
The pre-commit script is executed every time you run git commit before Git asks the developer for a
commit message or generates a commit object. You can use this hook to inspect the snapshot that is
about to be committed. For example, you may want to run some automated tests that make sure
the commit doesn’t break any existing functionality.
No arguments are passed to the pre-commit script, and exiting with a non-zero status aborts the
entire commit. Let’s take a look at a simplified (and more verbose) version of the built-in pre-commit
hook. This script aborts the commit if it finds any whitespace errors, as defined by the git diff-index
command (trailing whitespace, lines with only whitespace, and a space followed by a tab inside the
initial indent of a line are considered errors by default).
#!/bin/sh
The git diff-index --cached command compares a commit against the index. By passing the --check
option, we’re asking it to warn us if the changes introduces whitespace errors. If it does, we abort
the commit by returning an exit status of 1, otherwise we exit with 0 and the commit workflow
continues as normal.
This is just one example of the pre-commit hook. It happens to use existing Git commands to run
tests on the changes introduced by the proposed commit, but you can do anything you want in pre-
commit including executing other scripts, running a 3rd-party test suite, or checking code style with
Lint.
1. The name of a temporary file that contains the message. You change the commit message
by altering this file in-place.
2. The type of commit. This can be message (-m or -F option), template (-t option), merge (if
the commit is a merge commit), or squash (if the commit is squashing other commits).
3. The SHA1 hash of the relevant commit. Only given if -c, -C, or --amend option was given.
We already saw a simple example that edited the commit message, but let’s take a look at a more
useful script. When using an issue tracker, a common convention is to address each issue in a
separate branch. If you include the issue number in the branch name, you can write a prepare-
commit-msg hook to automatically include it in each commit message on that branch.
#!/usr/bin/env python
First, the above prepare-commit-msg hook shows you how to collect all of the parameters that are
passed to the script. Then, it calls git symbolic-ref --short HEAD to get the branch name that
corresponds to HEAD. If this branch name starts with issue-, it re-writes the commit message file
contents to include the issue number in the first line. So, if your branch name is issue-224, this will
generate the following commit message.
ISSUE-224
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch issue-224
# Changes to be committed:
# modified: test.txt
One thing to keep in mind when using prepare-commit-msg is that it runs even when the user passes
in a message with the -m option of git commit. This means that the above script will automatically
insert the ISSUE-[#] string without letting the user edit it. You can handle this case by seeing if the
2nd parameter (commit_type) is equal to message.
However, without the -m option, the prepare-commit-msg hook does allow the user to edit the
message after its generated, so this is really more of a convenience script than a way to enforce a
commit message policy. For that, you need the commit-msg hook discussed in the next section.
Commit Message
The commit-msg hook is much like the prepare-commit-msg hook, but it’s called after the user
enters a commit message. This is an appropriate place to warn developers that their message
doesn’t adhere to your team’s standards.
The only argument passed to this hook is the name of the file that contains the message. If it doesn’t
like the message that the user entered, it can alter this file in-place (just like with prepare-commit-
msg) or it can abort the commit entirely by exiting with a non-zero status.
For example, the following script checks to make sure that the user didn’t delete the ISSUE-[#] string
that was automatically generated by the prepare-commit-msg hook in the previous section.
#!/usr/bin/env python
The script takes no parameters and its exit status does not affect the commit in any way. For most
post-commit scripts, you’ll want access to the commit that was just created. You can use git rev-
parse HEAD to get the new commit’s SHA1 hash, or you can use git log -l HEAD to get all of its
information.
For example, if you want to email your boss every time you commit a snapshot (probably not the
best idea for most workflows), you could add the following post-commit hook.
#!/usr/bin/env python
import smtplib
from email.mime.text import MIMEText
from subprocess import check_output
It’s possible to use post-commit to trigger a local continuous integration system, but most of the
time you’ll want to be doing this in the post-receive hook. This runs on the server instead of the
user’s local machine, and it also runs every time any developer pushes their code. This makes it a
much more appropriate place to perform your continuous integration.
Post-Checkout
The post-checkout hook works a lot like the post-commit hook, but it’s called whenever you
successfully check out a reference with git checkout. This is nice for clearing out your working
directory of generated files that would otherwise cause confusion.
This hook accepts three parameters, and its exit status has no affect on the git checkout command.
A common problem with Python developers occurs when generated .pyc files stick around after
switching branches. The interpreter sometimes uses these .pyc instead of the .py source file. To
avoid any confusion, you can delete all .pyc files every time you check out a new branch using the
following post-checkout script:
#!/usr/bin/env python
if is_branch_checkout == "0":
print "post-checkout: This is a file checkout. Nothing to do."
sys.exit(0)
The current working directory for hook scripts is always set to the root of the repository, so the
os.walk('.') call iterates through every file in the repository. Then, we check its extension and delete
it if it’s a .pyc file.
You can also use the post-checkout hook to alter your working directory based on which branch you
have checked out. For example, you might use a plugins branch to store all of your plugins outside of
the core codebase. If these plugins require a lot of binaries that other branches do not, you can
selectively build them only when you’re on the plugins branch.
Pre-Rebase
The pre-rebase hook is called before git rebase changes anything, making it a good place to make
sure something terrible isn’t about to happen.
This hook takes 2 parameters: the upstream branch that the series was forked from, and the branch
being rebased. The second parameter is empty when rebasing the current branch. To abort the
rebase, exit with a non-zero status.
For example, if you want to completely disallow rebasing in your repository, you could use the
following pre-rebase script:
#!/bin/sh
Now, every time you run git rebase, you’ll see this message:
For a more in-depth example, take a look at the included pre-rebase.sample script. This script is a
little more intelligent about when to disallow rebasing. It checks to see if the topic branch that
you’re trying to rebase has already been merged into the next branch (which is assumed to be the
mainline branch). If it has, you’re probably going to get into trouble by rebasing it, so the script
aborts the rebase.
Server-side Hooks
Server-side hooks work just like local ones, except they reside in server-side repositories (e.g., a
central repository, or a developer’s public repository). When attached to the official repository,
some of these can serve as a way to enforce policy by rejecting certain commits.
There are 3 server-side hooks that we’ll be discussing in the rest of this article:
pre-receive
update
post-receive
All of these hooks let you react to different stages of the git push process.
The output from server-side hooks are piped to the client’s console, so it’s very easy to send
messages back to the developer. But, you should also keep in mind that these scripts don’t return
control of the terminal until they finish executing, so you should be careful about performing long-
running operations.
Pre-Receive
The pre-receive hook is executed every time somebody uses git push to push commits to the
repository. It should always reside in the remote repository that is the destination of the push, not in
the originating repository.
The hook runs before any references are updated, so it’s a good place to enforce any kind of
development policy that you want. If you don’t like who is doing the pushing, how the commit
message is formatted, or the changes contained in the commit, you can simply reject it. While you
can’t stop developers from making malformed commits, you can prevent these commits from
entering the official codebase by rejecting them with pre-receive.
The script takes no parameters, but each ref that is being pushed is passed to the script on a
separate line on standard input in the following format:
You can see how this hook works using a very basic pre-receive script that simply reads in the
pushed refs and prints them out.
#!/usr/bin/env python
import sys
import fileinput
Again, this is a little different than the other hooks because information is passed to the script via
standard input instead of as command-line arguments. After placing the above script in the
.git/hooks directory of a remote repository and pushing the master branch, you’ll see something like
the following in your console:
b6b36c697eb2d24302f89aa22d9170dfe609855b 85baa88c22b52ddd24d71f05db31f4e46d579095
refs/heads/master
You can use these SHA1 hashes, along with some lower-level Git commands, to inspect the changes
that are going to be introduced. Some common use cases include:
If multiple refs are pushed, returning a non-zero status from pre-receive aborts all of them. If you
want to accept or reject branches on a case-by-case basis, you need to use the update hook instead.
Update
The update hook is called after pre-receive, and it works much the same way. It’s still called before
anything is actually updated, but it’s called separately for each ref that was pushed. That means if
the user tries to push 4 branches, update is executed 4 times. Unlike pre-receive, this hook doesn’t
need to read from standard input. Instead, it accepts the following 3 arguments:
This is the same information passed to pre-receive, but since update is invoked separately for each
ref, you can reject some refs while allowing others.
#!/usr/bin/env python
import sys
branch = sys.argv[1]
old_commit = sys.argv[2]
new_commit = sys.argv[3]
The above update hook simply outputs the branch and the old/new commit hashes. When pushing
more than one branch to the remote repository, you’ll see the print statement execute for each
branch.
Post-Receive
The post-receive hook gets called after a successful push operation, making it a good place to
perform notifications. For many workflows, this is a better place to trigger notifications than post-
commit because the changes are available on a public server instead of residing only on the user’s
local machine. Emailing other developers and triggering a continuous integration system are
common use cases for post-receive.
The script takes no parameters, but is sent the same information as pre-receive via standard input.
Summary
In this article, we learned how Git hooks can be used to alter internal behavior and receive
notifications when certain events occur in a repository. Hooks are ordinary scripts that reside in
the .git/hooks repository, which makes them very easy to install and customize.
We also looked at some of the most common local and server-side hooks. These let us plug in to the
entire development life cycle. We now know how to perform customizable actions at every stage in
the commit creation process, as well as the git push process. With a little bit of scripting knowledge,
this lets you do virtually anything you can imagine with a Git repository.
Git is all about commits: you stage commits, create commits, view old commits, and transfer
commits between repositories using many different Git commands. The majority of these commands
operate on a commit in some form or another, and many of them accept a commit reference as a
parameter. For example, you can use git checkout to view an old commit by passing in a commit
hash, or you can use it to switch branches by passing in a branch name.
By understanding the many ways to refer to a commit, you make all of these commands that much
more powerful. In this chapter, we’ll shed some light on the internal workings of common
commands like git checkout, git branch, and git push by exploring the many methods of referring to
a commit.
We’ll also learn how to revive seemingly “lost” commits by accessing them through Git’s reflog
mechanism.
Hashes
The most direct way to reference a commit is via its SHA-1 hash. This acts as the unique ID for each
commit. You can find the hash of all your commits in the git log output.
commit 0c708fdec272bc4446c6cabea4f0022c2b616eba
When passing the commit to other Git commands, you only need to specify enough characters to
uniquely identify the commit. For example, you can inspect the above commit with git show by
running the following command:
It’s sometimes necessary to resolve a branch, tag, or another indirect reference into the
corresponding commit hash. For this, you can use the git rev-parse command. The following returns
the hash of the commit pointed to by the master branch:
This is particularly useful when writing custom scripts that accept a commit reference. Instead of
parsing the commit reference manually, you can let git rev-parse normalize the input for you.
Refs
A ref is an indirect way of referring to a commit. You can think of it as a user-friendly alias for a
commit hash. This is Git’s internal mechanism of representing branches and tags.
Refs are stored as normal text files in the .git/refs directory, where .git is usually called .git. To
explore the refs in one of your repositories, navigate to .git/refs. You should see the following
structure, but it will contain different files depending on what branches, tags, and remotes you have
in your repo:
.git/refs/
heads/
master
some-feature
remotes/
origin/
master
tags/
v0.9
The heads directory defines all of the local branches in you repository. Each filename matches the
name of the corresponding branch, and inside the file you’ll find a commit hash. This commit hash is
the location of the tip of the branch. To verify this, try running the following two commands from
the root of the Git repository:
cat .git/refs/heads/master
The commit hash return by the cat command should match the commit ID displayed by git log.
To change the location of the master branch, all Git has to do is change the contents of the
refs/heads/master file. Similarly, creating a new branch is simply a matter of writing a commit hash
to a new file. This is part of the reason why Git branches are so lightweight compared to SVN.
The tags directory works the exact same way, but it contains tags instead of branches. The remotes
directory lists all remote repositories that you created with git remote as separate subdirectories.
Inside each one, you’ll find all the remote branches that have been fetched into your repository.
Specifying Refs
When passing a ref to a Git command, you can either define the full name of the ref, or use a short
name and let Git search for a matching ref. You should already be familiar with short names for refs,
as this is what you’re using each time you refer to a branch by name.
git show some-feature
The some-feature argument in the above command is actually a short name for the branch. Git
resolves this to refs/heads/some-feature before using it. You can also specify the full ref on the
command line, like so:
it show refs/heads/some-feature
This avoids any ambiguity regarding the location of the ref. This is necessary, for instance, if you had
both a tag and a branch called some-feature. However, if you’re using proper naming conventions,
ambiguity between tags and branches shouldn’t generally be a problem.
Packed Refs
For large repositories, Git will periodically perform a garbage collection to remove unnecessary
objects and compress refs into a single file for more efficient performance. You can force this
compression with the garbage collection command:
git gc
This moves all of the individual branch and tag files in the refs folder into a single file called packed-
refs located in the top of the .git directory. If you open up this file, you’ll find a mapping of commit
hashes to refs:
00f54250cf4e549fdfcafe2cf9a2c90bc3800285 refs/heads/feature
0e25143693cfe9d5c2e83944bbaf6d3c4505eb17 refs/heads/master
bb883e4c91c870b5fed88fd36696e752fb6cf8e6 refs/tags/v0.9
On the outside, normal Git functionality won’t be affected in any way. But, if you’re wondering why
your .git/refs folder is empty, this is where the refs went.
Special Refs
In addition to the refs directory, there are a few special refs that reside in the top-level .git directory.
They are listed below:
These refs are all created and updated by Git when necessary. For example, The git pull command
first runs git fetch, which updates the FETCH_HEAD reference. Then, it runs git merge FETCH_HEAD
to finish pulling the fetched branches into the repository. Of course, you can use all of these like any
other ref, as I’m sure you’ve done with HEAD.
These files contain different content depending on their type and the state of your repository. The
HEAD ref can contain either a symbolic ref, which is simply a reference to another ref instead of a
commit hash, or a commit hash. For example, take a look at the contents of HEAD when you’re on
the master branch:
cat .git/HEAD
This will output ref: refs/heads/master, which means that HEAD points to the refs/heads/master ref.
This is how Git knows that the master branch is currently checked out. If you were to switch to
another branch, the contents of HEAD would be updated to reflect the new branch. But, if you were
to check out a commit instead of a branch, HEAD would contain a commit hash instead of a symbolic
ref. This is how Git knows that it’s in a detached HEAD state.
For the most part, HEAD is the only reference that you’ll be using directly. The others are generally
only useful when writing lower-level scripts that need to hook into Git’s internal workings.
Refspecs
A refspec maps a branch in the local repository to a branch in a remote repository. This makes it
possible to manage remote branches using local Git commands and to configure some advanced git
push and git fetch behavior.
A refspec is specified as [+]<src>:<dst>. The <src> parameter is the source branch in the local
repository, and the <dst> parameter is the destination branch in the remote repository. The optional
+ sign is for forcing the remote repository to perform a non-fast-forward update.
Refspecs can be used with the git push command to give a different name to the remote branch. For
example, the following command pushes the master branch to the origin remote repo like an
ordinary git push, but it uses qa-master as the name for the branch in the origin repo. This is useful
for QA teams that need to push their own branches to a remote repo.
You can also use refspecs for deleting remote branches. This is a common situation for feature-
branch workflows that push the feature branches to a remote repo (e.g., for backup purposes). The
remote feature branches still reside in the remote repo after they are deleted from the local repo, so
you get a build-up of dead feature branches as your project progresses. You can delete them by
pushing a refspec that has an empty <src> parameter, like so:
This is very convenient, since you don’t need to log in to your remote repository and manually delete
the remote branch. Note that as of Git v1.7.0 you can use the --delete flag instead of the above
method. The following will have the same effect as the above command:
By adding a few lines to the Git configuration file, you can use refspecs to alter the behavior of git
fetch. By default, git fetch fetches all of the branches in the remote repository. The reason for this is
the following section of the .git/config file:
[remote "origin"]
url = https://[email protected]:mary/example-repo.git
fetch = +refs/heads/*:refs/remotes/origin/*
The fetch line tells git fetch to download all of the branches from the origin repo. But, some
workflows don’t need all of them. For example, many continuous integration workflows only care
about the master branch. To fetch only the master branch, change the fetch line to match the
following:
[remote "origin"]
url = https://[email protected]:mary/example-repo.git
fetch = +refs/heads/master:refs/remotes/origin/master
You can also configure git push in a similar manner. For instance, if you want to always push the
master branch to qa-master in the origin remote (as we did above), you would change the config file
to:
[remote "origin"]
url = https://[email protected]:mary/example-repo.git
fetch = +refs/heads/master:refs/remotes/origin/master
push = refs/heads/master:refs/heads/qa-master
Refspecs give you complete control over how various Git commands transfer branches between
repositories. They let you rename and delete branches from your local repository, fetch/push to
branches with different names, and configure git push and git fetch to work with only the branches
that you want.
Relative Refs
You can also refer to commits relative to another commit. The ~ character lets you reach parent
commits. For example, the following displays the grandparent of HEAD:
But, when working with merge commits, things get a little more complicated. Since merge commits
have more than one parent, there is more than one path that you can follow. For 3-way merges, the
first parent is from the branch that you were on when you performed the merge, and the second
parent is from the branch that you passed to the git merge command.
The ~ character will always follow the first parent of a merge commit. If you want to follow a
different parent, you need to specify which one with the ^ character. For example, if HEAD is a
merge commit, the following returns the second parent of HEAD.
You can use more than one ^ character to move more than one generation. For instance, this
displays the grandparent of HEAD (assuming it’s a merge commit) that rests on the second parent.
Relative refs can be used with the same commands that a normal ref can be used. For example, all of
the following commands use a relative reference:
# Only list commits that are parent of the second parent of a merge commit
The Reflog
The reflog is Git’s safety net. It records almost every change you make in your repository, regardless
of whether you committed a snapshot or not. You can think of it is a chronological history of
everything you’ve done in your local repo. To view the reflog, run the git reflog command. It should
output something that looks like the following:
0e25143 HEAD@{1}: commit (amend): Integrate some awesome feature into `master`
The HEAD{<n>} syntax lets you reference commits stored in the reflog. It works a lot like the
HEAD~<n> references from the previous section, but the <n> refers to an entry in the reflog instead
of the commit history.
You can use this to revert to a state that would otherwise be lost. For example, lets say you just
scrapped a new feature with git reset. Your reflog might look something like this:
The three commits before the git reset are now dangling, which means that there is no way to
reference them—except through the reflog. Now, let’s say you realize that you shouldn’t have
thrown away all of your work. All you have to do is check out the HEAD@{1} commit to get back to
the state of your repository before you ran git reset.
This puts you in a detached HEAD state. From here, you can create a new branch and continue
working on your feature.
Summary
You should now be quite comfortable referring to commits in a Git repository. We learned how
branches and tags were stored as refs in the .git subdirectory, how to read a packed-refs file, how
HEAD is represented, how to use refspecs for advanced pushing and fetching, and how to use the
relative ~ and ^ operators to traverse a branch hierarchy.
We also took a look at the reflog, which is a way to reference commits that are not available through
any other means. This is a great way to recover from those little “Oops, I shouldn’t have done that”
situations.
The point of all this was to be able to pick out exactly the commit that you need in any given
development scenario. It’s very easy to leverage the skills you learned in this article against your
existing Git knowledge, as some of the most common commands accept refs as arguments, including
git log, git show, git checkout, git reset, git revert, git rebase, and many others.
Git Workflow