0% found this document useful (0 votes)
19 views190 pages

Git Tuturial

Version control systems are essential tools for managing changes to source code, enabling developers to track modifications, collaborate without conflicts, and maintain a history of changes. Git, a widely-used distributed version control system, offers performance, security, and flexibility, making it the preferred choice for many software teams. The adoption of Git enhances development workflows, facilitates feature branching, and supports agile practices across organizations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views190 pages

Git Tuturial

Version control systems are essential tools for managing changes to source code, enabling developers to track modifications, collaborate without conflicts, and maintain a history of changes. Git, a widely-used distributed version control system, offers performance, security, and flexibility, making it the preferred choice for many software teams. The adoption of Git enhances development workflows, facilitates feature branching, and supports agile practices across organizations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 190

Getting-Started

What is version control

Version control systems are a category of software tools that help a software team manage changes
to source code over time. Version control software keeps track of every modification to the code in a
special kind of database. If a mistake is made, developers can turn back the clock and compare
earlier versions of the code to help fix the mistake while minimizing disruption to all team members.

For almost all software projects, the source code is like the crown jewels - a precious asset whose
value must be protected. For most software teams, the source code is a repository of the invaluable
knowledge and understanding about the problem domain that the developers have collected and
refined through careful effort. Version control protects source code from both catastrophe and the
casual degradation of human error and unintended consequences.

Software developers working in teams are continually writing new source code and changing existing
source code. The code for a project, app or software component is typically organized in a folder
structure or “file tree”. One developer on the team may be working on a new feature while another
developer fixes an unrelated bug by changing code, each developer may make their changes in
several parts of the file tree.

Version control helps teams solve these kinds of problems, tracking every individual change by each
contributor and helping prevent concurrent work from conflicting. Changes made in one part of the
software can be incompatible with those made by another developer working at the same time. This
problem should be discovered and solved in an orderly manner without blocking the work of the rest
of the team. Further, in all software development, any change can introduce new bugs on its own
and new software can‘t be trusted until it’s tested. So testing and development proceed together
until a new version is ready.

Good version control software supports a developer's preferred workflow without imposing one
particular way of working. Ideally it also works on any platform, rather than dictate what operating
system or tool chain developers must use. Great version control systems facilitate a smooth and
continuous flow of changes to the code rather than the frustrating and clumsy mechanism of file
locking - giving the green light to one developer at the expense of blocking the progress of others.

Software teams that do not use any form of version control often run into problems like not knowing
which changes that have been made are available to users or the creation of incompatible changes
between two unrelated pieces of work that must then be painstakingly untangled and reworked. If
you‘re a developer who has never used version control you may have added versions to your files,
perhaps with suffixes like “final” or “latest” and then had to later deal with a new final version.
Perhaps you’ve commented out code blocks because you want to disable certain functionality
without deleting the code, fearing that there may be a use for it later. Version control is a way out of
these problems.

Version control software is an essential part of the every-day of the modern software team‘s
professional practices. Individual software developers who are accustomed to working with a
capable version control system in their teams typically recognize the incredible value version control
also gives them even on small solo projects. Once accustomed to the powerful benefits of version
control systems, many developers wouldn’t consider working without it even for non-software
projects.

Benefits of version control


Developing software without using version control is risky, like not having backups. Version control
can also enable developers to move faster and it allows software teams to preserve efficiency and
agility as the team scales to include more developers.

Version Control Systems (VCS) have seen great improvements over the past few decades and some
are better than others. VCS are sometimes known as SCM (Source Code Management) tools or RCS
(Revision Control System). One of the most popular VCS tools in use today is called Git. Git is
a Distributed VCS, a category known as DVCS, more on that later. Like many of the most popular VCS
systems available today, Git is free and open source. Regardless of what they are called, or which
system is used, the primary benefits you should expect from version control are as follows.

1. A complete long-term change history of every file. This means every change made by many
individuals over the years. Changes include the creation and deletion of files as well as edits
to their contents. Different VCS tools differ on how well they handle renaming and moving of
files. This history should also include the author, date and written notes on the purpose of
each change. Having the complete history enables going back to previous versions to help in
root cause analysis for bugs and it is crucial when needing to fix problems in older versions
of software. If the software is being actively worked on, almost everything can be considered
an “older version” of the software.
2. Branching and merging. Having team members work concurrently is a no-brainer, but even
individuals working on their own can benefit from the ability to work on independent
streams of changes. Creating a “branch” in VCS tools keeps multiple streams of work
independent from each other while also providing the facility to merge that work back
together, enabling developers to verify that the changes on each branch do not conflict.
Many software teams adopt a practice of branching for each feature or perhaps branching
for each release, or both. There are many different workflows that teams can choose from
when they decide how to make use of branching and merging facilities in VCS.

3. Traceability. Being able to trace each change made to the software and connect it to project
management and bug tracking software such as JIRA, and being able to annotate each
change with a message describing the purpose and intent of the change can help not only
with root cause analysis and other forensics. Having the annotated history of the code at
your fingertips when you are reading the code, trying to understand what it is doing and why
it is so designed can enable developers to make correct and harmonious changes that are in
accord with the intended long-term design of the system. This can be especially important
for working effectively with legacy code and is crucial in enabling developers to estimate
future work with any accuracy.

While it is possible to develop software without using any version control, doing so subjects the
project to a huge risk that no professional team would be advised to accept. So the question is not
whether to use version control but which version control system to use.

There are many choices, but here we are going to focus on just one, Git.

What is Git

By far, the most widely used modern version control system in the world today is Git. Git is a mature,
actively maintained open source project originally developed in 2005 by Linus Torvalds, the famous
creator of the Linux operating system kernel. A staggering number of software projects rely on Git
for version control, including commercial projects as well as open source. Developers who have
worked with Git are well represented in the pool of available software development talent and it
works well on a wide range of operating systems and IDEs (Integrated Development Environments).

Having a distributed architecture, Git is an example of a DVCS (hence Distributed Version Control
System). Rather than have only one single place for the full version history of the software as is
common in once-popular version control systems like CVS or Subversion (also known as SVN), in Git,
every developer's working copy of the code is also a repository that can contain the full history of all
changes.

In addition to being distributed, Git has been designed with performance, security and flexibility in
mind.

Performance
The raw performance characteristics of Git are very strong when compared to many alternatives.
Committing new changes, branching, merging and comparing past versions are all optimized for
performance. The algorithms implemented inside Git take advantage of deep knowledge about
common attributes of real source code file trees, how they are usually modified over time and what
the access patterns are.

Unlike some version control software, Git is not fooled by the names of the files when determining
what the storage and version history of the file tree should be, instead, Git focuses on the file
content itself. After all, source code files are frequently renamed, split, and rearranged. The object
format of Git's repository files uses a combination of delta encoding (storing content differences),
compression and explicitly stores directory contents and version metadata objects.

Being distributed enables significant performance benefits as well.

For example, say a developer, Alice, makes changes to source code, adding a feature for the
upcoming 2.0 release, then commits those changes with descriptive messages. She then works on a
second feature and commits those changes too. Naturally these are stored as separate pieces of
work in the version history. Alice then switches to the version 1.3 branch of the same software to fix
a bug that affects only that older version. The purpose of this is to enable Alice's team to ship a bug
fix release, version 1.3.1, before version 2.0 is ready. Alice can then return to the 2.0 branch to
continue working on new features for 2.0 and all of this can occur without any network access and is
therefore fast and reliable. She could even do it on an airplane. When she is ready to send all of the
individually committed changes to the remote repository, Alice can “push” them in one command.

Security
Git has been designed with the integrity of managed source code as a top priority. The content of
the files as well as the true relationships between files and directories, versions, tags and commits,
all of these objects in the Git repository are secured with a cryptographically secure hashing
algorithm called SHA1. This protects the code and the change history against both accidental and
malicious change and ensures that the history is fully traceable.

With Git, you can be sure you have an authentic content history of your source code.

Some other version control systems have no protections against secret alteration at a later date. This
can be a serious information security vulnerability for any organization that relies on software
development.

Flexibility
One of Git's key design objectives is flexibility. Git is flexible in several respects: in support for
various kinds of nonlinear development workflows, in its efficiency in both small and large projects
and in its compatibility with many existing systems and protocols.

Git has been designed to support branching and tagging as first-class citizens (unlike SVN) and
operations that affect branches and tags (such as merging or reverting) are also stored as part of the
change history. Not all version control systems feature this level of tracking.

Version control with Git


Git is the best choice for most software teams today. While every team is different and should do
their own analysis, here are the main reasons why version control with Git is preferred over
alternatives:

Git is good
Git has the functionality, performance, security and flexibility that most teams and individual
developers need. These attributes of Git are detailed above. In side-by-side comparisons with most
other alternatives, many teams find that Git is very favorable.

Git is a de facto standard


Git is the most broadly adopted tool of its kind. This is makes Git attractive for the following reasons.
At Atlassian, nearly all of our project source code is managed in Git.

Vast numbers of developers already have Git experience and a significant proportion of college
graduates may have experience with only Git. While some organizations may need to climb the
learning curve when migrating to Git from another version control system, many of their existing and
future developers do not need to be trained on Git.

In addition to the benefits of a large talent pool, the predominance of Git also means that many
third party software tools and services are already integrated with Git including IDEs, and our own
tools like DVCS desktop client SourceTree, issue and project tracking software, JIRA, and code
hosting service, Bitbucket.
If you are an inexperienced developer wanting to build up valuable skills in software development
tools, when it comes to version control, Git should be on your list.

Git is a quality open source project


Git is a very well supported open source project with over a decade of solid stewardship. The project
maintainers have shown balanced judgment and a mature approach to meeting the long term needs
of its users with regular releases that improve usability and functionality. The quality of the open
source software is easily scrutinized and countless businesses rely heavily on that quality.

Git enjoys great community support and a vast user base. Documentation is excellent and plentiful,
including books, tutorials and dedicated web sites. There are also podcasts and video tutorials.

Being open source lowers the cost for hobbyist developers as they can use Git without paying a fee.
For use in open-source projects, Git is undoubtedly the successor to the previous generations of
successful open source version control systems, SVN and CVS.

Criticism of Git
One common criticism of Git is that it can be difficult to learn. Some of the terminology in Git will be
novel to newcomers and for users of other systems, the Git terminology may be different, for
example, revert in Git has a different meaning than in SVN or CVS. Nevertheless, Git is very capable
and provides a lot of power to its users. Learning to use that power can take some time, however
once it has been learned, that power can be used by the team to increase their development speed.

For those teams coming from a non-distributed VCS, having a central repository may seem like a
good thing that they don‘t want to lose. However, while Git has been designed as a distributed
version control system (DVCS), with Git, you can still have an official, canonical repository where all
changes to the software must be stored. With Git, because each developer’s repository is complete,
their work doesn‘t need to be constrained by the availability and performance of the “central”
server. During outages or while offline, developers can still consult the full project history. Because
Git is flexible as well as being distributed, you can work the way you are accustomed to but gain the
additional benefits of Git, some of which you may not even realise you’re missing.

Now that you understand what version control is, what Git is and why software teams should use it,
read on to discover the benefits Git can provide across the whole organization.
Why Git for your organization

Switching from a centralized version control system to Git changes the way your development team
creates software. And, if you’re a company that relies on its software for mission-critical
applications, altering your development workflow impacts your entire business.

In this article, we’ll discuss how Git benefits each aspect of your organization, from your
development team to your marketing team, and everything in between. By the end of this article, it
should be clear that Git isn’t just for agile software development—it’s for agile business.
Git for developers

Feature Branch Workflow


One of the biggest advantages of Git is its branching capabilities. Unlike centralized version control
systems, Git branches are cheap and easy to merge. This facilitates the feature branch workflow
popular with many Git users.

Feature branches provide an isolated environment for every change to your codebase. When a
developer wants to start working on something—no matter how big or small—they create a new
branch. This ensures that the master branch always contains production-quality code.

Using feature branches is not only more reliable than directly editing production code, but it also
provides organizational benefits. They let you represent development work at the same granularity
as the your agile backlog. For example, you might implement a policy where each JIRA ticket is
addressed in its own feature branch.

Distributed Development
In SVN, each developer gets a working copy that points back to a single central repository. Git,
however, is a distributed version control system. Instead of a working copy, each developer gets
their own local repository, complete with a full history of commits.
Having a full local history makes Git fast, since it means you don’t need a network connection to
create commits, inspect previous versions of a file, or perform diffs between commits.

Distributed development also makes it easier to scale your engineering team. If someone breaks the
production branch in SVN, other developers can’t check in their changes until it’s fixed. With Git, this
kind of blocking doesn’t exist. Everybody can continue going about their business in their own local
repositories.

And, similar to feature branches, distributed development creates a more reliable environment.
Even if a developer obliterates their own repository, they can simply clone someone else’s and start
anew.

Pull Requests
Many source code management tools such as Bitbucket enhance core Git functionality with pull
requests. A pull request is a way to ask another developer to merge one of your branches into their
repository. This not only makes it easier for project leads to keep track of changes, but also lets
developers initiate discussions around their work before integrating it with the rest of the codebase.
Since they’re essentially a comment thread attached to a feature branch, pull requests are extremely
versatile. When a developer gets stuck with a hard problem, they can open a pull request to ask for
help from the rest of the team. Alternatively, junior developers can be confident that they aren’t
destroying the entire project by treating pull requests as a formal code review.

Community
In many circles, Git has come to be the expected version control system for new projects. If your
team is using Git, odds are you won’t have to train new hires on your workflow, because they’ll
already be familiar with distributed development.

In addition, Git is very popular among open source projects. This means it’s easy to leverage 3rd-
party libraries and encourage others to fork your own open source code.

Faster Release Cycle


The ultimate result of feature branches, distributed development, pull requests, and a stable
community is a faster release cycle. These capabilities facilitate an agile workflow where developers
are encouraged to share smaller changes more frequently. In turn, changes can get pushed down
the deployment pipeline faster than the monolithic releases common with centralized version
control systems.

As you might expect, Git works very well with continuous integration and continuous delivery
environments. Git hooks allow you to run scripts when certain events occur inside of a repository,
which lets you automate deployment to your heart’s content. You can even build or deploy code
from specific branches to different servers.

For example, you might want to configure Git to deploy the most recent commit from the develop
branch to a test server whenever anyone merges a pull request into it. Combining this kind of build
automation with peer review means you have the highest possible confidence in your code as it
moves from development to staging to production.

Git for marketing


To understand how switching to Git affects your company’s marketing activities, imagine your
development team has three distinct changes scheduled for completion in the next few weeks:

 The entire team is finishing up a game-changing feature that they’ve been working on for
the last 6 months.

 Mary is implementing a smaller, unrelated feature that only impacts existing customers.

 Rick is making some much-needed updates to the user interface.

If you’re using a traditional development workflow that relies on a centralized VCS, all of these
changes would probably be rolled up into a single release. Marketing can only make one
announcement that focuses primarily on the game-changing feature, and the marketing potential of
the other two updates is effectively ignored.
The shorter development cycle facilitated by Git makes it much easier to divide these into individual
releases. This gives marketers more to talk about, more often. In the above scenario, marketing can
build out three campaigns that revolve around each feature, and thus target very specific market
segments.

For instance, they might prepare a big PR push for the game changing feature, a corporate blog post
and newsletter blurb for Mary’s feature, and some guest posts about Rick’s underlying UX theory for
sending to external design blogs. All of these activities can be synchronized with a separate release.

Git for product management


The benefits of Git for product management is much the same as for marketing. More frequent
releases means more frequent customer feedback and faster updates in reaction to that feedback.
Instead of waiting for the next release 8 weeks from now, you can push a solution out to customers
as quickly as your developers can write the code.
The feature branch workflow also provides flexibility when priorities change. For instance, if you’re
halfway through a release cycle and you want to postpone one feature in lieu of another time-critical
one, it’s no problem. That initial feature can sit around in its own branch until engineering has time
to come back to it.

This same functionality makes it easy to manage innovation projects, beta tests, and rapid
prototypes as independent codebases.

Git for designers


Feature branches lend themselves to rapid prototyping. Whether your UX/UI designers want to
implement an entirely new user flow or simply replace some icons, checking out a new branch gives
them a sandboxed environment to play with. This lets designers see how their changes will look in a
real working copy of the product without the threat of breaking existing functionality.

Encapsulating user interface changes like this makes it easy to present updates to other
stakeholders. For example, if the director of engineering wants to see what the design team has
been working on, all they have to do is tell the director to check out the corresponding branch.

Pull requests take this one step further and provide a formal place for interested parties to discuss
the new interface. Designers can make any necessary changes, and the resulting commits will show
up in the pull request. This invites everybody to participate in the iteration process.

Perhaps the best part of prototyping with branches is that it’s just as easy to merge the changes into
production as it is to throw them away. There’s no pressure to do either one. This encourages
designers and UI developers to experiment while ensuring that only the best ideas make it through
to the customer.

Git for customer support


Customer support and customer success often have a different take on updates than product
managers. When a customer calls them up, they’re usually experiencing some kind of problem. If
that problem is caused by your company’s software, a bug fix needs to be pushed out as soon as
possible.

Git’s streamlined development cycle avoids postponing bug fixes until the next monolithic release. A
developer can patch the problem and push it directly to production. Faster fixes means happier
customers and fewer repeat support tickets. Instead of being stuck with, “Sorry, we’ll get right on
that” your customer support team can start responding with “We’ve already fixed it!

Git for human resources


To a certain extent, your software development workflow determines who you hire. It always helps
to hire engineers that are familiar with your technologies and workflows, but using Git also provides
other advantages.

Employees are drawn to companies that provide career growth opportunities, and understanding
how to leverage Git in both large and small organizations is a boon to any programmer. By choosing
Git as your version control system, you’re making the decision to attract forward-looking developers.

Git for anyone managing a budget


Git is all about efficiency. For developers, it eliminates everything from the time wasted passing
commits over a network connection to the man hours required to integrate changes in a centralized
version control system. It even makes better use of junior developers by giving them a safe
environment to work in. All of this affects the bottom line of your engineering department.

But, don’t forget that these efficiencies also extend outside your development team. They prevent
marketing from pouring energy into collateral for features that aren’t popular. They let designers
test new interfaces on the actual product with little overhead. They let you react to customer
complaints immediately.

Being agile is all about finding out what works as quickly as possible, magnifying efforts that are
successful, and eliminating ones that aren’t. Git serves as a multiplier for all your business activities
by making sure every department is doing their job more efficiently.
Setting up a repository
This tutorial provides a succinct overview of the most important Git commands. First, the Setting Up
a Repository section explains all of the tools you need to start a new version-controlled project.
Then, the remaining sections introduce your everyday Git commands.

By the end of this module, you should be able to create a Git repository, record snapshots of your
project for safekeeping, and view your project’s history.

git init
The git init command creates a new Git repository. It can be used to convert an existing, unversioned
project to a Git repository or initialize a new empty repository. Most of the other Git commands are
not available outside of an initialized repository, so this is usually the first command you’ll run in a
new project.

Executing git init creates a .git subdirectory in the project root, which contains all of the necessary
metadata for the repo. Aside from the .git directory, an existing project remains unaltered (unlike
SVN, Git doesn't require a .git folder in every subdirectory).

Usage
git init

Transform the current directory into a Git repository. This adds a .gitfolder to the current directory
and makes it possible to start recording revisions of the project.

git init <directory>

Create an empty Git repository in the specified directory. Running this command will create a new
folder called <directory containing nothing but the .git subdirectory.

git init --bare <directory>

Initialize an empty Git repository, but omit the working directory. Shared repositories should always
be created with the --bare flag (see discussion below). Conventionally, repositories initialized with
the --bare flag end in .git. For example, the bare version of a repository called my-project should be
stored in a directory called my-project.git.

Discussion
Compared to SVN, the git init command is an incredibly easy way to create new version-controlled
projects. Git doesn’t require you to create a repository, import files, and check out a working copy.
All you have to do is cd into your project folder and run git init, and you’ll have a fully functional Git
repository.

However, for most projects, git init only needs to be executed once to create a central repository—
developers typically don‘t use git init to create their local repositories. Instead, they’ll usually use git
clone to copy an existing repository onto their local machine.
Bare Repositories

The --bare flag creates a repository that doesn’t have a working directory, making it impossible to
edit files and commit changes in that repository. Central repositories should always be created as
bare repositories because pushing branches to a non-bare repository has the potential to overwrite
changes. Think of --bare as a way to mark a repository as a storage facility, opposed to a
development environment. This means that for virtually all Git workflows, the central repository is
bare, and developers local repositories are non-bare.

Example
Since git clone is a more convenient way to create local copies of a project, the most common use
case for git init is to create a central repository:

ssh <user>@<host>
cd path/above/repo
git init --bare my-project.git

First, you SSH into the server that will contain your central repository. Then, you navigate to
wherever you’d like to store the project. Finally, you use the --bare flag to create a central storage
repository. Developers would then [clone](/tutorials/setting-up-a-repository/git-clone) my-
project.gitto create a local copy on their development machine.

git clone
The git clone command copies an existing Git repository. This is sort of like svn checkout, except the
“working copy” is a full-fledged Git repository—it has its own history, manages its own files, and is a
completely isolated environment from the original repository.

As a convenience, cloning automatically creates a remote connection called origin pointing back to
the original repository. This makes it very easy to interact with a central repository.
Usage
git clone <repo>

Clone the repository located at <repo> onto the local machine. The original repository can be
located on the local filesystem or on a remote machine accessible via HTTP or SSH.

git clone <repo> <directory>

Clone the repository located at <repo> into the folder called <directory> on the local machine.

Discussion
If a project has already been set up in a central repository, the git clone command is the most
common way for users to obtain a development copy. Like git init, cloning is generally a one-time
operation—once a developer has obtained a working copy, all version control operations and
collaborations are managed through their local repository.

Repo-To-Repo Collaboration

It’s important to understand that Git’s idea of a “working copy” is very different from the working
copy you get by checking out code from an SVN repository. Unlike SVN, Git makes no distinction
between the working copy and the central repository—they are all full-fledged Git repositories.

This makes collaborating with Git fundamentally different than with SVN. Whereas SVN depends on
the relationship between the central repository and the working copy, Git’s collaboration model is
based on repository-to-repository interaction. Instead of checking a working copy into SVN’s central
repository, you push or pull commits from one repository to another.
Of course, there’s nothing stopping you from giving certain Git repos special meaning. For example,
by simply designating one Git repo as the “central” repository, it’s possible to replicate a Centralized
workflow using Git. The point is, this is accomplished through conventions rather than being
hardwired into the VCS itself.

Example
The example below demonstrates how to obtain a local copy of a central repository stored on a
server accessible at example.com using the SSH username john:

git clone ssh://[email protected]/path/to/my-project.git

cd my-project

# Start working on the project

The first command initializes a new Git repository in the my-projectfolder on your local machine and
populates it with the contents of the central repository. Then, you can cd into the project and start
editing files, committing snapshots, and interacting with other repositories. Also note that
the .git extension is omitted from the cloned repository. This reflects the non-bare status of the local
copy.

git config
The git config command lets you configure your Git installation (or an individual repository) from the
command line. This command can define everything from user info to preferences to the behavior of
a repository. Several common configuration options are listed below.

Usage
git config user.name <name>

Define the author name to be used for all commits in the current repository. Typically, you’ll want to
use the --global flag to set configuration options for the current user.
git config --global user.name <name>

Define the author name to be used for all commits by the current user.

git config --global user.email <email>

Define the author email to be used for all commits by the current user.

git config --global alias.<alias-name> <git-command>

Create a shortcut for a Git command.

git config --system core.editor <editor>

Define the text editor used by commands like git commit for all users on the current machine.
The <editor> argument should be the command that launches the desired editor (e.g., vi).

git config --global --edit

Open the global configuration file in a text editor for manual editing.

Discussion
All configuration options are stored in plaintext files, so the git configcommand is really just a
convenient command-line interface. Typically, you’ll only need to configure a Git installation the first
time you start working on a new development machine, and for virtually all cases, you’ll want to use
the --global flag.

Git stores configuration options in three separate files, which lets you scope options to individual
repositories, users, or the entire system:

 <repo>/.git/config – Repository-specific settings.

 ~/.gitconfig – User-specific settings. This is where options set with the --global flag are stored.

 $(prefix)/etc/gitconfig – System-wide settings.

When options in these files conflict, local settings override user settings, which override system-
wide. If you open any of these files, you’ll see something like the following:

[user]

name = John Smith

email = [email protected]

[alias]

st = status

co = checkout

br = branch
up = rebase

ci = commit

[core]

editor = vim

You can manually edit these values to the exact same effect as git config.

Example
The first thing you’ll want to do after installing Git is tell it your name/email and customize some of
the default settings. A typical initial configuration might look something like the following:

# Tell Git who you are

git config --global user.name "John Smith"

git config --global user.email [email protected]

# Select your favorite text editor

git config --global core.editor vim

# Add some SVN-like aliases

git config --global alias.st status

git config --global alias.co checkout

git config --global alias.br branch

git config --global alias.up rebase

git config --global alias.ci commit

This will produce the ~/.gitconfig file from the previous section.
Saving changes

git add
The git add command adds a change in the working directory to the staging area. It tells Git that you
want to include updates to a particular file in the next commit. However, git add doesn't really affect
the repository in any significant way—changes are not actually recorded until you run git commit.

In conjunction with these commands, you'll also need git status to view the state of the working
directory and the staging area.

Usage
git add <file>

Stage all changes in <file> for the next commit.

git add <directory>

Stage all changes in <directory> for the next commit.

git add -p

Begin an interactive staging session that lets you choose portions of a file to add to the next commit.
This will present you with a chunk of changes and prompt you for a command. Use y to stage the
chunk, n to ignore the chunk, s to split it into smaller chunks, e to manually edit the chunk, and q to
exit.

Discussion
The git add and git commit commands compose the fundamental Git workflow. These are the two
commands that every Git user needs to understand, regardless of their team’s collaboration model.
They are the means to record versions of a project into the repository’s history.
Developing a project revolves around the basic edit/stage/commit pattern. First, you edit your files
in the working directory. When you’re ready to save a copy of the current state of the project, you
stage changes with git add. After you’re happy with the staged snapshot, you commit it to the
project history with git commit.

The git add command should not be confused with svn add, which adds a file to the repository.
Instead, git add works on the more abstract level of changes. This means that git add needs to be
called every time you alter a file, whereas svn add only needs to be called once for each file. It may
sound redundant, but this workflow makes it much easier to keep a project organized.

The Staging Area

The staging area is one of Git's more unique features, and it can take some time to wrap your head
around it if you’re coming from an SVN (or even a Mercurial) background. It helps to think of it as a
buffer between the working directory and the project history.

Instead of committing all of the changes you've made since the last commit, the stage lets you group
related changes into highly focused snapshots before actually committing it to the project history.
This means you can make all sorts of edits to unrelated files, then go back and split them up into
logical commits by adding related changes to the stage and commit them piece-by-piece. As in any
revision control system, it’s important to create atomic commits so that it’s easy to track down bugs
and revert changes with minimal impact on the rest of the project.

Example
When you’re starting a new project, git add serves the same function as svn import. To create an
initial commit of the current directory, use the following two commands:

git add .

git commit

Once you’ve got your project up-and-running, new files can be added by passing the path to git add:
git add hello.py

git commit

The above commands can also be used to record changes to existing files. Again, Git doesn’t
differentiate between staging changes in new files vs. changes in files that have already been added
to the repository.

git commit
The git commit command commits the staged snapshot to the project history. Committed snapshots
can be thought of as “safe” versions of a project—Git will never change them unless you explicity ask
it to. Along with git add, this is one of the most important Git commands.

While they share the same name, this command is nothing like svn commit. Snapshots are
committed to the local repository, and this requires absolutely no interaction with other Git
repositories.

Usage
git commit

Commit the staged snapshot. This will launch a text editor prompting you for a commit message.
After you’ve entered a message, save the file and close the editor to create the actual commit. git
commit -m "<message>"

Commit the staged snapshot, but instead of launching a text editor, use <message> as the commit
message.

git commit -a

Commit a snapshot of all changes in the working directory. This only includes modifications to
tracked files (those that have been added with git add at some point in their history).

Discussion
Snapshots are always committed to the local repository. This is fundamentally different from SVN,
wherein the working copy is committed to the central repository. In contrast, Git doesn’t force you
to interact with the central repository until you’re ready. Just as the staging area is a buffer between
the working directory and the project history, each developer’s local repository is a buffer between
their contributions and the central repository.

This changes the basic development model for Git users. Instead of making a change and committing
it directly to the central repo, Git developers have the opportunity to accumulate commits in their
local repo. This has many advantages over SVN-style collaboration: it makes it easier to split up a
feature into atomic commits, keep related commits grouped together, and clean up local history
before publishing it to the central repository. It also lets developers work in an isolated environment,
deferring integration until they’re at a convenient break point.
Snapshots, Not Differences

Aside from the practical distinctions between SVN and Git, their underlying implementation also
follow entirely divergent design philosophies. Whereas SVN tracks differences of a file, Git’s version
control model is based on snapshots. For example, an SVN commit consists of a diff compared to the
original file added to the repository. Git, on the other hand, records the entire contents of each file in
every commit.

This makes many Git operations much faster than SVN, since a particular version of a file doesn’t
have to be “assembled” from its diffs—the complete revision of each file is immediately available
from Git's internal database.

Git's snapshot model has a far-reaching impact on virtually every aspect of its version control model,
affecting everything from its branching and merging tools to its collaboration workflows.

Example
The following example assumes you’ve edited some content in a file called hello.py and are ready to
commit it to the project history. First, you need to stage the file with git add, then you can commit
the staged snapshot.

git add hello.py

git commit

This will open a text editor (customizable via git config) asking for a commit message, along with a
list of what’s being committed:
# Please enter the commit message for your changes. Lines starting

# with '#' will be ignored, and an empty message aborts the commit.

# On branch master

# Changes to be committed:

# (use "git reset HEAD <file>..." to unstage)

#modified: hello.py

Git doesn't require commit messages to follow any specific formatting constraints, but the canonical
format is to summarize the entire commit on the first line in less than 50 characters, leave a blank
line, then a detailed explanation of what’s been changed. For example:

Change the message displayed by hello.py

- Update the sayHello() function to output the user's name

- Change the sayGoodbye() function to a friendlier message

Note that many developers also like to use present tense in their commit messages. This makes
them read more like actions on the repository, which makes many of the history-rewriting
operations more intuitive.

Inspecting a repository

git status
The git status command displays the state of the working directory and the staging area. It lets you
see which changes have been staged, which haven’t, and which files aren’t being tracked by Git.
Status output does not show you any information regarding the committed project history. For this,
you need to use git log.

Usage
git status

List which files are staged, unstaged, and untracked.


Discussion
The git status command is a relatively straightforward command. It simply shows you what's been
going on with git add and git commit. Status messages also include relevant instructions for
staging/unstaging files. Sample output showing the three main categories of a git status call is
included below:

# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
#modified: hello.py
#
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
#modified: main.py
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
#hello.pyc

Ignoring Files

Untracked files typically fall into two categories. They‘re either files that have just been added to the
project and haven’t been committed yet, or they're compiled binaries like .pyc, .obj, .exe, etc. While
it's definitely beneficial to include the former in the git status output, the latter can make it hard to
see what’s actually going on in your repository.

For this reason, Git lets you completely ignore files by placing paths in a special file called .gitignore.
Any files that you'd like to ignore should be included on a separate line, and the * symbol can be
used as a wildcard. For example, adding the following to a .gitignore file in your project root will
prevent compiled Python modules from appearing in git status:

*.pyc

Example

It‘s good practice to check the state of your repository before committing changes so that you don’t
accidentally commit something you don't mean to. This example displays the repository status
before and after staging and committing a snapshot:
# Edit hello.py
git status
# hello.py is listed under "Changes not staged for commit"
git add hello.py
git status
# hello.py is listed under "Changes to be committed"
git commit
git status
# nothing to commit (working directory clean)

The first status output will show the file as unstaged. The git addaction will be reflected in the
second git status, and the final status output will tell you that there is nothing to commit—the
working directory matches the most recent commit. Some Git commands (e.g., git merge) require
the working directory to be clean so that you don't accidentally overwrite changes.

git log
The git log command displays committed snapshots. It lets you list the project history, filter it, and
search for specific changes. While git status lets you inspect the working directory and the staging
area, git log only operates on the committed history.

Log output can be customized in several ways, from simply filtering commits to displaying them in a
completely user-defined format. Some of the most common configurations of git log are presented
below.

Usage
git log

Display the entire commit history using the default formatting. If the output takes up more than one
screen, you can use Space to scroll and q to exit.

git log -n <limit>

Limit the number of commits by <limit>. For example, git log -n 3will display only 3 commits.

git log --oneline


Condense each commit to a single line. This is useful for getting a high-level overview of the project
history.

git log --stat

Along with the ordinary git log information, include which files were altered and the relative number
of lines that were added or deleted from each of them.

git log -p

Display the patch representing each commit. This shows the full diff of each commit, which is the
most detailed view you can have of your project history.

git log --author="<pattern>"

Search for commits by a particular author. The <pattern> argument can be a plain string or a regular
expression.

git log --grep="<pattern>"

Search for commits with a commit message that matches <pattern>, which can be a plain string or a
regular expression.

git log <since>..<until>

Show only commits that occur between <since> and <until>. Both arguments can be either a commit
ID, a branch name, HEAD, or any other kind of revision reference.

git log <file>

Only display commits that include the specified file. This is an easy way to see the history of a
particular file.

git log --graph --decorate --oneline

A few useful options to consider. The —graph flag that will draw a text based graph of the commits
on the left hand side of the commit messages. —decorate adds the names of branches or tags of the
commits that are shown. —oneline shows the commit information on a single line making it easier to
browse through commits at-a-glance.

Discussion
The git log command is Git's basic tool for exploring a repository’s history. It’s what you use when
you need to find a specific version of a project or figure out what changes will be introduced by
merging in a feature branch.

commit 3157ee3718e180a9476bf2e5cab8e3f1e78a73b7

Author: John Smith


Most of this is pretty straightforward; however, the first line warrants some explanation. The 40-
character string after commit is an SHA-1 checksum of the commit’s contents. This serves two
purposes. First, it ensures the integrity of the commit—if it was ever corrupted, the commit would
generate a different checksum. Second, it serves as a unique ID for the commit.

This ID can be used in commands like git log <since>..<until> to refer to specific commits. For
instance, git log 3157e..5ab91 will display everything between the commits with
ID's 3157e and 5ab91. Aside from checksums, branch names (discussed in the Branch Module) and
the HEAD keyword are other common methods for referring to individual commits. HEAD always
refers to the current commit, be it a branch or a specific commit.

The ~ character is useful for making relative references to the parent of a commit. For
example, 3157e~1 refers to the commit before 3157e, and HEAD~3 is the great-grandparent of the
current commit.

The idea behind all of these identification methods is to let you perform actions based on specific
commits. The git log command is typically the starting point for these interactions, as it lets you find
the commits you want to work with.

Example
The Usage section provides many examples of git log, but keep in mind that several options can be
combined into a single command:

git log --author="John Smith" -p hello.py

This will display a full diff of all the changes John Smith has made to the file hello.py.

The .. syntax is a very useful tool for comparing branches. The next example displays a brief overview
of all the commits that are in some-feature that are not in master.

git log --oneline master..some-feature


Viewing old commits

git checkout
The git checkout command serves three distinct functions: checking out files, checking out commits,
and checking out branches. In this module, we’re only concerned with the first two configurations.

Checking out a commit makes the entire working directory match that commit. This can be used to
view an old state of your project without altering your current state in any way. Checking out a file
lets you see an old version of that particular file, leaving the rest of your working directory
untouched.

Usage

git checkout master

Return to the master branch. Branches are covered in depth in the next module, but for now, you
can just think of this as a way to get back to the “current” state of the project.

git checkout <commit> <file>

Check out a previous version of a file. This turns the <file> that resides in the working directory into
an exact copy of the one from <commit> and adds it to the staging area.

git checkout <commit>

Update all files in the working directory to match the specified commit. You can use either a commit
hash or a tag as the <commit> argument. This will put you in a detached HEAD state.
Discussion
The whole idea behind any version control system is to store “safe” copies of a project so that you
never have to worry about irreparably breaking your code base. Once you’ve built up a project
history, git checkout is an easy way to “load” any of these saved snapshots onto your development
machine.

Checking out an old commit is a read-only operation. It’s impossible to harm your repository while
viewing an old revision. The “current” state of your project remains untouched in the master branch
(see the Branches Module for details). During the normal course of development, the HEAD usually
points to master or some other local branch, but when you check out a previous commit, HEAD no
longer points to a branch—it points directly to a commit. This is called a “detached HEAD” state, and
it can be visualized as the following:

On the other hand, checking out an old file does affect the current state of your repository. You can
re-commit the old version in a new snapshot as you would any other file. So, in effect, this usage of
git checkout serves as a way to revert back to an old version of an individual file.
Example

Viewing an Old Revision


This example assumes that you’ve started developing a crazy experiment, but you’re not sure if you
want to keep it or not. To help you decide, you want to take a look at the state of the project before
you started your experiment. First, you’ll need to find the ID of the revision you want to see.

git log –oneline

Let’s say your project history looks something like the following:

b7119f2 Continue doing crazy things


872fa7e Try something crazy
a1e8fb5 Make some important changes to hello.py
435b61d Create hello.py
9773e52 Initial import

You can use git checkout to view the “Make some import changes to hello.py” commit as follows:

git checkout a1e8fb5

This makes your working directory match the exact state of the a1e8fb5 commit. You can look at
files, compile the project, run tests, and even edit files without worrying about losing the current
state of the project. Nothing you do in here will be saved in your repository. To continue developing,
you need to get back to the “current” state of your project:

git checkout master

This assumes that you're developing on the default master branch, which will be thoroughly
discussed in the Branches Module.

Once you’re back in the master branch, you can use either git revert or git reset to undo any
undesired changes.

Checking Out a File


If you’re only interested in a single file, you can also use git checkout to fetch an old version of it. For
example, if you only wanted to see the hello.py file from the old commit, you could use the following
command:

git checkout a1e8fb5 hello.py

Remember, unlike checking out a commit, this does affect the current state of your project. The old
file revision will show up as a “Change to be committed,” giving you the opportunity to revert back
to the previous version of the file. If you decide you don’t want to keep the old version, you can
check out the most recent version with the following:

git checkout HEAD hello.py


Undoing Changes

This tutorial provides all of the necessary skills to work with previous revisions of a software project.
First, it shows you how to explore old commits, then it explains the difference between reverting
public commits in the project history vs. resetting unpublished changes on your local machine.

git checkout
The git checkout command serves three distinct functions: checking out files, checking out commits,
and checking out branches. In this module, we’re only concerned with the first two configurations.
Checking out a commit makes the entire working directory match that commit. This can be used to
view an old state of your project without altering your current state in any way. Checking out a file
lets you see an old version of that particular file, leaving the rest of your working directory
untouched.

Usage

git checkout master

Return to the master branch. Branches are covered in depth in the next module, but for now, you
can just think of this as a way to get back to the “current” state of the project.

git checkout <commit> <file>

Check out a previous version of a file. This turns the <file> that resides in the working directory into
an exact copy of the one from <commit> and adds it to the staging area.

git checkout <commit>

Update all files in the working directory to match the specified commit. You can use either a commit
hash or a tag as the <commit>argument. This will put you in a detached HEAD state.
Discussion

The whole idea behind any version control system is to store “safe” copies of a project so that you
never have to worry about irreparably breaking your code base. Once you’ve built up a project
history, git checkout is an easy way to “load” any of these saved snapshots onto your development
machine.

Checking out an old commit is a read-only operation. It’s impossible to harm your repository while
viewing an old revision. The “current” state of your project remains untouched in the master branch
(see the Branches Module for details). During the normal course of development, the HEAD usually
points to master or some other local branch, but when you check out a previous commit, HEAD no
longer points to a branch—it points directly to a commit. This is called a “detached HEAD” state, and
it can be visualized as the following:

On the other hand, checking out an old file does affect the current state of your repository. You can
re-commit the old version in a new snapshot as you would any other file. So, in effect, this usage
of git checkout serves as a way to revert back to an old version of an individual file.
Example

Viewing an Old Revision


This example assumes that you’ve started developing a crazy experiment, but you’re not sure if you
want to keep it or not. To help you decide, you want to take a look at the state of the project before
you started your experiment. First, you’ll need to find the ID of the revision you want to see.

git log --oneline

Let’s say your project history looks something like the following:

b7119f2 Continue doing crazy things


872fa7e Try something crazy
a1e8fb5 Make some important changes to hello.py
435b61d Create hello.py
9773e52 Initial import

You can use git checkout to view the “Make some import changes to hello.py” commit as follows:

git checkout a1e8fb5

This makes your working directory match the exact state of the a1e8fb5 commit. You can look at
files, compile the project, run tests, and even edit files without worrying about losing the current
state of the project. Nothing you do in here will be saved in your repository. To continue developing,
you need to get back to the “current” state of your project:

git checkout master

This assumes that you're developing on the default master branch, which will be thoroughly
discussed in the Branches Module.

Once you’re back in the master branch, you can use either git revert or git reset to undo any
undesired changes.

Checking Out a File


If you’re only interested in a single file, you can also use git checkout to fetch an old version of it. For
example, if you only wanted to see the hello.py file from the old commit, you could use the following
command:

git checkout a1e8fb5 hello.py

Remember, unlike checking out a commit, this does affect the current state of your project. The old
file revision will show up as a “Change to be committed,” giving you the opportunity to revert back
to the previous version of the file. If you decide you don’t want to keep the old version, you can
check out the most recent version with the following:

git checkout HEAD hello.py


git revert
The git revert command undoes a committed snapshot. But, instead of removing the commit from
the project history, it figures out how to undo the changes introduced by the commit and appends
anew commit with the resulting content. This prevents Git from losing history, which is important for
the integrity of your revision history and for reliable collaboration.

Usage

git revert <commit>

Generate a new commit that undoes all of the changes introduced in <commit>, then apply it to the
current branch.
Discussion
Reverting should be used when you want to remove an entire commit from your project history. This
can be useful, for example, if you’re tracking down a bug and find that it was introduced by a single
commit. Instead of manually going in, fixing it, and committing a new snapshot, you can use git
revert to automatically do all of this for you.

Reverting vs. Resetting


It's important to understand that git revert undoes a single commit—it does not “revert” back to the
previous state of a project by removing all subsequent commits. In Git, this is actually called a reset,
not a revert.
Reverting has two important advantages over resetting. First, it doesn’t change the project history,
which makes it a “safe” operation for commits that have already been published to a shared
repository. For details about why altering shared history is dangerous, please see the git reset page.

Second, git revert is able to target an individual commit at an arbitrary point in the history,
whereas git reset can only work backwards from the current commit. For example, if you wanted to
undo an old commit with git reset, you would have to remove all of the commits that occurred after
the target commit, remove it, then re-commit all of the subsequent commits. Needless to say, this is
not an elegant undo solution.

Example
The following example is a simple demonstration of git revert. It commits a snapshot, then
immediately undoes it with a revert.

# Edit some tracked files

# Commit a snapshot
git commit -m "Make some changes that will be undone"

# Revert the commit we just created


git revert HEAD

This can be visualized as the following:


Note that the 4th commit is still in the project history after the revert. Instead of deleting it, git
revert added a new commit to undo its changes. As a result, the 3rd and 5th commits represent the
exact same code base, and the 4th commit is still in our history just in case we want to go back to it
down the road.

git reset
If git revert is a “safe” way to undo changes, you can think of git reset as the dangerous method.
When you undo with git reset(and the commits are no longer referenced by any ref or the reflog),
there is no way to retrieve the original copy—it is apermanent undo. Care must be taken when using
this tool, as it’s one of the only Git commands that has the potential to lose your work.

Like git checkout, git reset is a versatile command with many configurations. It can be used to
remove committed snapshots, although it’s more often used to undo changes in the staging area and
the working directory. In either case, it should only be used to undo local changes—you should
never reset snapshots that have been shared with other developers.

Usage

git reset <file>

Remove the specified file from the staging area, but leave the working directory unchanged. This
unstages a file without overwriting any changes.

git reset

Reset the staging area to match the most recent commit, but leave the working directory
unchanged. This unstages all files without overwriting any changes, giving you the opportunity to re-
build the staged snapshot from scratch.

git reset --hard


Reset the staging area and the working directory to match the most recent commit. In addition to
unstaging changes, the --hard flag tells Git to overwrite all changes in the working directory, too. Put
another way: this obliterates all uncommitted changes, so make sure you really want to throw away
your local developments before using it.

git reset <commit>

Move the current branch tip backward to <commit>, reset the staging area to match, but leave the
working directory alone. All changes made since <commit> will reside in the working directory,
which lets you re-commit the project history using cleaner, more atomic snapshots.

git reset --hard <commit>

Move the current branch tip backward to <commit> and reset both the staging area and the working
directory to match. This obliterates not only the uncommitted changes, but all commits
after <commit>, as well.

Discussion
All of the above invocations are used to remove changes from a repository. Without the --
hard flag, git reset is a way to clean up a repository by unstaging changes or uncommitting a series of
snapshots and re-building them from scratch. The --hard flag comes in handy when an experiment
has gone horribly wrong and you need a clean slate to work with.

Whereas reverting is designed to safely undo a public commit, git reset is designed to
undo local changes. Because of their distinct goals, the two commands are implemented differently:
resetting completely removes a changeset, whereas revertingmaintains the original changeset and
uses a new commit to apply the undo.

Don’t Reset Public History


You should never use git reset <commit> when any snapshots after<commit> have been pushed to a
public repository. After publishing a commit, you have to assume that other developers are reliant
upon it.
Removing a commit that other team members have continued developing poses serious problems
for collaboration. When they try to sync up with your repository, it will look like a chunk of the
project history abruptly disappeared. The sequence below demonstrates what happens when you
try to reset a public commit. The origin/master branch is the central repository’s version of your
local master branch.

As soon as you add new commits after the reset, Git will think that your local history has diverged
from origin/master, and the merge commit required to synchronize your repositories is likely to
confuse and frustrate your team.

The point is, make sure that you’re using git reset <commit> on a local experiment that went wrong
—not on published changes. If you need to fix a public commit, the git revert command was
designed specifically for this purpose.

Examples

Unstaging a File
The git reset command is frequently encountered while preparing the staged snapshot. The next
example assumes you have two files called hello.py and main.py that you’ve already added to the
repository.

# Edit both hello.py and main.py

# Stage everything in the current directory


git add .

# Realize that the changes in hello.py and main.py


# should be committed in different snapshots

# Unstage main.py
git reset main.py

# Commit only hello.py


git commit -m "Make some changes to hello.py"

# Commit main.py in a separate snapshot


git add main.py
git commit -m "Edit main.py"

As you can see, git reset helps you keep your commits highly-focused by letting you unstage changes
that aren’t related to the next commit.

Removing Local Commits


The next example shows a more advanced use case. It demonstrates what happens when you’ve
been working on a new experiment for a while, but decide to completely throw it away after
committing a few snapshots.

# Create a new file called `foo.py` and add some code to it

# Commit it to the project history


git add foo.py
git commit -m "Start developing a crazy feature"

# Edit `foo.py` again and change some other tracked files, too

# Commit another snapshot


git commit -a -m "Continue my crazy feature"

# Decide to scrap the feature and remove the associated commits


git reset --hard HEAD~2

The git reset HEAD~2 command moves the current branch backward by two commits, effectively
removing the two snapshots we just created from the project history. Remember that this kind of
reset should only be used on unpublished commits. Never perform the above operation if you’ve
already pushed your commits to a shared repository.

git clean
The git clean command removes untracked files from your working directory. This is really more of a
convenience command, since it’s trivial to see which files are untracked with git status and remove
them manually. Like an ordinary rm command, git clean is notundoable, so make sure you really
want to delete the untracked files before you run it.

The git clean command is often executed in conjunction with git reset --hard. Remember that
resetting only affects tracked files, so a separate command is required for cleaning up untracked
ones. Combined, these two commands let you return the working directory to the exact state of a
particular commit.

Usage

git clean -n
Perform a “dry run” of git clean. This will show you which files are going to be removed without
actually doing it.

git clean -f

Remove untracked files from the current directory. The -f (force) flag is required unless
the clean.requireForce configuration option is set to false (it's true by default). This will not remove
untracked folders or files specified by .gitignore.

git clean -f <path>

Remove untracked files, but limit the operation to the specified path.

git clean -df

Remove untracked files and untracked directories from the current directory.

git clean -xf

Remove untracked files from the current directory as well as any files that Git usually ignores.

Discussion
The git reset --hard and git clean -f commands are your best friends after you’ve made some
embarrassing developments in your local repository and want to burn the evidence. Running both of
them will make your working directory match the most recent commit, giving you a clean slate to
work with.

The git clean command can also be useful for cleaning up the working directory after a build. For
example, it can easily remove the .o and .exe binaries generated by a C compiler. This is occasionally
a necessary step before packaging a project for release. The -xoption is particularly convenient for
this purpose.

Keep in mind that, along with git reset, git clean is one of the only Git commands that has the
potential to permanently delete commits, so be careful with it. In fact, it’s so easy to lose important
additions that the Git maintainers require the -f flag for even the most basic operations. This
prevents you from accidentally deleting everything with a naive git clean call.

Example
The following example obliterates all changes in the working directory, including new files that have
been added. It assumes you’ve already committed a few snapshots and are experimenting with
some new developments.

# Edit some existing files


# Add some new files
# Realize you have no idea what you're doing

# Undo changes in tracked files


git reset --hard
# Remove untracked files
git clean -df
After running this reset/clean sequence, the working directory and the staging area will look exactly
like the most recent commit, and git status will report a clean working directory. You're now ready
to begin again.

Note that, unlike the second example in git reset, the new files were _not _added to the repository.
As a result, they could not be affected by git reset --hard, and git clean was required to delete them.

Rewriting history

Intro
Git‘s main job is to make sure you never lose a committed change. But, it’s also designed to give you
total control over your development workflow. This includes letting you define exactly what your
project history looks like; however, it also creates the potential to lose commits. Git provides its
history-rewriting commands under the disclaimer that using them may result in lost content.

This tutorial discusses some of the most common reasons for overwriting committed snapshots and
shows you how to avoid the pitfalls of doing so.

git commit --amend


The git commit --amend command is a convenient way to fix up the most recent commit. It lets you
combine staged changes with the previous commit instead of committing it as an entirely new
snapshot. It can also be used to simply edit the previous commit message without changing its
snapshot.
But, amending doesn’t just alter the most recent commit—it replaces it entirely. To Git, it will look
like a brand new commit, which is visualized with an asterisk (*) in the diagram above. It’s important
to keep this in mind when working with public repositories.

Usage

git commit –amend

Combine the staged changes with the previous commit and replace the previous commit with the
resulting snapshot. Running this when there is nothing staged lets you edit the previous commit’s
message without altering its snapshot.

Discussion
Premature commits happen all the time in the course of your everyday development. It’s easy to
forget to stage a file or to format your commit message the wrong way. The --amend flag is a
convenient way to fix these little mistakes.

Don’t Amend Public Commits


On the git reset page, we talked about how you should never reset commits that have been shared
with other developers. The same goes for amending: never amend commits that have been pushed
to a public repository.

Amended commits are actually entirely new commits, and the previous commit is removed from the
project history. This has the same consequences as resetting a public snapshot. If you amend a
commit that other developers have based their work on, it will look like the basis of their work
vanished from the project history. This is a confusing situation for developers to be in and it’s
complicated to recover from.

Example
The following example demonstrates a common scenario in Git-based development. We edit a few
files that we would like to commit in a single snapshot, but then we forget to add one of the files the
first time around. Fixing the error is simply a matter of staging the other file and committing with the
--amend flag:

# Edit hello.py and main.py


git add hello.py
git commit

# Realize you forgot to add the changes from main.py


git add main.py
git commit --amend --no-edit

The editor will be populated with the message from the previous commit and including the --no-edit
flag will allow you to make the amendment to your commit without changing its commit message.
You can change it if necessary, otherwise just save and close the file as usual. The resulting commit
will replace the incomplete one, and it will look like we committed the changes to hello.py and
main.py in a single snapshot.

git rebase
Rebasing is the process of moving a branch to a new base commit. The general process can be
visualized as the following:

From a content perspective, rebasing really is just moving a branch from one commit to another. But
internally, Git accomplishes this by creating new commits and applying them to the specified base—
it’s literally rewriting your project history. It’s very important to understand that, even though the
branch looks the same, it’s composed of entirely new commits.

Usage

git rebase <base>


Rebase the current branch onto <base>, which can be any kind of commit reference (an ID, a branch
name, a tag, or a relative reference to HEAD).

Discussion
The primary reason for rebasing is to maintain a linear project history. For example, consider a
situation where the master branch has progressed since you started working on a feature:

Git Rebase Branch onto Master

You have two options for integrating your feature into the master branch: merging directly or
rebasing and then merging. The former option results in a 3-way merge and a merge commit, while
the latter results in a fast-forward merge and a perfectly linear history. The following diagram
demonstrates how rebasing onto master facilitates a fast-forward merge.

Fast-forward merge
Rebasing is a common way to integrate upstream changes into your local repository. Pulling in
upstream changes with git merge results in a superfluous merge commit every time you want to see
how the project has progressed. On the other hand, rebasing is like saying, “I want to base my
changes on what everybody has already done.”

Don’t Rebase Public History


As we’ve discussed with git commit --amend and git reset, you should never rebase commits that
have been pushed to a public repository. The rebase would replace the old commits with new ones,
and it would look like that part of your project history abruptly vanished.

Examples
The example below combines git rebase with git merge to maintain a linear project history. This is a
quick and easy way to ensure that your merges will be fast-forwarded.

# Start a new feature


git checkout -b new-feature master
# Edit files
git commit -a -m "Start developing a feature"

In the middle of our feature, we realize there’s a security hole in our project

# Create a hotfix branch based off of master


git checkout -b hotfix master
# Edit files
git commit -a -m "Fix security hole"
# Merge back into master
git checkout master
git merge hotfix
git branch -d hotfix

After merging the hotfix into master, we have a forked project history. Instead of a plain git merge,
we’ll integrate the feature branch with a rebase to maintain a linear history:

git checkout new-feature


git rebase master

This moves new-feature to the tip of master, which lets us do a standard fast-forward merge from
master:

git checkout master


git merge new-feature

git rebase -i
Running git rebase with the -i flag begins an interactive rebasing session. Instead of blindly moving
all of the commits to the new base, interactive rebasing gives you the opportunity to alter individual
commits in the process. This lets you clean up history by removing, splitting, and altering an existing
series of commits. It’s like git commit --amend on steroids.
Usage

git rebase -i <base>

Rebase the current branch onto <base>, but use an interactive rebasing session. This opens an editor
where you can enter commands (described below) for each commit to be rebased. These commands
determine how individual commits will be transferred to the new base. You can also reorder the
commit listing to change the order of the commits themselves.

Discussion
Interactive rebasing gives you complete control over what your project history looks like. This affords
a lot of freedom to developers, as it lets them commit a “messy” history while they’re focused on
writing code, then go back and clean it up after the fact.

Most developers like to use an interactive rebase to polish a feature branch before merging it into
the main code base. This gives them the opportunity to squash insignificant commits, delete
obsolete ones, and make sure everything else is in order before committing to the “official” project
history. To everybody else, it will look like the entire feature was developed in a single series of well-
planned commits.

Examples
The example found below is an interactive adaptation of the one from the non-interactive git rebase
page.

# Start a new feature


git checkout -b new-feature master
# Edit files
git commit -a -m "Start developing a feature"
# Edit more files
git commit -a -m "Fix something from the previous commit"

# Add a commit directly to master


git checkout master
# Edit files
git commit -a -m "Fix security hole"

# Begin an interactive rebasing session


git checkout new-feature
git rebase -i master

The last command will open an editor populated with the two commits from new-feature, along
with some instructions:

pick 32618c4 Start developing a feature


pick 62eed47 Fix something from the previous commit

You can change the pick commands before each commit to determine how it gets moved during the
rebase. In our case, let’s just combine the two commits with a squash command:
pick 32618c4 Start developing a feature
squash 62eed47 Fix something from the previous commit

Save and close the editor to begin the rebase. This will open another editor asking for the commit
message for the combined snapshot. After defining the commit message, the rebase is complete and
you should be able to see the squashed commit in your git log output. This entire process can be
visualized as follows:

Note that the squashed commit has a different ID than either of the original commits, which tells us
that it is indeed a brand new commit.

Finally, you can do a fast-forward merge to integrate the polished feature branch into the main code
base:

git checkout master


git merge new-feature

The real power of interactive rebasing can be seen in the history of the resulting master branch—the
extra 62eed47 commit is nowhere to be found. To everybody else, it looks like you’re a brilliant
developer who implemented the new-feature with the perfect amount of commits the first time
around. This is how interactive rebasing can keep a project’s history clean and meaningful.

git reflog
Git keeps track of updates to the tip of branches using a mechanism called reflog. This allows you to
go back to changesets even though they are not referenced by any branch or tag. After rewriting
history, the reflog contains information about the old state of branches and allows you to go back to
that state if necessary.
Usage

git reflog

Show the reflog for the local repository.

git reflog --relative-date


Show the reflog with relative date information (e.g. 2 weeks ago).

Discussion
Every time the current HEAD gets updated (by switching branches, pulling in new changes, rewriting
history or simply by adding new commits) a new entry will be added to the reflog.

Example
To understand git reflog, let's run through an example.

0a2e358 HEAD@{0}: reset: moving to HEAD~2


0254ea7 HEAD@{1}: checkout: moving from 2.2 to master
c10f740 HEAD@{2}: checkout: moving from master to 2.2

The reflog above shows a checkout from master to the 2.2 branch and back. From there, there's a
hard reset to an older commit. The latest activity is represented at the top labeled HEAD@{0}.

If it turns out that you accidentially moved back, the reflog will contain the commit master pointed
to (0254ea7) before you accidentially dropped 2 commits.

git reset --hard 0254ea7

Using git reset it is then possible to change master back to the commit it was before. This provides a
safety net in case history was accidentially changed.

It's important to note that the reflog only provides a safety net if changes have been commited to
your local repository and that it only tracks movements.
Collaborating
Syncing

SVN uses a single central repository to serve as the communication hub for developers, and
collaboration takes place by passing changesets between the developers’ working copies and the
central repository. This is different from Git’s collaboration model, which gives every developer their
own copy of the repository, complete with its own local history and branch structure. Users typically
need to share a series of commits rather than a single changeset. Instead of committing a changeset
from a working copy to the central repository, Git lets you share entire branches between
repositories.

The commands presented below let you manage connections with other repositories, publish local
history by “pushing” branches to other repositories, and see what others have contributed by
“pulling” branches into your local repository.

git remote
The git remote command lets you create, view, and delete connections to other repositories.
Remote connections are more like bookmarks rather than direct links into other repositories. Instead
of providing real-time access to another repository, they serve as convenient names that can be
used to reference a not-so-convenient URL.

For example, the following diagram shows two remote connections from your repo into the central
repo and another developer’s repo. Instead of referencing them by their full URLs, you can pass the
origin and john shortcuts to other Git commands.
Usage
git remote

List the remote connections you have to other repositories.

git remote -v

Same as the above command, but include the URL of each connection.

git remote add <name> <url>

Create a new connection to a remote repository. After adding a remote, you’ll be able to use
<name> as a convenient shortcut for <url> in other Git commands.

git remote rm <name>

Remove the connection to the remote repository called <name>.

git remote rename <old-name> <new-name>

Rename a remote connection from <old-name> to <new-name>.

Discussion
Git is designed to give each developer an entirely isolated development environment. This means
that information is not automatically passed back and forth between repositories. Instead,
developers need to manually pull upstream commits into their local repository or manually push
their local commits back up to the central repository. The git remote command is really just an easier
way to pass URLs to these “sharing” commands.
The origin Remote
When you clone a repository with git clone, it automatically creates a remote connection called
origin pointing back to the cloned repository. This is useful for developers creating a local copy of a
central repository, since it provides an easy way to pull upstream changes or publish local commits.
This behavior is also why most Git-based projects call their central repository origin.

Repository URLs
Git supports many ways to reference a remote repository. Two of the easiest ways to access a
remote repo are via the HTTP and the SSH protocols. HTTP is an easy way to allow anonymous, read-
only access to a repository. For example:

http://host/path/to/repo.git

But, it’s generally not possible to push commits to an HTTP address (you wouldn’t want to allow
anonymous pushes anyways). For read-write access, you should use SSH instead:

ssh://user@host/path/to/repo.git

You’ll need a valid SSH account on the host machine, but other than that, Git supports authenticated
access via SSH out of the box.

Examples
In addition to origin, it’s often convenient to have a connection to your teammates’ repositories. For
example, if your co-worker, John, maintained a publicly accessible repository on
dev.example.com/john.git, you could add a connection as follows:

git remote add john http://dev.example.com/john.git

Having this kind of access to individual developers’ repositories makes it possible to collaborate
outside of the central repository. This can be very useful for small teams working on a large project.

git fetch
The git fetch command imports commits from a remote repository into your local repo. The resulting
commits are stored as remote branches instead of the normal local branches that we’ve been
working with. This gives you a chance to review changes before integrating them into your copy of
the project.

Usage

git fetch <remote>

Fetch all of the branches from the repository. This also downloads all of the required commits and
files from the other repository.

git fetch <remote> <branch>


Same as the above command, but only fetch the specified branch.

Discussion
Fetching is what you do when you want to see what everybody else has been working on. Since
fetched content is represented as a remote branch, it has absolutely no effect on your local
development work. This makes fetching a safe way to review commits before integrating them with
your local repository. It’s similar to svn update in that it lets you see how the central history has
progressed, but it doesn’t force you to actually merge the changes into your repository.

Remote Branches
Remote branches are just like local branches, except they represent commits from somebody else’s
repository. You can check out a remote branch just like a local one, but this puts you in a detached
HEAD state (just like checking out an old commit). You can think of them as read-only branches. To
view your remote branches, simply pass the -r flag to the git branch command. Remote branches are
prefixed by the remote they belong to so that you don’t mix them up with local branches. For
example, the next code snippet shows the branches you might see after fetching from the origin
remote:

git branch -r
# origin/master
# origin/develop
# origin/some-feature

Again, you can inspect these branches with the usual git checkout and git log commands. If you
approve the changes a remote branch contains, you can merge it into a local branch with a normal
git merge. So, unlike SVN, synchronizing your local repository with a remote repository is actually a
two-step process: fetch, then merge. The git pull command is a convenient shortcut for this process.

Examples
This example walks through the typical workflow for synchronizing your local repository with the
central repository's master branch.

git fetch origin

This will display the branches that were downloaded:

a1e8fb5..45e66a4 master -> origin/master


a1e8fb5..9e8ab1c develop -> origin/develop
* [new branch] some-feature -> origin/some-feature

The commits from these new remote branches are shown as squares instead of circles in the
diagram below. As you can see, git fetch gives you access to the entire branch structure of another
repository.
To see what commits have been added to the upstream master, you can run a git log using
origin/master as a filter

git log --oneline master..origin/master

To approve the changes and merge them into your local master branch with the following
commands:

git checkout master


git log origin/master

Then we can use git merge origin/master

git merge origin/master

The origin/master and master branches now point to the same commit, and you are synchronized
with the upstream developments.

git pull
Merging upstream changes into your local repository is a common task in Git-based collaboration
workflows. We already know how to do this with git fetch followed by git merge, but git pull rolls
this into a single command.

Usage

git pull <remote>

Fetch the specified remote’s copy of the current branch and immediately merge it into the local
copy. This is the same as git fetch <remote> followed by git merge origin/<current-branch>.
git pull --rebase <remote>

Same as the above command, but instead of using git merge to integrate the remote branch with
the local one, use git rebase.

Discussion
You can think of git pull as Git's version of svn update. It’s an easy way to synchronize your local
repository with upstream changes. The following diagram explains each step of the pulling process.

You start out thinking your repository is synchronized, but then git fetch reveals that origin's version
of master has progressed since you last checked it. Then git merge immediately integrates the
remote master into the local one:

Pulling via Rebase


The --rebase option can be used to ensure a linear history by preventing unnecessary merge
commits. Many developers prefer rebasing over merging, since it’s like saying, “I want to put my
changes on top of what everybody else has done.” In this sense, using git pull with the --rebase flag
is even more like svn update than a plain git pull.

In fact, pulling with --rebase is such a common workflow that there is a dedicated configuration
option for it:

git config --global branch.autosetuprebase always

After running that command, all git pull commands will integrate via git rebase instead of git merge.

Examples
The following example demonstrates how to synchronize with the central repository's master
branch:

git checkout master


git pull --rebase origin

This simply moves your local changes onto the top of what everybody else has already contributed.

git push
Pushing is how you transfer commits from your local repository to a remote repo. It's the
counterpart to git fetch, but whereas fetching imports commits to local branches, pushing exports
commits to remote branches. This has the potential to overwrite changes, so you need to be careful
how you use it. These issues are discussed below.

Usage

git push <remote> <branch>

Push the specified branch to <remote>, along with all of the necessary commits and internal objects.
This creates a local branch in the destination repository. To prevent you from overwriting commits,
Git won’t let you push when it results in a non-fast-forward merge in the destination repository.

git push <remote> --force

Same as the above command, but force the push even if it results in a non-fast-forward merge. Do
not use the --force flag unless you’re absolutely sure you know what you’re doing.

git push <remote> --all

Push all of your local branches to the specified remote.

git push <remote> --tags

Tags are not automatically pushed when you push a branch or use the --all option. The --tags flag
sends all of your local tags to the remote repository.

Discussion
The most common use case for git push is to publish your local changes to a central repository. After
you’ve accumulated several local commits and are ready to share them with the rest of the team,
you (optionally) clean them up with an interactive rebase, then push them to the central repository.
The above diagram shows what happens when your local master has progressed past the central
repository’s master and you publish changes by running git push origin master. Notice how git push
is essentially the same as running git merge master from inside the remote repository.

Force Pushing
Git prevents you from overwriting the central repository’s history by refusing push requests when
they result in a non-fast-forward merge. So, if the remote history has diverged from your history,
you need to pull the remote branch and merge it into your local one, then try pushing again. This is
similar to how SVN makes you synchronize with the central repository via svn update before
committing a changeset.

The --force flag overrides this behavior and makes the remote repository’s branch match your local
one, deleting any upstream changes that may have occurred since you last pulled. The only time you
should ever need to force push is when you realize that the commits you just shared were not quite
right and you fixed them with a git commit --amend or an interactive rebase. However, you must be
absolutely certain that none of your teammates have pulled those commits before using the --force
option.

Only Push to Bare Repositories


In addition, you should only push to repositories that have been created with the --bare flag. Since
pushing messes with the remote branch structure, it’s important to never push to another
developer’s repository. But because bare repos don’t have a working directory, it’s impossible to
interrupt anybody’s developments.

Examples
The following example describes one of the standard methods for publishing local contributions to
the central repository. First, it makes sure your local master is up-to-date by fetching the central
repository’s copy and rebasing your changes on top of them. The interactive rebase is also a good
opportunity to clean up your commits before sharing them. Then, the git push command sends all of
the commits on your local master to the central repository.

git checkout master


git fetch origin master
git rebase -i origin/master
# Squash commits, fix up commit messages etc.
git push origin master

Since we already made sure the local master was up-to-date, this should result in a fast-forward
merge, and git push should not complain about any of the non-fast-forward issues discussed above.

Making a Pull Request

Pull requests are a feature that makes it easier for developers to collaborate using Bitbucket. They
provide a user-friendly web interface for discussing proposed changes before integrating them into
the official project.

In their simplest form, pull requests are a mechanism for a developer to notify team members that
they have completed a feature. Once their feature branch is ready, the developer files a pull request
via their Bitbucket account. This lets everybody involved know that they need to review the code
and merge it into the master branch.

But, the pull request is more than just a notification—it’s a dedicated forum for discussing the
proposed feature. If there are any problems with the changes, teammates can post feedback in the
pull request and even tweak the feature by pushing follow-up commits. All of this activity is tracked
directly inside of the pull request.

Compared to other collaboration models, this formal solution for sharing commits makes for a much
more streamlined workflow. SVN and Git can both automatically send notification emails with a
simple script; however, when it comes to discussing changes, developers typically have to rely on
email threads. This can become haphazard, especially when follow-up commits are involved. Pull
requests put all of this functionality into a friendly web interface right next to your Bitbucket
repositories.

Anatomy of a Pull Request


When you file a pull request, all you’re doing is requesting that another developer (e.g., the project
maintainer) pulls a branch from your repository into their repository. This means that you need to
provide 4 pieces of information to file a pull request: the source repository, the source branch, the
destination repository, and the destination branch.
Many of these values will be set to a sensible default by Bitbucket. However, depending on your
collaboration workflow, your team may need to specify different values. The above diagram shows a
pull request that asks to merge a feature branch into the official master branch, but there are many
other ways to use pull requests.

How it works
Pull requests can be used in conjunction with the Feature Branch Workflow, the Gitflow Workflow,
or the Forking Workflow. But a pull request requires either two distinct branches or two distinct
repositories, so they will not work with the Centralized Workflow. Using pull requests with each of
these workflows is slightly different, but the general process is as follows:

1. A developer creates the feature in a dedicated branch in their local repo.


2. The developer pushes the branch to a public Bitbucket repository.
3. The developer files a pull request via Bitbucket.
4. The rest of the team reviews the code, discusses it, and alters it.
5. The project maintainer merges the feature into the official repository and closes the pull
request.
6. The rest of this section describes how pull requests can be leveraged against different
collaboration workflows.

Feature Branch Workflow With Pull Requests


The Feature Branch Workflow uses a shared Bitbucket repository for managing collaboration, and
developers create features in isolated branches. But, instead of immediately merging them into
master, developers should open a pull request to initiate a discussion around the feature before it
gets integrated into the main codebase.
There is only one public repository in the Feature Branch Workflow, so the pull request’s destination
repository and the source repository will always be the same. Typically, the developer will specify
their feature branch as the source branch and the master branch as the destination branch.

After receiving the pull request, the project maintainer has to decide what to do. If the feature is
ready to go, they can simply merge it into master and close the pull request. But, if there are
problems with the proposed changes, they can post feedback in the pull request. Follow-up commits
will show up right next to the relevant comments.

It’s also possible to file a pull request for a feature that is incomplete. For example, if a developer is
having trouble implementing a particular requirement, they can file a pull request containing their
work-in-progress. Other developers can then provide suggestions inside of the pull request, or even
fix the problem themselves with additional commits.

Gitflow Workflow With Pull Requests


The Gitflow Workflow is similar to the Feature Branch Workflow, but defines a strict branching
model designed around the project release. Adding pull requests to the Gitflow Workflow gives
developers a convenient place to talk about a release branch or a maintenance branch while they’re
working on it.
The mechanics of pull requests in the Gitflow Workflow are the exact same as the previous section: a
developer simply files a pull request when a feature, release, or hotfix branch needs to be reviewed,
and the rest of the team will be notified via Bitbucket.

Features are generally merged into the develop branch, while release and hotfix branches are
merged into both develop and master. Pull requests can be used to formally manage all of these
merges.

Forking Workflow With Pull Requests


In the Forking Workflow, a developer pushes a completed feature to their own public repository
instead of a shared one. After that, they file a pull request to let the project maintainer know that it’s
ready for review.

The notification aspect of pull requests is particularly useful in this workflow because the project
maintainer has no way of knowing when another developer has added commits to their Bitbucket
repository.
Since each developer has their own public repository, the pull request’s source repository will differ
from its destination repository. The source repository is the developer’s public repository and the
source branch is the one that contains the proposed changes. If the developer is trying to merge the
feature into the main codebase, then the destination repository is the official project and the
destination branch is master.

Pull requests can also be used to collaborate with other developers outside of the official project.
For example, if a developer was working on a feature with a teammate, they could file a pull request
using the teammate’s Bitbucket repository for the destination instead of the official project. They
would then use the same feature branch for the source and destination branches.

The two developers could discuss and develop the feature inside of the pull request. When they’re
done, one of them would file another pull request asking to merge the feature into the official
master branch. This kind of flexibility makes pull requests very powerful collaboration tool in the
Forking workflow.
Example
The example below demonstrates how pull requests can be used in the Forking Workflow. It is
equally applicable to developers working in small teams and to a third-party developer contributing
to an open source project.

In the example, Mary is a developer, and John is the project maintainer. Both of them have their
own public Bitbucket repositories, and John’s contains the official project.

Mary forks the official project

To start working in the project, Mary first needs to fork John’s Bitbucket repository. She can do this
by signing in to Bitbucket, navigating to John’s repository, and clicking the Fork button.
After filling out the name and description for the forked repository, she will have a server-side copy
of the project.

Mary clones her Bitbucket repository

Next, Mary needs to clone the Bitbucket repository that she just forked. This will give her a working
copy of the project on her local machine. She can do this by running the following command:

git clone https://[email protected]/user/repo.git

Keep in mind that git clone automatically creates an origin remote that points back to Mary’s forked
repository.
Mary develops a new feature

Before she starts writing any code, Mary needs to create a new branch for the feature. This branch is
what she will use as the source branch of the pull request.

git checkout -b some-feature


# Edit some code
git commit -a -m "Add first draft of some feature"
Mary can use as many commits as she needs to create the feature. And, if the feature’s history is
messier than she would like, she can use an interactive rebase to remove or squash unnecessary
commits. For larger projects, cleaning up a feature’s history makes it much easier for the project
maintainer to see what’s going on in the pull request.

Mary pushes the feature to her Bitbucket repository


After her feature is complete, Mary pushes the feature branch to her own Bitbucket repository (not
the official repository) with a simple git push:

git push origin some-branch

This makes her changes available to the project maintainer (or any collaborators who might need
access to them).

Mary creates the pull request

After Bitbucket has her feature branch, Mary can create the pull request through her Bitbucket
account by navigating to her forked repository and clicking the Pull request button in the top-right
corner. The resulting form automatically sets Mary’s repository as the source repository, and it asks
her to specify the source branch, the destination repository, and the destination branch.

Mary wants to merge her feature into the main codebase, so the source branch is her feature
branch, the destination repository is John’s public repository, and the destination branch is master.
She’ll also need to provide a title and description for the pull request. If there are other people who
need to approve the code besides John, she can enter them in the Reviewers field.
After she creates the pull request, a notification will be sent to John via his Bitbucket feed and
(optionally) via email.

John reviews the pull request

John can access all of the pull requests people have filed by clicking on the Pull request tab in his
own Bitbucket repository. Clicking on Mary’s pull request will show him a description of the pull
request, the feature’s commit history, and a diff of all the changes it contains.

If he thinks the feature is ready to merge into the project, all he has to do is hit the Merge button to
approve the pull request and merge Mary’s feature into his master branch.

But, for this example, let’s say John found a small bug in Mary’s code, and needs her to fix it before
merging it in. He can either post a comment to the pull request as a whole, or he can select a specific
commit in the feature’s history to comment on.
Mary adds a follow-up commit
If Mary has any questions about the feedback, she can respond inside of the pull request, treating it
as a discussion forum for her feature.

To correct the error, Mary adds another commit to her feature branch and pushes it to her Bitbucket
repository, just like she did the first time around. This commit is automatically added to the original
pull request, and John can review the changes again, right next to his original comment.

John accepts the pull request


Finally, John accepts the changes, merges the feature branch into master, and closes the pull
request. The feature is now integrated into the project, and any other developers working on it can
pull it into their own local repositories using the standard git pull command.

Where to go from here


You should now have all of the tools you need to start integrating pull requests into your existing
workflow. Remember, pull requests are not a replacement for any of the Git-based collaboration
workflows, but rather a convenient addition to them that makes collaboration more accessible to all
of your team members.

Using Branches
This tutorial is a comprehensive introduction to Git branches. First, we‘ll take a look at creating
branches, which is like requesting a new project history. Then, we’ll see how git checkout can be
used to select a branch. Finally, we'll learn how git merge can integrate the history of independent
branches.

As you read, remember that Git branches aren't like SVN branches. Whereas SVN branches are only
used to capture the occasional large-scale development effort, Git branches are an integral part of
your everyday workflow.
git branch

A branch represents an independent line of development. Branches serve as an abstraction for the
edit/stage/commit process discussed in Git Basics, the first module of this series. You can think of
them as a way to request a brand new working directory, staging area, and project history. New
commits are recorded in the history for the current branch, which results in a fork in the history of
the project.

The git branch command lets you create, list, rename, and delete branches. It doesn’t let you switch
between branches or put a forked history back together again. For this reason, git branch is tightly
integrated with the git checkout and git merge commands.

Usage

git branch

List all of the branches in your repository.

git branch <branch>

Create a new branch called <branch>. This does not check out the new branch.

git branch -d <branch>

Delete the specified branch. This is a “safe” operation in that Git prevents you from deleting the
branch if it has unmerged changes.

git branch -D <branch>

Force delete the specified branch, even if it has unmerged changes. This is the command to use if
you want to permanently throw away all of the commits associated with a particular line of
development.
git branch -m <branch>

Rename the current branch to <branch>.

Discussion
In Git, branches are a part of your everyday development process. When you want to add a new
feature or fix a bug—no matter how big or how small—you spawn a new branch to encapsulate your
changes. This makes sure that unstable code is never committed to the main code base, and it gives
you the chance to clean up your feature’s history before merging it into the main branch.

For example, the diagram above visualizes a repository with two isolated lines of development, one
for a little feature, and one for a longer-running feature. By developing them in branches, it’s not
only possible to work on both of them in parallel, but it also keeps the main master branch free from
questionable code.

Branch Tips
The implementation behind Git branches is much more lightweight than SVN’s model. Instead of
copying files from directory to directory, Git stores a branch as a reference to a commit. In this
sense, a branch represents the tip of a series of commits—it's not a container for commits. The
history for a branch is extrapolated through the commit relationships.

This has a dramatic impact on Git's merging model. Whereas merges in SVN are done on a file-basis,
Git lets you work on the more abstract level of commits. You can actually see merges in the project
history as a joining of two independent commit histories.

Example

Creating Branches
It's important to understand that branches are just pointers to commits. When you create a branch,
all Git needs to do is create a new pointer—it doesn’t change the repository in any other way. So, if
you start with a repository that looks like this:
Then, you create a branch using the following command:

git branch crazy-experiment

The repository history remains unchanged. All you get is a new pointer to the current commit:

Note that this only creates the new branch. To start adding commits to it, you need to select it with
git checkout, and then use the standard git add and git commit commands. Please see the git
checkout section of this module for more information.

Deleting Branches
Once you’ve finished working on a branch and have merged it into the main code base, you’re free
to delete the branch without losing any history:
git branch -d crazy-experiment

However, if the branch hasn’t been merged, the above command will output an error message:

error: The branch 'crazy-experiment' is not fully merged.


If you are sure you want to delete it, run 'git branch -D crazy-experiment'.

This protects you from losing your reference to those commits, which means you would effectively
lose access to that entire line of development. If you really want to delete the branch (e.g., it’s a
failed experiment), you can use the capital -D flag:

git branch -D crazy-experiment

This deletes the branch regardless of its status and without warnings, so use it judiciously.

git checkout
The git checkout command lets you navigate between the branches created by git branch. Checking
out a branch updates the files in the working directory to match the version stored in that branch,
and it tells Git to record all new commits on that branch. Think of it as a way to select which line of
development you’re working on.

In the previous module, we saw how git checkout can be used to view old commits. Checking out
branches is similar in that the working directory is updated to match the selected branch/revision;
however, new changes are saved in the project history—that is, it’s not a read-only operation.

Usage

git checkout <existing-branch>

Check out the specified branch, which should have already been created with git branch. This makes
<existing-branch> the current branch, and updates the working directory to match.

git checkout -b <new-branch>

Create and check out <new-branch>. The -b option is a convenience flag that tells Git to run git
branch <new-branch> before running git checkout <new-branch>. git checkout -b <new-branch>
<existing-branch>

Same as the above invocation, but base the new branch off of <existing-branch> instead of the
current branch.

Discussion
git checkout works hand-in-hand with git branch. When you want to start a new feature, you create
a branch with git branch, then check it out with git checkout. You can work on multiple features in a
single repository by switching between them with git checkout.
Having a dedicated branch for each new feature is a dramatic shift from the traditional SVN
workflow. It makes it ridiculously easy to try new experiments without the fear of destroying existing
functionality, and it makes it possible to work on many unrelated features at the same time. In
addition, branches also facilitate several collaborative workflows.

Detached HEADs
Now that we’ve seen the three main uses of git checkout we can talk about that “detached HEAD”
we encountered in the previous module.

Remember that the HEAD is Git’s way of referring to the current snapshot. Internally, the git
checkout command simply updates the HEAD to point to either the specified branch or commit.
When it points to a branch, Git doesn't complain, but when you check out a commit, it switches into
a “detached HEAD” state.

This is a warning telling you that everything you’re doing is “detached” from the rest of your
project’s development. If you were to start developing a feature while in a detached HEAD state,
there would be no branch allowing you to get back to it. When you inevitably check out another
branch (e.g., to merge your feature in), there would be no way to reference your feature:

The point is, your development should always take place on a branch—never on a detached HEAD.
This makes sure you always have a reference to your new commits. However, if you’re just looking at
an old commit, it doesn’t really matter if you’re in a detached HEAD state or not.

Example
The following example demonstrates the basic Git branching process. When you want to start
working on a new feature, you create a dedicated branch and switch into it:

git branch new-feature


git checkout new-feature

Then, you can commit new snapshots just like we’ve seen in previous modules:

# Edit some files


git add <file>
git commit -m "Started work on a new feature"
# Repeat

All of these are recorded in new-feature, which is completely isolated from master. You can add as
many commits here as necessary without worrying about what’s going on in the rest of your
branches. When it’s time to get back to “official” code base, simply check out the master branch:

git checkout master

This shows you the state of the repository before you started your feature. From here, you have the
option to merge in the completed feature, branch off a brand new, unrelated feature, or do some
work with the stable version of your project.
git merge
Merging is Git's way of putting a forked history back together again. The git merge command lets
you take the independent lines of development created by git branch and integrate them into a
single branch.

Note that all of the commands presented below merge into the current branch. The current branch
will be updated to reflect the merge, but the target branch will be completely unaffected. Again, this
means that git merge is often used in conjunction with git checkout for selecting the current branch
and git branch -d for deleting the obsolete target branch.

Usage

git merge <branch>

Merge the specified branch into the current branch. Git will determine the merge algorithm
automatically (discussed below).

git merge --no-ff <branch>

Merge the specified branch into the current branch, but always generate a merge commit (even if it
was a fast-forward merge). This is useful for documenting all merges that occur in your repository.

Discussion
Once you’ve finished developing a feature in an isolated branch, it's important to be able to get it
back into the main code base. Depending on the structure of your repository, Git has several distinct
algorithms to accomplish this: a fast-forward merge or a 3-way merge.

A fast-forward merge can occur when there is a linear path from the current branch tip to the target
branch. Instead of “actually” merging the branches, all Git has to do to integrate the histories is
move (i.e., “fast forward”) the current branch tip up to the target branch tip. This effectively
combines the histories, since all of the commits reachable from the target branch are now available
through the current one. For example, a fast forward merge of some-feature into master would look
something like the following:
However, a fast-forward merge is not possible if the branches have diverged. When there is not a
linear path to the target branch, Git has no choice but to combine them via a 3-way merge. 3-way
merges use a dedicated commit to tie together the two histories. The nomenclature comes from the
fact that Git uses three commits to generate the merge commit: the two branch tips and their
common ancestor.

While you can use either of these merge strategies, many developers like to use fast-forward merges
(facilitated through rebasing) for small features or bug fixes, while reserving 3-way merges for the
integration of longer-running features. In the latter case, the resulting merge commit serves as a
symbolic joining of the two branches.

Resolving Conflicts
If the two branches you‘re trying to merge both changed the same part of the same file, Git won’t be
able to figure out which version to use. When such a situation occurs, it stops right before the merge
commit so that you can resolve the conflicts manually.
The great part of Git's merging process is that it uses the familiar edit/stage/commit workflow to
resolve merge conflicts. When you encounter a merge conflict, running the git status command
shows you which files need to be resolved. For example, if both branches modified the same section
of hello.py, you would see something like the following:

# On branch master
# Unmerged paths:
# (use "git add/rm ..." as appropriate to mark resolution)
#
# both modified: hello.py
#

Then, you can go in and fix up the merge to your liking. When you're ready to finish the merge, all
you have to do is run git add on the conflicted file(s) to tell Git they're resolved. Then, you run a
normal git commit to generate the merge commit. It’s the exact same process as committing an
ordinary snapshot, which means it’s easy for normal developers to manage their own merges.

Note that merge conflicts will only occur in the event of a 3-way merge. It’s not possible to have
conflicting changes in a fast-forward merge.

Example
Fast-Forward Merge

Our first example demonstrates a fast-forward merge. The code below creates a new branch, adds
two commits to it, then integrates it into the main line with a fast-forward merge.

# Start a new feature


git checkout -b new-feature master

# Edit some files


git add <file>
git commit -m "Start a feature"

# Edit some files


git add <file>
git commit -m "Finish a feature"

# Merge in the new-feature branch


git checkout master
git merge new-feature
git branch -d new-feature

This is a common workflow for short-lived topic branches that are used more as an isolated
development than an organizational tool for longer-running features.

Also note that Git should not complain about the git branch -d, since new-feature is now accessible
from the master branch.
3-Way Merge

The next example is very similar, but requires a 3-way merge because master progresses while the
feature is in-progress. This is a common scenario for large features or when several developers are
working on a project simultaneously.

# Start a new feature


git checkout -b new-feature master

# Edit some files


git add <file>
git commit -m "Start a feature"

# Edit some files


git add <file>
git commit -m "Finish a feature"

# Develop the master branch


git checkout master

# Edit some files


git add <file>
git commit -m "Make some super-stable changes to master"

# Merge in the new-feature branch


git merge new-feature
git branch -d new-feature

Note that it’s impossible for Git to perform a fast-forward merge, as there is no way to move master
up to new-feature without backtracking.

For most workflows, new-feature would be a much larger feature that took a long time to develop,
which would be why new commits would appear on master in the meantime. If your feature branch
was actually as small as the one in the above example, you would probably be better off rebasing it
onto master and doing a fast-forward merge. This prevents superfluous merge commits from
cluttering up the project history.
Comparing Workflows

The array of possible workflows can make it hard to know where to begin when implementing Git in
the workplace. This page provides a starting point by surveying the most common Git workflows for
enterprise teams.

As you read through, remember that these workflows are designed to be guidelines rather than
concrete rules. We want to show you what’s possible, so you can mix and match aspects from
different workflows to suit your individual needs.

Centralized Workflow
Transitioning to a distributed version control system may seem like a daunting task, but you don’t
have to change your existing workflow to take advantage of Git. Your team can develop projects in
the exact same way as they do with Subversion.

However, using Git to power your development workflow presents a few advantages over SVN. First,
it gives every developer their own local copy of the entire project. This isolated environment lets
each developer work independently of all other changes to a project—they can add commits to their
local repository and completely forget about upstream developments until it's convenient for them.

Second, it gives you access to Git’s robust branching and merging model. Unlike SVN, Git branches
are designed to be a fail-safe mechanism for integrating code and sharing changes between
repositories.

How It Works
Like Subversion, the Centralized Workflow uses a central repository to serve as the single point-of-
entry for all changes to the project. Instead of trunk, the default development branch is called
master and all changes are committed into this branch. This workflow doesn’t require any other
branches besides master.

Developers start by cloning the central repository. In their own local copies of the project, they edit
files and commit changes as they would with SVN; however, these new commits are stored locally—
they’re completely isolated from the central repository. This lets developers defer synchronizing
upstream until they’re at a convenient break point.

To publish changes to the official project, developers “push” their local master branch to the central
repository. This is the equivalent of svn commit, except that it adds all of the local commits that
aren’t already in the central master branch.

Managing Conflicts
The central repository represents the official project, so its commit history should be treated as
sacred and immutable. If a developer’s local commits diverge from the central repository, Git will
refuse to push their changes because this would overwrite official commits.
Before the developer can publish their feature, they need to fetch the updated central commits and
rebase their changes on top of them. This is like saying, “I want to add my changes to what everyone
else has already done.” The result is a perfectly linear history, just like in traditional SVN workflows.

If local changes directly conflict with upstream commits, Git will pause the rebasing process and give
you a chance to manually resolve the conflicts. The nice thing about Git is that it uses the same git
status and git add commands for both generating commits and resolving merge conflicts. This makes
it easy for new developers to manage their own merges. Plus, if they get themselves into trouble, Git
makes it very easy to abort the entire rebase and try again (or go find help).

Example
Let’s take a step-by-step look at how a typical small team would collaborate using this workflow.
We’ll see how two developers, John and Mary, can work on separate features and share their
contributions via a centralized repository.

Someone initializes the central repository


First, someone needs to create the central repository on a server. If it’s a new project, you can
initialize an empty repository. Otherwise, you’ll need to import an existing Git or SVN repository.

Central repositories should always be bare repositories (they shouldn’t have a working directory),
which can be created as follows:

ssh user@host git init --bare /path/to/repo.git

Be sure to use a valid SSH username for user, the domain or IP address of your server for host, and
the location where you'd like to store your repo for /path/to/repo.git. Note that the .git extension is
conventionally appended to the repository name to indicate that it’s a bare repository.

Everybody clones the central repository


Git Workflows: Clone Central Repo
Next, each developer creates a local copy of the entire project. This is accomplished via the git clone
command:

git clone ssh://user@host/path/to/repo.git

When you clone a repository, Git automatically adds a shortcut called origin that points back to the
“parent” repository, under the assumption that you'll want to interact with it further on down the
road.

John works on his feature

In his local repository, John can develop features using the standard Git commit process: edit, stage,
and commit. If you’re not familiar with the staging area, it’s a way to prepare a commit without
having to include every change in the working directory. This lets you create highly focused commits,
even if you’ve made a lot of local changes.
git status # View the state of the repo
git add <some-file> # Stage a file
git commit # Commit a file</some-file>

Remember that since these commands create local commits, John can repeat this process as many
times as he wants without worrying about what’s going on in the central repository. This can be very
useful for large features that need to be broken down into simpler, more atomic chunks.

Mary works on her feature

Meanwhile, Mary is working on her own feature in her own local repository using the same
edit/stage/commit process. Like John, she doesn’t care what’s going on in the central repository, and
she really doesn’t care what John is doing in his local repository, since all local repositories are
private.

John publishes his feature


Once John finishes his feature, he should publish his local commits to the central repository so other
team members can access it. He can do this with the git push command, like so:

git push origin master

Remember that origin is the remote connection to the central repository that Git created when John
cloned it. The master argument tells Git to try to make the origin’s master branch look like his local
master branch. Since the central repository hasn’t been updated since John cloned it, this won’t
result in any conflicts and the push will work as expected.

Mary tries to publish her feature

Let’s see what happens if Mary tries to push her feature after John has successfully published his
changes to the central repository. She can use the exact same push command:

git push origin master

But, since her local history has diverged from the central repository, Git will refuse the request with
a rather verbose error message:

error: failed to push some refs to '/path/to/repo.git'


hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Merge the remote changes (e.g. 'git pull')
hint: before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

This prevents Mary from overwriting official commits. She needs to pull John’s updates into her
repository, integrate them with her local changes, and then try again.
Mary rebases on top of John’s commit(s)

Mary can use git pull to incorporate upstream changes into her repository. This command is sort of
like svn update—it pulls the entire upstream commit history into Mary’s local repository and tries to
integrate it with her local commits:

git pull --rebase origin master

The --rebase option tells Git to move all of Mary’s commits to the tip of the master branch after
synchronising it with the changes from the central repository, as shown below:

The pull would still work if you forgot this option, but you would wind up with a superfluous “merge
commit” every time someone needed to synchronize with the central repository. For this workflow,
it’s always better to rebase instead of generating a merge commit.
Mary resolves a merge conflict

Rebasing works by transferring each local commit to the updated master branch one at a time. This
means that you catch merge conflicts on a commit-by-commit basis rather than resolving all of them
in one massive merge commit. This keeps your commits as focused as possible and makes for a clean
project history. In turn, this makes it much easier to figure out where bugs were introduced and, if
necessary, to roll back changes with minimal impact on the project.

If Mary and John are working on unrelated features, it’s unlikely that the rebasing process will
generate conflicts. But if it does, Git will pause the rebase at the current commit and output the
following message, along with some relevant instructions:

CONFLICT (content): Merge conflict in <some-file>


The great thing about Git is that anyone can resolve their own merge conflicts. In our example, Mary
would simply run a git status to see where the problem is. Conflicted files will appear in the
Unmerged paths section:

# Unmerged paths:
# (use "git reset HEAD <some-file>..." to unstage)
# (use "git add/rm <some-file>..." as appropriate to mark resolution)
#
# both modified: <some-file>

Then, she’ll edit the file(s) to her liking. Once she’s happy with the result, she can stage the file(s) in
the usual fashion and let git rebase do the rest:

git add <some-file>


git rebase --continue

And that’s all there is to it. Git will move on to the next commit and repeat the process for any other
commits that generate conflicts.

If you get to this point and realize and you have no idea what’s going on, don’t panic. Just execute
the following command and you’ll be right back to where you started before you ran [git pull --
rebase](/tutorials/syncing/git-pull):

git rebase --abort

Mary successfully publishes her feature

After she’s done synchronizing with the central repository, Mary will be able to publish her changes
successfully:

git push origin master


Where To Go From Here
As you can see, it’s possible to replicate a traditional Subversion development environment using
only a handful of Git commands. This is great for transitioning teams off of SVN, but it doesn’t
leverage the distributed nature of Git.

If your team is comfortable with the Centralized Workflow but wants to streamline its collaboration
efforts, it's definitely worth exploring the benefits of the Feature Branch Workflow. By dedicating an
isolated branch to each feature, it’s possible to initiate in-depth discussions around new additions
before integrating them into the official project.

Feature Branch Workflow

Once you've got the hang of the Centralized Workflow, adding feature branches to your
development process is an easy way to encourage collaboration and streamline communication
between developers.

The core idea behind the Feature Branch Workflow is that all feature development should take place
in a dedicated branch instead of the master branch. This encapsulation makes it easy for multiple
developers to work on a particular feature without disturbing the main codebase. It also means the
master branch will never contain broken code, which is a huge advantage for continuous integration
environments.

Encapsulating feature development also makes it possible to leverage pull requests, which are a way
to initiate discussions around a branch. They give other developers the opportunity to sign off on a
feature before it gets integrated into the official project. Or, if you get stuck in the middle of a
feature, you can open a pull request asking for suggestions from your colleagues. The point is, pull
requests make it incredibly easy for your team to comment on each other’s work.
How It Works
The Feature Branch Workflow still uses a central repository, and master still represents the official
project history. But, instead of committing directly on their local master branch, developers create a
new branch every time they start work on a new feature. Feature branches should have descriptive
names, like animated-menu-items or issue-#1061. The idea is to give a clear, highly-focused purpose
to each branch.

Git makes no technical distinction between the master branch and feature branches, so developers
can edit, stage, and commit changes to a feature branch just as they did in the Centralized
Workflow.

In addition, feature branches can (and should) be pushed to the central repository. This makes it
possible to share a feature with other developers without touching any official code. Since master is
the only “special” branch, storing several feature branches on the central repository doesn’t pose
any problems. Of course, this is also a convenient way to back up everybody’s local commits.

Pull Requests
Aside from isolating feature development, branches make it possible to discuss changes via pull
requests. Once someone completes a feature, they don’t immediately merge it into master. Instead,
they push the feature branch to the central server and file a pull request asking to merge their
additions into master. This gives other developers an opportunity to review the changes before they
become a part of the main codebase.

Code review is a major benefit of pull requests, but they’re actually designed to be a generic way to
talk about code. You can think of pull requests as a discussion dedicated to a particular branch. This
means that they can also be used much earlier in the development process. For example, if a
developer needs help with a particular feature, all they have to do is file a pull request. Interested
parties will be notified automatically, and they’ll be able to see the question right next to the
relevant commits.

Once a pull request is accepted, the actual act of publishing a feature is much the same as in the
Centralized Workflow. First, you need to make sure your local master is synchronized with the
upstream master. Then, you merge the feature branch into master and push the updated master
back to the central repository.

Pull requests can be facilitated by product repository management solutions like Bitbucket Cloud or
Bitbucket Server. View the Bitbucket Server pull requests documentation for an example.

Example
The example included below demonstrates a pull request as a form of code review, but remember
that they can serve many other purposes.

Mary begins a new feature


Before she starts developing a feature, Mary needs an isolated branch to work on. She can request a
new branch with the following command:

git checkout -b marys-feature master

This checks out a branch called marys-feature based on master, and the -b flag tells Git to create the
branch if it doesn’t already exist. On this branch, Mary edits, stages, and commits changes in the
usual fashion, building up her feature with as many commits as necessary:

git status
git add <some-file>
git commit

Mary goes to lunch

Mary adds a few commits to her feature over the course of the morning. Before she leaves for lunch,
it’s a good idea to push her feature branch up to the central repository. This serves as a convenient
backup, but if Mary was collaborating with other developers, this would also give them access to her
initial commits.

git push -u origin marys-feature


This command pushes marys-feature to the central repository (origin), and the -u flag adds it as a
remote tracking branch. After setting up the tracking branch, Mary can call git push without any
parameters to push her feature.

Mary finishes her feature

When Mary gets back from lunch, she completes her feature. Before merging it into master, she
needs to file a pull request letting the rest of the team know she's done. But first, she should make
sure the central repository has her most recent commits:

git push

Then, she files the pull request in her Git GUI asking to merge marys-feature into master, and team
members will be notified automatically. The great thing about pull requests is that they show
comments right next to their related commits, so it's easy to ask questions about specific
changesets.

Bill receives the pull request


Bill gets the pull request and takes a look at marys-feature. He decides he wants to make a few
changes before integrating it into the official project, and he and Mary have some back-and-forth via
the pull request.

Mary makes the changes

To make the changes, Mary uses the exact same process as she did to create the first iteration of her
feature. She edits, stages, commits, and pushes updates to the central repository. All her activity
shows up in the pull request, and Bill can still make comments along the way.

If he wanted, Bill could pull marys-feature into his local repository and work on it on his own. Any
commits he added would also show up in the pull request.
Mary publishes her feature

Once Bill is ready to accept the pull request, someone needs to merge the feature into the stable
project (this can be done by either Bill or Mary):

git checkout master


git pull
git pull origin marys-feature
git push
First, whoever’s performing the merge needs to check out their master branch and make sure it’s up
to date. Then, git pull origin marys-feature merges the central repository’s copy of marys-feature.
You could also use a simple git merge marys-feature, but the command shown above makes sure
you’re always pulling the most up-to-date version of the feature branch. Finally, the updated master
needs to get pushed back to origin.

This process often results in a merge commit. Some developers like this because it’s like a symbolic
joining of the feature with the rest of the code base. But, if you’re partial to a linear history, it’s
possible to rebase the feature onto the tip of master before executing the merge, resulting in a fast-
forward merge.

Some GUI’s will automate the pull request acceptance process by running all of these commands just
by clicking an “Accept” button. If yours doesn’t, it should at least be able to automatically close the
pull request when the feature branch gets merged into master

Meanwhile, John is doing the exact same thing


While Mary and Bill are working on marys-feature and discussing it in her pull request, John is doing
the exact same thing with his own feature branch. By isolating features into separate branches,
everybody can work independently, yet it’s still trivial to share changes with other developers when
necessary.

Where To Go From Here


For a walkthrough of feature branching on Bitbucket, check out the Using Git Branches
documentation. By now, you can hopefully see how feature branches are a way to quite literally
multiply the functionality of the single master branch used in the Centralized Workflow. In addition,
feature branches also facilitate pull requests, which makes it possible to discuss specific commits
right inside of your version control GUI.

The Feature Branch Workflow is an incredibly flexible way to develop a project. The problem is,
sometimes it’s too flexible. For larger teams, it’s often beneficial to assign more specific roles to
different branches. The Gitflow Workflow is a common pattern for managing feature development,
release preparation, and maintenance.

Gitflow Workflow

The Gitflow Workflow section below is derived from Vincent Driessen at nvie.

The Gitflow Workflow defines a strict branching model designed around the project release. While
somewhat more complicated than the Feature Branch Workflow, this provides a robust framework
for managing larger projects.

This workflow doesn’t add any new concepts or commands beyond what’s required for the Feature
Branch Workflow. Instead, it assigns very specific roles to different branches and defines how and
when they should interact. In addition to feature branches, it uses individual branches for preparing,
maintaining, and recording releases. Of course, you also get to leverage all the benefits of the
Feature Branch Workflow: pull requests, isolated experiments, and more efficient collaboration.

How It Works
The Gitflow Workflow still uses a central repository as the communication hub for all developers.
And, as in the other workflows, developers work locally and push branches to the central repo. The
only difference is the branch structure of the project.

Historical Branches
Instead of a single master branch, this workflow uses two branches to record the history of the
project. The master branch stores the official release history, and the develop branch serves as an
integration branch for features. It's also convenient to tag all commits in the master branch with a
version number.
The rest of this workflow revolves around the distinction between these two branches.

Feature Branches
Each new feature should reside in its own branch, which can be pushed to the central repository for
backup/collaboration. But, instead of branching off of master, feature branches use develop as their
parent branch. When a feature is complete, it gets merged back into develop. Features should never
interact directly with master.

Note that feature branches combined with the develop branch is, for all intents and purposes, the
Feature Branch Workflow. But, the Gitflow Workflow doesn’t stop there.

Release Branches
Once develop has acquired enough features for a release (or a predetermined release date is
approaching), you fork a release branch off of develop. Creating this branch starts the next release
cycle, so no new features can be added after this point—only bug fixes, documentation generation,
and other release-oriented tasks should go in this branch. Once it's ready to ship, the release gets
merged into master and tagged with a version number. In addition, it should be merged back into
develop, which may have progressed since the release was initiated.

Using a dedicated branch to prepare releases makes it possible for one team to polish the current
release while another team continues working on features for the next release. It also creates well-
defined phases of development (e.g., it‘s easy to say, “this week we’re preparing for version 4.0” and
to actually see it in the structure of the repository).

Common conventions:

 branch off: develop


 merge into: master
 naming convention: release-* or release/*

Maintenance Branches
Maintenance or “hotfix” branches are used to quickly patch production releases. This is the only
branch that should fork directly off of master. As soon as the fix is complete, it should be merged
into both master and develop (or the current release branch), and master should be tagged with an
updated version number.

Having a dedicated line of development for bug fixes lets your team address issues without
interrupting the rest of the workflow or waiting for the next release cycle. You can think of
maintenance branches as ad hoc release branches that work directly with master.

Example
The example below demonstrates how this workflow can be used to manage a single release cycle.
We’ll assume you have already created a central repository.

Create a develop branch


The first step is to complement the default master with a develop branch. A simple way to do this is
for one developer to create an empty develop branch locally and push it to the server:

git branch develop


git push -u origin develop

This branch will contain the complete history of the project, whereas master will contain an abridged
version. Other developers should now clone the central repository and create a tracking branch for
develop:

git clone ssh://user@host/path/to/repo.git


git checkout -b develop origin/develop

Everybody now has a local copy of the historical branches set up.

Mary and John begin new features

Our example starts with John and Mary working on separate features. They both need to create
separate branches for their respective features. Instead of basing it on master, they should both
base their feature branches on develop:

git checkout -b some-feature develop

Both of them add commits to the feature branch in the usual fashion: edit, stage, commit:

git status
git add <some-file>
git commit

Mary finishes her feature


After adding a few commits, Mary decides her feature is ready. If her team is using pull requests, this
would be an appropriate time to open one asking to merge her feature into develop. Otherwise, she
can merge it into her local develop and push it to the central repository, like so:

git pull origin develop


git checkout develop
git merge some-feature
git push
git branch -d some-feature

The first command makes sure the develop branch is up to date before trying to merge in the
feature. Note that features should never be merged directly into master. Conflicts can be resolved in
the same way as in the Centralized Workflow.

Mary begins to prepare a release

While John is still working on his feature, Mary starts to prepare the first official release of the
project. Like feature development, she uses a new branch to encapsulate the release preparations.
This step is also where the release’s version number is established:
git checkout -b release-0.1 develop

This branch is a place to clean up the release, test everything, update the documentation, and do
any other kind of preparation for the upcoming release. It’s like a feature branch dedicated to
polishing the release.

As soon as Mary creates this branch and pushes it to the central repository, the release is feature-
frozen. Any functionality that isn’t already in develop is postponed until the next release cycle.

Mary finishes the release

Once the release is ready to ship, Mary merges it into master and develop, then deletes the release
branch. It’s important to merge back into develop because critical updates may have been added to
the release branch and they need to be accessible to new features. Again, if Mary’s organization
stresses code review, this would be an ideal place for a pull request.

git checkout master


git merge release-0.1
git push
git checkout develop
git merge release-0.1
git push
git branch -d release-0.1

Release branches act as a buffer between feature development (develop) and public releases
(master). Whenever you merge something into master, you should tag the commit for easy
reference:

git tag -a 0.1 -m "Initial public release" master


git push --tags
Git comes with several hooks, which are scripts that execute whenever a particular event occurs
within a repository. It’s possible to configure a hook to automatically build a public release whenever
you push the master branch to the central repository or push a tag.

End-user discovers a bug

Maintenance Branch

After shipping the release, Mary goes back to developing features for the next release with John.
That is, until an end-user opens a ticket complaining about a bug in the current release. To address
the bug, Mary (or John) creates a maintenance branch off of master, fixes the issue with as many
commits as necessary, then merges it directly back into master.

git checkout -b issue-#001 master


# Fix the bug
git checkout master
git merge issue-#001
git push

Like release branches, maintenance branches contain important updates that need to be included in
develop, so Mary needs to perform that merge as well. Then, she’s free to delete the branch:

git checkout develop


git merge issue-#001
git push
git branch -d issue-#001

Where To Go From Here

By now, you’re hopefully quite comfortable with the Centralized Workflow, the Feature Branch
Workflow, and the Gitflow Workflow. You should also have a solid grasp on the potential of local
repositories, the push/pull pattern, and Git's robust branching and merging model.
Remember that the workflows presented here are merely examples of what’s possible—they are not
hard-and-fast rules for using Git in the workplace. So, don't be afraid to adopt some aspects of a
workflow and disregard others. The goal should always be to make Git work for you, not the other
way around.

Forking Workflow
The Forking Workflow is fundamentally different than the other workflows discussed in this tutorial.
Instead of using a single server-side repository to act as the “central” codebase, it gives every
developer a server-side repository. This means that each contributor has not one, but two Git
repositories: a private local one and a public server-side one.

The main advantage of the Forking Workflow is that contributions can be integrated without the
need for everybody to push to a single central repository. Developers push to their own server-side
repositories, and only the project maintainer can push to the official repository. This allows the
maintainer to accept commits from any developer without giving them write access to the official
codebase.

The result is a distributed workflow that provides a flexible way for large, organic teams (including
untrusted third-parties) to collaborate securely. This also makes it an ideal workflow for open source
projects.

How It Works
As in the other Git workflows, the Forking Workflow begins with an official public repository stored
on a server. But when a new developer wants to start working on the project, they do not directly
clone the official repository.

Instead, they fork the official repository to create a copy of it on the server. This new copy serves as
their personal public repository—no other developers are allowed to push to it, but they can pull
changes from it (we’ll see why this is important in a moment). After they have created their server-
side copy, the developer performs a git clone to get a copy of it onto their local machine. This serves
as their private development environment, just like in the other workflows.

When they're ready to publish a local commit, they push the commit to their own public repository
—not the official one. Then, they file a pull request with the main repository, which lets the project
maintainer know that an update is ready to be integrated. The pull request also serves as a
convenient discussion thread if there are issues with the contributed code.

To integrate the feature into the official codebase, the maintainer pulls the contributor’s changes
into their local repository, checks to make sure it doesn’t break the project, merges it into his local
master branch, then pushes the master branch to the official repository on the server. The
contribution is now part of the project, and other developers should pull from the official repository
to synchronize their local repositories.

The Official Repository


It’s important to understand that the notion of an “official” repository in the Forking Workflow is
merely a convention. From a technical standpoint, Git doesn’t see any difference between each
developer’s public repository and the official one. In fact, the only thing that makes the official
repository so official is that it’s the public repository of the project maintainer.

Branching in the Forking Workflow


All of these personal public repositories are really just a convenient way to share branches with
other developers. Everybody should still be using branches to isolate individual features, just like in
the Feature Branch Workflow and the Gitflow Workflow. The only difference is how those branches
get shared. In the Forking Workflow, they are pulled into another developer’s local repository, while
in the Feature Branch and Gitflow Workflows they are pushed to the official repository.

Example

The project maintainer initializes the official repository


Forking Workflow: Shared Repository

As with any Git-based project, the first step is to create an official repository on a server accessible
to all of the team members. Typically, this repository will also serve as the public repository of the
project maintainer.

Public repositories should always be bare, regardless of whether they represent the official codebase
or not. So, the project maintainer should run something like the following to set up the official
repository:
ssh user@host
git init --bare /path/to/repo.git
Bitbucket also provides a convenient GUI alternative to the above commands. This is the exact same
process as setting up a central repository for the other workflows in this tutorial. The maintainer
should also push the existing codebase to this repository, if necessary.

Developers fork the official repository

Next, all of the other developers need to fork this official repository. It’s possible to do this by
SSH’ing into the server and running git clone to copy it to another location on the server—yes,
forking is basically just a server-side clone. But again, Bitbucket let developers fork a repository with
the click of a button.

After this step, every developer should have their own server-side repository. Like the official
repository, all of these should be bare repositories.

Developers clone their forked repositories


Next each developer needs to clone their own public repository. They can do with the familiar git
clone command.

Our example assumes the use of Bitbucket to host these repositories. Remember, in this situation,
each developer should have their own Bitbucket account and they should clone their server-side
repository using:

git clone https://[email protected]/user/repo.git

Whereas the other workflows in this tutorial use a single origin remote that points to the central
repository, the Forking Workflow requires two remotes—one for the official repository, and one for
the developer’s personal server-side repository. While you can call these remotes anything you
want, a common convention is to use origin as the remote for your forked repository (this will be
created automatically when you run git clone) and upstream for the official repository.

git remote add upstream https://bitbucket.org/maintainer/repo

ou’ll need to create the upstream remote yourself using the above command. This will let you easily
keep your local repository up-to-date as the official project progresses. Note that if your upstream
repository has authentication enabled (i.e., it‘s not open source), you’ll need to supply a username,
like so:

git remote add upstream https://[email protected]/maintainer/repo.git

This requires users to supply a valid password before cloning or pulling from the official codebase.

Developers work on their features


In the local repositories that they just cloned, developers can edit code, commit changes, and create
branches just like they did in the other workflows:

git checkout -b some-feature


# Edit some code
git commit -a -m "Add first draft of some feature"
All of their changes will be entirely private until they push it to their public repository. And, if the
official project has moved forward, they can access new commits with git pull:

git pull upstream master

Since developers should be working in a dedicated feature branch, this should generally result in a
fast-forward merge.

Developers publish their features

Once a developer is ready to share their new feature, they need to do two things. First, they have to
make their contribution accessible to other developers by pushing it to their public repository. Their
origin remote should already be set up, so all they should have to do is the following:
git push origin feature-branch

This diverges from the other workflows in that the origin remote points to the developer’s personal
server-side repository, not the main codebase.

Second, they need to notify the project maintainer that they want to merge their feature into the
official codebase. Bitbucket provides a “Pull request” button that leads to a form asking you to
specify which branch you want to merge into the official repository. Typically, you’ll want to
integrate your feature branch into the upstream remote’s master branch.

The project maintainer integrates their features

When the project maintainer receives the pull request, their job is to decide whether or not to
integrate it into the official codebase. They can do this in one of two ways:

1. Inspect the code directly in the pull request


2. Pull the code into their local repository and manually merge it

The first option is simpler, as it lets the maintainer view a diff of the changes, comment on it, and
perform the merge via a graphical user interface. However, the second option is necessary if the pull
request results in a merge conflict. In this case, the maintainer needs to fetch the feature branch
from the developer’s server-side repository, merge it into their local master branch, and resolve any
conflicts:

git fetch https://bitbucket.org/user/repo feature-branch


# Inspect the changes
git checkout master
git merge FETCH_HEAD

Once the changes are integrated into their local master, the maintainer needs to push it to the
official repository on the server so that other developers can access it:

git push origin master


Remember that the maintainer's origin points to their public repository, which also serves as the
official codebase for the project. The developer's contribution is now fully integrated into the
project.

Developers synchronize with the official repository

Since the main codebase has moved forward, other developers should synchronize with the official
repository:

git pull upstream master

Where To Go From Here


If you’re coming from an SVN background, the Forking Workflow may seem like a radical paradigm
shift. But don’t be afraid—all it’s really doing is introducing another level of abstraction on top of the
Feature Branch Workflow. Instead of sharing branches directly though a single central repository,
contributions are published to a server-side repository dedicated to the originating developer.

This article explained how a contribution flows from one developer into the official master branch,
but the same methodology can be used to integrate a contribution into any repository. For example,
if one part of your team is collaborating on a particular feature, they can share changes amongst
themselves in the exact same manner—without touching the main repository.

This makes the Forking Workflow a very powerful tool for loosely-knit teams. Any developer can
easily share changes with any other developer, and any branch can be efficiently merged into the
official codebase.
Migrating to Git
SVN to Git - prepping for the migration

In Why Git?, we discussed the many ways that Git can help your team become more agile. Once
you’ve decided to make the switch, your next step is to figure out how to migrate your existing
development workflow to Git.

This article explains some of the biggest changes you’ll encounter while transitioning your team from
SVN to Git. The most important thing to remember during the migration process is that Git is not
SVN. To realize the full potential of Git, try your best to open up to new ways of thinking about
version control.

For administrators
Adopting Git can take anywhere from a few days to several months depending on the size of your
team. This section addresses some of the main concerns for engineering managers when it comes to
training employees on Git and migrating repositories from SVN to Git.

Basic Git commands


Git task Notes Git commands
Configure the author name and email address to
git config —global user.name “Sam
be used with your commits.Note that Git strips
Tell Git who you are some characters (for example trailing periods)
Smith”git config —global
user.email [email protected]
from user.name.
Create a new local
git init
repository
Check out a repository Create a working copy of a local repository: git clone /path/to/repository
git clone
For a remote server, use:
username@host:/path/to/repository
Add files Add one or more files to staging (index): git add <filename>git add *
Commit changes to head (but not yet to the
Commit remote repository):
git commit -m “Commit message”
Commit any files you‘ve added with git add, and
also commit any files you’ve changed since git commit -a
then:
Send changes to the master branch of your
Push remote repository:
git push origin master
List the files you've changed and those you still
Status need to add or commit:
git status

Connect to a remote If you haven't connected your local repository to


a remote server, add the server to be able to git remote add origin <server>
repository push to it:
List all currently configured remote repositories: git remote -v
Branches Create a new branch and switch to it: git checkout -b <branchname>
Switch from one branch to another: git checkout <branchname>
List all the branches in your repo, and also tell
git branch
you what branch you're currently in:
Delete the feature branch: git branch -d <branchname>
Push the branch to your remote repository, so
git push origin <branchname>
others can use it:
Push all branches to your remote repository: git push —all origin
Delete a branch on your remote repository: git push origin :<branchname>
Update from the remote Fetch and merge changes on the remote server
git pull
repository to your working directory:
To merge a different branch into your active
git merge <branchname>
branch:
View all the merge conflicts:View the conflicts
git diffgit diff —base <filename>git
against the base file:Preview changes, before
diff <sourcebranch><targetbranch>
merging:
After you have manually resolved any conflicts,
git add <filename>
you mark the changed file:
You can use tagging to mark a significant
Tags git tag 1.0.0 <commitID>
changeset, such as a release:
CommitId is the leading characters of the
changeset ID, up to 10, but must be unique. Get git log
the ID using:
Push all tags to remote repository: git push —tags origin
If you mess up, you can replace the changes in
Undo local changes your working tree with the last content in git checkout — <filename>
head:Changes already added to the index, as
well as new files, will be kept.
Instead, to drop all your local changes and
git fetch origingit reset —hard
commits, fetch the latest history from the server
origin/master
and point your local master branch at it, do this:
Search Search the working directory for foo(): git grep “foo()”

Git once had a reputation for a steep learning curve. However the Git maintainers have been steadily
releasing new improvements like sensible defaults and contextual help messages that have made
the on-boarding process a lot more pleasant.

Atlassian offers a comprehensive series of self-paced Git tutorials, as well as webinars and live
training sessions. Together, these should provide all the training options your team needs to get
started with Git. To get you started, here are a list of some basic Git commands to get you going with
Git:

Git Migration Tools

There’s a number of tools available to help you migrate your existing projects from SVN to Git, but
before you decide what tools to use, you need to figure out how you want to migrate your code.
Your options are:

 Migrate your entire codebase to Git and stop using SVN altogether.
 Don’t migrate any existing projects to Git, but use Git for all new projects.
 Migrate some of your projects to Git while continuing to use SVN for other projects.
 Use SVN and Git simultaneously on the same projects.

A complete transition to Git limits the complexity in your development workflow, so this is the
preferred option. However, this isn’t always possible in larger companies with dozens of
development teams and potentially hundreds of projects. In these situations, a hybrid approach is a
safer option.

Your choice of migration tool(s) depends largely on which of the above strategies you choose. Some
of the most common SVN-to-Git migration tools are introduced below.

Atlassian’s migration scripts


If you’re interested in making an abrupt transition to Git, Atlassian’s migration scripts are a good
choice for you. These scripts provide all the tools you need to reliably convert your existing SVN
repositories to Git repositories. The resulting native-Git history ensures you won’t need to deal with
any SVN-to-Git interoperability issues after the conversion process.

We’ve provided a complete technical walkthrough for using these scripts to convert your entire
codebase to a collection of Git repositories. This walkthrough explains everything from extracting
SVN author information to re-organizing non-standard SVN repository structures.
SVN Mirror for Stash (now Bitbucket Server) plugin
SVN Mirror for Stash is a Bitbucket Server plugin that lets you easily maintain a hybrid codebase that
works with both SVN and Git. Unlike Atlassian’s migration scripts, SVN Mirror for Stash lets you use
Git and SVN simultaneously on the same project for as long as you like.

This compromise solution is a great option for larger companies. It enables incremental Git adoption
by letting different teams migrate workflows at their convenience.

Git-SVN
The git svn tool that comes with Git serves as an interface between a local Git repository and a
remote SVN repository. It lets developers write code and create commits locally with Git, then push
them up to a central SVN repository with svn commit-style behavior.

git svn is a good option if you’re not sure about making the switch to Git and want to let some of
your developers explore Git commands without committing to a full-on migration. It’s also perfect
for the training phase—instead of an abrupt transition, your team can ease into it with local Git
commands before worrying about collaboration workflows.

Note that git svn should only be a temporary phase of your migration process. Since it still depends
on SVN for the “backend,” it can’t leverage the more powerful Git features like branching or
advanced collaboration workflows.

Rollout Strategies
Migrating your codebase is only one aspect of adopting Git. You also need to consider how to
introduce Git to the people behind that codebase. External consultants, internal Git champions, and
pilots teams are the three main strategies for moving your development team over to Git.

External Git Consultants


Git consultants can essentially handle the migration process for you for a nominal fee. This has the
advantage of creating a Git workflow that’s perfectly suited to your team without investing the time
to figure it out on your own. It also makes expert training resources available to you while your team
is learning Git. Atlassian Experts are pros when it comes to SVN to Git migration and are a good
resource for sourcing a Git consultant.

On the other hand, designing and implementing a Git workflow on your own is a great way for your
team to understand the inner workings of their new development process. This avoids the risk of
your team being left in the dark when your consultant leaves.
Internal Git Champions
A Git champion is a developer inside of your company who’s excited to start using Git. Leveraging a
Git champion is a good option for companies with a strong developer culture and eager
programmers comfortable being early adopters. The idea is to enable one of your engineers to
become a Git expert so they can design a Git workflow tailored to your company and serve as an
internal consultant when it’s time to transition the rest of the team to Git.

Compared to an external consultant, this has the advantage of keeping your Git expertise in-house.
However, it requires a larger time investment to train that Git champion, and it runs the risk of
choosing the wrong Git workflow or implementing it incorrectly.

Pilot Teams
The third option for transitioning to Git is to test it out on a pilot team. This works best if you have a
small team working on a relatively isolated project. This could work even better by combining
external consultants with internal Git champions in the pilot team for a winning combo.

This has the advantage of requiring buy-in from your entire team, and also limits the risk of choosing
the wrong workflow, since it gets input from the entire team while designing the new development
process. In other words, it ensures any missing pieces are caught sooner than when a consultant or
champion designs the new workflow on their own.

On the other hand, using a pilot team means more initial training and setup time: instead of one
developer figuring out a new workflow, there’s a whole team that could potentially be temporarily
less productive while they’re getting comfortable with their new workflow. However, this short term
pain is absolutely worth the long term gain.

Security and Permissions


Access control is an aspect of Git where you need to fundamentally re-think how you manage your
codebase.

In SVN, you typically store your entire codebase in a single central repository, then limit access to
different teams or individuals by folder. In Git, this is not possible: developers must retrieve the
entire repository to work with it. You typically can not retrieve a subset of the repository, as you can
with SVN. permissions can only be granted to entire Git repositories.

This means you have to split up your large, monolithic SVN repository into several small Git
repositories. We actually experienced this first hand here at Atlassian when our JIRA development
team migrated to Git. All of our JIRA plugins used to be stored in a single SVN repository, but after
the migration, each plugin ended up in its own repository.

Keep in mind that Git was designed to securely integrate code contributions from thousands of
independent Linux developers, so it definitely provides some way to set up whatever kind of access
control your team needs. This may, however, require a fresh look at your build cycle.
If you’re concerned about maintaining dependencies between your new collection of Git
repositories, you may find a dependency management layer on top of Git helpful. A dependency
management layer will help with build times because as a project grows, you need “caching” in
order to speed up your build time. A list of recommended dependency management layer tools for
every technology stack can be found in this helpful article: “Git and project dependencies”.

For developers

A Repository for Every Developer


As a developer, the biggest change you’ll need to adjust to is the distributed nature of Git. Instead of
a single central repository, every developer has their own copy of the entire repository. This
dramatically changes the way you collaborate with your fellow programmers.

Instead of checking out an SVN repository with svn checkout and getting a working copy, you clone
the entire Git repository to your local machine with git clone.

Collaboration occurs by moving branches between repositories with either git push, git fetch, or git
pull. Sharing is commonly done on the branch level in Git but can be done on the commit level,
similar to SVN. But in Git, a commit represents the entire state of the whole project instead rather
than file modifications. Since you can use branches in both Git and SVN, the important distinction
here is that you can commit locally with Git, without sharing your work. This enables you to
experiment more freely, work more effectively offline and speeds up almost all version control
related commands.

However, it’s important to understand that a remote repository is not a direct link into somebody
else’s repository. It’s simply a bookmark that prevents you from having to re-type the full URL each
time you interact with a remote repository. Until you explicitly pull or push a branch to a remote
repository, you’re working in an isolated environment.

The other big adjustment for SVN users is the notion of “local” and “remote” repositories. Local
repositories are on your local machine, and all other repositories are referred to as remote
repositories. The main purpose of a remote repository is to make your code accessible to the rest of
the team, and thus no active development takes place in them. Local repositories reside on your
local machine, and it’s where you do all of your software development.

Don’t Be Scared of Branching or Merging


In SVN, you commit code by editing files in your working copy, then running svn commit to send the
code to the central repository. Everybody else can then pull those changes into their own working
copies with svn update. SVN branches are usually reserved for large, long-running aspects of a
project because merging is a dangerous procedure that has the potential to break the project.

Git’s basic development workflow is much different. Instead of being bound to a single line of
development (e.g., trunk/), life revolves around branching and merging.
When you want to start working on anything in Git, you create and check out a new branch with git
checkout -b <branch-name>. This gives you a dedicated line of development where you can write
code without worrying about affecting anyone else on your team. If you break something beyond
repair, you simply throw the branch away with git branch -d <branch-name>. If you build something
useful, you file a pull request asking to merge it into the master branch.

Potential Git Workflows


When choosing a Git workflow it is important to consider your team's needs. A simple workflow can
maximise development speed and flexibility, while a more complex workflow can ensure greater
consistency and control of work in progress. You can adapt and combine the general approaches
listed below to suit your needs and the different roles on your team. A core developer might use
feature branches while a contractor works from a fork, for example.

A centralized workflow provides the closest match to common SVN processes, so it's a good option
to get started.

Building on that idea, using a feature branch workflow lets developers keep their work in progress
isolated and important shared branches protected. Feature branches also form the basis for
managing changes via pull requests.

A Gitflow workflow is a more formal, structured extension to feature branching, making it a great
option for larger teams with well-defined release cycles.

Finally, consider a forking workflow if you need maximum isolation and control over changes, or
have many developers contributing to one repository.

But, if you really want to get the most out of Git as a professional team, you should consider the
feature branch workflow. This is a truly distributed workflow that is highly secure, incredibly
scalable, and quintessentially agile.

Conclusion
Transitioning your team to Git can be a daunting task, but it doesn’t have to be. This article
introduced some of the common options for migrating your existing codebase, rolling out Git to your
development teams, and dealing with security and permissions. We also introduced the biggest
challenges that your developers should be prepared for during the migration process.

Hopefully, you now have a solid foundation for introducing distributed development to your
company, regardless of its size or current development practices.
Migrate to Git from SVN

We’ve broken down the SVN-to-Git migration process into 5 simple steps:

1. Prepare your environment for the migration.


2. Convert the SVN repository to a local Git repository.
3. Synchronize the local Git repository when the SVN repository changes.
4. Share the Git repository with your developers via Bitbucket.
5. Migrate your development efforts from SVN to Git.

The prepare, convert, and synchronize steps take a SVN commit history and turn it into a Git
repository. The best way to manage these first 3 steps is to designate one of your team members as
the migration lead (if you’re reading this guide, that person is probably you). All 3 of these steps
should be performed on the migration lead’s local computer.
After the synchronize phase, the migration lead should have no trouble keeping a local Git repository
up-to-date with an SVN counterpart. To share the Git repository, the migration lead can share his
local Git repository with other developers by pushing it to Bitbucket, a Git hosting service.

Once it’s on Bitbucket, other developers can clone the converted Git repository to their local
machines, explore its history with Git commands, and begin integrating it into their build processes.
However, we advocate a one-way synchronization from SVN to Git until your team is ready to switch
to a pure Git workflow. This means that everybody should treat their Git repository as read-only and
continue committing to the original SVN repository. The only changes to the Git repository should
happen when the migration lead synchronizes it and pushes the updates to Bitbucket.

This provides a clear-cut transition period where your team can get comfortable with Git without
interrupting your existing SVN-based workflow. Once you’re confident that your developers are
ready to make the switch, the final step in the migration process is to freeze your SVN repository and
begin committing with Git instead.

This switch should be a very natural process, as the entire Git workflow is already in place and your
developers have had all the time they need to get comfortable with it. By this point, you have
successfully migrated your project from SVN to Git.

Prepare

The first step to migrating a project from SVN to Git-based version control is to prepare the
migration lead’s local machine. In this phase, you’ll download a convenient utility script, mount a
case-sensitive filesystem (if necessary), and map author information from SVN to Git.

All of the the following steps should be performed on the migration lead’s local machine.

Download the migration script


Git comes with most of the necessary tools for importing an SVN repository; however, there are a
few missing bits of functionality that Atlassian has rolled into a handy JAR file. This file will be
integral to the migration, so be sure to download svn-migration-scripts.jar from Atlassian’s Bitbucket
account. This guide assumes that you’ve saved it in your home directory.

Once you’ve downloaded it, it’s a good idea to verify the scripts to make sure you have the Java
Runtime Environment, Git, Subversion, and the git-svn utility installed. Open a command prompt
and run the following:

java -jar ~/svn-migration-scripts.jar verify

This will display an error message in the console if you don’t have the necessary programs for the
migration process. Make sure that any missing software is installed before moving on.

If you get a warning about being unable to determine a version, run export LANG=C (*nix) or SET
LANG=C (Windows) and try again.

If you’re performing the migration on a computer running OS X, you’ll also see the following
warning:
You appear to be running on a case-insensitive file-system. This is unsupported, and can result in
data loss.

We’ll address this in the next section.

Mount a case-sensitive disk image


Migrating to Git should be done on a case-sensitive file system to avoid corrupting the repository.
This is a problem if you’re performing the migration on an OS X computer, as the OS X filesystem
isn’t case-sensitive.

If you’re not running OS X, all you need to do is create a directory on your local machine called
~/GitMigration. This is where you will perform the conversion. After that, you can skip to the next
section.

If you are running OS X, you need to mount a case-sensitive disk image with the create-disk-image
script included in svn-migration-scripts.jar. It takes two parameters:

1. The size of the disk image to create in gigabytes. You can use any size you like, as long as it’s
bigger than the SVN repository that you’re trying to migrate.
2. The name of the disk image. This guide uses GitMigration for this value.

For example, the following command creates a 5GB disk image called GitMigration:

java -jar ~/svn-migration-scripts.jar create-disk-image 5 GitMigration

The disk image is mounted in your home directory, so you should now see a directory called
~/GitMigration on your local machine. This serves as a virtual case-sensitive filesystem, and it’s
where you’ll store the converted Git repository.

Extract the author information

SVN only records the username of the author for each revision. Git, however, stores the full name
and email address of the author. This means that you need to create a text file that maps SVN
usernames to their Git counterparts.
Run the following commands to automatically generate this text file:

cd ~/GitMigration

java -jar ~/svn-migration-scripts.jar authors <svn-repo> > authors.txt

Be sure to replace <svn-repo> with the URI of the SVN repository that you want to migrate. For
example, if your repository resided at https://svn.example.com, you would run the following:

java -jar ~/svn-migration-scripts.jar authors https://svn.example.com > authors.txt

This creates a text file called authors.txt that contains the username of every author in the SVN
repository along with a generated name and email address. It should look something like this:

j.doe = j.doe <[email protected]>

m.smith = m.smith <[email protected]>

Change the portion to the right of the equal sign to the full name and email address of the
corresponding user. For example, you might change the above authors to:

j.doe = John Doe <[email protected]>

m.smith = Mary Smith <[email protected]>

Summary
Now that you have your migration scripts, disk image (OS X only), and author information, you’re
ready to import your SVN history into a new Git repository. The next phase explains how this
conversion works.
Convert

The next step in the migration from SVN to Git is to import the contents of the SVN repository into a
new Git repository. We’ll do this with the git svn utility that is included with most Git distributions,
then we’ll clean up the results with svn-migration-scripts.jar.

Beware that the conversion process can take a significant amount of time for larger repositories,
even when cloning from a local SVN repository. As a benchmark, converting a 400MB repository with
33,000 commits on master took around 12 hours to complete.

For reasonably sized repositories, the following steps should be run on the migration lead’s local
computer. However, if you have a very large SVN repository and want to cut down on the conversion
time, you can run git svn clone on the SVN server instead of on the migration lead’s local machine.
This will avoid the overhead of cloning via a network connection.

Clone the SVN repository


The git svn clone command transforms the trunk, branches, and tags in your SVN repository into a
new Git repository. Depending on the structure of your SVN repo, the command needs to be
configured differently.
Standard SVN layouts

If your SVN project uses the standard /trunk, /branches, and /tags directory layout, you can use the
--stdlayout option instead of manually specifying the repository’s structure. Run the following
command in the ~/GitMigration directory:

git svn clone --stdlayout --authors-file=authors.txt

<svn-repo>/<project> <git-repo-name>

Where <svn-repo> is the URI of the SVN repository that you want to migrate and, <project> is the
name of the project that you want to import, and <git-repo-name> is the directory name of the new
Git repository.

For example, if you were migrating a project called Confluence, hosted on https://svn.atlassian.com,
you might run the following:

git svn clone --stdlayout --authors-file=authors.txt


https://svn.atlassian.com/Confluence ConfluenceAsGit

Non-standard SVN layouts

If your SVN repository doesn’t have a standard layout, you need to provide the locations of your
trunk, branches, and tags using the --trunk, --branches, and --tags command line options. For
example, if you have branches stored in both the /branches directory and the /bugfixes directories,
you would use the following command:

git svn clone --trunk=/trunk --branches=/branches


--branches=/bugfixes --tags=/tags --authors-file=authors.txt

<svn-repo>/<project> <git-repo-name>

Inspect the new Git repository


After git svn clone has finished (this might take a while), you’ll find a new directory called <git-repo-
name> in ~/GitMigration. This is the converted Git repository. You should be able to switch into <git-
repo-name> and run any of the standard Git commands to explore your project.

Branches and tags are not imported into the new Git repository as you might expect. You won’t find
any of your SVN branches in the git branch output, nor will you find any of your SVN tags in the git
tag output. But, if you run git branch -r, you’ll find all of the branches and tags from your SVN
repository. The git svn clone command imports your SVN branches as remote branches and imports
your SVN tags as remote branches prefixed with tags/.

This behavior makes certain two-way synchronization procedures easier, but it can be very confusing
when trying to make a one-way migration Git. That’s why our next step will be to convert these
remote branches to local branches and actual Git tags.

Clean the new Git repository


The clean-git script included in svn-migration-scripts.jar turns the SVN branches into local Git
branches and the SVN tags into full-fledged Git tags. Note that this is a destructive operation, and
you will not be able to move commits from the Git repository back into the SVN repository.

If you’re following this migration guide, this isn’t a problem, as it advocates a one-way sync from
SVN to Git (the Git repository is considered read-only until after the Migrate step). However, if
you’re planning on committing to the Git repository and the SVN repository during the migration
process, you should not perform the following commands. This is an advanced task, as is not
recommended for the typical project.

To see what can be cleaned up, run the following command in ~/GitMigration/<git-repo-name>:

java -Dfile.encoding=utf-8 -jar ~/svn-migration-scripts.jar clean-git

This will output all of the changes the script wants to make, but it won’t actually make any of them.
To execute these changes, you need to use the --force option, like so:

java -Dfile.encoding=utf-8 -jar ~/svn-migration-scripts.jar clean-git

--force

You should now see all of your SVN branches in the git branch output, along with your SVN tags in
the git tag output. This means that you’ve successfully converted your SVN project to a Git
repository.

Summary
In this step, you turned an SVN repository into a new Git repository with the git svn clone command,
then cleaned up the structure of the resulting repository with svn-migration-scripts.jar. In the next
step, you’ll learn how to keep this new Git repo in sync with any new commits to the SVN repository.
This will be a similar process to the conversion, but there are some important workflow
considerations during this transition period.

Synchronize

It’s very easy to synchronize your Git repository with new commits in the original SVN repository.
This makes for a comfortable transition period in the migration process where you can continue to
use your existing SVN workflow, but begin to experiment with Git.
It’s possible to synchronize in both directions. However, we recommend a one-way sync from SVN to
Git. During your transition period, you should only commit to your SVN repository, not your Git repo.
Once you’re confident that your team is ready to make the switch, you can complete the migration
process and begin to commit changes with Git instead of SVN.

In the meantime, you should continue to commit to your SVN repository and synchronize your Git
repository whenever necessary. This process is similar to the Convert phase, but since you’re only
dealing with incremental changes, it should be much more efficient.

Update the authors file


The authors.txt file that we used to map SVN usernames to full names and email addresses is
essential to the synchronization process. If it has been moved from the ~/GitMigration/authors.txt
location that we’ve been using thus far, you need to update its location with:

git config svn.authorsfile <path-to-authors-file>

If new developers have committed to the SVN repository since the last sync (or the initial clone), the
authors file needs to be updated accordingly. You can do this by manually appending new users to
authors.txt, or you can use the --authors-prog option, as discussed in the next section.

For one-off synchronizations it’s often easier to directly edit the authors file; however, the---authors-
prog option is preferred if you’re performing unsupervised syncs (i.e. in a scheduled task).

Automatically generating Git authors


If your authors file doesn’t need to be updated, you can skip to the next section.
The git svn command includes an option called --authors-prog, which points to a script that
automatically transforms SVN usernames into Git authors. You’ll need to configure this script to
accept the SVN username as its only argument and return a single line in the form of Name <email>
(just like the right hand side of the existing authors file). This option can be very useful if you need to
periodically add new developers to your project.

If you want to use the --authors-prog option, create a file called authors.sh option in ~/GitMigration.
Add the following line to authors.sh to return a dummy Git name and email for any authors that
aren’t found in authors.txt:

echo "$1 <[email protected]>"

Again, this will only generate a dummy name and email based on the SVN username, so feel free to
alter it if you can provide a more meaningful mapping.

Fetch the new SVN commits


Unlike SVN, Git makes a distinction between downloading upstream commits and integrating them
into the project. The former is called “fetching”, while the latter can be done via merging or
rebasing. In the ~/GitMigration directory, run the following command to fetch any new commits
from the original SVN repository.

git svn fetch

This is similar to the git svn clone command from the previous phase in that it only updates the Git
repository’s remote branches—the local branches will not reflect any of the updates yet. Your
remote branches, on the other hand, should exactly match your SVN repo’s history.

If you’re using the --authors-prog option, you need include it in the above command, like so:

git svn fetch --authors-prog=authors.sh

Synchronize with the fetched commits


To apply the downloaded commits to the repository, run the following command:

java -Dfile.encoding=utf-8 -jar ~/svn-migration-scripts.jar sync-rebase

This will rebase the fetched commits onto your local branches so that they match their remote
counterparts. You should now be able to see the new commits in your git log output.

Clean up the Git repo (again)

It’s also a good idea to run the git-clean script again to remove any obsolete tags or branches that
were deleted from the original SVN repository since the last sync:

java -Dfile.encoding=utf-8 -jar ~/svn-migration-scripts.jar clean-git

--force
Your local Git repository should now be synchronized with your SVN repository.

Summary
During this transition period, it’s very important that your developers only commit to the original
SVN repository. The only time the Git repository should be updated is via the synchronization
process discussed above. This is much easier than managing a two-way synchronization workflow,
but it still allows you to start integrating Git into your build process.

Share

In SVN, developers share contributions by committing changes from a working copy on their local
computer to a central repository. Then, other developers pull these updates from the central repo
into their own local working copies.

Git’s collaboration workflow is much different. Instead of differentiating between working copies
and the central repository, Git gives each developer their own local copy of the entire repository.
Changes are committed to this local repository instead of a central one. To share updates with other
developers, you need to push these local changes to a public Git repository on a server. Then, the
other developers can pull your new commits from the public repo into their own local repositories.
Giving each developer their own complete repository is the heart of distributed version control, and
it opens up a wide array of potential workflows. You can read more about these workflows from our
Git Workflows section.

So far, you’ve only been working with a local Git repository. This page explains how to push this local
repo to a public repository hosted on Bitbucket. Sharing the Git repository during the migration
allows your team to experiment with Git commands without affecting their active SVN development.
Until you’re ready to make the switch, it’s very important to treat the shared Git repositories as
read-only. All development should continue to be committed to the original SVN repository.

Create a Bitbucket account


If you don’t already have a Bitbucket account, you’ll need to create one. Hosting is free for up to 5
users, so you can start experimenting with new Git workflows right away.

Create a Bitbucket repository


Next, you’ll need to create a Bitbucket repository. Bitbucket makes it very easy to administer your
hosted repositories via a web interface. All you have to do is click the Create repository button after
you’ve logged in.

In the resulting form, add a name and description for your repository. If your project is private, keep
the Access level option checked so that only designated developers are allowed to clone it. For the
Forking field, use Allow only private forks. Use Git for the Repository type, select any project
management tools you want to use, and select the primary programming language of your project in
the Language field.

To create the hosted repository, submit the form by clicking the Create repository button. After your
repository is set up, you’ll see a Next steps page that describes some useful commands for importing
an existing project. The rest of this page will walk you through those instructions step-by-step.

Add an origin remote


To make it easier to push commits from your local Git repository to the Bitbucket repository you just
created, you should record the Bitbucket repo’s URL in a remote. A remote is just a convenient
shortcut for a URL. Technically, you can use anything you like for the shortcut, but if the remote
repository serves as the official codebase for the project, it’s conventionally referred to as origin.
Run the following in your local Git repository to add your new Bitbucket repository as the origin
remote.

git remote add origin https://<user>@bitbucket.org/<user>/<repo>.git


Be sure to change <user> to your Bitbucket username and <repo> to the name of the Bitbucket
repository. You should also be able to copy and paste the complete URL from the Bitbucket web
interface.

After running the above command, you can use origin in other Git commands to refer to your
Bitbucket repository.

Push the local repository to Bitbucket


Next, you need to populate your Bitbucket repository with the contents of your local Git repository.
This is called “pushing,” and can be accomplished with the following command:

git push -u origin --all

The -u option tells Git to track the upstream branches. This enables Git to tell you if the remote
repo’s commit history is ahead or behind your local ones. The --all option pushes all of the local
branches to the remote repository.

You also need to push your local tags to the Bitbucket repository with the --tags option:

git push --tags


Your Bitbucket repository is now essentially a clone of your local repository. In the Bitbucket web
interface, you should be able to explore the entire commit history of all of your branches.

Share the repository with your team


All you have to do now is share the URL of your Bitbucket repository with any other developers that
need access to the repository. The URL for any Git repository can be copy-and-pasted from the
repository home page on Bitbucket:

If your repository is private, you’ll also need to grant access to your team members in the
Administration tab of the Bitbucket web interface. Users and groups can be managed by clicking the
Access management link the left sidebar.
As an alternative, you can use Bitbucket’s built-in invitation feature to invite other developers to fork
the repository. The invited users will automatically be given access to the repository, so you don’t
need to worry about granting permissions.

Once they have the URL of your repository, another developer can copy the repository to their local
machine with git clone and begin working with the project. For example, after running the following
command on their local machine, another developer would find a new Git repository containing the
project in the <destination> directory.

git clone https://<user>@bitbucket.org/<user>/<project>.git <destination>

You should now be able to push your local project to a remote repository, and your team should be
able to use that remote repository to clone the project onto their local machines. These are all the
tools you need to start collaborating with Git. However, you and your team should continue to
commit changes using SVN until everybody is ready to make the switch.

The only changes to the Git repository should come from the original SVN repository using the
synchronization process discussed on the previous page. For all intents and purposes, this means
that all of your Git repositories (both local and remote) are read-only. Your developers can
experiment with them, and you can begin to integrate them into your build process, but you should
avoid committing any permanent changes using Git.
Summary
In this step, you set up a Bitbucket repository to share your converted Git repository with other
developers. You should now have all the tools you need to implement any of the git workflows
described in Git Workflows. You can continue synchronizing with the SVN repository and sharing the
resulting Git commits via Bitbucket for as long as it takes to get your development team comfortable
with Git. Then, you can complete the migration process by retiring your SVN repository.

Migrate

This migration guide advocates a one-way synchronization from SVN to Git during the transition
period. This means that while your team is getting comfortable with Git, they should still only be
committing to the original SVN repository. When you’re ready to make the switch, the SVN
repository should freeze at whatever state it’s in. Then, developers should begin committing to their
local Git repositories and sharing them via Bitbucket.
The discrete switch from SVN to Git makes for a very intuitive migration. All of your developers
should already understand the new Git workflows that they’ll be using, and they should have had
plenty of time to practice using Git commands on the local repositories they cloned from Bitbucket.

This page guides you through the final step of the migration.

Synchronize the Git repository


Before finalizing your migration to Git, you should make sure that your Git repository contains any
new changes that have been committed to your SVN repository. You can do this with the same
process described in the Synchronize phase.

git svn fetch

java -Dfile.encoding=utf-8 -jar ~/svn-migration-scripts.jar sync-rebase

java -Dfile.encoding=utf-8 -jar ~/svn-migration-scripts.jar clean-git --force

Backup the SVN repository


While you can still see your pre-Git project history in the migrated repository, it’s a good idea to
backup the SVN repository just in case you ever need to explore the raw SVN data. An easy way to
backup an SVN repo is to run the following on the machine that hosts the central SVN repository. If
your SVN repo is hosted on a Linux machine, you can use the following:

svnadmin dump <svn-repo> | gzip -9 > <backup-file>

Replace <svn-repo> with the file path of the SVN repository that you’re backing up, and replace
<backup-file> with the file path of the compressed file containing the backup.
Make the SVN repository read-only
All of your developers should now be committing with Git. To enforce this convention, you can make
your SVN repository read-only. This process can vary depending on your server setup, but if you’re
using the svnserve daemon, you can accomplish this by editing your SVN repo’s conf/svnserve.conf
file. It’s [general] section should contain the following lines:

anon-access = read

auth-access = read

This tells svnserve that both anonymous and authenticated users only have read permissions.

Summary
And that’s all there is to migrating a project to Git. Your team should now be developing with a pure
Git workflow and enjoying all of the benefits of distributed development. Good job!

Advanced Tips

Atlassian’s Git tutorials introduce the most common Git commands, and our Git Workflows modules
discuss how these commands are typically used to facilitate collaboration. Alone, these are enough
to get a development team up and running with Git. But, if you really want to leverage the full power
of Git, you’re ready to dive into our Advanced Git articles.

Each of these articles provide an in-depth discussion of an advanced feature of Git. Instead of
presenting new commands and concepts, they refine your existing Git skills by explaining what’s
going on under the hood. Armed with this knowledge, you’ll be able to use familiar Git commands
more effectively. More importantly, you’ll never be scared of breaking your Git repository because
you’ll understand why it broke and how to fix it.
Merging vs. Rebasing

Git is all about working with divergent history. Its git merge and git rebase commands offer
alternative ways to integrate commits from different branches, and both options come with their
own advantages. In this article, we’ll discuss how and when a basic git merge operation can be
replaced with a rebase.

The git rebase command has a reputation for being magical Git voodoo that beginners should stay
away from, but it can actually make life much easier for a development team when used with care.
In this article, we’ll compare git rebase with the related git merge command and identify all of the
potential opportunities to incorporate rebasing into the typical Git workflow.

Conceptual Overview
The first thing to understand about git rebase is that it solves the same problem as git merge. Both
of these commands are designed to integrate changes from one branch into another branch—they
just do it in very different ways.

Consider what happens when you start working on a new feature in a dedicated branch, then
another team member updates the master branch with new commits. This results in a forked
history, which should be familiar to anyone who has used Git as a collaboration tool.
Now, let’s say that the new commits in master are relevant to the feature that you’re working on. To
incorporate the new commits into your feature branch, you have two options: merging or rebasing.

The Merge Option


The easiest option is to merge the master branch into the feature branch using something like the
following:

git checkout feature

git merge master

Or, you can condense this to a one-liner:

git merge master feature

This creates a new “merge commit” in the feature branch that ties together the histories of both
branches, giving you a branch structure that looks like this:
Merging is nice because it’s a non-destructive operation. The existing branches are not changed in
any way. This avoids all of the potential pitfalls of rebasing (discussed below).

On the other hand, this also means that the feature branch will have an extraneous merge commit
every time you need to incorporate upstream changes. If master is very active, this can pollute your
feature branch’s history quite a bit. While it’s possible to mitigate this issue with advanced git log
options, it can make it hard for other developers to understand the history of the project.

The Rebase Option


As an alternative to merging, you can rebase the feature branch onto master branch using the
following commands:

git checkout feature

git rebase master

This moves the entire feature branch to begin on the tip of the master branch, effectively
incorporating all of the new commits in master. But, instead of using a merge commit, rebasing re-
writes the project history by creating brand new commits for each commit in the original branch.
The major benefit of rebasing is that you get a much cleaner project history. First, it eliminates the
unnecessary merge commits required by git merge. Second, as you can see in the above diagram,
rebasing also results in a perfectly linear project history—you can follow the tip of feature all the
way to the beginning of the project without any forks. This makes it easier to navigate your project
with commands like git log, git bisect, and gitk.

But, there are two trade-offs for this pristine commit history: safety and traceability. If you don’t
follow the Golden Rule of Rebasing, re-writing project history can be potentially catastrophic for
your collaboration workflow. And, less importantly, rebasing loses the context provided by a merge
commit—you can’t see when upstream changes were incorporated into the feature.

Interactive Rebasing
Interactive rebasing gives you the opportunity to alter commits as they are moved to the new
branch. This is even more powerful than an automated rebase, since it offers complete control over
the branch’s commit history. Typically, this is used to clean up a messy history before merging a
feature branch into master.

To begin an interactive rebasing session, pass the i option to the git rebase command:

git checkout feature

git rebase -i master

This will open a text editor listing all of the commits that are about to be moved:

pick 33d5b7a Message for commit #1

pick 9480b3d Message for commit #2


pick 5c67e61 Message for commit #3

This listing defines exactly what the branch will look like after the rebase is performed. By changing
the pick command and/or re-ordering the entries, you can make the branch’s history look like
whatever you want. For example, if the 2nd commit fixes a small problem in the 1st commit, you can
condense them into a single commit with the fixup command:

pick 33d5b7a Message for commit #1

fixup 9480b3d Message for commit #2

pick 5c67e61 Message for commit #3

When you save and close the file, Git will perform the rebase according to your instructions,
resulting in project history that looks like the following:

Eliminating insignificant commits like this makes your feature’s history much easier to understand.
This is something that git merge simply cannot do.

The Golden Rule of Rebasing


Once you understand what rebasing is, the most important thing to learn is when not to do it. The
golden rule of git rebase is to never use it on public branches.

For example, think about what would happen if you rebased master onto your feature branch:
The rebase moves all of the commits in master onto the tip of feature. The problem is that this only
happened in your repository. All of the other developers are still working with the original master.
Since rebasing results in brand new commits, Git will think that your master branch’s history has
diverged from everybody else’s.

The only way to synchronize the two master branches is to merge them back together, resulting in
an extra merge commit and two sets of commits that contain the same changes (the original ones,
and the ones from your rebased branch). Needless to say, this is a very confusing situation.

So, before you run git rebase, always ask yourself, “Is anyone else looking at this branch?” If the
answer is yes, take your hands off the keyboard and start thinking about a non-destructive way to
make your changes (e.g., the git revert command). Otherwise, you’re safe to re-write history as
much as you like.

Force-Pushing
If you try to push the rebased master branch back to a remote repository, Git will prevent you from
doing so because it conflicts with the remote master branch. But, you can force the push to go
through by passing the --force flag, like so:

# Be very careful with this command!

git push --force

This overwrites the remote master branch to match the rebased one from your repository and
makes things very confusing for the rest of your team. So, be very careful to use this command only
when you know exactly what you’re doing.

One of the only times you should be force-pushing is when you’ve performed a local cleanup after
you’ve pushed a private feature branch to a remote repository (e.g., for backup purposes). This is
like saying, “Oops, I didn’t really want to push that original version of the feature branch. Take the
current one instead.” Again, it’s important that nobody is working off of the commits from the
original version of the feature branch.

Workflow Walkthrough
Rebasing can be incorporated into your existing Git workflow as much or as little as your team is
comfortable with. In this section, we’ll take a look at the benefits that rebasing can offer at the
various stages of a feature’s development.

The first step in any workflow that leverages git rebase is to create a dedicated branch for each
feature. This gives you the necessary branch structure to safely utilize rebasing:

Local Cleanup
One of the best ways to incorporate rebasing into your workflow is to clean up local, in-progress
features. By periodically performing an interactive rebase, you can make sure each commit in your
feature is focused and meaningful. This lets you write your code without worrying about breaking it
up into isolated commits—you can fix it up after the fact.

When calling git rebase, you have two options for the new base: The feature’s parent branch (e.g.,
master), or an earlier commit in your feature. We saw an example of the first option in the
Interactive Rebasing section. The latter option is nice when you only need to fix up the last few
commits. For example, the following command begins an interactive rebase of only the last 3
commits.

git checkout feature

git rebase -i HEAD~3


By specifying HEAD~3 as the new base, you’re not actually moving the branch—you’re just
interactively re-writing the 3 commits that follow it. Note that this will not incorporate upstream
changes into the feature branch.

If you want to re-write the entire feature using this method, the git merge-base command can be
useful to find the original base of the feature branch. The following returns the commit ID of the
original base, which you can then pass to git rebase:

git merge-base feature master

This use of interactive rebasing is a great way to introduce git rebase into your workflow, as it only
affects local branches. The only thing other developers will see is your finished product, which
should be a clean, easy-to-follow feature branch history.

But again, this only works for private feature branches. If you’re collaborating with other developers
via the same feature branch, that branch is public, and you’re not allowed to re-write its history.

There is no git merge alternative for cleaning up local commits with an interactive rebase.

Incorporating Upstream Changes Into a Feature


In the Conceptual Overview section, we saw how a feature branch can incorporate upstream
changes from master using either git merge or git rebase. Merging is a safe option that preserves the
entire history of your repository, while rebasing creates a linear history by moving your feature
branch onto the tip of master.

This use of git rebase is similar to a local cleanup (and can be performed simultaneously), but in the
process it incorporates those upstream commits from master.

Keep in mind that it’s perfectly legal to rebase onto a remote branch instead of master. This can
happen when collaborating on the same feature with another developer and you need to
incorporate their changes into your repository.
For example, if you and another developer named John added commits to the feature branch, your
repository might look like the following after fetching the remote feature branch from John’s
repository:

You can resolve this fork the exact same way as you integrate upstream changes from master: either
merge your local feature with john/feature, or rebase your local feature onto the tip of john/feature.

Note that this rebase doesn’t violate the Golden Rule of Rebasing because only your local feature
commits are being moved—everything before that is untouched. This is like saying, “add my changes
to what John has already done.” In most circumstances, this is more intuitive than synchronizing
with the remote branch via a merge commit.
By default, the git pull command performs a merge, but you can force it to integrate the remote
branch with a rebase by passing it the --rebase option.

Reviewing a Feature With a Pull Request


If you use pull requests as part of your code review process, you need to avoid using git rebase after
creating the pull request. As soon as you make the pull request, other developers will be looking at
your commits, which means that it’s a public branch. Re-writing its history will make it impossible for
Git and your teammates to track any follow-up commits added to the feature.

Any changes from other developers need to be incorporated with git merge instead of git rebase.

For this reason, it’s usually a good idea to clean up your code with an interactive rebase before
submitting your pull request.

Integrating an Approved Feature


After a feature has been approved by your team, you have the option of rebasing the feature onto
the tip of the master branch before using git merge to integrate the feature into the main code base.

This is a similar situation to incorporating upstream changes into a feature branch, but since you’re
not allowed to re-write commits in the master branch, you have to eventually use git merge to
integrate the feature. However, by performing a rebase before the merge, you’re assured that the
merge will be fast-forwarded, resulting in a perfectly linear history. This also gives you the chance to
squash any follow-up commits added during a pull request.
If you’re not entirely comfortable with git rebase, you can always perform the rebase in a temporary
branch. That way, if you accidentally mess up your feature’s history, you can check out the original
branch and try again. For example:

git checkout feature

git checkout -b temporary-branch

git rebase -i master

# [Clean up the history]

git checkout master

git merge temporary-branch

Summary
And that’s all you really need to know to start rebasing your branches. If you would prefer a clean,
linear history free of unnecessary merge commits, you should reach for git rebase instead of git
merge when integrating changes from another branch.

On the other hand, if you want to preserve the complete history of your project and avoid the risk of
re-writing public commits, you can stick with git merge. Either option is perfectly valid, but at least
now you have the option of leveraging the benefits of git rebase.
Resetting, Checking Out, and Reverting

The git reset, git checkout, and git revert commands are all similar in that they undo some type of
change in your repository. But, they all affect different combinations of the working directory, staged
snapshot, and commit history. This article clearly defines how these commands differ and when each
of them should be used in the standard Git workflows.

The git reset, git checkout, and git revert command are some of the most useful tools in your Git
toolbox. They all let you undo some kind of change in your repository, and the first two commands
can be used to manipulate either commits or individual files.

Because they’re so similar, it’s very easy to mix up which command should be used in any given
development scenario. In this article, we’ll compare the most common configurations of git reset, git
checkout, and git revert. Hopefully, you’ll walk away with the confidence to navigate your repository
using any of these commands.
It helps to think about each command in terms of their effect on the three main components of a Git
repository: the working directory, the staged snapshot, and the commit history. Keep these
components in mind as you read through this article.

Commit-level Operation
The parameters that you pass to git reset and git checkout determine their scope. When you don’t
include a file path as a parameter, they operate on whole commits. That’s what we’ll be exploring in
this section. Note that git revert has no file-level counterpart.

Reset
On the commit-level, resetting is a way to move the tip of a branch to a different commit. This can be
used to remove commits from the current branch. For example, the following command moves the
hotfix branch backwards by two commits.

git checkout hotfix

git reset HEAD~2

The two commits that were on the end of hotfix are now dangling commits, which means they will
be deleted the next time Git performs a garbage collection. In other words, you’re saying that you
want to throw away these commits. This can be visualized as the following:
This usage of git reset is a simple way to undo changes that haven’t been shared with anyone else.
It’s your go-to command when you’ve started working on a feature and find yourself thinking, “Oh
crap, what am I doing? I should just start over.”

In addition to moving the current branch, you can also get git reset to alter the staged snapshot
and/or the working directory by passing it one of the following flags:

 --soft – The staged snapshot and working directory are not altered in any way.
 --mixed – The staged snapshot is updated to match the specified commit, but the working
directory is not affected. This is the default option.
 --hard – The staged snapshot and the working directory are both updated to match the
specified commit.

It’s easier to think of these modes as defining the scope of a git reset operation:
These flags are often used with HEAD as the parameter. For instance, git reset --mixed HEAD has the
affect of unstaging all changes, but leaves them in the working directory. On the other hand, if you
want to completely throw away all your uncommitted changes, you would use git reset --hard HEAD.
These are two of the most common uses of git reset.

Be careful when passing a commit other than HEAD to git reset, since this re-writes the current
branch’s history. As discussed in The Golden Rule of Rebasing, this a big problem when working on a
public branch.

Checkout
By now, you should be very familiar with the commit-level version of git checkout. When passed a
branch name, it lets you switch between branches.

git checkout hotfix

Internally, all the above command does is move HEAD to a different branch and update the working
directory to match. Since this has the potential to overwrite local changes, Git forces you to commit
or stash any changes in the working directory that will be lost during the checkout operation. Unlike
git reset, git checkout doesn’t move any branches around.
You can also check out arbitrary commits by passing in the commit reference instead of a branch.
This does the exact same thing as checking out a branch: it moves the HEAD reference to the
specified commit. For example, the following command will check out out the grandparent of the
current commit:

git checkout HEAD~2


This is useful for quickly inspecting an old version of your project. However, since there is no branch
reference to the current HEAD, this puts you in a detached HEAD state. This can be dangerous if you
start adding new commits because there will be no way to get back to them after you switch to
another branch. For this reason, you should always create a new branch before adding commits to a
detached HEAD.

Revert
Reverting undoes a commit by creating a new commit. This is a safe way to undo changes, as it has
no chance of re-writing the commit history. For example, the following command will figure out the
changes contained in the 2nd to last commit, create a new commit undoing those changes, and tack
the new commit onto the existing project.

git checkout hotfix

git revert HEAD~2

This can be visualized as the following:


Contrast this with git reset, which does alter the existing commit history. For this reason, git revert
should be used to undo changes on a public branch, and git reset should be reserved for undoing
changes on a private branch.

You can also think of git revert as a tool for undoing committed changes, while git reset HEAD is for
undoing uncommitted changes.

Like git checkout, git revert has the potential to overwrite files in the working directory, so it will ask
you to commit or stash changes that would be lost during the revert operation.

File-level Operations

The git reset and git checkout commands also accept an optional file path as a parameter. This
dramatically alters their behavior. Instead of operating on entire snapshots, this forces them to limit
their operations to a single file.
Reset
When invoked with a file path, git reset updates the staged snapshot to match the version from the
specified commit. For example, this command will fetch the version of foo.py in the 2nd-to-last
commit and stage it for the next commit:

git reset HEAD~2 foo.py

As with the commit-level version of git reset, this is more commonly used with HEAD rather than an
arbitrary commit. Running git reset HEAD foo.py will unstage foo.py. The changes it contains will still
be present in the working directory.

The --soft, --mixed, and --hard flags do not have any effect on the file-level version of git reset, as the
staged snapshot is always updated, and the working directory is never updated.

Checkout
Checking out a file is similar to using git reset with a file path, except it updates the working
directory instead of the stage. Unlike the commit-level version of this command, this does not move
the HEAD reference, which means that you won’t switch branches.
For example, the following command makes foo.py in the working directory match the one from the
2nd-to-last commit:

git checkout HEAD~2 foo.py

Just like the commit-level invocation of git checkout, this can be used to inspect old versions of a
project—but the scope is limited to the specified file.

If you stage and commit the checked-out file, this has the effect of “reverting” to the old version of
that file. Note that this removes all of the subsequent changes to the file, whereas the git revert
command undoes only the changes introduced by the specified commit.

Like git reset, this is commonly used with HEAD as the commit reference. For instance, git checkout
HEAD foo.py has the effect of discarding unstaged changes to foo.py. This is similar behavior to git
reset HEAD --hard, but it operates only on the specified file.

Summary

You should now have all the tools you could ever need to undo changes in a Git repository. The git
reset, git checkout, and git revert commands can be confusing, but when you think about their
effects on the working directory, staged snapshot, and commit history, it should be easier to discern
which command fits the development task at hand.

The table below sums up the most common use cases for all of these commands. Be sure to keep
this reference handy, as you’ll undoubtedly need to use at least some them during your Git career.
Command Scope Common use cases
git reset Commit-level Discard commits in a private branch or throw away uncommited changes
git reset File-level Unstage a file
git checkout Commit-level Switch between branches or inspect old snapshots
git checkout File-level Discard changes in the working directory
git revert Commit-level Undo commits in a public branch
git revert File-level (N/A)

Advanced Git Log

The git log command is what makes your project history useful. Without it, you wouldn’t be able to
access any of your commits. But, if you’re like most aspiring Git users, you’ve probably only
scratched the surface of what’s possible with git log. This article walks you through its advanced
formatting and filtering options, giving you the power to extract all sorts of interesting information
from your Git repository.

The purpose of any version control system is to record changes to your code. This gives you the
power to go back into your project history to see who contributed what, figure out where bugs were
introduced, and revert problematic changes. But, having all of this history available is useless if you
don’t know how to navigate it. That’s where the git log command comes in.

By now, you should already know the basic git log command for displaying commits. But, you can
alter this output by passing many different parameters to git log.

The advanced features of git log can be split into two categories: formatting how each commit is
displayed, and filtering which commits are included in the output. Together, these two skills give you
the power to go back into your project and find any information that you could possibly need.
Formatting Log Output
First, this article will take a look at the many ways in which git log’s output can be formatted. Most of
these come in the form of flags that let you request more or less information from git log.

If you don’t like the default git log format, you can use git config’s aliasing functionality to create a
shortcut for any of the formatting options discussed below. Please see in The git config Command for
how to set up an alias.

Oneline
The --oneline flag condenses each commit to a single line. By default, it displays only the commit ID
and the first line of the commit message. Your typical git log --oneline output will look something like
this:

0e25143 Merge branch 'feature'

ad8621a Fix a bug in the feature

16b36c6 Add a new feature

23ad9ad Add the initial code base

This is very useful for getting a high-level overview of your project.

Decorating
Many times it’s useful to know which branch or tag each commit is associated with. The --decorate
flag makes git log display all of the references (e.g., branches, tags, etc) that point to each commit.

This can be combined with other configuration options. For example, running git log --oneline --
decorate will format the commit history like so:

0e25143 (HEAD, master) Merge branch 'feature'

ad8621a (feature) Fix a bug in the feature

16b36c6 Add a new feature

23ad9ad (tag: v0.9) Add the initial code base

This lets you know that the top commit is also checked out (denoted by HEAD) and that it is also the
tip of the master branch. The second commit has another branch pointing to it called feature, and
finally the 4th commit is tagged as v0.9.
Branches, tags, HEAD, and the commit history are almost all of the information contained in your Git
repository, so this gives you a more complete view of the logical structure of your repository.

Diffs
The git log command includes many options for displaying diffs with each commit. Two of the most
common options are --stat and -p.

The --stat option displays the number of insertions and deletions to each file altered by each commit
(note that modifying a line is represented as 1 insertion and 1 deletion). This is useful when you
want a brief summary of the changes introduced by each commit. For example, the following
commit added 67 lines to the hello.py file and removed 38 lines:

commit f2a238924e89ca1d4947662928218a06d39068c3

Author: John <[email protected]>

Date: Fri Jun 25 17:30:28 2014 -0500

Add a new feature

hello.py | 105 ++++++++++++++++++++++++-----------------

1 file changed, 67 insertion(+), 38 deletions(-)

The amount of + and - signs next to the file name show the relative number of changes to each file
altered by the commit. This gives you an idea of where the changes for each commit can be found.

If you want to see the actual changes introduced by each commit, you can pass the -p option to git
log. This outputs the entire patch representing that commit:

commit 16b36c697eb2d24302f89aa22d9170dfe609855b

Author: Mary <[email protected]>

Date: Fri Jun 25 17:31:57 2014 -0500

Fix a bug in the feature

diff --git a/hello.py b/hello.py

index 18ca709..c673b40 100644

--- a/hello.py

+++ b/hello.py

@@ -13,14 +13,14 @@ B

-print("Hello, World!")

+print("Hello, Git!")
For commits with a lot of changes, the resulting output can become quite long and unwieldy. More
often than not, if you’re displaying a full patch, you’re probably searching for a specific change. For
this, you want to use the pickaxe option.

The Shortlog
The git shortlog command is a special version of git log intended for creating release
announcements. It groups each commit by author and displays the first line of each commit
message. This is an easy way to see who’s been working on what.

For example, if two developers have contributed 5 commits to a project, the git shortlog output
might look like the following:

Mary (2):

Fix a bug in the feature

Fix a serious security hole in our framework

John (3):

Add the initial code base

Add a new feature

Merge branch 'feature'

By default, git shortlog sorts the output by author name, but you can also pass the -n option to sort
by the number of commits per author.

Graphs
The --graph option draws an ASCII graph representing the branch structure of the commit history.
This is commonly used in conjunction with the --oneline and --decorate commands to make it easier
to see which commit belongs to which branch:

git log --graph --oneline --decorate

For a simple repository with just 2 branches, this will produce the following:

* 0e25143 (HEAD, master) Merge branch 'feature'

|\

| * 16b36c6 Fix a bug in the new feature


| * 23ad9ad Start a new feature

* | ad8621a Fix a critical security issue

|/

* 400e4b7 Fix typos in the documentation

* 160e224 Add the initial code base

The asterisk shows which branch the commit was on, so the above graph tells us that the 23ad9ad
and 16b36c6 commits are on a topic branch and the rest are on the master branch.

While this is a nice option for simple repositories, you’re probably better off with a more full-
featured visualization tool like gitk or SourceTree for projects that are heavily branched.

Custom Formatting
For all of your other git log formatting needs, you can use the --pretty=format:"<string>" option. This
lets you display each commit however you want using printf-style placeholders.

For example, the %cn, %h and %cd characters in the following command are replaced with the
committer name, abbreviated commit hash, and the committer date, respectively.

git log --pretty=format:"%cn committed %h on %cd"

This results in the following format for each commit:

John committed 400e4b7 on Fri Jun 24 12:30:04 2014 -0500

John committed 89ab2cf on Thu Jun 23 17:09:42 2014 -0500

Mary committed 180e223 on Wed Jun 22 17:21:19 2014 -0500

John committed f12ca28 on Wed Jun 22 13:50:31 2014 -0500

The complete list of placeholders can be found in the Pretty Formats section of the git log manual
page.

Aside from letting you view only the information that you’re interested in, the --
pretty=format:"<string>" option is particularly useful when you’re trying to pipe git log output into
another command.
Filtering the Commit History

Formatting how each commit gets displayed is only half the battle of learning git log. The other half
is understanding how to navigate the commit history. The rest of this article introduces some of the
advanced ways to pick out specific commits in your project history using git log. All of these can be
combined with any of the formatting options discussed above.

By Amount
The most basic filtering option for git log is to limit the number of commits that are displayed. When
you’re only interested in the last few commits, this saves you the trouble of viewing all the commits
in a pager.

You can limit git log’s output by including the -<n> option. For example, the following command will
display only the 3 most recent commits.

git log -3

By Date

If you’re looking for a commit from a specific time frame, you can use the --after or --before flags for
filtering commits by date. These both accept a variety of date formats as a parameter. For example,
the following command only shows commits that were created after July 1st, 2014 (inclusive):

git log --after="2014-7-1"

You can also pass in relative references like "1 week ago" and "yesterday":

get log --after="yesterday"

To search for a commits that were created between two dates, you can provide both a --before and
--after date. For instance, to display all the commits added between July 1st, 2014 and July 4th,
2014, you would use the following:

git log --after="2014-7-1" --before="2014-7-4"

Note that the --since and --until flags are synonymous with --after and --before, respectively.

By Author
When you’re only looking for commits created by a particular user, use the --author flag. This
accepts a regular expression, and returns all commits whose author matches that pattern. If you
know exactly who you’re looking for, you can use a plain old string instead of a regular expression:
git log --author="John"

This displays all commits whose author includes the name John. The author name doesn’t need to be
an exact match—it just needs to contain the specified phrase.

You can also use regular expressions to create more complex searches. For example, the following
command searches for commits by either Mary or John.

git log --author="John\|Mary"

Note that the author’s email is also included with the author’s name, so you can use this option to
search by email, too.

If your workflow separates committers from authors, the --committer flag operates in the same
fashion.

By Message
To filter commits by their commit message, use the --grep flag. This works just like the --author flag
discussed above, but it matches against the commit message instead of the author.

For example, if your team includes relevant issue numbers in each commit message, you can use
something like the following to pull out all of the commits related to that issue:

git log --grep="JRA-224:"

You can also pass in the -i parameter to git log to make it ignore case differences while pattern
matching.

By File
Many times, you’re only interested in changes that happened to a particular file. To show the history
related to a file, all you have to do is pass in the file path. For example, the following returns all
commits that affected either the foo.py or the bar.py file:

git log -- foo.py bar.py

The -- parameter is used to tell git log that subsequent arguments are file paths and not branch
names. If there’s no chance of mixing it up with a branch, you can omit the --.

By Content
It’s also possible to search for commits that introduce or remove a particular line of source code.
This is called a pickaxe, and it takes the form of -S"<string>". For example, if you want to know when
the string Hello, World! was added to any file in the project, you would use the following command:
git log -S"Hello, World!"

If you want to search using a regular expression instead of a string, you can use the -G"<regex>" flag
instead.

This is a very powerful debugging tool, as it lets you locate all of the commits that affect a particular
line of code. It can even show you when a line was copied or moved to another file.

By Range
You can pass a range of commits to git log to show only the commits contained in that range. The
range is specified in the following format, where <since> and <until> are commit references:

git log <since>..<until>

This command is particularly useful when you use branch references as the parameters. It’s a simple
way to show the differences between 2 branches. Consider the following command:

git log master..feature

The master..feature range contains all of the commits that are in the feature branch, but aren’t in
the master branch. In other words, this is how far feature has progressed since it forked off of
master. You can visualize this as follows:

Note that if you switch the order of the range (feature..master), you will get all of the commits in
master, but not in feature. If git log outputs commits for both versions, this tells you that your
history has diverged.
Filtering Merge Commits
By default, git log includes merge commits in its output. But, if your team has an always-merge
policy (that is, you merge upstream changes into topic branches instead of rebasing the topic branch
onto the upstream branch), you’ll have a lot of extraneous merge commits in your project history.

You can prevent git log from displaying these merge commits by passing the --no-merges flag:

git log --no-merges

On the other hand, if you’re only interested in the merge commits, you can use the --merges flag:

git log --merges

This returns all commits that have at least two parents.

Summary
You should now be fairly comfortable using git log’s advanced parameters to format its output and
select which commits you want to display. This gives you the power to pull out exactly what you
need from your project history.

These new skills are an important part of your Git toolkit, but remember that git log is often used in
conjunction other Git commands. Once you’ve found the commit you’re looking for, you typically
pass it off to git checkout, git revert, or some other tool for manipulating your commit history. So, be
sure to keep on learning about Git’s advanced features.

Git Hooks
If you want to perform custom actions when a certain event takes place in a Git repository, hooks
are your tool of choice. They let you normalize commit messages, automate testing suites, notify
continuous integration systems, and much more. After this article, you’ll understand the many ways
in which Git hooks can streamline your workflow.

Git hooks are scripts that run automatically every time a particular event occurs in a Git repository.
They let you customize Git’s internal behavior and trigger customizable actions at key points in the
development life cycle.

Common use cases for Git hooks include encouraging a commit policy, altering the project
environment depending on the state of the repository, and implementing continuous integration
workflows. But, since scripts are infinitely customizable, you can use Git hooks to automate or
optimize virtually any aspect of your development workflow.

In this article, we’ll start with a conceptual overview of how Git hooks work. Then, we’ll survey some
of the most popular hooks for use in both local and server-side repositories.

Conceptual Overview

All Git hooks are ordinary scripts that Git executes when certain events occur in the repository. This
makes them very easy to install and configure.

Hooks can reside in either local or server-side repositories, and they are only executed in response
to actions in that repository. We’ll take a concrete look at categories of hooks later in this article.
The configuration discussed in the rest of this section apply to both local and server-side hooks.
Installing Hooks
Hooks reside in the .git/hooks directory of every Git repository. Git automatically populates this
directory with example scripts when you initialize a repository. If you take a look inside .git/hooks,
you’ll find the following files:

applypatch-msg.sample pre-push.sample

commit-msg.sample pre-rebase.sample

post-update.sample prepare-commit-msg.sample

pre-applypatch.sample update.sample

pre-commit.sample

These represent most of the available hooks, but the .sample extension prevents them from
executing by default. To “install” a hook, all you have to do is remove the .sample extension. Or, if
you’re writing a new script from scratch, you can simply add a new file matching one of the above
filenames, minus the .sample extension.

As an example, try installing a simple prepare-commit-msg hook. Remove the .sample extension
from this script, and add the following to the file:

#!/bin/sh

echo "# Please include a useful commit message!" > $1

Hooks need to be executable, so you may need to change the file permissions of the script if you’re
creating it from scratch. For example, to make sure that prepare-commit-msg is executable, you
would run the following command:

chmod +x prepare-commit-msg

You should now see this message in place of the default commit message every time you run git
commit. We’ll take a closer look at how this actually works in the Prepare Commit Message section.
For now, let’s just revel in the fact that we can customize some of Git’s internal functionality.

The built-in sample scripts are very useful references, as they document the parameters that are
passed in to each hook (they vary from hook to hook).

Scripting Languages
The built-in scripts are mostly shell and PERL scripts, but you can use any scripting language you like
as long as it can be run as an executable. The shebang line (#!/bin/sh) in each script defines how
your file should be interpreted. So, to use a different language, all you have to do is change it to the
path of your interpreter.
For instance, we can write an executable Python script in the prepare-commit-msg file instead of
using shell commands. The following hook will do the same thing as the shell script in the previous
section.

#!/usr/bin/env python

import sys, os

commit_msg_filepath = sys.argv[1]

with open(commit_msg_filepath, 'w') as f:

f.write("# Please include a useful commit message!")

Notice how the first line changed to point to the Python interpreter. And, instead of using $1 to
access the first argument passed to the script, we used sys.argv[1] (again, more on this in a
moment).

This is a very powerful feature for Git hooks because it lets you work in whatever language you’re
most comfortable with.

Scope of Hooks
Hooks are local to any given Git repository, and they are not copied over to the new repository when
you run git clone. And, since hooks are local, they can be altered by anybody with access to the
repository.

This has an important impact when configuring hooks for a team of developers. First, you need to
find a way to make sure hooks stay up-to-date amongst your team members. Second, you can’t
force developers to create commits that look a certain way—you can only encourage them to do so.

Maintaining hooks for a team of developers can be a little tricky because the .git/hooks directory
isn’t cloned with the rest of your project, nor is it under version control. A simple solution to both of
these problems is to store your hooks in the actual project directory (above the .git directory). This
lets you edit them like any other version-controlled file. To install the hook, you can either create a
symlink to it in .git/hooks, or you can simply copy and paste it into the .git/hooks directory whenever
the hook is updated.
As an alternative, Git also provides a Template Directory mechanism that makes it easier to install
hooks automatically. All of the files and directories contained in this template directory are copied
into the .git directory every time you use git init or git clone.

All of the local hooks described below can be altered—or completely un-installed—by the owner of a
repository. It’s entirely up to each team member whether or not they actually use a hook. With this
in mind, it’s best to think of Git hooks as a convenient developer tool rather than a strictly enforced
development policy.

That said, it is possible to reject commits that do not conform to some standard using server-side
hooks. We’ll talk more about this later in the article.

Local Hooks
Local hooks affect only the repository in which they reside. As you read through this section,
remember that each developer can alter their own local hooks, so you can’t use them as a way to
enforce a commit policy. They can, however, make it much easier for developers to adhere to
certain guidelines.

In this section, we’ll be exploring 6 of the most useful local hooks:

 pre-commit
 prepare-commit-msg
 commit-msg
 post-commit
 post-checkout
 pre-rebase
The first 4 hooks let you plug into the entire commit life cycle, and the final 2 let you perform some
extra actions or safety checks for the git checkout and git rebase commands, respectively.

All of the pre- hooks let you alter the action that’s about to take place, while the post- hooks are
used only for notifications.

We’ll also see some useful techniques for parsing hook arguments and requesting information about
the repository using lower-level Git commands.

Pre-Commit
The pre-commit script is executed every time you run git commit before Git asks the developer for a
commit message or generates a commit object. You can use this hook to inspect the snapshot that is
about to be committed. For example, you may want to run some automated tests that make sure
the commit doesn’t break any existing functionality.

No arguments are passed to the pre-commit script, and exiting with a non-zero status aborts the
entire commit. Let’s take a look at a simplified (and more verbose) version of the built-in pre-commit
hook. This script aborts the commit if it finds any whitespace errors, as defined by the git diff-index
command (trailing whitespace, lines with only whitespace, and a space followed by a tab inside the
initial indent of a line are considered errors by default).

#!/bin/sh

# Check if this is the initial commit


if git rev-parse --verify HEAD >/dev/null 2>&1
then
echo "pre-commit: About to create a new commit..."
against=HEAD
else
echo "pre-commit: About to create the first commit..."
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904
fi

# Use git diff-index to check for whitespace errors


echo "pre-commit: Testing for whitespace errors..."
if ! git diff-index --check --cached $against
then
echo "pre-commit: Aborting commit due to whitespace errors"
exit 1
else
echo "pre-commit: No whitespace errors :)"
exit 0
fi
In order to use git diff-index, we need to figure out which commit reference we’re comparing the
index to. Normally, this is HEAD; however, HEAD doesn’t exist when creating the initial commit, so
our first task is to account for this edge case. We do this with git rev-parse --verify, which simply
checks whether or not the argument (HEAD) is a valid reference. The >/dev/null 2>&1 portion
silences any output from git rev-parse. Either HEAD or an empty commit object is stored in the
against variable for use with git diff-index. The 4b825d... hash is a magic commit ID that represents
an empty commit.

The git diff-index --cached command compares a commit against the index. By passing the --check
option, we’re asking it to warn us if the changes introduces whitespace errors. If it does, we abort
the commit by returning an exit status of 1, otherwise we exit with 0 and the commit workflow
continues as normal.

This is just one example of the pre-commit hook. It happens to use existing Git commands to run
tests on the changes introduced by the proposed commit, but you can do anything you want in pre-
commit including executing other scripts, running a 3rd-party test suite, or checking code style with
Lint.

Prepare Commit Message


The prepare-commit-msg hook is called after the pre-commit hook to populate the text editor with a
commit message. This is a good place to alter the automatically generated commit messages for
squashed or merged commits.

One to three arguments are passed to the prepare-commit-msg script:

1. The name of a temporary file that contains the message. You change the commit message
by altering this file in-place.
2. The type of commit. This can be message (-m or -F option), template (-t option), merge (if
the commit is a merge commit), or squash (if the commit is squashing other commits).
3. The SHA1 hash of the relevant commit. Only given if -c, -C, or --amend option was given.

As with pre-commit, exiting with a non-zero status aborts the commit.

We already saw a simple example that edited the commit message, but let’s take a look at a more
useful script. When using an issue tracker, a common convention is to address each issue in a
separate branch. If you include the issue number in the branch name, you can write a prepare-
commit-msg hook to automatically include it in each commit message on that branch.
#!/usr/bin/env python

import sys, os, re


from subprocess import check_output

# Collect the parameters


commit_msg_filepath = sys.argv[1]
if len(sys.argv) > 2:
commit_type = sys.argv[2]
else:
commit_type = ''
if len(sys.argv) > 3:
commit_hash = sys.argv[3]
else:
commit_hash = ''

print "prepare-commit-msg: File: %s\nType: %s\nHash: %s" % (commit_msg_filepath, commit_type,


commit_hash)

# Figure out which branch we're on


branch = check_output(['git', 'symbolic-ref', '--short', 'HEAD']).strip()
print "prepare-commit-msg: On branch '%s'" % branch

# Populate the commit message with the issue #, if there is one


if branch.startswith('issue-'):
print "prepare-commit-msg: Oh hey, it's an issue branch."
result = re.match('issue-(.*)', branch)
issue_number = result.group(1)

with open(commit_msg_filepath, 'r+') as f:


content = f.read()
f.seek(0, 0)
f.write("ISSUE-%s %s" % (issue_number, content))

First, the above prepare-commit-msg hook shows you how to collect all of the parameters that are
passed to the script. Then, it calls git symbolic-ref --short HEAD to get the branch name that
corresponds to HEAD. If this branch name starts with issue-, it re-writes the commit message file
contents to include the issue number in the first line. So, if your branch name is issue-224, this will
generate the following commit message.

ISSUE-224
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch issue-224
# Changes to be committed:
# modified: test.txt

One thing to keep in mind when using prepare-commit-msg is that it runs even when the user passes
in a message with the -m option of git commit. This means that the above script will automatically
insert the ISSUE-[#] string without letting the user edit it. You can handle this case by seeing if the
2nd parameter (commit_type) is equal to message.

However, without the -m option, the prepare-commit-msg hook does allow the user to edit the
message after its generated, so this is really more of a convenience script than a way to enforce a
commit message policy. For that, you need the commit-msg hook discussed in the next section.

Commit Message
The commit-msg hook is much like the prepare-commit-msg hook, but it’s called after the user
enters a commit message. This is an appropriate place to warn developers that their message
doesn’t adhere to your team’s standards.

The only argument passed to this hook is the name of the file that contains the message. If it doesn’t
like the message that the user entered, it can alter this file in-place (just like with prepare-commit-
msg) or it can abort the commit entirely by exiting with a non-zero status.

For example, the following script checks to make sure that the user didn’t delete the ISSUE-[#] string
that was automatically generated by the prepare-commit-msg hook in the previous section.

#!/usr/bin/env python

import sys, os, re


from subprocess import check_output

# Collect the parameters


commit_msg_filepath = sys.argv[1]

# Figure out which branch we're on


branch = check_output(['git', 'symbolic-ref', '--short', 'HEAD']).strip()
print "commit-msg: On branch '%s'" % branch

# Check the commit message if we're on an issue branch


if branch.startswith('issue-'):
print "commit-msg: Oh hey, it's an issue branch."
result = re.match('issue-(.*)', branch)
issue_number = result.group(1)
required_message = "ISSUE-%s" % issue_number

with open(commit_msg_filepath, 'r') as f:


content = f.read()
if not content.startswith(required_message):
print "commit-msg: ERROR! The commit message must start with '%s'" % required_message
sys.exit(1)
While this script is called every time the user creates a commit, you should avoid doing much outside
of checking the commit message. If you need to notify other services that a snapshot was
committed, you should use the post-commit hook instead.
Post-Commit
The post-commit hook is called immediately after the commit-msg hook. It can’t change the
outcome of the git commit operation, so it’s used primarily for notification purposes.

The script takes no parameters and its exit status does not affect the commit in any way. For most
post-commit scripts, you’ll want access to the commit that was just created. You can use git rev-
parse HEAD to get the new commit’s SHA1 hash, or you can use git log -l HEAD to get all of its
information.

For example, if you want to email your boss every time you commit a snapshot (probably not the
best idea for most workflows), you could add the following post-commit hook.

#!/usr/bin/env python

import smtplib
from email.mime.text import MIMEText
from subprocess import check_output

# Get the git log --stat entry of the new commit


log = check_output(['git', 'log', '-1', '--stat', 'HEAD'])

# Create a plaintext email message


msg = MIMEText("Look, I'm actually doing some work:\n\n%s" % log)

msg['Subject'] = 'Git post-commit hook notification'


msg['From'] = '[email protected]'
msg['To'] = '[email protected]'

# Send the message


SMTP_SERVER = 'smtp.example.com'
SMTP_PORT = 587

session = smtplib.SMTP(SMTP_SERVER, SMTP_PORT)


session.ehlo()
session.starttls()
session.ehlo()
session.login(msg['From'], 'secretPassword')

session.sendmail(msg['From'], msg['To'], msg.as_string())


session.quit()

It’s possible to use post-commit to trigger a local continuous integration system, but most of the
time you’ll want to be doing this in the post-receive hook. This runs on the server instead of the
user’s local machine, and it also runs every time any developer pushes their code. This makes it a
much more appropriate place to perform your continuous integration.
Post-Checkout
The post-checkout hook works a lot like the post-commit hook, but it’s called whenever you
successfully check out a reference with git checkout. This is nice for clearing out your working
directory of generated files that would otherwise cause confusion.

This hook accepts three parameters, and its exit status has no affect on the git checkout command.

1. The ref of the previous HEAD


2. The ref of the new HEAD
3. A flag telling you if it was a branch checkout or a file checkout. The flag will be 1 and 0,
respectively.

A common problem with Python developers occurs when generated .pyc files stick around after
switching branches. The interpreter sometimes uses these .pyc instead of the .py source file. To
avoid any confusion, you can delete all .pyc files every time you check out a new branch using the
following post-checkout script:

#!/usr/bin/env python

import sys, os, re


from subprocess import check_output

# Collect the parameters


previous_head = sys.argv[1]
new_head = sys.argv[2]
is_branch_checkout = sys.argv[3]

if is_branch_checkout == "0":
print "post-checkout: This is a file checkout. Nothing to do."
sys.exit(0)

print "post-checkout: Deleting all '.pyc' files in working directory"


for root, dirs, files in os.walk('.'):
for filename in files:
ext = os.path.splitext(filename)[1]
if ext == '.pyc':
os.unlink(os.path.join(root, filename))

The current working directory for hook scripts is always set to the root of the repository, so the
os.walk('.') call iterates through every file in the repository. Then, we check its extension and delete
it if it’s a .pyc file.

You can also use the post-checkout hook to alter your working directory based on which branch you
have checked out. For example, you might use a plugins branch to store all of your plugins outside of
the core codebase. If these plugins require a lot of binaries that other branches do not, you can
selectively build them only when you’re on the plugins branch.

Pre-Rebase
The pre-rebase hook is called before git rebase changes anything, making it a good place to make
sure something terrible isn’t about to happen.

This hook takes 2 parameters: the upstream branch that the series was forked from, and the branch
being rebased. The second parameter is empty when rebasing the current branch. To abort the
rebase, exit with a non-zero status.

For example, if you want to completely disallow rebasing in your repository, you could use the
following pre-rebase script:

#!/bin/sh

# Disallow all rebasing


echo "pre-rebase: Rebasing is dangerous. Don't do it."
exit 1

Now, every time you run git rebase, you’ll see this message:

pre-rebase: Rebasing is dangerous. Don't do it.

The pre-rebase hook refused to rebase.

For a more in-depth example, take a look at the included pre-rebase.sample script. This script is a
little more intelligent about when to disallow rebasing. It checks to see if the topic branch that
you’re trying to rebase has already been merged into the next branch (which is assumed to be the
mainline branch). If it has, you’re probably going to get into trouble by rebasing it, so the script
aborts the rebase.

Server-side Hooks
Server-side hooks work just like local ones, except they reside in server-side repositories (e.g., a
central repository, or a developer’s public repository). When attached to the official repository,
some of these can serve as a way to enforce policy by rejecting certain commits.

There are 3 server-side hooks that we’ll be discussing in the rest of this article:

 pre-receive
 update
 post-receive

All of these hooks let you react to different stages of the git push process.

The output from server-side hooks are piped to the client’s console, so it’s very easy to send
messages back to the developer. But, you should also keep in mind that these scripts don’t return
control of the terminal until they finish executing, so you should be careful about performing long-
running operations.

Pre-Receive
The pre-receive hook is executed every time somebody uses git push to push commits to the
repository. It should always reside in the remote repository that is the destination of the push, not in
the originating repository.

The hook runs before any references are updated, so it’s a good place to enforce any kind of
development policy that you want. If you don’t like who is doing the pushing, how the commit
message is formatted, or the changes contained in the commit, you can simply reject it. While you
can’t stop developers from making malformed commits, you can prevent these commits from
entering the official codebase by rejecting them with pre-receive.

The script takes no parameters, but each ref that is being pushed is passed to the script on a
separate line on standard input in the following format:

<old-value> <new-value> <ref-name>

You can see how this hook works using a very basic pre-receive script that simply reads in the
pushed refs and prints them out.

#!/usr/bin/env python

import sys
import fileinput

# Read in each ref that the user is trying to update


for line in fileinput.input():
print "pre-receive: Trying to push ref: %s" % line

# Abort the push


# sys.exit(1)

Again, this is a little different than the other hooks because information is passed to the script via
standard input instead of as command-line arguments. After placing the above script in the
.git/hooks directory of a remote repository and pushing the master branch, you’ll see something like
the following in your console:

b6b36c697eb2d24302f89aa22d9170dfe609855b 85baa88c22b52ddd24d71f05db31f4e46d579095
refs/heads/master

You can use these SHA1 hashes, along with some lower-level Git commands, to inspect the changes
that are going to be introduced. Some common use cases include:

 Rejecting changes that involve an upstream rebase


 Preventing non-fast-forward merges
 Checking that the user has the correct permissions to make the intended changes (mostly
used for centralized Git workflows)

If multiple refs are pushed, returning a non-zero status from pre-receive aborts all of them. If you
want to accept or reject branches on a case-by-case basis, you need to use the update hook instead.

Update
The update hook is called after pre-receive, and it works much the same way. It’s still called before
anything is actually updated, but it’s called separately for each ref that was pushed. That means if
the user tries to push 4 branches, update is executed 4 times. Unlike pre-receive, this hook doesn’t
need to read from standard input. Instead, it accepts the following 3 arguments:

 The name of the ref being updated


 The old object name stored in the ref
 The new object name stored in the ref

This is the same information passed to pre-receive, but since update is invoked separately for each
ref, you can reject some refs while allowing others.

#!/usr/bin/env python

import sys

branch = sys.argv[1]
old_commit = sys.argv[2]
new_commit = sys.argv[3]

print "Moving '%s' from %s to %s" % (branch, old_commit, new_commit)

# Abort pushing only this branch


# sys.exit(1)

The above update hook simply outputs the branch and the old/new commit hashes. When pushing
more than one branch to the remote repository, you’ll see the print statement execute for each
branch.

Post-Receive
The post-receive hook gets called after a successful push operation, making it a good place to
perform notifications. For many workflows, this is a better place to trigger notifications than post-
commit because the changes are available on a public server instead of residing only on the user’s
local machine. Emailing other developers and triggering a continuous integration system are
common use cases for post-receive.

The script takes no parameters, but is sent the same information as pre-receive via standard input.

Summary

In this article, we learned how Git hooks can be used to alter internal behavior and receive
notifications when certain events occur in a repository. Hooks are ordinary scripts that reside in
the .git/hooks repository, which makes them very easy to install and customize.

We also looked at some of the most common local and server-side hooks. These let us plug in to the
entire development life cycle. We now know how to perform customizable actions at every stage in
the commit creation process, as well as the git push process. With a little bit of scripting knowledge,
this lets you do virtually anything you can imagine with a Git repository.

Refs and the Reflog


A ref is Git’s internal way of referring to a commit. You’re already familiar with many categories of
refs, including commit hashes and branch names. But, there are many other types of refs, and
virtually every Git command utilizes them in some form or another. You’ll walk away from this article
with an intimate knowledge of Git’s inner workings.

Git is all about commits: you stage commits, create commits, view old commits, and transfer
commits between repositories using many different Git commands. The majority of these commands
operate on a commit in some form or another, and many of them accept a commit reference as a
parameter. For example, you can use git checkout to view an old commit by passing in a commit
hash, or you can use it to switch branches by passing in a branch name.

By understanding the many ways to refer to a commit, you make all of these commands that much
more powerful. In this chapter, we’ll shed some light on the internal workings of common
commands like git checkout, git branch, and git push by exploring the many methods of referring to
a commit.

We’ll also learn how to revive seemingly “lost” commits by accessing them through Git’s reflog
mechanism.

Hashes

The most direct way to reference a commit is via its SHA-1 hash. This acts as the unique ID for each
commit. You can find the hash of all your commits in the git log output.

commit 0c708fdec272bc4446c6cabea4f0022c2b616eba

Author: Mary Johnson <[email protected]>

Date: Wed Jul 9 16:37:42 2014 -0500

Some commit message

When passing the commit to other Git commands, you only need to specify enough characters to
uniquely identify the commit. For example, you can inspect the above commit with git show by
running the following command:

git show 0c708f

It’s sometimes necessary to resolve a branch, tag, or another indirect reference into the
corresponding commit hash. For this, you can use the git rev-parse command. The following returns
the hash of the commit pointed to by the master branch:

git rev-parse master

This is particularly useful when writing custom scripts that accept a commit reference. Instead of
parsing the commit reference manually, you can let git rev-parse normalize the input for you.

Refs
A ref is an indirect way of referring to a commit. You can think of it as a user-friendly alias for a
commit hash. This is Git’s internal mechanism of representing branches and tags.

Refs are stored as normal text files in the .git/refs directory, where .git is usually called .git. To
explore the refs in one of your repositories, navigate to .git/refs. You should see the following
structure, but it will contain different files depending on what branches, tags, and remotes you have
in your repo:
.git/refs/

heads/

master

some-feature

remotes/

origin/

master

tags/

v0.9

The heads directory defines all of the local branches in you repository. Each filename matches the
name of the corresponding branch, and inside the file you’ll find a commit hash. This commit hash is
the location of the tip of the branch. To verify this, try running the following two commands from
the root of the Git repository:

# Output the contents of `refs/heads/master` file:

cat .git/refs/heads/master

# Inspect the commit at the tip of the `master` branch:

git log -1 master

The commit hash return by the cat command should match the commit ID displayed by git log.

To change the location of the master branch, all Git has to do is change the contents of the
refs/heads/master file. Similarly, creating a new branch is simply a matter of writing a commit hash
to a new file. This is part of the reason why Git branches are so lightweight compared to SVN.

The tags directory works the exact same way, but it contains tags instead of branches. The remotes
directory lists all remote repositories that you created with git remote as separate subdirectories.
Inside each one, you’ll find all the remote branches that have been fetched into your repository.

Specifying Refs
When passing a ref to a Git command, you can either define the full name of the ref, or use a short
name and let Git search for a matching ref. You should already be familiar with short names for refs,
as this is what you’re using each time you refer to a branch by name.
git show some-feature

The some-feature argument in the above command is actually a short name for the branch. Git
resolves this to refs/heads/some-feature before using it. You can also specify the full ref on the
command line, like so:

it show refs/heads/some-feature

This avoids any ambiguity regarding the location of the ref. This is necessary, for instance, if you had
both a tag and a branch called some-feature. However, if you’re using proper naming conventions,
ambiguity between tags and branches shouldn’t generally be a problem.

We’ll see more full ref names in the Refspecs section.

Packed Refs
For large repositories, Git will periodically perform a garbage collection to remove unnecessary
objects and compress refs into a single file for more efficient performance. You can force this
compression with the garbage collection command:

git gc

This moves all of the individual branch and tag files in the refs folder into a single file called packed-
refs located in the top of the .git directory. If you open up this file, you’ll find a mapping of commit
hashes to refs:

00f54250cf4e549fdfcafe2cf9a2c90bc3800285 refs/heads/feature

0e25143693cfe9d5c2e83944bbaf6d3c4505eb17 refs/heads/master

bb883e4c91c870b5fed88fd36696e752fb6cf8e6 refs/tags/v0.9

On the outside, normal Git functionality won’t be affected in any way. But, if you’re wondering why
your .git/refs folder is empty, this is where the refs went.
Special Refs

In addition to the refs directory, there are a few special refs that reside in the top-level .git directory.
They are listed below:

 HEAD – The currently checked-out commit/branch.


 FETCH_HEAD – The most recently fetched branch from a remote repo.
 ORIG_HEAD – A backup reference to HEAD before drastic changes to it.
 MERGE_HEAD – The commit(s) that you’re merging into the current branch with git merge.
 CHERRY_PICK_HEAD – The commit that you’re cherry-picking.

These refs are all created and updated by Git when necessary. For example, The git pull command
first runs git fetch, which updates the FETCH_HEAD reference. Then, it runs git merge FETCH_HEAD
to finish pulling the fetched branches into the repository. Of course, you can use all of these like any
other ref, as I’m sure you’ve done with HEAD.

These files contain different content depending on their type and the state of your repository. The
HEAD ref can contain either a symbolic ref, which is simply a reference to another ref instead of a
commit hash, or a commit hash. For example, take a look at the contents of HEAD when you’re on
the master branch:

git checkout master

cat .git/HEAD

This will output ref: refs/heads/master, which means that HEAD points to the refs/heads/master ref.
This is how Git knows that the master branch is currently checked out. If you were to switch to
another branch, the contents of HEAD would be updated to reflect the new branch. But, if you were
to check out a commit instead of a branch, HEAD would contain a commit hash instead of a symbolic
ref. This is how Git knows that it’s in a detached HEAD state.

For the most part, HEAD is the only reference that you’ll be using directly. The others are generally
only useful when writing lower-level scripts that need to hook into Git’s internal workings.

Refspecs

A refspec maps a branch in the local repository to a branch in a remote repository. This makes it
possible to manage remote branches using local Git commands and to configure some advanced git
push and git fetch behavior.

A refspec is specified as [+]<src>:<dst>. The <src> parameter is the source branch in the local
repository, and the <dst> parameter is the destination branch in the remote repository. The optional
+ sign is for forcing the remote repository to perform a non-fast-forward update.
Refspecs can be used with the git push command to give a different name to the remote branch. For
example, the following command pushes the master branch to the origin remote repo like an
ordinary git push, but it uses qa-master as the name for the branch in the origin repo. This is useful
for QA teams that need to push their own branches to a remote repo.

git push origin master:refs/heads/qa-master

You can also use refspecs for deleting remote branches. This is a common situation for feature-
branch workflows that push the feature branches to a remote repo (e.g., for backup purposes). The
remote feature branches still reside in the remote repo after they are deleted from the local repo, so
you get a build-up of dead feature branches as your project progresses. You can delete them by
pushing a refspec that has an empty <src> parameter, like so:

git push origin :some-feature

This is very convenient, since you don’t need to log in to your remote repository and manually delete
the remote branch. Note that as of Git v1.7.0 you can use the --delete flag instead of the above
method. The following will have the same effect as the above command:

git push origin --delete some-feature

By adding a few lines to the Git configuration file, you can use refspecs to alter the behavior of git
fetch. By default, git fetch fetches all of the branches in the remote repository. The reason for this is
the following section of the .git/config file:

[remote "origin"]

url = https://[email protected]:mary/example-repo.git

fetch = +refs/heads/*:refs/remotes/origin/*

The fetch line tells git fetch to download all of the branches from the origin repo. But, some
workflows don’t need all of them. For example, many continuous integration workflows only care
about the master branch. To fetch only the master branch, change the fetch line to match the
following:

[remote "origin"]

url = https://[email protected]:mary/example-repo.git

fetch = +refs/heads/master:refs/remotes/origin/master
You can also configure git push in a similar manner. For instance, if you want to always push the
master branch to qa-master in the origin remote (as we did above), you would change the config file
to:

[remote "origin"]

url = https://[email protected]:mary/example-repo.git

fetch = +refs/heads/master:refs/remotes/origin/master

push = refs/heads/master:refs/heads/qa-master

Refspecs give you complete control over how various Git commands transfer branches between
repositories. They let you rename and delete branches from your local repository, fetch/push to
branches with different names, and configure git push and git fetch to work with only the branches
that you want.

Relative Refs
You can also refer to commits relative to another commit. The ~ character lets you reach parent
commits. For example, the following displays the grandparent of HEAD:

git show HEAD~2

But, when working with merge commits, things get a little more complicated. Since merge commits
have more than one parent, there is more than one path that you can follow. For 3-way merges, the
first parent is from the branch that you were on when you performed the merge, and the second
parent is from the branch that you passed to the git merge command.

The ~ character will always follow the first parent of a merge commit. If you want to follow a
different parent, you need to specify which one with the ^ character. For example, if HEAD is a
merge commit, the following returns the second parent of HEAD.

git show HEAD^2

You can use more than one ^ character to move more than one generation. For instance, this
displays the grandparent of HEAD (assuming it’s a merge commit) that rests on the second parent.

git show HEAD^2^1


To clarify how ~ and ^ work, the following figure shows you how to reach any commit from A using
relative references. In some cases, there are multiple ways to reach a commit.

Relative refs can be used with the same commands that a normal ref can be used. For example, all of
the following commands use a relative reference:

# Only list commits that are parent of the second parent of a merge commit

git log HEAD^2

# Remove the last 3 commits from the current branch

git reset HEAD~3

# Interactively rebase the last 3 commits on the current branch

git rebase -i HEAD~3

The Reflog

The reflog is Git’s safety net. It records almost every change you make in your repository, regardless
of whether you committed a snapshot or not. You can think of it is a chronological history of
everything you’ve done in your local repo. To view the reflog, run the git reflog command. It should
output something that looks like the following:

400e4b7 HEAD@{0}: checkout: moving from master to HEAD~2

0e25143 HEAD@{1}: commit (amend): Integrate some awesome feature into `master`

00f5425 HEAD@{2}: commit (merge): Merge branch ';feature';

ad8621a HEAD@{3}: commit: Finish the feature

This can be translated as follows:

 You just checked out HEAD~2


 Before that you amended a commit message
 Before that you merged the feature branch into master
 Before that you committed a snapshot

The HEAD{<n>} syntax lets you reference commits stored in the reflog. It works a lot like the
HEAD~<n> references from the previous section, but the <n> refers to an entry in the reflog instead
of the commit history.

You can use this to revert to a state that would otherwise be lost. For example, lets say you just
scrapped a new feature with git reset. Your reflog might look something like this:

ad8621a HEAD@{0}: reset: moving to HEAD~3

298eb9f HEAD@{1}: commit: Some other commit message

bbe9012 HEAD@{2}: commit: Continue the feature

9cb79fa HEAD@{3}: commit: Start a new feature

The three commits before the git reset are now dangling, which means that there is no way to
reference them—except through the reflog. Now, let’s say you realize that you shouldn’t have
thrown away all of your work. All you have to do is check out the HEAD@{1} commit to get back to
the state of your repository before you ran git reset.

git checkout HEAD@{1}

This puts you in a detached HEAD state. From here, you can create a new branch and continue
working on your feature.
Summary
You should now be quite comfortable referring to commits in a Git repository. We learned how
branches and tags were stored as refs in the .git subdirectory, how to read a packed-refs file, how
HEAD is represented, how to use refspecs for advanced pushing and fetching, and how to use the
relative ~ and ^ operators to traverse a branch hierarchy.

We also took a look at the reflog, which is a way to reference commits that are not available through
any other means. This is a great way to recover from those little “Oops, I shouldn’t have done that”
situations.

The point of all this was to be able to pick out exactly the commit that you need in any given
development scenario. It’s very easy to leverage the skills you learned in this article against your
existing Git knowledge, as some of the most common commands accept refs as arguments, including
git log, git show, git checkout, git reset, git revert, git rebase, and many others.

Git Workflow

You might also like