Git For Authors
Git For Authors
David Farmer
American Institute of Mathematics
The essential concepts of git are treated in the first four chapters, they should
be read thoroughly and carefully, in order. Chapter 2, Chapter 3, and Chapter 4,
treat the cases, respectively, of one, several, and many collaborators, with very
different organizational models, but the concepts are universal to all three
models.
Eventually you will find more intricate topics in Chapter 5 and Chapter 6
necessary or useful, and maybe both. Chapter 7 explains ways to backup
and correct mistakes. Chapter 8 is a grabbag of short topics that do not fit
naturally in the early material and would have just been a distraction. At some
point familiarize yourself with what is on-offer there. The appendices contain
technical information that might change over time, in addition to reference
material.
iv
Contents
1 Introduction 1
2 All By Yourself 3
2.1 Commits . . . . . . . . . . . . . . . . . . . . . 3
2.2 Branches . . . . . . . . . . . . . . . . . . . . . 7
2.3 Commit Hashes . . . . . . . . . . . . . . . . . . 12
4 In Control 23
4.1 Creating a Pull Request . . . . . . . . . . . . . . . 23
4.2 Reviewing and Accepting a Pull Request . . . . . . . . . 27
5 Merge Conflicts 32
7 (∗) Oops! 35
7.1 That is So Messed Up . . . . . . . . . . . . . . . . 35
8 Git Miscellany 36
8.1 (∗) Word Diff . . . . . . . . . . . . . . . . . . . 36
8.2 (∗) Impersonating a Commiter . . . . . . . . . . . . . 36
8.3 The Stash . . . . . . . . . . . . . . . . . . . . 36
8.4 (∗) Tagging Releases, Signing a Repository . . . . . . . . 37
8.5 (∗) Who Did What, and When? . . . . . . . . . . . . 37
8.6 (∗) Where Did it All Go Wrong? . . . . . . . . . . . . 37
8.7 (∗) File Management . . . . . . . . . . . . . . . . 37
8.8 (∗) Binary Files . . . . . . . . . . . . . . . . . . 38
v
CONTENTS vi
9 Parting Shot 39
Appendices
C Quick Reference 45
F List of Principles 48
Back Matter
Resources 49
Chapter 1
Introduction
1
CHAPTER 1. INTRODUCTION 2
□
That example is a bit contrived, but basically realistic. And there is
necessarily some hand-waving in how git helps you. We will guide you carefully
through a similar application of git in Checkpoint 2.2.1.
To use git you need to author with source files that are simple text. So
that likely means using some markup language like LATEX, MathBook XML,
Markdown, reStructured Text, or a similar language such as those supported
by pandoc. XML formats for Word files might work out acceptably, but will
be less than perfect. We are also going to show you how to use git at the
command-line in a terminal, so you might need to consult a tutorial on using a
command-line and file management. We have some quickstart information on
git itself in Appendix A.
git provides some powerful tools, and perhaps as a corollary, has a steep
learning curve. There is no right way to use git, and the way you choose to
employ it may in large part depend on social aspects of your project and its
organization. We will organize our initial material according to the number
of contributors and the organizational structure of the people in your project:
solo contributor (Chapter 2), a few collegial contributors (Chapter 3), or many
contributors with a central authority (Chapter 4). Read all of these chapters,
and read them in order, even if you know which model you might want to
eventually employ. The techniques, concepts and principles build on each other,
and there are no firm rules about numbers, procedures or organizational models.
It is even possible they will fluctuate over the lifetime of a project.
We will distill some of the essential concepts of git into a short series of
eight Principles, summarized in Appendix F. Review them often, until you
feel you understand them well, and it will be much easier to grasp the finer
details of git.
Principle 1.0.2 Git is a Tool. git is a tool that can be adapted to many
purposes and projects, in many different ways.
Discussions on the Internet about how best to use git can generate a lot of
heat, but not much light. Ignore them. Understand the basic principles of how
the tool functions, learn the commands that work for your projects, and employ
it to make your real work more productive, efficient and enjoyable. Simply
becoming a git expert is not the object.
You will find that git is like a garage door opener. Without one, you wonder
why you might ever need one, but once you start using one, you decide you
could not live without it. Or to extend the automotive motif, it is like a seat
belt–you feel unsafe without it. Enjoy getting to know git.
Chapter 2
All By Yourself
git is at its best when people collaborate, but it can still be valuable to an
individual, and you may gain collaborators or contributors in the later stages
of a project. We can understand some basic concepts by considering first the
simple situation of a single, solo contributor.
You will not fully appreciate all of our Principles on a first reading, but if
you come back to review them, they may be more useful on each reading. Here
is one such.
Principle 2.0.1 Git Manages Changes. git manages collections of changes
to your files. It does so in linear sequences of incremental changes that are
always consistent.
You may be very comfortable with organizing your writing as a collection
of files, perhaps further organized in a series of directories or folders, perhaps
even nested several levels deep. git works on, and manipulates, your files for
you, which can be disconcerting at some point. The objects that git stores
and tracks are collections of changes to your files. One such collection might
make several small changes to several different files (perhaps you renamed a
character throughout your novel), or the change may be to add a new file (an
entire new chapter, say). As you instruct git to move between branches, you
might see your character’s name change back and forth, or you might see your
new chapter entirely disappear, only to reappear later. The files in your project
are the cumulative result of many changes applied in sequence, not some final
state that never regresses to an earlier state. But don’t panic, git has all your
changes stored away safely.
git manages changes to your files, and those changes are accumu-
lated in files, whose state may change in ways you are not accustomed
to.
If that sounds scary, realize that RAB is self-taught when it comes to git
and has never lost any work. He has panicked. But he has gained valuable
experience puzzling his way out of some jams (see Chapter 7). And once he
ended up applying the same collection of changes twice, mysteriously getting
two of everything in a chapter (see Section 8.3).
2.1 Commits
Collections of changes can be called changesets, but we will be more likely to
call each such collection a commit. That is a noun, not a verb. If you have
3
CHAPTER 2. ALL BY YOURSELF 4
some experience with other revision control systems, then you might be familiar
with the notion of “committing”, or “checking in.” Try to avoid confusing the
new noun with your old verbs.
How do you make a commit? Roughly, you edit your files, so that your
directory of files (your working directory) is dirty. The dirty or clean
directory is a good mental image as you start working with git. Edit your
files, and save your files. Normally, you feel pretty secure at this point. You
have made changes, and by saving the edited files, you feel like you have saved
your changes. But from git’s perspective, your files are dirty and you have not
made your changes known to git yet. Here is the drill, using two commands at
the command line in a terminal.
List 2.1.1 Making a Commit
1. Edit some files and save them, making your working directory
dirty.
You will get no reaction (output) from the git add command, but when
you actually make the commit, you should get a response like
[master c0f19a2] Add the incident at the train station
OK, that is a basic recipe, but what actually happened? In the add command
you would have listed some, or all, of the files you had edited and saved. If you
only listed some, the commit would only contain some of your changes, and the
remaining changes would contribute to keeping your working directory dirty.
The add command moves your changes in the indicated files to a staging area,
a sort of purgatory, called the index. We say those changes are staged. You
can incrementally add changes to the index to form a coherent set of changes
that will eventually become a commit. For example, above you could have run
the add command three times, once for each file, to stage the same collection of
changes. If you further edit a file after git add, you can add that file again to
move the subsequent edits into the index.
Realize that git add does two similar things. If git is unaware of some file,
then add will make it one of the files that git tracks and will put the current
contents of that file into the index. And from now on, git will include relevant
details about this file in reports. For example if the file is dirty, then certain
reports will show the changes (see next paragraph). But “tracking” a file does
not mean git automatically packages up changes. That is your job. You have
control of exactly which changes git will manage, and when you want git to
become aware of those changes. Subsequently, git add moves changes from a
file into the index, and you can do this repeatedly to update which collection
of changes are staged in the index.
With all this talk of a dirty directory, how can you tell if your directory is
even dirty at all? The command is git diff. It takes no action and is merely
informative. You can run it anytime you like and it is wise to do it often,
especially when getting started. RAB often walks away from his writing with
a dirty directory (not best practice). So it is a good habit he has to always
run git diff when first returning to a project. The output of git diff is all
of the changes in your working directory that are not staged into the index.
It is organized by file (given in yellow on my computer), with red text being
removed and green text being added. White text is unchanged and provides
CHAPTER 2. ALL BY YOURSELF 5
context for changed text, in order to help git apply changes in the right places.
Solid red squares or bars are extraneous whitespace that serves no purpose
other than to potentially confuse git. It is a good idea to become comfortable
understanding this information. When all your changes are in the index, your
working directory is now clean, and git diff reports nothing.
git diff drops you into a simple program known as a pager. The down
and up arrows work to scroll through the output, the spacebar advances by a
screenful, and the b key takes you back a screenful. Press h for help on more
commands, and use q to quit and exit.
As you add changes to the index, you can see what your future commit
looks like by running git diff –cached, which will report the accumulated
changes in the index, using the same format.
After all this add’ing and diff’ing, making the commit itself is straightfor-
ward. git commit will do the job—moving changes from the index into a single
collection of changes, a changeset, to be stored, managed and manipulated by
git. Technically, this is an irreversible action, but in practice there are many
ways to back-up and have a do-over, especially when you are solo. So don’t
panic.
The -m switch allows you to make a commit message on the command
line, which you should enclose in quotation marks (single or double, allowing
use of the other kind in your message, if needed). Without it, git will dump
you in your editor, a step we prefer to avoid. Either way, you will always want
to include a commit message. They can have multiple lines, but in practice we
like to keep them to one concise line, leading with a capitalized action verb,
and not more than about sixty characters. These messages will help you find
your way in your git repository, and they will be the first thing others see if
they peruse your repository. We think they are worth some thought toward
making them informative and helpful, rather than sloppy and uninformative.
You are an author, no? Treat your commit messages much like the entries that
form a Table of Contents.
information on every commit on your current branch. And you will see some
huge commit hashes. There are 268 435 456 different possibilities for seven
hexadecimal digits. A full 40-digit commit hash has about 1048 possibilities.
This is not a technical aside, we will see soon enough the critical role commit
hashes play in a git repository (see Section 2.3). Even though your commit
about the train station incident might often be shown shorthand as c0f19a2, it
may in reality be
c0f19a223404c394d592661532747527038754e
which you would see in the log.
Here are two more useful diagnostic commands. git status will tell you
which files are dirty, which files have changes staged in the index and destined for
the next commit, and which files are lurking about in your directory, but which
you have not ever told git about. This is a good command to run frequently,
especially when you are beginning. Finally git ls-files will output all the
files git has changes for. This one is interesting, but less useful day-to-day.
We will primarily teach by guiding you through exercises. They almost
always have extra information, so read them just for that. Experiment with
a scratch repository where you can try different things without the inevitable
mistakes also being worrisome disasters. And realize you can always start over.
Checkpoint 2.1.3 My First Repository.
1. Consult Appendix A for instructions, setup git, and init an empty
repository.
2. Make several commits, creating and adding at least three files into the
repository. Use git diff, git diff –cached, git status, and git log
liberally in the process.
Put some non-trivial content into each file (though it does not need to be
excessive). We will use this repository for future exercises, so do not get
rid of it. Put a typographical mistake into one of your three files.
3. Do not experiment with branches, we will do that next.
4. Once completed, use git show master to see the changes in your last
commit, in the diff format. Use git show master~1 to see the changes in
the commit just prior to that one. And git show master~2 for the one
before that. Try replacing the references (master~N) by the first seven or
eight digits of the commit hash, which you can get from the output of
git log, and see that this is the functional equivalent of using branch
names with relative references.
When you are done, the logical arrangement of your commits might look like
the following diagram. We list older commits at the bottom and do not include
commit messages. We use a 4-digit hash, which will uniquely identify each
commit. The name with the arrow points to the tip of the branch with that
name. The commit at the bottom, the first commit ever, is known as the root
commit. As your repository gains more branches, it will look more and more
like a tree than a twig. This diagram should be similar in spirit to what git
log reports for this simple first exercise.
2.2 Branches
The term branch gets used many ways in git, as a verb and a noun, likely
because it is so fundamental. We will introduce you to this concept with an
exercise, that will also give you some idea of the power of the idea. So even if
you do not work the example right away, read through it, since it contains a lot
of explanation.
Checkpoint 2.2.1 Hero and Heroine on Branches. Make sure your
practice repository has at least three files in it, with one typographic mistake.
So go ahead and create a new file and a new commit if you need to. We will
wait.
We are going to pretend that we are writing a novel and we want to design
two possible endings. In one, the hero dies, and in the other, the heroine dies.
Make sure your working directory is clean and type git branch heroine.
Not much will appear to have just happened. So here is a new diagnostic
command you will want to use all the time. Type git show-branch. This shows
all of your possible branches, with an asterisk (“*”) indicating the branch you
are “on” and an exclamation mark indicating other branches (“!”). It is not
very informative yes, since your repository is just beginning to take shape and
you have only just created a branch. Be sure to keep this output visible in your
terminal for an upcoming comparison.
Now we are going to move to you new branch. Ready? Type git checkout
heroine. Now run git show-branch again and compare to the previous output.
The big change is that the asterisk has moved to indicate the heroine branch.
We say you are now on the heroine branch. If you have experience with other
revision control systems, the word “checkout” is fraught with other meanings
that are not accurate. Sorry, just get over it. In git the checkout command
will change the files in your working directory to some possibly different state,
based on a different sequence of changes in a sequence of commits constituting
a different branch. (Remember, commits are collections of changes.)
Choose one of your two files without a typo and edit it to include some
words about the heroine dying, and also remove some existing words at the
same time. Use git diff, git add, git commit to form a new commit that
is the changes you just made, and with a commit message about creating an
ending where the heroine dies. You should have a clean working directory after
the commit.
Now show-branch will show something interesting:
open your files. git has manipulated your files and applied the changes with
the ending, so now your novel should contain the new ending.
Perhaps you now see why git can feel a bit foreign and why at some point
you will have that “Oh, &\#∗!” feeling. Don’t panic. git manipulates the
state of the files in your working directory, so in a way those files are ethereal.
git stores your commits (collections of changes) safely and can rewind and
replay them in ways consistent with your writing while on various branches,
using your working directory as a sort of sandbox or laboratory.
Switch back and forth several times between the master and heroine
branches with git checkout. Close and open your files in your editor as
you let git manipulate your working directory. Run git show-branch liberally.
When you are ready, leave a file open when you know a checkout is going to
change it. How does your editor react? We don’t know, so can’t help. But
a good editor might say something like “File on disk changed, do you want
to reload it?” We use an editor that just silently reloads the file, unless there
are unsaved edits in it by mistake. If a git checkout results in a file not even
being present, our editor leaves it in a state we find very confusing. Sometimes
this sort of behavior can be configured in a editor. Experiment until this is not
confusing and in the meantime as you are learning, close and reopen files as
you instruct git to manipulate the state of your working directory.
Ready for another branch? Make sure you are on master (by running
git checkout master). Then make a new branch with git branch hero, and
switch to it with git checkout hero. Run git show-branch as you go, studying
carefully how the output is changing.
Open the file that is different from the one you edited before, and that is
not the one with the typo. Remove some words, and author an ending where
the hero dies. Create a commit with these changes, using a commit message
about the hero dying. (We might now use “commit” as a verb, and say commit
your changes.) Now show-branch will produce something like:
has advanced one commit, so that the hero branch still splits off prematurely
from master, but the heroine branch does not.
Running git log you will see that the second commit (the typo fix) was
reported by the merge as a short commit hash in the output (XXXX) and the first
commit (the heroine ending) is also reported by the merge via a short commit
hash (XXXX). This is the movement of the master branch pointer, indicated by
.. between the hashes.
This is actually a very special type of merge, known as a fast-forward
merge, as reported in the output. Since the heroine branch splits off from the
tip of the master branch, it is trivial to rewind all the heroine commits back
to master and then replay them onto master. Think about that for a minute.
Not only is it trivial, it borders on silly.
Here is what really happens in a fast-forward merge. No commits are
rewound and replayed. git simply moves the master branch pointer from its
original location, so that it points to the same commit as the one that heroine
points to. This has two implications. First, none of the commits on the heroine
branch change in any way. Second, once the merge is completed master and
heroine are redundant, as they point to the same commit, as you can see in
the top half of the output from show-branch where the commit message is
duplicated.
The previous paragraph is an important realization, but you might be wise
to forget about it. In the long run, it is better to think of this merge as
bringing (merging) the changes on heroine into master. That is the more
general situation, and we remember how the simpler version of describing a
fast-forward merge confused more general concepts later.
Now we have a bit of a mess on our hands. The heroine branch pointer is
redundant, but really it is obsolete. Our trial death of the heroine is no longer
experimental, we have decided that is the ending we want and now it is part of
master. Let us kill the heroine branch pointer too.
will garbage collect them when it does some automatic spring cleaning of your
repository. For us, there is no real harm in leaving the hero branch in place,
and it is instructive to wait a while before killing it. It is doing no damage
where it is, we do not ever need to check it out, though eventually we will grow
tired of seeing it in all our diagnostic reports.
Realize that the decision above to merge heroine into master should not be
taken lightly, as it is quite difficult to reverse it, and not really in the spirit of
how you use git. Branches like hero and heroine are sometimes called topic
branches. Or in a nod to software development they may be called feature
branches, since they might be used to create and test a new feature for a
larger piece of software. I like to refer to a branch like master as the mainline.
An essential aspect of working with git is to always work on a branch, and
realize that there is little cost to making a branch, merging it, and then deleting
the (temporary) branch pointer. I have made branches that only had a lifetime
of five minutes. That is a principle.
Principle 2.2.3 Always Work on a Branch. Always make a topic branch
off your mainline when starting new work, and only merge once satisfied.
the rewound branch. The previous hash(es) are meaningless (and lost to time).
Notice that git updates the branch pointer hero to use a new hash from the
tip of the replayed branch.
This is a principle that will be important once we get social and work with
others.
Principle 2.3.1 A Rebase Changes Hashes, a Merge Does Not. A
rebase will always change some commit hashes, while a merge will never change
any commit hashes.
You now know, and have experience with, four of the six important concepts of
working with git: committing, branching, rebasing and fast-forward merging.
Only (general) merging and pull requests remains. But first, let us get social and
begin collaborating with others, to realize some of the most powerful aspects of
git.
Chapter 3
One of the real benefits of git is the ability to collaborate easily with others in a
de-centralized manner across time and space. In this chapter we will experiment
with a model that is appropriate for a small group of collegial collaborators.
My dictionary (WordNet) defines “collegial” as “characterized by or having
authority vested equally among colleagues,” which is precisely the situation we
will simulate. So, in this chapter we will explore the scenario where a small
group of equals works on a writing project, on the assumption that everyone in
the group is trusted to make any sort of change at any time.
We will not be using email attachments or Google Docs to communicate.
Rather, an easy way to collaborate with git is to place a copy of the repository
on a server where each collaborator has the right privileges to interact with
the repository. If your project is secret and sensitive, this might be a server at
your workplace, or a web host you control and trust. Or maybe the project is
not so sensitive and a private account at GitHub is appropriate and easy to
set-up. Or maybe your project has an open license and an open repository will
eventually allow total strangers to contribute to your project (see Chapter 4).
For exercises in this chapter, we will use GitHub1 , a (free) site that hosts git
repositories along with tools supporting collaboration around a git repository.
Principle 3.0.1 Merge into Your Current Branch. A merge integrates
changes into your current branch, from a branch you specify.
14
CHAPTER 3. WITH A FEW FRIENDS 15
research, they decide to write their paper openly as a public GitHub repository,
and they decide to host the repository in Alice’s account. Everything else will
be discussed on GitHub.
Work this exercise playing the role of Alice. If you have a friend who can
be Bob, all the better, but you can also play both sides of the collaboration
yourself and get almost as much out of the exercise (if Bob is somebody else,
then he need a GitHub account, but if you are playing both sides, then your one
GitHub account is enough.). Alice (you!) will log into her GitHub account and
initiate a new repository. Recall that in Chapter 2 we created a new repository
on our local computer at the command-line with git init. Now Alice will let
GitHub do that step since GitHub will automatically configure the repository
for subsequent communication.
See Section B.2 and Section B.3 for instructions on the steps in this para-
graph. Alice will create a new repository and name it banking-paper. She will
make Bob a collaborator on the repository since she knows Bob’s username on
GitHub from their previous collaborations. So there is now a fresh repository
on GitHub, which Alice and Bob can manipulate. We are going to call this the
definitive repository, as it will hold the “official” version of their paper. In
a minute we will setup Alice and Bob with local copies, but they have agreed
that those are just their local workspaces and the repository on GitHub always
holds the latest, and presumably best, version of their paper.
Section B.4 contains the necessary instructions for this paragraph, but are
more general, so read them and this paragraph through completely before
doing anything. In particular, ignore any discussion of “forks” until Chapter 4.
Alice should make a copy of the fresh repository onto her work computer,
and Bob should do the same. If you are playing both sides this exercise
yourself, copy the repository once, and then rename the banking-paper directory
to alice-banking. Then copy again and rename the resulting directory as
bob-banking. These changes have zero effect on how your repository behaves,
but you will need to mentally figure out which files you should be working with
in the remainder of the exercise.
In principle, Alice and Bob are totally setup and organized, and never even
need to visit the GitHub site ever again. But GitHub has some nice tools and
Alice and Bob have decided to be 100% transparent in their work. A GitHub
issue is like a topic on an online discussion forum. It is designed mostly for
reporting and discussing bugs in software, or requesting and implementing new
features in software. But they can also be used for planning and discussion.
Alice and Bob would like to plan their writing as an open discussion on GitHub,
deciding that Alice will concentrate on the introduction since she is the better
overall writer, and Bob will therefore get started on the section with the details
of the vulnerability. They will work more closely on the final section containing
recommendations.
So in our exercise, Alice should create a branch off of master named intro,
create a file intoduction.txt, add it to her branch, make some edits, commit
the changes, and so on. Bob should do similarly but make a branch off master
named vulnerable where he adds and edits a file vulnerability.txt as a series
of commits. Recall that Principle 2.2.3 says Alice and Bob should do all of
their work on branches.
Alice had the simpler task, so let us assume she finishes the introduction
first. She does not know she is first, she does not even have any idea where Bob
is in his writing. She has been doing her best to get the introduction right, and
to not disturb Bob, who is presumably also working hard. So Alice suspects
there are no new commits on the master branch, but does not really know. OK,
Alice is going to update master with a pull, see no new commits there, do a
CHAPTER 3. WITH A FEW FRIENDS 16
fast-forward merge of her intro branch into master locally, and then push her
master branch to GitHub. We will do the details carefully, but recognize that
the push and pull are the only new concepts we did not cover in Chapter 2.
But first, a bit of diagnostic work. Alice’s repository was copied from GitHub
and therefore is aware of its heritage.
So Alice has placed her introduction in the definitive repository without any
additional coordination with Bob. Time is important, so Alice initiates an
issue on GitHub to discuss their recommendations, which will now interrupt
Bob, but they need to form a plan. However, without waiting for Bob’s reply,
Alice creates a new branch off master named last-chapter where she simply
adds an empty file named recommendations.txt. She repeats the steps above
and with a fast-forward merge, updates her local master and pushes it to the
definitive repository on GitHub. Alice can now clean up by deleting her intro
and last-chapter branch pointers that have become obsolete.
Now that Alice has communicated with a public repository, known to
the world, it is the right time to introduce a principle that we will illustrate
subsequently.
Principle 3.1.2 Never Alter a Public Commit. Never, ever, alter in any
way a commit that has been made available to anybody else.
We discussed the nature of commit hashes in Section 2.3. We have seen how a
local rebase changes commit hashes in Checkpoint 2.2.1. And we have a principle
about a rebase changing hashes, while a merge does not (Principle 2.3.1). Alice
can rebase her branch all she wants within her repository on her local computer,
but the instant she pushes commits to the definitive repository, they become
available to her co-author (Bob), and to the entire world. The commit hash
for each of these commits is a globally unique ID (GUID). There are ways
to modify these public commits in the definitive repository, but this would be
tantamount to chopping off somebody’s finger and replacing it with a new one
with a different fingerprint. Don’t do it!
Why not? All of git’s coordination is predicated on identical commits
having identical ID (the commit hash). In Chapter 4 we will expand our circle
of contributors to anybody in the world (don’t panic, we will have a procedure for
approving changes before they go into the definitive repository). Manipulating
a public commit will totally confuse git, make a big mess, and infuriate your
collaborators, whose copies of the repository are no longer consistent with
your ill-advised action. This may be the only advice that all the Internet git
commentators can agree on. If you follow the procedures we are describing, this
will never be a danger. But when you push a commit to the master branch of
the definitive repository and feel like you made a mistake, resist the temptation
to go backwards locally and then do a “forced push” to the definitive repository.
You might get away with it for a while, but eventually you will regret it. Just
live with it (a misspelled commit message), or add another commit to fix your
mistake (a grammatically poor sentence). And forget we even mentioned the
possibility of changing public commits.
Back to our exercise, now Bob has finished up his section, so he wants to
make it part of the definitive repository. He suspects Alice has finished the
introduction, since she was eager to discuss the recommendations. So, just like
Alice, he is going to update his master branch.
What does “private” mean here? We have seen that commit hashes form
a chain of repeated hashes all the way back to the root commit (Section 2.3),
and that we can identify commits by the leading digits of a commit hash. Also,
a rebase will always change some commit hashes (Principle 2.3.1). So while
the present principle advocates frequent rebases, never perform a rebase that
changes a commit hash that has been made available to somebody else. This is
the advice contained in Principle 3.1.2, and now could be a good time to back
and re-read the discussion that follows it.
We have seen how to pull from the master branch of the definitive repository,
and how to push commits to the master branch of the definitive repository. So
we have the tools for two-way communication between repositories. We can
pull from public repositories at will, but need permission to push to repositories
where we are trusted to make unilateral changes.
When you have resolved this problem , run " git rebase
-- continue ".
If you prefer to skip this patch , run " git rebase -- skip "
instead .
To check out the original branch and stop rebasing , run " git
rebase -- abort ".
Boom! The scariest thing that can happen in git. But you know we
engineered this conflict to happen and we are not going to panic. (If you
do want to panic, try git rebase –abort as suggested and when you collect
yourself and settle down, come back to try the rebase again.)
Your hint about where the conflict lies is in the line
CONFLICT (content): Merge conflict in recommendations.txt
Details on resolving conflicts are in Chapter 5 so head there for details on
what to do now, specifically see the instructions for a “rebase conflict” in
List 5.0.1. Briefly, Bob will open recommendations.txt in his editor and see
the two different second paragraphs adjacent to each other, with Alice’s official
second paragraph first, and his proposed second paragraph afterwards. Now
there is no notion of official or not, Bob and Alice are equals. But Alice got
there first, so Bob has the responsibility to decide which paragraph to keep, or
to keep both, and in what order. If Alice does not like Bob’s decisions, they
can do another round of changes (perhaps after some discussion on a GitHub
issue, including commit hashes in the discussion to reference the details of any
disagreement).
Done resolving the conflict, Bob’s recommend branch now comes off the tip
of master and is positioned for a fast-forward merge that he can then push to
origin/master.
CHAPTER 3. WITH A FEW FRIENDS 22
In Control
We now turn to the situation where your book may have numerous collaborators.
Maybe your book is an anthology of short stories with thirty contributors, or
maybe it is a human anatomy textbook with the potential for students to
discover small errors for many, many years down the road. If the anatomy
textbook has four authors, you might begin as colleagues as in Chapter 3, but
later may have specialists reviewing and editing select chapters, until finally
the book is published and students suggest changes and corrections for the
inevitable Second Edition. git is flexible enough to help you at every step.
So the model here is a select few that are in control of the repository. They
might be called authors or editors, but we will refer to them generically as
overlords (WordNet: “a person who has general authority over others”). In
our opinion many projects should be large, because you want to encourage
people to contribute.
23
CHAPTER 4. IN CONTROL 24
• Everyone who wants to contribute to the project also has a copy of their
fork on their own computer. (We now know of three copies of the project:
definitive, your fork on GitHub, your fork on your computer.) We will
refer to this as “the version on your laptop,” although your computer
might be a desktop machine, or something in the cloud, or something else.
(The technical term for “version on your laptop” is clone, but we will
not use that terminology. Keep in mind that the version on your laptop
was copied from your fork, not from the definitive repository.)
• The repository on each contributors’s individual laptop knows about two
repositories on GitHub:
◦ Their own fork, which is called origin.
◦ The definitive repository, which is called upstream.
Both of these repositories are known and managed as remotes within
the repository on the laptop.
To summarize, everyone has a repository on their laptop. Everyone has the same
upstream remote: it is the definitive repository. Everyone has a different origin,
which is their own personal fork (a repository) in their account on GitHub.
Notice that in Chapter 3 the origin remote was the definitive repository, but
now origin is a sort of intermediary between the repository on your laptop
and upstream, the definitive repository. (See Appendix B for details on forking
a project on GitHub and setting up your laptop with a copy.)
Once all of that is set up, you go through the exact same cycle every time
you want to contribute changes to the project. You already know how to work
with branches, and their usefullness in a personal project, or with a small group.
Chapter 6 is devoted entirely to how you can work effectively with a branch.
In a big complicated project, like a calculus textbook, there are lots of
different things that may need attention. Maybe you need to put in the solutions
to the problems from Chapter 6, or maybe you need to add a new section on
the chain rule to Chapter 4, or maybe you need to edit the introduction to
Section 5.3.
With branches, you know that you could use your repository to productively
work on all three tasks, switching between them at will. You would have three dif-
ferent branches, solutions-chap-6, chain-rule-section, intro-section5-3,
and in each branch you would be doing something a bit different.
Why is this good? There are many reasons, two of which are: (i) You do not
need to mess up the working version of the book, because you are just working
on a copy, and (ii) independent changes can be evaluated independently. Item
(ii) suggests the following principle.
Principle 4.1.1 Pull Requests Separate Creation and Approval. The
process of suggesting changes to the definitive repository of a project is separate
from the process of accepting those changes.
This may seem silly if you are thinking in terms of a single person writing
a book. But if the writing and editing is a group effort, and many people
are contributing, it makes perfect sense for all changes to require at least two
people: the person who wrote the new material, and the person who agreed
to add that material to the definitive version. Notice that the sections of this
chapter are organized exactly according to this principle.
So, you make a branch when you are about to start working on some aspect
of the document (Principle 2.2.3). You may work on that branch for just an
hour, or for several days, or off-and-on for months. You may switch to working
on other branches. If it sounds confusing at first, it will not be confusing once
CHAPTER 4. IN CONTROL 25
you start doing it, and it is totally worth it. For example, suppose you have
finished making the solutions to the problems for Chapter 6. Good, because
now you can propose those changes to be included in the definitive repository,
and it is no problem that you have not yet finished the other work you are
going, because those tasks are on different branches. And if the overlords review
your work promptly, that is nice, but if it takes them some time to do that,
you are not hung up waiting for their feedback before you can go back to to
editing your new section on the chain rule. Juggling several writing tasks has
just become a whole lot easier.
One last bit of terminology: proposing that the changes in a branch go
into the definitive version is called a pull request. As in, “I request that the
overlords pull my changes into the definitive repository.” Once the authorities
accept your pull request, the changes from your branch are now incorporated
into the definitive repository and part of the official version of the project.
Now that we know the purpose and workflow of a pull request, and how to
setup our laptop, let us describe the procedure of making a pull request. Notice
that we are leveraging what you already know about branches from Chapter 2
and what you know about pushing and pulling changes from Chapter 3. You
will go through the following recipe every time you want to contribute to a
project where an overlord will review your contribution.
What might happen next? Hopefully someone with control over the
definitive repository will accept your pull request. But maybe they want you
to make some changes; perhaps they found a typo or other error. No problem:
just checkout your branch again on your laptop, edit to make the corrections,
save those changes as a new commit on the branch, and push your branch to
your fork (origin). This is the same recipe as in List 4.1.2, except that you
skip over the command that actually creates branch_name, since that branch
already exists.
Your pull request will be automatically updated and the overlords notified.
Notice that we are not changing any commits, just adding to, and extending
the branch. And anybody can look and see the record of your original proposal
CHAPTER 4. IN CONTROL 27
Locate a Pull Request on GitHub. GitHub has likely sent you a notifi-
cation immediately after a contributor has created a pull request, and maybe
that email has a link to take you directly there. But you also need to know
how to find any given pull request later.
Go to the project’s home on GitHub and see if there are any pending pull
requests. Note that you are going to the definitive repository, not your own
personal fork of the project. You have your manager-hat on now, not your
contributor-hat.
It is common to see seven tabs across the top of the repository on GitHub:
Code Issues Pull requests Wiki Pulse Graphs Settings
The first three and the last one are used most often. For Issues and Pull
requests there will be a little number telling you how many items need your
attention. Click on Pull requests. Each pull request will have a title, and it
will tell you who made the pull request and when they created it. Click on one
of the pull requests.
Initial Review of a Pull Request. If the contributor did a good job, there
will be a few sentences describing what they did, and an indication of how
their proposed changes appear in the final product, such as “New exercises for
Chapter 6, see page 88.”. Let us assume the contributor gave a clear description,
and what they describe sounds like something beneficial to your project.
Below the description you should see a check mark in a green disk, and the
phrase
CHAPTER 4. IN CONTROL 28
In-depth Review of a Pull Request. Now you are looking at a pull request
with a description that sounds useful, and there are no conflicts. The next step
is to look at the actual changes the contributor made and their effect on your
project. There are three tabs below the title of the pull request:
Conversation Commits Files changed
Click on Files changed. Highlighted in red and green will be the lines which
were deleted and added (respectively) as part of this pull request. You need
to look carefully at what was written, because this is destined to become an
official part of your project. All large successful projects have standards for
writing the source material, and you should check that the author has done
a good job. Suppose, for example, you see the following line added to your
calculus textbook:
When finding a maximum, be sure to check {\em both} end points.
The author has not used the LaTeX markup language in the best way, so it
would be reasonable to click back to the Conversation tab, and in the comments
box put something like
Use \emph{...} instead of {\em ...} for emphasis.
It is a good idea to look through all of their changes and submit multiple
comments (in the same comment box), otherwise both of you will become
annoyed if you have to go back and forth several times.
Assuming you have looked through the contributor’s changes and the format
and content seems to look okay, now you need to actually check that their
contribution performs as claimed. If the project is code, that means you
need to compile and run their code. If it is a book, you need to produce
the book with their changes and see that the output looks good. Here is the
procedure for doing a preliminary review within your fork on your laptop (this
is your personal sandbox) and then actually incorporating the changes into the
definitive repository.
As a result you will now have a branch in your laptop’s fork named
fredstro-solutions-chap-6 with all of the changes fredstro is
proposing. Now you can do a thorough check on the pull request.
Produce your book, or run code, or whatever is appropriate for
your project. Examine the output to see if the changes performed
as expected. Notice that you have not endangered the definitive
repository in any way, and eventually you can just delete the
branch from your fork.
3. If the contribution fails to pass your discriminating review, then
leave a comment under Conversation and the person will respond
with further changes. That scenario is quite common: people often
forget to actually verify that their changes behave as claimed before
submitting a pull request. It can also happen that the overlord
evaluating the pull request fails to actually check everything before
accepting it. Then the real blame lies with the overlord who
accepted the request, because they are in a position of responsibility
for the project. So be very careful if you accept a pull request after
only glancing at the Files changed on GitHub. Once the changes
go into the definitive repository, you and your collaborators are
responsible for them.
5. Note that your laptop is still on the temporary branch you created
to check that pull request. So you should checkout master and
git pull upstream master. The master branch on your fork on
your laptop will now have the changes from the pull request you
just accepted.
If you handle a lot of pull requests you will acquire a lot of branches
which are no longer needed. We have seen earlier how to delete
branches. The temporary branches you make when you evaluate a
pull request have distinctive names, which makes it easy to identify
and delete them.
That is it! More than ninety percent of pull requests can be handled with
those simple steps. If you encounter a complicated situation, seek help from an
expert.
Large projects use a model similar to this one. Sage, an open source computer
algebra system, has hundreds of contributors with a single release manager
as the overlord. Reviews of proposed branches are distributed among the
contributors themselves, so if the release manager is familiar with the reputations
and expertise of members of the community, it can be feasible for a single
person to roll up fifty or so contributions on a weekly basis and still maintain
the integrity of the project, and their sanity.
CHAPTER 4. IN CONTROL 31
1 xkcd.com/1597/
Chapter 5
Merge Conflicts
1. In the output announcing the conflict, git will list exactly which
files have merge conflicts. Open these files in your editor and
search for ========. Above and below a this line of equal signs
you will see the text that git cannot resolve itself delimited with
hints on where it comes from. Edit freely to make the text look
the way you want it, and remove the markers: «««<, ========,
»»»>.
2. git add <file1> <file2> <file3>
You need to stage your changes in the index in preparation for a
commit, as usual. At this point, git status will include
(all conflicts fixed: run "git rebase --continue")
so go ahead. Notice that you are not given an opportunity to
change the commit message, and maybe your edits make this
desirable. This can be done later, see interactive rebasing in
Chapter 6.
3. git rebase –continue
You have modified the commit that did not rewind smoothly, and
so this command tells git to continue replaying commits from
the branch, including the one it is in the midst of replaying. The
remaining commits may apply smoothly, or they may present new
conflicts. So you may go through this recipe several times.
32
CHAPTER 5. MERGE CONFLICTS 33
you will see the text that git cannot resolve itself delimited with
hints on where it comes from. Edit freely to make the text look
the way you want it, and remove the markers: «««<, ========,
»»»>.
2. git add <file1> <file2> <file3>
You need to stage your changes in the index in preparation for a
commit, as usual.
3. git commit -m "Fixing merge conflict by..."
You now create one new commit, a merge commit. Remember,
in a merge no commits change in any way. But here you do create
an additional new commit that is slightly different in nature. It
holds the changes you made to resolve the conflicts, but unlike
other commits, it has two parents, not one. These are the tips of
the two branches that are being merged. This what makes a regular
merge different from a fast-forward merge. In show-branch it will
be shown distinctively as a dash, not an asterisk or exclamation
point.
Chapter 6
34
Chapter 7
(∗) Oops!
35
Chapter 8
Git Miscellany
This final chapter is a loose collection of tidbits that you may find useful,
without getting too technical or arcane. We have not tried to cover everything,
and we may have told you a few white lies along the way. When you exhaust
what we have here, re-read Chapter 9, and send us a pull request if you have
something useful (and not too arcane) for this chapter.
• git stash show will show you the diff of the changeset on top.
This is like looking real closely at the top plate.
36
CHAPTER 8. GIT MISCELLANY 37
• git stash list will show you all the changesets currently in the stash,
so it helps to provide good messages if you have many.
This like like reading the notes on all the plates in the warmer.
• git stash pop will put your changes back into the working directory and
remove the entry from the stack. It is up to you to make sure you are
on the branch you want to be, and that your working directory is in the
right state to accept these changes (or you may end up starting a merge
you may not want).
This is like taking a plate out of the warmer.
• git stash apply will take the changes on the top of the stack and put
them back into the working dirtectory, but it will leave the original entry
on top of the stash. RAB once ended up with duplicate copies of a set
of changes in his working directory. He thinks he did an apply and
subsequently did a pop. Maybe.
This is like magically duplicating the plate on top, and removing it, leaving
the original still in the warmer.
• git stash drop will remove the changes on top of the stack and throw
them away. This can actually be very useful. You may add and commit
a variety of changes to your branch, but still have some paragraphs you
do not really like, and decided not to use, still polluting your working
directory. Simple. Move the changes to the stash and immediately delete
them, no message needed. Careful, think twice, you really are deleting
changes, though they may be recoverable with advanced techniques.
This is exactly like taking the top plate out and dropping it on the floor
so it breaks into many unusable pieces.
• There are many more actions you can take with the stash, but the above
should be sufficient for intermediate git use. Consult the usual sources
for more advanced use. Note that prior to the time around git version
1.6.0 changesets in the stash expired, but it appers that behavior has
changed. So don’t panic if you see older posts that speak of expiration.
Parting Shot
There are often many ways to accomplish the same thing in git, and some will
be easier than others. You can find lots of advice on the Internet, some of it is
even good. Sites where answers get upvoted or downvoted are often useful. As
you gain a good understanding of the basic principles of how git works and
the job it has been designed for, you will get better at locating and evaluating
suggestions. We have tried to give you a headstart on that basic understanding.
Take notes when you find good stuff that works for you. (Much of the later
bits and pieces here are inspired by our own notes accumulated through our
first three years of gaining valuable experience with git.)
Beware of dogma. Some will say you should never rewrite history–we do
it all the time. Others will say you should always do fast-forward merges
and obtain a perfectly linear history–impossible on a big public project. Use
your independent judgement, and always remember the first of our principles,
Principle 1.0.2, git is just a tool.
But most of all, have fun and don’t panic!
39
Appendix A
We generally prefer to use a large collection of very sharp tools to get our
work done, and it is no different with git. There are pretty point-and-click
programs that are suppose to make it easier to use and understand git, and
even GitHub will offer to do certain tasks for you. In our limited experience,
we just find these interfaces at best confusing, and downright impossible when
things go south. Thus our decision to teach you how to interact with git at
the command-line.
As it is, command-line git is already an interface to even more primitive
commands. What you and I are using in these exercises are known as the
porcelain commands, while the primitive versions are the plumbing com-
mands. You can work out the metaphor. You can find tutorials that walk
you through a sequence of plumbing commands to effect one of our porcelain
commands like a commit or merge, and you can learn a lot from the exercise.
(see [3, Chapter 10].) Don’t panic if you see some terms below that you are not
yet familiar with, they will be explained eventually.
The information in this appendix is accurate as of 2016-04-10. Corrections
and updates are greatly appreciated. How about as a pull request?
40
APPENDIX A. GETTING STARTED WITH GIT 41
collaborators, and you would rather not go back and edit all of your commits.
Fortunately, git makes it easy. Use your real name, and use an email address
that you expect to own for a long time. Together, these two pieces of information
should identify you across all the repositories you will ever contribute to, and
across all the repositories ever made.
You can find lots of examples on the Internet (in addition to lots of custom
configurations). I prefer not to use too much extra configuration, so that if I
end up on an unfamiliar computer, commands continue to work as I expect.
Appendix B
GitHub1 is a site that lets users host, and communicate with, their git reposi-
tories and the repositories of others. There are similar services, but GitHub
appears to be the most popular, and so benefits from the network effects of a
large number of users. You can make an unlimited number of public repositories,
but must pay for private repositories. Here “public” means anybody can see
the content of your repository and copy it, but you retain control over who can
modify the repository.
The information in this appendix is accurate as of 2016-04-10. Corrections
and updates are greatly appreciated. How about as a pull request?
43
APPENDIX B. GETTING STARTED WITH GITHUB 44
which would fit our first GitHub example. Copy the URL to your clipboard.
Now open a terminal on your local machine where you can use the command-
line. Navigate to a directory, where a new repository-specific directory makes
sense for your work. In our first example, Alice might navigate to her existing
~/papers/ directory, where she expects to soon have a ~/papers/banking-paper
directory. OK, all set. Making a copy is known as a clone. (And a fork is just
a special type of clone.)
Quick Reference
45
Appendix D
You have forked a repository and cloned your fork to your laptop. You have
origin as a remote to your fork, and upstream as a remote to the official
repository for the project. Below are all the steps you typically need for
contributing to the project.
List D.0.1
46
Appendix E
You are one of the people in charge of a GitHub repository. Below are the
typical steps to evaluate pull requests.
List E.0.1
1. Click on the Pull requests tab, and then choose a pull request.
2. If there are merge conflicts, write a brief message and click Comment.
Now there is nothing else to do until you get an email either
replying to your comment, or telling you that the pull request has
been updated.
3. Click on the Files changed sub-tab, and carefully look at all the
changes. If there are any errors or violations of the official style
for the project, click back to Conversation and leave a helpful
Comment.
4. If the changes look reasonable, then you have to actually check
that the new material functions as advertised. Go to your laptop
and do:
git checkout master
git pull upstream master
Then click back to Conversation and click on command line
instructions. Copy and paste those into the command line.
They will look something like this:
git checkout -b fredstro-solutions-chap-6 master
git pull https://github.com/fredstro/calculus.git solutions-chap-6
5. Check their contribution: run latex, or compile the code, or
whatever is appropriate.
6. If you aren’t happy with what you see, leave a helpful comment.
But if everything looks good, click Merge pull request, leave a
comment if appropriate, and then click Confirm merge.
7. You are done, except don’t forget to git checkout master and
then pull in the new material.
47
Appendix F
List of Principles
48
Resources
49