Gitting Things Done - A Visual and Practical Guide To Git (Full Book)
Gitting Things Done - A Visual and Practical Guide To Git (Full Book)
Omer Rosenbaum
Introduction
Git is awesome.
Most software developers use Git on a daily basis. But how many truly
understand Git? Do you feel like you know what's going on under the hood
as you use Git to perform various tasks?
For example, what happens when you use git commit ? What is stored
between commits? Is it just a diff between the current and previous
commit? If so, how is the diff encoded? Or is an entire snapshot of the
repository stored each time?
Most people who use Git don't know the answers to the questions posed
above. But does it really matter? Do you really have to know all of those
things?
So many times have I received questions about Git from experienced, highly
skilled software engineers. I have seen wonderful developers react in fear
when something happened in their commit history, and they just didn't
know what to do. It doesn't have to be this way.
By reading this book, you will gain a new perspective of Git. You will feel
confident when working with Git, and you will understand Git's underlying
mechanisms, at least those that are useful to understand. You will Git it. You
will be Gitting things done.
Table of Contents
Introduction
Chapter 11 - Exercises
Summary
Appendixes
If you are experienced with Git - I am sure you will be able to deepen your
knowledge. Even if you are new to Git - I will start with an overview of the
mechanisms of Git, and the terms used throughout this book.
This book is for you. I wrote it so you can learn more about Git, and also
come to appreciate, or even love Git.
You will also notice that I use a casual style throughout the book. I believe
that learning Git should be insightful and fun. Learning new things is always
:
that learning Git should be insightful and fun. Learning new things is always
hard, and I felt like writing in a less casual style wouldn't really make a good
service. And as I already mentioned - this book is for you.
Who Am I?
This book is about you, and your journey with Git. But I would like to tell you
a bit about why I think I can contribute to your journey.
If you would like to support this book, you are welcome to buy the
Paperback version, an E-Book version, or buy me a coffee. Thank you!
Accompanying Videos
I have covered many topics from this book on my YouTube channel - Brief
(https://www.youtube.com/@BriefVid). You are welcome to check them out
as well.
:
as well.
Git's Feelings
Throughout the book, I sometimes refer to Git with words such as
"believes", "thinks", or "wants". As you may argue, Git is not a human, and it
doesn't have feelings or beliefs. Well, that's true, but in order for us to enjoy
playing around with Git, and to help you enjoy reading (and me writing) this
book, I feel like referring to Git as more than just code makes it all so much
more enjoyable.
My Setup
I will include screenshots. There's no need for your setup to match mine, but
if you're curious about my setup, then:
I also use plugins for Oh My Zsh, you can follow this tutorial on
freeCodeCamp.
Right from the beginning, I asked for feedback and was lucky to receive it
from great people (see Acknowledgments) to make sure the book achieves
these goals. If you liked something about this book, felt that something was
missing, or that something needed improvement - I would love to hear from
you. Please reach out at: [email protected].
Note
This book is provided for free on freeCodeCamp as described above and
according to Creative Commons Attribution-NonCommercial-ShareAlike
4.0 International.
If you would like to support this book, you are welcome to buy the
Paperback version, an E-Book version, or buy me a coffee. Thank you!
Blobs
In Git, the contents of files are stored in objects called blobs, short for
binary large objects.
The difference between blobs and files is that files also contain meta-data.
For example, a file "remembers" when it was created, so if you move that file
from one directory into another directory, its creation time remains the
same.
Blobs, in contrast, are just binary streams of data, like a file's contents. A
blob does not register its creation date, its name, or anything other than its
contents.
Every blob in Git is identified by its SHA-1 hash. SHA-1 hashes consist of 20
bytes, usually represented by 40 characters in hexadecimal form.
Throughout this book I will sometimes show just the first characters of that
hash. As hashes, and specifically SHA-1 hashes are so ubiquitous within Git,
Deterministic means that the same input will provide the same output. That
is - you take a stream of data, run a hash function on that stream, and you
get a result.
For example, if you provide the SHA-1 hash function with the stream
hello , you will get 0xaaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d . If you
run the SHA-1 hash function again, from a different machine, and provide it
the same data ( hello ), you will get the same value.
Git uses SHA-1 as its hash function in order to identify objects. It relies on it
being deterministic, such that an object will always have the same identifier.
Donate
A one-way function is a function that is hard to invert given an output. That
is, it is impossible (or at least, very hard) to tell, given the result of the hash
Learn to code — free 3,000-hour curriculum
function (for example 0xaaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d ),
what input yielded that result (in this example, hello ).
Back to Git
Back to Git - Blobs, just like other Git objects, have SHA-1 hashes
associated with them.
Trees
In Git, the equivalent of a directory is a tree. A tree is basically a directory
listing, referring to blobs, as well as other trees.
Consider the drawing above. Note that the tree CAFE7 refers to the blob
F92A0 as the file pic.png . In another tree, that same blob may have
another name - but as long as the contents are the same, it will still be the
same blob object, and still have the same SHA-1 value.
Commits
Now it's time to take a snapshot of that file system—and store all the files
that existed at that time, along with their contents.
In most cases, a commit also has one or more parent commits—the previous
snapshot (or snapshots). Of course, commit objects are also identified by
their SHA-1 hashes. These are the hashes you are probably used to seeing
when you use commands such as git log .
A commit is a snapshot in time. It refers to the root tree. As this is the first commit, it has no
parents
Every commit holds the entire snapshot, not just differences between itself
and its parent commit or commits.
:
How can that work? Doesn't that mean that Git has to store a lot of data for
every commit?
Examine what happens if you change the contents of a file. Say that you edit
the file 1.txt , and add an exclamation mark—that is, you changed the
content from HELLO WORLD , to HELLO WORLD! .
Well, this change means that Git creates a new blob object, with a new SHA-
1 hash. This makes sense, as sha1("HELLO WORLD") is different from
sha1("HELLO WORLD!") .
Since you have a new hash, then the tree's listing should also change. After
all, your tree no longer points to blob 73D8A , but rather blob 62E7A instead.
Since you change the tree's contents, you also change its hash.
The tree that points to the changed blob needs to change as well
And now, since the hash of that tree is different, you also need to change
the parent tree—as the latter no longer points to tree CAFE7 , but rather to
:
the parent tree—as the latter no longer points to tree CAFE7 , but rather to
tree 24601 . Consequently, the parent tree will also have a new hash.
Almost ready to create a new commit object, and it seems like you are going
to store a lot of data—the entire file system, once more! But is that really
necessary?
Actually, some objects, specifically blob objects, haven't changed since the
previous commit—the blob F92A0 remained intact, and so did the blob
F00D1 .
So this is the trick—as long as an object doesn't change, Git doesn't store it
again. In this case, Git doesn't need to store blob F92A0 or blob F00D1 once
more. Git can refer to them using only their hash values. You can then
create your commit object.
Considering Hashes
After introducing blobs, trees, and commits - consider the hashes of these
objects. Assume I wrote the string Git is awesome! , and created a blob
object from it. You did the same on your system. Would we have the same
hash?
The answer is—Yes. Since the blobs consist of the same data, they'll have
the same SHA-1 values.
What if I made a tree that references the blob of Git is awesome! , and
gave it a specific name and metadata, and you did exactly the same on your
system. Would we have the same hash?
Again, yes. Since the tree objects are the same, they would have the same
hash.
What if I created a commit pointing to that tree with the commit message
Hello , and you did the same on your system? Would we have the same
hash?
In this case, the answer is—No. Even though our commit objects refer to the
same tree, they have different commit details—time, committer, and so on.
Blob—contents of a file.
One of the wonders of Git is that it enables multiple people to work on that
file system, in parallel, (mostly) without interfering with each other's work.
Most people would say that they are "working on branch X ." But what does
that actually mean?
You can always reference a commit by its SHA-1 hash, but humans usually
prefer other ways to name objects. A branch is one way to reference a
commit, but it's really just that.
Typically, the branch points to the latest commit in the line of development
you are currently working on.
To create another branch, you can use the git branch command. When
you do that, Git creates another pointer. If you created a branch called
test , by using git branch test , you would be creating another pointer
that points to the same commit as the branch you are on:
:
Using git branch creates another pointer
How does Git know which branch you're currently on? It keeps another
designated pointer, called HEAD . Usually, HEAD points to a branch, which in
turns points to a commit. In the case described, HEAD might point to main ,
which in turn points to commit B2424 . In some cases, HEAD can also point
to a commit directly.
To switch the active branch to be test , you can use the command git
checkout test , or git switch test . Now you can already guess what this
command actually does—it just changes HEAD to point to test .
:
git checkout test changes where HEAD points
You could also use git checkout -b test before creating the test
branch, which is the equivalent of running git branch test to create the
branch, and then git checkout test to move HEAD to point to the new
branch.
At the point represented in the drawing above, what would happen if you
made some changes and created a new commit using git commit ? Which
branch will the new commit be added to?
The answer is the test branch, as this is the active branch (since HEAD
points to it). Afterwards, the test pointer will move to the newly added
commit. Note that HEAD still points to test .
:
Every time we use git commit , the branch pointer moves to the newly created commit
If you go back to main by using git checkout main , Git will move HEAD to
point to main again.
Now, if you create another commit, which branch will it be added to?
That's right, it will be added to the main branch (and its parent would be
commit B2424 ).
:
commit B2424 ).
The resulting state after creating another commit on the main branch
When you use git commit , Git creates a commit object, and moves
the branch to point to the newly created commit.
In the next chapters, you will learn how to introduce changes to Git. You will
create a repository from scratch — without using git init , git add , or
git commit . This will allow you to deepen your understanding of what is
:
happening under the hood when you work with Git. You will also create new
branches, switch branches, and create additional commits — all without
using git branch or git checkout . I don't know about you, but I am
excited already!
1. Blob—contents of a file.
The first three are objects, whereas the fourth is one way to refer to objects
(specifically, commits).
When you work on your source code, you work from a working dir. A
working dir(ectory) (also called "working tree") is any directory on your file
system which has a repository associated with it. It contains the folders and
files of your project, and also a directory called .git that we will talk more
about later. Remember that we said that Git is a system to maintain a file
system. The working directory is the root of the file system for Git.
After you make some changes, you might want to record them in your
repository. A repository (in short: "repo") is a collection of commits, each of
which is an archive of what the project's working tree looked like at a past
:
which is an archive of what the project's working tree looked like at a past
date, whether on your machine or someone else's. That is, as I said before, a
commit is a snapshot of the working tree.
A repository also includes things other than your code files, such as HEAD
and branches .
Note regarding the drawing conventions I use: I include .git within the
working directory, to remind you that it is a folder within the project's
folder on the filesystem. The .git folder actually contains the objects of
the repository, as we will see in chapter 4.
There are other version control systems where changes are committed
directly from the working dir to the repository. In Git, this is not the case.
Instead, changes are first registered in something called the index, or the
staging area.
Both of these terms refer to the same thing, and they are used often in Git's
When you checkout a branch, Git populates the index and the working dir
with the contents of the files as they exist in the commit that branch is
pointing to. When you use git commit , Git creates a new commit object
based on the state of the index.
Using the index allows you to carefully prepare each commit. For example,
you may have two files with changes in your working dir:
For example, assume these two files are 1.txt and 2.txt . It is possible to
:
For example, assume these two files are 1.txt and 2.txt . It is possible to
only add one of them (for instance, 1.txt ) to the index, by using git add
1.txt :
As a result, the state of the index matches the state of HEAD (in this case,
"Commit 2"), with the exception of the file 1.txt , which matches the state
of 1.txt in the working directory. Since you did not stage 2.txt , the index
does not include the updated version of 2.txt . So the state of 2.txt in the
index matches the state of 2.txt in "Commit 2".
Behind the scenes - once you stage a version of a file, Git creates a blob
object with the file's contents. This blob object is then added to the index.
As long as you only modify the file on the working directory, without staging
it, the changes you make are not recorded in blob objects.
When considering the previous figure, note that I do not draw the staged
version of the file as part of the "repository", as in this representation, the
"repository" refers to a tree of commits and their references, and this blob
has not been a part of any commit.
Now, you can use git commit to record the change to 1.txt only:
:
The state after using git commit
1. It creates a new commit object. This commit object reflects the state
of the index when you ran the git commit command.
Initialize a new repository using git init my_repo , and then change your
directory to that of the repository using cd my_repo :
git init
By using tree -f .git you can see that running git init my_repo
resulted in quite a few sub-directories inside .git . (The flag -f includes
files in tree's output).
Creating f.txt
This file is within your working directory. If you run git status , you'll see
this file is untracked:
:
The result of git status
Tracked files are files that Git "knows" about. They either were in the last
commit, or they are staged now (that is, they are in the staging area).
Untracked files are everything else—any files in your working directory that
were not in your last commit, and are not in your staging area.
The new file ( f.txt ) is currently untracked, as you haven't added it to the
staging area, and it hasn't been included in a previous commit.
You can now add this file to the staging area (also referred to as staging this
file) by using git add f.txt . You can verify that it has been staged by
running git status :
:
Adding the new file to the staging area
So now the state of the index matches that of the working dir:
If you run git status again, you'll see that the status is clean - that is, the
state of HEAD (which points to your initial commit) equals the state of the
index, and also the state of the working dir. By using git log you will see
indeed that HEAD points to main which in turn points to your new commit:
:
The output of `git log` after introducing the first commit
Has something changed within the .git directory? Run tree -f .git to
check:
Apparently, quite a lot has changed. It's time to dive deeper into the
structure of .git and understand what is going on under the hood when
you run git init , git add or git commit . That's exactly what the next
When you introduce changes in Git, you almost always follow this order:
2. Then you stage these changes (or some of them) to the index
In order to deeply understand how Git works, you will create a repository,
but this time — you will build it from scratch. As in other chapters, I
encourage you to try out the commands alongside this chapter.
:
How to Set Up .git
Create a new directory, and run git status within it:
Alright, so Git seems unhappy as you don't yet have a .git folder. The
natural thing to do would be to create that directory and try again:
Apparently, creating a .git directory is just not enough. You need to add
some content to that directory.
A repository may also contain other things, such as hooks, but at the very
least — it must include objects and references.
Create a directory for the objects at .git/objects , and a directory for the
:
Create a directory for the objects at .git/objects , and a directory for the
references (in short: "refs") at .git/refs (on Windows systems — .git\
objects and .git\refs , respectively).
How does Git know where to start when looking for a commit in the
repository? As I explained earlier, it looks for HEAD , which points to the
:
current active branch (or commit, in some cases).
So, you need to create HEAD , which is just a file residing at .git/HEAD . You
can apply the following:
On UNIX:
On Windows:
So you now know how HEAD is implemented — it is simply a file, and its
contents describe what it points to.
Following the command above, git status seems to change its mind:
Notice that Git "believes" you are on a branch called main , even though you
haven't created this branch. main is just a name. You can also make Git
believe you are on a branch called banana if you wish:
:
Creating a branch named banana
Switch back to main , as you will keep working from (mostly) there
throughout this chapter, just to adhere to the regular convention:
Now that you have your .git directory ready, you can work your way to
make a commit (again, without using git add or git commit ).
So far, you have dealt with porcelain commands — git init , git add or
git commit . It's time to go deeper, and get yourself acquainted with some
plumbing commands.
On UNIX:
On Windows:
By using --stdin you are instructing git hash-object to take its input
from the standard input. This will provide you with the relevant hash value:
What Git did here is take the first two characters of the SHA-1 hash, and use
them as the name of a directory. The remaining characters are used as the
filename for the file that actually contains the blob.
Why is that so? Consider a fairly big repository, one that has 400,000
objects (blobs, trees, and commits) in its database. Looking up a hash inside
that list of 400,000 hashes might take a while. Thus, Git simply divides that
problem by 256 .
To look up the hash above, Git would first look for the directory named 7a
inside the directory .git/objects , which may have up to 256 directories
( 00 through FF ). Then, it will search within that directory, narrowing down
the search as it goes.
:
the search as it goes.
Back to the process of generating a commit. You have just created an object.
What is the type of that object? You can use another plumbing command,
git cat-file -t ( -t stands for "type"), to check that out:
Using git cat-file -t <object_sha> reveals the type of the Git object
Not surprisingly, this object is a blob. You can also use git cat-file -p ( -
p stands for "pretty-print") to see its contents:
git cat-file -p
Remember that Git creates a blob of the entire file that is staged. Even if a
single character is modified or added, the file has a new blob with a new
hash (as in the example in chapter 1 where you added ! at the end of a
line).
Apparently, no. Adding a blob object to Git's internal database does not
change the status, as Git does not know of any tracked (or untracked) files
at this stage.
You need to track this file— add it to the staging area. To do that, you can
use another plumbing command, git update-index , like so:
Note: The cacheinfo is a 16-bit file mode as stored by Git, following the
layout of POSIX types and modes. This is not within the scope of this book,
as it is really not important for you to Git things done.
Running the command above will result in a change to .git 's contents:
Can you spot the change? A new file by the name of index has been
:
created. This is it — the famous index (or staging area), is basically a file that
resides within .git/index .
So now that your blob has been added to the index, do you expect git
status to look different?
First, you can see that awesome.txt appears in green, in the "Changes to be
committed" area. That is so because the index now includes awesome.txt ,
waiting to be committed.
Second, we can see that awesome.txt appears in red — because Git believes
the file awesome.txt has been deleted, and the fact that the file has been
deleted is not staged.
(Note: You may have noticed that I sometimes refer to Git with words such
as "believes", "thinks", or "wants". As I explained in the introduction of athis
book - in order for us to enjoy playing around with Git, and reading (and
writing) this book, I feel like referring to Git as more than just code makes it
all so much more enjoyable.)
This happens as you added the blob with the contents Git is awesome to
:
This happens as you added the blob with the contents Git is awesome to
the objects' database, and updated the index that the file awesome.txt
holds the contents of that blob, but you never actually created that file on
disk.
You can easily solve this by taking the contents of the blob and writing them
to our file system, to a file called awesome.txt :
So now it's time to create a commit object from your staging area. As
explained in chapter 1, a commit object has a reference to a tree, so you
need to create a tree.
You can achieve this using the command git write-tree , which records
the contents of the index in a tree object. Of course, you can use git cat-
file -t to see that it's indeed a tree:
:
Creating a tree object with the contents of the index
Great, so you created a tree, and now you need to create a commit object
that references this tree. To do that, you can use the command:
You should now feel comfortable with the commands used to check the
created object's type, and print its contents:
Well, to know that your file has been committed, Git needs to know about
the latest commit. How does Git do that? It goes to the HEAD :
Creating main
Now, finally, git status and git log seem to appreciate our efforts:
git status
git log
By using echo and low-level commands such as git hash-object , you were
able to create a blob, add it to the index, create a tree of the index, and
create a commit object pointing to that tree.
You also learned that HEAD is a file, located in .git/HEAD . Branches are also
files, located under .git/refs/heads . When you understand how Git
operates, those abstract notions of HEAD or "branches" become very
tangible.
The next chapter will deepen your understanding of how branches work
under the hood.
Continuing from the previous chapter - you only have one branch, named
main . To create another one with the name test (as the equivalent of git
branch test ), you would need to create a file named test within
.git/refs/heads , and the contents of that file would be the same commit's
hash as the main branch points to.
:
hash as the main branch points to.
If you use git log , you can see that this is indeed the case — both main
and test point to this commit:
(Note: if you run this command and don't see a valid output, you may have
written something other than the commit's hash into
.git/refs/heads/test .)
Next, switch to our newly created branch (the equivalent of git checkout
test ). How would you do that? Try to answer for yourself before moving on
to the next paragraph.
To change the active branch, you should change HEAD to point to your new
branch:
You can now use the commands you have already used in the previous
chapter to create another file and add it to the index:
Create a blob with the content of Another File (using git hash-
object ).
It's now time to create a commit referencing this tree. This time, you should
also specify the parent of this commit — which would be the previous
commit. You specify the parent using the -p switch of git commit-tree :
We have just created a commit, with a tree as well as a parent, as you can
:
We have just created a commit, with a tree as well as a parent, as you can
see:
As you can see, git log doesn't show anything new. Why is that?
Remember that git log traces the branches to find relevant commits to
show. It shows us now test and the commit it points to, and it also shows
main which points to the same commit.
That's right — you need to change test to point to the new commit object.
You can do that by changing the contents of .git/refs/heads/test :
It worked!
git log goes to HEAD , which tells Git to go to the branch test , which
points to commit 222..3d , which links back to its parent commit b6d..07 .
By inspecting your repository's folder, you can see that you have six
different objects under the folder .git/objects - these are the two blobs
you created (one for awesome.txt and one for file.txt ), two commit
objects ("Commit 1" and "Commit 2"), and the tree objects - each pointed to
by one of the commit objects.
You also have .git/HEAD that points to the active branch or commit, and
two branches - within .git/refs/heads .
Part 1 - Summary
This part introduced you to the internals of Git. We started by covering the
basic objects—blobs, trees, and commits.
:
You learned that a blob holds the contents of a file. A tree is a directory-
listing, containing blobs and/or sub-trees. A commit is a snapshot of our
working directory, with some meta-data such as the time or the commit
message.
You learned about branches, seeing that they are nothing but a named
reference to a commit.
You learned the process of recording changes in Git, and that it involves the
working directory, a directory that has a repository associated with it, the
staging area (index) which holds the tree for the next commit, and the
repository, which is a collection of commits and references.
Then you created a new repository from scratch, by using echo and low-
level commands such as git hash-object . You created a blob, added it to
the index, created a tree object representing the index, and even created a
commit object pointing to that tree.
You were also able to create and switch between branches by modifying
files directly. Kudos to those of you who tried this on your own!
All together, after following along through this part, you should feel that
The next part will explore different strategies for integrating changes when
:
working in different branches in Git - specifically, merge and rebase.
When teams work with Git, they introduce sequences of changes, usually in
branches, and then they need to combine different change histories
together. To really understand how this is achieved, you should learn how
Git treats diffs and patches. You will then apply your knowledge to
understand the process of merge and rebase.
So, what do I mean when I say "diff"? Let's start with some history.
:
Git Diff's History
Git's diff is based on the diff utility on UNIX systems. diff was
developed in the early 1970's on the Unix operating system. The first
released version shipped with the Fifth Edition of Unix in 1974.
git diff is a command that takes two inputs, and computes the difference
between them. Inputs can be commits, but also files, and even files that have
never been introduced to the repository.
This is important - git diff computes the difference between two strings,
which most of the time happen to consist of code, but not necessarily.
https://github.com/Omerr/gitting_things_repo.git
You can clone it locally and have the same starting point I am using for this
chapter.
Consider this short text file on my machine, called file.txt , which consists
of 6 lines:
:
file.txt consists of six lines
Now, modify this file a bit. Remove the second line, and insert a new line as
the fourth line. Add an exclamation mark ( ! ) to the end of the last line, so
you get this result:
Now you can run git diff to compute the difference between the files like
so:
(I will explain the --no-index switch of this command later. For now it's
enough to understand it allows us to compare between two files that are
not part of a Git repository.)
:
The output of git diff --no-index file.txt new_file.txt
Focus on the part starting with This is a file . You can see that the added
line ( // new test ) is preceded by a + sign. The deleted line is preceded by
a - sign.
Addition lines are preceded by + , deletion lines by - , and modification lines are sequences of
deletions and additions
Now would be a good time to discuss the terms "patch" and "diff". These
two are often used interchangeably, although there is a distinction, at least
historically.
A diff shows the differences between two files, or snapshots, and can be
quite minimal in doing so. A patch is an extension of a diff, augmented with
further information such as context lines and filenames, which allow it to be
applied more widely. It is a text document that describes how to alter an
existing file or codebase.
:
These days, the Unix diff program, and git diff , can produce patches of
various kinds.
Try it out:
The patch format uses context, as well as line numbers, to locate differing
file regions. This allows a patch to be applied to a somewhat earlier or later
version of the first file than the one from which it was derived, as long as the
applying program can still locate the context of the change. We will see
exactly how these are used.
The first line introduces the compared files. Git always gives one file the
:
The first line introduces the compared files. Git always gives one file the
name a , and the other the name b . So in this case file.txt is called a ,
whereas new_file.txt is called b .
The first line in diff 's output introduces the files being compared
Then the second line, starting with index , includes the blob SHAs of these
files. So even though in our case they are not even stored within a Git repo,
Git shows their corresponding SHA-1 values.
The third value in this line, 100644 , is the "mode bits", indicating that this is
a "regular" file: not executable and not a symbolic link.
The use of two dots ( .. ) here between the blob SHAs is just as a separator
(unlike other cases where it's used within Git).
The second line in diff 's output includes the blob SHAs of the compared files, as well as the
mode bits
Other header lines might indicate the old and new mode bits if they've
changed, old and new filenames if the files were being renamed, and so on.
The blob SHAs (also called "blob IDs") are helpful if this patch is later
:
The blob SHAs (also called "blob IDs") are helpful if this patch is later
applied by Git to the same project and there are conflicts while applying it.
You will better understand what this means when you learn about the
merges in the next chapter.
After the blob IDs, we have two lines: one starting with - signs, and the
other starting with + signs. This is the traditional "unified diff" header,
again showing the files being compared and the direction of the changes: -
signs show lines in the A version that are missing from the B version, and +
signs show lines missing in the A version but present in B.
If the patch were of this file being added or deleted in its entirety, then one
of these would be /dev/null to signal that.
- signs show lines in the A version but missing from the B version, and + signs, lines missing
in A version but present in B
rm awesome.txt
For now, undo the deleting (more on undoing changes in Part 3):
After this unified diff header, we get to the main part of the diff, consisting
Every hunk begins with a single line, starting with two @ signs. These signs
are followed by at most four numbers, and then a header for the chunk -
which is an educated guess by Git. Usually, it will include the beginning of a
:
function or a class, when possible.
When possible, Git includes a header for each hunk, for example a function or class definition
In the image above, the hunk's header includes the beginning of the
function that includes the changed lines - def example_function(x) .
The first numbers are preceded by a - sign as they refer to file A . The
first number represents the line number corresponding to the first line in
file A that this hunk refers to. In the example above, it is 1 , meaning that
the line This is a file corresponds to line number 1 in version file A .
This number is followed by a comma ( , ), and then the number of lines this
chunk consists of in file A . This number includes all context lines (the
lines preceded with a space in the diff ), or lines marked with a - sign, as
they are part of file A , but not lines marked with a + sign, as they do not
exist in file A .
As you can see, the lines beginning with a space character are context lines,
Then, we have a + sign to mark the two numbers that refer to file B .
First, there's the line number corresponding to the first line in file B ,
followed by the number of lines this chunk consists of in file B .
This number includes all context lines, as well as lines marked with the +
:
sign, as they are part of file B , but not lines marked with a - sign.
After the header of the chunk, we get the actual lines - either context, - , or
+ lines.
Typically and by default, a hunk starts and ends with three context lines. For
example, if you modify lines 4-5 in a file with ten lines:
that is, three lines before and three lines after the modified lines.
If that file doesn't have nine lines, but rather six lines - then the hunk will
contain only one context line after the changed lines, and not three.
Similarly, if you change the second line of a file, then there would be only
one line of context before the changed lines.
:
The patch format by git diff
Often, you will see the output of git diff showing two versions of the
same file and the difference between them.
Again, I encourage you to run the commands with me - make sure you clone
the repository from:
https://github.com/Omerr/gitting_things_repo.git
At the current state, the active directory is a Git repository, with a clean
status:
:
git status
Save your changes, but don't stage or commit them. Next, run git diff :
:
The output of git diff for my_file.py after changing it
The output of git diff shows the difference between my_file.py 's
version in the staging area, which in this case is the same as the last commit
( HEAD ), and the version in the working directory.
I covered the terms "working directory", "staging area", and "commit" in the
Git objects chapter, so check it out in ccase you would like to refresh your
memory. As a reminder, the terms "staging area" and "index" are
interchangeable, and both are widely used.
At this state, the status of the working dir is different from the status of the index. The status
of the index is the same as that of HEAD
To see the difference between the working dir and the staging area, use
git diff , without any additional flags.
:
Without switches, git diff shows the difference between the staging area and the working
directory
As you can see, git diff lists here both file A and file B pointing to
my_file.py . file A here refers to the version of my_file.py in the
staging area, whereas file B refers to its version in the working dir.
Note that if you modify my_file.py in a text editor, and don't save the file,
then git diff will not be aware of the changes you've made. This is
because they haven't been saved to the working dir.
We can provide a few switches to git diff to get the diff between the
working dir and a specific commit, or between the staging area and the
latest commit, or between two commits, and so on.
Currently the file is in the working dir, and it is actually untracked in Git.
:
A new, untracked file
Now, the state of HEAD is the same as the state of the staging area, as well
as the working tree:
The state of HEAD is the same as the index and the working dir
Next, edit new_file.txt by adding a new line at the beginning and another
new line at the end:
Modifying new_file.txt by adding a line in the beginning and another in the end
After saving, the state in the working dir is different than that of the index or HEAD
A nice trick would be to use git add -p , which allows you to split the
changes even within a file, and consider which ones you'd like to stage.
In this case, add the first line to the index, but not the last line. To do that,
you can split the hunk using s , then accept to stage the first hunk (using
y ), and not the second part (using n ).
If you are not sure what each letter stands for, you can always use a ? and
Git will tell you.
:
Using git add -p , you can stage only the first change
So now the state in HEAD is without either of those new lines. In the staging
area you have the first line but not the last line, and in the working dir you
have both new lines.
git diff shows the difference between the index and the working dir
Well, as stated before, you get the diff between the staging area and the
working tree.
What happens if you want to get the diff between HEAD and the staging
area? For that, you can use git diff --cached :
:
git diff --cached shows the difference between HEAD and the index
And what if you want the difference between HEAD and the working tree?
For that you can run git diff HEAD :
git diff HEAD shows the difference between HEAD and the working dir
To summarize the different switches for git diff we have seen so far, here's a
diagram:
As a reminder, at the beginning of this chapter you used git diff --no-
index . With the --no-index switch, you can compare two files that are not
part of the repository - or of any staging area.
By the way, you can omit the 1 above and write HEAD~ , and get the same
result. Using 1 is the explicit way to state you are referring to the first
parent of the commit.
Note that writing the parent commit here, HEAD~1 , first results in a diff
showing how to get from the parent commit to the current commit. Of
course, I could also generate the reverse diff by writing:
The output of git diff HEAD HEAD~1 generates the reverse patch
:
To summarize all the different switches for git diff we covered in this
section, see this diagram:
A short way to view the diff between a commit and its parent is by using git
show , for example:
git diff HEAD~ HEAD is used to show the difference between commits
As you learned in the Git Objects chapter, Git stores the entire snapshots.
The diff is dynamically generated from the snapshot data - by comparing
the root trees of the commit and its parent.
Of course, Git can compare any two snapshots in time, not just adjacent
commits, and also generate a diff of files not included in a repository.
Historical Note
Actually, sharing patches used to be the main way to share code in the early
days of open source. But now - virtually all projects have moved to sharing
Git commits directly through pull requests (called "merge requests" on
some platforms).
The biggest problem with using patches is that it is hard to apply a patch
when your working directory does not match the sender's previous commit.
Losing the commit history makes it difficult to resolve conflicts. You will
better understand this as you dive deeper into the process of git apply ,
especially in the next chapter where we cover merges.
A Simple Patch
What does it mean to apply a patch? It's time to try it out!
Don't worry about the last command - I'll explain it in detail in Part 3, where
we discuss undoing changes. In short, it allows us to "reset" the state of
where HEAD is pointing to, as well as the state of the index and of the
working dir. In the example above, they are all set to the state of HEAD~1 , or
"Commit 3" in the diagram.
So after running the reset command, the contents of the file are as follows
(the state from "Commit 3"):
nano new_file.txt
new_file.txt
And you will apply this patch that you've just saved:
nano my_patch.patch
:
The patch you are about to apply, as generated by git diff
And as a result, you get this version of your file, just like the commit you
have created before:
nano new_file.txt
:
The contents of new_file.txt after applying the patch
nano test.text
Now, change this file by adding a new line, and also erasing the line before
:
Now, change this file by adding a new line, and also erasing the line before
the last one:
Changes to test.txt
Observe the difference between the original version of the file and the
version including your changes:
(Using -- test.txt tells Git to run the command diff , taking into
consideration only test.txt , so you don't get the diff for other files.)
As a result, the line numbers are different from the original version where
the patch has been created. Consider the patch you created before:
new_patch.patch
It assumes that the line With more text is the second line in test.txt ,
which is no longer the case. So...will git apply work?
It worked!
:
It worked!
By default, Git looks for 3 lines of context before and after each change
introduced in the patch - as you can see, they are included in the patch file.
If you take three lines before and after the added line, and three lines
before and after the deleted line (actually only one line after, as no other
lines exist) - you get to the patch file. If these lines all exist - then applying
the patch works, even if the line numbers changed.
What happens if you change one of the context lines? Try it out by changing
the line With more text to With more text! :
And now:
Well, no. The patch does not apply. If you are not sure why, or just want to
better understand the process Git is performing, you can add the --
verbose flag to git apply , like so:
git apply --verbose shows the process Git is taking to apply the patch
It seems that Git searched lines from the file, including the line "With more
text", right before the line "It has some really nice lines". This sequence of
lines no longer exists in the file. As Git cannot find this sequence, it cannot
apply the patch.
As mentioned earlier, by default, Git looks for 3 lines of context before and
after each change introduced in the patch. If the surrounding three lines do
:
after each change introduced in the patch. If the surrounding three lines do
not exist, Git cannot apply the patch.
You can ask Git to rely on fewer lines of context, using the -C argument.
For example, to ask Git to look for 1 line of the surrounding context, run the
following command:
new_patch.patch
When applying the patch with the -C1 option, Git is looking for the lines:
How wonderful
So we are writing an example
Git is awesoome!
As Git can find these lines, Git can erase the middle one.
Notice that commit 54a9d is also "on" this branch, as it is the parent commit
of ba0d2 . So if you start from the pointer of feature_1 , you get to ba0d2 ,
which then points to 54a9d . You can go on the chain of parents, and all
these reachable commits are considered to be "on" feature_1 .
When you merge with Git, you merge commits. Almost always, we merge
two commits by referring to them with the branch names that point to
them. Thus we say we "merge branches" - though under the hood, we
actually merge commits.
https://github.com/Omerr/gitting_things_merge.git
OK, so let's say I have this simple repository here, with a branch called
main , and a few commits with the commit messages of "Commit 1",
"Commit 2", and "Commit 3":
:
A simple repository with three commits
And switch HEAD to point to this new branch, by using git checkout
new_feature (or git switch new_feature ). You can look at the outcome by
using git log:
If you need a reminder about branches and how they're implemented under
the hood, please check out chapter 2. Yes, check out. Pun intended
Implementing new_feature
Looking at the history, you have the branch new_feature , now pointing to
"Commit 4", which points to its parent, "Commit 3". The branch main is also
pointing to "Commit 3".
:
pointing to "Commit 3".
Time to merge the new feature! That is, merge these two branches, main
and new_feature . Or, in Git's lingo, merge new_feature into main . This
means merging "Commit 4" and "Commit 3". This is pretty trivial, as after all,
"Commit 3" is an ancestor of "Commit 4".
Check out the main branch (with git checkout main ), and perform the
merge by using git merge new_feature :
Since new_feature never really diverged from main, Git could just perform
a fast-forward merge. So what happened here? Consider the history:
Even though you used git merge , there was no actual merging here.
Actually, Git did something very simple - it reset the main branch to point
:
Actually, Git did something very simple - it reset the main branch to point
to the same commit as the branch new_feature .
In case you don't want that to happen, but rather you want Git to really
perform a merge, you could either change Git's configuration, or run the
merge command with the --no-ff flag.
Reminder: if this way of using reset is not clear to you, don't worry - we will
cover it in detail in Part 3. It is not crucial for this introduction of merge,
though. For now, it's important to understand that it basically undoes the
merge operation.
Next, perform the merge with the --no-fast-forward flag ( --no-ff for
short):
(Reminder: git lol is an alias I added to Git to visibly see the history in a
graphical manner. You can find it, along with the other components of my
Considering this history, you can see Git created a new commit, a merge
commit.
You will see that this commit actually has two parents - "Commit 4", which
was the commit that new_feature pointed to when you ran git merge , and
"Commit 3", which was the commit that main pointed to.
The merge commit shows us the concept of merge quite well. Git takes two
commits, usually referenced by two different branches, and merges them
together.
After the merge, as you started the process from main , you are still on
main , and the history from new_feature has been merged into this branch.
Since you started with main , then "Commit 3", which main pointed to, is
the first parent of the merge commit, whereas "Commit 4", which you
merged into main , is the second parent of the merge commit.
Notice that you started on main when it pointed to "Commit 3", and Git
went quite a long way for you. It changed the working tree, the index, and
also HEAD and created a new commit object. At least when you use git
merge without the --no-commit flag and when it's not a fast-forward
merge, Git does all of that.
:
This was a super simple case, where the branches you merged didn't diverge
at all. We will soon consider more interesting cases.
By the way, you can use git merge to merge more than two commits -
actually, any number of commits. This is rarely done, and to adhere to the
practicality principle of this book, I won't delve into it.
Assume we have two people working on this repo now, John and Paul.
While John was working on this song, Paul was also writing, on another
branch. Paul had started from main:
And Paul wrote his song into a file called penny_lane.md . Paul staged and
committed this file:
:
git add penny_lane.md
git commit -m "Commit 6"
So now our history looks like this - where we have two different branches,
branching out from main , with different histories:
John is happy with his branch (that is, his song), so he decides to merge it
into the main branch:
validate that by looking at the history (using git lol , for example):
:
Merging john_branch into main results in a fast-forward merge
At this point, Paul also wants to merge his branch into main , but now a fast-
forward merge is no longer relevant - there are two different histories here:
the history of main 's and that of paul_branch 's. It's not that paul_branch
only adds commits on top of main branch or vice versa.
First, let Git do the hard work for you. After that, we will understand what's
actually happening under the hood.
What you have is a new commit, with two parents - "Commit 5" and
"Commit 6".
:
In the working dir, you can see that both John's song as well as Paul's song
are there (if you use ls , you will see both files in the working dir).
Nice, Git really did merge the changes for you. But how does that happen?
What Git has done here is it called a 3-way merge. In outlining the process
of a 3-way merge, I will use the term "branch" for simplicity, but you should
remember you could also merge two (or more) commits that are not
referenced by a branch.
First, Git locates the common ancestor of the two branches. That is, the
common commit from which the merging branches most recently diverged.
Technically, this is actually the first commit that is reachable from both
branches. This commit is then called the merge base.
Second, Git calculates two diffs - one diff from the merge base to the first
branch, and another diff from the merge base to the second branch. Git
generates patches based on those diffs.
:
Third, Git applies both patches to the merge base using a 3-way merge
algorithm. The result is the state of the new merge commit.
The three steps of the 3-way merge algorithm: (1) locate the common ancestor (2) calculate
diffs from the merge base to the first branch, and from the merge base to the second branch (3)
apply both patches together
In the first step, Git looks from both branches - main and paul_branch -
and traverses the history to find the first commit that is reachable from
both. In this case, this would be… which commit?
Correct, the merge commit (the one with "Commit 3" and "Commit 4" as its
parents).
:
parents).
If you are not sure, you can always ask Git directly:
The merge base is the merge commit with "Commit 3" and "Commit 4" as its parents. Note: the
previous commit merge is blurred as it is not reachable via the current history following the
reset command
By the way, this is the most common and simple case, where we have a
single obvious choice for the merge base. In more complicated cases, there
may be multiple possibilities for a merge base, but this is not within our
focus.
In the second step, Git calculates the diffs. So it first calculates the diff
between the merge commit and "Commit 5":
If you don't feel comfortable with the output of git diff , you can read the
previous chapter where I described it in detail.
Next, Git calculates the diff between the merge commit and "Commit 6":
First, try that out directly - just apply the patches (I will walk you through it
in a moment). This is not what Git really does under the hood, but it will help
you gain a better understanding of why Git needs to do something different.
Checkout the merge base first, that is, the merge commit:
And apply John's patch first (as a reminder, this is the patch shown in the
image with the caption "The diff between the merge commit and "Commit
5""):
So now John's new song is incorporated into the index. Apply the other
patch:
Now it's time to commit your merge. Since the porcelain command git
commit always generates a commit with a single parent, you would need the
underlying plumbing command - git commit-tree .
Remember that every Git commit object points to a single tree. So you need
:
Remember that every Git commit object points to a single tree. So you need
to record the contents of the index in a tree:
git write-tree
Now you get the SHA-1 value of the created tree, and you can create a
commit object using git commit-tree :
Recall that git merge also changes HEAD to point to the new merge
This is almost what we wanted. Remember that when you ran git merge ,
the result was HEAD pointing to main which pointed to the newly created
commit (as shown in the image with the caption "When you merge
paul_branch , you get a new merge commit". What should you do then?
Well, what you want is to modify main , so you can just point it to the new
commit:
And now you have the same result as when you ran git merge : main
points to the new commit, which has "Commit 5" and "Commit 6" as its
parents. You can use git lol to verify that.
:
So this is exactly the same result as the merge done by Git, with the
exception of the timestamp and thus the SHA-1 value, of course.
Overall, you got to merge both the contents of the two commits - that is, the
state of the files, and also the history of those commits - by creating a merge
commit that points to both histories.
In this simple case, you could actually just apply the patches using git
apply , and everything works quite well.
First, locates the merge base - the common ancestor of the two
branches. That is, the first commit that is reachable from both
branches.
Second, Git calculates two diffs - one diff from the merge base to the
first branch, and another diff from the merge base to the second
branch.
Third, Git applies both patches to the merge base, using a 3-way
merge algorithm. I haven't explained the 3-way merge yet, but I will
elaborate on that later. The result is the state of the new merge
commit.
You can also understand why it's called a "3-way merge": Git merges three
different states - that of the first branch, that of the second branch, and
their common ancestor. In our previous example, main , paul_branch , and
the merge commit (with "Commit 3" and "Commit 4" as parents),
:
the merge commit (with "Commit 3" and "Commit 4" as parents),
respectively.
This is unlike, say, the fast-forward examples we saw before. The fast-
forward examples are actually a case of a two-way merge, as Git only
compares two states - for example, where main pointed to, and where
john_branch pointed to.
Moving on
Still, this was a simple case of a 3-way merge. John and Paul created
different songs, so each of them touched a different file. It was pretty
straightforward to execute the merge.
Let's assume that now John and Paul are co-authoring a new song.
So, John checked out main branch and started writing the song:
Of course, the original song does not include the title "Paul's Verse", but I
:
Of course, the original song does not include the title "Paul's Verse", but I
added it here for clarity.
John also branches out from main and adds an additional two lines at the
end:
So, both Paul and John modified the same file on different branches. Will Git
Say now we don't go through main , but John will try to merge Paul's new
branch into his branch:
So, first, Git needs to find the merge base. Can you see which commit that
would be?
Correct, it would be the last commit on the main branch, where the two
diverged - that is, "Commit 7".
Checkout the merge base so you can later apply the patches you will create:
Will applying this patch succeed? Well, no problem, Git has all the context
lines in place.
Now, compute the diff between John's new branch and the merge base.
Notice that you haven't committed the applied changes, so john_branch_2
still points at the same commit as before, "Commit 9":
Well, indeed, yes. Notice that even though the line numbers have changed
on the current version of the file, thanks to the context lines Git is able to
locate where it needs to add these lines…
git write-tree
See how I used the branch names here? After all, they are just pointers to
the commits we want.
You can also let Git perform the job for you. You can checkout
:
You can also let Git perform the job for you. You can checkout
john_branch_2 , which you haven't moved - so it still points to the same
commit as it did before the merge. So all you need to do is run:
Just as before, you have a merge commit pointing to "Commit 8" and
"Commit 9" as its parents. "Commit 9" is the first parent since you merged
into it.
:
But this was still quite simple… John and Paul worked on the same file, but
on very different parts. You could also directly apply Paul's changes to
John's branch. If you go back to John's branch before the merge:
But what happens when the two branches include changes on the same
files, in the same locations?
In this case, John creates the first version of this song in the main branch:
By the way, this text is indeed taken from the version that John Lennon
recorded for a demo in 1968. But this isn't a book about the Beatles. If
you're curious about the process the Beatles underwent while writing this
song, you can follow the links in the end of this chapter.
Now John and Paul split. Paul creates a new verse in the beginning:
Also, while talking to John, they decided to change the word "feet" to "foot",
so Paul adds this change as well.
You can observe Paul's changes, by comparing this branch's state to the
state of branch main :
And he replaces the line "Everyone had the boot in" with the line "Everyone
had a wet dream". In addition, John changed the word "feet" to "foot",
following his talk with Paul.
This also applies to commits that are no longer reachable from any named
reference, such as "Commit 8" or "Commit 9". Since they are not reachable
from any named reference via the parents' chain, they will not be included
in the output of commands such as git log .
Back to our story - Paul told John he had added a new verse, so John would
like to merge Paul's changes.
As explained earlier, git merge uses a 3-way merge algorithm, and this can
come in handy here. What would be the first step of this algorithm?
Well, first, Git would find the merge base - that is, the common ancestor of
Paul's branch and John's branch. Consider the history:
:
The history after introducing "Commit 12"
So the common ancestor of "Commit 11" and "Commit 12" is "Commit 10".
You can verify this by running the command:
Now we can take the patches we generated from the diffs on both branches,
and apply them to main . Would that work?
nano everyone.md
Now, can Git apply Paul's patch? To remind you, this is the patch:
What you tried to do now, applying Paul's patch on the main branch after
applying John's patch, is the same as being on john_branch_3 , and
attempting to apply the patch. That is, running:
Well, no! Again, if you are not sure what happened, you can always ask git
apply to be a bit more verbose:
Git is looking for "Everyone put the feet down", but Paul has already
changed this line so it now consists of the word "foot" instead of "feet". As a
result, applying this patch fails.
Notice that changing the number of context lines here (that is, using git
apply with the -C flag, as discussed in the previous chapter) is irrelevant -
:
apply with the -C flag, as discussed in the previous chapter) is irrelevant -
Git is unable to locate the actual line that the patch is trying to erase.
But actually, Git can make this work, if you just add a flag to apply, telling it
to perform a 3-way merge under the hood:
Exactly what we wanted! You have Paul's verse, and both of John's changes!
Well, as I mentioned, Git really did a 3-way merge, and with this example, it
will be a good time to dive into what this actually means.
You have now three versions: the merge base, which is "Commit 10", Paul's
branch, and John's branch. In general terms, we can say these are the merge
base , commit A and commit B . Notice that the merge base is by definition
an ancestor of both commit A and commit B .
To perform the merge, Git looks at the diff between the three different
versions of the file in question on these three revisions. In your case, it's the
file everyone.md, and the revisions are "Commit 10", Paul's branch - that is,
"Commit 11", and John's branch, that is, "Commit 12".
Git makes the merging decision based on the status of each line in each of
these versions.
This means that the state of John's branch is equal to the state of the merge
base. So the 3-way merge goes with Paul's version.
In general, if the state of the merge base is the same as A , the algorithm
goes with B . The reason is that since the merge base is the ancestor of both
A and B , Git assumes that this line hasn't changed in A , and it has changed
in B , which is the most recent version for that line, and should thus be
taken into account.
If the state of the merge base is the same as A , and this state is different from B , the
algorithm goes with B
:
Next, you can see lines where all three versions agree - they exist on the
merge base, A and B , with equal data.
In this case the algorithm has a trivial choice - just take that version.
In case all three versions agree, the algorithm goes with that single version
In a previous example, we saw that if the merge base and A agree, and B 's
version is different, the algorithm picks B . This works in the other direction
too - for example, here you have a line that exists on John's branch, different
than that on the merge base and Paul's branch.
:
A line where Paul's version matches the merge base's version, and John has a different version
If the state of the merge base is the same as B , and this state is different from A , the
algorithm goes with A
Now consider another case, where both A and B agree on a line, but the
value they agree upon is different from the merge base: both John and Paul
agreed to change the line "Everyone put their feet down" to "Everyone put
their foot down":
:
A line where Paul's version matches John's version, yet the merge base has a different version
In case A and B agree on a version which is different from the merge base's version, the
algorithm picks the version on both A and B
Notice this is not a democratic vote. In the previous case, the algorithm
picked the minority version, as it resembled the newest version of this line.
In this case, it happens to pick the majority - but only because A and B are
You will see that the merge commit indeed has two parents: the first is
"Commit 11", that is, where paul_branch_3 pointed to before the merge.
The second is "Commit 12", where john_branch_3 pointed to, and still
points to now.
What will happen if you now merge from main ? That is, switch to the main
branch, which is pointing to "Commit 10":
:
git checkout main
A fast-forward merge
So, this is a 3-way merge. In general, if all versions agree on a line, then this
line is used. If A and the merge base match, and B has another version, B
is taken. In the opposite case, where the merge base and B match, the A
version is selected. If A and B match, this version is taken, whether the
merge base agrees or not.
This description leaves one open question though: What happens in cases
where all three versions disagree?
Well, that's a conflict that Git does not resolve automatically. In these cases,
Git calls for a human's help.
:
Git calls for a human's help.
And he decides to add some "Yeah"s to the song, so he changes this verse as
follows:
Paul's additions
You can see that the history diverges from main , to two different branches
- paul_branch_4 , and john_branch_4 .
At this point, John would like to merge the changes introduced by Paul.
A merge conflict
We have a conflict!
Git cannot merge these branches on its own. You can get an overview of the
merge state, using git status :
The changes that Git had no problem resolving are staged for commit. And
there is a separate section for "unmerged paths" - these are files with
conflicts that Git could not resolve on its own.
:
It's time to understand why and when these conflicts happen, how to
resolve them, and also how Git handles them under the hood.
First, Git will look for the merge base - the common ancestor of
john_branch_4 and paul_branch_4 . Which commit would that be?
It would be the tip of the main branch, the commit in which we merged
john_branch_3 into paul_branch_3 .
Again, if you are not sure, you can verify that by running:
And at the current state, git status knows which files are staged and
which aren't.
Consider the process for each file, which is the same as the 3-way merge
algorithm we considered per line, but on a file's level:
What about everyone.md ? Well, here we have three different states of this
file: its state on the merge base, its state on John's branch, and its state on
Paul's branch. While performing a merge, Git stores all of these versions on
the index.
Let's observe that by looking directly at the index with the command git
ls-files :
You can see that everyone.md has three different entries. Git assigns each
version a number that represents the "stage" of the file, and this is a distinct
property of an index entry, alongside the file's name and the mode bits.
Stage 2 - which is "your" version. That is, the version of the file on
the branch you are merging into. In our example, this would be
john_branch_4 .
To observe the file's contents in a specific stage, you can use a command I
introduced in a previous post, git cat-file, and provide the blob's SHA:
And indeed, this is the content we expected - from John's branch, where the
lines start with "Everybody" rather than "Everyone".
A nice trick that allows you to see the content quickly without providing the
blob's SHA-1 value, is by using git show , like so:
For example, to get the content of the same version as with git cat-file -p
<BLOB_SHA_FOR_STAGE_2>, you can write git show :2:everyone.md .
Git records the three states of the three commits into the index in this way
at the start of the merge. It then follows the three-way merge algorithm to
quickly resolve the simple cases:
If one side made a change while the other did nothing - that is, stage 1
:
If one side made a change while the other did nothing - that is, stage 1
matches stage 2 - then we choose stage 3 , or vice versa. That's exactly
what happened with let_it_be.md and across_the_universe.md .
In case of a deletion on the incoming branch, for example, and given there
were no changes on the current branch, then we would see that stage 1
matches stage 2 , but there is no stage 3 . In this case, git merge removes
the file for the merged version.
What's really cool here is that for matching, Git doesn't need the actual
files. Rather, it can rely on the SHA-1 values of the corresponding blobs.
This way, Git can easily detect the state a file is in.
For everyone.md you have this special case - where stage 1 , stage 2 and
stage 3 are all different from one another. That is, they have different blob
SHAs. It's time to go deeper and understand the merge conflict.
One way to do that would be to simply use git diff . In a previous chapter,
we examined git diff in detail, and saw that it shows the differences
between various combinations of the working tree, index or commits.
But git diff also has a special mode for helping with merge conflicts:
git diff
:
The output of git diff during a merge conflict
This output may be confusing at first, but once you get used to it, it's pretty
clear. Let's start by understanding it, and then see how you can resolve
conflicts with other, more visual tools.
So git diff without any special flags shows changes between the working
tree and the index - which in this case are the conflicts yet to be resolved.
The output doesn't include staged changes, which is very convenient for
:
The output doesn't include staged changes, which is very convenient for
resolving the conflict.
For Git, Paul and John made different changes to the same line, for a few
lines. John changed it to one thing, and Paul changed it to another thing. Git
cannot decide which one is correct.
This is not the case for the last lines, like the line that used to be "Everyone
had a hard year" on the merge base. Paul hasn't changed this line, or the
lines surrounding it, so its version on paul_branch_4, or "theirs" in our case,
agrees with the merge_base . Yet John's version, "ours", is different. Thus
git merge can easily decide to take this version.
In this case, I know what I want, and that is actually a combination of these
lines. I want the lines to start with "Everybody", following John's change, but
also to include Paul's "yeah"s. So go ahead and create the desired version by
editing everyone.md:
nano everyone.md
To compare the result file to what you had in the branch prior to the merge,
you can run:
Similarly, if you wish to see how the result of the merge differs from the
branch you merged into our branch, you can run:
You can even see how the result is different from both sides using:
After staging, if you look at git status , you will see no conflicts:
:
After staging the fixed version everyone.md , there are no conflicts
You can now simply use git commit , and Git will present you with a commit
message containing details about the merge. You can modify it if you like, or
leave it as is. Regardless of the commit message, Git will create a "merge
commit" - that is, a commit with more than one parent.
john_branch_4 now points to the new merge commit. The incoming branch,
"theirs", in this case, paul_branch_4 , stays where it was.
VS Code marks the different versions with "Current Change" - which is the
"ours" version, the current HEAD , and "Incoming Change" for the branch we
are merging into the active branch. You can accept one of the changes (or
both) by clicking on one of the options.
If you clicked on Resolve in Merge editor , you'll get a more visual view of
the state. VS Code shows the status of each line:
If you look closely, you will see that VS Code shows changes within words -
for example, showing that "Everyone" was changed to "Everybody",
marking the changed parts.
You can accept either version, or you can accept a combination. In this case,
if you click on "Accept Combination", you get this result:
:
VS Code's Merge Editor after clicking on "Accept Combination"
VS Code did a really good job! The same three way merge algorithm was
implemented here and used on the word level rather than the line level. So
VS Code was able to actually resolve this conflict in a rather impressive way.
Of course, you can modify VS Code's suggestion, but it provided a very good
start.
command line and learn a tool that can come in handy in more complicated
cases.
And merge:
And say, you are not exactly sure what happened. Why is there a conflict?
One very useful command would be:
As a reminder, git log shows the history of commits that are reachable
from HEAD . Adding -p tells git log to show the commits along with the
diffs they introduced. The --merge switch makes the command show all
commits containing changes relevant to any unmerged files, on either
branch, together with their diffs.
This can help you identify the changes in history that led to the conflicts. So
in this example, you'd see:
Notice that git log --merge did not mention previous commits that
changed everyone.md before "Commit 13", as they didn't affect the current
conflict.
This way, git log tells you all you need to know to understand the process
that got you into the current conflicting state. Cool!
Using the command line, you can also ask Git to take only one side of the
changes - either "ours" or "theirs", even for a specific file.
You can also instruct Git to take some parts of the diffs of one file and
another from another file. I will provide links that describe how to do that in
the additional resources of this chapter in the appendix.
For the most part, you can accomplish that pretty easily, either manually or
from the UI of your favorite IDE.
First, Git locates the merge base. As a reminder, this is the first
commit that is reachable from both branches.
Second, Git calculates two diffs - one diff from the merge base to the
first branch, and another diff from the merge base to the second
branch. Git generates patches based on those diffs.
Third and last, Git applies both patches to the merge base using a 3-
way merge algorithm. The result is the state of the new merge
commit.
We dove deeper into the process of a 3-way merge, whether at a file level or
a hunk level. We considered when Git is able to rely on a 3-way merge to
automatically resolve conflicts, and when it just can't.
You saw the output of git diff when we are in a conflicting state, and how
to resolve conflicts either manually or with VS Code.
Beatles-Related Resources
https://www.the-paulmccartney-project.com/song/ive-got-a-
feeling/
https://www.cheatsheet.com/entertainment/did-john-lennon-or-
paul-mccartney-write-the-classic-a-day-in-the-life.html/
http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-
lyrics.html
The truth is, if you understand what it actually does, git rebase is a very
elegant, and straightforward tool to achieve so many different things in Git.
In the previous chapters in this part, you learned what Git diffs are, what a
merge is, and how Git resolves merge conflicts. In this chapter, you will
understand what Git rebase is, why it's different from merge, and how to
rebase with confidence.
In the previous chapter, we considered the example where John and Paul (of
the Beatles) were co-authoring a new song. They started from the main
branch, and then each diverged, modified the lyrics, and committed their
changes.
Then, the two wanted to integrate their changes, which is something that
happens very frequently when working with Git.
In the previous chapter, we got to know git merge pretty well. We saw
that when performing a merge, we create a merge commit - where the
contents of this commit are a combination of the two branches, and it also
has two parents, one in each branch.
:
So, say you are on the branch john_branch (assuming the history depicted
in the drawing above), and you run git merge paul_branch . You will get to
this state - where on john_branch , there is a new commit with two parents.
The first one will be the commit on the john_branch branch where HEAD
was pointing to a state before performing the merge - in this case, "Commit
6". The second will be the commit pointed to by paul_branch , "Commit 9".
The result of running git merge paul_branch : a new Merge Commit with two parents
Look again at the history graph: you created a diverged history. You can
actually see where it branched and where it merged again.
So when using git merge , you do not rewrite history - but rather, you add a
commit to the existing history. And specifically, a commit that creates a
diverged history.
Let's start with the big picture: if you are on paul_branch , and use git
rebase john_branch , Git goes to the common ancestor of John's branch and
:
rebase john_branch , Git goes to the common ancestor of John's branch and
Paul's branch. Then it takes the patches introduced in the commits on Paul's
branch, and applies those changes to John's branch.
So here, you use rebase to take the changes that were committed on one
branch - Paul's branch - and replay them on a different branch,
john_branch .
The result of running `git rebase john_branch`: the commits on `paul_branch` were "replayed"
on top of john_branch
We will now take this bit by bit to make sure you fully understand what's
happening under the hood
As always, you are encouraged to run the commands yourself while reading
this chapter. Unless noted otherwise, I will use the following repository:
https://github.com/Omerr/rebase_playground.git
I recommend you clone it locally and have the same starting point I am using
for this chapter.
You can see that in this commit, John started working on a song called "Lucy
in the Sky with Diamonds":
:
The output of git diff - the patch introduced by "Commit 5"
As a reminder, you can also use the command git show to get the same
output:
Now, if you cherry-pick this commit, you will introduce this change
specifically, on the active branch. Switch to main first:
It seems like you copy-pasted "Commit 5". Remember that even though it has
the same commit message, and introduces the same changes, and even
points to the same tree object as the original "Commit 5" in this case - it is
still a different commit object, as it was created with a different timestamp.
Cool!
You can now remove the new branch so it doesn't appear on your history
every time:
To understand the process, I will provide the high level view, and then dive
deeper into each step. The process of rebasing one branch on top of
:
deeper into each step. The process of rebasing one branch on top of
another branch is as follows:
The process of making new commits with the same change sets as existing
ones is also called "replaying" those commits, a term we have already used.
With git merge you added to the history, while with git rebase you
rewrite history. You create new commit objects. In addition, the result is a
linear history graph - rather than a diverging graph.
In essence, you "copied" the commits that were on paul_branch and that
were introduced after "Commit 4", and "pasted" them on top of
john_branch .
The command is called "rebase", because it changes the base commit of the
branch it's run from. That is, in your case, before running git rebase , the
base of paul_branch was "Commit 4" - as this is where the branch was
"born" (from main ). With rebase , you asked Git to give it another base -
that is, pretend as if it had been born from "Commit 6".
To do that, Git took what used to be "Commit 7", and "replayed" the changes
introduced in this commit onto "Commit 6". Then it created a new commit
object. This object differs from the original "Commit 7" in three aspects:
Notice the last commit here, "Commit 9'". The snapshot it represents (that
is, the tree that it points to) is exactly the same tree you would get by
merging the two branches. The state of the files in your Git repository
would be the same as if you used git merge . It's only the history that is
different, and the commit objects of course.
Hm.... What would happen if you ran this last command? Consider the
commit history again, after checking out main :
In the previous example, when you only used rebase (without additional
switches), Git replayed all the commits from the common ancestor to the tip
of the current branch.
Undo the last merge by making main point to "Commit 4" again:
Notice that you got to exactly the same history you used to have:
To be clear, "Commit 9" doesn't just disappear when it's not reachable from
the current HEAD . Rather, it's still stored in the object database. And as you
used git reset now to change HEAD to point to this commit, you were able
to retrieve it, and also its parent commits since they are also stored in the
database. Pretty cool, huh?
You will learn more about git reset in the next part, where we discuss
undoing changes in Git.
:
View the changes that Paul introduced:
git show HEAD~ (same as git show HEAD~1 ) shows the patch introduced by "Commit 8"
:
And one commit further:
Perhaps Paul doesn't want this kind of history. Rather, he wants it to seem
as if he introduced the changes in "Commit 7" and "Commit 8" as a single
commit.
For that, you can use an interactive rebase. To do that, we add the -i (or -
:
For that, you can use an interactive rebase. To do that, we add the -i (or -
-interactive ) switch to the rebase command:
By running this command, you tell Git to use a new base, "Commit 4". So you
are asking Git to go back to all commits that were introduced after "Commit
4" and that are reachable from the current HEAD , and replay those commits.
For every commit that is replayed, Git asks us what we'd like to do with it:
:
git rebase -i main prompts you to select what to do with each commit
In this context it's useful to think of a commit as a patch. That is, "Commit 7",
as in "the patch that "Commit 7" introduced on top of its parent".
One option is to use pick . This is the default behavior, which tells Git to
replay the changes introduced in this commit. In this case, if you just leave it
as is - and pick all commits - you will get the same history, and Git won't
even create new commit objects.
As you can see, git rebase -i provides additional options, but we won't
go into all of them in this chapter. If you allow the rebase to run, you will get
prompted to select a commit message for the newly created commit (that is,
the one that introduced the changes of both "Commit 7" and "Commit 8"):
:
Providing the commit message: Commits 7+8
git rebase grants you unlimited control over the shape of any branch. You
can use it to reorder commits, or to remove incorrect changes, or modify a
change in retrospect. Alternatively, you could perhaps move the base of
your branch onto another commit, any commit that you wish.
:
How to Use the --onto Switch of git rebase
Let's consider one more example. Get to main again:
And delete the pointers to paul_branch and john_branch so you don't see
them in the commit graph anymore:
Now, change the file code.py (for example, add a new function) and commit
your changes:
nano code.py
Oh wait, now I realize that I wanted you to make the changes introduced in
"Commit 11" as a part of the new_branch . Ugh. What can you do?
Instead of having "Commit 11" reside only on the main branch, I want it to
be on both the main branch as well as new_branch . Visually, I would want to
move it down the graph here:
To do that, you can use other arguments of git rebase . Specifically, you
can use git rebase --onto , which optionally takes three parameters:
That is, you take all commits between old_parent and until , and you
"cut" and "paste" them onto new_parent .
In this case, you'd tell Git that you want to take all the history introduced
between the common ancestor of main and new_branch , which is "Commit
4", and have the new base for that history be "Commit 11". To do that, use:
The history before and after the rebase, "Commit 10" has been "pushed"
Now branch out from feature_branch_1 (this is the mistake you will later
fix):
Modifying 2.py
Modifying 1.py
Try to think about it given the history graph and what you've learned about
the --onto flag for the rebase command.
This tells Git to take the history with "Commit 13" as a base, and change
that base to be "Commit 12" (pointed to by main ) instead.
It isn't really nice, is it? I mean, I have two commits that are related to one
another, "Commit 17" and "Commit 19" (turning ' s into " s), but they are
split by the unrelated "Commit 18" (where I added a new function). What
can we do? Can you help me?
:
Intuitively, I want to edit the history here:
I can rebase the history from "Commit 17" to "Commit 19", on top of
"Commit 15". To do that:
After following your advice and running the rebase command (thanks! )
I get the following screen:
Interactive rebase
So what would I do? I want to put "Commit 19" before "Commit 18", so it
comes right after "Commit 17". I can go further and squash them together,
like so:
Now when I get prompted for a commit message, I can provide the message
"Commit 17+19":
Thanks again!
With the upcoming use cases, I strongly suggest you stop reading after I've
introduced each use case, and then try to solve it on your own.
:
How to Exclude Commits
Say you have this history on another repo:
Before playing around with it, store a tag to "Commit F" so you can get back
to it later:
Now, you actually don't want the changes in "Commit C" and "Commit D" to
be included. You could use an interactive rebase like before and remove
their changes. Or, you could use git rebase --onto again. How would you
You can rebase HEAD on top of "Commit B", where the old parent was
actually "Commit D", and now it should be "Commit B". Consider the history
again:
:
The history again
Rebasing so that "Commit B" is the base of "Commit E" means "moving"
both "Commit E" and "Commit F", and giving them another base - "Commit
B". Can you come up with the command yourself?
Notice that using the syntax above (exactly as provided) would not move
main to point to the new commit, so the result is a "detached" HEAD . If you
use gg or another tool that displays the history reachable from branches, it
might confuse you:
But if you simply use git log (or my alias git lol ), you will see the
desired history:
:
The resulting history
I don't know about you, but these kinds of things make me really happy.
By the way, you could omit HEAD from the previous command as this is the
default value for the third parameter. So just using:
Would have the same effect. The last parameter actually tells Git where the
end of the current sequence of commits to rebase is. So the syntax of git
rebase --onto with three arguments is:
So, what does this mean in terms of rebase ? Consider the image above.
What commit (or commits) should I rebase, and which commit would be the
new base?
What I want is to take "Commit E", and this commit only, and change its base
to be "Commit B". In other words, to replay the changes introduced in
"Commit E" onto "Commit B".
Notice that rebase moved HEAD , but not any other reference named (such
as a branch or a tag). In other words, you are in a detached HEAD state. So
here too, using gg or another tool that displays the history reachable from
branches and tags might confuse you. You can use git log (or my alias git
lol ) to display the reachable history from HEAD .
Awesome!
```
This is a sample file
```
def new_feature():
print('new feature')
Say you are trying to rebase "Commit 12" onto another commit. If, for some
reason, these context lines don't exist as they do in the patch on the commit
you are rebasing onto, then you will have a conflict.
But, as you now know, they are very different in how they operate. While
merging results in a diverged history, rebasing results in a linear history.
Conflicts are possible in both cases. And there is one more column
described in the table above that requires some close attention.
Now that you know what "Git rebase" is, and how to use interactive rebase
or rebase --onto , as I hope you agree, git rebase is a super powerful tool.
Yet, it has one huge drawback when compared with merging.
This means that you should not rebase commits that exist outside your local
copy of the repository, and that other people may have based their commits
on.
In other words, if the only commits in question are those you created locally
- go ahead, use rebase, go wild.
But if the commits have been pushed, this can lead to a huge problem - as
someone else may rely on these commits that you later overwrite, and then
:
someone else may rely on these commits that you later overwrite, and then
you and they will have different versions of the repository.
This is unlike merge which, as we have seen, does not modify history.
For example, consider the last case where we rebased and resulted in this
history:
Now, assume that I have already pushed this branch to the remote. And
after I had pushed the branch, another developer pulled it and branched out
from "Commit C". The other developer didn't know that meanwhile, I was
locally rebasing my branch, and would later push it again.
I will not elaborate on what exactly this causes in this book, as my main
message is that you should definitely avoid such cases. If you're interested
in what would actually happen, I'll leave a link to a useful resource in the
additional references. For now, let's summarize what we have covered.
I hope I was able to convince you that git rebase is powerful - but also
that it is quite simple once you get the gist. It is a tool you can use to "copy-
paste" commits (or, more accurately, patches). And it's a useful tool to have
under your belt. In essence, git rebase takes the patches introduced by
commits, and replays them on another commit. As described in this chapter,
this is useful in many different scenarios.
Part 2 - Summary
In this part you learned about branching and integrating changes in Git.
You learned what a diff is, and the difference between a diff and a patch.
You also learned how the output of git diff is constructed.
Then, you got an extensive overview of merging with Git. You learned that
merging is the process of combining the recent changes from several
branches into a single new commit. The new commit has multiple parents -
those commits which had been the tips of the branches that were merged.
In most cases, merging combines the changes from two branches, and the
resulting merge commit then has two parents - one from each branch.
First, Git locates the merge base. As a reminder, this is the first
commit that is reachable from both branches.
Second, Git calculates two diffs - one diff from the merge base to the
first branch, and another diff from the merge base to the second
branch. Git generates patches based on those diffs.
Third and last, Git applies both patches to the merge base using a 3-
way merge algorithm. The result is the state of the new merge
commit.
You saw the output of git diff when we are in a conflicting state, and how
to resolve conflicts either manually or with VS Code.
Ultimately, you got to know Git rebase. You saw that git rebase is
powerful - but also that it is quite simple once you understand what it does.
It is a tool to "copy-paste" commits (or, more accurately, patches).
Both git merge and git rebase are used to integrate changes introduced
in different histories.
Yet, they differ in how they operate. While merging results in a diverged
:
history, rebasing results in a linear history. git rebase changes the history,
whereas git merge adds to the existing history.
With this deep understanding of diffs, patches, merge and rebase, you
should feel confident introducing changes to a git repository.
The next part will focus on what happens when things go wrong - how you
can change history (with or without git rebase ), or find "lost" commits.
Perhaps you committed to the wrong branch. Perhaps you lost some code
that you had written. Perhaps you committed something that you didn't
mean to.
This part will give you the tools to rewrite history with confidence, thereby
"undoing" all kinds of changes in Git.
Just like the other parts of the book, this part will be practical yet in-depth -
so instead of providing you with a list of things to do when things go wrong,
2. The staging area (index) which holds the tree for the next commit.
Note regarding the drawing conventions I use: I include .git within the
working directory, to remind you that it is a folder within the project's
folder on the filesystem. The .git folder actually contains the objects and
references of the repository, as explained in chapter 4.
Hands-on Demonstration
Use git init to initialize a new repository. Write some text into a file
called 1.txt :
:
called 1.txt :
mkdir my_repo
cd my_repo
git init
echo Hello world > 1.txt
Out of the three tree states described above, where is 1.txt now?
Notice that once you stage 1.txt , Git creates a blob object with the
content of this file, and adds it to the internal object database (within .git
folder), as covered in chapter 3 and chapter 4. I do not draw it as part of the
"repository" as in this representation, the "repository" refers to a tree of
commits and their references, and this blob has not been a part of any
commit.
When considering the diagrams, notice that we only have a single copy of
the file 1.txt on disk, and a corresponding blob object in Git's object
database. The "repository" tree now shows this file as it is part of the active
commit - that is, the commit object "Commit 1" points to a tree that points
to the blob with the contents of 1.txt , the same blob that the index is
pointing to.
For more information about the objects in Git (such as commits and trees),
refer to chapter 1.
The file 2.txt is in the working dir and the index after staging it with git add
Next, commit:
:
git commit -m "Commit 2"
A new commit object has been created, at first - main still points to the previous commit
Second, git commit moves the pointer of the active branch — in our case,
that would be main , to point to the newly created commit object.
:
git commit also updates the active branch to point to the newly created commit object
The syntax HEAD~1 refers to the first parent of HEAD . Consider a case
where I had more than one commit in the commit-graph, say "Commit 3"
This command asks Git to change whatever HEAD is pointing to. (Note: In
:
This command asks Git to change whatever HEAD is pointing to. (Note: In
the diagrams below, I use *HEAD for "whatever HEAD is pointing to".) In our
example, HEAD is pointing to main . So Git will only change the pointer of
main to point to HEAD~1 . That is, main will point to "Commit 1".
However, this command did not affect the state of the index or the working
tree. So if you use git status you will see that 2.txt is staged, just like
before you ran git commit :
git status shows that 2.txt is in the index, but not in the active commit
What about git log ? It will start from HEAD , go to main , and then to
"Commit 1":
Notice that this means that "Commit 2" is no longer reachable from our
history.
No, it's not deleted. It still resides within Git's internal object database of
objects.
If you push the current history now, by using git push , Git will not push
"Commit 2" to the remote server (as it is not reachable from the current
HEAD ), but the commit object still exists on your local copy of the repository.
Now, commit again - and use the commit message of "Commit 2.1" to
I omitted "Commit 2" as it is not reachable from HEAD , even though its
object exists in Git's internal object database.
Why are "Commit 2" and "Commit 2.1" different? Even if we used the same
commit message, and even though they point to the same tree object (of the
root folder consisting of 1.txt and 2.txt ), they still have different
timestamps, as they were created at different times. Both "Commit 2" and
"Commit 2.1" now point to "Commit 1", but only "Commit 2.1" is reachable
from HEAD .
This command starts the same as git reset --soft HEAD~1 . That is, the
command takes the pointer of whatever HEAD is pointing to now, which is
the main branch, and sets it to HEAD~1 , in our example - "Commit 1".
:
The first step of git reset --mixed is the same as git reset --soft
Next, Git goes further, effectively undoing the changes we made to the
index. That is, changing the index so that it matches with the current HEAD ,
the new HEAD after setting it in the first step.
If we ran git reset --mixed HEAD~1 , then HEAD ( main ) would be set to
HEAD~1 ("Commit 1"), and then Git would match the index to the state of
"Commit 1" - in this case, it means that 2.txt would no longer be part of
the index.
The second step of git reset --mixed is to match the index with the new HEAD
:
It's time to create a new commit with the state of the original "Commit 2".
This time you need to stage 2.txt again before creating it:
Again, Git starts with the --soft stage, setting whatever HEAD is pointing
to ( main ), to HEAD~1 ("Commit 1").
:
to ( main ), to HEAD~1 ("Commit 1").
The first step of git reset --hard is the same as git reset --soft
Next, moving on to the --mixed stage, matching the index with HEAD . That
is, Git undoes the staging of 2.txt .
The second step of git reset --hard is the same as git reset --mixed
Next comes the --hard step, where Git goes even further and matches the
working dir with the stage of the index. In this case, it means removing
2.txt also from the working dir.
:
The third step of git reset --hard matches the state of the working dir with that of the
index
So to introduce a change to Git, you have three steps: you change the
working dir, the index, or the staging area, and then you commit a new
snapshot with those changes. To undo these changes:
Also, save a tag so that you can get back to this commit later if needed:
Oh, oops!
What I actually wanted you to do is write some more love words in this file
before committing it.
:
What can you do?
Well, one way to overcome this would be to use git reset --mixed
HEAD~1 , effectively undoing both the committing and the staging actions
you took:
Well done!
You got this clear, nice history of "Commit 2.4" pointing to "Commit 1".
This tool is super, super useful, and you can accomplish almost anything
with it. It's not always the most convenient tool to use, but it's capable of
solving almost any rewriting-history scenario if you use it carefully.
For beginners, I recommend using only git reset for almost any time you
want to undo in Git. Once you feel comfortable with it, move on to other
tools.
Scenario #2
:
Let us consider another case.
(Note: In the drawing I omitted the files from the repository to avoid clutter.
Commit 3 includes 1.txt , love.txt and new.txt at this stage).
Oops. Actually, that's a mistake. You were on main , and I wanted you to
create this commit on a feature branch. My bad
There are two most important tools I want you to take from this chapter.
The second is git reset . The first and by far more important one is to
whiteboard the current state versus the state you want to be in.
For this scenario, the current state and the desired state look like so:
:
Scenario #2: current-vs-desired states
(Note: In following diagrams, I will refer to the current state as the "original"
state - before starting the process of rewriting history.)
1. main points to "Commit 3" (the blue one) in the current state, but to
"Commit 2.4" in the desired state.
If you can draw this and you know how to use git reset , you can definitely
get yourself out of this situation.
So again, the most important thing is to take a breath and draw this out.
Observing the drawing above, how do you get from the current state to the
desired one?
There are a few different ways of course, but I will present one option only
for each scenario. Feel free to play around with other options as well.
:
You can start by using git reset --soft HEAD~1 . This would set main to
point to the previous commit, "Commit 2.4":
Changing main : "Commit 3" is still there, just not reachable from HEAD
Peeking at the current-vs-desired diagram again, you can see that you need
a new branch, right? You can use git switch -c feature_branch for it, or
git checkout -b feature_branch (which does the same thing):
Since you used git reset --soft , you didn't change the index, so it
currently has exactly the state you want to commit - how convenient! You
can simply commit to feature_branch :
Scenario #3
Ready to apply your knowledge to additional cases?
The history, as well as the state of the index and the working dir after creating "Commit 4"
Oh, oops, actually I wanted you to create two separate commits, one with
each change...
Following this command, the index no longer includes those two changes,
:
Following this command, the index no longer includes those two changes,
but they're both still in your file system:
Committing separately
Nice!
Scenario #4
To clear up the state, switch to main and use reset --hard to make it point
to "Commit 3.1", while setting the index and the working dir to the state of
"Commit 3.1":
A new commit
Oops...
I'll give you a hint. The answer is really short and really easy. What do we do
first?
No, not reset . We draw. That's the first thing to do, as it would make
everything else so much easier. So this is the current state:
:
The new commit on main appears blue
How do you get from the current state to the desired state, what would be
easiest?
One way would be to use git reset as you did before, but there is another
way that I would like you to try.
Note that the following commands indeed assume the branch existing
:
Note that the following commands indeed assume the branch existing
exists on your repository, yet you haven't created it earlier. To match a state
where this branch actually exists, you can use the following commands:
Now your history should match the one shown in the picture with the
caption "We want the "blue" commit to be on another, existing , branch".
To ask Git to take the changes introduced between a commit and its parent
commit and just apply these changes on the active branch, you can use git
cherry-pick , a command we introduced in chapter 8. This command takes
the changes introduced in the specified revision and applies them to the
state of the active commit. Run:
You can specify the SHA-1 identifier of the desired commit, but you can also
use git cherry-pick main , as the commit whose changes you are applying
is the one main is pointing to.
git cherry-pick also creates a new commit object, and updates the active
branch to point to this new object, so the resulting state would be:
:
The result after using git cherry-pick
You made good progress - the desired commit is now on the existing
branch! But we don't want these changes to exist on main branch. git
cherry-pick only applied the changes to the existing branch. How can you
remove them from main ?
One way would be to switch back to main , and then reset it:
Also, note that you can ask Git to cherry-pick the changes introduced in
any commit, not only commits referenced by a branch.
git reset --hard <commit> , which goes through the --soft and -
-mixed stages, and then sets the state of the working dir to match
that of the index.
You then applied your knowledge about git reset to solve some real-life
issues that arise when using Git.
In the future chapters, we will cover additional Git commands and how they
can help us solve all kinds of undesired situations.
:
Chapter 10 - Additional Tools for Undoing
Changes
In the previous chapter, you met git reset . Indeed, git reset is a super
powerful tool, and I highly recommend to use it until you feel completely
comfortable with it.
Yet, git reset is not the only tool at our disposal. Some of the times, it is
not the most convenient tool to use. In others, it's just not enough. This
short chapter touches a few tools that are helpful for undoing changes in
Git.
And then I realized I didn't want you to commit it at that state, but rather -
write some more love words in this file before committing it.
To match this state, simply checkout the tag you created, which points to
"Commit 2.3":
:
git checkout scenario-1
In the previous chapter, when we introduced git reset , you solved this
issue by using git reset --mixed HEAD~1 , effectively undoing both the
committing and the staging actions you took.
Now I would like you to consider another approach. Keep working at the
state of the last introduced commit ("Commit 2.3", referenced by the tag
"scenario-1"), and make the changes you want:
Now, you can use git commit with the --amend switch, which tells it to
override the commit HEAD is pointing to. Actually, it will create another,
new commit, pointing to HEAD~1 ("Commit 1" in our example), and make
HEAD point to this newly created commit. By providing the -m argument
you can specify a new commit message as well:
The state after using git commit --amend (Commit "2.3" is unreachable and thus not included
in the drawing)
This tool is useful when you want to quickly override the last commit you
created. Indeed, you could use git reset to accomplish the same thing,
but you can view git commit --amend as a more convenient shortcut.
git revert
Okay, so another day, another problem.
Um, oops …
I just noticed something. I had a typo there. I wrote "This is more tezt"
instead of "This is more text". Whoops. So what's the big problem now? I
push ed, which means that someone else might have already pull ed those
changes.
Once you push the change, you need to be certain no one else has fetched
those changes if you are going to rewrite history.
:
Alternatively, you can use another tool called git revert . This command
takes the commit you're providing it with and computes the diff from its
parent commit, just like git cherry-pick , but this time, it computes the
reverse changes. That is, if in the specified commit you added a line, the
reverse would delete the line, and vice versa.
In our case we are reverting "Commit 3", so the reverse would be to delete
the line "This is more tezt" from love.txt . Since "Commit 3" is referenced
by main and HEAD , we can use any of these named references in this
command:
git revert created a new commit object, which means it's an addition to
the history. By using git revert , you didn't rewrite history. You admitted
your past mistake, and this commit is an acknowledgment that you made a
mistake and now you fixed it.
Some would say it's the more mature way. Some would say it's not as clean a
history as you would get if you used git reset to rewrite the previous
commit. But this is a way to avoid rewriting history.
You can use git revert to revert a commit other than HEAD . Say that you
want to reverse the parent of HEAD , you can use:
Notice that since Git will apply the reverse patch of the previous patch - this
operation might fail, as the patch may no longer apply and you might get a
conflict.
For that, you would usually rebase on a single branch, and use interactive
rebase. Consider again this example covered in chapter 8, where I worked
from feature_branch_2 , and specifically edited the file code.py . I started
by changing all strings to be wrapped by double quotes rather than single
quotes:
And now I realized I actually forgot to change the single quotes to double
quotes wrapping the __main__ (as you might have noticed), so I did that
too:
As explained in chapter 8, I got to a state with two commits that are related
:
As explained in chapter 8, I got to a state with two commits that are related
to one another, "Commit 17" and "Commit 19" (turning ' s into " s), but
they are split by the unrelated "Commit 18" (where I added a new function).
This is a classic case where git rebase would come in handy, to undo the
local changes before push ing a clean history.
I can rebase the history from "Commit 17" to "Commit 19", on top of
"Commit 15". To do that:
Interactive rebase
So what would I do? I want to put "Commit 19" before "Commit 18", so it
comes right after "Commit 17". I can go further and squash them together,
like so:
Now when I get prompted for a commit message, I can provide the message
"Commit 17+19":
The syntax used above, git rebase --interactive --onto <COMMIT X>
<COMMIT X> would be the most commonly used syntax by those who use
rebase regularly. The state of mind these developers usually have is to
create atomic commits while working, all the time, without being scared to
change them later. Then, before push ing their changes, they would rebase
the entire set of changes since the last push , and rearrange it so the history
becomes coherent.
git reflog
Time to consider a more startling case.
Get some work done, write some code, and add it to love.txt . Stage this
change, and commit it:
:
echo lots of work >> love.txt
git add love.txt
git commit -m "Commit 3.2"
(I'm using "Commit 3.2" to indicate that this is not the same commit as
"Commit 3" we used when explaining git revert .)
I did the same on my machine, and I used the Up arrow key on my keyboard
to scroll back to previous commands, and then I hit Enter , and… Wow.
Whoops.
What actually happened? As you learned in the previous chapter, Git moved
the pointer to HEAD~1 , so the last commit, with all of my precious work, is
not reachable from the current history. Git also removed all the changes
:
not reachable from the current history. Git also removed all the changes
from the staging area, and then matched the working dir to the state of the
staging area.
That is, everything matches this state where my work is… gone.
But, really, is there a reason to freak out? Not really… We're relaxed people.
What do we do? Well, intuitively, is the commit really, really gone?
No. Why not? It still exists inside the internal database of Git.
If I only knew where that is, I would know the SHA-1 value that identifies
this commit, and we could restore it. I could even undo the undoing, and
reset back to this commit.
Actually, the only thing I really need here is the SHA-1 of the "deleted"
commit.
Now the question is, how do I find it? Would git log be useful?
Well, not really. git log would go to HEAD , which points to main , which
points to the parent commit of the commit we are looking for. Then, git
log would trace back through the parent chain, which does not include the
commit with my precious work.
:
git log doesn't help in this case
Thankfully, the very smart people who created Git also created a backup
plan for us, and that is called the reflog .
While you work with Git, whenever you change HEAD , which you can do by
using git reset , but also other commands like git switch or git
checkout , Git adds an entry to the reflog .
We can also relate to it by its "nickname" - HEAD@{1} . Similar to the way Git
uses HEAD~1 to get to the first parent of HEAD , and HEAD~2 to refer to the
second parent of HEAD and so on, Git uses HEAD@{1} to refer to the first
reflog parent of HEAD , that is, where HEAD pointed to in the previous step.
Note: In case you are using Windows, you may need to wrap it with
:
Note: In case you are using Windows, you may need to wrap it with
quotation marks - like so:
Another way to view the reflog is by using git log -g , which asks git
log to actually consider the reflog :
You can see in the output of git log -g that the reflog 's entry HEAD@{0} ,
just like HEAD , points to main , which points to "Commit 2". But the parent
of that entry in the reflog points to "Commit 3".
So to get back to "Commit 3", you can just use git reset --hard HEAD@{1}
(or the SHA-1 value of "Commit 3"):
What would happen if I used this command again? And ran git reset --
hard HEAD@{1} ?
Git would set HEAD to where HEAD was pointing before the last reset ,
meaning to "Commit 2". We can keep going all day:
In this chapter, you extended your toolbox for undoing changes in Git with a
few new commands:
git commit --amend - which "overrides" the last commit with the
:
stage of the index. Mostly useful when you just committed
something and want to modify that last commit.
git rebase - which you already know from chapter 8, and is useful
for rewriting the history of multiple commits, especially before
pushing them.
git reflog (and git log -g ) - which tracks all changes to HEAD , so
you might find the SHA-1 value of a commit you need to get back to.
The most important tool, even more important than the tools I just listed, is
to whiteboard the current situation vs the desired one. Trust me on this, it
will make every situation seem less daunting and the solution more clear.
There are additional tools that allow you to reverse changes in Git (I will
provide links in the appendix), but the collection of tools covered here
should prepare you to tackle any challenge with confidence.
Chapter 11 - Exercises
This chapter includes a few exercises to deepen your understanding of the
tools you learned in Part 3. The full version of this book also includes
detailed solutions for each.
https://github.com/Omerr/undo-exercises.git
:
Each exercise exists on a branch with the name exercise_XX , so Exercise 1
is found on branch exercise_01 , Exercise 2 is found on branch
exercise_02 and so on.
Note: As explained in previous chapters, if you work with commits that can
be found on a remote server (which you are in this case, as you are using my
repository "undo-exercises"), you should probably use git revert instead
of git reset . Similar to git rebase , the command git reset also
rewrites history - and thus you should refrain from using it on commits that
others may have relied on.
For the purposes of these exercises, you can assume no one else has cloned
or pulled code from the remote repository. Just remember - in real life, you
should probably use git revert instead of commands that rewrite history
in such cases.
Exercise 1
On branch exercise_01 , consider the file hello.txt :
This file includes a typo (in the last character). Find the commit that
introduced this typo.
Exercise (1a)
Remove this commit from the reachable history using git reset (with the
right arguments), fix the typo, and commit again. Consider your history.
:
right arguments), fix the typo, and commit again. Consider your history.
Exercise (1b)
Remove the faulty commit using git commit --amend , and get to the same
state of the history as in the end of exercise (1a).
Exercise (1c)
revert the faulty commit using git revert and fix the typo. Consider
your history.
Exercise (1d)
Using git rebase , get to the same state as in the end of exercise (1a).
Exercise 2
Switch to exercise_02 , and consider the contents of exercise_02.txt :
git lol
Use the tools you've acquired to create a history where the creation of
exercise_02.txt is all done in a single commit.
Exercise 3
Consider the history on branch exercise_03 :
Fix these issues, but rely on the changes of each original commit. The
resulting history should look like so:
Exercise 4
This exercise actually consists of three branches: exercise_04 ,
exercise_04_a , and exercise_04_b .
To see the history of these branches without others, use the following
syntax:
Good luck!
This part relies on the basics you acquired in the previous parts, and covers
specific commands and options that you may find useful. Given your
understanding of how Git works, having these small tools can make you a
real pro in Gitting things done.
Most developers use git log , few use it effectively. In this chapter you will
learn useful tweaks for making the most of git log . Once you feel
comfortable with the different switches of this command, it will be a game
changer in your day to day work with Git.
Thinking about it, git log encompasses the essence of every version
control system - that is, to record changes in versions. You record versions
so that you can consider the history of your project - perhaps revert or
apply specific changes, prefer to switch to a different point in time and test
things there. Perhaps you would like to know who contributed a certain
piece of code or when they did that.
While git does preserve this information by using commit objects, that
also point to their parent commits, and references to commit objects (such
as branches or HEAD ), this storing of versions is not enough. Without being
able to find the relevant commit you would like to consider, or gather the
relevant information about it, having this data stored is pretty useless.
You can think of your commit objects as different books that pile up in a
huge stack, or in a library, filling long shelves. The information you might
need is in these books, but if you don't have an index - a way to know in
which book the information you seek lies, or where this book is located
within the library - you wouldn't be able to make much use of it. git log is
this indexing of your library - it's a way to find the relevant commits and the
information about them.
The useful arguments for git log that you will learn in this chapter either
format how commits are displayed in the log, or filter specific commits.
:
git lol , an alias which I have used throughout the book, uses some of
these switches, as I will demonstrate. Feel free to tweak this alias (or create
another from scratch) after reading this chapter.
Filtering Commits
Consider the default output of git log :
The log starts from HEAD , and follows the parent chain.
You can specify multiple revisions for git log - if you write git log
branch_1 branch_2 , you ask git log to include every commit that is
reachable from branch_1 or branch_2 (or both).
git log will exclude any commits that are reachable from revisions
preceded by a ^ .
asks git log to include every commit that is reachable from branch_1 , but
not those reachable from branch_2 .
Consider the history when I use git log feature_branch_1 on this repo:
Indeed, git log outputs only "Commit 13" and "Commit 16", which are
reachable from feature_branch_1 but not from main .
By Author
If you know you are looking for a commit that a specific person has
authored, you can filter these commits by using that user's name or email,
like so:
:
git log --author="Name"
You can use regular expressions to look for author names that match a
specific pattern, for example:
By Date
When you know that the change you are looking for has been committed
within a specific timeframe, you can use --before or --after to filter
commits from that timeframe.
For example, to get all commits introduced after April 12th, 2023
(inclusive), use:
By Paths
You can ask git log to only show commits where changes to files in specific
paths have been introduced. Notice that this does not mean any commit
that points to a tree that includes the files in question, but rather that if we
:
compute the difference between the commit in question and its parent, we
would see that at least one of the paths has been modified.
to find all commits that are reachable from any named pointer, or HEAD , and
introduce a change to 1.py . You can specify multiple paths:
The previous command will make git log include reachable commits that
introduced a change to 1.py or 2.py (or both).
will include commits reachable from HEAD that include a change to any file
in the root directory whose name ends with a .py . To look for any file
whose name ends with .py , you can use:
:
git log -- **/*.py
By Commit Message
If you know the commit message (or parts of it) of the commit you are
searching, you can use the --grep switch for "git log", for example:
By Diff Content
This one is super useful, and it saved me countless times. By using git log
-S , you can search for commits that introduce or remove a particular line of
source code.
This comes in handy, for example, when you know you have created
something in the repo, but you don't know where it is now. You can't find it
anywhere on your filesystem (it's not in HEAD ), and you know it must be
there - lurking somewhere in this library (bunch of commits) that you have.
Say I remember I wrote a line with the text Git is awesome , but I can't find
it now. I could run:
Formatting Log
Consider the default output of git log again:
The log starts from HEAD , and follows the parent chain.
Each log entry begins with a line starting with commit and then the SHA-1
of the commit, perhaps followed by additional pointers that point to this
:
commit.
It is then followed by the author, date, and commit message.
--oneline
The main difficulty with the default output of git log is that it is hard to
understand a history with more than a few commits, as you simply don't see
them all.
In the output of git log shown before, only four commit objects appeared
on my screen. Using git log --oneline provides a more concise view,
showing the SHA-1 of the commit, next to its message, and named
references if relevant:
If you wish to omit the named references, you can add the --no-decorate
switch:
To explicitly ask for git log to show decorations, you can use git log --
decorate .
:
decorate .
--graph
git log --oneline shows a compact representation. That is great when
we have a linear history, perhaps on a single branch. But what happens
when we have multiple branches, that may diverge from one another?
You can actually see that feature_branch_1 branched from main (as
"Commit 12", main , is the parent of "Commit 13"), and also that
feature_branch_2 branched from main (as the parent of "Commit 14" is
also "Commit 12").
The * symbol tells us which branch a certain commit is "on", so you can
know for sure that "Commit 13" is on feature_branch_1 , and not
feature_branch_2 .
--pretty=format
The above result is already very useful! Yet, it lacks a few things. We don't
know the author or the time of the commit. These two information details
were included in the default output of git log which was very long.
Perhaps we can add them in a more compact way?
In the following command, the %s , %an and %cd placeholders are replaced
by the commit's subject (message), author name, and the commit's date,
respectively.
:
git log --oneline --graph feature_branch_1 feature_branch_2 --pretty=format:"%s
That's useful, but not really great to look at. We can then use other
formatting tricks, specifically %C(color) that will switch the color to
color , until reaching a %Creset that resets the color. To make the author
name's yellow, you can use:
abbrev-commit
You already know --graph , which makes the output include an ASCII
graph.
--abbrev-commit uses a short prefix from the full SHA-1 of the commit (in
my configuration, the first seven characters).
:
my configuration, the first seven characters).
I like this output because I find it clear. It gives me the information I need,
with enough coloring so that every detail stands out without hurting my
eyes. But if you prefer other information, other colors, a different order, or
anything else - go ahead and tweak it to your liking.
Setting an alias
As you know, I set git lol as an alias - that is, when I run git lol , it
executes the long command I provided previously.
I have a bug.
I can tell that two weeks ago, this didn't happen. Luckily for me, I have been
using Git (obviously, I know...), so I can go back in time and test a past
version of my code. Indeed, in this version - everything worked fine.
But... I have made many changes in these two weeks. Alas, not just me - my
entire team has contributed commits that add, delete, or modify parts of
the code base. Where do I begin? Should I go over every change introduced
in those two weeks?
The goal of git bisect is help you find the commit where a bug was
:
The goal of git bisect is help you find the commit where a bug was
introduced, in an effective manner.
The key here is using binary search - by looking at the halfway point and
deciding if it is the new top or bottom of the list of commits, you can find the
right commit efficiently. Even if you have 10,000 commits to hunt through,
it only takes a maximum of 13 steps to find the first commit that introduced
the bug.
In this repository, we have a single python file that is used to compute the
value of pi (which is approximately 3.14 ). If you run python3 get_pi.py on
main , however, you will get a wrong result:
If you checkout to this commit and run python3 get_pi.py again, the
result is correct:
To find it using git bisect , start the bisect process, and mark this
commit as "good":
:
git bisect start
git bisect good
By default, git bisect good would take HEAD as the "good" commit. To
mark main as "bad", you can use git bisect bad main :
git bisect checked out commit number 251 , the "middle point" of main
branch. Does the state in this commit produce the right or wrong output?
Trying again...
We still get the wrong output, which means we can discard commits 252
through 500 (and additional commits after that), and narrow our search to
commits 2 through 251 . Mark this as bad :
Mark as bad
git bisect checked out the "middle" commit (number 126 ), and running
:
git bisect checked out the "middle" commit (number 126 ), and running
the code again results in the right answer! This means that this commit is
"good", and that the first "bad" commit is somewhere between 127 and
251 . Mark it as "good":
Mark as good
Nice, git bisect takes us to commit 188 , as this is the "middle" commit
between 127 and 251 . By running the code again, you can see that the
result is wrong, so this is actually a "bad" commit, which means the first
faulty commit is somewhere between 127 and 188 . As you can see, git
bisect narrows down the search space by half on each iteration.
Come on, now it's your turn - keep going from here! Test the result of
python3 get_pi.py and use git bisect good or git bisect bad to mark
the commit accordingly. What is the faulty commit?
When you are done, use git bisect reset to stop the bisect process.
As this book is not about programming and doesn't assume you know a
specific programming language, I will not show an example of implementing
my_script . The README.md file in the repository used in this chapter
(https://github.com/Omerr/bisect-exercise.git) includes an example for a
script that you can run with git bisect run to automatically find the
faulty commit for the previous example.
git cherry-pick
Introduced in chapter 8, this command takes a given commit, computes the
patch this commit introduces by computing the difference between the
parent's commit and the commit itself, and then cherry-pick "replays" this
difference. It is like "copy-pasting" a commit, that is, the diff this commit
introduced.
You can see that in this commit, John started working on a song called "Lucy
in the Sky with Diamonds":
As a reminder, you can also use the command git show to get the same
output:
:
git show <SHA_OF_COMMIT_5>
Now, if you cherry-pick this commit, you will introduce this change
specifically, on the active branch. You can switch to main branch:
Using cherry-pick to apply the changes introduced in "Commit 5" onto main
It seems like you copy-pasted "Commit 5". Remember that even though it has
the same commit message, and introduces the same changes, and even
:
the same commit message, and introduces the same changes, and even
points to the same tree object as the original "Commit 5" in this case - it is
still a different commit object, as it was created with a different timestamp.
git revert
git revert is essentially the reverse of git cherry-pick , introduced in
chapter 10. This command takes the commit you're providing it with and
computes the diff from its parent commit, just like git cherry-pick , but
this time, it computes the reverse changes. That is, if in the specified commit
you added a line, the reverse would delete the line, and vice versa.
git add -p
Staging changes is an integral part of introducing changes to Git.
Sometimes, you wish to stage all changes together (with git add . ), or
perhaps stage all changes of a specific file (using git add <file_path> ). Yet
there are times where it would be convenient to stage only certain parts of
:
there are times where it would be convenient to stage only certain parts of
modified files.
my_file.py
You then modify this file - by changing text within function_1 , and also
adding a new function, function_5 :
If you used git add my_file.py at this point, you would stage both of
these changes together. In case you want to separate them into different
commits, you could use git add -p , which splits these two changes and
asks you about each one as a standalone hunk:
:
git add -p
By typing ? , you can see what the different options stand for:
In this case, say we only want to stage the change introducing function_5 .
We do not want to stage the change of function_1 , so we select n :
:
Not staging the change to function_1
Next, we are prompted for the second change - the one introducing
function_5 . We want to stage this hunk indeed, to can do so we can type
y .
Summary
Well, this was FUN!
You then learned about branches, seeing that they are nothing but a named
reference to a commit.
You learned the process of recording changes in Git, and that it involves the
working directory, the staging area (index), and the repository.
Then - you created a new repository from scratch, by using echo and low-
level commands such as git hash-object . You created a blob, a tree, and a
commit object pointing to that tree.
You learned what a diff is, and the difference between a diff and a patch.
You also learned how the output of git diff is constructed.
You saw that git rebase is powerful - but also that it is quite simple once
you understand what it does. You understood the differences between
merging and rebasing, and when you should use each.
In Part 3 you learned how to undo changes in Git - especially when things
go wrong. You learned how to use a bunch of tools, like git reset , git
commit --amend , git revert , git reflog (and git log -g ).
The most important tool, even more important than the tools I just listed, is
to whiteboard the current situation vs the desired one. Trust me on this, it
will make every situation seem less daunting and the solution more clear.
If you want to read more of my Git articles and handbooks, here they are:
Acknowledgements
Many people helped make this book the best it can be. Among them, I was
lucky to have many beta readers that provided me with feedback so that I
can improve the book. Specifically, I would like to thank Jason S. Shapiro,
Anna Łapińska, C. Bruce Hilbert, and Jonathon McKitrick for their thorough
reviews.
Abbey Rennemeyer has been a wonderful editor. After she has reviewed my
posts for freeCodeCamp for over three years, it was clear that I would like
to ask her to be the editor of this book as well. She helped me improve the
book in many ways, and I am grateful for her help.
Contact Me
This book has been created to help you and people like you learn,
understand Git, and apply their knowledge in real life.
Right from the beginning, I asked for feedback and was lucky to receive it
from great people (mentioned in the Acknowledgements) to make sure the
book achieves these goals. If you liked something about this book, felt that
something was missing or needed improvement - I would love to hear from
you. Please reach out at: [email protected] .
- Omer Rosenbaum
Appendixes
:
Appendixes
Additional References - By Part
(Note - this is a short list. You can find a longer list of references on the E-
Book or printed version.)
Part 1
Git Internals YouTube playlist - by Brief:
https://www.youtube.com/playlist?
list=PL9lx0DXCC4BNUby5H58y6s2TQVLadV8v7
Part 2
Diffs and Patches
Git Diffs algorithms:
https://en.wikipedia.org/wiki/Diff
https://www.nathaniel.ai/myers-diff/
https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-
part-1/
https://blog.robertelder.org/diff-algorithm/
:
Git Merge
https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging
https://blog.plasticscm.com/2010/11/live-to-merge-merge-to-
live.html
Git Rebase
https://jwiegley.github.io/git-from-the-bottom-up/1-Repository/7-
branching-and-the-power-of-rebase.html
https://git-scm.com/book/en/v2/Git-Branching-Rebasing
Beatles-Related Resources
https://www.the-paulmccartney-project.com/song/ive-got-a-
feeling/
https://www.cheatsheet.com/entertainment/did-john-lennon-or-
paul-mccartney-write-the-classic-a-day-in-the-life.html/
http://lifeofthebeatles.blogspot.com/2009/06/ive-got-feeling-
lyrics.html
Part 3
https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified
https://www.edureka.co/blog/common-git-mistakes/
Omer Rosenbaum
Read more posts.
If you read this far, thank the author to show them you care.
Say Thanks
Learn to code for free. freeCodeCamp's open source curriculum has helped
more than 40,000 people get jobs as developers. Get started
ADVERTISEMENT
Our mission: to help people learn to code for free. We accomplish this by creating
thousands of videos, articles, and interactive coding lessons - all freely available to the
public.
Donations to freeCodeCamp go toward our education initiatives, and help pay for
servers, services, and staff.
Trending Guides
What is Programming?
Open-Closed Principle
Compare Strings in JS
time.sleep() in Python
Python Requirements.txt
What is a ROM?
JavaScript Require
Our Charity
Copyright Policy
: