Lecture 10 - Version Control
Lecture 10 - Version Control
>
Version Control
2
> Version Control
➢Why track/manage different versions of code?
> The essence of version control
➢System which records snapshots of a project
➢Implements branching:
• You can work on several feature branches and switch between them
• Different people can work on the same code/project without interfering
• You can experiment with an idea and discard it if it turns out to be a bad
idea
➢Implements merging:
• Person A and B’s simultaneous work can be easily combined
> What we typically like to snapshot?
➢Software (this is how it started but Git/GitHub can track a lot more)
➢Scripts
➢Documents (plain text files much better suitable than Word
documents)
➢Manuscripts (Git is great for collaborating/sharing LaTeX
manuscripts)
➢Configuration files
➢Website sources
➢Data
5
> Why version control?
➢Roll-back functionality
• Mistakes happen - without recorded snapshots you cannot easily undo
mistakes and go back to a working version.
6
> Why version control?
➢Branching
• Often you need to work on several issues/features in one code -
without branching this can be messy and confusing.
• You can simulate branching by copying the entire code to multiple
places but also this will be messy and confusing.
7
> Why version control?
➢Collaboration
• With version control, none of these are needed anymore (or have
much simpler answers):
• “I will just finish my work and then you can start with your changes.”
• “Can you please send me the latest version?”
• “Where is the latest version?”
• “Which version are you using?”
• “Which version have the authors used in the paper I am trying to
reproduce?”
8
> Why version control?
➢Reproducibility
• How do you indicate which version of your code you have used in your
paper?
• When you find a bug, how do you know when precisely this bug was
introduced (Are published results affected? Do you need to inform
collaborators or users of your code?).
9
> Compare with Dropbox or Google Drive
➢Document/code is in one place, no need to email snapshots.
10
> Version Control Systems (VCSs)
➢Help you track/manage/distribute revisions
➢Examples:
older • Revision Control System (RCS)
• Concurrent Versions System (CVS)
• Subversion (SVN)
• Git
newer
Our focus
> Version Control Hosting Services
➢Enable sharing version control repos
➢Internet/Web based
➢Examples:
• SourceForge
• Bitbucket
• GitLab
• GitHub
Our focus
> About Git
➢Created by Linus Torvalds, creator of Linux, in 2005
• – Came out of Linux development community
• – Designed to do version control on Linux kernel
➢• Goals of Git:
• – Speed
• – Support for non-linear development (thousands of parallel branches)
• – Fully distributed
• – Able to handle large projects efficiently
13
> Centralized VCS
• A central server repository (repo)
holds the "official copy" of the code
– the server maintains the sole
version history of the repo
• You make "checkouts" of it to
your local copy
– you make local modifications
– your changes are not versioned
• When you're done, you "check in"
back to the server
– your checkin increments the
repo's version
14
> Distributed VCS (Git)
➢Your local repo is a complete copy
of everything on the remote server
• – yours is "just as good" as theirs
➢Many operations are local:
– check in/out from local repo
– commit changes to local repo
– local repo keeps version history
➢When you're ready, you can "push"
changes back to server
15
> Snapshots
➢Centralized VCS like Subversion track version data on each
individual file.
16
> Snapshots
➢Git keeps "snapshots" of
the entire state of the
project.
• – Each checkin version of
the overall code has a copy
of each file in it.
• – Some files change on a
given checkin, some do not.
• – More redundancy, but
faster.
17
> GitHub-User Perspective
You GitHub
Working Dir
Local Remote
Repos Repos
> Using GitHub to Collaborate Wo
rkin
g Dir
L oc
Rep al
os
GitHub
al
L oc o s
Rep
g Dir
rkin
Wo
> Questions to answer
How organized?
You GitHub
Working Dir
Local Remote
Repos Repos
What operations?
> Local git areas
22
> Git staging and committing
23
> Git Commit checksums
➢In Subversion each modification to the central repo increments the
version # of the overall repo.
• – In Git, each user has their own copy of the repo, and commits changes
to their local copy of the repo before pushing to the central server.
• – So Git generates a unique SHA-1 hash (40 character string of hex digits)
for every commit.
• – Refers to commits by this ID rather than a version number.
• – Often we only see the first 7 characters:
• 1677b2d Edited first line of readme
• 258efa7 Added line to readme
• 0e52da7 Initial commit
24
> Git Commit
➢WHAT INFORMATION IS IN ONE COMMIT?
• Commit hash
• Uniquely identifies a commit. Calculated from other contents of the commit and the
hash of the parent
• Author
• Timestamp
• Commit message
• Why, what?
• Changes
• • Parent commit(s)
25
> Repo Organization
http://git-scm.com/book/
> Repo Organization
Commits (from
oldest to newest;
hashes as commit
IDs)
http://git-scm.com/book/
> Repo Organization
http://git-scm.com/book/
> Repo Organization
Branch (last commit)
http://git-scm.com/book/
> Local repos also have...
HEAD
Current Version
in Working Dir
http://git-scm.com/book/
> Local Repo Operations
You
• init
• add/commit
Working Dir
Local • log
Repos • switch/checkout
• branch
• merge
• …
Before
> How commit works...
HEAD
http://git-scm.com/book/
> How commit works... After
HEAD
http://git-scm.com/book/
> Remote Repo Operations
You GitHub
Git in a nutshell
>
>
➢https://learngitbranching.js.org/
36
> Branches
➢A branch is just a pointer to a commit
37
>
➢Creating a new branch just adds a pointer to a commit:
39
>
➢If you add commits on
both master and testing, the code
can diverge:
40
> BRANCH WORKFLOWS
➢https://www.atlassian.com/git/tutorials/comparing-workflows
41
> 1. Centralized workflow
➢Central repository to serve as the single point-of-entry for all
changes to the project
➢Default development branch is called master
– all changes are committed into master
– doesn’t require any other branches
42
>
43
>
44
>
45
>
46
>
error: failed to push some refs to
'/path/to/repo.git’ hint: Updates were rejected
because the tip of your current branch is
behind its remote counterpart. Merge the remote
changes (e.g. 'git pull') before pushing again.
See the 'Note about fast-forwards' in 'git push
--help' for details.
git push origin master
47
>
48
>
49
>
50
>
51
>
52
> 2. Git Feature Branch Workflow
➢All feature development should take place in a dedicated branch
instead of the master branch
53
>
git status
git add <some-file>
git commit
54
>
git push 56
>
57
>
58
> Merge pull request
59
> 3. Gitflow Workflow
➢Strict branching model designed around the project release
• Suitable for projects that have a scheduled release cycle
➢Branches have specific roles and interactions
➢Uses two branches
• Master stores the official release history; tag all commits in the master
branch with a version number
• Develop serves as an integration branch for features
60
>
61
> GitFlow feature branches (from develop)
62
> GitFlow release branches (eventually into master)
63
> GitFlow hotfix branches
64
>
65
> Let us explore an existing Git repository
➢History
• Explore the repository.
• Explore the history.
• Note that there are branches.
➢Reproducibility
• Discuss the enormous value of the annotation feature: example file.
➢Collaboration
• You can refer to code portions (so much simpler to send a link rather than
describe which file to open and where to scroll to).
• Browse the forks.
• See contributors.
➢Releases
• Explore the release history.
66