Skip to content

Latest commit

 

History

History
321 lines (259 loc) · 18.8 KB

2019-02-27-edition-48.markdown

File metadata and controls

321 lines (259 loc) · 18.8 KB
title layout date author categories navbar
Git Rev News Edition 48 (February 27th, 2019)
default
2019-02-27 12:06:51 +0100
chriscool
news
false

Git Rev News: Edition 48 (February 27th, 2019)

Welcome to the 48th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of January 2019. It also covers the Git Contributor Summit and the Git Merge conference that took place on January 31th and February 1st.

Discussions

General

  • Git Merge 2019 — General Sessions

    The Git Merge 2019 conference took place in Brussels, Belgium on January 31st (workshops and contributor summit) and February 1st (main conference day).

    • This year a big theme was handling large Git repositories, both from technical and organizational point of view

      • Ivan Frade and Minh Thai in "Tales in scalability: how Google has seen users break Git" talked about solving problems with Android (many repos, huge binary assets, many commits) and Chromium monorepo (many unique committers). Some of the problems were caused by legacy practices of trying to keep Subversion-like monotonic version number -- it turned out that attempts to provide it got into troubles and were cause of much of churn. Another problem was the change in Gerrit, which now stores patch history in git repo, resulting in "forest of tiny bushes" graph of commits; the solution here was moving to protocol v2. There was also talk about making the negotiation phase during fetch faster at the cost of somewhat bigger data transfer, e.g. by skipping commits using Fibonacci number gaps.

      • Johan Abildskov, a consultant at Praqma in "The what, how and why of scaling repositories" talked about how to choose between monorepos and many-repos (and how to split the codebase into repositories). The major idea was to not ignore the real problems (like having to create multiple commits to handle single bug), and to base decision on data

        Our conclusions are not better than our data

        For this reason the git-metrics tool was created, which is a set of util scripts to scrape data from git repositories to help teams improve.

      • Brandon Williams from Facebook gave a lightning talk "Git protocols: still tinkering after all these years?" focusing on introduction of protocol v2 to reduce communication overhead (especially important for repositories with large number of branches and tags) and increase extensibility, and troubles with adding it while maintaining all-important backwards compatibility.

      • Terry Parker from Google gave a lightning talk "Native Git support for large objects" explain how Git’s new partial clone feature (where only a subset of objects, selected by initial filter, e.g. --filter=blob:limit=1m, is downloaded on clone; the rest are fetched on demand, as needed) and the new proposal to use content distribution networks (CDN) can help with handling repositories with large files.

      • John Briggs from Microsoft in "Technical contributions towards scaling for Windows" talked about both technical improvements in Git, like serialized commit graph (with generation numbers) and multipack index (*.midx), and the "sparse" object walk during push that is being worked on (see the "Reviews" section), and improvements in VFS for Git (formerly called GVFS), like prefetching in background and git status serialization. He also announced that VFS for Git will be ported to other platforms: MacOS and Linux (to handle MS Office, which itself is cross-platform project).

    • John Austin, game studio technical lead from A Stranger Gravity and Funomena in "Git for games: current problems and solutions" talked about major problem with using Git in game development workflows, namely many and large binary files, for which file conflicts are lost work (minor change, like adding voiceover or changing equalizer settings results in large changes to files). File locking is one possibility, but it doesn't play nicely with Git -- it is inherently centralized. He introduces a new tool, Git Global Graph (a work in progress), which can be used to check at commit time if it wouldn't create a divergent version of a file. The idea is that there should be only a single path through commit graph with changes to binary files.

    • Javier Fontan from source{d} gave a lightning talk "Gitbase, SQL interface to Git repositories" about gitbase tool, which provides read-only SQL interface to Git repositories (with Abstract Syntax Tree support).

    • Brian M. Carlson, Git Ecosystem Engineer at GitHub in "Bridging the gap: transitioning Git to SHA-256" talked about ongoing work to transition from SHA-1, which is considered weak, to SHA-256, which is more secure: the transition plan, where we are with it, and how to provide interoperability between versions of Git using different hash algorithms.

    • Belén Barros Pena, PhD student and interaction designer, gave talk "The art of patience: why you should bother teaching Git to designers", where she also described how to do it and provide good retention, namely:

      1. Show things on a need-to-know basis
      2. Avoid the Git jargon
      3. Don't bother too much with the concepts; will be grasped through practice
      4. Do things with, never for, your designer
      5. Designer should take notes and keep cheat sheet
      6. Teach command--line Git
    • Veronica Hanus in "Version control for visual learners" talked about how to enter visual representations of recently-changed elements into version control in the form of screenshot diffing.

Reviews

Last November Derrick Stolee, who prefers to be called just Stolee, sent a patch series to the mailing list to speed up git push operations by implementing and using a new "sparse" tree walk algorithm.

Stefan Beller wondered how users can know about this new algorithm and if it should be turned on by default for users. Stolee replied that indeed "we should actually make the config setting true by default, and recommend that servers opt-out".

Junio Hamano, the Git maintainer, disagreed saying that we should wait until "enough users complain that they have to turn it on" before we turn it on by default.

Stolee later sent a version 2 of the patch series improving the tests, then a version 3 improving the documentation, and a version 4 with a few code and commit message improvements.

Junio and Stolee discussed how the mark_trees_uninteresting_sparse() function is implemented in the first patch, and how a variable is named in this function.

They also discussed the purpose of patches 2 and 3 and agreed that they should be merged and what the related tests should do.

Additionally, Junio suggested a number of small code improvements in the last patch. Especially he suggested to get rid of a global variable that was unused. Ramsay Jones, who regularly uses the sparse tool and his own static-check.pl script on the Git code base to find errors, had also found this unused variable separately.

Ævar Arnfjörð Bjarmason chimed in to ask for a clarification about which step the patch speeds up, and if a progress bar should be added while the user is waiting during this step, and how this step should be named on the command line interface. It seems though that some preliminary work would be needed to untangle the steps during which a progress bar is already displayed.

Stolee eventually sent a version 5 of the patch series on January 16th which has since been merged and is in the recently released Git v2.21.0.

Releases

Other News

Various

Light reading

Git tools and sites

Credits

This edition of Git Rev News was curated by Christian Couder <[email protected]>, Jakub Narębski <[email protected]>, Markus Jansen <[email protected]> and Gabriel Alcaras <[email protected]> with help from David Pursehouse and Luca Milanesio.