Scaling Git’s garbage collection (GitHub blog)
The GitHub blog has adetailed look at garbage collection in Git and the work that has beendone to make it faster.
To solve this problem, we turned to a long-discussed idea on theGit mailing list: cruft packs. The idea is simple: store anauxiliary list of mtime data alongside a pack containingjust unreachable objects. To garbage collect a repository, Gitplaces the unreachable objects in a pack. That pack is designatedas a cruft pack" because Git also writes the mtime datacorresponding to each object in a separate file alongside thatpack. This makes it possible to update the mtime of asingle unreachable object without changing the mtimes ofany other unreachable object.