Exponential Backup
The first day of a new job is always an adjustment. There's a fine line between explaining that you're unused to a procedure and constantly saying "At my old company...". After all, nobody wants to be that guy, right? So you proceed with caution, trying to learn before giving advice.
But some things warrant the extra mile. When Samantha started her tenure at a mid-sized firm, it all started out fine. She got a computer right away, which is a nice plus. She met the team, got settled into a desk, and was given a list of passwords and important URLs to get situated. The usual stuff.
After changing her Windows password, she decided to start by browsing the source code repository. This company used Subversion, so she went and downloaded the whole repo so she could see the structure. It took a while, so she got up and got some coffee; when she got back, it had finished, and she was able to see the total size: 300 GB. That's... weird. Really weird. Weirder still, when she glanced over the commit history, it only dated back a year or so.
What could be taking so much space? Were they storing some huge binaries tucked away someplace that the code depended on? She didn't want to make waves, but this just seemed so... inefficiently huge. Now curious, she opened the repo, browsing the folder structure.
Subversion bases everything on folder structure; there is only really one "branch" in Git's thinking, but you can check out any subfolder without taking the whole repository. Inside of each project directory was a layout that is common to SVN repos: a folder called "branches", a folder called "tags", and a folder called "trunk" (Subversion's primary branch). In the branches directory there were folders called "fix" and "feature", and in each of those there were copies of the source code stored under the names of the branches. Under normal work, she'd start her checkout from one of those branch folders, thus only pulling down the code for her branch, and merge into the "trunk" copy when she was all done.
But there was one folder she didn't anticipate: "backups". Backups? But... this is version control. We can revert to an earlier version any time we want. What are the backups for? I must be misunderstanding. She opened one and was promptly horrified to find a series of zip files, dated monthly, all at revision 1.
Now morbidly curious, Samantha opened one of these zips. The top level folder inside the zip was the name of the project; under that, she found branches, tags, trunk. No way. They can't have-- She clicked in, and there it was, plain as day: another backups folder. And inside? Every backup older than the one she'd clicked. Each backup included, presumably, every backup prior to that, meaning that in the backup for October, the backup from January was included nine times, the backup from February eight times, and so on and so forth. Within two years, a floppy disk worth of code would fill a terabyte drive.
Samantha asked her boss, "What will you do when the repo gets too big to be downloaded onto your hard drive?
His response was quick and entirely serious: "Well, we back it up, then we make a new one."
[Advertisement] ProGet supports your applications, Docker containers, and third-party packages, allowing you to enforce quality standards across all components. Download and see how!