This post originated from an RSS feed registered with Java Buzz
by Mathias Bogaert.
Original Post: How to tear apart a repository: the Git way
Feed Title: Scuttlebutt
Feed URL: http://feeds.feedburner.com/AtlassianDeveloperBlog
Feed Description: tech gossip by mathias
How do you divide a Git repo without squashing the history and breaking the original repo? I’ll show you how to do it with as little pain as possible, by splitting the main repository, thus making your team and Git happy at the same time! Here at the Hosted Operations team, we have many small repositories, and sometimes they just contain single scripts. This approach eventually produced some duplicated code and effort, leading to considerable maintenance issues. What we decided then, was to create a repo in which we could consolidate many of those scripts and concentrate our refactoring efforts. After this refactoring, we ended up with a pretty big repo that was naturally divided in binaries and libraries. Every script that made use of these libraries was included inside this repository, trying to maximize the reuse of code as much as possible. In the meanwhile, other big projects wanted to use that mighty pool of awesome libraries without carrying along the binaries included in the repo. We finally decided to do the only logical thing: separate the libraries from the main repo, maintaining these in their own space. Scenario & Goals We will call the original repository of the story by the codename base; this will be the repository that will be split into two: scripts – this will hold the binaries only libraries – this will hold the libraries that many projects will end up using The challenge here, is that the history for these “wanna-be repositories” is mixed all together in that one big repo that we called base. In our case, we had all the scripts in the bin directory and all the libraries in lib inside our base repo. work$ git clone base scripts Cloning into 'scripts'... done. work$ git clone base libraries Cloning into 'libraries'... done. The next step is to filter out unwanted history from each of the two repos. Instead of tracking down individual files, we can use an amazing filter-branch switch: –subdirectory-filter. This will rewrite the repo history picking up only those commits that actually affect the content of a specific subdirectory. Note that this switch will also instruct Git to convert the subdirectory as being the root of the whole repo. This will rewrite the current branch (master in this case) extracting only the history belonging to the wanted folder. scripts$ git filter-branch --subdirectory-filter bin/ -- master Rewrite c97684d3120b82e42e99ccb711627ea877c3bf0c (128/128) Ref 'refs/heads/master' was rewritten scripts$ cd ../libraries/ libraries$ git filter-branch --subdirectory-filter lib/ -- master Rewrite 6bb1f8ef53094cd5f05379fced9413b5d7f8e018 (90/90) Ref 'refs/heads/master' was rewritten libraries$ Instead of specifying only a branch to be rewritten (master in this case), you can also specify to rewrite multiple branches and even tags. Obviously, not every tag can be successfully rewritten on the new history: the tagged commit must be within the rewritten ones for the tag to be reapplied. As you might imagine, this operation can be harmful. For this reason, filter-branch will create a backup copy of every ref it modifies, as original/refs/*. Git will […]