This post originated from an RSS feed registered with Ruby Buzz
by Christian Neukirchen.
Original Post: Tracking the Ruby CVS with Git
Feed Title: chris blogs: Ruby stuff
Feed URL: http://chneukirchen.org/blog/category/ruby.atom
Feed Description: a weblog by christian neukirchen - Ruby stuff
$ du -h ~/projects/Git/ruby.git/
29M /Users/chris/projects/Git/ruby.git/
Amazing, but true: above directory contains the whole history of the
Ruby CVS—from January
1998 until today, in less than 30 megabytes. That’s 9325 commits and
about 44332 different file versions.
How is this possible? I used Git, the version
control system that was written to keep the Linux source, which is
“designed to handle absolutely massive projects with speed and
efficiency”. And most of the parts are actually pretty efficient and
fast.
Not among them is importing from CVS. Not yet, at least. Git includes
a Perl script, git-cvsimport which essentially works like that:
Checkout each revision from CVS, commit to Git, checkout the next revision,
commit again, water, rinse and repeat.
Hopelessy slow, especially if the CVS is remote. So let’s fix that,
we make a local CVS mirror first. Luckily, the Ruby CVS supports
cvsup, which is essentally like a fast rsync
for CVS repositories, but also can be used to mirror complete
CVSROOTs. Unluckily, this is not documented at the Ruby CVS page.
However, with help from Shugo Maeda, I was able to locally mirror the
Ruby CVS. You need a cvsup file like this:
*default base=/Users/chris/mess/current/cstest/sups
*default compress delete use-rel-suffix
*default release=cvs
*default host=cvs.ruby-lang.org
*default prefix=/Users/chris/mess/current/cstest/ruby
# Ruby and other modules
cvs-src
Adjust the paths to your local needs, of course. Then, you need to
fetch cvsup. If you are lucky, your distribution will have it
packaged, else you need to bootstrap a Modula 3 compiler(!) to compile
it. Have fun. *sigh* (The compiler is pretty quick, though.)
Anyway, at the end of the day, I had my local CVS mirror—let the
experiments start.
git-cvsimport
depends on cvsps, a tool to analyze
CVSROOTs and figure the actual revisions. This is needed because CVS
is a bunch of clunky shit that has no conscience of its commits. After
that, an almost endless loop of checkout and commit will start. If
you want to try it yourself, get a fast computer, a fast, big disk and
an efficient file-system. No, doing it on an iBook with only a few
gigs free and HFS+ is not a good idea. Actually, it took four days,
and I had to do it stepwise.
There could be a better solution in the future,
parsecvs
by Keith Packard of X.org fame. It’s in very early alpha stage, and
will need even more disk space as of now, but ought to be a lot faster
in the future. At least one can hope.
After this, you’ll have a Git controlled tree full of the actual file
revisions, it’s hard to estimate how big it would be. To make the
handy file shown above, you need to pack the tree. For this, you run:
git repack -d
This will compute a few minutes/hours/days and spit out a nice file,
of about 70 megabytes in size. If you want the handy file above, you
either need to figure out how to patch git repack to pass the
optimization options --window=50 --depth=50 to git-pack-objects,
or call the latter low-level tool directly. This way, you’ll get the handy
file. Higher argument values will slow down the process a lot, and
not result in packages that are maybe half a megabyte smaller. I
tried.
The great thing about git-cvsimport is that it can work
incrementally, so once we have the pack, we can update directly from
Ruby CVS—the changes are small if you do that regularily. For this,
I included a small script in the pack, update-ruby-git:
Run this script regularily to keep your tree recent. You don’t need
the CVSROOT or cvsup anymore.
Now, how is this all of this useful? Obviously, you enjoy all the
benefits Git provides for your daily hacking: atomic actions,
distributed development, zero cost (almost!) branches and good merges.
Also, you have the nice gitk
repository browser that allows you to keep track of recent
development. Since you can fetch every file at every revision easily,
it’s just a matter of time someone starts datamining… “how many
percents of Ruby are really written by matz”?
You can use
git bisect
to find bugs in Ruby by marking some revision as good, some as bad,
and let Git figure which revision you try next to find the faulty
patch.
And if you really want to use CVS, you even can emulate a CVS server
(read and write!), with git-cvsserver. Isn’t that impressive?
I probably will make the pack available on the net, but I haven’t yet
found a good way to allow others to efficiently (and incrementally)
fetch it… hopefully more about that later.