Python Buzz Forum - Centralized vs. Decentralized 2

After thinking about the comments to my last post I'm starting to see some of the technical advantages to distributed version control (even though I'm not any more enamored to the development process it so seldom presumes).

Really the thing that keeps me from just opening up large swathes of a Subversion repository to anonymous access is the security concern. I just don't trust Subversion in that way, and I doubt the Subversion developers trust it that way either. After all, it's written in C, one of the Least Secure Languages Ever. (PHP is giving it a run for its money with its own take on How To Be Insecure, but C has a much deeper and richer legacy of insecurity.)

But a lot of the problems are hard to really imagine fixing in Subversion. What if someone uploads 10Gig of asdfasdfasdf into the repository? Sure, you can delete anything, but stuff still Lives Forever in the history. Or less maliciously, someone is sure to start uploading core dumps, or giant PDFs, or something. So even though Subversion is much less prone to mistakes than CVS, because operations can generally be "undone" there's still cruft left behind. Not enough to bother me now, but enough that I suspect I'd be bothered if I give access to the public. (I still plan to give access to more people once I get the permission thing figured out, just not self-signup.)

Also, because lots of the logic lives on the server with a centralized tool like Subversion, there's a lot more to worry about in terms of remote exploits. If most of the logic is in the client then they can only exploit their own machine. Though on reflection this might be worse, since it could mean checking out a repository could itself be a security risk. Well... let's just hope we're working with environments where security is valued and attainable.

Another issue is backend management. One of Subversion's benefits and drawbacks over CVS is that you couldn't "maintain" the repository, meaning you couldn't go in and fiddle with files on the server. This means you can't break the repository, but also that you can't fix it (like when someone uploads those core dumps, or completely eliminating defunct branches). Distributed systems leave room to meaningfully modify the "repository" using file commands, where the "repository" is really a whole set of repositories, which together form something equivalent to the more inclusive repository that Subversion expects.

So maybe a distributed version control system would be a good basis for an open centralized repository, where that open repository is primarily a file share. A usable system would actually handle the file sharing internally, since relying on scp, rsync, or an OS-level webdav client implementation is too error prone at this time.

From this perspective centralizing the files is still very important to me. In the model I would prefer there is a privileged (and presumably somewhat trusted) limited set of branches (like trunk or HEAD, tagged releases, stable branches). And then there are all the other branches, and anyone can edit any of them. That means that the default is open, which I feel strongly is the right default. The current default of distributed systems is private and with author editing only. This feels like an unnecessary restriction imposed mostly because they are avoiding the technical issues of sharing. I think the way these systems rely on email, rsync, ssh, etc., is simply avoidance, an unwillingness to address the whole experience. But that can certainly be resolved.


	Web Artima.com