This post originated from an RSS feed registered with Java Buzz
by Michael Cote.
Original Post: PeopleOverProcess.com: Levanta and Punk IT
Feed Title: Cote's Weblog: Coding, Austin, etc.
Feed URL: https://cote.io/feed/
Feed Description: Using Java to get to the ideal state.
I spoke with David Dennis of Levanta (formally LinuxCare in "a past life") today, primarily about their Intrepid-M Linux management appliance. Their approach to Linux box management is, as it was most recently called, "punk IT." That approach to IT is composed of a bunch of easily managed boxes instead of a handful (or just one) of high horsepower servers that you've virtualized. Of course, both of those are different than the ad hoc approach where your "tools" are SSH and vi.
As David and I discussed, virtualization is the current golden boy of the approaches. But, folks who don't have high-powered servers to sell or cash to buy said servers, increasingly speak in hushed, reverent tones about the "trash and re-provision" magic pixie dust (no pun intended...?) that Google practices. Almost every conversation I have with a Linux-head turns to The Google School of sysadmin'ing in about 5 minutes.
Having that trash and re-provision mindset, Levanta's story and problem space is "sort of like" Qlusters, as both parties put it when I asked. Levanta's approach is bit different, in that they're more into a fine-grained, very transactional-like approach to re-provisioning. They haven't taken on the monitoring aspects, though their customers do tend to integrate (via the command line in the case we went over) with other systems management platforms.
What it Does
In a nut-shell, the Intrepid M works like this:
You have a bunch of "naked" boxes.
They boot up, and bootstrap via PXE, connecting to the Intrepid M.
The Intrepid M maintains several "masters" (my term) of server software and default/base configuration. These are in the "Repository" above. These masters might be an email server, a web server, or whatever you might run on a server.
The masters act as prototypes for actual server instances. You might have one web server host your static content, while another load balances your PHP. One server could host the database for your employee records, while another one hosts the database for your inventory. In both cases, the majority of the software is the same, while the configuration and data are different.
The "diffs" between the master/prototypes and the customizations are called the "Overlays."
So, as you recall, all your machines have PXE booted and hooked up to the Intrepid M, awaiting the command to download the software for the type of server it is. One "naked" server that comes up will be assigned as the web server that hosts images, another will be assigned the inventory database. The Intrepid M manages doing this assignment and downloading the software to that server. (If you want to be dork-cool in your word-choice, you could say "the bytes" or "the bits" instead of "the software.")
Now, all of these servers that previously had nothing installed on them are setup to run as the different types of servers. They're provisioned.
This is where the interesting stuff starts happening. At some set intervals, each server takes a "snap-shot" of itself and sends it up the Intrepid M. This snap-shot becomes the new Overlay for the server in question. The point of this, is that, should a server crap out or light-fire (as so often happens in these scenarios), the Intrepid M has a recent copy of the state of the machine, and can re-provision it to a non-smoldering server. That is, the Intrepid M is backing up each server and can restore those backups on different machines.
What's interesting, from a dork-level, is how the Intrepid M does this. It doesn't just copy the entire server, that'd take way too much space. Instead, the Overlays are binary diffs between the current state of the server and the master/prototype in the repository. These means that the Intrepid M is only "backing up" those bits that have changed, not all of the bits on the servers (there I go being cool with my word choice). This is how CVS and, I believe, subversion work: when you check in a file, you're not actually checking in a full copy of it, just the things that have changed.
This is where the ability to "roll-back" a server comes from. You can keep 1-n snap-shots/Overlays stored (depending on how much storage you have). So you could have the Overlay from 1:15PM, 1:00PM, 12:45PM, etc. So, if your server gets hacked or otherwise "goes bad," you can roll-back to the last known, good state. Perhaps the server got hacked at 12:58PM, so you'd roll-back to the 12:45PM Overlay.
Customers
We haven't spoken with a customer reference yet (I neglected to ask, so I'll have to follow-up), but the CUNY case is compelling. In discussing the history of the company -- which used to be LinuxCare -- one bullet point mentioned "[g]ot more customers in 12 months than last 2 years."