Summary
Eric Armstrong's recent article on Artima caught my eye and made me
think a bit about the evolution of distributed programming techniques.
In the beginning, there was the message. But what now?
Advertisement
In the beginning...
... there was the message. Eric Armstrong's recent article on Artima
Messaging is the Right Way to Build a Distributed System
caught my eye the other day. I've not had a chance to write anything for
Artima in quite a while and perhaps responding to this piece is as
good an excuse as any to get back in the habit.
Eric's article made me begin thinking about the evolution of
the distributed programming techniques I've seen and done in
the last two decades. While a message passing paradigm started
the ball rolling, I believe we've progressed far from that mark.
CSPs
I first started thinking about distributed computing using
messaging in about 1985 while working on a project that was
trying to make a loose collection of
devices communicate over a twisted pair network.
It was pretty primitive compared to what we do today, but the
problems were all the same: remoteness implies longer latencies,
the possibilities of partial failure, managing parallelism,
and dealing with the pure tedium of message formats and the
protocols that carry those messages.
The model we used then was reflected in Tony Hoare's book
Communicating Sequential Processes. CSPs are indeed
an excellent way to organize distributed components with simplicity.
Each service need only have a single thread (it could have more, but
more aren't required for this model), they take a message, perform an
action, and send a reply message to report the results to the
initiator.
Message paradigm wrapped in an RPC
A few years later, I was running a project building a piece of
semiconductor fabrication equipment and I employed the same
approach. The machine had a large number of moving parts and
several racks of computers providing the computing power to handle
the motion control and supervisory functions. The CSP approach
made managing parallelism about as easy as it was going to be.
The model was so intuitive that I constructed a programming language
called MIL (the Machine Interface Language)
which I had hoped technicians might be able to use to calibrate
and diagnose problems with the machine.
MIL could format messages and send them on the machine's message bus.
One of the mantras I'd chant was, "everything had to be on the bus!"
While this probably drove the engineers nuts, the approach ensured that
the power of the network and this message passing paradigm extended into
all of the nooks and crannies of the machine. If it works for the big
stuff, it works for the small stuff, too.
The message handling primitives of send and receive were
quickly wrapped into procedure forms: one form for a regular RPC
(send, wait for the reply), and call-no-wait. (The majority of the code
written for the machine was in C and the C programmers also wrapped
their message stuff in RPC wrappers.) The RPC form looked
something like this in MIL:
status := ssl_move(x:=5, y:=10, timeout:=40000)
if (status != 0) then
...
end if
To do things in parallel, you would use the no wait
form of the subroutine call which would just send the message
and not wait for the reply. Waiting for the reply was done in
a separate step. MIL's syntax for using the no wait
form looked something like this:
// initialize the substrate stage
x := ssl_initialize_no_wait(param := 5)
// initialize the reticle stage
y := ret_initialize_no_wait(foo := 6)
// now do the rendezvous
wait(x, 30000)
wait(y, 30000)
if (waiting(x) or waiting(y)) then
...
end if
In the above code, we sent messages to both the
left substrate stage (ssl) and the reticle stage
to have them initialize. The _no_wait
forms of these routines just sent the messages to
the respective subsystems and returned a
handle that could be used later to
recognize the arrival of the return message. So,
in this example, we started the initialization
process for two of the machine's subsystems and
then, once all this activity was started (in
parallel), we could wait for their completion with
the wait command. The
wait command takes a variable used as
the status return value from the send
call as the first parameter, and takes a time out
value (in milliseconds) as the second value. In
this case, we just start waiting for the return
messages. In the normal case, both return messages
will arrive, both wait calls will
succeed before their timeout kicks in. The
if statement after the two
wait calls asks if either of those
two waits had failed by seeing if we
are still waiting. All of this is
pretty simple and it provided a very powerful,
intuitive model that most everybody, engineers,
technicians, test designers, etc., could use.
Notice, though, that the underlying mechanism, message passing,
eventually gathered another layer of abstraction on top of it.
This wasn't to obscure the remoteness of the various
subsystems inside the machine; it was to provide some syntactic
sugar that made the concepts clearer and more palatable to
the developers.
Moving more than data in our messages
After this project it was time to waste some career cycles doing
stuff for the web. I'd done some SQL stuff in previous jobs
but it was interesting to me how much programming was being
done in the database layer. It wasn't just all the SQL query
strings that were being passed around, though there were
plenty of those, it was also the number of stored procedures
that were being used. It seemed like some combination of SQL
and the stored procedures were a programming platform in its
own right. The lines for when code was installed and used
was starting to get blurred in my mind. Were SQL statements
data or code? How about stored procedures? If you install
stored procedures and then exercise them along with a bunch
of other SQL code, is that mobile code? Or, is it
just code that was subject to a lightweight install
or temporary install. Was there a difference?
Is a connection to a database, and the passing of SQL
statements just business as usual in Eric's messaging
model? Or, is it more along the line of mobile code?
Clearly, the purpose of moving the SQL statements to the
database is to have the processing done closer to the
data with only the results passed back to the initiator.
That sounds like mobile code to me. In fact, the whole
point of the database connector is so I don't need to
know how the magic of the transport and messaging
between my program and the database works.
Why should I care about the protocol?
When I'm trying to talk to a service like a database
I really don't want to care about the underlying
transport or how they organize the messaging beneath.
I've been given a different, higher-level view than
all that and I'm happy for it. My program has to be
able to handle database errors (including a lost
connection to the database), but I don't need to
understand any of that stuff the database connector
provided.
In fact, the particular transport and protocol for
the interactions with the database are a private
problem between the database vendor and the provider
of the connector technology. If they release a new
version of the database and connector, they could
even change this protocol completely and I, after a
quick recompile and relink, shouldn't care a bit.
Wouldn't it be nice if more stuff worked like that?
More stuff does work like that
The idea of moving code isn't new. Back in 1993
and 1994 General Magic carved out a nifty idea of
having a scripting language regular users could use
to help organize their personal communication. Here's
an example of what this could have done.
Let's say you set up some rules for how you wish to be contacted:
If email is received during 8-6 M-F route to office mail.
If email is marked urgent, route to office mail and copy to
my Blackberry.
If I haven't picked up a message during normal business hours
and a message is 4 hours old (or older), send a message to my
Blackberry with the subject only.
If I receive a message from "boss" during normal business hours
that is not picked up in 30 minutes or less, call me on my
cell phone and do a text-to-speech of the message
(so I can pick it up at the ballpark... Hey, I don't want to
be fired...)
Telescript was going to do stuff like this and much more.
Here's the blurb from a
Byte magazine article
from that era:
"Telescript, a communications-oriented programming
language comparable to C or Pascal, will let developers
create network-independent intelligent agents and distributed
applications. Some of these applications, in turn, may be tools
that let ordinary users create intelligent agents without
programming. What PostScript did for cross-platform,
device-independent documents, Telescript aims to do for
cross-platform, network-independent messaging. General
Magic hopes Telescript will become a lingua franca for
communications."
We didn't get a world filled with Telescript. But, we do
have a world filled with Java and we've been moving Java
code since the beginning of the Java movement. Applets,
those sometime cutesy and even occasionally useful pieces
of Java code that get loaded into web browsers are an
example of mobile code. Sometimes those applets talk back
to the computer that served them up to our web browser.
Again, the browser provides a place for the applet to
run and the applet provides a service within the page,
but whatever protocols or messages it exchanges with
the remote computer are invisible, and uninteresting,
to us. Like the database connection utilities, I don't
really care how it does its job.
More stuff should work like this
Moving code around, whether it be SQL commands,
telescript, or Java code, has much to recommend it.
There are a couple of things that moving code
accomplishes:
Computation happens in the right place, and
details like protocol specifics and message formats can be hidden.
Jini, a distributed computing model
for Java, does move code around and for both of these reasons. But,
there is some reluctance even in this forward-looking group to fully
embrace moving code to solve some problems. I had a long running
conversation on the JavaSpaces list where I lamented that some of
the proposed changes to the JavaSpaces specification made this Spring
were trying to move data when moving code made more sense to me.
You can read my prattling
here.
Who knows, maybe I was off my rocker--but it seemed like if moving SQL
commands to the database for data selection made sense, moving code to a JavaSpace
for data selection made at least as much sense.
The relationship between the JavaSpaces client and the JavaSpaces
service is managed by the mobile code of the service's proxy.
Just as the database connector technology provided a local proxy
for the facilities supplied by the database, the JavaSpaces proxy
supplies a facade for the JavaSpaces service. The only difference
between these two cases is when the linkage-editor functionality
happens. If you believe "later is better", you're gonna love Jini
and JavaSpaces.
The evolution from the messaging paradigm
We've come a long way since the simple exchange of messages.
We started with just simple messages, protocols that had
to be agreed upon up-front, and Hoare's CSP model. But
as time progressed, there was pressure on that model.
Pressure to move the computation closer to the data
The SQL commands you send in every database query are an
example of moving the computation closer to the data. You
can buy more computing power; it is hard to buy more network
bandwidth.
Pressure to keep a familiar program model like subroutine calls
Messages are fine but the RPC programming model is often what
you're trying to do anyway: make a call, get an answer, move on.
Why not use a something that looks like a procedure or
function to do that?
Pressure to hide unnecessary details
I don't care about the protocol between the database and
whatever connector technology I'm using. In fact, the
less I know the better off everybody is. If they want to
change it in their next release, that's fine with me.
Pressure to control both ends of the connection
If you have to declare message formats and protocols up-front,
that means a lock in that could be uncomfortable later. If
instead you simply offer an API for the service (as the database
connection utility or Jini service would do), you can control
both ends of the protocol and the message formats because both
of these things are implementation details below the visibility
of your end-users.
Distributed computing's growth over the last half decade has
been astonishing to me. And, while I'm still a fan of Hoare's
simple model, I recognize that pressures have encouraged an
evolution of tools, techniques, and attitudes that have made
distributed system development easier.
It may have all started with the message... but we've come a long
way since then.
Resources
Communicating sequential processes (Prentice-Hall International
series in computer science) (Paperback), C. A. R. Hoare, 1985,
found on Amazon
here. This was one of the first texts I had seen on the topic.
It was a bit heavy on the symbology and the math, but once I got it,
I was hooked. I still have the hard-bound edition on my shelf.
General Magic
invented Telescript
which was an idea out of the blue and far before its time.
Scott- As you may know, I'm all for allowing code to be moved to the space, perhaps going further with the idea than most people would be comfortable with:
Indeed, Neon (http://neon.jini.org) works on the principle of having code moving around the network, in order to allow applications (by this I mean the applications that use your services) to their various components together seamlessly at runtime. As you've set out in your post, I ended up writing a messaging system underpinning Neon, but which translates method invocations (via dynamic proxies) into messages (Objects not XML) and sends them on between nodes - and it uses Jini to send the messages.
My Issue with Eric's original post is the point that XML gives you evolvability and immunity from changes; my point is that to do so, you remove automatic validation of the XML from your process.....with serialization, your marshalled object is compared against the class definition, before you can get to it, in other words the byte-stream is implicitly verified , and of course you can have multiple versions, either by careful changes via non-serailization-breaking changes to your class format, through readExternal, or if using Jini PreferredClassLoaders.
However, with XML, if you have an XML Schema (S1) that describes the types and formats that your parser is expecting from a particular set of data purporting to be applicable to that schema(X1), and you have another XML file(X2) that is applicable to an 'evolved' version of that schema(S1.1), for a system using the first Schema format(S1) to read the new format file(X2), the existing schema must be updated, to be able to recognize new element definitions (even though processing further on may ignore them). The only advantage in this is that with XML you can turn off implicit validation and verification of the XML stream.
BTW, it's good to see some context applied to how distrbuted systems have evolved over the years.
Nice historical commentary. What I find interesting is that people seem to mix up the whole synchronous vs asynchronous vs reliability thing with whether something is invocation based or message based. They seem to miss the fact that it's an exercise in abstraction - I can use either approach to implement either synchronous or asynchronous as I require. I'd have expected all this to be obvious given the behaviour/characteristics of TCP/IP and what's been built on top.....
If it matters, I played around with the code moving approach and prototyped an interface for Blitz that allowed a client to move selected code to the server for more rapid execution without the round-trips.
A few people have played with it but not many and feedback is scarce. I think most people simply have problems with any system where the location of the code changes - maybe it's cos they can't so easily model it in their heads or maybe it's some other issue. I'd love to know......
I have had very positive experiance of passing objects instead of messages, particularly using JavaSpaces which is part of Jini. The problem with message passing is that it is two static, if you want to add a new message then everything, all machines, need updating. With object passing you can distribute the code on the fly not just the value of the objects, much more flexible.
If you want to pass the objects values using XML then you can, at least in Java. You can serialize to XML or use one of the bindings packages, e.g. JAXB.
Great article, (I wanted to respond to "Eric's" original, but thank you for posting such an in depth analysis here).
First off I heartily agree with the idea "Develop an API on top of message passing" if your system requires such message passing.
Having implemented a "Message passing system" in Java I wanted to respond with my experience and throw out some ideas and questions.
Many people have pointed out that Object passing systems are inherently flawed because all of the "clients" and the "server" need to stay in sync (i.e. new Stubs/Skeletons for EJBs, WSDL and proxies). I'm trying to understand how a message passing system resolves this issue.
Assume the "interface/syntax" of Service/message changes (on the server), and the client sends a message with the old "Payload/Syntax"? What actions should the server take? How would the message be deciphered on the (distributed) server to know which method to be called? (what if the message is "overloaded") And what does each "field" in the message mean (symantically) as it is being invoked on the server?
One (generally bad) way to solve this would be to define a generic interface (i.e. doAction(Map fields)), which pawns off this problem to the server process. (I don't want to write AI on top of my message parser)...Well defined interfaces is my preferred approach to distributed systems.And yes you have to redeploy to all clients each time you make a change.
Then there is the idea of passing "agents" which holds a good deal of promise, it sounds really cool. But at the point we start talking about doing this (or "pushing code" to the server), I think, "Well if it's that easy (just push it code to the server)" why not just physically put the entire App on the server and forget this distributed nonsense?". We can still scale (cluster) the app horizontally, and we bypass all of these other distributed problems.
> Then there is the idea of passing "agents" which holds a > good deal of promise, it sounds really cool. But at the > point we start talking about doing this (or "pushing code" > to the server), I think, "Well if it's that easy (just > push it code to the server)" why not just physically put > the entire App on the server and forget this distributed > nonsense?". We can still scale (cluster) the app > horizontally, and we bypass all of these other distributed > problems.
Actually, you wouldn't push all your code to the server. Typically you push a proxy object, which communicates with your own machine, where the real work is done. The advantage of this approach is that you control both the proxy and your machine, so the communiation protocol is completely private to your software. You can use RMI, XML or pigeons to handle your communication, and the other software on the server wouldn't care as long as your proxy implements an interface it can talk to.
For software evolution, you can create new versions of the interface, have your proxy implement that as well, and 2 clients using different versions of the interface can both talk to your software.
I suppose the underlying message what I did not effectively communicate in my post is: "How much complexity and overhead are we adding to the system as when we decide to make it a distributed system". The idea that we can just "push code" to a server makes me wonder whether the system should be distributed in the first place.
I've never had the opportunity to use "agent passing", which sounds extremely interesting... I would question, though, how "safe" it would be to allow this in a real business class system. (Am I really going to open up my system to let agents do what they want?...What if the agent decides it needs all of my system resources to accomplish something (full table scan on every table), etc.)
Adding this "agent-passing" to a system (as you describe) makes sense, and it does solve the (immutable interface) problem, however, is the juice worth the squeeze? (IMO it seems complicated, and just opens up another level of complexities and security issues).
By the way when I refer to "Distributed System" I'm referring to a system which is more than the standard "client/server" type (I just want to make sure we're discussing the same thing.)
The primary barrier to agent consuming massive system resources is not technical, but social/organizational.
Like you said, these systems are not simple, small client-server setups, but serious enterprise systems. You don't open up these systems to code/agents from just anyone. Any agent that needs to be pushed onto this system will go through the same testing and security procedures that other software on such systems goes through.
Like so many technologies, it's a balancing game, trading one type of complexity for another. It entirely depends on your own environment which sort of complexity is easier to manage.
A closely related topic is the form of consulting that a software product company might offer. I founded and ran a small software company for 20 years, and we were able to field 5 or 6 good engineers as consultants and charge consulting level rates. And the customers were very pleased with the value delivered for the monies paid.
The reason these 5 or 6 engineers were perceived as good value for the customers is that they really did know there stuff. And the reason they really knew there stuff was that when they were not consulting, they were doing actual development on the product itself. So they had a depth of knowledge far greater than any outside consultant could have.
But this model, of course, is not at all scalable, and my company eventually went down the tubes because it never successfully scaled up. Had we scaled up, we would have invaribly started fielding "consultants" that did nothing but consulting, their in depth knowledge would have declined, and their perceived value to the customer would have declined. Sad but true.
I agree with the statements made on the thread about the future being smaller companies with great, in-depth expertise. But there is always a large amount of skepticism from a large company when dealing with a small company. That is why large companies throw such money at large companies like Anderson only to get mediocre quality and poor ROI.
Scott, I don't know if you are aware, but I started a project on jini.org called 'fspace' (flexible space) where I put some code that codified all the issues that I thought were important about iterators verses executors etc. It's till a work in progress, as I haven't got transactions in there yet, but the interfaces are visible, and there is code there that runs.
I mainly wanted to start to see what type of latencies would be involved with an executor, for frequent changes in the executor and similar things with the iterator concept.
One of the primary things that I did was to separate the data stored from the keys used for selection. This allows some data to be marshalled and others not. I also eliminated the use of MarshalledObject for implementation in the space so that you could send whatever kind of data you wanted into the space, and optimize access in your executor, and then potentially recast that data to a different for for return to the client.
Fascinating. Google is using an interesting technology known as AJAX that basically consists of moving javascript to the web browser. AJAX is the magic behind GoogleMaps and GMail. So in this case, we have moving code from server to the client. Also some of the p2p apps, such as Nabster, utilize code at each peer.
I think the "marshall by text" approach that is XML is the way to go. Do you really need to pass an object or just its state? Do I need an Account object or do I just need a Name, Number and Amount? If the Account object provides more functionality than just properties, then I have a big problem in terms of coupling the design between the producer and the plethora of consumers of the Account object.
I guess "decoupled" object RPC is possible. With reflection and all the dynamic what-not, maybe collaborating systems can figure out how to work with strange objects that may appear on thier doorsteps.
> Fascinating. Google is using an interesting technology > known as AJAX that basically consists of moving javascript > to the web browser. AJAX is the magic behind GoogleMaps > and GMail. So in this case, we have moving code from > server to the client. Also some of the p2p apps, such as > Nabster, utilize code at each peer.
This is just a heavy client application in that the client is doing the work instead of the server. There is mobile code here as well.
> I think the "marshall by text" approach that is XML is the > way to go. Do you really need to pass an object or just > its state? Do I need an Account object or do I just need > a Name, Number and Amount? If the Account object provides > more functionality than just properties, then I have a big > problem in terms of coupling the design between the > producer and the plethora of consumers of the Account > object.
Well, there are two pieces here. There is the application, which is what the users care about and then there is the underlying remote communications that creates the app.
Because the Javascript is downloaded code, it can be changed to optimize the client. Google is getting ready, or already has upgraded the client to revision 17 which will include some rendering optimizations that some are saying are great improvements for large applications.
What we see happening here is interesting. Google maps doesn't demand anything about the data that is used to create the application additions to the basic map. You can do whatever you want. Many are using XML documents (I am at http://www.w5ias.com/status) but others are using a database, or other backend store, and some static data in the HTML page downloaded.
> I guess "decoupled" object RPC is possible. With > reflection and all the dynamic what-not, maybe > collaborating systems can figure out how to work with > strange objects that may appear on thier doorsteps.
I am still very sceptible about how productive this can be without some highlevel agreements on certain aspects of the data and its use.