Weblogs Forum - Messaging may not be the way to Build a Distrbuted System

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

In the beginning...

... there was the message. Eric Armstrong's recent article on Artima Messaging is the Right Way to Build a Distributed System caught my eye the other day. I've not had a chance to write anything for Artima in quite a while and perhaps responding to this piece is as good an excuse as any to get back in the habit.

Eric's article made me begin thinking about the evolution of the distributed programming techniques I've seen and done in the last two decades. While a message passing paradigm started the ball rolling, I believe we've progressed far from that mark.

CSPs

I first started thinking about distributed computing using messaging in about 1985 while working on a project that was trying to make a loose collection of devices communicate over a twisted pair network. It was pretty primitive compared to what we do today, but the problems were all the same: remoteness implies longer latencies, the possibilities of partial failure, managing parallelism, and dealing with the pure tedium of message formats and the protocols that carry those messages.

The model we used then was reflected in Tony Hoare's book Communicating Sequential Processes. CSPs are indeed an excellent way to organize distributed components with simplicity. Each service need only have a single thread (it could have more, but more aren't required for this model), they take a message, perform an action, and send a reply message to report the results to the initiator.

Message paradigm wrapped in an RPC

A few years later, I was running a project building a piece of semiconductor fabrication equipment and I employed the same approach. The machine had a large number of moving parts and several racks of computers providing the computing power to handle the motion control and supervisory functions. The CSP approach made managing parallelism about as easy as it was going to be. The model was so intuitive that I constructed a programming language called MIL (the Machine Interface Language) which I had hoped technicians might be able to use to calibrate and diagnose problems with the machine.

MIL could format messages and send them on the machine's message bus. One of the mantras I'd chant was, "everything had to be on the bus!" While this probably drove the engineers nuts, the approach ensured that the power of the network and this message passing paradigm extended into all of the nooks and crannies of the machine. If it works for the big stuff, it works for the small stuff, too.

The message handling primitives of send and receive were quickly wrapped into procedure forms: one form for a regular RPC (send, wait for the reply), and call-no-wait. (The majority of the code written for the machine was in C and the C programmers also wrapped their message stuff in RPC wrappers.) The RPC form looked something like this in MIL:


status := ssl_move(x:=5, y:=10, timeout:=40000)
if (status != 0) then
	...
end if

To do things in parallel, you would use the no wait form of the subroutine call which would just send the message and not wait for the reply. Waiting for the reply was done in a separate step. MIL's syntax for using the no wait form looked something like this:


// initialize the substrate stage
x := ssl_initialize_no_wait(param := 5)
// initialize the reticle stage
y := ret_initialize_no_wait(foo := 6)

// now do the rendezvous
wait(x, 30000)
wait(y, 30000)
if (waiting(x) or waiting(y)) then
...
end if

In the above code, we sent messages to both the left substrate stage (ssl) and the reticle stage to have them initialize. The _no_wait forms of these routines just sent the messages to the respective subsystems and returned a handle that could be used later to recognize the arrival of the return message. So, in this example, we started the initialization process for two of the machine's subsystems and then, once all this activity was started (in parallel), we could wait for their completion with the wait command. The wait command takes a variable used as the status return value from the send call as the first parameter, and takes a time out value (in milliseconds) as the second value. In this case, we just start waiting for the return messages. In the normal case, both return messages will arrive, both wait calls will succeed before their timeout kicks in. The if statement after the two wait calls asks if either of those two waits had failed by seeing if we are still waiting. All of this is pretty simple and it provided a very powerful, intuitive model that most everybody, engineers, technicians, test designers, etc., could use.

Notice, though, that the underlying mechanism, message passing, eventually gathered another layer of abstraction on top of it. This wasn't to obscure the remoteness of the various subsystems inside the machine; it was to provide some syntactic sugar that made the concepts clearer and more palatable to the developers.

Moving more than data in our messages

After this project it was time to waste some career cycles doing stuff for the web. I'd done some SQL stuff in previous jobs but it was interesting to me how much programming was being done in the database layer. It wasn't just all the SQL query strings that were being passed around, though there were plenty of those, it was also the number of stored procedures that were being used. It seemed like some combination of SQL and the stored procedures were a programming platform in its own right. The lines for when code was installed and used was starting to get blurred in my mind. Were SQL statements data or code? How about stored procedures? If you install stored procedures and then exercise them along with a bunch of other SQL code, is that mobile code? Or, is it just code that was subject to a lightweight install or temporary install. Was there a difference?

Is a connection to a database, and the passing of SQL statements just business as usual in Eric's messaging model? Or, is it more along the line of mobile code? Clearly, the purpose of moving the SQL statements to the database is to have the processing done closer to the data with only the results passed back to the initiator. That sounds like mobile code to me. In fact, the whole point of the database connector is so I don't need to know how the magic of the transport and messaging between my program and the database works.

Why should I care about the protocol?

When I'm trying to talk to a service like a database I really don't want to care about the underlying transport or how they organize the messaging beneath. I've been given a different, higher-level view than all that and I'm happy for it. My program has to be able to handle database errors (including a lost connection to the database), but I don't need to understand any of that stuff the database connector provided.

In fact, the particular transport and protocol for the interactions with the database are a private problem between the database vendor and the provider of the connector technology. If they release a new version of the database and connector, they could even change this protocol completely and I, after a quick recompile and relink, shouldn't care a bit. Wouldn't it be nice if more stuff worked like that?

More stuff does work like that

The idea of moving code isn't new. Back in 1993 and 1994 General Magic carved out a nifty idea of having a scripting language regular users could use to help organize their personal communication. Here's an example of what this could have done.

Let's say you set up some rules for how you wish to be contacted:

If email is received during 8-6 M-F route to office mail.
If email is marked urgent, route to office mail and copy to my Blackberry.
If I haven't picked up a message during normal business hours and a message is 4 hours old (or older), send a message to my Blackberry with the subject only.
If I receive a message from "boss" during normal business hours that is not picked up in 30 minutes or less, call me on my cell phone and do a text-to-speech of the message (so I can pick it up at the ballpark... Hey, I don't want to be fired...)

Telescript was going to do stuff like this and much more. Here's the blurb from a Byte magazine article from that era:

"Telescript, a communications-oriented programming language comparable to C or Pascal, will let developers create network-independent intelligent agents and distributed applications. Some of these applications, in turn, may be tools that let ordinary users create intelligent agents without programming. What PostScript did for cross-platform, device-independent documents, Telescript aims to do for cross-platform, network-independent messaging. General Magic hopes Telescript will become a lingua franca for communications."

We didn't get a world filled with Telescript. But, we do have a world filled with Java and we've been moving Java code since the beginning of the Java movement. Applets, those sometime cutesy and even occasionally useful pieces of Java code that get loaded into web browsers are an example of mobile code. Sometimes those applets talk back to the computer that served them up to our web browser. Again, the browser provides a place for the applet to run and the applet provides a service within the page, but whatever protocols or messages it exchanges with the remote computer are invisible, and uninteresting, to us. Like the database connection utilities, I don't really care how it does its job.

More stuff should work like this

Moving code around, whether it be SQL commands, telescript, or Java code, has much to recommend it. There are a couple of things that moving code accomplishes:

Computation happens in the right place, and
details like protocol specifics and message formats can be hidden.

Jini, a distributed computing model for Java, does move code around and for both of these reasons. But, there is some reluctance even in this forward-looking group to fully embrace moving code to solve some problems. I had a long running conversation on the JavaSpaces list where I lamented that some of the proposed changes to the JavaSpaces specification made this Spring were trying to move data when moving code made more sense to me. You can read my prattling here. Who knows, maybe I was off my rocker--but it seemed like if moving SQL commands to the database for data selection made sense, moving code to a JavaSpace for data selection made at least as much sense.

The relationship between the JavaSpaces client and the JavaSpaces service is managed by the mobile code of the service's proxy. Just as the database connector technology provided a local proxy for the facilities supplied by the database, the JavaSpaces proxy supplies a facade for the JavaSpaces service. The only difference between these two cases is when the linkage-editor functionality happens. If you believe "later is better", you're gonna love Jini and JavaSpaces.

The evolution from the messaging paradigm

We've come a long way since the simple exchange of messages. We started with just simple messages, protocols that had to be agreed upon up-front, and Hoare's CSP model. But as time progressed, there was pressure on that model.

Pressure to move the computation closer to the data
The SQL commands you send in every database query are an example of moving the computation closer to the data. You can buy more computing power; it is hard to buy more network bandwidth.
Pressure to keep a familiar program model like subroutine calls
Messages are fine but the RPC programming model is often what you're trying to do anyway: make a call, get an answer, move on. Why not use a something that looks like a procedure or function to do that?
Pressure to hide unnecessary details
I don't care about the protocol between the database and whatever connector technology I'm using. In fact, the less I know the better off everybody is. If they want to change it in their next release, that's fine with me.
Pressure to control both ends of the connection
If you have to declare message formats and protocols up-front, that means a lock in that could be uncomfortable later. If instead you simply offer an API for the service (as the database connection utility or Jini service would do), you can control both ends of the protocol and the message formats because both of these things are implementation details below the visibility of your end-users.

Distributed computing's growth over the last half decade has been astonishing to me. And, while I'm still a fan of Hoare's simple model, I recognize that pressures have encouraged an evolution of tools, techniques, and attitudes that have made distributed system development easier.

It may have all started with the message... but we've come a long way since then.

Resources

Communicating sequential processes (Prentice-Hall International series in computer science) (Paperback), C. A. R. Hoare, 1985, found on Amazon here. This was one of the first texts I had seen on the topic. It was a bit heavy on the symbology and the math, but once I got it, I was hooked. I still have the hard-bound edition on my shelf.
General Magic invented Telescript which was an idea out of the blue and far before its time.
You can learn more about Jini here.

Calum Shaw-Mackay

Posts: 58
Nickname: calum
Registered: Mar, 2004

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 1, 2005 3:36 AM

Scott-
As you may know, I'm all for allowing code to be moved to the space, perhaps going further with the idea than most people would be comfortable with:

http://archives.java.sun.com/cgi-bin/wa?A2=ind0405&L=javaspaces-users&P=R16494&I=-3

Indeed, Neon (http://neon.jini.org) works on the principle of having code moving around the network, in order to allow applications (by this I mean the applications that use your services) to their various components together seamlessly at runtime. As you've set out in your post, I ended up writing a messaging system underpinning Neon, but which translates method invocations (via dynamic proxies) into messages (Objects not XML) and sends them on between nodes - and it uses Jini to send the messages.

My Issue with Eric's original post is the point that XML gives you evolvability and immunity from changes; my point is that to do so, you remove automatic validation of the XML from your process.....with serialization, your marshalled object is compared against the class definition, before you can get to it, in other words the byte-stream is implicitly verified , and of course you can have multiple versions, either by careful changes via non-serailization-breaking changes to your class format, through readExternal, or if using Jini PreferredClassLoaders.

However, with XML, if you have an XML Schema (S1) that describes the types and formats that your parser is expecting from a particular set of data purporting to be applicable to that schema(X1), and you have another XML file(X2) that is applicable to an 'evolved' version of that schema(S1.1), for a system using the first Schema format(S1) to read the new format file(X2), the existing schema must be updated, to be able to recognize new element definitions (even though processing further on may ignore them). The only advantage in this is that with XML you can turn off implicit validation and verification of the XML stream.

BTW, it's good to see some context applied to how distrbuted systems have evolved over the years.

--Calum

Dan Creswell

Posts: 49
Nickname: dancres
Registered: Apr, 2003

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 1, 2005 4:31 AM

Nice historical commentary. What I find interesting is that people seem to mix up the whole synchronous vs asynchronous vs reliability thing with whether something is invocation based or message based. They seem to miss the fact that it's an exercise in abstraction - I can use either approach to implement either synchronous or asynchronous as I require. I'd have expected all this to be obvious given the behaviour/characteristics of TCP/IP and what's been built on top.....

If it matters, I played around with the code moving approach and prototyped an interface for Blitz that allowed a client to move selected code to the server for more rapid execution without the round-trips.

A few people have played with it but not many and feedback is scarce. I think most people simply have problems with any system where the location of the code changes - maybe it's cos they can't so easily model it in their heads or maybe it's some other issue. I'd love to know......

Maarten Hazewinkel

Posts: 32
Nickname: terkans
Registered: Jan, 2005

Hoare's CSP Book online

Posted: Aug 1, 2005 5:25 AM

For those interested in the CSP book by Hoare, it is freely available online at http://www.usingcsp.com/

Isaac Gouy

Posts: 527
Nickname: igouy
Registered: Jul, 2003

Re: Hoare's CSP Book online

Posted: Aug 1, 2005 11:23 AM

For those interested in using CSP in Java

Download the JSCP jar
http://www.cs.kent.ac.uk/projects/ofa/jcsp/

Look at the commercial xCSP products from
http://www.quickstone.com/xcsp/

Howard Lovatt

Posts: 321
Nickname: hlovatt
Registered: Mar, 2003

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 2, 2005 2:45 PM

I have had very positive experiance of passing objects instead of messages, particularly using JavaSpaces which is part of Jini. The problem with message passing is that it is two static, if you want to add a new message then everything, all machines, need updating. With object passing you can distribute the code on the fly not just the value of the objects, much more flexible.

If you want to pass the objects values using XML then you can, at least in Java. You can serialize to XML or use one of the bindings packages, e.g. JAXB.

M. Eric DeFazio

Posts: 10
Nickname: defazio
Registered: May, 2005

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 3, 2005 10:23 AM

Great article, (I wanted to respond to "Eric's" original, but thank you for posting such an in depth analysis here).

First off I heartily agree with the idea "Develop an API on top of message passing" if your system requires such message passing.

Having implemented a "Message passing system" in Java I wanted to respond with my experience and throw out some ideas and questions.

Many people have pointed out that Object passing systems are inherently flawed because all of the "clients" and the "server" need to stay in sync (i.e. new Stubs/Skeletons for EJBs, WSDL and proxies). I'm trying to understand how a message passing system resolves this issue.

Assume the "interface/syntax" of Service/message changes (on the server), and the client sends a message with the old "Payload/Syntax"? What actions should the server take? How would the message be deciphered on the (distributed) server to know which method to be called? (what if the message is "overloaded") And what does each "field" in the message mean (symantically) as it is being invoked on the server?

One (generally bad) way to solve this would be to define a generic interface (i.e. doAction(Map fields)), which pawns off this problem to the server process. (I don't want to write AI on top of my message parser)...Well defined interfaces is my preferred approach to distributed systems.And yes you have to redeploy to all clients each time you make a change.

Then there is the idea of passing "agents" which holds a good deal of promise, it sounds really cool. But at the point we start talking about doing this (or "pushing code" to the server), I think, "Well if it's that easy (just push it code to the server)" why not just physically put the entire App on the server and forget this distributed nonsense?". We can still scale (cluster) the app horizontally, and we bypass all of these other distributed problems.

Maarten Hazewinkel

Posts: 32
Nickname: terkans
Registered: Jan, 2005

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 3, 2005 12:26 PM

> Then there is the idea of passing "agents" which holds a
> good deal of promise, it sounds really cool. But at the
> point we start talking about doing this (or "pushing code"
> to the server), I think, "Well if it's that easy (just
> push it code to the server)" why not just physically put
> the entire App on the server and forget this distributed
> nonsense?". We can still scale (cluster) the app
> horizontally, and we bypass all of these other distributed
> problems.

Actually, you wouldn't push all your code to the server. Typically you push a proxy object, which communicates with your own machine, where the real work is done. The advantage of this approach is that you control both the proxy and your machine, so the communiation protocol is completely private to your software. You can use RMI, XML or pigeons to handle your communication, and the other software on the server wouldn't care as long as your proxy implements an interface it can talk to.

For software evolution, you can create new versions of the interface, have your proxy implement that as well, and 2 clients using different versions of the interface can both talk to your software.

M. Eric DeFazio

Posts: 10
Nickname: defazio
Registered: May, 2005

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 4, 2005 2:09 PM

Duly Noted Maarten,

I suppose the underlying message what I did not effectively communicate in my post is: "How much complexity and overhead are we adding to the system as when we decide to make it a distributed system". The idea that we can just "push code" to a server makes me wonder whether the system should be distributed in the first place.

I've never had the opportunity to use "agent passing", which sounds extremely interesting... I would question, though, how "safe" it would be to allow this in a real business class system. (Am I really going to open up my system to let agents do what they want?...What if the agent decides it needs all of my system resources to accomplish something (full table scan on every table), etc.)

Adding this "agent-passing" to a system (as you describe) makes sense, and it does solve the (immutable interface) problem, however, is the juice worth the squeeze? (IMO it seems complicated, and just opens up another level of complexities and security issues).

By the way when I refer to "Distributed System" I'm referring to a system which is more than the standard "client/server" type (I just want to make sure we're discussing the same thing.)

Thanks,
Eric

Maarten Hazewinkel

Posts: 32
Nickname: terkans
Registered: Jan, 2005

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 5, 2005 1:27 AM

Eric,

The primary barrier to agent consuming massive system resources is not technical, but social/organizational.

Like you said, these systems are not simple, small client-server setups, but serious enterprise systems. You don't open up these systems to code/agents from just anyone. Any agent that needs to be pushed onto this system will go through the same testing and security procedures that other software on such systems goes through.

Like so many technologies, it's a balancing game, trading one type of complexity for another. It entirely depends on your own environment which sort of complexity is easier to manage.

Bradley Ward

Posts: 2
Nickname: bradward
Registered: Aug, 2005

Product Consulting

Posted: Aug 5, 2005 10:20 AM

A closely related topic is the form of consulting that a software product company might offer. I founded and ran a small software company for 20 years, and we were able to field 5 or 6 good engineers as consultants and charge consulting level rates. And the customers were very pleased with the value delivered for the monies paid.

The reason these 5 or 6 engineers were perceived as good value for the customers is that they really did know there stuff. And the reason they really knew there stuff was that when they were not consulting, they were doing actual development on the product itself. So they had a depth of knowledge far greater than any outside consultant could have.

But this model, of course, is not at all scalable, and my company eventually went down the tubes because it never successfully scaled up. Had we scaled up, we would have invaribly started fielding "consultants" that did nothing but consulting, their in depth knowledge would have declined, and their perceived value to the customer would have declined. Sad but true.

I agree with the statements made on the thread about the future being smaller companies with great, in-depth expertise. But there is always a large amount of skepticism from a large company when dealing with a small company. That is why large companies throw such money at large companies like Anderson only to get mediocre quality and poor ROI.

Gregg Wonderly

Posts: 317
Nickname: greggwon
Registered: Apr, 2003

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 5, 2005 10:39 AM

Scott, I don't know if you are aware, but I started a project on jini.org called 'fspace' (flexible space) where I put some code that codified all the issues that I thought were important about iterators verses executors etc. It's till a work in progress, as I haven't got transactions in there yet, but the interfaces are visible, and there is code there that runs.

I mainly wanted to start to see what type of latencies would be involved with an executor, for frequent changes in the executor and similar things with the iterator concept.

One of the primary things that I did was to separate the data stored from the keys used for selection. This allows some data to be marshalled and others not. I also eliminated the use of MarshalledObject for implementation in the space so that you could send whatever kind of data you wanted into the space, and optimize access in your executor, and then potentially recast that data to a different for for return to the client.

More flexibility...

Mike Petry

Posts: 34
Nickname: mikepetry
Registered: Apr, 2005

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 23, 2005 11:26 PM

Fascinating. Google is using an interesting technology known as AJAX that basically consists of moving javascript to the web browser. AJAX is the magic behind GoogleMaps and GMail. So in this case, we have moving code from server to the client. Also some of the p2p apps, such as Nabster, utilize code at each peer.

I think the "marshall by text" approach that is XML is the way to go. Do you really need to pass an object or just its state? Do I need an Account object or do I just need a Name, Number and Amount? If the Account object provides more functionality than just properties, then I have a big problem in terms of coupling the design between the producer and the plethora of consumers of the Account object.

I guess "decoupled" object RPC is possible. With reflection and all the dynamic what-not, maybe collaborating systems can figure out how to work with strange objects that may appear on thier doorsteps.

Cool Piece

Gregg Wonderly

Posts: 317
Nickname: greggwon
Registered: Apr, 2003

Re: Messaging may not be the way to Build a Distrbuted System

Posted: Aug 23, 2005 11:31 PM

> Fascinating. Google is using an interesting technology
> known as AJAX that basically consists of moving javascript
> to the web browser. AJAX is the magic behind GoogleMaps
> and GMail. So in this case, we have moving code from
> server to the client. Also some of the p2p apps, such as
> Nabster, utilize code at each peer.

This is just a heavy client application in that the client is doing the work instead of the server. There is mobile code here as well.

> I think the "marshall by text" approach that is XML is the
> way to go. Do you really need to pass an object or just
> its state? Do I need an Account object or do I just need
> a Name, Number and Amount? If the Account object provides
> more functionality than just properties, then I have a big
> problem in terms of coupling the design between the
> producer and the plethora of consumers of the Account
> object.

Well, there are two pieces here. There is the application, which is what the users care about and then there is the underlying remote communications that creates the app.

Because the Javascript is downloaded code, it can be changed to optimize the client. Google is getting ready, or already has upgraded the client to revision 17 which will include some rendering optimizations that some are saying are great improvements for large applications.

What we see happening here is interesting. Google maps doesn't demand anything about the data that is used to create the application additions to the basic map. You can do whatever you want. Many are using XML documents (I am at http://www.w5ias.com/status) but others are using a database, or other backend store, and some static data in
the HTML page downloaded.

> I guess "decoupled" object RPC is possible. With
> reflection and all the dynamic what-not, maybe
> collaborating systems can figure out how to work with
> strange objects that may appear on thier doorsteps.

I am still very sceptible about how productive this can be without some highlevel agreements on certain aspects of the data and its use.

Flat View: This topic has 13 replies on 1 page

Previous Topic

Next Topic