Summary
A message-based design is fundamentally the right way to think about building a distributed system, as opposed to code sharing, remote procedure calls, and the like. This article explains why.
Advertisement
I read an article last year that said the complexity of J2EE resulted from
the basic architectural designs it's intended for--architectures that rely on
distributed code, where you pass objects around the network and invoke methods
on objects that reside on other systems. The article's premise was that the
best way to build a distributed system is to use messaging--to use a protocol-based
design, rather than an API-based design. If I ever find that article again,
I'll post a pointer to it. It was a good one. It got me thinking...
I knew J2EE was complex. For me, it was too complex. The simplifications
and ease-of-use improvements coming up in the near future will hide a lot of
that complexity so the developer doesn't have to deal with it (see J2EE
Ease of Use)--but the complexity will still exist. So I've been wondering
for a while now: What is it that makes a loosely-coupled, message-based design
so much better?
I recently spent some time with Bill Venners and Frank Sommers of Artima. At
the time, they were discussing that very question. Something about the conversation
made my thoughts gel, and things sort of clicked into place. This post shares
those thoughts.
To my mind, the factors that tend to make a message-based design clearly superior
most of the time are:
When you're sending messages, it's easy to inspect them--so it's easy to see
which messages that are being passed. That makes debugging a whole lot easier.
To test a module, you send it a message. You can then inspect the message
coming back to compare the results. You can do your unit testing properly, in
other words, testing each module in isolation. In an object-sharing system,
on the other hand, you have two choices. One choice is to wait until everything
is running to start testing--which means you have a host of bugs waiting to
be found, all of which are deep in the code and complex. That's horrible.
The other choice is to build a unit testing framework that lets you plug in
a real module, surrounding it with a framework and feeding it mock objects you
can use to test with. Setting all that up can be nearly as much work as the
application itself.
You can add additional content to an XML message at will. The receiving code
doesn't even need to know about it. So you can upgrade the sending end of the
system without necessarily affecting the receiving end. When you're passing
objects around on the other hand, that's frquently impossible to do--unless
the "objects" are simply messages in disguise as, for example, a hash
map that contains key/value pairs.
You can even "immunize" the system against drastic change by providing
a wrapper that directs messages in the old format to an older version of the
module, and messages in a newer format to the new version of the module. That
kind of implementation makes the system even more "loosely coupled",
which lets you upgrade the system organically, one part of a time. That's an
important aspect of a system design, whether you're a wholesaler with 120 clients
scattered around the globe, or a corporate resource that's providing services
to a dozen different departments. When you update your main system to provide
new capabilities, you want let your clients to continue to access the old capabilities
until they get around to upgrading. That gives your system the ability to evolve
in a somewhat organic fashion.
In a distributed-object system, on the other hand, you're pretty well forced
to upgrade everyone on the same timetable. Then you throw a giant switch and
hope the new system works as well as the old one did--and that people will like
it. (With the message-based system, they could always go back to the old way
of doing things, if they needed to. With the distributed-object implementation,
they don't get that choice.)
If you're passing XML messages around--which are essentially a glorifed version
of ASCII text--there is no requirement whatever the different parts of the system
be implemented in the same language, on the same operating system, or even on
the same platform. There is no need particular need to make sure that the systems
are "compatible", beyond a minimal ability to communicate. That's
interoperability.
And if you decide you want to use a different language, operating system, or
platform, you're free to do so. You can convert part of the system and stop
there, or do the whole enchilada eventually--one module at a time. Once again,
you're free to evolve the system in directions you choose.
Object-construction is relatively expensive. And object by it's very nature
preserves state. So a distributed-object design will tend to use state-smart
objects. But objects consume resources, and when you have a lot of state-smart
objects, most of the time they're just sitting around waiting to be poked so
they can take the next step, whatever that turns out to be. On top of that,
you have the problem of distributed garbage collection. Garbage collection by
itself is hard enough. Distributed garbage collection just has to be a nightmare.
If you're passing messages, on the other hand--for instance, XML messages,
the "state" information the receiving object needs amounts to a fistful
of extra bytes. So the objects doing the work on the receiving end don't have
to record any state information. That's statelessness.
Although transmission time is marginally longer with such a design, the overall
load on the system is drastically reduced. So the system tends to keep working
when the transaction volume is immense. That's scalability.
For a distributed system, a protocol-based, message-based design is almost
uniformly an improvement over a distributed-object design, for good reason:
transparency, testability, immunity and evolution, as well as stateless scalability.
The NetBeans collaboration APIs described in JavaOne
2005, Day 3: Share the News! show how clean and simple a distributed system
can be when it's message based.
Agreed there's an appeal to the transparency of text based messages. But XML really isn't that human readable, and the downside of this is that it's a lowest common denominator, IDL re-envisioned approach.
Yes a very real downside of the remote object approach is that it couples deployment of client and server. But the alternative is an approach that discourages refactoring / simplifying of of service APIs yet service APIs are in many ways the most important and the ones that we really should be able to refactor as we develop our understanding of the problem domain.
I think the XML messaging approach is wonderful for Amazon, or Ebay, where the service author has thousands of service users with disparate technology platforms. But when I come across this between systems in the same organization it is frequently more about firming up organizational boundaries than meeting any business or technical requirement. Conway's Law continues to be true, alas, http://www.jargon.net/jargonfile/c/ConwaysLaw.html
> Yes a very real downside of the remote object approach is > that it couples > deployment of client and server. But the alternative is an > approach that discourages refactoring / simplifying of of > service APIs yet service APIs are in many ways the most > important and the ones that we really should be able to > refactor as we develop our understanding of the problem > domain. > Interesting point. The next version of J2EE will allow refactoring across the application--including Java beans, since they become Plain Old Java Objects (POJOs) in that version. As you indicate, there is benefit that can be derived from that.
It seems obvious that there must be some components of a system which *should* be strongly coupled. The alternative is a SmallTalk approach where everything is truly a "message". It's something like a stereo system. Any one module is a single unit (tightly coupled), but the modules are wired together (loosely coupled). Figuring out the module boundaries is the hard part. After that, it's messaging between modules and invocations within them.
As for XML--actually, it *is* human readable. That is its real strength. That's what makes it preferable to binary document formats (which tend to be proprietary if only because they're impenetrable). It rapidly superceded the binary messaging formats that preceded it, largely because it is transparent both to humans (readable) and computers (parsable).
XML's one severe deficiency from the standpoint of generalized document editing is the inability to identify different entities as either "inline" or"structural", and to specify style sheet properties for inline entities. Were that ommission rectified, XML would soon replace plain text.
BTW: One of the interesting things in the Eclipse world that I never got around to reporting from JavaOne 2005 was the SSE intitiative--the Standard for Structured Editing. Personally, I think that's an idea who's time is overdue, and I'm glad to see things moving in that direction.
Agreed. I've found that most people who dislike XML tend to write what I'm calling "inline XML" as opposed to properly formatted XML. They write the equivalent of obfuscated Perl code and then complain that it's not readable.
Protocols are the right way to build a distibutes system, but APIs are the right way to use a distrubted system. The first thing people will do is build an API over the protocol to make it easier for programmers to use. So I do both. Make the protocol message based AND provide a programmer API.
While I agree that Messaging is a very useful way to access remote functionality (services seems to the the buzzword at the moment) I think that using messaging brings its own problems.
The main problem that I come across is to do with transactions. I have been involved in a lot of systems that used messaging (both as a designer/architect of these systems and as a reviewer of other people's designs, and, a while ago now as a developer) and the bit that people always have difficulty understanding is how to deal properly with failure.
If you have a synchronous call (rpc or whatever) then it will either work or it won't (assuming no bugs - but that's another story). In Java it will throw an exception if it doesn't work. In the old days of C the return value would tell you. If it doesn't work then you have a choice - you can either retry or you can put the object that did the call into some sort of error state and leave it for someone or something else to sort out.
If however you are using an asynchronous messaging system then the number of failure scenarios are much greater: Maybe the message didn't get sent, maybe it got sent but you don't know it, maybe it got sent but not received, and so on and so on right through the loop from sender to receiver back to sender. To deal with this in a sensible way you really need a state machine so that your system can deal with resends, replies it didn't expect, duplicate replies etc. And of course the receiving end (the service you are calling) needs one too. This is fine if your object making the call has a state machine, or is part of a workflow anyway (which it often will be) but it is an overhead that needs to be considered if not.
Another thing to consider is if you are using JMS as your messaging service then you have all the problems involved with using an api to a remote service anyway.
I'm not saying that Messaging is a bad thing and shouldn't be used. As I said before, I have both designed and reviewed systems that make extensive use of messaging. All I'm saying is getting it right isn't easy and making developers understand the complexities and pitfalls is not easy either.
> If however you are using an asynchronous messaging system > then the number of failure scenarios are much greater: > Maybe the message didn't get sent, maybe it got sent but > you don't know it, maybe it got sent but not received, and > so on and so on right through the loop from sender to > receiver back to sender. To deal with this in a sensible > way you really need a state machine so that your system > can deal with resends, replies it didn't expect, duplicate > replies etc. And of course the receiving end (the service > you are calling) needs one too. This is fine if your > object making the call has a state machine, or is part of > a workflow anyway (which it often will be) but it is an > overhead that needs to be considered if not.
This is eased greatly by using transactional message services. If they accept a message, you can count on it being delivered eventually. You don't know exactly when, especially in the face of severe system failures, but unless your message store database gets totally fubar-ed, it will be delivered. And if one of your databases does get wiped, that will leave you in the deep in any system architecture.
> Protocols are the right way to build a distibutes system, > but APIs are the right way to use a distrubted system. The > first thing people will do is build an API over the > protocol to make it easier for programmers to use. So I do > both. Make the protocol message based AND provide a > programmer API.
> What a fascinating idea. APIs and protocols, without actually distributing any objects. Brilliant.
> ...using an asynchronous messaging system > then the number of failure scenarios are much greater: > Maybe the message didn't get sent, maybe it got sent but > you don't know it, maybe it got sent but not received, and > so on and so on right through the loop from sender to > receiver back to sender. > Todd Hoff's suggestion fits right in, here. The wrapper around the protocol does any receipt-verification and retrying that's required, and eventually generates an exception when it gives up.
> > If however you are using an asynchronous messaging > system > > then the number of failure scenarios are much greater: > > This is eased greatly by using transactional message > services. If they accept a message, you can count on it > being delivered eventually... > Aha. Which have you used, or look interesting enough to recommend?
> > Protocols are the right way to build a distibutes > system, > > but APIs are the right way to use a distrubted system. > The > > first thing people will do is build an API over the > > protocol to make it easier for programmers to use. So I > do > > both. Make the protocol message based AND provide a > > programmer API. > > > > What a fascinating idea. APIs and protocols, without > actually distributing any objects. Brilliant.
Isn't that the first law of distributed programming? Avoid distribution where possible. I have seen many more "over distributed" systems than I have under-distributed perhaps because some believe that an extra tier will magically "add scalability."
I really liked the concepts of immunity and organic evolution. I never heard those words before (I think I have to visit the ServerSide.com web site more frequently) but the ideas encapsulated on them make a lot of sense. Thanks for elevating my vocabulary!
Also, I also found great when you put that some objects are in fact messages in disguise, like hash maps that contains key/value pairs. My experience at work is exactly with a kind of system like this, where all modules communicate by means of hash tables. You debug every logical unit (or service) just by confronting the input and output document objects. In other words, you dont need any environment debugger a tool able to plug in the development engine and read the entity objects. Instead, any way available to intercept the messages is good enough! Just catch the messages (for real messages, how you do it depends on the underlying communication protocol; for messages in disguise, you can serialize the objects) and inspect them. No need of mocks objects or container frameworks!
The only remark I have to add is regarding your comment: In a distributed-object system, on the other hand, you're pretty well forced to upgrade everyone on the same timetable. Even for an API-based middleware, where the objects are fat, as long as the interfaces are respected, you still can write wrappers to mold the objects to the old format.
While I was writing down this comment I thought about what I was writing and I think I maybe realize your point: it is pretty easy to be in a situation where you are forced to alter the interfaces for a given fat object. There is where the wrappers go to hell. On the other hand, looking from this angle, messaging is a superior architectural choice.
Regards, Claudio Lyra bits and bytes cruncher, a.k.a. Software Engineer
After posting my own message I went through the other messages and I found very interesting points.
Two things: a) First, on the concerns with transactionality, retries, logging and the alike: messaging does need an "application server". It does not come for free, either you build your own or you choose one from a vendor. The benefits of messaging will show up after that. In a corporate scenario (Conway's law), to keep an inventory of existing API libraries or message formats are both similarly difficult. I don't know a golden formula to solve it.
b) To develop both an API and a protocol based layers seems to be overkill. Anyway, each case is a case, and it might be necessary in specific situations. Anyway, it sounds more like the job for a "middleware vendor" than for regular "IT department" teams to do.
finally { Someone has mentioned simplicity: it reminded of the Occam's razor: when faced with options choose the simplest one. It also came up to my mind one of the XP tenets: don't add bells and whitles unless extremely necessarily. Keep the solution at the minimum. }
Using protocols instead of APIs is an easier way to publish contracts between a collection of objects.
For example, protocols are one of the techniques people use to design interactions between agents in a multi-agent system and it really scales to distributed and open systems.
There are still theoretical issues to address but using protocols will better do than API since you deal with autonomous entities whether active objects or agents
> finally { > Someone has mentioned simplicity: it reminded of the > the Occam's razor: when faced with options choose the > simplest one. > It also came up to my mind one of the XP tenets: don't > n't add bells and whitles unless extremely necessarily. > Keep the solution at the minimum. > Right you are. I owe the agile guys big time for making a mantra out of this one. If I had back all the time I've spent coding "somebody might need this" features that never got used--and for which the hardware no longer exists even to run the program--I'd have been able to do something useful with my life...
Flat View: This topic has 50 replies
on 4 pages
[
1234
|
»
]