Ken Arnold
Posts: 9
Nickname: kcrca
Registered: Mar, 2002
|
|
Re: Designing Distributed Systems
|
Posted: Sep 29, 2002 11:44 AM
|
|
I can try. The join problem is this: Suppose you are asked to become a participant in a transaction. You talk to the TransactionManager and ask to join, but you get a remote exception. Are you in?
Maybe you are and maybe you aren't. But if you want to continue to do the work that requires the transaction, you can simply try to join again. The server will just consider redundant joins from the same participant to be successful. Repeat until you are ready to give up, or until you get in.
So far so good, but to really make it work we have to consider another case. Suppose you joined the transaction successfully and then crashed before you could record that information. When you restart, you will not know anything about that transaction. This is OK by itself -- the TransactionManager will eventually ask you to vote on the transaction and you will reply "I don't know about that transaction", causing the manager to abort the transaction.
The problem comes when you are asked to perform another operation under the transaction and therefore innocently try to rejoin. A trivial idempotency would allow you to do so, and then you would be a member but have forgotten the early (pre-crash) stuff. This would be bad.
So for that reason, the join call has a "join state" identifier that you pass along. Whenever you are unsure if you have all previous join state (such as across a crash) you generate a new ID. Now the manager will see a second join from the same potential participant, but with a different join state ID. Now the manager knows that something has gone wrong. It will refuse to let you in, and it will abort the transaction, rolling everyone back to a previous (consistent) state.
The important facts about this are (a) The client's failure to succeed is something it can handle; (b) The server is unharmed by executing a call even if the client doesn't learn about the execution. In this case a client that hasn't crashed can retry or give up as it prefers, and a client that has crashed will be stopped from causing harm. And the server is unharmed because it can always abort the transaction in the future if things go weird.
Another example might be logging: I will tell the server when I do something. If that fails, I will repeat the call, but only a few times. The client knows what to do (retry a few times) and the server can live equally well by ignoring redundant logging and by not getting the logging that happens during a network failure.
|
|