Sometimes, being too aggressive about handling exceptions can be worse than not being aggressive enough. Over the last couple of months, Michael has been tweaking the NetResources library (it's in the public Store) to deal with some locking issues in the caching code. Last month, there was a change made that caused a few problems - it started trying to handle exceptions that it shouldn't. Here's the relevant code snippet:
response := [client executeRequestDo: [:connection | client getNotifyingResponse: connection]]
on: (self class httpExceptions, Error) do: [:ex | self behaviourForException: ex. nil].
response ifNil: [^self].
The #executeRequestDo: message send executes the HTTP request (or not; it actually implements conditional-get). When you make an HTTP request, any number of bad things can happen - you can get timeouts, other network errors, redirects - if you don't get a response object, you'll get an exception. Some of those exceptions (like redirect) can (and usually should) be resumed - while others (server error) most certainly should not be. Still others - timeouts - can be resumed or restarted, but the decision is an application level issue.
Note the exceptions being caught - #httpExceptions is a list of http and network level exceptions, and Error is a catch-all. The logic is in #behaviorForException:. The mistake was in the old version of this method, and in the old version of the handling code. In the previous version of the above, the code was catching Error (i.e., pretty much everything, undifferentiated). Here's the handling code:
behaviourForException: ex
ex class = Security.SSLBadCertificate
ifTrue: [Security.X509Registry default addCertificate: ex parameter parameter].
(self class possibleTimeoutExceptions includes: ex class) ifTrue: [^self class triggerTimeoutEvent: url].
(ex isResumable and: [self class exceptionsWeShouldResume includes: ex class] )
ifTrue: [ex resume]
ifFalse: [self reportTheErrorType: ex]
What that does is check the sort of exception we got, and then handle it based on that information. The old code just checked whether the exception could be resumed and then did so; that led to situations where BottomFeeder would report low level socket errors - the code being resumed was in no state to be resumed. The relevant check is not only whether the exception could be resumed, but whether it should be resumed. In this case, that check is a simple check against this list:
exceptionsWeShouldResume
"which ones should we actually resume?"
^Array
with: Security.SSLBadCertificate
with: Net.HttpRedirectionError.
Note that the handler stuffs the certificate, which allows us to resume. In the original caller (excerpted at the top) there's logic for dealing with a redirect, so that gets dealt with. Other errors are presumed to be transient, and simply reported. For the purposes of an application like BottomFeeder, where we'll try to read a feed every hour (or whatever the interval has been set to), there's no reason not to ignore most errors. The only additional handling - which is done at the feed level - is to check the response to differentiate between a 404 (presumed to be transient) and a 410 (permanently gone). In the latter case, the app automatically disables the feed in question.
The bottom line is this - you shouldn't mindlessly resume exceptions. It's as sloppy as swallowing MNU errors, and gets you into states that are really, really hard to diagnose.