Summary
About 88% of an iceberg's mass is below water. How much of your class is down there too?
Advertisement
There are many ways to end up with odd designs and everyone has a different notion of oddness, but one thing that always strikes me as odd are classes that have fewer public methods than private methods. Classes that look like that remind me of icebergs. They have one or two public methods above water and a lot of mass below, lurking in the private methods. To me, it often means that there's some other abstraction hidden away inside the class that might be useful to pull out.
The thing that is funny about iceberg classes is that they superficially look like good design. After all, the argument goes, encapsulation is a hallmark of OO, so if we have more encapsulation we have better design. Two public methods and seven private methods? That must be good, we've encapsulated more.. but is it really?
SpanFinder is part of a little Python IDE I was working on a while ago. You give it a list of character offsets which represent the starting and ending positions of method names in a program listing, and then you can use its find_span method to get back an pair offsets (a span) that enclose a position you pass to it. The only methods here that are "public" are __init__ and find_span. All of the other methods are called from find_span (directly or indirectly). It's a typical iceberg, but does it have to be an iceberg at all?
Well, one of the things that I noticed when I was writing it was that the geq and leq methods were really misplaced. If I had a Span class, I could place geq and leq there along with first_of, second_of, and in_span. The segment method is used internally in those methods and it could come along for the ride. With that move, I could cut the iceberg down to size.. two "public" methods (__init__ and find_span) and one "private" method (fold).
class SpanFinder:
def __init__(self,range_list):
def find_span(self,index):
def fold(self):
The Span class would have about six "public" methods and one "private" method (segment):
class Span:
def __init__(self,span_representation):
def contains_index(self,index): # was in_span
def geq(self,right):
def leq(self,right):
def first(self):
def second(self):
def segment(self,index_string,which):
Would it be worth it? At the the time I didn't think so (that's why I was able to find the code in this state) but, I remember seeing the issue and thinking that if I ever went back and had to change that class, the refactoring might be worthwhile.
The thing that I'm pretty curious about, though, is whether it is always this way.. whether every iceberg class has a decent, applicable extract class refactoring. I tend to feel this way about long methods, that any method that has more than say ten lines really is doing more than one thing (unless it is just some long initializing method) and that there is some decent method extraction that you can find if you hunt for it.
I have a feeling that there is something to this, that for any class that has, say, more than 60-70% private methods and more than, say, four or five methods, there's always a decent class extraction that can be done. Anyone have any counter-examples?
One way to look at this is as a question of the tradeoffs between surface complexity vs. volumetric complexity. IME, I find that most programmers are biased towards volumetric complexity while most designers are biased towards surface complexity. Interestingly (to me anyways :-), is that both groups tend to over-focus on one class or a few classes in their analysis and miss many of the consequences of their choices in the overall system.
I don't care much for such rules because they are too one-dimensional. For example, this rule could induce the creation of a lot of (public) classes such that the surface area of the overall system could actually go up. Thereby increasing the complexity of the system.
Also, IMHO, it's important to distinguish between the bulk due to direct, internal implementation and the bulk caused by the use of other facilities. I.e., the difference between "implements" and "uses" (and semi-public "is used by" for active objects (e.g., handlers, callbacks, etc.)). For example, I may write a class that has a very small public interface but which hides a ridiculous amount of pain in dealing with some other sub-system.
Joel Spolsky's "Law of Leaky Abstractions" (http://www.joelonsoftware.com/articles/LeakyAbstractions.html) is another good thing to keep in mind as a lot of arguments w.r.t. the tradeoffs of internal vs. external complexity hinge on the belief that it's an either/or thing rather than a spectrum.
The influence of unit testing and dependency injection containers can make you evolve a design in that direction.
If there's a coherent piece of functionality inside private methods that needs to be tested, you tend to move it out of private methods into its own class so you have a public interface to test against.
Dependency injection containers make you think about privacy as a local, dynamic, object-oriented issue: if you want or don't want a piece of functionality to be available to an object, you provide or don't provide it with a reference to an object that does the job. You tend to make almost everything public so the container can configure it and then narrow down through interface types the functionality provided to related objects.
> One way to look at this is as a question of the tradeoffs > between surface complexity vs. volumetric complexity. IME, > I find that most programmers are biased towards volumetric > complexity while most designers are biased towards surface > complexity. Interestingly (to me anyways :-), is that both > groups tend to over-focus on one class or a few classes in > their analysis and miss many of the consequences of their > choices in the overall system.
I'd feel better about that idea if I felt that the people who favor volume were consciously applying it as a strategy. As it stands, I think it's just easier to add volume than it is to add surface area and I think that is why we end up with such puffy classes and methods.
> I don't care much for such rules because they are too > one-dimensional. For example, this rule could induce the > creation of a lot of (public) classes such that the > surface area of the overall system could actually go up. > Thereby increasing the complexity of the system.
I think it depends on what we call complexity. I guess in an information theory sense increased surface is more complex, but to me it makes things simpler. Once I understand a Span, for instance, it's easier to understand the code that uses it, in nothing else because it's not interwoven so the using code is clearer. But, I agree that many people see it differently.
> Also, IMHO, it's important to distinguish between the bulk > due to direct, internal implementation and the bulk caused > by the use of other facilities. I.e., the difference > between "implements" and "uses" (and semi-public "is used > by" for active objects (e.g., handlers, callbacks, > etc.)). > For example, I may write a class that has a very small > public interface but which hides a ridiculous amount of > pain in dealing with some other sub-system.
Yes, I'd see that as a good exception. Although, I'd probably wonder whether the code under the covers is worth factoring more finely. I can imagine cases where it may not be worth it.
> Joel Spolsky's "Law of Leaky Abstractions" > (http://www.joelonsoftware.com/articles/LeakyAbstractions.h > tml) is another good thing to keep in mind as a lot of > arguments w.r.t. the tradeoffs of internal vs. external > complexity hinge on the belief that it's an either/or > thing rather than a spectrum.
I read it and I get the concept but I'm not sure whether increasing the number of abstractions makes leakiness better or worse. I think I could argue it both ways. The law reads: 'All non-trivial abstractions, to some degree, are leaky.' Maybe that means it's good to have more as long as we can make them as trivial as possible. And, I like that idea. :-)
I like the idea of refactoring a class that's hard to read into two that are reasy to read. And I like the name "iceberg class" too. I've built a few of those.
But my next thought was almost exactly the same as John Mitchell's. Namespace clutter is a valid concern.
Here's my take on deciding between the two: If the refactoring is useful to read the iceberg class, but is *not* useful to anything else, just limit the visibility of the extracted class. Make it an inner class. Make it package-level/internal visibility. You can have both.
[...] > I'd feel better about that idea if I felt that the people > who favor volume were consciously applying it as a > strategy. As it stands, I think it's just easier to add > volume than it is to add surface area and I think that is > why we end up with such puffy classes and methods.
I've seen plenty of example of laziness from both sides so I don't think that's exclusive to one or the other.
[...] > I think it depends on what we call complexity. I guess in > an information theory sense increased surface is more > complex, but to me it makes things simpler. Once I > understand a Span, for instance, it's easier to understand > the code that uses it, in nothing else because it's not > interwoven so the using code is clearer. But, I agree > that many people see it differently.
Note that I said it could increase not that it had to. :-) The key point is that it is a tradeoff -- neither end of the spectrum is a priori "right".
[...] > Yes, I'd see that as a good exception. Although, I'd > probably wonder whether the code under the covers is worth > factoring more finely. I can imagine cases where it may > not be worth it.
The common case here is the hiding of e.g., third-party code that you cannot (easily) make major changes to.
[...] > I read it and I get the concept but I'm not sure whether > increasing the number of abstractions makes leakiness > better or worse.
It's more that it shifts various cognitive burdens and so it becomes an open question as to whether there's any overall improvement.
> I think I could argue it both ways. The > law reads: 'All non-trivial abstractions, to some degree, > are leaky.' Maybe that means it's good to have more as > long as we can make them as trivial as possible. And, I > like that idea. :-)
Naw, it means that we have to actually make things simpler rather than just more simplistic.
> Note that I said it could increase not that it had to. > :-) > The key point is that it is a tradeoff -- neither end of > the spectrum is a priori "right".
I agree, there are tradeoffs, but I just get this sense that there's an appropriate size for most abstractions.. whether we shoot for it or not is another issue. For instance, I just did a run down of the code in the example above. I mapped out the dependencies of each method.. the vars and other methods that it depends on and I got this:
("->" is depends upon, and range_list is an instance var)
When we draw out a DAG for this, it really looks like there is a natural encapsulation boundary there at in_span.
Whether we extract or not is another issue, but the neat thing is that the boundary is there in a way, it's not a matter of whether we're going to put it there, but rather whether we're going to recognize it by pulling out another class.
> <snip> > If the > refactoring is useful to read the iceberg class, but is > *not* useful to anything else, just limit the visibility > of the extracted class. Make it an inner class. Make it > package-level/internal visibility. You can have both.
Alternatively, the helper class(es) can be put in a sub namespace. That way, they can be public, and thus easier to test, without cluttering the namespace.
> Alternatively, the helper class(es) can be put in a sub > namespace. That way, they can be public, and thus easier > to test, without cluttering the namespace.
Good idea, the only thing I would say, though, is: wait a while. Sometimes when you look at a package and see what it contains you may end up thinking that it is an appropriate place for the helper to go. You may even get to the point where you don't think about the helper as a helper any more, but rather a peer.
If you mechanically take classes you've extracted and put them someplace else, the design ends up with packaging criteria like "Well, this package contains the classes I thought should be in the design before January, and this package over here has all the stuff that I didn't know about before January but figured out in January, February and March."
We don't have to marry the first set of ideas that we have about a design, and we don't have to ingrain them in our packaging either. When people extract classes left and right and their conception of their design doesn't change, to me, that means they have some blindspots. Somehow, they thought the original design was "done", and that sort of thing makes entropy smile, because we're letting him win.
I see you've already commented in MethodsShouldBePublic. :>
I agree with "mme me"'s comment:
"If there's a coherent piece of functionality inside private methods that needs to be tested, you tend to move it out of private methods into its own class so you have a public interface to test against.".
EXCEPT I would say one shouldn't do this for the sake of testing, though that would be a side-benefit. The real reason is to improve cohesion. Yes, it increases the surface area, but the complexity of the surface area is managed by effectively dealing with coupling as the size grows.
> We don't have to marry the first set of ideas that we have > about a design, and we don't have to ingrain them in our > packaging either. When people extract classes left and > right and their conception of their design doesn't change, > to me, that means they have some blindspots. Somehow, > they thought the original design was "done", and that sort > of thing makes entropy smile, because we're letting him > win.
I certainly don't disagree. I don't think I was increasing entropy, though: I think the difference in perspective is that I, being new to this type of idea, had in mind only the specific case in your example. I envisioned a scenario in which this new extracted class was very specific and not generally useful. That is to say, my understanding was that the single motivating factor was a cleaner, more readable design. On this very specific basis I combined your new extracted class with the encapsulation that was originally present.
So, I also took a look at http://www.c2.com/cgi/wiki?MethodsShouldBePublic. Wow, that's an eye-opening discussion. I find the idea of *all* private methods becoming public methods of extracted classes very appealing. (I can tell that I'm going to need better refactoring tools that what's built into VS05 before I can apply it, though.) And in that new light, I can very much see what you're getting at with just keeping every method and every class public. I also wonder if it might mesh very well with Uncle Bob's OOP metrics.
I think my namespace concerns are valid but very well addressed by Geert Baeyaert's introduction of a sub namespace.
I think there are two reasons why methods should NOT be public: 1) it is harder to learn to use a class 2) change is getting more difficult - because public methods are a contract - which sometimes cannot be changed easily
1 can be partially overcome with Extract Class. But in general, you have to think twice before Extract Method, which will probably lead to larger methods.
2 is really an issue about public vs. published. In other words, it is OK to have all methods public, as long as there is a way to tell users that some method signatures and semantics are not guaranteed to stay the same.
In Eclipse, there is a convention that public methods in packages that contain "internal" are not published. So what you can do is define an interface in a non-internal package and implement it in an internal package and can have all public methods.
Putting in another way: I think private methods should be accessible, if you really want to (e.g. for testing). You should just be aware that you don't have any guarantees doing so. But making all private methods public removes this red light - all of the sudden you expect all methods to be "safe". The "internal" package mechanism is just a convention to solve this problem - real language support would be better.
I think that this is a quite Interesting topic: "The Relation between the public and the private sizte of a class."
I don´t like the "iceberg"-thing too much, because, what you try to "hide" is complexity, it´s not volume. And an Iceberg is something, where ships collide with and sink.
If you could show how they collide, it would be easier to me to accept the Iceberg-metaphor.
I´d prefer to find another metaphor first. It´s a class that is quite easy to use, like an old pocket radio with one button and one wheel - and that´s no bad thing.
The "Iceberg"-class is fine for the user of that class, because a lot of responsibility is inside that class. But it could be a nightmare for the developer of that class, if he has internal problems to solve.
So, maybe those classes may under-estimate the User? Or they are just greedy?
If they are difficult to maintain, they should be separated into easy maintainable classes, not matter if someone else will use that classes or not.
That´s what I think about that.
A Parser could be such an example. He has just one method: nextToken()
But he may look very ugly. After separating a good scanner, if not already done, it may look much nicer. The thing is to find that "scanner" as a useful separateable thing.
Having such classes could mean that the developers don´t know the right Patterns to separate in. They should probably read more.