Summary
According to the Design Patterns book (aka GoF), the intent of the Iterator pattern is to "Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation." Or more simply, an iterator unifies access to containers.
Advertisement
As I observed in the previous weblog entry, the C++ STL does not try to treat all the containers as "all one type" by giving them a common interface. Instead, the C++ STL algorithms interface to both containers and arrays using the same mechanism: the iterator. You don't pass a container to a function, you pass iterators -- usually two of them, one to denote the beginning and one the end of a sequence (this way you can also denote subsequences).
The Java library designers, however, appeared to find commonality in List, Set, and with J2SE5, Queue; enough commonality to unify them under the Collection interface.
What does it mean to unify List, Set, and Queue? I would argue that what they appear to have in common is structural rather than functional: all three involve holding objects. However, what they do with those objects is very different. A List keeps elements in order -- this is the primary tool for storing objects for later use. A Set prevents duplicates, and is most often used to keep track of things, so you can ask, "is this object in the set?" And a Queue is primarily for communicating between processes (even if those processes happen to share the same thread): one process puts objects on the Queue, the other process consumes the objects. All Queues in the standard Java library keep the objects in the enqueueing order except for PriorityQueue. LinkedList implements Queue, but most of the queue implementations are for concurrency.
The problem with this approach is that classifying all these containers under Collection says that they are all the same type -- they are all basically the same thing. Or put another way, that they all happen to hold objects is all that's necessary to unify them under a common interface.
But almost all classes hold other objects, and we don't say that they are the same. Instead, we emphasize their behavior, their contract as expressed by the interface. And the behavior of Set, List and Queue are quite different.
On top of the alternative approach taken by C++, the Set, List, QueueandMap are all "container classes." They all hold objects. But the unification of "Collection" didn't extend to Map, because in the case of Map the lack of similarity is obvious, whereas in the case of the other three it can appear that they all have some kind of "sequence" behavior. However, "sequence" behavior is precisely what an Iterator expresses, and Maps also produce Iterators.
An argument for inheritance is the "Liskov Substitution Principle" where a derived type can be substituted for the base type. What you'll usually see (and this may be the precise definition of Liskov Substitutability; perhaps someone can clarify this in the comments) is a common interface, which is not extended but rather implemented to produce different behavior for different subtypes.
But the Collection hierarchy is strange. The base type Collection is duplicated exactly, without extending the interface, in the Set class, so a Set is perfectly substitutable for a Collection. However, List and further LinkedList add significantly important methods to the interface, and Queue adds different methods. In addition, the Queue methods could provide the Queue behavior by themselves, without the rest of the Collection interface.
I suggest that the appearance of the Collection interface is a result of "coincidental commonality," and that List, Set, and Queue are not the same basic type.
Examining the standard Java library code using the Python tool shown in my prior weblog posting (which is limited; it can only discover methods whose argument lists are on a single line), here is what I found. First, the number of examples I discovered using the following as arguments:
Iterable one of the newest standard interfaces (it's actually packaged in java.lang). It buys you more than passing an Iterator, because an Iterable is a class that will produce an Iterator, and therefore you could make multiple passes through a container -- this would seem to be a good solution for unifying access to containers. However, no classes in the standard Java library were detected that pass Iterable as an argument.
Much of the time, even when a specific type like Set is used as an argument, an Iterator is still taken. In this case, the Set is specified to restrict usage to that type, rather than trying to create a more general method.
In other cases the container is simply passed through to another method call, which doesn't give you information one way or the other, so these cases can be ignored (other than to note that, when they specify a subtype of Collection such as Set, they are placing a restriction on what can be passed).
Every once in awhile there's a call to a method like toArray() that is part of the Collection interface -- but again, this could be accomplished with an iterator.
In the standard Java libraries, Maps are most commonly used as "messengers" a.k.a. "data transfer objects," to move clumps of named data around. In these cases, they are really being used as untyped objects, so you see a lot of dynamic type checking going on. It's interesting to note that all Python objects are implemented as hash maps, so this behavior is built in and has better support than it does in Java. In many cases where a Map is used as a data transfer object in the libraries, a class could have been created instead, but clearly the author of that library saw some benefit to the dynamic nature of the "object made from a Map."
Iterators can also provide a safety mechanism (by using fail-fast iterators to more easily spot bugs) against concurrent modifications.
As an aside, I came across some strange code when doing this analysis. One file is src\java\security\cert\X509CertSelector.java
private static Set<GeneralNameInterface>
parseNames(Collection<List<?>> names) throws IOException {
// ...
Iterator<List<?>> i = names.iterator();
while (i.hasNext()) {
Object o = i.next();
if (!(o instanceof List)) {
throw new IOException("expected List");
}
List<Object> nameList = (List<Object>)o;
// ...
What's strange about this is that it appears at first to be only partially-rewritten code. Notice that the Collection that's passed in is parameterized on List<?>, and the Iterator that's taken from the Collection argument is also parameterized, but when next() is called the type is ignored and the result is upcast to Object. That result is then tested to see whether it's a List -- but presumably it must be, from the static type checking provided by generics. Finally, it's recast, but to a parameterized type. So at first you're thinking that maybe they just stopped using parameterized types in the rest of the method, but then they reappear after the hand-coded dynamic checking.
I saw this approach in a number of places. But I also saw a fair amount of library code that hadn't been generified, so it was puzzling. If you want to study the code examples yourself, run the Python program from the previous weblog entry (it's been rewritten a few times so it's improved a lot).
I'm guessing that this code was rewritten after generics was introduced. What I'd really like to know is the motivation for rewriting the code this way. Because one explanation is that the person rewriting it -- presumably someone at Sun -- was not clear about what the generic code was actually doing here, and so thought that maybe it was safer to leave the hand-coded dynamic check in addition to the static checks that came from using generics. If that's the case, it lends more credence to Ken Arnold's argument that generics are simply too complicated, if the library writers at Sun don't get them.
I haven't looked in the standard library; but I am surprised you haven't found lots of constructors of this form:
public MyList( final Collection c ) {
this( c.size() );
addAll( c );
}
I am surprised you haven't found these constructors because it is recommended that you provide a copy constructor for a Collection, it is the obvious method of writing a copy constructor, and they are very useful for converting between one type of collection and another. Whilst these could be written with an Iterable instead of a Collection the size method in Collection is very handy.
On a more general point I don't get why you are so against Collection. If you just want an Iterable then use one. If one of the other operations in Collection is useful then use Collection. From a type point of view you seem to be splitting hairs as to whether List etc. have a common base or not. At the very least Collection is a useful common adjective for related classes.
Your argument is a bit like saying that Object shouldn't be the common root of the types because hardly anyone ever calls the methods in it. If you search the standard library I bet there are considerable more calls to non-Object methods than to Object methods :)
PS I agree with you that the code snippet is strange, in adition to your points the net effect is to cast a List< ? > to a List< Object >. These are not the same, a List< ? > can contain any object type whereas List< Object > can contain only Object instances and not instences of classes derived from Object. It is difficult to say why they coded this snipet this way; it could be partially converted code, for example the method is called from pre-generic code and generic code and therefore the checks are still necessary because the genric compiler doesn't check the pre-generic code.
If you look at the code extraction tool you'll see that it explicitly excluded files that were simply implementing the various container classes. I wanted to see examples of usage rather than just implementing the container methods. Also, they aren't be in the form you show, but use generic type parameters.
It's not about being "against" Collection. It's about understanding it -- in particular, whether the design is appropriate, and possibly guessing what the motivation behind it is. There are many cases where design approaches taken in both the Java language and libraries were innapropriate and/or downright mistakes. Occasionally even the Java designers at Sun will admit this, but far too rarely, so we are left to our own devices to discover that "oh, you should never actually use or rely on finalize()" or "clone() only works occasionally." The list is fairly extensive, and if you are aware of it, you can navigate around these things and be a functional Java programmer.
What I don't get is why there's such a reaction to trying to analyze the language, so that we may understand its flaws, so that we may be better Java programmers. I guess it's probably because the marketing flaks at Sun have done such an effective job at getting the word out that Java is perfect, so that a large faction of people don't want to question it but instead accept it as gospel. This is counterproductive, and worse, will lead to the demise of the language. For example, it's only because people were willing to examine the possibility that threading in Java has always been broken that we've ended up with threading in J2SE5 which might not be. But if the "everything in Java is perfect" attitude had persisted, then, as multicore architectures become more prevalent, people would have eventually just decided that they couldn't ever get Java to work correctly on those platforms, and moved on to something else. Because someone was willing to question decisions made about threading in Java, it (probably) got fixed.
> At the very least Collection is a useful common adjective for related classes.
This is the very point I was trying to make in the post: the classes are not related. They only have structural similarity, but the essence of the classes are different and they don't belong in the same heirarchy.
> Your argument is a bit like saying that Object shouldn't be the common root of the types because hardly anyone ever calls the methods in it.
That's not my argument at all. "Everything is an Object" suggests a common root class. It has nothing to do with how often a method is called, it has to do with the appropriateness of the abstraction. All Objects have things in common; tragically in Java, Object isn't the common root of all types, since we have the primitives off by themselves, not being Objects.
My argument is that just because a class happens to hold other objects doesn't mean it is the same as other classes that happen to hold objects. And that arbitrarily shoehorning such classes into the same hierarchy confuses the issue.
> for example the method is called from pre-generic code and generic code and therefore the checks are still necessary because the genric compiler doesn't check the pre-generic code.
This is an interesting observation that deserves further thought. If it is true, it may open a significant can of worms, because the primary reason for the erasure implementation is "migration compatibility," so that new (generic) code can be called from old non-generic code. Your suggestion would mean that you would have to guard all your code this way, using generics but then remembering that erasure removes the type information. I notice we don't see this kind of code in, for example, ArrayList. Perhaps it's only for special cases, but I'm very skeptical about this. If the result is that we use generics AND we have to backfill in places by adding runtime type checks, it seems counterproductive. But I suspect that the person who rewrote that code may have just been lazy -- if there was an actual reason to leave it that way then it's not something I've seen in any of the generics discussions, and I will find the ensuing discussion very interesting.
I haven't taken the time to study the snippet, but a possible explanation may be that there is still in some cases a fair amount of defensive coding required within generic implementations to account for the possibility of its being used via a rawtype reference. The snippet is from the java.security.cert package, so a fair amount of caution/paranoia is fairly warranted, after all. If the calling code is executing this method via a rawtype reference (rather than a generic reference), then the implementer is potentially justified in double checking things, especially in order to throw something other than ClassCast. You probably also need to account for the possibility that the caller is casting the argument in order to get the compiler to accept it: I'm guessing if you're deliberately trying to abuse the security package, you're not too worried about compiler warnings.
This may not be true in this specific case: it's a private static method but the snippet doesn't include all of the callers of that method, so we don't know that the parm isn't coming from an instance method.
>> These are not the same, a List< ? > can contain any object type whereas List< Object > can contain only Object instances and not instences of classes derived from Object.
This is not correct:
- List<Object> can contain instances derived from Object - any such instances can be added to an existing list - A reference to a List<Object> can hold any List<Object> instance
OTOH
- a List<?> reference can hold a List<Object>, but not vice versa - nothing can be added to a list via a List<?> reference - you can't actually instantiate a List<?> instance: the wildcards only control what types of Lists can be assiged to particular List references.
> > At the very least Collection is a useful common > adjective for related classes. > > This is the very point I was trying to make in the post: > the classes are not related. They only have > structural similarity, but the essence of the classes are > different and they don't belong in the same > hierarchy.
But, Collection isn't a class, it's an interface, so the question isn't "are they related", it's looser, it's "is there a context in which it makes sense to use one of these as one of those?"
> But, Collection isn't a class, it's an interface, so the > question isn't "are they related", it's looser, it's "is > there a context in which it makes sense to use one of > these as one of those?"
My perspective is that a type is defined by its interface. Not the keyword interface in particular, but the methods that you can call for it in general. A class happens to be an implementation of an interface, but they both express a type.
I think what's relevant in this discussion is the relationship between types, and that's what inheritance expresses, regardless of whether you're talking about inheriting classes or interfaces.
But even when you say "one of these" and "one of those," I still think you're talking about type relationships. And "using one of these as one of those" is type-substitution, an "is-a" relationship.
Put another way, class inheritance is generally taken to mean implementation inheritance (although it can also pertain to type inheritance), whereas interface inheritance cannot be confused as implementation inheritance -- you are only talking about the abstraction of a type, not its implementation.
So I do not think you can say that because you are inheriting interfaces that you do not have to be expressing an is-a relationship. If anything, I think you are farm more constrained to expressing an is-a relationship with interface inheritance.
I would also put forward another question as an example of why Collection is a result of "coincidental commonality." Consider Queue, the latest addition to the Collections subinterfaces. The methods in Queue stand alone; you could have a completely functional Queue using the methods that are ONLY in the Queue interface, without needing the methods in Collection. Doesn't that suggest that Queue inherits from Collection because ... well, we already had List and Set so it would look funny if Queue didn't. Or worse, it would suggest that maybe Collection was a mistake.
And if the possibility that this could be another mistake in the design of Java offends you, I'll drag out that hidden-away insane uncle, Stack ... which inherits from Vector (which synchronizes some but not all of its methods, and let's just not go there). This was one of the original classes in Java (so need not worry about offending Josh Bloch, which I may have by questioning Collection), back when the library was small, and containers are so fundamental that I would suggest that Gosling himself must have seen and approved of it, and ... well, the implications are just too harrowing to consider.
Are there two questions here? A) should the interface Collection be the super-interface for both Sets and Lists, and B) being, shouldn't more methods take Iterable rather than Collection as their argument? I think B) is a slam-dunk suggestion that few have had time to move to, but I think A) is not.
Look at all the methods of the Collection interface - add(), addAll(Collection), contains(Object), containsAll(Collection<?>), clear(), remove(Object), removeAll(Collection<?>), retainAll(Collection<?>), size(). It certainly looks like a "collection" to me, and if both Set objects and List objects can satisfy this interface, why shouldn't they both extend it?
And it's also pretty clear that they are both collections of arbitrary objects, so I don't get your argument about some classes containing other classes but not extending the "Collection" interface. Seems like a bit of a strawman.
> > But, Collection isn't a class, it's an interface, so > the > > question isn't "are they related", it's looser, it's > "is > > there a context in which it makes sense to use one of > > these as one of those?" > > My perspective is that a type is defined by its interface. > Not the keyword interface in particular, but the methods > that you can call for it in general. A class happens to be > an implementation of an interface, but they both express a > type. > > I think what's relevant in this discussion is the > relationship between types, and that's what inheritance > expresses, regardless of whether you're talking about > inheriting classes or interfaces. > > But even when you say "one of these" and "one of those," I > still think you're talking about type relationships. And > "using one of these as one of those" is type-substitution, > an "is-a" relationship.
I agree, and that's what I was talking about. I'm not going to defend Collection as it's written, but here's something that is worth looking at. Take a look at Collection and strip away all of the operations which are not marked as optional (all the ones which make it an LSP nightmare). Here's what we have: contains, ContainsAll, equals, hashCode, isEmpty, iterator, size, and two forms of toArray.
To me, that's a perfectly decent interface and it depicts behavioral commonality. The problem is that they reached too far with all those "optional" operations.
I agree with some of the others that there is plenty of commonality in Set, List, etc. that justifies a Collection interface. I actually like it since it makes it so easy to change from List to Set when requirements/understanding changes, but we (and Sun) could and should make more use of Iterators.
BTW, Map and List are actually quite similar: the only difference is that the "index" is defined set of keys versus a continuous range of integers, also a set.
I am more concerned these issues:
The varying styles coming out of Sun, like Enumerators instead of Iterators, the optionally working methods like remove() on iterator, etc. Changing collection class may break code!!! Does anyone use the remove() while iterating over a collection?
Another concern is the lack of flexibility in Collections, e.g., I would like to see something like this: new Set<T>(Comparable<T>) so that I can use other equality checks than just T.equals()
The missing Trees like Red-Black Tree, etc. in the provided implementations.
Actually, Collection is the logical evolution of Iterator.
Because using Iterator affects its state, we want a seperate instance for each use. Iterable lets us do this.
The problem with using Iterable is that some common operations require checking all elements for an external class, but are orders of magnitude faster if performed by the Iterable instance itself: size checking is obvious, ArrayList can return an array of its elements with just an array copy, and HashSet and sorted collections can do fast inclusion checking. In order to have the implementation handle these cases where applicable, we need a subtype of Iterable, and we get Collection! All of Collection's non-optional methods could be done by a utility class working with Iterable, but would often be slower.
Thus, Collection fills the same role as Iterable, and the only reason Iterable is now seperate is the new for loop. Even in JSE5, nothing in the JSL implements Iterable directly. I'm not saying the JSL's Collection is or isn't done well, just that its niche is necessary and logical.
Actually, Collection is the logical evolution of Iterator.
Because using Iterator affects its state, we want a seperate instance for each use. Iterable lets us do this.
The problem with using Iterable is that some common operations require checking all elements for an external class, but are orders of magnitude faster if performed by the Iterable instance itself: size checking is obvious, ArrayList can return an array of its elements with just an array copy, and HashSet and sorted collections can do fast inclusion checking. In order to have the implementation handle these cases where applicable, we need a subtype of Iterable, and we get Collection! All of Collection's non-optional methods could be done by a utility class working with Iterable, but would often be slower.
Thus, Collection fills the same role as Iterable, and the only reason Iterable is seperate is the new for loop. Even in JSE5, nothing in the JSL implements Iterable directly. I'm not saying Collection is or isn't done well, just that it's logical.
Collections are Iterators, and Iterators are Collections
OK, maybe not in Java, but they are in some other languages, like LISP.
There's a breed of nerd called a SmugLISPWeenie that goes around tsk tsking other languages for not learning the lessons LISP learned 40 years ago. I must confess I have some sympathy for this point of view; for example, LISP had garbage collection more than three decades before it was fashionable.
In terms of understanding collections versus iterators, it would probably be profitable to take a long, hard look at the way LISP does it. It is extremely simple and clean, and LISP programmers seem to have been quite happy with the system for easily building and manipulating data structures for a long time. The simplicity is so...simple that it engenders in many a "ring of truth" feeling. For example, mathematical algorithms that are inductive in theory map directly to the way data is handled in practice, with CONS, CAR & CDR.
If you have a list, you can easily access it's first element with CAR. If you apply 'CDR' to the list, you get a smaller list missing the first element, or the empty list when you reach the end. If you apply CONS plus an argument to a list, you get a bigger list prepended with the argument. Other primitives let you destructively change elements in a list, allowing other sorts of operations one does with containers in other languages.
LISP, although it has an object system, really predates the OO era. Lists, for example, are not objects. I'm actually implementing a new language, Atom, using C++, that could be described as a 'fundamentally object-oriented LISP'. Fundamentally in that, unlike Java, C++, and even many other 'OO' languages, even things like integers really are objects. There are no non-objects.
The data representation scheme is based on the idea of generalizing the CONS/CAR/CDR system to any object, any type of collection. So far it is working out very well. I believe that any container-type operation could be added to the object interface without compromising the design philosophy, though only time will tell.
In short, it may be that the age-old container/iterator distinction in C++ & Java is simply the wrong way to think about collections and data. (After all, even in C, the address of an element in an array is also an array. Though without the concept of an 'empty array' at the end, that's not enough to use for iteration.)
With regard to the snippet: > private static Set<GeneralNameInterface> > parseNames(Collection<List<?>> names) throws > IOException { > // ... > Iterator<List<?>> i = names.iterator(); > while (i.hasNext()) { > Object o = i.next(); > if (!(o instanceof List)) { > throw new IOException("expected List"); > } > List<Object> nameList = (List<Object>)o; > // ...
The reason is most likely due to legacy class library compatibility.
Consider two cases: 1. someone who is writing new code (or recompiling old code) using the J2SE 1.5 compiler, and 2. someone who is using a class library (we'll call it Foo) which was compiled with the 1.4 JVM, and Foo makes a call to parseNames.
For (1), the person who is writing new code, the compiler will enforce the generic type contract in the method definition. In that case, the implementation itself is irrelevant.
For (2), the person who is using a class library (Foo) which was compiled with the 1.4 JVM, remember that when Foo was compiled, it did not have to enforce the generic type contract.
Let's say Foo has a method Bar.foo(Collection):
class Bar { public static void foo(Collection collection) throws IOException { X509CertSelector.parseNames(collection); ... }
If you're writing an application which uses Foo, and you can't wait for the nice folks at foo.com to publish a generics-friendly version of their library, there's no reason why you can't do this: Collection<Object> collection = new ArrayList<Object>(); collection.add("This isn't a list"); collection.add(Boolean.FALSE) collection.add(...); Bar.foo(collection)
Thankfully, if you compile this with the 1.5 compiler, you may not get a compile-time error, but you'll still get the run-time error you expect, and love!
Addendum: Of course, they can't change the type of exception which is thrown, even though the expected natural refactoring of that method converts the exception from IOException to ClassCastException, because the library's already in use!
Second Addendum: If the parseNames threw a ClassCastException instead of an IOException, then I bet parseNames would have the natural Tigerish look you would expect. (I bet some people are not happy with whomever wrote this method to throw IOException! :) )