Summary
The Java collection class library has some fine qualities, and having a standardized set of collections is certainly far better than not having it. But the library also has a number of flaws that should be discussed and understood, so that they don't get perpetuated in future libraries. This article discusses two of the more irritating issues.
Advertisement
Several years ago, I was on a committee of the College Board, planning the switch of the Advanced Placement Computer Science exam from C++ to Java. That exam is offered to high school students who seek to get college credit for their high school computer science classes. One of the tasks of the committee was to specify a subset of the Java library that was fair game for exam questions. When the time came to discuss the collection class library, we all sighed and muttered about "Bloch's blunders".
A few weeks later, I was at Java One, where the same library was awarded a prize for "best" (or at least, "best known") class library of the year.
How could two audiences have such a different opinion of the same library? When I asked Josh Bloch, he told me that we academics should stop grousing about the "optional operations". But that wasn't our complaint. We understood that the optional operations are necessary for limited capability views, such as the view returned by Arrays.asList. There is nothing wrong with trading away some purity for a substantial increase in functionality.
The problem is trading away purity when the only result is confusion and inconvenience. The first issue is a serious problem for educators. We try to teach our students about the performance differences between data structures, such as linked lists and arrays. For example, it would be a poor idea to execute binary search on a linked list. After all, linked lists don't have efficient random access. But in the collection class library, both LinkedList and ArrayList implement the List interface, which has operations for random access. Why not have two interfaces, one for an ordered collection such as a linked list, and another for an indexed collection, such as an array? I asked Josh Bloch, and he suggested that some programmers might have a linked list with a few elements, and they might want to sort it, and who was he to deny them that? Huh? When is the last time you needed to do that? And anyway, there are perfectly good algorithms for sorting linked lists.
You may well wonder what happens when you call the Collections.binarySearch method on a linked list. Rather than carry out a horrendously inefficient binary search, the method actually switches to a linear search for a linked list, its name nonwithstanding. Before SDK 1.4, the test for the linear search involved a very ugly hack. Now there is, somewhat belatedly, a tagging interface RandomAccess that ArrayList implements but LinkedList does not. Ugh. Smalltalk did a lot better than that, 20 years earlier.
The second problem with the Java collection classes is iterator inconvenience. In C++, iterators were simple. If you have a collection with n elements, there are n + 1 iterator positions: At the first, second, . . ., nth element, and past the nth element. You could think of an iterator as a cursor like the old terminals used to have (and Emacs still has). Alternatively, you can think of the iterator as new-fangled caret--a vertical line that is between elements. Then the n + 1 positions are before the first, between the first and second, . . . between the (n-1)st and nth, and past the nth element. Both interpretations are essentially equivalent, but the "cursor" view makes it a bit easier to interpret the element that the iterator currently visits.
What should happen when you add an element at the iterator position? Both C++ and Java do the right thing--which is the same thing your text editor does. The element is inserted before the iterator, and the iterator moves past the inserted element, like this:
a|bc => ax|bc (after inserting x)
How about deleting the iterator position? C++ does what your text editor would do when you press the Delete key (ok, except on the Mac, don't get me going...). The element under the cursor (or to the right of the caret) is deleted, like this:
ax|bc => ax|c (after deleting)
How about Java? Well, you can't tell. It depends if the iterator had just moved to the right, to the left, or not at all. Huh? In the last case, the remove method even throws an exception! If you want to delete two consecutive elements, then you have to move the iterator past the first one, invoke the remove method, and move the iterator over the adjacent element before calling remove again. If my text editor worked like that, it would drive me crazy.
I don't think I am the only one who is aggrieved by Java iterators. Douglas Dunn has a long-winded section in his splendid book "Java Rules", trying to explain the behavior of iterators. (He takes a different tack, taking the remove functionality for granted and then laboriously tries to explain the inconsistency of the add method.)
Of course, these flaws are not fatal. The collection class library has some fine qualities, and having a standardized set of collections is certainly far better than not having it. Other libraries, such as the STL library of C++, have their share of bizarre features. What bugs be about the Java collection class library is that it seems to be uncritically accepted, and that so few people talks about its flaws. If we can't recognize our mistakes, how can we learn from them?
Thanks for an interesting article. However, I do not agree with your idea that the behaviour of the Iterator is faulty. The documentation for the remove method of the class Iterator says: void remove() Removes from the underlying collection the last element returned by the iterator (optional operation).
From the above you can infer that you only delete items that have been iterated over ("nexted"). If you have not iterated over an item, you don't know enough about it so it makes no sense to delete it. If you know what the item is then perhaps you should not be using the iterator in the first place.
Secondly, the documentation for remove() is very succinct and I think that the behaviour you mentioned from C++ would require a bit more complicated documentation (ok a couple of more lines). So when you contrast the contract for remove() in Java and a similar operation in C++ I think the one in Java has greater advantages.
Thanks for an interesting article. However, I do not agree with your idea that the behaviour of the Iterator is faulty. The documentation for the remove method of the class Iterator says: void remove() Removes from the underlying collection the last element returned by the iterator (optional operation).
From the above you can infer that you only delete items that have been iterated over ("nexted"). If you have not iterated over an item, you don't know enough about it so it makes no sense to delete it. If you know what the item is then perhaps you should not be using the iterator in the first place.
Secondly, the documentation for remove() is very succinct and I think that the behaviour you mentioned from C++ would require a bit more complicated documentation (ok a couple of more lines). So when you contrast the contract for remove() in Java and a similar operation in C++ I think the one in Java has greater advantages.
<quote> C++ does what your text editor would do when you press the Delete key (ok, except on the Mac, don't get me going...). The element under the cursor (or to the right of the caret) is deleted, like this: </quote>
There's another key on your keyboard, usually a nice big key that gets used a lot more than the delete key. Have you noticed the "Backspace" key? And, it acts similarly to most iterator usage in java - it deletes the letter (object) you just typed. Which makes a whole lot more sense than deleting the letter you haven't gotten to yet.
Paul--it may well make sense to have "remove" equal to the "Backspace" key, but that's NOT what the Java library gives us. Hit "Backspace" twice, and you delete two items. Call "remove" twice, and the second call throws an IllegalStateException :-(
Stefan--unfortunately, you are wrong in your assessment that "you only delete items that have been iterated over ("nexted")." If you iterate backwards, you must have "previoused". The fact that you missed that shows to me that the documentation isn't all that clear after all.
And if you want to erase a range of items after you inspected them, you can't just delete what you now know needs to be removed. You need to reinspect each one of them before you can remove it.
> Stefan--unfortunately, you are wrong in your assessment > that "you only delete items that have been iterated over > ("nexted")." If you iterate backwards, you must have > "previoused". The fact that you missed that shows to me > that the documentation isn't all that clear after all.
ok perhaps by using the term ("nexted") my post was ambigious because it indicates a certain direction but if you dismiss this "nexted" term (or add "previoused") I think the post is accurate and "right" :) At least this is correct with respect to the documentation of the iterator interface that only provides the next operation.
The documentation succintly says that the last element iterated over is in some sense an active element and is eligible for removal, possibly in either direction (but that would be the behavour of another interface, possibly extended from iterator).
> And if you want to erase a range of items after you > inspected them, you can't just delete what you now know > needs to be removed. You need to reinspect each one of > them before you can remove it.
My point is that if you want to do some complex deletions of the items in the underlying collection that implements the iterator interface then perhaps you would be better of by accessing the same collection through another interface instead of the iterator.
I think it is a benefit that the operational semantics of the iterator in the collections framework is so simple. It should be as simple as it can be but not too simple (after all it is possible to delete at all).
But in the end this is perhaps more a matter of personal preferences and outlook on programming.
There are two key flaws in the collections classes that aren't addressed:
o The fact that methods are called 'size()' rather than 'getSize()' means that they can't be used by JavaBean objects easily. Had they been getSize, JavaBeans Introspection would have kicked in which meant that all data structures would be interrogable by JavaBean objects (probably more important in the scope of JSPs anyway). [Bloch himself attributed to the fact that he 'thought' they would be better, in much the same way in Effective Java he 'thought' it would be better to violate the equals contract]
o The fact that remove() is a pretty silly thing to have in an iterator anyway. If you want a read-only iterator, you have to explicitly wrap it first. Using an Enumerator would have been far more sensible.
I do understand why the remove() was put in in the first place, but that was to tackle a single issue. It would have been far better to create a more generic 'getPosition()' which could then be used to perform a whole bunch of operations, such as insertBefore and insertAfter on the data structure itself. [The position wouldn't have to be an integer; it could be a pointer directly to that part of the data structure itself, such as an element for the linked list node]
Most of what Bloch writes about in Effective Java makes sense, but you do get the feeling sometimes that he made a few blunders early on, and instead of accepting the errors tries to convince others he was right all along.