Summary
For Thinking in Java 4e, I'm trying to analyze the presence of the Collections interface in java.util.
Advertisement
There is no equivalent explicit common interface for sequences in C++, and my perception is that anytime a generic STL function is created, that function works with any sequence (via templates) but it turns around and pulls an iterator from the sequence every time (or almost every time).
My perception is that almost every time they pass a Collection in the Java library code, they pull an Iterator from it and use that -- therefore they could just pass the Iterator rather than a Collection, and create a more general-purpose piece of code as a result.
I wrote a Python program to extract the method bodies of methods that take various types of container and iterator arguments, to make it easier to evaluate this assertion (note: program updated July 14). Try running it on your own Java library source to see the results:
'''Searches through standard Java source code files looking
for examples where the classes in "searchFor" are method
arguments. Formats results into the "output" file'''
# Requires Python 2.4
import os, re
src = r"C:\ProgTools\Java\src" # Your Java source directory
output = file("CollectionVsIteratorArguments.txt", 'w')
headerWidth = 59
searchFor = [ "Collection", "List", "LinkedList", "Set",
"Map", "Iterator", "ListIterator", "Iterable", "Enumeration" ]
# Regular expressions to match argument types. Imperfect --
# only finds methods where the entire argument list is on
# a single line:
argSearch = {}
for srch in searchFor:
argSearch[srch] = re.compile("\W%s( |<)" % srch)
# Capture argument list:
arglist = re.compile("\([^)]+?\)", )
results = {}
for types in argSearch.keys():
results[types] = [0, ""]
# Don't include files that are implementing container classes
exclude = """CopyOnWriteArraySet.java
AbstractSet.java PriorityQueue.java AbstractList.java
LinkedList.java DefaultListModel.java RegularEnumSet.java
JumboEnumSet.java DelayQueue.java Vector.java HashSet.java
AbstractCollection.java ArrayList.java
PriorityBlockingQueue.java CopyOnWriteArrayList.java
LinkedHashSet.java IdentityHashMap.java AbstractQueue.java
AbstractSequentialList.java LinkedBlockingQueue.java
TreeSet.java Collections.java ArrayBlockingQueue.java
SynchronousQueue.java ConcurrentLinkedQueue.java
EnumSet.java Class.java BeanContextSupport.java
AbstractMap.java EnumMap.java HashMap.java Hashtable.java
LinkedHashMap.java TreeMap.java WeakHashMap.java
ConcurrentHashMap.java""".split()
def captureMethod(javaFile, firstLine, lines, index):
'''The caller has found a first line of a method. This
captures the rest of the method, assuming it can do a
simple brace count to detect the end of the method'''
method = ""
openBraces = 1
method += "-" * headerWidth + "\n" + javaFile + "\n"
method += firstLine
print firstLine, #############
while openBraces:
index += 1
line = lines[index]
method += line
openBraces += line.count('{') - line.count('}')
return method
# Walk the directory tree looking for appropriate Java files
for javaFile in (os.path.join(root, name)
for root, dirs, files in os.walk(src)
for name in files if name.endswith(".java")
and name not in exclude):
lines = file(javaFile).readlines()
for index, line in enumerate(lines):
methodArgs = arglist.search(line)
if methodArgs \
and line.strip().endswith("{") \
and not line.strip().startswith("for") \
and not line.strip().startswith("if") \
and not line.strip().startswith("*") \
and not line.strip().startswith("//"):
args = methodArgs.group(0)
for argType in argSearch.keys():
if argSearch[argType].search(args):
results[argType][0] += 1
results[argType][1] += captureMethod(
javaFile, line, lines, index)
for type, methods in results.items():
print >>output, "#" * headerWidth
print >>output, (type + " (%d instances)" %
methods[0]).center(headerWidth)
print "%s (%d instances)" % (type, methods[0])
print >>output, "#" * headerWidth
print >>output, methods[1]
C++ doesn't need the common interface because of the latent interface produced by templates.
However, if you are dealing with forward iterators, and this is the case for Iterator in Java, you may need to pass through the sequence more than once, in which case you need the actual container rather than a forward iterator.
But it still seems to me that there may be more cases where an Iterator could be passed to a method rather than Collection.
I also notice that the majority of the uses of Collection as an argument come as the implementation of the java.util library classes.
Chuck Allison commented:
Algorithms have no knowledge of the underlying container in C++. That's why they require two iterators to delimit the sequence. If they need to make multiple passes through the sequence, they make a local copy of the iterator (which is why iterators must be copyable).
To me, a Collection feels like more of a "fundamental" abstraction; a Collection is simply a "multiple Object". Whereas Iterators, although a smaller interface, are a more complicated concept to get your head around. Iterators have behaviour and state. You cannot observe the contents of an Iterator without altering the Iterator. Iterators should really be used for what they were intended; i.e. iterating over Collections, rather than as a way of passing data around.
> therefore they could just pass the Iterator rather than a Collection, and create a more general-purpose piece of code as a result.
My gut feel is that although true in theory, in practise this does not make much difference. The vast majority of Iterator instances are themselves created from the iterator() method of Collection, so whenever you have an Iterator you usually also have the Collection it came from. (Perhaps you could use your Python script to test this assertion?). Yes, you can create independent implementations of Iterator that have nothing to do with a Collection, but this is not frequently done in practice.
Using Collections as arguments rather than Iterators certainly makes methods easier to implement, as the Collection interface provides much greater functionality. You can find the size, combine two Collections etc. As you have pointed out, most uses of Collections just pull out an iterator and use that, but the important point is that you have the full functionality of a Collection if and when you need it. Perhaps the main thought guiding programmers to use Collections rather than Iterators is that it is easy to convert a Collection to an Iterator, but harder to convert an Iterator to a Collection? A kind of "just in case I need it" approach?
Taking a higher level view, the important thing to my mind is that you should pick one approach and use it consistently. Either take Collections as arguments in all your methods or take Iterators as arguments. This leads to a simpler and easier to understand API. I can't imagine it would be much fun to use an API which takes Collections in some places but then for no apparent reason (to the user) takes Iterators in others. I prefer using Collections, but it might be interesting to base an API off Iterators instead.
Having said that, I've recently started using the "Typesafe Collection" idiom. I read this idea in Elliotte Rusty Harold's excellent "XOM Design Principles" article, in particular the section "Lack of generics really hurts in the Collections API. Hence, dont use it." (http://www.cafeconleche.org/XOM/designprinciples.xhtml#d0e378). The idea is to create a custom collection Class for each important Class in your API. These classes do not implement Collection or List. I've found the concept very quick to implement and it makes an API much clearer and easier to understand.
An API that uses untyped Collections for arguments and return values is harder to understand, error prone and doesn't use of one of the central features of Java i.e. the type system. If you find yourself writing Javadoc like "@return a List of Document objects" or "@param documents A List of Document objects" all over the place, either consider using Typesafe Collections or perhaps switch to a language (like Python) that doesn't have typed arguments and return values anyway!
I suspect he meant Iterator, like he said, but using Iterable instead has its benefit. For one, you can replace method definitions that currently take Collection<?> and replace them with Iterable<?> keeping your library backward compatible, yet more clearly defining the role of your method (damn that Iterator.remove!)
I think Andreas is entirely correct. The concept of a Collection is much simpler than that of an Iterator and we must always strive for simplicity.
Most of us here have spent several years working with Java and it's easy to forget that programming concepts that we take for granted altually took years to formulate and clarify. These concepts are still new to very many (possibly the majority) of coders. So whilst using Iterators as parameters instead of Collections may 'simplfy' the code for some, it also introduces another interlectual barrier to be crossed for others.
Although this is not primarily a Java 1.5 issue, I can't help feeling that the language has reached a level of complexity both internally (in terms of its added syntax) and externally (in terms of the esoteric issues being addressed by the experts) where it in danger of losing its position as the 'language of choice' for beginners and, consequently, its primary strength as the lingua franca of software development.
> Although this is not primarily a Java 1.5 issue, I can't > help feeling that the language has reached a level of > complexity both internally (in terms of its added syntax) > and externally (in terms of the esoteric issues being > addressed by the experts) where it in danger of losing its > position as the 'language of choice' for beginners and, > consequently, its primary strength as the lingua franca of > software development.
I think that as Java, C# and (believe it or not VB) move to generics (with all their attendant complexities), people will switch to dynamically typed languages in droves.
I tend to agree with both sides somewhat -- I think it's important for a large segment of users/uses to use the simplest conceptual interface, so Collection is appropriate. OTOH, much of the algorithmic support built into my JGA library is implemented as extended Iterators, and I've done quite a lot of framework-work in the past that primarily used Iterator in the interfaces so I'm very comfortable with that.
One issue that no one has mentioned in favor of using Collection in the interface is the ownership issue: if you have a method that accepts Collection only so that the method can call the arguments iterator() method, the implementation of the method knows that the iterator will not be advanced by another caller. This isn't so much an issue for direct uses (ie, when the method in question obtains the iterator and consumes its contents before returning) as it is for indirect uses (when the method caches the iterator reference and consumes it in a later method call).
Either way, I think this is one of the cases where the assumptions made by the method ought to be clearly documented: if you accept an Iterator, you ought to clearly state what the condition of the Iterator will be when the method returns.
I find that I use the "List" or "Set" interfaces more than "Collection" to signify whether order matters. Using iterators implies an ordering (even though sets can create iterators).
To be really useful as an argument to a method, I think the 'Iterator" would need a "rewind()" method but in my experience (with geometry objects), that's a bad idea.
Another issue to consider is that of modification. Passing the reference to the collection itself allows the method to modify the collection. Depending on the implementation of the iterator - specifically, the "remove()" method - the iterator is a read-only view of the collection.
> <p>My perception is that almost every time they pass a > <b>Collection</b> in the Java library code, they pull an > <b>Iterator</b> from it and use that -- therefore they > could just pass the <b>Iterator</b> rather than a > <b>Collection</b>, and create a more general-purpose piece > of code as a result.
We wrote a library of basic higher-order functions on sequences in Java. A sequence is implemented by an iterator. Here is some JavaDoc:
> I find that I use the "List" or "Set" interfaces more than > "Collection" to signify whether order matters.
This is actually the question I was looking to answer. The meaning of List and Set are so disjoint that it seems arbitrary to try to unify the two with the Collection interface, just because they happen to have method names in common.
I guess what I was really looking for was how often this unification really happens, vs. the situations where you can use Iterator, List or Set instead.
The fact that the C++ STL algorithms work (almost?) exclusively with iterators is indicative, I think.
> Using > iterators implies an ordering (even though sets can create > iterators).
I think an iterator simply says that elements can be selected one at a time. The ordering is established by the underlying container, and the primary objective of the iterator is to decouple an algorithm from the underlying data structure.
"Iterable" does an even better job of this than Iterator -- closer to what the C++ STL accomplishes -- because you can use foreach on anything that is Iterable, and this includes arrays (unlike Iterator). But all code in the standard Java library uses Iterable or even Enumeration; I was unable to discover one example where Iterable was passed as an argument. This might be because an array isn't accepted as an Iterable argument:
[code] import java.util.*;
public class ArrayAsIterable { static <T> void test(Iterable<T> ib) { for(T t : ib) System.out.println(t); } public static void main(String[] args) { test(Arrays.asList(1, 2, 3)); test(new String[] { "A", "B", "C" }); // Won't compile } } [/code]
> To be really useful as an argument to a method, I think > the 'Iterator" would need a "rewind()" method but in my > experience (with geometry objects), that's a bad idea.
This is probably the sticky issue. In C++, I think I remember that it's easy to clone an iterator, so even if you only have a "forward" iterator you can keep a copy to enable multiple passes.
If you have to make multiple passes in Java with an Iterator (not a ListIterator), then you need the Collection Iterface so that you can ask for another Iterator.
I'm still improving the Python program to do a better job of hunting for relevant examples in the Java library source.
> Another issue to consider is that of modification. > Passing the reference to the collection itself allows the > e method to modify the collection. Depending on the > implementation of the iterator - specifically, the > "remove()" method - the iterator is a read-only view of > the collection.
Indeed, "remove()" is optional. Most of the examples I've seen have simply read the Collection, which could be accomplished by passing a read-only Iterator.
I think the concensus is that 'Iterator' would not be a good idea but that there is a case for 'Iterable', because:
1. The 'Iterator' you get from 'iterator()' is under you control
2. You can ask for multiple 'Iterator's
Further points are:
1. The interface 'Iterable' is new to J5 and therefore old code would have to have used 'Collection'
2. 'Collection' extends 'Iterable' therefore a 'Collection' is an 'Iterable' and if you don't want the extra features of 'Collection' you can use 'Iterable' instead, this should not be that confusing since people generally understand inheritance
3. The feature of 'Collection' that I find most useful is 'size()'
4. The heirarchy is well designed and simple; you can pick the level that suits your method, progressing cleanly through to more functionality:
Iterable -> Collection -> List -> Set -> SortedSet -> Queue -> BlockingQueue
In summary I think it is best bet to avoid 'Iterator' and instead choose the lowest level that suits your method from the above heirarchy and in searching code I suspect that you will not see a great deal of use of 'Iterable' because it is new to J5.
> I think Andreas is entirely correct. The concept > of a Collection is much simpler than that of an Iterator > and we must always strive for simplicity.
Hm, I can't see why a collection should be simpler than an iterator (rather the other way around). In the C++ STL, iterators are used as the "link" between collections (or, in general, sequences) and algorithms (i.e. collections are not used directly in algorithms) (*). This makes it easier to use subranges, for example (pass in an iterator modified by find(), for example). With collections, you need something like collection "views" for the same purpose, which presents a subset of the collection.
Besides, iterators may be considered more general, hence actually simpler or more fundamental, than collections. If you have an algorithm operating on iterators, then it can operate on anything that presents itself as a sequence: That may be a collection, an array, a file, a stream, or whatever.
You can always get iterators from collections, but iterators doesn't have to come from collections. Thus, iterators are arguably a more fundamental, simpler concept, than collections. And I agree, we should strive for simplicity. ;)
Regards,
Terje
(*) For those not familiar with STL, the algorithms take a pair of iterators, giving a range. Thus, there's no need for a rewind() method or some such (in fact, iterators don't have to be objects of user-defined types, but may well be pointers).
Yes, but note the constraint on many iterators in the C++ STL: they must often be copyable. There is no such support for Iterator in Java, so you have effectively a forward iterator that can't be copied. For that reason, going back to the Collection to get another iterator may be necessary. However, I haven't really seen that done in the Java library code that I've examined using my Python program.
The main reason in STL for using iterators is not good design, it is to support arrays. Since Java collections don't do any primitive types this isn't a consideration in Java and they went for a cleaner and simpler design of the 'Iterable' interface instead of an 'Iterator' that was cloneable etc. (see below for all the Iterator needs to do). One of the best example of when the Java design shines is with sort, in STL you have:
sort( RandomAccessIterator begin, RandomAccessIterator end )
with Java you have the simpler
sort( List unsorted )
For the STL version to work you need a 'RandomAccessIterator' which has the following methods:
Copy constructor Default constructor = swap == != * (dereference for both sides of =) -> (member access) ++ (both post and pree) *++ (post ++ combined with *) -- (both post and pre) += + (two versions with arguments reversed) -= - (two versions with different argument types) [] (array access for both sides of =)
As you can see this is considerably more complex than the Java 'Iterator' class. Not only is the 'RandomAccessIterator' necessarily much more complex; but it is typically used inefficiently, in particular a typical call would be:
sort( vector.begin(), vector.end() )
Then the first thing that 'sort' does is to make copies (clones) of its input arguments because it needs to guarantee that they aren't modified externally. Thus four 'RandomAccessIterator's are created before you even begin.
The STL sort can sort sub-ranges, but this typically isn't that useful and in anycase is easy in Java:
sort( list.subList( 1, 2 ) )
and STL still needs a subrange constructor in its 'vector' class.
Having said all this, the STL design is correct for STL because it gives support for arrays and the Java design is correct for Java because it is much simpler.
There seem to be four distinct questions that have collapsed onto one here:
1) Collection vs List, Set as parameter type 2) Collection vs Iterator as the parameter itself 3) Collections vs Iterable as parameter type 4) How C++ handles this
My view:
1) I never use Collection in methods I create and if I am editing other peoples code I frequently find myself replacing with the more specific List or Set interface. Collection is overgeneralized. 2) Iterator is the "right on paper" but ultimately wrong answer. It is at the wrong "level" and furthermore you sometimes need to lock a Collection when iterating. 3) Iterable isn't Idiomatic and thus wrong 4) I'm a C++ naif
Flat View: This topic has 24 replies
on 2 pages
[
12
|
»
]