Weblogs Forum - Collection overused as an argument in Java Libraries?

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
Collection overused as an argument in Java Libraries?

24 replies on 2 pages. Most recent reply: Feb 6, 2006 4:10 PM by Chris Chen

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 24 replies on 2 pages [ « | 1 2 ]

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 14, 2005 8:45 PM

> 1) I never use Collection in methods I create and if I am
> editing other
> peoples code I frequently find myself replacing with the
> more specific
> List or Set interface. Collection is overgeneralized.

That's the conclusion I'm tending to. See the next weblog posting.

> 2) Iterator is the "right on paper" but ultimately wrong
> answer.
> It is at the wrong "level" and furthermore you sometimes
> need to lock a Collection when iterating.

I think that sometimes Iterator works, and sometimes the more specific collection type works. That's what I found in the Java library code (although you'll see in the next weblog entry that I also found some very questionable, or at least cryptic, code, so just because they do it in the Java library doesn't necessarily mean it's a good practice!).

> 3) Iterable isn't Idiomatic and thus wrong

I don't understand what you mean by this. I actually think that passing Iterable is a fairly powerful idiom. Maybe you can explain further what you mean.

> 4) I'm a C++ naif

Which means "Lacking worldly experience and understanding." I take this to mean that you came to these conclusions without studying C++.

Terje Slettebø

Posts: 205
Nickname: tslettebo
Registered: Jun, 2004

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 15, 2005 1:19 AM

> The main reason in STL for using iterators is not good
> design, it is to support arrays.

Huh? Where have you got that from? I haven't heard that anywhere else.

STL was designed by Alexander Stepanov. At the time, as I understand, he was looking for a language to implement his idea, and found that C++ fit the bill. Iterator is a general abstraction (in fact a pattern) that allows you to treat any sequence the same way, like I said in my posting, it also works for streams, etc. Also, with things like iterator adaptors, you may for example perform jumps, say accessing every tenth element, traversing it backwards, and so on.

Thus, I haven't seen any evidence that iterators where used in STL to support arrays. Yes, it supports arrays, too, but that's just one of the many things that the iterator abstraction supports.

But don't take my word for it, have a look at what its creator have to say about the subject: http://www.stlport.org/resources/StepanovUSA.html. For example:

"I believe that iterator theories are as central to Computer Science as theories of rings or Banach spaces are central to Mathematics."

No mentioning of C++ arrays, here. :)

> Since Java collections
> don't do any primitive types this isn't a consideration in
> Java and they went for a cleaner and simpler design of the
> 'Iterable' interface instead of an 'Iterator' that was
> cloneable etc.

What is cleaner and simpler is apparently in the eye of the beholder. I find a design, where you're not forced to inherit from an interface, for it to be used in algorithms, to be a cleaner design. You remove a dependency, so it may work with types that were designed separately.

Not to mention operator overloading in C++, something I find particularly elegant, since you can then make user-defined types which behave the same way as built-in types (and you avoid the clumsy wrapping of built-in types in an object, like in Java, just to be able to use it in a container, and you also typically get a performance hit that way, from the memory management).

The Java generics (although I understand the design rationale - in particular to be backwards compatible) also ensures that this continues - a List<int> is still just a List (of Object) to the bytecode...

> (see below for all the Iterator needs to
> do). One of the best example of when the Java design
> shines is with sort, in STL you have:
>
> sort( RandomAccessIterator begin, RandomAccessIterator end
> )
>
> with Java you have the simpler
>
> sort( List unsorted )

*shrugs*

// Sort for random access container (like vector)

template<class RandomAccessContainer>
void sort(const RandomAccessContainer &c)
{
sort(c.begin(), c.end());
}

// Sort for list

template<class T>
void sort(const list<T> &c)
{
c.sort();
}

These functions might just as well have been in the standard library, so you could use sort(), regardless of container type.

Note that in the above, it chooses the most efficient sort function (random access container) possible, depending on the container type.

> For the STL version to work you need a
> 'RandomAccessIterator' which has the following methods:
>


> Copy constructor
> Default constructor
> =
> swap
> ==
> !=
> * (dereference for both sides of =)
> -> (member access)
> ++ (both post and pree)
> *++ (post ++ combined with *)
> -- (both post and pre)
> +=
> + (two versions with arguments reversed)
> -=
> - (two versions with different argument types)
> [] (array access for both sides of =)
>

That also makes the STL iterator more powerful than the Java one (allowing you to traverse forward and backward or in jumps, and accessing any element, so it may be used as arguments to a generic sort function, unlike the Java version).

In other words, there's a reason for the complexity. Remember that when you compare designs, complexity by itself is not a good measurement. Compexity compared to versatility, however, is a good measurement. And that's where _C++ and STL_ shines.

Before the STL, there were several collection libraries in C++ (similar to the Java design). However, when STL came, it blew all these away. That was simply because many people feel it's a better design (not just so that you can also use it for arrays).

> As you can see this is considerably more complex than the
> Java 'Iterator' class. Not only is the
> 'RandomAccessIterator' necessarily much more complex; but
> it is typically used inefficiently, in particular a
> typical call would be:
>
> sort( vector.begin(), vector.end() )
>
> Then the first thing that 'sort' does is to make copies
> (clones) of its input arguments because it needs to
> guarantee that they aren't modified externally.

Actually, no, that's not the reason: C++ has no notion of concurrency in the current standard (something that will likely change in the next version of it). Moreover, there's no requirement for it to copy the iterators internally. It may do it, depending on the implementation, so iterators are assumed to be "lightweight" objects (unlike containers).

> Thus four
> 'RandomAccessIterator's are created before you even
> begin.

Wrong conclusion. Besides, these are typically pointers, and therefore this operation is neglible compared to the actual sort. Besides, who's to say that Java doesn't perform its own copying? Perhaps do a benchmark between C++ and Java?

> The STL sort can sort sub-ranges, but this typically isn't
> that useful and in anycase is easy in Java:
>
> sort( list.subList( 1, 2 ) )
>
> and STL still needs a subrange constructor in its 'vector'
> class.

Really? It doesn't need that, to use a subrange of it. The point is to modify the iterators, not the container. With lists, you may get away with a few pointer operations, to get a sublist, but how will you be able to sort a subrange of a vector, if you need to pass the whole vector to sort? Then you need something like a "view", that provides such a subrange. So in exchange for iterator and iterator adapters, you may need a similar collection of views, which could be considered at least as complicated, and potentially less general.

> Having said all this, the STL design is correct for STL
> because it gives support for arrays and the Java design is
> correct for Java because it is much simpler.

You may say that the Java collections framework is simpler than STL, but that's because it doesn't try to cover all that STL covers (support for arrays is only a small part of it - the iterator abstraction is a general concept that extends far beyond this, as mentioned).

Thus, comparing the two is a bit like comparing apples to oranges.

However, as mentioned above (and as you also mention here), the complexity is there for a reason, and having a more complex language and library can actually make your own code simpler. The essential complexity has to go somewhere, and if it's not in the language or library, it will be in the programs. I have experience with both programming C++ and Java, and Java code (in particular because it treats built-in and user-defined types fundamentally different, and doesn't support operator overloading, unlike C++) tends to become more verbose.

Take the C++ code:

++my_map[key];

This increments the element in the map referenced to by the key. With Java code, the above typically requires 4-5 lines of code (although some of the casting and manual "boxing" disappears with generics and auto-boxing, so it's better than it used to be, in Java 5).

Regards,

Terje

Michael Feathers

Posts: 448
Nickname: mfeathers
Registered: Jul, 2003

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 15, 2005 6:38 AM

> > The main reason in STL for using iterators is not good
> > design, it is to support arrays.
>
> Huh? Where have you got that from? I haven't heard that
> anywhere else.
>
> STL was designed by Alexander Stepanov. At the time, as I
> understand, he was looking for a language to implement his
> idea, and found that C++ fit the bill. Iterator is a
> general abstraction (in fact a pattern) that allows you to
> treat any sequence the same way, like I said in my
> posting, it also works for streams, etc. Also, with things
> like iterator adaptors, you may for example perform jumps,
> say accessing every tenth element, traversing it
> backwards, and so on.
>
> Thus, I haven't seen any evidence that iterators where
> used in STL to support arrays. Yes, it supports arrays,
> too, but that's just one of the many things that the
> iterator abstraction supports.

Yes, but the world didn't start with STL. I think I remember Andrew Koenig pushing for iterators as an abstraction very early on specifically because the abstraction is already there in C++. Iterators are, essentially, pointers. Now whether it was his idea or Stepanov's (or someone else's) originally, I don't know, but it was recognized that iterators were a very C++ thing to do; they fit perfectly in the language.

Now, on Java. I think I see why the library designers used Collection more than iterator, and I think it has to do with simplicity. It is simply easier for new programmers to "get" collections. They use the metaphor of a "bag": put things in and get things out. Iterators are more versatile, but as I said, they are essentially pointers. They are very easy for everyone on Artima to understand, but they could introduce a level of indirection in APIs (no pun intended) that would make the them a little less approachable.

The people designing the Java libraries have done this sort of thing before. Look at UnmodifiableList in the Collections library.. it isn't Liskov compliant, but the alternative, having separate interfaces for mutating and accessing operations on collections, would have cluttered the library and made it less approachable.

Vincent O'Sullivan

Posts: 724
Nickname: vincent
Registered: Nov, 2002

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 15, 2005 7:41 AM

> Hm, I can't see why a collection should be simpler than an
> iterator (rather the other way around).

As technical concepts, they are probably similar in complexity. However, beyond Java you'll find the word collection in all English dictionaries but not the word iterator. Everyone knows what a collection is before they come to Java but an iterator is a new concept that has to be learnt. Every new concept that has to be learnt, in order to use a programming language, is a barrier to new users of that language. One of Java's strengths was that it was a language that had a good balance of complexity and ease of use (excluding its UI capabilities, of course). That's why I say that a collection is simpler than an iterator.

V.

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 15, 2005 7:46 AM

I found this comment generally insightful, but a couple points of clarification:

> Not to mention operator overloading in C++, something I
> find particularly elegant, since you can then make
> user-defined types which behave the same way as built-in
> types (and you avoid the clumsy wrapping of built-in types
> in an object, like in Java, just to be able to use it in a
> container, and you also typically get a performance hit
> that way, from the memory management).

J2SE5 autoboxing automatically performs the wrapping for you (you may still argue that it's clumsy, however).

> The Java generics (although I understand the design
> rationale - in particular to be backwards compatible) also
> ensures that this continues - a List<int> is still just a
> List (of Object) to the bytecode...

You can't actually say List<int>, but rather List<Integer>, and the autoboxing performs the conversions back and forth to int.

> Remember that when you compare designs, complexity by
> itself is not a good measurement. Compexity compared to
> versatility, however, is a good measurement.

Nicely put.

> Before the STL, there were several collection libraries in
> C++ (similar to the Java design). However, when STL came,
> it blew all these away. That was simply because many
> people feel it's a better design (not just so that you can
> also use it for arrays).

I also think that the performance was very good for the STL; better than the alternative approaches. It seemed to take special advantage of template mechanisms.

> The essential complexity has to go
> somewhere, and if it's not in the language or library, it
> will be in the programs.

Also nicely put.

> ++my_map[key];

Always one of my favorites -- about as clean as you can hope to get, but the meaning is not obscured. Also note that this adds a new slot if the key is not there.

Howard Lovatt

Posts: 321
Nickname: hlovatt
Registered: Mar, 2003

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 17, 2005 8:35 PM

@Terje Slettebo

I think your sort example using templates shows what I mean, you gave two sort wrappers one for gneral collections and one for a list. You need a further one for arrays, arrays don't have begin and end methods. So sure the STL designers could have provided four sort methods, three of which were wrappers around the sort method they did provide. Ditto for many functions. But they already had something that was highly complex and intimidating and difficult enough to write so they presumably decided against providing the wrappers.

I don't get your assertion that STL is more powerful that the Java collections library. The two don't cover the same ground, STL in additions to collections does functional programming and supports primitives.

Functional programming in Java is the domain of third party libraries, only because in Java they have decided not to provide it - not because you can't write it.

Ditto primitives - there are third party libraries that support primitive collectionss, again it is a matter of choice as to whether to provide or not.

Sticking to collections of objects - I can't see much difference in the power of the STL and Java. The main differences between the two, in their area of common coverage, is that Java is much simpler to use.

I think you are been very hard on Java, in the light of the C++ experiance with STL they came up with an improvement. Just like, in the light of experiance with C++ in general, they came up with an improvement - Java.

Sure people who want primitive collections and functional programming have to use third party libraries. But, and this is probably the key point as to why Java is superior, the vast majority don't want these facilities and they aren't burdened with the necessary extra complexity needed to provide them.

Jeroen Wenting

Posts: 88
Nickname: jwenting
Registered: Mar, 2004

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 18, 2005 2:08 AM

By taking a Collection as an argument the method relieves the API user of having to do a lot of calls to retrieve Iterators (or Iterables) from his Collections.

If those methods would all take Iterators instead there'd be complaints (maybe even from you) that the methods are needlessly specific in the arguments that they require and therefore not generic enough.

DouglasDD

Posts: 7
Nickname: douglasdd
Registered: Oct, 2004

Re: Collection overused as an argument in Java Libraries?

Posted: Jul 26, 2005 9:18 AM

> I think that as Java, C# and (believe it or not VB) move
> to generics (with all their attendant complexities),
> people will switch to dynamically typed languages in
> droves.

At the risk of sounding like a snob, the above quote reminds me of the three buckets that I tend to sort C++ programmers into (in increasing order of skill):
- coders who don't grok pointers
- coders who grok pointers
- coders who grok pointer and STL

DouglasDD

Posts: 7
Nickname: douglasdd
Registered: Oct, 2004

Iterators vs. Collections

Posted: Jul 26, 2005 9:35 AM

In C the collection type available in the base language is the array.

Since C++ builds on C, the STL designers chose to design a collection library that was backwards compatible with plain C-arrays. Although vector<T> could have behavior, C-style arrays cannot. Therefore the only option available to STL was the iterator which was downwards compatible with pointers into C-arrays.

It has always seemed to me that the interfaces and algorithms designed to use iterators instead of collections, were forced into that (somewhat awkward) position by the burden of backwards compatibility with C-arrays.

...And iterator-based APIs are one thing that I definitely don't miss about C++ when I'm working in Java (or Perl, or ...)!!

Chris Chen

Posts: 3
Nickname: oistrakh
Registered: Feb, 2006

Re: Collection overused as an argument in Java Libraries?

Posted: Feb 6, 2006 4:10 PM

It always amazes me how few people understand what templates are for in C++. I wonder how many copies Austern has sold of "Generic Programming and the STL". Judging from people's comments about templates on the net, about five.

Consider the following example: I'm writing a piece of code that will save a list of Objects to the database. This might be the method signature:

public void saveObjects ( List objectList );

That's fine and dandy ... except what if the caller of this piece of code actually has the list of objects stored in a HashMap? The caller might call HashMap.valueSet( ) to get those objects out for saving, but then he/she would be getting back a java.util.Set, NOT a java.util.List. And due to the sheer brilliance of the well-thought-out Java Collections API, the two are incompatible type-wise, even though conceptually they are the same. The caller could instantiate a brand new List, and copy the Set into the List, but that's a pretty expensive cost to pay just so that the caller can use my API. So then I'd want to add another method in my API:

public void saveObjects ( List objectList );
public void saveObjects ( Set objectSet );

Well, now I've got two methods to do the same thing, we wouldn't want to have two implementations of the code that basically do the same thing ... Refactor, right? And how might I refactor this? Probably by creating an internal method that takes an Iterator as its parameter and forwarding all the other calls to it!!

public void saveObjects ( List objectList ) {
saveObjects( objectList.iterator( ) );
}
public void saveObjects ( Set objectList ) {
saveObjects( objectList.iterator( ) );
}
protected void saveObjects ( Iterator objects ) {
... // Save to DB
}

So why not just have Iterator in the interface in the first place??? It sure looks a lot cleaner!

public void saveObjects ( Iterator objects );

The more underlying concept that I think this blog entry is getting at is the fact that for the writer of the saveObjects() method, the only thing he/she cares about is obtaining an iteration over a group of objects. Whether that group of objects is a List or Set or some other not-yet-defined collection type is irrelevant. As the writer of the saveObjects() method, he/she wants to focus just on the algorithm that's required, not in cluttering the API with useless conversions. So an Iterator is all that the API writer should want as the input parameter. This is what the STL is getting at in C++ when it uses iterators everywhere, and this is what Sun's engineers could have done, with very little cost. But it's not "Java-like" to think about writing algorithms with generic types sent in, that's more of a C++ mentality, at least for now.

But Bruce, there is a problem with putting Iterators everywhere, which they may have fixed by now. As of JDK 1.4.x, most of the Iterator implementations in the Collections framework were not Serializable, even though the individual Collection implementations (List, Set, etc) are. So, for example, if you are writing an EJB (*shudder*) or other remote API, you cannot accept an Iterator because no one using the Sun implementations of the Collection objects will actually be able to send one to you. Very silly, hopefully they've fixed this.

Flat View: This topic has 24 replies on 2 pages [ « | 1 2 ]

Previous Topic

Next Topic