The Java Collections framework is one of the most frequently used libraries in Java applications. Treating the JDK Collections classes as just "a black box," however, is a mistake, according to Laurence Vanhelsuwé, founder of SoftwarePearls and lead developer of CollectionsSpy, a new Java profiler focused on detecting collections-related programming errors.
In this interview with Artima, Vanhelsuwé talks about the kinds of problems developers often encounter with Java collections, and how CollectionSpy helps overcome those problems:
Laurence Vanhelsuwé:
The idea for CollectionSpy came from the recurring observation that a lot of enterprise Java developers struggle with Java collection containers. That may not be a problem for top programmers, but your average Java programmer does have issues with Hashtable
s, LinkedList
s, ConcurrentHashMap
s, and other containers. I have observed these problems in real-life software teams, and that led me to create CollectionSpy, a new kind of profiler, to help address those problems.
A lot of people are not aware, for example, how careful you need to be when overriding equals()
or hashCode()
in a class. Do that the wrong way—and statistics show that the majority of developers do—and containers can misbehave. If you use an object with a hashCode()
implementation that uses mutable object state, and you use such objects as Map
keys, you end up with mutable keys—I've seen a lot of mutable classes being used as keys. Our tool will instantly detect if a container is corrupted because one of its keys or elements gets mutated.
In general, developers may think that a JDK container acts like a perfect, benign black box. That would lead one to think that container performance can't degrade. But, in fact, it can. For instance, when writing hashCode()
functions, you need to be careful that your values are distributed properly. If that's not done, you can have a properly functioning Hashtable
or HashSet
, but the performance will be atrocious: You're going to leak precious performance, while the problem may be well below the radar as far as top bottlenecks is concerned.
You can, in fact, have all sorts of unexpected behaviors in collections. The worst case I've seen was simply an infinite loop in HashMap.get()
that resulted from corrupting the internal structures of the Map, that was, in turn, caused by multi-threaded access to the Map.
Another classic collections-related problem is that you're not finding in a collection the object you'd expect. That sometimes happens because the content of the collection changed in some unexpected ways. The question, then, is, How did the contents of that collection got to be that way? CollectionSpy addresses all those debugging nightmares.
Unlike conventional profilers, we restrict the kinds of objects we look at to collection framework containers. All the [JDK] containers are first-class citizens in CollectionSpy, and we track all sorts of information for each container. For example, we track the threads that access or use the container. If you have a non thread-safe container, such as HashMap
, being accessed by several threads, that's an indication that we need to take a closer look at how and when that container is accessed. CollectionSpy would flag that container in one of its analysis rules. We also track all code that accesses any container: You can view the stack trace for any access to any container, so you can always find out the root cause of unexpected or problematic accesses.
CollectionSpy is a standalone tool: you just point it at your program, and it starts to produce the data right away. You don't have to do any manual instrumentation or annotation: Just drag-and-drop your existing program on the tool, launch the program, and CollectionSpy will start profiling. For server-based applications, it's a bit more involved, but that's also just a few steps, and we document that. CollectionSpy is the kind of power tool I've wished I had in my toolbox for years, so I'm sure it's going to save others many days of needless debugging frustration.
What kinds of Java collections-related problems do you frequently encounter in your code? Do you think profilers are an effective solution for detecting those problems?