This post originated from an RSS feed registered with Java Buzz
by Brian McCallister.
Original Post: Graph Manipulation vs Reporting
Feed Title: Waste of Time
Feed URL: http://kasparov.skife.org/blog/index.rss
Feed Description: A simple waste of time and weblog experiment
My last post on graph paging continued to confuse (con-fuse not boggle -- I really need a better word) two ideas somewhat. Let's see if I can do better =)
To go back to programming kindergarten: we have four typical operations on persistent state: create, read, update, and delete. Three of these typically involve working with a small object graph: create, update, delete. They manipulate and mutate, and tweak, and work with, and generally implement confusing business so-called rules. The final one, read, has a big split.
The type of read op is the presentation of small data: username, birthday, shopping cart contents. The second type of read op handles huge volumes of data. These are almost always done via named queries because they would break o/r graph mapping tools, or rather, break the jvm by generating the lovable java.lang.OutOfMemoryError if two got executed concurrently somehow.
The first type of op, the small graph op, is mostly satisfied by the current crop of o/r m tools. The second set ranges from not too bad (OJB's persistence broker) to annoying (Hibernate's session) to not actually practicable (EJB CMP (they invented the "fast lane reader" for this one)). The strongest query language I know of for this is probably HQL (hey, I love OJB, and generally prefer it to Hibernate as it is lighter weight/lower level in my preferred form), but HQL is a very nice query language =). It isn't perfect, but it is probably the most useful one we have right now. Oddly enough, while it is optimized (language constructs) for writing reporting queries (lots of results), it is tied to a small-graph manipulation library (Hibernate).
So, this is completely different from the aforementioned previous post. It doesn't touch on any of the ideas. The reason is that I think these two very different beasts should probably be seperated, or at least handled very differently. Small graph manipulation is, I strongly suspect, much better served via a graph-paging system. Reporting is best served, I am completely convinced, by a result stream.
The nice part here is that you should be able to re-use huge swathes of the code =)
The closest solution I have right now is probably using OJB's persistence broker (report query by iterator) for reporting and the OTM or ODMG (ick) for graph manipulation (this will change in 1.1, and already has in CVS, where full object-transaction graph manipulation is available when wanted from the PB and you can use the same client interface for high level and low level ops :o). Using iBatis with Hibernate also seems very popular, and works pretty well (high volume reports go through iBatis, graph manipulation through Hibernate).
Speaking of large result sets, I also really want to be able to pipeline these puppies. A callback based reader which grabs non-object-transactional instances (which are immediately recycled after the callback returns to help with memory thrashing) would be handy which can be the end of a pass-through right from a streaming result set. Luckily, I don't have to deal with multi-gigabyte result sets anymore (sometimes I miss it, though, unusual constraints are fun to work with).