Summary
Java applications adhering to standards rightly pride themselves as independent not only of operating system platforms, but also of specific vendor products. Yet, stored procedures, a product-specific feature of many DBMSs, play an increasingly important role in highly performant enterprise applications.
Advertisement
Incorporating OS- or vendor-specific features in a Java application has been somewhat of a taboo in the developer community. Indeed, a key motivation for most Java standards has been the desire to enable Java applications to remain independent of operating system platforms, vendors, and specific products.
Java's progress has allowed developers to maintain the balance between platform independence and performance remarkably. J2EE applications, for the most part, can be swapped between app servers with no loss of functionality.
Yet, in the case of data management, complete platform- and vendor-independence may be harder to achieve. While JDBC and XML both provide rich platform-independent data access APIs, few developers need to read and present bulk, unprocessed data. Since data is often stored in a database management system, pushing as much data-related computation into the database as possible is not only efficient but is also very convenient. But it can also make an application reliant on a specific database product.
Although SQL is standardized across most database products, most products gain their unique power from vendor specific SQL extensions and a rich stored procedure language. Vendors spend considerable resources implementing such database programming languages in a highly optimized fashion. As a result, often nothing can beat the speed and efficiency of executing the right SQL query or stored procedure inside the database.
Yet, the notion of pushing computation into the database, and retrieving only the needed amount of data, runs counter to a long-running trend in the Java community, according to Roger Bjärevall, product manager of Minq Software's DbVisualizer product, a database inspection and editing tool. "Several years ago, there was a trend to try to move code into the middle tier, and perform data processing there," according to Bjärevall, who worked for Sun prior to developing DbVisualizer. "I don't see that that has happened. Lots of [DbVisualizer] users have had a need for stored procedures. This is a strong requirement in a product such as ours." Bjärevall sees a wide use of stored procedures among his enterprise customers, and DbVisualizer 5.0, which was released last week, features a stored-procedure editor specific to the Oracle database.
Still, many developers are still hesitant to take full advantage of SQL, let alone database product-specific extensions. In an earlier Artima Developer interview, Gavin King, founder of the Hibernate project, lamented that few developers take full advantage of what a relational database has to offer, and that behind a Hibernate application is a database, "in fact, a relational database," noted Gavin.
Minq's Bjärevall doesn't see conflict between JDBC's goal of platform independence and the use of stored procedures in a Java application. "JDBC is a really simple API used mostly to set up communication with a database, and is really not suited for anything beyond that. JDBC can be used to execute SQL and even stored procedures in the database, but JDBC doesn't try to supplant these facilities."
So how much data processing would you entrust to stored procedures, and how much would you perform in Java application logic code? If you use stored procedures, to what extend does your application depend on a specific database product?
It is quite easy to see how this poses a problem for the Java community, where you are trying to develop applications which are platform independent.
To some degree, your independence of platform is always going to be governed by the application and the data involved. If the application was relatively small, then you could achieve it through using a truely independent storage mechanism such as XML.
Unfortunately, in the majority of cases that isn't going to satisfy the requirements - so you'll always need to start looking for a RDBMS. As soon as that happens, the complexity of the problem increases and you are now bound by what platforms the RDBMS runs on. You can increase platform independence by choosing a database which might implement less features (ie, PL/SQL) or take the other path and choose a database which has thousands of features and less independence.
At the end of the day, the choice is going to fall squarely on the outcome of a simple question:
Does my application have a requirement to run on various, potentially incompatible databases?
If the answer to that question is a resounding yes, then you need to choose a database which provides the best multi-platform performance for your application. If the answer is a no, then you pick a database which might meet the same or very different requirements.
Alistair said, "then you could achieve it [data storage] through using a truely independent storage mechanism such as XML."
I believe that XML is a data transfer mechanism (albeit very verbose), not a data storage mechanism.
Using XML for data storage leads to duplication of information through repeated use of tags, problems of data integrity, and an emphasis on the physical aspects of data storage rather than the logical aspects (that is, you focus on where and how the data is stored rather than on what you want to do with it).
As such, I believe that it is an abomination to use XML for data storage.
I agree with Gavin's comment but let me count the ways I don't like stored procedures:
1) There is no standard language for stored procs - they are not database independent.
2) They are difficult to debug.
3) Where does the transaction management reside?
4) Granted, in smaller systems they can be useful. But never ever put business logic in a sproc.
5) Do you need to scale? If you have enterprise class applications, you'll do better by replicating your code across server/clusters/threads than having a number of instances hammering a stored procedure.
Basically the problem is not only with Java, but with other programming languages, too: using stored procedures may be proved to be faster than manipulating the data with a programming language.
The real reason behind the problem is the difference between DB systems and programming languages: although a DB system is a programming language itself, its datatypes are different than that of the language, and its data structures do not usually map directly to the data structures of the said language. In other words, the problem is due to having a separation between the data and the code.
Until someone comes up with a DB Java (i.e. Java + DB intergrated), the solution is to write complex processing routines in stored procedures, and leave the simplest ones to the programming language.
The two questions posted early are valid but a third question is missing.
Will it be required to change the application language during the lifetime of the application?[/1]
The database should not be used as a black box. All DBMS platforms that I have had the privilege to work with have rich functionality built in that most developers would find helpful in increasing productivity and performance. Adding additional hardware in the middle tier will not help you scale if the method in which you access data is expensive to begin with. All that approach gets you is a more expensive implementation of your application.
There are definitely times when you should use stored procedures because they are a signficant performance win.
But what about the case where the DBA dictates that all database accesses must go through stored procedures? You would end up with lots of simple stored procedures that consist of a single select or validation logic+insert/update. I gather that that some DBAs consider that to be a best practice.
My concern here is that this approach severely limits what you can do in the Java application without necessarly improving performance. For instance, stored procedures can make it difficult to use an object/relational mapping framework. This in turn makes it difficult to use a domain model, which is often the best way to implement complex business logic.
This is a pattern that I've seen many times and frankly I understand the DBA's frustration when poorly tested applications mess up 'their' data and the VP wants it fixed up over the weekend or else :)
Hopefully, with Agile/TDD aproaches to app construction DBA's can be convinced to join the dev team and make consensus decisions based on a common understanding of the merits and risks of each approach.
I tend to agree with Daniel and Alistair. To my mind the underlying problem boils down to where to solve what. Let's face it, the world we live in is nowhere near pure Java or any other platform -- even Windows. And, fortunately, it never will be.
Many developers seem unwilling to leave the OO domain and let stored procedures handle set-oriented problems. That, IMHO, is due to the fact that most attempts for a solution are made within the confines of the technical context the developer is familiar with, whether that's an OO domain like Java/C# or a relational database.
SPs are actually easy to develop and not very hard to debug because of their script-like nature. Even though there is a system barrier, i.e. no IDE integration. As Daniel pointed out, the power of SQL database does indeed increase productivity.
The issue about platform independence seems mute in an enterprise setting. Those systems are always heterogenous and full of legacy complications. I have never seen a customer switch databases or programming platforms that did not involve a large migration project accompanying this step.
Scalability problems are, again, from my experience, not solvable outside the database -- Daniel put this very well. That is, unless we are talking about distributed database clusters where the middleware needs to do the load-balancing. Shouldn't those systems utilize all the power of the relational database they can get? And doesn't that imply heavy optimization everywhere including the database?
Let's not forget that SP execution plans are pre-compiled (at least in Oracle and MS SQL Server) and the way table joins are written can be based on the relational structure which does not necessarily (mostly?) mirror the OO model.
> I agree with Eric. I try to avoid stored procedures up to > the point when the pure Java app does not scale.
We actually did a bit of early optimization with stored procedures. Maybe it was premature, maybe prudent. Either way, we did it before we had evidence that a pure Java approach wouldn't scale.
The situation was that we want to record a history record in a separate "history" table for each update to the main table. To do this we felt the need to not trust what Hibernate says was the previous version in an Interceptor, both because we just don't feel we understand it, and because we're using optimistic locking. We just didn't trust that data as much as we trusted the data in the database, which is "the truth" in our system. So we felt the need to grab the existing record (the one we're about to update) from the database and then use that data to construct a history record to turn around and immediately store. Then we do the update.
It wasn't so much that we were that worried about scalability, but it just seemed dumb to grab a record from the database only to pretty much store it back into a different table when the database itself could easily do that via a stored procedure. A stored procedure would save a round trip for the data every update of every record of every versionable entity in our system, data which could be arbitrarily large. I know that the use of the stored procedure increases the cost of change if we want to change the database, but we generate the stored procedures from our entity DSL, so if we someday want to change, we just change the code generator that generates the stored procedures and hit a button.
We actually did have a problem with this approach, though. We set it up as a trigger, to happen right before any update of a versionable table. It worked too well. In testing, it worked exactly as we expected it to. But in production, we seem to be getting a lot of spurious history records. Something is triggering the trigger besides an application update. It doesn't really hurt anything so we haven't taken the time to figure out what is causing it.
So my philosophy isn't informed by a lot of experience with stored procedures, but I just don't feel like I should be afraid to use them for things simple like creating history records.
I am not sure what Hibernate does by default, but creating a separate history object on the java side and saving it using JDBC batch mode would not incure you any additional network traffic. I would not go to a SP yet in this case.
David, I see it quite the other way around. Bills problem and its solution is a good example for a transparent way to use a stored procedure. To the Java system it is just a database call. Even though JDBC might provide the same functionality, this way you don't have to code the JDBC management. It can be done within the SP and if you need to be transactional you can easily add that as well.
> Even though > there is a system barrier, i.e. no IDE integration.
Well, both Oracle and DB2 provide IDEs for Stored Procedures and Triggers. Each is separate from a language IDE. At least so far.
And the issue of portability? SQL (ANSI versions) changes slowly, certainly versus FOTM languages. MySql's SP language (according to its website, anyway) uses DB2 syntax. It's much more likely that the programming language will go out of favor (who would build with Perl over PHP, Ruby, java?); COBOL? What to do when that Silver Bullet Language shows up? Rewrite your app? Go out of business when someone else leverages the SBL to obsolete your app?
Wouldn't it be better to leverage SQL and generate a UI using source generators? And you get to truly re-use your business logic. But that does mean taking both SQL and relational model seriously. Folks are doing this today. Makes sense to me.
Flat View: This topic has 23 replies
on 2 pages
[
12
|
»
]