Summary
In an article introducing Ruby on Rails' Active Record, Bruce Tate suggests that Java could enjoy some of the benefits of Rails by taking a wrapping rather than a mapping approach to persistence. I think this misses the point. What Rails really demonstrates is the benefit of code generation.
Advertisement
Bruce Tate has written an article, "Crossing borders: Exploring Active Record" published on IBM DeveloperWorks (see Resources), which introduces Active Record, the persistence layer of Ruby on Rails. In this article Tate compares Active Record's approach to Hibernate and JDO's approach by distinguishing "wrapping" from "mapping." He says:
In Hibernate, you'd usually begin development by working on your Java objects because Hibernate is a mapping framework. The object model becomes the center of your Hibernate universe. Active Record is a wrapping framework, so you start by creating a database table. The relational schema is the center of your Active Record universe.
Tate then provides a nice example introducing Active Record, and suggests:
The Java platform already boasts state-of-the-art mapping frameworks, but I now believe that it needs a groundbreaking wrapping framework. Active Record relies on language capabilities to extend Rails classes on the fly. A Java framework could possibly simulate some of what Active Record offers, but creating something like Active Record would be challenging, possibly breaking three existing Java conventions:
A persistence solution should work only on a Java POJO (plain old Java object). First and foremost, it would be difficult to create properties based on the contents of a database. A domain object might have a different API. Instead of calling person.get_name to set a property, you might use person.get(name) instead. At the cost of static type checking, you'd get a class built of metadata driven from a database.
A persistence solution should express configuration in XML or annotations. Rails bucks this trend through forcing naming conventions with meaningful defaults, saving the user an incredible amount of repetition. The cost is not great because you can override defaults as needed with additional configuration code. Java frameworks could easily adopt the Rails convention-over-configuration paradigm.
Schema migrations should be driven from the persistent domain model. Rails bucks this convention with migrations. The core benefit is the migration of both data and schema. Migrations also allow Rails to break the dependence on a relational database vendor. And the Rails strategy decouples the persistence strategy from the issue of schema migrations.
In each of these cases, Rails breaks long-standing conventions that Java framework designers have often held as sacred. Rails starts with a working schema and reflects on the schema to construct a model object. A Java wrapping framework might not take the same approach. Instead, to take advantage of Java's support for static typing (and the advantages of tools that recognize those types and provide features such as code completion), a Java framework would start with a working model and use Java's reflection and the excellent JDBC API to dynamically force that model out to the database.
The Real Lesson of Rails
My observation is that the main technique Rails uses to improve developer productivity is code generation, even though Ruby's dynamic nature makes the code generation less obvious. The technique is also called metaprogramming, which simply means writing programs that write programs. Nevertheless, code generation is an old technique, and one I've used several times throughout my career to improve productivity.
For example, back in the late 1980s I was working on a project in C that used a proprietary API to send SQL to an Informix database. There was one C file that we used as a layer between our application and the database. That C file defined C structures for each table, and functions that used SQL to store and retrieve those structures to and from the database. We updated our database schema every time we did a new release, which was around twice a year. So twice a year I found myself editing the SQL schema file that created the tables and also editing that C file that served as the layer. It dawned on me that all the information needed to generate the C file was contained in the SQL schema file, so I wrote a Yacc/Lex program that parsed the SQL schema file and generated the C file (i.e., it generated the database layer for our application). We used this generator again and again over subsequent years, and was in retrospect a good investment. It saved us time, because it made the changes to the C file, and once I got the bugs out of the generator, the generator never made a mistake.
That old Yacc/Lex program demonstrated what Bruce Tate would refer to as a "wrapping" approach, because the C structures were based on the database schema. However, to me what is important is simply that I'm expressing my intention in one place, and using a tool to generate other pieces of my system that can be determined from that one specification. In the case of my Yacc/Lex tool, the specification was the SQL file that contained all the create table commands that defined our database schema.
Another place to express intent is in a Domain Specific Language (DSL) or "little language." In our new architecture at Artima, for example, as one step of our build we generate Java code from little programs we write in DSLs we created using JavaCC. We generate major portions of our controllers and entity layer that way. It minimizes the amount of code we write, because we express our controllers and entities in a concise DSL, and once we get the generator working, it never makes a mistake when writing the Java code. In the case of entities, we use Hibernate to do O/R mapping. Our entity generator creates POJOs, manager classes that have CRUD and other persistence methods, Hibernate XML mapping files, database triggers, and some SQL that generates database sequences. We use Hibernate's SchemaUpdate tool to actually synchronize the database so it matches the schema we indirectly specify in our DSL scripts.
Static versus Dynamic Code Generation
Ruby's dynamic features make Active Record look a bit different than what I've done in the past with code generation in C, C++, and Java. In Rails, code is effectively generated at runtime rather than pre-compile time. One thing we do in our controller generator, for example, is pull out the request parameters and put them in instance variables. This saves us time because we never have to write code to extract the parameters, they are already in instance variables that we can just use from our controllers. Rails does a similar thing by adding an instance variable to the controller dynamically for each request, and initializing it with the parameter's value. Rails does this dynamically to each controller object it creates to handle a request, after the request comes in.
One difference between static and dynamic code generation for the developer is that in the Rails case, you don't have to wait for static code generation before trying a change, and you don't have to look at the generated code. In my opinion, Rails applications feel like they require so much less code in great part because the generated code is hidden (but also because Ruby is concise in general). In our case at Artima, we see lots of generated .java files lying around in the midst of our project. On the other hand, we generate nice JavaDoc comments with our generated code, and so we get nice API documentation to look at if we want. If there's a problem or a question, we can go look at the generated code too. In Rails, it is just kind of magic. That makes it feel very lightweight, but if you ever encounter a problem with that magic, it might be more painful to solve it.
The other main difference for the developer is that static code generation is not part of the build cycle in Rails. This is a two-edged sword, though, because when doing the example programs in Rails at least, everything is quite fast. I make a change, I go to the browser and immediately try it. (Doing simple examples is as far as I've gotten with Rails.) As the number of database tables grows, there may be some perceptible lag time each iteration waiting for all the dynamic code generation to take place.
To me the important lesson for the Java community to take away from Rails is that you should consider using code generation where appropriate. I think doing code generation is something you have to be careful about, however. If you make a code generator, then you have to pay time up front building the generator, and you have to support the generator thereafter. It makes sense when you're going to get a good payback, which means you'll be using it regularly. It also helps if you can use an existing tool to generate code. For example, you could use Ruby on Rails. Or, you could use existing Hibernate tools to reverse engineer mapping files and POJOs from existing database schemas.
A database layer is often a good candidate for code generation, because these layers need to be updated whenever the database schema changes. If you only need to do something once, then writing a code generator is probably a bad idea. And even if you think you'll be doing this over and over, until you've done it several times by hand you probably don't know enough to automate. In our current architecture effort, we probably built about a dozen controllers by hand, and a dozen entities, before we felt we knew enough to automate.
I think programmers often don't like code generators that come from the outside, because like any framework they will often only take you 90% of the way you need to go. After that you start fighting the framework, which in a code generator often means you want to tweak the generated code. But that usually defeats the purpose of the code generator, which is to give you huge leverage. An exception are things like the scaffold generation in Rails, which is intended to just be a quick start that you edit and carry forward. Our in-house code generators are the kind where we aren't supposed to touch the generated code, and we solve the 90% problem because when we need to change the generated code, we change the code generator (since we wrote it, we can change it).
I think dynamic languages like Ruby and Python, because they make metaprogramming easy, push people in this direction of writing software that creates software, and that's a great thing. However, I think that with tools like JavaCC and ANTLR, it is relatively easy to define a code generator for Java. Authors of Java frameworks could use this technique, or you can use it yourself on individual projects. If you find yourself frustrated doing repetive programming tasks in Java, the problem may not be with Java but with how you are using it. That frustration could be a signal that you should consider adding some automation via code generation.
If you can write code which takes some data and generates another code which then does the job. Then you should be absolutely able to write code which takes data and does the job. Why take extra code generation step? http://talkinghub.com/forum/message/417.html#message
Great article. You actually turned me around on code generation.
You failed to mention one downside that bugs me, though -- since the source is generated by the build process, generated source is always "a version behind," so to speak. If I open up a model class and change one of the annotations. I've got to remember to re-run the build immediately, or risk confusing myself by later looking at an out-of-date Hibernate mapping file. (That, or just train myself to avoid looking at the generated stuff.)
Despite that, you managed to sell me that it's worth the learning curve to figure out XDoclet. :)
A technique I've seen a few times: first in Spec Bowers' <a href="http://members.aol.com/bowersdev/">AppMaker</a> and more recently in Jakarta <a href="http://jakarta.apache.org/turbine/">Turbine</a> is that the generator generates two classes: a base class that it is free to re-generate at any time, and a subclass of the base which is yours to mess with however you like. I think it's a critically important technique.
I don't think code-generator-generated comments are worth the pixels they occupy; they tend to say, in so many words: getFred is a getter for the field fred; it gets the value of the field fred. I'd prefer no comments to those.
This article, and Tate's comments, reminded me of some amazing extensions to Fox Software's FoxBase app written by a developer working for the JPL. As it turns out, this wasn't some bored scientist with some free time on his hands, but rather a sharp programmer, Ken Levy, who had previously done some work on embedded applications and also dBase.
These languages, and languages like PowerBuilder, Gupta SQL, etc, were called "4GLs" i.e. 4th generation languages, in other word not as general purpose as something like C - you wouldn't write an Operating System in a 4GL for example.
In FoxBase, designing a screen was a 2 part process, i.e. first use a screen editor, which would write all the coordinates and other details to a database file, then you ran another program to generate the screen program, and then that would be compiled. What was nice was everything was written to standard .dbf files that FoxBase understood, and the generated code was in FoxBase, i.e not some closed proprietary format like a Word .doc (more like a .PDF).
Now what Levy did was write a wrapper program, so that you could modify how the code was generated, but without rewriting the screen generator entirely.
He also built it so it with hooks so you could call other "driver" programs, which is where the amazing part came in - some folks wrote a 3D driver, and a Tab driver, such that your Windows could now achieve that chiseled 3d look, and tab pages, long before it was in the commercial version of Fox.
Sure, nothing that hasn't been done before in other languages, but cool stuff.
Ken Levy - Jet Propulsion Laboratory Extending The Screen Builder With GENSCRNX
Led by the author of GENSCRNX for FoxPro, this session will discuss the architecture and features of GENSCRNX which is a pre and post processor exten- sion to FoxPro’s GENSCRN. GENSCRNX is a public domain program that allows extended control over the code generated from FoxPro’s Screen Builder without modifying GENSCRN. The session will discuss how GENSCRNX extends Screen Builder development using its own set of built in directives; how to create complete 3D looking screens in FoxPro for Windows with just a few directives; and how to create complete Drag/Drop interfaces with pictures and text using Visual Basic like syntax that is fully cross-platform ready (DOS/Windows/Mac).
> If you can write code which takes some data and generates > another code which then does the job. Then you should be > absolutely able to write code which takes data and does > the job. Why take extra code generation step? > I'm not sure I understand your question, but it is true that we could write by hand all the code we currently generate. In fact we wrote a bunch of it by hand before automating. The reason you would want to automate is the same as the reason you might want to replace human workers fitting doors onto cars on an assembly line with robots. The robots can do a more consistent job, work faster with fewer mistakes, and cost less over the long term. By writing a tool that generates code we would otherwise write by hand, in the future we can move much faster, handle more complexity (because the code we write is much smaller than the code we generate), and be confident the generated code works (because once we debug the generator, the robot, it never makes a coding mistake).
That's the benefit. The cost is that you have to pay to build the generator up front, and that slows you down in the short term even if it speeds you up in the long term. Also, it will only speed you up in the long term if you actually need to use the generator a lot. And you have to train people on the DSL and support the tool.
> You failed to mention one downside that bugs me, though -- > since the source is generated by the build process, > generated source is always "a version behind," so to > speak. If I open up a model class and change one of the > annotations. I've got to remember to re-run the build > immediately, or risk confusing myself by later looking at > an out-of-date Hibernate mapping file. (That, or just > train myself to avoid looking at the generated stuff.) > Yes, we don't touch the generated code, because it isn't "source" code anymore. If we need to change the generated code, we have to remember to go to the DSL script. That's not usually a problem. But if the generator doesn't support the change we need to make, then we have to add support for it in the generator itself. Our current generators only support the functionality we need right now, so we fully expect to be adding new features over time as they are needed.
> A technique I've seen a few times: first in Spec Bowers' > <a href="http://members.aol.com/bowersdev/">AppMaker</a> > and more recently in Jakarta <a > href="http://jakarta.apache.org/turbine/">Turbine</a> is > that the generator generates two classes: a base class > that it is free to re-generate at any time, and a subclass > of the base which is yours to mess with however you like. > I think it's a critically important technique. > We use that very technique in a couple places as a way in Java to generate a class that is customizable. For example, our controller generator generates GeneratedAccountController, and we subclass that with AccountController. That works fine, however it is really an artificial split between the generated superclass and the hand-written subclass. What conceptually we would generate is parts of AccountController.
I believe C# has a feature called partial classes that allows a class to be defined in multiple files, which would allow us to generate part of a class in one file and write the rest by hand in a different file. In Ruby, you can add "code" to an existing instantiated object dynamically at runtime, which means you can really write some of the object by hand (in code in a file) and add more to it dynamically at runtime.
> I don't think code-generator-generated comments are worth > the pixels they occupy; they tend to say, in so many > words: getFred is a getter for the field fred; it gets the > value of the field fred. I'd prefer no comments to those.
Yes, in our case, the getter for isVerified says:
Gets this Email's verified property.
So it doesn't add too much information, but it looks nice when you look at the JavaDoc documentation, which I like to use as a design visualization tool. The documentation for other methods whose purpose is not as obvious is perhaps more useful, for example we have a makeTransientCopy method on our POJOs that has documentation like this:
Makes a transient copy of this Email entity. The transient copy will have null id, version, and entity database date properties. Given that this entity is versionable, it will also have a null creation date, modified date, actor ID, and ip address properties. In addition, its user property will be null. The subscriptions set property of the returned transient copy will be populated with transient copies of the Subscription entities contained in this object's subscriptions set. The other properties in the returned transient copy will have the same values as the corresponding properties in this object, the entity on which this method is invoked.
My goal was that our generators will generate APIs that we can then script against, and to me having good JavaDoc comments is an aspect of quality code that I didn't want to sacrifice. I think the generated code also will kind of set a quality tone for the code we write by hand--we'll want our hand written code to look good in the JavaDoc view too.
I would like to share my personal real life experience of the lesson of code generation and Rails. We had built an open source project called GenAndRun which implements and adopt most of the features and ideas mention in your article, except that we use IBatis rather then Hibernate.
But it seems that most Java developers show no interest on this approach or even go against this approach. Some developers think that code generation is bad and useless. Code generation based on Table metadata is HORRIBLE.
Even though the source code generated by GenAndRun has no dependency on GenAndRun because the generated source code does not use any API of GenAndRun. Many java developers think that it is buzzword and not useful.
Richard Norman argues that Active Record is the worst idea to adopt into Java.
In summary, the general opinion is that the code generation and Rails approach is a waste of time. The correct approach is EJB3.0, annotation, etc...
You are one of the minority of java programmer who can really appreciate the approach. Personally I like to say thank you since you give me strength. Thanks a lot.
I think in many cases the cravings for code generation can be satisfied by creative (or forgotten) use of the language.
For example, if we want code generator to enable us write "article.getAuthor()", why not to write instead something like 'article.get("author")' which can be implemented in runtime and doesn't require code generation? Granted, the former is faster and a bit more readable (only a bit), but we can specifically optimize some 3% of the calls, while leaving the rest totally dynamic.
> I think in many cases the cravings for code generation can > be satisfied by creative (or forgotten) use of the > language. > > For example, if we want code generator to enable us write > "article.getAuthor()", why not to write instead something > like 'article.get("author")' which can be implemented in > runtime and doesn't require code generation? Granted, the
Bill, my assumption is very similar to Sasha's thoughts. Instead of writing a tool which generates code - you can think about writing a tool which does the actual job, and this tool can be adopted to changes and produce adopted results.
In your 1980s C example. Instead of writing C code generator I would modify C code to enable it to take database schema and act according to it. all the information needed to generate the C file was contained in the SQL schema file is just step away from "all the information needed to work with database was contained in the SQL schema file".
> For example, if we want code generator to enable us write > "article.getAuthor()", why not to write instead something > like 'article.get("author")' which can be implemented in > runtime and doesn't require code generation?
In addition to the issue of readability which you mention - and which I consider more important than you evidently do - there is the matter of finding errors earlier. I don't want to stir up the whole compiled/interpreted question - I'm pretty ambivalent on it myself - but if you're using a compiled language it's nice to be able to find out before you run it that something's wrong. 'article.get("athor")' is perfectly valid until it gets to the database; 'article.getAthor()' fails at compile time with errors that a good IDE will show you as soon as you type them in.
Of course, we're talking about code generation, and the generator won't make those mistakes, but if your code generation starts to serve as a model for the "other 10%" of your code... I think it's nice to have these safeguards in place.
> > For example, if we want code generator to enable us > write > > "article.getAuthor()", why not to write instead > something > > like 'article.get("author")' which can be implemented > in > > runtime and doesn't require code generation? >
The main reason, as you mention, is readability, i.e., usability of the API. I also consider that very important. We're basically generating a clean API, with docs, etc. Generating code can mean generating an API, and it helps you tailor the generated API to your taste.
I've written code in the past that became so generic that even I had a hard time figuring out what it was doing a couple of months after I wrote it. That might be OK for a smaller system, but is a huge productivity handicap when we talk about hundreds of classes. With that amount of code, I like to have a clean API, i.e., something that makes it immediately obvious what each method does and, more important, how I can achieve a task.
> A technique I've seen a few times: first in Spec Bowers' > <a href="http://members.aol.com/bowersdev/">AppMaker</a> > and more recently in Jakarta <a > href="http://jakarta.apache.org/turbine/">Turbine</a> is > that the generator generates two classes: a base class > that it is free to re-generate at any time, and a subclass > of the base which is yours to mess with however you like. > I think it's a critically important technique.
That's also something we do in the new Artima infrastructure: We generate some fairly rich superclasses, and then use subclasses in the actual code. Both our controller/MVC and data access/entity layers work this way. So while most frameworks take you from 0-90% real fast, this technique allows us to travel that last 10% via the subclassing mechanism fairly quickly also.
> Bill, my assumption is very similar to Sasha's thoughts. > Instead of writing a tool which generates code - you can > think about writing a tool which does the actual job, and > this tool can be adopted to changes and produce adopted > results. > > In your 1980s C example. Instead of writing C code > generator I would modify C code to enable it to take > database schema and act according to it. all the > information needed to generate the C file was contained in > the SQL schema file is just step away from "all the > information needed to work with database was contained in > the SQL schema file". > Ah, now I understand what you meant by your earlier post. Yes, I believe that one should use code generation as a last resort. The first and preferred way to get rid of duplication and repetition should be to simply take common code and factor it into one method, or one class, and then have everything else use that. In the case of our manager classes with their CRUD methods, we could have one ueber-manager that had methods like:
Instead what we do is generate a separate manager for each entity, which has method signatures like this:
class UserManager {
....
saveUser(User user) { ... } }
class EmailManager {
....
saveEmail(Email email) { ... } }
A few of the reasons we went the code generation approach in this case is to get more readability (as Sasha mentioned), to leverage static type checking (as Carl Manaster mentioned), and because in many cases in our example it was unwieldy to get all that stuff in one class.
The type checking thing is not just for finding errors early, but that's one useful thing. It also helps my refactoring IDE do refactoring, it can help optimizers do a better job of optimizing. In cases where my refactoring IDE fails me, I can do a refactor by hand and run a compile to get a do list of places to fix. So I tend to try to use the static typing in Java as much as possible.
The unwieldy thing, though, is really the probably the main reason I turned to a generator. When I want to write something a certain way, because it is a good design, but it still requires a lot of repetive coding, then instead of making the design worse by putting too much stuff in one class that is harder to use, I prefer to generate code that looks like what I would otherwise have written by hand. I.e., I still get the nice, well-factored, understandable design and code, but I don't have to write it.
Flat View: This topic has 42 replies
on 3 pages
[
123
|
»
]