Summary
In a recent blog post, Steve Yegge argues that the worst thing that can happen to a code base is increased size, and suggests that developers should consider programming languages on the basis of how concise a code base a language can facilitate.
Advertisement
Most enterprises have accepted large code bases as a fact of life, and believe that developer tools are the solution to living with huge amounts of code, argues Steve Yegge in a recent blog post, Code's Worst Enemy.
Yegge's point is that, to the contrary, large code bases are not inevitable, even for projects with very complex requirements. Rather, large code bases are the result of the languages developers use, and even of some development techniques that are meant to, in fact, reduce the complexity of large code bases.
Yegge writes that:
I believe, quite staunchly I might add, that the worst thing that can happen to a code base is size...
The word "bloat" might be more accurate, since everyone knows that "bloat" is bad, but unfortunately most so-called experienced programmers do not know how to detect bloat, and they'll point at severely bloated code bases and claim they're skinny as a rail...
I say my opinion is hard-won because people don't really talk much about code base size; it's not widely recognized as a problem. In fact it's widely recognized as a non-problem.
Yegge notes that enterprises as well as developers accepted large codebases as a fact of life, believing that IDEs can help deal with large amounts of code:
People in the industry are very excited about various ideas that nominally help you deal with large code bases, such as IDEs that can manipulate code as "algebraic structures", and search indexes, and so on. These people tend to view code bases much the way construction workers view dirt: they want great big machines that can move the dirt this way and that...
My minority opinion is that a mountain of code is the worst thing that can befall a person, a team, a company. I believe that code weight wrecks projects and companies, that it forces rewrites after a certain size, and that smart teams will do everything in their power to keep their code base from becoming a mountain.
Many companies are faced with multiple million lines of code, and they view it as a simple tools issue, nothing more: lots of dirt that needs to be moved around occasionally.
Interestingly, Yegge points to several techniques aimed at dealing with large code bases as actually contributing to increased code size. Two such techniques Yegge mentions are refactoring and design patterns:
The problem with Refactoring as applied to languages like Java, and this is really quite central to my thesis today, is that Refactoring makes the code base larger. I'd estimate that fewer than 5% of the standard refactorings supported by IDEs today make the code smaller. Refactoring is like cleaning your closet without being allowed to throw anything away...
Design Patterns was a mid-1990s book that provided twenty-three fancy new boxes for organizing your closet, plus an extensibility mechanism for defining new types of boxes. It was really great for those of us who were trying to organize jam-packed closets with almost no boxes, bags, shelves or drawers. All we had to do was remodel our houses to make the closets four times bigger, and suddenly we could make them as clean as a Nordstrom merchandise rack...
A design pattern isn't a feature. A Factory isn't a feature, nor is a Delegate nor a Proxy nor a Bridge. They "enable" features in a very loose sense, by providing nice boxes to hold the features in. But boxes and bags and shelves take space. And design patterns – at least most of the patterns in the "Gang of Four" book – make code bases get bigger.
Finally, Yegge points out that certain languages almost always result in larger than necessary code bases. One language the blog post mentions is Java:
The core problem is duplication, and unfortunately there are patterns of duplication that cannot be eradicated from Java code. These duplication patterns are everywhere in Java; they're ubiquitous, but Java programmers quickly lose the ability to see them at all...
I'll give you the capsule synopsis, the one-sentence summary of the learnings I had from the Bad Thing that happened to me while writing my game in Java: if you begin with the assumption that you need to shrink your code base, you will eventually be forced to conclude that you cannot continue to use Java. Conversely, if you begin with the assumption that you must use Java, then you will eventually be forced to conclude that you will have millions of lines of code...
Java's game pieces don't permit code elimination because Java's static type system doesn't have any compression facilities – no macros, no lambdas, no declarative data structures, no templates, nothing that would permit the removal of the copy-and-paste duplication patterns that Java programmers think of as "inevitable boilerplate", but which are in fact easily factored out in dynamic languages.
Not surprising, Yegge started to look for another JVM language to use in his project:
Three years ago, I set out to figure out which JVM language would be the best code-compressing successor to Java. That took a lot longer than I expected, and the answer was far less satisfactory than I'd anticipated. Even now, three years later, the answer is still a year or two away from being really compelling...
Each of these languages [JRuby and Jython] (as does Perl 6) provides mechanisms that would permit compression of a well-engineered 500,000-line Java code base by 50% to 75%. Exactly where the dart lands (between 50% and 75%) remains to be seen, but I'm going to try it myself.
Yegge concludes his blog post by describing what language he chose and why.
Do you share Yegge's opinion that size is a code base's worst enemy?
Yeap, I totally agree. Java is good, but it does not provide mechanisms for abstracting away tasks, and thus the same code is being copied and pasted over and over. One example is iterating collections.
Interestingly enough, after a lengthy (and occasionally boring) read, the reader learns that Steve is going to solve his problem with Java with Rhino, a language written in - Java. Sounds a lot like the guy he bashes for listing the problem as a requirement. The blog post is an excellent confirmation of Stroustrup's saying that there are only two kinds of languages - those that everyone complains about and those that nobody uses.
> Interestingly enough, after a lengthy (and occasionally > boring) read, the reader learns that Steve is going to > solve his problem with Java with Rhino, a language written > in - Java. Sounds a lot like the guy he bashes for listing > the problem as a requirement.
Hmm, my read was that he was going to use EcmaScript, which is basically the next-generation JavaScript. While often maligned, and while it scores relatively low on prettiness in my book, EcmaScript is actually a nice language that supports fully functional programming constructs as well as OO, optional strong typing, and it also has a surprisingly large and capable API. I certainly like the version of EcmaScript that Adobe produced for its Flex environment. I didn't think of EcmaScript as a JVM language until reading Steve's post, though.
I'm in Steve's target demographic - young, hopefully not dumb as a post (though if I were, I'd never know it), and looking to be a better programmer.
Alex,
Stoustrup's saying is trivially true because people don't complain about tools they never use, and in every useful tool there are flaws. But you shouldn't let this mask the fact that languages do become obsolete, and better languages do appear.
I don't know much of anything, but I'll guarantee that the software of the future will be larger and more complex than the software of today. Language design is a challenge of managing this complexity and helping programmers deal with it. In my mind, this means that the languages of the future will necessarily be more readable. They will have to be. This implies concision, and writing code at a higher level (or at multiple levels) of abstraction.
As for his use of Java... You work with what you have *right now*. Steve is just using it to build himself something better. There's nothing wrong with that. It's just the bootstrapping problem. If he wants to use a dynamic language on the JVM, he has to write an interpreter on the JVM. The only candidate for that on the first iteration is Java. I suspect his next implementation of Rhino (with lessons learned) will be in Rhino ;)
> Stoustrup's saying is trivially true because people don't > complain about tools they never use, and in every useful > tool there are flaws. But you shouldn't let this mask the > fact that languages do become obsolete, and better > languages do appear.
I am not arguing that at all. JavaScript (or ECMAScript, to be exact) is a mighty fine language for doing certain things.
What I am missing is, since he claims that it is actually impossible to do anything in Java without code bloat (which, in turn, makes final product an unmaintainable mess), why would he expect Rhino to be exception to that rule. Correct me if I'm wrong, but Rhino is written in 100% Java. According to the line of thought in the blog, Rhino can turn out to be nothing but a bloated mess. On top of which he will build something that he says will be 'lean and mean'. To me, it sounds like building a super strength light framed house on top of a pile of sand (or maybe dung would be a better metaphor ;-).
I believe that the majority of bloat references were towards application code - not language implementations.
EcmaScript (vs 4? ) has a clear definition that probably won't grow in the same way that an application would.
The whole point of using Rhino is to gain all the cross platform / performance benefits that the jvm has to offer - java being used to parse and compile the javascript is of little consequence - just a means to an end.
There, I hope the pedantic part of the arguments can be put to rest now. ;)
> What I am missing is, since he claims that it is actually > impossible to do anything in Java without code bloat > (which, in turn, makes final product an unmaintainable > mess), why would he expect Rhino to be exception to that > rule. Correct me if I'm wrong, but Rhino is written in > 100% Java. According to the line of thought in the blog, > Rhino can turn out to be nothing but a bloated mess. On > top of which he will build something that he says will be > 'lean and mean'. To me, it sounds like building a super > strength light framed house on top of a pile of sand (or > maybe dung would be a better metaphor ;-).
"So taking for granted today that VMs are "good", and acknowledging that my game is pretty heavily tied to the JVM – not just for the extensive libraries and monitoring tools, but also for more subtle architectural decisions like the threading and memory models – the rational answer to code bloat is to use another JVM language.
Steve's interested in the JVM, not Java. The fact that the first implementation of a dynamic language on the JVM *has to* be done in Java (or bytecode directly ;-) isn't Steve's fault.
His other option would be to write a Java bytecode compiler in a non-JVM language of his choice. But that's just silliness.
> I believe that the majority of bloat references were > towards application code - not language implementations. > > EcmaScript (vs 4? ) has a clear definition that probably > won't grow in the same way that an application would.
It may grow slower, but it will grow nevertheless. And "... code always changes, always always always, ...". So why would a language implementation be an exception? Just look at Java and recent generics mess.
I don't really care one way or the other. To me, the whole blog is exactly what it says it is - a rant. Some time ago I read his blog on how horrible OO is and how functional is the way to go. Today I learn that he has spent last decade writing half million lines of Java. Without a single test.
Hmm well, you are certainly doing a good job clearing up the stigma that is normally associated with java developers.
> I don't really care one way or the other. To me, the whole > blog is exactly what it says it is - a rant. Some time ago > I read his blog on how horrible OO is and how functional > is the way to go. Today I learn that he has spent last > decade writing half million lines of Java. Without a > single test.
> Hmm well, you are certainly doing a good job clearing up > the stigma that is normally associated with java > developers.
I am actually a devil's advocate in this discussion, since I am neither Java developer nor do I care for the language. I'm just pointing out inconsistencies in the mentioned blog that don't make sense to me.
Then I apologize for calling you a java developer. ;)
I'm still confused by your remarks though. Taking a step back from the academic/theoretical language design discussions - the java virtual machine has to be one of the most robust / cross platform compatible VM's out there today. Did you have a better idea for how/what it should be executed in? (the javascript code)
With regard to the large game program written in java combined with thinking languages like java may have many downsides when compared with dynamic languages - I'm also confused as to your point. Wouldn't it be ridiculous of him to spout off all these opinions about java without ever having written anything real with it?
sorry, just trying to figure out if your arguments touch on pragmatic stuff as well as academic..
> > Hmm well, you are certainly doing a good job clearing > up > > the stigma that is normally associated with java > > developers. > > I am actually a devil's advocate in this discussion, since > I am neither Java developer nor do I care for the > language. I'm just pointing out inconsistencies in the > mentioned blog that don't make sense to me.
> What I am missing is, since he claims that it is actually > impossible to do anything in Java without code bloat > (which, in turn, makes final product an unmaintainable > mess), why would he expect Rhino to be exception to that > rule. Correct me if I'm wrong, but Rhino is written in > 100% Java. According to the line of thought in the blog, > Rhino can turn out to be nothing but a bloated mess. On > top of which he will build something that he says will be > 'lean and mean'. To me, it sounds like building a super > strength light framed house on top of a pile of sand (or > maybe dung would be a better metaphor ;-).
For all intents and purposes, all modern languages in common use are built upon some other language. That doesn't mean that they higher level languages necessarily inherit the limitations of the language they are built upon. Otherwise we'd have no reason to use anything other than machine code.
I also take issue with the assertion that building something to run on the JVM implies it must be written in Java. There are a number of languages that compile directly to bytecode.
Finally, I am not going to disagree with the basic point of this article but I find that often the people who complain about Java demonstrate a lack of understanding of how to use it effectively. Don't get me wrong, it's limiting and there is a excessive amount of unavoidable boilerplate.
After many years of using, loving, and then becoming disenchanted with Java, I think writing interpreters is probably one of the best ways to use Java and writing application logic is probably the one of the worst.
Steve's points are well taken. I just can't help but wonder how many wheels Steve has reinvented on his way to 500K.
Is it possible that Steve could have used the services of some infrastructure applications? Did Steve code up his own parsing engine when XML/DOM would have sufficed? Does Steve's application use a database or did he devise some type of custom file-based persistence scheme?
What about Java components such as Beans and RMI? Did Steve consider the use of these technologies?
Python, Perl and Ruby provide very clever and appealing syntaxes that minimize code size and keystrokes, but do not overlook the fact that these languages essentially consist of libraries of components. These libraries can be created by anyone with a mind to do so (witness CPAN) and can be based on the copious amounts of portable C code that abound the Unix-sphere.
In short, any JVM-based interpreted language must feature a simple component technology or program size would not be minimized to the extent we would hope for.
I think there's some selective reading going on here.
Steve Yegge said that code's biggest enemy is bloat, not Java. I think he makes an important point and its much larger than a Java problem. One reason for the prevalence of bloat is that many intermediate programmers do not have a gut sense as to how large a given system *should* be.
For example: seven years ago I worked on a C++ derivative risk system at a broker/dealer. Two years ago I began working on a similar Java application at a competitor. This app was pure Java , though it was written in the style of C. The two applications had a similar functional scope but the first was 60kloc and the second is 750kloc. The developer of the second doesn't realize he has a bloat problem
Flat View: This topic has 24 replies
on 2 pages
[
12
|
»
]