Summary:
Computers make life easier because they're so fast, right? Well, yes and no. Do you write efficient code? The author reveals some disconcerting inefficiencies lurking in commonly used software and development practices.
The ability to add new comments in this discussion is temporarily disabled.
Most recent reply: December 11, 2008 11:25 AM by
|
I really enjoyed this article. It's well written (witty and clear), and achieves what its aim seems to be: shake up the experienced developers frequenting Artima that have a responsibility to teach the large crowd of slightly more ignorant software creators that performance is still something to consider. After all, we don't want the often quoted credo (attributed to Intel) of 'Intel giveth, M$ taketh away' to be applicable to our software (be it Windows, BSD, Linux or otherwise based).
It's scary sometimes that most junior software engineers these days don't even know what big-O notation is. That there are algorithms used by all those fancy libraries they are given and that those algorithms have characteristics that should be taken into account when choosing to use or not use them. And hardware and how it is controlled and utilized by software and what that actually may mean is so far from their minds that they might as well be working on an abacus for all they know ...
It is up to the older and more experienced among the developer communities to help them discover that there is more to developing software than pretty GUIs and fancy developer tool sets (nice as those are).
Okay, I'll shut up now *blush*
|
|
|
> I believe that main cause of slow (and resource-hungry) > programs today is the use of prefabricated libraries and > components. These components are often too general for the > purpose they are used for, and are often built from other > components. > > Each such layer will also add innecessary conversions of > data and data structures.
Not necessarily. That's where generic programming shines: providing abstraction without penalty.
> However, once you start stacking these components > together, the inefficiencies will become apparent. > > Also, I believe that (current) object-oriented programming > paradigm is for the most part responsible for the > situation
I agree with you, at least in part. Blindly treating the OO design paradigm as a desirable end in itself, or simply not having the flexibility to think differently can be a major problem. Much of the desktop software we run today has immense quantities of legacy code from the era when everything was supposed to be OO. > In my opinion, object-oriented programs hide > the fundamental data structures of the program in the vast > networks of little objects. This prevents understanding > and optimization of these data structures, thus preventing > usage of optimal algorithms (the whole refactoring thing > is just accommodating a different data structures to the > program, so that better algorithms could be used).
Yes, it is interesting that programmers have spent years developing abstractions (like modules, functions, classes, etc.) to help us organize our code and manage complexity, and have mostly overlooked the abstraction of "what the program is actually doing." That's an algorithm. Again, generic programming is a paradigm that actually allows us to focus on and clarify the abstract algorithm without loss of efficiency. > There are ways out in my opinion. One is to take a look at > VHDL, for instance. It is a very modular language (for > hardware design), yet the final design is optimized as a > whole - there are no redundant data conversions after the > synthesis.
It's comparatively easy to write such entirely declarative languages to address particular domains (DSLs), but I don't think anyone has really succeeded in making a general-purpose programming language that works that way.
> Second approach is to decouple the data structures used in > the program from the actual program logic.
Bingo. Tease out the abstract algorithm so you can reason about its efficiency.
> Any program > that uses a database essentially does that - the common > point is the database schema, and how exactly (with what > algorithms) it is implemented in the database is not > relevant. The actual database implementation can then be > optimized independently of the program, by choice of > indexes for example.
Are you suggesting that programs that use databases don't have efficiency problems?
|
|
|
> One thing that I've long considered a flaw in the Java Map > class is that it puts far too much burden on the > implementor. Often I just want an interface that has a > single method: get(key) -> value. > > It seems to me that this should be the base of a Map API > and then there would be subclasses of this that would > provide more functionality. Yup .. that's why we built our own AbstractKeyBasedMap (com.tangosol.util package), with a total of two abstract methods: public abstract Object get(Object oKey); protected abstract Iterator iterateKeys(); There are obvious / simple implementations for the use cases that you described. Peace, Cameron Purdy | Oracle http://www.oracle.com/technology/products/coherence/index.html
|
|
|
> There are a number of potential alternative designs for > the collection classes. I sometimes wish for a separate > interface for read only collections instead of just > throwing an exception if a mutating method is called. The > trouble is that most such alternatives greatly increase > the number of interfaces and implementions. I see three map implementations (obviously the names would be better in real implementation): interface MapA<V> {
public V get(Object key);
}
interface MapB<K, V> extends MapA<V> {
public Set<K> keySet();
}
interface MapC<K, V> extends MapB<K, V> {
public put(K key, V value);
}
I'm leaving out the other utility methods of Map for simplicity. > The API we have is a compromise, where the > classes/interfaces usually do rather more than we need in > any one instance, but we don't have to remember so much. > True this does cause more work for implementers, but most > people merely use the provided classes, and for those that > do implement collections the abstract collections do much > of the work. Personally, I don't think this is too much to remember. And we wouldn't need any more implementations to have the current common reusable implementations. Each could implement MapC and implement the other two at the same time. And if my method will only look up values and never write to the map, it makes it clear in the method signature. I do agree that most people use the provided classes but what I question is whether people do this because it's complicated to implement a Map. There are a lot of powerful approaches to programming we lose out on in Java because of the attitude that Maps have to be standalone classes. Being able to return an anonymous inner class that allows properties to be retrieved by name allows for a lot of simple reusable code. Think about how much simpler JavaBeans would be if they were an interface that returned a Map over the objects properties. You could even implement them without any hard-coding. And the Map interface is not simpe to implement, even if you extend the AbstractMap interface. That class forces you to implement an entry set which requires implementing Map.Entry and an iterator.
|
|
|
> interface MapA<V> {
> public V get(Object key);
> }
>
> interface MapB<K, V> extends MapA<V> {
> public Set<K> keySet();
> }
>
> interface MapC<K, V> extends MapB<K, V> {
> public put(K key, V value);
> }
I was thinking about this and the proper interface hierarchy would be for MapC to extend MapA. This would allow for a writable map with or without providing keys.
|
|
|
Well, James, I know from my email that the engineers I respect the most enjoyed this, so I'd say I hit my audience. And I know from how slowly some programs run that engineers who should know all this don't always put it into practice.
|
|
|
> Well, James, I know from my email that the engineers I > respect the most enjoyed this, so I'd say I hit my > audience.
Did it tell them anything they didn't already know or did they think it was good for other people to read?
> And I know from how slowly some programs run > that engineers who should know all this don't always put > it into practice.
Do you believe telling them again (assuming they read this) will cause them to start putting these basic principles into practice?
|
|
|
Well I'd say they enjoyed being reminded humorously of what they should know, as they related funny stories of their own, and they mostly despaired of ever reaching those who believe Saint Moore will save them.
|
|
|
> Well I'd say they enjoyed being reminded humorously of > what they should know, as they related funny stories of > their own, and they mostly despaired of ever reaching > those who believe Saint Moore will save them.
Well, if your goal was to just amuse people then maybe you hit your audience. I kind of got the feeling you were attempting to educate people.
I think this is a good article for junior developers. I just don't think this is the place where you will get those eyeballs. If you are reposting it in other places, however, you might. I'm just explaining the reaction you received from people here. I am not criticizing the article. I will point out that your subtle dig at Java was pointless and needlessly partisan.
|
|
|
>> Often I just want an interface that has a single method: >> get(key) -> value. > Yup .. that's why we built our own AbstractKeyBasedMap > (com.tangosol.util package), with a total of two abstract > methods: > public abstract Object get(Object oKey); > protected abstract Iterator iterateKeys();
Thanks, Cameron. Perhaps I'll download it and take a look.
Custom data structures aside... I made a happy discovery yesterday. I read that Java's GC runs in its own thread, and that gave me the idea of putting my thread to sleep. When I put to sleep for a bit the loops that generate lots of garbage, even just once in a while -- sleep(0.001) if rand(50).zero? -- my app is able to massage a much larger mound of data before thrashing for virtual memory.
The moral of this little episode: Read the fine print in the docs and books. For you might be able to help your built-in GC along without explicitly invoking it. (And explicitly invoking a full collection is usually not what you want anyway.)
|
|
|
Elizabeth - try Thread.yield(), which I think is slightly better than a short sleep. If there is a possibility that your calculation thread will be interrupted, call the following method at various times in your processing: /** * Standard way to Check for interrupts * @param message * @throws InterruptedException * @see http://www.forward.com.au/javaProgramming/HowToStopAThread.html */ public static void yieldForInterrupts(String message) throws InterruptedException { Thread.yield(); // let other threads have time if (Thread.currentThread().isInterrupted()) { throw new InterruptedException(message); } }
|
|
|
Morgan Conrad wrote: "Elizabeth - try Thread.yield(), which I think is slightly better than a short sleep."
Thanks. This has prompted me to try Ruby's Thread.pass .
Greg Colvin wrote: "Well I'd say they enjoyed being reminded humorously of what they should know, as they related funny stories of their own, and they mostly despaired of ever reaching those who believe Saint Moore will save them."
I wonder if believing in Saint Moore is an age thing. I'm coming up on 50 and I've never believed in Saint Moore. Before computer hardware became downright disposable and could be purchased with disposable income, getting more memory, or a bigger hard drive, or a niftier graphics card, or a faster machine, or additional machines, was not an option. (Heck, I used to solder up RS-232 cables just to avoid forking out the dough for ready-made...)
|
|
|
Garbage article. There is no new insight whatsoever. All the examples are very tired. Anyone who gives two shits about the field already knows them all. What is the point of this?
I think someone is trying to meet "my publish-o-meter quota" in order to needlessly inflate their ego. The essay must be at least 5 pages long, even if you have only 1 sentence worth of ideas to convey. Academia BS mentality.
|
|
|
> I really enjoyed this article. It's well written (witty > and clear), and achieves what its aim seems to be: shake > up the experienced developers frequenting Artima that have > a responsibility to teach the large crowd of slightly more > ignorant software creators that performance is still > something to consider.
I would agree with this sentiment. Having a laundry list of issues like this could be a good starting point, especially since many of the more experienced developers here presumably need to advise newer developers. (or at least, point them to articles like these).
One thing I found particularly interesting was the section on thrashing. It's easy to forget low level machine specific issues like cache lines when dealing with higher level languages and I was hoping to gather some different viewpoints on this.
From one perspective, having to worry about such issues seems to indicate a certain level of failure in black box abstraction/layering mechanisms. Whether we like it or not, it appears we can't escape the implementation details of the underlying layers, and therefore are doomed to know all of them, and eventually, buckle under all that mind boggling complexity/information overload?
For example, padding a struct so it fits snugly in a cache line is good for performance, but brings in architecture specific code into your program + complicates the code. Obviously, it shouldn't be done unless the code is performance critical, but it still means we are exposed to the details of an underlying layer. What about the mantra taught in school to use black box abstractions to shield yourself from complexity? Can we really blame people for being overwhelmed with all this stuff?
|
|
|
> > Each such layer will also add innecessary conversions > > of data and data structures. > > Not necessarily. That's where generic programming shines: > providing abstraction without penalty.
Yeah, you're right, generic programming in C++ (and also usage of non-virtual class system) allows this layer to be removed during compilation, so these abstractions give no penalty. But that's a special case in today's OO systems.
Probably it's just the whole static vs. dynamic thing. The more things you have static, you lose flexibility, but you can optimize much better.
> > There are ways out in my opinion. One is to take a look > > at VHDL, for instance. It is a very modular language (for > > hardware design), yet the final design is optimized as > > a whole - there are no redundant data conversions after > > the synthesis. > > It's comparatively easy to write such entirely declarative > languages to address particular domains (DSLs), but I > don't think anyone has really succeeded in making a > general-purpose programming language that works that way.
The generic programming you mention is actually an example of paradigm that comes close to that in this respect.
> > Second approach is to decouple the data structures used > > in the program from the actual program logic. > > Bingo. Tease out the abstract algorithm so you can reason > about its efficiency.
Actually, I have an idea in mind based on that, below. > > Any program > > that uses a database essentially does that - the common > > point is the database schema, and how exactly (with > > what algorithms) it is implemented in the database is not > > relevant. The actual database implementation can then > > be optimized independently of the program, by choice of > > indexes for example. > > Are you suggesting that programs that use databases don't > have efficiency problems?
Well, they may have efficiency problems, but part of these problems can be easily solved by hinting the database (by normalizing the schema and creating the correct indexes). So you don't need to touch the program logic at all.
And that's my point. The program logic and the data structure implementation are somehow "decoupled" in programs that use databases (actually, it's still muddy if you use stored procedures, but that's another story).
Let me tell you more about my pet research project (which unfortunately, for the time being, competes with my other pet research projects).
I imagine a language where every data structure (especially the global ones, which cannot be described in terms of standard data structure) would be designed like a database scheme is usually designed. The actual implementation of the schemes would be thus completely decoupled from the program logic.
I even imagine that scheme implementation would be created by compiler (two manual passes - in first, compiler creates a test implementation, then you run tests of your program, you gather statistics about how the schemes are used, and in second pass, the compiler will use the gathered statistics as well as other user's input to create efficient implementation of the data structures).
I think this approach would have many advantages, most importantly, possibility to change the data structures used at will, depending on the change of requirements of program logic.
There are many interesting ideas in this direction. For example, compiler could implement the scheme in completely different way, and use the original scheme as a view of it, so the original scheme would be translated during compilation to the new scheme. You could even have older parts of program to reference the data structures in old way, while having newer parts to reference them in new way - like versioning of data structures.
Other idea is to have "virtual columns" in the scheme, which would be computed from values in other columns. The compiler would decide if it's worth to cache this value in the table, or if it should compute it at each use of this value (depending on how many inserts/reads you do and how difficult is to obtain the value).
So you see, there are bazillion possibilities, and also lot of interesting problems to overcome. However, it contradicts to current OOP paradigm (read: languages like Java, C#, Ruby, Groovy) in two fundamental ways:
1. Centralization of data structures - obviously, since I want to emphasize them. This contradicts having networks of small objects implementing the data structures.
2. The data structures are static for the run of program - if I want to optimize them, I need to know the type of usage ahead. This contradicts having dynamic language.
I generally find the idea of compiler that selects data structures for you depending on your actual needs and goals (speed, memory footprint) very intriguing, similar to the way that current compilers select instructions and registers for you instead of using assembler and doing it yourself.
|
|