Summary:
Computers make life easier because they're so fast, right? Well, yes and no. Do you write efficient code? The author reveals some disconcerting inefficiencies lurking in commonly used software and development practices.
The ability to add new comments in this discussion is temporarily disabled.
Most recent reply: December 11, 2008 11:25 AM by
|
When I try to make code that is functionally correct I look up the definition of the api's I call, consider all possible code paths and try to ensure that the correct value will always be returned.
When I try to make code fast, I measure it under certain hopefully valid scenarios and see what is taking the time. I generally reason about 'known to be slow' operations (like database operations), but overall performance is not something I try to guarantee in the way I do functional correctness.
If we want guarantees of performance, libraries would document their performance characteristics, and we would reason about what the time bounds.
I certainly think that the tools to help us analyze performance would be a lot more valuable that the tools to help with concurrent programming, which is one currently ultra trendy, and over hyped optimization technique.
|
|
|
> If we want guarantees of performance, libraries would > document their performance characteristics, and we would > reason about what the time bounds.
You're right. A lot of API's out there don't spell out time (or space) performance characteristics.
Take Ruby, for example. The RDocs imply that Array#shift is costly and Array#pop is not. Meanwhile, there's no such thing as Hash#pop, but the RDocs don't hint that Hash#shift takes a tremendous amount of time on a large Hash. I tried shifting a Hash last week. Ugh. It seems to go through the trouble of rehashing the whole table with every shift, but I wouldn't know this from the docs.
help(list) and help(dict) don't say anything about Python list and dict performance, nor does the Python Library Reference, as far as I can tell.
So which API's out there spell out performance characteristics? Is this information in docs limited to the world of C++ or something? I notice glancing at the JavaDocs for HashMap, Hashtable, and LinkedHashMap that they're reasonably informative, much more so than Ruby and Python. And I notice that the Perldocs for array pop and shift are reasonably informative.
Perhaps if API's were in the habit of spelling this stuff out, developers be more conscious of performance and less inclined to rely on Saint Moore.
|
|
|
> So you see, there are bazillion possibilities, and also > lot of interesting problems to overcome. However, it > contradicts to current OOP paradigm (read: languages like > Java, C#, Ruby, Groovy) in two fundamental ways: > > 1. Centralization of data structures - obviously, since I > want to emphasize them. This contradicts having networks > of small objects implementing the data structures.
As I already mentioned, this represents a misunderstanding of OO.
> 2. The data structures are static for the run of program - > if I want to optimize them, I need to know the type of > usage ahead. This contradicts having dynamic language.
Java and C# are static languages.
> I generally find the idea of compiler that selects data > structures for you depending on your actual needs and > goals (speed, memory footprint) very intriguing, similar > to the way that current compilers select instructions and > registers for you instead of using assembler and doing it > yourself.
This is something that could be achieved through an OO approach. I don't know of anything that does this (I don't think runtime optimizations in the JVM or CLR like method inlining counts.) Because OO hidesthe underlying data-structures from what is using them, the data structure can often be replaced without affecting the caller.
|
|
|
This was fun :)
One remark I have is that in general it is not a good idea to write a recursive quick sort algorithm, at least not in an imperative language such as C++. Not only it is slow, but leads to a stack overflow.
|
|
|
> to write a recursive quick sort algorithm, at least not in > an imperative language such as C++. Not only it is slow, > but leads to a stack overflow. Only if you get the implementation wrong. A reasonable implementation will provide a log(n) bound on the stack size (hint: recurse on the smaller partition).
|
|
|
> help(list) and help(dict) don't say anything about Python > list and dict performance, nor does the Python Library > Reference, as far as I can tell. > > So which API's out there spell out performance > characteristics? API's? Hard to tell ;) For sorting you can consult the spec: http://svn.python.org/projects/python/trunk/Objects/listsort.txtIn this particular case people usually trust more on Tim Peters than on their own understanding. I argued with him one time on prime number generators and I wouldn't repeat it again.
|
|
|
> Do you believe telling them again (assuming they read > this) will cause them to start putting these basic > principles into practice?
No matter how much I learn, I sometimes find it useful to once again go over the basic priniciples. This is especially true when dealing with a craft.
|
|
|
> > This contradicts having networks > > of small objects implementing the data structures. > > As I already mentioned, this represents a misunderstanding > of OO.
No, you don't understand. There are several layers of data structures in any large program. Let me explain on example what I have in mind:
Imagine your data are like two tables of records, which are related. So you have a class A and B for respective records in those tables. The class A contains (among other things) string class S (points to its instance, to be precise). Then you have classes AA and BB for these tables perhaps, which again point to all records inside those tables. And finally, you have class C which points to AA and BB instances and contains some relation between the records in these two tables.
Now, my point is, even if all the algorithms in classes A, B, C, AA, BB would be optimal, there still could be some unoptimal remnants in the whole scheme. For example, it may happen that class S always stores length of the string, but I just don't need it anywhere in any of the other classes. It may happen that class A is implemented as an array of pointers to the instances, but it would be much more useful from cache standpoint, if it would just have the data from it's instances together. It may happen that class C always requires only a specific set of records from the classes AA and BB, so it would be useful, if they would contain also pointers to a subset of these records stored somewhere.
If I had just an empty paper and see the grand scheme of things from the start (all the classes), I would certainly define the structure of all data in memory very differently, so it could be fast for the problem at hand, class C.
The problem here is thus that the top layer (class C) only sees layer with classes AA and BB, this layer sees only layer with classes A and B, and so on.
There are so much possible inefficiencies from these 5 classes that it's amazing, and consider that large OO programs have hundreds of classes, and tens of such layers.
I believe that is actual reason why today's programs are so slow and memory-hungry. There is just no single simple piece of code that can be easily optimized. It's a big collection of such little things described above, accumulated over the layers.
That's why I am pointing to databases for inspiration - database schemas have only few layers. There are basic datatypes, maybe arrays of them, then records (table rows), tables and their relations. There are always 5 layers, and their structure and purpose is clear, and the optimization is (relatively to OOP) easy because of that.
I explain below why I think Java or C# cannot deal with this problem. I hope you see now why I feel there is need of centralized management of data structures, and compilers that are aware of the data structures in their all complexity.
> > 2. The data structures are static for the run of program > - > > if I want to optimize them, I need to know the type of > > usage ahead. This contradicts having dynamic language. > > Java and C# are static languages.
Yes, they're statically typed, but they must preserve the class structure, to be able to handle overloaded classes and for reflection. It means that they cannot optimize classes away like C++ can do (if virtual methods are not used, that is). In this sense they are more dynamic, because they cannot base their optimizations on implementation of other classes. You see, it's a dilemma - either you optimize but you have to know in advance, or you take advantage of not knowing in advance, and then it's hard to optimize.
The cases that need dynamic features (like handling of unknown inherited classes and reflection) are like 1% of all cases. It's wrong to slow down everything else (by making it impossible to optimize) just because these features are sometimes useful.
|
|
|
> > Java and C# are static languages. > > Yes, they're statically typed, but they must preserve the > class structure, to be able to handle overloaded classes > and for reflection. It means that they cannot optimize > classes away like C++ can do (if virtual methods are not > used, that is). In this sense they are more dynamic, They can and do optimise away virtual invovations if at a point in time there are no overriding methods. If a class is later loaded which overrides such a method then the optimisation is reversed. The JIT has complete knowledge of the classes actually present, and in principle can do any optimisation that such information makes possible.
|
|
|
> > > Java and C# are static languages. > > > > Yes, they're statically typed, but they must preserve > the > > class structure, to be able to handle overloaded > classes > > and for reflection. It means that they cannot optimize > > classes away like C++ can do (if virtual methods are > not > > used, that is). In this sense they are more dynamic, > They can and do optimise away virtual invovations if at a > point in time there are no overriding methods. If a class > is later loaded which overrides such a method then the > optimisation is reversed. The JIT has complete knowledge > of the classes actually present, and in principle can do > any optimisation that such information makes possible.
In addition escape analysis has been implemented successfully in JVMs. This allows the VM to determine that an object reference never leaves the stack and can then 'extract' the members of the object directly onto the stack as variables. It doesn't take much imagination to see that this can be done recursively to eliminate many layers of indirection.
|
|
|
> > > This contradicts having networks > > > of small objects implementing the data structures. > > > > As I already mentioned, this represents a > misunderstanding > > of OO. > > No, you don't understand. There are several layers of data > structures in any large program. Let me explain on example > what I have in mind: > > [snip] > > If I had just an empty paper and see the grand scheme of > things from the start (all the classes), I would certainly > define the structure of all data in memory very > differently, so it could be fast for the problem at hand, > class C. > > The problem here is thus that the top layer (class C) only > sees layer with classes AA and BB, this layer sees only > layer with classes A and B, and so on. This is exactly the misunderstanding I am referring to. And it's a misconception that a lot of people developing OO code believe. Taking your example, the main error is the assumption that because AA is defined as a set of A instances, that AA must be implemented physically as an object that points to a number of instances of A. It's one possible solution but not mandated in any way. Instead A can be implemented as an indirection over AA. For example (Java code): class AA {
final byte[] data;
final int recordLength;
AA(byte[] data, int recordLength) {
this.data = data;
this.recordLength = recordLength;
}
List<A> getAs() {
return new AbstractList<A>() {
public A get(final int index)
{
return new A() {
public String getValue()
{
return new String(data, index * recordLength, recordLength);
}
};
}
public int size()
{
return data.length / recordLength;
}
};
}
}
interface A {
String getValue();
}
|
|
|
I did not mean to bash library vendors for not documenting things.
What I do mean is that when I do:
root = (-b + sqrt(b*b - 4 * a * c)) / (2 * a)
I think about the fact that that if 4 * a * c is bigger than b * b my code fail. And even sometimes about precision; if things are large or small that could cause a large error or even a failure that would not happen in pure mathematics. I never think about what numbers will make sqrt slow.
I don't always expect the operation to complete in the same amount of time. Or even within any given known time bound.
Since I don't reason about the sum or the time (or memory) used by my code, there is no reason to expect that my code will run in any given bound of time.
All I do is reason about the big things (sorting was mentioned).
I feel it is like coding without reasoning about corner cases.
So why won't there be corner cases that are horribly slow?
Why don't I write unit tests that explore those cases and write assertions about the number of operations my code took?
|
|
|
> All I do is reason about the big things (sorting was mentioned).
It's hard to know what is big if the standard docs don't say. It's hard to know whether shifting (or popping) an element off a particular collection is big if the docs don't say.
If "shift"ing an arbitrary element off a Ruby Hash causes the entire hash table to get resized and rehashed, then shifting off all the elements one at a time is complexity O(nlogn) or O(n^2) (not sure off the top of my head). That puts what I tried to do a few days ago in the realm of "big things" like sorting. But I can't know ahead of time if the Hash#shift docs don't provide a clue.
|
|
|
> Instead A can be implemented as an indirection over AA. > For example (Java code):
Ok, so you have shown me it's possible to break layers in Java. However, I have two objections:
1. Is it still OOP? What happened to the encapsulation principle? (I know that the "correct" set of OOP principles is fuzzy and may vary from person to person.) I don't see this technique being the canonical way to program in Java, just like it's not canonical to have your program in one class, with the instance variables as globals (though it's certainly possible to program this Cobol way too).
2. How many programs out there are written like that? If the language doesn't make it easy or obvious to write programs this way, even if the technique may be superior, it won't help in practice.
In fact, I would probably rather deal with program that has networks of small objects than program of the same complexity written this way.
I don't want to argue much. I simply believe it can be done better, essentially, that compiler can select good implementation for A (or even several different implementations) without much fuss on the side of programmer. If you don't believe that, fine, at least you won't compete with me trying to figure that out. :)
P.S. Your example looks more like aspect-oriented programming to me (I don't know much about it, just a thought), having a set of big classes which are mostly accessed through interfaces that expose some part of them.
|
|
|
> 1. Is it still OOP? What happened to the encapsulation > principle? (I know that the "correct" set of OOP Sometimes, if efficiency is critical, you have to compromise.
I have been experimenting with some Java code for matrix multiplication. The classical algorithm with a fully encapsulated matrix achieves 90MFLOPS, but code that is modified to take better advantage of the processor (the test matrices don't fit in the Level2 cache, and it has 4 cores) manages 6000MFLOPS. I don't think compilers for any language are yet able to automatically make all the transformations involved in getting that performance gain.
|
|