|
Next up, after a coffee break: Code Optimization with Adriaan van Os of Soops. He's working on their rule-based data warehouse application - a decentralized coupling of the electricity markets (Netherlands) with VW. The latter is calculation intensive. |
Optimization is hard, and it's hard to predict where the problems are. Sometimes, you can make an application harder to extend and maintain after optimization. You can also run into problems with platform specific code (using tools like VW).
Advice: Concentrate on the design first, and optimize if you have to. To analyze:
Time millisecondsToRun: ["code here"]
{Multi)TimeProfiler (VW)
(Multi)AllocationProfiler (VW)
If you do timings, run the test multiple times and average. Things that will pop out: LargeInteger use, running into the allocation ceiling. It's still hard to get consistent results. When you do see the profiler, you need to filter out areas you don't care about.
bear in mind as well - there are "special" selectors in VW (and other Smalltalks) which get optimized. You can actually change that - look in DefineOpcodePool for details. There are also optimized selectors that will be inlined - see MessageNode class.
Even with all this, trust the results you get from actual profiling and timing - don't make assumptions about what "should" be faster.
It helps to keep Blocks "clean" - declare variables in the innermost scope, don't return from inside them. With numbers, avoid fractions and large integers unless you can't, and use Doubles. Also, this:
10.0 * 10
is faster than
10 * 10.0
Put the higher generality first. With collections, avoid intermediate collections and repeated iterations. Pre-size them if you know "about" how big they should be.
Another thing to bear in mind: optimization "tricks" (like message A is faster than message B) can change across releases of your Smalltalk product. It also helps to pick the right collection (i.e., IdentitySet or Set, IdentityDictionary/Dictionary) based on what keys you plan to use.
If you have lots of conditional statements, you might well have too few classes and too little polymorphism. If you need a "case" statement, put the common cases at the top.
Caches often help, but keep them simple, and make lookups fast. You don't want to give back in lookup.
For Gemstone, most of this applies, but there are a few more things:
- Objects may be on disk, not in memory
- make use of identity because of that
- Objects may be distant (network/disk) - minimize copying/replicating
- GC is harder with shared objects
Now he's giving some examples, with results. I'm not copying that all down :) The idea here is, check the results!
Recommendation: Travis Griggs' StS talk.