The Artima Developer Community
Sponsored Link

Agile Buzz Forum
AOStA

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
AOStA Posted: Jul 23, 2003 12:30 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: AOStA
Feed Title: Cincom Smalltalk Blog - Smalltalk with Rants
Feed URL: http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml
Feed Description: James Robertson comments on Cincom Smalltalk, the Smalltalk development community, and IT trends and issues in general.
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Cincom Smalltalk Blog - Smalltalk with Rants

Advertisement
Via Niall Ross:
AOStA, Eliot Miranda, Cincom

Eliot started with the 'It's nice when things just work' car add. Smalltalk has optimisations (ifTrue:, etc.). To make it viable for 80's hardware, they cached native code. Thus each send site has a memo of the last send they performed. You get an in-line cache of what method was right for that receiver class last time so you need not do the lookup again for that receiver class. However you've also collected type information on your program.

Traditional static typers use type information to optimise their compilers. This does not map to Smalltalk; exploratory programming and concrete types do not mix. Another issue is that primitive types are costly to type check. Peter Deutch added go-faster primitives (assume type is right) but it was not clear how to generate them until self. Self had no direct slot access and no inlining so a simple in-line cache did not work. Self could not infer much information by looking at the byte codes. Thus Polymorphic Inline Caches were introduced (see Eliot's earlier talk, described at length in my report on ESUG 2000 in Southampton) as self has more polymorphism than Smalltalk (but Smalltalk has quite enough to make PICs valuable). The first call still does the full lookup. Later calls use the table (which will build up to some maximum like 8, after which you fall off the end and do the lookup as you would for a first call on that receiver). Now you have type info for all of the sites except when the table overflows, which is rare (< 2%). You can now use that type information to build optimised methods (needs dynamic deoptimisation for debugger and for redefinition). The self VM that used this shrank from 26,000 LOC to 11,000 and gave much more predictable performance. However the compiler still showed pauses and had a large cache.

Mid-90s, the Animorphic team solved this in applying it to Smalltalk and showed a prototype in 1996 at OOPSLA. This took 5 years of effort and Eliot just does not have the time to repeat this effort (original was lost due to the sad pause of 'DarkPlace DodgyTalk'). However Animorphic's coder is now available to be looked at and some good relevant books have been written. Hobbes (Smalltalk in Smalltalk) also gives information. Thus it would be nice to do it and to do it in Smalltalk. This gives the rationale for AOStA. Eliot and Gilad Bracha will try to do it in Smalltalk.

AOStA moves the VM's complexity and the housekeeping into Smalltalk. Bytecode -> Static Single Assignment info -> type analysis -> faster bytecode. The result is faster because of inlining, optimised primitives (no type checks), using registers as temps, and unboxing floating point operations (see e.g. discussion section in my ESUG 2000 report).

The send site counter used to be in the method prologue but this check of the callee prevents capturing calls from different sites. A simpler choice is therefore to count branches and it is faster. The optimiser must not change the code cache as it is counting so it must freeze and thaw the cache while counting and while applying the result of counting. The performance hit is 15% (10% from loss of static type prediction and 10% from counting when each is done individually, so Eliot doesn't understand why it only causes a 15% hit when you do both, though of course he's pleased about it).

Bytecode to bytecode compilation is easy since bytecode is well structured. Java dynamic optimisers have a much harder job to show gains due to primitive types; the result is that Smalltalk has much low-hanging fruit to pick. In-line primitives can speed up at: calls and instvar initializing by 3.5. Loops can show factor 10 gains.

Eliot showed a loop calling at: in smalltalk and in bytecode. You must check whether the receiver is immediate, is a pointer and whether the index is SmallInteger or not, then you must add the fixed field size to the index, etc. The whole takes 50+ instructions in an x86 machine. Eliot then showed the optimisation. The VM presents the type information as a literal array of branch and send data (bytecode number, then call number or class, method. This and the Static Single Assignment form guide the optimisation.

SSA form eliminates variables in favour of always knowing the type of any value. Reasoning over the SSA of his example, Eliot showed how the VM could deduce the possible values of a variable and so eliminate a type check. The correctness of a type on a call can be trapped for much as a PIC checks types (e.g. VM deduces receiver must be an array; traps if not).

You don't want to unbox floating points aggressively lest you thrash between implementations. Instead, the work is done on intermediate representations. The idea is to eliminate the unboxing of a + b in an overall a + b + c and just unbox the final answer. The code generator will not generate the intermediate stack at all (except for the debugger when needed) but map it directly onto the floating point registers. Generally, complex heuristics are needed to avoid thrashing between bad (borderline?) guesses about what can be optimised. These can be handled much better at the Smalltalk level.

AOStA puts the optimising compiler in Smalltalk, etc. If the optimised guesses prove to be wrong they must revert (deoptimise) on the fly to the old PIC form. All this can be done in Smalltalk using contexts; we just give them methods to let them render themselves as deoptimised. (Test that deoptimising is perfect by partially evaluating the deoptimised and then partially evaluating; values on stack must be the same.)

The optimising is easy to turn off, so that you can do training runs, save and then do e.g. an animation run. You can also instruct the system to optimise infrequently done things which matter to you. You could patch the optimiser dynamically (e.g. download someone's latest from the open repository) and carry on.

Eliot started this October 2002. No floating point is yet done, no dynamic deoptimisation (except toy examples) has yet been done. Eliot has got far enough to discover that implementing a compiler in Smalltalk is a joy and doable in the existing commercial context, which doing it in C++ was not. He has had interest from GemStone, Dan Ingalls, Paolo (GNU) and would be happy to hear from others with compiler smarts.

Read: AOStA

Topic: Hold your hat!  Cool VW Work ahead Previous Topic   Next Topic Topic: Remote Development Panel

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use