Summary
One of the trickiest things about Scheme is the distinction between the interpreter semantics which is typically
(but not always) used at the REPL and compiler semantics
which is typically (but not always) used in scripts.
Advertisement
The compilation and evaluation strategy of Scheme programs
One of the trickiest aspects of Scheme, coming from Python, is its
distinction between interpreter semantics and compiler semantics.
The problem is that the same program can be executed both with
interpreter semantics (typically when typed at the REPL)
and with compiler semantics (typically when run as a script), but the way the
program behaves is different. Moreover, there
are programs which are valid at the REPL but are rejected by the compiler.
To make things worse, the interpreter semantics is unspecified by the
R6RS report, whereas the compiler semantics is loosely specified,
so that there are at least three different and incompatible semantics
about how programs are compiled and libraries are imported:
the Ikarus/Ypsilon/IronScheme/MoshScheme one, the Larceny one and the
PLT one.
In other words, there is no hope of making programs with the
interpreter semantics portable; moreover, there also plenty of
programs with compiler semantics which are not portable.
Fortunately the module system works well enough for most simple
cases. The proof is that we introduced the R6RS module system in
episode 5, and for 15 episode we could go on safely by just using the
basic import/export syntax. However, once nontrivial macros enters in
the game, things are not easy anymore.
First of all, let me clarify what I do mean by interpreter semantics
and compiler semantics, terms which have nothing to do with
being an interpreted or compiled language, since both Scheme interpreters
and Scheme compilers exhibit both semantics.
Compiler semantics means that a program has (at least)
two phases, the run-time phase and the expand-time phase, and
some parts of the programs are executed at expand-time and some
other parts of the program are executed at run-time. Scheme has
a generic concept of macro expansion time which is valid even
for interpreted implementation when there is no compilation time.
Interpreter semantics means that a program is fully evaluated at
runtime, with no distinction between phases (for pure interpreters)
or with interleaved expansion and evaluation phases (for
incremental compilers).
For instance Ikarus and Ypsilon work as incremental compilers
at the REPL (I consider this as interpreter semantics, by stretching
the terminology) and as batch compilers for scripts (for Ypsilon
this is true only when the R6RS
compatibility flag is set).
Python works as an incremental compiler at the REPL (each time you enter
a function in the REPL it is compiled to bytecode, and you can
extract the bytecode by looking at .func_code attribute) and
as batch compiler for scripts.
Conceptually, in Python everything happens at runtime, including
bytecode compilation. While technically bytecode compilation
is cached, conceptually you may very well think that every module
is recompiled at runtime, when you import it - which is actually what
happens if the module has changed in the meanwhile.
In short, you can consider Python as an interpreter (as it is usually
done) and there is no substantial difference between typing commands
at the REPL and writing a script. There are a few minor differences
actually, but they are not relevant for what I am discussing now.
Things are quite different in Scheme. The interpreter semantics is
not specified by the R6RS standard and it is completely
implementation-dependent. It is also compatible with the standard to
not provide interpreter semantics at all, i.e. to not provide a REPL:
for instance PLT Scheme does not provide a REPL for R6RS programs
(it does provide a REPL for non R6RS programs which is actually quite
exceptional since it uses compiler semantics and not interpreter
semantics!).
The compiler semantics i.e. the expansion process
of Scheme source code is (loosely) specified by the R6RS
standard and is used in libraries. The semantics used in scripts is not clear
(in the words of Will Clinger there is no such thing as an R6RS-conforming
Scheme script, because Scheme scripts are described only
by a non-binding document that was never ratified).
The difference between the two semantics is most visible when you have
macros depending on helper functions. When a program is read in
interpreter semantics, everything happens at runtime: it is possible
to define a function and immediately after a macro using that
function.
When a program is read in batch compiler semantics instead, all the
definitions and the expressions are read, the macros are expanded and
the program compiled, before execution.
Implementations have a considerable freedom in what they allowed
to do; for instance Ypsilon scripts use batch compiler semantics when the
--r6rs flag is set, but by default they use incremental
compiler semantics, just as the REPL. On the opposite side of the
spectrum, the PLT REPL (in non-R6RS mode) basically
uses batch compiler semantics.
In any case the behavior of code typed the REPL is never identical to
the behavior of a script: for instance, at the REPL you can import
modules at any moment, whereas in a script you must import them at the
beginning. There are other subtler differences, for instance in the
behavior of continuations. Then bottom line is that you should not
believe your REPL blindly.
As I said, you see the problem of compiler semantics once you
start using macros which depend from auxiliary functions. More in
general there is the same problem for any identifier which is used in
the right hand side of a macro definition and not inside the
templates. For instance, consider this simple macro
which raises a compile-time exception (syntax-violation) if it is
invoked with duplicate arguments. Such macro could be used as a
helper in macros defining multiple names at the same time, like
the multi-define macro of episode 9.
assert-distinct relies on the builtin function
bound-identifier=? which returns true when two identifiers
are equal and false otherwise (this is an extremely simplified explanation,
let me refer to the R6RS document for the gory details) and
on the helper function distinct? defined as follows:
;; check if the elements of a list are distinct according to eq?
(define (distinct? eq? items)
(if (null? items) #t ; no items
(let+ ((first . rest) items)
(cond
((null? rest) #t); single item
((exists (cut eq? first <>) rest) #f); duplicate
(else (distinct? eq? rest)); look at the sublist
))))
distinct? takes a list of objects and finds out they are all
distinct according to some equality operator, of if there are duplicates.
Here are a couple of test cases:
(test "distinct"
(distinct? eq? '(a b c))
#t)
(test "not-distinct"
(distinct? eq? '(a b a))
#f)
It is natural, when writing new code, to try things at the REPL and to
define first the function and then the macro. The problem is that the
code will work in REPL: however, in R6RS-conforming implementations,
if you cut and paste from the REPL and convert it into a script, you
will run into an error!
The explanation is that in compiler semantics macro
definitions and function definitions happens at different times. In
particular, macro definitions are taken in consideration before
function definitions, independently from their relative position in
the source code. Therefore our example fails to compile since the
assert-distinct macro makes use of the distinct? function
which is not yet defined at the time the macro is considered,
i.e. at expansion time. Actually, not only functions are not evaluated
at expansion time and cannot be used inside a macro, but in general
the right hand side of any definition is left unevaluated by the compiler.
This explains why (definex(/10)) is compiled correctly,
as we discussed in the previous article .
There are nonportable ways to avoiding writing the helper
functions in a separate module. For instance Ypsilon scripts by
default (unless the strict R6RS-compatibility flag is set) use
interpreter semantics and have no phase separation. On the other
end of the spectrum, mzscheme has very strong phase separation,
but it is still possible to define helper functions at expand-time
without putting them in a separated module, using the nonportable
define-for-syntax form.
Nevertheless, the only portable way to make
available at expand time a function defined at runtime is to
define the function in a different module and to import it at
expand time.
Ikarus and Ypsilon use the semantics of an incremental compiler:
each top level block of code is compiled - to native code in Ikarus and
to bytecode in Ypsilon - and executed immediately. Each new definition
augments the namespace of known names at runtime, both for first class
objects and macros. Macros are both defined and expanded at runtime.
It is clear tha the semantics of an incremental compiler is
very similar to the semantics of an interpreter; here is an example in
Ikarus, where a macro is defined which depends from a helper function:
However, an incremental compiler is not identical to an interpreter,
since internally it uses phase separation to compile blocks
of code; for instance in Ikarus if you put together the
previous definition in a single block you get an error,
since the function double is known at run-time
but not at expand-time:
There are still Scheme implementations which are pure interpreters and
do not distinguish expand time from runtime at all; here is an example
in Guile (notice that Guile is not an R6RS implementation):
I am using define-macro here which is the built-in macro mechanism
for Guile: as you see the function double is immediately available
to the macro, even if it is defined inside the same block as the macro,
which is not the case for any of the existing R6RS implementations.
Notice however that Guile also supports high level macros (via an external library)
with compiler semantics.
The interpreter semantics is the most intuitive and easier to
understand. In such semantics everything happens at runtime; the code
may still be compiled before being executed, as in incremental
compiler, but this is an implementation detail: from the point of view
of the programmer the feeling is the same as using an interpreter -
modulo the tricky point mentioned in the previous paragraph.
The interpreter semantics is also the most powerful semantics of all:
for instance, it is possible to redefine identifiers and to import
modules at runtime, things which are both impossible in compiler
semantics.
If you look at it with honesty, the compiler semantics is basically a
performance hack: by separing compilation time from runtime you can
perform some computation only once (at compilation time) and gain
performance. This is not strange at all: compilers are performance
hacks. It is just more efficient to convert a a program into machine
code with a compiler than to interpret it expression by expression.
The other main reason to favor compilers over interpreters, apart
from performance, is compile-time cheching. Compilers are able to reject a
class of incorrect programs even before executing them.
Scheme compilers are traditionally not too strong in this respect, because of
dynamic typing and because of the design philosophy of the
language (be permissive, we will solve the errors later). Nevertheless,
with macros you can in principle add all the compile-time checkings
you want (we just saw the checking for distinct names):
it is even possible to turn Scheme into a typed language, like Typed Scheme.
Another (minor) advantage of the compiler semantics is that it makes
it easier for static tools to work with a program.
For instance in Python an IDE cannot implement autocompletion of names in a
reliable way, without having knowledge of the running program. In Scheme
an IDE can statically determine all the names imported by the program
and thus offer full autocompletion.
If expansion happens before evaluation, then (macro) will either
1. expand to a definition of x, and in that case, the reference to x in the last line refers to the introduced definition and can be compiled accordingly. Or,
2. not expand to a definition of x, and in that case, x at the last line refers to the let-bound x and can therefore be compiled accordingly.
In interpreter semantics, the x at the last line may mean one thing in one call to foo and mean something else in another call. Thus, the x cannot be compiled (or even understood) by the implementation until evaluation reaches that point. And after the compiler understands what x means the first time, it cannot reuse that information the second time foo is called because the macro has to be expanded again, which may change the meaning of the program.
In essence, run-time macros shaft the compiler: code cannot be compiled before being executed. It has to be re-expanded and re-analyzed every time it is evaluated.
PLT Scheme does provide a "REPL for R6RS" (whatever that means, given that R6RS does not specify one): - start DrScheme, - choose the module language, - type in #!r6rs, - hit the "Run" button, - get an R6RS repl.
> PLT Scheme does provide a "REPL for R6RS" (whatever that > means, given that R6RS does not specify one): > - start DrScheme, > - choose the module language, > - type in #!r6rs, > - hit the "Run" button, > - get an R6RS repl.
Then I stand corrected (I do not remember where I read that PLT had no REPL for R6RS mode). Question: I do enable the R6RS REPL if I use mzscheme from the command line and I do not use DrScheme?
> In essence, run-time macros shaft the compiler: code > cannot be > compiled before being executed. It has to be re-expanded > and > re-analyzed every time it is evaluated.
You are of course right, I should have expressed the concept better, but I am sure my readers will not be confused.