Summary
In this episode I will explain the meaning of the "code is data" concept. To this aim I will discuss the quoting operation which allows to convert a code fragment into a list of symbols and primitive values - i.e. converts code into data. Then, I will discuss the issue of evaluating data as code.
A distinguishing feature of the Lisp family of languages is the existence
of a quoting operator denoted with a quote ' or with (quote),
the first form being syntactic sugar for the second.
The quoting operator works as follows:
on primitive values such as numbers, literal strings, symbols, etc, it
works as an identity operator:
> '1
1
> '"hello"
"hello"
> ''a
'a
expressions are converted into lists;
for instance '(display"hello") is the list
Hence every Scheme/Lisp program admits a natural representation as
a (nested) list of symbols and primitive values: code is data. On
the other hand, every nested list of symbols and primitive values
corresponding to a syntactically valid Scheme/Lisp programs can be
executed, both at runtime - with eval - or at compilation time
- through macros. The consequences are fascinating: since every program
is a list, it is possible to write programs that, by building lists,
build other programs. Of course, you can do the same in other
languages: for instance in Python you can generate strings
corresponding to valid source code and you can evaluate such strings
with various mechanisms
(eval, exec, __import__, compile, etc). In C/C++
you can generate a string, save it into a file and compile it to
a dynamic library, then you can import it at runtime;
you also have the mechanism of pre-processor macros at your disposal
for working at compile time. The point is that there is no language where
code generation is as convenient as in Scheme/Lisp where it is buil-in,
thanks to s-expressions or, you wish, thanks to parenthesis.
The backquote or (quasiquote) syntax ` introduces a list to be
interpolated (template); it is possible to replace some
variables within the template, by prepending to them the unquoting
operator (unquote) or ,, denotated by a comma. In
our case we are unquoting the user name, ,user. The function
say-hello takes the user name as a string and returns a list
containing the string "hello" together with the username.
There is another operator like unquote, called
unquote-splicing or comma-at and written ,@, which works as follows:
> (let ((ls '(a b c))) `(func ,@ls))
(func a b c)
In practice ,@ls "unpacks" the list ls into its components: without
the splice operator we would get:
> (let ((ls '(a b c))) `(func ,ls))
(func (a b c))
The power of quasiquotation stands in the code/data equivalence:
since Scheme/Lisp code is nothing else than a list, it is easy
to build code by interpolating a list template.
For instance, suppose we want to evaluate a Scheme expression
in a given context, where the contexts is given as a list of
bindings, i.e. a list names/values:
(eval-with-context '((a 1)(b 2) (c 3))
'(* a (+ b c)))
How can we define eval-with-context? The answer is by eval-uating
a template:
Notice that eval requires a second argument that specifies the language
known by the interpreter; in our case we declared that eval understands
all the procedures and macros of the most recent RnRS standard (i.e.
the R6RS standard). The environment specification has the same syntax
of an import, since in practice it is the same concept: it is
possible to specify user-defined modules as the eval environment.
This is especially useful if you have security concerns, since you
can run untrusted code in a stricter and safer subset of R6RS Scheme.
eval is extremely powerful and sometimes it is the only possible
solution, in particular when you want to execute generic code at
runtime, i.e. when you are writing an interpreter. However, often
you only want to execute code known at compilation time: in this case
the job of eval can be done more elegantly by a macro. When that
happens, in practice you are writing a compiler.
Once you realize that code is nothing else than data, it becomes easy
to write programs taking as input source code and generating as output
source code, i.e. it is easy to write a compiler.
For instance, suppose we want to convert the following code fragment
(begin
(define n 3)
(display "begin program\n")
(for i from 1 to n (display i)); for is not defined in R6RS
(display "\nend program\n"))
which is not a valid R6RS program into the following program, which is
valid according to the R6RS standard:
(begin
(define n 3)
(display "begin program\n")
(let loop (( i 1)) ; for loop expanded into a named let
(unless (>= i n) (display i) (loop (add1 i))))
(display "\nend program\n")))
begin is the standard Scheme syntax to group multiple expressions
into a single expression without introducing a new scope (you may
introduce a new scope with let) and preserving the evaluation order
(in most functional languages the evaluation order is unspecified).
More in general, we want to write a script which is able to convert
where the expressions may be of kind for or any other kind not
containing a subexpression of kind for.
Such a script can be thought of as a preprocessor
expanding source code from an high level language with a primitive
for syntax into a low level language without a primitive
for. Preprocessors of this kind are actually very primitive
compilers, and Scheme syntax was basically invented to make the
writing of compilers easy.
In this case you can write a compiler expanding for expressions
into named lets as follows:
(import (rnrs) (only (ikarus) pretty-print))
;; a very low-level approach
(define (convert-for-into-loop begin-list)
(assert (eq? 'begin (car begin-list)))
`(begin
,@(map (lambda (expr)
(if (eq? 'for (car expr)) (apply convert-for (cdr expr)) expr))
(cdr begin-list))))
; i1 is subject to multiple evaluations, but let's ignore the issue
(define (convert-for i from i0 to i1 . actions)
;; from must be 'from and to must be 'to
(assert (and (eq? 'from from) (eq? 'to to)))
`(let loop ((i ,i0))
(unless (>= i ,i1) ,@actions (loop (+ i 1)))))
(pretty-print
(convert-for-into-loop
'(begin
(define n 3)
(display "begin program\n")
(for i from 1 to n (display i))
(display "\nend program\n"))))
Running the script you will see that it replaces the for expression
with a named let indeed. It is not difficult to extend the compiler
to make it able to manage sub-expressions (the trick is to use
recursion) and structures more general than begin: but I leave
that as an useful exercise. In a future episode I will talk of
code-walkers and I will discuss how to convert generic source code.
In general, one can convert s-expression based source code by using
an external compiler, as we did here, or by using the built-in mechanism
provided by Scheme macros. Scheme macros are particularly powerful,
since they feature extremely advanced pattern matching capabilities:
the example I have just given, based on the primitive list operations
car/cdr/map is absolutely primitive in comparison.
The next episode will be entirely devoted to macros. Don't miss it!
In the latest episode I asked you to write an equivalent (for
lists) of Python built-ins enumerate and zip. Here I
give my solutions, so you may check them with yours.
Tcl does not have a syntax based on s-expressions, so if you want to manipulate source code you need a parser. Only for languages of the Lisp family the Abstract Syntax Tree is explicit. This is the important feature that makes code generation easy.
Here's my implementation of enumerate (untested). Added bonus: free range function.
; creates a list of numbers from a to b
; (range 2 5) => (2 3 4)
(define (range a b)
(if (>= a b)
'()
(cons a (range (+ a 1) b))))
(define (enumerate xs)
(zip (range 0 (length xs)) xs))
It seems, that there is the missing unquote operator just before i1 at the last line of the convert-for procedure.
Fixed version:
(define (convert-for i from i0 to i1 . actions) ;; from must be 'from and to must be 'to (assert (and (eq? 'from from) (eq? 'to to))) `(let loop ((i ,i0)) (unless (>= i ,i1) ,@actions (loop (+ i 1)))))
I went a bit further with enumerate and zip. My versions mimic the behavior of the python functions. enumerate returns an object that has a next method that generates 1 pair at a time of an index and a value. In scheme returning a function is common solution. My zip works with lists with different lenghts.
Eh, your are cheating here! I meant solutions using only the concepts explained until now. So, not set! e no call/cc please. I don't want my readers scared to death! ;-)
> The point is that there is no language where code generation > is as convenient as in Scheme/Lisp where it is buil-in, > thanks to*s*-expressions or, you wish, thanks to parenthesis.
I tend to think this is a funny short circuit that presents a problem as a solution. The problem is that people don't like to read and manipulate ASTs a lot. Really. So give them a disgusting program notation where this is the only mode of doing things and hence turn it into a convenience.
Self-modifying programs are a bygone concept in computer science. Modern programming languages preclude this ability, and good assembly language practice also avoids such tricks. It is ironic that a programming language attempting to open a new era in computer programming opens the door to such arcane techniques,... -- Sterling and Shapiro/1994 [The Art of Prolog]