I assume you already know what pure functional language means: a
language is purely functional if variables cannot be re-assigned,
data structures cannot be modified, and side effects are excluded.
Of course, it is impossible to program with a pure functional
language, since input and output are based on side effects and
you cannot have a sensible program without input and output.
However, a practical functional language can still be as pure as
possible if it is able to confine the non-functional aspects to input and
output only, in a controlled way.
The purest functional language
out there is probably Haskell; on a lower level of purity we find
the languages of the ML family (SML, OCAML, F#, ...); on a lower
level there is Scheme. Python and Common Lisp are at the same level,
both below Scheme.
I would not consider Python and Common Lisp as
truly functional languages: they are just imperative languages with
some support for functional programming (I mean constructs such as map,
filter, reduce, list comprehension, generators, et cetera).
However, there is a large gap between an
imperative language with some support for functional programming and a
true functional language.
True functional languages have strong support for recursion
(tail call optimization), for higher order functions and for pattern matching;
moreover, true functional languages are based on immutable data
structures. Scheme is somewhat less functional than SML and Haskell,
since Scheme lists are mutable, currying is not supported by the base
syntax of the language, (it can be implemented via macros, of course)
and generally speaking one uses higher order functions less.
While not pure, Scheme can be quite functional if
you avoid rebinding and you restrict yourself to functional data
structures, and it allows many typical idioms of functional programming
which have no counterpart in imperative programming. We already saw a common
trick in episode #5, i.e. the accumulator trick, which is a
way to avoid mutation in loops by using recursion. In this episode and
the next ones we will show many others.
Historically, the basic data types of Lisp languages (pairs and
lists) have always been mutable. In all Lisp dialects (including
Scheme) it has always been possible to modify the car and the
cdr of a pair freely. The situation changed with the
R6RS report: nowadays the imperative procedures set-car! and
set-cdr! have been removed from the rnrs environment.
Actually, it is still possible to mutate pairs, but only by
extending the rnrs environment, i.e. by importing the
(rnrsmutable-pairs) extension, which is part of the standard library
but not of the core language. This is a clear indication of the fact
that the functional paradigm is somewhat a recommended paradigm in Scheme,
even if it is not enforced (in strongly functional languages, such as SML
and Haskell, lists are immutable and there is no other option).
I should notice that the requirement of importing (rnrsmutable-pairs) is only valid for scripts and libraries: the behavior
of the REPL is unspecified in the R6RS document (actually the R6RS
forbids a REPL but every R6RS implementation provides a REPL with some
different semantics) and implementations are free to import or not to
import mutable-pairs in the REPL. The REPL of Ikarus imports
mutable-pairs by default, so you have at your disposal
set-car! and set-cdr!, but this is an implementation specific
choice; other implementations can behave differently from Ikarus at
the REPL, and in general they do.
Excluding code typed at the REPL, in principle the compiler could use
immutability optimizations for library code not importing the (rnrsmutable-pairs) extension. In practice, a Scheme compiler cannot
perform the optimizations based on the assumption of truly immutable
pairs, because the current standard says that cons must allocate a
new pair and cannot re-use a previously created one
(i.e. (eq?(consxy)(consxy)) is always false for any value of
x and y: even if the two conses have the same value, they
correspond to different objects).
If pairs were really immutable, a compiler could use the same object
for equivalent pairs, i.e. '(x.y) could be the same as (consxy) and a compiler could cache that value: with the current
standard that cannot never happen. The point is well explained in a
recent thread in comp.lang.scheme (Really immutable pairs).
Really immutable pairs (and thus lists) have lots of advantages: I would
welcome them in the standard, but I am not sure if that will happen,
since there is a potential compatibility breaking problem. I can only
hope for the best. Immutability has advantages from the point of view
of efficiency and makes the life of language implementors easier, but
those are not really important point for an application programmer.
The important point for the mere mortals is that programs based on
immutable data structures becomes easier
to understand and to debug. For instance, consider a routine taking
a list in input and suppose that the content of the list is not
what you would expect. If the list is mutable you potentially have to
wonder about your whole code base, since everything could have mutated
the list before reaching the routine you are interested in.
If the list if immutable, you are sure that the bug must be
in the procedure which created the list, and in no
other place.
Moreover, functional structures avoid whole classes of bugs (I am sure
every Pythonista has found some issue with lists being mutable,
especially when used as default arguments) especially in the hairy
situations of multithreaded code. In my Adventures I will never
rely on the ability to mutate pairs, and I will use pairs as
functional data structures.
There is apparently an issue with immutable data structures:
in many imperative programs one needs to
modify the data: but how can you update an immutable object?
The answer is actually pretty simple: you don't. Since you cannot mutate an
immutable object, the only option is to create a brand new object with
a different content from the original one. This mechanism is called
functional update: for instance, it is easy to define two functional
procedures set-car and set-cdr performing the job of
set-car! and set-cdr! but without mutation:
What if you want to (functionally) update the n-th value of a list?
The trick is to use recursion:
(define (list-set n lst value)
(if (zero? n)
(set-car lst value)
(cons (car lst) (list-set (- n 1) (cdr lst) value))))
> (list-set 2 '(a b c d) 'X)
(a b X d)
Notice that list-set is nicer than its imperative counterpart:
(define (list-set! n lst value)
(set-car! (list-tail lst n) value) lst)
> (define ls '(a b c d))
> (list-set! 2 ls 'X)
> ls
(a b X d)
Notice the use of the R6RS procedure (list-taillstn) which
returns the tail of lst starting from the n-th element.
The indexing starts from zero, as usual (for
instance (list-tail'(abcd)2) is the list (cd)).
The important bit to understand how list-set! works
is that list-tail returns the tail, and not a copy
of it: by mutating the tail you are actually mutating the original
list. This is clearly quite risky.
The R6RS standard does not provide primitives for functional update
out of the box (
list-set! is not in the standard since it is of very little
utility: if you want to be able to modify the n-th element of a
sequence, you are much better off by using a vector and not a list).
However, it does provide primitives to remove elements for a list
functionally:
(you can find other list utilities in R6RS standard library).
One would expect functional update to be much slower than imperative
update, because of the need of creating potentially long lists to
modify just a single element. However, this is not necessarily
true. It all depends on how smart your compiler is. A smart compiler
can internally use mutation
(at the machine level), so in principle it could be as fast as
imperative code. The advantage is that mutation is managed by the
compiler, not by the programmer, who can think purely in terms of
immutable data structures.
It is true that compilers for functional languages are still slower
than C compilers in average, but they are not so bad. In most
situations the slowdown is acceptable and in particular situations
compilers for functional languages can be faster than a C compiler.
Moreover, they allow for memory optimizations. For instance, you can
memoize an immutable list, whereas you cannot memoize a mutable
one.
Since I come from a Python background I do not care much about
performance and optimizations, but I care a lot about maintainability
of programs and bug reduction, as well as about conceptual cleanness.
Moreover, this series has also some pedagogical intention, therefore
I will prefer functional solutions over imperative ones here, in order
to show new ways of doing old things.