The typed relational algebra
I introduced some time ago is more mature and I can now give more
examples (this is not all I'm showing today: you can find a link to the source
code below) that illustrate how this algebra addresses the problems I
identified in my previous post:
data model duplication (DB schema and object models)
overall brittleness as the DB schema is modified: tracking the schema changes in the application becomes increasingly hard as the codebase grows
the 1+N query issue
Declaring relations
As strange as it may seem, the following code is valid OCaml... with a twist:
I have extended the grammar using the camlp4 tool, included in the Objective
Caml system. camlp4 adds tremendous power to OCaml by allowing you to adapt the
grammar in a way a bit reminiscent of Lisp's macros.
This snippet defines the relations consumed by the relational operators I will
show later, generates code to create and verify the SQL schema (so the
application can verify that its assumptions about the DB are correct as it
starts), as well as a number of utility functions.
TABLE user users
COLUMN id SERIAL AUTO PRIMARY KEY
COLUMN name VARCHAR(64) UNIQUE
COLUMN age INT NULLABLE INDEXED
COLUMN password VARCHAR(64)
END
TABLE comment comments
COLUMN id SERIAL AUTO PRIMARY KEY
COLUMN title TEXT
COLUMN text TEXT
COLUMN created_at TIMESTAMPZ
COLUMN author SERIAL FOREIGN(users, id)
END
(As you can see, I've made no attempt to automate the pluralization of the row
names...)
Among the values generated by the above code, four are especially important:
those that represent the relations (named "users" and "comments") and those
that stand for the tables ("users_schema" and "comments_schema"). Why
establish this distinction? To put it simply, because we will be manipulating
relations that don't correspond directly to a table (e.g. a subset of the rows
from a table). So relations are consumed (and generated) by the relational
operators, and tables are used when we update or delete rows (insertion is
performed a bit differently, more on that later).
The important thing to keep in mind is that both relations and tables have
information about the columns (and also about the primary keys, for the
latter) encoded in their types, in such a way that we can check statically
that all queries are sound. The good news is that, in this case, "we" means
"the compiler", and "check statically" just means "compile", so this doesn't
require any additional work.
Here's the type of the "users" relation. You need not pay much
attention to this, what matters is that it includes all we need to type-check
the queries:
val users :
([ `User_age of int option | `User_id of int | `User_name of string | `User_password of string ],
[ `User_age | `User_id | `User_name | `User_password]) Relational.Relations.relation
Selection
Here's how you select rows from a relation (incidentally, I'm typing this in
the "ocaml" REPL; the output has been abbreviated for clarity):
# let u_18_to_40 = SELECT [User_age < (Some 40) AND User_age > (Some 18)] users;;
val u_18_to_40 :
([ `User_age of int option
| `User_id of int
| `User_name of string
| `User_password of string ],
[ `User_age | `User_id | `User_name | `User_password ])
Relational.Relations.relation =
[...]
In fact, we can abstract that, and create a function that selects the desired
rows from any relation with the appropriate columns:
# let u_18_to_40 x = SELECT [User_age < (Some 40) AND User_age > (Some 18)] x;;
val u_18_to_40 :
([> `User_age of int option ] as 'a, [> `User_age ] as 'b)
Relational.Relations.relation -> ('a, 'b) Relational.Relations.relation =
<fun>
This is a function and can thus be composed easily:
# let targets x = SELECT [User_name LIKE "%paul%"] (u_18_to_40 x);;
val targets :
([> `User_age of int option | `User_name of string ] as 'a,
[> `User_age | `User_name ] as 'b)
Relational.Relations.relation -> ('a, 'b) Relational.Relations.relation =
<fun>
Here starts the new stuff I hadn't shown before:
# print_endline (Relations.to_sql (targets users));;
SELECT "id", "name", "age", "password" FROM users
WHERE (("age" < 40) AND ("age" > 18)) AND ("name" LIKE '%paul%')
You don't want to manipulate the SQL directly, though (in fact, I might hide
the to_sql function at some point), because what you really need is a way to
access the values from the relation.