Using and Abusing XML and XSL, Joseph Pelrine, MetaProg
Joseph is in charge of SUnit. Keeping it up-to-date on all dialects was what prompted him to get into this area via the Rosetta tool. He has now resuscitated Ginsu (configuration management tool; see Joseph's talk in my ESUG 2001 report) into Rosetta. Rosetta uses XML to rewrite exported code from one dialect to another. Ginsu lets instance variables have additional stuff e.g. type constraints as needed by GemStone.
Joseph has used XP for longer than it existed (old friend of Kent and Ward) and has been using SCRUM for years. He finds SCRUM gives him all the things he found missing in XP. There is a regular SCRUM meeting at which everyone answers three questions: what have I done, what am I doing, what problems do I have (I was reminded of the Imperial government in India in the Victorian age. It had a very small proportion of bureaucrats and each official had only one form to fill in at the end of each month on which they said what they expected to happen next month as regards tax collection, law and order, etc., and described what had happened in those categories last month, comparing it to what they had predicted on the previous form). Joseph told the joke about the chicken and the pig who planned to start a restaurant. Let's call it the, 'Bacon and eggs' says the chicken. "No way", replies the pig. "You'd be involved, I'd be committed". In SCRUM meetings, only pigs (the committed) can speak, chickens can't even squeak (and no making faces).
XML separates data from presentation of that data: it defines and structures data into custom formats. The price is that you must transform data from one format to another and that is the purpose of XSL. XSL is an XML-based language so it can also be manipulated. XSLT is the transformation-describing language. You can also restructure XML in other ways (by using SAX or by doing DOMs in programming languages).
- XSL Formatting objects describes the formatting semantics.
- XPath (hideous to read) is the syntax for selecting nodes in an XML doc.
This course is about using XSLT and XPath. It uses tags (e.g. ), elements (everything between two tags inclusive), attributes (e.g. < table = ..., space delimited). Entities (e.g.   non-breaking space, &le less than sign.), CDATA and PCDATA. See his SUnit.ExampleSetTestCase on slide (slides are on Joseph's website www.metaprog.com). It shows a test suite result and the times each test took to run.
XML can be well-formed or not; many browsers tolerate very bad XML (missing tags, un-nested elements, ...). Use www.validator.org to see if a URL's html is well-formed. Joseph has tested back to NetScape 2.0 and verified that browsers will not usually break on well-formed html (!!!) but NetScape 4 has a bad reputation and he notes it is an issue to bear in mind.
A DTD or a schema is an abstract definition to which a document can be required to adhere. DTDs are not XML-based and have quirky syntax but are easy to write. Schemas are XML-based but less easy to write. Joseph's rule is to think about how he would approach parsing it. If he would have to worry about scanning then he uses a schema. If the tokenizing is trivial then he thinks it safe to use a DTD. XMLspy is the tool he uses to generate a first-cut schema which he can customise to his needs. (Read these notes with Joseph's slides as they have reference data for DTD layout, etc., that I have not attempted to reproduce.)
There is as yet no published SUnit or JUnit DTD. Joseph reverse-engineered a DTD from some JUnit stuff and has applied this to SUnit. Thus we can generate SUnit html reports that look just like JUnit ones. JUnit DTD pumps out 50 system properties (what version of this, that and the other you're running on) which he has still to add. Errors have their info plus some CDATA (currently a stack dump). PCDATA means the field must hold character data. His slides summarised the regex-like rules for element DTD definition (one, zero or more), alternatives (this or that), defaults, e.g.
...
Entities can be external (as in URL-ref example above) or defined in the DTD. Often entities are defined in a DTD referenced by others so that when they change they change only in one place. It can be hard to track down a definition.
XML Namespaces deal with element name collisions. XSLT relies on namespaces.
XML Tools: there is a big industry:
- XML Spy is Joseph's favourite (from altova.com). It is not cheap (various levels: $400 for version he uses). Joseph demoed by writing S# in XML Spy; S# has a DTD so the tool told him what type of thing he had to put next, etc. It looked very pleasant to use; popup menus prompted you with the list of things you could have at each point.
- XPath Visualizer (Dimitry's tool): Joseph demoed by opening on a file and typing in XPath expressions; the tool highlighted what was selected. He strongly recommends it for checking XPath expressions if you ever need to write any. (Limitation: tool only runs on IE.)
There are various XSLT Tools:
- Xalan came from IBM alphaworks lab and is now on apache.org open source. The Java version, XalanJ, is included in standard zip and has its own weird way of writing XSLT docs. The C++ version, XalanC, lags behind XalanJ
- Saxon and MSXML are other XSLT tool
- XSLTProc is Joseph's preferred tool. It is very fast and is the only one that supports the XSLT1.1 spec. It is written in C. The 1.1 spec lets you output multiple documents which is the key reason why Joseph uses it, since e.g. ObjectStudio has each class in a separate file, as does S#, so Rosetta must be able to do file -> many files
Joseph writes tests to verify users of his utilities have base methods needed.
self
assert: (anSUnitClass selectors includes: #aMethod)
description: 'SUnit requires the method aMethod on class ', anSUnitClass name, '. Please create it with the following implementation....'.
is an example. (This is part of Joseph's Common Base Smalltalk work.)
As an example of a DTD, he then went through Rosetta's concepts.
- Cluster is what he calls a composite CM concept (it is a Cluster in VSE and Rosetta, a Bundle in VW, and is similar to a config map in Envy)
- Package is the leaf CM concept (a package in VW, an Application or SubApplication that has no SubApplications in Envy),
- Subsystem
- Program
- Module specification: timestamped as digits only (collates well in files), exporting dialect, moduleType (one of the above list items)
Rosetta constructs an XML Dome which can handle forward references.
XPath is used to define paths within an XML document: ways to get to some place and ways to find things. It has a library of standard functions. It is a major element of XSLT and is not written in XML. Joseph tried to avoid XPath when he started but when his tasks got complex he had to use it (and found that the comp.lang.xml has some good people who will comment on small examples). XPath can't group (e.g. group all my stuff by class) and he had to work around that.
XPath uses pattern expressions to select nodes in an XML document:
- /roottag gets you to the root element tagged
- /roottag/tag gets you a nodeset containing all the tagged elements within that root
- For example, you might use /catalog/cd/price to get all prices of CDs in a catalog. You can also skip the intervening tags and just type //price to get all price elements in the document, whereas /catalog/cd/* will get all nodes within CDs in the catalog, and /catalog/*/price gets all price elements in items in the root catalog item.
- Predicates are shown in square brackets. They let you select nodes, e.g. /catalog/cd[1] (think cd at: 1), /catalog/cd[(last()] (think cd last, there is no first() function, just cd[1]), catalog/cd[price=10] (think of this as a cd select: [:each | each price = 10])
- using alternate symbol | you can select several paths.
- using @ you select attributes, //cd[@country='UK']
XPath syntax is SQL written by aliens on a bad day.
Stepping to a location start from an axis (a relationship between nodes: am I facing up, down, left or right when I start stepping) and a test (what nodes am I looking for) and much else (see slides). It is best understood by mapping it into smalltalk:
ancestor = superclass, child = subclass, descendant = allSubclasses, following = sibling, self (is the same word, thank god!! :-)).
- child::cd selects all children of the current node that are cds, while ancestor-or-self::cd gets owning node or start node if either is a cd.
- /descendant::cd[position()=7] gets the seventh CD in the document
- there are various unix-like abbreviations (. for current node, .. for ancestor node, etc.)
- all requests are sets, whether obviously going to return a single object or not. If the object requested is not there, an empty set is returned
- order of predicates matters: [5][@type = "classic"] means select classic items within the item at position 5 within the current position whereas [@type = "classic"][5] means select classic items within the current position and then select the 5th such item
Node set equality is in fact an anySatisfy: check while not-equal is a negated allSatisfy:, so node sets can be simultaneously equal and not-equal. XPath has many library functions and lacks many others (e.g. no substrings function - you must call substring recursively). Watch out for when indices start from 1 and when they start from zero: cd[0] but [cd=1].
Never put attributes in XML directly; put them in Cascading Style Sheets. Because the browser does not know whether <table> is data or furniture you need XSL to explain to a browser how to display things.
(Many sites have a page with little on it because they're checking your browser's capabilities; then they route you to pages you can handle, or make appropriate remarks to please turn on Javascript or whatever, or just make remarks about how old your browser is :-). Joseph's site has a hide page trick: see the remarks if you need to, see correct first page otherwise).
XSLT uses XPath to find nodes that match a pre-defined template. Nodes that don't match will remain unmodified or, if that path is never visited, ignored. Joseph showed XML to display HTML of package manifests from packages held as XML, building up the example stage by stage (see slides). He set the stylesheet, as for any HTML doc, then gradually added more and more to an xsl... select statement, using the following:
- An xsl:template has the rules for what to do with matched elements: what elements should it output, etc.
- xsl:value-of gets the contents of an element (e.g. package's name)
- xsl:for-each is looping construct (e.g. for-each select="...pre-reqs...")
- filters are like predicates: [@moduleName != 'Kernel'] avoided showing Kernel in the prereqs as it's not interesting (other operators are =, <, >, can use < and > for these when embedded within quotes)
- xsl:sort is not supported before MSXML 3.0
- xsl:if for control flow (also not supported before MSXML 3.0)
- xsl:choose and xsl:when (xsl:otherwise for default block) for multiple alternatives
- xsl:apply-templates is how you apply other templates' rules to the selected elements or their children, so you can structure your templates, call them from within each other
- xsl:call-template calling a template by name like a function with parameters
With all this built up, the web page now presented the package list effectively.
VW XSL is the implementation of an old spec, so does not handle the more complex things. (Joseph sent a bug report to Steve last year. He does not know when the fix is planned). VA wrappers Xalan. Dolphin does some XSL things. Any Smalltalk could wrapper XSLTprobe (it is a C library and fast). So far no Smalltalk does correct full XSLT in Smalltalk.
Joseph's preferred book is Michael Kay's XSLT Programmer Reference. It has many many examples you can just cut and paste to reuse.
Variables in XSL are really weird to programmers. They are constants, placeholders, not variables; new value => new variable.
Rosetta will convert between all Smalltalks except S#, where it can translate to but not back again. Rosetta has a cross-dialect semantic model plus transformations between dialects. It offers:
- namespace annotations: watch out for accidentally putting subclasses into the superclass' namespace when that's not what you want (it also has selector namespacing support though that is little needed currently)
- custom transformation blocks: for example it uses an RB rewrite rule to exchange Squeak's backarrow and standard := assignment
- splitting of a single source file into many when a dialect requires it
The output dialect must have a satisfactory package concept (thus Ginsu is provided for Squeak) and a small dialect-specific preload module. The input dialect needs nothing. To choose between package concepts, go to the stylesheet and change the chosen concept. The default configuration aims to produce something that will load in a vanilla image so does parcels for VW7 not packages. Currently, Rosetta has seven dialects fully done, four dialects (GNU, #Smalltalk, SmalltalkMT and S#) mainly done and one dialect (GemStone) still to be done (discussion of this led to the conclusion that GemStone should be very easy to get to the mainly done stage).
Joseph ended with some Rosetta demos. A typical scenario is moving stuff to all dialects and then suddenly realising you forgot some minor thing. His first example was realizing you have recategorised some methods: e.g. you reclassified methods in the 'Camp Smalltalk' protocol to several sensibly-named protocols in SUnit. XSLT template can extract className, selector, category and do tree diff to see what you forgot to port and to which dialect.
Demo 2: We looked at the XSLT for transfer from Rosetta standard to VW7 and to AOS.
Demo 3: output tests results and display as HTML.
Q. I noted that Rosetta spits out .dialect tags; sometimes these must be changed to import, e.g. .vwxml must be changed to .xml to import into VW7. Joseph explained that the original tags help track what is what when doing a push to all dialects, outputting files to a single directory. It could be useful to have right tags if doing many exports to a single dialect.
Q. Do as succession of individual transformations? Possible but much harder to track cross-references.
Q. Rosetta in Envy? The commercial add-on does various extra things, e.g. converting code not loaded in image.
Joseph closed by thanking Michael Liepert who got him started on XSLT, Alan Knight, Dave Simmons, Dino Rosati, Camp Smalltalk and ESUG.