Summary:
Pragmatic Programmers Andy Hunt and Dave Thomas talk with Bill Venners about the value of storing persistent data in plain text and the ways they feel XML is being misused.
The ability to add new comments in this discussion is temporarily disabled.
Most recent reply: July 18, 2003 9:52 AM by
|
Dave Thomas says, "XML is useful in appropriate contexts, but it is being grossly abused in most of the ways it is being used today." Read this Artima.com interview with Pragmatic Programmers Andy Hunt and Dave Thomas: http://www.artima.com/intv/plain.htmlHere's an excerpt: Andy Hunt: Virtually any program that's going to operate on text of some sort can operate on plain text as the lowest common denominator. Very often you get into a state where you want to work with some program, but it's properties file has gotten corrupted such that the program won't even come up to let you change the property. If that file is in some binary format that needs the program itself to fix it, you're hosed. You've catch-22ed yourself right out of existence. If it's in a plain text format, you can go in with any generic tool -- a text editor, whatever you like to use to deal with plain text -- and fix the problem. So in terms of emergency recovery, or changes in the field, plain text is helpful. It provides another level of insurance. What do you think of Dave and Andy's comments?
|
|
|
Dave Thomas says: People think, "Once I've got my data in XML that's all I've got to do. I've now got self-describing data," but the reality is they don't. They're just assuming that the tags that are in there somehow give people all the information they need to be able to deal with the data.
One of the most useful parts of any XML file is the comments that are used to help explain things. There are faculties for making XML more human-readable, people simply have to make use of them.
When I hear someone say "XML is self-describing", I don't take that to mean that they think an XML document all by itself can tell you everything you need to know about it. My interpretation of this is that one doesn't necessarily have to draw upon an outside source to decode the meanings of the various tags. Certainly, some systems will use XML in ways that are only slightly more readable than binary formats, in which case, yes you do need to refer to an outside source to understand the file. And there are ways to handle more complex scenarios where you need a DTD or XSD or some kind of data dictionary to understand what is going on in the XML document. But in many cases, the XML document itself can do all of the work.
A big advantage of using XML is the fact that it's pretty widely-known. Using org.apache.commons.digester , I was able to give the users of a simple command-line tool I wrote a means of configuring the tool from a simple XML file (when I say simple, I'm talking about 5 or 6 elements). I provided an extensively commented sample XML file that could be used as a template for later configurations. If I need to extend the features of this file, I can do so easily without fear that my users will have to learn new, unfamiliar syntax for the configuration file (they will only need to learn any new elements or attributes that are added), as long as I remain within the XML standard. And chances are good that I'll be able to find a parser, perhaps one as easy to use as Digester, that can accommodate the changes, so I won't have to roll my own.
In short, I think a large degree of the responsibility for ensuring that XML documents are human-readable is with the author, or the author of the tool that will generate the documents. It's not the XML format itself that has an inherent problem with being human-readable.
|
|
|
yes... some times I will think some XML is too complicate to handle, like the file format of openoffice. I will think that some content package format, like SCORM ( http://www.adlnet.org), only store the table of content and the pointer to the real content in the XML, then the package reader get the individual page information according to the pointer. I wonder if it is the better approach of storing big XML?
|
|
|
It's not always immediately obvious what a new technology really adds. I think TimBL (maybe Fielding?) once said of the web (paraphrasing) HTML isn't so great, HTTP isn't so great, but a universal address space (i.e. URI) turns out to be very useful.
In the case of XML, angle brackets and attributes may not be so great, and sure, people can do some pretty stupid stuff, but the advancement of Unicode as a standard text encoding is, IMHO, a Good Thing.
|
|
|
Just a quick pointer for the programmers out there: if you're interested in using plain text files as a way to generate Docbook XML files (which has numerous advantages, considering the growing variety of tools for it, but IMHO is a pain in the neck to edit "by hand"), check out the docutils plaintext converter (written in Python): http://docutils.sourceforge.netWith this tool, I can write using plaintext files with almost no markup (or very very little, in any case), then easily generate HTML pages or LaTeX documents, and eventually Docbook XML (there is an unfinished-but-working implementation of conversion to Docbook).
|
|
|
That's interesting... while I admit it's a mild pain to add the tags, and I'm sure using docutils makes it easier (I haven't used docutils, though), I was able to add most of the tags to a Docbook document I wrote, using careful Perl scripts and a feature of the BBEdit text editor called "Glossary". The real pain in the neck of using Docbook was getting Jade to work.
|
|
|
> Dave Thomas says, "XML is useful in appropriate contexts, > but it is being grossly abused in most of the ways it is > being used today."
The problem with XML is that it's the round peg that's being shoved into holes of every shape.
We're better off today because we have XML; at the small end of the spectrum, I'd rather parse someone's tags rather than deal with writing regexes or yacc grammars each and every time someone has worthwhile data/metadata I want to process. And let's not forget that not every programmer can switch hats to write bug-free regular expressions or yacc grammars. If they can express their data with a small XML vocabulary, I'm not going to be held hostage by their buggy grammar or my buggy regex; we can agree to use an XML parser as a foundation and reliably interchange data with each other.
Of course there are instances where XML is misused, abused or simply a bad choice. Ant build files are one example. Apple's plist format is another. At the same time, there are text-based grammars that are worse -- Makefile syntax (in all its many splendored dialects), *roff or *TeX for example -- so shunning XML in favor of a simpler text-based format is not a panacea.
The crux of the problem we are facing is that programmers do not create easily parsed, simple data formats easily (complex formats and buggy parsers are much easier to create). The issues with abusing XML are the same ones we find with underspecifying a text-based grammar, or creating an overly complex text-based grammar. Swinging back from XML to text/yacc isn't a magic bullet, and won't solve any problems in and of itself.
|
|
|
FWIW, Word 1 was pretty much just plain text.
|
|
|
> FWIW, > Word 1 was pretty much just plain text.
That's hilarious. Thanks. Perhaps we should change that to Word 2.
|
|
|
Word 1...ahhh. It wouldn't even run on the Windows 2 (not 2000) my dad had, but it was smoking on my old 8088 with two 5 1/4 floppy drives, no hard disk, and an Amber screen. ...I think my watch is now more powerful!
|
|
|
Loud cheers for the statement that XML is an ill-fit for user-input. That's exactly what bothered me in the recent efforts at rebuilding aspectj in open source XML format. (See Cedric Beust's weblog or Rickard ?berg's). Somehow they seem to think it's the other way round. Now my own opinion is validated by the authorities. My superego is sighs with satisfaction. :-)
But what I'd like to ask Dave & Andy: is writing your own grammar using javacc, lexx/yacc or whatever the only alternative they think of when they say 'xml sucks'? Because a lot of times I think XML with it's metadata validation (XSD's) is better then the alternative which would be at most customersites I visit no validation or validation interpersed with parsing code.
Ofcourse XML is now mainstream so there's more newsvalue in criticising it, but unless Dave & Andy can come up with some other alternative than using lexx/yacc I think XML is most of the times the least bad alternative.
And Sun's machine generated XML files seem to have been a first try at declaring information (deployment information, transaction information,...) that is not in the same plane as the code. And looking back I think we decide that it has these drawbacks they mentioned and that metadata attributes as in .Net and JSR 175 & 182 offer a much better design.
groetjes, Joost
|
|
|
> > Dave Thomas says, "XML is useful in appropriate > contexts, > > but it is being grossly abused in most of the ways it > is > > being used today." r > Of course there are instances where XML is misused, abused > or simply a bad choice. Ant build files are one example. My colleauge Matt Foemmel is getting sick of reading (and writing) ant files in XML. So he's come up with an alternative: Pynt < http://pynt.sourceforge.net/>. It uses a Python like scripting language and can easily call ant tasks.
|
|
|
Personally I don't have a problem writing all my ant scripts in XML, at least it stops my colleagues who try to do: <tag1/>
<tag2/>
</tag1>
no matter how often i explain, the build falling over is a good indicator for them. If using a script suits someone that's fine, but in a larger project, who's to say that there's going to be a common language between all the developers, at least XML is simple (well, barring my previous example :-)).
|
|
|
> My colleauge Matt Foemmel is getting sick of reading (and > writing) ant files in XML. So he's come up with an > alternative: Pynt < http://pynt.sourceforge.net/>. I can't get this link to work. V.
|
|
|
After you've clicked on the link you should remove the final character '>' in the resulting adres in your browser. It should read http://pynt.sourceforge.netVery interesting, this pynt. Especially if it leverages existing Ant tags and - scripts. Hope it will be able to generate some following. groetje,s Joost
|
|