Summary
I have devised a simple mark-up scheme for my own purposes which is much simpler than XML.
Advertisement
If you have ever looked at the official XML specification then you know that as far as mark-up languages go, it is far more complicated than it needs to be. Writing an XML parser is supposed to be easy, but in practice is very hard to do correctly. This is a great example of how bad things get when technology is designed by committee.
Anyway, enough bitching. I currently need a simple and efficient markup language to represent textual data with a hierarchical structure, and I came up with the following format I'm calling PicoML:
But in a more serious note, what if I want to include a literal "[" as text? What then? And to be more pedantic, what unit is the price listed in? British pounds? Attributes can be useful in XML, so the price tag can be defined as:
> If you have ever looked at the <a > href="http://www.w3.org/TR/REC-xml/">official XML > specification</a> then you know that as far as mark-up > languages go, it is far more complicated than it needs to > be.
You are not alone. I think everybody who has had a good look at XML has said (at least once) "there's got to be a better way". Some, like yourself, have come up with alternatives that do mostly the same but with must easier to use syntax.
One of my favourites goes a bit like this (from memory)
document ::= [meta_content] tagged_content
meta_content ::= "(" "*" ":" content ")"
tagged_content ::= open_tag content close_tag
content ::= tagged_content* | text
open_tag ::= "(" id_text ":"
close_tag ::= ")"
id_text ::= alpha id_char*
id_char ::= alpha | digit | "_" | "$" | "@" | "#"
text ::= ^( "\" | "(" | ")" ) | escaped_char
escaped_char ::= "\(" | "\)" | "\\"
So using your example we could get something like...
> > One of my favourites goes a bit like this (from memory) > [snip] > > That markup language is quite cool, do you remember the > name?
LOL! I never got to name it. It was one of a few that I dreamed up over the years. But it is the one I like the most. Let's called it YAM (yet another markup) ;-)
The semantics are simple too. The only tricky bits were that it allowed matching parenthesis inside a 'content'. You could have ...
(title: The (Many) Loves of Dobbie Gilles)
so you only had to 'escape' unmatched parentheses. Also, text enclosed in matching quotes was not examined for parenthesis and would not be trimmed. The quotes could be a single quote ', double quote ", or back-quote `.
(extention: " and ')'" )
where in this case the content of the tagged entry 'extention' would be the eight characters between the matching double-quotes.
I used this mark up in a GUI Form definition language for Windows programming that I wrote.
Duh, the way tags are nested look like good ol' SGML with generic end tag instead of no tag at all. > > > One of my favourites goes a bit like this (from > memory) > > [snip] > > > > That markup language is quite cool, do you remember the > > name? > > LOL! I never got to name it. It was one of a few that I > dreamed up over the years. But it is the one I like the > most. Let's called it YAM (yet another markup) ;-) > We could also call it Lisp, though :p
Last week you were suggesting storing program source code in XML, and several people pointed out the complexity of that. This week XML is too complicated now that you are apparently writing your own parser (why?). Will picoML be a better fit for storing source code?
> So who else is tired of looking at an emperor with no clothing?
> > Another question: who writes their own XML parsers? > > There exist XML parsers, for free, that can be used. > > And the overwhelming majority are incorrect! And the ones > that aren't are really really really slow.
Proof? Examples? Incorrect according to what test cases? Slow according to what metric? I don't want to defend XML here but you throw these rocks over the wall all the time, claiming that some language or technique is poorly-designed, poorly-implemented, slow, unsuitable for your purposes. Could you for once back up these statements with an actual example or proof? Until you have a 100% correct and fast XML parser (or programming language, or macro processor) all others can be judged against your claims have no credibility.
I don't know whether you are actually trying to come up with an alternative to xml or not - it seems rather theoretical to me. If not, then I don't think that the question is very interesting. A grammar for a small utility language used only by you and perhaps a few others would have value if it presented something novel which is not the case here as far as I can tell. In case your purpose is to actually compete with xml, just being "simpler" sounds to me very much like the original goals of Java compared to C++ - and now look at the complexity of Java 5. Inertia picks up as your language matures. People will begin asking for schemas, namespaces, include mechanisms, etc. etc. Not to mention tools, libraries in all kinds of languages.
Facile criticisms of XML are popular, and worth little. The designers had engineering tradeoffs to make and not a free hand to create the perfect solution.
> One of my favourites goes a bit like this (from memory)
The more I look at your markup the more I like it! It is essentially an s-expression with labels, which significantly enhances plain old s-expressions. If I was to modify the spec for my own purposes (giving you credit of course) I would make the following changes:
First I would strip the meta_content. I don't see any sufficiently compelling reason to have separate meta tags versus ordinary tags. Next I would remove the special requirements for id characters. This would increase the speed of the parser, and have an interesting side-effect: labels can be used as raw-data. Next I would remove the string parsing rules (which incidentally you left out of the BNF). Finally I leave out the mismatched non-escaping paranthesis rule. It may be convenient for a human, but it adds more complexity to the parser.
My goal is to have a specification with as few rules as possible, and can be parsed with blinding speed.