The Artima Developer Community
Sponsored Link

Python Buzz Forum
Changing the Structured Blogging plugins' XML output

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand
Changing the Structured Blogging plugins' XML output Posted: Jan 11, 2006 10:07 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Changing the Structured Blogging plugins' XML output
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
Latest Python Buzz Posts
Latest Python Buzz Posts by Phillip Pearson
Latest Posts From Second p0st

Advertisement

One current issue with the Structured Blogging plugins is that they produce HTML that doesn't validate on the W3C validator and feeds that produce warnings on the Feed Validator.

This is because of the method used to embed the structured post's XML source in the HTML output.

How the output looks

The current output looks like this, with the XML source for the post shown in bold:

<script type="application/x-subnode; charset=utf-8">
  <!-- the following is structured blog data for machine readers. -->
  <subnode alternate-for-id="sbentry_5"
      xmlns:data-view="http://www.w3.org/2003/g/data-view#"
      data-view:interpreter="http://structuredblogging.org/subnode-to-rdf-interpreter.xsl"
      xmlns="http://www.structuredblogging.org/xmlns#subnode">
    <xml-structured-blog-entry xmlns="http://www.structuredblogging.org/xmlns">
    <generator id="wpsb-1" type="x-wpsb-post" version="1"/>
    <event type="event/conference">
      <name>Doc's show</name>
      <image>/~phil/sb_latest/images/syndicate_logo.gif</image>
      <person role="organizer" url="http://doc.weblogs.com">Doc Searls</person>
      <description>This is Doc's show.  He organized it, decided what
        panels to have, and he's paying for dinner.</description>
      <tags>doc</tags>
      <begins>2005-12-13T15:57:00</begins>
      <ends>2005-12-13T15:57:00</ends>
    </event>
    </xml-structured-blog-entry>
  </subnode>
</script>

This embedding technique, called x-subnode and invented by the guys at PubSub (I think Bob Wyman and Duncan Werner) when they did the first SB plugin, is pretty clever. Because they don't know about the the application/x-subnode script type, browsers will completely ignore the contents. This means you don't need to enclose the whole thing in a comment to stop it from being displayed. Then, you can just drop the whole thing into an RSS <description> or Atom <content> element and have the structured data flow out through the feed.

Other bits to note:

The alternate-for-id attribute points to an ID earlier in the page which encloses the HTML of this post. This would let a Greasemonkey script reformat the post if it wanted to - or allow a crawler to go back from the structured data to the actual HTML.

The two lines in italics are there to enable GRDDL, which lets RDF people extract meaning from the XML content. This lets us be "RDF compatible" without having to actually generate the RDF.

So, in summary:

  • It lets you embed XML inside HTML without commenting it out.
  • The XML is still accessible using an XML parser, so XSLT etc works.
  • GRDDL tools will be able to turn it into RDF.
  • It works inside HTML and also inside RSS/Atom, so a separate embedding method isn't required for feeds.

Problems

Unfortunately, using <script> for all this fires off warnings everywhere we go, and pretty much everyone who looks at the embedded data, whether in a web page or in a feed, has a really bad first impression. So, it's time to do something about that.

Here are my thoughts so far.

Tidying the GRDDL stuff

It seems (from reading the GRDDL Team Submission, the GRDDL profile document, and Danny Ayers' explanation on how to make microformats GRDDL-friendly), that the data-view bits needn't appear in the XML when embedded in HTML. If we put a profile for Structured Blogging in the HTML header like this:

<head profile="http://structuredblogging.org/profile">

... then, in the profile page, refer to the data-view profile and point to the SB XSLT file using profileTransformation, this will cause the XSLT file to be run on pages generated by the SB plugin.

Getting the XML out of the page

After setting up the GRDDL profile/transform, we could define a microformat to link to the XML source and move it to another URL. This way an RDF crawler would still pick up on it, while crawlers specifically looking for SB posts could look for the links and work from there.

I'm not quite sure how this should look, but here's one possibility: put a class name (e.g. sb_post) on an element surrounding the post, and inside that element, link to the XML source with rel="sb_source". So the HTML for a post might look like:

<div class="sb_post">
  <h3>This is the post title</h3>
  <p>Here is some text</p>
  <p>(<a rel="sb_source" href="/path/to/xml_source">XML</a>)</p>
</div>

Making the XML more accessible inside feeds

Currently the whole chunk of XML (above) is embedded in the description or content elements in syndication feeds, as part of the encoded HTML. It would look a lot nicer if it could be moved out - perhaps like this:

<item>
  ...
  <description>HTML goes here</description>
  <source xmlns="http://structuredblogging.org/xmlns>
    core XML -- <event> from the first example -- goes here
  </source>
</item>

We could GRDDL-enable this by putting a namespaceTransformation reference in the xmlns document.

Pros and cons of the changes

Making these changes would:

  • make everything look a lot nicer,
  • and make everything validate,
  • while maintaining RDF compatibility.

The downside is:

  • the XML would no longer be directly available inside the HTML, so a crawler would have to make more HTTP requests,
  • and feed parsers (like the one powering PubSub) would have to be modified to understand the new syntax.

Hmm.

Comment

Read: Changing the Structured Blogging plugins' XML output

Topic: [Jan 11, 2006 07:56 PST] 1 Links Previous Topic   Next Topic Topic: Wax 0.3.24 released

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use