The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
The Dark Side of Atom

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Christian Neukirchen

Posts: 188
Nickname: chris2
Registered: Mar, 2005

Christian Neukirchen is a student from Biberach, Germany playing and hacking with Ruby.
The Dark Side of Atom Posted: Jun 27, 2005 1:19 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Christian Neukirchen.
Original Post: The Dark Side of Atom
Feed Title: chris blogs: Ruby stuff
Feed URL: http://chneukirchen.org/blog/category/ruby.atom
Feed Description: a weblog by christian neukirchen - Ruby stuff
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Christian Neukirchen
Latest Posts From chris blogs: Ruby stuff

Advertisement

Yesterday antifuchs told me about a problem with the Atom feed of Anarchaia, that now and then includes IRC quotes like this:

#ruby-de
12:18 <ionas_> alles was nicht analog ist ist lossy ;p
12:18 <ionas_> und alles was analog ist geht schnell kaputt ,p

In raw HTML, this looks like that, this code is directly taken from the generated HTML:

<div class="ircquote">
<span class="channel">#ruby-de</span>
<div class="line">12:18 &lt;ionas_&gt;  alles was nicht analog ist ist lossy ;p</div>
<div class="line">12:18 &lt;ionas_&gt;  und alles was analog ist geht schnell kaputt ,p</div>
</div>

In default IRC style, I quote the nickname with < and >, but antifuchs tells me he doesn’t see any nicks when he looks at my blog with Bloglines. Weird, I think, and decide to have a look at it.

Just for fun, I subscribe to my blog in NetNewsWire and I see, …no nicknames! Now, how is my Atom feed generated? The snippet looks about like that:

<entry>
<title>25</title>
<!-- ... --->
<content mode="xml" xmlns="http://www.w3.org/1999/xhtml">
  <div class="ircquote">
  <span class="channel">#ruby-de</span>
  <div class="line">12:18 &lt;ionas_&gt;  alles was nicht analog ist ist lossy ;p</div>
  <div class="line">12:18 &lt;ionas_&gt;  und alles was analog ist geht schnell kaputt ,p</div>
  </div>
</content>
</entry>

And I start to wonder. My Atom feed is perfectly valid, and I just inserted the raw (and valid) XHTML as-is. This should be OK. To quote the Atom specification (emphasis mine):

3) If the value of “type” is “xhtml”, the content of atom:content MUST be a single XHTML div element [XHTML], and SHOULD be suitable for handling as XHTML. The XHTML div element itself MUST NOT be considered part of the content. Atom Processors that display the content MAY use the markup to aid in displaying it. The escaped versions of characters such as “&” and “>” represent those characters, not markup.

Now, apparently, both Bloglines and NetNewsWire somehow pass the XHTML to a rendering engine, in either case my browser respective HTMLKit. And those seem to parse it again, thereby creating the tag <ionas_>. Now, I fixed that by escaping all & in my Atom feeds with &amp;, so now the nick reads &amp;lt;ionas_&amp;gt;. Which is more than ugly and really pisses me off.

When I see such stuff, sometimes I think, RSS really did it better when they just decided to escape the whole stuff and stray their entities all over. That would be consistent, at least.

The civilization of today surely will go down because escaping doesn’t work (and don’t even get me started on encodings, oh my…).

NP: Le Tigre—Phanta

Read: The Dark Side of Atom

Topic: Dreamhost now supports Ruby on Rails Previous Topic   Next Topic Topic: Just testing...

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use