Ruby Buzz Forum - The Dark Side of Atom

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
The Dark Side of Atom

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Christian Neukirchen

Posts: 188
Nickname: chris2
Registered: Mar, 2005

Christian Neukirchen is a student from Biberach, Germany playing and hacking with Ruby.

The Dark Side of Atom

Posted: Jun 27, 2005 1:19 PM

This post originated from an RSS feed registered with Ruby Buzz by Christian Neukirchen.
Original Post: The Dark Side of Atom Feed Title: chris blogs: Ruby stuff Feed URL: http://chneukirchen.org/blog/category/ruby.atom Feed Description: a weblog by christian neukirchen - Ruby stuff	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Christian Neukirchen Latest Posts From chris blogs: Ruby stuff

Yesterday antifuchs told me about a problem with the Atom feed of Anarchaia, that now and then includes IRC quotes like this:

#ruby-de

12:18 <ionas_> alles was nicht analog ist ist lossy ;p

12:18 <ionas_> und alles was analog ist geht schnell kaputt ,p

In raw HTML, this looks like that, this code is directly taken from the generated HTML:

<div class="ircquote">
<span class="channel">#ruby-de</span>
<div class="line">12:18 &lt;ionas_&gt;  alles was nicht analog ist ist lossy ;p</div>
<div class="line">12:18 &lt;ionas_&gt;  und alles was analog ist geht schnell kaputt ,p</div>
</div>

In default IRC style, I quote the nickname with < and >, but antifuchs tells me he doesn’t see any nicks when he looks at my blog with Bloglines. Weird, I think, and decide to have a look at it.

Just for fun, I subscribe to my blog in NetNewsWire and I see, …no nicknames! Now, how is my Atom feed generated? The snippet looks about like that:

<entry>
<title>25</title>
<!-- ... --->
<content mode="xml" xmlns="http://www.w3.org/1999/xhtml">
  <div class="ircquote">
  <span class="channel">#ruby-de</span>
  <div class="line">12:18 &lt;ionas_&gt;  alles was nicht analog ist ist lossy ;p</div>
  <div class="line">12:18 &lt;ionas_&gt;  und alles was analog ist geht schnell kaputt ,p</div>
  </div>
</content>
</entry>

And I start to wonder. My Atom feed is perfectly valid, and I just inserted the raw (and valid) XHTML as-is. This should be OK. To quote the Atom specification (emphasis mine):

3) If the value of “type” is “xhtml”, the content of atom:content MUST be a single XHTML div element [XHTML], and SHOULD be suitable for handling as XHTML. The XHTML div element itself MUST NOT be considered part of the content. Atom Processors that display the content MAY use the markup to aid in displaying it. The escaped versions of characters such as “&” and “>” represent those characters, not markup.

Now, apparently, both Bloglines and NetNewsWire somehow pass the XHTML to a rendering engine, in either case my browser respective HTMLKit. And those seem to parse it again, thereby creating the tag <ionas_>. Now, I fixed that by escaping all & in my Atom feeds with &, so now the nick reads &lt;ionas_&gt;. Which is more than ugly and really pisses me off.

When I see such stuff, sometimes I think, RSS really did it better when they just decided to escape the whole stuff and stray their entities all over. That would be consistent, at least.

The civilization of today surely will go down because escaping doesn’t work (and don’t even get me started on encodings, oh my…).

NP: Le Tigre—Phanta

Read: The Dark Side of Atom

Previous Topic

Next Topic


	Web Artima.com