Agile Buzz Forum - RSS Identity

Gordon tackles the syndication ID issue, and mentions in passing the scheme I use on this server:

RSS 2.0, RSS 1.0, and Atom all provide a way to handle post identity: the <guid> element, the rdf:about attribute, and the <atom:id> elements, respectively. Unfortunately, not everyone provides this metadata, or does it incorrectly: for instance, CNN doesn't give you GUIDs, the Cincom weblogs just use big integers (these look like they might be dates, but I'm not sure), and PHP.NET is re-using the rdf:about attribute on different posts. The problems, from last to first: if you identify posts by GUID, re-using a GUID amounts to modifying a post, though that doesn't seem to be the intent in this case. Using big integers is poor practice, because an integer isn't a GUID. Recall that the GU part stands for globally unique : if you use integers as GUIDs, you're just hoping that there won't be a collision, especially if your protocol is to increment a counter with each new post. If you're going to use an integer for a GUID, use a really big one (128 bits or so), and use an algorithim appropriate for the purpose: counters are not appropriate.

Well, I can explain what I'm doing here. Ideally, the GUID should probably be an URL, and the scheme I'm using would allow for that. Way back when I started the codebase, that didn't seem so obvious, so I picked something I figured would be unlikely to be duped:

guid := Timestamp now asSeconds.

Meaning, the guid for a post on this server is the creation time for the post. It's unique within the context of a single image server, and unlikely to be duplicated somewhere else. As well, in the feed itself, it's clearly marked as not being an url.

Was this a great idea in the long term? Possibly not. The scheme could produce duplicate numbers (at least across different blogs) if I started using more than one image - and, if I ever changed the code to use more than one image for a given blog, I'd have real problems. On the other hand, I'd need to make multiple changes to do the latter change, so it's unlikely to happen. As for duplicate numbers across different blogs? Yeah, that's an issue, and if I ever need to cross that bridge, I'll just change the ids to urls - the ID is used by the server to identify posts, so it would be a trivial change.

If it's a trivial change, why haven't I done it? Sheer inertia. Things work at present, and I haven't been highly motivated to fix a theoretical problem. At least, not yet.


	Web Artima.com