The Artima Developer Community
Sponsored Link

Java Buzz Forum
More database type switching

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Bill de hÓra

Posts: 1137
Nickname: dehora
Registered: May, 2003

Bill de hÓra is a technical architect with Propylon
More database type switching Posted: Aug 11, 2005 7:47 AM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Bill de hÓra.
Original Post: More database type switching
Feed Title: Bill de hÓra
Feed URL: http://www.dehora.net/journal/atom.xml
Feed Description: FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8
Latest Java Buzz Posts
Latest Java Buzz Posts by Bill de hÓra
Latest Posts From Bill de hÓra

Advertisement
I got some quick feedback on my question about how to treat disjoint types in an RDBMS, but it seems I left out some detail and I might have posed the wrong question altogether. To recap, there's an event structure as follows: class event: def __init__(self, what, where, when) self.what=what self.where=where self.when=when whose 'what' value can be a string, a URI or an XML document. The thing is that these 3 types are disjoint and I was wondering what people thought the idiomatic way to deal with this issue was in an RDBMS. Bill Seitz mentioned sparse tables where one of the 3 possible 'what' columns is populated for each row "Or, maybe I'd make whatType, whatString, whatUri, and whatBlob fields in a single (sparse) table." Aristotle Pagaltzis described a normalised approach and its potential runtime inefficiency: "The clean, minimally redundant approach would be to use four tables, of which one is the 'event', table which holds only when/where pairs and a primary key, and of which the other three are 'what' tables whose the primary keys are simultaenously foreign keys to the event table. This way each datum can be stored in a properly typed column, without storing boatloads of NULLs as you’d have to if you did this with a single table having one column per type of value....Unfortunately, this is stupidly costly to query – you need three left joins in every single statement.Worse, you need the primary key from the event table before you can update any of the what tables, so you have to chatter back and forth with the database instead of dumping bulk statements on it." Adam Vandenberg asked: "Are you going to query against "what", or just process them when they come up in a query?" So it seems to be the case that instead of thinking about a generified RDBMS setup here for disjoint types, we need to think about what needs to be done with the data. Ok, so of the 3 possible types (text, URIs, XML) two of them are candidates for querying against interactively: The 'well-known' XML format is a candidate to be queried on as it has has a standard header set; that would be much more useful to capture as a table than as a blob. That way we can ask question like: "show me all the events where the XML header whose foo element (now a column) is 'bar'". The URI is a candidate to be queried on since it is a name of some class of events: "show me all the events with a URI of 'X' since this date". The text data I wouldn't expect to query against, just render. That would tend to lead to them being kept in their own tables. In terms of volumes, I'd imagine we'd be seeing around 25,000 of these events each week, where about 80% are 'well-known' XML and for the sake of argument let's say we could flush the database annually leaving a running total of about 1,300,000 records. Incidentally, Jimmy Cerra mentioned that in RDF: "I'd use a NODE type and have the object be either a URIRef or a typed literal (definitely not rdf:type though)." and if any of the RDF community are reading, you can multiple the above figures by 10 - each one of these events results in approximately 10 RDF statements. I gather that 6-10M triples is the state of the art in RDF storage but here we'd be talking about 13M statements, which I think would argue for partitioning the data in separate graphs....

Read: More database type switching

Topic: JavaScript number validation Previous Topic   Next Topic Topic: [DrunkAndRetired.com Podcast] Episode 14 - UML, Shit in Boxes, Java 5.0

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use