Java Buzz Forum - More database type switching

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Buzz Forum
More database type switching

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Bill de hÓra

Posts: 1137
Nickname: dehora
Registered: May, 2003

Bill de hÓra is a technical architect with Propylon

More database type switching

Posted: Aug 11, 2005 7:47 AM

This post originated from an RSS feed registered with Java Buzz by Bill de hÓra.
Original Post: More database type switching Feed Title: Bill de hÓra Feed URL: http://www.dehora.net/journal/atom.xml Feed Description: FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8	Latest Java Buzz Posts Latest Java Buzz Posts by Bill de hÓra Latest Posts From Bill de hÓra

I got some quick feedback on my question about how to treat disjoint types in an RDBMS, but it seems I left out some detail and I might have posed the wrong question altogether. To recap, there's an event structure as follows: class event: def __init__(self, what, where, when) self.what=what self.where=where self.when=when whose 'what' value can be a string, a URI or an XML document. The thing is that these 3 types are disjoint and I was wondering what people thought the idiomatic way to deal with this issue was in an RDBMS. Bill Seitz mentioned sparse tables where one of the 3 possible 'what' columns is populated for each row "Or, maybe I'd make whatType, whatString, whatUri, and whatBlob fields in a single (sparse) table." Aristotle Pagaltzis described a normalised approach and its potential runtime inefficiency: "The clean, minimally redundant approach would be to use four tables, of which one is the 'event', table which holds only when/where pairs and a primary key, and of which the other three are 'what' tables whose the primary keys are simultaenously foreign keys to the event table. This way each datum can be stored in a properly typed column, without storing boatloads of NULLs as you’d have to if you did this with a single table having one column per type of value....Unfortunately, this is stupidly costly to query – you need three left joins in every single statement.Worse, you need the primary key from the event table before you can update any of the what tables, so you have to chatter back and forth with the database instead of dumping bulk statements on it." Adam Vandenberg asked: "Are you going to query against "what", or just process them when they come up in a query?" So it seems to be the case that instead of thinking about a generified RDBMS setup here for disjoint types, we need to think about what needs to be done with the data. Ok, so of the 3 possible types (text, URIs, XML) two of them are candidates for querying against interactively: The 'well-known' XML format is a candidate to be queried on as it has has a standard header set; that would be much more useful to capture as a table than as a blob. That way we can ask question like: "show me all the events where the XML header whose foo element (now a column) is 'bar'". The URI is a candidate to be queried on since it is a name of some class of events: "show me all the events with a URI of 'X' since this date". The text data I wouldn't expect to query against, just render. That would tend to lead to them being kept in their own tables. In terms of volumes, I'd imagine we'd be seeing around 25,000 of these events each week, where about 80% are 'well-known' XML and for the sake of argument let's say we could flush the database annually leaving a running total of about 1,300,000 records. Incidentally, Jimmy Cerra mentioned that in RDF: "I'd use a NODE type and have the object be either a URIRef or a typed literal (definitely not rdf:type though)." and if any of the RDF community are reading, you can multiple the above figures by 10 - each one of these events results in approximately 10 RDF statements. I gather that 6-10M triples is the state of the art in RDF storage but here we'd be talking about 13M statements, which I think would argue for partitioning the data in separate graphs....

Read: More database type switching

Previous Topic

Next Topic


	Web Artima.com