The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Normalization vs Compression

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
Normalization vs Compression Posted: Jul 9, 2010 12:08 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: Normalization vs Compression
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Michael Lucas-Smith

Advertisement

Recently I was working on a relational database and I was juggling the eternal act of normalization with efficiency. There are many books about database normalization written by people who love writing books about database normalization. I've read a couple myself and it all seemed reasonable back in university.

But it's the 21st century and it dawned on me that normalization is the futile act of trying to manually compress data. If you break up your data records in to its related component parts to -avoid duplicate data- then you're literally trying to re-invent compression ..badly.

So let's say we let compression deal with our data storage. Smart hashing would mean you'd easily be able to store just as much data in "document" format. But how do you identify when two records are actually the same? I suspect this is the job of hashing and indexing. If two people have the same (and more interestingly - similar) address, statistical machine analysis will find this fact without a programmer having to define the concept of address as its own table.

I've also seen a few impressive search engines that work by doing exactly this - compressing the data and using hashes to look it up rapidly. That combined with similar indexing and automatic data normalization through reduction seems interesting to me. It could find patterns that you'd never think of normalizing normally but ultimately make your program more efficient, based on how you're using the data.

You can also use the same technique to apply indexing automatically. It is compression after all, so finding the kinds of data you're looking for is the job of statistical machine learning. Is there anyone out there doing this right now? I'd be interested in trying to use this kind of database over a classical relational database.

Read: Normalization vs Compression

Topic: Molybdenum Cross Browser Testing Previous Topic   Next Topic Topic: Learning TDD through Test-first Teaching

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use