This post originated from an RSS feed registered with Java Buzz
by Wolf Paulus.
Original Post: Tiger's Spotlight - Simplicity with Room for Improvement
Feed Title: Wolf's Web Journal
Feed URL: http://wolfpaulus.com/feed/
Feed Description: Journal - dedicated to excellence, and motivated by enthusiasm to trying new things
Thu. I attended Apple's Tiger Tech Talk, seeing some of the changes in the next OSX release first hand.
One of the features that will definitely created attention is Spotlight, Tiger's new search technology. Tiger will include an Index-Server, capable of parsing files on all mounted drives. Every time a file is created, saved, moved, copied, or deleted, the file system automatically ensures that the file is properly indexed. The dictionary itself is a file system-level database, holding all of the meta-data attributes about the files, as well as an index of their contents.
So-called Importers are used to decide what to put into the dictionary. It seems to be that for every filetype only one Importer can be registered system-wide. Moreover, the importer-call is triggered form a low-level file-system routine and it may not be obvious anymore, which application triggered the event causing the (re-)parsing request. An importer is handed a pointer to the file that needs to be index and a dictionary.
Like afore mentioned, the dictionary contains both meta-data and an index of the contents, which however, will be limited to a few 100 kilobytes per file.
While I'm not doubting the success of Apple marketing this as a world-class search tool, from what I know today, there are some shortcomings:
* Only one Importer registrable per File type (.extension) system-wide.
* No daisy-chaining of Importers.
* Very limited amount of data to be stored in dictionary per file.
* No built-in capability to index the content of compressed files like jars, zip, tar etc.
Looking at this, I first thought that it's purely text-file based indexer. However, looking at how Apple is using this to index iCalendar and AddressBook contents a workaround becomes immediately clear:
A dummy type ( like .dmy ) is registered with a custom Importer. Every time an application needs to insert, delete or modify an entry in the dictionary it just touches a myfile.dmy file and Mac OS X calls the custom Importer which can now do whatever necessary with the dictionary it receives as a parameter.