The Artima Developer Community
Sponsored Link

Agile Buzz Forum
A Generic Dictionary

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
A Generic Dictionary Posted: Jun 21, 2004 5:42 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: A Generic Dictionary
Feed Title: Travis Griggs - Blog
Feed URL: http://www.cincomsmalltalk.com/rssBlog/travis-rss.xml
Feed Description: This TAG Line is Extra
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Travis Griggs - Blog

Advertisement
This week, I thought I'd experiment with some new Collection classes

The Hypothesis

Sometimes, I want a dictionary that looks up on some computed attribute of some keys. With the base library, I have two choices. Equality and Identity. Identity seems pretty straightforward. But sometimes just what makes an object "equal" in its role as a key, can be a little less. For example, what if I compute values for given floating point inputs, but I want them bin them to the nearest 1/10th. Or what if I want to lookup customers by their name, and it's ok to assume that the lookup is case insensitive.

We could apply the conversion function to our computed key going in and out of the dictionary, but that gets old.

Also, I'm intrigued that the one size fits all hash function might not always be all that is needed for a given domain. And if we could use a specific hash function we might be able to improve the speed of lookup.

The Implementation

What we need is a dictionary where we can parameterize the functions used to index keys in the dictionary. There are two functions involved, first the hash function, and secondly, the comparison function.

Smalltalk.Core defineClass: #GenericDictionary
	superclass: #{Core.Dictionary}
	indexedType: #objects
	private: false
	instanceVariableNames: 'compareFunction hashFunction '
	classInstanceVariableNames: ''
	imports: ''
	category: 'Collections-Unordered'

Now we've got our dictionary, we'll use blocks for our functions. Here's our simple setter. We just provide one dual arg version. One has to still observe the invariant that all objects which are equal MUST hash the same, so forcing the application program to set the two functions in context of each other is probably a good thing.

compare: a2ArgBlock hash: a1ArgBlock 
	compareFunction := a2ArgBlock.
	hashFunction := a1ArgBlock

We want to initialize new instances to a good default state; the default use of = and hash seems good. Sets and thereby Dictionaries, don't use an initialize method, the use a setTally method at creation time to get initialized. So we override thusly:

setTally
	self compare: [:a :b | a = b] hash: [:obj | obj hash].
	^super setTally

With that in place, it turns there's exactly one method that needs to be tuned to use our functions:

findKeyOrNil: key 
	"overriden to use hash and compare function blocks, rather than hardcoded #hash and #="

	| location length probe pass |
	length := self basicSize.
	pass := 1.
	location := self initialIndexFor: (hashFunction value: key)
				boundedBy: length.
	
	[(probe := self basicAt: location) == nil 
		or: [compareFunction value: probe key value: key]] 
			whileFalse: 
				[(location := location + 1) > length 
					ifTrue: 
						[location := 1.
						pass := pass + 1.
						pass > 2 ifTrue: [^self grow findKeyOrNil: key]]].
	^location

That's all there is to it.

The Results

The case insensitive keys is kind of fun. It seems to work fine:

dict := GenericDictionary new 
			compare: [:a :b | a equivalentTo: b ignoreCase: true]
			hash: [:a | a asLowercase hash].
dict add: 'Travis' -> 'Griggs'.
Transcript
	show: (dict at: 'travis');
	cr.
dict at: 'TRAVIS' put: '(The Kid in Old Yeller)'.
Transcript
	show: (dict at: 'traVIS');
	cr

outputs 'Griggs' and then '(The Kid in Old Yeller)' to my transcript.

I was kind of disappointed in my attempts to find a domain where I could best the default implementation for speed. Adding the extra level of block invocation will make our version a little slower than default for the default case. I used a timing test of the form:

| dict keys | dict := (GenericDictionary new: 1000).
rnd := Random new.
keys := (1 to: 100) collect: [:each | rnd next].
(Time millisecondsToRun: [1000 timesRepeat:  [1 to: keys size do: [:i | dict at: (keys at: i) put: $y]]]) out.

It's not too bad, for this case, it was only about 10-15% slower. I then tried some floating point domains and specialized hash functions. In most cases, the default remained faster. Only after having a good look at the Float hash method was able to derive one where I could come out ahead:

| dict keys | dict := (GenericDictionary new: 1000) compare: [:a :b | a = b] hash: [:a | a truncated].
rnd := Random new.
keys := (1 to: 100) collect: [:each | rnd next * 1000].
(Time millisecondsToRun: [1000 timesRepeat:  [1 to: keys size do: [:i | dict at: (keys at: i) put: $y]]]) out

This particular example assumes a set of floating point keys that are sparsely spread far apart. At this point, we know that simply truncating them for a hash function is enough to get good separation, the default hash function does a bit more than that. So what we lose in block invocation, we gain by much faster hashing, for an overall speed increase of about 2.5x.

I'd have to conclude that only in rare cases is one going to come up with specialized hash/compare functions that are going to be a significant speed improvement. It's still kind of nice to be able to use it as a binning histogram. Or a case insensitive dictionary. I'll have to cogitate to see if there aren't some more interesting problems we could throw this guy at. Can you think of any?

Read: A Generic Dictionary

Topic: Security checks for Syndication Previous Topic   Next Topic Topic: PDF: Want Better Software? Just Ask: Seven Things Project Customers Can Do

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use