The Artima Developer Community
Sponsored Link

Python Buzz Forum
Think, Sync and Wink (part two)

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Sidnei da Silva

Posts: 31
Nickname: dreamcatch
Registered: Aug, 2003

Sidnei da Silva is a dirty little brazilian python hacker.
Think, Sync and Wink (part two) Posted: Oct 15, 2003 12:25 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Sidnei da Silva.
Original Post: Think, Sync and Wink (part two)
Feed Title: dreamcatcher.homeunix.org
Feed URL: http://dreamcatcher.homeunix.org/categories.rdf?category=Python
Feed Description: making your dreams come true
Latest Python Buzz Posts
Latest Python Buzz Posts by Sidnei da Silva
Latest Posts From dreamcatcher.homeunix.org

Advertisement
Think, Sync and Wink (part two)

One of the first things I noticed when looking at IndexedCatalog for the first time was that the fact that it stored the indexes as OOBTrees, where the value was a reference to the object would probably cause a significant slowdown when querying, cause it would potentially wake up lots of objects unnecessarily. This proved to be true when we made the first profile: there were around 2000 calls to __setstate__ on a normal query, which was responsible for around 75% of the total time. There was also a intersection between OOSets (containing object references) involved, which is undoubtly slower than a intersection using IISets.

So, we decided to go ahead with the plan of converting the OO*s to II*s and added a new feature to the plan, after a discussion over chinese food: we would try to delay loading the objects until it was strictly necessary. That would be possible because the objects are normally fetched from a search result, and using only OIDs on the indexes would allow us to return the object, given a OID when the user iterates through the search results.

So, the workflow is more or less like this now:

  • User does a query
  • Catalog delegates query to the indexes
  • Indexes returns a list of OIDs (actually a IISet)
  • Catalog builds a Result object with the intersection of the OIDs received from the Indexes
  • Result, when asked for an item, does a lookup by the OID and returns the actual object.

Needless to say, the improvement was overwhelming. Not only the query was blazingly faster, but the database, after replacing the indexes, was 20% smaller.

I must admit: the BTrees package its one of the most amazing ones I've used during all the time I've been involved with python, and when you deploy it the right way, it can make a world of difference.

Read: Think, Sync and Wink (part two)

Topic: Molecular Formula - History of Chemical Nomenclature Previous Topic   Next Topic Topic: Skip Lists

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use