The Artima Developer Community
Sponsored Link

Java Buzz Forum
A lucene equivalent for non-text documents

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Norman Richards

Posts: 396
Nickname: orb
Registered: Jun, 2003

Norman Richards is co-author of XDoclet in Action
A lucene equivalent for non-text documents Posted: Jan 28, 2005 1:35 PM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Norman Richards.
Original Post: A lucene equivalent for non-text documents
Feed Title: Orb [norman richards]
Feed URL: http://members.capmac.org/~orb/blog.cgi/tech/java?flav=rss
Feed Description: Monkey number 312,978,199
Latest Java Buzz Posts
Latest Java Buzz Posts by Norman Richards
Latest Posts From Orb [norman richards]

Advertisement

I've tried to use Lucene on a few projects, but I've never really been able to make it work. Where you have a simple search box and a collection of backing documents, Lucene works well. Unfortunately, that's not a need I've had often. I did once help someone set up nutch to create a specialized search engine, but aside from that one case Lucene hasn't been that useful.

I'm not suggesting Lucene is bad or poorly implemented. I'm just saying that I've often had the desire to go well beyond that. I've often been faced with the projects that could have made great use of a fast local (potential in memory?) index of a data set. A perfect example is a product catalog. I don't want to resort to slow remote database search to do a catalog search. Lucene can get the fuzzy text search that relational databases don't get, but it really doesn't do good when dealing non-text data like dates and numbers, and it falls over completely when you want to apply some type of ordering. With multiple indexes you can hack together a solution that works ok, but it is really stretching the limits of what Lucene was intended to do.

What I would like is something that works like Lucene but allows you to programmatically specify the types of indexes you want. Then you could issue queries locally for the data and load the data from your cache.

It seems like a great idea, but perhaps that is just due to my LDAP background. What I've described is almost exactly how we implemented search in the LDAP server I worked on. I believe OpenLDAP works like that still. (not surprising since the share a common ancestry in the original slapd code) We use BerkeleyDB as the datastore for our indexes, as well as for the data itself. It was extremely fast for mostly read access and scaled well with load.

Yet, I haven't seen any projects out there that do this. The success of Lucene shows that people want search. I think the market for indexing is even larger. If I were going to commit to yet another open source project, I think something in this space would be at the top of my list.

To reiterate, I like Lucene. My frustration with Lucene stems solely from trying to apply it to tasks that it wasn't suited for. I should also say that I liked Lucene in Action. I actually used an early draft of the book to learn Lucene from. So, go get it. (oh, and check out those nice back cover quotes)

Read: A lucene equivalent for non-text documents

Topic: We, The Observers Previous Topic   Next Topic Topic: Looking for a graphics design person

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use