Java Buzz Forum - A lucene equivalent for non-text documents

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Buzz Forum
A lucene equivalent for non-text documents

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Norman Richards

Posts: 396
Nickname: orb
Registered: Jun, 2003

Norman Richards is co-author of XDoclet in Action

A lucene equivalent for non-text documents

Posted: Jan 28, 2005 1:35 PM

This post originated from an RSS feed registered with Java Buzz by Norman Richards.
Original Post: A lucene equivalent for non-text documents Feed Title: Orb [norman richards] Feed URL: http://members.capmac.org/~orb/blog.cgi/tech/java?flav=rss Feed Description: Monkey number 312,978,199	Latest Java Buzz Posts Latest Java Buzz Posts by Norman Richards Latest Posts From Orb [norman richards]

I've tried to use Lucene on a few projects, but I've never really been able to make it work. Where you have a simple search box and a collection of backing documents, Lucene works well. Unfortunately, that's not a need I've had often. I did once help someone set up nutch to create a specialized search engine, but aside from that one case Lucene hasn't been that useful.

I'm not suggesting Lucene is bad or poorly implemented. I'm just saying that I've often had the desire to go well beyond that. I've often been faced with the projects that could have made great use of a fast local (potential in memory?) index of a data set. A perfect example is a product catalog. I don't want to resort to slow remote database search to do a catalog search. Lucene can get the fuzzy text search that relational databases don't get, but it really doesn't do good when dealing non-text data like dates and numbers, and it falls over completely when you want to apply some type of ordering. With multiple indexes you can hack together a solution that works ok, but it is really stretching the limits of what Lucene was intended to do.

What I would like is something that works like Lucene but allows you to programmatically specify the types of indexes you want. Then you could issue queries locally for the data and load the data from your cache .

It seems like a great idea, but perhaps that is just due to my LDAP background. What I've described is almost exactly how we implemented search in the LDAP server I worked on. I believe OpenLDAP works like that still. (not surprising since the share a common ancestry in the original slapd code) We use BerkeleyDB as the datastore for our indexes, as well as for the data itself. It was extremely fast for mostly read access and scaled well with load.

Yet, I haven't seen any projects out there that do this. The success of Lucene shows that people want search. I think the market for indexing is even larger. If I were going to commit to yet another open source project, I think something in this space would be at the top of my list.

To reiterate, I like Lucene. My frustration with Lucene stems solely from trying to apply it to tasks that it wasn't suited for. I should also say that I liked Lucene in Action. I actually used an early draft of the book to learn Lucene from. So, go get it. (oh, and check out those nice back cover quotes)

Read: A lucene equivalent for non-text documents

Previous Topic

Next Topic


	Web Artima.com