Java Community News - Clustering Lucene

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Community News
Clustering Lucene

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Clustering Lucene

Posted: Nov 7, 2006 3:06 PM

Summary
As popular Web applications increasingly rely on full-text search functionality, making a search index highly available has become an important requirement for any search solution. In a recent blog post, Terracotta's Orion Letizi demonstrates a technique to cluster the open-source Apache Lucene search index.

Apache's Lucene search API provides text search functionality for an increasing array of popular sites and enterprise products. According to Lucene project documents, parts of Wikipedia relies on Lucene for full-text search, as does code search Web site Krugle, using the Lucene-based open-source search engine Nutch for its code search tool. Lucene has been ported to Ruby, C#, and C++.

With full-text search playing an increasingly pivotal role for many Web sites and enterprise applications, the availability of the index files used to execute text searches has become an important requirement. In a recent blog post, Clustering Lucene, Terracotta's Orion Letizi describes his attempt to cluster Lucene using his company's JVM clustering tool.

Letizi's posts suggests that making Lucene-based search highly available—or scaling it to serve a large number of requests—is largely a matter of clustering Lucene's search index directory:

We used an implementation of the Lucene Directory interface called the RAMDirectory as the index store and made it a clustered object. That's done with a scrap of configuration that tells Terracotta to make our RAMDirectory shared. After that, manipulating the index is business as usual...

Keeping this same RAMDirectory consistent across multiple JVMs without transparent object clustering would be a real hassle. Just to keep the indexes up to date by hand, you'd have to trap changes to the them and then somehow send those changes out to the other JVMs and apply them. Keeping the indexes consistent is even harder.

Letizi points out that, with Terracotta's approach, clustering is "transparent" in the sense that making an object's state available on other nodes is a matter of setting configuration options:

[The clustered version] isn't really any different with clustering enabled than it is without clustering. In fact, turning clustering on and off is as simple as invoking Java with or without a couple of Terracotta options.

What is your experience with "transparent" clustering, especially in data-intensive applications, such as text search? How well do you think the transparent clustering approach mentioned in Letizi's blog scales to large Lucene indexes?

Previous Topic

Next Topic


	Web Artima.com