The Artima Developer Community
Sponsored Link

Java Community News
Clustering Lucene

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Clustering Lucene Posted: Nov 7, 2006 3:06 PM
Reply to this message Reply
Summary
As popular Web applications increasingly rely on full-text search functionality, making a search index highly available has become an important requirement for any search solution. In a recent blog post, Terracotta's Orion Letizi demonstrates a technique to cluster the open-source Apache Lucene search index.
Advertisement

Apache's Lucene search API provides text search functionality for an increasing array of popular sites and enterprise products. According to Lucene project documents, parts of Wikipedia relies on Lucene for full-text search, as does code search Web site Krugle, using the Lucene-based open-source search engine Nutch for its code search tool. Lucene has been ported to Ruby, C#, and C++.

With full-text search playing an increasingly pivotal role for many Web sites and enterprise applications, the availability of the index files used to execute text searches has become an important requirement. In a recent blog post, Clustering Lucene, Terracotta's Orion Letizi describes his attempt to cluster Lucene using his company's JVM clustering tool.

Letizi's posts suggests that making Lucene-based search highly available—or scaling it to serve a large number of requests—is largely a matter of clustering Lucene's search index directory:

We used an implementation of the Lucene Directory interface called the RAMDirectory as the index store and made it a clustered object. That's done with a scrap of configuration that tells Terracotta to make our RAMDirectory shared. After that, manipulating the index is business as usual...

Keeping this same RAMDirectory consistent across multiple JVMs without transparent object clustering would be a real hassle. Just to keep the indexes up to date by hand, you'd have to trap changes to the them and then somehow send those changes out to the other JVMs and apply them. Keeping the indexes consistent is even harder.

Letizi points out that, with Terracotta's approach, clustering is "transparent" in the sense that making an object's state available on other nodes is a matter of setting configuration options:

[The clustered version] isn't really any different with clustering enabled than it is without clustering. In fact, turning clustering on and off is as simple as invoking Java with or without a couple of Terracotta options.

What is your experience with "transparent" clustering, especially in data-intensive applications, such as text search? How well do you think the transparent clustering approach mentioned in Letizi's blog scales to large Lucene indexes?

Topic: Clustering Lucene Previous Topic   Next Topic Topic: Continuous Integration Tools Comparison

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use