Summary
The new release of the open-source Java search engine contains over thirty new features and numerous enhancements. Here's a quick look at some of them.
Advertisement
The 1.9 release is mostly source compatible with previous Lucene releases, and provides many new features and enhancements:
Support for binary stored fields and stored compressed fields
A new DateTools allows formatting of dates in a readable format adequate for indexing. Unlike Lucene's existing DateField, DateTools can handle dates before 1970, and forces the specifying of a desired date resolution, making RangeQuerys more efficient. In addition, a new RangeFilter is a more generically useful filter than DateFilter filter on date ranges.
Lucene's QueryParser now works with Analyzers that can return more than one token per position:
A query such as "+fast +car" would be parsed as "+fast +(car automobile)" if the Analyzer returns "car" and "automobile" at the same position whenever it finds "car."
The new NumberTools utility helps index numeric fields.
Two new regular expression queries, RegexQuery and SpanRegexQuery, were added.
The new DisjunctionMaxQuery provides the maximum score across it's clauses, which is useful for searching across multiple field.
The newly added public static IndexReader.main(String[] args) method in IndexReader can now be used at the command line to list and optionally extract the individual files from an existing compound index file.
The new ParallelReader is an IndexReader that combines separate indexes over different fields into a single virtual index.
Lucene has been around for while, and is now emerging as a major component of many open-source projects incorporating search. Do you use Lucene in your projects? Do you think general-purpose search engine tools built on Lucene, such as Nutch, can one day challenge current, closed-source search engines?
> Interesting, > > Frank what do you think of PyLucene? Maybe it is a moot > issue. I'm wondering "which one" I should start with.
I'm not familiar with PyLucene. But we're looking into Java Lucene at Artima.com, and will be looking more intensely in the coming months, as most of the Artima.com search capabilities will be facilitated by Lucene. Stay tuned, as we'll most certainly blog about our experience.
I'd be curious, though, to hear of others' experience in using Lucene as an underlying search tool for a large-scale Web site. How does Lucene scale in practice, for instance?