Summary
Mostly a bugfix release form version 1.9, the Lucene project released the 2.0 version of its popular open-source search engine tool. The latest release removes most deprecated methods, and provides many new features over the last major version.
Advertisement
According to the Lucene project, version 2.0 is mainly a bugfix release, and thus contains many of new features that were already present in the 1.9 version. Here are some of the new features (for a complete list, see the Lucene 2.0 release notes):
Support for binary stored fields and stored compressed fields.
A new DateTools allows formatting of dates in a readable format adequate for indexing. Unlike Lucene's existing DateField, DateTools can handle dates before 1970, and forces the specifying of a desired date resolution, making RangeQuerys more efficient. In addition, a new RangeFilter is a more generically useful filter than DateFilter filter on date ranges.
Lucene's QueryParser now works with Analyzers that can return more than one token per position: A query such as "+fast +car" would be parsed as "+fast +(car automobile)" if the Analyzer returns "car" and "automobile" at the same position whenever it finds "car."
The new NumberTools utility helps index numeric fields.
Two new regular expression queries, RegexQuery and SpanRegexQuery, were added.
The new DisjunctionMaxQuery provides the maximum score across it's clauses, which is useful for searching across multiple field.
The newly added public static IndexReader.main(String[] args) method in IndexReader can now be used at the command line to list and optionally extract the individual files from an existing compound index file.
The new ParallelReader is an IndexReader that combines separate indexes over different fields into a single virtual index.
No but good god I keep looking for an excuse too. Lucene is the one gem in Java I keep looking to find time to work seriously with. Of course I've run the generic "index the source" demo, but regex querying? Awesome. Sigh!