Eran's blog

Advanced Lucene (ApacheCon 2005)

Ryan pointed me to this fine resource on advanced Lucene topics. Grant Ingersoll of CNLP recently presented at ApacheCon 2005, and discussed several very interesting topics, most accompanied by code (!).

Relevance feedback is a technique used to augment the user’s query with terms from the best matching documents. Grant shows how to use Lucene’s Term Vectors to do this. Relevance feedback has been on my todo-list for a few weeks now, it’s great to have a good starting point.

Span queries provide information about where a match took place within a document. The presentation explains how to use them for phrase matching (I wonder if that could replace named entity detection completely) and how to further use them for question answering. Looks like you could also use span queries as the basis for an efficient result summarizer.

Grant then goes on to discuss some of the work done at CNLP, using Lucene in various scenarios and domains. Very interesting stuff. I’ve yet to digest most of it but I’m sure I will soon, go check it out!


Filed under: Search

%d bloggers like this: