Advanced Lucene (ApacheCon 2005)

Ryan pointed me to this fine resource on advanced Lucene topics. Grant Ingersoll of CNLP recently presented at ApacheCon 2005, and discussed several very interesting topics, most accompanied by code (!).

Relevance feedback is a technique used to augment the user’s query with terms from the best matching documents. Grant shows how to use Lucene’s Term Vectors to do this. Relevance feedback has been on my todo-list for a few weeks now, it’s great to have a good starting point.

Span queries provide information about where a match took place within a document. The presentation explains how to use them for phrase matching (I wonder if that could replace named entity detection completely) and how to further use them for question answering. Looks like you could also use span queries as the basis for an efficient result summarizer.

Grant then goes on to discuss some of the work done at CNLP, using Lucene in various scenarios and domains. Very interesting stuff. I’ve yet to digest most of it but I’m sure I will soon, go check it out!

I’ve been saying this since the whole “what is web 2.0” meme started and it seems that the world is catching up, about time! But amidst all the declarations of the death 0f Web 2.0 (film at the usual time) I finally found the one definition of Web 2.0 I can really get behind:

We are in a new era of excitement about the Web and that is, I guess, as close to anything the definition of Web 2.0.

A new era. A revival. A renaissance if you will. Thank you, Mr. MacManus (the blogger formerly known as the father of Web 2.0).

Update: Credit where credit’s due.

