I was asked recently about Google’s Personalized Search. It’s kinda funny that people ask for my opinion based on a post I made on our class’s wiki page but I guess the Internet really is the great equalizer. Anyways, here’s my edited response:

I’m somewhat conflicted on the possibilities of Google’s personalized search mostly because I haven’t found any papers that explain exactly how it works. The best paper I’ve found which I *believe* details algorithms similar to what they currently use is the one I referenced on the wiki. I’ve never done the math to see how scaleable the methods detailed in the paper are but I suspect that they scale pretty well both in speed of serving results and more importantly in detailed the personalization bias can be.

There’s a lot of linear algebra that needs to be done before my assumptions are absolutely proven but considering the speed Google Personalized works right now I believe they managed to break the problem down into much smaller independent components that are completely separate from the non-personalized PageRank scores and so don’t need to re-calculate the whole thing for every user. Does anybody care to do the math on this and see if that really is feasible? In my mind they have a separate bias vector for every choice or set of choices and all they need to do is add the vectors that match a user’s personalization preferences to the big PageRank vector and voila! Real Time personalized PageRank values.

If my suspicions are true, it should be possible for Google to offer a much more detailed topic hierarchy to choose from both in breadth and in depth. This would definitely improve the amount of personalization you see in their results as the bias will be towards less generic sites. This does not, however, guarantee good results; for good results we need to think about the amount of bias and how far it spreads. I’ve only seen anecdotal evidence of those algorithms actually working well and the kind of influence the initial set of biased pages has on the results (in section 6 of Bringing Order to the Web). It is hard to evaluate this very deeply, not knowing the algorithms involved, but if we can assume that they’re giving a certain bias to a set of pages that “represent” a topic in the hierarchy, these biased pages would then give a higher PageRank to pages connected to them and so forth. The important question then becomes what kind of effect do these pages have on the eventual PageRank? What I would like to see is that pages in the semantic neighborhood of those seed pages become more “important” and so show up higher in the search results. How well does the algorithm really works? Does it really find a good subset of the community of pages related to those seeds? I don’t know but I hope Google does.

Filed under: Search