The A-List and the Case for Randomizing Search Results

An interesting idea: Partial randomization of search results as a way of quickly getting high-quality “Z list” results onto the “A-list” radar.

In-degree, PageRank, number of visits and other measures of Web page
popularity significantly influence the ranking of search results by
modern search engines. The assumption is that popularity is closely
correlated with quality, a more elusive concept that is difficult to
measure directly. Unfortunately, the correlation between popularity and
quality is very weak for newly-created pages that have yet to receive
many visits and/or in-links. Worse, since discovery of new content is
largely done by querying search engines, and because users usually
focus their attention on the top few results, newly-created but
high-quality pages are effectively “shut out,” and it can take a very
long time before they become popular.

We propose a simple and elegant solution to this problem: the
introduction of a controlled amount of randomness into search result
ranking methods. Doing so offers new pages a chance to prove their
worth, although clearly using too much randomness will degrade result
quality and annul any benefits achieved. Hence there is a tradeoff
between exploration to estimate the quality of new pages and
exploitation of pages already known to be of high quality. We study
this tradeoff both analytically and via simulation, in the context of
an economic objective function based on aggregate result quality
amortized over time. We show that a modest amount of randomness leads
to improved search results. “


  1. Not quite the same thing. One problem with the A-list is that they basically only read other A-listers. Therefore, any idea said by many people tends to be linked to another A-lister. Randomization of search results doesn’t solve that issue, because it’s more an “in-bred” problem. This is similar to self-reinforcing search rank in action, but isn’t in the search domain.
    Oh, and Google’s recognized the problem years ago. The fix is what led to the “blog noise” complaints.

  2. Paul,
    I am an avid reader of your blog. My favorite post was the Da Vinci 2 hour webcast.
    Randomness will not improve the results. Using personalization will. A simple improvement would be to weight the ranks by views * time viewed.
    Personalization using multidimensional database tools would do even better. Web logs can easily be used to learn about the users personality, interests beyond simple word searches and phrase searches. Rather than return hundreds of irrelevant links, a more select set of useful links can be returned with the search. Autonomy is a company doing this among others.
    My friend and I worked on this for personalization of advertisement delivery for video services, and internet services.
    Keep up the good info.

  3. Personalization is not a truly valid solution. It assumes that a given user will make the same types of queries repeatedly or will have the same intent when making similar queries. The search problem is not when trying to find familiar material, since often the user has “learned” which keywords to use and in which order to use them. The problem is when the user is trying to find brand new information and often isn’t sure even which keywords to use.

  4. Guillaume says:

    Another reason personalisation would fail is that we are all very average… Of course, one has to be careful with average (on average, the world population has one testicle) but then it’s easy to find large subgroups which behave the same.

  5. I think Randomization can not solve search problem for newly created pages. I strongly believe that if a page content is worth to be in top search list, it will surely get there. A little effort to promote site with quality content can see it in ‘A’ search list pretty fast. Google’s page ranking algorithm is good enough to help do that.