Archive for May 9th, 2011


Build a Better Search Engine, and the World Will Something Something

Yesterday’s New York Times ran an article detailing the efforts of yet another up-and-coming search engine determined to dethrone Google— Seems like the search monster wannabes crop up as regularly as referral spam. Why?

Well, duh. They want to do a Google and retire billionaires at the age of 27. They figure that all the folks at Google did was come up with an incrementally better way to find information on the internet, let it run for a couple of years, launch an IPO, and off to the Lamborghini dealer they went. How hard can it be?

Not so hard to invent a search engine, apparently. Very hard to invent one that’s even incrementally better than Google. For one thing, they have to improve on an engine that delivers pretty decent results on most queries, and good enough results on the rest. And at the same time, not make anything worse.

Here’s  a list of things a new search engine will need to do better in order to dethrone—or even compete—against the Goog.

  1. Relevance. As of now, Google’s ability to match queries to webpage content is almost creepy. They’ve indexed pretty much every word of on the internet, and their rocket-science algorithm sifts it all down to meaning as well as or better than most humans. Still, there’s always room to improve. And oddly (or not), the paid AdWords ads are often less relevant than the organic search. Maybe some room to compete there, and any engine that bests Google will have to return results that are least as relevant, if not more so. Bing, so far you lose.
  2. Currency. Just how up-to-date is the Google index? Well, it’s better than it once was, but it’s not so much. Because they apparently value a domain’s age when calculating search position, they return a lot of well-aged results. Try a search for “percentage of email that is spam” and you’ll get a Wikipedia page of uncertain vintage, then 7 of the next 9 pages dated between 2007 and 2009. In internet time, that’s ancient history. Searches conducted on popular current events do much better. This is at least partly because of Google’s love of the blog, which does give them a large pool of content centered on whatever is causing a buzz. A better engine would be better at determining whether a query is looking for up-to-date info, historical data, or for something with a pedigree. Wolfram-Alpha may on the way to getting that smart, but none of the others are close.
  3. Quality. The most recent Panda Update was designed to improve the quality of search result pages. It may have helped some by culling (or at least demoting) huge arrays of crappy “almost” content from a few giant content farms. Still, the ratio of junk to useful is still pretty high. The reason Google ranks so many pages full of nonsense? Because getting a high rank is valuable enough that scamming your way to the top of a Google search result page has become a billion dollar a year industry. Blekko’s focus on human-powered social networking to declare a page’s value is an interesting gambit. Will they be able to fend off the hordes of black-hatters who will descend on them the minute they find some success? Doubt it, but maybe.

That’s all there is to it. Come up with a search engine formula that does a superior job at even 2 out of the 3 and you have a shot at taking down the champ. Until then, forget it.

New York Times Article

An Engine’s Tall Order: Streamline the Search