A couple of days ago, I posted about a neat Google hack — the search results for “weapons of mass destruction”. In the comment field for the item, Franco pointed out that when he recently tried to seach for the goddess “Tykhe”, Google asked him if he really meant to search for the word “the”. As Franco sardonically joked: “Yes, I meant to search the entire internet for the word ‘the’ — a word which you refuse to search for.” And it’s true: Whenever you type in a search string with common words like “the” or “and”, Google strips them out. Generally, Google won’t even allow you to include “the” as a search term.
But here’s the weird thing: If you type in only the word “the” as a search, you actually do get results. When I searched for “Tykhe”, Google gave me the same response it gave Franco:
Searched the web for Tykhe — Results 1 - 10 of about 302. Search took 0.05 seconds.
Did you mean: The
So I clicked on the “the” search, and discovered it generates 3,680,000,000 results. The top-ranked search results are, in order:
The Onion
The White House
The Economist
NASA
The Guardian
AllTheWeb.com
The Weather Channel
The New York Times
The Washington Post
The Hunger Site
This is really intriguing. Since “the” is the most common word in the English language, it would — theoretically — be distributed pretty evenly around the Internet. In that case, when Google searches for “the”, it faces a unique situation. It would be very hard for Google’s semantic or key-word-matching tools to figure out which web site used the word most frequently, or in a most significant fashion. Most semantic or key-word-matching reasoning is rendered useless. And indeed, look again at the number of results: 3,680,000,000. That’s almost precisely the number of sites that Google claims to index — 3,083,324,652. Thus, the search “the” is returning results for every single page on the Internet.
In this situation, the main trick Google has to fall back on is PageRank: Its patented system for determining which sites are important, by counting the number of links that point to them. This would mean, then that The Onion — and those other nine sites — may have more links to it than most other sites on the Net. They are, in effect, the most popular sites on the Net, since PageRank popularity is clearly the main criteria — if not the only criteria — that Google is using to place them on the Top 10 list, right?
Well, maybe. Possibly the names of the sites are important, too. Notice that, except for NASA, all the sites have the word “the” in their official web-site title — and thus probably also in their meta tags, and various other semantically important bits of HTML. That may explain why The Hunger Site appears so high.
Pretty weird, eh?
I'm Clive Thompson, the author of Smarter Than You Think: How Technology is Changing Our Minds for the Better (Penguin Press). You can order the book now at Amazon, Barnes and Noble, Powells, Indiebound, or through your local bookstore! I'm also a contributing writer for the New York Times Magazine and a columnist for Wired magazine. Email is here or ping me via the antiquated form of AOL IM (pomeranian99).
ECHO
Erik Weissengruber
Vespaboy
Terri Senft
Tom Igoe
El Rey Del Art
Morgan Noel
Maura Johnston
Cori Eckert
Heather Gold
Andrew Hearst
Chris Allbritton
Bret Dawson
Michele Tepper
Sharyn November
Gail Jaitin
Barnaby Marshall
Frankly, I'd Rather Not
The Shifted Librarian
Ryan Bigge
Nick Denton
Howard Sherman's Nuggets
Serial Deviant
Ellen McDermott
Jeff Liu
Marc Kelsey
Chris Shieh
Iron Monkey
Diversions
Rob Toole
Donut Rock City
Ross Judson
Idle Words
J-Walk Blog
The Antic Muse
Tribblescape
Little Things
Jeff Heer
Abstract Dynamics
Snark Market
Plastic Bag
Sensory Impact
Incoming Signals
MemeFirst
MemoryCard
Majikthise
Ludonauts
Boing Boing
Slashdot
Atrios
Smart Mobs
Plastic
Ludology.org
The Feature
Gizmodo
game girl
Mindjack
Techdirt Wireless News
Corante Gaming blog
Corante Social Software blog
ECHO
SciTech Daily
Arts and Letters Daily
Textually.org
BlogPulse
Robots.net
Alan Reiter's Wireless Data Weblog
Brad DeLong
Viral Marketing Blog
Gameblogs
Slashdot Games