« PREVIOUS ENTRY
Does calorie labeling get Starbucks customers to eat light? With food — but not with drinks
Here’s a study with an interesting finding: If you want to get better results on Google, try using a shorter query.
I found this while doing research for a story about automated “question answering” systems. I was reading through the work of James Allan, a computer scientist at the University of Massachusetts, and read his paper “A Case for Shorter Queries, and Helping Users Create Them” (PDF here). In it, he and his coauthor Giridhar Kumaran conducted an experiment: They took the query Define Argentine and British international relations and ran it through a search engine. (They don’t specify which one they used.) Then they ran various similar queries that used fewer words — “sub queries” — such as define britain international argentina or define britain relate argentina. Each time, he graded the relevance the search engine’s results, expressed as their “average precision” on a scale of zero to 1.0.
So which sub-query produced the best results? The shortest one. It was only two words long — britain argentina — but it scored 0.626, quite a lot better than the original, full-sentence query, which scored only 0.424.
Why would short queries work better than longer ones? Possibly because they contain fewer “noise terms” — common words like define or and — which might muddy the search results. Human language is filled with ambiguity; one of the big challenges for a machine is taking a human question and figuring out what, semantically, it’s actually asking. In that sense, using fewer words would reduce the number of potential ways the machine can misunderstand you.
Except the truly strange thing in that example above is the question was asking about British and Argentinian international relations — yet the best results came from removing the words “international” and “relations”. I’d have expected those to be important words, no? But that’s precisely the point Allan is getting at here:
Sub-queries a human would consider as an incomplete expression of information need sometimes performed better than the original query.
This suggests, of course, that the best way to get results on a search engine is to radically strip your query down even further than you think is useful. Or maybe start with a regular query, and if you don’t like the results, try making it shorter and shorter.
Then again, it’s hard to know if this would really work. I’m not privy to what’s going on behind the hood of most search engines today. Allan’s paper discusses several ways for question-answering systems to have the computer automatically shorten a query before feeding it into the knowledge database; but his paper is a few years old, so maybe these techniques are already common amongst search engines — maybe they already reformat our queries into semantically shorter formats.
What do you guys think? Anecdotally, have you found that super-short queries work better than longer, sentence-like ones?
I'm Clive Thompson, the author of Smarter Than You Think: How Technology is Changing Our Minds for the Better (Penguin Press). You can order the book now at Amazon, Barnes and Noble, Powells, Indiebound, or through your local bookstore! I'm also a contributing writer for the New York Times Magazine and a columnist for Wired magazine. Email is here or ping me via the antiquated form of AOL IM (pomeranian99).
ECHO
Erik Weissengruber
Vespaboy
Terri Senft
Tom Igoe
El Rey Del Art
Morgan Noel
Maura Johnston
Cori Eckert
Heather Gold
Andrew Hearst
Chris Allbritton
Bret Dawson
Michele Tepper
Sharyn November
Gail Jaitin
Barnaby Marshall
Frankly, I'd Rather Not
The Shifted Librarian
Ryan Bigge
Nick Denton
Howard Sherman's Nuggets
Serial Deviant
Ellen McDermott
Jeff Liu
Marc Kelsey
Chris Shieh
Iron Monkey
Diversions
Rob Toole
Donut Rock City
Ross Judson
Idle Words
J-Walk Blog
The Antic Muse
Tribblescape
Little Things
Jeff Heer
Abstract Dynamics
Snark Market
Plastic Bag
Sensory Impact
Incoming Signals
MemeFirst
MemoryCard
Majikthise
Ludonauts
Boing Boing
Slashdot
Atrios
Smart Mobs
Plastic
Ludology.org
The Feature
Gizmodo
game girl
Mindjack
Techdirt Wireless News
Corante Gaming blog
Corante Social Software blog
ECHO
SciTech Daily
Arts and Letters Daily
Textually.org
BlogPulse
Robots.net
Alan Reiter's Wireless Data Weblog
Brad DeLong
Viral Marketing Blog
Gameblogs
Slashdot Games