I downloaded the (infamous) AOL query logs a few days back, so I could explore caching in search. Here’s a few things I learnt about popular queries along the way.
Popular queries
The top ten queries in the 2006 AOL query logs are
- ebay
- yahoo
- yahoo.com
- mapquest
- google.com
- myspace.com
- myspace
- http://www.yahoo.com
- http://www.google.com
The world’s changed since then: you wouldn’t expect to see a few of those names in the top ten. But what’s probably still true is that the queries are navigational, that is, queries that users pose when they want to go somewhere else on the web. The queries weather and american idol are the only two in the top twenty that aren’t navigational (they’re informational queries).
Misspellings
The misspellings of google are startling. Any spelling you can imagine is in the query logs, and they’re frequent. Here’s a few examples from the top 1,000 queries:
- googlecom
- google.
- http://www.google
- google.cm
- googl.com
- googl
- goole.com
- goole
- goog
- googel
- google.co
- googles
- goggle.com
- goggle
This is true of every popular query: a quick glance at ebay (the second-most popular query) finds e-bay, e bay, ebay.com, ebay search, ebay.om, eby, and many more.
And don’t get me started on the different spellings of britney (as in spears): brittany, brittney, britny, britney, …
The good news for users is that most of these misspellings or alternate expressions work just fine at google. That’s the miracle of query rewriting in search.
Single Characters and Other Typing Errors
Single characters queries are surprisingly common. The ten most popular are m (51st most popular query), g (89th), y (115th), a, e, h, w, c, s, and b. Here my theory on m: users are typing <something>.com (which we know is very popular), and at the end they hit enter just before hitting m, and then hit m, and press enter again. Transpositions are pretty common, and m is far-and-away the most popular letter that ends a query. My theory on g and y is they’re the first letters of google and yahoo, and the user hit enter way too early. I don’t have a URL theory on a or e, they are very common letters. On h and w, they’re the beginning of http and www.
There’s many other had-a-problem-with-the interface queries that are popular. Queries such as mhttp, comhttp, .comhttp, and so on are common. What’s happened here is the user has gone back to the search box, partially erased the previous query, typed something new, and hit enter early.
Of the top 1000 queries, 91 begin with www. It’s basically a list of the top sites on the web, that half way through repeats with the initial period replaced with a space (example: http://www.google.com is the 10th most popular query, www google.com is the 123rd most popular query). I wonder if using a www prefix has changed in 6 years? My first theory on this is users don’t get the difference between the search box and the browser address bar — and Google Chrome sure has fixed that problem (make them the one thing). Brilliant, simple innovation. The second theory is that users think they need to put www at the front of queries when they’re navigational — you’ll often hear people talk about that in user experience research sessions.
Blomquist’s Approximation states: if you give a user a search box, the number one query is “google”.
In addition to the AOL example that you cited, Hugh, while I was working at Microsoft we observed that not only was “google” the number one query on Bing, but also in both the Windows and Office help search boxes. Those last two boggled my mind so much that I decided to grossly overgeneralize and self-title a pompous engineering principle based on it.
Can eBay’s query data save the world from Blomquist’s Approximation?