I downloaded the (infamous) AOL query logs a few days back, so I could explore caching in search. Here’s a few things I learnt about popular queries along the way.
The top ten queries in the 2006 AOL query logs are
The world’s changed since then: you wouldn’t expect to see a few of those names in the top ten. But what’s probably still true is that the queries are navigational, that is, queries that users pose when they want to go somewhere else on the web. The queries weather and american idol are the only two in the top twenty that aren’t navigational (they’re informational queries).
The misspellings of google are startling. Any spelling you can imagine is in the query logs, and they’re frequent. Here’s a few examples from the top 1,000 queries:
This is true of every popular query: a quick glance at ebay (the second-most popular query) finds e-bay, e bay, ebay.com, ebay search, ebay.om, eby, and many more.
And don’t get me started on the different spellings of britney (as in spears): brittany, brittney, britny, britney, …
The good news for users is that most of these misspellings or alternate expressions work just fine at google. That’s the miracle of query rewriting in search.
Single Characters and Other Typing Errors
Single characters queries are surprisingly common. The ten most popular are m (51st most popular query), g (89th), y (115th), a, e, h, w, c, s, and b. Here my theory on m: users are typing <something>.com (which we know is very popular), and at the end they hit enter just before hitting m, and then hit m, and press enter again. Transpositions are pretty common, and m is far-and-away the most popular letter that ends a query. My theory on g and y is they’re the first letters of google and yahoo, and the user hit enter way too early. I don’t have a URL theory on a or e, they are very common letters. On h and w, they’re the beginning of http and www.
There’s many other had-a-problem-with-the interface queries that are popular. Queries such as mhttp, comhttp, .comhttp, and so on are common. What’s happened here is the user has gone back to the search box, partially erased the previous query, typed something new, and hit enter early.
Of the top 1000 queries, 91 begin with www. It’s basically a list of the top sites on the web, that half way through repeats with the initial period replaced with a space (example: http://www.google.com is the 10th most popular query, www google.com is the 123rd most popular query). I wonder if using a www prefix has changed in 6 years? My first theory on this is users don’t get the difference between the search box and the browser address bar — and Google Chrome sure has fixed that problem (make them the one thing). Brilliant, simple innovation. The second theory is that users think they need to put www at the front of queries when they’re navigational — you’ll often hear people talk about that in user experience research sessions.