Category Archives: technology

Knowing Your Customer with Data

Are you really data driven? Here’s what I’ve learnt about making decisions using quantitative data.

A Typical Test versus Control Experiment

Let’s get on a page about what we’re discussing. Most web companies run test versus control experiments, or A/B tests. The idea is simple:

  1. Divide the customers into populations
  2. Show one population the control (default, “A”) experience
  3. Show one or more populations the test (new, altered, “B”) experience
  4. Collect data from each population
  5. Compute metrics from the data
  6. Understand the relative results between the test and the control
  7. Make decisions: either keep the control, or replace it with a new, better experience from a positive test

Explaining how to really know your customer with data at the 2012 eBay Data Conference

It’s critical in Step 5 to compute confidence intervals, that is, statistical measures that tell you the probability that the phenomena you’re seeing is real. For example, using a one-sided t-test, you might learn that there’s a 90% probability that the test experience is better than the control.

Let’s suppose you’ve reorganized the layout of your site, and what you’ve learnt is that customers abandon the pages much less. Through your test, you’re 90% confident that a new experience you’ve tested is better than the default, control experience. On that basis, you might want to launch the new, test experience — but I’d caution you to learn more before you make a decision.

Where does the behavior come from?

I recommend you always dig deep into your data. Learn as much as you can before you decide. I like to see data “cut” (broken into sub populations) by:

  • Device (Mobile vs. tablet vs. desktop. Break it down by brand, make, and model [for example, Apple iPad HD])
  • Operating system (Linux vs. Mac OS X vs. Windows, break it out by versions)
  • Browser (Chrome vs. IE vs. Firefox vs. Safari, break it out by version)
  • Channel (Visits from within your site vs. visits from Google search vs. Visits from paid advertising)

When you do this, and add in your confidence intervals, you will almost always learn something. Is the new experience working as expected on the dreaded IE6 and IE7? Any issues on a mobile device? Does it work better when customers are navigating within your site versus landing in the middle of it from a Google search?

Ask yourself: what can I improve before I make a decision? And always ask: knowing this detail, am I still comfortable with my decision? Be very careful about launching new experiences that help most of the population, and hurt some of it — ask whether you can live with the worst case experience.

When you do these cuts, make sure the data makes sense. I’ve learnt over the years that when you see something that you don’t expect, it’s almost always a bug, or an error in the data. Never explain away surprises with complex theories — something is probably broken.

Who or what is affected by the change?

You can think of the previous section as suggesting you cut the data funnel — where the behaviors come from. You should also cut the data by who or what it affects on your site:

  • Which customers are affected? (Old versus new, first time visitors versus returning, regular versus occasional, international versus domestic, near versus far, and so on)
  • What categories are affected? (Fashion versus electronics, browse versus buy, and so on)
  • Which queries are affected? (A search-centric view. Long versus short queries, English versus non-English, Navigational versus Informational, and so on)
  • Which sessions are affected? (Long research sessions versus short purchase sessions, multi-query sessions versus single-query sessions, multi-click sessions versus single-click sessions, and so on)
  • Which pages are affected?

All the same caveats and suggestions from the previous section apply here.

I also love to compute many different metrics. While you’ll often have a “north star” metric that you’re trying to move — whether it’s relevance of the experience, abandonment of your site, or the dollar value of goods sold — it’s great to have supporting data to inform your decision. When you compute more metrics, you almost always will see contradiction that makes your decisions harder: but it’s always better to know more than to have your head in the sand. It takes smart, sensible debate to make most launch decisions.

The mean average hides the truth

Here’s an over-simplified example. Suppose six customers rate your site on a scale of 1 (horrible) to 10 (amazing). In the control, they rate you as 4, 5, and 6. In the test, they rate you as 1, 4,  and 10. The control and test have a mean average rating of 5. (Ignore the statistical significance for the simple example.)

On this basis, you might abandon the work on the new experience — it’s no better than the control. But if you dig in the data, you’d see that some customers love the new experience, and some hate it. Imagine if you can fix whatever is causing customers to hate it — if you could get that 1 to be a 5, you’d see a mean average of over 6 for the test. The fastest way to move a mean is to fix the outliers: focusing on what’s broken.

I don’t like mean averages because they hide the interesting nuggets. I like to see 90th and 95th percentiles — show me the performance of the best and worst 10% or 5% of customer experiences respectively. In our simple example, I’d love to know that the worst customer experience was 1 in the test and 4 in the control, and the best experience was 10 and 6. Knowing this, I’m, excited about the potential of the test, but worried that something is very wrong about it for some customers. That guides me where to put my energy.

Don’t be myopic

It’s common to measure your feature in the product, and ignore the ecosystem. For example, you might be working on an improvement on some part of a page — imagine that you’re working on Facebook’s news feed. You’ve figured out an improvement, run the test, seen much better customer engagement, and you’re excited to launch.

But did you worry about what you’ve done to the sponsored links on the right side of the page? Did you hurt the performance of another part of the product owned by another team? It’s common for features to hurt performance of others, and often cause the overall result to be neutral. This happens between features on one page, and between pages. Make sure you always measure overall page and site performance too.

Tests don’t tell you everything

Tests don’t tell you what you don’t measure. Measure as much as you can.

Even if you do measure as much as you can, there’ll be much happening outside your test that’s important. For example, if you run a test for a week, you don’t learn anything about the long term effects on customer retention. You don’t know anything about how customers will adapt to using the feature. You won’t know whether the effects are seasonal, or what might happen if some of your assumptions change — for example, what if another team changes something else on the page or site in the future?

This can be ok. Just realize the limitations, and be aware that retesting in the future might be a smart choice.

Quantitative testing also won’t tell you anything qualitative about what you’re working on. That’s a whole another theme of testing — and one I do plan to come back to talk about in the future.


Around 1,000 people attended the employee-only eBay Data Conference recently. I had the opportunity to speak to them through my opening keynote address, and this post is based on that presentation. Thanks to Bob Page for inviting me.

The size, scale, and numbers of

I work in the Marketplaces business at eBay. That’s the part of the company that builds,,,, and most of the other worldwide marketplaces under the eBay brand. (The other major parts of eBay Inc are PayPal, GSI Commerce, x.commerce, and StubHub.)

I am lucky to have opportunities to speak publicly about eBay, and about the technology we’re building. It’s an exciting time to give a talk – we are in the middle of rewriting our search engine, we’ve improved search substantially, we’re automating our data centers, we’re retooling our user experience development stack, and much more.

At the beginning of most talks, I get the chance to share a few facts about our scale and size. I thought I’d share some with you:

  • We have over 10 petabytes of data stored in our Hadoop and Teradata clusters. Hadoop is primarily used by engineers who use data to build products, and Teradata is primarily used by our finance team to understand our business
  • We have over 300 million items for sale, and over a billion accessible at any time (including, for example, items that are no longer for sale but that are used by customers for price research)
  • We process around 250 million user queries per day (which become many billions of queries behind the scenes – query rewriting implies many calls to search to provide results for a single user query, and many other parts of our system use search for various reasons)
  • We serve over 2 billion pages to customers every day
  • We have over 100 million active users
  • We sold over US$68 billion in merchandize in 2011
  • We make over 75 billion database calls each day (our database tables are denormalized because doing relational joins at our scale is often too slow – and so we precompute and store the results, leading to many more queries that take much less time each)

They’re some pretty large numbers, ones that make our engineering challenges exciting and rewarding to solve.

Any surprises for you?


Passwords are in the news lately, and particularly associated with ‘leaks’.

In general, passwords are rarely leaked, because they usually aren’t stored. What is usually leaked are the password hashes.

How Passwords Work

When you provide a new password for an account, here’s what typically happens:

  1. The company hashes your password so that it becomes a string of characters that isn’t the password (more in a moment)
  2. The company stores the hash
  3. The company throws away your original password

When you come back to the site, and provide your password, here’s what happens:

  1. You type your password
  2. The password is hashed (using the same algorithm as before)
  3. The hash is compared to what’s stored by the company
  4. If they’re the same, you’re in. If they’re different, your password is wrong, try again

One-Way Hashes

The method that’s used to turn a password into a hash should be one-way. That is, it should be theoretically impossible to reverse the hash into the password. That’s actually pretty easy: most hashing algorithms throw away lots of information, and so the hash is lossy (that is, it has less information in it than the original password — it’s just a very-likely-to-be-unique string that represents the password). So it is actually usually impossible to reverse a password hashing algorithm.

There are many different algorithms for one-way hashing that can’t be reversed. The most popular one-way hash is the 128-bit MD5 hash, though this has been shown to be somewhat insecure (I’ll explain this in a minute). More recent approaches such as SHA-2 are similar in their theory but more secure.

Here’s an example. If you put the string password into an MD5 function, you get 5f4dcc3b5aa765d61d8327deb882cf99 as the output.

How can something that’s lossy be insecure? Since it’s impossible to reverse it, how can it be used by a hacker to find the original password?

The answer is that you can try putting vast numbers of original passwords through the hashing algorithm, and see if you can match the output to what is actually stored. So, for example, you could hash every English word using MD5, and see if you get a string that matches the leaked hashes. If you do, you’re in. And if you find one that matches, chances are you find many more that match too — lots of users have the same password.

That’s the problem with the MD5 hash: computers are sufficiently fast that it’s possible to try very, very large numbers of input strings and compare them to hashes (more later). And so if you’ve got the computing resources, you can effectively “reverse” the hashing.

Salting Passwords

The best way to make it very hard for hackers to figure out the original passwords is to salt them. Salting is a pretty simple idea: instead of hashing the password, hash the password and some other string concatenated to it. That “some other string” is non changing: it could the user’s user ID in the system, the time they created their account, or something else that’s stored but not typically not known to the user.

Here’s an example. Suppose the user supplies the password “magpie”. Instead of hashing this alone, we might append their user ID, say “123456” to the string. So, we’d be hashing “magpie123456” to get our password hash that’s stored in the system.

How does salting help? Well, it makes it hard to reverse the hashing: a hacker won’t be able to match common passwords (say, English words) against the hashes, and so they’ll effectively have to try vastly more input strings. When they crack one, they typically also won’t crack more, since even if the passwords of two users are the same, the salted strings aren’t — “magpie123456” and “magpie123457” produce vastly different hashes.

Of course, if a hacker gains access to the complete database, and the salting algorithm — and so they have all of the input data, and the algorithm for creating the salted strings, you’re in lots of trouble. But if they’ve done that, you’ve got worse things to worry about.

Salting is essential. Don’t build a password system without it. And when you do build the password system, consider a hashing algorithm such as SHA-2. Even the author of MD5 doesn’t recommend using it now — computers have enough resources to brute force crack MD5 using a birthday attack.

I’ll be back next week.

Popular Queries

I downloaded the (infamous) AOL query logs a few days back, so I could explore caching in search. Here’s a few things I learnt about popular queries along the way.

Popular queries

The top ten queries in the 2006 AOL query logs are

  1. google
  2. ebay
  3. yahoo
  5. mapquest
  8. myspace

The world’s changed since then: you wouldn’t expect to see a few of those names in the top ten. But what’s probably still true is that the queries are navigational, that is, queries that users pose when they want to go somewhere else on the web. The queries weather and american idol are the only two in the top twenty that aren’t navigational (they’re informational queries).


The misspellings of google are startling. Any spelling you can imagine is in the query logs, and they’re frequent. Here’s a few examples from the top 1,000 queries:

  • googlecom
  • google.
  • googl
  • goole
  • goog
  • googel
  • googles
  • goggle

This is true of every popular query: a quick glance at ebay (the second-most popular query) finds e-bay, e bay,, ebay search,, eby, and many more.

And don’t get me started on the different spellings of britney (as in spears): brittanybrittneybritnybritney, …

The good news for users is that most of these misspellings or alternate expressions work just fine at google. That’s the miracle of query rewriting in search.

Single Characters and Other Typing Errors

Single characters queries are surprisingly common. The ten most popular are m (51st most popular query), g (89th), y (115th), a, e, h, w, c, s, and b. Here my theory on m: users are typing <something>.com (which we know is very popular), and at the end they hit enter just before hitting m, and then hit m, and press enter again. Transpositions are pretty common, and m is far-and-away the most popular letter that ends a query. My theory on g and y is they’re the first letters of google and yahoo, and the user hit enter way too early. I don’t have a URL theory on a or e, they are very common letters. On h and w, they’re the beginning of http and www.

There’s many other had-a-problem-with-the interface queries that are popular. Queries such as mhttp, comhttp, .comhttp, and so on are common. What’s happened here is the user has gone back to the search box, partially erased the previous query, typed something new, and hit enter early.

Of the top 1000 queries, 91 begin with www. It’s basically a list of the top sites on the web, that half way through repeats with the initial period replaced with a space (example: is the 10th most popular query, www is the 123rd most popular query). I wonder if using a www prefix has changed in 6 years? My first theory on this is users don’t get the difference between the search box and the browser address bar — and Google Chrome sure has fixed that problem (make them the one thing). Brilliant, simple innovation. The second theory is that users think they need to put www at the front of queries when they’re navigational — you’ll often hear people talk about that in user experience research sessions.

Caching in Search

Did you know that the vast majority of results from search engines are served from a cache of previously-computed results? Probably only around one third of all queries are actually evaluated using the backend search infrastructure.

Caching provides fast response to most user queries, while allowing search companies to spend much less on hardware, or to devote the resources in their search infrastructure to better computation of results.

Why does caching work in search?

In this post on click curves, I explained that most everything in search follows an inverse power law distribution (a so-called “Zipf curve”). The implication is that a few queries account for the majority of distinct queries, that is, most users are searching for the same things.

AOL memorably released three months of query logs in 2006. They were slammed for doing so and pretty quickly apologized and took down the data. However, it’s a pretty nice data set for our purposes of discussing caching.

The most popular query at AOL in those three months of 2006 was google. Around 0.9% of the queries typed by users were those looking to leave AOL’s search and head over to Google. The second most popular query was ebay at 0.4% of all queries, and the third yahoo at 0.4%. If you sum the frequency of the top ten unique queries, you’ve seen around 3% of all the query volume. Here’s what happens as you inspect more unique queries:

  • If you sum the total frequency of the top 100 queries, you get around 6% of all the user query volume
  • The top 1,000 unique queries are around 11% of the query volume
  • The top 10,000 are around 20% of the volume
  • The top 100,000 are around 34% of the volume
  • The top 1,000,000 are around 58% of the volume

Those points are plotted on a log-log graph below.

Query cache effectiveness for the AOL search query logs. The y-axis is percentage of the total volume of queries that’s cached. The x-axis is the number of unique queries in the cache. Bottom line, storing over a million queries in the cache means you can serve over 60% of user queries from the cache.

We’d expect there’s diminishing returns in caching more queries and their results. As the queries become less frequent, there’s less benefit in caching their results. There’s no benefit in caching a query that occurs once. By the time you’re caching the millionth query from this set, you’re caching queries that occur only 5 times in 3 months. By the way, there are just over 36 million queries in the log, and about 10 million unique queries when they’re normalized (which I didn’t do a very good job of).

The key point to take away is that if we only store only the results for the top 100,000 queries, we can save our search backend from having to evaluate around 34% of all the queries that users pose.

This is a slight exaggeration, since we can’t quite key our search cache on query string alone. Remember that web search users have different language preferences, safe search settings, and so on. All up, the key for our cache probably has around ten parts — but remember than most users likely stick with the defaults, and so the query is the most variable element in the key. I don’t know quite what effect this’d have — but I bet it’s small (say, it reduces the caching effectiveness of the top 100,000 queries from 34% to 32% or so). I expect that the recent Bing and Google pushes into personalization have made caching harder: but I also bet that personalization affects relatively few queries.

Storing Cached Data

The key to the cache is the query and around ten other elements including safe search settings, language preference, market, and so on.

What’s stored in the cache is the results of the query with those settings. For web search, the results includes the list of matching URLs, the snippets, freshness information (more in a moment), and other elements you need to build the page. You might, for example, store the related searches, or the images or news or videos that are associated with queries that show more than web search results.

One of the tricks of caching is knowing when to expire what’s in the cache. You don’t want to keep showing results for a query when the results have actually changed; for example, maybe there’s a new snippet, or a URL has changed, or a new result has entered the top ten results, or there’s some breaking news. Here’s a few factors I thought of that you could use to expire results in the cache:

  • Historical change rate of the results (record how frequently the results change, and use that data to predict when it’ll change in the future)
  • What data is being displayed (if the results contain, for example, news sites, perhaps you expire the cache entry earlier)
  • Change in query frequency (if users suddenly start typing a query much more frequently, that’s a clue that’s something has changed)
  • How long the results have been cached (perhaps you have some time limits that ensure everything is refreshed on a cycle)
  • The load on the search engine (if the search engine is under heavy load, don’t expire the cache as aggressively; if it’s under low load, it’s a good time to refresh the cache)

When you expire a result in the cache, you fall back to the search backend to recompute the results for the query, and then you store those results in the cache.

Bottom line, caching is amazingly effective in search. It’s a super hard problem at eBay, given the dynamic nature of the auction and fixed price formats: items sell, bids change, prices change, and so on. We also blend auctions, fixed price items, and products dynamically based on the results — so even the mix of formats is dynamic. We’re excited about making caching work well at eBay, but we’ve so far not hit anywhere near the heights you’d expect from the analysis of AOL’s web search query logs. I’ll explain this more in the future.

You can learn more about the AOL query logs by downloading this paper from Abdur Chowdhury’s website. Here’s the full citation:

G. Pass, A. Chowdhury, C. Torgeson, “A Picture of Search“, The First
International Conference on Scalable Information Systems, Hong Kong, June,

I’ll explain some interesting facts about the AOL query logs in a future post.

The race to build better search: a Reuters article

I spent a couple of hours with Alistair Barr from Reuters discussing search at eBay, and our Project Cassini rewrite of eBay’s search platform. Alistair published the story yesterday, and it’s a good, short read. Thanks Alastair for sharing the story with your readers.

Alistair discusses Walmart’s search rewrite (which didn’t take too long by the sounds of it — my recent blog post suggests why), quotes responses from Google’s PR team, and shares insights from my good friend Oren Etzioni who works at both the University of Washington and the rather awesome shopping decision engine, He mentions Google’s features that obviously do a little light image processing to match color and shape intents in queries such as “red dress” or “v-neck dress” against the content in the images.

Google does do lots of things well, but they’re often not the first to do them — we built that color search into Bing’s image search in 2008 (try this “red dress” query). On a related note, eBay has a rather cool image search feature, which we really should make more prominent in our search experience (mental note: must work on that). Try this “red dress” query, and you’ll see results that use visual image features to find related items.

I’ll be back with part #3 of my Ranking at eBay series soon.

Hiring Engineers

I’ve spent much of the past seven years helping recruit great candidates. I’ve probably interviewed over 1,000 people (wow!). In this post, I thought I’d share some of the experiences and beliefs I’ve built up along the way. Before I start, I should say that Ken Moss and Jim Walsh have shared with me their interviewing philosophies and experiences over the years, and much of what I say below was influenced by them.

Sourcing Candidates

The most successful source of candidates is personal referral. Why? The interview process only estimates whether an engineer is great – we all want the error bars to be small, but there’s only a certain amount of information you can gather in a day of interviews. Having prior knowledge of a candidate is invaluable – if you’ve worked or studied with them, you’re able to decrease the error bars significantly. Moreover, there’s the importance of the human side – if you know someone, and you’ve enjoyed working with them, and the feeling is mutual, you’re already ahead of your competitors and you’ve already given the candidate a reason to come work with you. (Sourcing through a recruiting team is also important, but in my experience the error bars are higher, the success rate is lower, and the engineering team’s knowledge isn’t directly applied to the sourcing problem.)

All up, one of the most important things you can do to help build a great team is make personal referrals. Take an old colleague out to lunch!

Great Interviews

I’m passionate about great interviews. I believe in two things: the candidate must have a great experience, and you must interview for core competencies (and not skills and training). Having a great experience is important: even if a candidate isn’t successful in the interview, they’ll talk to their friends, and spread the word about the interview experience – you want that message to be positive. Hiring for competencies is also important: if you focus only on skills, you may not hire people who can grow and change as your business and technology grows and changes.

What are core competencies? They’re inate traits and abilities, such as integrity, communication skills, and courage and conviction. There’s a company called Lominger that has worked hard on developing a list of the competencies, and ways to describe them and help you understand how to assess your profiency at each one. The only list in the public domain that I can find that’s similar is here.

I believe that all competencies are important, but that there are four that are critical to being a successful engineer at the companies I’ve worked at:

  1. Intellectual horsepower
  2. Problem solving skills
  3. Drive for Results
  4. Action-oriented

Interviews should focus on uncovering the capabilities of the candidate at those four compentencies. (Again, I emphasize that other competencies are important – none of us want to work with folks with low integrity, no sense of humor, or no interest in valuing diversity. But I’m much happier approximating the candidate’s skills at those after the interview, rather than making them the focus.)

Intellectual horsepower is basically being smart, and being able to learn and grow when presented with new knowledge. In interviews, I typically measure this by how fast the candidate understands the questions, the types of questions they ask, and how “fast paced” the conversation is. If I learn something from talking to the candidate, where I’m provoked to have a new thought, I’m usually satisfied that the candidate has intellectual horsepower.

We want problem solvers who can solve complex challenges in code. We don’t need software engineers who can only solve an organizational challenge, figure out a clever physics problem, or solve puzzles from NPR’s Car Talk. It’s therefore essential to present computer science problems, and ask the candidate to solve them with real code. I’m not a fan of pseudo code, and I tend to ignore the output of interviews that don’t have real, hard, problem solving problems with coding solutions. Some of my favorites questions for recent college graduates are: reverse a linked list, and write a program to shuffle a deck of cards. (The former either gets a recursive solution, or uses a stack; if they get it fast, ask them to try the other solution. The latter is best solved with a single pass through an array, where each element is swapped with another random element.)

Drive for results means getting things done for a reason, with zeal, and a strong desire to reach a conclusion. People with this competency are “finishers” and will deliver results for the customers and business, and they realize that getting it done is more important than making it perfect. These kinds of people are scrappy, and make the right tradeoffs, and they’re the ones who work the smartest. How do you figure this out? If I’m interviewing college graduates, I ask about their favorite project while they were at college – do they talk about the customer? The impact it had or could have? Was it actually a summer job that shows their passion for results in the real world? Or was it just a technology for technology’s sake? Is it esoteric or applied? Try asking the question, you’ll get an instant feel for what I mean. If it’s someone more experienced, I generally ask about a project or team they’ve enjoyed being on.

Action-oriented means getting started, and being decisive about starting (and often figuring out only what is needed before beginning). These folks are more action and less talk. They’re the ones you will perceive as hard working. This is a hard competency to explicitly understand in an interview – but you can often observe it in their approach to problem solving questions. Do they jump up to the whiteboard, grab a pen, and start solving? Do they make and state assumptions, just so they can get on with it? Or do they endlessly push back and ask questions? Do they criticize you and your question? Do they try and divert the interview somewhere else? Do you have to ask them to get up and use the whiteboard?

When you’re done with an interview, I believe it’s critical to write down the questions you asked and what you learnt. I typically write down what I learnt about the four competencies, and I always begin my writeup with a definitive statement of whether or not I’d hire the candidate. Great decisions are only made after considered thought – and writing it down makes you think, and makes you stand behind a decision. It’s also very useful – others will learn what you ask and how you interpret it, and it’s also a great record for when a candidate applies again at a later date (this happens more frequently than you’d expect). A good interview writeup is several paragraphs in length in my experience.

Great Experiences

We want to win the hiring race. Great candidates will typically interview with multiple companies, and often have multiple offers. When it comes to decision time, they’ll reflect on more than the position you’re offering and the compensation. They’ll think hard about the people they met, the questions they were asked, and how they were treated during the interview process.

Here are some basic tips:

  1. Don’t ask the same question as someone else – understand what’s been asked so far, and show that you know who they’ve already talked to
  2. Read their resume, show interest in them and their experiences. Often, I look for the unique thing and ask about it – “How long have you been playing guitar?” or “How did you enjoy living in London?”
  3. Leave 5 or 10 minutes to answer questions, sell your experience at the company, and give the candidate a chance to use the restroom or get a drink
  4. Show that you’re smart, great at problem solving, you’re action oriented, and driven for results. Be engaged, animated, excited, and passionate about what you’re doing – great folks want to work with great people
  5. If you’re the manager, make sure nothing slips between the cracks. Stay close to recruiting and the candidate, and make sure everything happens in a timely fashion – even when we don’t want to offer a role, make sure the “regret” experience is timely, in person (not an email!), and professional

The bottom line is it’s like running a retail business. Part of success is having a great customer experience – if you upset someone, trust me that you’ll upset five more through word of mouth. On the flip side, do it well, and you’ll have an enhanced reputation as a great technology company that’s well worth considering as a destination.

Thanks for reading. I hope this helps you hire more great folks!

Ideas and Invention (and the story of Bing’s Image Search)

I was recently told that I am an ideas guy. Probably the best compliment I’ve received. It got me thinking, and I thought I’d share a story.

With two remarkable people, Nick Craswell and Julie Farago, I invented infinite scroll in MSN Search’s image search in 2005. Use Bing’s image search, you’ll see that there’s only one page of results – you can scroll and more images are loaded, unlike web search where you have to click on a pagination control to get to page two. Google released infinite scroll a couple of years ago in their image search, and Facebook, Twitter, and others use a similar infinite approach. Image search at Bing was the first to do this.

MSN Search's original image search with infinite scroll

How’d this idea come about? Most good ideas are small, obvious increments based on studying data, not lightning-bolt moments that are abstracted from the current reality. In this case, we began by studying data from image search engines, back when all image search engines had a pagination control and roughly twenty images per page.

Over a pizza lunch, Nick, Julie, and I spent time digging in user sessions from users who’d used web and image search. We learnt a couple of things in a few hours. At least, I recall a couple of things – somewhere along the way we invented a thumbnail slider, a cool hover-over feature, and a few other things. But I don’t think that was over pizza.

Back to the story. The first thing we learnt was that users paginate in image search.  A lot. In web search, you’ll typically see that for around 75% of queries, users stay on page one of the results; they don’t like pagination. In image search, it’s the opposite: 43% of queries stay on page 1 and it takes until page 8 to hit the 75% threshold.

Second, we learnt that users inspect large numbers of images before they click on a result. Nick remembers finding a session where a user went to page twenty-something before clicking on an image of a chocolate cake. That was a pretty wow moment – you don’t see that patience in web search (though, as it turns out, we do see it at eBay).

If you were there, having pizza with us, perhaps you would have invented infinite scroll. It’s an obvious step forward when you know that users are suffering through clicking on pagination for the bulk of their queries, and that they want to consume many images before they click. Well, perhaps the simplest invention would have been more than 20 images per page (say, 100 images per page) – but it’s a logical small leap from there to “infinity”. (While we called it “infinite scroll”, the limit was 1,000 images before you hit the bottom.) It was later publicly known as “smart scroll”.

To get from the inspiration to the implementation, we went through many incarnations of scroll bars, and ways to help users understand where they were in the results set (that’s the problem with infinite scroll – infinity is hard to navigate). In the end, the scroll bar was rather unremarkable looking – but watch what it does as you aggressively scroll down. It’s intuitive but it isn’t obvious that’s how the scroll bar should have worked based on the original idea.

This idea is an incremental one. Like many others, it was created through understanding the customer through data, figuring out what problem the customers are trying to solve, and having a simple idea that helps. It’s also about being able to let go of ideas that don’t work or clutter the experience – my advice is don’t hang onto a boat anchor for too long. (We had an idea of a kind of scratchpad where users could save their images. It was later dropped from the product.)

The Live Search scratchpad. No longer in Bing's Image Search

I’m still an ideas guy. That’s what fires me up. By the way, Nick’s still at Microsoft. Julie’s at Google. And I am at eBay.

(Here’s the patent document, and here’s a presentation I made about image search during my time at Microsoft. All of the details in this blog post are taken from the presentation or patent document.)