Monthly Archives: April 2012

Ranking at eBay (Part #2)

In part 1 of Ranking at eBay, I explained what makes the eBay search problem different to other online search problems. I also explained why there’s a certain kinship with Twitter, the only other engine that deals with the same kinds of challenges that eBay does. To sum it up, eBay’s search problem is different because our items aren’t around for very long, the information about the items changes very quickly, and we have over 300 million items and the majority are not products like you’d find on major commerce web sites like Walmart or Amazon.

In this post, I explain how we think about using data in the eBay ranking problem. In the next post, I’ll explain how we combine all of that data to compute our Best Match function, and how it’s all coming together in a world where we are rebuilding search at eBay.

Ranking Factors at eBay

Let’s imagine that you and I work together and run the search science team at eBay. Part of our role is to help make sure that the items and products that are returned when a customer runs a query are ordered correctly. Correctly means that the most relevant item to the customer’s information need is in the first position in our search results, the next most relevant is in the second position, and so on.

What does relevant mean? In eBay’s case, you could abstract it to say that the item is great value from a trusted seller, it matches the intent of the query, and it’s something that buyers want to buy. For example, if the customer queries for a polaroid camera, our best result might be a great, used, vintage Polaroid camera in excellent condition. Of course, it’s subjective: you could argue it should be a new generation Polaroid camera, or some other plausible argument. In a general sense, relevance is approximated by computing some measure of statistical similarity — obviously, search engines can’t read a user’s mind, so they compute information to score how similar an item is to a query, and add any other information that’s query independent and can help. (In a future post, I’ll come back and explain how we understand whether we’ve got it right, and work to understand what the underlying intent is behind a query.)

Let’s agree for now that we want to order results from most- to least-relevant to a query, when the user is using our default Best Match sorting feature. So, how do we do that? The key is having information about what we’re ranking: and I’ll argue that the more, different information we have, the better job we can do. Let’s start simply: suppose we only have one data source, the title of the item. I’ve shown below an item, and you can see it’s title at the top, “NICE Older POLAROID 600 Land Camera SUN AUTO FOCUS 660″.

A Polaroid Camera on eBay. Notice the title of the item, "NICE Older POLAROID 600 Land Camera SUN AUTO FOCUS 660"

Let’s think about the factors we can use from the item title to help us order results in a likely relevant way:

  • Does the title contain the query words? The rationale for proposing this factor is pretty simple: if the words are in the title, the item is more relevant than an item that doesn’t contain the words.
  • How frequently are the query words repeated in the title? The rationale is: the more the words are repeated, the more likely that item is to be on the topic of the query, and so the more relevant the item.
  • How rare are each of the query words that match in the title? The rationale is that rarer words across all of the items at eBay are better discriminators between relevant and irrelevant items; in this example, we’d argue that items containing the rarer word polaroid are probably more likely to be relevant than items containing the less rare word camera.
  • How near are the query words to the beginning of the title? The argument is that items with query words near the beginning of the title are likely more relevant than those containing the query words later in the title, with the rationale that the key topic of the item is likely mentioned first or early in the title. Consider two examples to illustrate:  Polaroid land camera 420 1970s issued still in nice shape retro funk, and PX 100 Silver Shade Impossible Project Film for Polaroid SX-70 Camera. (The former example is a camera, the latter example is film for a camera.)

Before I move on, let me just say that these are example factors. I am not sharing that we do or don’t use these factors in ranking at eBay. What I’m illustrating is that you and I can successfully, rationally think about factors we might try in Best Match that might help separate relevant items from irrelevant items. And, overall, when we combine these factors in some way, we should be able to produce a complete ordering of eBay’s results from most- to least-relevant to the query.

So far, I’ve given you narrow examples about text factors from the title. There are many other text factors we could use: factors from the longer item description, category information, text that’s automatically painted onto the item by our algorithms at listing time, and more. If we worked through these methodically, we could together write down factors that we thought might intuitively help us rank items better. At the end of process, I’m guessing we’d have written downs tens of factors for the text alone we have at eBay.

You can see my argument coming together: if you used just one or two of these factors, you might do a good, basic job of ranking items. But if you use more information, you’ll do better. You’ll be able to more effectively discern differences between items, and you’ll do a better job of ranking the items. Net, the more (new, different, and useful) information you have, the better.

What’s key here is that we need different factors, and we need factors that actually do the right thing. There are some simple ways we can test the intuition about a factor before we use it. For example, we could ask a simple question: do users buy more of items that have this factor than those that don’t? In practice, there’s much more sophisticated things we can do to validate a factor before we decide to actually build it into search (and I’ll leave that discussion to another time).

The Factor Buckets

I believe in a five bucket framework of factors to build our eBay Best Match ranking function:

  1. Text factors (discussed above)
  2. Image factors
  3. Seller factors
  4. Buyer factors
  5. Behavioral factors

Pictures or images are an important part of the items and products at eBay. Images are therefore an interesting possible source of ranking factors. For example, we know that users prefer pictures where the background is a single color, that is, where the object of interest is easily distinguished from the background.

The seller is an important part of the buyer’s decision to purchase. You can likely think of many factors that we could include in search: how long have they been selling? How’s their feedback? Do they ship on time? Are they a trusted seller?

Buyer factors is an interesting bucket. If you think about the buyer, there’s many potential factors you might want to explore. Do they always buy fixed price items? What are the categories they buy in? What’s the shoe size they keep on asking for in their queries? Do they buy internationally?

Behavioral factors are also an exciting bucket. Here’s a few examples we could work on: does this item get clicks from buyers for this query? What’s the watch count on the item? How many bids does the auction have? How many sales have their been of this fixed price item, given it’s been shown to users that many times? If you want to dig deeper into this bucket, Mike Mathieson wrote a super blog post on part of our behavioral factor journey.

Where we are on the factors journey

We formed our search science team in late 2009, when Mike Mathieson joined our team. We’ve built the team from Mike to tens of folks in the past couple of years, and we’re on a journey to make search awesome at eBay. Indeed, if you want to join the team — and have an awesome engineering or applied science background, you can always reach out to me.

Right now, we use several text factors in Best Match, we have released a few seller factors and behavioral factors, and we have begun working on image and buyer factors. All up, we have tens of factors in our Best Match ranking function. You might ask: all of these factors seem like they’d be useful, so why haven’t you done more? There’s a few good reasons:

  1. Our current search engine doesn’t make it easy to flexibly combine factors in ranking. (that’s one good reason why we’re rewriting search at eBay.)
  2. It takes engineering time to develop a factor, and make it available at query time for the search ranking process. In many cases, factors are extremely complex engineering projects — for example, imagine how hard it is to process images and extract factors when there’s 10 million new items per day (and most items have more than 1 image), and you’re working hard to get additions to the index complete within 90 seconds. Or imagine how challenging it is to have real-time behavioral factors available in a multi-thousand computer search grid within a few seconds. (If you’ve read Part #1 of this series, you’ll appreciate just how real-time search is at eBay.)
  3. Experimentation takes time. Intuition is the easy part, building the factor, combining it with other factors, testing the new ranking function with users, and iterating and improving takes time. I’ll talk more about experimentation and testing in my next post

In the third and final post in this series, I’ll explain more about how we combine factors and give you some insights into where we are on the search journey at eBay. Thanks for reading: please share this post with your friends and colleagues using the buttons below.

Fighting fit: Why you need to be in top shape to be a leader

People are surprised I lead a team of over 700 people and find time to stay in shape. For me, one isn’t possible without the other. And my advice to you is to take your physical wellbeing seriously if you want to have impact over the long haul.

I believed for a long time that my impact at work was simply the product of the quantity of time by the quality of how I used it. Quantity just means hours spent. Quality means what I spend those hours doing, that is, how effectively I use my time.  I’ve never met a successful person who doesn’t work hard and use their time effectively. And for you that means: work hard and smart, and you’ll have the basic ingredients for success.

But it turns out for me that this basic equation doesn’t work for the long haul. There are two other ingredients for me: physical and mental condition. If my physical condition is great — I am fighting fit– then I’m alert, less stressed, positive, less prone to illness, confident, balanced, and slower to burn out. Being mentally in top shape is critical too, particularly making sure I find meaning in what I’m doing, and getting the balance right between family and work time (a topic for another time). So, these days I’d argue that my impact at work is something like: quantity times quality times physical condition times mental condition (with some constants that I don’t yet understand). In this post, I’m going to tell you why you should stay fighting  fit too.

This is an entirely non-technical post from a primarily technical person. Take it with a grain of salt, and see your doctor before you take any of my advice.

Top 10 Tips for being Fighting Fit ™

Let me just cut to the chase, and tell you the top ten things that you can use to be fighting fit:

  1. Don’t eat wheat. Better still, don’t eat grains
  2. Avoid high sugar foods. If it has more than 10g of sugar per 100g of product, don’t eat it
  3. Drink lots of water. Aim for at least 96oz or 3 litres per day
  4. Get a decent night’s sleep. It feels to me like 8+ hours is the sweet spot
  5. Have a big breakfast
  6. Have a small dinner
  7. Work the big muscles with resistance training three times per week
  8. Stretch
  9. Do cardiovascular exercise
  10. Ignore the above nine items for just one day each week (and be perfect the other six)

That’s in priority order. The top six are all about nourishment, the next three are about fitness, and the last one is a rule that governs how to apply the others.

To be fighting fit, it’s 70% nutrition and 30% exercise. I’ve worked incredibly hard at exercise and weighed 15 pounds more than I do today. These days, I’m pretty much at my high school weight, stronger than I’ve ever been, and the difference is nutrition (and perhaps more focus on strength or resistance training).

Nutrition

Do you want to be 15 pounds lighter? Follow rule #1 and you’ll be well on your way. Don’t eat grains because they’re full of carbohydrates, and that causes insulin to spike, and the body to enthusiastically store carbohydrates as body fat. Same with high sugar foods like sodas. Instead, eat more protein, and healthy fats. I’m big on egg whites, nuts, avocado, meats, and so on. Try a salad for lunch, with plenty of chicken, turkey, or tuna.

Fats don’t make you fat. Fats are just an intense source of energy, and you need to avoid eating too much. Eat nuts, avocado, egg yolks, and other healthy fats in moderation. Carbohydrates are the bad news problem.

Drink lots of water to keep yourself hydrated, and your metabolism running efficiently. Everything I read says drinking lots of water is a good idea.

I eat a massive breakfast, and try and go easy at dinner (though I struggle to do that effectively). The rationale is that in the morning, I need energy to get through the day. In the evening, I’m going to bed, so there’s no sense in consuming a ton of calories. Try and tilt your plan in that direction.

If that’s all too hard, follow rule#1: don’t eat wheat. You’ll get somewhere, trust me.

Exercise

Exercising is my passion. I hit the gym four or five days a week, run a couple of times per week, do yoga once a week, and add in some exercise at home (like mountain biking, boxing, jump rope, or agility work) on the weekends. It just plain makes me feel great, lowers my stress, and gives me space and time to think about ideas and problems that are important in life and work.

How do I fit all that in? Pretty simply, really: I just make it my number one priority. When I was at Microsoft, my motto was “I’m not canceling the gym for anyone except Bill Gates”. And I stuck to it and still do. My rationale is that the company needs me to be effective for the long haul, and this is what makes me effective. I’m happy to be at work any time I’m not in the gym.

I’ve learnt that to be fighting fit, you need to do more strength training and less cardiovascular exercise. The nice thing about strength training is you burn some calories while you’re in the gym, and then a lot more afterwards: your body is busy repairing and growing the muscles you’ve worked. Most cardio burns more per minute in the gym than strength training while you’re doing it, but then the burn stops afterwards. Focus on your big muscles: leg and butt, chest, core, and muscles that help you maintain a reasonable posture (given you likely sit around a lot in front of computers). Working those muscles burns more calories than the ones you see in the mirror (you can skip the biceps). Get a personal trainer, ask them to put together a strength training routine, and do it 2 or 3 times per week. The results will amaze you.

It turns out that exercising hard requires maintenance. Maintenance for me is stretching, and I use yoga as the key way to do that. Yoga is seriously hard work: it requires core strength, balance, and flexibility. I’m not good at it, but it’s helping me be flexible and loose, and that helps me stay fighting fit.

I like cardio, I love going for a run (that’s something I’ve been doing regularly since 1995). I also love riding my bike. So, I get out and do some. But strength training is the key: if you don’t have much time, skip the cardio and go do some strength training.

Cheating

I try hard to be good for six days in every seven. I have no trouble doing that with exercise. But with food it’s harder. One day a week, I let loose. I do whatever I want, and that gives me willpower for the rest of the week.

This is really important for you: cheat every day and you will get nowhere. If you want to be fighting fit, be disciplined six days out of seven.

Final Thoughts

Personal training is a great investment. I’d recommend to you that you get a personal trainer: it makes strength training safe and challenging, and helps you learn about how to make yourself fighting fit. Getting some nutritional advice from a nutritionist is a great idea too; diets are the worst thing in the world, it’s far smarter to eat to a plan and enjoy the results.

So that’s my Fighting Fit plan to make you an effective leader for the long haul. Remember the basics: don’t eat wheat, avoid high sugar foods, get in some strength training 2 or 3 times per week, and cheat once per week. You will be a fighting fit machine in no time (and I look forward to hearing about your results).

Please don’t blindly copy my plan. Please talk to your doctor, fitness professional, or nutritionist. And remember that I am a computer scientist, so you should Read My Disclaimer.

An Afterword of Thanks

My trainer is David Macchi in the eBay gym. Dave’s awesome: he’s taught me hundreds of exercises, and got me working on muscles that help posture and keep me balanced. He’s also good on the nutrition tips, and pushes me that little bit harder than I’d push myself. We’ve also partnered together on programs to help get our technology team at eBay more active, and help charity at the same time. I’m working hard to spread the fighting fit message.

Cheat day, and focusing harder on nutrition, is a strategy I learnt by participating in a “12 week challenge” with the I Choose Awesome guys in Inverloch, Australia. Great guys, and I owe them a bunch of thanks for helping me explore more about being fighting fit. They also taught me some sayings:  “Nothing tastes as good as lean feels” and “Pain is just weakness leaving the body”. You might need those sayings.

My trainer when I lived in Redmond, Washington, and worked at Microsoft, was Dirk Huebner. Dirk got me excited about agility drills, Fartlek training, and medicine balls. Another great guy to know.

Ranking at eBay (Part #1)

Search ranking is the science of ordering search results from most- to least-relevant in response to user queries. In the case of eBay, the dominant user need is to find a great deal on something they want to purchase. And eBay search’s goal is to do a great job of finding relevant results in response to those customer needs.

eBay is amazingly dynamic. Around 10% of the 300+ million items for sale end each day (sell or end unsold), and a new 10% is listed. A large fraction of items have updates: they get bids, prices change, sellers revise descriptions, buyers watch, buyers offer, buyers ask questions, and so on. We process tens of millions of change events on items in a typical day, that is, our search engine receives that many signals that something important has changed about an item that should be used in the search ranking process. And all that is happening while we process around 250 million queries on a typical day.

In this post, I explain what makes eBay’s search ranking problem unique and complex. I’m aiming here to give you a sense of why we’ve built a custom search engine, and the types of technical search ranking challenges we’re dealing with as we rebuild search at eBay. Next week, I’ll continue this post and offer a few insights into how we’re working on the problem.

What’s different about eBay

Here are a few significantly different facets of eBay’s search problem space:

  1. Under typical load, it takes around 90 seconds from an item being listed by an eBay seller to when it can be found using the search engine. The same is true for any change that affects eBay’s search ranking — for example, if the number of sales of a fixed price multi-quantity item changes, it’s about 90 seconds until that count is updated in our index and can be used in search ranking. Even to an insider, that’s pretty impressive: there’s probably no other search engine that handles inserts, updates, and deletes at the scale and speed that eBay does. (I’ll explain real time index update in detail in a future post, but here’s a paper on the topic if you’d like to know more now.)
  2. In web search, there are many stable signals. Most documents persist and they don’t change very much. The link graph between documents on the web is reasonably stable; for example, my home page will always link to my blog, and my blog posts have links embedded in them that persist and lead to places on the web. All of this means that a web search engine can compute information about documents and their relationships, and use that as a strong signal in ranking. The same isn’t true of an auction item at eBay (which are live for between 1 and 7 days), and it’s less true of a fixed price item (many of which are live for only 30 days) — the link graph isn’t very valuable and static pages aren’t common at eBay
  3. eBay is an ecosystem, and not a search-and-leave search engine. The most important problem that web search engines solve is getting you somewhere else on the web — you run a query, you click on a link and you’re gone. eBay’s different: you run a query, you click on a link, and you’re typically still at eBay and interacting with a product, item, or hub page on eBay. This means that at eBay we know much more than at a web search engine: we know what our users are doing before and after they search, and have a much richer data set to draw from to build search ranking algorithms.
  4. Web search is largely unstructured. It’s mostly about searching blobs of text that form documents, and finding the highest precision matches. eBay certainly has plenty of text in its items and products, but there’s much more structure in the associated information. For example, items are listed in categories, and categories have a hierarchy. We also “paint” information on items as they’re listed in the form of value:attribute pairs; for example, if you list a men’s shirt, we might paint on the item that it is color:green, size:small, and brand:american apparel. We also often know the product that an item is: this is more often the case for listings that are books, DVDs, popular electronics, and motors. Net, eBay search isn’t just about matching text to blobs of text, it’s about matching text or preferences to structured information
  5. Anyone can author a web document, or create a web site. And it’ll happily be crawled by a search engine, perhaps indexed (depends on what they decide to put in their index), and perhaps available to be found. At eBay, sellers create listings (and sometimes products), and everything is always searchable (usually in 90 seconds under typical conditions). And we know much more about our sellers than a web search engine knows about its page authors
  6. We also know a lot about our buyers. A good fraction of the customers that search at eBay are logged in, or have cookies in their browser that identify them. Companies like Google and Microsoft also customize their search for their users when they are logged in (arguably, they do a pretty bad job of it — perhaps a post for another time too). The difference between web search and eBay is that we have information about our buyers’ purchase history, preferred categories, preferred buying formats, preferred sellers, what they’re watching, bidding on, and much more
  7. Almost every item and product has an image, and images play a key role in making purchase decisions (particularly for non-commodity products). We present images in our search results

There are more differences and challenges than these, but my goal here is to give you a taste, not an exhaustive list.

Who has similar problems?

Twitter is probably the closest analog technically to eBay:

  • They make use of changing signals in their ranking and so have to update their search indexes in near real-time too. But it’s not possible to edit a tweet and they don’t yet use clicks in ranking, so that means there’s probably much less updating going on than at eBay
  • Twitter explains that tweet rates go from 2,000 per second to 6000 to 8000 when there is a major event. eBay tends to have signals that change very quickly for a single item as it gets very close to ending (perhaps that’s similar to retweet characteristics). In both cases, signals about individual items are important in ranking those items, and those signals change quickly (whether they’re tweets or eBay items)
  • Twitter is largely an ecosystem like eBay (though many tweets contain links to external web sites)
  • Twitter makes everything searchable like eBay, though they typically truncate the result list and return only the top matches (with a link to see all matches). eBay shows you all the matches by default (you can argue whether or not we should)
  • Twitter doesn’t really have structured data in the sense that eBay does
  • Twitter isn’t as media rich as eBay
  • Twitter probably knows much less about their users’ buying and selling behaviors

(Thanks to Twitter engineering manager Krishna Gade for the links.)

Large commerce search engines (Amazon, Bestbuy, Walmart, and so on) bear similarity too: they are ecosystems, they have structure, they know about their buyers, they have imagery, and they probably search everything. The significant differences are they mostly sell products, and very few unique items, and they have vastly fewer sellers. They are also typically dominated by multi-quantity items (for example, a thousand copies of a book). The implication is there is likely vastly less data to search, relatively almost no index update issues, relatively much less inventory that ends, relatively much less diversity, and likely much fewer changing signals about the things they sell. That makes the search technical challenge vastly different; on the surface it seems simpler than eBay, though there are likely challenges I don’t fully appreciate.

Next week, I’ll continue this post by explaining how we think about ranking at eBay, and explain the framework we use for innovation in search.

Clicks in search

Have you heard of the Pareto principle? The idea that 80% of sales come from 20% of customers, or that the 20% of the richest people control 80% of the world’s wealth.

How about George K. Zipf? The author of the “Human behavior and the principle of least effort” and “The Psycho-Biology of Language” is best-known for “Zipf’s Law“, the observation that the frequency of a word is inversely proportional to the rank of its frequency. Over simplifying a little, the word “the” is about twice as frequent as the word “of”, and then comes “and”, and so on. This also applies to the populations of cities, corporation sizes, and many more natural occurrences.

I’ve spent time understanding and publishing work how Zipf’s work applies in search engines. And the punchline in search is that the Pareto principle and Zipf’s Law are hard at work: the first item in a list gets about twice as many clicks as the second, and so on. There are inverse power law distributions everywhere.

The eBay Search Results Click Curve

Here’s the eBay search results click curve, averaged over a very large number of queries. The y-axis is the total number of clicks on each result position, and the x-axis is the result position. For example, you can see that the first result in search (the top result, the first item you see when you run a query) gets about eight times as many clicks on average as the fifteenth result. The x-axis is labelled from 1 to 200, which is typically four pages of eBay results since we show 50 results per page by default.

eBay Click Curve. The y-axis is number of clicks per result, and the x-axis is the result position.

As a search guy, I’m not surprised by this curve (more on that topic later). It’s a typical inverse power law distribution (a “Zipf’s Law” distribution). But there are a couple of interesting quirks.

Take a look at the little bump around result position 50 on the x-axis. Why’s that there? What’s happening is that after scrolling for a while through the results, many users scroll to the very bottom of the page. They then inspect the final few results on the page (results 46 to 50), just above the pagination control. Those final few results therefore get a few more clicks than the ones above that the user skipped. Again, this isn’t a surprise to me — you’ll often see little spikes after user scroll points (in web search, you’ll typically see a spike in result 6 or 7 on a 10-result page).

I’ve blown up the first ten positions a little more so that you can see the inverse power law distribution.

Search click curve for the first 10 results in eBay search.

You can see that result 1 gets about 4 times as many clicks as result 10. You can also see that result 2 gets about 5/9ths of the clicks as result 1. This is pretty typical — it’s what you’d expect to see when search is working properly.

Interestingly, even if you randomize the first few results, you’ll still see a click curve that has an inverse power law distribution. Result 1 will almost always get more clicks than result 2, regardless of whether it’s less relevant.

Click Curves Are Everywhere

Here are some other examples of inverse power law distributions that you’ll typically see in search:

  • The query curve. The most popular query is much more popular than the second most popular query, and so on. The top 20% of queries account for at least 80% of the searches. That’s why caching works in search: most search engines serve more than 70% of their results from a cache
  • The document access curve. Because the queries are skew in distribution, and so are the the clicks per result position, it’s probably not surprising that a few documents (or items or objects) are accessed much more frequently than others. As a rule of thumb, you’ll typically find that 80% of the document accesses go to 20% of the documents. Pareto at work.
  • Clicks on related searches. Most search engines show related searches, and there’s a click curve on those that’s an inverse power law distribution
  • Clicks on just about any list: left navigation, pagination controls, ads, and any other list will typically have an inverse power law distribution. That’s why there’s often such a huge price differential between what advertisers will pay in search for the top position versus the second position
  • Words in queries, documents, and ads. Just like Zipf illustrated all those years ago, word frequencies follow an inverse power law distribution. Interestingly, and I explain this in this paper, Zipf’s formal distribution doesn’t hold very well on words drawn from web documents (a thing called Heap’s law does a better job). But the point remains: a few words account for much of the occurrences

What does this all mean? To a search guy, it means that when you see a curve that isn’t an inverse power law distribution, you should worry. There’s probably something wrong — an issue with search relevance, a user experience quirk (like the little bump I explained above), or something else. Expect to see curves that decay rapidly, and worry if you don’t.

See you again next Monday for a new post. If you’re enjoying the posts, please share with your friends by clicking on the little buttons below. Thanks!

Snippets: The unsung heroes of web search

Remember when Google launched into our consciousness? Below you’ll see what you probably saw back in 1999: a clean, simple results page with ten blue links.

Google search results, circa 1999

Google was a marked contrast to the other search engines of the era: AltaVista, Excite, Lycos, Looksmart, Inktomi, Yahoo!, Northern Light, and others.

What was different about Google? Most people talked about three things:

  1. PageRank, the use of the web’s link structure in search ranking. The perception was that Google found more relevant results than its competitors. Here’s the original paper
  2. Its clean page design. It wasn’t a link farm, there was plenty of white space, and an absence of advertising (banner or otherwise) for the first couple of years
  3. Its speed and its collection size. Google felt faster, reported fast search times, claimed a large index (remember when they used to say how many documents they had on the home page?), and showed large result set counts for your queries (result set estimation is a fun game – I should write a blog post about it)

What struck me most was the way Google presented result summaries on the search results page. Google’s result summaries were contextual: they were constructed of fragments of the documents that matched the query, with the query words highlighted. This was a huge step forward from what the others were doing: they were mostly just showing the first hundred or more characters of the document (which often was a mess of HTML, JavaScript, and other rubbish). With Google, you could suddenly tell if the result was relevant to you — and it made Google’s results page look incredibly relevant compared to its competitors.

And that’s what this blog post is about: contextual result summaries in search, snippets.

Snippets

Snippets are the summaries on the search results page. There’s typically ten of them per results page. Take a look at the result below from the Google search engine, it’s one of the ten snippets for the query gold base bobblehead.

A Google Snippet

The snippet is designed to help you make a decision: is this relevant or not to my information need? It helps you decide whether or not to invest a click. It’s important — it’s key to our impression of whether or not a search engine works well. The snippets are the bulk of the results page, they’re displayed below the brand of the search engine, and together they represent what the search engine found in response to your query. If they’re irrelevant, your impression of the search engine is negative; and that’s regardless of whether or not the underlying pages are relevant or not.

The snippet has three basic components:

  1. The title of the web document, usually what’s in the HTML <title> tag. In this case, “Vintage Baseball Memorabilia Bobble Heads Gold Base
  2. A representation of the URL of the web resource, which you can click on to visit the document on the web. In this case, “keymancollectibles.com/bobbleheads/goldbasenod.htm”
  3. (Usually) a query biased summary. One, two, or three fragments extracted from the document that are contextually relevant to the query. In this case, “The Gold base series of Bobbing Heads was the last one issued in the 60′s. They were sold from 1966 through 1971. This series saw the movement of several …”
Some words in the title, URL, and query biased summary are shown in bold. They’re the query words that the user typed (in this case, the query was “gold base bobblehead“) or variants of the query words, such as plurals, synonyms, or acronym expansions or contractions. (My recent post on query rewriting introduces how variants are discovered and used.) The idea here is that highlighting the query terms helps the user identify the likely relevant pieces of the snippet, and quickly make a decision about the relevance of the web resource to their need.

I’m going to come back to discussing query biased summaries, and how the fragments of text are found and ranked. But, before I do, let’s quickly cover what’s been happening in snippets over the past few years.

What’s happening in the world of snippets

All search engines show quicklinks or deeplinks for navigational queries. Navigational queries are those that are posed by users when they want to visit a specific location on the web, such as Google, Microsoft, or eBay. Take a look at the snippet below from Bing for the query nordstrom. What you can see below the snippet are links that lead to common pages on the nordstrom.com site including “Men”, “Baby & Kids”, and so on; these are what’s known in the industry as quicklinks or deeplinks.  When I was managing the snippets engineering team at Bing, we worked on enhancing snippets for navigational queries to show phone numbers, related sites, and other relevant data. The example for the query nordstrom shows the phone number and a search box for searching only the nordstrom.com site.

Nordstrom navigational query snippet from Bing

At Bing, we also invented customizing snippets for particular sites or classes of sites — for example, below is  a snippet that’s customized for a forum site. You can see that it highlights the number of posts, author, view count, and publication date separately from the query biased summary.

Custom snippet for a forum site from Bing

While Bing pioneered it, Google has certainly got in on the game — I’ve included below a few different Google examples for snippets from LinkedIn, Google Play, Google Scholar, and Fox Sports.

Custom Google snippets for LinkedIn, Google Scholar, Google Play, and Fox Sports

What’s all this innovation for? It’s to help users make better decisions about whether or not to click, and to occasionally give extra links to click on. For the Fox Sports snippet, it’s showing the top three headlines from the site — and that’s perhaps what the user wants when they query for Fox Sports (they don’t really want to go to the home page, they want to read a top story). For the Google Play and Scholar snippets, there’s information in the snippet that informs us whether the results is authoritative: has it been cited by other papers many times? Has it been voted on many times? The LinkedIn result is extracting key information and presenting it separately: the city that the person live in and who they work for, and it’s also including some data from Google+.

In my opinion, snippets are becoming too heterogeneous, the search engines are going too far. Users find it hard to switch from processing pictures to reading text, and they’re becoming too intermingled to allow users to quickly scan results pages. The snippets have too many different formats: users have to pause, understand what template they’re looking at, digest the information, and make a decision. The text justification is getting shaky. Life was simpler and faster when snippets were more consistent. I am sure the major search engines would argue that users are making better decisions (for example, I bet they’re seeing on lower mean average times from when the user types a query to when they invest their click). But I think the aesthetics are getting lost, and it’s getting a little too much for most users.

Query-Biased Summaries

Google didn’t invent query-biased summaries, Tassos Tombros and Mark Sanderson invented them in this 1998 paper.

The basic method for producing query-biased summary goes like this:

  1. Process the user’s query using the search index, and get a list of web resources that match (typically ten of them, and typically web [HTML] pages)
  2. For each document that matches:
    1. Find the location of the document in the collection
    2. Seek and read the document into memory
    3. Process the document sequentially and:
      1. Extract all fragments that contain one or more query words (or related terms — you get these from the query rewriting service)
      2. Assign a score to each fragment that approximates its likely relevance to the query
      3. Find the best few fragments based on the best few scores (one, two, or three)
    4. Neaten up the best fragments so they make sense (try and make them begin and end nicely on sentence boundaries or other sensible punctuation)
    5. Stitch the fragments together into one string of text, and save this somewhere
  3. Show the query-biased summaries to the user
I’ve spoken to engineers from the major search companies, and it’s pretty clear that producing snippets is one of the computationally most challenging problems they face. You only execute one query per user request, but you’ve got to produce ten summaries per results page. And unlike the search process, there’s no inverted index structure to help locate the terms in the documents. There are a few tricks that help:
  • Don’t process the entire document. In general, the fragments that are the best summaries are nearer the start of the documents, and so there’s diminishing returns in processing the entirety of large documents. It’s also possible to reorganize the document, by selecting fragments that are very likely to appear in snippets, and putting them together at the start of the document
  • Cache the popular results. Save the query-biased summary after it’s computed, and reuse it when the same query and document pair is requested again
  • Cache documents in memory. It’s been shown that caching just 1% of the documents in memory saves more than 75% of the disk seeks (which turn out to be a major bottleneck)
  • Compress the documents so they’re fast to retrieve from disk. Better still, compress the query words, and compare them directly to the compressed document. This is the subject of a paper I wrote with Dave Hawking, Andrew Turpin, and Yohannes Tsegay a few years ago. We showed that this makes producing snippets around twice as fast as without compression

Most search folks don’t think about the ranking techniques that are used in the snippet generator. The big ranking teams at major search companies focus on matching queries to documents. But the snippet generator does do interesting ranking: its task is to figure out the best fragments in the document that should be shown to the user, from all of the matching fragments in the document. Tombros and Sanderson discuss this briefly: they scored sentences using the square of the number of query words in the sentence divided by the number of query words in the query. Net, the more query words in the fragment, the better. I’d recommend adding a factor that gives more weight to fragments that are closer to the beginning of the document. There are also other things to consider: does the fragment make sense (is it a complete sentence)? Which of the query words are most important, and are those in the fragment? Do the set of fragments we’re showing cover the set of query terms? You could even consider how the set of query-biased summaries give an overall summary of the topic. I’m sure there’s lots of interesting ideas to try here.

Snippets have always fascinated me, and I believe their role in search is critical to users, unsung, and important to building a great search engine.

Hope you enjoyed this post — if you did, please share it with your friends through your favorite social network. There’ll be a new post next Monday…