Search ranking is the science of ordering search results from most- to least-relevant in response to user queries. In the case of eBay, the dominant user need is to find a great deal on something they want to purchase. And eBay search’s goal is to do a great job of finding relevant results in response to those customer needs.
eBay is amazingly dynamic. Around 10% of the 300+ million items for sale end each day (sell or end unsold), and a new 10% is listed. A large fraction of items have updates: they get bids, prices change, sellers revise descriptions, buyers watch, buyers offer, buyers ask questions, and so on. We process tens of millions of change events on items in a typical day, that is, our search engine receives that many signals that something important has changed about an item that should be used in the search ranking process. And all that is happening while we process around 250 million queries on a typical day.
In this post, I explain what makes eBay’s search ranking problem unique and complex. I’m aiming here to give you a sense of why we’ve built a custom search engine, and the types of technical search ranking challenges we’re dealing with as we rebuild search at eBay. Next week, I’ll continue this post and offer a few insights into how we’re working on the problem.
What’s different about eBay
Here are a few significantly different facets of eBay’s search problem space:
- Under typical load, it takes around 90 seconds from an item being listed by an eBay seller to when it can be found using the search engine. The same is true for any change that affects eBay’s search ranking — for example, if the number of sales of a fixed price multi-quantity item changes, it’s about 90 seconds until that count is updated in our index and can be used in search ranking. Even to an insider, that’s pretty impressive: there’s probably no other search engine that handles inserts, updates, and deletes at the scale and speed that eBay does. (I’ll explain real time index update in detail in a future post, but here’s a paper on the topic if you’d like to know more now.)
- In web search, there are many stable signals. Most documents persist and they don’t change very much. The link graph between documents on the web is reasonably stable; for example, my home page will always link to my blog, and my blog posts have links embedded in them that persist and lead to places on the web. All of this means that a web search engine can compute information about documents and their relationships, and use that as a strong signal in ranking. The same isn’t true of an auction item at eBay (which are live for between 1 and 7 days), and it’s less true of a fixed price item (many of which are live for only 30 days) — the link graph isn’t very valuable and static pages aren’t common at eBay
- eBay is an ecosystem, and not a search-and-leave search engine. The most important problem that web search engines solve is getting you somewhere else on the web — you run a query, you click on a link and you’re gone. eBay’s different: you run a query, you click on a link, and you’re typically still at eBay and interacting with a product, item, or hub page on eBay. This means that at eBay we know much more than at a web search engine: we know what our users are doing before and after they search, and have a much richer data set to draw from to build search ranking algorithms.
- Web search is largely unstructured. It’s mostly about searching blobs of text that form documents, and finding the highest precision matches. eBay certainly has plenty of text in its items and products, but there’s much more structure in the associated information. For example, items are listed in categories, and categories have a hierarchy. We also “paint” information on items as they’re listed in the form of value:attribute pairs; for example, if you list a men’s shirt, we might paint on the item that it is color:green, size:small, and brand:american apparel. We also often know the product that an item is: this is more often the case for listings that are books, DVDs, popular electronics, and motors. Net, eBay search isn’t just about matching text to blobs of text, it’s about matching text or preferences to structured information
- Anyone can author a web document, or create a web site. And it’ll happily be crawled by a search engine, perhaps indexed (depends on what they decide to put in their index), and perhaps available to be found. At eBay, sellers create listings (and sometimes products), and everything is always searchable (usually in 90 seconds under typical conditions). And we know much more about our sellers than a web search engine knows about its page authors
- We also know a lot about our buyers. A good fraction of the customers that search at eBay are logged in, or have cookies in their browser that identify them. Companies like Google and Microsoft also customize their search for their users when they are logged in (arguably, they do a pretty bad job of it — perhaps a post for another time too). The difference between web search and eBay is that we have information about our buyers’ purchase history, preferred categories, preferred buying formats, preferred sellers, what they’re watching, bidding on, and much more
- Almost every item and product has an image, and images play a key role in making purchase decisions (particularly for non-commodity products). We present images in our search results
There are more differences and challenges than these, but my goal here is to give you a taste, not an exhaustive list.
Who has similar problems?
Twitter is probably the closest analog technically to eBay:
- They make use of changing signals in their ranking and so have to update their search indexes in near real-time too. But it’s not possible to edit a tweet and they don’t yet use clicks in ranking, so that means there’s probably much less updating going on than at eBay
- Twitter explains that tweet rates go from 2,000 per second to 6000 to 8000 when there is a major event. eBay tends to have signals that change very quickly for a single item as it gets very close to ending (perhaps that’s similar to retweet characteristics). In both cases, signals about individual items are important in ranking those items, and those signals change quickly (whether they’re tweets or eBay items)
- Twitter is largely an ecosystem like eBay (though many tweets contain links to external web sites)
- Twitter makes everything searchable like eBay, though they typically truncate the result list and return only the top matches (with a link to see all matches). eBay shows you all the matches by default (you can argue whether or not we should)
- Twitter doesn’t really have structured data in the sense that eBay does
- Twitter isn’t as media rich as eBay
- Twitter probably knows much less about their users’ buying and selling behaviors
(Thanks to Twitter engineering manager Krishna Gade for the links.)
Large commerce search engines (Amazon, Bestbuy, Walmart, and so on) bear similarity too: they are ecosystems, they have structure, they know about their buyers, they have imagery, and they probably search everything. The significant differences are they mostly sell products, and very few unique items, and they have vastly fewer sellers. They are also typically dominated by multi-quantity items (for example, a thousand copies of a book). The implication is there is likely vastly less data to search, relatively almost no index update issues, relatively much less inventory that ends, relatively much less diversity, and likely much fewer changing signals about the things they sell. That makes the search technical challenge vastly different; on the surface it seems simpler than eBay, though there are likely challenges I don’t fully appreciate.
Next week, I’ll continue this post by explaining how we think about ranking at eBay, and explain the framework we use for innovation in search.