Search ranking is the science of ordering search results from most- to least-relevant in response to user queries. In the case of eBay, the dominant user need is to find a great deal on something they want to purchase. And eBay search’s goal is to do a great job of finding relevant results in response to those customer needs.
eBay is amazingly dynamic. Around 10% of the 300+ million items for sale end each day (sell or end unsold), and a new 10% is listed. A large fraction of items have updates: they get bids, prices change, sellers revise descriptions, buyers watch, buyers offer, buyers ask questions, and so on. We process tens of millions of change events on items in a typical day, that is, our search engine receives that many signals that something important has changed about an item that should be used in the search ranking process. And all that is happening while we process around 250 million queries on a typical day.
In this post, I explain what makes eBay’s search ranking problem unique and complex. I’m aiming here to give you a sense of why we’ve built a custom search engine, and the types of technical search ranking challenges we’re dealing with as we rebuild search at eBay. Next week, I’ll continue this post and offer a few insights into how we’re working on the problem.
What’s different about eBay
Here are a few significantly different facets of eBay’s search problem space:
- Under typical load, it takes around 90 seconds from an item being listed by an eBay seller to when it can be found using the search engine. The same is true for any change that affects eBay’s search ranking — for example, if the number of sales of a fixed price multi-quantity item changes, it’s about 90 seconds until that count is updated in our index and can be used in search ranking. Even to an insider, that’s pretty impressive: there’s probably no other search engine that handles inserts, updates, and deletes at the scale and speed that eBay does. (I’ll explain real time index update in detail in a future post, but here’s a paper on the topic if you’d like to know more now.)
- In web search, there are many stable signals. Most documents persist and they don’t change very much. The link graph between documents on the web is reasonably stable; for example, my home page will always link to my blog, and my blog posts have links embedded in them that persist and lead to places on the web. All of this means that a web search engine can compute information about documents and their relationships, and use that as a strong signal in ranking. The same isn’t true of an auction item at eBay (which are live for between 1 and 7 days), and it’s less true of a fixed price item (many of which are live for only 30 days) — the link graph isn’t very valuable and static pages aren’t common at eBay
- eBay is an ecosystem, and not a search-and-leave search engine. The most important problem that web search engines solve is getting you somewhere else on the web — you run a query, you click on a link and you’re gone. eBay’s different: you run a query, you click on a link, and you’re typically still at eBay and interacting with a product, item, or hub page on eBay. This means that at eBay we know much more than at a web search engine: we know what our users are doing before and after they search, and have a much richer data set to draw from to build search ranking algorithms.
- Web search is largely unstructured. It’s mostly about searching blobs of text that form documents, and finding the highest precision matches. eBay certainly has plenty of text in its items and products, but there’s much more structure in the associated information. For example, items are listed in categories, and categories have a hierarchy. We also “paint” information on items as they’re listed in the form of value:attribute pairs; for example, if you list a men’s shirt, we might paint on the item that it is color:green, size:small, and brand:american apparel. We also often know the product that an item is: this is more often the case for listings that are books, DVDs, popular electronics, and motors. Net, eBay search isn’t just about matching text to blobs of text, it’s about matching text or preferences to structured information
- Anyone can author a web document, or create a web site. And it’ll happily be crawled by a search engine, perhaps indexed (depends on what they decide to put in their index), and perhaps available to be found. At eBay, sellers create listings (and sometimes products), and everything is always searchable (usually in 90 seconds under typical conditions). And we know much more about our sellers than a web search engine knows about its page authors
- We also know a lot about our buyers. A good fraction of the customers that search at eBay are logged in, or have cookies in their browser that identify them. Companies like Google and Microsoft also customize their search for their users when they are logged in (arguably, they do a pretty bad job of it — perhaps a post for another time too). The difference between web search and eBay is that we have information about our buyers’ purchase history, preferred categories, preferred buying formats, preferred sellers, what they’re watching, bidding on, and much more
- Almost every item and product has an image, and images play a key role in making purchase decisions (particularly for non-commodity products). We present images in our search results
There are more differences and challenges than these, but my goal here is to give you a taste, not an exhaustive list.
Who has similar problems?
Twitter is probably the closest analog technically to eBay:
- They make use of changing signals in their ranking and so have to update their search indexes in near real-time too. But it’s not possible to edit a tweet and they don’t yet use clicks in ranking, so that means there’s probably much less updating going on than at eBay
- Twitter explains that tweet rates go from 2,000 per second to 6000 to 8000 when there is a major event. eBay tends to have signals that change very quickly for a single item as it gets very close to ending (perhaps that’s similar to retweet characteristics). In both cases, signals about individual items are important in ranking those items, and those signals change quickly (whether they’re tweets or eBay items)
- Twitter is largely an ecosystem like eBay (though many tweets contain links to external web sites)
- Twitter makes everything searchable like eBay, though they typically truncate the result list and return only the top matches (with a link to see all matches). eBay shows you all the matches by default (you can argue whether or not we should)
- Twitter doesn’t really have structured data in the sense that eBay does
- Twitter isn’t as media rich as eBay
- Twitter probably knows much less about their users’ buying and selling behaviors
(Thanks to Twitter engineering manager Krishna Gade for the links.)
Large commerce search engines (Amazon, Bestbuy, Walmart, and so on) bear similarity too: they are ecosystems, they have structure, they know about their buyers, they have imagery, and they probably search everything. The significant differences are they mostly sell products, and very few unique items, and they have vastly fewer sellers. They are also typically dominated by multi-quantity items (for example, a thousand copies of a book). The implication is there is likely vastly less data to search, relatively almost no index update issues, relatively much less inventory that ends, relatively much less diversity, and likely much fewer changing signals about the things they sell. That makes the search technical challenge vastly different; on the surface it seems simpler than eBay, though there are likely challenges I don’t fully appreciate.
Next week, I’ll continue this post by explaining how we think about ranking at eBay, and explain the framework we use for innovation in search.
Thanks for quoting me 🙂 on the link.
Yes realtime search is very interesting for the reasons you mention.
Small correction – we do use a signal composed of retweets and favorites of a tweet and update the index as soon as they arrive. You’re right that tweets are immutable and that works in our advantage. Also indexing latency of a tweet coming into our system and showing up in results is about 10 seconds.
Also like eBay personalizes results according to the buyer’s behavior, twitter personalizes results based on the interest graph of the user (which users one is following).
Interested to know if you are using some type of cycled live/offline system with index updates, or extreme amounts of sharding in the data to handle this type of real-time updates.
Read the paper, thanks for the link, was a good read! With ebay item finishing times, are you also experiencing a 90 second delay before it disappears or do you have a separate ‘instant’ kill for this? Definitely following the series, keen to know if you index the free text areas on postings or not. Also, do you use a generic style internal metadata structure or a more specific E-bay oriented structure? Many, many questions, but I’ll wait for the next in the series before firing them all 🙂
Thank you for the excellent post and the hyperlink to the paper! Given that the postings are accumulated in main memory as documents are added, 90 seconds is still huge!! Will wait for the future posts! 🙂
Hey Krishna, thanks for the follow up — there’s indeed much similarity between Twitter and eBay’s search problems. Perhaps I can summarize the biggest differences as: you’ve got orders of magnitude more documents, and we’ve got orders of magnitude larger documents?
Ben, thanks for both the comments. In this particular series of posts, I’ll be sticking with the ranking problems and talking much less about the technical / architecture / data structures challenges. Would you like to see a technical follow up post at some point?
Ardent, love your logos. Ninety seconds is actually pretty impressive — remember that we need to distribute the changes out onto a grid of several thousand computers…
@Hugh – Yes you’re right that ebay indexes larger documents compared to twitter and while twitter has a larger volume of tweets flowing thru’ the system. Also for anyone to know how internals of twitter’s real-time search engine work, we published our work in this years ICDE conference. You can read it all up here.
Click to access Busch_etal_ICDE2012.pdf
Just like twitter, I wonder, if You tube also has similar interesting search problems. Most of the content is user generated. Incorrect and small text descriptions, new content uploaded every single second,picking the right moment in video for thumbnail, similar videos, copyright violations, quality of recording…
@Hugh, sure, a technical follow up post at some point would be fantastic. Ranking is still equally interesting, especially the user-specific components.
@nilesh – there are some similarities, particularly some structure, plenty of media, and a strong expectation from users that new videos are searchable as soon as possible (I know a bit about video from my experience leading the team that build Bing’s video search).
But there are many differences: updates to items are likely much less frequent, there’s likely a strong anchor text / web graph signal, and the collection is fairly stable (sure, there are new videos — but that’s typically likely a very, very tiny fraction increase in collection size each day).
This blog posts sounds a lot like a peeing contest. Who has the biggest search problem? You seem to be trash talking about web search without knowing much about it.
90 seconds to update an index isn’t all that impressive. Google is able to bring in things in their instant index in less time than that. So is Twitter. In terms of volume, I’m sure that Twitter, Facebook, and Google are seeing larger update rates than eBay.
You picture the web as something fairly static, which is quite far from the reality. Plus, web pages aren’t unstructured as you pretend, there’s a lot of rich structure in them, and good search engines will build a lot of extra structure on top of them. Surprise! pages are analyzed, put in categories, which are part of a category hierarchy, etc.
@Jeff, I don’t think it’s very constructive or insightful to make these kinds of comments and assumptions. Re: the web being full of rich structure. As someone who works with web search day in and out, sadly it isn’t so. There are many good standards to apply rich metadata to web content but it is also easily misunderstood, misused and often abused.
Going back on-topic, Ebay wouldn’t suffer from this to a high degree as much of the data is structured. It was however the reason for my earlier question on the description of the sales being included, as they would be unstructured content.
Hi ! have you published Part 2 of the ebay search engine storey. I cannot find it ?! Thanks very much – great article – let me no if you can !
Are you publishing Part 2 soon I cannot find it ? Many thanks if you can point me to another link or where to go. Thanks ! Great article.
Dave / windchime1 -> https://hughewilliams.com/2012/04/28/ranking-at-ebay-part-2/
Thanks very much for that – much appreciated. I shall give it a good read !
Pingback: The Cassini Search Engine | Hugh E. Williams
Thanks for finally writing about >Ranking at eBay (Part
#1) | Hugh E. Williams <Loved it!