Changing platforms at Scale: Lessons from eBay

At eBay, we’ve been on a journey to modernize our platforms, and rebuild old, crufty systems as modern, flexible ones.

In 2011, Sri Shivananda, Mark Carges, and I decided to modernize our front-end development stack. We were building our web applications in v4, an entirely home-grown framework. It wasn’t intuitive to new engineers who joined eBay, they were familiar with industry standards we didn’t use, and we’d also built very tightly coupled balls of spaghetti code over many years. (That’s not a criticism of our engineers – every system gets crufty and unwieldy eventually; software has an effective timespan and then it needs to be replaced.)

Sri Shivananda, eBay Marketplaces' VP of Platform and Intrastructure

Sri Shivananda, eBay Marketplaces’ VP of Platform and Intrastructure

We set design goals for our new Raptor framework, including that we wanted to do a better job separating presentation from business logic. We also wanted better tools for engineers, faster code build times, better monitoring and alerting when problems occur, the ability to test changes without restarting our web servers, and a framework that was intuitive to engineers who joined from other companies. It was an ambitious project, and one that Sri’s lead as a successful revolution in the Marketplaces business. We now build software much faster than ever before, and we’ve rewritten major parts of the front-end systems. (And we’ve open sourced part of the framework.)

That’s the context, but what this post is really about is how you execute a change in platforms in a large company with complex software systems.

The “Steel Thread” Model

Mark Carges, eBay's Chief Technical Officer

Mark Carges, eBay’s Chief Technical Officer

Our CTO, Mark Carges, advocates building a “steel thread” use case when we rethink platforms. What he means is that when you build a new platform, build it at the same time as a single use case on top of the platform. That is, build a system with the platform, like a steel thread running end-to-end through everything we do.

A good platform team thinks broadly about all systems that’ll be built on the platform, and designs for the present and the future. The risk is they’ll build the whole thing – including features that no one ultimately needs for use cases that are three years away. Things change fast in this world. Large platform projects can go down very deep holes, and sometimes never come out.

The wisdom of the “steel thread” model is that the platform team still does the thinking, but it’s pushed by an application team to only fully design and build that parts that are immediately needed. The tension forces prioritization, tradeoffs, and a pragmatism in the platform team. Once you’re done with the first use case, you can move onto subsequent ones and build more of the platform.

Rebuilding the Search Results Page

Our first steel thread use case on Raptor was the eBay Marketplaces Search Results Page (the SRP). We picked this use case because it was hard: it’s our second-most trafficked page, and one of our most complex; building the SRP on Raptor would exercise the new platform extensively.

We co-located our new Raptor platform team – which was a small team by design – together with one of our most mission critical teams, the search frontend team. We declared that their success was mutually dependent: we’re not celebrating until the SRP is built on Raptor.

We asked the team to rebuild the SRP together. We asked for an aggressive timeline. We set bold goals. But there was one twist: build the same functionality and look-and-feel as the existing SRP. That is, we asked the team to only change one variable: change the platform. We asked them not to change an important second variable: the functionality of the site.

This turned out to be important. The end result – after much hard work – was a shiny new SRP code base:  modular, cleaner, simpler, and built on a modern platform.  But it looked and behaved the same as the old one. This allowed us to test one thing: is it equivalent for our customers to the old one?

Testing the new Search Results Page

We ran a few weeks of A/B tests, where we showed different customer populations the old and new search results page. Remember, they’re pretty much the same SRPs from the customers’ perspective. What we were looking for were subtle problems: was the new experience slower for some scenarios than the old one? Did it break in certain browsers on some platforms? Was it as reliable? Could we operate it as effectively? We could compare the populations and spot the differences reasonably easily.

This was a substantial change in our platforms and systems, and the answer wasn’t always perfect. We took the new SRP out of service a few times, fixed bugs, and put it back in. Ultimately, we deemed it a fine replacement in North America, and turned it on for all our customers in North America. The next few months saw us repeat the process across our other major markets (where there are subtle differences between our SRPs).

What’s important is that we didn’t change the look and feel or functionality at first: if we’d done that, we may not have seen several of the small problems we did see as fast as we saw them.

Keeping the old application

Another wise choice was we didn’t follow that old adage of “out with the old, and in with the new”. We kept the old SRP around running in our data centers for a few months, even though it wasn’t in service.

This gave us a fallback plan: when you make major changes, it’s never going to be entirely plain sailing. We knew that the new SRP would have problems, and that we’d want to take it out of service. When we did, we could put the old one back in service while we fixed the problem.

Eventually, we reached the confidence with our new SRP that we didn’t need the old one. And so it was retired, and the hardware put to other uses. That was over a year ago – it has been smooth sailing since.

The curse of dual-development

You might ask why we set bold goals and pushed the teams hard to build the new Raptor platform and the SRP. We like to do that at eBay, but there’s also a pragmatic reason: while there’s two SRP code bases, there’s twice the engineering going on.

Imagine that we’ve got a new idea for an improvement to the SRP. While we’re building the new SRP, the team has to add that idea to the new code base.  The team also has to get the idea into the old code base too – both so we can get it out to our customers, and so that we can carry out that careful testing I described earlier.

To prevent dual development slowing down our project, we declared a moratorium on features in the SRP for a couple of months. This was tough on the broader team – lots of folks want features from the search team, and we delayed all requests. The benefit was we could move much faster in building the new SRP, and getting it out to customers. Of course, a moratorium can’t go on for too long.

And then we changed the page

After we were done with the rollout, the SRP application team could move with speed on modernizing the functionality and look-and-feel of the search results page.

Ultimately, this became an important part of eBay 2.0, a  refresh of the site that we launched in 2012. And they’re now set up to move faster whenever they need to: we are testing more new ideas that improve the customer experience than we’ve been able to before, and that’s key to the continued technology-driven revolution at eBay.

See you next week.

My Tesla Model S

Beta testing the Tesla Model S

My Tesla Model S

My Tesla Model S

I bought a Tesla Model S earlier this year. It’s a dream car: comfortable, responsive, spacious, and great looking. It’s a total geek dream gadget, and I feel good about owning an environmentally sensible electric car. It’s 95% of the way to perfect – and it’s fun being part of the ongoing experiment to find the last 5%.

Scheduled Software Updates

Tesla updates the car occasionally – the car has a 3G cell connection. A dialog box on the massive 17” screen says an update is available, you schedule it, and wake up to an improved car. It’s like updating iOS on your iPhone. Indeed, it’s very similar – your car could be quite different after the update, and it’s clear the car is designed to be a flexible software-driven platform. This is mostly where the beta testing feeling comes in.

Scheduled Charging

Scheduled charging. My car is configured to being charging at 1am when it's plugged in at home.

Scheduled charging. My car is configured to begin charging at 1am when it’s plugged in at home.

The most recent update added scheduled charging. You plug the car into its charge point, and it’ll start charging when you tell it – this allows you to take advantage of lower electricity rates in the early hours of the morning. What’s cool is that it is location-aware: you can set different charge behaviors for different locations, and the car remembers those. So, for example, you could have it charge as soon as it’s plugged in at work, and beginning at 1am at home – and once it’s set, it just works. Pretty neat. (I’m glad this feature arrived – I was beginning to figure out how to install a timer on my 50 Amp 220 volt plug at home.)

Plugged in for charging with the mobile charging cable.

Plugged in for charging with the mobile charging cable.

I actually got this new feature about a week ahead of everyone else. How? Well, I scheduled an update and it failed. I woke up to a dialog box that told me to call Tesla Service. The climate control didn’t work, the odometer read 0 miles, and a few other things were a little off – but the car was completely drivable. I called Tesla service, dreading the need to take it to their service center – but it was way simpler than that. The guy on the phone asked me when I’d next have the car parked for a couple of hours. They later logged into my car, remarking that “the packages were all there but didn’t unpack properly” (suggesting a Linux flavor to the car), and “cleaned things up”. When I got back to the car, all was great – everything back to normal, and I’m the first guy on the block with the latest software that includes scheduled charging.

Climate Control Problems

Climate control must be a harder problem than you’d think. It’s entirely automatic by default: you set the temperature, and the Model S looks after maintaining it. However, I get blasted with cold air most of the time – if you jump in the car when it’s warm outside, and ask for 70 degrees inside, it’ll get you there as fast as it can. And once it’s there, it’ll lower the fan speed until (I guess) it gets a couple of degrees warmer, and then it’ll Arctic blast again. It always feels like it’s not quite doing what I want – sometimes 70 degrees feels rather too warm, and other times I’m freezing. There must be subtlety in making this an awesome feature (maybe other car companies took a long time to get this right?): you want the occupants to be comfortable as soon as possible, but you also want them to have a pleasant time getting there. I bet there’s a software update coming.

Spinal Tap humor: the volume control and even the climate control fan settings go all the way to 11

Spinal Tap humor: the volume control and even the climate control fan settings go all the way to 11

The web browser and nav apps fall short

The giant 17” screen includes a web browser and a navigation application. The browser is about as basic as you’ll get: it doesn’t have autocomplete (with much-needed spelling correction), it doesn’t save form data, and it randomly seems to lose its history and cookies. It’s also got problems with its touch interface: you need to press a little above any link you want to click, and often a few times. The navigation application is ok, but has a few quirks: it’s always oriented so that north is facing up, which isn’t how I like to use navigation, and traffic data seems to update on its own frequency (even if you turn traffic on and off) – which can lead you into a jam. I am not quite sure whether the traffic data is used to determine routes – I suspect not yet ; it’s certainly not configurable to tell the navigation app whether you’d prefer a faster, shorter, non-highway, or highway route as in many other nav tools.

If the 17” screen has issues, you can reboot it by holding the two scroll wheels on the steering wheel. You can do this while you’re driving. You can reboot the screen behind the steering wheel separately by holding the two buttons above the scroll wheels. Again, no problem while you’re driving. This suggests there’s several physical or virtual machines in the Model S – at least one for each of the screens, and more behind running what’s needed to drive the car.

Am I unhappy? No. The future has arrived early – a car that’s as much software as hardware, and that can be iterated on and improved without you going near a service center. Is it entirely baked? Not yet. Do I love my Tesla Model S? Best car I’ve owned easily.

See you next week.

By the way, while I own a Tesla, I don’t own any shares in the company nor do I plan to buy any. I wish I did, after their spectacular rise in the past couple of weeks.

eBay Open House in Seattle on Tuesday May 14

eBay's Ken Moss, VP and General Manager of eBay's Seattle office

eBay’s Ken Moss, VP and General Manager of eBay’s Seattle office

Would you like to learn about some of the projects underway in the eBay Seattle office and have the chance to mingle with our leadership team and engineers?

eBay Seattle invites you to a keynote on Tuesday May 14 2013 by one of our Vice Presidents, Ken Moss.  The team will also present an overview of eBay’s architecture and a tour of our endeavors in the big data realm. The event is at the Meydenbauer Center in Bellevue, WA.

Please RSVP at http://ebayseattle.eventbrite.com.

Five Myths about Hash Tables

A hash table is data structure that is used to search for exact matches to a search key. For searching, they work like this:

  1. Take a search key (example: the word “cat”)
  2. Hash the search key: pass the key to a function that returns an integer value (example: the hash function returns 47 when the input key is “cat”)
  3. Use the integer value to inspect a slot to see if it contains the search key (example: look in an array at element 47)
  4. If the key matches, great, you’ve found what you’re looking for and you can retrieve whatever metadata you’re looking for (example: in array element 47, we’re found the word “cat”. We retrieve metadata that tells us “cat” is an “animal”)
  5. If the key doesn’t match and the slot contains something besides the key, carry out a secondary search process to make sure the search key really isn’t stored in the hash table; the secondary search process varies depending on the type of hashing algorithm used (example: in array element 47, we found the word “dog”. So, let’s look in slot 48 and see if “cat” is there. Slot 48 is empty, so “cat” isn’t in the hash table. This is called linear probing, it’s one kind of secondary search)

I had a minor obsession with hash tables for ten years.

The worst case is terrible

Engineers irrationally avoid hash tables because of the worst-case O(n) search time. In practice, that means they’re worried that everything they search for will hash to the same value; for example, imagine hashing every word in the English dictionary, and the hash function always returning 47. Put differently, if the hash function doesn’t do a good job of randomly distributing search keys throughout the hash table, the searching will become scanning instead.

This doesn’t happen in practice. At least, it won’t happen if you make a reasonable attempt to design a hash function using some knowledge of the search keys. Indeed, hashing is usually in the average case much closer to O(1). In practice, that means almost always you’ll find what you’re looking for on the first search attempt (and you will do very little secondary searching).

Hash tables become full, and bad things happen

The story goes like this: you’ve created a hash table with one million slots. Let’s say it’s an array. What happens when you try and insert the 1,000,001st item? Bad things: your hash function points you to a slot, it’s full, so you look in another slot, it’s full, and this never ends — you’re stuck in an infinite loop. The function never returns, CPU goes through the roof, your server stops responding, and you take down the whole of _LARGE_INTERNET_COMPANY.

This shouldn’t happen if you’ve spent time on design.

Here’s one solve: there’s a class of hash tables that deal with becoming full. They work like this: when the table becomes x% full, you create a new hash table that is (say) double the size, and move all the data into the new hash table by rehashing all of the elements that are stored in it. The downside is you have to rehash all the values, which is an O(n) [aka linear, scanning, time consuming] process.

Here’s another solve: instead of storing the metadata in the array, make your array elements pointers to a linked list of nodes that contain the metadata. That way, your hash table can never get full. You search for an element, and then traverse the linked list looking for a match; traversing the linked list is your secondary hash function. Here’s a diagram that shows a so-called chained hash table:

A chained hash table, showing “John Smith” and “Sandra Dee” both in slot 152 of the hash table. You can see that they’re chained together in a linked list. Taken from Wikipedia, http://en.wikipedia.org/wiki/Hash_table

Chained hash tables are a very good idea. Problem solved.

By the way, I recommend you create a hash table that’s about twice the size of the number of elements you expect to put into the table. So, suppose you’re planning on putting one million items in the table, go ahead and create a table with two million slots.

Trees are better

In general, trees aren’t faster for searching for exact matches. They’re slower, and here’s peer-reviewed evidence that compares B-trees, splay trees, and hashing for a variety of string types. So, why use a tree at all? Trees are good for inexact matches — say finding all names that begin with “B” — and that’s a task that hash tables can’t do.

Hash functions are slow

I’ve just pointed you to research that shows that must not be true. But there is something to watch out for: don’t use the traditional modulo based hash function you’ll find in your algorithms text book; for example, here’s Sedgewick’s version, note the “% M” modulo operation that’s performed once per character in the input string. The modulo is used to ensure that the hash value falls within the size of the hash table — for example, if we have 100 slots and the hash function returns 147, the modulo turns that into 47. Instead, do the modulo once, just before you return from the hash function.

Here’s the hash function I used in much of my research on hashing. You can download the code here.

A very fast hash function written in C. This uses bit shifts, and a single modulo at the end of the function. If you find a faster one, would love to hear about it.

Hash tables use too much memory

Again, not true. The same peer-reviewed evidence shows that a hash table uses about the same memory as an efficiently implemented tree structure. Remember that if you’re creating an array for a chained hash table, you’re just creating an array of pointers — you don’t really start using significant space until you’re inserting nodes into the linked lists that are chained from the hash table (and those nodes typically take less space than nodes in a tree, since they only have one pointer).

One Small Trick

If you want your hash tables to really fly for searching, move any node that matches your search to the front of the chained list that it’s in. Here’s more peer-reviewed evidence that shows this works great.

Alright. Viva Hash Tables! See you next time.

Why Facebook shouldn’t have dumped HTML5

We all want to build fast, reliable mobile apps. Facebook couldn’t make its HTML5 mobile app deliver on that goal, and decided to build its own native app. In practice, this means retiring an app that is a browser-like shell that renders web pages (a thin client), and launching a fully-fledged app on the mobile device (a thick client, with apparently a sprinkling of HTML5 inside).

That’s a step backwards, and flies in the face of history. Haven’t we just been through a twenty-year evolution of thicker clients in general being replaced by thinner clients? How many apps did you install on your PC this year compared to ten years ago? How many tabs do you have open in your browser?

Should Facebook have just made the web faster and more reliable? Rather than mostly abandon HTML5, why didn’t they evolve the standard and make the web better? It wouldn’t be the first time that has happened — I’m many of you remember web standards evolving in the 1990s, and you can thank those days for the better experiences we all have today. So, an opportunity lost — but I am sure the story is not over, and indeed it sounds like their new app is indeed a “hybrid app” (where there is some HTML5 inside the native app’s framework).

This change also makes experimentation much harder. On the web, most major companies are running test versus control experiments or A/B tests. We put a population of users in an experiment, and compare their behaviors with those who aren’t in the test — for example, at eBay, we try out improvements to search ranking on a small population of customers and compare their behaviors to those who are seeing the regular results. The great thing about the web is you can do this fast — pretty much as fast as you can code and test the changes — and test large numbers of simultaneous variants. The outcome is you make fast progress in improving your product on behalf of your customers.

Building native apps makes experimentation harder. You could build an “A” and a “B” experience into an iPhone native app, get it through Apple’s approval process, and try out the two experiences on the customers. But the barrier to entry is much higher — you can only run a couple of experiments, and you probably only release once a month at most. You’re not going to evolve your application as fast as your customers want.

There are always tensions and tradeoffs: in this case it is speed and reliability on one side, and the future of the web and experimentation on the other. I would have fought hard to stay in the latter camp.

Five fitness gadgets I love

My passion is fitness, and part of fueling the passion is having the right gadgets to stay motivated, work hard, and enjoy what I do. Here’s my top five (which is subject to change any year).

FitBit

I’m not the first to put the fitbit at the top of a list — mashable did it just two weeks ago.

The original fitbit. A small, clever, wireless pedometer that’ll keep you motivated

For $99 you get yourself a tiny, wireless pedometer. It counts daily steps accurately, measures how many flights of stairs you’ve climbed, and has a nice stopwatch. It also has a clock, a fairly useless calorie burn guesstimator, and a few other features. The stopwatch is useful for timing how long you’ve been asleep — press and hold the button on the front and the stopwatch starts, press and hold the button and it stops. If the stopwatch runs for an extended period, fitbit figures out you were asleep and records it as such.

What’s most cool is the website. When you walk past the basestation that comes with your fitbit, your data is uploaded to fitbit.com. You can then inspect the data online, including step totals for the week, badges you win for hitting milestones, lifetime achievements, average sleep duration, and more. For me, there’s a healthy competition with friends I’ve connected to on fitbit: who’s did the most steps this week and where am I ranked. You get a weekly email on Tuesdays with a summary of last week’s performance.

The fitbit leaderboard at the fitbit.com website. If you own a fitbit, compete with your friends.

I can’t say I’m achieving my step goals every week, but I love how the fitbit motivates me to move.

TRX

The TRX Suspension Trainer or TRX is a new essential in my fitness arsenal. I throw it in my carry-on luggage when I travel, and toss it in the car when I hit the running track. It’s around $200.

The TRX is simple: two handles attached to each end of a strap, with an anchor point in the middle. You attach the anchor point to a stable, high mounting point, and then use the handles to workout. It’s a cousin of men’s gymnastic rings. You can attach it to a tree, monkey bars, a chip up bar in the gym, or the (slightly expensive) mounting options that the TRX folks sell.

The TRX is cool because it replaces a variety of other workout gear. You can use it to exercise your chest, back, abs, arms, and much more — it’s a fine alternative to dumbbells, barbells, and the variety of machines in your gym. The bonus is it’s also unstable in a good way — you need to work more muscles to carry out many of the exercises, and so even the humble pushup becomes more of an abs and shoulder stabilization exercise. The video that’s embedded below shows you fifty exercises you can do — it illustrates the amazing versatility, even if a few of the exercises aren’t to my liking.

Chin up bar

When I was in high school, my record number of chin ups was (maybe) three. They’re a lifelong nemesis. But me being me, I like a challenge — so what’s better than installing a chin up bar in your garage, and getting after improving? I’ve tried a few, and the stud bar pullup bar is the standout winner at $140. It’s sturdy, reasonably easy to install, and easily mounted far from walls.

The stud bar pull up bar. It attaches sturdily to the studs in your roof, giving you plenty of clearance from walls.

Chin up bars aren’t just for chins ups, and they don’t just work your lats (the muscles under your armpits). With a forward grip, you sure do work your lats, but you also work your core muscles and more. With a reverse grip, your biceps come into play. And there’s lots of great abs exercises you can do by hanging from the bar, and lifting, raising, or rotating your knees. If you want a strong core, it’s a great investment.

Resistance Bands

Resistance bands are rubber bands with (usually) handles at each end. Similarly to the TRX, they’re a versatile way of working muscles in a way that doesn’t require iron. They’re almost as portable as the TRX — easy to throw in a bag when you’re travelling. My favorites are from bodylastics.com. For $36, you can buy their entry-level set — and, honestly, I wouldn’t but their more expensive ones (unless you’re super strong, or you want to work out with a partner frequently).

A truly random picture of a few resistance band exercises. It shows you that pulling the ends of a long rubber band is a versatile way to exercise your body

The idea is fairly simple: pull the handle, stretch the band, work one or more muscles. For example, you can wrap a band around a pole, and pull the handles on each end toward your hips to work your back muscles. The bodylastics products come with a nice booklet that illustrates tens of exercises, and has a few suggested routines for those interested in different sports and with different levels of experience. YouTube is also full of resistance band workouts.

Agility Ladder

An agility ladder is a set of plastic straps that are held together on either side by a rope or strap to make a ladder-like apparatus. You lay it out on a floor or path, and then run through it in a variety of different ways; indeed, “run” is a gross generalization, there’s tens of complicated ways to traverse the length of the ladder, many involving complex aerobics-like moves. The benefit is a cardio, brain, and agility workout — you work up a sweat while also teaching your body how to react, accelerate, and move in patterns. They’re incredibly portable, they stow away in a small bag that’s easily thrown into your luggage.

Three guys making their way through an agility ladder. It’s fun to follow someone else — a great way to learn, and challenge yourself to a race

I’ve got a list in my head of around thirty different moves I do with an agility ladder — I do each one up and back, catch my breath, and hit the next. It’s a buzz, and doubly-so if you’ve got headphones, music that you can keep pace with, and you’re in the mood to push yourself.

Honorable Mentions

I’m disappointed I couldn’t squeeze in my iPod, jump rope, medicine ball, Bowflex 1090 dumbbells, or some humble cones. If this post gets more than a few views, I’ll post my top ten someday soon. See you all next week (and apologies for the intermittent posts this month — work is super busy).

Bing vs. Google

The Bing folks launched their new bingiton challenge today. It’s an anonymized (well, almost) taste test of Google versus Bing for queries that you supply. The challenge is to try five queries, and see how often Bing beats Google.

My results from the Bing It On challenge. Google 3, Bing 2.

You can see what happened for me: Google 3, Bing 2. Bing claims this isn’t typical, I’ll let you try and it see if they’re right; they claim Bing beats Google 2:1 in their tests.

Here’s why Google and Bing won their respective queries for me:

  • Gold Base bobblehead. Google won this hands down, it’s all down to the first result. They show a definitive site with a list of the gold base baseball bobbleheads of the 1960s. Bing whiffs with two eBay links in positions one and two (much as a I love eBay, that isn’t what I’m looking for)
  • Hugh Williams. Come on, we all try looking for ourselves. Bing wins here, they have a link to my site as the first result, but it’s the presentation that makes it a winner — they include an image, a link to my LinkedIn page, and my email address all in a single result. Google whiffs with a link to the actor’s wikipedia page, and some much less attractive links to pages about me in their later results
  • Bobby Valentine. Was checking how fresh the indexes are, and it’s a dead heat — they’ve both got the latest news and great results. Google wins for a slightly more attractive presentation of the images throughout the page
  • Starbucks Sunnyvale. Let’s test who’s best at local queries. Again, it’s close to a dead heat — both do a great job presenting information about Starbucks locations in Sunnyvale in the first half of the page. What makes the difference is Google’s presentation of Yelp results that are visual and helped me choose a Starbucks, while Bing presented some fairly useless results in the lower half of the page. Minor victory to Google
  • The Shock of the Lightning Video. Let’s test who gets me to my multimedia best. Easy win here to Bing, their nice presentation of a strip of video results is a slam dunk winner over Google’s one row per video, YouTube-centric presentation

Google wins, but not by a huge margin. What’s not fair is that the Bing It On challenge takes the query-completing autosuggest feature out of play, and also Google’s instant search. Personalization also disappears, though that’s not a bad thing. The pages are also incomplete, so you can’t quite use search in the way you might. But, all up, it’s a reasonable way to compare the two.

What happens when you try it? Is it the Google habit for you, or are you thinking about a switch to Bing?