Author Archives: Hugh E. Williams

About Hugh E. Williams

Search engine guy, Engineer, Executive, Father, and eBay-er.

Changing platforms at Scale: Lessons from eBay

At eBay, we’ve been on a journey to modernize our platforms, and rebuild old, crufty systems as modern, flexible ones.

In 2011, Sri Shivananda, Mark Carges, and I decided to modernize our front-end development stack. We were building our web applications in v4, an entirely home-grown framework. It wasn’t intuitive to new engineers who joined eBay, they were familiar with industry standards we didn’t use, and we’d also built very tightly coupled balls of spaghetti code over many years. (That’s not a criticism of our engineers – every system gets crufty and unwieldy eventually; software has an effective timespan and then it needs to be replaced.)

Sri Shivananda, eBay Marketplaces' VP of Platform and Intrastructure

Sri Shivananda, eBay Marketplaces’ VP of Platform and Intrastructure

We set design goals for our new Raptor framework, including that we wanted to do a better job separating presentation from business logic. We also wanted better tools for engineers, faster code build times, better monitoring and alerting when problems occur, the ability to test changes without restarting our web servers, and a framework that was intuitive to engineers who joined from other companies. It was an ambitious project, and one that Sri’s lead as a successful revolution in the Marketplaces business. We now build software much faster than ever before, and we’ve rewritten major parts of the front-end systems. (And we’ve open sourced part of the framework.)

That’s the context, but what this post is really about is how you execute a change in platforms in a large company with complex software systems.

The “Steel Thread” Model

Mark Carges, eBay's Chief Technical Officer

Mark Carges, eBay’s Chief Technical Officer

Our CTO, Mark Carges, advocates building a “steel thread” use case when we rethink platforms. What he means is that when you build a new platform, build it at the same time as a single use case on top of the platform. That is, build a system with the platform, like a steel thread running end-to-end through everything we do.

A good platform team thinks broadly about all systems that’ll be built on the platform, and designs for the present and the future. The risk is they’ll build the whole thing – including features that no one ultimately needs for use cases that are three years away. Things change fast in this world. Large platform projects can go down very deep holes, and sometimes never come out.

The wisdom of the “steel thread” model is that the platform team still does the thinking, but it’s pushed by an application team to only fully design and build that parts that are immediately needed. The tension forces prioritization, tradeoffs, and a pragmatism in the platform team. Once you’re done with the first use case, you can move onto subsequent ones and build more of the platform.

Rebuilding the Search Results Page

Our first steel thread use case on Raptor was the eBay Marketplaces Search Results Page (the SRP). We picked this use case because it was hard: it’s our second-most trafficked page, and one of our most complex; building the SRP on Raptor would exercise the new platform extensively.

We co-located our new Raptor platform team – which was a small team by design – together with one of our most mission critical teams, the search frontend team. We declared that their success was mutually dependent: we’re not celebrating until the SRP is built on Raptor.

We asked the team to rebuild the SRP together. We asked for an aggressive timeline. We set bold goals. But there was one twist: build the same functionality and look-and-feel as the existing SRP. That is, we asked the team to only change one variable: change the platform. We asked them not to change an important second variable: the functionality of the site.

This turned out to be important. The end result – after much hard work – was a shiny new SRP code base:  modular, cleaner, simpler, and built on a modern platform.  But it looked and behaved the same as the old one. This allowed us to test one thing: is it equivalent for our customers to the old one?

Testing the new Search Results Page

We ran a few weeks of A/B tests, where we showed different customer populations the old and new search results page. Remember, they’re pretty much the same SRPs from the customers’ perspective. What we were looking for were subtle problems: was the new experience slower for some scenarios than the old one? Did it break in certain browsers on some platforms? Was it as reliable? Could we operate it as effectively? We could compare the populations and spot the differences reasonably easily.

This was a substantial change in our platforms and systems, and the answer wasn’t always perfect. We took the new SRP out of service a few times, fixed bugs, and put it back in. Ultimately, we deemed it a fine replacement in North America, and turned it on for all our customers in North America. The next few months saw us repeat the process across our other major markets (where there are subtle differences between our SRPs).

What’s important is that we didn’t change the look and feel or functionality at first: if we’d done that, we may not have seen several of the small problems we did see as fast as we saw them.

Keeping the old application

Another wise choice was we didn’t follow that old adage of “out with the old, and in with the new”. We kept the old SRP around running in our data centers for a few months, even though it wasn’t in service.

This gave us a fallback plan: when you make major changes, it’s never going to be entirely plain sailing. We knew that the new SRP would have problems, and that we’d want to take it out of service. When we did, we could put the old one back in service while we fixed the problem.

Eventually, we reached the confidence with our new SRP that we didn’t need the old one. And so it was retired, and the hardware put to other uses. That was over a year ago – it has been smooth sailing since.

The curse of dual-development

You might ask why we set bold goals and pushed the teams hard to build the new Raptor platform and the SRP. We like to do that at eBay, but there’s also a pragmatic reason: while there’s two SRP code bases, there’s twice the engineering going on.

Imagine that we’ve got a new idea for an improvement to the SRP. While we’re building the new SRP, the team has to add that idea to the new code base.  The team also has to get the idea into the old code base too – both so we can get it out to our customers, and so that we can carry out that careful testing I described earlier.

To prevent dual development slowing down our project, we declared a moratorium on features in the SRP for a couple of months. This was tough on the broader team – lots of folks want features from the search team, and we delayed all requests. The benefit was we could move much faster in building the new SRP, and getting it out to customers. Of course, a moratorium can’t go on for too long.

And then we changed the page

After we were done with the rollout, the SRP application team could move with speed on modernizing the functionality and look-and-feel of the search results page.

Ultimately, this became an important part of eBay 2.0, a  refresh of the site that we launched in 2012. And they’re now set up to move faster whenever they need to: we are testing more new ideas that improve the customer experience than we’ve been able to before, and that’s key to the continued technology-driven revolution at eBay.

See you next week.

My Tesla Model S

Beta testing the Tesla Model S

My Tesla Model S

My Tesla Model S

I bought a Tesla Model S earlier this year. It’s a dream car: comfortable, responsive, spacious, and great looking. It’s a total geek dream gadget, and I feel good about owning an environmentally sensible electric car. It’s 95% of the way to perfect – and it’s fun being part of the ongoing experiment to find the last 5%.

Scheduled Software Updates

Tesla updates the car occasionally – the car has a 3G cell connection. A dialog box on the massive 17” screen says an update is available, you schedule it, and wake up to an improved car. It’s like updating iOS on your iPhone. Indeed, it’s very similar – your car could be quite different after the update, and it’s clear the car is designed to be a flexible software-driven platform. This is mostly where the beta testing feeling comes in.

Scheduled Charging

Scheduled charging. My car is configured to being charging at 1am when it's plugged in at home.

Scheduled charging. My car is configured to begin charging at 1am when it’s plugged in at home.

The most recent update added scheduled charging. You plug the car into its charge point, and it’ll start charging when you tell it – this allows you to take advantage of lower electricity rates in the early hours of the morning. What’s cool is that it is location-aware: you can set different charge behaviors for different locations, and the car remembers those. So, for example, you could have it charge as soon as it’s plugged in at work, and beginning at 1am at home – and once it’s set, it just works. Pretty neat. (I’m glad this feature arrived – I was beginning to figure out how to install a timer on my 50 Amp 220 volt plug at home.)

Plugged in for charging with the mobile charging cable.

Plugged in for charging with the mobile charging cable.

I actually got this new feature about a week ahead of everyone else. How? Well, I scheduled an update and it failed. I woke up to a dialog box that told me to call Tesla Service. The climate control didn’t work, the odometer read 0 miles, and a few other things were a little off – but the car was completely drivable. I called Tesla service, dreading the need to take it to their service center – but it was way simpler than that. The guy on the phone asked me when I’d next have the car parked for a couple of hours. They later logged into my car, remarking that “the packages were all there but didn’t unpack properly” (suggesting a Linux flavor to the car), and “cleaned things up”. When I got back to the car, all was great – everything back to normal, and I’m the first guy on the block with the latest software that includes scheduled charging.

Climate Control Problems

Climate control must be a harder problem than you’d think. It’s entirely automatic by default: you set the temperature, and the Model S looks after maintaining it. However, I get blasted with cold air most of the time – if you jump in the car when it’s warm outside, and ask for 70 degrees inside, it’ll get you there as fast as it can. And once it’s there, it’ll lower the fan speed until (I guess) it gets a couple of degrees warmer, and then it’ll Arctic blast again. It always feels like it’s not quite doing what I want – sometimes 70 degrees feels rather too warm, and other times I’m freezing. There must be subtlety in making this an awesome feature (maybe other car companies took a long time to get this right?): you want the occupants to be comfortable as soon as possible, but you also want them to have a pleasant time getting there. I bet there’s a software update coming.

Spinal Tap humor: the volume control and even the climate control fan settings go all the way to 11

Spinal Tap humor: the volume control and even the climate control fan settings go all the way to 11

The web browser and nav apps fall short

The giant 17” screen includes a web browser and a navigation application. The browser is about as basic as you’ll get: it doesn’t have autocomplete (with much-needed spelling correction), it doesn’t save form data, and it randomly seems to lose its history and cookies. It’s also got problems with its touch interface: you need to press a little above any link you want to click, and often a few times. The navigation application is ok, but has a few quirks: it’s always oriented so that north is facing up, which isn’t how I like to use navigation, and traffic data seems to update on its own frequency (even if you turn traffic on and off) – which can lead you into a jam. I am not quite sure whether the traffic data is used to determine routes – I suspect not yet ; it’s certainly not configurable to tell the navigation app whether you’d prefer a faster, shorter, non-highway, or highway route as in many other nav tools.

If the 17” screen has issues, you can reboot it by holding the two scroll wheels on the steering wheel. You can do this while you’re driving. You can reboot the screen behind the steering wheel separately by holding the two buttons above the scroll wheels. Again, no problem while you’re driving. This suggests there’s several physical or virtual machines in the Model S – at least one for each of the screens, and more behind running what’s needed to drive the car.

Am I unhappy? No. The future has arrived early – a car that’s as much software as hardware, and that can be iterated on and improved without you going near a service center. Is it entirely baked? Not yet. Do I love my Tesla Model S? Best car I’ve owned easily.

See you next week.

By the way, while I own a Tesla, I don’t own any shares in the company nor do I plan to buy any. I wish I did, after their spectacular rise in the past couple of weeks.

eBay Open House in Seattle on Tuesday May 14

eBay's Ken Moss, VP and General Manager of eBay's Seattle office

eBay’s Ken Moss, VP and General Manager of eBay’s Seattle office

Would you like to learn about some of the projects underway in the eBay Seattle office and have the chance to mingle with our leadership team and engineers?

eBay Seattle invites you to a keynote on Tuesday May 14 2013 by one of our Vice Presidents, Ken Moss.  The team will also present an overview of eBay’s architecture and a tour of our endeavors in the big data realm. The event is at the Meydenbauer Center in Bellevue, WA.

Please RSVP at http://ebayseattle.eventbrite.com.

Five Myths about Hash Tables

A hash table is data structure that is used to search for exact matches to a search key. For searching, they work like this:

  1. Take a search key (example: the word “cat”)
  2. Hash the search key: pass the key to a function that returns an integer value (example: the hash function returns 47 when the input key is “cat”)
  3. Use the integer value to inspect a slot to see if it contains the search key (example: look in an array at element 47)
  4. If the key matches, great, you’ve found what you’re looking for and you can retrieve whatever metadata you’re looking for (example: in array element 47, we’re found the word “cat”. We retrieve metadata that tells us “cat” is an “animal”)
  5. If the key doesn’t match and the slot contains something besides the key, carry out a secondary search process to make sure the search key really isn’t stored in the hash table; the secondary search process varies depending on the type of hashing algorithm used (example: in array element 47, we found the word “dog”. So, let’s look in slot 48 and see if “cat” is there. Slot 48 is empty, so “cat” isn’t in the hash table. This is called linear probing, it’s one kind of secondary search)

I had a minor obsession with hash tables for ten years.

The worst case is terrible

Engineers irrationally avoid hash tables because of the worst-case O(n) search time. In practice, that means they’re worried that everything they search for will hash to the same value; for example, imagine hashing every word in the English dictionary, and the hash function always returning 47. Put differently, if the hash function doesn’t do a good job of randomly distributing search keys throughout the hash table, the searching will become scanning instead.

This doesn’t happen in practice. At least, it won’t happen if you make a reasonable attempt to design a hash function using some knowledge of the search keys. Indeed, hashing is usually in the average case much closer to O(1). In practice, that means almost always you’ll find what you’re looking for on the first search attempt (and you will do very little secondary searching).

Hash tables become full, and bad things happen

The story goes like this: you’ve created a hash table with one million slots. Let’s say it’s an array. What happens when you try and insert the 1,000,001st item? Bad things: your hash function points you to a slot, it’s full, so you look in another slot, it’s full, and this never ends — you’re stuck in an infinite loop. The function never returns, CPU goes through the roof, your server stops responding, and you take down the whole of _LARGE_INTERNET_COMPANY.

This shouldn’t happen if you’ve spent time on design.

Here’s one solve: there’s a class of hash tables that deal with becoming full. They work like this: when the table becomes x% full, you create a new hash table that is (say) double the size, and move all the data into the new hash table by rehashing all of the elements that are stored in it. The downside is you have to rehash all the values, which is an O(n) [aka linear, scanning, time consuming] process.

Here’s another solve: instead of storing the metadata in the array, make your array elements pointers to a linked list of nodes that contain the metadata. That way, your hash table can never get full. You search for an element, and then traverse the linked list looking for a match; traversing the linked list is your secondary hash function. Here’s a diagram that shows a so-called chained hash table:

A chained hash table, showing “John Smith” and “Sandra Dee” both in slot 152 of the hash table. You can see that they’re chained together in a linked list. Taken from Wikipedia, http://en.wikipedia.org/wiki/Hash_table

Chained hash tables are a very good idea. Problem solved.

By the way, I recommend you create a hash table that’s about twice the size of the number of elements you expect to put into the table. So, suppose you’re planning on putting one million items in the table, go ahead and create a table with two million slots.

Trees are better

In general, trees aren’t faster for searching for exact matches. They’re slower, and here’s peer-reviewed evidence that compares B-trees, splay trees, and hashing for a variety of string types. So, why use a tree at all? Trees are good for inexact matches — say finding all names that begin with “B” — and that’s a task that hash tables can’t do.

Hash functions are slow

I’ve just pointed you to research that shows that must not be true. But there is something to watch out for: don’t use the traditional modulo based hash function you’ll find in your algorithms text book; for example, here’s Sedgewick’s version, note the “% M” modulo operation that’s performed once per character in the input string. The modulo is used to ensure that the hash value falls within the size of the hash table — for example, if we have 100 slots and the hash function returns 147, the modulo turns that into 47. Instead, do the modulo once, just before you return from the hash function.

Here’s the hash function I used in much of my research on hashing. You can download the code here.

A very fast hash function written in C. This uses bit shifts, and a single modulo at the end of the function. If you find a faster one, would love to hear about it.

Hash tables use too much memory

Again, not true. The same peer-reviewed evidence shows that a hash table uses about the same memory as an efficiently implemented tree structure. Remember that if you’re creating an array for a chained hash table, you’re just creating an array of pointers — you don’t really start using significant space until you’re inserting nodes into the linked lists that are chained from the hash table (and those nodes typically take less space than nodes in a tree, since they only have one pointer).

One Small Trick

If you want your hash tables to really fly for searching, move any node that matches your search to the front of the chained list that it’s in. Here’s more peer-reviewed evidence that shows this works great.

Alright. Viva Hash Tables! See you next time.

Why Facebook shouldn’t have dumped HTML5

We all want to build fast, reliable mobile apps. Facebook couldn’t make its HTML5 mobile app deliver on that goal, and decided to build its own native app. In practice, this means retiring an app that is a browser-like shell that renders web pages (a thin client), and launching a fully-fledged app on the mobile device (a thick client, with apparently a sprinkling of HTML5 inside).

That’s a step backwards, and flies in the face of history. Haven’t we just been through a twenty-year evolution of thicker clients in general being replaced by thinner clients? How many apps did you install on your PC this year compared to ten years ago? How many tabs do you have open in your browser?

Should Facebook have just made the web faster and more reliable? Rather than mostly abandon HTML5, why didn’t they evolve the standard and make the web better? It wouldn’t be the first time that has happened — I’m many of you remember web standards evolving in the 1990s, and you can thank those days for the better experiences we all have today. So, an opportunity lost — but I am sure the story is not over, and indeed it sounds like their new app is indeed a “hybrid app” (where there is some HTML5 inside the native app’s framework).

This change also makes experimentation much harder. On the web, most major companies are running test versus control experiments or A/B tests. We put a population of users in an experiment, and compare their behaviors with those who aren’t in the test — for example, at eBay, we try out improvements to search ranking on a small population of customers and compare their behaviors to those who are seeing the regular results. The great thing about the web is you can do this fast — pretty much as fast as you can code and test the changes — and test large numbers of simultaneous variants. The outcome is you make fast progress in improving your product on behalf of your customers.

Building native apps makes experimentation harder. You could build an “A” and a “B” experience into an iPhone native app, get it through Apple’s approval process, and try out the two experiences on the customers. But the barrier to entry is much higher — you can only run a couple of experiments, and you probably only release once a month at most. You’re not going to evolve your application as fast as your customers want.

There are always tensions and tradeoffs: in this case it is speed and reliability on one side, and the future of the web and experimentation on the other. I would have fought hard to stay in the latter camp.

Five fitness gadgets I love

My passion is fitness, and part of fueling the passion is having the right gadgets to stay motivated, work hard, and enjoy what I do. Here’s my top five (which is subject to change any year).

FitBit

I’m not the first to put the fitbit at the top of a list — mashable did it just two weeks ago.

The original fitbit. A small, clever, wireless pedometer that’ll keep you motivated

For $99 you get yourself a tiny, wireless pedometer. It counts daily steps accurately, measures how many flights of stairs you’ve climbed, and has a nice stopwatch. It also has a clock, a fairly useless calorie burn guesstimator, and a few other features. The stopwatch is useful for timing how long you’ve been asleep — press and hold the button on the front and the stopwatch starts, press and hold the button and it stops. If the stopwatch runs for an extended period, fitbit figures out you were asleep and records it as such.

What’s most cool is the website. When you walk past the basestation that comes with your fitbit, your data is uploaded to fitbit.com. You can then inspect the data online, including step totals for the week, badges you win for hitting milestones, lifetime achievements, average sleep duration, and more. For me, there’s a healthy competition with friends I’ve connected to on fitbit: who’s did the most steps this week and where am I ranked. You get a weekly email on Tuesdays with a summary of last week’s performance.

The fitbit leaderboard at the fitbit.com website. If you own a fitbit, compete with your friends.

I can’t say I’m achieving my step goals every week, but I love how the fitbit motivates me to move.

TRX

The TRX Suspension Trainer or TRX is a new essential in my fitness arsenal. I throw it in my carry-on luggage when I travel, and toss it in the car when I hit the running track. It’s around $200.

The TRX is simple: two handles attached to each end of a strap, with an anchor point in the middle. You attach the anchor point to a stable, high mounting point, and then use the handles to workout. It’s a cousin of men’s gymnastic rings. You can attach it to a tree, monkey bars, a chip up bar in the gym, or the (slightly expensive) mounting options that the TRX folks sell.

The TRX is cool because it replaces a variety of other workout gear. You can use it to exercise your chest, back, abs, arms, and much more — it’s a fine alternative to dumbbells, barbells, and the variety of machines in your gym. The bonus is it’s also unstable in a good way — you need to work more muscles to carry out many of the exercises, and so even the humble pushup becomes more of an abs and shoulder stabilization exercise. The video that’s embedded below shows you fifty exercises you can do — it illustrates the amazing versatility, even if a few of the exercises aren’t to my liking.

Chin up bar

When I was in high school, my record number of chin ups was (maybe) three. They’re a lifelong nemesis. But me being me, I like a challenge — so what’s better than installing a chin up bar in your garage, and getting after improving? I’ve tried a few, and the stud bar pullup bar is the standout winner at $140. It’s sturdy, reasonably easy to install, and easily mounted far from walls.

The stud bar pull up bar. It attaches sturdily to the studs in your roof, giving you plenty of clearance from walls.

Chin up bars aren’t just for chins ups, and they don’t just work your lats (the muscles under your armpits). With a forward grip, you sure do work your lats, but you also work your core muscles and more. With a reverse grip, your biceps come into play. And there’s lots of great abs exercises you can do by hanging from the bar, and lifting, raising, or rotating your knees. If you want a strong core, it’s a great investment.

Resistance Bands

Resistance bands are rubber bands with (usually) handles at each end. Similarly to the TRX, they’re a versatile way of working muscles in a way that doesn’t require iron. They’re almost as portable as the TRX — easy to throw in a bag when you’re travelling. My favorites are from bodylastics.com. For $36, you can buy their entry-level set — and, honestly, I wouldn’t but their more expensive ones (unless you’re super strong, or you want to work out with a partner frequently).

A truly random picture of a few resistance band exercises. It shows you that pulling the ends of a long rubber band is a versatile way to exercise your body

The idea is fairly simple: pull the handle, stretch the band, work one or more muscles. For example, you can wrap a band around a pole, and pull the handles on each end toward your hips to work your back muscles. The bodylastics products come with a nice booklet that illustrates tens of exercises, and has a few suggested routines for those interested in different sports and with different levels of experience. YouTube is also full of resistance band workouts.

Agility Ladder

An agility ladder is a set of plastic straps that are held together on either side by a rope or strap to make a ladder-like apparatus. You lay it out on a floor or path, and then run through it in a variety of different ways; indeed, “run” is a gross generalization, there’s tens of complicated ways to traverse the length of the ladder, many involving complex aerobics-like moves. The benefit is a cardio, brain, and agility workout — you work up a sweat while also teaching your body how to react, accelerate, and move in patterns. They’re incredibly portable, they stow away in a small bag that’s easily thrown into your luggage.

Three guys making their way through an agility ladder. It’s fun to follow someone else — a great way to learn, and challenge yourself to a race

I’ve got a list in my head of around thirty different moves I do with an agility ladder — I do each one up and back, catch my breath, and hit the next. It’s a buzz, and doubly-so if you’ve got headphones, music that you can keep pace with, and you’re in the mood to push yourself.

Honorable Mentions

I’m disappointed I couldn’t squeeze in my iPod, jump rope, medicine ball, Bowflex 1090 dumbbells, or some humble cones. If this post gets more than a few views, I’ll post my top ten someday soon. See you all next week (and apologies for the intermittent posts this month — work is super busy).

Bing vs. Google

The Bing folks launched their new bingiton challenge today. It’s an anonymized (well, almost) taste test of Google versus Bing for queries that you supply. The challenge is to try five queries, and see how often Bing beats Google.

My results from the Bing It On challenge. Google 3, Bing 2.

You can see what happened for me: Google 3, Bing 2. Bing claims this isn’t typical, I’ll let you try and it see if they’re right; they claim Bing beats Google 2:1 in their tests.

Here’s why Google and Bing won their respective queries for me:

  • Gold Base bobblehead. Google won this hands down, it’s all down to the first result. They show a definitive site with a list of the gold base baseball bobbleheads of the 1960s. Bing whiffs with two eBay links in positions one and two (much as a I love eBay, that isn’t what I’m looking for)
  • Hugh Williams. Come on, we all try looking for ourselves. Bing wins here, they have a link to my site as the first result, but it’s the presentation that makes it a winner — they include an image, a link to my LinkedIn page, and my email address all in a single result. Google whiffs with a link to the actor’s wikipedia page, and some much less attractive links to pages about me in their later results
  • Bobby Valentine. Was checking how fresh the indexes are, and it’s a dead heat — they’ve both got the latest news and great results. Google wins for a slightly more attractive presentation of the images throughout the page
  • Starbucks Sunnyvale. Let’s test who’s best at local queries. Again, it’s close to a dead heat — both do a great job presenting information about Starbucks locations in Sunnyvale in the first half of the page. What makes the difference is Google’s presentation of Yelp results that are visual and helped me choose a Starbucks, while Bing presented some fairly useless results in the lower half of the page. Minor victory to Google
  • The Shock of the Lightning Video. Let’s test who gets me to my multimedia best. Easy win here to Bing, their nice presentation of a strip of video results is a slam dunk winner over Google’s one row per video, YouTube-centric presentation

Google wins, but not by a huge margin. What’s not fair is that the Bing It On challenge takes the query-completing autosuggest feature out of play, and also Google’s instant search. Personalization also disappears, though that’s not a bad thing. The pages are also incomplete, so you can’t quite use search in the way you might. But, all up, it’s a reasonable way to compare the two.

What happens when you try it? Is it the Google habit for you, or are you thinking about a switch to Bing?

Your action plan: using feedback to drive your career

I recently published this blog post on seeking career feedback. Once you’ve sought feedback, it’s time to make choices about what you’re going to do, share the plan with your manager, and gauge your progress as you work on it.

The negatives

If you’ve asked for feedback, listened, and recorded it, you’re ready to start creating an action plan. You now need to decide on the importance of each constructive piece of feedback. Here are some things to consider:

  • How many times did you hear it? The more times, the more important
  • Who did you hear it from? Worry more about your boss’s opinion than your peers’, and more about your peers’ than anyone else’s
  • When did they say it? The second thing is often the most important – people warm up with a gentle message, and often end later with low priority points
  • Is it a perception or a reality? Did you have an off-day, or an off-interaction that was out of character? Or is this a genuine flaw?
  • Do you think it is correct? Was it on your list?

I recommend finding the top five pieces of feedback, and sorting them from most- to least-important by considering the criteria above. You don’t want to work on too many things at once.

The positives

Don’t take positive feedback for granted. You can use the same techniques as you’ve used for the negatives to create your list of top strengths. Make sure you do this too: figure out your top five strengths.

Creating a Plan

You have a fundamental choice in creating a plan. You can decide to lean hard on your strengths and have them propel you further forward, or you can choose to work on the weaknesses so they don’t hold you back. The right thing to do is usually somewhere in between: focus on improving a couple of weaknesses at any one time, and work on using your strengths to their maximum potential.

One thing I’ve observed is that senior people are usually held back by their weaknesses. In part it’s the Peter Principle, and in part it’s the fact that everyone around them is pretty awesome, and flaws stand out. There’s definitely a point I often see where people get confused – they expect to advance  in their career because of the awesome competencies they have, and then suddenly they’re stuck because of their weaknesses. It often takes people a while to accept that a few things need to change, especially if they’ve never heard negative feedback before.

I’d recommend taking your top two or three negatives, and your top one or two strengths, and writing them down as the areas you want to focus on. Bonus points if you sort them into priority order. Now, you need an action plan. Next to each point, write key steps you’ll take to address that weakness, or showcase that strength. The more actionable, the better – it’s not that helpful to say “you’ll improve your public speaking”, it’s helpful to say “give three public talks in 2012, and seek actionable feedback immediately after each presentation”.  Try and use quantities, dates, names, situations, or other concrete points in creating the plan. Remember that taking action on weaknesses takes you out of your comfort zone – so the steps should feel hard, awkward, and uncertain.

Executing the plan

If you’ve got an actionable plan, I’d recommend reviewing it with your boss. At the very least, you’re going to look good for having taken career development seriously. You’ll probably also get great feedback on whether this is a good plan or not – and, again, your boss’s opinion matters.

Now you can go execute your plan. Good luck. Keep an eye on your progress: I’d recommend giving yourself a green, yellow, or red rating on each point every six weeks or so. Ask the people who gave you’re the feedback points whether they’re seeing a change – the door is open with them, and you should use it.

Of course, this is a process that never ends. You can complete the plan successfully, and there’ll be another plan ready to be created by starting over. Good luck, I hope you use feedback successfully in building your career.

Six months, fifty thousand visits

Thanks for traveling on my blog journey. It’s been fun to have you along. It’s great to be writing again.

I’ve had around 50,000 views of my 31 posts so far. Here’s a few other factoids from the journey thus far:

  • The most popular post was about eBay’s size and scale, which I published on June 26. It’s had around 5,000 views, and made it into WordPress’s “Freshly Pressed” section. It’s also received the most comments and likes, and contributed to the blog’s busiest day, June 27
  • The least popular post was about my keynote at the PHP UK 2012 conference. That had 176 views
  • This story about Bing’s image search is one of my favorite posts, and my biggest surprise with only around 400 views
  • Referrals to the blog come from a few popular sources: Twitter (3500 referrals), Google (2200), Facebook (2300), LinkedIn (1400), and WordPress (1200)
  • Referrals to the blog don’t come from Bing (58 referrals), Ask (13), or Yahoo! Search (8)
  • I’ve had exactly 25,000 visits from the United States (where I live), 2600 from the UK, and 2400 from Australia (where I’m originally from)
  • When people search on Google (and subsequently land on a blog page), the most popular queries they type are: ebay.com, five variations of my name, query rewrite, and byte versus bit inverted index
  • There’s a few sites out there that occasionally highlight my posts, which I appreciate very much. The awesome highscalability.com has driven 900 views, Jason Haley’s great “interesting finds” blog has driven 58, and Y Combinator’s Hacker News about 80
  • If gossip in the corridors were a measure, this post about fitness and nutrition caused the biggest stir and got people talking the most
  • Ardent Logophile has offered the most comments on the blog, and thoughtful ones at that. Thanks Ardent!
  • A bizarre factoid: someone translated this post of mine on successful teams into Japanese, and added cool pictures (including Star Wars stormtroopers having a meeting)
  • This is far from the most popular site I own. That honor goes to the (no longer maintained) webdatabasebook.com that I built as a companion to my first book

The open question is what I should write about next. More on search engines? Management? Fitness and nutrition? eBay? Something else? Your thoughts?

Have a great week.

Knowing Your Customer with Data

Are you really data driven? Here’s what I’ve learnt about making decisions using quantitative data.

A Typical Test versus Control Experiment

Let’s get on a page about what we’re discussing. Most web companies run test versus control experiments, or A/B tests. The idea is simple:

  1. Divide the customers into populations
  2. Show one population the control (default, “A”) experience
  3. Show one or more populations the test (new, altered, “B”) experience
  4. Collect data from each population
  5. Compute metrics from the data
  6. Understand the relative results between the test and the control
  7. Make decisions: either keep the control, or replace it with a new, better experience from a positive test

Explaining how to really know your customer with data at the 2012 eBay Data Conference

It’s critical in Step 5 to compute confidence intervals, that is, statistical measures that tell you the probability that the phenomena you’re seeing is real. For example, using a one-sided t-test, you might learn that there’s a 90% probability that the test experience is better than the control.

Let’s suppose you’ve reorganized the layout of your site, and what you’ve learnt is that customers abandon the pages much less. Through your test, you’re 90% confident that a new experience you’ve tested is better than the default, control experience. On that basis, you might want to launch the new, test experience — but I’d caution you to learn more before you make a decision.

Where does the behavior come from?

I recommend you always dig deep into your data. Learn as much as you can before you decide. I like to see data “cut” (broken into sub populations) by:

  • Device (Mobile vs. tablet vs. desktop. Break it down by brand, make, and model [for example, Apple iPad HD])
  • Operating system (Linux vs. Mac OS X vs. Windows, break it out by versions)
  • Browser (Chrome vs. IE vs. Firefox vs. Safari, break it out by version)
  • Channel (Visits from within your site vs. visits from Google search vs. Visits from paid advertising)

When you do this, and add in your confidence intervals, you will almost always learn something. Is the new experience working as expected on the dreaded IE6 and IE7? Any issues on a mobile device? Does it work better when customers are navigating within your site versus landing in the middle of it from a Google search?

Ask yourself: what can I improve before I make a decision? And always ask: knowing this detail, am I still comfortable with my decision? Be very careful about launching new experiences that help most of the population, and hurt some of it — ask whether you can live with the worst case experience.

When you do these cuts, make sure the data makes sense. I’ve learnt over the years that when you see something that you don’t expect, it’s almost always a bug, or an error in the data. Never explain away surprises with complex theories — something is probably broken.

Who or what is affected by the change?

You can think of the previous section as suggesting you cut the data funnel — where the behaviors come from. You should also cut the data by who or what it affects on your site:

  • Which customers are affected? (Old versus new, first time visitors versus returning, regular versus occasional, international versus domestic, near versus far, and so on)
  • What categories are affected? (Fashion versus electronics, browse versus buy, and so on)
  • Which queries are affected? (A search-centric view. Long versus short queries, English versus non-English, Navigational versus Informational, and so on)
  • Which sessions are affected? (Long research sessions versus short purchase sessions, multi-query sessions versus single-query sessions, multi-click sessions versus single-click sessions, and so on)
  • Which pages are affected?

All the same caveats and suggestions from the previous section apply here.

I also love to compute many different metrics. While you’ll often have a “north star” metric that you’re trying to move — whether it’s relevance of the experience, abandonment of your site, or the dollar value of goods sold — it’s great to have supporting data to inform your decision. When you compute more metrics, you almost always will see contradiction that makes your decisions harder: but it’s always better to know more than to have your head in the sand. It takes smart, sensible debate to make most launch decisions.

The mean average hides the truth

Here’s an over-simplified example. Suppose six customers rate your site on a scale of 1 (horrible) to 10 (amazing). In the control, they rate you as 4, 5, and 6. In the test, they rate you as 1, 4,  and 10. The control and test have a mean average rating of 5. (Ignore the statistical significance for the simple example.)

On this basis, you might abandon the work on the new experience — it’s no better than the control. But if you dig in the data, you’d see that some customers love the new experience, and some hate it. Imagine if you can fix whatever is causing customers to hate it — if you could get that 1 to be a 5, you’d see a mean average of over 6 for the test. The fastest way to move a mean is to fix the outliers: focusing on what’s broken.

I don’t like mean averages because they hide the interesting nuggets. I like to see 90th and 95th percentiles — show me the performance of the best and worst 10% or 5% of customer experiences respectively. In our simple example, I’d love to know that the worst customer experience was 1 in the test and 4 in the control, and the best experience was 10 and 6. Knowing this, I’m, excited about the potential of the test, but worried that something is very wrong about it for some customers. That guides me where to put my energy.

Don’t be myopic

It’s common to measure your feature in the product, and ignore the ecosystem. For example, you might be working on an improvement on some part of a page — imagine that you’re working on Facebook’s news feed. You’ve figured out an improvement, run the test, seen much better customer engagement, and you’re excited to launch.

But did you worry about what you’ve done to the sponsored links on the right side of the page? Did you hurt the performance of another part of the product owned by another team? It’s common for features to hurt performance of others, and often cause the overall result to be neutral. This happens between features on one page, and between pages. Make sure you always measure overall page and site performance too.

Tests don’t tell you everything

Tests don’t tell you what you don’t measure. Measure as much as you can.

Even if you do measure as much as you can, there’ll be much happening outside your test that’s important. For example, if you run a test for a week, you don’t learn anything about the long term effects on customer retention. You don’t know anything about how customers will adapt to using the feature. You won’t know whether the effects are seasonal, or what might happen if some of your assumptions change — for example, what if another team changes something else on the page or site in the future?

This can be ok. Just realize the limitations, and be aware that retesting in the future might be a smart choice.

Quantitative testing also won’t tell you anything qualitative about what you’re working on. That’s a whole another theme of testing — and one I do plan to come back to talk about in the future.

Afterword

Around 1,000 people attended the employee-only eBay Data Conference recently. I had the opportunity to speak to them through my opening keynote address, and this post is based on that presentation. Thanks to Bob Page for inviting me.