We all want to build fast, reliable mobile apps. Facebook couldn’t make its HTML5 mobile app deliver on that goal, and decided to build its own native app. In practice, this means retiring an app that is a browser-like shell that renders web pages (a thin client), and launching a fully-fledged app on the mobile device (a thick client, with apparently a sprinkling of HTML5 inside).
That’s a step backwards, and flies in the face of history. Haven’t we just been through a twenty-year evolution of thicker clients in general being replaced by thinner clients? How many apps did you install on your PC this year compared to ten years ago? How many tabs do you have open in your browser?
Should Facebook have just made the web faster and more reliable? Rather than mostly abandon HTML5, why didn’t they evolve the standard and make the web better? It wouldn’t be the first time that has happened — I’m many of you remember web standards evolving in the 1990s, and you can thank those days for the better experiences we all have today. So, an opportunity lost — but I am sure the story is not over, and indeed it sounds like their new app is indeed a “hybrid app” (where there is some HTML5 inside the native app’s framework).
This change also makes experimentation much harder. On the web, most major companies are running test versus control experiments or A/B tests. We put a population of users in an experiment, and compare their behaviors with those who aren’t in the test — for example, at eBay, we try out improvements to search ranking on a small population of customers and compare their behaviors to those who are seeing the regular results. The great thing about the web is you can do this fast — pretty much as fast as you can code and test the changes — and test large numbers of simultaneous variants. The outcome is you make fast progress in improving your product on behalf of your customers.
Building native apps makes experimentation harder. You could build an “A” and a “B” experience into an iPhone native app, get it through Apple’s approval process, and try out the two experiences on the customers. But the barrier to entry is much higher — you can only run a couple of experiments, and you probably only release once a month at most. You’re not going to evolve your application as fast as your customers want.
There are always tensions and tradeoffs: in this case it is speed and reliability on one side, and the future of the web and experimentation on the other. I would have fought hard to stay in the latter camp.
My passion is fitness, and part of fueling the passion is having the right gadgets to stay motivated, work hard, and enjoy what I do. Here’s my top five (which is subject to change any year).
FitBit
I’m not the first to put the fitbit at the top of a list — mashable did it just two weeks ago.
The original fitbit. A small, clever, wireless pedometer that’ll keep you motivated
For $99 you get yourself a tiny, wireless pedometer. It counts daily steps accurately, measures how many flights of stairs you’ve climbed, and has a nice stopwatch. It also has a clock, a fairly useless calorie burn guesstimator, and a few other features. The stopwatch is useful for timing how long you’ve been asleep — press and hold the button on the front and the stopwatch starts, press and hold the button and it stops. If the stopwatch runs for an extended period, fitbit figures out you were asleep and records it as such.
What’s most cool is the website. When you walk past the basestation that comes with your fitbit, your data is uploaded to fitbit.com. You can then inspect the data online, including step totals for the week, badges you win for hitting milestones, lifetime achievements, average sleep duration, and more. For me, there’s a healthy competition with friends I’ve connected to on fitbit: who’s did the most steps this week and where am I ranked. You get a weekly email on Tuesdays with a summary of last week’s performance.
The fitbit leaderboard at the fitbit.com website. If you own a fitbit, compete with your friends.
I can’t say I’m achieving my step goals every week, but I love how the fitbit motivates me to move.
TRX
The TRX Suspension Trainer or TRX is a new essential in my fitness arsenal. I throw it in my carry-on luggage when I travel, and toss it in the car when I hit the running track. It’s around $200.
The TRX is simple: two handles attached to each end of a strap, with an anchor point in the middle. You attach the anchor point to a stable, high mounting point, and then use the handles to workout. It’s a cousin of men’s gymnastic rings. You can attach it to a tree, monkey bars, a chip up bar in the gym, or the (slightly expensive) mounting options that the TRX folks sell.
The TRX is cool because it replaces a variety of other workout gear. You can use it to exercise your chest, back, abs, arms, and much more — it’s a fine alternative to dumbbells, barbells, and the variety of machines in your gym. The bonus is it’s also unstable in a good way — you need to work more muscles to carry out many of the exercises, and so even the humble pushup becomes more of an abs and shoulder stabilization exercise. The video that’s embedded below shows you fifty exercises you can do — it illustrates the amazing versatility, even if a few of the exercises aren’t to my liking.
Chin up bar
When I was in high school, my record number of chin ups was (maybe) three. They’re a lifelong nemesis. But me being me, I like a challenge — so what’s better than installing a chin up bar in your garage, and getting after improving? I’ve tried a few, and the stud bar pullup bar is the standout winner at $140. It’s sturdy, reasonably easy to install, and easily mounted far from walls.
The stud bar pull up bar. It attaches sturdily to the studs in your roof, giving you plenty of clearance from walls.
Chin up bars aren’t just for chins ups, and they don’t just work your lats (the muscles under your armpits). With a forward grip, you sure do work your lats, but you also work your core muscles and more. With a reverse grip, your biceps come into play. And there’s lots of great abs exercises you can do by hanging from the bar, and lifting, raising, or rotating your knees. If you want a strong core, it’s a great investment.
Resistance Bands
Resistance bands are rubber bands with (usually) handles at each end. Similarly to the TRX, they’re a versatile way of working muscles in a way that doesn’t require iron. They’re almost as portable as the TRX — easy to throw in a bag when you’re travelling. My favorites are from bodylastics.com. For $36, you can buy their entry-level set — and, honestly, I wouldn’t but their more expensive ones (unless you’re super strong, or you want to work out with a partner frequently).
A truly random picture of a few resistance band exercises. It shows you that pulling the ends of a long rubber band is a versatile way to exercise your body
The idea is fairly simple: pull the handle, stretch the band, work one or more muscles. For example, you can wrap a band around a pole, and pull the handles on each end toward your hips to work your back muscles. The bodylastics products come with a nice booklet that illustrates tens of exercises, and has a few suggested routines for those interested in different sports and with different levels of experience. YouTube is also full of resistance band workouts.
Agility Ladder
An agility ladder is a set of plastic straps that are held together on either side by a rope or strap to make a ladder-like apparatus. You lay it out on a floor or path, and then run through it in a variety of different ways; indeed, “run” is a gross generalization, there’s tens of complicated ways to traverse the length of the ladder, many involving complex aerobics-like moves. The benefit is a cardio, brain, and agility workout — you work up a sweat while also teaching your body how to react, accelerate, and move in patterns. They’re incredibly portable, they stow away in a small bag that’s easily thrown into your luggage.
Three guys making their way through an agility ladder. It’s fun to follow someone else — a great way to learn, and challenge yourself to a race
I’ve got a list in my head of around thirty different moves I do with an agility ladder — I do each one up and back, catch my breath, and hit the next. It’s a buzz, and doubly-so if you’ve got headphones, music that you can keep pace with, and you’re in the mood to push yourself.
Honorable Mentions
I’m disappointed I couldn’t squeeze in my iPod, jump rope, medicine ball, Bowflex 1090 dumbbells, or some humble cones. If this post gets more than a few views, I’ll post my top ten someday soon. See you all next week (and apologies for the intermittent posts this month — work is super busy).
The Bing folks launched their new bingiton challenge today. It’s an anonymized (well, almost) taste test of Google versus Bing for queries that you supply. The challenge is to try five queries, and see how often Bing beats Google.
My results from the Bing It On challenge. Google 3, Bing 2.
You can see what happened for me: Google 3, Bing 2. Bing claims this isn’t typical, I’ll let you try and it see if they’re right; they claim Bing beats Google 2:1 in their tests.
Here’s why Google and Bing won their respective queries for me:
Gold Base bobblehead. Google won this hands down, it’s all down to the first result. They show a definitive site with a list of the gold base baseball bobbleheads of the 1960s. Bing whiffs with two eBay links in positions one and two (much as a I love eBay, that isn’t what I’m looking for)
Hugh Williams. Come on, we all try looking for ourselves. Bing wins here, they have a link to my site as the first result, but it’s the presentation that makes it a winner — they include an image, a link to my LinkedIn page, and my email address all in a single result. Google whiffs with a link to the actor’s wikipedia page, and some much less attractive links to pages about me in their later results
Bobby Valentine. Was checking how fresh the indexes are, and it’s a dead heat — they’ve both got the latest news and great results. Google wins for a slightly more attractive presentation of the images throughout the page
Starbucks Sunnyvale. Let’s test who’s best at local queries. Again, it’s close to a dead heat — both do a great job presenting information about Starbucks locations in Sunnyvale in the first half of the page. What makes the difference is Google’s presentation of Yelp results that are visual and helped me choose a Starbucks, while Bing presented some fairly useless results in the lower half of the page. Minor victory to Google
The Shock of the Lightning Video. Let’s test who gets me to my multimedia best. Easy win here to Bing, their nice presentation of a strip of video results is a slam dunk winner over Google’s one row per video, YouTube-centric presentation
Google wins, but not by a huge margin. What’s not fair is that the Bing It On challenge takes the query-completing autosuggest feature out of play, and also Google’s instant search. Personalization also disappears, though that’s not a bad thing. The pages are also incomplete, so you can’t quite use search in the way you might. But, all up, it’s a reasonable way to compare the two.
What happens when you try it? Is it the Google habit for you, or are you thinking about a switch to Bing?
I recently published this blog post on seeking career feedback. Once you’ve sought feedback, it’s time to make choices about what you’re going to do, share the plan with your manager, and gauge your progress as you work on it.
The negatives
If you’ve asked for feedback, listened, and recorded it, you’re ready to start creating an action plan. You now need to decide on the importance of each constructive piece of feedback. Here are some things to consider:
How many times did you hear it? The more times, the more important
Who did you hear it from? Worry more about your boss’s opinion than your peers’, and more about your peers’ than anyone else’s
When did they say it? The second thing is often the most important – people warm up with a gentle message, and often end later with low priority points
Is it a perception or a reality? Did you have an off-day, or an off-interaction that was out of character? Or is this a genuine flaw?
Do you think it is correct? Was it on your list?
I recommend finding the top five pieces of feedback, and sorting them from most- to least-important by considering the criteria above. You don’t want to work on too many things at once.
The positives
Don’t take positive feedback for granted. You can use the same techniques as you’ve used for the negatives to create your list of top strengths. Make sure you do this too: figure out your top five strengths.
Creating a Plan
You have a fundamental choice in creating a plan. You can decide to lean hard on your strengths and have them propel you further forward, or you can choose to work on the weaknesses so they don’t hold you back. The right thing to do is usually somewhere in between: focus on improving a couple of weaknesses at any one time, and work on using your strengths to their maximum potential.
One thing I’ve observed is that senior people are usually held back by their weaknesses. In part it’s the Peter Principle, and in part it’s the fact that everyone around them is pretty awesome, and flaws stand out. There’s definitely a point I often see where people get confused – they expect to advance in their career because of the awesome competencies they have, and then suddenly they’re stuck because of their weaknesses. It often takes people a while to accept that a few things need to change, especially if they’ve never heard negative feedback before.
I’d recommend taking your top two or three negatives, and your top one or two strengths, and writing them down as the areas you want to focus on. Bonus points if you sort them into priority order. Now, you need an action plan. Next to each point, write key steps you’ll take to address that weakness, or showcase that strength. The more actionable, the better – it’s not that helpful to say “you’ll improve your public speaking”, it’s helpful to say “give three public talks in 2012, and seek actionable feedback immediately after each presentation”. Try and use quantities, dates, names, situations, or other concrete points in creating the plan. Remember that taking action on weaknesses takes you out of your comfort zone – so the steps should feel hard, awkward, and uncertain.
Executing the plan
If you’ve got an actionable plan, I’d recommend reviewing it with your boss. At the very least, you’re going to look good for having taken career development seriously. You’ll probably also get great feedback on whether this is a good plan or not – and, again, your boss’s opinion matters.
Now you can go execute your plan. Good luck. Keep an eye on your progress: I’d recommend giving yourself a green, yellow, or red rating on each point every six weeks or so. Ask the people who gave you’re the feedback points whether they’re seeing a change – the door is open with them, and you should use it.
Of course, this is a process that never ends. You can complete the plan successfully, and there’ll be another plan ready to be created by starting over. Good luck, I hope you use feedback successfully in building your career.
Thanks for traveling on my blog journey. It’s been fun to have you along. It’s great to be writing again.
I’ve had around 50,000 views of my 31 posts so far. Here’s a few other factoids from the journey thus far:
The most popular post was about eBay’s size and scale, which I published on June 26. It’s had around 5,000 views, and made it into WordPress’s “Freshly Pressed” section. It’s also received the most comments and likes, and contributed to the blog’s busiest day, June 27
Referrals to the blog come from a few popular sources: Twitter (3500 referrals), Google (2200), Facebook (2300), LinkedIn (1400), and WordPress (1200)
Referrals to the blog don’t come from Bing (58 referrals), Ask (13), or Yahoo! Search (8)
I’ve had exactly 25,000 visits from the United States (where I live), 2600 from the UK, and 2400 from Australia (where I’m originally from)
When people search on Google (and subsequently land on a blog page), the most popular queries they type are: ebay.com, five variations of my name, query rewrite, and byte versus bit inverted index
This is far from the most popular site I own. That honor goes to the (no longer maintained) webdatabasebook.com that I built as a companion to my first book
The open question is what I should write about next. More on search engines? Management? Fitness and nutrition? eBay? Something else? Your thoughts?
Are you really data driven? Here’s what I’ve learnt about making decisions using quantitative data.
A Typical Test versus Control Experiment
Let’s get on a page about what we’re discussing. Most web companies run test versus control experiments, or A/B tests. The idea is simple:
Divide the customers into populations
Show one population the control (default, “A”) experience
Show one or more populations the test (new, altered, “B”) experience
Collect data from each population
Compute metrics from the data
Understand the relative results between the test and the control
Make decisions: either keep the control, or replace it with a new, better experience from a positive test
Explaining how to really know your customer with data at the 2012 eBay Data Conference
It’s critical in Step 5 to compute confidence intervals, that is, statistical measures that tell you the probability that the phenomena you’re seeing is real. For example, using a one-sidedt-test, you might learn that there’s a 90% probability that the test experience is better than the control.
Let’s suppose you’ve reorganized the layout of your site, and what you’ve learnt is that customers abandon the pages much less. Through your test, you’re 90% confident that a new experience you’ve tested is better than the default, control experience. On that basis, you might want to launch the new, test experience — but I’d caution you to learn more before you make a decision.
Where does the behavior come from?
I recommend you always dig deep into your data. Learn as much as you can before you decide. I like to see data “cut” (broken into sub populations) by:
Device (Mobile vs. tablet vs. desktop. Break it down by brand, make, and model [for example, Apple iPad HD])
Operating system (Linux vs. Mac OS X vs. Windows, break it out by versions)
Browser (Chrome vs. IE vs. Firefox vs. Safari, break it out by version)
Channel (Visits from within your site vs. visits from Google search vs. Visits from paid advertising)
When you do this, and add in your confidence intervals, you will almost always learn something. Is the new experience working as expected on the dreaded IE6 and IE7? Any issues on a mobile device? Does it work better when customers are navigating within your site versus landing in the middle of it from a Google search?
Ask yourself: what can I improve before I make a decision? And always ask: knowing this detail, am I still comfortable with my decision? Be very careful about launching new experiences that help most of the population, and hurt some of it — ask whether you can live with the worst case experience.
When you do these cuts, make sure the data makes sense. I’ve learnt over the years that when you see something that you don’t expect, it’s almost always a bug, or an error in the data. Never explain away surprises with complex theories — something is probably broken.
Who or what is affected by the change?
You can think of the previous section as suggesting you cut the data funnel — where the behaviors come from. You should also cut the data by who or what it affects on your site:
Which customers are affected? (Old versus new, first time visitors versus returning, regular versus occasional, international versus domestic, near versus far, and so on)
What categories are affected? (Fashion versus electronics, browse versus buy, and so on)
Which queries are affected? (A search-centric view. Long versus short queries, English versus non-English, Navigational versus Informational, and so on)
Which sessions are affected? (Long research sessions versus short purchase sessions, multi-query sessions versus single-query sessions, multi-click sessions versus single-click sessions, and so on)
Which pages are affected?
All the same caveats and suggestions from the previous section apply here.
I also love to compute many different metrics. While you’ll often have a “north star” metric that you’re trying to move — whether it’s relevance of the experience, abandonment of your site, or the dollar value of goods sold — it’s great to have supporting data to inform your decision. When you compute more metrics, you almost always will see contradiction that makes your decisions harder: but it’s always better to know more than to have your head in the sand. It takes smart, sensible debate to make most launch decisions.
The mean average hides the truth
Here’s an over-simplified example. Suppose six customers rate your site on a scale of 1 (horrible) to 10 (amazing). In the control, they rate you as 4, 5, and 6. In the test, they rate you as 1, 4, and 10. The control and test have a mean average rating of 5. (Ignore the statistical significance for the simple example.)
On this basis, you might abandon the work on the new experience — it’s no better than the control. But if you dig in the data, you’d see that some customers love the new experience, and some hate it. Imagine if you can fix whatever is causing customers to hate it — if you could get that 1 to be a 5, you’d see a mean average of over 6 for the test. The fastest way to move a mean is to fix the outliers: focusing on what’s broken.
I don’t like mean averages because they hide the interesting nuggets. I like to see 90th and 95th percentiles — show me the performance of the best and worst 10% or 5% of customer experiences respectively. In our simple example, I’d love to know that the worst customer experience was 1 in the test and 4 in the control, and the best experience was 10 and 6. Knowing this, I’m, excited about the potential of the test, but worried that something is very wrong about it for some customers. That guides me where to put my energy.
Don’t be myopic
It’s common to measure your feature in the product, and ignore the ecosystem. For example, you might be working on an improvement on some part of a page — imagine that you’re working on Facebook’s news feed. You’ve figured out an improvement, run the test, seen much better customer engagement, and you’re excited to launch.
But did you worry about what you’ve done to the sponsored links on the right side of the page? Did you hurt the performance of another part of the product owned by another team? It’s common for features to hurt performance of others, and often cause the overall result to be neutral. This happens between features on one page, and between pages. Make sure you always measure overall page and site performance too.
Tests don’t tell you everything
Tests don’t tell you what you don’t measure. Measure as much as you can.
Even if you do measure as much as you can, there’ll be much happening outside your test that’s important. For example, if you run a test for a week, you don’t learn anything about the long term effects on customer retention. You don’t know anything about how customers will adapt to using the feature. You won’t know whether the effects are seasonal, or what might happen if some of your assumptions change — for example, what if another team changes something else on the page or site in the future?
This can be ok. Just realize the limitations, and be aware that retesting in the future might be a smart choice.
Quantitative testing also won’t tell you anything qualitative about what you’re working on. That’s a whole another theme of testing — and one I do plan to come back to talk about in the future.
Afterword
Around 1,000 people attended the employee-only eBay Data Conference recently. I had the opportunity to speak to them through my opening keynote address, and this post is based on that presentation. Thanks to Bob Page for inviting me.
I recently enjoyed a conversation with our 2012 eBay interns. We discussed careers, leadership, business, and engineering. Someone asked me about career path: should I follow the manager or individual contributor path? It’s a great question.
The answer is it depends on what you’re passionate about, and ultimately that’ll be key in determining whether you’re good at it. Here’s my litmus test for the manager career track:
Are you passionate about leading people? If not, don’t become a manager. If yes, you need to develop people management skills: from growing people and helping them succeed, to delivering tough messages and handling challenging personal circumstances. You’ll need to spend much of your time working with people
Is having impact through others rewarding to you? If yes, that means you feel reward when your team hits its goals, the people around you solve problems, and your employees work together as team. If not, you’re someone who highly values personally contributing ideas, solving problems, or creating output (such as writing code)
There’s no right or wrong answer, and it isn’t black or white. You can be a good manager who still contributes personally, but realize its more about others than you. You can be a great individual contributor who’s passionate about helping others succeed; indeed, that’s a prerequisite of a senior individual contributor. But at the core, management is about leading others and being accountable for a team, and succeeding or failing based on their contribution.