Monthly Archives: May 2012

Work, Life, Balanced: 5 tips

When I wrote my most-popular post so far, Fighting fit: Why you need to be in top shape to be a leader, I promised I’d write a future post about work-life balance. So, here are five things that work for me.

1. Do it and then forget it

One of my favorite sayings is many pebbles do a mountain make. One example is it’s hard to be productive, focused, and energized with a thousand small todos in your head. It’s hard enough having a few large things. Empty your head of the small stuff: do small things when you think of them, don’t file them away, and don’t have them hanging over you. This lowers my stress, gives me a warm feeling of having completed something, and makes life better.

A good example is an email. If you’ve read it, it needs a reply, and the reply is going to take a minute or so: just do it now. The cost of reading it, filing a todo in your head, finding it again, and replying is much higher. It’s an added stress, and it’s occupying valuable brain real estate that could be used wisely.

Where’d this idea come from? My favorite management book in the past five years is Getting Things Done by David Allen. From it I learnt this tip: if it takes less than two minutes to do, don’t defer it, do it now.

2. A work-free day

Pick Saturday or Sunday, and do no work at all. Don’t read your email. Don’t touch the computer. Don’t call anyone. Put work aside, do your best not to think about it. It’s not that hard — if something urgent comes up, someone will call you.

You’ll be surprised how good this makes you feel. You’ll have a great day, and you’ll be energized when you return to work on the next one.

3. Be consistent

Early in my career, I’d take it easy for most of a work milestone, and then crank up the all-night, weekend work to get things done towards the end. That worked for a while, but it’s not sustainable over a career.

I recommend consistency. Try and work the same hours, regardless of the deadlines and pressures. Put in a solid day, work hard from the start of a project, and keep on track all the time. If there’s less to do than usual, don’t work less: this is your chance to clean up email, documents, develop your career, or network. (This won’t always work — there are definitely times where you will need to work harder, but work to make those the exceptions.)

You’ll find being consistent burns you out less. It’s the right approach for the long haul.

4. Take a vacation

Your work wants you there for the long haul, and they give you your vacation so that you can relax, recharge, and come back energized. So do it.

Turn the email off — I actually remove the account from my smartphone. Turn on the “out of office” message on your email, and state you’re not reading email because you’re on vacation. Tell your boss your home phone number and your personal email address, and ask her to contact you there in an emergency.

Try and have one vacation per year that’s at least two weeks. It takes a week to wind down, and that second week is bliss. If you fragment it too much, you may not get the relaxation you’ve earnt.

5. Quality is more important than Quantity

Working long hours is a badge of courage. Strangely, using the hours wisely doesn’t have same status. It should.

I vote for using a sensible number of hours wisely instead of using a large number of hours poorly. Some of the most effective people I know work most days from 9 to 6, or 8 to 5, or 8 to 6. They think about what they want to achieve each day, stay focused on those things, and avoid meetings they don’t need to be in. They tend to also be the folks who are consistent in their approach — you’ll see them working to that timetable every day, most of the year.

Try relentlessly optimizing your day, give yourself a focus on quality. A good tip to get started is to write down the four things you want to achieve today before you start your day — and promise yourself you’ll do them before you leave.

Hope this helps you improve your work-life balance. Feedback very welcome.

Popular Queries

I downloaded the (infamous) AOL query logs a few days back, so I could explore caching in search. Here’s a few things I learnt about popular queries along the way.

Popular queries

The top ten queries in the 2006 AOL query logs are

  1. google
  2. ebay
  3. yahoo
  4. yahoo.com
  5. mapquest
  6. google.com
  7. myspace.com
  8. myspace
  9. http://www.yahoo.com
  10. http://www.google.com

The world’s changed since then: you wouldn’t expect to see a few of those names in the top ten. But what’s probably still true is that the queries are navigational, that is, queries that users pose when they want to go somewhere else on the web. The queries weather and american idol are the only two in the top twenty that aren’t navigational (they’re informational queries).

Misspellings

The misspellings of google are startling. Any spelling you can imagine is in the query logs, and they’re frequent. Here’s a few examples from the top 1,000 queries:

  • googlecom
  • google.
  • http://www.google
  • google.cm
  • googl.com
  • googl
  • goole.com
  • goole
  • goog
  • googel
  • google.co
  • googles
  • goggle.com
  • goggle

This is true of every popular query: a quick glance at ebay (the second-most popular query) finds e-bay, e bay, ebay.com, ebay search, ebay.om, eby, and many more.

And don’t get me started on the different spellings of britney (as in spears): brittanybrittneybritnybritney, …

The good news for users is that most of these misspellings or alternate expressions work just fine at google. That’s the miracle of query rewriting in search.

Single Characters and Other Typing Errors

Single characters queries are surprisingly common. The ten most popular are m (51st most popular query), g (89th), y (115th), a, e, h, w, c, s, and b. Here my theory on m: users are typing <something>.com (which we know is very popular), and at the end they hit enter just before hitting m, and then hit m, and press enter again. Transpositions are pretty common, and m is far-and-away the most popular letter that ends a query. My theory on g and y is they’re the first letters of google and yahoo, and the user hit enter way too early. I don’t have a URL theory on a or e, they are very common letters. On h and w, they’re the beginning of http and www.

There’s many other had-a-problem-with-the interface queries that are popular. Queries such as mhttp, comhttp, .comhttp, and so on are common. What’s happened here is the user has gone back to the search box, partially erased the previous query, typed something new, and hit enter early.

Of the top 1000 queries, 91 begin with www. It’s basically a list of the top sites on the web, that half way through repeats with the initial period replaced with a space (example: http://www.google.com is the 10th most popular query, www google.com is the 123rd most popular query). I wonder if using a www prefix has changed in 6 years? My first theory on this is users don’t get the difference between the search box and the browser address bar — and Google Chrome sure has fixed that problem (make them the one thing). Brilliant, simple innovation. The second theory is that users think they need to put www at the front of queries when they’re navigational — you’ll often hear people talk about that in user experience research sessions.

Caching in Search

Did you know that the vast majority of results from search engines are served from a cache of previously-computed results? Probably only around one third of all queries are actually evaluated using the backend search infrastructure.

Caching provides fast response to most user queries, while allowing search companies to spend much less on hardware, or to devote the resources in their search infrastructure to better computation of results.

Why does caching work in search?

In this post on click curves, I explained that most everything in search follows an inverse power law distribution (a so-called “Zipf curve”). The implication is that a few queries account for the majority of distinct queries, that is, most users are searching for the same things.

AOL memorably released three months of query logs in 2006. They were slammed for doing so and pretty quickly apologized and took down the data. However, it’s a pretty nice data set for our purposes of discussing caching.

The most popular query at AOL in those three months of 2006 was google. Around 0.9% of the queries typed by users were those looking to leave AOL’s search and head over to Google. The second most popular query was ebay at 0.4% of all queries, and the third yahoo at 0.4%. If you sum the frequency of the top ten unique queries, you’ve seen around 3% of all the query volume. Here’s what happens as you inspect more unique queries:

  • If you sum the total frequency of the top 100 queries, you get around 6% of all the user query volume
  • The top 1,000 unique queries are around 11% of the query volume
  • The top 10,000 are around 20% of the volume
  • The top 100,000 are around 34% of the volume
  • The top 1,000,000 are around 58% of the volume

Those points are plotted on a log-log graph below.

Query cache effectiveness for the AOL search query logs. The y-axis is percentage of the total volume of queries that’s cached. The x-axis is the number of unique queries in the cache. Bottom line, storing over a million queries in the cache means you can serve over 60% of user queries from the cache.

We’d expect there’s diminishing returns in caching more queries and their results. As the queries become less frequent, there’s less benefit in caching their results. There’s no benefit in caching a query that occurs once. By the time you’re caching the millionth query from this set, you’re caching queries that occur only 5 times in 3 months. By the way, there are just over 36 million queries in the log, and about 10 million unique queries when they’re normalized (which I didn’t do a very good job of).

The key point to take away is that if we only store only the results for the top 100,000 queries, we can save our search backend from having to evaluate around 34% of all the queries that users pose.

This is a slight exaggeration, since we can’t quite key our search cache on query string alone. Remember that web search users have different language preferences, safe search settings, and so on. All up, the key for our cache probably has around ten parts — but remember than most users likely stick with the defaults, and so the query is the most variable element in the key. I don’t know quite what effect this’d have — but I bet it’s small (say, it reduces the caching effectiveness of the top 100,000 queries from 34% to 32% or so). I expect that the recent Bing and Google pushes into personalization have made caching harder: but I also bet that personalization affects relatively few queries.

Storing Cached Data

The key to the cache is the query and around ten other elements including safe search settings, language preference, market, and so on.

What’s stored in the cache is the results of the query with those settings. For web search, the results includes the list of matching URLs, the snippets, freshness information (more in a moment), and other elements you need to build the page. You might, for example, store the related searches, or the images or news or videos that are associated with queries that show more than web search results.

One of the tricks of caching is knowing when to expire what’s in the cache. You don’t want to keep showing results for a query when the results have actually changed; for example, maybe there’s a new snippet, or a URL has changed, or a new result has entered the top ten results, or there’s some breaking news. Here’s a few factors I thought of that you could use to expire results in the cache:

  • Historical change rate of the results (record how frequently the results change, and use that data to predict when it’ll change in the future)
  • What data is being displayed (if the results contain, for example, news sites, perhaps you expire the cache entry earlier)
  • Change in query frequency (if users suddenly start typing a query much more frequently, that’s a clue that’s something has changed)
  • How long the results have been cached (perhaps you have some time limits that ensure everything is refreshed on a cycle)
  • The load on the search engine (if the search engine is under heavy load, don’t expire the cache as aggressively; if it’s under low load, it’s a good time to refresh the cache)

When you expire a result in the cache, you fall back to the search backend to recompute the results for the query, and then you store those results in the cache.

Bottom line, caching is amazingly effective in search. It’s a super hard problem at eBay, given the dynamic nature of the auction and fixed price formats: items sell, bids change, prices change, and so on. We also blend auctions, fixed price items, and products dynamically based on the results — so even the mix of formats is dynamic. We’re excited about making caching work well at eBay, but we’ve so far not hit anywhere near the heights you’d expect from the analysis of AOL’s web search query logs. I’ll explain this more in the future.

You can learn more about the AOL query logs by downloading this paper from Abdur Chowdhury’s website. Here’s the full citation:

G. Pass, A. Chowdhury, C. Torgeson, “A Picture of Search“, The First
International Conference on Scalable Information Systems, Hong Kong, June,
2006.

I’ll explain some interesting facts about the AOL query logs in a future post.

Software Estimation

One of the hardest things to do as a software engineer is estimation. Even the most experienced engineers curse their estimates, and almost everyone has had an experience with underestimating. In this post, I share a few things I’ve learnt and observed along the way.

Basics of Estimation

When you’re beginning to work on software estimation, I’d recommend working on listing sequential tasks needed to complete a feature – there’d be time for design, for each development step, integration into the system or product, unit tests, and for iteration at the end to iron out the bugs. Just write down a list of the things you need to do to go from your design to the finished code that’s ready to ship.

I would never recommend estimating anything as taking longer than three days. Break it down into multiple steps – I’ve found that engineers don’t understand what they’re doing if the step’s duration is three or more days. I’d also recommend never breaking down items into less than half a day — that’s likely false precision and makes the plan unwieldy.

Once you’ve got a schedule, do the work, and then ask how you did. If you’ve finished early, you’ve “sandbagged” — figure out why you underestimate. If you finished late, you’ll have had work-life balance issues to hit the date you promised or you held up the schedule – not something any developer wants . In either case, ask what didn’t I understand when I estimated? What types of work items do I over- or under-estimate? How could I break down an item into smaller steps so that I could get it right?

Good estimation always includes time to be a developer. Think about the meetings you will attend, how long code reviews take, and that there’d always be a day or two where you’ll be fixing production issues. I would recommend never calling those out separately – personally, I have typically added about 20% to each of my work items to account for the “down time”. To me, it doesn’t make sense to call these out separately – they’re part of the time that it takes from starting to finishing a work item, not something that happens at a fixed point in the development cycle. I am allergic to work items such as “code hardening” and “meetings”.

In case you’re wondering, I was guilty of frequent underestimation earlier in my career; that seems the more common mistake that developers make. I learnt that my gut estimate is always under by a factor of about 2.5. I’ve heard others say 2 or 3, and one guy who used to say e (2.71828183). I figured this out by doing post-morterms – I’d go back, learn from mistakes, and strive to improve.

I’ve noticed that senior engineers tend not to like the linear list approach. They tend to do more tasks in parallel, and estimation becomes more complex. They often jump around the tasks while they figure out how to unblock the next logical task in the list. Net they have lots of things brewing and incomplete. I’d still recommend writing down the tasks in a minimum half day, maximum three day list, and tracking how long you’ve spent on each one — even if you don’t burn through them in a linear way. Estimation up front is still important.

Estimation and Leading Small Teams

When I was a development lead, I used what I’d learnt about estimation to manage my team. I asked each developer to put their list of tasks in a shared spreadsheet. Every Friday, the developers would update the sheet with any new information, such as being ahead or behind schedule, or inserting or deleting work items. I used this to determine whether we’d ship on time, and to rebalance resources where needed to ensure we did. I also used it to understand who was good at estimation, and where we could improve. It turned out to be a good way to keep people open and transparent about what they were working on too — and helped us be more focused on great estimation.

Good luck estimating!

Setting Your Goals: 5 Steps to Creating Your Future

You need a plan to reach a destination that means something to you. Where are you headed this year? What about five years from now? If you could dream, where would you be in ten years?

Here’s a useful tool that I use to think about my goals. Every six months or so, I take a piece of paper and write down my goals. I take a photo with my iPhone, and make it my desktop wallpaper — and then it’s there to remind me every day for the next six months. I’ll explain in this post how I think about creating the goals.

I don’t know where I learnt this approach, or whether I invented it myself, but I was amused to find the same idea on a lululemon bag recently. Not only do they make great yoga and sportswear, they give great advice. You should shop there, if only so you can read the bag.

Advice to live your life by. On a shopping bag from lululemon.

1. Two career goals

Step one is to create two career goals for the next twelve months. What do you want to achieve in the next year?

Everyone who’s working probably has goals in a system somewhere, but these should be more personal. What do you want to learn to do better? What characteristic do you want to develop? How do you want to be perceived by the people around you? Where should your focus be? How do you want to direct your energy?

My recommendation is that your goal should be a few words that mean something to you. A short phrase that triggers a longer thought. Something you can glance at and consume.

2. Two health, fitness, and wellbeing goals

You’re getting to know me through my blog, so you won’t be surprised by this section. I’ve learnt that the number one priority in life is health; without health, you can’t look after your family, yourself, or your career.

So, now it’s time to write down two goals for the next twelve months that are about you and your health, fitness, and wellbeing. Do you want to get to a healthy weight? Eat right? Get exercising? Sleep better? Fix your posture? Take tests to check on family conditions? Or do you want to take it something to the next level? How about trying my ten tips for being fighting fit?

3. Two personal goals

The final step for planning this year is to think about your personal life and capture two points. Do you want to travel more with the family? Take up a hobby? Switch off from work on the weekends? Call your friends? Make new friends? Help the community? Give your time to a cause?

4. This year, in 5 years, and in 10 years

Steps one, two, and three give you six points to focus on for the next twelve months. I recommend repeating them to create a five year plan, and again to create a ten year plan. I enjoy this part the most: thinking about the next year is a little tactical, but turning a dream for the future into goals for the future is fun. This is where you get to think about who you want to be, what success is for you in life, and what you want to be doing in your career after you’ve made substantial progress. I’d recommend thinking about your larger financial goals, your wellbeing, what success is for your family, and what your perfect career looks like.

While you’re writing this, think about coherence: is the one year plan leading to the five year plan? Five year leading to the ten? One year leading to the ten? Don’t be afraid to go back and make adjustments. You want your one year plan to take you roughly 20% of the way to your five year goal.

When you’re done, you’ve got 18 points on a page. I can fit this easily on a small sheet. As I said, I keep the points very short and consumable in a glance. Then it’s photo time, and time to make it your desktop background.

5. Refresh

Every six months or so, you should refresh your goals. Read the previous entry, and make an honest assessment of how you’ve gone with your goals. Copy over the phrases you still like or that shouldn’t change, and make a few adjustments where you need. Don’t get too unhappy with yourself the first time around — I started by writing overly ambitious goals, and I’ve learnt that it’s better to write down achievable goals that push me in the right direction. I don’t tend to change the ten year goals, and I rarely tweak the five year goals. I often make changes to the one year plan, but always in the context of asking: is this helping me get from today to my five year goals?

This is an enjoyable exercise for me, and an investment in thinking about myself. I hope you find it useful too. Let me know how it goes.

Ranking at eBay (Part #3)

Over the last two posts on this topic, I’ve explained some of the unique problems of eBay’s search challenge, and how we think about using different factors to build a ranking function. In this post, I’ll tell you more about how we use the factors to rank, how we decide if we’ve improved ranking at eBay, and where we are on the ranking journey.

Hand-tuning a Ranking Function

A ranking function combines different factors to give an overall score that can be used to rank documents from most- to least-relevant to a query. This involves computing each factor using the information that it needs, and then plugging the results into the overall function to combine the factors. Ranking functions are complicated: there’s typically at least three factors in the most simple function, and they’re typically combined by multiplying constants by each of the factors. The output is just a score, which is simply used later to sort the results into rank order (by the way, the scores are typically meaningless across different queries).

If you’ve got two, three, or maybe ten different factors, you can combine them by hand, using a mix of intuition, and experimentation. That’s pretty much what happens in the public domain research. For example, there’s a well-known ranking function Okapi BM25 that brings together three major factors:

  1. Term frequency: How often does a word from the query occur in the document? (the intuition being that a document that contains a query word many times is more relevant than a document that contains it fewer times. For example, if your query is ipod, then a document that mentions ipod ten times is more relevant than one that mentions it once)
  2. Inverse document frequency: How rare is a query word across the whole collection? (the intuition being that a document that contains a rarer word from the query is more relevant than one that contains a more common word. For example, if your query was pink ipod nano, then a document that contains nano is more relevant than a document that contains pink)
  3. Inverse document length: How long is the document? (the intuition being that the longer the document, the more likely it is to contain a query word on the balance of probabilities. Therefore, longer documents need to be slightly penalized or they’ll dominate the results for no good reason)

How are these factors combined in BM25? Pretty much by hand. In the Wikipedia page for Okapi Bm25 the community recommends that the term frequency be weighted slightly higher than the inverse document frequency (a multiplication of 1.2 or 2.0). I’ve heard different recommendations from different people, and it’s pretty much a hand-tuning game to try different approaches and see what works. You’ll often find that research papers talk about what constants they used, and how they selected them; for example, in this 2004 paper of mine, we explain the BM25 variant we use and the constants we chose.

This all works to a certain point: it’s possible to tune factors, and still have a function you can intuitively understand, as long as you don’t have too many factors.

Training Algorithms to Combine Factors

At eBay, we’ve historically done just what I described to build the Best Match function. We created factors, and combined them by hand using intuition, and then used experimentation to see if what we’ve done is better than what’s currently running on the site. That worked for a time, and was key to making the progress we’ve made as a team.

At some point, combining factors by hand becomes very difficult to do — it becomes easier to learn how to combine the factors using algorithms (using what’s broadly known as machine learning). It’s claimed that AltaVista was the first to use algorithmic approaches to combine ranking factors, and that this is now prevalent in industry. It’s certainly true that everyone in the Valley talks about Yahoo!’s use of gradient boosted decision trees in their now-retired search engine, and that Microsoft announced they used machine-based approaches as early as 2005. Google’s approach isn’t known, though I’d guess there’s more hand tuning than in other search engines. Google has said they use more than 200 signals in ranking (I call these factors in this post).

Let me give you an example of how you’d go about using algorithms to combine factors.

First, you need to decide what you’re aiming to achieve, since you want to learn how to combine the factors so that you can achieve a specific goal. There’s lots of choices of what you might optimize for: for example, we might want to deliver relevant results on a per query basis, we might want to maximize clicks on the results per query, we might want to sell more items by dollar value, we might want to sell more items, or we might want to increase the amount of times that a user uses the search engine each month. Of course, there’s many other choices. But this is the important first step — decide what you’re optimizing for.

Second, once you’ve chosen what you want to achieve, you need training data so that your algorithm can learn how to rank. Let’s suppose we’ve decided we want to maximize the number of clicks on results. If we’ve stored (logged or recorded) the interactions of users with our search engine, we have a vast amount of data to extract and use for this task. We go to our data repository and we extract queries and items that were clicked, and queries and items that were not clicked. So, for example, we might extract thousands of sessions where a user ran the query ipod, and the different item identifiers that they did and didn’t click on; it’s important to have both positive and negative training data. We’d do this at a vast scale, we’re likely looking to have hundreds of thousands of data points. (How much data you need depends on how many factors you have, and the algorithm you choose.)

So, now we’ve got examples of what users do and don’t click on a per query basis. Third, it’s time to go an extract the factors that we’re using in ranking. So, we get our hands on all the original data that we need to compute our factors — whether it’s the original items, information about sellers, information about buyers, information from the images, or other behavioral information. Consider an example from earlier: we might want to use term frequency in the item as a factor, so we need to go fetch the original item text, and from that item we’d extract the number of times that each of the query words occurs in the document. We’d do this for every query we’re using in training, and every document that is and isn’t clicked on. For the query ipod, it might have generated a click on this item. We’d inspect this item, count the number of times that ipod occurs, and record the fact that it occurred 44 times. Once we’ve got the factor values for all queries and items, we’re ready to start training our algorithm to combine the factors.

Fourth, we choose an algorithmic approach to learning how to combine the factors. Typical choices might be a support vector machine, decision tree, neural net, or bayesian network. And then we train the algorithm using the training data we’ve created, and give it the target or goal we’re optimizing for. The goal is that the algorithm learns how to separate good examples from bad examples using the factors we’ve provided, and can combine the factors in a way that will lead to relevant documents being ranked ahead of irrelevant examples. In the case we’ve described, we’re aiming for the algorithm to be able to put items that are going to be clicked ahead of items that aren’t going to be clicked, and we’re allowing the algorithm to choose which factors will help it do that and to combine them in way that achieves the goal. Once we’re done training, we’d typically validate that our algorithm works by testing it on some data that we’ve set aside, and then we’re ready to do some serious analysis before testing it on customers.

Fifth, before you launch a new ranking algorithm, you want to know if it’s working sensibly enough for even a small set of customers to see. I’ll explain later how to launch a new approach.

If you’re looking for a simple, graphical way to play around with training using a variety of algorithms, I recommend Orange. It works on Mac OS X.

What about Best Match at eBay?

We launched a machine-learned version of Best Match earlier in 2012. You can learn more about the work we’re doing on machine learning at eBay here.

We now have tens of factors in our ranking function, and it isn’t practical to combine them by hand. And so the 2012 version of Best Match combines its factors by using a machine learned approach. As we add more factors — which we’re always trying to do — we retrain our algorithm, test, iterate, learn, and release new versions. We’re adding more factors because we want to bring more knowledge to the ranking process: the more different, useful data that the ranking algorithm has, the better it will do in separating relevant from irrelevant items.

We don’t talk about what target we’re optimizing for, nor have we explained in detail what factors are used in ranking. We might start sharing the factors soon — in the same way Google does for its ranking function.

Launching a New Ranking Algorithm

Before you launch a new ranking function, you should be sure it’s going to be a likely positive experience for your customers. No function is likely to be entirely better than a previous function — what you’re expecting is that the vast majority of experiences are the same or better, and that only a few scenarios are worse (and, hopefully, not much worse). It’s a little like buying a new car — you usually buy one that’s better than the old one, but there’s usually some compromise you’re making (like, say, not quite the right color, you don’t like the wheels as much, or maybe it doesn’t quite corner as well).

A good place to start in releasing a new function is to use it in the team. We have a side-by-side tool that allows us to see an existing ranking scheme alongside a new approach in a single screen. You run a query, and you see results for both approaches in the same screen. We use this tool to kick the tires of a new approach, and empirically observe whether there’s a benefit for the customers, and what kinds of issues we might see when we release it. I’ve included a simple example from our side by side tool, where you can see a comparison of two ranking for the query yarn, and slightly different results — the team saw that in the experiment on the left we were surfacing a great new result (in green), and on the right in the default control we were surfacing a result that wasn’t price competitive (in red).

Side by side results for the query yarn. On the left, an experiment, and on the right is the default experience.

If a new approach passes our bar as a team, we’ll then do some human evaluation on a large scale. I explained this in this blog post, but in essence what we do is ask people to judge whether results are relevant or not to queries, and then compute an overall score that tells us how good our new algorithm is compared to the old one. This also allows us to dig into cases where it’s worse, and make sure it’s not significantly worse. We also look at the basic facts about the new approach: for example, for a large set of queries, how different are the results? (with the rationale that we don’t want to dramatically change the customer experience). If we see some quick fixes we can make, we do so.

Once a new algorithm looks good, it’s time to test it on our customers. We typically start very small, trying it out on a tiny fraction of customers, and comparing how those customers use search relative to those who are using the regular algorithms. As we get more confident, we increase the number of customers who are seeing the new approach. And after a few week’s testing, if the new approach is superior to the existing approach, we’ll replace the algorithm entirely. We measure many things about search — and we use all the different facts to make decisions. It’s a complex process, and rarely clear cut — there’s facts that help, but in the end it’s usually a nuanced judgement to release a new function.

Hope you’ve enjoyed this post, the final one in my eBay ranking series. See you again next week, with something new on a new topic!

Managing small teams: the tools you should use

Bobby Kostadinov recently asked me the following question on twitter:

And I was curious enough to go ask my team at eBay, where we use yammer to have fun conversations like this one.

The net of it is the team prefers simple approaches over complicated ones:

  • Many folks prefer using sticky notes, a whiteboard, and a camera for capturing them
  • A few folks like a spreadsheet, probably one with a few macros that create burn down charts and statistics
  • Using tools such as Jira, Trello, and Github Issues is becoming popular in the team

My personal experience with very small teams has been with sticky notes, whiteboards, and simple spreadsheets. When I began my career in 1990, I used a whiteboard on my first team and a polaroid camera to capture it. Many years later, I used a simple spreadsheet, and sticky notes late in the project to close down bugs and those final few features. If I had my time again on a small project, I’d probably do exactly the same thing.

The people on my team are wiser than me — they’ve got current experience with small teams — and here’s a compilation of what they said.

The most popular method: a whiteboard, a camera, and sticky (aka PostIt) notes

Ravi started the conversation about a simple approach and says that “for a small team – especially if they are working on a new project – [the] best I have seen so far is a whiteboard with cards or sticky notes. Put tasks as sticky notes or write in a card, move it along the white board from start to finish. Annotate the sticky with blockers, etc. if required”. Ibrahim has worked at two other major Internet companies prior to eBay, and says “Wow. Sticky notes people. It’s the most ‘obvious’ answer to all this. Sticky notes, whiteboard, and a camera (to capture) is all you need”.

Mitch, who’s probably been around as long as me, has the same experience and agrees, except when a team is remote. In that case, it’s time to switch to tools: “hypermail, jira (or similar, integrated system)”. Farah is of a similar mind and agrees “… sticky notes with daily huddles work very well. But being remote I have a strong preference for documenting everything. Taking pictures of the whiteboard after every huddle and putting the images on a wiki might work.”

Use a Tool

Sri’s one of our Vice Presidents, and has a long and distinguished career at eBay from individual contributor all the way up. He has “seen most open source initiatives use Jira. The rich portfolio of plugins make it a great tool for both the teams and for the project owner. Nice thing about Jira is that the companion wiki (confluence, the one we run) can surface it neatly inline as well through simple macros.”

There were many votes for Github Issues, including from Utkarsh who says that you should use it if you “are based off github. Simple and just works.”

Another popular emerging tool is Trello, which quite a few folks recommended. Other tools mentioned were Pivotal Tracker, newcomer Asana, and redmine.

Use a Spreadsheet

Jon D, who’s a veteran too, says that he’d stick with a spreadsheet. “For such a small team, especially if co-located, simple is good. A spreadsheet (Excel) or MS Project (using only basic features) are sufficient”. He was supported by several folks, who pointed out that there are good spreadsheets you can download and use, and that Google Docs is a nice way to collaborate across the team.

Other ideas

A few other ideas were mentioned, including using email (Google’s Gmail has excellent search capabilities), groups on Yammer, a Wiki, and (urrggghhh!) Sharepoint.

Hope this helps you Bobby, and a few others out there! Looking forward to starting a conversation on this one…

The race to build better search: a Reuters article

I spent a couple of hours with Alistair Barr from Reuters discussing search at eBay, and our Project Cassini rewrite of eBay’s search platform. Alistair published the story yesterday, and it’s a good, short read. Thanks Alastair for sharing the story with your readers.

Alistair discusses Walmart’s search rewrite (which didn’t take too long by the sounds of it — my recent blog post suggests why), quotes responses from Google’s PR team, and shares insights from my good friend Oren Etzioni who works at both the University of Washington and the rather awesome shopping decision engine, decide.com. He mentions Google’s features that obviously do a little light image processing to match color and shape intents in queries such as “red dress” or “v-neck dress” against the content in the images.

Google does do lots of things well, but they’re often not the first to do them — we built that color search into Bing’s image search in 2008 (try this “red dress” query). On a related note, eBay has a rather cool image search feature, which we really should make more prominent in our search experience (mental note: must work on that). Try this “red dress” query, and you’ll see results that use visual image features to find related items.

I’ll be back with part #3 of my Ranking at eBay series soon.

The don’t eat grains mantra: why it makes sense

I listed my top ten tips for getting in fighting fit shape in this blog post. Tip #1 (“Don’t eat grains”) seems to be controversial with some, so I thought I’d explain a little more of why it works for me and makes staying at a healthy body weight easy.

Plan A: Measure and Track your Caloric Intake

Here’s the traditional thing that most folks do: they work out how many calories they should be consuming, with a focus on making sure they consume slightly less than they burn. If you eat much less than you burn, your body panics, starts breaking down muscle instead of fat, and counter-productive things happen. The trick is consuming a little less than you burn, and your body dips into its fat reserves, and healthy weight loss happens. Again, take all this with a grain of salt: I’m a computer scientist, not a nutritionist, and you should do your own research and get your own plan from a professional.

So, let’s try that. There’s a variety of sites that let you figure out how many calories you’re probably burning, enter how much you weigh, add a bunch of other data, and then suggest what your caloric intake should be given your body weight goal and a healthy weight loss rate. Here’s a tool from dietitian.com that’s linked to by the US Food and Nutrition Center at the USDA. When I enter my data, and say that I want to drop my body fat by 1%, and lose 1 pound per week to do it, it tells me to eat about 1700 calories per day; that is outrageously low and wrong. Ok, so now I’ll try another US government tool, the “SuperTracker” at ChooseMyPlate.gov. It tells me I should eat 2800 calories per day; that seems a little high, but in the ball park. The guys at Fitness Wave do hydrostatic body testing, where they put you in a tank and figure out your body fat percentage fairly accurately, and they tell me I should eat 2800 calories per day to maintain weight. Another tool I’ve played with is MyNetDiary, and it says I should eat about 2400 calories.

So, I’m confused. And if you get this wrong, you’re either going to starve and bad things will happen, or put on weight. Most of us should find a nutritionist, otherwise it really is a guessing game until you figure it out through experimentation. Of course, you can try eating roughly to an approximate sensible target, and measure what happens over a few weeks. But, anyway, since we’re trying to put together “Plan A” in this blog post, let’s go with 2600 calories per day as a guesstimate for maintaining my current weight.

Ok, so let’s pretend I’ve fallen out of bed, headed downstairs, and I’m grabbing breakfast. Let’s say I eat steel-cut oats with fruit and nuts and 1% milk, and have a Starbucks tall latte on the way to work. Total price tag from MyNetDiary is 753 calories. I hit the gym, and grab a Starbucks apple bran muffin and another coffee at 10am: 510 calories. At 12, I grab a sandwich at work with two pieces of wholegrain bread, some chicken breast, two slices of cheese, tomato, and lettuce: 454 calories. By 3pm, I’m dragging, and grab another coffee and a Clif bar from the food machine: 430 calories. Now it’s dinner time: I decide to have a stir fry, with rice, chicken, and some vegetables: 510 calories. No after dinner snack for me today. Total: 2657 calories.

So, I’m pretty close to the 2600 calories, remembering the target isn’t science. I’m probably not gaining or losing weight, and I ate reasonably well — do you eat better or worse? If I was eating slices of pizza for lunch, waffles for breakfast, eating candy, drinking soda, or hitting take out for dinner, we’d be blasting into the 3000+ territory easily, and then we’re packing on the weight.

In practice, what makes or breaks this kind of plan is meticulous tracking, and having an accurate target that accounts for your goal, your output (exercise), and your inputs. It can work if you work hard.

What makes it hard is the high-carb foods. If I threw in a couple of slices of toast with honey for breakfast, you can add 170 calories to the total (and I could easily do that on the weekend, or even eat four slices for lunch). A cup of rice is 220 calories — if I eat 1.5 or 2 cups with my stirfry instead of 1, it’s goodbye to the plan for the day. A bowl of pasta (say 1.5 cups of plain old spaghetti with sauce) is 330 calories.

Of course, sometimes you need a blast of energy, and this high carb loading is a good thing. If I was hitting the hills on the mountain bike for three hours, it’s probably a good idea. If I was a marathon runner who’s training hard, I’d need this kind of intake. But most of the time, I’m a sedentary office worker (even though I probably do a good 60 to 90 minutes of exercise each day of some material form). So I don’t need the energy, and I don’t want my insulin spiking, and my body storing it as fat for the apocalypse…

Anyway, I’ve given this a good college try, and it doesn’t work for me. Too much measuring, recording, counting. Too much going over the goal by lunch, and winding up hungry later on to hit the goal. Too complicated.

Plan B: Skip the grains

Here’s the non-traditional thing to do: skip the grains, and go easy on the high-sugar foods (don’t eat anything with more than 10g of sugar per 100g of product).

I’ve got an awesome breakfast that I cook every morning. It’s a kind of souffle pancake filled with fruit and nuts. The ingredients are eggs, fruit, nuts, vanilla, and cinnamon — I’ll make a video of how I make it sometime, and take this blog into youtube cooking land. But, bottom line, when I eat it, it’s mighty big and I’m full, full of energy, and happy. It’s about 400 calories in total.

After the gym, I have a small handful of nuts, some carrots or snap peas or celery, and some turkey jerky. Again, gets me back to feeling full, and restores my energy.

At lunch, it’s salad with lean meat. I’ll have a massive portion of chicken and spinach, and sometimes throw in plenty of colorful greens. For a mid-afternoon snack, I have another massive portion of chicken and salad. Effectively, I’m eating two lunches per day.

For dinner, I stick with meats, vegetables, and salads, and that means lots of BBQs / cookouts, and plenty of spice in the food we cook. Last night, I had roasted tri-tip, a spinach and sweet potato salad with a lemon dressing, and a delicious shredded lettuce, almond, and carrot salad with a yummy dressing.

All up, I love breakfast, and I love dinner. The rest can be a bit of a chore, but I am certainly always eating and feeling full. And who said that food has to be a hedonistic experience at every meal anyway?

When I put this into a tool, even with plenty of different variations, it’s always either on my target, or slightly under. And even when I throw in a glass of wine, or a sweet treat at the end of the day, I’m never over by more than a couple of hundred calories. It’s pretty hard to miss when you don’t eat grains, and you steer clear of the high sugar stuff.

I also feel great on Plan B — I just plain feel better from having given up grains, and steering clear of high sugar foods. My digestion is better, and when I mess up and eat wheat, I feel sick. That’s converted me — if I feel great without grains, and terrible when I eat them, then I don’t need them in my life.

Am I short on any nutrient and do I get enough fiber? No and yes. I eat an amazing variety of vegetables and fruits, nuts and meats, and spices and condiments. From my tracking, it looks like I’m spot on where I need to be. I don’t need “whole wheat” (lots of carbs with some indigestible fiber stuff attached) to somehow be magically healthier — there’s no problem that needs solving.

The Bottom Line

I’ve tried two basic ways to get to a healthy weight and maintain it: calorie counting and planning, or just avoiding grains and high sugar foods.

Calorie counting is too complicated for me, and prone to big, bad misses that are fueled by messing up when I eat high carb foods. I’ve never been successful following Plan A. How about you?

Avoiding grains is a basic rule, and it doesn’t require meticulous recording and counting to be roughly right, day after day. The bonus of avoiding grains for me is that I also feel better, perhaps I’m slightly gluten intolerant and blissfully lived most of my life not knowing (and now I feel better!). I’ve been successful on Plan B.

If you’re interested in Plan B, perhaps you should have a chat to a nutritionist, and see if giving a grain-free eating plan is right for you. Ask them if you should try it for 30 days, and then share your data with my readers, and let’s see where the thinking goes.