What’s Big Data anyway? Part Two

Last week, I shared a few ways in which big data adds value. This week, I share a few more.

Predictions

You can predict the future using data. Google gets publicity from predicting flu outbreaks.

I did something similar, years earlier, that is thematically similar and illustrates the idea of using big data to predict the future. I was interested in what queries users typed before and after the query stomach ache (and a few synonymous queries). Google and Bing both give you examples of what users type next, including: diarrhea, nausea, constipation, peptic ulcer, and stomach acid symptoms. Why was I interested? I wanted to see if I could figure out which drugs had side effects that included stomach upsets.

Talking about eBay's use of big data at the 2012 PHP UK conference

Talking about eBay’s use of big data at the 2012 PHP UK conference

I collected all the queries that users typed before and after stomach ache (and its synonyms) over the period of two or so years. I then threw away all queries that contained only English dictionary words, leaving queries that contained one or more non-dictionary words. What’s left? Drug names, and a ton of other junk (places, people, websites, misspellings, foreign words, and so on). What I found was that users were typing the names of drugs they were taking, learning about them, and then searching for information on stomach problems (and vice-versa). I could also see how frequently each drug was associated with a stomach ache.

I looked up some of the drugs on various websites, and learnt about the side effects. Guess what? More than half of the drugs I checked had a side effect of a stomach ache. Less than half didn’t — but I suspect that probably isn’t right. If you have enough users, you can learn about the future — and I know that at least a couple of the drug side effects have been updated to include rare incidences of stomach aches. See: you can predict the future!

The world of big data has many companies built on predicting the future using vast amounts of historical data. One of my favorites is The Climate Corporation (who recently were purchased by Monsanto) — they invested their time in doing a better job of predicting the weather than existing weather providers, and commercializing the insights through selling insurance against weather events.

Relative Performance

Every major website is running A/B tests. The idea is pretty simple: show one set of users “experience A” and show another set of users “experience B”. You do this for a while, and then compare various metrics between the populations. You might learn, for example, that customers prefer a blue button over a grey button, or that customers buy more products if you show them better product imagery. I’ve written about this topic previously.

Why’s this related to big data? Well, you have to collect and process an enormous amount of data to derive insights. To find statistically significant differences between the behaviors of populations of users, you typically need tens of thousands of users in each test and a reasonable time period of tracking all of their behaviors. If you multiply this by the number of tests you’re concurrently running, you plan to keep the data forever, and you want to produce many different insights, you will have petabytes of data on your hands.

Creating Feature Ideas

My third ever blog post was about inventing infinite scroll on the Web. It’s a good example of how you can use data to understand customers, and then create intuitive insights based on that understanding. In that example, we saw that users of image search paginated a ton, and we created a future without pagination — what’s now known as “infinite scroll”. You need lots of data, you need to keep that data, and you need to be able to create insights from that data to have these kinds of feature ideas.

Afterword

I don’t intend this to be a taxonomy of big data themes. There’s much more you can do with data — this is a stream of consciousness of themes I’ve seen in action. In my world, very little happens without big data: you’re using data to understand users and systems, you’re creating new ideas with that data, and you’re iterating on those ideas by measuring them at scale. Even the big leaps — like infinite scroll — aren’t ideas that are created in the absence of data.

See you next time.

2 thoughts on “What’s Big Data anyway? Part Two

  1. Pingback: What’s Big Data anyway? | Hugh E. Williams

  2. Mike Jeanes

    Hey Hugh, im following some of your articles with interest! I thought of your stomach example as less prediction and more of aggregating disaggregated information. So in that way it might be thought of overcoming a type of what economists would call information asymmetry (drugging around all the pharma websites is tedious).

    By the way I’ve played around building a alpha web concept SeekaFinda.com. I’d be interested on your thought about the concept if you have any?

    Might see you in Melbourne some time!
    Cheers
    Mike

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s