It’s an apostrophe

Even the smartest folks I know can’t get apostrophes right. (Please don’t read all my blog posts and find the mistakes!). Let me see if I can help. 

  • “It’s” is equivalent to “it is”. If you write “it’s” in a sentence, check it makes sense if you replace it with “it is”. If yes, good. If no, you probably meant “its”
  • “Its” is a possessive. “The dog looked at its tail”. As in, the tail attached to the dog was stared at by the aforementioned canine

Get those right, and you’re in the top 98% of apostrophe users.

Don’t write “In the 1980’s, rock music was…”. You mean “In the 1980s, …”. As in, the plural: the ten years that constitute the decade that began in 1980. These are also correct: “He collected LPs” or “She installed LEDs instead of incandescent globes”. You’ll find some people argue about these: for example, some folks write “mind your P’s and Q’s”, and argue correctness. I personally think it’s wrong, there are many Ps and Qs, and so it should be “mind your Ps and Qs”.

Watch out for possession of non-plurals that end in consonants. “Hugh William’s blogs are annoying” and “Hugh Williams’ blogs are annoying” are both wrong. “Hugh Williams’s blogs are annoying” is right (in more ways than one?).

One trick I use is this: if you say the “s”, add the “s”. Hugh Williams’s blog. Ross’s Dad. The boss’s desk. If you don’t say it, don’t add it. His Achilles’ heel. That genres’ meaning.

Have a fun week!

Fireside chat at the DataEdge Conference

The video of my recent conversation with Michael Chui from McKinsey as part of the UC Berkeley DataEdge conference is now online. Here it is:

The discussion is around 30 minutes. I tell a few stories, and most of them are mostly true. We talk about my career in data, search, changing jobs, inventing infinite scroll, eBay, Microsoft, Pivotal, and more.  Enjoy!

Putting Email on a Diet

A wise friend of mine once said: try something new for 30 days, and then decide if you want to make it permanent.

Here’s my latest experiment: turning off email on my iPhone. Why? I found I was in work meetings, or spending time with the family, and I’d frequently pick up my phone and check my email. The result was I wasn’t participating in what I’d chosen to be part of — I was distracted, disrespectful of the folks I was with, and fostering a culture of rapid-fire responses to what was supposed to be an asynchronous communication medium. So, I turned email off on my iPhone. image1

What happened? I am enjoying and participating in meetings more. I am paying attention to the people and places I have chosen to be. And I’m not falling behind on email — I do email when I choose to do it, and it’s a more deliberate and effective effort.

Have I strayed? Yes, I have. When I’m truly mobile (traveling and away from my computer), I choose to turn it on and stay on top of my inbox — that’s a time when I want to multitask and make the best use of my time by actually choosing to do email. And then I turn it off again.

My calendar and contacts are still enabled. On the go, I want to know where and when I need to be somewhere, and to be able to consciously check my plans. I also want to be able to contact people with my phone.

Will I stick with it? I think so. Give it a try.

See you next time.

Armchair Guide to Data Compression

Data compression is used to represent information in less space than its original representation. This post explains the basics of lossless compression, and hopefully helps you think about what happens when you next compress data.

Similarly to my post on hash tables, it’s aimed at software folks who’ve forgotten about compression, or folks just interested in how software works; if you’re a compression expert, you’re in the wrong place.

A Simple Compression Example

Suppose you have four colored lights: red, green, blue, and yellow. These lights flash in a repeating pattern: red, red, green, red, red, blue, red, red, yellow, yellow (rrgrrbrryy). You think this is neat, and you want to send a message to a friend and share the light flashing sequence.

You know the binary (base 2) numbering system, and you know that you could represent each of the lights in two digits: 00 (for red, 0 in decimal), 01 (for green, 1 in decimal), 10 (for blue, 2 in decimal), and 11 (for yellow, 3 in decimal). To send the message rrgrrbrryy to your friend, you could therefore send them this message: 00 00 01 00 00 10 00 00 11 11 (of course, you don’t send the spaces, I’ve just included them to make the message easy for you to read).

Data compression. It's hard to find a decent picture!

Data compression. It’s hard to find a decent picture!

You also need to send a key or dictionary, so your friend can decode the message. You only need to send this once, even if the lights flash in a new sequence and you want to share that with your friend. The dictionary is: red green blue yellow. Your friend will be smart enough to know that this implies red is 00, green is 01, and so on.

Sending the message takes 20 bits: two bits per light flash and ten flashes in total (plus the one-off cost of sending the dictionary, which I’ll ignore for now). Let’s call that the original representation.

Alright, let’s do some simple compression. Notice that red flashes six times, yellow twice, and the other colors once each. Sending those red flashes is taking twelve bits, four for sending yellow, and the others a total of four for the total of twenty bits. If we’re smart, what we should be doing is sending the code for red in one bit (instead of two) since it’s sent often, and using more than two bits for green and blue since they’re sent once each. If we could send red in one bit, it’d cost us six bits to send the six red flashes, saving a total of six bits over the two-bit version.

How about we try something simple? Let’s represent red as 0. Let’s represent yellow as 10, green as 110, and blue as 1110 (this is a simple unary counting scheme). Why’d I use that scheme? Well, it’s a simple idea: sort the color flashes by decreasing frequency (red, yellow, green, blue), and assign increasingly longer codes to the colors in a very simple way: you can count 1s until you see a 0, and then you have a key you can use to look in the dictionary. When we see just a 0, we can look in the dictionary to find that seeing zero 1s means red. When we see 1110, we can look in the dictionary to find that seeing three 1s means blue.

Here’s what our original twenty-bit sequence would now look like: 0 0 110 0 0 1110 0 0 10 10. That’s a total of 17 bits, a saving of 3 bits — we’ve compressed the data! Of course, we need to send our friend the dictionary too: red yellow green blue.

It turns out we can do better than this using Huffman coding. We could assign 0 to red, 10 to yellow, 110 to blue, and 111 to green. Our message would then be 0 0 111 0 0 110 0 0 10 10. That’s 16 bits, 1 bit better than our simple scheme (and, again, we don’t need to send the spaces). We’d also need to share the dictionary: red <blank> yellow <blank> blue green to show that 0 is red, 1 isn’t anything, yellow is 10, 11 isn’t anything, blue is 110, and green is 111. A slightly more complicated dictionary for better message compression.

Semi-Static Compression

Our examples are two semi-static compression schemes. The dictionary is static, it doesn’t change. However, it’s built from a single-pass over the data to learn about the frequencies — so the dictionary is dependent on the data. For that reason, I’ll call our two schemes semi-static schemes.

Huffman coding (or minimum-redundancy coding) is the most famous example of a semi-static scheme.

Semi-static schemes have at least three interesting properties:

  1. They require two passes over the data: one to build the dictionary, and another to emit the compressed representation
  2. They’re data-dependent, meaning that the dictionary is built based on the symbols and frequencies in the original data. A dictionary that’s derived from one data set isn’t optimal for a different data set (one with different frequencies) — for example, if you figured out the dictionary for Shakespeare’s works, it isn’t going to be optimal for compressing War and Peace
  3. You need to send the dictionary to the recipient, so the message can be decoded; lots of folks forget to include this cost when they share the compression ratios or savings they’re seeing, don’t do that

It doesn’t matter what kind of compression scheme you choose, you need to decide what the symbols are. For example, you could choose letters, words, or even phrases from English text.

Static Compression

Morse code is an example of a (fairly lame) static compression scheme. The dictionary is universally known (and doesn’t need to be communicated), but the dictionary isn’t derived from the input data. This means that the compression ratios you’ll see are at best the same as you’ll see from a semi-static compression scheme, and usually worse.

There aren’t many static compression schemes in widespread use. Two examples are Elias gamma and delta codes, which are used to compress inputs that consist of only integer values.

Adaptive Compression

The drawbacks of semi-static compression schemes are two-fold: you need to process the data twice, and they don’t adapt to local regions in the data where the frequencies of symbols might vary from the overall frequencies. Imagine, for example, that you’re compressing a very large image: you might find a region of blue sky, where there’s only a few blue colors. If you built a dictionary for only that blue section, you’d get a different (and better) dictionary than the one you’d get for the whole image.

Here’s the idea behind adaptive compression. Build the dictionary as you go: process the input, and see if you can find it in the (initially empty) dictionary. If you can’t, add the input to the dictionary. Now emit the compressed code, and keep on processing the input. In this way, your dictionary adapts as you go, and you only have to process the data once to create the dictionary and compress the data. The most famous example is LZW compression, which is used in many of the compression tools you’re probably using: gzip, pkzip, GIF images, and more.

Adaptive schemes have at least three interesting properties:

  1. One pass over the data creates the dictionary and the compressed representation, an advantage over semi-static schemes
  2. Adaptive schemes get better compression with the more input they see: since they don’t approximate the global symbol frequencies until lots of input has been processed, they’re usually much less optimal than semi-static schemes for small inputs
  3. You can’t randomly access the compressed data, since the dictionary is derived from processing the data sequentially from the start. This is one good reason why folks use static and semi-static schemes for some applications

What about lossy compression?

The taxonomy I’ve presented is for lossless compression schemes — those that are used to compress and decompress data such that you get an exact copy of the original input. Lossy compression schemes don’t guarantee that: they’re schemes where some of the input is thrown away by approximation, and the decompressed representation isn’t guaranteed to be the same as the input. A great example is JPEG image compression: it’s an effective way to store an image, by throwing away (hopefully) unimportant data and approximating it instead. The MP3 music file format and the MP4 video format (usually) do the same thing.

In general, lossy compression schemes are more compact than lossless schemes. That’s why, for example, GIF image files are often much larger than JPEG image files.

Hope you learnt something useful. Tell me if you did or didn’t. See you next time.

Don’t use a Pie Chart

I don’t like pie charts. Why? It’s almost impossible to compare the relative size of the slices. Worse still, it is actually impossible to compare the slices between two pie charts.

Pie charts in action.

Pie charts in action.

Take the example above. Take a look at Pie Chart A. The blue slice looks the same size as the red or green slice. You might draw the conclusion they’re roughly the same. In fact, take a look at the histogram below — the red is 17 units, blue is 18 units, and green is 20 units. The histogram is informative, useful for comparison, and clear for communication. (There’s still a cardinal sin here though: no labels on the axes; I’ll rant about that some other day.)

Compare pie charts B and C above. It sure looks like there’s the same quantity of blue in B and C, and about the same amount of green. The quantities aren’t the same: the histograms below show that green is 19 and 20 respectively, and blue is 20 and 22.

You might argue that pie charts are useful for comparing relative quantities. I’d argue it’s possible but difficult to interpret. Take a look at three yellow slices in charts A, B, and C. They look to be similar percentages of the pies, but it’s hard to tell: they are oriented slightly differently, making the comparison unclear. What’s the alternative? Use a histogram with the y-axis representing the percentage.

Pie charts get even more ugly when there’s lots of slices, or some of the slices are ridiculously small. If you’re going to use them — I still don’t think it’s a good idea — then use them when there are only a few values and the smallest slices can be labeled. The pie chart below is ridiculous.

A pie chart that's nearly impossible to read. I'm not sure what conclusions you could draw from this one.

A pie chart that’s nearly impossible to read. I’m not sure what conclusions you could draw from this one.

Why do I care? I’m in the business of communicating information, and I spend much of my time reviewing information that people share with me. Clarity is important — decisions are made on data. See you next time.

Joining Pivotal

I joined Pivotal Software Inc. last week as their SVP of Research and Development. I’m excited to be part of a team that will change the big data landscape. I’m already enjoying getting down to work.

Why’d I choose Pivotal? I’ve seen big data transform businesses, but I’ve been lucky enough to see that from the inside of two Internet giants. I believe we’re on the cusp of seeing that transformation in the next few thousand or more large businesses, and Pivotal has the opportunity to help drive that transformation. One day, we will see health care, farming, manufacturing, education, finance, and communications iterate at the speed of an Internet company — and I believe it’s possible if they’re able to store their data, understand what it’s telling them, and iterate fast on their products. Indeed, I said as much yesterday in an interview with GigaOm.

I’m looking forward to the challenge. As I learn more, expect more blogs on Pivotal, its challenges, and more. See you next time.

Learning Lessons from the Apollo program

I’m fascinated by the Apollo program. It’s a triumph of human endeavour that we sent twenty-four men to the moon, and twelve to the surface between 1969 and 1972. It was a program built with reliable 1950s-designed equipment, and computers far less sophisticated than a cheap modern wristwatch. Achieving Kennedy’s audacious vision was made possible through teamwork, planning, hard work, and ingenuity.

Recently, I decided to learn more about the early Apollo missions to understand what work went into getting us to the moon. It’s an incredibly story, and perhaps more interesting than several later missions.

An incremental approach

Apollo 7 was the first manned Apollo mission to fly in space. It was a confidence-builder, and the first time three people had flown together in space around the Earth. After eleven days in 1968, the crew returned having tested the command module and successfully made a live TV appearance.

The Apollo 7 crew during the first live broadcast from space

The Apollo 7 crew during the first live broadcast from space

The critical pieces of Apollo 7 had flown earlier. The earlier unmanned Apollo 4 and 6 had tested launching a similar unmanned command module and returning it to earth, while Apollo 5 had also tested the same Saturn IB rocket that was used to launch Apollo 7. Apollo 7 was focused on testing putting three astronauts in space in what was a now-trusted setup.

Apollo 8 was a shorter, six day mission. Three men flew to the moon, orbited it a few times, and returned to earth. They were the first to fly atop the Saturn V rocket — the smaller Saturn IB rocket used in Apollo 7 wasn’t sufficient to reach the moon. The Saturn V had been tested in Apollo 4 and 6. This was an audacious mission — testing both a manned Saturn V mission, and visiting the moon. Behinds the scenes, it was a tough sell to management — they didn’t like being that aggressive. But it made sense as a mission: the lunar module was behind in development and wasn’t ready to be tested, and there was fear the Russians might get Cosmonauts around the moon in 1968.

The far side of the moon as seen from Apollo 8. The Apollo 8 crew was the first to ever see the dark side of the moon

The far side of the moon as seen from Apollo 8. The Apollo 8 crew was the first to ever see the dark side of the moon

Apollo 9 was a return to an incremental approach. The mission orbited the earth, tested the lunar module in earth orbit (that would later be used to land on the moon), and included a space walk to test the spacesuits. It also included a rendezvous, which was necessary with the command and lunar modules being separated.

Testing the Apollo 9 lunar module Spider in earth orbit

Testing the Apollo 9 lunar module Spider in earth orbit

Apollo 10 was the full dress rehearsal for landing on the moon. It was time to repeat the test of the lunar module, but this time in moon orbit. The lunar module detached from the command module, and the crew descended to within ten miles of the moon’s surface. They then returned to rendezvous with the command module, and journeyed back to earth. In total, the crew spent eight days in space, and the mission was a huge success — so successful that Apollo 11 was the mission that met Kennedy’s goal in July 1969.

The Apollo 10 lunar module Snoopy returns from almost landing on the moon

The Apollo 10 lunar module Snoopy returns from almost landing on the moon

Interestingly, many people at NASA thought early in the Apollo program that Apollo 12 was likely to be the mission that landed first on the moon. Perhaps if Apollo 9 had happened before Apollo 8 (as was originally planned), there might have been two separate missions to test the manned Saturn V and then a manned Saturn V to the moon. Certainly, if something substantial had gone wrong between Apollo 7 and 10, Apollo 11 would have been repeating validation of the space craft, space suits, and processes.

The tortoise beats the hare

The Soviet Union went all-in with Soyuz 1. It was the first flight of the new Soyuz spacecraft and Soyuz rocket, and was planned to be a rendezvous with the three-manned Soyuz 2. The mission had problems from the start — a solar panel failed to deploy, and this delayed the launch of Soyuz 2. The weather turned bad, and Soyuz 2 didn’t launch. This was fortunate, as the parachute on Soyuz 1 didn’t deploy due to a design fault, and the single Cosmonaut died on reentry. If Soyuz 2 had launched, the crew wouldn’t have survived.

Vladimir Komarov, the cosmonaut who died in the ill-fated Soyuz 1

Vladimir Komarov, the cosmonaut who died in the ill-fated Soyuz 1

This was 1967, a year before Apollo 7. The Soviets went for broke, testing rockets, capsules, rendezvous, and more in one mission. On paper, it looked like they were ahead. The result was failure — and an 18-month delay in the program, and ultimately failure to get to the moon before the Americans. Indeed, Soyuz 4 and 5 in early 1969 eventually completed the mission aims of Soyuz 1 and 2.

The tortoise beat the hare. It was a pretty fast tortoise, but you see the point. The pragmatic approach of trying one complex new component in each mission ultimately made the Apollo program successful. Doing everything at once didn’t work. There’s something in that for all of us. See you next time.