Passwords are in the news lately, and particularly associated with ‘leaks’.
In general, passwords are rarely leaked, because they usually aren’t stored. What is usually leaked are the password hashes.
How Passwords Work
When you provide a new password for an account, here’s what typically happens:
- The company hashes your password so that it becomes a string of characters that isn’t the password (more in a moment)
- The company stores the hash
- The company throws away your original password
When you come back to the site, and provide your password, here’s what happens:
- You type your password
- The password is hashed (using the same algorithm as before)
- The hash is compared to what’s stored by the company
- If they’re the same, you’re in. If they’re different, your password is wrong, try again
The method that’s used to turn a password into a hash should be one-way. That is, it should be theoretically impossible to reverse the hash into the password. That’s actually pretty easy: most hashing algorithms throw away lots of information, and so the hash is lossy (that is, it has less information in it than the original password — it’s just a very-likely-to-be-unique string that represents the password). So it is actually usually impossible to reverse a password hashing algorithm.
There are many different algorithms for one-way hashing that can’t be reversed. The most popular one-way hash is the 128-bit MD5 hash, though this has been shown to be somewhat insecure (I’ll explain this in a minute). More recent approaches such as SHA-2 are similar in their theory but more secure.
Here’s an example. If you put the string password into an MD5 function, you get 5f4dcc3b5aa765d61d8327deb882cf99 as the output.
How can something that’s lossy be insecure? Since it’s impossible to reverse it, how can it be used by a hacker to find the original password?
The answer is that you can try putting vast numbers of original passwords through the hashing algorithm, and see if you can match the output to what is actually stored. So, for example, you could hash every English word using MD5, and see if you get a string that matches the leaked hashes. If you do, you’re in. And if you find one that matches, chances are you find many more that match too — lots of users have the same password.
That’s the problem with the MD5 hash: computers are sufficiently fast that it’s possible to try very, very large numbers of input strings and compare them to hashes (more later). And so if you’ve got the computing resources, you can effectively “reverse” the hashing.
The best way to make it very hard for hackers to figure out the original passwords is to salt them. Salting is a pretty simple idea: instead of hashing the password, hash the password and some other string concatenated to it. That “some other string” is non changing: it could the user’s user ID in the system, the time they created their account, or something else that’s stored but not typically not known to the user.
Here’s an example. Suppose the user supplies the password “magpie”. Instead of hashing this alone, we might append their user ID, say “123456” to the string. So, we’d be hashing “magpie123456” to get our password hash that’s stored in the system.
How does salting help? Well, it makes it hard to reverse the hashing: a hacker won’t be able to match common passwords (say, English words) against the hashes, and so they’ll effectively have to try vastly more input strings. When they crack one, they typically also won’t crack more, since even if the passwords of two users are the same, the salted strings aren’t — “magpie123456” and “magpie123457” produce vastly different hashes.
Of course, if a hacker gains access to the complete database, and the salting algorithm — and so they have all of the input data, and the algorithm for creating the salted strings, you’re in lots of trouble. But if they’ve done that, you’ve got worse things to worry about.
Salting is essential. Don’t build a password system without it. And when you do build the password system, consider a hashing algorithm such as SHA-2. Even the author of MD5 doesn’t recommend using it now — computers have enough resources to brute force crack MD5 using a birthday attack.
I’ll be back next week.