Passwords are in the news lately, and particularly associated with ‘leaks’.
In general, passwords are rarely leaked, because they usually aren’t stored. What is usually leaked are the password hashes.
How Passwords Work
When you provide a new password for an account, here’s what typically happens:
- The company hashes your password so that it becomes a string of characters that isn’t the password (more in a moment)
- The company stores the hash
- The company throws away your original password
When you come back to the site, and provide your password, here’s what happens:
- You type your password
- The password is hashed (using the same algorithm as before)
- The hash is compared to what’s stored by the company
- If they’re the same, you’re in. If they’re different, your password is wrong, try again
The method that’s used to turn a password into a hash should be one-way. That is, it should be theoretically impossible to reverse the hash into the password. That’s actually pretty easy: most hashing algorithms throw away lots of information, and so the hash is lossy (that is, it has less information in it than the original password — it’s just a very-likely-to-be-unique string that represents the password). So it is actually usually impossible to reverse a password hashing algorithm.
There are many different algorithms for one-way hashing that can’t be reversed. The most popular one-way hash is the 128-bit MD5 hash, though this has been shown to be somewhat insecure (I’ll explain this in a minute). More recent approaches such as SHA-2 are similar in their theory but more secure.
Here’s an example. If you put the string password into an MD5 function, you get 5f4dcc3b5aa765d61d8327deb882cf99 as the output.
How can something that’s lossy be insecure? Since it’s impossible to reverse it, how can it be used by a hacker to find the original password?
The answer is that you can try putting vast numbers of original passwords through the hashing algorithm, and see if you can match the output to what is actually stored. So, for example, you could hash every English word using MD5, and see if you get a string that matches the leaked hashes. If you do, you’re in. And if you find one that matches, chances are you find many more that match too — lots of users have the same password.
That’s the problem with the MD5 hash: computers are sufficiently fast that it’s possible to try very, very large numbers of input strings and compare them to hashes (more later). And so if you’ve got the computing resources, you can effectively “reverse” the hashing.
The best way to make it very hard for hackers to figure out the original passwords is to salt them. Salting is a pretty simple idea: instead of hashing the password, hash the password and some other string concatenated to it. That “some other string” is non changing: it could the user’s user ID in the system, the time they created their account, or something else that’s stored but not typically not known to the user.
Here’s an example. Suppose the user supplies the password “magpie”. Instead of hashing this alone, we might append their user ID, say “123456” to the string. So, we’d be hashing “magpie123456” to get our password hash that’s stored in the system.
How does salting help? Well, it makes it hard to reverse the hashing: a hacker won’t be able to match common passwords (say, English words) against the hashes, and so they’ll effectively have to try vastly more input strings. When they crack one, they typically also won’t crack more, since even if the passwords of two users are the same, the salted strings aren’t — “magpie123456” and “magpie123457” produce vastly different hashes.
Of course, if a hacker gains access to the complete database, and the salting algorithm — and so they have all of the input data, and the algorithm for creating the salted strings, you’re in lots of trouble. But if they’ve done that, you’ve got worse things to worry about.
Salting is essential. Don’t build a password system without it. And when you do build the password system, consider a hashing algorithm such as SHA-2. Even the author of MD5 doesn’t recommend using it now — computers have enough resources to brute force crack MD5 using a birthday attack.
I’ll be back next week.
Actually, double like!!!! 🙂
I’d recommend using OTP as ultimate solution. For example, Yubi key will be a nice alternative to traditional token.
Its completely upto the developer/company. Few of them are saving the user passwords directly into their db (No doubt, I’ve seen many). 🙂
Nicely explained. Wish that LinkedIn’s “world class” security team would read this.
My company has been doing this for a few years now. We use a random numbers of varying lengths for the salt and it changes each time the user changes their password. We also encypt the salt on the database. I wish I could say we thought of it first but, the idea has been on the net for some time.
LinkedIn’s “world class” security team is well aware of this and way ahead of the hackers. That’s why very few of the “published” passwords were not still hashed. They were probably much older passwords that had not been rehashed under the current salting system.
technically short salts may not serve as much of a purpose as
a) long and different salts for every account. (still won’t protect your users from someone brute-forcing a single account password)
b) using a password storage crypto system e.g. bcrypt, not a single salted hash. (these would also be salted I imagine)
Keep in mind, salting with different large enough values (not 1 D:) renders rainbow tables useless (can’t just use one table for all users off the net). BUT
What salts won’t do is protect your (individual) users from a dictionary attack on their password. That’s where bcrypt and the like come in.
You get computationally expensive (100s of passwords a second, not millions) repeated password hash calculations, and it’s much harder (computationally expensive) to get (repeatedly guess) a password for even a single user, vs a salted hash.
For a salted hash, if you know the salt value (and it’s just prepended or appended), you can use john the ripper with a pre-harvested custom dictionary (e.g harvesting user info from a social networking site, or their company site), or a variation of dumped/cracked passwords from similar sites and run it through a progressive series of password permutations (based on the most common ones people use).
tho, Bcrypt’s use of computational resources (CPU) for iterative getting of the final password hash seems rather asking for an application level DoS for budget one server sites, where a low number of requests (1000s) per second will cripple your application. (if you have a server farm and dedicated login servers you’re probably peachy)
as a sidenote, the entropy of passphrases is also debatable (ignoring the psychological choice/forgetting and writing down biases and complications). (say 20000P5 (which is pretty generous for some people’s vocab and memory, where some say 3000p4 is more reasonable), vs 50P8 or 50P12 for passwords)
>MD5 doesn’t recommend using it now — computers have enough resources to brute force crack MD5 using a birthday attack.
it’s not recommended because of low computational complexity and wide availability of rainbow tables, not because of collisions.
Collisions are important for digital signatures (and MD5 is not recommended there, because you can get a CA to sign something benign and then brute force mutate something to have the same hash).
That said a collision for a password hash would be bad (same hash = valid password for web most web application purposes) – the amount of computational power that you’d have to expend is not the 2^21 wikipedia suggests at first glance and ‘desktop’ level CPU crackable in seconds.
The thing is, for passwords, you’re generally dealing with a single block MD5 and a limited inputs set (a-zA-Z0-9, some special chars), and the collision vulnerabilities were dealing with arbitrary input bytes and block sizes of 128bytes and more. (well aside from the potential implications from single block collision by http://eprint.iacr.org/2010/643 , but that’s yet unpublished)
>And when you do build the password system, consider a hashing algorithm such as SHA-2.
Salted SHA-2 hashes won’t really save your single user from getting his password bruteforced as per above
Svyr -> we should write a follow up blog post together! Send me an email, and maybe we can draft something for one of the major blog sites. I confess my knowledge is entry level – i took a cryptography class in 1990 (?) and wrote a few pages in one of my books. I saw what was missing in the news / blogosphere was a simple explanation of how passwords should work, and what actually constitutes a leak