Timing Attacks Explained

Jul 25, 2010

Ed. note: This post was originally written as part of the Security Awareness Program at Square. If you find this stuff interesting come join us — we’re working on lots of cool problems, and are growing fast. You can also get in touch with me directly if you’d prefer.

I should also note that it took me a while to get this post up. For “this week”, please read “the week of July 13”. — sq

Update 9/3/2010 Note that this is just a blog post. If you have problems with timing attacks, then you have real problems — and should contact a professional. Don’t assume that because you read this post, you can design a secure cryptosystem.

Timing Attacks Explained

Probably the most interesting bit of security research that happened this week was the dropping of this little bombshell by Taylor Nelson of Root Labs:

Every OpenID implementation I have checked this far has contained timing dependent compares in the HMAC verification, allowing a remote attacker to forge valid tokens.

I want to walk through this issue, talking about what it means, how it happened, and what happened next. But first, some background is probably in order:

OpenID

OpenID is essentially a lightweight single-sign on system for the web. The idea is that you have one “identity” — often just your primary email address — and you use that whenever you sign into other websites. Instead of having to create and remember a password every time you sign up for a new website, those services can simply ask your “identity provider” (eg, gmail) to verify that you are who you say you are. OpenID is the protocol that the websites and identity providers speak in order to carry out this verification.

As you might imagine, under the covers, the OpenID protocol involves passing a bunch of messages around between the identity provider and the web service. Some of these messages might say, in effect, “this user is who he says he is”. If an attacker could forge these messages, then she could create messages saying she is anyone she likes — thus totally destroying the security of the system.

HMAC

To prevent forgery, OpenID uses, appropriately enough, Message Authentication Codes (sometimes called MACs or signatures). In particular, the standard calls for a keyed-hash message authentication code (“HMAC”) — but the details aren’t important for this story. The point is just that, effectively, these messages have signatures — long, seemingly-random strings of bytes — and only the identity provider and the website know how to calculate the signature.

Checking Signatures

The problem that Taylor pointed out is in how these signatures get checked. All of the implementations he checked — in Java, Python, Ruby, etc — checked these signatures in the same, naive way. In pseudo-code:

if (signature(message) == signature_bytes) 
    return true;

In English, that says something like: Compute the correct signature of the message, and compare it (using the “==”operation) to the signature that we received. If (and only if) they match, then the signature is correct.

If you’re thinking that that seems correct, then you’re right — which is why all of the implementors wrote it that way. Unfortunately, in all of the languages in question, it only seems right: there is a subtle bug under the surface.

Comparing Strings

In order to understand the problem, we need to look at the way that languages implement the “==” operation. Say we have two strings:

a = "ABCD"
b = "ABBA"

When we test the truth of the statement a==b, almost every language runs through the same algorithm: First, look at the first character. If they’re the same, move on to the next step. If they’re different, stop now because we know the two strings are different. For the next step, do the same thing with the second character; and repeat until you get to the last character. If you get to the end without finding a difference, then you know the two strings are the same.

In pseudo-code:

for (i = 0; i < a.length; i++) {
	if (a[i] != b[i])
		return false;
}
return true;

(Astute readers will note one important check missing from the pseudo-code. I left it out for clarity; extra credit to whomever spots it first. No bonus points for smart-alecs who point out that this wrong because modern CPUs do this in hardware.)

Applying this algorithm to our two strings, a and b above, we get:

  1. Compare the first characters of each string. They’re the same — both “A” — so we move on to the next step.

  2. Compare the second characters. Again, they’re the same — both “B” — so we move on to the next step.

  3. Compare the third characters. This time, they’re different: the third character of a is “C”, while the third character of b is “B”. Since we know the strings are different, we can stop now.

The Problem

The important point to note here is that the algorithm stops as soon as we notice a difference. This is good, normally, because it means that we don’t waste time continuing to check the strings even after we know they differ. However, in the case of MACs, this optimization is fatal.

Imagine that Alice has a website that uses OpenID, and Bob has an account on the site. Mallory, an attacker, wants to break into Bob’s account. In order to do this, she needs to forge a message that says she is Bob. Since messages are signed, this really means that she has to forge the signature for that message.

Signatures are normally long strings of bytes, but for the purposes of this example, let’s assume they are short strings of upper-case letters from A-Z. Thus, even more concretely, what Mallory has to do is figure out what that ten-letter string is.

The Attack

Here is where the optimization problem comes in. Mallory knows that the first character of the signature is a letter from A to Z. To figure out which letter is correct, she can send each of the following signatures to the server:

AAAAAAAAAA
BAAAAAAAAA
CAAAAAAAAA
DAAAAAAAAA
...
XAAAAAAAAA
YAAAAAAAAA
ZAAAAAAAAA

Now, it’s extremely unlikely that any of these is the correct signature for the message she wants to send. In fact, in 25 out of the 26 cases, the server only has to look at the first letter to know that the signature is wrong. But Mallory knows that, in exactly one of the 26 cases, the first letter will be correct, and the server will have to move on and compare the second letter of her signature. For that one message, the server has to perform more work — and so the server will take slightly longer to return an error message. By timing how long it takes for the server to report an error, Mallory can detect which letter takes longer and thus which is the correct first letter. Armed with this knowledge, she can just repeat the trick and figure out the second letter, and then the third, etc etc.

And that’s the attack. Mallory can use this trick to figure out, for a given message, what the correct signature is. And that means she can totally compromise the security of OpenID — in this case, by convincing Alice’s server that she is Bob.

But…

Many folks, when they understand the attack, respond by saying something like, “Ok, there may be a timing difference — but it’s very very small, and the Internet is very very big. There’s no way you could detect which signature takes longer!”

Intuitively, this seems right. Given all the routers and proxies and whatnot on the Internet, there are a ton of random network-level delays that are much bigger than the delay introduced by one extra character comparison. For each message Mallory sends, this network jitter will outweigh any information about which signature took longer to process.

But of course, Mallory isn’t limited to trying each signature once. She can try as many times as she likes — and by averaging results across a large enough data set, she can remove the effects of this jitter. This is the magic of statistics: any difference can be observed, given enough measurements.

In this case, Crosby et al showed that a few tens of thousands of measurements is all it takes to time things with 15 microsecond accuracy over the Internet. (On a LAN, the same number of measurements can pinpoint timings to ~100 nanoseconds.) The question of just how much precision Mallory needs in order to carry out her attack will depend on the language used to implement the server — but as an example, Dan Boneh and friends showed in 2003 that, due to a similar timing vulnerability in OpenSSL, a few million requests could reveal an entire 1024-bit SSL private key. More concretely, the reason that this issue in OpenID is being discussed now is that the authors at Root Labs are going to be presenting new results at Blackhat in a few weeks — results that, presumably, show how OpenID can be compromised within a reasonable number of requests by using this attack.

The Fix

The problem here is essentially that the timing of the server’s responses reveals information about the correct signature. The solution, then, is to ensure that the server takes the same amount of time to respond no matter which signature is sent. Remember that the implementation of == in most languages stops checking the strings as soon as a difference is found. What we want is a comparison function that checks the whole string before reporting whether or not a difference was found. In pseudo-code:

match = true;
for (i = 0; i < a.length; i++) {
	if (a[i] != b[i])
		match := false;
}
return match;

This is exactly the same as the previous == definition, except that the return statement inside the for loop has been removed. Now, when a difference is detected, that will be noted — but the comparison will still continue through the rest of the string. Only once the entire string has been checked will the result be returned.

This is a good solution, but the astute reader will note that the code that gets executed still depends on the equality of the strings. Two strings that differ only in one character will execute the “match := false” line exactly once, while strings that differ in all 10 places will execute it 10 times, and strings that match exactly will execute it zero times. Even this kind of difference could conceivably be used to reveal information about the true signature; though it’s somewhat more complicated, the idiom for doing this kind of comparison is actually:

match = 0;
for (i = 0; i < a.length; i++) {
	match = match or (a[i] xor b[i])
}
return match == 0;

Here, “or” and “xor” should be taken as binary arithmetic operations, which don’t short-circuit — if that doesn’t make sense to you, don’t worry: this form does essentially the same thing as the last one. It iterates through the strings, noting every bit that differs between the two strings, and then checks at the end that no differences were noted. The difference is that, in this version, the operations that get executed don’t depend on the input in any way. No matter what the strings look like, the same or and xor operations are performed. This version is totally immune to any sort of timing attack.

The Lesson

This kind of timing vulnerability is really common. Remember that every OpenID implementation that the Root Labs guys checked had this flaw, even though they mostly written by separate developers. The same problem was found in some OAuth implementations, in Java 6’s MessageDigest.isEqual method, in Rails’ cookie-based session store, and all kinds of other places.

The lesson to learn here is emphatically not that cryptography is magic, or that developers shouldn’t bother trying to understand it. These issues are tricky, to be sure, and some security “experts” recommend against developers ever touching crypto code because of the possibility of error. But this kind of thinking leads to terrible code, as talented developers just end up neglecting the code in question. (It’s even worse when those same “experts” lash out at folks who try to improve the situation, blaming other people for their own code-rot… but I digress.)

Instead, I think the lesson here is twofold. First, this stuff is subtle — but once you understand what’s going on, it’s not that hard to figure out. I hope that the explanation above made sense (let me know if not!), but even if I butchered the delivery, I promise you that it’s not actually rocket science. (Figuring out that timing attacks are possible in the first place is certainly pretty hard, as is finding reliable ways to exploit them. But defending against them is a lot easier.)

Second, it’s good to note that this particular flaw — comparing secret strings using simple == operations — is really common, so it’s wise to understand it and keep an eye out for it. On the one hand, those who do not comprehend the bugs of the past are condemned to repeat them; on the other, as long as we can learn from our mistakes and the mistakes of others, we’ll be in great shape.

FIN