### Return to Table of Contents

John Nash was a mathematician who earned the 1994 Nobel Prize in Economics for his work in game theory. His life was made into a successful movie, *A Beautiful Mind.*

In 1955, Nash was in correspondence with the United States National Security Agency (NSA),^{1} discussing new methods of encryption that he had devised. In these letters, he also proposes some general principles of cryptography (bold highlighting not in the original):

^{1}The original letters, handwritten by Nash, are available at: http://www.nsa.gov/public_info/press_room/

2012/nash_exhibit.shtml.

*We see immediately that in principle the enemy needs very little information to begin to break down the process. Essentially, as soon as r bits ^{2} of enciphered message have been transmitted the key is about determined. This is no security, for a practical key should not be too long. But this does not consider how easy or difficult it is for the enemy to make the computation determining the key. If this computation, although possible in principle, were sufficiently long at best then the process could still be secure in a practical sense*

*The most direct computation procedure would be for the enemy to try all *2^{r} possible keys, one by one. Obviously this is easily made impractical for the enemy by simply choosing r large enough.

*In many cruder types of enciphering, particularly those which are not autocoding, such as substitution ciphers [letter for letter, letter pair for letter pair, triple for triple..] shorter means for computing the key are feasible, essentially because the key can be determined piece meal, one substitution at a time.*

*So a logical way to classify enciphering processes is by the way in which the computation length for the computation of the key increases with increasing length of the key. This is at best exponential and at worst probably a relatively small power of r, ar^{2} or ar^{3}, as in substitution ciphers.*

*Now my general conjecture is as follows: For almost all sufficiently complex types of enciphering, especially where the instructions given by different portions of the key interact complexly with each other in the determination of their ultimate effects on the enciphering, the mean key computation length increases exponentially with the length of the key, or in other words, with the information content of the key.*

*The significance of this general conjecture, assuming its truth, is easy to see. It means that it is quite feasible to design ciphers that are effectively unbreakable. As ciphers become more sophisticated the game of cipher breaking by skilled teams, etc. should become a thing of the past.*

^{2}Nash is using *r* to denote the length of the key, in bits

Nash’s letters were declassified only in 2012, so they did not directly influence the development of modern cryptography. Still, his letters illustrate the most important idea of “modern” cryptography: that **security can be based on the computational difficulty of an attack rather than on the impossibility of an attack**. In other words, we are willing to accept that breaking a cryptographic scheme may be possible

*in principle*, as long as breaking it would require too much computational effort to be feasible.

We have already discussed one-time-secret encryption and secret sharing. For both of these tasks we were able to achieve a level of security guaranteeing that attacks are *impossible in principle*. But that’s essentially the limit of what can be accomplished with such ideal security. Everything else we will see in this class, and every well-known product of cryptography (public-key encryption, hash functions, etc.) has a “modern”-style level of security, which guarantees that attacks are merely *computationally infeasible*, not *impossible*.

**4.1: Polynomial-Time Computation**

Nash’s letters also spell out the importance of distinguishing between computations that take a polynomial amount of time and those that take an exponential amount of time.^{3} Throughout computer science, polynomial-time is used as a formal definition of “efficient,” and exponential-time (and above) is synonymous with “intractible.”

^{3}Nash was surprisingly ahead of his time here. Polynomial-time wasn’t formally proposed as a natural definition for “efficiency” in computation until Alan Cobham, 10 years after Nash’s letter was written (Alan Cobham, The intrinsic computational difficulty of functions, in Proc. Logic, Methodology, and Philosophy of Science II, 1965). Until Nash’s letters were declassified, the earliest well-known argument hinting at the importance of polynomial-time was in a letter from Kurt Gödel to John von Neumann. But that letter is not nearly as explicit as Nash’s, and was anyway written a year later.

In cryptography, it makes a lot of sense to not worry about guaranteeing security in the presence of attackers with unlimited computational resources. Not even the most powerful nation-states can invest 2^{100} CPU cycles towards a cryptographic attack. So the modern approach to cryptography (more or less) follows Nash’s suggestion, demanding that breaking a scheme requires exponential time.

### Definition 4.1

*A program runs in polynomial time if there exists a constant c* > 0

*such that for all sufficiently long input strings x, the program stops after no more than O*(|

*x*|

^{c}) steps.

Polynomial time is not a perfect match to what we mean by “efficient.” Polynomial time includes algorithms with running time Θ(*n*^{1000}), while excluding those with running time Θ(*n*^{log log log n}). Despite that, it’s extremely useful because of the following *closure property*: repeating a polynomial-time process a polynomial number of times results in a polynomial-time process overall.

### Security Parameter

The definition of polynomial-time is *asymptotic*, since it considers the behavior of a computer program *as the size of the inputs grows to infinity*. But cryptographic algorithms often take multiple different inputs to serve various purposes. To be absolutely clear about our “measuring stick” for polynomial time, we measure the efficiency of cryptographic algorithms (and adversaries!) against something called the **security parameter**, which is the number of bits needed to represent secret keys and/or randomly chosen values used in the scheme. We will typically use *λ* to refer to the security parameter of a scheme.

It’s helpful to think of the security parameter as a tuning knob for the cryptographic system. When we dial up this knob, we increase the size of the keys in the system, the size of all associated things like ciphertexts, and the required computation time of the associated algorithms. Most importantly, the amount of effort required by the honest users grows reasonably (as a polynomial function of the security parameter) while the effort required to violate security increases faster than any (polynomial-time) adversary can keep up.

### Potential Pitfall: Numerical Algorithms

The public-key cryptographic algorithms that we will see are based on problems from abstract algebra and number theory. These schemes require users to perform operations on very large numbers. We must remember that representing the number *N* on a computer requires only ┌log_{2} (*N*+1)┐ bits. This means that ┌log_{2} (*N*+1)┐, rather than *N*, is our security parameter! We will therefore be interested in whether certain operations on the number *N* run in polynomial-time as a function of ┌log_{2} (*N* + 1)┐, rather than in *N*. Keep in mind that the difference between running time *O*(log *N*) and *O*(*N*) is the difference between writing down a number and counting to the number.

For reference, here are some numerical operations that we will be using later in the class, and their known efficiencies:

By “efficient,” we mean polynomial-time. However, all of the problems in the right-hand column *do* have known polynomial-time algorithms on quantum computers.

**4.2: Negligible Probabilities**

In all of the cryptography that we’ll see, an adversary can *always* violate security simply by guessing some value that was chosen at random, like a secret key. However, imagine a system that has 1024-bit secret keys chosen uniformly at random. The probability of correctly guessing the key is 2^{−1024}, which is so low that we can safely ignore it.

We don’t worry about “attacks” that have such a ridiculously small success probability. But we would worry about an attack that succeeds with, say, probability 1/2. Somewhere between 2^{−1024} and 2^{−1} we need to find a sensible place to draw the line. In the same way that polynomial time formalizes “efficient” running times, we will use an *asymptotic* definition of what is a negligibly small probability.

Consider a scheme with keys that are *λ* bits long. Then a blind guess is correct with probability 1/2^{λ}. Now what about an adversary who makes 2 guesses, or *λ* guesses, or *λ*^{42} guesses? Such an adversary would still run in polynomial time, and might succeed in its attack with probability 2/2^{λ} , *λ*/2^{λ} , or *λ*^{42}/2^{λ}.

Our starting point here is 1/2^{λ}, and the nice thing about that probability is that, no matter what polynomial we place on top, we still get a very small probability indeed. This idea gives rise to our formal definition:

### Definition 4.2: Negligible:

*A function* ? *is negligible if, for every polynomial p, we have* lim

_{λ→∞}

*p*(

*λ*)?(

*λ*) = 0.

In other words, a negligible function approaches zero so fast that you can never catch up when multiplying by a polynomial. This is exactly the property we want from a security guarantee that is supposed to hold against all polynomial-time adversaries. If a polynomial-time adversary succeeds with probability ?, then running the same attack *p* times would still be an overall polynomial-time attack (if *p* is a polynomial), and potentially have success probability *p* · ?.

#### Example:

*The function* ?(*λ*) = 1/2^{λ} *is negligible, since for any polynomial p*(*λ*) = *λ ^{c}*, we have:

*since c*log(*λ*) − *λ approaches −∞ in the limit, for any constant c.
*

*Similarly, functions like *1/2^{λ/2}, 1/2^{log2 λ}, and 1/n^{log n} are all negligible.

In this class, when we see a negligible function, it will typically always be one that is easy to recognize as negligible (just as in an undergraduate algorithms course, you won’t really encounter algorithms where it’s hard to tell whether the running time is polynomial).

### Definition 4.3: (*f ≈ g*)

*If* ?,*g* : ℕ → ℝ *are two functions, we write *? ≈ *g* to mean that |?(*λ*) − *g*(*λ*)| *is a negligible function*.

We use the terminology of negligible functions exclusively when discussing probabilities, so the following are common:

^{4}Pr[A] ≈ Pr[B] doesn’t mean that events *A* and *B* almost always happen **together** (when *A* and *B* are defined over a common probability space) — imagine *A* being the event “the coin came up heads” and *B* being the event “the coin came up tails.” These events have the same probability but never happen together. To say that “*A* and *B* almost always happen together,” you’d have to say something like Pr[*A* ⊕ *B*] ≈ 0, where *A* ⊕ *B* denotes the event that *exactly one* of *A* and *B* happens.

Additionally, the ≈ symbol is *transitive*:^{5} if Pr[*X*] ≈ Pr[*Y*] and Pr[*Y*] ≈ Pr[*Z*], then Pr[*X*] ≈ Pr[*Z*] (perhaps with a slightly larger, but still negligible, difference).

^{5}It’s only transitive when applied a polynomial number of times. So you can’t define a whole series of events *X _{i}*, show that Pr[

*X*] ≈ Pr[

_{i}*X*], and conclude that Pr[

_{i+1}*X*] ≈ Pr[

_{1}*X*]. It’s rare that we’ll encounter this subtlety in this course.

_{2}n

**4.3: Indistinguishability**

So far we have been writing formal security definitions in terms of interchangeable libraries, which requires that two libraries have *exactly the same* effect on *every* calling program. Going forward, our security definitions will not be quite as demanding. First, we only consider polynomial-time calling programs; second, we don’t require the libraries to have exactly the same effect on the calling program, only that the effect is negligible.

### Definition 4.4: Indistinguishability

*Let* ℒ_{left} *and* ℒ_{right} *be two libraries with a common interface. We say that* ℒ_{left} *and* ℒ_{right} *are indistinguishable, and write* ℒ_{left} ≋ ℒ_{right}, *if for all polynomial-time programs* ? *that output a single bit*, Pr[? ◊ ℒ_{left} ⇒ 1] ≈ Pr[? ◊ ℒ_{right} ⇒ 1].

*We call the quantity *|Pr[? ◊ ℒ_{left} ⇒ 1] − Pr[? ◊ ℒ_{right} ⇒ 1]| *the advantage or bias of* ?

*in distinguishing*ℒ

_{left}

*from*ℒ

_{right}.

*Two libraries are therefore indistinguishable if all polynomial-time calling programs have negligible advantage in distinguishing them.*

From the properties of the “≈” symbol, we can see that indistinguishability of libraries is also transitive, which allows us to carry out hybrid proofs of security in the same way as before.

And similar to before, we also have a library composition lemma (you are asked to prove this as an exercise):

### Lemma 4.5: Composition:

*If* ℒ_{left} ≋ ℒ_{right} *then* ℒsup>∗ ◊ ℒ_{left} ≋ ℒ^{∗} ◊ ℒ_{right} *for any polynomial-time library* ℒ^{∗}.

### Bad-Event Lemma

A common situation is when two libraries carry out exactly the same steps until some exceptional condition happens. In that case, we can bound an adversary’s distinguishing advantage by the probability of the exceptional condition.

More formally, we can state a lemma of Bellare & Rogaway.^{6} We present it without proof, since it involves a *syntactic *requirement. Proving it formally would require a formal definition of the syntax/semantics of the pseudocode that we use for these libraries.

^{6}Mihir Bellare & Phillip Rogaway: “Code-Based Game-Playing Proofs and the Security of Triple Encryption,” in Eurocrypt 2006. ia.cr/2004/331

### Lemma 4.6: Bad Events

*Let* ℒ_{left} *and* ℒ_{right} *be libraries that each define a variable bad that is initialized to 0. If* ℒ_{left} and ℒ_{right} *have identical code, except for code blocks reachable only when bad = 1* (*e.g., guarded by an “if bad = 1” statement*), *then*

**4.4: Sampling with Replacement & the Birthday Bound**

Below is an example of two libraries which are not *interchangeable* (they do have mathematically different behavior), but are *indistinguishable*. These two libraries happen to be convenient for later topics, as well.

### Lemma 4.7: Replacement Sampling

*Define two libraries as below*

*Then for all calling programs* ? *that make q queries to the SAMP subroutine, the advantage of* ? *in distinguishing the libraries is at most q*(*q* − 1)/2^{λ+1}.

*In particular, when* ? *is polynomial-time* (*in λ*), *then q grows as a polynomial in the security parameter. Hence,* ? *has negligible advantage, and* ℒ^{samp-L} ≋ ℒ^{samp-R}.

ℒ^{samp-L} uniformly samples *λ*-bit strings with replacement (*i.e.*, allowing for duplicates). In ℒ^{samp-R}, we’ve initialized the set *R* outside of any subroutine, by which we mean that *R* is a static (its value is maintained across different calls to SAMP) and *private* (its value is not directly accessible to the calling program). ℒ_{samp-R} uses the set *R* to keep track of which values have been previously sampled, to sample *without* replacement and ensure that there are no duplicate outputs.

The overall idea here is that the only way to distinguish the two libraries *seems to be* to wait for a repeated output of the SAMP algorithm. But when sampling uniformly from a huge set with *2 ^{λ}* items, it is very unlikely (but not impossible!) to ever see a repeated value unless one asks for a huge number of samples. For that reason, the two libraries should be indistinguishable. This is an accurate intuition, but it’s not a formal proof. Among other things, we need to determine just how unlikely repeated outputs are, and argue that the distinguishing advantage is related to the probability of repeated outputs.

### Proof:

Consider the following hybrid libraries:

First, let us prove some simple observations about these libraries:

ℒ_{hyb-L} ≡ ℒ_{samp-L}: Note that ℒ_{hyb-L} simply samples uniformly from {0,1}^{λ}. The extra *R* and bad variables in ℒ_{hyb-L} don’t actually have an effect on its external behavior (they are used only for convenience later in the proof).

ℒ_{hyb-R} ≡ ℒ_{samp-R}: Whereas ℒ_{samp-R} avoids repeats by simply sampling from {0,1}^{λ}\ *R*, this library ℒ_{hyb-R} samples *r* uniformly from {0,1}^{λ} and retries if the result happens to be in *R*. This method is called *rejection sampling*, and it has the same effect^{7} as sampling *r* directly from {0,1}^{λ}\ *R*.

^{7}The two approaches for sampling from {0,1}^{λ}\ *R* may have different running times, but our model considers only the input-output behavior of the library.

We can also see that ℒ_{hyb-L} and ℒ_{hyb-R} differ only in code that is reachable only when bad = 1 (highlighted). So, using Lemma 4.6, we can bound the advantage of the calling program:

It suffices to show that in ? ◊ ℒ_{hyb-L} the variable bad is set 1 only with negligible probability. It turns out to be easier to reason about the complement event, and argue that bad remains 0 with probability negligibly close to 1.

Suppose ? makes *q* calls to the SAMP subroutine. Let *r*_{1},. . . ,*r*_{q} be the responses of these calls. For variable bad to remain 0, it must be the case that for each *i*, we have *r*_{i} ∉ {*r _{1}*,. . . ,

*r*}. Then,

_{i-1}_{hyb-L}]

For the inequality we used the convenient fact that when *x* and *y* are positive, we have (1 − *x*)(1 − *y*) = 1 − (*x* + *y*) + *xy* ⩾ 1 − (*x* + *y*). More generally, when all terms *x _{i}* are positive, ∏

_{i}(1 −

*x*) ⩾ 1 − ∑

_{i}_{i}

*x*.

_{i}Summing up, the advantage of ? in distinguishing ℒ_{samp-L} and ℒ_{samp-R} is:

When ? is a polynomial-time distinguisher, *q* is a polynomial function of *λ*, so ?’s advantage is negligible. This shows that ℒ_{samp-L} ≋ ℒ_{samp-R}.

### Birthday Bound

As part of the previous proof, we showed that the probability of repeating an item while taking *q* uniform samples from {0,1}^{λ} is *O*(q^{2}/2^{λ}). This is an upper bound, but we can come up with a lower bound as well:

### Lemma 4.8: Birthday Bound

*When taking q uniform samples from *{0,1}* ^{λ}, where q* ⩽ √(2

^{λ + 1}),

*the probability of encountering a repeated value is at least*0.632 * (

^{q(q−1)}/

_{2λ+1}).

*In particular, when q*= √(2

^{λ+1}),

*a repeated value is encountered with probability at least 1/2.*

### Proof:

We use the fact that for *x* ∈ [0,1], we have 1 − *x* ⩽ *e*^{−x} ⩽ 1 − (0.632…)*x*, where the constant 0.632 … is 1 −(^{1}/_{e}).

Let *r _{1}*,… ,

*r*be the values uniformly sampled from {0,1}

_{q}^{λ}. As above, we

*fail*to encounter a repeated value if the

*r*’s are all distinct. Applying the bounds from above, we have:

_{i}The second inequality follows from the fact that *q*(*q* − 1)/2^{λ+1} < 1 by our bound on *q*. So the probability of a repeated value among the *r _{i}*’s is

More generally, when sampling uniformly from a set of *N* items, taking √(2*N*) samples is enough to ensure a repeated value with probability -0.63. The bound gets its name from considering *q* = √(2 · 365) ≈ 27 people in a room, and assuming that the distribution of birthdays is uniform across the calendar. With probability at least 0.63, two people will share a birthday. This counterintuitive fact is often referred to as the *birthday paradox*, although it’s not an actual paradox.^{8}

^{8}I think the actual birthday paradox is that the “birthday paradox” is not a paradox.

In the context of cryptography, we often design schemes in which the users take many uniform samples from {0,1}^{λ}. Security often breaks down when the same value is sampled twice, and this happens after about √(2^{λ + 1}) ∼ 2^{λ/2} steps.

**Exercises:**

4.1: Which of the following are negligible functions in *λ*? Justify your answers.

4.2: Show that when ? is negligible, then for every polynomial *p*, the function *p*(*λ*)?(*λ*) not only approaches 0, but it is also negligible itself.

4.3: Prove that the ≈ relation is transitive. Let *f,g,h* : ℕ → ℝ be functions. Using the definition of the ≈ relation, prove that if ? ≈ *g* and *g* ≈ *h* then ? ≈ *h*. You may find it useful to invoke the *triangle inequality*: |*a* − *c*| ≤ |*a* − *b*| + |*b* − *c*|.

4.4: Prove Lemma 4.5.

4.5: A *deterministic* program is one that uses no random choices. Suppose ℒ_{1} and ℒ_{2} are two *deterministic* libraries with a common interface. Show that either ℒ_{1} ≡ ℒ_{2}, or else ℒ_{1} & ℒ_{2} can be distinguished with advantage 1.

4.6: For this problem, consider the following two libraries:

(a). Prove that the two libraries are indistinguishable. More precisely, show that if an adversary makes *q*_{1} number of calls to AVOID and *q*_{2} calls to SAMP, then its distinguishing advantage is at most *q*_{1}*q*_{2}/2* ^{λ}*. For a polynomial-time adversary, both

*q*

_{1}and

*q*

_{2}(and hence their product) are polynomial functions of the security parameter, so the advantage is negligible.

(b). Suppose an adversary makes a total of *n _{i}* calls to AVOID before making its

*i*th call to SAMP, and furthermore the adversary makes

*q*calls to SAMP overall (so that

*n*

_{1}⩽

*n*

_{2}⩽ · · · ⩽

*n*). Show that the two libraries can be distinguished with advantage at least:

_{q}

4.7: Prove the following generalization of Lemmas 4.7 & 4.8

Fix a value *x* ∈ {0,1}* ^{λ}*. Then when taking

*q*uniform samples from {0,1}

*, the probability that there exist two distinct samples whose XOR is*

^{λ}*x*is at least 0.63

*q*(

*q*− 1)/2

^{λ – 1}and at most

*q*(

*q*− 1)/2

^{λ + 1.}