Cryptographic Random Numbers


Please send comments on this draft to Carl Ellison at cme@acm.org.

This document was originally prepared as an appendix for the P1363 standard. The current version of that standard has incorporated some of these ideas in the security considerations section. There are versions of this document available in PostScript and in LaTeX source.


Introduction

Although the term is appropriate and is used in the field, the phrase ``random numbers'' can be misleading. To many people, it suggests random number generator functions in the math libraries which come with one's compiler. Such generator functions are insecure and to be avoided for cryptographic purposes.

What one needs for cryptography is values which can not be guessed by an adversary any more easily than by trying all possibilities [that is, ``brute force'']. There are several ways to acquire or generate such values, but none of them is guaranteed. Therefore, selection of a random number source is a matter of art and assumptions, as indicated below and in the RFC on randomness by Eastlake, Crocker and Schiller[RFC1750].

Need for random bits

One needs random bits (or values) for several cryptographic purposes, but the two most common are the generation of cryptographic keys (or passwords) and the blinding of values in certain protocols.

Criterion for a random source

There are several definitions of randomness used by cryptographers, but in general there is only one criterion for a random source -- that any adversary with full knowledge of your software and hardware, the money to build a matching computer and run tests with it, the ability to plant bugs in your site, etc., must not know anything about the bits you are to use next even if he knows all the bits you have used so far.

Random Sources

Random sources can be classified as either true-random or pseudo-random. The latter are algorithms which immitate the former. However, the concept of randomness is as much philosophical as physical or mathematical and is far from resolved.

True-random sources can be considered unconditionally unguessable, even by an adversary with infinite computing resources, while pseudo-random sources are good only against computationally limited adversaries.

True Random Sources

The process to obtain true-random bits typically involves the following steps.

Harvest bits

One first gathers some bits unknown to and unguessable by the adversary. These must come from some I/O device [[The alternative is for them to be generated by program -- but we have assumed that the adversary knows all our software and can therefore run the same program.]]. Those bits are not necessarily all independent. That is, one might be able to predict some one harvested bit with probability greater than 1/2, given all the others. The adversary might even know entire subsequences of the bits. What is important is that the harvested bits contain information (entropy) which is unavailable to the adversary.

Determine entropy

The second step is then to determine how many unguessable bits were thus harvested. That is, one needs to know how many of the harvested bits are independent and unguessable [[With a proper hash function, it is not necessary to know which of the bits are independent.]]. This number of bits is usually referred to as entropy and is defined below in detail.

Reduce to independent bits

As a third step, one can compute a hash of the harvested bits to reduce them to independent, random bits. The hash function for this stage of operation needs to have each output bit functionally dependent on all input bits and functionally independent of all other output bits. Barring formal analysis, we assume that the hash functions which are claimed to be cryptographically strong (MD5 and SHA) have this characteristic.

The output of this third step is a set of independent, unguessable bits. These can be used with confidence wherever random bits are called for, subject of course to the assumptions involved in the three steps above.

Pseudo-random Sources

In some cases, one needs more random bits than the available sources of entropy can provide. In such cases, one resorts to pseudo-random number (bit) generators (PRNGs). A PRNG is a function which takes a certain amount of true randomness (called the seed of the PRNG) and generates a stream of bits which can be used as if they were true-random, assuming the adversary is computationally limited and that the seed is large enough to thwart brute force attacks by that adversary.

A cryptographically strong PRNG [[as opposed to merely statistically random sources like the C rand() function]] is an algorithm for which it has been proved that an opponent who knows the algorithm and all of its output bits up to a certain point but not its seed, can not guess the next output bit with any higher probability than ((1/2)+(e)) where (e) usually decreases exponentially with some security parameter, (s) (typically the length of the PRNG seed).

As with any computational complexity argument, such proofs are based on assumptions (such as (P != NP)). A number of reasonable, strong PRNGs are discussed in the literature. See the bibliography of this appendix as well as [RFC1750] for some of these references.

Determination of bits of entropy

Mathematical Definitions

For our purposes, entropy is the information delivered within a stream of bits. It is defined as:
H = - sum_x p_x log_2( p_x )
where (x) is a possible value in a stream of values (e.g., in this example, a contiguous set of bits of some fixed size -- a byte, a word, 100 bytes, ...) and (p_x) is its probability of occurrence (from an infinite population of (x) values, not just a finite sample). Typically, as the values (x) over which entropy is computed increase in size, the entropy increases but not as rapidly as the size.

What we care about is entropy bits (unguessable bits) per bit of source. So, let us also define an entropy rate, (J), as

J = H / |x|
where (|x|) is the size of the symbol (x) in bits. We can also define what might be called the absolute entropy, (E), as
E = min_{1 <= |x| < infinity} J
the guaranteed minimum entropy (unguessability) per bit of source, no matter what symbol size the adversary chooses.

This definition of (E) relies heavily on having an infinite number of infinite length sequences to analyze. For example, any periodic sequence of bits will result in (E=0) since when (|x|) equals the period of the sequence all the values (except for a set of measure 0) are the same and so (H=0). This means that any PRNG output will result in (E=0) since a PRNG is a finite state machine (FSM) and therefore forced to produce periodic output. That finite period might be very long -- longer than any computer could compute. Therefore, (E) can not be computed, numerically, from every actual sequence.

Attempts to Compute (E)

One is forced to compute an approximation of (E) from actual bit strings. Some people use the best available compression algorithm and hope that the compression ratio approximates (E), since if there were a perfect compression algorithm, its output would have (E=1) by definition. Others define statistical tests[Maurer91], to be applied to an output bit string from a hardware generator (such as a noisy resistor) believed to have limited computational ability to fool the test.

If it were possible to compute an actual absolute entropy from a sample string, the value would be the same no matter what the representation of the sample string. That is, there are transformations of a string which preserve all the string's information -- e.g., a Fourier transform, Hadamard transform, difference operation

y_i = x_i - x_{i-1}
etc. -- and the number of absolutely unguessable bits in a bit string must be invariant under such transformations. One can therefore evaluate the quality of some approximation of (E) by performing the algorithm over several transformations of the same string and comparing their results.

However one chooses to compute an approximation for (E), one must further reduce (E) to reflect the fraction of these entropy bits which an adversary might have acquired by guessing or measurement or bugging or creating some bias in the generator process. For example, if one uses a system date and time as a source of bits, then one can expect the adversary to know the date and probably the hour and maybe the minute of the value chosen -- leaving only a few low order bits as possibly hidden from the adversary. If one uses room sounds between 14KHz and 19KHz as the source, an adversary could inject sounds in that frequency range through the room's windows and therefore bias the result. If one uses a mouse-drawn signature as an entropy source, those elements of the track which actually follow the person's signature are guessable by the adversary, so only the noisy deviations from that track count as entropy. If one uses disk head air turbulence[DavisIhFe94] as a random source, poorly designed system or application code could degrade the actual entropy signal (e.g., through very coarse time measurements) and add significant predictable noise to the air turbulence entropy (e.g., through interference with disk completion interrupts by a non-random but random-looking system interrupt), masking the real entropy and making it necessary to reduce (E).

Once (E) is computed to the designer's or user's satisfaction, with whatever allowance one prefers for the possibility of bugging or creation of bias, (E) becomes the fraction of bits one can use of the source stream. Specifically, if one is using a hash function to distill independent bits from the source stream and that function produces (K) bits of output from each operation, one needs to feed the function with (K/E) bits of input from the source.

Sources of Unguessable Values

Almost any input source is a source of entropy. Some are better than others in unguessability and in being hidden from an adversary. Each has its own rate of entropy. Some possible sources at the time of this writing are: The examples below are used frequently as sources of entropy, but can have serious flaws in that they are observable, predictable or subject to influence by a determined adversary, especially if one is using a time-shared computer. That makes the determination of (E) for these sources especially difficult. The following are almost worthless as sources of entropy, but they tend to be used by some because they are convenient.

Expansion of source bits

If one chooses not to use a proven cryptographically strong PRNG for expansion of a true-random seed, there are techniques which are believed good enough for PRNG use. Mistakes in these assumptions can lead to a catastrophic reduction in security and any designer following this path should be careful [[One must be especially careful not to use a seed too small or use too few true-random bits to form the seed.]].

These techniques amount to a one-way function masking some easily predictable (weak) operation. That weak operation could be as simple as counting or as complex as a long-period bit generator[Marsaglia91]. There are some commonly used function combinations:

Assumptions

Unfortunately, at our present level of knowledge, random number sources are a matter of art more than science. We know that we can have adequate random numbers if certain assumptions hold, but we don't know if those assumptions can hold. Specifically, we can have true random numbers if: and we can have pseudo-random numbers if:

Advice

Stepping back from academic reasoning, there are some things to avoid and some things which are definitely good to do.

Things to avoid

Things to do


References

[BethDa90] T. Beth and Zong-duo Dai.
On the complexity of pseudo-random sequences - or: If you can describe a sequence it can't be random.
In J.J. Quisquater and J. Vandewalle, editors, Advances in Cryptology --- Eurocrypt '89, pages 533--543, Berlin, 1990. Springer-Verlag.

[BlumBlSh86] L. Blum, M. Blum, and M. Shub.
A simple unpredictable pseudo-random number generator.
SIAM J. Computing, 15(2):364--383, May 1986.

[BlumBlSh83] Lenore Blum, Manuel Blum, and Michael Shub.
Comparison of two pseudo-random number generators.
In R. L. Rivest, A. Sherman, and D. Chaum, editors, Proc. CRYPTO 82, pages 61--78, New York, 1983. Plenum Press.

[BlumMi84] M. Blum and S. Micali.
How to generate cryptographically strong sequences of pseudo-random bits.
SIAM J. Computing, 13(4):850--863, November 1984.

[Boyar89] Joan Boyar.
Inferring sequences produced by pseudo-random number generators.
Journal of the ACM, 36(1):129--141, January 1989.

[ChorGo85a] B. Chor and O. Goldreich.
Unbiased bits from sources of weak randomness and probabilistic communication complexity.
In Proc. (26)th IEEE Symp. on Foundations of Comp. Science, pages 429--442, Portland, 1985. IEEE.

[ChorGo88] B. Chor and O. Goldreich.
Unbiased bits from sources of weak randomness and probabilistic communication complexity.
SIAM J. Computing, 17(2):230--261, April 1988.

[DavisIhFe94] Don Davis, Ross Ihaka, and Philip Fenstermacher.
Cryptographic randomness from air turbulence in disk drives.
In Yvo G. Desmedt, editor, Proc. CRYPTO 94, pages 114--120. Springer, 1994.
Lecture Notes in Computer Science No. 839.

[FairfieldMoCo85] R.C. Fairfield, R.L. Mortenson, and K.B. Coulthart.
An LSI random number generator (RNG).
In G. R. Blakley and D. C. Chaum, editors, Proc. CRYPTO 84, pages 203--230. Springer, 1985.
Lecture Notes in Computer Science No. 196.

[ImpagliazzoLeLu89] Russell Impagliazzo, Leonid A. Levin, and Michael Luby.
Pseudo-random generation from one-way functions.
In Proc. (21)st ACM Symp. on Theory of Computing, pages 12--24, Seattle, 1989. ACM.

[Kaliski87] Burton S. Kaliski, Jr.
A pseudo-random bit generator based on elliptic logarithms.
In A.M. Odlyzko, editor, Proc. CRYPTO 86, pages 84--103. Springer-Verlag, 1987.
Lecture Notes in Computer Science No. 263.

[Kaliski88] Burton S. Kaliski, Jr.
Elliptic Curves and Cryptography: A Pseudorandom Bit Generator and Other Tools.
PhD thesis, MIT EECS Dept., January 1988.
Published as MIT LCS Technical Report MIT/LCS/TR-411 (Jan. 1988).

[Lagarias90] J. C. Lagarias.
Pseudorandom number generators in cryptography and number theory.
In Proc. of the AMS Symposia in Applied Mathematics: Computational Number Theory and Cryptography, pages 115--143. American Mathematical Society, 1990.

[Levin85] L. A. Levin.
One-way functions and pseudorandom generators.
In Proc. (17)th ACM Symp. on Theory of Computing, pages 363--365, Providence, 1985. ACM.

[Luby92] M. Luby.
Pseudo-random generators from one-way functions.
In J. Feigenbaum, editor, Proc. CRYPTO 91, page 300. Springer, 1992.
Lecture Notes in Computer Science No. 576.

[Marsaglia91] George Marsaglia and Arif Zaman.
A new class of random number generators.
The Annals of Applied Probability, 1(3):462--480, 1991.

[Maurer91] Ueli M. Maurer.
A universal statistical test for random bit generators.
In A.J. Menezes and S. A. Vanstone, editors, Proc. CRYPTO 90, pages 409--420. Springer-Verlag, 1991.
Lecture Notes in Computer Science No. 537.


A universal statistical test for random bit generators

Ueli M. Maurer. Institute of Theoretical Computer Science, ETH Zürich. 1992.
Journal of Cryptology. Vol. 5. Nr. 2.

Available files: [ abstract ] [plain, unix- or gnu-compressed Postscript]


[MaurerMa90a] Ueli M. Maurer and James L. Massey.
Perfect local randomness in pseudo-random sequences.
In G. Brassard, editor, Proc. CRYPTO 89, pages 100--112. Springer-Verlag, 1990.
Lecture Notes in Computer Science No. 435.

[MicaliSc88] S. Micali and C.P. Schnorr.
Efficient, perfect random number generators.
In S. Goldwasser, editor, Proc. CRYPTO 88, pages 173--199. Springer-Verlag, 1988.
Lecture Notes in Computer Science No. 403.

[ReifTy88] J.H. Reif and J.D. Tygar.
Efficient parallel pseudorandom number generation.
SIAM J. Computing, 17(2):404--411, April 1988.

[RFC1750] D. Eastlake, S. Crocker, and J. Schiller.
RFC1750: Randomness Recommendations for Security.
Internet Activities Board, December 1994.

[SanthaVa86] M. Santha and U. V. Vazirani.
Generating quasi-random sequences from semi-random sources.
Journal of Computer and Systems Sciences, 33:75--87, 1986.

[Shamir81] A. Shamir.
On the generation of cryptographically strong pseudo-random sequences.
In Proc. ICALP, pages 544--550. Springer, 1981.

[Shamir82a] Adi Shamir.
The generation of cryptographically strong pseudo-random sequences.
In Allen Gersho, editor, Advances in Cryptology: A Report on CRYPTO 81, pages 1--1. U.C. Santa Barbara Dept. of Elec. and Computer Eng., 1982.
Tech Report 82-04.

[Vazirani85] U. V. Vazirani.
Towards a strong communication complexity theory, or generating quasi-random sequences from two communicating slightly-random sources.
In Proc. (17)th ACM Symp. on Theory of Computing, pages 366--378, Providence, 1985. ACM.

[VaziraniVa84] U.V. Vazirani and V.V. Vazirani.
Efficient and secure pseudo-random number generation.
In Proc. (25)th IEEE Symp. on Foundations of Comp. Science, pages 458--463, Singer Island, 1984. IEEE.

[VaziraniVa85] U.V. Vazirani and V.V. Vazirani.
Efficient and secure pseudo-random number generation.
In G. R. Blakley and D. C. Chaum, editors, Proc. CRYPTO 84, pages 193--202. Springer, 1985.
Lecture Notes in Computer Science No. 196.


Converted and posted by Carl Ellison --- cme@acm.org