-----BEGIN PGP SIGNED MESSAGE----- Results of a Survey on PGP Pass Phrase Usage Arnold G. Reinhold Cambridge, Massachusetts, USA May 19, 1996 Revised June 1, 1995 Pass phrase management is arguably one of the weakest links in the PGP security chain. To gather some facts on actual pass phrase usage, I recently conducted a survey over the Internet. The survey questionnaire was posted to usenet four times: three times on alt.security.pgp, on March 10, 26 and April 13 and once on sci.crypt on April 24. A total of 46 responses were received, the last on May 7. One respondent declined to answer the pass phrase questions and was excluded. This is not an ideal survey in that the sample size is small and all the respondents are self-selected, but it is the best method for gathering some real data that I could come up with. Thanks to all those who took the trouble to respond. Results Q. 1. On which computer platform do you run PGP? (#number of responses) a. MS-DOS 24 b. Windows 12 c. Macintosh 8 d. Unix (multi-user) 9 e. Unix (standalone) 6 f. Other (specify) OS/2 4 Amiga 2 Atari 1 Eleven responders indicated two platforms and three indicated three platforms. Most Windows users also indicated MS-DOS. The number of responders who indicated multi-user Unix is a cause for concern. Multi-user Unix is generally considered an undesirable platform for PGP uses since it is so lacking in security. Six of the nine responders who indicated multi-user Unix indicated a personal computer platform as well. The wording of my question does not exclude the use of multi-user Unix systems for encryption only, which is less of a risk. Q. 2 How often do you use PGP? (#number of responses) a. Rarely 5 b. Once a month 9 c. Once a week 6 d. Several times a week 14 e. Daily 4 f. Several times a day 7 The highest reported usage was 50 times/day. Q. 3. How long is your public key (pick the closest value)? (#number of responses) a. 384 bits 0 b. 512 bits 1 c. 768 bits 2 d. 1024 bits 33 f. 1280 bits 1 g. 2048 bits 2 d&g. 1024/2048 bits 5 No response 1 Almost everyone is using long keys. Only three responders were using a public key less than 900 bits. Q. 4. How many characters are in your pass phrase? (characters) Minimum 8 First Quartile 14 Median 21 Third Quartile 41 Maximum 100 Q. 5. How many distinct words are in your pass phrase? (words) Minimum 0 First Quartile 2 Median 4 Third Quartile 7 Maximum 15 6. How many of these words are in an English dictionary or are proper names? (#number of responses) All 15 Some 15 None 15 The median fraction of words in an English Dictionary was 5 out of 8. Of the 15 who reported none, 4 indicated use of non-English dictionary words. Average Word Length Dividing the number of characters by the number of words gives an average pass phrase word length for each responder. The distribution of these values was: (characters) Minimum 3.3 First Quartile 4.5 Median 5.3 Third Quartile 6.9 Maximum 86.0 The median average word length for respondents whose pass phrase was composed entirely of English dictionary words was also 5.3 characters. This is shorter than the median length of words in an English dictionary [DAV] (6 based on a sample of 100), suggesting that pass phrase words are not chosen randomly. 7. Have you written down your pass phrase anywhere? Yes 2 No 43 Almost no one admits to writing down their pass phrase. This is in accord with conventional wisdom, but in view of the numerous short pass phrases reported, it may be bad advice. See my article in Internet Secrets [REI]. 8. Any comments on pass phrases? Most respondents included some comment. See Appendix A, below. Several respondents included good suggestions for choosing a pass phrase. A few indicate some incorrect beliefs about pass phrases including: Changing one's pass phrase often is desirable -- it isn't if you don't change your key at the same time Eliminating spaces in a pass phrase is a good idea -- no particular advantage Using foreign language words helps -- not much, especially if it is a language you might be expected to know. Medium of Response a. Anonymous snail- mail 18 b. Anonymous, encrypted e-mail 9 c. Encrypted e-mail 3 d. Anonymous e-mail 3 e. Open e-mail 12 The medium which responders used to forward their answers to me is of interest in itself. To eliminate any possibility that the data collected could compromise someone's password, I had asked in my first posting that responses only be sent via conventional mail with no return address, i.e. "via anonymous snail-mail." I felt that this method provided the highest level of anonymity. Despite my request, I received about as many e-mail as snail-mail replies to the first posting. A number of the e-mail replies were anonymous. Several responders asked for my PGP public key. My subsequent repostings included my public key but still encouraged snail mail response by offering to contribute the price of postage on all replies received through end of April, 1995 to the Phil Zimmerman defense fund. The repostings did recommend anonymous, PGP- encrypted responses as a less secure alternative. Entropy Analysis A pass phrase consisting of dictionary words is weaker than the same size pass phrase made up of random letters. To try to compare different responses on a single scale, I estimated the entropy of each responder's pass phrase using the following approximation: Est_Entropy = 15*num_of_dictionary_words + 5.5*num_of_characters* (1 - num_of_dictionary_words/num_of_words) This formula assigns 15 bits of entropy to each English dictionary word and 5.5 bits per character to non-dictionary words. It is a very crude formula, and I believe it tends to overestimate the entropy of most pass phrases, but it does allow some further analysis of the survey data. Here are some results using this formula: Estimated Pass Phrase Entropy for all Responses (bits) Minimum 30 First Quartile 60 Median 75 Third Quartile 157 Maximum 473 Most respondents are using a pass phrase with substantially less entropy than the IDEA 128 bit session key. Median Est. Entropy by Frequency of Use (bits) a. Rarely 60 b. Once a month 110 c. Once a week 75 d. Several times a week 104 e. Daily 69 f. Several times a day 90 There seems to be no correlation between frequency of PGP usage and pass phrase strength. Median Est. Entropy by PGP Key Length (bits) <900 bits (b & c) 3 responses 66 bits 1024 bits (d) 33 responses 75 bits >1100 bits (f, g, d+g) 8 responses 87 bits Those who select stronger keys seem to choose stronger pass phrases as well. Median Est. Entropy by Medium of Response (bits) a. Snail mail 75 b. Anonymous, encrypted e-mail 105 c. Encrypted e-mail 201 d. Anonymous e-mail 55 e. Open e-mail 65 No strong trend is evident. However, it appears that those responders that used PGP in answering the survey (despite my urgings) have stronger pass phrases: Response not encrypted (a+d+e) 33 responses 75 bits Response encrypted (b+c) 12 responses 105 bits How big should a pass phrase be? In his paper "Efficient DES Key Search," Michael J. Wiener [WEI] describes a machine that could exhaustively search the 56 bit DES key space in 3.5 hours. He estimated that the machine would cost $1 Million to build in 1993. This method assumes knowledge of a block of plaintext and its matching cyphertext, the so-called "known plaintext" attack. An attacker who had someone's pass-phrase-protected secret key but lacked the pass phrase itself has all the information needed for a known plaintext attack on the pass phrase. The attack is somewhat more complex than in the DES case because the generation of possible pass phrases is harder and less certain than the enumeration of DES keys, and because the algorithm that must be executed to test each pass phrase is more complex, requiring both an MD5 and an IDEA pass. To get a rough estimate of the cost of a pass phrase attack, let's assume that a PGP pass phrase engine that could try 2^56 pass phrases in 3.5 hours could be built for $2 million using 1995 technology. Amortizing the cost over 3 years and assuming 24 hour/day operation gives a capital cost of $76/hour. Adding in $14/hr for power consumption, operator time and floor space, gives a total cost or $90/hr or $315/set of 2^56 pass phrases tested. This cost estimate implies the following rough relationship between bits of pass phrase entropy and the level of protection afforded in terms of cost of attack: Bits Cost of Attack (1995) 56 $315 60 $5,000 64 $81,000 68 $1,290,000 72 $21,000 ,000 76 $330,000,000 80 $5,300,000,000 To allow for progress in electronics, one bit of entropy should be added every 2 to 3 years. Using my rough estimate, 31% of responders had pass phrases with 60 bits of entropy or less; 20% had less than 56 bit of entropy. Recommendations This study, crude as it is, suggests that a significant minority of PGP users are using inadequate pass phrases. Before you say "Well, my pass phrase is long enough," remember that in PGP, as in all public key systems, the security of the messages you send depends on the security of the recipient's secret key, not on your own safeguards. And, in general, you have no way of knowing how careful he or she is. Of course, an attacker that was able to purloin someone's pass-phrase- protected secret key might also be in a position to bug their keyboard or to plant a program that would capture their pass phrase. Still, it is good practice to make every layer of security as effective as practical. Judging from their comments, users with apparently weak pass phrases often thought they were adequate. I believe there are there are two paths to strengthening PGP pass phrases: education and improvements to PGP itself. Education It should be possible to develop a consensus on minimal standards for pass phrases. My recommendations would include, as a minimum: o Five or more randomly chosen dictionary words, or o A quotation of at least 10 words selected at random from a randomly chosen library book, or o A password of at least 8 random syllables. In addition, I think it is time to reconsider the "never write down your pass phrase" slogan. If writing down at least a portion of one's pass phrase leads to stronger pass phrase choices, it might be a good practice to recommend. Improvements to PGP There are a couple of ways PGP could be improved that would reduce the risk of pass phrase compromise: 1. PGP could warn users when they attempt to enter a pass phrase of 10 characters or less. 2. PGP could include a random pass phrase generator with a couple of options including random dictionary word selection and random pronounceable syllables. Peter Kwangjun Suk [SUK] has compiled a "list of 10760 `words' that are easy to remember but whose average length is 4.77 characters." It might be a good basis for a pass phrase generator. 3. (my favorite) PGP should change the way it hashes pass phrases to substantially increase the computation time required. The Opportunity for Improved Pass Phrase Hashing The Wiener DES engine, described above, assumed one DES trial every 20 nanoseconds in each of the parallel processing nodes. Increasing the time to 0.2 seconds would make such an engine 10,000,000 times more expensive. Also, by using large amounts of memory and as much of the power of the personal computer's microprocessor as possible, the silicon footprint of each node would be greatly increased. Each node in Wiener's DES design had 26,000 equivalent gates. An MD5/IDEA design would require several times as many, say 100,000. On the other hand, simulating the essential parts of a 486-class microprocessor and 1 megabyte of memory might require 10,000,000 equivalent gates, complicating the design of a search engine by another factor of 100. Combined, the two effects described above could make a pass phrase attack one billion times harder, with little negative impact on PGP users. Currently (as I read the source code) PGP uses the MD5 hash of the pass phrase as the IDEA key for encrypting the secret key. Instead, PGP could use the pass phrase as the initial value for a computation-time intensive hash algorithm optimized to use as much of the processing resources in a typical personal computer as possible, including wide word multiplies, branches and lots of RAM. A good starting place for the design of a computation-time intensive hash algorithm might be the Randomizing by Shuffling method discovered by Carter Bays and S. D. Durham [BAY] and described in Knuth's The Art of Computer Programming [KNU]. The auxiliary table would be large, on the order of 1 megabyte, and filled using a 32-bit linear-congruential pseudo-random number generator. MD5 or SHA passes every so often would add security, but these hash algorithms should not be a large component of the compute time since they are optimized for hardware implementation. In addition, each encrypted secret key should have "salt" stored with it to prevent an attacker from developing a dictionary of IDEA keys that match common pass phrases. The computation-time intensive hash algorithm should have the number of iterations and amount of memory used as parameters. When generating secret keys or when changing pass phrases, users could be given a choice of levels of protection. Each level would specify the number of iterations and amount of memory. This would insure interoperability between different machines. Level A might be the existing PGP scheme for backwards compatibility, Level B might be set for 68000/8086 generation computers, Level C might be set for 486/68040 class machines, Level D might be set for Pentium/PowerPC class machines. Level E and above might double the number of iterations of the previous level. As PGP's gains wider acceptance, new users will likely be even less careful in pass phrase selection and less willing to use long pass phrases than the "early adopters" who responded to this survey. Adding a factor of 10^9 in the difficulty of recovering pass phrases is roughly equivalent to adding 30 bits of entropy to each pass phrase and would significantly improve the security of PGP for all users. References [BAY] C. Bays and S. D. Durham, ACM Trans. Math. Software 2, 1976, pp. 59-64 [DAV] P. Davies, Ed., "The American Heritage Dictionary of the English Language," (55,000 entries), Dell, 1973 [KNU] D. E. Knuth, "The Art of Computer Programming," Vol. 2 "Semi- Numerical Algorithms," Second Edition, Sec. 3.2.2, Algorithm B, pp. 32- 33, Addison-Wesley, 1973 {REI] A. G. Reinhold, "Common Sense and Cryptography," in Internet Secrets, J. R. Levine and C. Baroudi, Ed., IDG Books, 1995, p. 148 [SUK] P. K. Suk, "Re: A Good Solution to the Passphrase Question," posted to alt.security.pgp, 4 Apr 1995 23:21:06 EDT, suk@usceast.cs.scarolina.edu [WEI] M. J. Wiener, "Efficient DES Key Search," Bell-Northern Research, Ottawa, Ont., Canada, 1993, available at http://www.eff.org/pub/EFF/ Policy/Crypto/Misc/Technical/des_key_search.ps.gz Appendix A - Pass Phrase Survey Comments I do use another language, with numerals included. Longer the better, uses PGP 4-12 times/day. [8 chars, 2 words, 0 English] Uses PGP 50 times/day. Plans to increase to 21 chars. [12 chars, 1word, 0 English] Adds numbers & capitalization to make passphrase more difficult to guess. It's mine and it's SECRET. Event from childhood w/random punctuation & misspellings. [long comment] Plans to increase usage in the future. Pass phrase easy to get by other means: bug keyboard, video tape; therefore complexity not a deterrent. [100 chars, 15 Words, 13 English] Changes passphrase often, uses multiple languages. [long comment] Name of my favorite movie with ? substituted for a letter. Plans to change his passphrase. Why isn't your PGP public key in your .plan file? Mine is easily remembered, but a hybrid of different languages, (none English). I probably use PGP less because my pass phrase is such a pain to type in. [50 chars, 14 words, 11 English] Two keys, one for anonymous. Choose random words from dictionary. [Thailand] Phrase from literature. Includes 9 non-Latin chars, no blanks, uses PGP 30 times a day. Long random phrases are too hard to remember. uses PGP >6 times/day. [39 chars 7 English words] d-15,g-20 If PGP is so secure, how come the weak link is the passphrase. "The Turtle moves" In list of quotes from book he likes, changing to Spanish. Should enforce minimal length/complexity. [12 chars, 2 words, 0 English] Obscures simple phrase by substitutions e.g. "You won four quarts of jello" -> "u14KwartzuvjeLoh". SECDEV passphrase is 20 random alphanumerics selected by pair of dice & 6x6 table. They're easier to remember than strong 8-10 char passwords and stronger. Excerpt from one of the books he owns. Had PGP 2.3a so couldn't read my key. [Japan] All in German dictionary, better than passwords. Changes passphrase every month. Selects from books. "F**k the NSA (and alike)!" [From France where encryption is illegal!] I find PGP a difficult program to use. It ain't user friendly, that for sure. [23 chars, 4 English words] - ---------------------------------------------------------------------------- Copyright (c) 1995, Arnold G. Reinhold, Cambridge, Mass. USA. The author hereby grants rights for free non-commercial electronic distribution of the entire text with attribution and signature attached. -----BEGIN PGP SIGNATURE----- Version: 2.6 iQCVAwUBL84Tx2truC2sMYShAQF3tgP/RI3OQNHMu9GmCi7713DeXtzGKPeSYRRF ti6EBsOdu8R1BdFVrW5/nBWG7HqcM0uNVl4Uy2kCAszb4Tonvsaf0qY0Dbw88EyE EyKcfIrFZWSFHn+DlblwzxgnDiYe8owYxDuzCy4Y3kIyGlc8pFXjljMBbKLslog5 PrRHbp6OkrY= =dYyx -----END PGP SIGNATURE-----