Passwordpaper Springer SCIproof
Passwordpaper Springer SCIproof
1 Introduction
Online security has been a major concern since the time when the Internet became
a necessity for the society, from business activities to everyday life of ordinary
people. A fundamental aspect of online security is to protect data from
unauthorized access. The most commonly used method for doing this is to use
password as part of the online access process. Password is a secret character string
only the user knows and its hashed code is stored on the server that provides
access to the data. When the user requests for data access, he enters the password
along with other identification information, such as user name or email. A message
digest cypher (hash) of the password is computed and the hashed code is
transmitted to the server that matches the hash
166 G. Hu
G. Hu(✉)
Department of Computer Science, Central Michigan University, Mount Pleasant,
MI 48859, USA
e-mail: [email protected]
hashing
match no match
Granted Denied
string, and continues this process (for off-line attack) until a match is found (the
password is cracked) or the hacker gives up without success. The idea of cracking
a password is shown in Fig.2, where the hacker tries a sequence of potential
password p1,p2,…,pk to generate hash codes h1,h2,…,hk, and to see if hk = h, where
h is the given hash code to be cracked. For online attack, a “three strikes” type of
rule may prevent the hacker to try more than 3 times, though.
The problem for the hacker to figure out is how to select strings p1,p2,…,pk as
potential passwords. There are various approaches for doing so, all based on
guessing. From the user’s point of view, the critical point to the secrecy of the
password is make it very hard for hackers to guess the password from the hash
code. This is the issue of password quality (or strength).
In this paper, we survey various measures for password quality including a new
complexity measure that we propose in the paper, and compare these measures by
analyzing their relationships. Experiments on cracking a small set of passwords of
different quality levels was conducted to confirm the correlation between the
difficulty of cracking and password quality. We also applied clustering algorithm
on these measures to a set of passwords to group them into four quality clusters
(weak, fair, medium, strong). Since password quality is mostly an indicator as to
how hard a password can be cracked, we first briefly discuss approaches to crack
passwords.
Each password p is hashed using an encryption function f to generate its hash code
h as defined in Eq. (1). Since only h (rather than p) is stored in the database on the
resource server, the task of cracking a password is to use a method c such that c(h)
= p. Note that c is not f−1 that doesn’t exist. So, the hacker needs to figure out what
method c to use such that he has a better chance to succeed finding p.
Commonly used methods for cracking passwords include brute-force attack,
dictionary attack, and some variations of these with time-space tradeoff
considerations.
168 G. Hu
where Nl is the size of string pool the possible passwords are drawn from the
alphabet 𝛴. That is, the time complexity of brute-force attack is polynomial of N
and exponential of l. In general, the alphabet 𝛴 is one of the basic sets or
combination of them. The basic sets as given in Table1.
Most passwords are from lowercase letters (L) only. In this case, for a password
of length 6 (minimum length required by most service sites), there are 26 6 ≈ 308.9
× 106 possible strings to try. If we can make 100,000 calls to the crypt function f
per second, it will take 3089s, or 51.5min, to exhaust all possible strings from L.
For
a larger alphabet, say D ∪ L, a 6-char password will need over 6h to crack, while
a 7-char password needs over 9 days, and 11 months for a 8-char password.
It is practically infeasible to use brute-force approach to crack a password of
length 7 or longer with a larger 𝛴. It is the basic reason why most service sites
require passwords to be at least 8 characters long, with at least an uppercase letter,
and a digit or a special character, in addition to lowercase letters. This will make N
= (62 or 94) and l ≥ 8.
Since most people use human-memorable passwords that are likely words in
dictionaries or some variations of these words, a hacker can try each word in the
dictionary rather than random string used in brute-force. With this approach, each
word wi ∈ D in a dictionary D is checked to see if f(wi) = h, where h is the given
hash code to be cracked. Hence, it is very easy to crack if the password is a word
in the dictionary. In practice, all the hash codes f(wi) are pre-computed and stored
in a database, rather than computed at run time.
There are two issues with this approach. First, the dictionary needs to contain a
very large number of words to cover most (if not all) passwords that you think
On Password Strength: A Survey and Analysis 169
people may use. Second, most users are aware of dictionary attack and avoid using
actual dictionary words for passwords; instead, they make small changes to the
word so that it is still easy to remember. For example, rather than directly using
essay as password, they may use essay1, e55ay, or 3ssay. Hence,
dictionary attack often apply some rules to the “spelling” of the words, such as
replacing l by 1, o by 0, e by 3, s by 5, etc.
Let D be the dictionary used and m be the number of rules, the time t to crack a
password using dictionary attack is
|
t ∈ 𝛩(|D + m) (3)
That is, the time complexity is linear to the size of the dictionary and number of
rules, a significant improvement from brute-force. However, the method may fail
to find the password if the dictionary does not contain the password or its
variations after applying the rules. Table 1 Basic alphabet
𝛴 N
D={0..9} 10
26
L={a..z}
26
U={A..Z}
S={∼‘!@#$%^&*()-_=+[{]}\|;:”’,<.>/?} 32
To speed up the cracking time, table lookup attack stores precomputed hashes of
potential passwords (dictionaries) in a database, and cracks a given password hash
by searching the database. The search is a lot faster than the original dictionary
attack simply because no need to compute the hash for each guess at run time.
However, the method requires a huge amount of storage space to store “all”
possible passwords and their hashes.
Rainbow table attack [16] is a variation of table lookup attack. Instead of
precomputing the hash codes of a large number of potential passwords and storing
in a database, rainbow table approach is a time-memory trade-off to store much
less number of hash codes that still represent a huge number of passwords. The
basic idea is to create a password-hash chain of length k that covers k potential
passwords and their hash codes.
For each word w ∈ D in a dictionary D, we create a chain c = (p1,h1,…,pk,hk),
where p1 = w,hi = f(pi) and pi = r(hi−1), f is a crypt hash function and r is a reduce
function that “reduces” a has code to a potential password. For each chain, only
the pair (p1,hk), i.e. the initial password and the ending hash code, is stored in the
database. Hence, if k = 10,000, the pair (p1,hn) represents all the
170 G. Hu
10,000 passwords and their hash codes in the chain, hence a significant saving on
storage space. A rainbow table T is a set of chains: T = {ci,i = 1,…,n}, where ci is
the chain for word wi in D.
To crack a given hash code h of password p, we check if h equals the ending
hash hk. If it does, r(hk) is the target password p. Otherwise, we keep applying r
and f alternatively backward in the chain until the password is found, or we move
the next chain in the table T. If all chains are exhausted, we just failed to find p.
The time complexity of rainbow table attack is
t ∈ 𝛩(k|D|) (4)
that is linear to the dictionary size but with a constant coefficient k that may be
quite large (say, 10,000 or larger). It is a time-memory trade-off that, using the
same storage space, it covers k times more words using the same dictionary but
about k times slower than dictionary attack.
One of the problems with rainbow table approach is collision when two
different hash codes are reduced to the same password. Another difficult is how to
make the reduce function r “well behaved.” That is, r should map the hash codes
into welldistributed (likely as user-selected rather than random) password set.
It was argued that users prefer passwords that are easy to remember. Most users
are aware of dictionary attack, so human-memorable passwords are mostly not in
dictionaries, nor random (randomly generated passwords are hard to remember).
One of the approaches to attack human-memorable passwords is “smart
dictionary” attack using dictionaries that contain passwords that users are likely to
generate. Narayanan and Shmatikov introduced a fast dictionary attack method
based on the likelihood of the sequence of characters in users’ passwords [15]. The
method uses standard Markov model to generate smart dictionary that is much
smaller than the ones used in traditional dictionary attack. The main observation
given in [15] is that “the distribution of letters in easy-to-remember passwords is
likely to be similar to the distribution of letters in the users’ native language.”
Hence, the Markovian dictionary can be created based on the probability of the
characters in a sequence.
Let v(x) be the frequency of character x in English text, and v(xi+1|xi) be the
frequency of xi+1 given that the previously generated character is xi. In the zero-
order Markov model, the probability of a sequence 𝛼 = x1x2 …xn is
∏
p(𝛼) = v(x)
x∈𝛼
∏n
p(x1x2 …xn) = v(x1) v(xi+1|xi)
i=1
∏n
Dv,𝜃 = {x1x2 …xn ∶ v(x1) v(xi+1|xi) ≥ 𝜃}
i=1
The great advantage of these models is that they drastically reduce the size of
search space be eliminating the majority of words from the dictionary that are not
likely to be user-selected passwords. It is shown in [15] that if 𝜃 = 1∕7 (i.e. only
172 G. Hu
14% of sequences are produced while 86% of sequences are ignored) the zero-
order dictionary still has 90% probability covering the plausible passwords. A
dictionary containing 1/11 of the keyspace has 80% coverage and 1/40 of
keyspace has 50% coverage. Their experiments using the dictionaries of small
fraction of the search space successfully recovered 67.6% of the passwords that is
a lot higher than many previous work.
Dictionary attack often uses word-mangling rules, but it can be difficult to chose
effective mangling rules. One approach to address this problem is to generate
guesses in the order of their probability to be user passwords. This would increase
the likelihood of cracking the target password in a limited number of guesses. The
basic idea of this approach is to estimate the probability of the user passwords
from a training set, a set of disclosed real passwords, and create a context-free
grammar to be used to estimate the likelihood of the formation of a string [21, 23].
𝛼 → 𝛽,(p)
∑
1 for all productions i that have the same 𝛼.
i pi =
In the case for passwords, the only variables (in addition to the start symbol T)
are L, D, ans S, representing letters, digits, and special characters. Notations Lk,
Dk and Sk represent consecutive k letters, consecutive k digits, consecutive k
special symbols, respectively. The probability pi of each production rule i is
estimated using the training set. The probability of a sentential form (a string
derived from T) is the product of the probabilities of the productions used in the
derivation. An example of a derivation is
S ⇒L𝟑D𝟏S𝟏 ⇒L𝟑4S𝟏 ⇒L𝟑4#
Analysis of password strength has been an active area for research and practice for
a long time. The focus of these work is on the metrics of password strength and
evaluation of these metrics. We shall survey several metrics for password quality
including complexity that we propose here.
3.1 Entropy
Entropy is a measure of uncertainty and was a term used by Claude Shannon in his
information theory [19]. He applied entropy to analysis of English text as “the
average number of binary digits required per letter of the original language” [20].
The entropy H of variable X is defined as
∑
H(X) = − p(X = x)log2(X = x)
x
It is obvious that longer passwords from larger charsets will have higher entropy
values.
The NIST Guidelines gave an outline to estimate the entropy of user selected
password based on the length of password, charsets, and possibility of dictionary
attack. It mostly assigns additional bits to the entropy as the password’s length
increases, and added certain “bonus” bits to the use of multiple charsets as well as
dictionary check. The estimated entropy H is given below.
174 G. Hu
Most users are aware of dictionary attack and avoid using dictionary words for
passwords. However, users want passwords easy to remember, so they tend to use
a word and make small changes to it. With this consideration, methods of
dictionary attack also adopted various word-mangling rules to match a password
with words in dictionaries. So, the strength of a password not only considers if the
password is in dictionaries, but also should measure how easy (or hard) to correct
the “spelling errors” in the password so it can match some words in the
dictionaries. This is a commonly measure as a linguistic distance between the
password and a dictionary word. The very basic linguistic distance is the
Levenshtein distance (or edit distance) that is the minimum number of editing
operations (insert, delete, and replace) needed to transfer one word to another. This
idea was used in the password quality index (PQI) metric proposed in [13] and
refined in [14].
L = m × log10 N (6)
where m is the length of w and N is the size of the charset where the characters of
w is drawn from.
The effective length is the length calculated in a “standardized” charset, the
digit set D given in Table1. The idea behind this is that a password (e.g. k38P of
length 4) drawn from multiple charsets (D ∪ L ∪ U) is just as hard to crack (or,
has about the same number of possible candidates to crack) as another password
(e.g. 378902 of length 6) from only the digit set D.
It is seen that the effective length of a password given in (6) is essentially the
same as the entropy value in (5), just off by a constant factor log 2 10. Both
consider the password’s length as well as the size of the charset.
With the PQI measure, The quality criterion given in [13] states that a password
is of good quality if D ≥ 3 and L ≥ 14.
given in [4] that listed the basic requirements at these vendors, partly shown in
Table2.
Some of these vendors also use user information (such as surname) in their
classification of password quality. Most of these meters are based on the
traditional LUDS requirements (uppercase and lowercase letters, digits, and
special chararacters), except Dropbox.
Table 2 Password requirements at various vendors [4]
Service Strength scale Length limits Charset required
Min Max
Dropbox Very weak, 6 72 ∅
Weak, So-so,
Good, Great
Drupal Weak, Fair, 6 128 ∅
Good, Strong
FedEx Very weak, 8 35 1+ lower, 1+
Weak, Medium, upper, 1+ digit
Strong, Very
strong
Microsoft Weak, Medium, 1 – ∅
Strong, Best
Twitter Invalid/Too 6 > 1000 ∅
short, Obvious,
Not secure
enough
Could be more
secure, Okay,
Perfect
Yahoo! Weak, Strong, 6 32 ∅
Very strong
eBay Invalid, Weak, 6 20 any 2 charsets
Medium, Strong
Google Weak, Fair, 8 100 ∅
Good, Strong
Skype Por, Medium, 6 20 2 charsets or
Good upper only
Apple Weak, Moderate, 8 32 1+ lower, 1+
Strong upper, 1+ digit
PayPal Weak, Fair, 8 20 any 2 charsetsa
Strong
a
PayPal counts uppercase and lowercase letters as a single charset
common passwords in leaked password sets, common names from census data,
and common words in Wikipedia. The zxcvbn algorithm finds patterns (sub-
strings) in a password that match items in the sources, and these patterns may
overlap within the password. The patterns include token (logitech), reversed
(DrowssaP), sequence (jklm), repeat (ababab), keyboard (qAzxcde3), date
(781947), etc. It then assigns a guess attempt estimation to each match, and
finally searches for non-overlapping adjacent matches that cover the password and
has the minimum total guess attempt.
Table 3 Password multi-checker output for password$1 [4]
Service Strngth score
We propose a password complexity metric that considers both the common LUDS
requirements and the patterns in the password. As in many other measures, the
number of different charsets used in a password still plays an important role in this
metric. In addition, we consider other factors that may make a password harder to
guess, such as mix of same-charset-substrings, position of special symbols, and
substrings in the dictionary.
As the users are aware of using different charsets to compose passwords, they
are likely to put characters from the same charset together rather than mixing the
them up. For example, it is more likely to have a password horse743 instead of
ho7r4se3. The latter is considered more “complex” than the former and harder
to crack. We find substrings in a password that are from the same charset, and
count the number of such substrings. Using the same example, the number of
substrings in horse743 is 2 (horse and 743), while the number of substrings
in ho7r4se3 is 6 (ho, 7, r, 4, se, and 3). However, this number may be higher
for a longer password than a shorter one, so we take the ratio of this number vs the
password’s length as a factor to our metric.
Another factor is the position of special symbols. Many users use a common
word and then add a special symbol at the end (or beginning). For example,
horse# may be more common than hor#se. We apply a small penalty to this
pattern if the password uses only 2 charsets (including the special symbol charset).
Three data sets were used in our experiments as explained in Table4. The data sets
D1 and D2 were primarily for correlation analysis, whereas D3 was used to practice
password cracking to illustrate the strength of the passwords.
180 G. Hu
Twostatistics,Pearsoncorrelationandmaximuminformationcoefficient (MIC)were
calculated to show the relationships between the strength measure. MIC was
introduced in [18] as a new exploratory data analysis indicator that measures the
strength of relationship between two variables. This measure is a statistic score
between 0 and 1. It captures a wide range of relationships and is not limited to
specific function types (such as linear as Pearson correlation does).
We calculated the Pearson correlation and MIC on four measures (entropy,
NIST entropy, Levenshtein distance, and complexity). The pairwise results for
data set D1 and D2 are given in Table5(a) and (b), respectively.
It shows that the entropy, NIST entropy, and complexity have high MIC scores,
indicating that the three measures are closely related and not very independent.
Their
We used two online password cracking sites to uncover the passwords (given their
MD5 hashes) in the data set D3. One site is CrackStation that “uses massive
precomputed lookup tables to crack password hashes” [5]. The other is HashKiller
[11] that also uses very large hash databases to crack password hashes. The sizes
of the lookup table databases of the two sites are listed in Table6 (a), and the
results are given in Table6 (b).
All the 17 MD5 hashes that HashKiller failed also failed by CrackStation.
HashKiller was more successful than CrackStation simply because it has larger
hash databases. These are hashes of passwords that were randomly generated (like
nXXdHtt6Q, 2y!3e)!%), fwtC9xcO, etc.). The average strength metrics of
182 G. Hu
the passwords are given in Table6 (c). It is clearly shown that the passwords the
sites failed to crack are those with higher strength than those cracked passwords
on all the measures. It is also seen that HashKiller was able to crack passwords
with higher strength than CrackStation.
John the Ripper (JtR) [17] is an open source software for cracking passwords. It
combines dictionary attack, rainbow table attach, and brute force attack, with
various rules for word composition. The free version of JtR comes with a very
small dictionary (password list) containing only 3,546 words. We used JtR to
crack some passwords in the data set D3. The time spent on some of these
passwords is given in Table7. The last two passwords in the table took too long to
crack and we aborted JtR before the passwords were recovered.
Figure3 shows plots of various strength measures vs time spent by JtR to crack
the passwords.
entropy 35.75 23.73 40.02 28.76
NIST 27.84 19.11 31.56 22.66
entropy
complexity 38.33 13.53 42.92 24.54
Levenshtein 4.62 1.53 5.94 2.79
Since the default dictionary of JtR is very small and we did not use an
additional dictionary, only 6 passwords (or slightly changed version of the
passwords by some transformation rules in JtR) found in the dictionary very
quickly. The other passwords were cracked using the brute-force approach and
consumed from 10min to 1.5h for each password. From Fig.3, we can see that the
strength metrics entropy and complexity have positive correlations with the time
spent by JtR, and the Levenshtein distance measure is even more closely
correlated to the time needed to crack the passwords.
NYLM 312.0 GB
CrackStation HashKiller
Measure
un- cracked un- cracked
cracked cracked
449979_1_En_12_Chapter-online ✓ TYPESET
ushe 174b7ade 34
d5e410c96a560d494f 86 2
16.2 3
12.0 8
4.0 0.0 6
0:01:5
0 58f5f
On Password Strength: A Survey and Analysis
r seeyou e2b 15
e778 ec4d274f0af2e7dc 8c43 4 9
25.0 0
19.4 2
01.4 0
3.0 4
0:10:1
4530
DISK
6
funlo 13
c fd0d44ade 8efcb67a69c6
4 8b006f0 8 8
24.3 3
17.0 3
26.6 0
3.0 5
0:24:1
9
_Yt b6cc855441c4c688 ab 84a34722868 6
7.9 0
12.0 7
0.0 0
1.0 5
0:00:1
509
Password 00
ac9cb7dc02b3c 3
07.1 0
28.1 3
01.1 0
1.0 8
0:00:1
2 38eb70 898e549b63
1billabong c61a71b 8 7
ce304b 8bce33cb56f9cffa 4
29.3 7
22.8 1
7.7 0
3.0 9
0:00:0
9
Basketball c71a1808 d9e74b4a5c5d 8d9a17d68d0 2
39.5 3
24.6 2
80.0 0.0 0:00:0
3
phoenix0 aed5d571daa 8 c27a
97 33033534a4ac2 1
32.2 0
20.8 0
16.6 0
2.0 0:05:1
6
9
akjuwfg 24 5
22.8 3
22.0 7
12.8 0
4.0 7
1:07:2
015c09 cba7ac3cdfe01e 98e2b8ad1
oQ4cf5BDA 3567b3ab 8dbbbc1f091e2 8 4
caaf7976b 1
41.2 0
33.0 6
42.0 0
6.0 0
4
D$f9 ecad9ace9534cdb71d04a600960ca3c 7
18.1 0
22.0 0
65.0 0
2.0
4 7 0 0 0
1
2
3
4
5
6
Id
10
10
11
11
11
12
12
12
12
61
8
8
9
0
9
0
7
Time Spent (in seconds)
6,000
5,000
4,000
3,000
2,000
1,000
0
Fig. 3 Time spent by JtR 5 10 15 20 25 30 35 40
Number of iterations: 24
6,000
5,000
4,000
3,000
2,000
1,000
0
0 5 10 15 20 25 30
complexity
The algorithm randomly selected 4 passwords (with ids 29, 35, 38, and 44) as
the initial centroids, ran through 24 iterations and stabilized in 4 clusters with the
within-cluster SSE (sum of squared errors) 7.23.
The projections of the 6-D space onto 2-D of some pairs of the metrics are
shown inFig.4a–d. Notethatthenumber ofpointsofaclusterintheplotmayappear
smaller that the number in Table8. For example, there are 22 instances in cluster 4,
but only 10–14 cluster-4 points in the plots. This is because of 6-D to 2-D
projection and multiple points were mapped to the same spot in the 2-D space.
From theses figures, we can visually see that the passwords were reasonably
clustered into strength groups, from weak to strong.
5 Related Work
A lot work has been done in assessing the quality of passwords. As mentioned
before that a good survey of password meters used by service vendors was given in
[4]. Florêncio and Herley studied a set of half-million user passwords (as well as
other information about the user accounts) on various web sites to find the
characteristics about the passwords and their usages [7]. Kelley et al. analyzed
12,000 passwords and developed methods for calculating the time needed for
several passwordguessing tools to guess the passwords so that the password-
composition policies were better understood [12]. Dell’Amico et al. conducted an
empirical analysis of password strength [6], in which the authors analyzed the
success rate (number of cracked passwords versus search space) using different
tools including brute force, dictionary attack, dictionary mangling, probabilistic
context-free grammars [23], and Markov chains [15]. They concluded that the
probability of success guessing a password at each attempt decreases roughly
exponentially as the size of search space grows. An extension of the probabilistic
context-free grammar approach was given in [26] that tries to find new classes of
patterns (such as keyboard and multi-word) from the training set. Weir et al. tested
the effectiveness of using entropy (NIST) as a measurement of password strength
[22]. Their experiments on password cracking techniques on several large data sets
(largest had 32 million passwords) showed that the NIST entropy does “not provide
a valid metric for measuring the security provided by password creation policies.”
Traditional advice to the users to select their passwords does not provide
additional security, argued by Florêncio et al. [8]. They pointed out that relative
weak passwords are sufficient to prevent brute-force attack on an account if “three
strikes” type rule applies. Thus, making password stronger does little to address the
real threats. Bonneau used statistical guessing metrics to analyze a large set of 70
million passwords [1]. The metrics is to compare the password distribution with
190 G. Hu
6 Conclusion
In this paper we reviewed the basic metrics for assessing password strength,
including entropy, NIST entropy, password quality index, and Lavenshtein
distance, as well as some password quality metrics developed at some popular
service vendors. We proposed a password complexity metric that considers the
composition patterns in a password in addition to the LUDS criteria. The
correlations between these measures were analyzed using the maximum
information coefficient (MIC) and Pearson coefficient. The resulting statistics show
that most of the metrics are closely correlated except Levenshtein distance that may
be a good measure for password quality besides the traditional parameters (length
of password and size of charset).
There password strength metrics were evaluated by experiments that tried to
crack the hashes of a small set of passwords to see if the difficulty of cracking a
password is indeed related to the strength measures. The cracking tools used
On Password Strength: A Survey and Analysis 191
including techniques like brute force, transformation rules, dictionary attacks, and
massive table look-up. Two types of results were obtained: password found or not
(table look-up), and the length of time spent to discover the password (other
techniques used in JtR). Results showed that the level of success cracking the
passwords is highly positively related to the strength measures. It is pretty
convincing from our experiments that the strength metrics are valid and can be
used to assess the quality of user selected passwords. This was further validated by
the clustering results.
The bottom line of our analysis comes back to the simple advice to users: select
a long password with various kinds of characters (lower and uppercase letters,
digits, and symbols), as the length of password and size of charset being the two
most critical parameters to the strength of the password in all the metrics we
studied.
References
1. Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million
passwords. In: 2012 IEEE Symposium on Security and Privacy, pp. 538–552. IEEE (2012)
2. Bonneau, J., Herley, C., van Oorschot, P.C., Stajano, F.: Passwords and the evolution of
imperfect authentication. Commun. ACM 58(7), 78–87 (2015)
3. Burr, W.E., Dodson, D.F., Newton, E.M., Perlner, R.A., Polk, W.T., Gupta, S., Nabbus, E.A.:
Draft NIST special publication 800–63-2: electronic authentication guideline. US
Department of Commerce, National Institute of Standards and Technology (2013)
4. de Carné de Carnavalet, X., Mannan, M.: From very weak to very strong: analyzing
passwordstrength meters. In: Network and Distributed System Security Symposium (NDSS
2014). Internet Society (2014)
5. Defuse Security: Crackstation: Free password hash cracker. https://crackstation.net/
6. Dell’Amico, M., Michiardi, P., Roudier, Y.: Password strength: an empirical analysis. In:
29thIEEE International Conference on Computer Communications, pp. 983–991 (2010)
7. Florencio, D., Herley, C.: A large-scale study of web password habits. In: Proceedings of
the16th International Conference on World Wide Web, pp. 657–666. ACM (2007)
8. Florêncio, D., Herley, C., Coskun, B.: Do strong web passwords accomplish anything? In:
2ndUSENIX Workshop on hot Topics in Security (2007)
9. GitHub: 10k most common passwords. https://github.com/danielmiessler/SecLists/blob/
master/Passwords/10k_most_common.txt
10. Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques, 3rd edn. Elsevier (2012)
11. HashC: Hash killer. https://www.hashkiller.co.uk/
12. Kelley, P.G., Komanduri, S., Mazurek, M.L., Shay, R., Vidas, T., Bauer, L., Christin, N.,
Cranor, L.F., Lopez, J.: Guess again (and again and again): measuring password strength by
simulating password-cracking algorithms. In: 2012 IEEE Symposium on Security and
Privacy (SP), pp. 523–537. IEEE (2012)
13. Ma, W., Campbell, J., Tran, D., Kleeman, D.: A conceptual framework for assessing
passwordquality. Int. J. Comput. Sci. Netw. Secur. 7(1), 179–185 (2007)
14. Ma, W., Campbell, J., Tran, D., Kleeman, D.: Password entropy and password quality. In:
4th
International Conference on Network and System Security (NSS), pp. 583–587. IEEE (2010)
192 G. Hu
15. Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using time-space
tradeoff.In: Proceedings of the 12th ACM Conference on Computer and Communications
Security, pp. 364–372. ACM (2005)
16. Oechslin, P.: Making a faster cryptanalytic time-memory trade-off. In: Annual International
Cryptology Conference, pp. 617–630. Springer (2003)
17. Openwell: John the Ripper password cracker. http://www.openwall.com/john/
18. Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh,
P.J.,Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data
sets. Science 334(6062), 1518–1524 (2011)
19. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423
(1948)
20. Shannon, C.E.: Prediction and entropy of printed english. Bell Labs Tech. J. 30(1), 50–64
(1951)
21. Weir, C.M.: Using probabilistic techniques to aid in password cracking attacks. Ph.D.
thesis,Florida State University (2010)
22. Weir, M., Aggarwal, S., Collins, M., Stern, H.: Testing metrics for password creation
policiesby attacking large sets of revealed passwords. In: Proceedings of the 17th ACM
Conference on Computer and Communications Security, pp. 162–175. ACM (2010)
23. Weir, M., Aggarwal, S., De Medeiros, B., Glodek, B.: Password cracking using
probabilisticcontext-free grammars. In: 30th IEEE Symposium on Security and Privacy, pp.
391–405. IEEE (2009)
24. Wheeler, D.L.: zxcvbn: low-budget password strength estimation. In: Proceedings of the
25thUSENIX Security Symposium, pp. 157–173 (2016)
25. Yan, J., Blackwell, A., Anderson, R., Grant, A.: Password memorability and security:
empiricalresults. IEEE Secur. Priv. 2(5), 25–31 (2004)
26. Yazdi, S.H.: Probabilistic context-free grammar based password cracking: attack, defense
andapplications. Ph.D. thesis, Florida State University (2015)
On Password Strength: A Survey and Analysis 193