Exploring bad passwords
If your password is in the file rockyou.txt then it's a bad password. Password cracking software will find it instantly. (Use long, randomly generated passwords; staying off the list of worst passwords is necessary but not sufficient for security.)
The rockyou.txt file currently contains 14,344,394 bad passwords. I poked around in the file and this post reports some things I found.
To make things more interesting, I made myself a rule that I could only use command line utilities.
Pure numeric passwordsI was curious how many of these passwords consisted only of digits so I ran the following.
grep -P '^\d+$' rockyou.txt | wc -l
This says 2,346,744 of the passwords only contain digits, about 1 in 6.
Digit distributionI made a file of digits appearing in the passwords
grep -o -P '\d' rockyou.txt > digits
and looked at the frequency of digits.
for i in 0 1 2 3 4 5 6 7 8 9 do grep -c $i digits done
This is what I got:
5740291 6734380 5237479 3767584 3391342 3355180 3118364 3100596 3567258 3855490
The digits are distributed more evenly than I would have expected. 1's are more common than other digits, but only about twice as common as the least common digits.
Longest bad passwordsHow long is the longest bad password? The command
wc -L rockyou.txt
shows that one line in the file is 285 characters long. What is this password? The command
grep -P '.{285}' rockyou.txt
shows that it's some HTML code. Nice try whoever thought of that, but you've been pwned.
A similar search for all-digit passwords show that the longest numeric passwords are 255 digits long. One of these is a string of 255 zeros.
Dictionary wordsA common bit of advice is to not choose passwords that can be found in a database. That's good advice as far as it goes, but it doesn't go very far.
I used the comm utility to see how many bad passwords are not in the dictionary by running
comm -23 sorted dict | wc -l
and the answer was 14,310,684. Nearly all the bad passwords are not in a dictionary!
(Here sorted is a sorted version of the rockyou.txt file; I believe the file is initially sorted by popularity, worst passwords first. The comm utility complained that my system dictionary isn't sorted, which I found odd, but I sorted it to make comm happy and dict is the sorted file.)
Curiously, the command
comm -13 sorted dict | wc -l
shows there are 70,624 words in the dictionary (specifically, the american-english file on my Linux box) that are not on the bad password list.
Smallest good' numeric passwordWhat is the smallest number not in the list of pure numeric passwords? The following command strips leading zeros from purely numeric passwords, sorts the results as numbers, removes duplicates, and stores the results in a file called nums.
grep -P '^\d+$' rockyou.txt | sed 's/^0\+//' | sort -n | uniq > nums
The file nums begins with a blank. I removed this with sed.
sed -i 1d nums
Next I used awk to print instances where the line number does not match the line in the file nums.
awk '{if (NR-$0 < 0) print $0 }' nums | less
The first number this prints is 61. This means that the first line is 1, the second line is 2, and so on, but the 60th line is 61. That means 60 is missing. The file rockyou.txt does not contain 60. You can verify this: the command
grep '^60$' rockyou.txt
returns nothing. 60 is the smallest number not in the bad password file. There are passwords that contain '60' as a substring, but just 60 as a complete password is not in the file.
Related posts- Passwords and power laws
- Salting and stretching a password
- Cracking pass codes with De Bruijn sequences
- Cryptography posts