Cracking pass codes with De Bruijn sequences
Suppose you have a keypad that will unlock a door as soon as it sees a specified sequence of four digits. There's no "enter" key to mark the end of a four-digit sequence, so the four digits could come at any time, though they have to be sequential. So, for example, if the pass code is 9235, if you started entering 1139235" the lock would open as soon as you enter the 5.
How long would it take to attack such a lock by brute force? There are 104 possible 4-digit codes, so you could enter
000000010002"99989999
until the lock opens, but there's a more efficient way. It's still brute force, but not quite as brute.
The sequence above could be make more efficient. For example, it tests for codes 0000 and 0001 by entering both full codes. But entering 00001 would test for both. This takes advantage of the fact that the code could start anywhere, not just at every fourth position.
De Bruijn sequencesThe optimal solution to this problem uses a De Bruijn sequence B(k, n), the shortest sequence of symbols from an alphabet of size k that contains every possible subsequence of length n. In the example above, n = 4 and k = 10. A sequence in B(k, n) has length kn. Using a De Bruijn sequence, you could break the lock above by entering at most 10,000 digits rather than 40,000 as in the most naive approach. (That's almost true. We've left out a detail that we'll get to shortly.)
Update: See this post on generating De Bruijn sequences.
Binary exampleBefore we go any further, let's look at a couple examples of De Bruijn sequences. First, let's use k = 2 and n = 3. Then
00010111
is a B(2,3) sequence. You can see that this contains 000, 001, 010, and 011. But what about 100? Here's the detail we left out: In order to contain every possible subsequence, you have to consider the De Bruijn sequence as a cycle. So if we use the last bit and then start over with the first two bits, now we have 100. We also have to wrap around to find 110.
DNA exampleNext, here's an example with k = 4 and n = 3. The following is an example of a De Bruijn sequence for all triples of DNA base pairs.
AAACAAGAATACCACGACTAGCAGGAGTATCATGATTCCCGCCTCGGCGTCTGCTTGGGTGTTT
You could specify any triple by giving its starting location in the sequence: AAA starts in position 1, AAC in position 2, AAG in position 4, etc. Note that TTA starts in position 63 and wraps around. TAA starts in position 64 and also wraps around.
De Bruijn sequences are as short as possible for the problem they solve. For example, the sequence above has 64 characters. Since each of the 64 possible triples corresponds to a starting location, the sequence could not have any fewer than 64 characters.
Brute force with and without an enter keyThe De Bruijn sequence has kn symbols, but to attack a pass code of length n on an alphabet of k symbols we might have to enter kn + n - 1 symbols because of wrapping the sequence back to the beginning. Using De Bruijn sequences cuts the amount of work necessary to perform a brute force attack by about n; it would be exactly n if it weren't for accounting for the need to possibly wrap around and reuse the first few symbols.
In our keypad example, an attack on 4-digit pass codes might need to enter as many as 10,003 digits. If someone added an "enter" key to the keypad and required the user to enter exactly four digits at a time, this would increase the effort of a brute force attack by a factor of approximately 4, from 10,003 keys to 40,000 keys.
But this requires the user to enter five keys rather than four, the last one being the enter key. If the designer increased the pass code length to five digits that could occur at any time, then a brute force attack using De Bruijn sequences would require up to 100,004 keys. Increasing the pass code length by one increases the difficulty of a brute force attack more than requiring a terminator key would.
This is true in general when the alphabet size k is larger than the pass code length n. Suppose you have an alphabet of size k and are using pass codes of length n with no terminator. Requiring a terminator multiplies the difficulty of a brute force attack by n, but requiring one more character in pass codes multiplies the difficulty by k [1].
Increasing the alphabet size also increases security. So instead of using #, for example, as an enter key, a designer could use it as a possible pass code symbol. It could still appear at the end of a pass code, but it could also appear anywhere else, increasing the number of possible pass codes.
Related posts[1] To be precise, a brute force attack on keys of length n using De Bruijn sequences takes kn + n - 1 symbols. Adding a terminator changes the brute force difficulty to n kn. Requiring one more symbol in a pass code instead changes it to kn+1 + n.
So roughly n kn versus k kn. If k > n, multiplying by k makes a bigger increase than multiplying by n.