Simple substitution ciphers over a gargantuan alphabet
Simple substitution ciphers replace one letter with another. Maybe A goes to W, B goes to G, C goes to A, etc.
These ciphers are famously easy to break, so easy that they're common in puzzle books. Here's one I made [1] for this post in case you'd like to try it.
X RF SXIIXKW XK IYZ UXINYZK HT IYZ CXIICZ YHJSZ RI FZGTXZCG, HJQ SZNHKG TRQF BYXNY XS NJI HTT EV IYZ QXGWZ RKG R MJRQIZQ-FXCZ RNQHSS IYZ TXZCGS TQHF HJQ YHFZ LCRNZ, BYZQZ VHJ RQZ. X RF BQXIXKW R EHHU. XK XI X RF SLZRUXKW IH VHJ. EJI X RF RCSH SLZRUXKW IH IYZ BHQCG. IH EHIY X HBZ RK RNNHJKIXKW.
As is common in puzzle books, I kept the spaces and punctuation.
When you learn that simple substitution is breakable, you might reasonably think that the problem is the small alphabet size. What if you replaced pairs of letters with pairs of letters, effectively working over an alphabet of size 26^2 = 676. That's an improvement, but it's still not secure. It could be broken manually in a few hours, depending on the length of the text, and of course could be broken quickly using a computer.
If we want a cipher to be secure against computer-aided cryptanalysis, we're going to need a much bigger alphabet.
The Roman alphabet has 26 letters, which can be expressed in 5 bits. Pairs of Roman letters would require 10 bits. What if we used a 32-bit alphabet, substituting 32-bit sequences with other 32-bit sequences? This is working over an alphabet of over 4 billion symbols. Surely that's secure? Nope.
What if we use blocks of 128 bits? This is working over an alphabet of size
2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456.
Nope. Still not good enough. Because you can see the penguin.

The image above is a famous example of a downfall of simple substitution, albeit over a gargantuan alphabet. The image was created by taking a graphic of the Linux mascot and encrypting the bits using 128-bit encryption. Each block of 128 bits goes to a unique, essentially random replacement. Each block is well encrypted. But there are repetitive blocks in the original that become repetitive blocks in the encrypted version.
The AES (Rijndael) encryption algorithm is a good algorithm, but in the example above it was used poorly. It was used in electronic code book mode (ECB), something that nobody would do in practice.
In practice, you might do something like cipher block chaining where you XOR each block with the encrypted version of the previous block. You could think of this as a clever way of using a simple substitution over an enormous alphabet. You look up the substitution of each block, but then XOR its bits with the previously encrypted block. Now repetitive input does not produce repetitive output. You cannot see the penguin. The penguin image becomes random-looking static.
Related posts[1] I produced the cryptogram using
cat myfile | tr [a-z] [A-Z] | tr [A-Z] ...
where ..." is a permutation of the 26 upper case letters.
The post Simple substitution ciphers over a gargantuan alphabet first appeared on John D. Cook.