Base 64 encoding remainder problem
I've mentioned base 64 encoding a few times here, but I've left out a detail. This post fills in that detail.
Base 64 encoding comes up in multiple contexts in which you want to represent binary data in text form. I've mentioned base 64 encoding in the context of Gnu ASCII armor. A more common application is MIME (Multipurpose Internet Mail Extensions) encoding.
Base 64 encoding uses 64 characters (A...Z, a...z, 0...9, +, and /) to represent six bits at a time.
In the previous post I showed how ASCII armor encoded a 91,272 byte JPEG image into a text file and how it could convert the text back into binary. The number of bytes in the file a multiple of 3, which you could quickly confirm by casting out nines.
If the number of bytes in a file is not a multiple of three, the number of bits is not a multiple of six, and so we have to do something with the remainder.
For an example, let's start with a file containing the bits
000000 000001 000010 000011 000100 000101 000110 000111
If we run gpg --enarmor on this file, we get
ABCDEFGH=/u99
and some extra text for human consumption. The base 64 encoding is ABCDEFGH, the equal sign is a separator, and /u99 is a checksum.
If we delete the last 8 bits from our file and run ASCII armor again, we get
ABCDEFE==/IPL
The second equal sign separates the base 64 encoding from the checksum, but the first equal sign is something new.
If we chop eight more bits off the end of the file we get
ABCDEA===izh9
What's going on here? In each case, the first 30 bits are being encoded as ABCDE. The remaining bits are
000101 000110 000111
When we cut off the last 8 bits we were left with
000101 0001
The bits 00101 are encoded as F, and the last four bits are padded to 00100 and encoded as E. The trailing equal sign is a signal that two bits were added as padding.
When we cut off 8 more bits we were left with
00
which was padded to 000000 and encoded as A. Then two equal signs were added to signal that four bits were added as padding.
So the rule is add two or four 0s at the end to make the number of bits a multiple of six. Then add an equal sign for each pair of bits added.
Since file sizes are multiples of bytes, and a byte is 8 bits, the number of bits in a file is always even. This means the remainder when the number of bits is divided by 6 is 0, 2, or 4. So if we add padding, we only add two or four zero bits and never an odd number of bits.
The post Base 64 encoding remainder problem first appeared on John D. Cook.