Check sums and error detection
The previous post looked at Crockford's base 32 encoding, a minor variation on the way math conventionally represents base 32 numbers, with concessions for human use. By not using the letter O, for example, it avoids confusion with the digit 0.
Crockford recommends the following check sum procedure, a simple error detection code:
The check symbol encodes the number modulo 37, 37 being the least prime number greater than 32.
That is, we take the remainder when the base 32 number is divided by 37 and append the result to the original encoded number. The remainder could be larger than 31, so we need to expand our alphabet of symbols. Crockford recommends using the symbols *, ~, $, =, and U to represent remainders of 32, 33, 34, 35, and 36.
Crockford says his check sum will "detect wrong-symbol and transposed-symbol errors." We will show that this is the case in the proof below.
Python exampleHere's a little Python code to demonstrate how the checksum works
from base32_crockford import encode, decode s = "H88CMK9BVJ1V" w = "H88CMK9BVJ1W" # wrong last char t = "H88CMK9BVJV1" # transposed last chars def append_checksum(s): return encode(decode(s), checksum=True) print(append_checksum(s)) print(append_checksum(w)) print(append_checksum(t))
This produces the following output.
H88CMK9BVJ1VP H88CMK9BVJ1WQ H88CMK9BVJV1E
The checksum character of the original string is P. When the last character is changed, the checksum changes from P to Q. Similarly, transposing the last two characters changes the checksum from P to E.
The following code illustrates that the check sum can be a non-alphabetic character.
s = "H88CMK9BVJ10" n = decode(s) r = n % 37 print(r) print(encode(n, checksum=True))
This produces
32 H88CMK9BVJ10*
As we said above, a remainder of 32 is represented by *.
ProofIf you change one character in a base 32 number, its remainder by 37 will change as well, and so the check sum changes.
Specifically, if you change the nth digit from the right, counting from 0, by an amount k, then you change the number by a factor of k 2n. Since 0 < k < 32, k is not divisible by 37, nor is 2n. Because 37 is prime, k 2n is not divisible by 37 [1]. The same argument holds if we replace 37 by any larger prime.
Now what about transpositions? If you swap consecutive digits a and b in a number, you also change the remainder by 37 (or any larger prime) and hence the check sum.
Again, let's be specific. Suppose we transpose the nth and n+1st digits from the right, again counting from 0. Denote these digits by a and b respectively. Then swapping these two digits changes the number by an amount
(b 2n+1 + a 2n) - (a 2n+1 + b 2n) = (b - a) 2n
If a a b, then b - a is a number between -31 and 31, but not 0, and so b - a is not divisible by 37. Neither is any power of 2 divisible by 37, so we've changed the remainder by 37, i.e. changed the check sum. And as before, the same argument works for any prime larger than 47.
Related posts[1] A prime p divides a product ab only if it divides a or it divides b. This isn't true for composite numbers. For example, 6 divides 4*9 = 36, but 6 doesn't divide 4 or 9.