This lesson continues the discussion of substitution ciphers, such as vigenère, beaufort, variant-beaufort, and general, showing how they're used, as well as the techniques used to break them.
One approach to reducing the "spikyness" of natural language text is used the Playfair cipher which encrypts more than one letter at once. We now consider the other alternative, using multiple cipher alphabets in turn. This gives the attacker more work, since many alphabets need to be guessed, and because the frequency distribution is more complex, since the same plaintext letter could be replaced by several ciphertext letters, depending on which alphabet is used.
Simple create a set of caesar cipher translation alphabets, then use each in turn, as shown next.
Plaintext THISPROCESSCANALSOBEEXPRESSED Keyword CIPHERCIPHERCIPHERCIPHERCIPHE Plaintext VPXZTIQKTZWTCVPSWFDMTETIGAHLH
In this example have the keyword "CIPHER". Hence have the following translation alphabets:
C -> CDEFGHIJKLMNOPQRSTUVWXYZAB
I -> IJKLMNOPQRSTUVWXYZABCDEFGH
P -> PQRSTUVWXYZABCDEFGHIJKLMNO
H -> HIJKLMNOPQRSTUVWXYZABCDEFG
E -> EFGHIJKLMNOPQRSTUVWXYZABCD
R -> RSTUVWXYZABCDEFGHIJKLMNOPQ
ABCDEFGHIJKLMNOPQRSTUVWXYZ
to map the above plaintext letters.
Hence, map the plaintext message as follows:
'T' uses key 'C' maps to 'V' 'H' uses key 'I' maps to 'P' 'I' ises key 'P' maps to 'X' etc
+--------------------------+ |ABCDEFGHIJKLMNOPQRSTUVWXYZ| +----------------------------------------------------+ |ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ| +----------------------------------------------------+ | | +--------------------------+
The "Saint-Cyr Slide" was popularised and named by Jean Kerckhoffs, who published a famous early text "La Cryptographie Militaire" (Miltary Cryptography) in 1883. He named the slide after the French National Military Academy where the methods were taught. He also noted that any slide can be expanded into a tableau, or bent round into a cipher disk. cf. Kahn Ch 8 pp230-239.
Plaintext Letter
ABCDEFGHIJKLMNOPQRSTUVWXYZ
+-----------------------------+
A | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
B | BCDEFGHIJKLMNOPQRSTUVWXYZA |
C | CDEFGHIJKLMNOPQRSTUVWXYZAB |
D | DEFGHIJKLMNOPQRSTUVWXYZABC |
E | EFGHIJKLMNOPQRSTUVWXYZABCD |
K F | FGHIJKLMNOPQRSTUVWXYZABCDE |
e G | GHIJKLMNOPQRSTUVWXYZABCDEF |
y H | HIJKLMNOPQRSTUVWXYZABCDEFG |
I | IJKLMNOPQRSTUVWXYZABCDEFGH |
L J | JKLMNOPQRSTUVWXYZABCDEFGHI |
e K | KLMNOPQRSTUVWXYZABCDEFGHIJ |
t L | LMNOPQRSTUVWXYZABCDEFGHIJK |
t M | MNOPQRSTUVWXYZABCDEFGHIJKL |
e N | NOPQRSTUVWXYZABCDEFGHIJKLM |
r O | OPQRSTUVWXYZABCDEFGHIJKLMN |
P | PQRSTUVWXYZABCDEFGHIJKLMNO |
Q | QRSTUVWXYZABCDEFGHIJKLMNOP |
R | RSTUVWXYZABCDEFGHIJKLMNOPQ |
S | STUVWXYZABCDEFGHIJKLMNOPQR |
T | TUVWXYZABCDEFGHIJKLMNOPQRS |
U | UVWXYZABCDEFGHIJKLMNOPQRST |
V | VWXYZABCDEFGHIJKLMNOPQRSTU |
W | WXYZABCDEFGHIJKLMNOPQRSTUV |
X | XYZABCDEFGHIJKLMNOPQRSTUVW |
Y | YZABCDEFGHIJKLMNOPQRSTUVWX |
Z | ZABCDEFGHIJKLMNOPQRSTUVWXY |
+-----------------------------+
Plain: ABCDEFGHIJKLMNOPQRSTUVWXYZ Cipher: DCBAZYXWVUTSRQPONMLKJIHGFE
Here are using a backwards alphabet. Note that the same function is used for decryption (unlike Vigenère). The effect of the cipher is to reverse the order of the plaintext letters, and then right shift them.
Vigenère and Variant-Beauford Ciphers are mutual inverses. The translation used to encrypt one, decrypts the other. Also always have equivalent pairs of keys, since shifting left i places MUST be the same as shifting right by 26-i places. This also means that when analysing, have no way of distinguishing a Vigenère from a Variant-Beauford cipher, except that possibly with one the key is a valid word, otherwise is random letters.
d possible alphabets, then:
N = H(K)/D = log2 26d/3.2 = d log2 26/3.2 = 1.5d
27.6d
Moving to polyalphabetic ciphers does improve the security, but not dramatically. Each additional alphabet adds the same amount of work to break, but doesn't dramatically improve the security.
key: DECEPTIVEWEAREDISCOVEREDSAV plaintext: WEAREDISCOVEREDSAVEYOURSELF ciphertext: ZICVTWQNGKZEIIGASXSTSLVVWLA
See that the key used is the keyword "DECEPTIVE" prefixed to as much of the message "WEAREDISCOVEREDSAV" as is needed. When deciphering, recover the first 9 letters using the keyword "DECEPTIVE". Then instead of repeating the keyword, start using the recovered letters from the message "WEAREDISC". As recover more letters, have more of key to recover later letters.
(0.1275)2 = 0.01663, about twice as often
as a 'T' encrypted with a key of 'T'
Plaintext: TOBEORNOTTOBE Key: NOWNOWNOWNOWN Ciphertext: GCXRCNACPGCXR
As we saw previously, it was this discovery by Babbage & Kasiski which broke the previously "unbreakable" cipher. Like most advances in cryptology, it came from looking at the problem in a different and "new" way - in this case, realising that repeated runs of plaintext letters, that "happened" to line up with the same keyword letters, would generate the same run of ciphertext letters. This was the chink in the armour needed.
The Index of Coincidence (IC) is a critical statistic used in analysing classical polyalphabetic substitution ciphers. Note that William F Friedman is one of the great cryptographers of the early 20C, and wrote several of the seminal works (classified until recently) on military cryptanalysis, now published by Aegean Park Press - look him up as an author in the ADFA library catalog. The critical idea is to get an estimate of the number of alphabets used, by looking at how "spiky" or not, the distribution of letters in the ciphertext is, since we know that the more alphabets used, the flatter it gets. The development of the IC formula comes from first developing the idea of a "Measure of Roughness", and then equating that formula to the observed frequencies in the ciphertext letters, as shown next.
i='z'
MR = SUM ( pi - 1/26)2
i='a'
pi is the probability that any
randomly chosen letter is the ith letter
pi's summing to 1
(must be some letter in the alphabet)
i='z' SUM pi = 1 i='a'
The full derivation of the MR and IC is given in Sinkov Ch 3 pp63-70, and a briefer version in Seberry 1/e Ch3 pp71-74.
i='z'
MR = SUM pi2 - 0.038
i='a'
or
i='z'
MR + 0.038 = SUM pi2
i='a'
pi2
pa2 is the probability that two arbitrarily selected letters are both 'a'
pb2 is probability both are 'b'
[N] | | = 1/2 N (N-1) [2]
Fi is the frequency of the ithletter in the ciphertext
Fi(Fi-1) ------- 2
N letters in the ciphertext, then
i='z' SUM Fi = N i='a'
i='z' Fi(Fi-1)
IC = SUM -------
i='a' N(N-1)
MR + 0.038 = IC
Now have a way to approximate the MR using the IC, which calculated from the observed ciphertext letter frequencies, and compare to the expected values for any given language. The values here are for English, see Seberry for values for other languages.
d the expected value of the IC is
1 N-d [d-1] N
Expected(IC) = - --- (0.066) + | | --- (0.038)
d N-1 [ d ] N-1
| Period (No of Alphabets) | Expected IC |
|---|---|
| 1 | 0.0660 |
| 2 | 0.0520 |
| 3 | 0.0473 |
| 4 | 0.0450 |
| 5 | 0.0436 |
| 6 | 0.0427 |
| 7 | 0.0420 |
| 8 | 0.0415 |
| 9 | 0.0411 |
| 10 | 0.0408 |
| 11 | 0.0405 |
| 12 | 0.0403 |
| 13 | 0.0402 |
| 14 | 0.0400 |
| 15 | 0.0399 |
| ... | ... |
| Infinite | 0.0380 |
See Seberry 1/e Fig3.4 p74 for example program code to compute these values. Again, this is for English, but by substituting suitable values for number of letters (N) and range of IC values can get equivalent table for other languages. Given this table, you first compute the IC for some sample ciphertext, and then lookup that value in the table to get an estimate of the number of alphabets that would have been used to get such an IC. Note that this is a statistical measure - so the estimate may well be out. But it provides an idea of the range it might be.
this terminal is no more it has ceased to be its expired and gone to meet its maker this is a late terminal its a stiff bereft of life it rests in peace if you hadnt nailed it to the bench it would be pushing up the daisies this is an xterminal