Polyalphabetic substitution ciphers

Vigenere cipher.


In multiple-alphabet ciphers, the substitute letter is taken from different alphabets, depending on the location of the current letter in the plaintext. The rule for selecting the alphabet is the key.

While we are looking for better strength of the cryptosystem compared with the monoalphabetic ones, we are also interested in keeping it as simple as possible. The most commonly known cryptosystem in this class is the Vigenere cipher.

For the explanation of this method we use numbers 1..26 standing for letters, and neglect space and punctuation characters:

a
b
c
d
e
f
g
h
i
j
k
l
m
01
02
03
04
05
06
07
08
09
10
11
12
13
n
o
p
q
r
s
t
u
v
w
x
y
z
14
15
16
17
18
19
20
21
22
23
24
25
26

For given plaintext letter pi, the substitution value ci is calculated, as follows:

Ci = (pi + ki)mod26,

where ki is the value of the letter found in the i-th position of the key sequence.

For example, let the plaintext be the phrase, "A quick brown fox jumps " and the key sequence is "my key sequence". Below the key sequence (shown in red) is repeated an appropriate number of times, and is written underneath the plaintext with their numerical values:

A
Q
U
I
C
K
B
R
O
W
N
F
O
X
J
U
M
P
S
01
17
21
09
03
11
02
18
15
23
14
06
15
24
10
21
13
16
19
m
y
k
e
y
s
e
q
u
e
n
c
e
m
y
k
e
y
s
13
25
11
05
25
19
05
17
21
05
14
03
05
13
25
11
05
25
19

--------------------------------------------------------------------------------

N
P
F
N
B
D
G
I
J
B
B
I
T
K
I
F
R
O
L
14
16
06
14
02
04
07
09
10
02
02
09
20
11
09
06
18
15
12

The blue sequence below the horizontal line is the ciphertext. For instance, the last letter in the word "quick" was substituted by adding the numerical codes of letter "K" (11) and "s" (19) and subtracting 26 from the sum 30, since it is greater then 26. The result is 04, i.e. "D".

For decrypting the ciphertext the following algorithm is used:

pi = (Ci - ki)mod26.

For example, from the last letter "D" in the second word, we get 04-19=-15. By performing mod26, we add 26 to this sum and eventually get the result 11, i.e. "K".

One can think of this system as based on the set of 26 different Caesar ciphers, with shifts of 0 through 25, that apply to the plaintext characters in different locations. The key sequence is used for determining the shift for the individual plaintext letter. For example, for all letters "s" in the key sequence the shift for the plaintext is 19. This view of the Vigenere cipher allows us to classify it as a polyalphabetic substitution cryptosystem. It was exactly in this form the original Vigenere cipher was first published in 1586.

Since the keyspace size for Caesar cipher is 26, with the key containing m characters, we get total number of possible keys for the Vigenere cipher 26^m. For example, if we take m=5, then the keyspace has size exceeding 1.1×E107. This is already large enough to preclude an exhaustive key search by hand (but not by computer).

It is clear that, due to much larger keyspace, this cipher is considerably stronger than the monoalphabetic substitution. Generally, the longer the key, the stronger the cipher, but its usage is getting less convenient. With a too short key, plaintext regularities can be revealed more easily. Ideally, the key sequence should be used only once, and its length should be the same as that of the plaintext. One possible way to reduce some weak points of short keys is concatenating the plaintext to the key. Therefore, if we using the key "good cipher", the key sequence in the above example will be "good cipher A quick brow".

 

Any secret key distribution mechanism is subject to the risk of seizure of the key by the adversary.
Because of its strength and convenience, some polyalphabetic ciphers were used rather extensively by the Union Army during the Civil War. In the 20th century, until the advent of computers, spies used widely available texts from regular books as the source of the key sequences (so-called the Running-Key, or Book cipher). That simplified the secret key distribution problem considerably, because the only information the secret agents and their controllers should exchange was the convention on the correspondence between the page numbers and the dates of the message transmissions. Members of a Soviet spy ring led by German journalist Richard Zorge and caught in Japan in 1942 were using calendars for that purpose. Since some of the arrested suspects kept the same type of calendar in their homes, and were making some notes on them, the Japanese counterintelligence succeeded in breaking the cipher. This is a typical example of the key distribution channel failure.

The weakness of this type of cipher is that it still reveals the digram frequencies, if both the plaintext and the key use the same language known to the cryptanalyst. It is also vulnerable to known plaintext attack and to repeated pattern analysis, especially if the key is being reused.




You can view the source code here