Sunday, June 1, 2008

Al-Kindi, Frequency Analysis and Scrabulous

Ever wonder why you get stuck with too many "I" tiles while playing Scrabulous, but never have enough "H" or "S" tiles?

Ever have the feeling maybe the letters aren't distributed optimally? As it turns out, you're right.

How do we know this? Frequency analysis - the study of repetition of certain letters or words within encrypted messages - a science first conceived in the ninth century by the great Arab philosopher Abu Yusuf Ya'qub ibn Is-haq ibn as-Sabbah ibn omran ibn Ismail al-Kindi.

Al-Kindi was the first to note that encrypted messages could be cracked by using "cribs" - i.e. by looking for repeated groups of letters or words, such as the arabic "al", roughly equivalent to the English "the", and according to Simon Singh, he even wrote a book on the subject (one of 290 such contributions to science) entitled "A Manuscript on Deciphering Cryptographic Messages".

"One way to solve an encrypted message, if we know its language, is to find a different plaintext of [that] language... and then count the occurrences of each letter... then we look at the ciphertext and classify its symbols. We find the most occurring symbol and change it to the form of the [most occurring] letter of the plaintext symbol... and so on, until we account for all symbols of the cryptogram we want to solve."

Yes, dear reader, this was written over a thousand years ago - most probably at the "House of Wisdom" in Baghdad, where Al-Kindi spent most of his life, before dying in 873. Al-Kindi's original book can still be found in the Sulaimaniyyah Archive in Istanbul.

It was the use of frequency analysis by British scientists at Bletchley Park that allowed Britain to win the second war. Turing and others, looking for ways of breaking the codes, theorized that early-morning reports from naval vessels would contain reports on weather.

By using the German words for weather ("wetter") and time as "cribs" (and employing other pieces of knowledge, such as the fact that in German, the letter "E" appears, on average, once every five letters), and using automated analysis machines called "bombes", they were able to determine the settings used by the Enigma machines, often early in the day - a breakthrough that saved millions of lives, and changed the course of history.

Anyway, back to Scrabulous and those missing tiles...

The original Scrabble game called for 100 tiles, and for the most part, the distribution follows the general distribution of letters in the English language. However, is we use Beker-Piper, we quickly find that things are not "as they should be".

Based on analysis of English conducted by Beker and Piper, authors of "Cipher Systems: The Protection of Communication", there should be 4 additional letter "H" tiles, 4 additional "T" tiles, at least 3 additional "S" tiles, and 2 less letter "I" tiles - even accounting for the blanks.

So the next time you're stuck for a chat subject on Scrabulous, you can say "I was reading about this ninth century Arab philospher the other day, and as it turns out..."

No comments: