Software Tools in Haskell: crypt
xor stdin with a list of keys
This page is part of a series on Software Tools in Haskell.
This post is literate Haskell; you can load the source into GHCi and play along.
As usual, we start with some imports.
-- sth-crypt: xor chars on stdin with a list of keys
module Main where
import System.Environment (getArgs)
import System.Exit (exitSuccess)
import Data.Char (ord, chr, readLitChar)
See also the Backslash module.
This program performs very (very!) simple encryption by xor-ing stdin
with a list of keys, supplied by the user at the command line.
At the bit level, a xor b
is 0 if a
and b
are equal and is 1
otherwise. Two lists of bits are xor
ed entrywise, with the shorter list padded with \(0\)s. We can think of xor
as an operation on natural numbers by converting to and from base 2, and finally we can think of xor
as an operation on characters by converting to and from natural numbers (a.k.a. code points). Then to xor
two strings we xor
characterwise.
We will implement these operations bare-handed.
class XOR t where
xor :: t -> t -> t
xors :: [t] -> [t] -> [t]
xors [] ys = ys
xors xs [] = xs
xors (x:xs) (y:ys) = (xor x y) : xors xs ys
data Bit
= Zero | One
deriving (Eq, Show)
instance XOR Bit where
xor Zero Zero = Zero
xor Zero One = One
xor One Zero = One
xor One One = Zero
instance XOR Int where
xor a b = bitsToInt $ xors (intToBits a) (intToBits b)
where
intToBits :: (Integral n) => n -> [Bit]
intToBits k = case getBits k of
[] -> [Zero]
bs -> bs
where
getBits t
| t <= 0 = []
| otherwise = case even t of
True -> Zero : (getBits $ t`quot`2)
False -> One : (getBits $ (t-1)`quot`2)
bitsToInt :: (Integral n) => [Bit] -> n
bitsToInt = sum . zipWith (*) [2^t | t <- [0..]] . map bitToInt
where
bitToInt Zero = 0
bitToInt One = 1
instance XOR Char where
xor x y = chr $ xor (ord x) (ord y)
When we xor
two strings together, one is called the plaintext and the other is called the key. If the key is shorter than the plaintext we simply repeat it from the beginning as many times as needed. The result is a new string, the ciphertext, which will generally not be recognizable. However, we can recover the plaintext by repeating the xor
operation with the same key.
This method of encrytion has several interesting properties. (I am hesitant to call these unequivocal “pros” or “cons”, since every encryption scheme involves tradeoffs.)
- Extremely simple to implement.
- Is a symmetric cipher; in fact, exactly the same key is used for encryption and decryption.
- If the key is short compared to the text, is vulnerable to statistical attacks. (Beyond the usual brute-force attacks that come with short keys.)
- Can be used to implement a one-time pad, which to my knowledge is the only provably secure encryption scheme. (To do this, the key must be the same length as the plaintext.)
- Can only be used on character encodings of size \(2^n\) (which unicode is), which preferably have a canonical mapping to the integers from \(0\) to \(2^n - 1\) (which unicode does).
- When used on a large alphabet, most significant bits can play an important role. For instance, if we are encrypting a text which consists only of ASCII (the first 128 characters of unicode) and use a key consisting of characters with a bit higher than the 7th set, then these high bits will never be
xor
ed away. I haven’t thought about this in depth, but I suspect this opens up a new class of statistical attacks. - Related to the previous property, practical text does not use the full range of unicode – more likely it is restricted to the characters used in a particular language. This may open a class of attacks.
- When used with ASCII or unicode, generally the ciphertext will include control characters. This can make it inconvenient to use text-oriented tools on encrypted text. We could fix this with a different mapping from characters to numbers, but doing so would probably weaken the encryption even further by reducing the alphabet size.
Here’s the main program.
main :: IO ()
main = do
keys <- fmap (map bsUnEsc) getArgs
charFilter (cryptWith keys)
exitSuccess
cryptWith :: [String] -> String -> String
cryptWith ks str = foldr crypt str ks
where
crypt :: String -> String -> String
crypt "" str = str
crypt key str = zipWith xor str (concat $ repeat key)
We definitely want the user to specify an encryption key from the command line. But generally, the user can specify many (or no!) command line arguments. What should we do if that happens?
- If the user does not specify a key, then the simplest thing to do is return the plaintext unchanged. We could return an error, but it is feasible that some future user would prefer that “no key” be treated the same as “empty key”. Moreover, the recursive definition of
cryptKeys
is much cleaner with this decision. - We could just silently ignore any arguments after the first, but then the presence of extra arguments likely signifies that the user misunderstands how to use the program.
- We could simply concatenate the arguments together as a single larget key.
- We could interpret the command line arguments as a list of keys, and use all of them, one at a time.
Concatenating the arguments to a single key would be fine. But interpreting the arguments as multiple keys, to be used independently, has a nice side effect. It provides a simple (if not maximally secure) way for the user to increase the effective size of the key. As K&P note, if we xor
encrypt a given file twice with two different keys, of lengths \(m\) and \(n\), then this is equivalent to xor
ing once with a single key of length \(\mathrm{lcm}(m,n)\). For instance,
crypt "foo" "quuux"
with two keys of length 3 and 5, is equivalent to running crypt
with a key of length 15. Saying
crypt "foo" "quuux" "mungely"
is just like using a key of length 105.
The keys are also run through backslashUnEscape
by default, meaning that any C or ASCII style escape codes are interpreted. This is the Right Thing because we want the user to have easy access to the widest possible range of keys. It is not necessary to clutter the interface by making this functionality optional with an extra command line argument.
I can think of one important possible improvement to crypt
: it would be nice if we could specify the keys as files as well as arguments. For instance, an invocation like
crypt "foo" "quuux" --keyfiles key1.txt key2.txt
would treat the contents of key1.txt
and key2.txt
as keys, just like foo
and quuux
. We generally do not want to leave encryption keys lying around in files. But this would make it easier to use very large keys, for instance to implement a one-time pad. That’s an idea for another day.
Old stuff
-- apply a map to stdin
charFilter :: (String -> String) -> IO ()
charFilter f = do
xs <- getContents
putStr $ f xs
woo