Software Tools in Haskell: crypt

xor stdin with a list of keys

Posted on 2016-02-25 by nbloomf
Tags: software-tools-in-haskell, literate-haskell

This page is part of a series on Software Tools in Haskell.

This post is literate Haskell; you can load the source into GHCi and play along.

As usual, we start with some imports.

-- sth-crypt: xor chars on stdin with a list of keys
module Main where

import System.Environment (getArgs)
import System.Exit (exitSuccess)
import Data.Char (ord, chr, readLitChar)

See also the Backslash module.

import Lib.Backslash (bsUnEsc)

This program performs very (very!) simple encryption by xor-ing stdin with a list of keys, supplied by the user at the command line.

At the bit level, a xor b is 0 if a and b are equal and is 1 otherwise. Two lists of bits are xored entrywise, with the shorter list padded with \(0\)s. We can think of xor as an operation on natural numbers by converting to and from base 2, and finally we can think of xor as an operation on characters by converting to and from natural numbers (a.k.a. code points). Then to xor two strings we xor characterwise.

We will implement these operations bare-handed.

class XOR t where
  xor :: t -> t -> t

  xors :: [t] -> [t] -> [t]
  xors [] ys = ys
  xors xs [] = xs
  xors (x:xs) (y:ys) = (xor x y) : xors xs ys

data Bit
  = Zero | One
  deriving (Eq, Show)

instance XOR Bit where
  xor Zero Zero = Zero
  xor Zero One  = One
  xor One  Zero = One
  xor One  One  = Zero

instance XOR Int where
  xor a b = bitsToInt $ xors (intToBits a) (intToBits b)
      intToBits :: (Integral n) => n -> [Bit]
      intToBits k = case getBits k of
        [] -> [Zero]
        bs -> bs
          getBits t
            | t <= 0    = []
            | otherwise = case even t of
                True  -> Zero : (getBits $ t`quot`2)
                False -> One  : (getBits $ (t-1)`quot`2)

      bitsToInt :: (Integral n) => [Bit] -> n
      bitsToInt = sum . zipWith (*) [2^t | t <- [0..]] . map bitToInt
          bitToInt Zero = 0
          bitToInt One  = 1

instance XOR Char where
  xor x y = chr $ xor (ord x) (ord y)

When we xor two strings together, one is called the plaintext and the other is called the key. If the key is shorter than the plaintext we simply repeat it from the beginning as many times as needed. The result is a new string, the ciphertext, which will generally not be recognizable. However, we can recover the plaintext by repeating the xor operation with the same key.

This method of encrytion has several interesting properties. (I am hesitant to call these unequivocal “pros” or “cons”, since every encryption scheme involves tradeoffs.)

Here’s the main program.

main :: IO ()
main = do
  keys <- fmap (map bsUnEsc) getArgs
  charFilter (cryptWith keys)

cryptWith :: [String] -> String -> String
cryptWith ks str = foldr crypt str ks
    crypt :: String -> String -> String
    crypt ""  str = str
    crypt key str = zipWith xor str (concat $ repeat key)

We definitely want the user to specify an encryption key from the command line. But generally, the user can specify many (or no!) command line arguments. What should we do if that happens?

Concatenating the arguments to a single key would be fine. But interpreting the arguments as multiple keys, to be used independently, has a nice side effect. It provides a simple (if not maximally secure) way for the user to increase the effective size of the key. As K&P note, if we xor encrypt a given file twice with two different keys, of lengths \(m\) and \(n\), then this is equivalent to xoring once with a single key of length \(\mathrm{lcm}(m,n)\). For instance,

crypt "foo" "quuux"

with two keys of length 3 and 5, is equivalent to running crypt with a key of length 15. Saying

crypt "foo" "quuux" "mungely"

is just like using a key of length 105.

The keys are also run through backslashUnEscape by default, meaning that any C or ASCII style escape codes are interpreted. This is the Right Thing because we want the user to have easy access to the widest possible range of keys. It is not necessary to clutter the interface by making this functionality optional with an extra command line argument.

I can think of one important possible improvement to crypt: it would be nice if we could specify the keys as files as well as arguments. For instance, an invocation like

crypt "foo" "quuux" --keyfiles key1.txt key2.txt

would treat the contents of key1.txt and key2.txt as keys, just like foo and quuux. We generally do not want to leave encryption keys lying around in files. But this would make it easier to use very large keys, for instance to implement a one-time pad. That’s an idea for another day.

Old stuff

-- apply a map to stdin
charFilter :: (String -> String) -> IO ()
charFilter f = do
  xs <- getContents
  putStr $ f xs