Software Tools in Haskell: examine

interactively view a file

Posted on 2016-03-07 by nbloomf
Tags: software-tools-in-haskell, literate-haskell

This page is part of a series on Software Tools in Haskell.

This post is literate Haskell; you can load the source into GHCi and play along.

As usual, we start with some imports.

-- examine: interactively view a file
module Main where

import System.Environment (getArgs, getProgName)
import System.Exit (exitSuccess, exitFailure)
import System.IO (hPutStrLn, stderr)
import Data.List (unfoldr)

This is not one of the examples in Software Tools, but is a program I’ve found myself wishing I had in the past. It’s a simple interactive file viewer. examine takes a file and presents it to the user a little bit at a time, and accepts some simple commands: pressing n advances to the next chunk of the file, v the previous chunk, x exits the program, and w writes the current chunk to stdout. That last command, w, can be used to make examine act like a human-powered filter. To avoid cluttering up stdout, the interaction happens on stderr. (This is a minor abuse of stderr.) I’d prefer to use the arrow keys for navigation, but doing this in a platform-independent way is not so simple.

main :: IO ()
main = do
  args <- getArgs

  -- interpret arguments
  (num, name) <- case args of
    [str]   -> return (10, str)
    [k,str] -> case readDecimalNat k of
      Nothing -> argErr >> exitFailure
      Just m  -> return (m, str)
    otherwise -> argErr >> exitFailure

  lns <- fmap getLines $ readFile name

  zipper <- case chunksOf num lns of
    Nothing -> argErr >> exitFailure
    Just xs -> case makeZipperList xs of
      Nothing -> argErr >> exitFailure
      Just ys -> return ys

      = putOutOfBand ["", "[n]ext pre[v] e[x]it [w]rite?"]

    prompt z = do
      putOutOfBand $ zlFocus z
      input <- getLine
      putOutOfBand [""]
      case input of
        "x" -> return ()

        "n" -> case zlNext z of
          Just w  -> prompt w
          Nothing -> do
            putOutOfBand ["end of file. (press any key)"]
            prompt z

        "v" -> case zlPrev z of
          Just w  -> prompt w
          Nothing -> do
            putOutOfBand ["start of file. (press any key)"]
            prompt z

        "w" -> do
          sequence_ $ map putStrLn $ zlFocus z
          prompt z

        _ -> prompt z

  prompt zipper


argErr :: IO ()
argErr = reportErrorMsgs
  [ "usage:"
  , "  examine FILE     -- interactively view FILE 10 lines at a time"
  , "  examine INT FILE -- interactively view FILE INT lines at a time"

One idea of interest is the way we represent our file internally. The file being examined can be thought of as a list of chunks, but we’d like one of these, the “current chunk”, to be special. Using a straight list to represent the chunks, we could use the list index to keep track of the current chunk. Updating the reference to the current chunk takes constant time (add or subtract 1), but retrieving the current chunk takes linear time. In a large file this could be a problem. Can we do better? In other languages we might use pointers to randomly access parts of a file, but in Haskell this strategy is less straightforward.

Note, though, that our design doesn’t really want random access to the parts of a file. We just want one part to be singled out as the “current” chunk and to have efficient ways to move to the “next” and “previous” chunk. We can represent this as a triple: the current chunk, a list of all the next chunks, and a list of all the previous chunks.

data ZipperList a = ZL a [a] [a]

zlFocus :: ZipperList a -> a
zlFocus (ZL x _ _) = x

makeZipperList :: [a] -> Maybe (ZipperList a)
makeZipperList []     = Nothing
makeZipperList (x:xs) = Just $ ZL x [] xs

zlNext :: ZipperList a -> Maybe (ZipperList a)
zlNext (ZL _ _  [])     = Nothing
zlNext (ZL x ys (z:zs)) = Just $ ZL z (x:ys) zs

zlPrev :: ZipperList a -> Maybe (ZipperList a)
zlPrev (ZL _ []     _)  = Nothing
zlPrev (ZL x (y:ys) zs) = Just $ ZL y ys (x:zs)

I’ve called this a “zipper list” because it is related to the well-studied type theoretic concept of zippers.

Zippers are wild, and one of the reasons I got turned on to Haskell in the first place. So Haskell types are objects in a category, right? And this category (among other properties) has all binary products and coproducts: tuples (a,b) and eithers Either a b. With a slight change in notation we can use ordinary multiplication and addition symbols to represent products and coproducts, so that a type like (Either a b, c) is denoted (a+b)c. A similar convention applies to algebraic data types. A type like

data Foo a b c = A a | BC b c

can be denoted Foo(a,b,c) = a + bc. The data keyword signals that we are naming a type-level synonym, or “function” of sorts, and this function looks an awful lot like a polynomial.

This is where zippers come in: any algebraic type has an associated “zipper” type, which represents the type of “one-hole contexts” over that type. And the zipper type is found by symbolically differentiating the original type function.

Here’s an extremely hand-wavy example. Recursive types can also play this game; so the type of lists, which we might denote

data L a = Nil | Cons a (L a)

can be represented as \(L(a) = 1 + a L(a)\). Blurring the line for a moment between functions and types we can “solve the equation” for \(L(a)\) as \[L(a) = \frac{1}{1-a}.\] What does this even mean, though? Addition and multiplication of types is meaningful, but I’m not sure about subtraction and division. But if we convert this to its power series representation, we have \[L(a) = 1 + a + a^2 + a^3 + \cdots.\] This makes a little more sense. A list is either empty, or contains one element, or contains two elements, or three, and so on. (This singles out the unique “smallest” solution of the defining equation of \(L(a)\), excluding infinite lists. Hand wave!) Implicitly differentiating \(L(a) = 1 + aL(a)\) with respect to \(a\), we get \[L^\prime(a) = \frac{L(a)}{1-a},\] which (using the power series hand wave) is \[L^\prime(a) = L(a)L(a).\] That is, the type of one-hole contexts of a list is a pair of lists, representing the part before and the part after the hole. With one element to fill the hole, this is the type of our ZipperList.

There is some deep magic here which I do not completely understand. For instance, the differentiation of types can be done automatically, and higher derivatives give us contexts with more than one hole. And does the analogy between types and calculus hold for codata (potentially infinite data)?

Old Stuff

-- write list of messages to stderr
reportErrorMsgs :: [String] -> IO ()
reportErrorMsgs errs = do
  name <- getProgName
  sequence_ $ map (hPutStrLn stderr) $ ((name ++ " error"):errs)

chunksOf :: Int -> [a] -> Maybe [[a]]
chunksOf k xs
  | k <= 0    = Nothing
  | otherwise = Just $ chunk xs
    chunk [] = []
    chunk ys = let (as,bs) = splitAt k ys in as : chunk bs

putOutOfBand :: [String] -> IO ()
putOutOfBand xs = do
  sequence_ $ map (hPutStrLn stderr) xs

-- parse a natural number base 10
readDecimalNat :: String -> Maybe Int
readDecimalNat xs = do
  ys <- sequence $ map decToInt $ reverse xs
  return $ sum $ zipWith (*) ys [10^t | t <- [0..]]
    decToInt :: Char -> Maybe Int
    decToInt x = lookup x
      [ ('0',0), ('1',1), ('2',2), ('3',3), ('4',4)
      , ('5',5), ('6',6), ('7',7), ('8',8), ('9',9)

-- split on \n
getLines :: String -> [String]
getLines = unfoldr firstLine
    firstLine :: String -> Maybe (String, String)
    firstLine xs = case break (== '\n') xs of
      ("","")   -> Nothing
      (as,"")   -> Just (as,"")
      (as,b:bs) -> Just (as,bs)