Software Tools in Haskell: copy
copy characters from stdin to stdout
This page is part of a series on Software Tools in Haskell.
This post is literate Haskell; you can load the source into GHCi and play along.
As usual, we start with some imports.
-- sth-copy: copy characters from stdin to stdout
module Main where
import Data.List (break, unfoldr)
import System.Exit (exitSuccess)
import System.Environment (getArgs)
Many simple tools are designed to act as filters: programs which take a stream of data, manipulate it in some way, and send it along. The copy
program is the simplest possible example of a filter – the identity filter. This is even simpler than cat
, which at least reads and concatenates files.
We can think of copy
as just a character filter – it reads characters on stdin
and writes them, unmodified, to stdout
. But there is another, subtler way to think about how copy
should behave. If the data passed in to copy
is lined text, then every (logical) line should be terminated by a newline character. This may not be the case, though; specifically, the last line in a text file may not be newline-terminated. This is a simple error, though, and so we’d like for copy
to correct this problem. It appears then that copy
has two possible “modes”: character mode and line mode. In character mode, copy
passes bytes from stdin
to stdout
, and in line mode it should additionally make sure that the last character is a newline.
These two uses of copy
are fundamentally different (consider systems where the line separator is something other than \n
) but not so different as to warrant two separate programs. Instead, we’ll provide an argument to let the user specify which mode to use. Rather than making the user specify a mode every time, it is better to make one the default and only require an explicit flag for the other. So: which mode should be the default? Which is more likely to be used? With some experience, lined text seems to be the most common textual format, so we’ll make this the default and enable straight character copying with a --char
flag.
The way we get command line arguments in Haskell is with the getArgs
function in System.Environment
. This function has signature IO [String]
and returns a list of all the arguments given to our program at the command line.
Depending on the mode, copy
does one of two things: read each character from stdin
and write it to stdout
, or read each line from stdin
and write it to stdout
. We can think of these actions as specific instances of a more general pattern: read a character from stdin
, apply a mapping to it, and write the result to stdout
, or read a line from stdin
, apply a mapping to it, and write the result to stdout
. As we will see, many programs are of this form. So we will write general-purpose character and line filter programs, parameterized on the mapping used to transform the input. charFilter
simply reads everything it can from stdin
, applies a function to it, and writes out the result. Note that the standard library function getContents
reads from stdin lazily, so despite appearances this function does not read all of stdin
at once before getting to work.
-- apply a map to stdin
charFilter :: (String -> String) -> IO ()
charFilter f = do
xs <- getContents
putStr $ f xs
lineFilter
does the same, only it extracts the (logical) lines from stdin
first. (The function getLines
extracts the logical lines from a string; we will see this function in detail when we discuss count
.)
-- apply a map to each line of stdin.
lineFilter :: (String -> String) -> IO ()
lineFilter f = do
xs <- fmap getLines getContents
sequence_ $ map (putStrLn . f) xs
-- split a string of characters at any instances
-- of the newline character ('\n'). the resulting
-- strings do not contain any newlines.
getLines :: String -> [String]
getLines = unfoldr firstLine
where
firstLine :: String -> Maybe (String, String)
firstLine xs = case break (== '\n') xs of
("","") -> Nothing
(as,"") -> Just (as,"")
(as,b:bs) -> Just (as,bs)
By wrapping the basic behavior of filters behind a higher order function like this, we can write at a higher level. The copy
program then just needs to determine whether to process characters or lines and filter with the identity.
data Mode = Chars | Lines
main :: IO ()
main = do
args <- getArgs
mode <- case args of
["--char"] -> return Chars
otherwise -> return Lines
case mode of
Chars -> charFilter id
Lines -> lineFilter id
exitSuccess
I have to confess that I don’t see what the practical use of copy
is. However, it is valuable to see that our environment for compiling, running, and testing programs is working properly.