# A simple term rewriting tool

This post is literate Haskell; you can load the source into GHCi and play along.

```
module TermRewriting where
import Data.List (nub, unfoldr)
import Data.Maybe (catMaybes)
import Control.Monad (guard)
import Text.ParserCombinators.Parsec
import System.Environment
import System.Exit
import System.IO
```

Today we’ll build a tool that applies term-rewriting rules to expressions in a simplified form of mathematical notation.

My motivation for this comes from the Arithmetic Made Difficult series of posts on this site. There, I’m writing lots of equation-style proofs, and it would be nice to have a tool check that at least some of the steps in those proofs are carried out correctly. I won’t try to build a full-fledged proof checker – building a proof checker that uses a syntax readable by people is a Hard Problem. But a simple term rewriting tool can get us (made up number) 80% of the benefits with 1% of the effort. With this tool, we’ll be able to annotate equational proofs with “justification” information that is machine checked, but also compiles to human readable cross references (hyperlinks).

## Grammar

To set the scope of the problem, we’ll focus on a subset of math notation that can be easily translated to something like lambda calculus. The basic ingredients are

*constants*, or atomic expressions, that represent themselves;*variables*, that represent arbitrary subexpressions; and*application*of one expression to another.

We can represent this little grammar with the following `Expr`

type.

In keeping with \(\LaTeX\) syntax, we will decree that constants consist of one or more letters prefixed by a `\`

, such as

`\plus \times \cons`

We’ll decree that variables consist of one or more letters *not* prefixed by a `\`

, such as

`a b foo munge`

Application is a little trickier. Certainly an expression like `f(a)`

means “the expression `f`

applied to the expression `a`

. The trouble is how to handle functions with more than one argument. We’ll do this by implicitly currying all functions. Remember that any function with signature

`f : (a,b) -> c`

is equivalent to one with signature

`g : a -> b -> c`

where the function arrow is right associative. So we can interpret an expression like `f(a,b)`

implicitly as `f(a)(b)`

. In particular, we only need to support functions of one argument. But we also need to handle explicitly curried functions, like

`f(a)(b,c)`

and we also need to allow arguments to be fenced in either parentheses or braces.

```
fenced :: Parser a -> Parser a
fenced p =
(between (char '(') (char ')') p)
<|> (between (char '{') (char '}') p)
```

We’ll say that an “expression” is a constant or variable followed by zero or more fenced comma-delimited lists of subexpressions.

```
pLatexExpr :: Parser Expr
pLatexExpr = do
f <- pCon <|> pVar
args <- option [] $ many1 $ fenced $ sepBy1 pLatexExpr (char ',')
return $ foldl App f $ concat args
```

Nothing about the `Expr`

type is specific to LaTeX, but (at least for now) this tool will be used specifically to parse and verify equations written in LaTeX.

While we’re at it, some parsing helpers:

```
parseWith :: String -> Parser a -> String -> Either String a
parseWith loc p text = case runParser (p <* eof) () loc text of
Left err -> Left (show err)
Right x -> Right x
parseWithIO :: String -> Parser a -> String -> IO a
parseWithIO loc p text = case parseWith loc p text of
Left msg -> putStrLn (unwords [loc,text,msg]) >> exitFailure
Right a -> return a
```

And here is an example.

```
$> x <- parseWithIO "" pLatexExpr "\\plus(a,\\times(b,c))"
$> putStrLn $ show x
App (App (Con "plus") (Var "a")) (App (App (Con "times") (Var "b")) (Var "c"))
```

We’ll also make a function to pretty print `Expr`

s as LaTeX.

```
latex :: Expr -> String
latex (Con x) = '\\' : x
latex (Var x) = x
latex (App x y) = concat
[ latex x, "(", latex y, ")" ]
```

For example:

```
$> x <- parseWithIO "" pLatexExpr "\\plus(a,\\times(b,c))"
$> putStrLn $ latex x
\plus(a)(\times(b)(c))
```

Note that the round trip generally takes a LaTeX statement to a different, but equivalent, statement. This is the price of simplicity, and doesn’t matter too much.

## Substitutions

A *substitution* is a mapping from variables to expressions, which we’ll represent using as a list of pairs.

Since lists are not actually maps, we need a helper function to decide whether a given list of token-expression pairs is well-defined. That is, a valid substitution shouldn’t have two pairs with the same token but different expressions.

```
wellDefined :: Substitution -> Bool
wellDefined [] = True
wellDefined ((x,a):ps) = and
[ null (filter (\(y,b) -> (x == y) && (a /= b)) ps)
, wellDefined ps
]
```

Now actually making a substitution is straightforward; we march down an `Expr`

, and if we find a variable token replace it with its associated subexpression. We do the simplest thing by replacing *all* instances of the variable.

```
substitute :: Substitution -> Expr -> Expr
substitute _ (Con a) =
Con a
substitute es (Var x) =
case lookup x es of
Just e -> e
Nothing -> Var x
substitute es (App a b) =
App (substitute es a) (substitute es b)
```

Next we can try to construct a substitution taking one expression to another, based at the root. Constants match only themselves, variables match anything, and applications match recursively.

```
matches :: Expr -> Expr -> Maybe Substitution
matches pattern expr = match pattern expr >>= check
where
check ps = if wellDefined ps
then Just (nub ps)
else Nothing
match (Con a) (Con b) =
if a == b then Just [] else Nothing
match (Con _) _ =
Nothing
match (Var x) e =
Just [(x,e)]
match (App e1 e2) (App f1 f2) = do
xs <- match e1 f1
ys <- match e2 f2
return (xs ++ ys)
match (App _ _) _ =
Nothing
```

More generally matches can occur anywhere in an expression, and we need to be able to unambiguously state where the match is. The `Path`

type represents a series of directions for moving from the root of an expression to some interior node.

And `matchesIn`

constructs a list of all substitutions from one expression to another, along with paths to the root of each substitution.

```
matchesIn :: Expr -> Expr -> [(Path, Substitution)]
matchesIn pattern expr = case expr of
Con a -> case matches pattern expr of
Nothing -> []
Just s -> [(H, s)]
Var x -> case matches pattern expr of
Nothing -> []
Just s -> [(H, s)]
App x y -> do
let
mx = map (\(p,z) -> (L p, z)) $ matchesIn pattern x
my = map (\(p,z) -> (R p, z)) $ matchesIn pattern y
case matches pattern expr of
Nothing -> mx ++ my
Just s -> (H,s) : mx ++ my
```

Given a `Path`

, we can (attempt to) transform the subexpression it points to.

```
transform :: (Expr -> Expr) -> Path -> Expr -> Maybe Expr
transform f H expr = Just (f expr)
transform f (L path) (App expr w) = do
q <- transform f path expr
Just (App q w)
transform f (R path) (App w expr) = do
q <- transform f path expr
Just (App w q)
transform _ _ _ = Nothing
```

As a special case, `rewrite`

replaces the subexpression at a given location.

Substituting application expressions, where we want to match the arguments, is a little different.

```
funMatch :: Expr -> Expr -> Maybe Substitution
funMatch (Con a) (Con b) =
if a == b then return [] else Nothing
funMatch (Var x) (Var y) =
if x == y then return [] else Nothing
funMatch (App e (Var x)) (App f g) = do
fmap ((x,g):) $ funMatch e f
funMatch _ _ = Nothing
funSub :: Expr -> Expr -> Expr -> Maybe Expr
funSub _ _ (Con a) = return (Con a)
funSub _ _ (Var x) = return (Var x)
funSub repl func (App f x) = do
y <- funSub repl func x
g <- funSub repl func f
case funMatch func (App g y) of
Nothing -> return $ App g y
Just sub -> if wellDefined sub
then return $ substitute sub repl
else Nothing
```

## Rewrite Rules

A *rewrite rule* is a pair of expressions that we interpret as being “equal” for all possible substitutions.

In our latexy syntax, we’ll say a rewrite rule is two expressions, separated by an equals sign, and separated by spaces.

To apply a rewrite rule to an expression, we find all matches of the left hand side in the expression, substitute these into the right hand side, and then rewrite the expression.

```
applyRule :: Rule -> Expr -> [Expr]
applyRule (lhs,rhs) x =
catMaybes $ map applyMatch $ matchesIn lhs x
where
applyMatch (path,sub) = rewrite (substitute sub rhs) path x
```

To *validate* a rule application – to check that `a = b`

maps `e`

to `f`

– we apply the rule to `e`

and see if `f`

is among the resuls.

```
validate :: Rule -> Expr -> Expr -> Bool
validate rule input output = elem output (applyRule rule input)
```

Most of the rewrite rules and expressions we’ll deal with are *symmetric*; we don’t mind reversing the order of the expressions in the rewrite rule, and the roles of the input and output expressions. For this case we’ll use a separate validation function.

```
validateSymmetric :: Rule -> Expr -> Expr -> Bool
validateSymmetric (lhs,rhs) e f = or
[ validate (lhs,rhs) e f
, validate (rhs,lhs) e f
, validate (lhs,rhs) f e
, validate (rhs,lhs) f e
]
```

## Interface

Remember: this tool is specifically intended for use with the Arithmetic Made Difficult posts, which include a few thousand lines worth of equational proofs. Those posts need two kinds of support.

- We’d like to statically check the equalities in our equational proofs. Some equalities are annotated with a cross reference to the theorem that justifies them. Some
`sed`

magic can turn these into rewrite rules, which we can use to verify that the left hand side of an equality rewrites to the right hand side. Verifying one equality at a time, rather than an entire proof, makes it possible to build up static checks incrementally. - We’d also like to statically check variable substitutions. This is similar to scenario 1, except instead of rewriting a single (complex) subexpression, in this case we only care about replacing all instances of a variable with some expression and validating the result.
- If an equality
*doesn’t*have a cross referenced annotation, it’d be nice if our tool could also give suggestions for valid cross references based on a dictionary of rewrite rules. Bonus points if it also builds an edit script for inserting the annotation (hint, it will).

These requirements suggest that the tool needs three modes, which we’ll call *verify* mode, *substitute* mode, and *suggest* mode, respectively.

We’ll keep this really simple. In verify mode, the tool expects its input on `stdin`

, one rewrite check per line, in the form

`location TAB rulelhs = rulerhs TAB from-expr TAB to-expr`

Where `location`

is some information about where the check comes from (for reporting problems). So we’ll need to split a line on tabs:

```
unTab :: String -> [String]
unTab = unfoldr getCell
where
getCell [] = Nothing
getCell xs =
let
(cell, z) = span (/= '\t') xs
rest = if null z then [] else tail z
in
Just (cell, rest)
```

And we’ll need to process one untabbed line. We’re assuming that the rewrite rules are symmetric; given `x = y`

and two expressions `e`

and `f`

, any substitution taking either `e`

to `f`

or `f`

to `e`

is considered valid. This is ok for the equation chains we want to use this on.

```
processVerify :: [String] -> IO ()
processVerify [loc,r,a,b] = do
rule <- parseWithIO loc (pRule pLatexExpr) r
e <- parseWithIO loc pLatexExpr a
f <- parseWithIO loc pLatexExpr b
if validateSymmetric rule e f
then return ()
else putStrLn $ unwords [loc,"invalid!",r,"::",a,"-->",b]
processVerify xs = do
putStrLn $ unwords ("unrecognized input format:" : xs)
```

In substitute mode, we’ll accept input in the same format as for verify mode, with an extra check on the form of the rewrite rule.

```
processSubstitute :: [String] -> IO ()
processSubstitute [loc,r,a,b] = do
(lhs,rhs) <- parseWithIO loc (pRule pLatexExpr) r
e <- parseWithIO loc pLatexExpr a
f <- parseWithIO loc pLatexExpr b
case lhs of
Var x -> if (f == substitute [(x,rhs)] e) || (e == substitute [(x,rhs)] f)
then return ()
else putStrLn $ unwords [loc,"invalid!",r,"::",a,"-->",b]
App _ _ -> if (Just f == funSub rhs lhs e) || (Just e == funSub rhs lhs e)
then return ()
else putStrLn $ unwords [loc,"invalid!",r,"::",a,"-->",b]
_ -> do
putStrLn $ unwords [loc,"rewrite rule must be of the form 'var = expr':",r]
processSubstitute xs = do
putStrLn $ unwords ("unrecognized input:" : xs)
```

In suggest mode, we’ll assume that a dictionary of named rewrite rules is kept in some external file consisting of lines of the form

`name TAB rulelhs = rulerhs`

```
parseNamedRule :: [String] -> IO (String,(Expr,Expr))
parseNamedRule [name,r] = do
(x,y) <- parseWithIO "-" (pRule pLatexExpr) r
return (name,(x,y))
parseNamedRule xs = do
putStrLn $ unwords ("unrecognized input:":xs)
exitFailure
```

We’ll also allow comment lines in the dictionary; these start with a `#`

character and are ignored.

We also assume backslashes are escaped in the dictionary because reasons.

```
unesc :: String -> String
unesc ('\\':'\\':xs) = '\\' : unesc xs
unesc (c:cs) = c : unesc cs
unesc [] = []
```

It will be more useful if in suggest mode, we ignore parse errors. Suggestion mode marches down the rules in the dictionary, and if it finds one, emits a `sed`

command to insert an annotation. This part is completely ad-hoc.

```
processSuggest :: [(String,(Expr,Expr))] -> [String] -> IO ()
processSuggest ((name,(x,y)):rs) [loc,line,a,b] = do
lhs <- case parseWith loc pLatexExpr a of
Left _ -> return (Con "")
Right w -> return w
rhs <- case parseWith loc pLatexExpr b of
Left _ -> return (Con "")
Right w -> return w
if validateSymmetric (x,y) lhs rhs
then do
putStrLn $ unwords
[ "sed -i ''"
, "'"++line++"s/^ & = & / \\& \\\\href{" ++ name ++ "}~ = \\& /'"
, loc
]
else processSuggest rs [loc,line,a,b]
processSuggest [] [_,_,_,_] = return ()
processSuggest _ xs = do
putStrLn $ unwords xs
```

And `main`

. In a departure from the unix philosophy, in verify mode we’ll report the number of checks made.

```
main :: IO ()
main = do
args <- getArgs
case args of
["--verify"] -> do
checks <- fmap (map unTab . lines) getContents
mapM_ processVerify checks
hPutStrLn stderr $ unwords [show $ length checks, "checks completed"]
["--substitute"] -> do
checks <- fmap (map unTab . lines) getContents
mapM_ processSubstitute checks
hPutStrLn stderr $ unwords [show $ length checks, "checks completed"]
["--suggest", path] -> do
rules <- (fmap (map unTab . filter (not . comment) . lines . unesc) $ readFile path)
>>= mapM parseNamedRule
checks <- fmap (map unTab . lines) getContents
putStrLn "#!/bin/bash"
mapM_ (processSuggest rules) checks
_ -> do
putStrLn $ unlines
[ "usage:"
, " --verify : validate rewrites on stdin"
, " --substitute : validate substitutions on stdin"
, " --suggest PATH : suggest rewrite rules from PATH on stdin"
]
```

# Testing

The parts of this tool are complicated enough that I’ll feel better having some tests.

```
test_main :: IO ()
test_main = do
parseSuccessTests
parseFailureTests
substituteTests
validationTests
```

Parsing tests helpers:

```
testParseSuccess :: Int -> (String, Expr) -> IO ()
testParseSuccess k (str,expr) = do
putStrLn $ "\nsuccess test " ++ show k
putStrLn str
case parseWith "" pLatexExpr str of
Left msg -> putStrLn msg >> exitFailure
Right ex -> if ex == expr
then putStrLn "ok"
else do
putStrLn $ unlines
[ "parse error:"
, str
, "expected: " ++ show expr
, "actual: " ++ show ex
]
exitFailure
testParseFailure :: Int -> String -> IO ()
testParseFailure k str = do
putStrLn $ "\nfailure test " ++ show k
putStrLn str
case parseWith "" pLatexExpr str of
Right ex -> putStrLn (show ex) >> exitFailure
Left _ -> putStrLn "ok"
```

Parsing test cases:

```
parseSuccessTests :: IO ()
parseSuccessTests = sequence_ $ zipWith testParseSuccess [1..]
[ ("f", Var "f")
, ("\\f", Con "f")
, ("f(x)", App (Var "f") (Var "x"))
, ("\\f(x)", App (Con "f") (Var "x"))
, ("f(\\x)", App (Var "f") (Con "x"))
, ("\\f(\\x)", App (Con "f") (Con "x"))
, ("f{x}", App (Var "f") (Var "x"))
, ("\\f{x}", App (Con "f") (Var "x"))
, ("f{\\x}", App (Var "f") (Con "x"))
, ("\\f{\\x}", App (Con "f") (Con "x"))
, ("f(x,y)", App (App (Var "f") (Var "x")) (Var "y"))
, ("\\f(x,y)", App (App (Con "f") (Var "x")) (Var "y"))
, ("f(x)(y)", App (App (Var "f") (Var "x")) (Var "y"))
, ("f{x}(y)", App (App (Var "f") (Var "x")) (Var "y"))
, ("f(x,y)(z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
, ("f(x)(y,z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
, ("f{x,y}(z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
, ("f{x}(y,z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
, ("f(x)(g(y))", App (App (Var "f") (Var "x")) (App (Var "g") (Var "y")))
]
parseFailureTests :: IO ()
parseFailureTests = sequence_ $ zipWith testParseFailure [1..]
[ "#"
, "f(x"
, "f(x}"
, "\\fx)"
, "\\f(x,)"
, "\\f(,x)"
]
```

Substitution test helper:

```
testSubstitution :: Int -> (Substitution, Expr, Expr) -> IO ()
testSubstitution k (sub,e1,e2) = do
putStrLn $ "\nsubstitution test " ++ show k
putStrLn $ show sub
putStrLn $ latex e1
if e2 == substitute sub e1
then putStrLn "ok"
else do
putStrLn $ unlines
[ "substitution error:"
, "expected: " ++ show e2
, "actual: " ++ show (substitute sub e1)
]
exitFailure
```

Substitution test cases:

```
substituteTests :: IO ()
substituteTests = sequence_ $ zipWith testSubstitution [1..]
[ ([], Con "a", Con "a")
, ([("a", Con "b")], Var "a", Con "b")
, ([("a", Con "b")], Con "a", Con "a")
, ([("a", App (Con "f") (Var "x"))], App (Con "g") (Var "a"), App (Con "g") (App (Con "f") (Var "x")))
, ([("a", Con "x")], App (Var "a") (Var "a"), App (Con "x") (Con "x"))
, ([("x", Var "y")], App (Var "x") (Var "x"), App (Var "y") (Var "y"))
, ([("x", Var "y")], App (Con "x") (Var "x"), App (Con "x") (Var "y"))
]
```

Validation test helper:

```
testValidate :: Int -> (Rule, Expr, Expr, Bool) -> IO ()
testValidate k (rule,e1,e2,result) = do
putStrLn $ "\nvalidation test " ++ show k
putStrLn $ show rule
putStrLn $ latex e1
putStrLn $ latex e2
if result == validate rule e1 e2
then putStrLn "ok"
else do
putStrLn "validation error!"
exitFailure
```

Validation test cases: