A simple term rewriting tool

Posted on 2018-01-23 by nbloomf

This post is literate Haskell; you can load the source into GHCi and play along.

module TermRewriting where

import Data.List (nub, unfoldr)
import Data.Maybe (catMaybes)
import Text.ParserCombinators.Parsec
import System.Environment
import System.Exit
import System.IO

Today we’ll build a tool that applies term-rewriting rules to expressions in a simplified form of mathematical notation.

My motivation for this comes from the Arithmetic Made Difficult series of posts on this site. There, I’m writing lots of equation-style proofs, and it would be nice to have a tool check that at least some of the steps in those proofs are carried out correctly. I won’t try to build a full-fledged proof checker – building a proof checker that uses a syntax readable by people is a Hard Problem. But a simple term rewriting tool can get us (made up number) 80% of the benefits with 1% of the effort. With this tool, we’ll be able to annotate equational proofs with “justification” information that is machine checked, but also compiles to human readable cross references (hyperlinks).

Grammar

To set the scope of the problem, we’ll focus on a subset of math notation that can be easily translated to something like lambda calculus. The basic ingredients are

1. constants, or atomic expressions, that represent themselves;
2. variables, that represent arbitrary subexpressions; and
3. application of one expression to another.

We can represent this little grammar with the following Expr type.

data Expr
= Con Token
| Var Token
| App Expr Expr
deriving (Eq, Show)

type Token = String

In keeping with $$\LaTeX$$ syntax, we will decree that constants consist of one or more letters prefixed by a \, such as

\plus \times \cons
pCon :: Parser Expr
pCon = do
char '\\'
token <- many1 letter
return (Con token)

We’ll decree that variables consist of one or more letters not prefixed by a \, such as

a b foo munge
pVar :: Parser Expr
pVar = do
token <- many1 letter
return (Var token)

Application is a little trickier. Certainly an expression like f(a) means “the expression f applied to the expression a. The trouble is how to handle functions with more than one argument. We’ll do this by implicitly currying all functions. Remember that any function with signature

f : (a,b) -> c

is equivalent to one with signature

g : a -> b -> c

where the function arrow is right associative. So we can interpret an expression like f(a,b) implicitly as f(a)(b). In particular, we only need to support functions of one argument. But we also need to handle explicitly curried functions, like

f(a)(b,c)

and we also need to allow arguments to be fenced in either parentheses or braces.

fenced :: Parser a -> Parser a
fenced p =
(between (char '(') (char ')') p)
<|> (between (char '{') (char '}') p)

We’ll say that an “expression” is a constant or variable followed by zero or more fenced comma-delimited lists of subexpressions.

pLatexExpr :: Parser Expr
pLatexExpr = do
f <- pCon <|> pVar
args <- option [] $many1$ fenced $sepBy1 pLatexExpr (char ',') return$ foldl App f $concat args Nothing about the Expr type is specific to LaTeX, but (at least for now) this tool will be used specifically to parse and verify equations written in LaTeX. While we’re at it, some parsing helpers: parseWith :: String -> Parser a -> String -> Either String a parseWith loc p text = case runParser (p <* eof) () loc text of Left err -> Left (show err) Right x -> Right x parseWithIO :: String -> Parser a -> String -> IO a parseWithIO loc p text = case parseWith loc p text of Left msg -> putStrLn (unwords [loc,text,msg]) >> exitFailure Right a -> return a And here is an example. $> x <- parseWithIO "" pLatexExpr "\\plus(a,\\times(b,c))"
$> putStrLn$ show x
App (App (Con "plus") (Var "a")) (App (App (Con "times") (Var "b")) (Var "c"))

We’ll also make a function to pretty print Exprs as LaTeX.

latex :: Expr -> String
latex (Con x) = '\\' : x
latex (Var x) = x
latex (App x y) = concat
[ latex x, "(", latex y, ")" ]

For example:

$> x <- parseWithIO "" pLatexExpr "\\plus(a,\\times(b,c))"$> putStrLn $latex x \plus(a)(\times(b)(c)) Note that the round trip generally takes a LaTeX statement to a different, but equivalent, statement. This is the price of simplicity, and doesn’t matter too much. Substitutions A substitution is a mapping from variables to expressions, which we’ll represent using as a list of pairs. type Substitution = [(Token, Expr)] Since lists are not actually maps, we need a helper function to decide whether a given list of token-expression pairs is well-defined. That is, a valid substitution shouldn’t have two pairs with the same token but different expressions. wellDefined :: Substitution -> Bool wellDefined [] = True wellDefined ((x,a):ps) = and [ null (filter (\(y,b) -> (x == y) && (a /= b)) ps) , wellDefined ps ] Now actually making a substitution is straightforward; we march down an Expr, and if we find a variable token replace it with its associated subexpression. We do the simplest thing by replacing all instances of the variable. substitute :: Substitution -> Expr -> Expr substitute _ (Con a) = Con a substitute es (Var x) = case lookup x es of Just e -> e Nothing -> Var x substitute es (App a b) = App (substitute es a) (substitute es b) Next we can try to construct a substitution taking one expression to another, based at the root. Constants match only themselves, variables match anything, and applications match recursively. matches :: Expr -> Expr -> Maybe Substitution matches pattern expr = match pattern expr >>= check where check ps = if wellDefined ps then Just (nub ps) else Nothing match (Con a) (Con b) = if a == b then Just [] else Nothing match (Con _) _ = Nothing match (Var x) e = Just [(x,e)] match (App e1 e2) (App f1 f2) = do xs <- match e1 f1 ys <- match e2 f2 return (xs ++ ys) match (App _ _) _ = Nothing More generally matches can occur anywhere in an expression, and we need to be able to unambiguously state where the match is. The Path type represents a series of directions for moving from the root of an expression to some interior node. data Path = H | L Path | R Path deriving (Eq, Show) And matchesIn constructs a list of all substitutions from one expression to another, along with paths to the root of each substitution. matchesIn :: Expr -> Expr -> [(Path, Substitution)] matchesIn pattern expr = case expr of Con a -> case matches pattern expr of Nothing -> [] Just s -> [(H, s)] Var x -> case matches pattern expr of Nothing -> [] Just s -> [(H, s)] App x y -> do let mx = map (\(p,z) -> (L p, z))$ matchesIn pattern x
my = map (\(p,z) -> (R p, z)) $matchesIn pattern y case matches pattern expr of Nothing -> mx ++ my Just s -> (H,s) : mx ++ my Given a Path, we can (attempt to) transform the subexpression it points to. transform :: (Expr -> Expr) -> Path -> Expr -> Maybe Expr transform f H expr = Just (f expr) transform f (L path) (App expr w) = do q <- transform f path expr Just (App q w) transform f (R path) (App w expr) = do q <- transform f path expr Just (App w q) transform _ _ _ = Nothing As a special case, rewrite replaces the subexpression at a given location. rewrite :: Expr -> Path -> Expr -> Maybe Expr rewrite p = transform (const p) Substituting application expressions, where we want to match the arguments, is a little different. funMatch :: Expr -> Expr -> Maybe Substitution funMatch (Con a) (Con b) = if a == b then return [] else Nothing funMatch (Var x) (Var y) = if x == y then return [] else Nothing funMatch (App e (Var x)) (App f g) = do fmap ((x,g):)$ funMatch e f
funMatch _ _ = Nothing

funSub :: Expr -> Expr -> Expr -> Maybe Expr
funSub _ _ (Con a) = return (Con a)
funSub _ _ (Var x) = return (Var x)
funSub repl func (App f x) = do
y <- funSub repl func x
g <- funSub repl func f
case funMatch func (App g y) of
Nothing  -> return $App g y Just sub -> if wellDefined sub then return$ substitute sub repl
else Nothing

Rewrite Rules

A rewrite rule is a pair of expressions that we interpret as being “equal” for all possible substitutions.

type Rule = (Expr, Expr)

In our latexy syntax, we’ll say a rewrite rule is two expressions, separated by an equals sign, and separated by spaces.

pRule :: Parser Expr -> Parser Rule
pRule p = do
a <- p
spaces
char '='
spaces
b <- p
return (a,b)

To apply a rewrite rule to an expression, we find all matches of the left hand side in the expression, substitute these into the right hand side, and then rewrite the expression.

applyRule :: Rule -> Expr -> [Expr]
applyRule (lhs,rhs) x =
catMaybes $map applyMatch$ matchesIn lhs x
where
applyMatch (path,sub) = rewrite (substitute sub rhs) path x

To validate a rule application – to check that a = b maps e to f – we apply the rule to e and see if f is among the resuls.

validate :: Rule -> Expr -> Expr -> Bool
validate rule input output = elem output (applyRule rule input)

Most of the rewrite rules and expressions we’ll deal with are symmetric; we don’t mind reversing the order of the expressions in the rewrite rule, and the roles of the input and output expressions. For this case we’ll use a separate validation function.

validateSymmetric :: Rule -> Expr -> Expr -> Bool
validateSymmetric (lhs,rhs) e f = or
[ validate (lhs,rhs) e f
, validate (rhs,lhs) e f
, validate (lhs,rhs) f e
, validate (rhs,lhs) f e
]

Interface

Remember: this tool is specifically intended for use with the Arithmetic Made Difficult posts, which include a few thousand lines worth of equational proofs. Those posts need two kinds of support.

1. We’d like to statically check the equalities in our equational proofs. Some equalities are annotated with a cross reference to the theorem that justifies them. Some sed magic can turn these into rewrite rules, which we can use to verify that the left hand side of an equality rewrites to the right hand side. Verifying one equality at a time, rather than an entire proof, makes it possible to build up static checks incrementally.
2. We’d also like to statically check variable substitutions. This is similar to scenario 1, except instead of rewriting a single (complex) subexpression, in this case we only care about replacing all instances of a variable with some expression and validating the result.
3. If an equality doesn’t have a cross referenced annotation, it’d be nice if our tool could also give suggestions for valid cross references based on a dictionary of rewrite rules. Bonus points if it also builds an edit script for inserting the annotation (hint, it will).

These requirements suggest that the tool needs three modes, which we’ll call verify mode, substitute mode, and suggest mode, respectively.

We’ll keep this really simple. In verify mode, the tool expects its input on stdin, one rewrite check per line, in the form

location TAB rulelhs = rulerhs TAB from-expr TAB to-expr

Where location is some information about where the check comes from (for reporting problems). So we’ll need to split a line on tabs:

unTab :: String -> [String]
unTab = unfoldr getCell
where
getCell [] = Nothing
getCell xs =
let
(cell, z) = span (/= '\t') xs
rest = if null z then [] else tail z
in
Just (cell, rest)

And we’ll need to process one untabbed line. We’re assuming that the rewrite rules are symmetric; given x = y and two expressions e and f, any substitution taking either e to f or f to e is considered valid. This is ok for the equation chains we want to use this on.

processVerify :: [String] -> IO ()
processVerify [loc,r,a,b] = do
rule <- parseWithIO loc (pRule pLatexExpr) r
e <- parseWithIO loc pLatexExpr a
f <- parseWithIO loc pLatexExpr b
if validateSymmetric rule e f
then return ()
else putStrLn $unwords [loc,"invalid!",r,"::",a,"-->",b] processVerify xs = do putStrLn$ unwords ("unrecognized input format:" : xs)

In substitute mode, we’ll accept input in the same format as for verify mode, with an extra check on the form of the rewrite rule.

processSubstitute :: [String] -> IO ()
processSubstitute [loc,r,a,b] = do
(lhs,rhs) <- parseWithIO loc (pRule pLatexExpr) r
e <- parseWithIO loc pLatexExpr a
f <- parseWithIO loc pLatexExpr b
case lhs of
Var x -> if (f == substitute [(x,rhs)] e) || (e == substitute [(x,rhs)] f)
then return ()
else putStrLn $unwords [loc,"invalid!",r,"::",a,"-->",b] App _ _ -> if (Just f == funSub rhs lhs e) || (Just e == funSub rhs lhs e) then return () else putStrLn$ unwords [loc,"invalid!",r,"::",a,"-->",b]
_ -> do
putStrLn $unwords [loc,"rewrite rule must be of the form 'var = expr':",r] processSubstitute xs = do putStrLn$ unwords ("unrecognized input:" : xs)

In suggest mode, we’ll assume that a dictionary of named rewrite rules is kept in some external file consisting of lines of the form

name TAB rulelhs = rulerhs
parseNamedRule :: [String] -> IO (String,(Expr,Expr))
parseNamedRule [name,r] = do
(x,y) <- parseWithIO "-" (pRule pLatexExpr) r
return (name,(x,y))
parseNamedRule xs = do
putStrLn $unwords ("unrecognized input:":xs) exitFailure We’ll also allow comment lines in the dictionary; these start with a # character and are ignored. comment :: String -> Bool comment ('#':_) = True comment _ = False We also assume backslashes are escaped in the dictionary because reasons. unesc :: String -> String unesc ('\\':'\\':xs) = '\\' : unesc xs unesc (c:cs) = c : unesc cs unesc [] = [] It will be more useful if in suggest mode, we ignore parse errors. Suggestion mode marches down the rules in the dictionary, and if it finds one, emits a sed command to insert an annotation. This part is completely ad-hoc. processSuggest :: [(String,(Expr,Expr))] -> [String] -> IO () processSuggest ((name,(x,y)):rs) [loc,line,a,b] = do lhs <- case parseWith loc pLatexExpr a of Left _ -> return (Con "") Right w -> return w rhs <- case parseWith loc pLatexExpr b of Left _ -> return (Con "") Right w -> return w if validateSymmetric (x,y) lhs rhs then do putStrLn$ unwords
[ "sed -i ''"
, "'"++line++"s/^ & = & / \\&     \\\\href{" ++ name ++ "}~   = \\& /'"
, loc
]
else processSuggest rs [loc,line,a,b]
processSuggest [] [_,_,_,_] = return ()
processSuggest _ xs = do
putStrLn $unwords xs And main. In a departure from the unix philosophy, in verify mode we’ll report the number of checks made. main :: IO () main = do args <- getArgs case args of ["--verify"] -> do checks <- fmap (map unTab . lines) getContents mapM_ processVerify checks hPutStrLn stderr$ unwords [show $length checks, "checks completed"] ["--substitute"] -> do checks <- fmap (map unTab . lines) getContents mapM_ processSubstitute checks hPutStrLn stderr$ unwords [show $length checks, "checks completed"] ["--suggest", path] -> do rules <- (fmap (map unTab . filter (not . comment) . lines . unesc)$ readFile path)
>>= mapM parseNamedRule
checks <- fmap (map unTab . lines) getContents
putStrLn "#!/bin/bash"
mapM_ (processSuggest rules) checks

_ -> do
putStrLn $unlines [ "usage:" , " --verify : validate rewrites on stdin" , " --substitute : validate substitutions on stdin" , " --suggest PATH : suggest rewrite rules from PATH on stdin" ] Testing The parts of this tool are complicated enough that I’ll feel better having some tests. test_main :: IO () test_main = do parseSuccessTests parseFailureTests substituteTests validationTests Parsing tests helpers: testParseSuccess :: Int -> (String, Expr) -> IO () testParseSuccess k (str,expr) = do putStrLn$ "\nsuccess test " ++ show k
putStrLn str
case parseWith "" pLatexExpr str of
Left msg -> putStrLn msg >> exitFailure
Right ex -> if ex == expr
then putStrLn "ok"
else do
putStrLn $unlines [ "parse error:" , str , "expected: " ++ show expr , "actual: " ++ show ex ] exitFailure testParseFailure :: Int -> String -> IO () testParseFailure k str = do putStrLn$ "\nfailure test " ++ show k
putStrLn str
case parseWith "" pLatexExpr str of
Right ex -> putStrLn (show ex) >> exitFailure
Left _   -> putStrLn "ok"

Parsing test cases:

parseSuccessTests :: IO ()
parseSuccessTests = sequence_ $zipWith testParseSuccess [1..] [ ("f", Var "f") , ("\\f", Con "f") , ("f(x)", App (Var "f") (Var "x")) , ("\\f(x)", App (Con "f") (Var "x")) , ("f(\\x)", App (Var "f") (Con "x")) , ("\\f(\\x)", App (Con "f") (Con "x")) , ("f{x}", App (Var "f") (Var "x")) , ("\\f{x}", App (Con "f") (Var "x")) , ("f{\\x}", App (Var "f") (Con "x")) , ("\\f{\\x}", App (Con "f") (Con "x")) , ("f(x,y)", App (App (Var "f") (Var "x")) (Var "y")) , ("\\f(x,y)", App (App (Con "f") (Var "x")) (Var "y")) , ("f(x)(y)", App (App (Var "f") (Var "x")) (Var "y")) , ("f{x}(y)", App (App (Var "f") (Var "x")) (Var "y")) , ("f(x,y)(z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z")) , ("f(x)(y,z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z")) , ("f{x,y}(z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z")) , ("f{x}(y,z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z")) , ("f(x)(g(y))", App (App (Var "f") (Var "x")) (App (Var "g") (Var "y"))) ] parseFailureTests :: IO () parseFailureTests = sequence_$ zipWith testParseFailure [1..]
[ "#"
, "f(x"
, "f(x}"
, "\\fx)"
, "\\f(x,)"
, "\\f(,x)"
]

Substitution test helper:

testSubstitution :: Int -> (Substitution, Expr, Expr) -> IO ()
testSubstitution k (sub,e1,e2) = do
putStrLn $"\nsubstitution test " ++ show k putStrLn$ show sub
putStrLn $latex e1 if e2 == substitute sub e1 then putStrLn "ok" else do putStrLn$ unlines
[ "substitution error:"
, "expected: " ++ show e2
, "actual:   " ++ show (substitute sub e1)
]
exitFailure

Substitution test cases:

substituteTests :: IO ()
substituteTests = sequence_ $zipWith testSubstitution [1..] [ ([], Con "a", Con "a") , ([("a", Con "b")], Var "a", Con "b") , ([("a", Con "b")], Con "a", Con "a") , ([("a", App (Con "f") (Var "x"))], App (Con "g") (Var "a"), App (Con "g") (App (Con "f") (Var "x"))) , ([("a", Con "x")], App (Var "a") (Var "a"), App (Con "x") (Con "x")) , ([("x", Var "y")], App (Var "x") (Var "x"), App (Var "y") (Var "y")) , ([("x", Var "y")], App (Con "x") (Var "x"), App (Con "x") (Var "y")) ] Validation test helper: testValidate :: Int -> (Rule, Expr, Expr, Bool) -> IO () testValidate k (rule,e1,e2,result) = do putStrLn$ "\nvalidation test " ++ show k
putStrLn $show rule putStrLn$ latex e1
putStrLn $latex e2 if result == validate rule e1 e2 then putStrLn "ok" else do putStrLn "validation error!" exitFailure Validation test cases: validationTests :: IO () validationTests = sequence_$ zipWith testValidate [1..]
[
]