A simple term rewriting tool

Posted on 2018-01-23 by nbloomf
Tags: literate-haskell

This post is literate Haskell; you can load the source into GHCi and play along.


module TermRewriting where

import Data.List (nub, unfoldr)
import Data.Maybe (catMaybes)
import Control.Monad (guard)
import Text.ParserCombinators.Parsec
import System.Environment
import System.Exit
import System.IO

Today we’ll build a tool that applies term-rewriting rules to expressions in a simplified form of mathematical notation.

My motivation for this comes from the Arithmetic Made Difficult series of posts on this site. There, I’m writing lots of equation-style proofs, and it would be nice to have a tool check that at least some of the steps in those proofs are carried out correctly. I won’t try to build a full-fledged proof checker – building a proof checker that uses a syntax readable by people is a Hard Problem. But a simple term rewriting tool can get us (made up number) 80% of the benefits with 1% of the effort. With this tool, we’ll be able to annotate equational proofs with “justification” information that is machine checked, but also compiles to human readable cross references (hyperlinks).

Grammar

To set the scope of the problem, we’ll focus on a subset of math notation that can be easily translated to something like lambda calculus. The basic ingredients are

  1. constants, or atomic expressions, that represent themselves;
  2. variables, that represent arbitrary subexpressions; and
  3. application of one expression to another.

We can represent this little grammar with the following Expr type.

data Expr
  = Con Token
  | Var Token
  | App Expr Expr
  deriving (Eq, Show)

type Token = String

In keeping with \(\LaTeX\) syntax, we will decree that constants consist of one or more letters prefixed by a \, such as

\plus \times \cons
pCon :: Parser Expr
pCon = do
  char '\\'
  token <- many1 letter
  return (Con token)

We’ll decree that variables consist of one or more letters not prefixed by a \, such as

a b foo munge
pVar :: Parser Expr
pVar = do
  token <- many1 letter
  return (Var token)

Application is a little trickier. Certainly an expression like f(a) means “the expression f applied to the expression a. The trouble is how to handle functions with more than one argument. We’ll do this by implicitly currying all functions. Remember that any function with signature

f : (a,b) -> c

is equivalent to one with signature

g : a -> b -> c

where the function arrow is right associative. So we can interpret an expression like f(a,b) implicitly as f(a)(b). In particular, we only need to support functions of one argument. But we also need to handle explicitly curried functions, like

f(a)(b,c)

and we also need to allow arguments to be fenced in either parentheses or braces.

fenced :: Parser a -> Parser a
fenced p =
      (between (char '(') (char ')') p)
  <|> (between (char '{') (char '}') p)

We’ll say that an “expression” is a constant or variable followed by zero or more fenced comma-delimited lists of subexpressions.

pLatexExpr :: Parser Expr
pLatexExpr = do
  f <- pCon <|> pVar
  args <- option [] $ many1 $ fenced $ sepBy1 pLatexExpr (char ',')
  return $ foldl App f $ concat args

Nothing about the Expr type is specific to LaTeX, but (at least for now) this tool will be used specifically to parse and verify equations written in LaTeX.

While we’re at it, some parsing helpers:

parseWith :: String -> Parser a -> String -> Either String a
parseWith loc p text = case runParser (p <* eof) () loc text of
  Left err -> Left (show err)
  Right x -> Right x

parseWithIO :: String -> Parser a -> String -> IO a
parseWithIO loc p text = case parseWith loc p text of
  Left msg -> putStrLn (unwords [loc,text,msg]) >> exitFailure
  Right a  -> return a

And here is an example.

$> x <- parseWithIO "" pLatexExpr "\\plus(a,\\times(b,c))"
$> putStrLn $ show x
App (App (Con "plus") (Var "a")) (App (App (Con "times") (Var "b")) (Var "c"))

We’ll also make a function to pretty print Exprs as LaTeX.

latex :: Expr -> String
latex (Con x) = '\\' : x
latex (Var x) = x
latex (App x y) = concat
  [ latex x, "(", latex y, ")" ]

For example:

$> x <- parseWithIO "" pLatexExpr "\\plus(a,\\times(b,c))"
$> putStrLn $ latex x
\plus(a)(\times(b)(c))

Note that the round trip generally takes a LaTeX statement to a different, but equivalent, statement. This is the price of simplicity, and doesn’t matter too much.

Substitutions

A substitution is a mapping from variables to expressions, which we’ll represent using as a list of pairs.

type Substitution = [(Token, Expr)]

Since lists are not actually maps, we need a helper function to decide whether a given list of token-expression pairs is well-defined. That is, a valid substitution shouldn’t have two pairs with the same token but different expressions.

wellDefined :: Substitution -> Bool
wellDefined [] = True
wellDefined ((x,a):ps) = and
  [ null (filter (\(y,b) -> (x == y) && (a /= b)) ps)
  , wellDefined ps
  ]

Now actually making a substitution is straightforward; we march down an Expr, and if we find a variable token replace it with its associated subexpression. We do the simplest thing by replacing all instances of the variable.

substitute :: Substitution -> Expr -> Expr
substitute _ (Con a) =
  Con a
substitute es (Var x) =
  case lookup x es of
    Just e  -> e
    Nothing -> Var x
substitute es (App a b) =
  App (substitute es a) (substitute es b)

Next we can try to construct a substitution taking one expression to another, based at the root. Constants match only themselves, variables match anything, and applications match recursively.

matches :: Expr -> Expr -> Maybe Substitution
matches pattern expr = match pattern expr >>= check
  where
    check ps = if wellDefined ps
      then Just (nub ps)
      else Nothing

    match (Con a) (Con b) =
      if a == b then Just [] else Nothing
    match (Con _) _ =
      Nothing
    match (Var x) e =
      Just [(x,e)]
    match (App e1 e2) (App f1 f2) = do
      xs <- match e1 f1
      ys <- match e2 f2
      return (xs ++ ys)
    match (App _ _) _ =
      Nothing

More generally matches can occur anywhere in an expression, and we need to be able to unambiguously state where the match is. The Path type represents a series of directions for moving from the root of an expression to some interior node.

data Path
  = H | L Path | R Path
  deriving (Eq, Show)

And matchesIn constructs a list of all substitutions from one expression to another, along with paths to the root of each substitution.

matchesIn :: Expr -> Expr -> [(Path, Substitution)]
matchesIn pattern expr = case expr of
  Con a -> case matches pattern expr of
    Nothing -> []
    Just s  -> [(H, s)]
  Var x -> case matches pattern expr of
    Nothing -> []
    Just s  -> [(H, s)]
  App x y -> do
    let
      mx = map (\(p,z) -> (L p, z)) $ matchesIn pattern x
      my = map (\(p,z) -> (R p, z)) $ matchesIn pattern y
    case matches pattern expr of
      Nothing -> mx ++ my
      Just s  -> (H,s) : mx ++ my

Given a Path, we can (attempt to) transform the subexpression it points to.

transform :: (Expr -> Expr) -> Path -> Expr -> Maybe Expr
transform f H expr = Just (f expr)
transform f (L path) (App expr w) = do
  q <- transform f path expr
  Just (App q w)
transform f (R path) (App w expr) = do
  q <- transform f path expr
  Just (App w q)
transform _ _ _ = Nothing

As a special case, rewrite replaces the subexpression at a given location.

rewrite :: Expr -> Path -> Expr -> Maybe Expr
rewrite p = transform (const p)

Substituting application expressions, where we want to match the arguments, is a little different.

funMatch :: Expr -> Expr -> Maybe Substitution
funMatch (Con a) (Con b) =
  if a == b then return [] else Nothing
funMatch (Var x) (Var y) =
  if x == y then return [] else Nothing
funMatch (App e (Var x)) (App f g) = do
  fmap ((x,g):) $ funMatch e f
funMatch _ _ = Nothing

funSub :: Expr -> Expr -> Expr -> Maybe Expr
funSub _ _ (Con a) = return (Con a)
funSub _ _ (Var x) = return (Var x)
funSub repl func (App f x) = do
  y <- funSub repl func x
  g <- funSub repl func f
  case funMatch func (App g y) of
    Nothing  -> return $ App g y
    Just sub -> if wellDefined sub
      then return $ substitute sub repl
      else Nothing

Rewrite Rules

A rewrite rule is a pair of expressions that we interpret as being “equal” for all possible substitutions.

type Rule = (Expr, Expr)

In our latexy syntax, we’ll say a rewrite rule is two expressions, separated by an equals sign, and separated by spaces.

pRule :: Parser Expr -> Parser Rule
pRule p = do
  a <- p
  spaces
  char '='
  spaces
  b <- p
  return (a,b)

To apply a rewrite rule to an expression, we find all matches of the left hand side in the expression, substitute these into the right hand side, and then rewrite the expression.

applyRule :: Rule -> Expr -> [Expr]
applyRule (lhs,rhs) x =
  catMaybes $ map applyMatch $ matchesIn lhs x
  where
    applyMatch (path,sub) = rewrite (substitute sub rhs) path x

To validate a rule application – to check that a = b maps e to f – we apply the rule to e and see if f is among the resuls.

validate :: Rule -> Expr -> Expr -> Bool
validate rule input output = elem output (applyRule rule input)

Most of the rewrite rules and expressions we’ll deal with are symmetric; we don’t mind reversing the order of the expressions in the rewrite rule, and the roles of the input and output expressions. For this case we’ll use a separate validation function.

validateSymmetric :: Rule -> Expr -> Expr -> Bool
validateSymmetric (lhs,rhs) e f = or
  [ validate (lhs,rhs) e f
  , validate (rhs,lhs) e f
  , validate (lhs,rhs) f e
  , validate (rhs,lhs) f e
  ]

Interface

Remember: this tool is specifically intended for use with the Arithmetic Made Difficult posts, which include a few thousand lines worth of equational proofs. Those posts need two kinds of support.

  1. We’d like to statically check the equalities in our equational proofs. Some equalities are annotated with a cross reference to the theorem that justifies them. Some sed magic can turn these into rewrite rules, which we can use to verify that the left hand side of an equality rewrites to the right hand side. Verifying one equality at a time, rather than an entire proof, makes it possible to build up static checks incrementally.
  2. We’d also like to statically check variable substitutions. This is similar to scenario 1, except instead of rewriting a single (complex) subexpression, in this case we only care about replacing all instances of a variable with some expression and validating the result.
  3. If an equality doesn’t have a cross referenced annotation, it’d be nice if our tool could also give suggestions for valid cross references based on a dictionary of rewrite rules. Bonus points if it also builds an edit script for inserting the annotation (hint, it will).

These requirements suggest that the tool needs three modes, which we’ll call verify mode, substitute mode, and suggest mode, respectively.

We’ll keep this really simple. In verify mode, the tool expects its input on stdin, one rewrite check per line, in the form

location TAB rulelhs = rulerhs TAB from-expr TAB to-expr

Where location is some information about where the check comes from (for reporting problems). So we’ll need to split a line on tabs:

unTab :: String -> [String]
unTab = unfoldr getCell
  where
    getCell [] = Nothing
    getCell xs =
      let
        (cell, z) = span (/= '\t') xs
        rest = if null z then [] else tail z
      in
        Just (cell, rest)

And we’ll need to process one untabbed line. We’re assuming that the rewrite rules are symmetric; given x = y and two expressions e and f, any substitution taking either e to f or f to e is considered valid. This is ok for the equation chains we want to use this on.

processVerify :: [String] -> IO ()
processVerify [loc,r,a,b] = do
  rule <- parseWithIO loc (pRule pLatexExpr) r
  e <- parseWithIO loc pLatexExpr a
  f <- parseWithIO loc pLatexExpr b
  if validateSymmetric rule e f
    then return ()
    else putStrLn $ unwords [loc,"invalid!",r,"::",a,"-->",b]
processVerify xs = do
  putStrLn $ unwords ("unrecognized input format:" : xs)

In substitute mode, we’ll accept input in the same format as for verify mode, with an extra check on the form of the rewrite rule.

processSubstitute :: [String] -> IO ()
processSubstitute [loc,r,a,b] = do
  (lhs,rhs) <- parseWithIO loc (pRule pLatexExpr) r
  e <- parseWithIO loc pLatexExpr a
  f <- parseWithIO loc pLatexExpr b
  case lhs of
    Var x -> if (f == substitute [(x,rhs)] e) || (e == substitute [(x,rhs)] f)
      then return ()
      else putStrLn $ unwords [loc,"invalid!",r,"::",a,"-->",b]
    App _ _ -> if (Just f == funSub rhs lhs e) || (Just e == funSub rhs lhs e)
      then return ()
      else putStrLn $ unwords [loc,"invalid!",r,"::",a,"-->",b]
    _ -> do
      putStrLn $ unwords [loc,"rewrite rule must be of the form 'var = expr':",r]
processSubstitute xs = do
  putStrLn $ unwords ("unrecognized input:" : xs)

In suggest mode, we’ll assume that a dictionary of named rewrite rules is kept in some external file consisting of lines of the form

name TAB rulelhs = rulerhs
parseNamedRule :: [String] -> IO (String,(Expr,Expr))
parseNamedRule [name,r] = do
  (x,y) <- parseWithIO "-" (pRule pLatexExpr) r
  return (name,(x,y))
parseNamedRule xs = do
  putStrLn $ unwords ("unrecognized input:":xs)
  exitFailure

We’ll also allow comment lines in the dictionary; these start with a # character and are ignored.

comment :: String -> Bool
comment ('#':_) = True
comment _ = False

We also assume backslashes are escaped in the dictionary because reasons.

unesc :: String -> String
unesc ('\\':'\\':xs) = '\\' : unesc xs
unesc (c:cs) = c : unesc cs
unesc [] = []

It will be more useful if in suggest mode, we ignore parse errors. Suggestion mode marches down the rules in the dictionary, and if it finds one, emits a sed command to insert an annotation. This part is completely ad-hoc.

processSuggest :: [(String,(Expr,Expr))] -> [String] -> IO ()
processSuggest ((name,(x,y)):rs) [loc,line,a,b] = do
  lhs <- case parseWith loc pLatexExpr a of
    Left _ -> return (Con "")
    Right w -> return w
  rhs <- case parseWith loc pLatexExpr b of
    Left _ -> return (Con "")
    Right w -> return w
  if validateSymmetric (x,y) lhs rhs
    then do
      putStrLn $ unwords
        [ "sed -i ''"
        , "'"++line++"s/^ & = & / \\&     \\\\href{" ++ name ++ "}~   = \\& /'"
        , loc
        ]
      else processSuggest rs [loc,line,a,b]
processSuggest [] [_,_,_,_] = return ()
processSuggest _ xs = do
  putStrLn $ unwords xs

And main. In a departure from the unix philosophy, in verify mode we’ll report the number of checks made.

main :: IO ()
main = do
  args <- getArgs
  case args of
    ["--verify"] -> do
      checks <- fmap (map unTab . lines) getContents
      mapM_ processVerify checks
      hPutStrLn stderr $ unwords [show $ length checks, "checks completed"]

    ["--substitute"] -> do
      checks <- fmap (map unTab . lines) getContents
      mapM_ processSubstitute checks
      hPutStrLn stderr $ unwords [show $ length checks, "checks completed"]

    ["--suggest", path] -> do
      rules <- (fmap (map unTab . filter (not . comment) . lines . unesc) $ readFile path)
        >>= mapM parseNamedRule
      checks <- fmap (map unTab . lines) getContents
      putStrLn "#!/bin/bash"
      mapM_ (processSuggest rules) checks

    _ -> do
      putStrLn $ unlines
        [ "usage:"
        , "  --verify       : validate rewrites on stdin"
        , "  --substitute   : validate substitutions on stdin"
        , "  --suggest PATH : suggest rewrite rules from PATH on stdin"
        ]

Testing

The parts of this tool are complicated enough that I’ll feel better having some tests.

test_main :: IO ()
test_main = do
  parseSuccessTests
  parseFailureTests
  substituteTests
  validationTests

Parsing tests helpers:

testParseSuccess :: Int -> (String, Expr) -> IO ()
testParseSuccess k (str,expr) = do
  putStrLn $ "\nsuccess test " ++ show k
  putStrLn str
  case parseWith "" pLatexExpr str of
    Left msg -> putStrLn msg >> exitFailure
    Right ex -> if ex == expr
      then putStrLn "ok"
      else do
        putStrLn $ unlines
          [ "parse error:"
          , str
          , "expected: " ++ show expr
          , "actual:   " ++ show ex
          ]
        exitFailure

testParseFailure :: Int -> String -> IO ()
testParseFailure k str = do
  putStrLn $ "\nfailure test " ++ show k
  putStrLn str
  case parseWith "" pLatexExpr str of
    Right ex -> putStrLn (show ex) >> exitFailure
    Left _   -> putStrLn "ok"

Parsing test cases:

parseSuccessTests :: IO ()
parseSuccessTests = sequence_ $ zipWith testParseSuccess [1..]
  [ ("f", Var "f")
  , ("\\f", Con "f")
  , ("f(x)", App (Var "f") (Var "x"))
  , ("\\f(x)", App (Con "f") (Var "x"))
  , ("f(\\x)", App (Var "f") (Con "x"))
  , ("\\f(\\x)", App (Con "f") (Con "x"))
  , ("f{x}", App (Var "f") (Var "x"))
  , ("\\f{x}", App (Con "f") (Var "x"))
  , ("f{\\x}", App (Var "f") (Con "x"))
  , ("\\f{\\x}", App (Con "f") (Con "x"))
  , ("f(x,y)", App (App (Var "f") (Var "x")) (Var "y"))
  , ("\\f(x,y)", App (App (Con "f") (Var "x")) (Var "y"))
  , ("f(x)(y)", App (App (Var "f") (Var "x")) (Var "y"))
  , ("f{x}(y)", App (App (Var "f") (Var "x")) (Var "y"))
  , ("f(x,y)(z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
  , ("f(x)(y,z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
  , ("f{x,y}(z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
  , ("f{x}(y,z)", App (App (App (Var "f") (Var "x")) (Var "y")) (Var "z"))
  , ("f(x)(g(y))", App (App (Var "f") (Var "x")) (App (Var "g") (Var "y")))
  ]

parseFailureTests :: IO ()
parseFailureTests = sequence_ $ zipWith testParseFailure [1..]
  [ "#"
  , "f(x"
  , "f(x}"
  , "\\fx)"
  , "\\f(x,)"
  , "\\f(,x)"
  ]

Substitution test helper:

testSubstitution :: Int -> (Substitution, Expr, Expr) -> IO ()
testSubstitution k (sub,e1,e2) = do
  putStrLn $ "\nsubstitution test " ++ show k
  putStrLn $ show sub
  putStrLn $ latex e1
  if e2 == substitute sub e1
    then putStrLn "ok"
    else do
      putStrLn $ unlines
        [ "substitution error:"
        , "expected: " ++ show e2
        , "actual:   " ++ show (substitute sub e1)
        ]
      exitFailure

Substitution test cases:

substituteTests :: IO ()
substituteTests = sequence_ $ zipWith testSubstitution [1..]
  [ ([], Con "a", Con "a")
  , ([("a", Con "b")], Var "a", Con "b")
  , ([("a", Con "b")], Con "a", Con "a")
  , ([("a", App (Con "f") (Var "x"))], App (Con "g") (Var "a"), App (Con "g") (App (Con "f") (Var "x")))
  , ([("a", Con "x")], App (Var "a") (Var "a"), App (Con "x") (Con "x"))
  , ([("x", Var "y")], App (Var "x") (Var "x"), App (Var "y") (Var "y"))
  , ([("x", Var "y")], App (Con "x") (Var "x"), App (Con "x") (Var "y"))
  ]

Validation test helper:

testValidate :: Int -> (Rule, Expr, Expr, Bool) -> IO ()
testValidate k (rule,e1,e2,result) = do
  putStrLn $ "\nvalidation test " ++ show k
  putStrLn $ show rule
  putStrLn $ latex e1
  putStrLn $ latex e2
  if result == validate rule e1 e2
    then putStrLn "ok"
    else do
      putStrLn "validation error!"
      exitFailure

Validation test cases:

validationTests :: IO ()
validationTests = sequence_ $ zipWith testValidate [1..]
  [
  ]