In Munch we built a little parser combinator library. Now let’s test it.
This module consists of a list of parsers, together with lists of good inputs (with expected outputs) and bad inputs (with expected errors). These examples form an automated test suite for the library.
First the usual compiler noises.
module Main (main) where
import Text.ParserCombinators.Munch
import Control.Applicative
import System.Exit (exitFailure)
Although the Munch library can operate on any type implementing the Token
class, these examples will only test the built-in Char
instance.
runCharParse
:: Parser Char a
-> String
-> Either (ParseError Char) a
runCharParse q str =
runParser q (toStream str)
parse_results
is a helper function taking a parser and a list of good inputs and outputs; for each example it checks the actual output against the expectation. The entire suite fails if any example fails.
parse_results
:: (Eq a, Show a) => Parser Char a -> [(String, a)] -> IO ()
parse_results = mapM_ . parse_result_is
where
parse_result_is
:: (Eq a, Show a) => Parser Char a -> (String, a) -> IO ()
parse_result_is q (str, x) =
case runCharParse q str of
Left err -> do
putStrLn $ '\n' : displayParseError err
exitFailure
Right a -> if a == x
then return ()
else do
putStrLn "=> Value Error!"
putStrLn "=> Expected:"
putStrLn $ show x
putStrLn "=> Actual:"
putStrLn $ show a
exitFailure
Similarly, parse_failures
takes a parser and a list of bad inputs and outputs, failing if actual output doesn’t match.
parse_failures
:: (Eq a, Show a)
=> Parser Char a -> [(String, ParseError Char)] -> IO ()
parse_failures = mapM_ . parse_failure_is
where
parse_failure_is
:: (Eq a, Show a)
=> Parser Char a -> (String, ParseError Char) -> IO ()
parse_failure_is q (str, fail) =
case runCharParse q str of
Right a -> do
putStrLn $ '\n' : "Expected failure:"
putStrLn $ show a
exitFailure
Left err -> if err == fail
then return ()
else do
putStrLn "=> Error Error!"
putStrLn "=> Expected:"
putStrLn $ show fail
putStrLn "=> Actual:"
putStrLn $ show err
exitFailure
Now test_parser
wraps all this up, running good and bad tests for a given parser and reporting the result.
test_parser
:: (Eq a, Show a)
=> String -- ^ Test label
-> Parser Char a
-> [(String, a)] -- ^ Good cases
-> [(String, ParseError Char)] -- ^ Bad cases
-> IO ()
test_parser name p oks fails = do
putStrLn $ "==> " ++ name
parse_results p oks
parse_failures p fails
let k = length oks + length fails
putStrLn $ "OK: " ++ show k ++ " cases"
Let’s start simple with char
. There’s only one way this parser can succeed, and a few ways it can fail.
p_00 = char 'a'
ok_00 =
[ ("a", 'a')
]
fail_00 =
[ ("ab", Simply $ IncompleteParse $ Just (Pos 1 2))
, ("b", Simply $ UnexpectedToken 'b' (Just 'a') (Pos 1 1))
, ("", Simply $ UnexpectedEOF $ Right (Just 'a'))
]
Next is anyToken
.
p_01 = anyToken
ok_01 =
[ ("a", 'a')
, ("\n", '\n')
]
fail_01 =
[ ("", Simply $ UnexpectedEOF $ Left "any token")
, ("ab", Simply $ IncompleteParse $ Just (Pos 1 2))
]
Now to try many
. Note the failure case "a"
here; many decimalDigit
looks for 0 or more decimal digit, and finds none. This is a successful parse. Control then passes "a"
to eof
, which fails when it encounters 'a'
. So the actual failure message is raised by eof
.
p_02 = many decimalDigit <* eof
ok_02 =
[ ("1", "1")
, ("463", "463")
, ("", "")
]
fail_02 =
[ ("a", Simply $ UnexpectedToken 'a' Nothing (Pos 1 1))
]
Here’s a test using <|>
and <?>
. Note the difference in behavior between this parser and the last that arises because we’re using >>
rather than <*
.
p_03 = (smol <|> big) >> eof
where
smol = some lowerLatin <?> "little word"
big = some upperLatin <?> "big word"
ok_03 =
[ ("abc", ())
, ("ABC", ())
]
fail_03 =
[ ("abcD", Simply $ UnexpectedToken 'D' Nothing (Pos 1 4))
, ( ""
, OneOf
[ Because (Note "little word" Nothing)
(Simply (UnexpectedEOF
(Left "lower case latin letter (a-z)")))
, Because (Note "big word" Nothing)
(Simply (UnexpectedEOF
(Left "upper case latin letter (A-Z)")))
]
)
, ( "5"
, OneOf
[ Because (Note "little word" (Just (Pos 1 1)))
(Simply $ UnexpectedSatisfy '5'
"lower case latin letter (a-z)" (Pos 1 1))
, Because (Note "big word" (Just (Pos 1 1)))
(Simply $ UnexpectedSatisfy '5'
"upper case latin letter (A-Z)" (Pos 1 1))
]
)
]
Here’s a parser for a simplified telephone number format. These numbers have three ternary digits with an optional one-digit area code, and are formatted as either (N) N-NN
or N-NN
.
p_04 = (no_area_code <|> with_area_code) >> eof
where
no_area_code =
digit >> char '-' >> digit >> digit
with_area_code =
char '(' >> digit >> char ')' >> char ' ' >> no_area_code
digit =
satisfies (\c -> elem c ['0','1','2']) "ternary digit"
ok_04 =
[ ("1-11", ())
, ("(2) 1-01", ())
]
fail_04 =
[ ( "1-13"
, OneOf
[ Simply $ UnexpectedSatisfy '3' "ternary digit" (Pos 1 4)
, Simply $ UnexpectedToken '1' (Just '(') (Pos 1 1)
]
)
]
Here’s a parser testing positive and negative lookahead.
p_05 = do
wouldSucceed $ some (char 'a' <|> char 'b')
some (char 'a' <|> char 'b')
wouldFail $ char '%'
many (char '_')
return ()
ok_05 =
[ ("a", ())
, ("babbab____", ())
]
fail_05 =
[ ( "c"
, Because (Lookahead (Just (Pos {line = 1, column = 1})))
(OneOf
[ Simply (UnexpectedToken 'c' (Just 'a') (Pos 1 1))
, Simply (UnexpectedToken 'c' (Just 'b') (Pos 1 1))
])
)
, ( "%"
, Because (Lookahead (Just (Pos {line = 1, column = 1})))
(OneOf
[ Simply (UnexpectedToken '%' (Just 'a') (Pos 1 1))
, Simply (UnexpectedToken '%' (Just 'b') (Pos 1 1))
])
)
, ( "aabba%"
, Simply (UnexpectedSuccess (Just (Pos 1 6)))
)
]
Now some basic indentation.
p_06 = do
localRef (Pos 1 5) $ do
ignore spaces
indent (wrtRef Start Column Eq) $ char 'a'
eof
ok_06 =
[ (" a", ())
]
fail_06 =
[ ( "a"
, Simply (UnexpectedIndentation
("start column of successful parse, at l1c1, to " ++
"equal that of the reference position at l1c5")
(Pos 1 1, Pos 1 1))
)
]
And a permutation.
p_07 =
let f x y z = [x,y] ++ z in
permute (f <$$> char 'a' <&?> ('_', char 'b') <&&> some (char 'c'))
ok_07 =
[ ("abc", "abc")
, ("cca", "a_cc")
]
fail_07 =
[ ( "bacd"
, Simply (IncompleteParse (Just (Pos 1 4)))
)
]
Now for something a little more complicated. Here’s a simple rose tree type. We’ll use indentation to denote levels in the tree; child nodes are indented by two extra spaces.
data Tree = B String [Tree]
deriving (Eq, Show)
p_08 = do
(b,(u,_)) <- ignore spaces *> consume (some lowerLatin) <* newline
ts <- many $ indent (wrtPos u Start Column (Add 2)) p_08
return $ B b ts
ok_08 =
[ ( concat
[ "a\n"
]
, B "a" []
)
, ( concat
[ "a\n"
, " b\n"
, " c\n"
]
, B "a" [B "b" [], B "c" []]
)
]
fail_08 =
[ ( concat
[ "a\n"
, " b\n"
]
, Simply (IncompleteParse (Just (Pos 2 1)))
)
]
main :: IO ()
main = do
putStrLn ""
test_parser "char"
p_00 ok_00 fail_00
test_parser "anyChar"
p_01 ok_01 fail_01
test_parser "many decimalDigit <* eof"
p_02 ok_02 fail_02
test_parser "little or big word"
p_03 ok_03 fail_03
test_parser "ternary phone numbers"
p_04 ok_04 fail_04
test_parser "speculation"
p_05 ok_05 fail_05
test_parser "simple indentation"
p_06 ok_06 fail_06
test_parser "simple permutation"
p_07 ok_07 fail_07
test_parser "indented trees"
p_08 ok_08 fail_08