#### calc.lhs:

We want to implement a calculator capable of evaluating expressions like "3*(4+5)"

correctly.

This is a little bit of module magic to hide the built-in function "lex".

> import Prelude hiding (lex)

The first thing we need to do is lexicographic analysis. This turns an input

string into tokens. In our case, the main difference from the input string is

that integers are represented as tokens with associated values, and that spaces

are removed.

We represent the tokens using a datatype with constructors.

> data Token = TINT Int | PLUS | TIMES | LPAR | RPAR deriving (Show,Eq)

Lexicographic analysis is pretty simple.

> lex :: String -> [Token]

We're done for the empty string.

> lex [] = []

For tokens corresponding to single characters, we just recognize them.

> lex ('+':rest) = PLUS : lex rest

> lex ('*':rest) = TIMES : lex rest

> lex ('(':rest) = LPAR : lex rest

> lex (')':rest) = RPAR : lex rest

Once we see a digit, we assume we are looking at an integer and compute its

value using a separate function. Using the "l@" pattern results in a one

character lookahead; that is, the lex_int function sees the entire digit sequence.

> lex l@(c:rest) | (c>='0' && c<='9') = lex_int 0 l

We just skip spaces.

> lex (' ':rest) = lex rest

This computes the value of a sequence of digits, then continues with lexing.

> lex_int value l@(c:rest) | (c>='0' && c<='9') = lex_int (value*10+digit c) rest

> lex_int value (' ':rest) = TINT value : lex rest

> lex_int value rest = TINT value : lex rest

For the above to work, we need to compute the value of a single digit. We

can do this by subtracting the integer code of the character '0' from the

integer of the digit we are looking at.

> digit :: Char -> Int

> digit c = fromEnum c - fromEnum '0'

After lexicographic analysis, we need to parse. Here is a simple grammar

for numerical expressions. This grammar actually has already been arranged

to be easy to parse.

> -- expression -> term + expression | term

> -- term -> factor * term | factor

> -- factor -> number | ( expression )

The output from the parser is an expression tree, containing sums, products,

and values.

> data Expr = SUM Expr Expr | PRODUCT Expr Expr | INT Int deriving (Show,Eq)

Parses operate on sequences of tokens. They return a tree plus the tokens that

haven't been consumed yet.

> type Parser = [Token] -> (Expr,[Token])

First, we parse expressions.

> expression :: Parser

All expressions start with a term on the left, so we parse that.

> expression tokens = expression1 (term tokens)

The function "expression1" might be called more appropriately

"expression_after_parsing_left_term", but that's a little long and experienced

programmers should be able to figure out this simple pattern anyway.

There are two choices now: if we are looking at a PLUS, then we need to parse another

term following the PLUS. If not, we're in the second alternative in the definition

of expression and we just return the tree that we got.

> expression1 (left,PLUS:tokens) = expression2 left (term tokens)

> expression1 (left,tokens) = (left,tokens)

This would be "expression_after_parsing_right_term", but we keep it short

again. Whatever we call it, we now have the left subtree and the right subtree,

so we return the combined tree and the remaining tokens.

> expression2 left (right,tokens) = (SUM left right,tokens)

The "term" function is structured analogously to the "expression" function.

> term :: Parser

> term tokens = term1 (factor tokens)

> term1 (left,TIMES:tokens) = term2 left (factor tokens)

> term1 (left,tokens) = (left,tokens)

> term2 left (right,tokens) = (PRODUCT left right,tokens)

The "factor" function is structured analogously to the "expression" function.

> factor :: Parser

> factor (TINT c:rest) = (INT c,rest)

> factor (LPAR:rest) = factor1 (expression rest)

> factor1 (expr,RPAR:rest) = (expr,rest)

> factor1 _ = error "missing right parenthesis"

Now let's put the lexer and the expression parser into a parse function.

We also add an extra check: after parsing a complete expression, there should

be no tokens left in the input. This also needs a boring helper function that

we don't bother naming separately.

> parse s = parse1 (expression (lex s))

> parse1 (tree,[]) = tree

> parse1 _ = error "extra stuff after end of expression"

After all this parsing, actually evaluating the expression is simple. We

have a nested tree representation, and evaluation is the usual recursive

tree algorithm.

> eval :: Expr -> Int

> eval (INT c) = c

> eval (SUM a b) = (eval a) + (eval b)

> eval (PRODUCT a b) = (eval a) * (eval b)

Finally, let's put together evaluation and parsing to get our calculator.

> calc :: String -> Int

> calc s = eval (parse s)

And here is a simple application.

> main = print (calc "3+(4*500)")