Extras‎ > ‎



We want to implement a calculator capable of evaluating expressions like "3*(4+5)" 

This is a little bit of module magic to hide the built-in function "lex".

> import Prelude hiding (lex)

The first thing we need to do is lexicographic analysis.  This turns an input
string into tokens.  In our case, the main difference from the input string is
that integers are represented as tokens with associated values, and that spaces
are removed. 

We represent the tokens using a datatype with constructors.

> data Token = TINT Int | PLUS | TIMES | LPAR | RPAR deriving (Show,Eq)

Lexicographic analysis is pretty simple.

> lex :: String -> [Token]

We're done for the empty string.

> lex [] = []

For tokens corresponding to single characters, we just recognize them.

> lex ('+':rest) = PLUS : lex rest
> lex ('*':rest) = TIMES : lex rest
> lex ('(':rest) = LPAR : lex rest
> lex (')':rest) = RPAR : lex rest

Once we see a digit, we assume we are looking at an integer and compute its
value using a separate function.  Using the "l@" pattern results in a one
character lookahead; that is, the lex_int function sees the entire digit sequence.

> lex l@(c:rest) | (c>='0' && c<='9') = lex_int 0 l 

We just skip spaces.

> lex (' ':rest) = lex rest

This computes the value of a sequence of digits, then continues with lexing.

> lex_int value l@(c:rest) | (c>='0' && c<='9') = lex_int (value*10+digit c) rest
> lex_int value (' ':rest) = TINT value : lex rest
> lex_int value rest = TINT value : lex rest

For the above to work, we need to compute the value of a single digit.  We
can do this by subtracting the integer code of the character '0' from the
integer of the digit we are looking at.

> digit :: Char -> Int
> digit c = fromEnum c - fromEnum '0'

After lexicographic analysis, we need to parse.  Here is a simple grammar
for numerical expressions.  This grammar actually has already been arranged
to be easy to parse.

> -- expression -> term + expression | term
> -- term -> factor * term | factor
> -- factor -> number | ( expression )

The output from the parser is an expression tree, containing sums, products,
and values.

> data Expr = SUM Expr Expr | PRODUCT Expr Expr | INT Int deriving (Show,Eq)

Parses operate on sequences of tokens.  They return a tree plus the tokens that
haven't been consumed yet.

> type Parser = [Token] -> (Expr,[Token])

First, we parse expressions.

> expression :: Parser

All expressions start with a term on the left, so we parse that.

> expression tokens = expression1 (term tokens)

The function "expression1" might be called more appropriately 
"expression_after_parsing_left_term", but that's a little long and experienced
programmers should be able to figure out this simple pattern anyway.

There are two choices now: if we are looking at a PLUS, then we need to parse another
term following the PLUS.  If not, we're in the second alternative in the definition
of expression and we just return the tree that we got.

> expression1 (left,PLUS:tokens) = expression2 left (term tokens)
> expression1 (left,tokens) = (left,tokens)

This would be "expression_after_parsing_right_term", but we keep it short
again.  Whatever we call it, we now have the left subtree and the right subtree,
so we return the combined tree and the remaining tokens.

> expression2 left (right,tokens) = (SUM left right,tokens)

The "term" function is structured analogously to the "expression" function.

> term :: Parser
> term tokens = term1 (factor tokens)
> term1 (left,TIMES:tokens) = term2 left (factor tokens)
> term1 (left,tokens) = (left,tokens)
> term2 left (right,tokens) = (PRODUCT left right,tokens)

The "factor" function is structured analogously to the "expression" function.

> factor :: Parser
> factor (TINT c:rest) = (INT c,rest)
> factor (LPAR:rest) = factor1 (expression rest)
> factor1 (expr,RPAR:rest) = (expr,rest)
> factor1 _ = error "missing right parenthesis"

Now let's put the lexer and the expression parser into a parse function.
We also add an extra check: after parsing a complete expression, there should
be no tokens left in the input.  This also needs a boring helper function that
we don't bother naming separately.

> parse s = parse1 (expression (lex s))
> parse1 (tree,[]) = tree
> parse1 _ = error "extra stuff after end of expression"

After all this parsing, actually evaluating the expression is simple.  We
have a nested tree representation, and evaluation is the usual recursive
tree algorithm.

> eval :: Expr -> Int
> eval (INT c) = c
> eval (SUM a b) = (eval a) + (eval b)
> eval (PRODUCT a b) = (eval a) * (eval b)

Finally, let's put together evaluation and parsing to get our calculator.

> calc :: String -> Int
> calc s = eval (parse s)

And here is a simple application.

> main = print (calc "3+(4*500)")