EBNF Scanner in Modula-2
As you might have seen already, one of my projects is the m4m compiler. The target is to make a compiler for a language closely resembling Modula-2, for use on micro controllers like the PIC, AVR and Zilog series. As a side effects I have been attracted to reading about Professor Wirth's experiences in this field. So I entered the subjects of Oberon and CoCo (qv). And when ordering ("used") books about those subjects I ran into "used" books about Modula-2. Which I ordered.
As a consequence I now am the proud owner of a german language edition of PIM (Programming in Modula-2) by
Niklaus Wirth. And since Professor Wirth is just about the inventor of EBNF (Extended Backus Naur Formalism, a
programming language to describe programming languages), he listed an EBNF checker in his PIM book.
In this section of fruttenboel, I pay attention to the EBNF scanner which
EBNFScanner: DEFINITION MODULE
The EBNF Scanner is the library module that contains the functions required for scanning the text. We start out with the DEFINITION MODULE:
DEFINITION MODULE EBNFScanner;
TYPE Symbol = (ident, literal, lpar, lbk, lbr, bar, eql, period, rpar, rbk, rbr, other);
CONST IdLength = 24;
VAR sym : Symbol;
id : ARRAY [0..IdLength] OF CHAR;
Ino : INTEGER;
PROCEDURE GetSym;
PROCEDURE MarkError (n : INTEGER);
PROCEDURE SkipLine;
END EBNFScanner.
This library exports just three functions. GetSym will read the next symbol from the inout file, MarkError
reacts on syntax errors by making a mention of it and skipping until the next strong SYNC point. SkipLine will
do the actional skipping until a SYNC marker.
EBNFScanner: IMPLEMENTATION MODULE
Below is the IMPLEMENTATION MODULE for the EBNF Scanner. This is a fine example of structured programming. The program uses a local module with it's associated IMPORT and EXPORT statements.
IMPLEMENTATION MODULE EBNFScanner;
FROM ASCII IMPORT LF;
FROM InOut IMPORT Read, Write, WriteLn, WriteInt, WriteBf, EOF;
VAR ch : CHAR;
MODULE LineHandler;
IMPORT LF, EOF, ch, Ino, Read, Write, WriteLn, WriteInt, WriteBf;
EXPORT GetCh, MarkError, SkipLine;
CONST LineWidth = 100;
VAR cc : INTEGER;
cc1 : INTEGER;
cc2 : INTEGER;
line : ARRAY [0..LineWidth - 1] OF CHAR;
PROCEDURE GetLine;
BEGIN
IF cc2 > 0 THEN
WriteLn;
cc2 := 0
END;
Read (ch);
IF EOF () THEN
line [0] := 177C;
cc1 := 1
ELSE
INC (Ino);
WriteInt (Ino, 5);
Write (' ');
cc1 := 0;
LOOP
Write (ch);
line [cc1] := ch;
INC (cc1);
IF ch = LF THEN EXIT END;
Read (ch)
END
END
END GetLine;
PROCEDURE GetCh;
BEGIN
WHILE cc = cc1 DO
cc := 0;
GetLine
END;
ch := line [cc];
INC (cc)
END GetCh;
PROCEDURE MarkError (n : INTEGER);
BEGIN
IF cc2 = 0 THEN
Write ('*');
cc2 := 3;
REPEAT
Write (' ');
DEC (cc2)
UNTIL cc2 = 0;
END;
WHILE cc2 < cc DO
Write (' ');
INC (cc2)
END;
Write ('^');
WriteInt (n, 1);
INC (cc2, 2)
END MarkError;
PROCEDURE SkipLine;
BEGIN
WHILE ch # LF DO GetCh END;
GetCh
END SkipLine;
BEGIN (* BEGIN of LineHandler *)
cc := 0;
cc1 := 0;
cc2 := 0
END LineHandler;
PROCEDURE GetSym;
VAR i : CARDINAL;
BEGIN
WHILE ch <= ' ' DO GetCh END;
IF ch = '/' THEN
SkipLine;
WHILE ch <= ' ' DO GetCh END
END;
IF (CAP (ch) <= 'Z') AND (CAP (ch) >= 'A') THEN
i := 0;
sym := literal;
REPEAT
IF i < IdLength THEN
id [i] := ch;
INC (i)
END;
IF ch > 'Z' THEN sym := ident END;
GetCh
UNTIL (CAP (ch) < 'A') OR (CAP (ch) > 'Z');
id [i] := ' '
ELSIF ch = "'" THEN
i := 0;
GetCh;
sym := literal;
WHILE ch # "'" DO
IF i < IdLength THEN
id [i] := ch;
INC (i)
END;
GetCh
END;
GetCh;
id [i] := ' '
ELSIF ch = '"' THEN
i := 0;
GetCh;
sym := literal;
WHILE ch # '"' DO
IF i < IdLength THEN
id [i] := ch;
INC (i)
END;
GetCh
END;
GetCh;
id [i] := ' '
ELSIF ch = '=' THEN sym := eql; GetCh
ELSIF ch = '(' THEN sym := lpar; GetCh
ELSIF ch = ')' THEN sym := rpar; GetCh
ELSIF ch = '[' THEN sym := lbk; GetCh
ELSIF ch = ']' THEN sym := rbk; GetCh
ELSIF ch = '{' THEN sym := lbr; GetCh
ELSIF ch = '}' THEN sym := rbr; GetCh
ELSIF ch = '|' THEN sym := bar; GetCh
ELSIF ch = '.' THEN sym := period; GetCh
ELSIF ch = 177C THEN sym := other; GetCh
ELSE
sym := other;
GetCh
END
END GetSym;
BEGIN
Ino := 0;
ch := ' '
END EBNFScanner.
This program will scan the EBNF code and act upon it if it finds errors. As you see, the layout of this
program is very similar to how EBNF is defined.
TableHandler: DEFINITION MODULE
Next, the header file for the Table Handler (which makes a binary tree for storing the symbol table) is published:
DEFINITION MODULE TableHandler;
CONST LineWidth = 80;
WordLength = 24;
TYPE Table;
VAR overflow : INTEGER; (* overflow > 0 means: table is full *)
PROCEDURE InitTable (VAR t : Table);
PROCEDURE Record (t : Table; VAR x : ARRAY OF CHAR; n : INTEGER);
PROCEDURE Tabulate (t : Table);
END TableHandler.
Not much to mention here. The table itself is defined, just like some parameters and three Table related
functions. InitTable will do just what the name implies. Record is to be pronounced in the same way as we
would when recording an audio tape. It checks if a symbol exists and if it doesn't the new symbol is inserted
in the tree. Tabulate will print out the symbol table.
TableHandler: IMPLEMENTATION MODULE
The table handler dynamically allocates space on the heap for building the binary tree. Then it traverses the tree to chec if a symbol is new or already defined. If it is new, it is inserted in the search tree.
IMPLEMENTATION MODULE TableHandler;
FROM InOut IMPORT Write, WriteInt, WriteLn;
FROM Storage IMPORT ALLOCATE;
CONST TableLength = 3000;
TYPE TreePtr = POINTER TO Word;
ListPtr = POINTER TO Item;
Item = RECORD
num : INTEGER;
next : ListPtr
END;
Word = RECORD
key : INTEGER;
first : ListPtr;
left, right : TreePtr
END;
Table = TreePtr;
VAR id : ARRAY [0..WordLength] OF CHAR;
ascinx : INTEGER;
asc : ARRAY [0..TableLength - 1] OF CHAR;
PROCEDURE InitTable (VAR t : Table);
BEGIN
ALLOCATE (t, SIZE (Word));
t^.right := NIL
END InitTable;
PROCEDURE Search (p : TreePtr) : TreePtr;
TYPE Relation = (less, equal, greater);
VAR q : TreePtr;
r : Relation;
i : INTEGER;
PROCEDURE rel (k : INTEGER) : Relation;
VAR i : INTEGER;
R : Relation;
x, y : CHAR;
BEGIN
i := 0;
R := equal;
LOOP
x := id [i];
y := asc [k];
IF CAP (x) # CAP (y) THEN EXIT END;
IF x <= ' ' THEN RETURN R END;
IF x < y THEN
R := less
ELSIF x > y THEN
R := greater
END;
INC (i);
INC (k)
END;
IF CAP (x) > CAP (y) THEN
RETURN greater
ELSE
RETURN less
END
END rel;
BEGIN
q := p^.right;
r := greater;
WHILE q # NIL DO
p := q;
r := rel (p^.key);
IF r = equal THEN
RETURN p
ELSIF r = less THEN
q := p^.left
ELSE
q := p^.right
END
END;
ALLOCATE (q, SIZE (Word));
IF q # NIL THEN
WITH q^ DO
key := ascinx;
first := NIL;
left := NIL;
right := NIL
END;
IF r = less THEN p^.left := q ELSE p^.right := q END;
i := 0;
WHILE id [i] > ' ' DO
IF ascinx = TableLength THEN
asc [ascinx] := ' ';
id [i] := ' ';
overflow := 1
ELSE
asc [ascinx] := id [i];
INC (ascinx);
INC (i)
END
END;
asc [ascinx] := ' ';
INC (ascinx)
END;
RETURN q
END Search;
PROCEDURE Record (t : Table; VAR x : ARRAY OF CHAR; n : INTEGER);
VAR p : TreePtr;
q : ListPtr;
i : INTEGER;
BEGIN
i := 0;
REPEAT
id [i] := x [i];
INC (i)
UNTIL (id [i-1] = ' ') OR (i = WordLength);
p := Search (t);
IF p = NIL THEN
overflow := 2
ELSE
ALLOCATE (q, SIZE (Item));
IF q = NIL THEN
overflow := 3
ELSE
q^.num := n;
q^.next := p^.first;
p^.first := q
END
END
END Record;
PROCEDURE Tabulate (t : Table);
PROCEDURE PrintItem (p : TreePtr);
CONST L = 6;
N = (LineWidth - WordLength) DIV L;
VAR ch : CHAR;
i, k : INTEGER;
q : ListPtr;
BEGIN
i := WordLength + 1;
k := p^.key;
REPEAT
ch := asc [k];
DEC (i);
INC (k);
Write (ch)
UNTIL ch <= ' ';
WHILE i > 0 DO Write (' '); DEC (i) END;
q := p^.first;
i := N;
WHILE q # NIL DO
IF i = 0 THEN
WriteLn;
i := WordLength + 1;
REPEAT Write (' '); DEC (i) UNTIL i = 0;
i := N
END;
WriteInt (q^.num, L);
q := q^.next;
DEC (i)
END;
WriteLn
END PrintItem;
PROCEDURE TraverseTree (p : TreePtr);
BEGIN
IF p # NIL THEN
TraverseTree (p^.left);
PrintItem (p);
TraverseTree (p^.right);
END
END TraverseTree;
BEGIN
WriteLn;
TraverseTree (t^.right)
END Tabulate;
BEGIN
ascinx := 0;
id [WordLength] := ' ';
overflow := 0
END TableHandler.
EBNF: the actual executable.
What follows is the EBNF scanner executable. Here we see a build-up of the program which is almost identical to the way in which EBNF is defined. Look at the processing of the left parenthesis (lpar). Between parenthesis, an expression is required. So the section scanning the lpar, reads the next symbol confirms if this is an expression and when successful, it checks whether the expression is closed by a right parenthesis (rpar). If there is an inconsistency in any place, an error is flagged and the scanning is continued after a predetermined SYNC point.
MODULE EBNF;
FROM ASCII IMPORT EOL;
FROM InOut IMPORT Done, Read, Write, WriteLn, WriteInt, WriteString;
FROM EBNFScanner IMPORT Symbol, sym, id, Ino, GetSym, MarkError, SkipLine;
FROM TableHandler IMPORT WordLength, Table, overflow, InitTable, Record, Tabulate;
VAR T0, T1 : Table;
PROCEDURE skip (n : INTEGER);
BEGIN
MarkError (n);
WHILE (sym < lpar) OR (sym > period) DO GetSym END
END skip;
PROCEDURE Expression;
PROCEDURE Term;
PROCEDURE Factor;
BEGIN
IF sym = ident THEN
Record (T0, id, Ino);
GetSym
ELSIF sym = literal THEN
Record (T1, id, Ino);
GetSym
ELSIF sym = lpar THEN
GetSym;
Expression;
IF sym = rpar THEN GetSym ELSE skip (2) END
ELSIF sym = lbk THEN
GetSym;
Expression;
IF sym = rbk THEN GetSym ELSE skip (3) END
ELSIF sym = lbr THEN
GetSym;
Expression;
IF sym = rbr THEN GetSym ELSE skip (4) END
ELSE
skip (5)
END
END Factor;
BEGIN
Factor;
WHILE sym < bar DO Factor END
END Term;
BEGIN
Term;
WHILE sym = bar DO
GetSym;
Term
END
END Expression;
PROCEDURE Production;
BEGIN
Record (T0, id, - INTEGER (Ino));
GetSym;
IF sym = eql THEN GetSym ELSE skip (7) END;
Expression;
IF sym # period THEN
MarkError (8);
SkipLine
END;
GetSym
END Production;
BEGIN
InitTable (T0);
InitTable (T1);
GetSym;
WHILE (sym = ident) AND (overflow = 0) DO Production END;
IF overflow > 0 THEN
WriteLn;
WriteString ("Table overflow");
WriteInt (overflow, 6);
END;
Write (35C);
Tabulate (T0);
Tabulate (T1);
END EBNF.
Debugging.
I copied this source by typing it in from the PIM book. Now, that source was meant to be run from a Lilith workstation and that was different from Linux based Mocka. So a few small adaptations were required:
Eating the pudding.
The proof of the pudding, is in the eating. So now we eat the pudding... But before we go to table, we need tools to eat the custard jelly. Below is the source of a file called 'ebnf.syntax'
symbol = {" "} (identifier | string | "(" | ")" | "[" | "]" | "{" | "}" /
| "|" | "=" | ".").
identifier = letter { letter | digit }.
string = """{character}""".
If we feed this into the EBNF executable with a command line similar to
EBNF <ebnf.syntax
then the system will produce an output similar to:
bash-2.05$ EBNF <ebnf.syntax
1 symbol = {" "} (identifier | string | "(" | ")" | "[" | "]" | "{" | "}" /
2 | "|" | "=" | ".").
3
4 identifier = letter { letter | digit }.
5
6 string = """{character}""".
digit 4
identifier -4 1
letter 4 4
string -6 1
symbol -1
6 6 1
( 1
) 1
. 2
= 2
[ 1
] 1
{ 1
{character} 6
| 2
} 1
bash-2.05$
Well, that wasn't bad for a pudding Professor Wirth baked for us, almost 20 years ago. To make life easier on
all Modula-2 enthusiasts, I made a tar.gz file that contains the full sources plus the compiled binaries. It
is downloadable in the DOWNLOAD section (look in the navigator frame on the right). If you do not see the
navigator frame, please read the text in the next green rectangle below.
There will be more of this sources and projects from the PIM book. Please check these pages on a regular base to see what the current status is.
Page created June 2005,
Page equipped with GoogleBuster technology