EBNF Scanner in Modula-2

As you might have seen already, one of my projects is the m4m compiler. The target is to make a compiler for a language closely resembling Modula-2, for use on micro controllers like the PIC, AVR and Zilog series. As a side effects I have been attracted to reading about Professor Wirth's experiences in this field. So I entered the subjects of Oberon and CoCo (qv). And when ordering ("used") books about those subjects I ran into "used" books about Modula-2. Which I ordered.

As a consequence I now am the proud owner of a german language edition of PIM (Programming in Modula-2) by Niklaus Wirth. And since Professor Wirth is just about the inventor of EBNF (Extended Backus Naur Formalism, a programming language to describe programming languages), he listed an EBNF checker in his PIM book.
In this section of fruttenboel, I pay attention to the EBNF scanner which

If I will understand how this EBNF scanner works, I will be better prepared to understand a true Modula-2 scanner which will ease my work for m4m.

EBNFScanner: DEFINITION MODULE

The EBNF Scanner is the library module that contains the functions required for scanning the text. We start out with the DEFINITION MODULE:

DEFINITION MODULE EBNFScanner;

TYPE  Symbol = (ident, literal, lpar, lbk, lbr, bar, eql, period, rpar, rbk, rbr, other);

CONST IdLength = 24;

VAR   sym      	    : Symbol;
      id	    : ARRAY [0..IdLength] OF CHAR;
      Ino	    : INTEGER;

PROCEDURE GetSym;

PROCEDURE MarkError (n : INTEGER);

PROCEDURE SkipLine;

END EBNFScanner.
   
This library exports just three functions. GetSym will read the next symbol from the inout file, MarkError reacts on syntax errors by making a mention of it and skipping until the next strong SYNC point. SkipLine will do the actional skipping until a SYNC marker.

EBNFScanner: IMPLEMENTATION MODULE

Below is the IMPLEMENTATION MODULE for the EBNF Scanner. This is a fine example of structured programming. The program uses a local module with it's associated IMPORT and EXPORT statements.

IMPLEMENTATION MODULE EBNFScanner;

FROM  ASCII    	   IMPORT  LF;
FROM  InOut	   IMPORT  Read, Write, WriteLn, WriteInt, WriteBf, EOF;

VAR   ch	   	   : CHAR;

MODULE LineHandler;

  IMPORT   LF, EOF, ch, Ino, Read, Write, WriteLn, WriteInt, WriteBf;
  EXPORT   GetCh, MarkError, SkipLine;
  
  CONST    LineWidth = 100;
  
  VAR	   cc	     : INTEGER;
  	   cc1	     : INTEGER;
	   cc2	     : INTEGER;
	   line	     : ARRAY [0..LineWidth - 1] OF CHAR;
	   
  PROCEDURE GetLine;
  
  BEGIN
    IF  cc2 > 0  THEN
      WriteLn;
      cc2 := 0
    END;
    Read (ch);
    IF  EOF ()  THEN
      line [0] := 177C;
      cc1 := 1
    ELSE
      INC (Ino);
      WriteInt (Ino, 5);
      Write (' ');
      cc1 := 0;
      LOOP
        Write (ch);
	line [cc1] := ch;
	INC (cc1);
	IF  ch = LF  THEN  EXIT  END;
	Read (ch)
      END    
    END 
  END GetLine;
    
    
    PROCEDURE GetCh;
    
    BEGIN
      WHILE  cc = cc1  DO
        cc := 0;
	GetLine
      END;
      ch := line [cc];
      INC (cc)
    END GetCh;
    
    
    PROCEDURE MarkError (n  : INTEGER);
    
    BEGIN
      IF  cc2 = 0  THEN
        Write ('*');
	cc2 := 3;
	REPEAT
	  Write (' ');
	  DEC (cc2)
	UNTIL  cc2 = 0;
      END;
      WHILE  cc2 < cc  DO
        Write (' ');
	INC (cc2)
      END;
      Write ('^');
      WriteInt (n, 1);
      INC (cc2, 2)
    END MarkError;
    
    
    PROCEDURE SkipLine;
    
    BEGIN
      WHILE  ch # LF  DO  GetCh  END;
      GetCh
    END SkipLine;
    
      
  BEGIN		 (* BEGIN of LineHandler	*)
    cc  := 0;
    cc1 := 0;
    cc2 := 0
  END LineHandler;
  
  
PROCEDURE GetSym;
  
VAR	i   	   : CARDINAL;
  
BEGIN
  WHILE  ch <= ' '  DO  GetCh  END;
  IF  ch = '/'  THEN
    SkipLine;
    WHILE  ch <= ' '  DO  GetCh  END
  END;
  IF  (CAP (ch) <= 'Z') AND (CAP (ch) >= 'A')  THEN
    i := 0;
    sym := literal;
    REPEAT
      IF  i < IdLength  THEN
        id [i] := ch;
      	INC (i)
      END;
      IF  ch > 'Z' THEN  sym := ident  END;
      GetCh
    UNTIL  (CAP (ch) < 'A') OR (CAP (ch) > 'Z');
    id [i] := ' '
  ELSIF  ch = "'"  THEN
    i := 0;
    GetCh;
    sym := literal;
    WHILE  ch # "'"  DO
      IF  i < IdLength  THEN
        id [i] := ch;
	INC (i)
      END;
      GetCh
    END;
    GetCh;
    id [i] := ' '
  ELSIF  ch = '"'  THEN
    i := 0;
    GetCh;
    sym := literal;
    WHILE  ch # '"'  DO
      IF  i < IdLength  THEN
        id [i] := ch;
	INC (i)
      END;
      GetCh
    END;
    GetCh;
    id [i] := ' '
  ELSIF  ch = '='  THEN  sym := eql;   GetCh
  ELSIF  ch = '('  THEN  sym := lpar;  GetCh
  ELSIF  ch = ')'  THEN  sym := rpar;  GetCh
  ELSIF  ch = '['  THEN  sym := lbk;   GetCh
  ELSIF  ch = ']'  THEN  sym := rbk;   GetCh
  ELSIF  ch = '{'  THEN  sym := lbr;   GetCh
  ELSIF  ch = '}'  THEN  sym := rbr;   GetCh
  ELSIF  ch = '|'  THEN  sym := bar;   GetCh
  ELSIF  ch = '.'  THEN  sym := period;  GetCh
  ELSIF  ch = 177C THEN  sym := other;  GetCh
  ELSE 
    sym := other;
    GetCh
  END
END GetSym;


BEGIN
  Ino := 0;
  ch := ' '
END EBNFScanner.
   
This program will scan the EBNF code and act upon it if it finds errors. As you see, the layout of this program is very similar to how EBNF is defined.

TableHandler: DEFINITION MODULE

Next, the header file for the Table Handler (which makes a binary tree for storing the symbol table) is published:

DEFINITION MODULE TableHandler;


CONST  LineWidth       = 80;
       WordLength      = 24;


TYPE   Table;


VAR    overflow		: INTEGER;	(* overflow > 0 means: table is full	*)


PROCEDURE InitTable (VAR t : Table);

PROCEDURE Record (t : Table; VAR x : ARRAY OF CHAR; n : INTEGER);

PROCEDURE Tabulate (t : Table);

END TableHandler.
   
Not much to mention here. The table itself is defined, just like some parameters and three Table related functions. InitTable will do just what the name implies. Record is to be pronounced in the same way as we would when recording an audio tape. It checks if a symbol exists and if it doesn't the new symbol is inserted in the tree. Tabulate will print out the symbol table.

TableHandler: IMPLEMENTATION MODULE

The table handler dynamically allocates space on the heap for building the binary tree. Then it traverses the tree to chec if a symbol is new or already defined. If it is new, it is inserted in the search tree.

IMPLEMENTATION MODULE TableHandler;


FROM  InOut    	      IMPORT  Write, WriteInt, WriteLn;
FROM  Storage	      IMPORT  ALLOCATE;


CONST TableLength = 3000;


TYPE  TreePtr = POINTER TO Word;
      ListPtr = POINTER TO Item;
      
      Item    = RECORD
      	      	  num  : INTEGER;
		  next : ListPtr
      	      	END;

      Word    = RECORD
       	      	  key 		: INTEGER;
		  first	 	: ListPtr;
		  left, right	: TreePtr
      	      	END;

      Table = TreePtr;


VAR   id      	      : ARRAY [0..WordLength] OF CHAR;
      ascinx	      : INTEGER;
      asc	      : ARRAY [0..TableLength - 1] OF CHAR;


PROCEDURE InitTable (VAR t : Table);

BEGIN
  ALLOCATE (t, SIZE (Word));
  t^.right := NIL
END InitTable;


PROCEDURE Search (p : TreePtr) : TreePtr;

  TYPE  Relation = (less, equal, greater);

  VAR   q	 : TreePtr;
  	r	 : Relation;
	i	 : INTEGER;
  
  PROCEDURE rel (k : INTEGER) : Relation;
  
  VAR  i    	: INTEGER;
       R	: Relation;
       x, y	: CHAR;
  
  BEGIN
    i := 0;
    R := equal;
    LOOP
      x := id [i];
      y := asc [k];
      IF  CAP (x) # CAP (y)  THEN  EXIT  END;
      IF  x <= ' ' THEN  RETURN R  END;
      IF  x < y  THEN
        R := less  
      ELSIF  x > y THEN
        R := greater
      END;
      INC (i);
      INC (k)
    END;
    IF  CAP (x) > CAP (y)  THEN
      RETURN greater
    ELSE
      RETURN less
    END
  END rel;
  
BEGIN
  q := p^.right;
  r := greater;
  WHILE  q # NIL  DO
    p := q;
    r := rel (p^.key);
    IF  r = equal  THEN  
      RETURN p
    ELSIF r = less THEN
      q := p^.left
    ELSE
      q := p^.right
    END
  END;
  ALLOCATE (q, SIZE (Word));
  IF  q # NIL  THEN
    WITH  q^  DO
      key := ascinx;
      first := NIL;
      left := NIL;
      right := NIL
    END;
    IF  r = less  THEN  p^.left := q  ELSE p^.right := q  END;
    i := 0;
    WHILE  id [i] > ' '  DO
      IF  ascinx = TableLength  THEN
        asc [ascinx] := ' ';
	id [i] := ' ';
	overflow := 1
      ELSE
        asc [ascinx] := id [i];
	INC (ascinx);
	INC (i)
      END
    END;
    asc [ascinx] := ' ';
    INC (ascinx)
  END;
  RETURN q
END Search;


PROCEDURE Record (t : Table; VAR x : ARRAY OF CHAR; n : INTEGER);

VAR  p	  	 : TreePtr;
     q		 : ListPtr;
     i		 : INTEGER;

BEGIN
  i := 0;
  REPEAT
    id [i] := x [i];
    INC (i)
  UNTIL (id [i-1] = ' ') OR (i = WordLength);
  p := Search (t);
  IF  p = NIL  THEN  
    overflow := 2  
  ELSE  
    ALLOCATE (q, SIZE (Item));
    IF  q = NIL  THEN
      overflow := 3
    ELSE
      q^.num := n;
      q^.next := p^.first;
      p^.first := q
    END
  END
END Record;


PROCEDURE Tabulate (t : Table);

  PROCEDURE PrintItem (p : TreePtr);

    CONST  L = 6;
    	   N = (LineWidth - WordLength) DIV L;

    VAR	   ch  		  : CHAR;
    	   i, k		  : INTEGER;
	   q  		  : ListPtr;

  BEGIN
    i := WordLength + 1;
    k := p^.key;
    REPEAT
      ch := asc [k];
      DEC (i);
      INC (k);
      Write (ch)      
    UNTIL  ch <= ' ';
    WHILE  i > 0  DO  Write (' ');  DEC (i)  END;
    q := p^.first;
    i := N;
    WHILE  q # NIL  DO
      IF  i = 0  THEN
        WriteLn;
	i := WordLength + 1;
	REPEAT  Write (' ');  DEC (i)  UNTIL  i = 0;
	i := N
      END;
      WriteInt (q^.num, L);
      q := q^.next;
      DEC (i)
    END;
    WriteLn
  END PrintItem;

  PROCEDURE TraverseTree (p : TreePtr);
  
  BEGIN
    IF  p # NIL  THEN
      TraverseTree (p^.left);
      PrintItem (p);
      TraverseTree (p^.right);
    END
  END TraverseTree;
  
BEGIN
  WriteLn;
  TraverseTree (t^.right)
END Tabulate;


BEGIN
  ascinx := 0;
  id [WordLength] := ' ';
  overflow := 0
END TableHandler.
   

EBNF: the actual executable.

What follows is the EBNF scanner executable. Here we see a build-up of the program which is almost identical to the way in which EBNF is defined. Look at the processing of the left parenthesis (lpar). Between parenthesis, an expression is required. So the section scanning the lpar, reads the next symbol confirms if this is an expression and when successful, it checks whether the expression is closed by a right parenthesis (rpar). If there is an inconsistency in any place, an error is flagged and the scanning is continued after a predetermined SYNC point.

MODULE EBNF;

FROM  ASCII 	    IMPORT  EOL;
FROM  InOut	    IMPORT  Done, Read, Write, WriteLn, WriteInt, WriteString;
FROM  EBNFScanner   IMPORT  Symbol, sym, id, Ino, GetSym, MarkError, SkipLine;
FROM  TableHandler  IMPORT  WordLength, Table, overflow, InitTable, Record, Tabulate;


VAR   T0, T1	    	    : Table;


PROCEDURE skip (n : INTEGER);

BEGIN
  MarkError (n);
  WHILE  (sym < lpar) OR (sym > period)  DO  GetSym  END
END skip;


PROCEDURE Expression;

  PROCEDURE Term;
  
    PROCEDURE Factor;
    
    BEGIN
      IF  sym = ident  THEN
        Record (T0, id, Ino);
	GetSym
      ELSIF  sym = literal  THEN
        Record (T1, id, Ino);
	GetSym
      ELSIF  sym = lpar  THEN
        GetSym;
	Expression;
	IF  sym = rpar  THEN  GetSym  ELSE  skip (2)  END
      ELSIF  sym = lbk  THEN
        GetSym;
	Expression;
	IF  sym = rbk  THEN  GetSym  ELSE  skip (3)  END
      ELSIF  sym = lbr  THEN
        GetSym;
	Expression;
	IF  sym = rbr  THEN  GetSym  ELSE  skip (4)  END
      ELSE
        skip (5)
      END
    END Factor;
    
  BEGIN
    Factor;
    WHILE  sym < bar  DO  Factor  END
  END Term;
    
BEGIN
  Term;
  WHILE  sym = bar  DO
    GetSym;
    Term
  END
END Expression;


PROCEDURE Production;

BEGIN
  Record (T0, id, - INTEGER (Ino));
  GetSym;
  IF  sym = eql  THEN  GetSym  ELSE  skip (7)  END;
  Expression;
  IF  sym # period  THEN
    MarkError (8);
    SkipLine
  END;
  GetSym
END Production;


BEGIN
  InitTable (T0);
  InitTable (T1);
  GetSym;
  WHILE  (sym = ident) AND (overflow = 0)  DO  Production  END;
  IF  overflow > 0  THEN
    WriteLn;
    WriteString ("Table overflow");
    WriteInt (overflow, 6);
  END;
  Write (35C);
  Tabulate (T0);
  Tabulate (T1);
END EBNF.
   

Debugging.

I copied this source by typing it in from the PIM book. Now, that source was meant to be run from a Lilith workstation and that was different from Linux based Mocka. So a few small adaptations were required:

  1. In Mocka, EOL is not a part of the InOut library, so I had to import it from the ASCII library.
  2. In Lilith, the EOF is shown by means of a NIL character (0C), but Linux has another one. So I had to revert to the 'EOF' function in InOut.
  3. In non-Linux Modula-2, the end-of-line token (EOL) is known as such. But in Linux it was better to use the ASCII.LF as such.
  4. Mocka doesn't support OpenInput, OpenOutput and the associated 'Close' functions, so the program will assume in/output redirection to be applied on the command line.
All in all, this took me about two hours to track down. Plus one typo in a source file. But now it works....

Eating the pudding.

The proof of the pudding, is in the eating. So now we eat the pudding... But before we go to table, we need tools to eat the custard jelly. Below is the source of a file called 'ebnf.syntax'

symbol = {" "} (identifier | string | "(" | ")" | "[" | "]" | "{" | "}" /
       	       		     	    | "|" | "=" | ".").

identifier = letter { letter | digit }.

string = """{character}""".
   
If we feed this into the EBNF executable with a command line similar to
      EBNF <ebnf.syntax
   
then the system will produce an output similar to:
bash-2.05$ EBNF <ebnf.syntax

    1 symbol = {" "} (identifier | string | "(" | ")" | "[" | "]" | "{" | "}" /
    2                                     | "|" | "=" | ".").
    3
    4 identifier = letter { letter | digit }.
    5
    6 string = """{character}""".
		     
digit                         4
identifier                   -4     1
letter                        4     4
string                       -6     1
symbol                       -1
		      
                              6     6     1
(   1
)   1
.   2
=   2
[   1
]   1
{   1
{character}   6
|   2
}   1
bash-2.05$
   
Well, that wasn't bad for a pudding Professor Wirth baked for us, almost 20 years ago. To make life easier on all Modula-2 enthusiasts, I made a tar.gz file that contains the full sources plus the compiled binaries. It is downloadable in the DOWNLOAD section (look in the navigator frame on the right). If you do not see the navigator frame, please read the text in the next green rectangle below.

There will be more of this sources and projects from the PIM book. Please check these pages on a regular base to see what the current status is.

Page created June 2005,