Compiler construction

In chapters 3 and 4 of Compiler Construction, Professor Wirth published an EBNF scanner that builds upon the ideas of recursive descent and LL(1) compilers. In order to better understand how things work, I have entered the source and ported it to the Mocka compiler (the original was in Oberon). Below you can read what I had to make of it.

EBNF scanner ported to Mocka.

In the original source, Professor Wirth liberally used functionality from Oberon modules. This, of course was not possible with Mocka. So I had to change things a bit.

      
MODULE EBNF;

IMPORT ASCII, Arguments, TextIO, InOut;

CONST  IdLen   = 32;

       ident   =  0;
       literal =  2;
       lparen  =  3;
       lbrak   =  4;
       lbrace  =  5;
       bar     =  6;
       eql     =  7;
       rparen  =  8;
       rbrak   =  9;
       rbrace  = 10;
       period  = 11;
       other   = 12;
   
Not much changed here. I only had to compensate for different line endings (CR in Oberon, LF in Unix) and I wanted to have the constants defined in such a way that they are easier to read back.
In the next section, I had to add some extra variables to allow the new program to read the command line and to detect and process end of files.
TYPE   Identifier   = ARRAY [0..IdLen] OF CHAR;


VAR    ch	      	    	: CHAR;
       count			: SHORTCARD;
       Line, Column, lastpos, 
       pos, sym	    		: CARDINAL;
       id	    		: Identifier;
       ProgramEnd    		: BOOLEAN;
       infile	    		: TextIO.File;
       args			: Arguments.ArgTable;


PROCEDURE ReadChar (VAR ch : CHAR);

BEGIN
  TextIO.GetChar (infile, ch);
  INC (pos);
  IF  TextIO.EOF (infile) = TRUE  THEN  ProgramEnd := TRUE  END;
  INC (Column);
  IF  ch = ASCII.LF  THEN
    INC (Line);
    Column := 0
  END
END ReadChar;
   
ReadChar was new. It replaces Texts.Reach (R, ch) calls. Originally I wanted to make my oen Texts module but luckily I came to my senses right in time. So I did things the easy way and I rewrote the 'Get Character' routine to standard Modula-2. This version has the 'pos' variable native. And it handles Line/Column number increments.
PROCEDURE error (n : CARDINAL);

BEGIN
  IF  pos > lastpos + 4 THEN
     InOut.WriteString ("Error at line"); InOut.WriteCard (Line, 4);
     InOut.WriteString (", column "); InOut.WriteCard (Column, 3);
     InOut.WriteString (" : sym = "); InOut.WriteCard (sym, 0);
     InOut.WriteLn;
  END
END error;


PROCEDURE GetSym;

VAR  i	  : INTEGER;

BEGIN
   WHILE  (NOT ProgramEnd) AND (ch <= ' ')  DO
      ReadChar (ch)  
   END;
   CASE  ch  OF
     'A'..'Z',
     'a'..'z' 	: sym := ident;		i := 0;
     		  REPEAT
		    id [i] := ch;
		    INC (i);
		    ReadChar (ch)
		  UNTIL  (CAP (ch) < 'A')  OR  (CAP (ch) > 'Z');
		  id [i] := 0C	     	       	    	   		|
     '"'	: ReadChar (ch);
     		  sym := literal;
		  i := 0;
		  WHILE (ch # '"') AND (ch > ' ') DO
		    id [i] := ch;
		    INC (i);
		    ReadChar (ch)
		  END;
		  IF  ch <= ' '  THEN  error (1)  END;
		  id [i] := 0C;
		  ReadChar (ch) 					|
     '='        : sym := eql;     ReadChar (ch) 			|
     '('	: sym := lparen;  ReadChar (ch)				|
     ')'	: sym := rparen;  ReadChar (ch)				|
     '['	: sym := lbrak;	  ReadChar (ch)				|
     ']'	: sym := rbrak;	  ReadChar (ch)				|
     '{'	: sym := lbrace;  ReadChar (ch)				|
     '}'	: sym := rbrace;  ReadChar (ch)				|
     '|'	: sym := bar;	  ReadChar (ch)				|
     '.'	: sym := period;  ReadChar (ch)
   ELSE
     sym := other;	ReadChar (ch)
   END
END GetSym;
   
GetSym is the heart of the matter. It reads the symbols and operands and assigns them numbers. The numbers were defined in the CONST section above. Wirth makes use of constants instead of enumerations. In this case, the enumeration would have worked as well, but in larger systems, when there are more than 32 items to differentiate, enumerated types would not have been possible anyway. So we stick to the methods with the constants, although it has a high degree of Sinclair ZX-Basic-ness. But it works.
PROCEDURE expression;

  PROCEDURE term;
  
    PROCEDURE factor;
    
    BEGIN
      IF  sym = ident  THEN
        InOut.WriteString ("identifier = ");
	InOut.WriteString (id);    		InOut.WriteLn;
(*        record (T0, id, 1);		*)
      	GetSym
      ELSIF  sym = literal  THEN
        InOut.WriteString ("string = ");
	InOut.WriteString (id);    		InOut.WriteLn;
(*        record (T1, id, 0);		*)
	GetSym
      ELSIF  sym = lparen  THEN
        InOut.WriteString ("symbol = (");	InOut.WriteLn;
        GetSym;
	expression;
	IF  sym = rparen  THEN  GetSym  ELSE  error (2)  END
      ELSIF sym = lbrak  THEN
        InOut.WriteString ("symbol = [");	InOut.WriteLn;
        GetSym;
	expression;
	IF  sym = rbrak  THEN  GetSym  ELSE  error (3)  END
      ELSIF sym = lbrace  THEN
        InOut.WriteString ("symbol = {");	InOut.WriteLn;
        GetSym;
	expression;
	IF  sym = rbrace  THEN  GetSym  ELSE  error (4)  END
      ELSE
        error (5)
      END
    END factor;

  BEGIN        		(* term *)
    factor;
    WHILE  sym < bar  DO  factor  END
  END term;
  
BEGIN	   		(* expression *)
  term;
  WHILE  sym = bar  DO  GetSym; term  END
END expression;
   
This big nested set of procedures represents the structure of EBNF. In fact, the nesting of procedures is a copy of the EBNF standard (expressed in EBNF as well). It may look a bit weird, but this way, with recursively called subroutines, the scanner becomes much simpler since no 'nesting depth' need to be taken into account. This is one of the tricks described in Compiler Construction. If you want to have a copy of the book (in PDF format) get one via the download section.
PROCEDURE production;

BEGIN
  GetSym;
  IF  sym = eql  THEN  GetSym  ELSE  error (7)  END;
  expression;
  IF  sym = period  THEN  GetSym  ELSE  error (8)  END 
END production;


PROCEDURE syntax;

BEGIN
  WHILE  sym = ident  DO
    InOut.WriteString ("ident = ");	InOut.WriteString (id);		InOut.WriteLn;
    production  
  END
END syntax;


PROCEDURE compile;

BEGIN
  lastpos := 0;
  ReadChar (ch);
  GetSym;
  syntax;
  InOut.WriteBf
END compile;


PROCEDURE Init;

BEGIN
  ProgramEnd := FALSE;
  pos := 0;
  Line := 1;
  Column := 1;
  Arguments.GetArgs (count, args);

  TextIO.OpenInput (infile, args^ [1]^);
  IF  TextIO.Done () = FALSE  THEN
    InOut.WriteString ("Cannot open file. Aborting.");
    InOut.WriteLn;
    InOut.WriteBf;
    HALT
  END
END Init;


BEGIN
  Init;
  compile;
  TextIO.Close (infile)
END EBNF.
   
That's it. I placed a lot of WriteStrings in the source in order to see what's going on. Also, the program didn't work at first since I made some typos.

Page created February 2006,

Page equipped with GoogleBuster technology