Compiler construction
In chapters 3 and 4 of Compiler Construction, Professor Wirth published an EBNF scanner that builds upon the ideas of recursive descent and LL(1) compilers. In order to better understand how things work, I have entered the source and ported it to the Mocka compiler (the original was in Oberon). Below you can read what I had to make of it.
EBNF scanner ported to Mocka.
In the original source, Professor Wirth liberally used functionality from Oberon modules. This, of course was not possible with Mocka. So I had to change things a bit.
MODULE EBNF;
IMPORT ASCII, Arguments, TextIO, InOut;
CONST IdLen = 32;
ident = 0;
literal = 2;
lparen = 3;
lbrak = 4;
lbrace = 5;
bar = 6;
eql = 7;
rparen = 8;
rbrak = 9;
rbrace = 10;
period = 11;
other = 12;
Not much changed here. I only had to compensate for different line endings (CR in Oberon, LF in Unix) and I
wanted to have the constants defined in such a way that they are easier to read back.
TYPE Identifier = ARRAY [0..IdLen] OF CHAR;
VAR ch : CHAR;
count : SHORTCARD;
Line, Column, lastpos,
pos, sym : CARDINAL;
id : Identifier;
ProgramEnd : BOOLEAN;
infile : TextIO.File;
args : Arguments.ArgTable;
PROCEDURE ReadChar (VAR ch : CHAR);
BEGIN
TextIO.GetChar (infile, ch);
INC (pos);
IF TextIO.EOF (infile) = TRUE THEN ProgramEnd := TRUE END;
INC (Column);
IF ch = ASCII.LF THEN
INC (Line);
Column := 0
END
END ReadChar;
ReadChar was new. It replaces Texts.Reach (R, ch) calls. Originally I wanted to make my oen Texts module but
luckily I came to my senses right in time. So I did things the easy way and I rewrote the 'Get Character'
routine to standard Modula-2. This version has the 'pos' variable native. And it handles Line/Column number
increments.
PROCEDURE error (n : CARDINAL);
BEGIN
IF pos > lastpos + 4 THEN
InOut.WriteString ("Error at line"); InOut.WriteCard (Line, 4);
InOut.WriteString (", column "); InOut.WriteCard (Column, 3);
InOut.WriteString (" : sym = "); InOut.WriteCard (sym, 0);
InOut.WriteLn;
END
END error;
PROCEDURE GetSym;
VAR i : INTEGER;
BEGIN
WHILE (NOT ProgramEnd) AND (ch <= ' ') DO
ReadChar (ch)
END;
CASE ch OF
'A'..'Z',
'a'..'z' : sym := ident; i := 0;
REPEAT
id [i] := ch;
INC (i);
ReadChar (ch)
UNTIL (CAP (ch) < 'A') OR (CAP (ch) > 'Z');
id [i] := 0C |
'"' : ReadChar (ch);
sym := literal;
i := 0;
WHILE (ch # '"') AND (ch > ' ') DO
id [i] := ch;
INC (i);
ReadChar (ch)
END;
IF ch <= ' ' THEN error (1) END;
id [i] := 0C;
ReadChar (ch) |
'=' : sym := eql; ReadChar (ch) |
'(' : sym := lparen; ReadChar (ch) |
')' : sym := rparen; ReadChar (ch) |
'[' : sym := lbrak; ReadChar (ch) |
']' : sym := rbrak; ReadChar (ch) |
'{' : sym := lbrace; ReadChar (ch) |
'}' : sym := rbrace; ReadChar (ch) |
'|' : sym := bar; ReadChar (ch) |
'.' : sym := period; ReadChar (ch)
ELSE
sym := other; ReadChar (ch)
END
END GetSym;
GetSym is the heart of the matter. It reads the symbols and operands and assigns them numbers. The numbers
were defined in the CONST section above. Wirth makes use of constants instead of enumerations. In this case,
the enumeration would have worked as well, but in larger systems, when there are more than 32 items to
differentiate, enumerated types would not have been possible anyway. So we stick to the methods with the
constants, although it has a high degree of Sinclair ZX-Basic-ness. But it works.
PROCEDURE expression;
PROCEDURE term;
PROCEDURE factor;
BEGIN
IF sym = ident THEN
InOut.WriteString ("identifier = ");
InOut.WriteString (id); InOut.WriteLn;
(* record (T0, id, 1); *)
GetSym
ELSIF sym = literal THEN
InOut.WriteString ("string = ");
InOut.WriteString (id); InOut.WriteLn;
(* record (T1, id, 0); *)
GetSym
ELSIF sym = lparen THEN
InOut.WriteString ("symbol = ("); InOut.WriteLn;
GetSym;
expression;
IF sym = rparen THEN GetSym ELSE error (2) END
ELSIF sym = lbrak THEN
InOut.WriteString ("symbol = ["); InOut.WriteLn;
GetSym;
expression;
IF sym = rbrak THEN GetSym ELSE error (3) END
ELSIF sym = lbrace THEN
InOut.WriteString ("symbol = {"); InOut.WriteLn;
GetSym;
expression;
IF sym = rbrace THEN GetSym ELSE error (4) END
ELSE
error (5)
END
END factor;
BEGIN (* term *)
factor;
WHILE sym < bar DO factor END
END term;
BEGIN (* expression *)
term;
WHILE sym = bar DO GetSym; term END
END expression;
This big nested set of procedures represents the structure of EBNF. In fact, the nesting of procedures is a
copy of the EBNF standard (expressed in EBNF as well). It may look a bit weird, but this way, with recursively
called subroutines, the scanner becomes much simpler since no 'nesting depth' need to be taken into account.
This is one of the tricks described in Compiler Construction. If you want to have a copy of the book (in PDF
format) get one via the download section.
PROCEDURE production;
BEGIN
GetSym;
IF sym = eql THEN GetSym ELSE error (7) END;
expression;
IF sym = period THEN GetSym ELSE error (8) END
END production;
PROCEDURE syntax;
BEGIN
WHILE sym = ident DO
InOut.WriteString ("ident = "); InOut.WriteString (id); InOut.WriteLn;
production
END
END syntax;
PROCEDURE compile;
BEGIN
lastpos := 0;
ReadChar (ch);
GetSym;
syntax;
InOut.WriteBf
END compile;
PROCEDURE Init;
BEGIN
ProgramEnd := FALSE;
pos := 0;
Line := 1;
Column := 1;
Arguments.GetArgs (count, args);
TextIO.OpenInput (infile, args^ [1]^);
IF TextIO.Done () = FALSE THEN
InOut.WriteString ("Cannot open file. Aborting.");
InOut.WriteLn;
InOut.WriteBf;
HALT
END
END Init;
BEGIN
Init;
compile;
TextIO.Close (infile)
END EBNF.
That's it. I placed a lot of WriteStrings in the source in order to see what's going on. Also, the program
didn't work at first since I made some typos.
Page created February 2006,
Page equipped with GoogleBuster technology