CGI : Word Hasher

A few months ago, someone at Facebook mentioned about a funny property of the human mind: For reading a text, it doesn't matter how words are spelled, as long as the first and last letter remain as they should be, the other letters may be in any position in the word.
The accompanying text was quite good to read. But hey, the text could have been handpicked on good readability. So I decided to make a word hasher that jumbles up all letters in a word, apart from the two sentinel tokens. This page is about that program, written in Mocka Modula-2 and cmpiled to be a CGI executable.

Whash : the algorithm

The hashing of the intermediate letters is quite simple:

  1. One letter words are kept intact
  2. Two letter words are left untouched too
  3. Three letter words also remain as is
For all other words, This is repeated until the input stream is empty.

To prevent that a comma, exclamation point or any other punctuation fouls up the result, each word is checked upon these tokens, both leading and trailing.

SkipTo

The SkipTo procedure is used to wade through a CGI text stream. The stream comes character by character as if the letters were entered by a person. SkipTo searches with one character lookahead. It is quite simple: it compares target string to entered tokens. As soon as the tokens do not match up anymore, the search is restarted.
Here is the source of SkipTo:

PROCEDURE SkipTo (tx : ARRAY OF CHAR);

VAR	ch	: CHAR;
	j, m	: CARDINAL;

BEGIN
  m := Strings.Length (tx);
  LOOP
    j := 0;
    REPEAT  InOut.Read (ch)  UNTIL  ch = tx [j];	(* Hunt for first token 	*)
    IF  NOT InOut.Done ()  THEN  EXIT  END;
    WHILE  (ch = tx [j]) AND (j < m)  DO		(* As long as the tokens match	*)
      INC (j);						(* keep fetchuing tokens 	*)
      IF  j = m  THEN  EXIT  END;			(* until we found the text	*)
      InOut.Read (ch)					
    END
  END
END SkipTo;
   

Whash : the source

Below is the source of the Modula-2 program that accomplishes the word hash:

MODULE whash03;

(*	version 2: handle punctuation
	version 3: cgi version			*)

IMPORT InOut, Random, Strings, SysLib;

VAR	src, dest	: Strings.String;
	count,
	k, l, m, rem	: CARDINAL;
	punct		: CHAR;
	Insq, Indq	: BOOLEAN;


PROCEDURE IsPunct (VAR str  : Strings.String) : CHAR;

VAR	ch, rc	: CHAR;
	l	: CARDINAL;

BEGIN
  l := Strings.Length (str);
  ch := str [l-1];		(* rc = Returned Character *)
  rc :=' ';
  IF  (ch = '"') OR (ch = '"') OR (ch = '.') OR (ch = ',') OR (ch = '?') OR
      (ch = '!') OR (ch = ']') OR (ch = '}') OR (ch = ')') OR (ch = ';') OR (ch = ':')  THEN
    Strings.Delete (str, l-1, 1);
    rc := ch
  END;
  ch := str [0];
  IF  (ch = '[') OR (ch = '(') OR (ch = '{') OR (ch = '"') OR (ch = "'")  THEN
    Strings.Delete (str, 0, 1);
    InOut.Write (ch)
  END;
  RETURN rc
END IsPunct;


PROCEDURE Init;

VAR	seed	: INTEGER;

BEGIN
  SysLib.time (seed);
  Random.initSeed (CARDINAL (seed));
  InOut.WriteString ("Content-type: text/html");
  InOut.WriteLn;
  InOut.WriteLn
END Init;



PROCEDURE SkipTo (tx : ARRAY OF CHAR);

VAR	ch	: CHAR;
	j, m	: CARDINAL;

BEGIN
  m := Strings.Length (tx);
  LOOP
    j := 0;
    REPEAT  InOut.Read (ch)  UNTIL  ch = tx [j];
    IF  NOT InOut.Done ()  THEN  EXIT  END;
    WHILE  (ch = tx [j]) AND (j < m)  DO
      INC (j);
      IF  j = m  THEN  EXIT  END;
      InOut.Read (ch)
    END
  END
END SkipTo;


PROCEDURE BuildHtml (nr : CARDINAL);

BEGIN
  IF  nr = 0  THEN
    InOut.WriteString ('<!doctype html><html lang="en">');
    InOut.WriteString ('<head><meta charset = "utf-8"><title>Word Hash: scrambled words</title></head>');
    InOut.WriteString ('<body><p><h1>Jumbled up words</h1><hr><h2>');
    InOut.WriteLn;
    InOut.WriteBf
  ELSIF nr = 1  THEN
    InOut.WriteString ('</h2></p><hr><p>Page created 4 February 2016,');
    InOut.WriteString ('<script>var date = new Date (document.lastModified);');
    InOut.WriteString ('document.write (" last revised on " + date.toDateString() + "<br>");');
    InOut.WriteString ('document.write ("This page located at " + document.URL + "<br>");');
    InOut.WriteString ('</script></p></body></html>');
    InOut.WriteLn;
    InOut.WriteBf
  END
END BuildHtml;


BEGIN
  Init;
  BuildHtml (0);
  SkipTo ('tekst=');
  count := 0;
  LOOP
    InOut.ReadString (src);
    INC (count);
    IF  Strings.StrEq (src, "Done=Klaar")  THEN  EXIT  END;
    IF  NOT InOut.Done () THEN  EXIT  END;
    IF  count > 250  THEN  
      InOut.WriteString ("  **  Too much text; aborting  **  ");
      EXIT
    END;
    punct := IsPunct (src);
    l := Strings.Length (src);
    dest := src;
    IF  l > 3  THEN
      rem := l - 2;
      Strings.Delete (src, l - 1, 1);
      Strings.Delete (src, 0, 1);
      m := 1;
      WHILE  rem > 1  DO
        k := Random.nr (rem);
	dest [m] := src [k];
	Strings.Delete (src, k, 1);
	DEC (rem);
	INC (m)
      END;
      dest [m] := src [0]
    END;
    InOut.WriteString (dest);
    InOut.Write (punct);
    IF  punct # ' '  THEN
      InOut.Write (' ')
    END
  END;
  BuildHtml (1)
END whash03.
   

Whash : the web form

Of course the CGI executable needs data. Therefore we have the whash web form. The most important line in it reads:

<form action="https:verhoeven272.nl/cgi-bin/whash" method="post" enctype="text/plain">
The first tells where the CGI executable can be found. The second part says we're using method post and the third part says we will be using text/plain. This last option tells apache to feed the payload of the form to the CGI executable as if it were characters typed by the user. In other words: the input is done via stdin. This is very convenient, since we now can use the usual InOut module.
<!doctype html>
<html lang="en">

 <head>
  <meta charset	= 'utf-8'>
  <meta name = 'author'	content = 'Jan Verhoeven'>
  <meta name = 'generator' content = 'Jan, Jed and some common sense'>
  <title>Word Hash: scrambled words</title>
  <link type="text/css" rel="stylesheet" href="../frutt.css">
 </head>
 
 <body>
  <form action="https:verhoeven272.nl/cgi-bin/whash" method="post" enctype="text/plain">
   <h2>Jumbled words.</h2>
   According to a study at Cambridge University, spelling is not very important for reading words. As long as the first
   and last letters of each word are right, all the letters in between can be in any order. It is described in more
   detail on
   <a href="https://en.wikipedia.org/wiki/Typoglycemia" target="_blank">wikipedia</a>.
   <br><br>
   <textarea name="tekst" rows="10" cols="40"></textarea>
   <br>
    Only the first 1000 characters are processed so it doesn't make sense to try to hash-up Tolstoi's "War and Peace".
   <br>
   <table>
    <tr>
     <td><input type="hidden" name="Done" value="Klaar"></td>
     <td><input type="submit" value="Scramble"></td>
     <td><input type="reset" value="Erase"></td>
    </tr>
   </table>
  </form>

  <p class="summ">
   Page created 4 February 2016,
   <script>
    var date = new Date (document.lastModified);
    document.write ("last revised on " + date.toDateString() + "<br>");
    document.write ("This page located at " + document.URL + "<br>");
   </script>
  </p>
 </body>
</html>
   

Page created on 18 Feb 2016 and