A safe way to process and receive E-mail

Until a few years ago, it was perfectly safe to publish your E-mail address in a webpage with the familiar

<a href="mailto:john.doe@googlemail.com">
method. Until the invention of the E-mail harvestor robots. Like Google crawls your webpages to find indexes for their searchengines, so called medical companies graze the webpages in order to harvest E-mail addresses. And the ones that are prefixed with the familiar 'mailto' are quite easy to find.

Of course there is the JavaScript trick that keeps away the majority of the harvester robots, but it remains a trick and it will last until the harvesters get smarter. Here is the JavaScript trick to hide E-mail addresses:

   <script language="JavaScript">
    <!-- Begin
     user = "fruttenboel";
     site = "gmail.com";
     document.write ('<a href=\"mailt' + 'o:' + user + '@' + site + '\">');
     document.write (user + '</a>');
    // End -->
   </script>
   
And this is the pereferred method with members of the WISclub. Or rather: one of the preferred methods. Nowadays, with Gmail accounts, it's not such a big deal if a gmail account is flooded with spam. The Google spamfilters are very good and get better every day.
Still, any spam we can prevent is gained. So better options are required. One such option is to make an html form that sends data to a CGI executable.
Many of such CGI scripts exist but the majority is written in Perl, Python or another scripting language. Like BASIC, Perl takes forever to execute since it is an interpreted language. And this all consumes speed from the webserver. So if there is a better solution, it is better for the world since lower processor loading means lower power consumption and faster internet. So I introduce the mailCGI executable here, written in a safely compiled language: Modula-2.

mailCGI version 1: how does it work?

Below is the sourcecode for the first version of mailCGI. It relies on an HTML page (called post.html) that can be loaded from this site as well, through the navigator frame on the right.

This program is written in Modula-2. You can read and understand this program, without having knowledge of or experience with Modula-2. The language is like a poem.

      
MODULE mailCGI;

FROM  InOut		IMPORT  Read, ReadString, WriteBf, WriteLn, WriteString;
FROM  Strings	  	IMPORT  Assign, String, StrEq;
FROM  TextIO		IMPORT  File, OpenOutput, PutString, PutLn, Close, Done, PutChar, PutBf;
FROM  cgi		IMPORT  ServerDataType, CGItype;

IMPORT ASCII, InOut, cgi, NumConv;


VAR   content 	 	       : String;
      dest, from	       : String;
      PostLength 	       : CARDINAL;
      ok		       : BOOLEAN;
      index	      	       : CARDINAL;
      ch	      	       : CHAR;


PROCEDURE ConvertHex;                  (*  Process encoded tokens '%0D' etc   *)

VAR	  str	     : ARRAY [0..1] OF CHAR;
	  num	     : CARDINAL;

BEGIN
   InOut.Read (str [0]);
   InOut.Read (str [1]);
   INC (index, 2);
   NumConv.Str2Num (num, 16, str, ok);
   IF  NOT ok  THEN
      WriteString ("Error in number : ");	WriteString (str);	WriteLn;
      HALT
   END;
   ch := CHR (num)
END ConvertHex;


PROCEDURE GetChar;                     (*  Read one character from stdin  *)

BEGIN
   InOut.Read (ch);
   INC (index);
   IF  ch = '&'  THEN
      ch := ASCII.LF
   ELSIF  ch = '+'  THEN
      ch := ' '
   END;
   IF  ch = '%'  THEN  ConvertHex  END
END GetChar;


PROCEDURE ReadLine (VAR str : String);

VAR   i		   : CARDINAL;

BEGIN
   i := 0;
   LOOP
      GetChar;
      IF  ch = ASCII.LF  THEN  EXIT  END;
      IF  ch # ASCII.CR  THEN
         IF  ch # '='  THEN
            str [i] := ch;
      	    INC (i)
	 ELSE
	    str [i] := ' ';	str [i+1] := '=';	str [i+2] := ' ';
	    INC (i, 3)
	 END
      END;
      IF  index = PostLength  THEN  EXIT  END;
   END;
   IF  i <= HIGH (str)  THEN  str [i] := 0C  END;
END ReadLine;


PROCEDURE ProcessPost;

BEGIN
   index := 0;
   LOOP
      ReadLine (content);
      WriteString (content);
      WriteLn;
      IF  index = PostLength  THEN  EXIT  END;
   END;
END ProcessPost;


BEGIN
   cgi.InformServer (Text);
   IF  cgi.GetEnvVar (RequestMethod, content) = FALSE  THEN
      WriteString ('Request method not found.');
      WriteLn;
      HALT
   END;
   WriteString ('The REQUEST_METHOD is ');
   WriteString (content);
   WriteLn;
   IF  StrEq (content, 'POST')  THEN
      IF  cgi.GetEnvVar (ContentLength, content) = FALSE  THEN
         WriteString ('No content length specified');
	 WriteLn;
	 HALT
      ELSE
         WriteString ("Content length = ");
	 WriteString (content);
	 WriteLn;
         NumConv.Str2Num (PostLength, 10, content, ok);
	 IF  NOT ok  THEN
	    WriteString ("error in number : ");
	    WriteString (content);
	    HALT
	 END
      END;
      ProcessPost;
   END;
   WriteBf
END mailCGI.
   
As you can see, the source is not too big and it reads quite well, even if you have no experience with Modula-2. Perhaps the only 'strange' construct is the function call with the period in the middle of the word, like in 'cgi.InformServer'. This means, that the function 'InformServer' is declared in library 'cgi'. That's all.

Do yourself a favor and run this executable to see what it does. You need to run it from the file 'post.html' which is in the navigator frame on the right.

What mailCGI does so far

Until now, mailCGI lets me get familiar with the dataflow. I do so by telling the webserver that a lot of data is coming up and all of it is plain text. From that moment on, the browser screen becomes a kind of DOS terminal where all messages are printed.

This way, I can fool around with WriteString and similar functions to see what's happening between the CGI executable and the Apache webserver. And how it works out on the screen.

Go play with the post.html file and try to crash the program. The strings cannot store more than 256 characters. Hint: you won't succeed.... This is not C. This is Modula-2.

Compiling and installing CGI executables

This is a tricky part, if you are not running a Unix derivative. It helps to have many virtual terminals open. To smooth out things, I run three X terminals with:

  1. a terminal running the Mocka compiler IDE
  2. a terminal that runs root in /var/www/cgi-bin
  3. a terminal running my editor (jed) to change the html files
The Mocka IDE runs quite easy. It's just a continued and alternate editor run followed by a compile/link run. Terminal two also does just two things:
      
bash-2.05# cp /home/jan/modula/cgi/mailCGI .
bash-2.05# chown root:root mailCGI
bash-2.05# ls -l
total 228
-rwxr-xr-x    1 root     root        17268 Feb  2  2005 P9add
-rwxr-xr-x    1 root     root        17364 Feb  2  2005 P9mul
-rwxr-xr-x    1 root     root        35063 Aug 27  2004 access
-rwxr-xr-x    1 root     root        40052 Aug 25  2004 cg
-rwxr-xr-x    1 root     root        40268 Aug 25  2004 counterCGI
-rwxr-xr-x    1 root     root        22292 Mar 18 23:37 mailCGI
-rwxr-xr-x    1 root     root          268 May 24  2001 printenv
-rwxr-xr-x    1 root     root          757 May 24  2001 test-cgi
-rwxr-xr-x    1 root     root        38507 Aug 23  2004 testCGI
bash-2.05# cp /home/jan/modula/cgi/mailCGI .
bash-2.05# chown root:root mailCGI
   
I copy the executable from the Modula-2 directory cgi into the place where Apache expects the CGI scripts. After that, I change the ownership of the executable to be root:root so that it can be executed.

Next: filtering the parameters

Soon more. But don't count on it....

Page started in 2005 and