The CGI 404 handler
I've had my share of 404 handlers on my website. In hindsight I must say: the most spartan 404 handler is the best. If you make a 404 page in HTML you end up being silly. Joking about a 404 isn't very nice. Or funny. A human made an error and gets punished for it immediately. By a piece of silicon. Instructed by a smartass on clogs and a tulip between his cheeks.
For a while I tried to please my 404 clients to serve them my sitemap file. But that's like forcing people to read a phonebook when they're just asking for your business card. So the sitemap file wasn't on for long as well.
This topic is about making a smart 404 handler. It will be smart since it will serve a dedicated page and it will inform you about your error. It will be a mix of Modula-2 (for interrogating the webserver and adding 'speed' to the subject) and JavaScript (for the user interface).
Preparations.
It is my intention to have this 404 handler installed on the server of my webhost. If I make a sloppy or
clumsy program, I may well cripple the webserver on which my webhost hosts my site. But I'm not the only user
on that server. So in the worst case, I will bring down the full server, thereby making a lot of websites
inaccessible. Please visit the section 'CGI loop' to get an indication what could happen. So we must be
careful from the very first moment.
If this isn't your first visit to Fruttenboel, you will know that I run Linux wherever possible. So I run a
webserver for standard on all machines in this house. On Beryllium, I run apache-perl. You need to customize
one line in /etc/apache-perl/httpd.conf. This is how it should look like:
# This controls which options the .htaccess files in directories can
# override. Can also be "All", or any combination of "Options", "FileInfo",
# "AuthConfig", and "Limit"
#
AllowOverride All
From now on, you will be able to use '.htaccess' files for instructing the webserver how to handle
ErrorDocuments. The webserver will try to find a '.htaccess' file in the current directory. If there isn't one
there, Apache will travel up the directory tree and use the first one it finds.
In my case, I have a custom '.htaccess' file in /fruttenboel/cgi and it reads:
ErrorDocument 400 /errors/400.html ErrorDocument 401 /errors/401.html ErrorDocument 402 /errors/402.html ErrorDocument 403 /errors/403.html ErrorDocument 404 /cgi-bin/testCGIAll file references are relative to the document root of Apache (which in my case is '/var/www'). The first four error handlers are more or less silly HTML documents. The 404 handler at this moment is a file we know from history: testCGI (it all started with this one).
ErrorDocument 404 'You made a typo, Asssmart!'that specific text would end up in a blank webpage! Apache is a very powerful webserver and you just need to read some books about it. Websearches don't reveal everything. This webpage not taken into account of course.
At this moment, I have my test environment set up. I only need a controlled way to trigger a 404. To do so, I added the following link to the navigator frame (on the right):
o <a href="404handler.html" Target="main">404 handler</a> <br> .... o <a href="farm3.html" Target="main">Create a 404</a> <br>Make the modifications and give it a try. You will be pleasantly surprised....
404 handler: first tests
Our first test is with the 'testCGI' executable in the right place and with the right privileges. This will trap and show all CGI environment variables and show them on screen in a formatted and controlled way. The most striking part is the following:

Apart from the well known CGI environment variables (which we discovered in the previous experiments and
projects in this section), there are now four or five more variables:
| Variable name | Purpose or payload |
|---|---|
| REDIRECT_ERROR_NOTES |
File does not exist: /var/www/net/fruttenboel/cgi/farm3.html
An error message followed by the full, absolute, serverpath to the name of the requested file |
| REDIRECT_REQUEST_METHOD | GET |
| REDIRECT_STATUS |
404
This is the status number code. In this case, a 404. But this same handler could be extended to handle an arbitrary numer of other status codes. Just inspect this variable. |
| REDIRECT_URL |
/net/fruttenboel/cgi/farm3.html
This is the relative path (from /var/www down) to the file that triggered this status code handler |
| REQUEST_URI |
/net/fruttenboel/cgi/farm3.html
In this case, URI is URL. |
The first thing that comes to mind is: damned! I need to extend the CGI module! It lacks some CGI types which we're going to need for this project.
Changing the CGI module
We need the following CGItypes which are not in the present CGI module:
Here's the new CGI.DEF file:
DEFINITION MODULE cgi;
FROM Strings IMPORT String;
TYPE ServerDataType = (Text, Html, Gif, Jpeg, PS, Mpeg);
CGItype = (ContentLength, GatewayInterface, HttpHost,
HttpReferer, QueryString, RemoteAddress,
RemoteHost, RemotePort, RequestMethod,
ScriptName, ServerAddress, ServerName,
ServerPort, ServerProtocol, ServerSignature,
DocumentRoot, RedirectStatus, RedirectUrl,
RedirectNotes, none);
PROCEDURE InformServer (dataType : ServerDataType);
PROCEDURE CheckType (str : String) : CGItype;
PROCEDURE GetEnvVar (kind : CGItype; VAR res : String) : BOOLEAN;
END cgi.
The implementation module of CGI changes not much. At the end of 'CheckType' I added there more checks:
IMPLEMENTATION MODULE cgi;
...
PROCEDURE CheckType (str : String) : CGItype;
BEGIN
CAPS (str); (* Convert entire string to capitals. *)
IF pos ('CONTENT_LENGTH', str) = 0 THEN
RETURN ContentLength
...
ELSIF pos ('REDIRECT_STATUS', str) = 0 THEN
RETURN RedirectStatus
ELSIF pos ('REDIRECT_URL', str) = 0 THEN
RETURN RedirectUrl
ELSIF pos ('REDIRECT_ERROR_NOTES', str) = 0 THEN
RETURN RedirectNotes
ELSE
RETURN none
END
END CheckType;
That's all! The environment value extractor gets a cgitype as a parameter and returns a string. So we need not
intrude that procedure. We only have added three more classes in the CGI object! Yikes. Where's the dettol? I
need to rinse my mouth! Classes and objects....
By the way: compile the enhanced modules as usual:
jan@beryllium:~/modula/cgi$ mocka Mocka 0608m >> d cgi >> i cgi >> c cgi .. Compiling Definition of cgi .. Compiling Implementation of cgi I/0004 II/0004 >>Done! I just love Modula-2 and the Mocka compiler (as adapted by Dr Maurer).
404 handler : a first attempt
On a normal webpage, you would be presented with the final solution. But not here. Fruttenboel is not about solutions to problems. Fruttenboel is about the ROAD that was taken to come to a solution. Including all the detours and dead ends. So I'm going to be fair right now:
So that's in a nutshell how we got here. In the mean time I already made some kind of 404 handler. It works
nice, but it is based on analysis of the wrong CGI environment variable: the referrer! Which isn't present in
the first place, when the 404 generating phrase was typed in the URL bar!
Still, it's a nice program and here it is:
MODULE S404;
(* Attempt to make a smart 404 handler January 2008 *)
IMPORT cgi, InOut, Strings;
TYPE Target = (fam, frutt);
VAR path, Title, content : Strings.String;
Frame : ARRAY [0..2] OF Strings.String;
target : Target;
PROCEDURE CreateHTML;
BEGIN
InOut.WriteString ("<html><head>");
InOut.WriteString ("</head>"); InOut.WriteLn;
InOut.WriteString ("<body><center><h1>");
InOut.WriteString ("404<p>");
InOut.WriteString ("You will be redirected to the most probable main section.<p>");
InOut.WriteString ("You wanted to access the following page:<p>");
InOut.WriteString (content);
InOut.WriteString ("</h1></center></body>");
InOut.WriteString ('<script language="JavaScript">'); InOut.WriteLn;
InOut.WriteString ("<!--"); InOut.WriteLn;
InOut.WriteString ("alert ('Page not found!');");
InOut.WriteString ("parent.location.href = ");
IF target = frutt THEN
InOut.WriteString ("'/fruttenboel/index.html'")
ELSE
InOut.WriteString ("'/net/verhoeven272/index.html'")
END;
InOut.WriteLn;
InOut.WriteString ("// -->"); InOut.WriteLn;
InOut.WriteString ("</script>");
InOut.WriteString ("</html>"); InOut.WriteLn;
InOut.WriteBf
END CreateHTML;
BEGIN
cgi.InformServer (cgi.Html);
IF cgi.GetEnvVar (cgi.HttpReferer, content) = FALSE THEN
content := 'fruttenboel'
END;
IF Strings.pos ('fruttenboel', content) > HIGH (content) THEN
target := fam
ELSE
target := frutt
END;
CreateHTML
END S404.
Just compile the program as usual and copy the executable (as root) to the correct place in the directory
tree. In my case
beryllium:/home/jan/# cp /home/jan/modula/cgi/S404 /usr/lib/cgi-bin/Change the '.htaccess' file to
ErrorDocument 400 /errors/400.html ErrorDocument 401 /errors/401.html ErrorDocument 402 /errors/402.html ErrorDocument 403 /errors/403.html ErrorDocument 404 /cgi-bin/S404This is enough to test the new 404 handler. It produces the following screen:

The page will remain on screen until you press the 'OK' button. Then the second section of the dynamically generated JavaScript starts redirecting:
InOut.WriteString ("parent.location.href = ");
IF target = frutt THEN
InOut.WriteString ("'/fruttenboel/index.html'")
ELSE
InOut.WriteString ("'/net/verhoeven272/index.html'")
END;
So, depending on the origin of the error, the erroneous webvisitor will be redirected to the family site or to
the technical site. Not very smart yet, but this is just a start.
The biggest problem at the moment is the way I try to determine the intended target based on a CGI environment variable which isn't always present.
404 handler : improved version
Below is an improved, not to say: superior, version of the 404 handler. Main differences:
MODULE S4041;
(* Attempt to make a smart 404 handler 15 January 2008 *)
(* It works, but it is based on the wrong CGI environment variables *)
(* The CGI module has been changed. This version is based on the modifications *)
(* This version uses the REDIRECT related variables 15 January 2008 *)
IMPORT cgi, InOut, Strings;
TYPE Target = (fam, frutt);
VAR path, Title, content, status : Strings.String;
Frame : ARRAY [0..2] OF Strings.String;
target : Target;
PROCEDURE CreateHTML;
BEGIN
InOut.WriteString ("<html><head>");
InOut.WriteString ("</head>"); InOut.WriteLn;
InOut.WriteString ("<body><center><h2>");
InOut.WriteString (status);
InOut.WriteString ("<p>You will be redirected to the most probable main section.<p>");
InOut.WriteString ("You wanted to access the following page:<p>");
InOut.WriteString (content);
InOut.WriteString ("</h2></center></body>");
InOut.WriteString ('<script language="JavaScript">'); InOut.WriteLn;
InOut.WriteString ("<!--"); InOut.WriteLn;
InOut.WriteString ("alert ('Page not found : redirecting');");
InOut.WriteString ("parent.location.href = ");
IF target = frutt THEN
InOut.WriteString ("'/fruttenboel/index.html'")
ELSE
InOut.WriteString ("'/net/verhoeven272/index.html'")
END;
InOut.WriteLn;
InOut.WriteString ("// -->"); InOut.WriteLn;
InOut.WriteString ("</script>");
InOut.WriteString ("</html>"); InOut.WriteLn;
InOut.WriteBf
END CreateHTML;
PROCEDURE ShowContent (str : ARRAY OF CHAR);
BEGIN
InOut.WriteString ("<h2>");
InOut.WriteString (str);
InOut.WriteString (" = ");
InOut.WriteString (content);
InOut.WriteString ("</h2>");
InOut.WriteLn
END ShowContent;
BEGIN
cgi.InformServer (cgi.Html);
IF cgi.GetEnvVar (cgi.RedirectUrl, content) = FALSE THEN
content := "RedirectUrl not found";
ShowContent ("ERROR");
HALT
END;
IF cgi.GetEnvVar (cgi.RedirectStatus, status) = FALSE THEN
content := "RedirecStatus not found";
ShowContent ("ERROR");
HALT
END;
IF Strings.pos ('fruttenboel', content) > HIGH (content) THEN
target := fam
ELSE
target := frutt
END;
CreateHTML
END S4041.
Compile it. Let 'root' copy it to the cgi-bin. Then create the error. This new version
ErrorDocument 400 /errors/400.html ErrorDocument 401 /errors/401.html ErrorDocument 402 /errors/402.html ErrorDocument 403 /errors/403.html ErrorDocument 404 /cgi-bin/S4041
404 handler : extensions
The current version works. Locally. I want do some more testings before I upload it to the cgi-bin of my webhost but I can't wait to do so. This is the end of lost visitors due to 404's. From now (then) on visitors are glued to my site. With Loctite.
Things to do:
S4042 and beyond
S4041 was the local version. S4042 was the version intended to be placed in my rented webspace. S4042 was uploaded and the '.htaccess' file was adapted. Still, it did not run. So I sent a mail to my webhost's support team. Their answer: compiled executables are not acceptable error handlers for our servers.
Scripts, written in Perl and such, were acceptable. But I know no Perl or Ruby. So I had the second bright moment of this week and I rewrote the 404 handler to JavaScript only. This is how it looks like. It is called 'H404.html'.
<html>
<head>
<title>Smart 404 handler in JavaScript</title>
</head>
<body>
<center>
<h2>
Error 404
<p>
You will be redirected to the most probable main section.
<p>
You wanted to access the following page:
<p>
<script language="JavaScript">
<!--
document.write (document.URL);
document.write ("<p>");
document.write ("You will be redirected to ");
loca = new String; target = new String;
loca = document.URL; pos = loca.indexOf ("fruttenboel");
if ( pos < 0)
target = "/net/verhoeven272/index.html"
else
target = "/fruttenboel/index.html"
;
document.write (target);
document.write ("<p>");
alert ('Resistance is futile!');
parent.location.href = target;
-->
</script>
</h2>
</center>
</body>
</html>
This is also some kind of script. I uploaded it and changed '.htaccess' once more. This is the result:

Cookies my foot. This would be the third redirector in a row. Apparently this has been disabled by my webhost. So I will have to learn to live with it. Still, at home, on the personal Apache server, it runs very well.
In the mean time I found out that .htaccess files at De Heeg need to be of the form:
ErrorDocument 401 /errors/401.html ErrorDocument 402 http://www.verhoeven272.nl/errors/402.html ErrorDocument 403 http://www.verhoeven272.nl/errors/403.html ErrorDocument 404 http://www.verhoeven272.nl/errors/H404.htmlSo, all the error documents need a fully qualified URL, except the handler for the 401.
if ( pos < 0)
target = "/net/verhoeven272/index.html"
else
target = "/fruttenboel/index.html"
;
This is a relative path and it might just as well be that there should be fully qualified URL's there as well.
So I changed these lines to
if (pos < 0)
target = "http://www.verhoeven272.nl/index.html";
else
target = "http://fruttenboel.verhoeven272.nl/index.html";
I changed the .htaccess file, went to this CGI section and clicked 'Create a 404'. I didn't get the
redirection error message anymore. Yet, this is what happened:

This however is very assuring. It means that the Javascript method works. The only problem is: this is a redirected 404 file. So the H404 file does not get the data that caused the 404 to happen. In a nutshell:
IF target = frutt THEN
InOut.WriteString ("'http://fruttenboel.verhoeven272.nl/index.html'")
ELSE
InOut.WriteString ("'http://www.verhoeven272.nl/index.html'")
END;
Recompiled, uploaded to the cgi-bin directory and changed the .htaccess file. Crossed my fingers and went to
the CGI section and clicked once more on 'Create a 404'. There was no chain reaction. The atmosphere did not
ignite. Instead one silly line appeared on screen:
MODULE S4042;
(* Attempt to make a smart 404 handler 15 January 2008 *)
(* It works, but it is based on the wrong CGI environment variables *)
(* The CGI module has been changed. This version is based on the modifications *)
(* This version uses the REDIRECT related variables 15 January 2008 *)
(* S4042 is the version that runs on my hosted website 15 January 2008 *)
IMPORT cgi, InOut, Strings;
TYPE Target = (fam, frutt);
VAR content, status : Strings.String;
target : Target;
PROCEDURE CreateHTML;
BEGIN
InOut.WriteString ("<html><head>");
InOut.WriteString ("</head>"); InOut.WriteLn;
InOut.WriteString ("<body><center><h2>");
InOut.WriteString (status);
InOut.WriteString ("<p>You will be redirected to the most probable main section.<p>");
InOut.WriteString ("You wanted to access the following page:<p>");
InOut.WriteString (content);
InOut.WriteString ("</h2></center>");
InOut.WriteString ('<script language="JavaScript">'); InOut.WriteLn;
InOut.WriteString ("<!--"); InOut.WriteLn;
InOut.WriteString ("alert ('Page not found : redirecting');");
InOut.WriteString ("parent.location.href = ");
IF target = frutt THEN
InOut.WriteString ("'http://fruttenboel.verhoeven272.nl/index.html'")
ELSE
InOut.WriteString ("'http://www.verhoeven272.nl/index.html'")
END;
InOut.WriteLn;
InOut.WriteString ("// -->"); InOut.WriteLn;
InOut.WriteString ("</script>");
InOut.WriteString ("</body></html>"); InOut.WriteLn;
InOut.WriteBf
END CreateHTML;
PROCEDURE ShowContent (str : ARRAY OF CHAR);
BEGIN
InOut.WriteString ("<h2>");
InOut.WriteString (str);
InOut.WriteString (" = ");
InOut.WriteString (content);
InOut.WriteString ("</h2>");
InOut.WriteLn
END ShowContent;
BEGIN
cgi.InformServer (cgi.Html);
IF cgi.GetEnvVar (cgi.RedirectUrl, content) = FALSE THEN
content := "RedirectUrl not found";
ShowContent ("ERROR");
HALT
END;
IF cgi.GetEnvVar (cgi.RedirectStatus, status) = FALSE THEN
content := "RedirectStatus not found";
ShowContent ("ERROR");
HALT
END;
IF Strings.pos ('fruttenboel', content) > HIGH (content) THEN
target := fam
ELSE
target := frutt
END;
CreateHTML
END S4042.
The required CGI Environment Variable is only present after the first redirection. After the second
redirection, the usual CGI variables are present and the cause of the error has been flushed. Down the toilet.
Gone forever.
The .htaccess file at THIS moment looked like:
ErrorDocument 401 /errors/401.html ErrorDocument 402 http://www.verhoeven272.nl/errors/402.html ErrorDocument 403 http://www.verhoeven272.nl/errors/403.html ErrorDocument 404 http://www.verhoeven272.nl/cgi-bin/S4042I think it's time to run 'testCGI' as error handler, to see what we all got.
Tests run on the remote webserver
I changed the .htaccess file to:
ErrorDocument 401 /errors/401.html ErrorDocument 402 http://www.verhoeven272.nl/errors/402.html ErrorDocument 403 http://www.verhoeven272.nl/errors/403.html ErrorDocument 404 http://www.verhoeven272.nl/cgi-bin/testCGIand created the 404 in the (by now) usual way. I got the familiar table on screen and one line was assuring:

The CGI referrer was still pointing to http://fruttenboel.verhoeven272.nl/cgi/cgicontent.html and that's the
directory in which the 404 was forced. This gives me one more idea: what will testCGI produce when I force a
404 from the URL bar? I have NO reason not to test it with testCGI. 'testCGI' is reliable, tested and rugged.
It never hangs. Why should I not use it? Perl and Python scripts are much more dangerous. So I am going to
reinstall the .htaccess file we saw above and see what happens. Hang in there just one more second.
Of course it ran errorfree. But the output resulted in a lot of information but not a single clue to what the
user might have typed into the URL bar. One thing of interest was, that the HTTP_REFERER was absent. We can
use that. But how? Time for a night's rest....
S4043
So I took the S404 handler and changed it into this:
MODULE S4043;
(* Attempt to make a smart 404 handler January 2008 *)
(* After S4041 and S4042, it became clear that S404 wasn't
that bad after all January 2008 *)
IMPORT cgi, InOut, Strings;
TYPE Target = (fam, frutt);
VAR content : Strings.String;
target : Target;
PROCEDURE CreateHTML;
BEGIN
InOut.WriteString ("<html><head></head>"); InOut.WriteLn;
InOut.WriteString ("<body>S4043<center><h2>");
InOut.WriteString ("404<p>");
InOut.WriteString ("You will be redirected to the most probable main section.<p>");
InOut.WriteString ("You wanted to access the following page:<p>");
InOut.WriteString (content);
InOut.WriteString ("</h2></center></body>");
InOut.WriteString ('<script language="JavaScript">'); InOut.WriteLn;
InOut.WriteString ("<!--"); InOut.WriteLn;
InOut.WriteString ("alert ('Resistance is futile!');");
InOut.WriteString ("parent.location.href = ");
IF target = frutt THEN
InOut.WriteString ("'http://fruttenboel.verhoeven272.nl/index.html'")
ELSE
InOut.WriteString ("'http://www.verhoeven272.nl/index.html'")
END;
InOut.WriteLn;
InOut.WriteString ("// -->"); InOut.WriteLn;
InOut.WriteString ("</script>");
InOut.WriteString ("</html>"); InOut.WriteLn;
InOut.WriteBf
END CreateHTML;
BEGIN
content := "URL typed into URL bar";
cgi.InformServer (cgi.Html);
IF cgi.GetEnvVar (cgi.HttpReferer, content) = FALSE THEN
target := frutt
ELSE
IF Strings.pos ('fruttenboel', content) > HIGH (content) THEN
target := fam
ELSE
target := frutt
END
END;
CreateHTML
END S4043.
Compiled it, upladed to the cgi-bin and changed the .htaccess file. First I tried it on y own system. It ran
errorfree. Then I forced the 404 on the hosted system. I got an internal server error. So that concludes this
topic for the time being. The .htaccess file has been erased again. I prefer a spartan 404 handler over a
silly one. I'd like to have one of the handlers that run on my private server, running on my leased webspace.
But that is not possible at this time. Perhaps later....
Below is what S4043 looks like when ran on my private Apache system. When I click the button I'm redirected to the right URL. But when I upload the same executable to my webhost, things go crazy.

Page created on 15 January 2008 and
Page equipped with FroogleBuster technology