CGI and datafiles.

Simple CGI executables need only some data handed over to them via the methods of POST or GET and all results are returned via the standard output channel (back into the webserver for further processing). But slighlty more complex CGI's need to have some memory. To be able to retrieve or reconstruct a previous state. Like a page-counter. So the bigger CGI programs need to be able to access files on disk. That's the subject of this topic.

Case 1: See if we can access files.

The first program is about being able to open a file. And if the file does not exist, report it to the user. We will use the by now familiar method of converting the webbrowser screen to a VT-100 terminal. Here's the source of the first program (called 'rooster' which is a dutch word):

MODULE rooster;

IMPORT ASCII, InOut, TextIO, cgi;

PROCEDURE Init;

VAR	  File 		: TextIO.File;
	  ch		: CHAR;

BEGIN
   TextIO.OpenInput (File, "data.test");
   IF  NOT TextIO.Done ()  THEN
      InOut.WriteString ("Cannot open file. Aborting.");
      InOut.WriteLn;
      HALT
   ELSE
      REPEAT
         TextIO.GetChar (File, ch);
	 InOut.Write (ch)
      UNTIL TextIO.EOF (File) = TRUE;
      InOut.WriteBf
   END;
   TextIO.Close (File)
END Init;

BEGIN
   cgi.InformServer (cgi.Text);
   Init;
END rooster.
   
It's a simple program and it compiles (with the 9905m compiler to which I have converted completely). As you can see: only qualified imports. Some more typing efforts, but the program gets another look. Perhaps the C++ guru's will have less problems reading this kind of source...

The output of rooster

Below is a copy of the screen of the Mocka IDE for compiling the module and to see what it does. Remember: we are still in a dry run. The CGI executable is running in console mode, without any attachments to a webbrowser or -server. As I told before: I converted to Mocka 9905m, which was modified by Dr Maurer.

jan@beryllium:~/modula/cgi$ mocka
Mocka 9905m
>> i rooster
>> p rooster
.. Compiling Program Module rooster I/0002 II/0002
.. Linking rooster
>> rooster
Content-type:text/plain

Hi there,

This is the content of the file 'data.test'.

See if we get to see it or not....
>> mv data.test Data.test
>> rooster
Content-type:text/plain

Cannot open file. Aborting.
>> q
jan@beryllium:~/modula/cgi$
   
A short file and a short outcome. But it works, on the command line. Now see if we can get it to work via the webserver. And it does! It's cool and cruel to see these silly magic words on the VT-100 screen. It took some effort and research to get things done, but now it works.

The tricks needed to get the thing working:

  1. Make sure you are user 'root'
  2. Find out where your cgi-bin is located
  3. Copy the executable to that directory
  4. Put the datafile there as well
  5. Make sure you have an entry in an HTML file
  6. Cross your fingers and run it!
On this Debian 3.1 system, the cgi-bin was not quite close to my DocumentRoot... I found the location of the /cgi-bin by examining the file /etc/apache2/sites-available/default. Here is part of that file:
        ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
	<Directory "/usr/lib/cgi-bin">
	        AllowOverride None
		Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
		Order allow,deny
		Allow from all
	</Directory>
   
Look for the line containing the words 'ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/' or something similar. In my case, the CGI executables need to be in '/usr/lib/cgi-bin'. The rest was easy. Copy the files, make sure they have the right privileges and start the file as usual.

Now I'm running with Slackware 12.2 (yeah, outdated but still going strong and reliable) and the cgi-bin is in a different place: /var/www/cgi-bin. That's more like it. Where else would you go looking for it? Here's what 'ls' gives:
root@Beryllium:/var/www/cgi-bin# root@Beryllium:/var/www/cgi-bin# ls -l
total 244
-rwxr-xr-x 1 root root   27732 2013-03-01 20:38 rooster
-rwxr-xr-x 1 jan  users  27756 2009-06-11 00:23 rooster2
-rwxr-xr-x 1 jan  users  23620 2009-06-07 23:32 testCGI
And you can start a file, locally, by entering in the URL bar of your browser:
https://localhost/cgi-bin/testCGI
When using a hyperlink, you need a line, similar to
<a href="/cgi-bin/rooster"      Target = "main">Try rooster</a>
as I did in the navigator (rightclick the mouse in the navigatorframe, choose 'This frame' and 'View frame source' to see this same line) or could have done in just about any HTML file.

Case 2: Creating files.

The second test is trying to write files to disk. I want to open a file, display the contents and then write back something else to the same file. So that, if we read the file next time, we can see that things have changed in between. So I made the following executable:

MODULE rooster2;

IMPORT ASCII, InOut, TextIO, cgi;

PROCEDURE Init;

VAR	  File 		: TextIO.File;
	  ch, Ch	: CHAR;
	  count, n	: CARDINAL;

BEGIN
   TextIO.OpenInput (File, "Counter.data");
   IF  NOT TextIO.Done ()  THEN
      InOut.WriteString ("Cannot open file. Aborting.");
      InOut.WriteLn;
      HALT
   END;
   count := 0;
   n := 0;
   REPEAT
      TextIO.GetChar (File, ch);
      InOut.Write (ch);
      IF  count = 0  THEN  Ch := ch  END;
      INC (count)
   UNTIL TextIO.EOF (File) = TRUE;
   TextIO.Close (File);
   InOut.WriteLn;
   TextIO.OpenOutput (File, "Counter.data");
   IF  Ch = 'Z'  THEN  Ch := 'A'  ELSE  INC (Ch)   END;
   REPEAT
      TextIO.PutChar (File, Ch);
      INC (n)
   UNTIL n > count;
   TextIO.Close (File)
END Init;

BEGIN
   cgi.InformServer (cgi.Text);
   Init;
END rooster2.
   
This executable compiles errorfree and it does what it was intended to do:
  1. Open the file for reading
  2. Read a character from it
  3. Store it in 'Ch'
  4. Read the rest of the characters
  5. Close the file
  6. Increment the character in 'Ch'
  7. Open the same file for writing
  8. Write the new character to the existing file, but one more than there used to be
  9. Close the file
The file initially contains a sequence of the letter 'A' (20 bytes). After the first run, there are 21 'B's and so forth.
jan@beryllium:~/modula/cgi$ mocka
Mocka 9905m
>> i rooster2
>> p rooster2
>>
 .. Compiling Program Module rooster2 I/0002 II/0002
 .. Linking rooster2
>> rooster2
Content-type:text/plain

AAAAAAAAAAAAAAAAAAAA

>> rooster2
Content-type:text/plain

BBBBBBBBBBBBBBBBBBBBBB
>>
   
As you can see, on the command line, this program works. Now see if it runs through Apache.
I will spoil your fun: it didn't. It ran, it showed the file contents, but after each successive run, the file contents did not change. I solved it though: the trick lies in the file permissions. After the command
beryllium:/usr/lib/cgi-bin# chmod 666 Counter.data
   
things were settled. Program 'rooster2' just didn't have the right permissions... In the meantime, we have proven that CGI programs can open files for reading and writing in the '/cgi-bin' directory. I also built-in a safety precaution for the file 'Counter.data'. If the character becomes 'Z', it wraps back to 'A', but the count keeps on incrementing.

Case 3: A simple counter.

Now that we can read and write data to files, it becomes interesting to write some more useful data to a file. So I made the 'log' module as shown below. It reads a number (in ASCII) from a file and processes it:

MODULE log;

IMPORT  cgi, InOut, TextIO, Strings;

VAR	File 	    	   : TextIO.File;
	number		   : CARDINAL;

BEGIN
   cgi.InformServer (cgi.Text);
   TextIO.OpenInput (File, 'CounterFile');
   IF  TextIO.Done () = FALSE  THEN  
      number := 0
   ELSE
      TextIO.GetCard (File, number);
      TextIO.Close (File)
   END;
   INC (number);
   TextIO.OpenOutput (File, 'CounterFile');
   TextIO.PutCard (File, number, 15);
   TextIO.Close (File);
   InOut.WriteString ("Access counter now is :");
   InOut.WriteCard (number, 15);
   InOut.WriteLn;
   InOut.WriteBf
END log.
   
If run in a text console, without a webserver in between, it works as could be expected:
jan@beryllium:~/modula/cgi$ mocka
Mocka 9905m
>> i log
>> p log
>>
 .. Compiling Program Module log I/0002 II/0002
 .. Linking log
>> rm CounterFile
>> log
Content-type:text/plain

Access counter now is :              1
>> log
Content-type:text/plain

Access counter now is :              2
>>
   
This is promising. Combine this with the previous experiences and we have to:
  1. put the executable in /cgi-bin
  2. make sure it is executable
  3. create a file called CounterFile with a '0' inside in /cgi-bin
  4. chmod 666 CounterFile
If you do this, add a line in your HTML dispatcher as follows:
&nbsp;o <a href="/cgi-bin/log"          Target = "main">log</a>
   
and now do clicka-di-click on the word 'log' in your navigator frame. Each time you click, a new VT-100 console is presented in your webbrowser screen. Isn't life wonderful?

I do have to admit one thing: I was unable to let the 'log' CGI executable create the file 'CounterFile' by itself. On the commandline it will do as I programmed. But when in the CGI directory, 'log' simply will not create the file in '/cgi-bin'. Most probable this is caused by some weird permission property I have yet overlooked. For the time being, let's be cheerful that we can make a very simple and fast pagecounter.

Case 4: the CGI executable outside /cgi-bin.

If I cannot have the CGI executable create a file in '/cgi-bin', the logical solution is to put my CGI program outside the cgi directory and run it there, where the datafiles live. So I parked the 'log' program in a random directory and made an HTML reference to it.

On the right, you see what happens when you try to run that program.

It won't run at all since it is not in the sole directory that is allowed to contain executables. Your webbrowser (in this case FireFox) will show a window in which you have two choices, but we are looking for the third option, which seems to be prohibited.


The unsafe workaround.

Based on the previous tests, I managed to come up with a workaround that works on my own computer which has an Apache webserver running by default. Do not try this on the server of a commercial webhost.
The datafile cannot be created due to permission problems. My executable is able to run, but that's all. It needs to be more in control, so I gave it some. As user 'root' I set the SUID bit to make the program aquire God-status in the Unix world. Below you see what I did. For this moment I ask you to test this case on your own private server.

beryllium:/usr/lib/cgi-bin# chmod 4755 log
beryllium:/usr/lib/cgi-bin# ls -lh
total 28K
-rw-rw-rw-  1 root root  15 2006-06-29 22:47 CounterFile
-rwsr-xr-x  1 root root 28K 2006-06-29 22:21 log
beryllium:/usr/lib/cgi-bin# rm CounterFile
beryllium:/usr/lib/cgi-bin# ls -lh
total 28K
-rwsr-xr-x  1 root root 28K 2006-06-29 22:21 log
beryllium:/usr/lib/cgi-bin#
   
At this point, I ran the 'log' program via the navigator frame. It produced the correct result (1) so it worked and now I'm curious whether it also created the file 'CounterFile'. I won't keep you in suspense:
beryllium:/usr/lib/cgi-bin# ls -lh
total 28K
-rw-r--r--  1 root www-data  15 2006-06-30 01:35 CounterFile
-rwsr-xr-x  1 root root     28K 2006-06-29 22:21 log
beryllium:/usr/lib/cgi-bin#
   
As you can see, raising the permissions of the executable enabled it to create the required file. But the 'ls' command also tells us that the file is not only owned by root. The group it belongs to is 'www-data'. Now, perhaps, this has some potential for alternatives:
beryllium:/usr/lib/cgi-bin# chmod 755 log
beryllium:/usr/lib/cgi-bin# ls -lh
total 28K
-rw-r--r--  1 root www-data  15 2006-06-30 01:35 CounterFile
-rwxr-xr-x  1 root root     28K 2006-06-29 22:21 log
beryllium:/usr/lib/cgi-bin# rm CounterFile
beryllium:/usr/lib/cgi-bin# chown root:www-data log
beryllium:/usr/lib/cgi-bin# ls -lh
total 28K
-rwxr-xr-x  1 root www-data 28K 2006-06-29 22:21 log
beryllium:/usr/lib/cgi-bin#
   
At this point, I ran the 'log' program via the navigator frame. It produced a blank screen...
beryllium:/usr/lib/cgi-bin# ls -l
total 28K
-rwxr-xr-x  1 root www-data 27784 2006-06-29 22:21 log
beryllium:/usr/lib/cgi-bin#
   
Not succesfull. The executable definitely needs the SUID bit. This brings me in a troublesome situation. I now have the key to enable the CGI executable to enable creating files. But I doubt if I can do something similar at the webhost where this website is running. I'm just a user there (UID = 'frutt').

As usual, the easiest way to see if it is possible is to just try it. No guts, no glory. So I started an ncftp session and changed directory to /cgi-bin. The system did not object when I issued a 'chmod 4755 log' command and a following 'ls -lh' showed that the file 'log' really was 'SetUID' to user 'frutt'.
But when I ran the file, through a webbrowser, I got an internal server error. Apparently this is not the way to do this thing.

Case 5: Brute force.

If everything else fails, apply brute force. So I did. I made the 'llog' program which is a source for experimenting with file locations. It is very similar to 'log'. Here is the source of 'llog':

MODULE llog;

IMPORT  cgi, InOut, TextIO, Strings;


VAR	File 	    	   : TextIO.File;
	number		   : CARDINAL;


BEGIN
   cgi.InformServer (cgi.Text);
   TextIO.OpenInput (File, '/CounterFile');
   IF  TextIO.Done () = FALSE  THEN  
      number := 0
   ELSE
      TextIO.GetCard (File, number);
      TextIO.Close (File)
   END;
   INC (number);
   TextIO.OpenOutput (File, '/CounterFile');
   TextIO.PutCard (File, number, 15);
   TextIO.Close (File);
   InOut.WriteString ("Access counter now is :");
   InOut.WriteCard (number, 15);
   InOut.WriteLn;
   InOut.WriteBf
END llog.
   
The main difference is, that now I added a '/' slash in front of the filename. And after compiling, I set the SUID bit of the executable. Then I ran it through Apache. Here's the result:
beryllium:/usr/lib/cgi-bin# cp /home/jan/modula/cgi/bin/llog .
beryllium:/usr/lib/cgi-bin# chmod 4755 llog
   
At this point, I ran the 'llog' program via the navigator frame. It showed that the counter was incremented.
beryllium:/usr/lib/cgi-bin# updatedb
beryllium:/usr/lib/cgi-bin# locate CounterFile
/CounterFile
/home/jan/internet/fruttenboel/cgi/CounterFile
/home/jan/modula/cgi/CounterFile
beryllium:/usr/lib/cgi-bin# ls / -lh
total 21K
drwxr-xr-x    2 root root     2.2K 2006-06-10 14:15 bin
drwxr-xr-x    3 root root      384 2006-06-10 13:52 boot
lrwxrwxrwx    1 root root       11 2006-06-09 20:27 cdrom -> media/cdrom
-rw-r--r--    1 root www-data   15 2006-06-30 21:52 CounterFile
   
It's clear: the new file 'CounterFile' has been created in the root directory, just as I instructed it to do. This is a strong hint, that the paths in file specifications are treated in an absolute way, rather than relative to the DocumentRoot (as I had hoped).

I changed 'llog' so that the filename was "~/Counterfile", recompiled, put the executable in /cgi-bin and ran the CGI program. Result: a blank screen. Apparently the file didn't run and no file was created. Time to go to my friends at Google again, this time with a very cunning searchphrase.

Case 6: Gentle force.

On the net I found a page that sheds some light, be it a dim light: at https://www.xcf.berkeley.edu/help-sessions/cgi/x220.html you can read the following:

Another important note regarding user identities as they relate to CGI programs is the fact that programs run as the user which started the process. So, a CGI program which is started by the web server will run with the identity of the web server, not with the identity of the user who created it. This is especially important for write-access to files. If a CGI program, written by the user "luser", relies on a file "cgidata" which luser has in her HTML-file directory, when it is run by the web server [with user-identity "www"], it will not have access to the "cgidata" file, unless luser made the file world-readable. If the CGI program is to write to the file, things get even worse. luser would not want to make the file world-writeable, since any other user on the machine could write to the file, which is a Bad Thing. But since the CGI program runs as www and not as luser, that is the only way to write to the file. Since most CGI application will want to write to files, there is a problem.

Fortunately, there exists a better solution. Unix has a special permission for programs called "setuid". This means that when the program is run, it runs with the identity of the user who owns the file, not the identity of the user executing the program. Thus, when the web server [user "www"] executes luser's program, it runs with luser's permissions. luser can then make the file user-write-able, and the program would have the ability to write to it, but other users would not. Use of the setuid bit is somewhat dangerous since it allows access to otherwise private files, but with care can be used to great advantage.

This seems to be the key to our solution. By applying a correct SetUID to the CGI executable, it can access files which are in the diskspace which is allocated to that specific user. It's worth some further investigation.

MODULE llog;

IMPORT  cgi, InOut, TextIO, Strings;

VAR	File 	    	   : TextIO.File;
	number		   : CARDINAL;

BEGIN
   cgi.InformServer (cgi.Text);
   TextIO.OpenInput (File, '/home/jan/CounterFile');
   IF  TextIO.Done () = FALSE  THEN  
      number := 0
   ELSE
      TextIO.GetCard (File, number);
      TextIO.Close (File)
   END;
   INC (number);
   TextIO.OpenOutput (File, '/home/jan/CounterFile');
   TextIO.PutCard (File, number, 15);
   TextIO.Close (File);
   InOut.WriteString ("Access counter now is :");
   InOut.WriteCard (number, 15);
   InOut.WriteLn;
   InOut.WriteBf
END llog.
   
I changed the ownership and privileges of the executables in the /cgi-bin as follows: Now run the executable through the webserver. The output is correct and the counter is incremented. This is the result of the action:
jan@beryllium:~/data$ ls -l ~
-rw-r--r--   1 jan www-data      15 2006-07-01 00:34 CounterFile
   
The file is created, 'by me' as a member from another group ('www-data'). Thanks to the SetUID feature I can leave the file permissions to the default (and safer) 644: rw-r--r--. Quite a relief.

Case 7: Does 'Tilde' work after SetUID?

Now that we come so far the question is: does a program that is owned by me, and who's UID is forced to mine, have the brains to also use my home directory structure for storing files?

To check this, I changed llog as follows (only the changed lines are listed):

   TextIO.OpenInput (File, '~/CounterFile');

   TextIO.OpenOutput (File, '~/CounterFile');
   
After compiling and chmodding the deception follows: the white screen appears. And it remains white. In short: it doesn't work.

Conclusions:

Experimental results:

  1. On Debian Sarge with Apache2, CGI executables need to be in '/usr/lib/cgi-bin'
  2. CGI executables can read data from files
  3. CGI executables can write to existing files with the 666 permission (insecure)
  4. CGI executables can not always create files
  5. CGI executables will only run from the /cgi-bin directory
  6. If the executable has the SUID bit set, it can handle files with secure permissions
  7. CGI executables do not recognize the tilde token, not even when they are SetUID 'user'
Things to do: If you took care of all of this, your CGI executables can read, write and create files, under the assumption that the set user ID belongs to the absolute home directory. 'SetUID root' is also possible but potentially dangerous.

Page created on 28 June 2006 and