Crashed CGI executable

I've been making a reasonable amount of CGI executables, as you might have noticed from this section. And the thought of making one that gets stuck in an infinite loop always worried me a bit. Not for my own system; I can kill runaway tasks quite easily with the appropriate commands.
But what happens when my CGI program gets stuck in a loop when it runs on the (unattended) webserver of my webhost? Will it bring down the system? Or slow it down to a grinding halt?

So I decided to make an infinite loop CGI program and test it on my own local system (Hydrogen happily volunteered for the job). Below is what happened. Be prepared for a small shock.

Case 1: the silent infinite loop

Below is the source of the endless loop program. It was compiled in the usual way and then copied to the right place (/var/www/cgi-bin) and the owner was changed into root:root. I hope this source is not too complex. I could have put it all on one line, but then it would look too much like a C programsource.. :o)

MODULE loop;

BEGIN
   LOOP
   END
END loop.
   
Easy, not? It's just an endless loop, no more.

In my test environment I have a frameset dedicated to silly tests. The 'content' frame (the one with the clickable links, like the one on the right) has an entry as follows:

<br>
# <a href="http://hydrogen.fruttenboel/cgi-bin/loop"    Target="main">Loop</a>
   
which is as usual in dynamically produced webpages.

Starting the endless loop CGI

So I collected all my courage and loaded the test environment. I clicked the 'Loop' link in the navigator and nothing happened. Only the mouse cursor changed into a watch (hourglass for old fashioned operating systems). So I clicked once more. Same effect.

One difference though: The system was getting very sluggish. The response time went up by a factor of 5 at least. So I went to the shell terminal in Desktop 4 and started a 'top' command:

     PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
     --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- ---------------
     312 nobody    12   0   408  408   336 R    46.5  0.6   1:11 loop
     310 nobody    12   0   408  408   336 R    45.9  0.6   2:03 loop
   

The eagle has landed!

Yes, we did it! We have two runaway programs, eating up all the unused cycles from the central processor. That's why the system slowed down so much.

The outcome:

I watched the top statistics for some time. In the '/etc/apache/httpd.conf' file I once noticed a time out value of 300 seconds. As you might have learned already: that's five minutes. And two had already passed for the process with PID 310. So another 3 minutes waiting was required to see IF Apache would do anything and if so: what! Would it bomb out? Issue a warning to the webmaster?

All of a sudden, I saw the word '<defunct>' appear at the end of the line of PID 310 in the 'top' listing.

     PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
     --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- ---------------
     312 nobody    12   0   408  408   336 R    46.5  0.6   1:38 loop
     310 nobody    12   0   408  408   336 R    45.9  0.6   2:29 loop <defunct>
   
And then the line disappeared. One minute later, the other runaway program was gone as well:
     PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
     --- ----     ---  --  ---- ---- ----- ---  ---- ----  ----- -------------------
     314 jan       10   0  1028 1028   804 R     2.3  1.6   0:08 top
     233 jan        9   0  7660 7300  6340 R     2.1 11.6   0:11 kdeinit
   
That was a pleasant surprise: Apache kills all CGI executables after 150 seconds of inactivity. So if the program gets caught in a loop, in 2.5 minutes, the webserver will abort the defunct executable.

I checked the error logs in '/var/log/apache/error_log' but no entry was found for the 'loop' case. In '/var/log/apache/access_log' there were two entries related to the loop executable:

   192.168.56.1 - - [12/May/2006:19:55:13 +0200] "GET /cgi-bin/loop HTTP/1.0"  200 -
   192.168.56.1 - - [12/May/2006:19:56:12 +0200] "GET /cgi-bin/loop HTTP/1.0"  200 -
   
and that was all. Apache recovered from the 'crash' automatically and was not impressed enough to even make a note about it.

I like the Apache webserver!

Case 2: a non-silent CGI executable.

The above version of 'loop' is silent. Apache is awaiting output and after 150 seconds it times out. But what will happen when the CGI is outputting data from within an infinite loop? Then the program is not silent and Apache cannot determine if the data put out by the CGI executable makes sense. So I made the following program:

      
MODULE loop;

IMPORT  InOut, cgi;

VAR  i, j 	    : CARDINAL;

BEGIN
   cgi.InformServer (cgi.Text);
   LOOP                             (* Infinite loop *)
      j := 0;
      REPEAT
         INC (j);
         i := 0;
         LOOP                       (* Finite loop   *)
            INC (i);
	    IF  i > 9999999  THEN  EXIT  END
         END;
         InOut.Write ('#')
      UNTIL j = 75;
      InOut.WriteLn
   END
END loop.
   
A bit bigger but still small enough. The finite (innermost) loop will just count to 10 million and then it leaves the loop and a pound sign is printed. After 75 pounds, a linefeed is issued. I informed the server that Text is coming up, so we can use the browser as a smart terminal.

What happened this time:

I compiled the source, copied it to '/var/www/cgi-bin' and changed the ownership to 'root:root'. Then I started the same webpage as before. A quick 'ps aux | grep loop' yielded the following output:

   nobody     435 91.1  0.6  2096  436 ?        R    00:30   0:15 loop 
   jan        437  0.0  0.7  1328  460 pts/1    S    00:30   0:00 grep loop
   
which is clear: the eagle has yet again landed. Now the question is: how long will it stay put there? First let's see how active 'loop' is at the moment. A 'top' tells:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ---- ----  ---- ---  ---- ----  ----- --------------
  435 nobody    15   0   436  436   360 R    91.7  0.6   0:57 loop
   
which proves our case. Time to make some coffee or open a bottle of beer. After 150 seconds: nothing happens. After 5 minutes: see for yourself.
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  435 nobody    13   0   436  436   360 R    93.2  0.6   3:18 loop
   
Right now the machine is running and I want to see if it bombs out at all within a reasonable timeframe. It's over ten minutes now. Ten minutes in which all the 'surplus' clockcycles are absorbed by the CGI loop. 12 lines of pounds have been printed at the moment.
After having watched a full movie ("Airplane II: the sequel") this is the result:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  435 nobody    17   0   436  436   360 R    93.0  0.6  98:12 loop
   
Time to kill the sucker. After more than one and a half hour it still eats up lots of cycles. So the Apache only kills runaway programs that do not output data anymore...

Case 3: the partly silent program.

As long as the CGI executable keeps emitting data, Apache will consider it a valid process. But what will happen when there was an initial burst of data followed by a silent infinite loop? My guess and hope is that the time-out will start from the moment the last character was received. The upcoming test will investigate right this:

MODULE loop2;

IMPORT  InOut, cgi;

VAR  i, j 	    : CARDINAL;

BEGIN
   cgi.InformServer (cgi.Text);
   j := 0;
   REPEAT
      INC (j);
      i := 0;
      REPEAT
         INC (i)
      UNTIL i = 1000000;
      InOut.Write ('#');
      InOut.WriteBf
   UNTIL j = 75;
   InOut.WriteLn;
   InOut.WriteString ("Endless loop mode entered.");
   InOut.WriteLn;
   LOOP
      INC (i)
   END
END loop2.
   
Compile and copy the executable in the usual way. Here's proof that it runs like crazy:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  325 nobody    14   0   436  436   360 R    91.1  0.6   0:23 loop2
   
After 174 seconds this is what happens:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  325 nobody    14   0   436  436   360 R    92.3  0.6   2:54 loop2
   
I.e. : Plain Old Nothing. The program still runs after 150 seconds. This is a bummer. I hoped that the inactivity timer would be started after the last character received by Apache. Apparently it doens't.

Still, while I type this, something happens:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  323 jan       17   0  1028 1028   804 R     2.8  1.6   0:11 top
   
This means that loop2 had been killed by Apache! So when the program was silent from the beginning, it is killed after 150 seconds and when silent after activity, it is killed after (I guess) 300 seconds. Time for another test:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  412 nobody    19   0   436  436   360 R    75.1  0.6   0:04 loop2
   
OK, it's running again. Time to wait.... And I was lucky today:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  --- ------   ---  --  ----  --- ----- ---- ---- ----  ----- --------------
  412 nobody    11   0     0    0     0 Z    61.1  0.0   4:35 loop2 <defunct>
   
So after 4 minutes and 35 seconds (275 seconds in total) Apache takes the scalp of the runaway loop. Strange number.... Oh wait: it took 15 seconds to carry out the non-silent part of the program. So the actual time-out was something like 260 seconds. Could this actually be 256 seconds?

Is a runaway CGI stoppable?

I started the loop2 program and 'top' showed it's activity. Now there are several methods to kill 'loop2':

Method Result
Back When you hit the 'Back' button, the previous webpage is reloaded. After a few seconds, 'loop2' disappears from the 'top' listing with a '<defunct>' message behind it's name. One second later, the CGI executable is killed.
Escape Pressing the 'Escape' key makes the cursor change from 'wait' mode into 'ready' mode. In Linux: the watch is removed and the arrow is back. But the task keeps on running. I had to change to a root shell and use 'kill -9 <pid>' to stop the program.
Reload 'Reload' will reload the current webpage and the cursor will change back from 'wait' mode into 'ready' mode. Like with the 'Escape' keypress. But the runaway task remains running. Again, root had to kill nobody's task.
Stop Clicking the 'Stop' button makes the cursor change from 'wait' mode into 'ready' mode (like in the previous two cases) and loop2 keeps on running. Again it had to be terminated by root.
Home Clicking 'Home' will bring you to the homepage you specified. The task keeps on running. It requires root access to terminate it.

I think it's pretty clear by now: if you have a runaway CGI program, you can only stop it with the 'Back' button. All other obvious methods make the cursor change into a 'safe' shape, but nothing will be done with the runaway program. Only 'Back' will terminate the current CGI task.

Conclusion:

Non-silent CGI programs in an infinite loop will cripple your server!

Something to think about. Based on these tests, it is advisable to keep your CGI programs as silent as possible. If it enters an infinite silent loop, it will take a silence of five minutes at most before it is killed by the webserver. But if stray tokens are being emitted, it will take forever.

Page created on 12 May 2006 and