[<<]Back to the main page

Programming Manual

HTTPi was designed first and foremost to be a hacker's webserver that was easily understood, easily patched and trivially maintainable -- thus this document.

The Programming Manual describes HTTPi internals to people interested in writing or patching their own applications into HTTPi. Perpetually under construction. Additions and suggestions, user patches, etc., are welcomed to the code base. However, I or an authorised maintainer reserve the right to choose what gets checked in. Remember the philosophy: "more with less" :-)

Send your questions or difficulties to httpi@floodgap.com.

Getting Started with the HTTPi configure Script

If you didn't read this page thoroughly, you would probably just go to where your configure script dropped the Perl executable and merrily edit it in-place. And it would probably work, too. Well, I've got three words for that:

Don't do it!

Why? If you decide to make configuration changes at a later date, or upgrade to a later version of HTTPi, guess what happens to your changes when configure runs again? You got it: completely obliterated. That's why HTTPi has a primitive make system of sorts built into the configure script suite. (Again, please recall there are multiple configure scripts; replace configure with the appropriate one for your setup.) By working in with the configure scripts, you can make changes to your heart's content and carry them across configuration changes and server updates. So don't edit the HTTPi executable directly!

There are four files in the HTTPi distribution that you need to worry about. These files are glommed together at configure-time. (Before HTTPi 1.2, everything was in httpi.in. If you are, for some pathological reason, patching an old version, then remember that all changes must be made there, and that some features might not be available to you. After that, but prior to 1.5, also keep in mind that the current userfunc.in was also part of httpi.in.) The files are:

We'll discuss making those changes later. How do you use configure?

If you have a settings transcript from a previous run of configure (usually named something like transcript.0.configure), you can use that to save typing, and configure will get its answers from that. Just do

% perl configure -d filename

where filename is the transcript file, either one automatically made by configure, or one of your own. If you leave out the filename, the default answer to questions will be used. That's it. (If you are using Demonic HTTPi, you will need to kill the old process(es) and start the new one for changes to take effect. Don't worry; when run in this mode configure doesn't change your inetd or xinetd configuration files.)

Otherwise, just run the appropriate configure script as usual and answer the necessary questions.

Naturally, none of the *.in files are runnable Perl; they're teased and twisted by configure into your runtime object, since configure actually contains a miniature pre-processor. Lines like this:

are exactly identical to
(except that they can't be nested) and have exactly the same semantics as they do in the standard C pre-processor. These are used to determine build-time code based on settings you select and on system options (like, for example, whether your system supports setruid() -- AIX doesn't, Darwin doesn't seem to either, NetBSD does, Linux does). There's also a mystical ~insert that has similar semantics to #include, except that you can't yet nest it either. There are also defines in the httpi.in file that you should be able to match up with questions asked during the configure process. Unless you're a white hat, don't mess with them or preprocessor directives.

In short, configure is your make processor. Use it, love it. You could modify the script in place, but configure is, we think, much friendlier, and you'll thank us for it later.

Hint: If you hopelessly destroy your *.in files, the real Unix make can rescue you. Assuming the stock/ directory is still intact, make revert will replace all your *.in files with the standard distribution versions. Sorry, this works in 1.2 and up only.

Making User Variable Changes

Several important globals reside in uservar.in. If you want to add extra headers, change the MIME types reported for a file extension, alter the rules for the restriction matrix or add IP-less virtual hosts, this is the file to change. (Before 1.2, these globals were in httpi.in.)

To make these changes, simply edit uservar.in and make the needed alterations to the globals. Only these intentionally user-configurable globals are kept in this file; all others will be in httpi.in and should be reserved for white hats. Then run configure again, as described above, to update your executable.

The list of HTTPi globals is listed later in this manual.

Writing In Your Own Custom Handlers

Adding functionality with modules.in

Adding internal services to HTTPi is very simple to do: simply put them into modules.in (before 1.2, into httpi.in). At that point the HTTP request has been decoded, you have all of the needed globals defined, and you're at the point where HTTPi needs to determine what the browser should get back.

For example, if you inserted this snippet of code:

	if ($address eq "/whoami") {
		&htsponse(200, "OK");
		&htcontent(<<"EOF", "text/html");
Hello, $variables!
		&log; exit;
and run configure again as described above to make a new executable, then your new HTTPi will internally be able to handle URLs like http://bletch/whoami?Cameron, and respond with a cheery, friendly response. (Here's information on the global variables and functions used here.)

A common idiom, and a nice way to test, is simply to have a line like this:

	require "/home/user/testcode.pl";
in modules.in to be able to modify server behaviour on the fly instantly while the server is running merely by changing testcode.pl. Because Perl must compile your code each time, unfortunately, this will degrade server performance to a variable degree. On the other hand, this can be exceptionally flexible, especially for testing code before making a final HTTPi build, or for those specialized applications where server programming needs to be altered in real time without stopping, rebuilding and restarting. Note that if there is a bug in your script, it may temporarily render HTTPi inoperable until it is fixed (but in most cases HTTPi should immediately start working again when it is). Since this is a require, remember to have your dynamic library return a true value.

In earlier versions, at this point in HTTPi's execution, no attempt had been made to find a valid file specification or check the restriction matrix and it was assumed you had a good idea of what your routine likes and doesn't like. As of 0.99, the restriction matrix checks now come first, so you may protect a module with, say, a password or an IP range restriction by putting an entry in the restriction matrix. In fact, this is how STATIOS is implemented by default.

Customizing library functions and adding your own with userfunc.in (1.5 and up)

Starting in 1.5, functions that are explicitly declared "user serviceable parts," along with custom global functions and subroutines, live in userfunc.in. (Before 1.5, these functions had to be maintained in httpi.in and merged between versions.)

In the function list below, functions marked with (userfunc.in) are maintained in this file and you should make changes to them there rather than to httpi.in.

userfunc.in is not merely limited to tweaking built-in functions, of course; you can put additional new library functions into it, which will also be visible to anything that shares the HTTPi namespace -- code in modules.in, for example, or inline Perl blocks or HTTPerl-based executables. These will also be portable across HTTPi upgrades, assuming no violent architectural changes.

Note that some functions in userfunc.in may be called by other internal HTTPi functions that are not in userfunc.in. For this reason, if you change the arguments or calling convention of these routines, you should make sure that the original method is still supported or you will need to change any calls to it from anywhere else in the file structure (and make those changes to future versions if needed). This can be a real pain. The moral of the story is, don't fix what ain't broke; leave alone what you can.

What you should not use userfunc.in for is directly executable code, as it is not guaranteed to execute when you expect it to (if it winds up executing at all). If you need to actually insert code and not merely a function, put it into modules.in as instructed above.

To make changes, simply edit userfunc.in and append your custom functions and/or edit the standard ones provided. Then, run configure again as described above to update your executable.

External handler support (1.4 and up)

Starting in 1.4 are the beginnings of designing your own external handlers to help reduce the need to depend on items "compiled" into HTTPi at configure-time. Only a constant stub appended to modules.in need be used; it can point to a handler that can vary freely. One example is the PHProxy, a handler that allows you to execute PHP scripts directly within HTTPi, and is included in the standard distribution with instructions in tools/phproxy/ . Here is how its modules.in dispatch looks.
	if ($address =~ /\.php$/i) {
		$raddress = "/usr/local/bin/phproxy";
		goto IRED;
Assuming the condition is met (in this case, any request for a file ending in .php), the $raddress global is set to the location of the external handler (in this case /usr/local/bin/phproxy). After this, the magic incantation goto IRED; short-circuits the logic used to handle standard files and executables, and goes directly to the section that serves a document. If $raddress references an executable (as expected, although it could also be a server-parsed document with inline Perl, or less frequently useful, a regular document), it will start execution.

Because this short-circuits a lot of logic (for several reasons: first and foremost simple installation and management; second, to allow custom behaviour; and third, for speed), your external handler is expected to be robust and do many of the services HTTPi would do for a regular file or executable access. This includes rejecting unsuitable requests (such as file not found, access denied, wrong format, etc.), handling security constraints, and delivering data back to the client.

The easiest way to give all this functionality to your handler is to write it in Perl, and enable HTTPerl at configure-time. While you could write your handler as a regular HTTPi executable or "quasi-NPH-CGI" and use the subset of CGI environment variables HTTPi provides you to obtain the relevant context and arguments, by enabling HTTPerl instead you will then have access to the entire library of HTTPi functions and globals to correctly handle a request in an expected manner. While a list of official globals and functions is given below, in short, you will still have your original request in $address, the mountpoint of the web server in $path (to allow you to generate a true absolute path to a file), and you will still be able to call the various error subroutines to raise complaints to the client, as well as the New Security Model to appropriately handle UID/GID security. In fact, PHProxy requires HTTPerl be enabled to operate correctly.

Functions in HTTPi

Functions maintained in userfunc.in appear with the tag (userfunc.in) and the version they were first maintained there. All other functions are in httpi.in unless otherwise noted.

Networking functions (roughly ordered by appearance)

sub sock_to_host
This function takes the result of getpeername(STDIN) and turns it into a hostname. Used for the log subroutine and executable support. Normally handled by similarly named functions in Socket.pm and like-minded modules, but HTTPi has its own socket support built-in. Unlike 0.1, the current version is hardcoded to use the STDIN filehandle. A list of (hostname, port, IP) is returned; if hostname lookups are off, IP addresses are returned in both the hostname and IP address sections. In 1.4 and up, sock_to_host attempts to cache the results for future calls using the $cache_ip $cache_hn $cache_port globals. In stunnel under 1.6 and up, this function uses the environment variables provided by stunnel instead.
sub absolver
1.5 and up; only valid if the "absolver" DNS option was selected during configuration. This function takes exactly the same arguments as the Perl internal function gethostbyaddr, but wraps the call in a timeout. If the "absolver" is not enabled, then the call is replaced by one to gethostbyaddr and absolver is not defined.

HTTP response functions (roughly ordered by appearance)

sub htsponse
This function takes two parameters, the HTTP response code (e.g. 200, 404, 500, etc.), and a string; it then sends to the client the HTTP response header, the HTTPi custom headers, and the current date. Globals $currentcode and $currentstring are set with the response code and string respectively. This function silently exits if the HTTP version of the client is 0.9.
sub hthead
This function takes two parameters, the header and an optional termination flag. The header is sent to the client, and if the termination flag is true (!= 0), the termination sequence "\r\n" is also sent to indicate the end of headers. The header will automatically have "\r\n" appended to it. This function silently exits if the HTTP version of the client is 0.9.
sub htcontent
This function takes two parameters, the content itself (as a scalar) and the MIME content type. Global $contentlength is set with the length of the content scalar. The content length and content type are sent to the client with the hthead function (the content type having the termination flag set), and the content is then dumped to the client unless the current request method is HEAD.

HTTPi-internal functions (roughly ordered by appearance)

sub log
This function takes no parameters. It writes CERN log entries to the file specified in the global $logfile, and utilises globals $hostname $httpref $date $method $address $variables $httpver $httpua $currentcode and $contentlength, depending on the logging option specified during the configure process.
sub bye
This function is called by default when HTTPi receives a SIGALRM. Currently, it just silently terminates HTTPi.
sub byebye
1.5 and up. This function is called by default when HTTPi receives a SIGTERM. Currently, it just silently terminates HTTPi also. If a child process exists (Demonic HTTPi), it will terminate it first before terminating itself.
sub dead
This function is called by default when HTTPi receives signals that would cause Perl to terminate (through the __DIE__ pseudohandler). It logs a 500 error through htsponse and prints an error message with hterror.
sub hterror (userfunc.in since 1.5)
This function is used to display the default formatted HTTPi error message. It assumes that htsponse has already been called to set the proper HTTP response code. It takes two arguments, a title and an explicatory string, then calls htcontent with an Apache-like formatted dump containing the title and explanation.
sub hterror404, hterror301 (userfunc.in since 1.5)
These are internal error subroutines that simply set error codes and call hterror with their messages.
sub hterror401, hterror302
These are also internal error subroutines that simply set error codes and fall through to hterror with their messages. Due to their less frequent employment, these are maintained internally instead.
sub nsecmodel
1.4 and up. This function manages security and juggles appropriate UID and GIDs for the owner of filehandle S. If the New Security Model is enabled, it enforces its constraints and, if possible, switches UID and GID to the owner of the file referenced by filehandle S. If the New Security Model is not enabled, it will only be invoked for executables, and only restrict to files not owned by UID or GID 0, although it will attempt to change UID and GID if it can. This function exists even if the New Security Model is not enabled. Globals $gid and $uid are set with the GID and UID, respectively, of the referenced document.
sub defaultsignals
1.5 and up. This function abstracts changes to Perl's signaling introduced in Perl 5.8.1. In particular, if POSIX-based signaling was selected at configure time, the old $SIG method is replaced with calls to POSIX::sigaction. This function then asserts standard signal handlers (as mentioned above) using the requested method.
sub alarmsignals
1.5 and up. This function also abstracts signaling, like defaultsignals, but specifically handles the case where SIGALRM is being used as a local timeout (such as in the "absolver", q.v.) rather than for the entire process state.
sub master
Demonic only. This is the abstracted master function called by the socket accept loop at the end of httpi.in. It does not exist in the other versions as this is the main program instead, not simply a function called by the Demonic socket loop.
sub rfctime
1.6 and up. This function takes at least one argument, which is then used to generate an RFC-compliant time string in GMT. If a single argument is passed, it is assumed to be a time integer and is converted to a string in GMT, then processed. If the optional second argument is non-zero, the first argument is assumed to be already a time string in GMT (e.g., the output of scalar gmtime).

Important Globals in HTTPi

<S> and <NS> are HTTPi's control filehandles. Don't mess with them unless you know what you're doing!

Globals maintained in uservar.in

A simple scalar containing additional headers to be sent to the client.
A hash with file extensions as the keys and their respective MIME-types for values. Additional file extensions should be added here. If any entries conflict with keys in %system_content_types (see below), the entries in %content_types take precedence.
The restriction matrix, specifying security options for HTTPi resources. See the manual for the format of values in this hash.
The HTTP name redirect. See the manual for the format of values in this hash.
Definitions for the virtual filesystem. See the manual for the format of values in this hash.

Globals maintained in httpi.in

$logfile, $path
Respectively, the absolute path of the logfile and the absolute path of the document directory. Set on startup.
$currentcode, $currentstring
Respectively, the numerical HTTP response code (e.g. 200, 404, etc.), and the provided string (e.g. "OK", "File Not Found", etc.) Set by htsponse.
Length of content passed to client. Set by htcontent.
The current date and time, in GMT and RFC format. In 1.6, this is generated by rfctime.
The current local date and time, in CERN log format.
$statiosuptime, $statios*
Any variable starting with $statios is part of the Demonic STATIOS module, and is only defined when that module is enabled except for $statiosuptime, which defines the time() when the server was started and is always defined since many portions of code now reference it. $statioslastsec and $statiosmaxsec were added in 1.6.
$method, $address, $httpver
Respectively, the HTTP method (e.g. GET), requested resource, and HTTP version (0.9, 1.0, 1.1). Set after receipt of a valid request from the client.
The real, fully-qualified path to the resource desired, if it can be instantiated. Only set after restriction matrix checks are passed.
The variables passed via the GET method to the server. Passed to an executable script through command line arguments (except for HTTPerl) and the QUERY_STRING environment variable.
Current HTTP referer, as specified by client (- if none given). Set after receipt of a valid Referer header from the client.
Current HTTP user agent/browser string, as specified by client (a null string if none given). Set after receipt of a valid User-Agent header from the client.
$httprawu, $httpuser, $httppw
Respectively, the base64-encoded Authorization header string, and the decoded user and temporary space for the clear-text password. The first is set on receipt of a valid Authorization header; the others only set after restriction matrix checks are passed.
$uid, $gid
Set by nsecmodel after analysis of the currently requested resource to the owner and GID of the requested resource.
$cache_ip, $cache_hn, $cache_port
Set by sock_to_host in 1.4 and up after execution to cache DNS reverse lookups.
Last modify date of the currently selected resource as indicated by $address; set after successfully verifying resource's existence and a successful stat(). Prior to 1.6 this was a simple string in ctime() format, but in 1.6 and up is generated using rfctime to make If-Modified-Since exact prefix comparison simple.
Holding area for content types defined in httpi.in and thus maintained as part of the basic distribution. It is merged with %content_types and then destroyed.

Inline Perl with the <perl> Tag

The inline Perl module, introduced in HTTPi 1.0, is simultaneously the most useful and the most abhorrent feature of HTTPi. As much as it affords you power and flexibility, it also adds a modicum of security leaks, idiosyncracies and stability issues, and that's why it's not enabled by default. The warnings below are just the ones the programmer knows of.

In short, you can execute arbitrary Perl code inside any document with an .sht, .shtm or .shtml extension by placing it within <perl></perl> tags. Whatever your code returns gets displayed to the client. If you have been stupid bold enough to try the inline Perl option, be advised of several important issues:

Here's a brief example of inline Perl that demonstrates grabbing HTTPi internals, calling HTTPi internal functions and some Perl expressions. (Code is provided.)

The page above (sperl.shtml) is very simple and easy, and armed with the list of globals and functions above, you can manipulate them in any way you see fit. Of course, you're not limited to evaluating arbitrary expressions. Try this (code also provided):

Did you like Mission: Impossible? Yes | No

The code responsible for the above page (imf.shtml) introduces two functions that only are added to HTTPi when you enable preparsing. These functions allow you to store up output into a buffer so you can "print" without messing up HTTPi's output stream. This is important because your inline Perl is executing before HTTPi has completely finished emitting headers, and a naked print will corrupt the HTTP headers and probably cause some client confusion. So, instead of printing, call &output with the string, which will emit into a buffer $fbuf (or do this yourself). When you're done, just return &flush();, which returns and flushes the buffer. You can see this in imf.shtml, so look at it if you haven't already.

In 1.4, file manipulation "primitives" have been included allowing you to sling entire files around and build a library of pieces which can be all tied together at service time. The &include function, also only present if preparsing is enabled, accepts a fully-qualified (not relative) path to a file, and directly inserts into the buffer, just as if you had used &output to display its contents, but &include takes care of all the opening and slurping of its data for you. As always, just return &flush(); to return and flush the buffer. If you just want to insert a single file and nothing else, &finclude() does all this for you. Check out this example.

If &include cannot include the file, it will quietly insert a comment in HTML <!-- --> tags containing the error message.

You can also include a file containing additional inline Perl blocks, and this file will be interpreted as if it were part of the original file. This lets you build an entire library of code components you can assemble together at a whim, with a central point for easy system-wide changes. For purposes of the New Security Model, any files you include are operated with the UID of the original file's owner if the New Security Model is enabled, so be sure you trust the source of files that you include into yours.

Errors get caught by HTTPi's __DIE__ handler. Of course, if this bugs you, you could alter this right in your inline Perl by introducing an anonymous subroutine reference. However, a much better way would be to rebuild a custom HTTPi yourself with a new handler rather than make such a change "locally".

It has been mentioned that there are some things you cannot do in inline Perl. One of them is, as stated, using print (or, for that matter, warn). Note that you can use print to print to a filehandle, but just not to stdout. You also cannot have a literal </perl> or <perl> in your code, since the preparser is not too bright; you'll need to cleverly escape it or break it up, which shouldn't be too hard. (Just like you can use the file-manipulation functions to include and execute another file with inline Perl blocks, it is fully possible to have an inline Perl block create another inline Perl block, which will be executed as if it were there in the first place, too. The possibilities for recursion are entertaining and somewhat disturbing, so this exercise is left to the reader.)

Also, anything that will affect the server process will probably cause unforseen results. If you decide to use or require arbitrary code in your inline Perl block, make sure it doesn't conflict with HTTPi. While you can't break the master server or bring it down, thanks to process isolation, you will probably get very inexplicable results out of your document if your module starts treading on HTTPi's rather large namespace (and vice versa). In general, loading modules in inline Perl blocks is not recommended: for such an application, use an executable instead.

Cameron Kaiser