[<<]Back to the main page

User's Manual

It would be greatly amiss if HTTPi didn't work at all like any other webserver. Most of the time, you'll find that HTTPi behaves and operates Pretty Much Like You'd Expect (tm). In fact, once running, it is actually simpler (in the programmer's very biased opinion) to operate and maintain, enable executables, and make documents available to be served; but, like every piece of software, it does have its own idiosyncrasies. If you think every webserver acts like Apache, you'd better read this thoroughly first.

Starting and Stopping HTTPi

Invoking and managing HTTPi sessions is very simple to do, and depending on the way you have HTTPi installed, literally automatic.

If you have installed HTTPi to run in (x)inetd, stunnel or launchd, the superserver will manage HTTPi for you once it is configured. You'll never need to start your server manually or place it in your system's rc scripts, and will automatically manage all your server instances and/or virtual servers for you if you have them set up. You will not see any HTTPi processes running when there are no requests pending. To stop the server, you must manually remove the entry from your superserver's configuration.

Demonic HTTPi has some special care and feeding rules that are slightly different than other webservers. In particular, you need one Demonic HTTPi executable running per IP address it is bound to (so multiple virtual servers require multiple instances of Demonic HTTPi built for each IP address to separate executables). These processes do not autostart, and should be invoked in your system's rc scripts or equivalent such as launchd itself to make sure that your server comes up on reboot.

Demonic HTTPi is invoked by simply calling it from the command line. You must be root to bind TCP port numbers lower than 1024.

Prior to HTTPi 1.5, Demonic HTTPi tried to run within a single process and open up additional forks only when requests arrive. This worked up until Perl 5.8, which made significant internal changes (especially, but not merely, with signals) rendering HTTPi incompatible.

To counter these architectural differences, in Demonic HTTPi 1.5 and up, two processes are seen at any one time. The first is the parent process and is generally idle, only serving to spawn a single child which is the one actually listening on the port and doing the work. If you look at the system's process list, you will see two entries for "dhttpi" (which is what Demonic HTTPi calls itself): the first, the parent, will display the current bound IP address, bound TCP port and the time of last request, allowing you to monitor the server remotely without sending fake requests to check response. The second, the child, will indicate which PID is monitoring it and its status. Fortunately, you don't need to worry about the child process; to stop the server, or a particular configured master instance, simply kill(1) the parent with a TERM signal and the parent will take down the child for you. You will see additional processes for requests which will automatically die when they are no longer needed.

You can also kill child processes off if they misbehave, which will appropriately be respawned if needed. The owner of these processes may be affected by the HTTPi Security Model (see below).

As of 0.99, Demonic HTTPi optionally includes STATIOS, a configure-time option allowing you to monitor your server's activity. STATIOS was considerably expanded in 1.6.

Document Serving

HTTPi follows a simple rule. If the requested file is readable and exists, it loads it and displays it. If the requested file is executable by the current uid, it runs it instead. Everything else gets a 404 error. Therefore, all you need to do to serve documents is make sure that HTTPi can read it. If HTTPi is running as root, no worries; if it is running as a non-privileged user, make sure you
% chmod ugo+r {filename}
so that it can see it, and
% chmod ugo+rx {directory}
on directories. Naturally, all documents must be in the documents directory HTTPi is looking in, to ensure that naughtyness like /../../../etc/passwd doesn't work. Good thing, too.

Unlike just about every other webserver, if you don't have an index.html file in a directory, you'll get an error, not a directory tree. This is to save program code and server size, and also offers a modicum of security so that people can't just riffle your files.

If your Perl supports the alarm() system call, which any port worth its salt does, requests have to be received within a few seconds of startup or the socket is closed. This is a paranoia feature to help cut down on clients maliciously holding sockets open.

As of 0.99, HTTPi does include a tool to let you make a directory browseable if you like (but subdirectories of that directory won't be unless you take the same steps with them). Copy browsed from the tools/ directory in the distribution to the desired directory. Rename it to index.html and make sure it's executable, e.g.
% chmod ugo+x index.html
Then access the directory like a regular resource (try http://httpi.floodgap.com/old-dists/ for an example).

User Filesystems

As of 0.99, you can now allow your users to serve documents from the public_html/ subdirectory in their home directories with a URL á la http://www.floodgap.com/~spectre/. This is a configure-time option done at time of installation. (This is assuming you're using Unix; other OSes may tune out for a bit.) The same rules apply: if it's readable relative to the web server's uid, it's displayed; if it's executable, it's executed; if it's neither, the client gets a 404.

Attention! Executables are an issue, as you might be running untrusted code under the web server's uid if you're not careful. There is no current way to selectively allow only some users or no users at all to have executables. Read the section on executable support carefully before you enable the user filesystem, as well as the HTTPi Security Model below.

The HTTPi Security Model

In 1.4, a new method which HTTPi uses to assign its (effective and/or real) uid and gid for operations became available; as of 1.7, this became the new default and mandatory.

In the old system, the server only changed its uid/gid on executables, if it could do so. No attempt was made to enforce which uids were legal until the most recent versions, although as a side effect, no root-owned executables were allowed to execute and this was later assimilated as a "feature". Documents were served, and server-parsed documents, including inline Perl blocks, ran, with the webserver's uid/gid.

In the new (and current) HTTPi Security Model, the server changes uid/gid for all documents to the owner of the document, if it can do so. This includes server-parsed documents, including those using inline Perl, and all executables, meaning the serving and execution of all content is restricted to the privileges of the user who owns the file being served. In fact, this includes even completely static documents and files. Moreover, a minimum uid can be specified to indicate which uids may serve documents. This has the following effects:

Even if the web server is unable to mutate its uid/gid to the document owner and must do operations as "itself" (when it's not running as root), it will still enforce the minimum uid constraint, and will still enforce the restrictions on files owned by either uid or gid 0.

Virtual Hosting with HTTPi

IP-Based Virtual Hosting

This option is not available to inetd installations.

IP-based virtual hosting (one IP address per virtual host) is handled by individual, distinct instances of HTTPi binding each IP address. You must run configure for each build, and specify different absolute paths for each build so that each configure does not overwrite others.

Each IP-based virtual server is an individual, distinct process and is separate from all others. If you are using Demonic HTTPi, each server must also be booted separately; the other flavours of HTTPi will require separate entries in the superserver's configuration. For managing these individual processes, see Starting and Stopping HTTPi.

IP-Less Virtual Hosting Using %nameredir

This option is available to all flavours of HTTPi.

This is only applicable if you have enabled IP-less virtual hosting with configure in your HTTPi build. Warning: Perl knowledge required for this section.

The IP-less virtual hosting feature uses a hash, with your virtual servers as the keys mapping to locations in the real file system on the actual server (the actual server is based on whatever you entered as the fully-qualified domain name of the server during the configure process, so this last is very important). Here's an example, from this server itself:

%nameredir = (
	 "stockholm.floodgap.com" => "http://www.floodgap.com",
	 "httpi.floodgap.com" => "http://www.floodgap.com/httpi",
	);
Based on the above, a request for http://httpi.floodgap.com/serve.html becomes mapped to http://www.floodgap.com/httpi/serve.html. In the same way, a request for http://stockholm.floodgap.com/httpi/ becomes mapped to http://www.floodgap.com/httpi/. Since HTTPi has been told the real name of the server is www.floodgap.com, HTTPi then relies on this hash to handle requests for the other virtual servers this machine runs.

You must specify an entry for every possible name your server should respond to if IP-less virtual hosting is turned on; if the request is not for the real name of the server or for any of the aliases in %nameredir, the client gets a 404. Note that, using the redirection hash above as written, a request for http://floodgap.com/ will fail with a 404: it must be http://www.floodgap.com/.

Also note that, unlike IP-based virtual hosting, this all occurs within one server process. You do not need individual processes per IP-less virtual server.

To edit the %nameredir hash, please see the programming manual about using the configure system to help you build new versions of HTTPi, and then edit the uservar.in (for versions before 1.2, edit httpi.in) file in the distribution according to the manual's instructions.

Executables

Attention! Your CGIs will almost certainly not work directly in HTTPi! Read carefully!

As mentioned, HTTPi supports executable programs of a fashion, but not as CGI, though some interface features are common.

When a file being requested is executable by HTTPi's uid, it sets a few environment variables and transforms itself (via exec() or another method, explained later) into the executable instead. If HTTPi is running as root, it and therefore the executable will assume the egid and euid of the executable's owner. It does not act like CGI: you must explicitly set an HTTP response code and you do not have all the CGI environment variables at your disposal. Think of it as a very stripped down NPH-CGI environment, where no headers are provided you. Executables need not have a .cgi extension, and they don't have to be in any particular directory save the document one (i.e. there is no explicit /cgi-bin directory).

Regardless of whether the webserver is itself running as root, no root-owned or gid 0-owned executable may be executed (in 1.4+), as well as any constraints you specify in the HTTPi Security Model.

The file noodle, included with the distribution in the tools/ directory, allows you to see what happens. chmod it executable (e.g.

% chmod ugo+x noodle
), pop it in the documents directory, and access it as a regular resource (try it on this server and see for yourself). It will display its uid, gid, euid, egid, arguments and environments. You are provided the REMOTE_HOST, REMOTE_ADDR, REMOTE_PORT, QUERY_STRING, SCRIPT_NAME, SCRIPT_FILENAME, SERVER_PROTOCOL, SERVER_PORT, SERVER_SOFTWARE, REQUEST_METHOD, SERVER_URL, CONTENT_TYPE (for POST requests), CONTENT_LENGTH (for POST requests), HTTP_USER_AGENT and HTTP_REFERER environment variables. In addition, as of 1.4, you are also provided the HTTP_COOKIE header with any cookies sent by the client, and as of 1.6, HTTP_ACCEPT_*, HTTP_ACCEPT, HTTP_IF_MODIFIED_SINCE and HTTP_X_REQUESTED_WITH; and if you have 1.7+ and you enable PATH_INFO, you also will get PATH_INFO where applicable.

Here's the good news: most NPH CGIs will probably need no modification at all, and most other CGIs will simply need you to add an extra header to explicitly set an HTTP response code, i.e. add

HTTP/1.0 200 OK
as the first line of whatever output the CGI spews. In Perl, you might use
print stdout "HTTP/1.0 200 OK\r\n";
to do this, and then call whatever routine prints your content-type.

Because HTTPi has already ceased to exist by the time the executable starts, there is no way it can know if the executable succeeded or failed. Executables appear in the log file with a status code of 100 instead of 200, 302 or 500, the normal codes seen associated with executables.

Note that the REMOTE_HOST variable may be affected if you enable DNS anti-spoofing at configure time in 1.5 and up. If your executables are expecting a directly-resolvable string to be present, you should modify the executable or disable anti-spoofing.

Certain filesystems may not have reliable execute bits. For those systems, you can force only certain extensions to be executable (i.e., the file must not only have execute bits, but must end in (exe|com|pl|cgi|cmd|perl|[ckbap]*sh)). This is available in 1.6 as a configure-time option.

HTTPerl vs. exec()

As of 0.7, HTTPi gives you two ways of running your executables. When configuring HTTPi, you can either choose to use the HTTPerl hack, or do a regular exec(). There are significant advantages and disadvantages to both.

The normal method, and the method used in HTTPi 0.4 and earlier, is to exec() into a new process and completely replace HTTPi and Perl with whatever the new executable process will be. This works everywhere and on just about everything, but is needless overhead if the new executable process happens to be Perl or a Perl script because now Perl needs to be re-invoked all over again.

The HTTPerl hack, then, is pretty easy to understand conceptually. HTTPerl works by re-using the current Perl interpreter that is running HTTPi to run the executable. This has one obvious advantage and one obvious disadvantage: you don't need to re-invoke Perl again, but at the same time every one of your executables that HTTPi runs directly must be in Perl. This means that if you have any binary executables, you must make a Perl wrapper for them that will do the exec() at that point. On the other hand, HTTPerl is in general much, much faster than blindly exec()ing into a new process.

There is one other major quirk in HTTPerl that you need to be aware of: HTTPerl basically does the equivalent of a require on the executable. This means that your executable also has access to all the internal globals and functions that HTTPi exposes, a double-edged sword as your executable or its modules might already be using globals and functions with the same names, but at the same time you don't have to provide any HTTP negotiation code yourself. Moreover, as a result of the way it is invoked, your executable runs within the server (it becomes part of the server), so HTTPi will do error handling for you unless you catch the __DIE__ pseudosignal handler first.

Warning! One big gotcha is that the server will gripe if you don't return a true value in HTTPerl. Adding a 1; at the end will suffice, and keep it compatible with regular CGI-based webservers.

In short, exec() is the default because it will handle all cases with no problems. However, if your Perl executables work with HTTPerl, you will find it a much faster solution. Just test thoroughly first!

(Note that speed gains may be reduced if your filesystem cache is fat and therefore its copy of the Perl executable doesn't have to be flushed out often. This may be the case on systems with lots of memory; in those cases, HTTPerl's quirks may not be worth the slight speed edge. Try it and see.)

Inline Perl and the <perl> Tag

This is only applicable if you have enabled inline Perl when you built your executable with configure.

HTTPi 1.0 and up offer the ability to embed Perl in server-parsed HTML files. This feature is very useful and also very dangerous (particularly if you have the user filesystem enabled, as it allows anyone to execute arbitrary Perl with the webserver's uid). Although the HTTPi Security Model improves this feature's security, it still has tremendous power for problem as well as promise.

As this feature requires some knowledge of the internals of HTTPi, it is given its own section in the Programmer's Manual.

HTTPi Security and the Restriction Matrix

This is only applicable if you have enabled the restriction matrix facility when you built your executable with configure.

Since HTTPi lends itself to adding a quick webserver wrapper around applications, a frequent use for HTTPi is as a cheap interface to a network monitor or some data acquisition tool. Equally frequently, this data is sensitive.

HTTPi has an allow/deny authorization scheme called the restriction matrix, a hash in the program that allows the user to create restrictions for a particular directory or resource to certain network addresses, user agents, browsers, or clients, or (as of 0.99) specified user/password pairs. Warning: Perl knowledge required for this section.

The restriction matrix is hard-coded into HTTPi. Here's two sample entries (in fact, the default ones included in HTTPi):

%restrictions =
        ("/nw" => "^10\.##^Mozilla#MSIE",
         "/status" => "####voyeur:daNrZR3TcSwD2");
To edit the %restrictions hash, please see the programming manual about using the configure system to help you build new versions of HTTPi, and then edit the uservar.in (for versions before 1.2, edit httpi.in) file in the distribution according to the manual's instructions.

The first entry indicates that for all resources starting with /nw/, only addresses 10.*.*.* are allowed, and of that set, only browsers that report a user agent string of Mozilla and do not have MSIE. By now it should be obvious that this is nothing more than four regular expressions concatenated together with #s, in this order:

The second entry has no allow/deny rules, but does have an optional fifth parameter valid only as of 0.99. This entry restricts all resources starting with /status (so the Demonic STATIOS module) to any IP address and any browser, but only those logging in as voyeur with password wannapeek, which is put in the file in crypt() format. Multiple user name:password pairs can be here, separated by commas.

The crapword utility in the tools/ distribution directory is a quick way to encrypt a password. Alternatively, you could copy the username-password pair from /etc/passwd or your appropriate shadow password file (Unix) or use the common htpasswd utility. For compatibility reasons, HTTPi does not use getpwnam() or getpwent() to access the password file.

Allow/deny rules have this precedence:

Once the allow/deny rules have been passed, HTTPi checks the user credentials required, if specified in the matrix. If the user credentials are correct, the client is allowed. Otherwise, the client is disallowed.

In the first line of the example, there is an allow and deny rule for the user agent. Microsoft Internet Explorer exports a user-agent string of the form Mozilla/V.v (... MSIE ...). It will satisfy the allow rule ^Mozilla and the deny rule MSIE, and thus be disallowed. Mozilla Firefox exports a user-agent string of the form Mozilla/V.v (...). It will satisfy the allow rule ^Mozilla, but not satisfy the deny rule, and thus will be allowed. Lynx (Lynx ...) will fail the allow rule and the deny rule, and will be disallowed.

The IP address scheme works similarly.

There is no user:password pair, so HTTP authentication is not required for the first example. The second example does require it.

To add additional restrictions for additional directories, simply add hash rows with the resource prefix, directory, etc., as the key and the restriction string as the value to the %restrictions hash -- please read the programmer's guide for important information on building new HTTPi versions. Restrictions are prioritized in descending order by resource prefix length (i.e. "/foo/bar" takes precedence over "/foo/" takes precedence over "/"). Naturally, if no restriction matrix entry exists that matches a particular resource, it is allowed to all clients and all network addresses.

When a client is disallowed due to a disallowed IP address or client, it is sent the standard HTTP error code 403 along with an explanation. With the first rule above in place, the resource can only be accessed from hosts on the internal network, and then only by Netscape clients.

When a client is disallowed due to an incorrect user name or password, it is sent the standard HTTP error code 401 and an explanation. With the second rule above in place, the resource can only be accessed by users providing username voyeur and password wannapeek.

Virtual Filesystem Support

This option is supported in Demonic HTTPi only.

This is only applicable if you have enabled the virtual file system with configure in your HTTPi build. Warning: Perl knowledge required for this section.

The virtual filesystem allows you to "preload" files, or even embed them in the server, and then have them served completely from memory. Think of it as a fast, stupid, non-dynamic disk cache. This is particularly useful for small images that get frequently referenced, for example, or other kinds of small files. There are a few caveats:

The entries that make up the virtual filesystem are in the %virtual_files hash in uservar.in. Each hash key is a fully-qualified file specification, and references an list reference that has three elements. The first element is the virtual file's MIME type (yes, it's redundant, but you also have the flexibility to override this for other purposes). The second is the type of data to follow: FILE if you wish to reference an actual file and preload that, or DATA if you're actually inlining data. The third is the data itself: either an absolute, fully-specified path to the file to be preloaded, or data to be inlined.

The default %virtual_files hash in the HTTPi distribution looks like this:

%virtual_files =
        ("/httpi/pix/httpismall.gif" => [ "image/gif", "FILE",
                "/usr/local/htdocs/httpi/pix/httpismall.gif" ] ,
         "/httpi/virtualfile.html" => [ "text/html", "DATA",
                "<html><body>Look, Ma, I'm virtual!</body></html>" ] ,
        );
The first hash entry tells Demonic HTTPi to load /usr/local/htdocs/httpi/pix/httpismall.gif into memory and when a request comes along, instead of looking for it on disk, to serve it directly from its memory store with a MIME type of image/gif.

The second hash entry tells Demonic HTTPi to match the inlined text data with any requests for /httpi/virtualfile.html. When a request for that file comes along, instead of looking for it on disk, the server instead retrieves it from its memory store and serves it with a MIME type of text/html. Most important of all, this file actually does not exist anywhere on this server's hard disk -- it is entirely a figment of HTTPi's imagination, so to speak.

The Last-Modified HTTP header works slightly bizarrely when a virtual file is served. Instead of being that of the original file referenced, since there may not even be an original file to reference, the modification date of virtual files is considered to be the time when the server was invoked, which, if you think about it, is technically true, too. (The $statiosuptime variable is used for this purpose, so even if you don't have STATIOS enabled, this is still defined.) This makes the code simpler rather than having to carry around file attributes as excess memory baggage, and allows the interface to be abstracted to include files that really are virtual and wouldn't have a modification date per se, as well.

If an entry in the hash is the same as a file that really does exist, the virtual filesystem entry always takes precedence.

The HTTPi Security Model has no effect on how virtual filesystem entries are served.

To edit the %virtual_files hash, please see the programming manual about using the configure system to help you build new versions of HTTPi, and then edit the uservar.in file in the distribution according to the manual's instructions.

Throttling Support

As of 1.4, an exceptionally primitive and dumb, but effective, bandwidth throttling option is available, allowing you to limit how fast requests are served to users.

In an attempt to use as portable and uncomplicated a method as possible, the bandwidth throttling method is implemented overly simplistically. At configure time, you specify how big a "gulp" to take and spew over the network, and then how long should elapse between said gulps. For example, specifying 16K (this should be given to the configure script in bytes, so, 16384 bytes) and one second intervals means an effective average limit of 16K/sec on outgoing transfers. Most people will want to use a one-second interval to obtain as direct a bytes/second limit as possible, but you may have valid uses for increasing this interval (see below).

The interval pause is implemented in terms of the sleep function, which may inherit the limitations of your local implementation and operating system, including failing to sleep for the exact time specified.

Virtual files, executables and server-parsed documents (including inline Perl blocks) are not subject to throttling.

Throttling occurs on a per-process basis, not per server or installation. If you have a very busy site and absolutely must maintain as low a bandwidth impact as possible, one possible way to lower the impact is to turn up the interval. This will spread out data pulses to clients and allow more server instances a greater chance to transmit their data. However, there is currently no way in this version of HTTPi to enforce an aggregate transmission limit over all processes that the master HTTPi instance has spawned.

Questions and Bug Reports

Send all outstanding issues to httpi@floodgap.com.
Cameron Kaiser