If you have installed HTTPi to run in (x)inetd
,
stunnel
or launchd
,
the superserver will manage HTTPi for you once it is configured.
You'll never need to start your
server manually or place it in your system's rc
scripts,
and will automatically manage all your server instances and/or virtual servers
for you if you have them set up. You will not see any HTTPi processes
running when there are no requests pending. To stop the server, you must
manually remove the entry from your superserver's configuration.
Demonic HTTPi has some special care and feeding rules that are slightly
different than other webservers. In particular, you need one Demonic HTTPi
executable running per IP address it is bound to (so multiple virtual
servers require multiple instances of Demonic HTTPi built for each IP address
to separate executables).
These processes do not autostart, and should be
invoked in your system's rc
scripts or equivalent such as
launchd
itself to make sure
that your server comes up on reboot.
Demonic HTTPi
is invoked by simply calling it from the command line. You must be
root
to bind TCP port numbers lower than 1024.
Prior to HTTPi 1.5, Demonic HTTPi tried to run within a single process and open up additional forks only when requests arrive. This worked up until Perl 5.8, which made significant internal changes (especially, but not merely, with signals) rendering HTTPi incompatible.
To counter these architectural differences, in Demonic HTTPi 1.5 and up, two
processes are seen at any one time. The first is the parent process and is
generally idle, only serving to spawn a single child which is the one
actually listening on the port and doing the work. If you look at the
system's process list, you will see two entries for "dhttpi
"
(which is what Demonic HTTPi calls itself): the first, the parent,
will display the current bound IP address, bound TCP port and the time of
last request,
allowing you to monitor the server remotely without sending fake requests to
check response.
The second, the child, will indicate which PID is monitoring it and its
status. Fortunately, you don't need to worry about the child process;
to stop the server, or a particular configured master instance,
simply kill(1)
the parent with a TERM
signal and the parent will take down
the child for you. You will see additional processes for requests which
will automatically die when they are no longer needed.
You can also kill child processes off if they misbehave, which will appropriately be respawned if needed. The owner of these processes may be affected by the HTTPi Security Model (see below).
As of 0.99, Demonic HTTPi optionally includes STATIOS, a configure-time option allowing you to monitor your server's activity. STATIOS was considerably expanded in 1.6. |
root
, no worries; if it is
running as a non-privileged user, make sure you
% chmod ugo+r {filename}so that it can see it, and
% chmod ugo+rx {directory}on directories. Naturally, all documents must be in the documents directory HTTPi is looking in, to ensure that naughtyness like
/../../../etc/passwd
doesn't work. Good thing, too.
Unlike just about every other webserver, if you don't have an
index.html
file in a directory, you'll get an error, not a
directory tree. This is to save program code and server size, and also
offers a modicum of security so that people can't just riffle your files.
If your Perl supports the alarm()
system call, which any
port worth its salt does,
requests have to be received within a few seconds of startup or the
socket is closed. This is a paranoia feature to help cut down on clients
maliciously holding sockets open.
As of 0.99, HTTPi does include a tool to let you make a directory browseable
if you like (but subdirectories of that directory won't be unless you take
the same steps with them). Copy browsed from the
tools/ directory in the distribution to the desired directory.
Rename it to index.html and make sure it's executable, e.g.
% chmod ugo+x index.htmlThen access the directory like a regular resource (try http://httpi.floodgap.com/old-dists/ for an
example).
|
public_html/
subdirectory in their home directories with a
URL á la http://www.floodgap.com/~spectre/
.
This is a configure-time option done at time of installation. (This is
assuming you're using Unix; other OSes may tune out for a bit.) The same
rules apply: if it's readable relative to the web server's uid, it's
displayed; if it's executable, it's executed; if it's neither, the client
gets a 404.
Attention! Executables are an issue, as you might be running untrusted code under the web server's uid if you're not careful. There is no current way to selectively allow only some users or no users at all to have executables. Read the section on executable support carefully before you enable the user filesystem, as well as the HTTPi Security Model below. |
In the old system, the server only changed its uid/gid on executables,
if it could do so. No
attempt was made to enforce which uids were legal until the most recent
versions, although as a side
effect, no root
-owned
executables were allowed to execute and this was later
assimilated as a "feature".
Documents were served, and server-parsed documents, including inline Perl
blocks, ran, with the webserver's uid/gid.
In the new (and current) HTTPi Security Model, the server changes uid/gid for all documents to the owner of the document, if it can do so. This includes server-parsed documents, including those using inline Perl, and all executables, meaning the serving and execution of all content is restricted to the privileges of the user who owns the file being served. In fact, this includes even completely static documents and files. Moreover, a minimum uid can be specified to indicate which uids may serve documents. This has the following effects:
root
) and gid 0
are explicitly disabled.
Documents owned by either uid 0 or gid 0, as well as any executable file owned
by them, are categorically proscribed and cannot be accessed, no
matter what.
/etc
or /var/adm
or other sensitive locations)
somewhere visible to the webserver, as long as the files within it
are owned by a proscribed uid, they can't be accessed.
nobody
may have strange interactions.
stat()
is used to evaluate the permissions of the file,
meaning that symlinks will not disguise a file's true uid/gid. For example,
a symlink to /etc/passwd
would still evaluate as owned by
root
, and cannot be served.
root
), it
will still enforce the minimum uid constraint, and will still enforce the
restrictions on files owned by either uid or gid 0.
This option is not available to inetd installations.
|
IP-based virtual hosting (one IP address per virtual host) is handled by
individual, distinct instances of HTTPi binding each IP address. You must
run configure
for each build, and specify different absolute
paths for each build so that each configure
does not overwrite
others.
Each IP-based virtual server is an individual, distinct process and is separate from all others. If you are using Demonic HTTPi, each server must also be booted separately; the other flavours of HTTPi will require separate entries in the superserver's configuration. For managing these individual processes, see Starting and Stopping HTTPi.
%nameredir
This option is available to all flavours of HTTPi. |
This is only applicable if you have enabled IP-less virtual hosting with
configure
in your HTTPi build. Warning: Perl knowledge required
for this section.
The IP-less virtual hosting feature uses a hash, with your virtual servers as
the keys mapping to locations in the real file system on the actual server
(the actual server is based on whatever you entered as the fully-qualified
domain name of the server during the configure
process, so this
last is very important). Here's an example, from this server itself:
%nameredir = ( "stockholm.floodgap.com" => "http://www.floodgap.com", "httpi.floodgap.com" => "http://www.floodgap.com/httpi", );Based on the above, a request for
http://httpi.floodgap.com/serve.html
becomes mapped to http://www.floodgap.com/httpi/serve.html
.
In the same way, a request for http://stockholm.floodgap.com/httpi/
becomes mapped to http://www.floodgap.com/httpi/
.
Since HTTPi has been told the real name of the server is
www.floodgap.com
, HTTPi then relies on this hash to
handle requests for the other virtual servers this machine runs.
You must specify an entry for every possible name your server
should respond to if IP-less virtual hosting is turned on;
if the request is not for the real name of the server or for any of
the aliases in %nameredir
, the client gets a 404.
Note that, using the redirection hash above as written, a request for
http://floodgap.com/
will fail with a 404: it must be
http://www.floodgap.com/
.
Also note that, unlike IP-based virtual hosting, this all occurs within one server process. You do not need individual processes per IP-less virtual server.
To edit the %nameredir
hash, please see the programming manual about using the configure
system to help you build new versions of HTTPi, and then edit the
uservar.in
(for versions before 1.2, edit httpi.in
)
file in the distribution according to the manual's instructions.
Executables
Attention! Your CGIs will almost certainly not work directly in HTTPi! Read carefully! |
As mentioned, HTTPi supports executable programs of a fashion, but not as CGI, though some interface features are common.
When a file being requested is executable by HTTPi's uid, it sets a few
environment variables and transforms itself (via exec()
or
another method, explained later) into the executable instead.
If HTTPi is running as
root
,
it and therefore the executable will assume the egid and euid of the
executable's owner. It does not act like
CGI: you must explicitly set an HTTP response code and you do not
have all the CGI environment variables at your disposal. Think of it as
a very stripped down NPH-CGI environment, where no headers are provided you.
Executables need not have a
.cgi
extension, and they don't have to be in any particular
directory save the document one (i.e. there is no explicit
/cgi-bin
directory).
Regardless of whether the webserver is itself running as
root ,
no root -owned or gid 0-owned executable may be executed (in
1.4+),
as well as any constraints you specify in the HTTPi Security Model.
|
The file noodle
, included with the distribution in the
tools/
directory, allows you to
see what happens. chmod
it executable (e.g.
% chmod ugo+x noodle), pop it in the documents directory, and access it as a regular resource (try it on this server and see for yourself). It will display its uid, gid, euid, egid, arguments and environments. You are provided the
REMOTE_HOST
, REMOTE_ADDR
,
REMOTE_PORT
, QUERY_STRING
,
SCRIPT_NAME
, SCRIPT_FILENAME
,
SERVER_PROTOCOL
, SERVER_PORT
,
SERVER_SOFTWARE
, REQUEST_METHOD
,
SERVER_URL
, CONTENT_TYPE
(for POST requests),
CONTENT_LENGTH
(for POST requests),
HTTP_USER_AGENT
and HTTP_REFERER
environment
variables. In addition, as of 1.4, you are also provided the
HTTP_COOKIE
header with any cookies sent by the client, and
as of 1.6, HTTP_ACCEPT_*
, HTTP_ACCEPT
,
HTTP_IF_MODIFIED_SINCE
and HTTP_X_REQUESTED_WITH
;
and if you have 1.7+ and you enable PATH_INFO
, you also will
get PATH_INFO
where applicable.
Here's the good news: most NPH CGIs will probably need no modification at all, and most other CGIs will simply need you to add an extra header to explicitly set an HTTP response code, i.e. add
HTTP/1.0 200 OKas the first line of whatever output the CGI spews. In Perl, you might use
print stdout "HTTP/1.0 200 OK\r\n";to do this, and then call whatever routine prints your content-type.
Because HTTPi has already ceased to exist by the time the executable starts, there is no way it can know if the executable succeeded or failed. Executables appear in the log file with a status code of 100 instead of 200, 302 or 500, the normal codes seen associated with executables.
Note that the REMOTE_HOST
variable may be affected if you
enable DNS anti-spoofing at configure time in 1.5 and up. If your
executables are expecting a directly-resolvable string to be present,
you should modify the executable or disable anti-spoofing.
Certain filesystems may not have reliable execute bits. For those systems,
you can force only certain extensions to be executable (i.e., the file
must not only have execute bits, but must end in
(exe|com|pl|cgi|cmd|perl|[ckbap]*sh) ). This is available in 1.6
as a configure-time option.
|
exec()
exec()
. There are significant advantages and
disadvantages to both.
The normal method, and the method used in HTTPi 0.4 and earlier, is to
exec()
into a new process and completely replace HTTPi and Perl
with whatever the new executable process will be. This works everywhere
and on just about everything,
but is needless overhead if the new executable process happens to be Perl
or a Perl script because now Perl needs to be re-invoked all over again.
The HTTPerl hack, then, is pretty easy to understand conceptually.
HTTPerl works by re-using the current Perl interpreter that is running HTTPi
to run the executable. This has one obvious advantage and one obvious
disadvantage: you don't need to re-invoke Perl again, but at the same time
every one of your executables that HTTPi runs directly must be in Perl. This
means that if you have any binary executables, you must make a Perl wrapper
for them that will do the exec()
at that point. On the other
hand, HTTPerl is in
general much, much faster than blindly exec()
ing into
a new process.
There is one other major quirk in HTTPerl that you need to be aware of:
HTTPerl basically does the equivalent of a require
on the
executable. This means that your executable also has access to all the
internal globals and functions that HTTPi
exposes, a double-edged sword as your executable or its modules might already
be using globals and functions with the same names, but at the same time you
don't have to provide any HTTP negotiation code yourself. Moreover, as a
result of the way it is invoked, your
executable runs within the server (it becomes part of the server),
so HTTPi will do error handling for you unless you catch the
__DIE__
pseudosignal handler first.
Warning!
One big gotcha is that
the server will gripe if you don't return a true value in HTTPerl.
Adding a 1; at the end will suffice, and keep it compatible
with regular CGI-based webservers.
|
In short, exec()
is the default because it will handle all
cases with no problems. However, if your Perl executables work with HTTPerl,
you will find it a much faster solution. Just test thoroughly first!
(Note that speed gains may be reduced if your filesystem cache is fat and
therefore its copy of the Perl executable doesn't have to be flushed
out often. This may be the case on systems with lots of memory; in those cases,
HTTPerl's quirks may not be worth the slight speed edge. Try it and see.)
Inline Perl and the
This is only applicable if you have enabled inline Perl when you built
your executable with <perl>
Tagconfigure
.
HTTPi 1.0 and up offer the ability to embed Perl in server-parsed HTML files. This feature is very useful and also very dangerous (particularly if you have the user filesystem enabled, as it allows anyone to execute arbitrary Perl with the webserver's uid). Although the HTTPi Security Model improves this feature's security, it still has tremendous power for problem as well as promise.
As this feature requires some knowledge of the internals of HTTPi, it is
given its own section in the Programmer's
Manual.
HTTPi Security and the Restriction Matrix
This is only applicable if you have enabled the restriction matrix
facility when you built your executable with configure
.
Since HTTPi lends itself to adding a quick webserver wrapper around applications, a frequent use for HTTPi is as a cheap interface to a network monitor or some data acquisition tool. Equally frequently, this data is sensitive.
HTTPi has an allow/deny authorization scheme called the restriction matrix, a hash in the program that allows the user to create restrictions for a particular directory or resource to certain network addresses, user agents, browsers, or clients, or (as of 0.99) specified user/password pairs. Warning: Perl knowledge required for this section.
The restriction matrix is hard-coded into HTTPi. Here's two sample entries (in fact, the default ones included in HTTPi):
%restrictions = ("/nw" => "^10\.##^Mozilla#MSIE", "/status" => "####voyeur:daNrZR3TcSwD2");To edit the
%restrictions
hash, please see the programming manual about using the configure
system to help you build new versions of HTTPi, and then edit the
uservar.in
(for versions before 1.2, edit httpi.in
)
file in the distribution according to the manual's instructions.
The first entry indicates that for all resources starting with
/nw/
, only
addresses 10.*.*.* are allowed, and of that set, only browsers that report a
user agent string of Mozilla and do not have MSIE. By now it should
be obvious that this is nothing more than four regular expressions concatenated
together with #s, in this order:
/status
(so the Demonic STATIOS module) to any IP address
and any browser, but only those logging in as voyeur
with
password wannapeek
, which is put in the file in
crypt()
format. Multiple user name:password pairs can be here, separated by commas.
The crapword utility in the tools/ distribution
directory is a quick way to encrypt a password. Alternatively, you could
copy the username-password pair from /etc/passwd or your
appropriate shadow password file (Unix) or
use the common htpasswd utility. For compatibility reasons,
HTTPi does not use getpwnam() or getpwent() to
access the password file.
|
Allow/deny rules have this precedence:
In the first line of the example, there is an allow and deny rule for the user
agent. Microsoft Internet Explorer exports a user-agent string of the form
Mozilla/V.v (... MSIE ...)
. It will satisfy the allow rule
^Mozilla
and the deny rule MSIE
, and thus
be disallowed. Mozilla Firefox exports a user-agent string of the
form Mozilla/V.v (...)
. It will satisfy the allow rule
^Mozilla
, but not satisfy the deny rule, and thus will be
allowed. Lynx (Lynx ...
) will fail the allow rule and the
deny rule, and will be disallowed.
The IP address scheme works similarly.
There is no user:password pair, so HTTP authentication is not required for the first example. The second example does require it.
To add additional restrictions for additional directories, simply add hash
rows with the resource prefix, directory, etc., as the key and the restriction
string as the value to the %restrictions
hash -- please
read the programmer's guide for important
information on building new HTTPi versions. Restrictions are prioritized
in descending order by
resource prefix length (i.e. "/foo/bar"
takes precedence over
"/foo/"
takes precedence over "/"
). Naturally, if
no restriction matrix entry exists that matches a particular resource, it
is allowed to all clients and all network addresses.
When a client is disallowed due to a disallowed IP address or client, it is sent the standard HTTP error code 403 along with an explanation. With the first rule above in place, the resource can only be accessed from hosts on the internal network, and then only by Netscape clients.
When a client is disallowed due to an incorrect user name or password, it is
sent the standard HTTP error code 401 and an explanation. With the second
rule above in place, the resource
can only be accessed by users providing username voyeur
and
password wannapeek
.
Virtual Filesystem Support
This option is supported in Demonic HTTPi only. |
This is only applicable if you have enabled the virtual file system with
configure
in your HTTPi build. Warning: Perl knowledge required
for this section.
The virtual filesystem allows you to "preload" files, or even embed them in the server, and then have them served completely from memory. Think of it as a fast, stupid, non-dynamic disk cache. This is particularly useful for small images that get frequently referenced, for example, or other kinds of small files. There are a few caveats:
%virtual_files
hash in uservar.in
. Each hash key
is a fully-qualified file specification, and references an list reference
that has three elements. The first element is the virtual file's MIME type
(yes, it's redundant, but you also have the flexibility to override this for
other purposes). The second is the type of data to follow: FILE
if you wish to reference an actual file and preload that, or DATA
if you're actually inlining data. The third is the data itself: either an
absolute, fully-specified path to the file to be preloaded, or data to be
inlined.
The default %virtual_files
hash in the HTTPi distribution
looks like this:
%virtual_files = ("/httpi/pix/httpismall.gif" => [ "image/gif", "FILE", "/usr/local/htdocs/httpi/pix/httpismall.gif" ] , "/httpi/virtualfile.html" => [ "text/html", "DATA", "<html><body>Look, Ma, I'm virtual!</body></html>" ] , );The first hash entry tells Demonic HTTPi to load
/usr/local/htdocs/httpi/pix/httpismall.gif
into memory and
when a request comes along, instead of looking for it on disk, to serve
it directly from its memory store with a MIME type of image/gif
.
The second hash entry tells Demonic HTTPi to match the inlined text data
with any requests for /httpi/virtualfile.html
. When a request
for that file comes along, instead of looking for it on disk, the server
instead retrieves it from its memory store and serves it with a MIME type
of text/html
. Most important of all, this file actually
does not exist anywhere on this server's hard disk -- it is entirely a
figment of HTTPi's imagination, so to speak.
The Last-Modified
HTTP header works slightly bizarrely when a
virtual file is served. Instead of being that of the original file referenced,
since there may not even be an original file to reference, the
modification date of virtual files is considered to be the time when the
server was invoked, which, if you think about it, is technically true, too.
(The $statiosuptime
variable is used for this purpose, so
even if you don't have STATIOS enabled, this is still defined.)
This makes the code simpler rather than having to carry around file attributes
as excess memory baggage, and allows the interface to be abstracted to
include files that really are virtual and wouldn't have a
modification date per se, as well.
If an entry in the hash is the same as a file that really does exist, the virtual filesystem entry always takes precedence.
The HTTPi Security Model has no effect on how virtual filesystem entries are served.
To edit the %virtual_files
hash, please see the programming manual about using the configure
system to help you build new versions of HTTPi, and then edit the
uservar.in
file in the distribution according to the manual's instructions.
Throttling Support
As of 1.4, an exceptionally primitive and dumb, but effective, bandwidth
throttling option is available, allowing you to limit how fast requests
are served to users.
In an attempt to use as portable and uncomplicated a method as possible, the bandwidth throttling method is implemented overly simplistically. At configure time, you specify how big a "gulp" to take and spew over the network, and then how long should elapse between said gulps. For example, specifying 16K (this should be given to the configure script in bytes, so, 16384 bytes) and one second intervals means an effective average limit of 16K/sec on outgoing transfers. Most people will want to use a one-second interval to obtain as direct a bytes/second limit as possible, but you may have valid uses for increasing this interval (see below).
The interval pause is implemented in terms of the sleep
function, which may inherit the limitations of your local
implementation and operating system, including failing to sleep for
the exact time specified.
|
Virtual files, executables and server-parsed documents (including inline Perl blocks) are not subject to throttling.
Throttling occurs on a per-process basis, not per server or installation. If you have a very busy site and absolutely must maintain as low a bandwidth impact as possible, one possible way to lower the impact is to turn up the interval. This will spread out data pulses to clients and allow more server instances a greater chance to transmit their data. However, there is currently no way in this version of HTTPi to enforce an aggregate transmission limit over all processes that the master HTTPi instance has spawned.