The Programming Manual describes HTTPi internals to people interested in writing or patching their own applications into HTTPi. Perpetually under construction. Additions and suggestions, user patches, etc., are welcomed to the code base. However, I or an authorised maintainer reserve the right to choose what gets checked in. Remember the philosophy: "more with less" :-)
Send your questions or difficulties to httpi@floodgap.com.
Getting Started with the
HTTPi
If you didn't read this page thoroughly, you would probably just go to
where your configure script dropped the Perl executable and merrily edit
it in-place. And it would probably work, too. Well, I've got three words for
that:
configure
Script
Don't do it!
Why?
If you decide to make configuration changes at a later date, or upgrade
to a later version of HTTPi, guess what happens to your changes when
configure runs again? You got it: completely obliterated. That's why
HTTPi has a primitive make
system of sorts built into the
configure
script suite. (Again, please recall there
are multiple configure
scripts; replace
configure
with the appropriate one for your setup.)
By working in with the configure scripts, you can make changes to your
heart's content and carry them across configuration changes and
server updates. So don't edit the HTTPi executable directly!
There are four files in the HTTPi distribution that you need to worry about.
These files are glommed together at configure-time. (Before HTTPi 1.2,
everything was in httpi.in
. If you are,
for some pathological reason, patching an old version, then remember that
all changes must be made there, and that some features might not be
available to you. After that, but prior to 1.5, also keep in mind that
the current userfunc.in
was also part of httpi.in
.)
The files are:
httpi.in
, the master server code. Modules maintained
in the standard distribution, like STATIOS, are also here.
modules.in
, the repository for user modules. All user
modules should be put in this file. You can use this file when you're
upgrading to keep the same modules in your next build.
uservar.in
, the user variables file. Certain configuration
variables, like the restriction matrix, content types and IP-less
virtual hosting, are kept here. Again, you can use this file when you're
upgrading to keep the same settings in your next build.
userfunc.in
, the user functions file. Functions
intentionally declared as user-serviceable live in this file, and you
can of course add your own global subroutines for your modules or
HTTPerl-based executables here too. Once again, this can be brought
forward between builds.
configure
?
If you have a settings transcript from a previous run
of configure
(usually named something like
transcript.0.configure
), you can use that to save typing, and
configure
will get its answers from that.
Just do
% perl configure -d filename
where filename
is the transcript file, either one automatically
made by configure
, or one of your own. If you leave out the
filename, the default answer to questions will be used.
That's it. (If you
are using Demonic HTTPi, you will need to kill the old process(es) and
start the new one for changes to take effect. Don't worry; when
run in this mode configure
doesn't change your inetd
or xinetd
configuration files.)
Otherwise, just run the appropriate configure script as usual and answer the necessary questions.
Naturally, none of the *.in
files are
runnable Perl; they're teased and
twisted by configure
into your runtime object, since
configure
actually contains a miniature pre-processor. Lines like
this:
~check SOMETHING ~ ~are exactly identical to
#ifdef SOMETHING #else #endif(except that they can't be nested) and have exactly the same semantics as they do in the standard C pre-processor. These are used to determine build-time code based on settings you select and on system options (like, for example, whether your system supports
setruid()
-- AIX doesn't, Darwin doesn't
seem to either, NetBSD does, Linux does).
There's also a mystical ~insert
that has similar semantics
to #include
, except that you can't yet nest it either.
There are also defines in the httpi.in
file that
you should be able to match up with questions asked during the
configure
process. Unless you're a white hat, don't mess
with them or preprocessor directives.
In short, configure
is your make
processor.
Use it, love it. You could modify the script in place, but
configure
is, we think, much friendlier, and you'll thank us
for it later.
Hint: If you hopelessly destroy your *.in files, the
real Unix make can rescue you. Assuming the
stock/ directory is still intact, make revert
will replace all your *.in files with the standard
distribution versions. Sorry, this works in 1.2 and up only.
|
uservar.in
. If you want
to add extra headers, change the MIME types reported for a file
extension, alter the rules for the restriction
matrix or add IP-less
virtual hosts, this is the file to
change. (Before 1.2, these globals were in httpi.in
.)
To make these changes, simply edit uservar.in
and make
the needed alterations to the globals. Only these intentionally
user-configurable globals are kept in this file; all others will be
in httpi.in
and should be reserved for white hats. Then
run configure
again, as described
above, to update your executable.
The list of HTTPi globals is listed later in this
manual.
Writing In Your Own Custom Handlers
modules.in
modules.in
(before 1.2, into httpi.in
). At
that point the HTTP request has been decoded,
you have all of the needed globals defined, and you're at the point where
HTTPi needs to determine what the browser should get back.
For example, if you inserted this snippet of code:
if ($address eq "/whoami") { &htsponse(200, "OK"); &htcontent(<<"EOF", "text/html"); <html> <body> Hello, $variables! </body> </html> EOF &log; exit; }and run
configure
again as described
above to make a new executable, then your
new HTTPi will internally be able to handle URLs like
http://bletch/whoami?Cameron
, and respond with a cheery,
friendly response. (Here's information on the global
variables and functions used here.)
A common idiom, and a nice way to test, is simply to have a line like this:
require "/home/user/testcode.pl";in
modules.in
to be able to modify server behaviour
on the fly instantly while the server is running
merely by changing testcode.pl
.
Because Perl must compile your code each time, unfortunately,
this will degrade server performance to a variable degree. On the
other hand, this can be exceptionally flexible, especially for testing
code before making a final HTTPi build, or for those specialized
applications where server programming needs to be altered in real time
without stopping, rebuilding and restarting.
Note that if there is a bug in your script, it may temporarily render HTTPi
inoperable until it is fixed (but in most cases
HTTPi should immediately start working again when it is). Since this is a
require
, remember to have your dynamic library return a true
value.
In earlier versions,
at this point in HTTPi's execution, no attempt had been made to find a
valid file specification or check the restriction matrix and it
was assumed you had
a good idea of what your routine likes and doesn't like. As of 0.99, the
restriction matrix checks now come first, so you may protect a module with,
say, a password or an IP range restriction by putting an entry in the
restriction matrix. In fact, this is how STATIOS
is implemented by default.
Customizing library
functions and adding your own with
Starting in 1.5, functions that are explicitly declared "user serviceable
parts," along with custom global functions and subroutines, live in
userfunc.in
(1.5 and up)userfunc.in
. (Before 1.5, these functions had to be maintained
in httpi.in
and merged between versions.)
In the function list below, functions marked with
(userfunc.in
) are maintained in this file and you should make
changes to them there rather than to httpi.in
.
userfunc.in
is not merely limited to tweaking built-in functions,
of course; you can put additional new library functions into it, which will
also be visible to anything that shares the HTTPi namespace -- code in
modules.in
,
for example, or inline Perl blocks
or HTTPerl-based executables. These will also be portable across
HTTPi upgrades, assuming no violent architectural changes.
Note that some functions in userfunc.in
may be called by other internal HTTPi functions that are
not in userfunc.in
. For this reason, if you change the
arguments or calling convention of these routines, you should make sure that
the original method is still supported or you will need to change
any calls to it from anywhere else in the file structure (and make those
changes to future versions if needed). This can be a real pain. The moral of
the story is, don't fix what ain't broke; leave alone what you can.
What you should not use userfunc.in
for is directly
executable code, as it is not guaranteed to execute when you expect it to
(if it winds up executing at all). If you need to actually insert code and
not merely a function, put it into modules.in
as instructed
above.
To make changes, simply edit userfunc.in
and append your
custom functions and/or edit the standard ones provided. Then,
run configure
again as described
above to update your executable.
modules.in
need be used; it can point to a handler that can vary freely. One example is
the PHProxy, a handler that allows you to
execute PHP scripts directly within HTTPi, and is included in the standard
distribution with instructions in tools/phproxy/
. Here is
how its modules.in
dispatch looks.
if ($address =~ /\.php$/i) { $raddress = "/usr/local/bin/phproxy"; goto IRED; }Assuming the condition is met (in this case, any request for a file ending in
.php
), the $raddress
global is set to the
location of the external handler (in this case
/usr/local/bin/phproxy
). After this, the magic incantation
goto IRED;
short-circuits the logic used to handle standard
files and executables, and goes directly to the section that serves
a document. If $raddress
references an executable (as expected,
although it could also be a server-parsed document with inline Perl, or
less frequently useful, a regular document), it will start execution.
Because this short-circuits a lot of logic (for several reasons: first and foremost simple installation and management; second, to allow custom behaviour; and third, for speed), your external handler is expected to be robust and do many of the services HTTPi would do for a regular file or executable access. This includes rejecting unsuitable requests (such as file not found, access denied, wrong format, etc.), handling security constraints, and delivering data back to the client.
The easiest way to give all this functionality to your handler is to
write it in Perl, and enable HTTPerl at configure-time. While you could
write your handler as a regular HTTPi executable or "quasi-NPH-CGI"
and use the subset of CGI environment
variables HTTPi provides you to obtain the relevant context and arguments,
by enabling HTTPerl instead you will then
have access to the entire library of HTTPi functions and globals to
correctly handle a request in an expected manner. While a list of
official globals and functions is given below, in short, you will still
have your original request in $address
, the mountpoint of
the web server in $path
(to allow you to generate a true
absolute path to a file), and you will still be able to call the various
error subroutines to raise complaints to the client, as well as the
New Security Model to appropriately handle UID/GID security. In fact,
PHProxy requires HTTPerl be enabled to operate correctly.
Functions in HTTPi
Functions maintained in userfunc.in
appear with the
tag (userfunc.in
) and the version they were first maintained
there. All other functions are in httpi.in
unless otherwise
noted.
Networking functions (roughly ordered by appearance)
sub sock_to_host
getpeername(STDIN)
and
turns it into a hostname. Used for the log
subroutine and
executable support. Normally handled by similarly named functions in
Socket.pm
and like-minded modules, but HTTPi has its own socket
support built-in. Unlike 0.1, the current version is hardcoded to use the
STDIN
filehandle. A list of (hostname, port, IP
) is
returned; if hostname lookups are off, IP addresses
are returned in both the hostname and IP address sections. In 1.4 and up,
sock_to_host
attempts to cache the results for future calls
using the $cache_ip $cache_hn $cache_port
globals. In
stunnel
under 1.6 and up, this function uses the environment
variables provided by stunnel
instead.
sub absolver
gethostbyaddr
, but wraps the
call in a timeout. If the "absolver" is not enabled, then the call is
replaced by one to gethostbyaddr
and absolver
is not defined.
HTTP response functions (roughly ordered by appearance)
sub htsponse
$currentcode
and $currentstring
are set with the
response code and string respectively. This function silently exits if the
HTTP version of the client is 0.9.
sub hthead
"\r\n"
is also
sent to indicate the end of headers. The header will automatically have
"\r\n"
appended to it. This function silently exits if the
HTTP version of the client is 0.9.
sub htcontent
$contentlength
is set with
the length of the content scalar. The content length and content
type are sent to the client with the hthead
function (the
content type having the termination flag set), and the content is then
dumped to the client unless the current request method is HEAD
.
HTTPi-internal functions (roughly ordered by appearance)
sub log
$logfile
, and utilises globals
$hostname $httpref $date $method $address $variables $httpver $httpua
$currentcode
and $contentlength
, depending on the
logging option specified during the configure
process.
sub bye
SIGALRM
. Currently, it just silently terminates HTTPi.
sub byebye
SIGTERM
. Currently, it just silently terminates HTTPi also.
If a child process exists (Demonic HTTPi), it will terminate it first
before terminating itself.
sub dead
__DIE__
pseudohandler).
It logs a 500 error through htsponse
and prints an
error message with hterror
.
sub hterror
(userfunc.in
since 1.5)
htsponse
has already been called to
set the proper HTTP response code. It takes two arguments, a title and an
explicatory string, then calls htcontent
with an Apache-like
formatted dump containing the title and explanation.
sub hterror404, hterror301
(userfunc.in
since 1.5)
hterror
with their messages.
sub hterror401, hterror302
hterror
with their messages. Due to their
less frequent employment, these are maintained internally instead.
sub nsecmodel
$gid
and
$uid
are set with the GID and UID, respectively, of the
referenced document.
sub defaultsignals
$SIG
method is replaced
with calls to POSIX::sigaction
. This function then asserts
standard signal handlers (as mentioned above) using the requested method.
sub alarmsignals
defaultsignals
, but specifically handles the case where
SIGALRM
is being used as a local timeout (such as in the
"absolver", q.v.) rather than for the entire process state.
sub master
httpi.in
. It does not
exist in the other versions as this is the main program
instead, not simply a function called by the Demonic socket loop.
sub rfctime
scalar gmtime
).
<S> and <NS> are HTTPi's control
filehandles. Don't mess with them unless you know what you're doing!
|
uservar.in
$headers
%content_types
%system_content_types
(see below), the
entries in %content_types
take precedence.
%restrictions
%nameredir
%virtual_files
httpi.in
$logfile, $path
$currentcode, $currentstring
htsponse
.
$contentlength
htcontent
.
$rfcdate
rfctime
.
$date
$statiosuptime, $statios*
$statios
is part of the Demonic
STATIOS module, and is only defined when that module is enabled except
for $statiosuptime
, which defines the time()
when
the server was started and is always defined since many portions of code now
reference it. $statioslastsec
and $statiosmaxsec
were added in 1.6.
$method, $address, $httpver
$raddress
$variables
QUERY_STRING
environment variable.
$httpref
Referer
header from the client.
$httpua
User-Agent
header from the
client.
$httprawu, $httpuser, $httppw
base64
-encoded Authorization
header string, and the decoded user and temporary space for the clear-text
password. The
first is set on receipt of a valid Authorization
header; the
others only set after restriction matrix checks are passed.
$uid, $gid
nsecmodel
after analysis of the currently requested
resource to the owner and GID of the requested resource.
$cache_ip, $cache_hn, $cache_port
sock_to_host
in 1.4 and up after execution to
cache DNS reverse lookups.
$mtime
$address
; set after successfully verifying resource's existence
and a successful stat()
. Prior to 1.6 this was a simple string
in ctime()
format, but in 1.6 and up is generated using
rfctime
to make If-Modified-Since
exact prefix comparison simple.
%system_content_types
httpi.in
and
thus maintained as part of the basic distribution. It is merged with
%content_types
and then destroyed.
<perl>
Tag
In short, you can execute arbitrary Perl code inside any document with an
.sht
, .shtm
or .shtml
extension
by placing it within <perl></perl>
tags. Whatever
your code return
s gets displayed to the client.
If you have been stupid bold enough to try the inline Perl option,
be advised of several important issues:
root
.
root
.
(x)inetd
, you can
seriously tank the current instance. You can also make complete use of all
globals, functions and states, and manipulate them to any possible extent.
The page above (sperl.shtml
) is very simple and easy, and
armed with the list of globals and functions above, you can manipulate them
in any way you see fit. Of course,
you're not limited to evaluating arbitrary expressions. Try this (code
also provided):
Did you like Mission: Impossible? Yes | No
The code responsible for the above page (imf.shtml
) introduces
two functions
that only are added to HTTPi when you enable preparsing. These functions
allow you to store up output into a buffer so you can "print" without
messing up HTTPi's output stream. This is important because your inline Perl
is executing before HTTPi has completely finished emitting headers, and
a naked print
will corrupt the HTTP headers and probably cause
some client confusion.
So, instead of print
ing, call &output
with the
string, which will emit into a buffer $fbuf
(or do this
yourself). When you're done, just return &flush();
, which
returns and flushes the buffer. You can see this in imf.shtml
, so
look at it if you haven't already.
In 1.4, file manipulation "primitives" have been included allowing you to
sling entire files around and build a library of pieces which can be all
tied together at service time. The &include
function, also
only present if preparsing is enabled, accepts a fully-qualified (not
relative) path to a file, and directly inserts into the buffer, just
as if you had used &output
to display its contents, but
&include
takes care of all the opening and slurping of its
data for you. As always, just return &flush();
to return
and flush the buffer. If you just want to insert a single file and nothing
else, &finclude()
does all this for you. Check out
this example.
If &include
cannot include the file, it will quietly
insert a comment in HTML <!-- -->
tags containing the
error message.
You can also include a file containing additional inline Perl blocks, and this file will be interpreted as if it were part of the original file. This lets you build an entire library of code components you can assemble together at a whim, with a central point for easy system-wide changes. For purposes of the New Security Model, any files you include are operated with the UID of the original file's owner if the New Security Model is enabled, so be sure you trust the source of files that you include into yours.
Errors get caught by HTTPi's __DIE__
handler. Of course, if
this bugs you, you could alter this right in your inline Perl by introducing
an anonymous subroutine reference. However, a much better way would be to
rebuild a custom HTTPi yourself with a new handler rather than make such a
change "locally".
It has been mentioned that there are some things you cannot do in inline Perl.
One of them is, as stated, using print
(or, for that matter,
warn
). Note that you can use print
to print to a
filehandle, but just not to stdout
. You also cannot have
a literal </perl>
or <perl>
in your
code, since the preparser is
not too bright; you'll need to cleverly escape it or break it up, which
shouldn't be too hard.
(Just like you can use the file-manipulation functions to include and
execute another file with inline Perl blocks,
it is fully possible to have an inline Perl block create another inline
Perl block, which will be executed as if it were there in the first place,
too.
The possibilities for recursion are entertaining and somewhat disturbing,
so this exercise is left to the reader.)
Also, anything that will affect the server process will
probably cause unforseen results. If you decide to use
or
require
arbitrary code in your inline Perl block, make sure
it doesn't conflict with HTTPi. While you can't break the master server or
bring it down, thanks to process isolation, you will probably get very
inexplicable results out of your document if your module starts treading on
HTTPi's rather large namespace (and vice versa). In general, loading modules in
inline Perl blocks is not recommended: for such an application, use an
executable instead.